Thanks for all your attention. Actually we are trying to categorize the
edition data according to the subjects. There are almost 25 million
editions in the current dump. We took 5 lacks random samples to our db to
see whether the data can be categorized according to the samples or not. We
were also curious to see the repeatability of the subject. After running
our parser script over ol edition dump, we saw below mentioned corruption
for several rows. I think it's not possible to check 25 million edition
manually and correct them. I think we can not filter the incorrect data
through a script. We have to think in other way. Thanks any way.

Rgds,
Sujoy

On Thu, May 3, 2012 at 2:18 AM, Tom Morris <[email protected]> wrote:

> On Wed, May 2, 2012 at 1:43 PM, Karen Coyle <[email protected]> wrote:
> > On 5/2/12 10:22 AM, Ben Companjen wrote:
> >
> >> What the best solution is for you depends on your goals with the data,
> I think.
> >> You could write software that tries to combine the "(2nd :" and
> >> "1962-1965)", but I don't know whether anyone would ever use that in a
> >> subject search.
> >
> > Remember that OL organizes subjects in subject pages, it doesn't just
> > allow search. The date-related subjects will gather together books with
> > the same or similar dates. Unfortunately, LCSH (and probably the Amazon
> > subjects) is pretty quirky.
> >
> > I forgot the FAST link on the last message:
> >
> > http://experimental.worldcat.org/fast/1360394/
> >
> > That's the one for a similar heading, but not this exact heading. This
> > heading has five subfields:
> >
> > |a Vatican Council
> > |n (2nd :
> > |d 1962-1965).
> > |t Declaratio de libertate religiosa
> > |x Congresses.
> >
> > and you'll see that "(2nd:" and "1962-1965)" are in separate subfields.
> > There's no way to know that they are supposed to display with "(2nd:
> > 1962-1965)" as a single unit unless you go to the effort of intepreting
> > the punctuation.
>
> You'd also need the pieces in the correct order and the order has been
> scrambled in both Amazon and Open Library.
>
> Tom
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to
> [email protected]
>
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to