Re: [ol-discuss] Open Library Dumps

George Oates Thu, 08 Apr 2010 16:13:03 -0700

Hi Apo,

Sorry the techie talk is a little overwhelming. We do actually have a more 
techie list set up, but ol-discuss has more subscribers, so we tend to abuse it.


In future, we'll try to make tech-related announcements on ol-tech instead.

But! Just because there are also more technical people subscribed to ol-discuss 
shouldn't prevent you from talking more generally about things other than 
software development!

Regards,
george


Apostolis wrote:
> Since the time I joined this list (about a year ago) I must admit that 
> very litle from all these discussions was really understood and could 
> apply when I was adding new titles. I am professional with a long 
> academic background, I am always learning.
> Some of you people are overspecialised in the subjects of programming 
> library software.
> WE the silent majority are not in your level.
> You must understand that the way you perform the discussion for 
> proposals for changes and improvements you just speak among yourselves 
> in an elit group of probably let me guess of no more than ten persons.
> We the silent majority, the audience, does not understand a word of all 
> the jargon you use about  how and where all these proposals will be applied.
> We participate in this list in order to learn to use this great tool of 
> OL and add more material and make corrections.
> My proposal is: you the higly specialised programming people must:
>  
> Either dedicate some take some time and make all this discussion simple 
> so even a non computer expert, non librarian expert, and all the simple 
> on feet going people, may follow you and be able get something out of 
> the whole discussion and benefit.
> Or you shift to another group list and let us the non specialists, the 
> amateurs but OL enthusiasts, be here and keep the level of the 
> discussions understandable and to a level wich everybody understands. 
> I guess that the non specialists are the majority so they must stay.
> Those who share my opinion must I gues will back up my proposal here in 
> written. Please Act now!
>  
> Apo Papageorgiou, Greece
>  
>  
>  
>  
> 
> 
>  
> 2010/4/7 Anand Chitipothu <[email protected] <mailto:[email protected]>>
> 
>      > I am perhaps assuming more than is warranted, but I think that the
>      > JSON
>      > data should always occur in the last column, for ease of parsing (if
>      > the
>      > data is comma delimited, and the JSON data contains commas, it makes
>      > it
>      > more complex to parse the non-JSON data). I would suggest making the
>      > column order explicit, such as:
>      >
>      > type, key, revision, timestamp, json
> 
>     Sorry, I forgot to mention that the columns will be tab separated. So
>     JSON will not interfere with parsing.
> 
>     I like the idea of keeping the json as the last column. I'm fine with
>     keeping type as the first column.
> 
>      > In this case, if I'm only interested in records last modified
>     after an
>      > arbitrary time it is easy to find the timestamp without touching the
>      > JSON data. Likewise, if I'm only interested in author records, having
>      > each line start with "type/[whatever]" makes it quite easy to skip
>      > over
>      > everything that is /not/ an author record.
> 
>     That was the reason for adding those columns.
> 
>      > On a related note, it seems to me that the OpenLibrary catalog data
>      > can
>      > be divided generally into two classes: metadata about books and
>      > library
>      > holdings, and metadata about the OpenLibrary catalog system and web
>      > interface.
> 
>     The second part is quite small (less than 1%).
>     Do you think it is helpful to generate ol_author_dump, ol_edition_dump
>     and ol_work_dump along with ol_dump? I didn't put them because they
>     can be generated quite easily from ol_dump.
> 
>      > I may be terribly wrong, but I would bet that the number of
>     people who
>      > want to recreate the OpenLibrary web interface can be counted on the
>      > fingers of one finger. All the rest are interested in the book
>      > metadata
>      > and little else. Thus, I would recommend splitting up the dump files
>      > into two parts: OpenLibrary catalog data about books, and OpenLibrary
>      > data about the OpenLibrary catalog system.
> 
>     We are also planning to use these dumps internally.
> 
>     [snip]
> 
>      >> More columns may be added to the dumps in future if needed. To
>      >> maintain backward-compatibility new columns will always be added at
>      >> the end and the above mentioned column order is maintained. People
>      >> writing code for parsing these dumps should keep this in mind.
>      >
>      > Again, I think JSON data should always appear in the last column. New
>      > columns would therefore be added after the last explicit column and
>      > before the JSON data. If the JSON data always starts with a curly
>      > brace
>      > ('{') it should unduly complicate parsing of a line.
> 
>     I agree.
> 
>     Thanks for your feedback, Lee!
> 
>     Anand
>     _______________________________________________
>     Ol-discuss mailing list
>     [email protected] <mailto:[email protected]>
>     http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
>     To unsubscribe from this mailing list, send email to
>     [email protected]
>     <mailto:[email protected]>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Ol-discuss mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
> To unsubscribe from this mailing list, send email to 
> [email protected]
_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-discuss] Open Library Dumps

Reply via email to