Re: [CODE4LIB] Q: XML2JSON converter
Godmar Back wrote: Hi, Can anybody recommend an open source XML2JSON converter in PhP or Python (or potentially other languages, including XSLT stylesheets)? Ideally, it should implement one of the common JSON conventions, such as Google's JSON convention for GData [1], but anything that preserves all elements, attributes, and text content of the XML file would be acceptable. Note that json_encode(simplexml_load_file(...)) does not meet this requirement - in fact, nothing based on simplexml_load_file() will. (It can't even load MarcXML correctly). Thanks! - Godmar [1] http://code.google.com/apis/gdata/docs/json.html Hi, try this: http://code.google.com/p/xml2json-xslt/ best, Ulrich -- Dr.-Ing. Ulrich Schaefer http://dfki.de/~uschaefer phone:+496813025154 DFKI Language Technology Lab, D-66123 Saarbruecken, Germany --- Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender), Dr. Walter Olthoff. Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes. Amtsgericht Kaiserslautern, HRB 2313
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com - Godmar
Re: [CODE4LIB] Code4Lib Midwest?
+1 ELM, I'm happy to help coordinate in whatever way you need. Also, if we can find a drummer, we could do a blues trio (count me in on bass). I could bring our band's drummer (a HUGE ND fan) down for a day or two if needed--he's awesome. --SG WMU in Kalamazoo - Original Message - From: Eric Lease Morgan emor...@nd.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, March 4, 2010 4:38:53 PM Subject: Re: [CODE4LIB] Code4Lib Midwest? On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote: 2. share demonstrations I'd like to see this be something like a blend between lightning talks and the ask anything session at the last conference This certainly works for me, and the length of time of each talk would/could be directly proportional to the number of people who attend. 4. give a presentation to library staff What sort of presentation did you have in mind, Eric? This also raises the issue of weekday vs. weekend. I'm game for either. Anyone else have a preference? What I was thinking here was a possible presentation to library faculty/staff and/or computing faculty/staff from across campus. The presentation could be one or two cool hacks or solutions that solved wider, less geeky problems. Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested content it would be making journal articles easier to find. This would be an opportunity to show off the good work done by institutions outside Notre Dame. A prophet in their own land is not as convincing as the expert from afar. I was thinking it would happen on a weekday. There would be more stuff going on here on campus, as well as give everybody a break from their normal work week. More specifically, I would suggest such an event take place on a Friday so the poeple who stayed over night would not have to take so many days off of work. 5. have a hack session It would be good to have 2 or 3 projects we can/should work on decided ahead of time (in case no one has any good ideas at the time), and perhaps a couple more inspired by the earlier presentations. True. -- ELM University of Notre Dame
Re: [CODE4LIB] Code4Lib Midwest?
On Fri, Mar 5, 2010 at 8:37 AM, Scott Garrison scott.garri...@wmich.edu wrote: Also, if we can find a drummer, we could do a blues trio (count me in on bass). If someone can bring drums, I can play them. -- Jonathan M. Brinley jonathanbrin...@gmail.com http://xplus3.net/
Re: [CODE4LIB] Q: XML2JSON converter
Internet Archive seems to have a copy of that: http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php as well as several versions of the site: http://web.archive.org/web/*/http://badgerfish.ning.com Kevin On Fri, Mar 5, 2010 at 8:15 AM, Godmar Back god...@gmail.com wrote: On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com - Godmar
Re: [CODE4LIB] Q: XML2JSON converter
On 3/5/10 8:15 AM, Godmar Back wrote: On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaeferulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com - Godmar Godmar, I'd be interested in collaborating with you on creating one. I'd bounced this question off the CouchDB IRC channel a while back, and the summary was that you'd generally create a JSON structure for your document and then right the code to map the XML to JSON. However, I do think something more generic like Google's GData to JSON would fit the bill for most use cases...sadly, it doesn't seem they've made the conversion code available. If you're looking at putting MARC into JSON, there was some discussion of that during code4lib 2010. Johnathan Rochkind, who was at code4lib 2010 blogged about marc-json recently: http://bibwild.wordpress.com/2010/03/03/marc-json/ He references a project that Bill Dueber's been playing with for a year: http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/ All told, there's growing momentum for a MARC in JSON format to be created, so you might jump in there. Additionally, I'd love to find a project building code to do what Google's done with the GData to JSON format. If you find one, I'd enjoy seeing it. Thanks, Godmar, Benjamin -- President BigBlueHat P: 864.232.9553 W: http://www.bigbluehat.com/ http://www.linkedin.com/in/benjaminyoung
Re: [CODE4LIB] Q: XML2JSON converter
You can find it here, although I wouldn't get too excited: http://bit.ly/acROxH You could also fish for more info by badgering its creator at http://www.sklar.com/page/section/contact. Cary On Fri, Mar 5, 2010 at 5:15 AM, Godmar Back god...@gmail.com wrote: On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com - Godmar -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] Code4Lib 2011 Proposals
I could say you're a dreamer, but you're not the only one. The reality is that III sees their APIs as gold mines that they can market to a captive audience. For example, their patron API -- a simple web interface to patron records -- probably cost them much less to develop than they get for a single license. Don't look for them to actually get off of that anytime soon. Koha might motivate them to bump there FUD generating budget, but that's about it. Cary On Thu, Mar 4, 2010 at 2:41 PM, Esme Cowles escow...@ucsd.edu wrote: After seeing some of the cool things people can do with other ILS's and how negative developers are about III, there's always the chance they might decide to open up a bit more and engage with code4lib types (we can always dream). And if that doesn't work, maybe the Ian Walls' talk (Becoming Truly Innovative: Migrating from Millennium to Koha) will motivate them... -Esme -- Esme Cowles escow...@ucsd.edu They extend copyrights perpetually. They don't get how that in itself is a form of theft. -- Lawrence Lessig, Free Culture On Mar 4, 2010, at 5:08 PM, Jill Ellern wrote: We tried to get some of the ILS's interested...with little success. But how knows...I did some heavy promotion to III this year...(despite the many --s, she promised to talk to headquarters) so perhaps they might help some next year... Jill -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Paul Joseph Sent: Wednesday, March 03, 2010 9:56 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals No need to be concerned about the vendors: they're the same suspects who sponsored C4L10. Paul On Wed, Mar 3, 2010 at 2:37 PM, Ya'aqov Ziso z...@rowan.edu wrote: also, I can assure you that to help keep registration fees low we'll be leaning on our vendors ... = Who would be these vendors? Seems CODE4LIB (bringing in creative, leading edge, OpenSource ideas where ILS have monolithically reigned) are the bad dream of ILS vendors. WorldCat DeveNet/Research may make an exception, but will it be $ufficient? Ya¹aqov -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] Q: XML2JSON converter
have you tried this? http://www.bramstein.com/projects/xsltjson/ http://github.com/bramstein/xsltjson using the parameter |use-rayfish=true seems to preserve everything but namespaces but then there is a parameter to preserve namespaces as well.| Mark On 3/5/2010 12:54 AM, Godmar Back wrote: Hi, Can anybody recommend an open source XML2JSON converter in PhP or Python (or potentially other languages, including XSLT stylesheets)? Ideally, it should implement one of the common JSON conventions, such as Google's JSON convention for GData [1], but anything that preserves all elements, attributes, and text content of the XML file would be acceptable. Note that json_encode(simplexml_load_file(...)) does not meet this requirement - in fact, nothing based on simplexml_load_file() will. (It can't even load MarcXML correctly). Thanks! - Godmar [1] http://code.google.com/apis/gdata/docs/json.html
Re: [CODE4LIB] Code4Lib Midwest?
+1 I suspect a few of us from OCLC would attend. Ralph -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Scott Garrison Sent: Friday, March 05, 2010 8:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib Midwest? +1 ELM, I'm happy to help coordinate in whatever way you need. Also, if we can find a drummer, we could do a blues trio (count me in on bass). I could bring our band's drummer (a HUGE ND fan) down for a day or two if needed--he's awesome. --SG WMU in Kalamazoo - Original Message - From: Eric Lease Morgan emor...@nd.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, March 4, 2010 4:38:53 PM Subject: Re: [CODE4LIB] Code4Lib Midwest? On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote: 2. share demonstrations I'd like to see this be something like a blend between lightning talks and the ask anything session at the last conference This certainly works for me, and the length of time of each talk would/could be directly proportional to the number of people who attend. 4. give a presentation to library staff What sort of presentation did you have in mind, Eric? This also raises the issue of weekday vs. weekend. I'm game for either. Anyone else have a preference? What I was thinking here was a possible presentation to library faculty/staff and/or computing faculty/staff from across campus. The presentation could be one or two cool hacks or solutions that solved wider, less geeky problems. Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested content it would be making journal articles easier to find. This would be an opportunity to show off the good work done by institutions outside Notre Dame. A prophet in their own land is not as convincing as the expert from afar. I was thinking it would happen on a weekday. There would be more stuff going on here on campus, as well as give everybody a break from their normal work week. More specifically, I would suggest such an event take place on a Friday so the poeple who stayed over night would not have to take so many days off of work. 5. have a hack session It would be good to have 2 or 3 projects we can/should work on decided ahead of time (in case no one has any good ideas at the time), and perhaps a couple more inspired by the earlier presentations. True. -- ELM University of Notre Dame
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, 5 Mar 2010, Godmar Back wrote: On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com http://web.archive.org/web/20080216200903/http://badgerfish.ning.com/ http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php -Joe
Re: [CODE4LIB] Code4Lib Midwest?
Exciting opportunity. I bet we could get several people from the Ball State Library IT shop up to ND for this. Jim + James Hammons, M.L.S. Head of Library Technologies Library Information Technology Services voice: (765) 285-8032 Bracken Library fax: (765) 285-1096 Ball State University e-mail: jhamm...@bsu.edu Muncie, IN 47306http://www.bsu.edu/library U.S.A. Ball State University Libraries A destination for research, learning, and friends + -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of LeVan,Ralph Sent: Friday, March 05, 2010 9:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib Midwest? +1 I suspect a few of us from OCLC would attend. Ralph -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Scott Garrison Sent: Friday, March 05, 2010 8:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib Midwest? +1 ELM, I'm happy to help coordinate in whatever way you need. Also, if we can find a drummer, we could do a blues trio (count me in on bass). I could bring our band's drummer (a HUGE ND fan) down for a day or two if needed--he's awesome. --SG WMU in Kalamazoo - Original Message - From: Eric Lease Morgan emor...@nd.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, March 4, 2010 4:38:53 PM Subject: Re: [CODE4LIB] Code4Lib Midwest? On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote: 2. share demonstrations I'd like to see this be something like a blend between lightning talks and the ask anything session at the last conference This certainly works for me, and the length of time of each talk would/could be directly proportional to the number of people who attend. 4. give a presentation to library staff What sort of presentation did you have in mind, Eric? This also raises the issue of weekday vs. weekend. I'm game for either. Anyone else have a preference? What I was thinking here was a possible presentation to library faculty/staff and/or computing faculty/staff from across campus. The presentation could be one or two cool hacks or solutions that solved wider, less geeky problems. Instead of tweaking Solr's term-weighting algorithms to index OAI- harvested content it would be making journal articles easier to find. This would be an opportunity to show off the good work done by institutions outside Notre Dame. A prophet in their own land is not as convincing as the expert from afar. I was thinking it would happen on a weekday. There would be more stuff going on here on campus, as well as give everybody a break from their normal work week. More specifically, I would suggest such an event take place on a Friday so the poeple who stayed over night would not have to take so many days off of work. 5. have a hack session It would be good to have 2 or 3 projects we can/should work on decided ahead of time (in case no one has any good ideas at the time), and perhaps a couple more inspired by the earlier presentations. True. -- ELM University of Notre Dame
Re: [CODE4LIB] Code4Lib Midwest?
I would come from Ohio to wherever we choose. Kalamazoo would suit me just fine; I've not been back there in entirely too long! Ken -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Scott Garrison Sent: Friday, March 05, 2010 8:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib Midwest? +1 ELM, I'm happy to help coordinate in whatever way you need. Also, if we can find a drummer, we could do a blues trio (count me in on bass). I could bring our band's drummer (a HUGE ND fan) down for a day or two if needed--he's awesome. --SG WMU in Kalamazoo - Original Message - From: Eric Lease Morgan emor...@nd.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, March 4, 2010 4:38:53 PM Subject: Re: [CODE4LIB] Code4Lib Midwest? On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote: 2. share demonstrations I'd like to see this be something like a blend between lightning talks and the ask anything session at the last conference This certainly works for me, and the length of time of each talk would/could be directly proportional to the number of people who attend. 4. give a presentation to library staff What sort of presentation did you have in mind, Eric? This also raises the issue of weekday vs. weekend. I'm game for either. Anyone else have a preference? What I was thinking here was a possible presentation to library faculty/staff and/or computing faculty/staff from across campus. The presentation could be one or two cool hacks or solutions that solved wider, less geeky problems. Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested content it would be making journal articles easier to find. This would be an opportunity to show off the good work done by institutions outside Notre Dame. A prophet in their own land is not as convincing as the expert from afar. I was thinking it would happen on a weekday. There would be more stuff going on here on campus, as well as give everybody a break from their normal work week. More specifically, I would suggest such an event take place on a Friday so the poeple who stayed over night would not have to take so many days off of work. 5. have a hack session It would be good to have 2 or 3 projects we can/should work on decided ahead of time (in case no one has any good ideas at the time), and perhaps a couple more inspired by the earlier presentations. True. -- ELM University of Notre Dame
Re: [CODE4LIB] Q: XML2JSON converter
If PHP/python isn't a hard requirement, I think this would be fairly simple to do in perl using a combination of the XML::Simple [1] and JSON::XS [2] modules. In fact it's so simple, here's the code: #!/usr/bin/perl use JSON::XS; use XML::Simple; use strict; my $filename = shift @ARGV; my $parsed = XMLin($filename); my $json = encode_json($parsed); print $json, \n; XML::Simple, in spite of the name, actually allows for a myriad of options for how the perl data structure gets created from the xml, including attribute preservation, grouping of elements, etc. --jay [1] http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm [2] http://search.cpan.org/~makamaka/JSON-2.17/lib/JSON.pm On Fri, Mar 5, 2010 at 9:55 AM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: On Fri, 5 Mar 2010, Godmar Back wrote: On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com http://web.archive.org/web/20080216200903/http://badgerfish.ning.com/ http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php -Joe
Re: [CODE4LIB] Code4Lib Midwest?
I'm pretty sure I could make it from Ann Arbor! On Fri, Mar 5, 2010 at 10:12 AM, Ken Irwin kir...@wittenberg.edu wrote: I would come from Ohio to wherever we choose. Kalamazoo would suit me just fine; I've not been back there in entirely too long! Ken -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Scott Garrison Sent: Friday, March 05, 2010 8:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib Midwest? +1 ELM, I'm happy to help coordinate in whatever way you need. Also, if we can find a drummer, we could do a blues trio (count me in on bass). I could bring our band's drummer (a HUGE ND fan) down for a day or two if needed--he's awesome. --SG WMU in Kalamazoo - Original Message - From: Eric Lease Morgan emor...@nd.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Thursday, March 4, 2010 4:38:53 PM Subject: Re: [CODE4LIB] Code4Lib Midwest? On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote: 2. share demonstrations I'd like to see this be something like a blend between lightning talks and the ask anything session at the last conference This certainly works for me, and the length of time of each talk would/could be directly proportional to the number of people who attend. 4. give a presentation to library staff What sort of presentation did you have in mind, Eric? This also raises the issue of weekday vs. weekend. I'm game for either. Anyone else have a preference? What I was thinking here was a possible presentation to library faculty/staff and/or computing faculty/staff from across campus. The presentation could be one or two cool hacks or solutions that solved wider, less geeky problems. Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested content it would be making journal articles easier to find. This would be an opportunity to show off the good work done by institutions outside Notre Dame. A prophet in their own land is not as convincing as the expert from afar. I was thinking it would happen on a weekday. There would be more stuff going on here on campus, as well as give everybody a break from their normal work week. More specifically, I would suggest such an event take place on a Friday so the poeple who stayed over night would not have to take so many days off of work. 5. have a hack session It would be good to have 2 or 3 projects we can/should work on decided ahead of time (in case no one has any good ideas at the time), and perhaps a couple more inspired by the earlier presentations. True. -- ELM University of Notre Dame -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Code4Lib 2011 Proposals
I can't see why III would want to have anything to do with this conference. I think most of us who attend the conference are open-source types, and are trying to do things beyond what we could do with the vendors (who are risk-averse and profit-oriented.) If III wants to be truly innovative, they should send a technical person (not a sales person) to this conference. Martin -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Cary Gordon Sent: Friday, March 05, 2010 9:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals I could say you're a dreamer, but you're not the only one. The reality is that III sees their APIs as gold mines that they can market to a captive audience. For example, their patron API -- a simple web interface to patron records -- probably cost them much less to develop than they get for a single license. Don't look for them to actually get off of that anytime soon. Koha might motivate them to bump there FUD generating budget, but that's about it. Cary On Thu, Mar 4, 2010 at 2:41 PM, Esme Cowles escow...@ucsd.edu wrote: After seeing some of the cool things people can do with other ILS's and how negative developers are about III, there's always the chance they might decide to open up a bit more and engage with code4lib types (we can always dream). And if that doesn't work, maybe the Ian Walls' talk (Becoming Truly Innovative: Migrating from Millennium to Koha) will motivate them... -Esme -- Esme Cowles escow...@ucsd.edu They extend copyrights perpetually. They don't get how that in itself is a form of theft. -- Lawrence Lessig, Free Culture On Mar 4, 2010, at 5:08 PM, Jill Ellern wrote: We tried to get some of the ILS's interested...with little success. But how knows...I did some heavy promotion to III this year...(despite the many --s, she promised to talk to headquarters) so perhaps they might help some next year... Jill -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Paul Joseph Sent: Wednesday, March 03, 2010 9:56 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals No need to be concerned about the vendors: they're the same suspects who sponsored C4L10. Paul On Wed, Mar 3, 2010 at 2:37 PM, Ya'aqov Ziso z...@rowan.edu wrote: also, I can assure you that to help keep registration fees low we'll be leaning on our vendors ... = Who would be these vendors? Seems CODE4LIB (bringing in creative, leading edge, OpenSource ideas where ILS have monolithically reigned) are the bad dream of ILS vendors. WorldCat DeveNet/Research may make an exception, but will it be $ufficient? Ya¹aqov -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Benjamin Young Sent: Friday, March 05, 2010 09:26 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter If you're looking at putting MARC into JSON, there was some discussion of that during code4lib 2010. Johnathan Rochkind, who was at code4lib 2010 blogged about marc-json recently: http://bibwild.wordpress.com/2010/03/03/marc-json/ He references a project that Bill Dueber's been playing with for a year: http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/ All told, there's growing momentum for a MARC in JSON format to be created, so you might jump in there. Too bad I didn't attend code4lib. OCLC Research has created a version of MARC in JSON and will probably release FAST concepts in MARC binary, MARC-XML and our MARC-JSON format among other formats. I'm wondering whether there is some consensus that can be reached and standardized at LC's level, just like OCLC, RLG and LC came to consensus on MARC-XML. Unfortunately, I have not had the time to document the format, although it fairly straight forward, and yes we have an XSLT to convert from MARC-XML to MARC-JSON. Basically the format I'm using is: [ ... ] which represents a collection of MARC records or { ... } which represents a single MARC records that takes the form: { leader : 01192cz a2200301n 4500, controlfield : [ { tag : 001, data : fst01303409 }, { tag : 003, data : OCoLC }, { tag : 005, data : 20100202194747.3 }, { tag : 008, data : 060620nn anznnbabn || ana d } ], datafield : [ { tag : 040, ind1 : , ind2 : , subfield : [ { code : a, data : OCoLC }, { code : b, data : eng }, { code : c, data : OCoLC }, { code : d, data : OCoLC-O }, { code : f, data : fast }, ] }, { tag : 151, ind1 : , ind2 : , subfield : [ { code : a, data : Hawaii }, { code : z, data : Diamond Head } ] } ] }
[CODE4LIB] UBC jobs
Anyone know the status of library systems jobs opening up at UBC? I don't see any posted yet on their site (http://hr.ubc.ca/careers/staff_postings.html), but heard there would be 4 positions open soon. I'll be moving to Vancouver next month, and am looking for work there. Thanks for any information you can provide! Jill Earles
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 12:01 PM, Houghton,Andrew hough...@oclc.org wrote: Too bad I didn't attend code4lib. OCLC Research has created a version of MARC in JSON and will probably release FAST concepts in MARC binary, MARC-XML and our MARC-JSON format among other formats. I'm wondering whether there is some consensus that can be reached and standardized at LC's level, just like OCLC, RLG and LC came to consensus on MARC-XML. Unfortunately, I have not had the time to document the format, although it fairly straight forward, and yes we have an XSLT to convert from MARC-XML to MARC-JSON. Basically the format I'm using is: The stuff I've been doing: http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/ ... is pretty much the same, except: 1. I don't explicitly split up control and data fields. There's a single field list; an item that has two elements is a control field (tag/data); one with four is a data field (tag / ind1 /ind2 / array_of_subfield) 2. Instead of putting a collection in a big json array, I use newline-delimited-json (basically, just stick one record on each line as a single json hash). This has the advantage that it makes streaming much, much easier, and makes doing some other things (e.g., grab the first record or two) much cheaper for even the dumbest json parser). I'm not sure what the state of JSON streaming parsers are; I know Jackson (for Java) can do it, and perl's JSON::XS can...kind of...but it's not great. 3. I include a type (MARC-JSON, MARC-HASH, whatever) and version: [major, minor] in each record. There's already a ton of JSON floating around the library world; labeling what the heck a structure is is just friendly :-) MARC's structure is dumb enough that we collectively basically can't miss; there's only so much you can do with the stuff, and a round-trip to JSON and back is easy to implement. I'm not super-against explicitly labeling the data elements (tag:, :ind1:, etc.) but I don't see where it's necessary unless you're planning on adding out-of-band data to the records/fields/subfields at some point. Which might be kinda cool (e.g., language hints on a per-subfield basis? Tokenization hints for non-whitespace-delimited languages? URIs for unique concepts and authorities where they exist for easy creation of RDF?) I *am*, however, willing to push and push and push for NDJ instead of having to deal with streaming JSON parsing, which to my limited understanding is hard to get right and to my more qualified understanding is a pain in the ass to work with. And anything we do should explicitly be UTF-8 only; converting from MARC-8 is a problem for the server, not the receiver. Support for what I've been calling marc-hash (I like to decouple it from the eventual JSON format in case the serialization preferences change, or at least so implementations don't get stuck with a single JSON library) is already baked into ruby-marc, and obviously implementations are dead-easy no matter what the underlying language is. Anyone from the LoC want to get in on this? -Bill- [ ... ] which represents a collection of MARC records or { ... } which represents a single MARC records that takes the form: { leader : 01192cz a2200301n 4500, controlfield : [ { tag : 001, data : fst01303409 }, { tag : 003, data : OCoLC }, { tag : 005, data : 20100202194747.3 }, { tag : 008, data : 060620nn anznnbabn || ana d } ], datafield : [ { tag : 040, ind1 : , ind2 : , subfield : [ { code : a, data : OCoLC }, { code : b, data : eng }, { code : c, data : OCoLC }, { code : d, data : OCoLC-O }, { code : f, data : fast }, ] }, { tag : 151, ind1 : , ind2 : , subfield : [ { code : a, data : Hawaii }, { code : z, data : Diamond Head } ] } ] } -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bill Dueber Sent: Friday, March 05, 2010 12:30 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter On Fri, Mar 5, 2010 at 12:01 PM, Houghton,Andrew hough...@oclc.org wrote: Too bad I didn't attend code4lib. OCLC Research has created a version of MARC in JSON and will probably release FAST concepts in MARC binary, MARC-XML and our MARC-JSON format among other formats. I'm wondering whether there is some consensus that can be reached and standardized at LC's level, just like OCLC, RLG and LC came to consensus on MARC-XML. Unfortunately, I have not had the time to document the format, although it fairly straight forward, and yes we have an XSLT to convert from MARC-XML to MARC-JSON. Basically the format I'm using is: The stuff I've been doing: http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/ ... is pretty much the same, except: I decided to stick closer to a MARC-XML type definition since its would be easier to explain how the two specifications are related, rather than take a more radical approach in producing a specification less familiar. Not to say that other approaches are bad, they just have different advantages and disadvantages. I was going for simple and familiar. I certainly would be will to work with LC on creating a MARC-JSON specification as I did in creating the MARC-XML specification. Andy.
Re: [CODE4LIB] Code4Lib 2011 Proposals
Hiya - San Diego is friggin expensive, and we don't have a small campus feel at all. Robert McDonald and I worked out the costs a few years ago and we'd be almost double what Asheville conf cost folks. It's killing me not to have you all out to paradise in Feb, but I can barely afford to live here :) D -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walter Lewis Sent: Wednesday, March 03, 2010 11:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals On 3 Mar 10, at 9:52 AM, Julia Bauder wrote: Also, the farther north we go, the more likely that snow+airplane incompatibilities will foil speakers' (and attendees'!) travel plans at the last minute, which isn't fun for anyone. somewhere_out_of_nor'easter_and_lake_effect_range_in_february++ Actually there is a clear line (at least on the eastern half of the continent) where the further north you go, the *less* snow you got this. Buffalo is trailing a number of places on the east coast in total snow accumulation and Toronto has been dusted a few times this winter, with nothing of real substance. Detroit and Chicago were well below seasonal averages last time I checked. ALL of that said, where are the San Diego gang or the folks from Miami? Walter who can only dream of pubs with open patios in February
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote: I decided to stick closer to a MARC-XML type definition since its would be easier to explain how the two specifications are related, rather than take a more radical approach in producing a specification less familiar. Not to say that other approaches are bad, they just have different advantages and disadvantages. I was going for simple and familiar. That makes sense, but please consider adding a format/version (which we get in MARC-XML from the namespace and isn't present here). In fact, please consider adding a format / version / URI, so people know what they've got. I'm also going to again push the newline-delimited-json stuff. The collection-as-array is simple and very clean, but leads to trouble for production (where for most of us we'd have to get the whole freakin' collection in memory first and then call JSON.dump or whatever) or consumption (have to deal with a streaming json parser). The production part is particularly worrisome, since I'd hate for everyone to have to default to writing out a '[', looping through the records, and writing a ']'. Yeah, it's easy enough, but it's an ugly hack that *everyone* would have to do, as opposed to just something like: while (r = nextRecord) { print r.to_json, \n } Unless, of course, writing json to a stream and reading json from a stream is a lot easier than I make it out to be across a variety of languages and I just don't know it, which is entirely possible. The streaming writer interfaces for Perl ( http://search.cpan.org/dist/JSON-Streaming-Writer/lib/JSON/Streaming/Writer.pm) and Java's Jackson ( http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example) are a little more daunting than I'd like them to be. Not wanting to argue unnecessarily, here; just adding input before things get effectively set in stone. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
On 3/5/10 1:10 PM, Houghton,Andrew wrote: From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bill Dueber Sent: Friday, March 05, 2010 12:30 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter On Fri, Mar 5, 2010 at 12:01 PM, Houghton,Andrewhough...@oclc.org wrote: Too bad I didn't attend code4lib. OCLC Research has created a version of MARC in JSON and will probably release FAST concepts in MARC binary, MARC-XML and our MARC-JSON format among other formats. I'm wondering whether there is some consensus that can be reached and standardized at LC's level, just like OCLC, RLG and LC came to consensus on MARC-XML. Unfortunately, I have not had the time to document the format, although it fairly straight forward, and yes we have an XSLT to convert from MARC-XML to MARC-JSON. Basically the format I'm using is: The stuff I've been doing: http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/ ... is pretty much the same, except: I decided to stick closer to a MARC-XML type definition since its would be easier to explain how the two specifications are related, rather than take a more radical approach in producing a specification less familiar. Not to say that other approaches are bad, they just have different advantages and disadvantages. I was going for simple and familiar. I certainly would be will to work with LC on creating a MARC-JSON specification as I did in creating the MARC-XML specification. Andy. A CouchDB friend of mine just pointed me to the BibJSON format by the Bibliographic Knowledge Network: http://www.bibkn.org/bibjson/index.html Might be worth looking through for future collaboration/transformation options. Benjamin
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote: I certainly would be will to work with LC on creating a MARC-JSON specification as I did in creating the MARC-XML specification. Quite frankly, I think I (and I imagine others) would much rather see a more open, RFC-style process to creating a marc-json spec than I talked to LC and here you go. Maybe I'm misreading this last paragraph a bit, however. -Ross.
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 2:06 PM, Benjamin Young byo...@bigbluehat.com wrote: A CouchDB friend of mine just pointed me to the BibJSON format by the Bibliographic Knowledge Network: http://www.bibkn.org/bibjson/index.html Might be worth looking through for future collaboration/transformation options. marc-json and BibJSON serve two different purposes: marc-json would need to be a loss-less serialization of a MARC record which may or may not contain bibliographic data (it may be an authority, holding or CID record, for example). BibJSON is more of a merging of data model and serialization (which, admittedly, is no stranger to MARC) for the purpose of bibliographic /citations/. So it will probably be lossy and there would most likely be a lot of MARC data that is out of scope. That's not to say it wouldn't be useful to figure out how to get from MARC-BibJSON, but from my perspective it's difficult to see the advantage it brings (being tied to JSON) vs. BIBO. -Ross.
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bill Dueber Sent: Friday, March 05, 2010 01:59 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote: I decided to stick closer to a MARC-XML type definition since its would be easier to explain how the two specifications are related, rather than take a more radical approach in producing a specification less familiar. Not to say that other approaches are bad, they just have different advantages and disadvantages. I was going for simple and familiar. That makes sense, but please consider adding a format/version (which we get in MARC-XML from the namespace and isn't present here). In fact, please consider adding a format / version / URI, so people know what they've got. This sounds reasonable and I'll consider adding into our specification. I'm also going to again push the newline-delimited-json stuff. The collection-as-array is simple and very clean, but leads to trouble for production (where for most of us we'd have to get the whole freakin' collection in memory first ... As far as our MARC-JSON specificaton is concerned a server application can return either a collection or record which mimics the MARC-XML specification where the collection or record element can be used for a document element. Unless, of course, writing json to a stream and reading json from a stream is a lot easier than I make it out to be across a variety of languages and I just don't know it, which is entirely possible. The streaming writer interfaces for Perl ( http://search.cpan.org/dist/JSON-Streaming- Writer/lib/JSON/Streaming/Writer.pm) and Java's Jackson ( http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example) are a little more daunting than I'd like them to be. As you point out JSON streaming doesn't work with all clients and I am hesitent to build on anything that all clients cannot accept. I think part of the issue here is proper API design. Sending tens of megabytes back to a client and expecting them to process it seems like a poor API design regardless of whether they can stream it or not. It might make more sense to have a server API send back 10 of our MARC-JSON records in a JSON collection and have the client request an additional batch of records for the result set. In addition, if I remember correctly, JSON streaming or other streaming methods keep the connection to the server open which is not a good thing to do to maintain server throughput. Andy.
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Benjamin Young Sent: Friday, March 05, 2010 02:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter A CouchDB friend of mine just pointed me to the BibJSON format by the Bibliographic Knowledge Network: http://www.bibkn.org/bibjson/index.html Might be worth looking through for future collaboration/transformation options. Unfortunately, it doesn't really work for authority and classification data that I'm frequently involved with. Andy.
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, March 05, 2010 02:32 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote: I certainly would be will to work with LC on creating a MARC-JSON specification as I did in creating the MARC-XML specification. Quite frankly, I think I (and I imagine others) would much rather see a more open, RFC-style process to creating a marc-json spec than I talked to LC and here you go. Maybe I'm misreading this last paragraph a bit, however. Yes, you misread the last paragraph. Andy.
Re: [CODE4LIB] Q: XML2JSON converter
On 3/5/10 2:46 PM, Ross Singer wrote: On Fri, Mar 5, 2010 at 2:06 PM, Benjamin Youngbyo...@bigbluehat.com wrote: A CouchDB friend of mine just pointed me to the BibJSON format by the Bibliographic Knowledge Network: http://www.bibkn.org/bibjson/index.html Might be worth looking through for future collaboration/transformation options. marc-json and BibJSON serve two different purposes: marc-json would need to be a loss-less serialization of a MARC record which may or may not contain bibliographic data (it may be an authority, holding or CID record, for example). BibJSON is more of a merging of data model and serialization (which, admittedly, is no stranger to MARC) for the purpose of bibliographic /citations/. So it will probably be lossy and there would most likely be a lot of MARC data that is out of scope. That's not to say it wouldn't be useful to figure out how to get from MARC-BibJSON, but from my perspective it's difficult to see the advantage it brings (being tied to JSON) vs. BIBO. -Ross. Thanks for the clarification, Ross. I thought it would be helpful (if nothing else) to see how data was being mapped in a related domain into and out of JSON. I'm new to library data in general, so I appreciate the clarification on which format is for what. Appreciated, Benjamin
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 3:14 PM, Houghton,Andrew hough...@oclc.org wrote: As you point out JSON streaming doesn't work with all clients and I am hesitent to build on anything that all clients cannot accept. I think part of the issue here is proper API design. Sending tens of megabytes back to a client and expecting them to process it seems like a poor API design regardless of whether they can stream it or not. It might make more sense to have a server API send back 10 of our MARC-JSON records in a JSON collection and have the client request an additional batch of records for the result set. In addition, if I remember correctly, JSON streaming or other streaming methods keep the connection to the server open which is not a good thing to do to maintain server throughput. I guess my concern here is that the specification, as you're describing it, is closing off potential uses. It seems fine if, for example, your primary concern is javascript-in-the-browser, and browser-request, pagination-enabled systems might be all you're worried about right now. That's not the whole universe of uses, though. People are going to want to dump these things into a file to read later -- no possibility for pagination in that situation. Others may, in fact, want to stream a few thousand records down the pipe at once, but without a streaming parser that can't happen if it's all one big array. I worry that as specified, the *only* use will be, Pull these down a thin pipe, and if you want to keep them for later, or want a bunch of them, you have to deal with marc-xml. Part of my incentive is to *not* have to use marc-xml, but in this case I'd just be trading one technology I don't like (marc-xml) for two technologies, one of which I don't like (that'd be marc-xml again). I really do understand the desire to make this parallel to marc-xml, but there's a seem between the two technologies that makes that a problematic approach. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
-Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bill Dueber I really do understand the desire to make this parallel to marc-xml, but there's a seem between the two technologies that makes that a problematic approach. As a confession, here in OCLC Research, we do pass around files of marc-xml records that are newline delimited without a wrapper element containing them. We do that for all the reasons you gave for wanting the same thing for JSON records. Ralph
Re: [CODE4LIB] Q: XML2JSON converter
On 3/5/10 3:45 PM, Bill Dueber wrote: On Fri, Mar 5, 2010 at 3:14 PM, Houghton,Andrewhough...@oclc.org wrote: As you point out JSON streaming doesn't work with all clients and I am hesitent to build on anything that all clients cannot accept. I think part of the issue here is proper API design. Sending tens of megabytes back to a client and expecting them to process it seems like a poor API design regardless of whether they can stream it or not. It might make more sense to have a server API send back 10 of our MARC-JSON records in a JSON collection and have the client request an additional batch of records for the result set. In addition, if I remember correctly, JSON streaming or other streaming methods keep the connection to the server open which is not a good thing to do to maintain server throughput. I guess my concern here is that the specification, as you're describing it, is closing off potential uses. It seems fine if, for example, your primary concern is javascript-in-the-browser, and browser-request, pagination-enabled systems might be all you're worried about right now. That's not the whole universe of uses, though. People are going to want to dump these things into a file to read later -- no possibility for pagination in that situation. Others may, in fact, want to stream a few thousand records down the pipe at once, but without a streaming parser that can't happen if it's all one big array. I worry that as specified, the *only* use will be, Pull these down a thin pipe, and if you want to keep them for later, or want a bunch of them, you have to deal with marc-xml. Part of my incentive is to *not* have to use marc-xml, but in this case I'd just be trading one technology I don't like (marc-xml) for two technologies, one of which I don't like (that'd be marc-xml again). I really do understand the desire to make this parallel to marc-xml, but there's a seem between the two technologies that makes that a problematic approach. For my part, I'd like to explore the options of putting MARC data into CouchDB (which stores documents as JSON) which could then open the door for replicating that data between any number of installations of CouchDB as well as providing for various output formats (marc-xml, etc). It's just an idea, but it's one that uses JSON outside of the browser and is a good proof case for any MARC in JSON format. Thanks, Benjamin -- President BigBlueHat P: 864.232.9553 W: http://www.bigbluehat.com/ http://www.linkedin.com/in/benjaminyoung
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bill Dueber Sent: Friday, March 05, 2010 03:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter I guess my concern here is that the specification, as you're describing it, is closing off potential uses. It seems fine if, for example, your primary concern is javascript-in-the-browser, and browser-request, pagination-enabled systems might be all you're worried about right now. That's not the whole universe of uses, though. People are going to want to dump these things into a file to read later -- no possibility for pagination in that situation. I disagree that you couldn't dump a paginated result set into a file for reading later. I do this all the time not only in Javascript, but may other programming languages. Others may, in fact, want to stream a few thousand records down the pipe at once, but without a streaming parser that can't happen if it's all one big array. Well, if your service isn't allowing them to be streamed a few thousand records at a time, then that isn't a issue :) Maybe I have been mislead or misunderstood JSON streaming. My understanding was that you can generate an arbitrary large outgoing stream on the server side and can read an arbitrary large incoming stream on the client side. So it shouldn't matter if the result set was delivered as one big JSON array. The SAX like interface that JSON streaming uses provides the necessary events to allow you to pull the individual records from that arbitrary large array. I worry that as specified, the *only* use will be, Pull these down a thin pipe, and if you want to keep them for later, or want a bunch of them, you have to deal with marc-xml. Don't quite follow this. MARC-XML is an XML format, MARC-JSON is our JSON format for expressing the various MARC-21 format, e.g., authority, bibliographic, classification, community information and holdings in JSON. The JSON is based on the structure of MARC-XML which was based on the structure of ISO 2709. Don't see how MARC-XML comes into play when you are dealing with JSON. If you want to save our MARC-JSON you don't have to convert it to MARC-XML on the client side. Just save it as a text file. Part of my incentive is to *not* have to use marc-xml, but in this case I'd just be trading one technology I don't like (marc-xml) for two technologies, one of which I don't like (that'd be marc-xml again). Again not sure how to address this concern. If you are dealing with library data, then its current communication formats are either MARC binary (ISO 2709) or MARC-XML, ignoring IFLA's MARC-XML-ish format for the moment. You might not like it, but that is life in library land. You can go develop your own formats based on the various MARC-21 format specifications, but are unlikely to achieve any sort of interoperability with the existing library systems and services. We choose our MARC-JSON to maintain the structural components of MARC-XML and hence MARC binary (ISO 2709). In MARC, control fields have different semantics from data fields and you cannot merge them into one thing called field. If you look closely at the MARC-XML schema, you might notice that the controlfield and datafield elements can have non-numeric tags. If you merge everything into something called field, then you cannot distinguish between a non-numeric tag for a controlfield vs. a datafield element. There are valid reasons why we decided to maintain the existing structure of MARC. Andy.
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Benjamin Young Sent: Friday, March 05, 2010 04:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter For my part, I'd like to explore the options of putting MARC data into CouchDB (which stores documents as JSON) which could then open the door for replicating that data between any number of installations of CouchDB as well as providing for various output formats (marc-xml, etc). It's just an idea, but it's one that uses JSON outside of the browser and is a good proof case for any MARC in JSON format. This was partly the reason why I developed our MARC-JSON format since I'm using MongoDB [1] which is a NoSQL database based on JSON. Andy. [1] http://www.mongodb.org/display/DOCS/Home
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 4:38 PM, Houghton,Andrew hough...@oclc.org wrote: Maybe I have been mislead or misunderstood JSON streaming. This is my central point. I'm actually saying that JSON streaming is painful and rare enough that it should be avoided as a requirement for working with any new format. I guess, in sum, I'm making the following assertions: 1. Streaming APIs for JSON, where they exist, are a pain in the ass. And they don't exist everywhere. Without a JSON streaming parser, you have to pull the whole array of documents up into memory, which may be impossible. This is the crux of my argument -- if you disagree with it, then I would assume you disagree with the other points as well. 2. Many people -- and I don't think I'm exaggerating here, honestly -- really don't like using MARC-XML but have to because of the length restrictions on MARC-binary. A useful alternative, based on dead-easy parsing and production, is very appealing. 2.5 Having to deal with a streaming API takes away the dead-easy part. 3. If you accept my assertions about streaming parsers, then dealing with the format you've proposed for large sets is either painful (with a streaming API) or impossible (where such an API doesn't exist) due to memory constraints. 4. Streaming JSON writer APIs are also painful; everything that applies to reading applies to writing. Sans a streaming writer, trying to *write* a large JSON document also results in you having to have the whole thing in memory. 5. People are going to want to deal with this format, because of its benefits over marc21 (record length) and marc-xml (ease of processing), which means we're going to want to deal with big sets of data and/or dump batches of it to a file. Which brings us back to #1, the pain or absence of streaming apis. Write a better JSON parser/writer or use a different language seem like bad solutions to me, especially when a (potentially) useful alternative exists. As I pointed out, if streaming JSON is no harder/unavailable to you than non-streaming json, then this is mostly moot. I assert that for many people in this community it is one or the other, which is why I'm leery of it. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Code4Lib 2011 Proposals
Miami is also very expensive, it's considered top 3 now in the most expensive places to live, plus I must add that Feb is also our high season which means hotel rates and airfares are more than double the usual rates. We also have a poor public transportation system... sorry, unless someone else in Florida can host the conference. Vanessa Meireles v.meire...@miami.edu Computer Programmer, Information Mgmt Systems and Digital Initiatives University of Miami Richter Library Coral Gables, FL 33124-0320 URL: http://www.library.miami.edu/ -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Fleming, Declan Sent: Friday, March 05, 2010 1:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals Hiya - San Diego is friggin expensive, and we don't have a small campus feel at all. Robert McDonald and I worked out the costs a few years ago and we'd be almost double what Asheville conf cost folks. It's killing me not to have you all out to paradise in Feb, but I can barely afford to live here :) D -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walter Lewis Sent: Wednesday, March 03, 2010 11:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals On 3 Mar 10, at 9:52 AM, Julia Bauder wrote: Also, the farther north we go, the more likely that snow+airplane incompatibilities will foil speakers' (and attendees'!) travel plans at the last minute, which isn't fun for anyone. somewhere_out_of_nor'easter_and_lake_effect_range_in_february++ Actually there is a clear line (at least on the eastern half of the continent) where the further north you go, the *less* snow you got this. Buffalo is trailing a number of places on the east coast in total snow accumulation and Toronto has been dusted a few times this winter, with nothing of real substance. Detroit and Chicago were well below seasonal averages last time I checked. ALL of that said, where are the San Diego gang or the folks from Miami? Walter who can only dream of pubs with open patios in February
Re: [CODE4LIB] Code4Lib 2011 Proposals
Anyone interested in Burlington, Vt.? If I had some help (and the deadline extended a couple days) I'd be willing to throw in the hat. Sibyl Schaefer University of Vermont On Fri, Mar 5, 2010 at 5:22 PM, Meireles, Vanessa v.meire...@miami.edu wrote: Miami is also very expensive, it's considered top 3 now in the most expensive places to live, plus I must add that Feb is also our high season which means hotel rates and airfares are more than double the usual rates. We also have a poor public transportation system... sorry, unless someone else in Florida can host the conference. Vanessa Meireles v.meire...@miami.edu Computer Programmer, Information Mgmt Systems and Digital Initiatives University of Miami Richter Library Coral Gables, FL 33124-0320 URL: http://www.library.miami.edu/ -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Fleming, Declan Sent: Friday, March 05, 2010 1:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals Hiya - San Diego is friggin expensive, and we don't have a small campus feel at all. Robert McDonald and I worked out the costs a few years ago and we'd be almost double what Asheville conf cost folks. It's killing me not to have you all out to paradise in Feb, but I can barely afford to live here :) D -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walter Lewis Sent: Wednesday, March 03, 2010 11:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals On 3 Mar 10, at 9:52 AM, Julia Bauder wrote: Also, the farther north we go, the more likely that snow+airplane incompatibilities will foil speakers' (and attendees'!) travel plans at the last minute, which isn't fun for anyone. somewhere_out_of_nor'easter_and_lake_effect_range_in_february++ Actually there is a clear line (at least on the eastern half of the continent) where the further north you go, the *less* snow you got this. Buffalo is trailing a number of places on the east coast in total snow accumulation and Toronto has been dusted a few times this winter, with nothing of real substance. Detroit and Chicago were well below seasonal averages last time I checked. ALL of that said, where are the San Diego gang or the folks from Miami? Walter who can only dream of pubs with open patios in February
Re: [CODE4LIB] Code4Lib 2011 Proposals
Hi Sibyl, I'd love Burlington. It might not be warm but there is a lot of good winter activities. However, It is probably too late for this year to find out what the costs, etc. are, but if you want to put a proposal for 2012, count me in. Edward On Fri, Mar 5, 2010 at 5:53 PM, Sibyl Schaefer sibylschae...@gmail.com wrote: Anyone interested in Burlington, Vt.? If I had some help (and the deadline extended a couple days) I'd be willing to throw in the hat. Sibyl Schaefer University of Vermont On Fri, Mar 5, 2010 at 5:22 PM, Meireles, Vanessa v.meire...@miami.edu wrote: Miami is also very expensive, it's considered top 3 now in the most expensive places to live, plus I must add that Feb is also our high season which means hotel rates and airfares are more than double the usual rates. We also have a poor public transportation system... sorry, unless someone else in Florida can host the conference. Vanessa Meireles v.meire...@miami.edu Computer Programmer, Information Mgmt Systems and Digital Initiatives University of Miami Richter Library Coral Gables, FL 33124-0320 URL: http://www.library.miami.edu/ -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Fleming, Declan Sent: Friday, March 05, 2010 1:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals Hiya - San Diego is friggin expensive, and we don't have a small campus feel at all. Robert McDonald and I worked out the costs a few years ago and we'd be almost double what Asheville conf cost folks. It's killing me not to have you all out to paradise in Feb, but I can barely afford to live here :) D -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walter Lewis Sent: Wednesday, March 03, 2010 11:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals On 3 Mar 10, at 9:52 AM, Julia Bauder wrote: Also, the farther north we go, the more likely that snow+airplane incompatibilities will foil speakers' (and attendees'!) travel plans at the last minute, which isn't fun for anyone. somewhere_out_of_nor'easter_and_lake_effect_range_in_february++ Actually there is a clear line (at least on the eastern half of the continent) where the further north you go, the *less* snow you got this. Buffalo is trailing a number of places on the east coast in total snow accumulation and Toronto has been dusted a few times this winter, with nothing of real substance. Detroit and Chicago were well below seasonal averages last time I checked. ALL of that said, where are the San Diego gang or the folks from Miami? Walter who can only dream of pubs with open patios in February
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 6:25 PM, Houghton,Andrew hough...@oclc.org wrote: OK, I will bite, you stated: 1. That large datasets are a problem. 2. That streaming APIs are a pain to deal with. 3. That tool sets have memory constraints. So how do you propose to process large JSON datasets that: 1. Comply with the JSON specification. 2. Can be read by any JavaScript/JSON processor. 3. Do not require the use of streaming API. 4. Do not exceed the memory limitations of current JSON processors. What I'm proposing is that we don't process large JSON datasets; I'm proposing that we process smallish JSON documents one at a time by pulling them out of a stream based on an end-of-record character. This is basically what we use for MARC21 binary format -- have a defined structure for a valid record, and separate multiple well-formed record structures with an end-of-record character. This preserves JSON specification adherence at the record level and uses a different scheme to represent collections. Obviously, MARC-XML uses a different mechanism to define a collection of records -- putting well-formed record structures inside a collection tag. So... I'm proposing define what we mean by a single MARC record serialized to JSON (in whatever format; I'm not very opinionated on this point) that preserves the order, indicators, tags, data, etc. we need to round-trip between marc21binary, marc-xml, and marc-json. And then separate those valid records with an end-of-record character -- \n. Unless I've read all this wrong, you've come to the conclusion that the benefit of having a JSON serialization that is valid JSON at both the record and collection level outweighs the pain of having to deal with a streaming parser and writer. This allows a single collection to be treated as any other JSON document, which has obvious benefits (which I certainly don't mean to minimize) and all the drawbacks we've been talking about *ad nauseam *. I go the the other way. I think the pain of dealing with a streaming API outweighs the benefits of having a single valid JSON structure for a collection, and instead have put forward that we use a combination of JSON records and a well-defined end-of-record character (\n) to represent a collection. I recognize that this involves providing special-purpose code which must call for JSON-deserialization on each line, instead of being able to throw the whole stream/file/whatever at your json parser is. I accept that because getting each line of a text file is something I find easy compared to dealing with streaming parsers. And our point of disagreement, I think, is that I believe that defining the collection structure in such a way that we need two steps (get a line; deserialize that line) and can't just call the equivalent of JSON.parse(stream) has benefits in ease of implementation and use that outweigh the loss of having both a single record and a collection of records be valid JSON. And you, I think, don't :-) I'm going to bow out of this now, unless I've got some part of our positions wrong, to let any others that care (which may number zero) chime in. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Q: XML2JSON converter
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Friday, March 05, 2010 09:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q: XML2JSON converter I actually just wrote the same exact email as Bill (although probably not as polite -- I called the marcxml collection element a contrivance that appears nowhere in marc21). I even wrote the marc21 is EOR character delimited files bit. I was hoping to figure out how to use unix split to make my point, couldn't, and then discarded my draft. But I was *right there*. -Ross. I'll answer Bill's message tomorrow after I have had some sleep :) Actually, I contend that the MARC-XML collection element does appear in MARC (ISO 2709), but it is at the physical layer and not at the structural layer. Remember MARC records were placed on a tape reel, thus the tape reel was the collection (container). Placed on disk in a file, the file is the collection (container). I agree that it's not spelled out in the standard, but the concept of a collection (container) is implicit when you have more than one record of anything. Basic set theory: a set is a container for its members :) The obvious reason why it exists in XML is that the XML infoset requires a single document element (container). This is why the MARC-XML schema allows either a collection or record element to be specified as the document element. It is unfortunate that the XML infoset requires a single document element, otherwise you would be back to the file on disk being the implicit collection (container) as it is in ISO 2709. Andy.