Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Ulrich Schaefer

Godmar Back wrote:

Hi,

Can anybody recommend an open source XML2JSON converter in PhP or
Python (or potentially other languages, including XSLT stylesheets)?

Ideally, it should implement one of the common JSON conventions, such
as Google's JSON convention for GData [1], but anything that preserves
all elements, attributes, and text content of the XML file would be
acceptable.

Note that json_encode(simplexml_load_file(...)) does not meet this
requirement - in fact, nothing based on simplexml_load_file() will.
(It can't even load MarcXML correctly).

Thanks!

 - Godmar

[1] http://code.google.com/apis/gdata/docs/json.html
  

Hi,
try this: http://code.google.com/p/xml2json-xslt/

best,
Ulrich

--
Dr.-Ing. Ulrich Schaefer http://dfki.de/~uschaefer phone:+496813025154
   DFKI Language Technology Lab, D-66123 Saarbruecken, Germany
---
  Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
  Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender), Dr. Walter Olthoff. Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes. Amtsgericht Kaiserslautern, HRB 2313


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Godmar Back
On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote:

 Hi,
 try this: http://code.google.com/p/xml2json-xslt/


I should have mentioned that I already tried everything I could find after
googling - this stylesheet doesn't meet the requirements, not by far. It
drops attributes just like simplexml_json does.

The one thing I didn't try is a program called 'BadgerFish.php' which I
couldn't locate - Google once indexed it at badgerfish.ning.com

 - Godmar


Re: [CODE4LIB] Code4Lib Midwest?

2010-03-05 Thread Scott Garrison
+1

ELM, I'm happy to help coordinate in whatever way you need.

Also, if we can find a drummer, we could do a blues trio (count me in on bass). 
I could bring our band's drummer (a HUGE ND fan) down for a day or two if 
needed--he's awesome.

--SG
WMU in Kalamazoo

- Original Message -
From: Eric Lease Morgan emor...@nd.edu
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Thursday, March 4, 2010 4:38:53 PM
Subject: Re: [CODE4LIB] Code4Lib Midwest?

On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote:

  2. share demonstrations
 
 I'd like to see this be something like a blend between lightning talks
 and the ask anything session at the last conference

This certainly works for me, and the length of time of each talk would/could 
be directly proportional to the number of people who attend.


  4. give a presentation to library staff
 
 What sort of presentation did you have in mind, Eric?
 
 This also raises the issue of weekday vs. weekend. I'm game for
 either. Anyone else have a preference?

What I was thinking here was a possible presentation to library faculty/staff 
and/or computing faculty/staff from across campus. The presentation could be 
one or two cool hacks or solutions that solved wider, less geeky problems. 
Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested 
content it would be making journal articles easier to find. This would be an 
opportunity to show off the good work done by institutions outside Notre Dame. 
A prophet in their own land is not as convincing as the expert from afar.

I was thinking it would happen on a weekday. There would be more stuff going on 
here on campus, as well as give everybody a break from their normal work week. 
More specifically, I would suggest such an event take place on a Friday so the 
poeple who stayed over night would not have to take so many days off of work.


  5. have a hack session
 
 It would be good to have 2 or 3 projects we can/should work on decided
 ahead of time (in case no one has any good ideas at the time), and
 perhaps a couple more inspired by the earlier presentations.



True.

-- 
ELM
University of Notre Dame


Re: [CODE4LIB] Code4Lib Midwest?

2010-03-05 Thread Jonathan Brinley
On Fri, Mar 5, 2010 at 8:37 AM, Scott Garrison scott.garri...@wmich.edu wrote:
 Also, if we can find a drummer, we could do a blues trio (count me in on 
 bass).

If someone can bring drums, I can play them.



-- 
Jonathan M. Brinley

jonathanbrin...@gmail.com
http://xplus3.net/


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Kevin S. Clarke
Internet Archive seems to have a copy of that:

http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php

as well as several versions of the site:

http://web.archive.org/web/*/http://badgerfish.ning.com

Kevin



On Fri, Mar 5, 2010 at 8:15 AM, Godmar Back god...@gmail.com wrote:
 On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer 
 ulrich.schae...@dfki.dewrote:

 Hi,
 try this: http://code.google.com/p/xml2json-xslt/


 I should have mentioned that I already tried everything I could find after
 googling - this stylesheet doesn't meet the requirements, not by far. It
 drops attributes just like simplexml_json does.

 The one thing I didn't try is a program called 'BadgerFish.php' which I
 couldn't locate - Google once indexed it at badgerfish.ning.com

  - Godmar



Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Benjamin Young

On 3/5/10 8:15 AM, Godmar Back wrote:

On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaeferulrich.schae...@dfki.dewrote:

   

Hi,
try this: http://code.google.com/p/xml2json-xslt/


 

I should have mentioned that I already tried everything I could find after
googling - this stylesheet doesn't meet the requirements, not by far. It
drops attributes just like simplexml_json does.

The one thing I didn't try is a program called 'BadgerFish.php' which I
couldn't locate - Google once indexed it at badgerfish.ning.com

  - Godmar
   

Godmar,

I'd be interested in collaborating with you on creating one. I'd bounced 
this question off the CouchDB IRC channel a while back, and the summary 
was that you'd generally create a JSON structure for your document and 
then right the code to map the XML to JSON. However, I do think 
something more generic like Google's GData to JSON would fit the bill 
for most use cases...sadly, it doesn't seem they've made the conversion 
code available.


If you're looking at putting MARC into JSON, there was some discussion 
of that during code4lib 2010. Johnathan Rochkind, who was at code4lib 
2010 blogged about marc-json recently:

http://bibwild.wordpress.com/2010/03/03/marc-json/
He references a project that Bill Dueber's been playing with for a year:
http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/

All told, there's growing momentum for a MARC in JSON format to be 
created, so you might jump in there.


Additionally, I'd love to find a project building code to do what 
Google's done with the GData to JSON format. If you find one, I'd enjoy 
seeing it.


Thanks, Godmar,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Cary Gordon
You can find it here, although I wouldn't get too excited: http://bit.ly/acROxH

You could also fish for more info by badgering its creator at
http://www.sklar.com/page/section/contact.

Cary

On Fri, Mar 5, 2010 at 5:15 AM, Godmar Back god...@gmail.com wrote:
 On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer 
 ulrich.schae...@dfki.dewrote:

 Hi,
 try this: http://code.google.com/p/xml2json-xslt/


 I should have mentioned that I already tried everything I could find after
 googling - this stylesheet doesn't meet the requirements, not by far. It
 drops attributes just like simplexml_json does.

 The one thing I didn't try is a program called 'BadgerFish.php' which I
 couldn't locate - Google once indexed it at badgerfish.ning.com

  - Godmar




-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-05 Thread Cary Gordon
I could say you're a dreamer, but you're not the only one.

The reality is that III sees their APIs as gold mines that they can
market to a captive audience. For example, their patron API -- a
simple web interface to patron records -- probably cost them much less
to develop than they get for a single license. Don't look for them to
actually get off of that anytime soon.

Koha might motivate them to bump there FUD generating budget, but
that's about it.

Cary

On Thu, Mar 4, 2010 at 2:41 PM, Esme Cowles escow...@ucsd.edu wrote:
 After seeing some of the cool things people can do with other ILS's and how 
 negative developers are about III, there's always the chance they might 
 decide to open up a bit more and engage with code4lib types (we can always 
 dream).

 And if that doesn't work, maybe the Ian Walls' talk (Becoming Truly 
 Innovative: Migrating from Millennium to Koha) will motivate them...

 -Esme
 --
 Esme Cowles escow...@ucsd.edu

 They extend copyrights perpetually. They don't get how that in itself is a
  form of theft. -- Lawrence Lessig, Free Culture

 On Mar 4, 2010, at 5:08 PM, Jill Ellern wrote:

 We tried to get some of the ILS's interested...with little success.  But how 
 knows...I did some heavy promotion to III this year...(despite the many 
 --s, she promised to talk to headquarters) so perhaps they might help some 
 next year...

 Jill

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Paul 
 Joseph
 Sent: Wednesday, March 03, 2010 9:56 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

 No need to be concerned about the vendors: they're the same suspects who
 sponsored C4L10.
 Paul


 On Wed, Mar 3, 2010 at 2:37 PM, Ya'aqov Ziso z...@rowan.edu wrote:

  also, I can assure you that to help keep registration fees low we'll
 be
 leaning on our vendors ...
 =
 Who would be these vendors? Seems CODE4LIB (bringing in creative, leading
 edge, OpenSource ideas where ILS have monolithically reigned) are the bad
 dream of ILS vendors. WorldCat DeveNet/Research may make an exception, but
 will it be $ufficient?
 Ya¹aqov





-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Mark Mounts

have you tried this?

http://www.bramstein.com/projects/xsltjson/
http://github.com/bramstein/xsltjson

using the parameter |use-rayfish=true seems to preserve everything but 
namespaces but then there is a parameter to preserve namespaces as well.|


Mark

On 3/5/2010 12:54 AM, Godmar Back wrote:

Hi,

Can anybody recommend an open source XML2JSON converter in PhP or
Python (or potentially other languages, including XSLT stylesheets)?

Ideally, it should implement one of the common JSON conventions, such
as Google's JSON convention for GData [1], but anything that preserves
all elements, attributes, and text content of the XML file would be
acceptable.

Note that json_encode(simplexml_load_file(...)) does not meet this
requirement - in fact, nothing based on simplexml_load_file() will.
(It can't even load MarcXML correctly).

Thanks!

  - Godmar

[1] http://code.google.com/apis/gdata/docs/json.html
   


Re: [CODE4LIB] Code4Lib Midwest?

2010-03-05 Thread LeVan,Ralph
+1

I suspect a few of us from OCLC would attend.

Ralph

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Scott Garrison
 Sent: Friday, March 05, 2010 8:37 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib Midwest?
 
 +1
 
 ELM, I'm happy to help coordinate in whatever way you need.
 
 Also, if we can find a drummer, we could do a blues trio (count me in on 
 bass). I
 could bring our band's drummer (a HUGE ND fan) down for a day or two if
 needed--he's awesome.
 
 --SG
 WMU in Kalamazoo
 
 - Original Message -
 From: Eric Lease Morgan emor...@nd.edu
 To: CODE4LIB@LISTSERV.ND.EDU
 Sent: Thursday, March 4, 2010 4:38:53 PM
 Subject: Re: [CODE4LIB] Code4Lib Midwest?
 
 On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote:
 
   2. share demonstrations
 
  I'd like to see this be something like a blend between lightning talks
  and the ask anything session at the last conference
 
 This certainly works for me, and the length of time of each talk 
 would/could be
 directly proportional to the number of people who attend.
 
 
   4. give a presentation to library staff
 
  What sort of presentation did you have in mind, Eric?
 
  This also raises the issue of weekday vs. weekend. I'm game for
  either. Anyone else have a preference?
 
 What I was thinking here was a possible presentation to library faculty/staff
 and/or computing faculty/staff from across campus. The presentation could be
 one or two cool hacks or solutions that solved wider, less geeky problems.
 Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested
 content it would be making journal articles easier to find. This would be 
 an
 opportunity to show off the good work done by institutions outside Notre Dame.
 A prophet in their own land is not as convincing as the expert from afar.
 
 I was thinking it would happen on a weekday. There would be more stuff going
 on here on campus, as well as give everybody a break from their normal work
 week. More specifically, I would suggest such an event take place on a Friday
 so the poeple who stayed over night would not have to take so many days off of
 work.
 
 
   5. have a hack session
 
  It would be good to have 2 or 3 projects we can/should work on decided
  ahead of time (in case no one has any good ideas at the time), and
  perhaps a couple more inspired by the earlier presentations.
 
 
 
 True.
 
 --
 ELM
 University of Notre Dame


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Joe Hourcle

On Fri, 5 Mar 2010, Godmar Back wrote:


On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote:


Hi,
try this: http://code.google.com/p/xml2json-xslt/



I should have mentioned that I already tried everything I could find after
googling - this stylesheet doesn't meet the requirements, not by far. It
drops attributes just like simplexml_json does.

The one thing I didn't try is a program called 'BadgerFish.php' which I
couldn't locate - Google once indexed it at badgerfish.ning.com


http://web.archive.org/web/20080216200903/http://badgerfish.ning.com/

http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php

-Joe


Re: [CODE4LIB] Code4Lib Midwest?

2010-03-05 Thread Hammons, James W.
Exciting opportunity. I bet we could get several people from the Ball State 
Library IT shop up to ND for this.

Jim


+
James Hammons, M.L.S.   
Head of Library Technologies
Library Information Technology Services voice:  (765) 285-8032
Bracken Library fax: (765) 285-1096
Ball State University   e-mail: jhamm...@bsu.edu
Muncie, IN 47306http://www.bsu.edu/library
U.S.A.

Ball State University Libraries
A destination for research, learning, and friends
+



 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 LeVan,Ralph
 Sent: Friday, March 05, 2010 9:53 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib Midwest?
 
 +1
 
 I suspect a few of us from OCLC would attend.
 
 Ralph
 
  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
 Of
  Scott Garrison
  Sent: Friday, March 05, 2010 8:37 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Code4Lib Midwest?
 
  +1
 
  ELM, I'm happy to help coordinate in whatever way you need.
 
  Also, if we can find a drummer, we could do a blues trio (count me in
 on bass). I
  could bring our band's drummer (a HUGE ND fan) down for a day or two
 if
  needed--he's awesome.
 
  --SG
  WMU in Kalamazoo
 
  - Original Message -
  From: Eric Lease Morgan emor...@nd.edu
  To: CODE4LIB@LISTSERV.ND.EDU
  Sent: Thursday, March 4, 2010 4:38:53 PM
  Subject: Re: [CODE4LIB] Code4Lib Midwest?
 
  On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote:
 
2. share demonstrations
  
   I'd like to see this be something like a blend between lightning
 talks
   and the ask anything session at the last conference
 
  This certainly works for me, and the length of time of each talk
 would/could be
  directly proportional to the number of people who attend.
 
 
4. give a presentation to library staff
  
   What sort of presentation did you have in mind, Eric?
  
   This also raises the issue of weekday vs. weekend. I'm game for
   either. Anyone else have a preference?
 
  What I was thinking here was a possible presentation to library
 faculty/staff
  and/or computing faculty/staff from across campus. The presentation
 could be
  one or two cool hacks or solutions that solved wider, less geeky
 problems.
  Instead of tweaking Solr's term-weighting algorithms to index OAI-
 harvested
  content it would be making journal articles easier to find. This
 would be an
  opportunity to show off the good work done by institutions outside
 Notre Dame.
  A prophet in their own land is not as convincing as the expert from
 afar.
 
  I was thinking it would happen on a weekday. There would be more stuff
 going
  on here on campus, as well as give everybody a break from their normal
 work
  week. More specifically, I would suggest such an event take place on a
 Friday
  so the poeple who stayed over night would not have to take so many
 days off of
  work.
 
 
5. have a hack session
  
   It would be good to have 2 or 3 projects we can/should work on
 decided
   ahead of time (in case no one has any good ideas at the time), and
   perhaps a couple more inspired by the earlier presentations.
 
 
 
  True.
 
  --
  ELM
  University of Notre Dame


Re: [CODE4LIB] Code4Lib Midwest?

2010-03-05 Thread Ken Irwin
I would come from Ohio to wherever we choose. Kalamazoo would suit me just 
fine; I've not been back there in entirely too long! 
Ken

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Scott Garrison
 Sent: Friday, March 05, 2010 8:37 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib Midwest?

 +1

 ELM, I'm happy to help coordinate in whatever way you need.

 Also, if we can find a drummer, we could do a blues trio (count me in on 
 bass). I
 could bring our band's drummer (a HUGE ND fan) down for a day or two if
 needed--he's awesome.

 --SG
 WMU in Kalamazoo

 - Original Message -
 From: Eric Lease Morgan emor...@nd.edu
 To: CODE4LIB@LISTSERV.ND.EDU
 Sent: Thursday, March 4, 2010 4:38:53 PM
 Subject: Re: [CODE4LIB] Code4Lib Midwest?

 On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote:

   2. share demonstrations
 
  I'd like to see this be something like a blend between lightning talks
  and the ask anything session at the last conference

 This certainly works for me, and the length of time of each talk 
 would/could be
 directly proportional to the number of people who attend.


   4. give a presentation to library staff
 
  What sort of presentation did you have in mind, Eric?
 
  This also raises the issue of weekday vs. weekend. I'm game for
  either. Anyone else have a preference?

 What I was thinking here was a possible presentation to library faculty/staff
 and/or computing faculty/staff from across campus. The presentation could be
 one or two cool hacks or solutions that solved wider, less geeky problems.
 Instead of tweaking Solr's term-weighting algorithms to index OAI-harvested
 content it would be making journal articles easier to find. This would be 
 an
 opportunity to show off the good work done by institutions outside Notre Dame.
 A prophet in their own land is not as convincing as the expert from afar.

 I was thinking it would happen on a weekday. There would be more stuff going
 on here on campus, as well as give everybody a break from their normal work
 week. More specifically, I would suggest such an event take place on a Friday
 so the poeple who stayed over night would not have to take so many days off of
 work.


   5. have a hack session
 
  It would be good to have 2 or 3 projects we can/should work on decided
  ahead of time (in case no one has any good ideas at the time), and
  perhaps a couple more inspired by the earlier presentations.



 True.

 --
 ELM
 University of Notre Dame


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Jay Luker
If PHP/python isn't a hard requirement, I think this would be fairly
simple to do in perl using a combination of the XML::Simple [1] and
JSON::XS [2] modules.

In fact it's so simple, here's the code:


#!/usr/bin/perl

use JSON::XS;
use XML::Simple;
use strict;

my $filename = shift @ARGV;
my $parsed = XMLin($filename);
my $json = encode_json($parsed);
print $json, \n;


XML::Simple, in spite of the name, actually allows for a myriad of
options for how the perl data structure gets created from the xml,
including attribute preservation, grouping of elements, etc.

--jay

[1] http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm
[2] http://search.cpan.org/~makamaka/JSON-2.17/lib/JSON.pm

On Fri, Mar 5, 2010 at 9:55 AM, Joe Hourcle
onei...@grace.nascom.nasa.gov wrote:
 On Fri, 5 Mar 2010, Godmar Back wrote:

 On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer
 ulrich.schae...@dfki.dewrote:

 Hi,
 try this: http://code.google.com/p/xml2json-xslt/


 I should have mentioned that I already tried everything I could find after
 googling - this stylesheet doesn't meet the requirements, not by far. It
 drops attributes just like simplexml_json does.

 The one thing I didn't try is a program called 'BadgerFish.php' which I
 couldn't locate - Google once indexed it at badgerfish.ning.com

        http://web.archive.org/web/20080216200903/http://badgerfish.ning.com/

  http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php

 -Joe



Re: [CODE4LIB] Code4Lib Midwest?

2010-03-05 Thread Bill Dueber
I'm pretty sure I could make it from Ann Arbor!

On Fri, Mar 5, 2010 at 10:12 AM, Ken Irwin kir...@wittenberg.edu wrote:

 I would come from Ohio to wherever we choose. Kalamazoo would suit me just
 fine; I've not been back there in entirely too long!
 Ken

  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
  Scott Garrison
  Sent: Friday, March 05, 2010 8:37 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] Code4Lib Midwest?
 
  +1
 
  ELM, I'm happy to help coordinate in whatever way you need.
 
  Also, if we can find a drummer, we could do a blues trio (count me in on
 bass). I
  could bring our band's drummer (a HUGE ND fan) down for a day or two if
  needed--he's awesome.
 
  --SG
  WMU in Kalamazoo
 
  - Original Message -
  From: Eric Lease Morgan emor...@nd.edu
  To: CODE4LIB@LISTSERV.ND.EDU
  Sent: Thursday, March 4, 2010 4:38:53 PM
  Subject: Re: [CODE4LIB] Code4Lib Midwest?
 
  On Mar 4, 2010, at 3:29 PM, Jonathan Brinley wrote:
 
2. share demonstrations
  
   I'd like to see this be something like a blend between lightning talks
   and the ask anything session at the last conference
 
  This certainly works for me, and the length of time of each talk
 would/could be
  directly proportional to the number of people who attend.
 
 
4. give a presentation to library staff
  
   What sort of presentation did you have in mind, Eric?
  
   This also raises the issue of weekday vs. weekend. I'm game for
   either. Anyone else have a preference?
 
  What I was thinking here was a possible presentation to library
 faculty/staff
  and/or computing faculty/staff from across campus. The presentation could
 be
  one or two cool hacks or solutions that solved wider, less geeky
 problems.
  Instead of tweaking Solr's term-weighting algorithms to index
 OAI-harvested
  content it would be making journal articles easier to find. This would
 be an
  opportunity to show off the good work done by institutions outside Notre
 Dame.
  A prophet in their own land is not as convincing as the expert from afar.
 
  I was thinking it would happen on a weekday. There would be more stuff
 going
  on here on campus, as well as give everybody a break from their normal
 work
  week. More specifically, I would suggest such an event take place on a
 Friday
  so the poeple who stayed over night would not have to take so many days
 off of
  work.
 
 
5. have a hack session
  
   It would be good to have 2 or 3 projects we can/should work on decided
   ahead of time (in case no one has any good ideas at the time), and
   perhaps a couple more inspired by the earlier presentations.
 
 
 
  True.
 
  --
  ELM
  University of Notre Dame




-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-05 Thread Mehrling, Martin
I can't see why III would want to have anything to do with this conference.  I 
think most of us who attend the conference are open-source types, and are 
trying to do things beyond what we could do with the vendors (who are 
risk-averse and profit-oriented.)  If III wants to be truly innovative, they 
should send a technical person (not a sales person) to this conference.


Martin 


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Cary 
Gordon
Sent: Friday, March 05, 2010 9:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

I could say you're a dreamer, but you're not the only one.

The reality is that III sees their APIs as gold mines that they can
market to a captive audience. For example, their patron API -- a
simple web interface to patron records -- probably cost them much less
to develop than they get for a single license. Don't look for them to
actually get off of that anytime soon.

Koha might motivate them to bump there FUD generating budget, but
that's about it.

Cary

On Thu, Mar 4, 2010 at 2:41 PM, Esme Cowles escow...@ucsd.edu wrote:
 After seeing some of the cool things people can do with other ILS's and how 
 negative developers are about III, there's always the chance they might 
 decide to open up a bit more and engage with code4lib types (we can always 
 dream).

 And if that doesn't work, maybe the Ian Walls' talk (Becoming Truly 
 Innovative: Migrating from Millennium to Koha) will motivate them...

 -Esme
 --
 Esme Cowles escow...@ucsd.edu

 They extend copyrights perpetually. They don't get how that in itself is a
  form of theft. -- Lawrence Lessig, Free Culture

 On Mar 4, 2010, at 5:08 PM, Jill Ellern wrote:

 We tried to get some of the ILS's interested...with little success.  But how 
 knows...I did some heavy promotion to III this year...(despite the many 
 --s, she promised to talk to headquarters) so perhaps they might help some 
 next year...

 Jill

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Paul 
 Joseph
 Sent: Wednesday, March 03, 2010 9:56 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

 No need to be concerned about the vendors: they're the same suspects who
 sponsored C4L10.
 Paul


 On Wed, Mar 3, 2010 at 2:37 PM, Ya'aqov Ziso z...@rowan.edu wrote:

  also, I can assure you that to help keep registration fees low we'll
 be
 leaning on our vendors ...
 =
 Who would be these vendors? Seems CODE4LIB (bringing in creative, leading
 edge, OpenSource ideas where ILS have monolithically reigned) are the bad
 dream of ILS vendors. WorldCat DeveNet/Research may make an exception, but
 will it be $ufficient?
 Ya¹aqov





-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Benjamin Young
 Sent: Friday, March 05, 2010 09:26 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 If you're looking at putting MARC into JSON, there was some discussion
 of that during code4lib 2010. Johnathan Rochkind, who was at code4lib
 2010 blogged about marc-json recently:
 http://bibwild.wordpress.com/2010/03/03/marc-json/
 He references a project that Bill Dueber's been playing with for a
 year:
 http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
 
 All told, there's growing momentum for a MARC in JSON format to be
 created, so you might jump in there.

Too bad I didn't attend code4lib.  OCLC Research has created a version of MARC 
in JSON and will probably release FAST concepts in MARC binary, MARC-XML and 
our MARC-JSON format among other formats.  I'm wondering whether there is some 
consensus that can be reached and standardized at LC's level, just like OCLC, 
RLG and LC came to consensus on MARC-XML.  Unfortunately, I have not had the 
time to document the format, although it fairly straight forward, and yes we 
have an XSLT to convert from MARC-XML to MARC-JSON.  Basically the format I'm 
using is:

[
  ...
]

which represents a collection of MARC records or 

{
  ...
}

which represents a single MARC records that takes the form:

{
  leader : 01192cz  a2200301n  4500,
  controlfield :
  [
{ tag : 001, data : fst01303409 },
{ tag : 003, data : OCoLC },
{ tag : 005, data : 20100202194747.3 },
{ tag : 008, data : 060620nn anznnbabn  || ana d }
  ],
  datafield :
  [
{
  tag : 040,
  ind1 :  ,
  ind2 :  ,
  subfield :
  [
{ code : a, data : OCoLC },
{ code : b, data : eng },
{ code : c, data : OCoLC },
{ code : d, data : OCoLC-O },
{ code : f, data : fast },
  ]
},
{
  tag : 151,
  ind1 :  ,
  ind2 :  ,
  subfield :
  [
{ code : a, data : Hawaii },
{ code : z, data : Diamond Head }
  ]
}
  ]
}


[CODE4LIB] UBC jobs

2010-03-05 Thread Earles, Jill Denae
Anyone know the status of library systems jobs opening up at UBC?  I
don't see any posted yet on their site
(http://hr.ubc.ca/careers/staff_postings.html), but heard there would be
4 positions open soon.

I'll be moving to Vancouver next month, and am looking for work there.

Thanks for any information you can provide!

Jill Earles


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Bill Dueber
On Fri, Mar 5, 2010 at 12:01 PM, Houghton,Andrew hough...@oclc.org wrote:

 Too bad I didn't attend code4lib.  OCLC Research has created a version of
 MARC in JSON and will probably release FAST concepts in MARC binary,
 MARC-XML and our MARC-JSON format among other formats.  I'm wondering
 whether there is some consensus that can be reached and standardized at LC's
 level, just like OCLC, RLG and LC came to consensus on MARC-XML.
  Unfortunately, I have not had the time to document the format, although it
 fairly straight forward, and yes we have an XSLT to convert from MARC-XML to
 MARC-JSON.  Basically the format I'm using is:


The stuff I've been doing:

  http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/

... is pretty much the same, except:

  1. I don't explicitly split up control and data fields. There's a single
field list; an item that has two elements is a control field (tag/data); one
with four is a data field (tag / ind1 /ind2 / array_of_subfield)

  2. Instead of putting a collection in a big json array, I use
newline-delimited-json (basically, just stick one record on each line as a
single json hash). This has the advantage that it makes streaming much, much
easier, and makes doing some other things (e.g., grab the first record or
two) much cheaper for even the dumbest json parser). I'm not sure what the
state of JSON streaming parsers are; I know Jackson (for Java) can do it,
and perl's JSON::XS can...kind of...but it's not great.

3. I include a type (MARC-JSON, MARC-HASH, whatever) and version: [major,
minor] in each record. There's already a ton of JSON floating around the
library world; labeling what the heck a structure is is just friendly :-)

MARC's structure is dumb enough that we collectively basically can't miss;
there's only so much you can do with the stuff, and a round-trip to JSON and
back is easy to implement.

I'm not super-against explicitly labeling the data elements (tag:, :ind1:,
etc.) but I don't see where it's necessary unless you're planning on adding
out-of-band data to the records/fields/subfields at some point. Which might
be kinda cool (e.g., language hints on a per-subfield basis? Tokenization
hints for non-whitespace-delimited languages? URIs for unique concepts and
authorities where they exist for easy creation of RDF?)

I *am*, however, willing to push and push and push for NDJ instead of having
to deal with streaming JSON parsing, which to my limited understanding is
hard to get right and to my more qualified understanding is a pain in the
ass to work with.

And anything we do should explicitly be UTF-8 only; converting from MARC-8
is a problem for the server, not the receiver.

Support for what I've been calling marc-hash (I like to decouple it from the
eventual JSON format in case the serialization preferences change, or at
least so implementations don't get stuck with a single JSON library) is
already baked into ruby-marc, and obviously implementations are dead-easy no
matter what the underlying language is.

Anyone from the LoC want to get in on this?

 -Bill-




 [
  ...
 ]

 which represents a collection of MARC records or

 {
  ...
 }

 which represents a single MARC records that takes the form:

 {
  leader : 01192cz  a2200301n  4500,
  controlfield :
  [
{ tag : 001, data : fst01303409 },
{ tag : 003, data : OCoLC },
{ tag : 005, data : 20100202194747.3 },
{ tag : 008, data : 060620nn anznnbabn  || ana d }
  ],
  datafield :
  [
{
  tag : 040,
  ind1 :  ,
  ind2 :  ,
  subfield :
  [
{ code : a, data : OCoLC },
{ code : b, data : eng },
{ code : c, data : OCoLC },
{ code : d, data : OCoLC-O },
{ code : f, data : fast },
  ]
},
{
  tag : 151,
  ind1 :  ,
  ind2 :  ,
  subfield :
  [
{ code : a, data : Hawaii },
{ code : z, data : Diamond Head }
  ]
}
  ]
 }




-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Bill Dueber
 Sent: Friday, March 05, 2010 12:30 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 On Fri, Mar 5, 2010 at 12:01 PM, Houghton,Andrew hough...@oclc.org
 wrote:
 
  Too bad I didn't attend code4lib.  OCLC Research has created a
 version of
  MARC in JSON and will probably release FAST concepts in MARC binary,
  MARC-XML and our MARC-JSON format among other formats.  I'm wondering
  whether there is some consensus that can be reached and standardized
 at LC's
  level, just like OCLC, RLG and LC came to consensus on MARC-XML.
   Unfortunately, I have not had the time to document the format,
 although it
  fairly straight forward, and yes we have an XSLT to convert from
 MARC-XML to
  MARC-JSON.  Basically the format I'm using is:
 
 
 The stuff I've been doing:
 
   http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
 
 ... is pretty much the same, except:

I decided to stick closer to a MARC-XML type definition since its would be 
easier to explain how the two specifications are related, rather than take a 
more radical approach in producing a specification less familiar.  Not to say 
that other approaches are bad, they just have different advantages and 
disadvantages.  I was going for simple and familiar.

I certainly would be will to work with LC on creating a MARC-JSON specification 
as I did in creating the MARC-XML specification.


Andy.


Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-05 Thread Fleming, Declan
Hiya - San Diego is friggin expensive, and we don't have a small campus feel at 
all.  Robert McDonald and I worked out the costs a few years ago and we'd be 
almost double what Asheville conf cost folks.

It's killing me not to have you all out to paradise in Feb, but I can barely 
afford to live here :)

D

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walter 
Lewis
Sent: Wednesday, March 03, 2010 11:21 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

On 3 Mar 10, at 9:52 AM, Julia Bauder wrote:

 Also, the farther north we go, the more likely that snow+airplane
 incompatibilities will foil speakers' (and attendees'!) travel plans at the
 last minute, which isn't fun for anyone.
 
 somewhere_out_of_nor'easter_and_lake_effect_range_in_february++

Actually there is a clear line (at least on the eastern half of the continent) 
where the further north you go, the *less* snow you got this.  Buffalo is 
trailing a number of places on the east coast in total snow accumulation and 
Toronto has been dusted a few times this winter, with nothing of real 
substance.  Detroit and Chicago were well below seasonal averages last time I 
checked.

ALL of that said,  where are the San Diego gang or the folks from Miami?

Walter
  who can only dream of pubs with open patios in February


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Bill Dueber
On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote:


 I decided to stick closer to a MARC-XML type definition since its would be
 easier to explain how the two specifications are related, rather than take a
 more radical approach in producing a specification less familiar.  Not to
 say that other approaches are bad, they just have different advantages and
 disadvantages.  I was going for simple and familiar.


That makes sense, but please consider adding a format/version (which we get
in MARC-XML from the namespace and isn't present here). In fact, please
consider adding a format / version / URI, so people know what they've got.

I'm also going to again push the newline-delimited-json stuff. The
collection-as-array is simple and very clean, but leads to trouble
for production (where for most of us we'd have to get the whole freakin'
collection in memory first and then call JSON.dump or whatever)
or consumption (have to deal with a streaming json parser). The production
part is particularly worrisome, since I'd hate for everyone to have to
default to writing out a '[', looping through the records, and writing a
']'. Yeah, it's easy enough, but it's an ugly hack that *everyone* would
have to do, as opposed to just something like:

  while (r = nextRecord) {
 print r.to_json, \n
  }

Unless, of course, writing json to a stream and reading json from a stream
is a lot easier than I make it out to be across a variety of languages and I
just don't know it, which is entirely possible. The streaming writer
interfaces for Perl (
http://search.cpan.org/dist/JSON-Streaming-Writer/lib/JSON/Streaming/Writer.pm)
and Java's Jackson (
http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example) are a
little more daunting than I'd like them to be.

Not wanting to argue unnecessarily, here; just adding input before things
get effectively set in stone.

 -Bill-

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Benjamin Young

On 3/5/10 1:10 PM, Houghton,Andrew wrote:

From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Bill Dueber
Sent: Friday, March 05, 2010 12:30 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Q: XML2JSON converter

On Fri, Mar 5, 2010 at 12:01 PM, Houghton,Andrewhough...@oclc.org
wrote:

 

Too bad I didn't attend code4lib.  OCLC Research has created a
   

version of
 

MARC in JSON and will probably release FAST concepts in MARC binary,
MARC-XML and our MARC-JSON format among other formats.  I'm wondering
whether there is some consensus that can be reached and standardized
   

at LC's
 

level, just like OCLC, RLG and LC came to consensus on MARC-XML.
  Unfortunately, I have not had the time to document the format,
   

although it
 

fairly straight forward, and yes we have an XSLT to convert from
   

MARC-XML to
 

MARC-JSON.  Basically the format I'm using is:


   

The stuff I've been doing:

   http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/

... is pretty much the same, except:
 

I decided to stick closer to a MARC-XML type definition since its would be 
easier to explain how the two specifications are related, rather than take a 
more radical approach in producing a specification less familiar.  Not to say 
that other approaches are bad, they just have different advantages and 
disadvantages.  I was going for simple and familiar.

I certainly would be will to work with LC on creating a MARC-JSON specification 
as I did in creating the MARC-XML specification.


Andy.
   
A CouchDB friend of mine just pointed me to the BibJSON format by the 
Bibliographic Knowledge Network:

http://www.bibkn.org/bibjson/index.html

Might be worth looking through for future collaboration/transformation 
options.


Benjamin


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Ross Singer
On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org wrote:

 I certainly would be will to work with LC on creating a MARC-JSON 
 specification as I did in creating the MARC-XML specification.

Quite frankly, I think I (and I imagine others) would much rather see
a more open, RFC-style process to creating a marc-json spec than I
talked to LC and here you go.

Maybe I'm misreading this last paragraph a bit, however.

-Ross.


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Ross Singer
On Fri, Mar 5, 2010 at 2:06 PM, Benjamin Young byo...@bigbluehat.com wrote:

 A CouchDB friend of mine just pointed me to the BibJSON format by the
 Bibliographic Knowledge Network:
 http://www.bibkn.org/bibjson/index.html

 Might be worth looking through for future collaboration/transformation
 options.

marc-json and BibJSON serve two different purposes:  marc-json would
need to be a loss-less serialization of a MARC record which may or may
not contain bibliographic data (it may be an authority, holding or CID
record, for example).  BibJSON is more of a merging of data model and
serialization (which, admittedly, is no stranger to MARC) for the
purpose of bibliographic /citations/.  So it will probably be lossy
and there would most likely be a lot of MARC data that is out of
scope.

That's not to say it wouldn't be useful to figure out how to get from
MARC-BibJSON, but from my perspective it's difficult to see the
advantage it brings (being tied to JSON) vs. BIBO.

-Ross.


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Bill Dueber
 Sent: Friday, March 05, 2010 01:59 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org
 wrote:
 
 
  I decided to stick closer to a MARC-XML type definition since its
 would be
  easier to explain how the two specifications are related, rather than
 take a
  more radical approach in producing a specification less familiar.
 Not to
  say that other approaches are bad, they just have different
 advantages and
  disadvantages.  I was going for simple and familiar.
 
 
 That makes sense, but please consider adding a format/version (which we
 get
 in MARC-XML from the namespace and isn't present here). In fact, please
 consider adding a format / version / URI, so people know what they've
 got.

This sounds reasonable and I'll consider adding into our specification.

 I'm also going to again push the newline-delimited-json stuff. The
 collection-as-array is simple and very clean, but leads to trouble
 for production (where for most of us we'd have to get the whole
 freakin' collection in memory first ...

As far as our MARC-JSON specificaton is concerned a server application can 
return either a collection or record which mimics the MARC-XML specification 
where the collection or record element can be used for a document element.

 Unless, of course, writing json to a stream and reading json from a
 stream
 is a lot easier than I make it out to be across a variety of languages
 and I
 just don't know it, which is entirely possible. The streaming writer
 interfaces for Perl (
 http://search.cpan.org/dist/JSON-Streaming-
 Writer/lib/JSON/Streaming/Writer.pm)
 and Java's Jackson (
 http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example)
 are a
 little more daunting than I'd like them to be.

As you point out JSON streaming doesn't work with all clients and I am hesitent 
to build on anything that all clients cannot accept.  I think part of the issue 
here is proper API design.  Sending tens of megabytes back to a client and 
expecting them to process it seems like a poor API design regardless of whether 
they can stream it or not.  It might make more sense to have a server API send 
back 10 of our MARC-JSON records in a JSON collection and have the client 
request an additional batch of records for the result set.  In addition, if I 
remember correctly, JSON streaming or other streaming methods keep the 
connection to the server open which is not a good thing to do to maintain 
server throughput.


Andy.


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Benjamin Young
 Sent: Friday, March 05, 2010 02:06 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 A CouchDB friend of mine just pointed me to the BibJSON format by the
 Bibliographic Knowledge Network:
 http://www.bibkn.org/bibjson/index.html
 
 Might be worth looking through for future collaboration/transformation
 options.

Unfortunately, it doesn't really work for authority and classification data 
that I'm frequently involved with.

Andy.


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Friday, March 05, 2010 02:32 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 On Fri, Mar 5, 2010 at 1:10 PM, Houghton,Andrew hough...@oclc.org
 wrote:
 
  I certainly would be will to work with LC on creating a MARC-JSON
 specification as I did in creating the MARC-XML specification.
 
 Quite frankly, I think I (and I imagine others) would much rather see
 a more open, RFC-style process to creating a marc-json spec than I
 talked to LC and here you go.
 
 Maybe I'm misreading this last paragraph a bit, however.

Yes, you misread the last paragraph.

Andy.


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Benjamin Young

On 3/5/10 2:46 PM, Ross Singer wrote:

On Fri, Mar 5, 2010 at 2:06 PM, Benjamin Youngbyo...@bigbluehat.com  wrote:

   

A CouchDB friend of mine just pointed me to the BibJSON format by the
Bibliographic Knowledge Network:
http://www.bibkn.org/bibjson/index.html

Might be worth looking through for future collaboration/transformation
options.
 

marc-json and BibJSON serve two different purposes:  marc-json would
need to be a loss-less serialization of a MARC record which may or may
not contain bibliographic data (it may be an authority, holding or CID
record, for example).  BibJSON is more of a merging of data model and
serialization (which, admittedly, is no stranger to MARC) for the
purpose of bibliographic /citations/.  So it will probably be lossy
and there would most likely be a lot of MARC data that is out of
scope.

That's not to say it wouldn't be useful to figure out how to get from
MARC-BibJSON, but from my perspective it's difficult to see the
advantage it brings (being tied to JSON) vs. BIBO.

-Ross.
   
Thanks for the clarification, Ross. I thought it would be helpful (if 
nothing else) to see how data was being mapped in a related domain into 
and out of JSON. I'm new to library data in general, so I appreciate the 
clarification on which format is for what.


Appreciated,
Benjamin


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Bill Dueber
On Fri, Mar 5, 2010 at 3:14 PM, Houghton,Andrew hough...@oclc.org wrote:


 As you point out JSON streaming doesn't work with all clients and I am
 hesitent to build on anything that all clients cannot accept.  I think part
 of the issue here is proper API design.  Sending tens of megabytes back to a
 client and expecting them to process it seems like a poor API design
 regardless of whether they can stream it or not.  It might make more sense
 to have a server API send back 10 of our MARC-JSON records in a JSON
 collection and have the client request an additional batch of records for
 the result set.  In addition, if I remember correctly, JSON streaming or
 other streaming methods keep the connection to the server open which is not
 a good thing to do to maintain server throughput.


I guess my concern here is that the specification, as you're describing it,
is closing off potential uses.  It seems fine if, for example, your primary
concern is javascript-in-the-browser, and browser-request,
pagination-enabled systems might be all you're worried about right now.

That's not the whole universe of uses, though. People are going to want to
dump these things into a file to read later -- no possibility for pagination
in that situation. Others may, in fact, want to stream a few thousand
records down the pipe at once, but without a streaming parser that can't
happen if it's all one big array.

I worry that as specified, the *only* use will be, Pull these down a thin
pipe, and if you want to keep them for later, or want a bunch of them, you
have to deal with marc-xml. Part of my incentive is to *not* have to use
marc-xml, but in this case I'd just be trading one technology I don't like
(marc-xml) for two technologies, one of which I don't like (that'd be
marc-xml again).

I really do understand the desire to make this parallel to marc-xml, but
there's a seem between the two technologies that makes that a problematic
approach.



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread LeVan,Ralph
 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
Of
 Bill Dueber
 
 I really do understand the desire to make this parallel to marc-xml,
but
 there's a seem between the two technologies that makes that a
problematic
 approach.

As a confession, here in OCLC Research, we do pass around files of
marc-xml records that are newline delimited without a wrapper element
containing them.  We do that for all the reasons you gave for wanting
the same thing for JSON records.

Ralph


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Benjamin Young

On 3/5/10 3:45 PM, Bill Dueber wrote:

On Fri, Mar 5, 2010 at 3:14 PM, Houghton,Andrewhough...@oclc.org  wrote:


   

As you point out JSON streaming doesn't work with all clients and I am
hesitent to build on anything that all clients cannot accept.  I think part
of the issue here is proper API design.  Sending tens of megabytes back to a
client and expecting them to process it seems like a poor API design
regardless of whether they can stream it or not.  It might make more sense
to have a server API send back 10 of our MARC-JSON records in a JSON
collection and have the client request an additional batch of records for
the result set.  In addition, if I remember correctly, JSON streaming or
other streaming methods keep the connection to the server open which is not
a good thing to do to maintain server throughput.

 

I guess my concern here is that the specification, as you're describing it,
is closing off potential uses.  It seems fine if, for example, your primary
concern is javascript-in-the-browser, and browser-request,
pagination-enabled systems might be all you're worried about right now.

That's not the whole universe of uses, though. People are going to want to
dump these things into a file to read later -- no possibility for pagination
in that situation. Others may, in fact, want to stream a few thousand
records down the pipe at once, but without a streaming parser that can't
happen if it's all one big array.

I worry that as specified, the *only* use will be, Pull these down a thin
pipe, and if you want to keep them for later, or want a bunch of them, you
have to deal with marc-xml. Part of my incentive is to *not* have to use
marc-xml, but in this case I'd just be trading one technology I don't like
(marc-xml) for two technologies, one of which I don't like (that'd be
marc-xml again).

I really do understand the desire to make this parallel to marc-xml, but
there's a seem between the two technologies that makes that a problematic
approach.
   
For my part, I'd like to explore the options of putting MARC data into 
CouchDB (which stores documents as JSON) which could then open the door 
for replicating that data between any number of installations of CouchDB 
as well as providing for various output formats (marc-xml, etc).


It's just an idea, but it's one that uses JSON outside of the browser 
and is a good proof case for any MARC in JSON format.


Thanks,
Benjamin

--
President
BigBlueHat
P: 864.232.9553
W: http://www.bigbluehat.com/
http://www.linkedin.com/in/benjaminyoung


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Bill Dueber
 Sent: Friday, March 05, 2010 03:45 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 I guess my concern here is that the specification, as you're describing
 it, is closing off potential uses.  It seems fine if, for example, your
 primary concern is javascript-in-the-browser, and browser-request,
 pagination-enabled systems might be all you're worried about right now.
 
 That's not the whole universe of uses, though. People are going to want
 to dump these things into a file to read later -- no possibility for
 pagination in that situation.

I disagree that you couldn't dump a paginated result set into a file for 
reading later.  I do this all the time not only in Javascript, but may other 
programming languages.

 Others may, in fact, want to stream a few thousand
 records down the pipe at once, but without a streaming parser that
 can't happen if it's all one big array.

Well, if your service isn't allowing them to be streamed a few thousand records 
at a time, then that isn't a issue :)

Maybe I have been mislead or misunderstood JSON streaming.  My understanding 
was that you can generate an arbitrary large outgoing stream on the server side 
and can read an arbitrary large incoming stream on the client side.  So it 
shouldn't matter if the result set was delivered as one big JSON array.  The 
SAX like interface that JSON streaming uses provides the necessary events to 
allow you to pull the individual records from that arbitrary large array.

 I worry that as specified, the *only* use will be, Pull these down a
 thin pipe, and if you want to keep them for later, or want a bunch of
 them, you have to deal with marc-xml.

Don't quite follow this.  MARC-XML is an XML format, MARC-JSON is our JSON 
format for expressing the various MARC-21 format, e.g., authority, 
bibliographic, classification, community information and holdings in JSON.  The 
JSON is based on the structure of MARC-XML which was based on the structure of 
ISO 2709.  Don't see how MARC-XML comes into play when you are dealing with 
JSON.  If you want to save our MARC-JSON you don't have to convert it to 
MARC-XML on the client side.  Just save it as a text file.

 Part of my incentive is to *not* have to use marc-xml, but in this 
 case I'd just be trading one technology I don't like (marc-xml) 
 for two technologies, one of which I don't like (that'd be marc-xml 
 again).

Again not sure how to address this concern.  If you are dealing with library 
data, then its current communication formats are either MARC binary (ISO 2709) 
or MARC-XML, ignoring IFLA's MARC-XML-ish format for the moment.  You might not 
like it, but that is life in library land.  You can go develop your own formats 
based on the various MARC-21 format specifications, but are unlikely to achieve 
any sort of interoperability with the existing library systems and services.

We choose our MARC-JSON to maintain the structural components of MARC-XML and 
hence MARC binary (ISO 2709).  In MARC, control fields have different semantics 
from data fields and you cannot merge them into one thing called field.  If you 
look closely at the MARC-XML schema, you might notice that the controlfield and 
datafield elements can have non-numeric tags.  If you merge everything into 
something called field, then you cannot distinguish between a non-numeric tag 
for a controlfield vs. a datafield element.  There are valid reasons why we 
decided to maintain the existing structure of MARC.


Andy.


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Benjamin Young
 Sent: Friday, March 05, 2010 04:24 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 For my part, I'd like to explore the options of putting MARC data into
 CouchDB (which stores documents as JSON) which could then open the door
 for replicating that data between any number of installations of
 CouchDB
 as well as providing for various output formats (marc-xml, etc).
 
 It's just an idea, but it's one that uses JSON outside of the browser
 and is a good proof case for any MARC in JSON format.

This was partly the reason why I developed our MARC-JSON format since I'm using 
MongoDB [1] which is a NoSQL database based on JSON.


Andy.

[1] http://www.mongodb.org/display/DOCS/Home


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Bill Dueber
On Fri, Mar 5, 2010 at 4:38 PM, Houghton,Andrew hough...@oclc.org wrote:


 Maybe I have been mislead or misunderstood JSON streaming.


This is my central point. I'm actually saying that JSON streaming is painful
and rare enough that it should be avoided as a requirement for working with
any new format.

I guess, in sum, I'm making the following assertions:

1. Streaming APIs for JSON, where they exist, are a pain in the ass. And
they don't exist everywhere. Without a JSON streaming parser, you have to
pull the whole array of documents up into memory, which may be impossible.
This is the crux of my argument -- if you disagree with it, then I would
assume you disagree with the other points as well.

2. Many people -- and I don't think I'm exaggerating here, honestly --
really don't like using MARC-XML but have to because of the length
restrictions on MARC-binary. A useful alternative, based on dead-easy
parsing and production, is very appealing.

2.5 Having to deal with a streaming API takes away the dead-easy part.

3. If you accept my assertions about streaming parsers, then dealing with
the format you've proposed for large sets is either painful (with a
streaming API) or impossible (where such an API doesn't exist) due to memory
constraints.

4. Streaming JSON writer APIs are also painful; everything that applies to
reading applies to writing. Sans a streaming writer, trying to *write* a
large JSON document also results in you having to have the whole thing in
memory.

5. People are going to want to deal with this format, because of its
benefits over marc21 (record length) and marc-xml (ease of processing),
which means we're going to want to deal with big sets of data and/or dump
batches of it to a file. Which brings us back to #1, the pain or absence of
streaming apis.

Write a better JSON parser/writer  or use a different language seem like
bad solutions to me, especially when a (potentially) useful alternative
exists.

As I pointed out, if streaming JSON is no harder/unavailable to you than
non-streaming json, then this is mostly moot. I assert that for many people
in this community it is one or the other, which is why I'm leery of it.

  -Bill-


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-05 Thread Meireles, Vanessa
Miami is also very expensive, it's considered top 3 now in the most expensive 
places to live, plus I must add that Feb is also our high season which means 
hotel rates and airfares are more than double the usual rates.   We also have a 
poor public transportation system... sorry, unless someone else in Florida can 
host the conference.

Vanessa Meireles
v.meire...@miami.edu
Computer Programmer, 
Information Mgmt  Systems and Digital Initiatives

University of Miami Richter Library
Coral Gables, FL  33124-0320
URL:  http://www.library.miami.edu/


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
Fleming, Declan
Sent: Friday, March 05, 2010 1:24 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

Hiya - San Diego is friggin expensive, and we don't have a small campus feel at 
all.  Robert McDonald and I worked out the costs a few years ago and we'd be 
almost double what Asheville conf cost folks.

It's killing me not to have you all out to paradise in Feb, but I can barely 
afford to live here :)

D

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walter 
Lewis
Sent: Wednesday, March 03, 2010 11:21 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

On 3 Mar 10, at 9:52 AM, Julia Bauder wrote:

 Also, the farther north we go, the more likely that snow+airplane
 incompatibilities will foil speakers' (and attendees'!) travel plans at the
 last minute, which isn't fun for anyone.
 
 somewhere_out_of_nor'easter_and_lake_effect_range_in_february++

Actually there is a clear line (at least on the eastern half of the continent) 
where the further north you go, the *less* snow you got this.  Buffalo is 
trailing a number of places on the east coast in total snow accumulation and 
Toronto has been dusted a few times this winter, with nothing of real 
substance.  Detroit and Chicago were well below seasonal averages last time I 
checked.

ALL of that said,  where are the San Diego gang or the folks from Miami?

Walter
  who can only dream of pubs with open patios in February


Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-05 Thread Sibyl Schaefer
Anyone interested in Burlington, Vt.? If I had some help (and the
deadline extended a couple days) I'd be willing to throw in the hat.

Sibyl Schaefer
University of Vermont

On Fri, Mar 5, 2010 at 5:22 PM, Meireles, Vanessa v.meire...@miami.edu wrote:
 Miami is also very expensive, it's considered top 3 now in the most expensive 
 places to live, plus I must add that Feb is also our high season which means 
 hotel rates and airfares are more than double the usual rates.   We also have 
 a poor public transportation system... sorry, unless someone else in Florida 
 can host the conference.

 Vanessa Meireles
 v.meire...@miami.edu
 Computer Programmer,
 Information Mgmt  Systems and Digital Initiatives

 University of Miami Richter Library
 Coral Gables, FL  33124-0320
 URL:  http://www.library.miami.edu/


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
 Fleming, Declan
 Sent: Friday, March 05, 2010 1:24 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

 Hiya - San Diego is friggin expensive, and we don't have a small campus feel 
 at all.  Robert McDonald and I worked out the costs a few years ago and we'd 
 be almost double what Asheville conf cost folks.

 It's killing me not to have you all out to paradise in Feb, but I can barely 
 afford to live here :)

 D

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
 Walter Lewis
 Sent: Wednesday, March 03, 2010 11:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

 On 3 Mar 10, at 9:52 AM, Julia Bauder wrote:

 Also, the farther north we go, the more likely that snow+airplane
 incompatibilities will foil speakers' (and attendees'!) travel plans at the
 last minute, which isn't fun for anyone.

 somewhere_out_of_nor'easter_and_lake_effect_range_in_february++

 Actually there is a clear line (at least on the eastern half of the 
 continent) where the further north you go, the *less* snow you got this.  
 Buffalo is trailing a number of places on the east coast in total snow 
 accumulation and Toronto has been dusted a few times this winter, with 
 nothing of real substance.  Detroit and Chicago were well below seasonal 
 averages last time I checked.

 ALL of that said,  where are the San Diego gang or the folks from Miami?

 Walter
  who can only dream of pubs with open patios in February



Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-05 Thread Edward M. Corrado
Hi Sibyl,

I'd love Burlington. It might not be warm but there is a lot of good
winter activities. However, It is probably too late for this year to
find out what the costs, etc. are, but if you want to put a proposal
for 2012, count me in.

Edward

On Fri, Mar 5, 2010 at 5:53 PM, Sibyl Schaefer sibylschae...@gmail.com wrote:
 Anyone interested in Burlington, Vt.? If I had some help (and the
 deadline extended a couple days) I'd be willing to throw in the hat.

 Sibyl Schaefer
 University of Vermont

 On Fri, Mar 5, 2010 at 5:22 PM, Meireles, Vanessa v.meire...@miami.edu 
 wrote:
 Miami is also very expensive, it's considered top 3 now in the most 
 expensive places to live, plus I must add that Feb is also our high season 
 which means hotel rates and airfares are more than double the usual rates.   
 We also have a poor public transportation system... sorry, unless someone 
 else in Florida can host the conference.

 Vanessa Meireles
 v.meire...@miami.edu
 Computer Programmer,
 Information Mgmt  Systems and Digital Initiatives

 University of Miami Richter Library
 Coral Gables, FL  33124-0320
 URL:  http://www.library.miami.edu/


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
 Fleming, Declan
 Sent: Friday, March 05, 2010 1:24 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

 Hiya - San Diego is friggin expensive, and we don't have a small campus feel 
 at all.  Robert McDonald and I worked out the costs a few years ago and we'd 
 be almost double what Asheville conf cost folks.

 It's killing me not to have you all out to paradise in Feb, but I can barely 
 afford to live here :)

 D

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
 Walter Lewis
 Sent: Wednesday, March 03, 2010 11:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Code4Lib 2011 Proposals

 On 3 Mar 10, at 9:52 AM, Julia Bauder wrote:

 Also, the farther north we go, the more likely that snow+airplane
 incompatibilities will foil speakers' (and attendees'!) travel plans at the
 last minute, which isn't fun for anyone.

 somewhere_out_of_nor'easter_and_lake_effect_range_in_february++

 Actually there is a clear line (at least on the eastern half of the 
 continent) where the further north you go, the *less* snow you got this.  
 Buffalo is trailing a number of places on the east coast in total snow 
 accumulation and Toronto has been dusted a few times this winter, with 
 nothing of real substance.  Detroit and Chicago were well below seasonal 
 averages last time I checked.

 ALL of that said,  where are the San Diego gang or the folks from Miami?

 Walter
  who can only dream of pubs with open patios in February




Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Bill Dueber
On Fri, Mar 5, 2010 at 6:25 PM, Houghton,Andrew hough...@oclc.org wrote:

 OK, I will bite, you stated:

 1. That large datasets are a problem.
 2. That streaming APIs are a pain to deal with.
 3. That tool sets have memory constraints.

 So how do you propose to process large JSON datasets that:

 1. Comply with the JSON specification.
 2. Can be read by any JavaScript/JSON processor.
 3. Do not require the use of streaming API.
 4. Do not exceed the memory limitations of current JSON processors.


What I'm proposing is that we don't process large JSON datasets; I'm
proposing that we process smallish JSON documents one at a time by pulling
them out of a stream based on an end-of-record character.

This is basically what we use for MARC21 binary format -- have a defined
structure for a valid record, and separate multiple well-formed record
structures with an end-of-record character. This preserves JSON
specification adherence at the record level and uses a different scheme to
represent collections. Obviously, MARC-XML uses a different mechanism to
define a collection of records -- putting well-formed record structures
inside a collection tag.

So... I'm proposing define what we mean by a single MARC record serialized
to JSON (in whatever format; I'm not very opinionated on this point) that
preserves the order, indicators, tags, data, etc. we need to round-trip
between marc21binary, marc-xml, and marc-json.

And then separate those valid records with an end-of-record character --
\n.

Unless I've read all this wrong, you've come to the conclusion that the
benefit of having a JSON serialization that is valid JSON at both the record
and collection level outweighs the pain of having to deal with a streaming
parser and writer.  This allows a single collection to be treated as any
other JSON document, which has obvious benefits (which I certainly don't
mean to minimize) and all the drawbacks we've been talking about *ad nauseam
*.

I go the the other way. I think the pain of dealing with a streaming API
outweighs the benefits of having a single valid JSON structure for a
collection, and instead have put forward that we use a combination of JSON
records and a well-defined end-of-record character (\n) to represent a
collection.  I recognize that this involves providing special-purpose code
which must call for JSON-deserialization on each line, instead of being able
to throw the whole stream/file/whatever at your json parser is. I accept
that because getting each line of a text file is something I find easy
compared to dealing with streaming parsers.

And our point of disagreement, I think, is that I believe that defining the
collection structure in such a way that we need two steps (get a line;
deserialize that line) and can't just call the equivalent of
JSON.parse(stream) has benefits in ease of implementation and use that
outweigh the loss of having both a single record and a collection of records
be valid JSON. And you, I think, don't :-)

I'm going to bow out of this now, unless I've got some part of our positions
wrong, to let any others that care (which may number zero) chime in.

 -Bill-










-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Houghton,Andrew
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Friday, March 05, 2010 09:18 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Q: XML2JSON converter
 
 I actually just wrote the same exact email as Bill (although probably
 not as polite -- I called the marcxml collection element a
 contrivance that appears nowhere in marc21).  I even wrote the
 marc21 is EOR character delimited files bit.  I was hoping to figure
 out how to use unix split to make my point, couldn't, and then
 discarded my draft.
 
 But I was *right there*.
 
 -Ross.

I'll answer Bill's message tomorrow after I have had some sleep :) 

Actually, I contend that the MARC-XML collection element does appear in MARC 
(ISO 2709), but it is at the physical layer and not at the structural layer.  
Remember MARC records were placed on a tape reel, thus the tape reel was the 
collection (container).  Placed on disk in a file, the file is the collection 
(container).  I agree that it's not spelled out in the standard, but the 
concept of a collection (container) is implicit when you have more than one 
record of anything.

Basic set theory: a set is a container for its members :)

The obvious reason why it exists in XML is that the XML infoset requires a 
single document element (container).  This is why the MARC-XML schema allows 
either a collection or record element to be specified as the document element.  
It is unfortunate that the XML infoset requires a single document element, 
otherwise you would be back to the file on disk being the implicit collection 
(container) as it is in ISO 2709.


Andy.