date:20101025

Sorry. That was rude, and uncalled for. I disagree that the problem is
easily solved, even without the politics. There've been lots of attempts to
try to come up with a sufficiently expressive toolset for dealing with
biblio data, and we're still working on it. If you do think you've got some
insight, I'm sure we're all ears, but try to frame it terms of the existing
work if you can (RDA, some of the dublin core stuff, etc.) so we have a
frame of reference.

On Mon, Oct 25, 2010 at 10:18 PM, Bill Dueber  wrote:

> On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen <
> alexander.johanne...@gmail.com> wrote:
>
>> Political? For sure. Engineering? Not so much.
>
>
> Ok. Solve it. Let us know when you're done.
>
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

Re: [CODE4LIB] MARCXML - What is it for?

On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> Political? For sure. Engineering? Not so much.


Ok. Solve it. Let us know when you're done.


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Dana Pearson

i'm not a coder but i undertook a study of XML some years after it
came onto the scene and with a likely confused notion that it would be
the next significant technology, I learned some XSL and later was able
to weave PubMed Central journal information (CSV transformed into XML)
together with Dublin Core metadata of journal articles into MARCXML
during harvest with MarcEdit (which the inestimable Terry Reece
continues to tweak).  Also used the same XML journal data to augment
NLM  journal records with PubMed Central holdings and other data with
a transform in my IDE though it took me weeks to get right..so, no
asperations to become a coder.

Probably did not get all of the MARC cataloging rules right and I can
empathize with those who come to MARC and cataloging standards without
cataloging training, experience. My library experience was primarily
as library director...my expertise on library specializations would
always be under question.

regards,
dana








-- 
Dana Pearson
dbpearsonmlis.com

Re: [CODE4LIB] MARCXML - What is it for?

On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber  wrote:
> Here, I think you're guilty of radically underestimating "lots of people
> around the library world." No one thinks MARC is a good solution to
> our modern problems, and no one who actually knows what MARC
> is has trouble understanding MARC-XML as an XML serialization of
> the same old data -- certainly not anyone capable of meaningful
> contribution to work on an alternative.

Slow down, Tex. "Lots of people in the library world" is not the same
as developers, or even good developers, or even good XML developers,
or even good XML developers who knows what the document model imposes
to a data-centric approach.

> The problem we're dealing with is *hard*. Mind-numbingly hard.

This is no justification for not doing things better. (And I'd love to
know what the hard bits are; always interesting to hear from various
people as to what they think are the *real* problems of library
problems, as opposed to any other problem they have)

> The library world has several generations of infrastructure built
> around MARC (by which I mean AACR2), and devising data
> structures and standards that are a big enough improvement over
>  MARC to warrant replacing all that infrastructure is an engineering
>  and political nightmare.

Political? For sure. Engineering? Not so much. This is just that whole
"blinded by MARC" issue that keeps cropping up from time to time, and
rightly so; it is truly a beast - at least the way we have come to
know it through AACR2 and all its friends and its death-defying focus
on all things bibliographic - that has paralyzed library innovation,
probably to the point of making libraries almost irrelevant to the
world.

> I'm happy to take potshots at the RDA stuff from the sidelines, but I never
> forget that I'm on the sidelines, and that the people active in the game are
> among the best and brightest we have to offer, working on a problem that
>  invariably seems more intractable the deeper in you go.

Well, that's a pretty scary sentence, for all sorts of reasons, but I
think I shall not go there.

> If you think MARC-XML is some sort of an actual problem

What, because you don't agree with me the problem doesn't exist? :)

> and that people
> just need to be shouted at to realize that and do something about it, then,
> well, I think you're just plain wrong.

Fair enough, although you seem to be under the assumption that all of
the stuff I'm saying is a figment of my imagination (I've been
involved in several projects lambasted because managers think MARCXML
is solving some imaginary problem; this is not bullshit, but pain and
suffering from the battlefields of library development), that I'm not
one of those developers (or one of you, although judging from this
discussion it's clear that I am not), that the things I say somehow
doesn't apply because you don't agree with, umm, what I'm assuming is
my somewhat direct approach to stating my heretic opinions.

Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] MARCXML - What is it for?

On Mon, Oct 25, 2010 at 9:32 PM, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> Lots of people around the library world infra-structure will think
> that since your data is now in XML it has taken some important step
> towards being inter-operable with the rest of the world, that library
> data now is part of the real world in *any* meaningful way, but this
> is simply demonstrably deceivingly not true.

Here, I think you're guilty of radically underestimating "lots of people
around the library world." No one thinks MARC is a good solution to our
modern problems, and no one who actually knows what MARC is has trouble
understanding MARC-XML as an XML serialization of the same old data --
certainly not anyone capable of meaningful contribution to work on an
alternative.

You seem to presuppose that there's an enormous pent-up energy poised to
sweep in changes to an obviously-better data format, and that the existence
of MARC-XML somehow defuses all that energy. The truth is that a high
percentage of people that work with MARC data actively think about (or
curse) things that are wrong with it and gobs and gobs of ridiculously-smart
people work on a variety of alternate solutions (not the least of which is
RDA) and get their organizations to spend significant money to do so. The
problem we're dealing with is *hard*. Mind-numbingly hard.

The library world has several generations of infrastructure built around
MARC (by which I mean AACR2), and devising data structures and standards
that are a big enough improvement over MARC to warrant replacing all
that infrastructure is an engineering and political nightmare. I'm happy to
take potshots at the RDA stuff from the sidelines, but I never forget that
I'm on the sidelines, and that the people active in the game are among the
best and brightest we have to offer, working on a problem that invariably
seems more intractable the deeper in you go.

If you think MARC-XML is some sort of an actual problem, and that people
just need to be shouted at to realize that and do something about it, then,
well, I think you're just plain wrong.

  -Bill-

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

Re: [CODE4LIB] MARCXML - What is it for?

On Tue, Oct 26, 2010 at 11:56 AM, Walker, David  wrote:
> Your criticisms of MARC-XML all seem to presume that MARC-XML is the
> goal, the end point in the process.  But MARC-XML is really better seen as a
> utility, a middle step between binary MARC and the real goal, which is some
> other "useful and interesting" XML schema.

How do you create an ontological commitment in a community to an
expanding and useful set of tools and vocabularies? I think I need to
remind people of what MARCXML is supposed to be ;

"a framework for working with MARC data in a XML environment. This
framework is intended to be flexible and extensible to allow users to
work with MARC data in ways specific to their needs. The framework
itself includes many components such as schemas, stylesheets, and
software tools."

I'm not assuming MARCXML is a goal, no matter how we define that. I'm
poo-pooing MARCXML for the semantics we, as a community, have been
given by a process I suspect had goals very different from reality.
Very few people would "work with MARC through MARCXML", they would use
it to convert it, filter it, hack around it to something else
entirely. And I'm afraid lots of people are missing the point of
stubbing the developments in a community by embracing tools that
pushes a packet that inhibits innovation. So, here's the point, in
paraphrased point;

   "Here's our new thing. And we did it by simply converting all our
MARC into MARCXML that runs on a cron job every midnight, and a bit of
horrendous XSLT that's impossible to maintain."

   "But it looks just like the old thing using MARC and some templates?"

   "Ah yes, but now we're doing it in XML!"

   (Yeah, yeah, your mileage will vary)

I'm sorry if I'm overly pessimistic about the XML goodness in the
world, not for the XML itself, but the consequences of the named
entities involved. I've been a die-hard XML wonk for far too many
years, and the tools in that tool-chest doesn't automatically solve
hard problems better by wrapping stuff up in angle brackets, and -
dare I say it? - perhaps introduces a whole fleet of other problems
rarely talked about when XML is the latest buzz-word, like using a
document model on what's a traditional records model, character
encodings, whitespace issues, unicode, size and efficiencies (the
other part of this thread), and so on.

But let me also be a bit more specific about that hard semantic
problem I'm talking about;

Lots of people around the library world infra-structure will think
that since your data is now in XML it has taken some important step
towards being inter-operable with the rest of the world, that library
data now is part of the real world in *any* meaningful way, but this
is simply demonstrably deceivingly not true. By having our data in XML
has killed a few good projects where people have gone "A new project
to convert our MARC into useful XML? Aha! LoC has already solved that
problem for us."

Btw, to those who find me so obnoxious, at no point do I say it was
intentionally evil, just evil none the same. The road to hell is, as
always, paved with good intentions.

Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Eric Lease Morgan

On Oct 25, 2010, at 8:56 PM, Walker, David wrote:

> Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, 
> the end point in the process.  But MARC-XML is really better seen as a 
> utility, a middle step between binary MARC and the real goal, which is some 
> other "useful and interesting" XML schema.

Exactly.

-- 
Eric Morgan

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Walker, David

> b) expanding it to be actual useful and interesting.

But here I think you've missed the very utility of MARC-XML.

Let's say you have a binary MARC file (the kind that comes out of an ILS) and 
want to transform that into MODS, Dublin Core, or maybe some other XML schema.  

How would you do that?  

One way is to first transform the MARC into MARC-XML.  Then you can use XSLT to 
crosswalk the MARC-XML into that other schema.  Very handy.

Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, the 
end point in the process.  But MARC-XML is really better seen as a utility, a 
middle step between binary MARC and the real goal, which is some other "useful 
and interesting" XML schema.

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander 
Johannesen [alexander.johanne...@gmail.com]
Sent: Monday, October 25, 2010 12:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?

Hiya,

On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack  wrote:
> Switching to an XML format doesn't help with that at all.

I'm willing to take it further and say that MARCXML was the worst
thing the library world ever did. Some might argue it was a good first
step, and that it was better with something rather than nothing, to
which I respond ;

Poppycock!

MARCXML is nothing short of evil. Not only does it goes against every
principal of good XML anywhere (don't rely on whitespace, structure
over code, namespace conventions, identity management, document
control, separation of entities and properties, and on and on), it
breaks the ontological commitment that a better treatment of the MARC
data could bring, deterring people from actually a) using the darn
thing as anything but a bare minimal crutch, and b) expanding it to be
actual useful and interesting.

The quicker the library world can get rid of this monstrosity, the
better, although I doubt that will ever happen; it will hang around
like a foul stench for as long as there is MARC in the world. A long
time. A long sad time.

A few extra notes;
   http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html

Can you tell I'm not a fan? :)

Kind regards,

Alex
--
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] Django

2010-10-25 Thread Luciano Ramalho

On Mon, Oct 25, 2010 at 6:33 PM, Gabriel Farrell  wrote:
> If you already know PHP you might want to check out Symfony or another
> PHP framework to get the hang of web frameworks, then move onto other
> languages from there.

I've been using Django for a couple of years now, and have been tasked
to introduce Django to a team in my current employer. Two of the
developers here, both experienced in PHP but just learning Python,
told me that they've found Django much simpler and easier to learn
than Symfony.

Besides the original Django Book, my colleagues have also enjoyed
"Python Web Development with Django", which includes half a dozen
simple and diverse example applications.

http://www.amazon.com/Python-Development-Django-Jeff-Forcier/dp/0132356139

-- 
Luciano Ramalho
programador repentista || stand-up programmer
Twitter: @luciano

Re: [CODE4LIB] Django

2010-10-25 Thread Junior Tidal

I know the difference. 

>>> Andrew Hankinson  10/25/2010 4:40 PM >>>
Django is a web framework; Python is the language.

If you don't know the difference, I'd suggest sticking with PHP and going with 
one of the frameworks available to you there.


On 2010-10-25, at 4:25 PM, Junior Tidal wrote:

> Thanks for the suggestions everyone. I haven't actively looked for resources 
> since I'm busy doing collection development. However, I came across an 
> advertisement for a Django book and figured it would be a useful language to 
> learn. I already know php, so it seems logical that django is the next step?
> 
> Best,  
> 
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY 
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
> 
> http://library.citytech.cuny.edu 
> 
> 
 Andrew Hankinson  10/25/2010 10:23 AM >>>
> There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
> revised edition for 1.0)
> The Django docs, with some intro tutorials: 
> http://docs.djangoproject.com/en/1.2/ 
> 
> Did you try those already?
> 
> 
> On 2010-10-25, at 10:19 AM, Junior Tidal wrote:
> 
>> Hello Code4Lib,
>> 
>> Does anyone have any recommendations for learning Django? Books, websites, 
>> video tutorials, etc. ...
>> 
>> thanks,
>> 
>> Junior Tidal
>> Assistant Professor
>> Web Services and Multimedia Librarian
>> New York City College of Technology, CUNY 
>> 300 Jay Street
>> Brooklyn, NY 11210
>> 718.260.5481
>> 
>> http://library.citytech.cuny.edu

Re: [CODE4LIB] MARCXML - What is it for?

I know there are two parts of this discussion (speed on the one hand,
applicability/features on teh other), but for the former, running a little
benchmark just isn't that hard. Aren't we supposed to, you know, prefer to
make decisions based on data?

Note: I'm only testing deserialization because there's isn't, as of now, a
fast serialization option for ruby-marc. It uses REXML, and it's dog-slow. I
already looked marc-in-json vs marc binary at
http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/

Benchmark Source: http://gist.github.com/645683

18,883 records as either an XML collection or newline-delimited json.
Open the file, read every record, pull out a title. Repeat 5 times for a
total of 94,415 records (i.e., just under 100K records total).

Under ruby-marc, using the libxml deserializer is the fastest option. If
you're using the REXML parser, well,  god help us all.

ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin9.8.0]. User time
reported in seconds.

  xml w/libxml 227 seconds
  marc-in-json w/yajl  130 seconds

Soquite a bit faster (more than 40%). For a million records (assuming I
can just say 10*these_values) you're talking about a difference of 16
minutes due to just reading speed. Assuming, of course, you're running your
code on my desktop. Today.

For the 8M records I have to deal with, that'd be roughly 8M * ((227-130)
/ 94,415)  = 7806 seconds, or about 130 minutes. S...a lot.

Of course, if you're using a slower XML library or a slower JSON library,
your numbers will vary quite a bit. REXML is unforgivingly slow, and
json/pure (and even 'json') are quite a bit slower than yajl. And don't
forget that you need to serialize these things from your source somehow...

 -Bill-

On Mon, Oct 25, 2010 at 4:23 PM, Stephen Meyer wrote:

> Kyle Banerjee wrote:
>
>> On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding 
>> wrote:
>>
>>  Does processing speed of something matter anymore? You'd have to be
>>> doing a LOT of processing to care, wouldn't you?
>>>
>>>
>> Data migrations and data dumps are a common use case. Needing to break or
>> make hundreds of thousands or millions of records is not uncommon.
>>
>> kyle
>>
>
> To make this concrete, we processes the MARC records from 14 separate ILS's
> throughout the University of Wisconsin System. We extract, sort on OCLC
> number, dedup and merge pieces from any campus that has a record for the
> work. The MARC that we then index and display here
>
>  http://forward.library.wisconsin.edu/catalog/ocm37443537?school_code=WU
>
> is not identical to the version of the MARC record from any of the 4
> schools that hold it.
>
> We extract 13 million records and dedup down to 8 million every week. Speed
> is paramount.
>
> -sm
> --
> Stephen Meyer
> Library Application Developer
> UW-Madison Libraries
> 436 Memorial Library
> 728 State St.
> Madison, WI 53706
>
> sme...@library.wisc.edu
> 608-265-2844 (ph)
>
>
> "Just don't let the human factor fail to be a factor at all."
> - Andrew Bird, "Tables and Chairs"
>

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

Re: [CODE4LIB] MARCXML - What is it for?

Ray Denenberg, Library of Congress  wrote:
> It really is possible to make your point without being quite so obnoxious.

Obnoxious?


Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] Django

Django is a web framework; Python is the language.

If you don't know the difference, I'd suggest sticking with PHP and going with 
one of the frameworks available to you there.


On 2010-10-25, at 4:25 PM, Junior Tidal wrote:

> Thanks for the suggestions everyone. I haven't actively looked for resources 
> since I'm busy doing collection development. However, I came across an 
> advertisement for a Django book and figured it would be a useful language to 
> learn. I already know php, so it seems logical that django is the next step?
> 
> Best,  
> 
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY 
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
> 
> http://library.citytech.cuny.edu
> 
> 
 Andrew Hankinson  10/25/2010 10:23 AM >>>
> There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
> revised edition for 1.0)
> The Django docs, with some intro tutorials: 
> http://docs.djangoproject.com/en/1.2/ 
> 
> Did you try those already?
> 
> 
> On 2010-10-25, at 10:19 AM, Junior Tidal wrote:
> 
>> Hello Code4Lib,
>> 
>> Does anyone have any recommendations for learning Django? Books, websites, 
>> video tutorials, etc. ...
>> 
>> thanks,
>> 
>> Junior Tidal
>> Assistant Professor
>> Web Services and Multimedia Librarian
>> New York City College of Technology, CUNY 
>> 300 Jay Street
>> Brooklyn, NY 11210
>> 718.260.5481
>> 
>> http://library.citytech.cuny.edu

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Ray Denenberg, Library of Congress

It really is possible to make your point without being quite so obnoxious.
Everyone else seems to be able to do so. --Ray

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Alexander Johannesen
Sent: Monday, October 25, 2010 3:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?

Hiya,

On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack  wrote:
> Switching to an XML format doesn't help with that at all.

I'm willing to take it further and say that MARCXML was the worst thing the
library world ever did. Some might argue it was a good first step, and that
it was better with something rather than nothing, to which I respond ;

Poppycock!

MARCXML is nothing short of evil. Not only does it goes against every
principal of good XML anywhere (don't rely on whitespace, structure over
code, namespace conventions, identity management, document control,
separation of entities and properties, and on and on), it breaks the
ontological commitment that a better treatment of the MARC data could bring,
deterring people from actually a) using the darn thing as anything but a
bare minimal crutch, and b) expanding it to be actual useful and
interesting.

The quicker the library world can get rid of this monstrosity, the better,
although I doubt that will ever happen; it will hang around like a foul
stench for as long as there is MARC in the world. A long time. A long sad
time.

A few extra notes;
   http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html

Can you tell I'm not a fan? :)

Kind regards,

Alex
--
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] Django

2010-10-25 Thread Gabriel Farrell

If you already know PHP you might want to check out Symfony or another
PHP framework to get the hang of web frameworks, then move onto other
languages from there.

On Mon, Oct 25, 2010 at 4:25 PM, Junior Tidal  wrote:
> Thanks for the suggestions everyone. I haven't actively looked for resources 
> since I'm busy doing collection development. However, I came across an 
> advertisement for a Django book and figured it would be a useful language to 
> learn. I already know php, so it seems logical that django is the next step?
>
> Best,
>
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
>
> http://library.citytech.cuny.edu
>
>
 Andrew Hankinson  10/25/2010 10:23 AM >>>
> There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
> revised edition for 1.0)
> The Django docs, with some intro tutorials: 
> http://docs.djangoproject.com/en/1.2/
>
> Did you try those already?
>
>
> On 2010-10-25, at 10:19 AM, Junior Tidal wrote:
>
>> Hello Code4Lib,
>>
>> Does anyone have any recommendations for learning Django? Books, websites, 
>> video tutorials, etc. ...
>>
>> thanks,
>>
>> Junior Tidal
>> Assistant Professor
>> Web Services and Multimedia Librarian
>> New York City College of Technology, CUNY
>> 300 Jay Street
>> Brooklyn, NY 11210
>> 718.260.5481
>>
>> http://library.citytech.cuny.edu
>

Re: [CODE4LIB] Django

2010-10-25 Thread Gabriel Farrell

Agreed on the docs at the website. If you can't figure something out
from those, dig into the source. Happy hacking!

On Mon, Oct 25, 2010 at 10:25 AM, Michael J. Giarlo
 wrote:
> I'd start here:
>
>   http://docs.djangoproject.com/en/1.2/
>
> There are some tutorials in there as well.
>
> -Mike
>
>
>
> On Mon, Oct 25, 2010 at 10:19, Junior Tidal  wrote:
>> Hello Code4Lib,
>>
>> Does anyone have any recommendations for learning Django? Books, websites, 
>> video tutorials, etc. ...
>>
>> thanks,
>>
>> Junior Tidal
>> Assistant Professor
>> Web Services and Multimedia Librarian
>> New York City College of Technology, CUNY
>> 300 Jay Street
>> Brooklyn, NY 11210
>> 718.260.5481
>>
>> http://library.citytech.cuny.edu
>>
>

Re: [CODE4LIB] Django

2010-10-25 Thread Junior Tidal

Thanks for the suggestions everyone. I haven't actively looked for resources 
since I'm busy doing collection development. However, I came across an 
advertisement for a Django book and figured it would be a useful language to 
learn. I already know php, so it seems logical that django is the next step?

Best,  

Junior Tidal
Assistant Professor
Web Services and Multimedia Librarian
New York City College of Technology, CUNY 
300 Jay Street
Brooklyn, NY 11210
718.260.5481

http://library.citytech.cuny.edu

>>> Andrew Hankinson  10/25/2010 10:23 AM >>>
There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
revised edition for 1.0)
The Django docs, with some intro tutorials: 
http://docs.djangoproject.com/en/1.2/ 

Did you try those already?

On 2010-10-25, at 10:19 AM, Junior Tidal wrote:

> Hello Code4Lib,
> 
> Does anyone have any recommendations for learning Django? Books, websites, 
> video tutorials, etc. ...
> 
> thanks,
> 
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY 
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
> 
> http://library.citytech.cuny.edu

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Stephen Meyer


Kyle Banerjee wrote:

On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding  wrote:


Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?



Data migrations and data dumps are a common use case. Needing to break or
make hundreds of thousands or millions of records is not uncommon.

kyle


To make this concrete, we processes the MARC records from 14 separate 
ILS's throughout the University of Wisconsin System. We extract, sort on 
OCLC number, dedup and merge pieces from any campus that has a record 
for the work. The MARC that we then index and display here


 http://forward.library.wisconsin.edu/catalog/ocm37443537?school_code=WU

is not identical to the version of the MARC record from any of the 4 
schools that hold it.


We extract 13 million records and dedup down to 8 million every week. 
Speed is paramount.


-sm
--
Stephen Meyer
Library Application Developer
UW-Madison Libraries
436 Memorial Library
728 State St.
Madison, WI 53706

sme...@library.wisc.edu
608-265-2844 (ph)


"Just don't let the human factor fail to be a factor at all."
- Andrew Bird, "Tables and Chairs"

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos

JSON++

I routinely re-index about 2.5M JSON records (originally from binary MARC), and 
it's several orders of magnitude faster than XML (measured in single-digit 
minutes rather than double-digit hours).  I'm not sure if it's in the same 
range as binary MARC, but as Tim says, it's plenty fast enough for pragmatic 
purposes.

Unfortunately JSON doesn't have as many mature tools for manipulation as XML 
(yet?), but I'd be inclined to call it the best of both worlds rather than a 
middle-ground or compromise.

MJ

> Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML 
> (although still probably not as binary), based on a standard low-level data 
> format so easier to work with using existing tools (and developers eyes) than 
> binary, no maximum record length. 
> There have been a couple competing attempts to define a 
> marc-expressed-in-json 'standard', none have really caught on yet. I like 
> Ross's latest attempt:  
> http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
> 
> Patrick Hochstenbach wrote:
>> Dear Nate,
>> 
>> There is a trade-off: do you want very fast processing of data -> go for 
>> binary data. do you want to share your data globally easily in many (not per 
>> se library related) environments -> go for XML/RDF. Open your data and do 
>> both :-)
>> 
>> Pat
>> 
>> Sent from my iPhone
>> 
>> On 25 Oct 2010, at 20:39, "Nate Vack"  wrote:
>> 
>>  
>>> Hi all,
>>> 
>>> I've just spent the last couple of weeks delving into and decoding a
>>> binary file format. This, in turn, got me thinking about MARCXML.
>>> 
>>> In a nutshell, it looks like it's supposed to contain the exact same
>>> data as a normal MARC record, except in XML form. As in, it should be
>>> round-trippable.
>>> 
>>> What's the advantage to this? I can see using a human-readable format
>>> for poorly-documented file formats -- they're relatively easy to read
>>> and understand. But MARC is well, well-documented, with more than one
>>> free implementation in cursory searching. And once you know a binary
>>> file's format, it's no harder to parse than XML, and the data's
>>> smaller and processing faster.
>>> 
>>> So... why the XML?
>>> 
>>> Curious,
>>> -Nate
>>>
>> 
>>

Re: [CODE4LIB] MARCXML - What is it for?


Tim Spalding wrote:

Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
  


Yes,which sometimes you are. Say, when you're indexing 2 or 3 or 10 
million marc records into, say, solr.


Which is faster depends on what language and what libraries you are 
using for both binary marc and marcxml. But in many of our experiences, 
parseing and serializing binary marc _is_ significantly faster than 
parseing and serializing marcxml.  That is of course just one of the 
various criteria that comes into play when choosing a format.


Here's Bill Dueber's benchmarks comparing MarcXML, marc binary, and a 
marc-in-json format; in ruby, using various library alternatives.  I 
rather like the marc-in-json format for being a happy medium.  Whether 
it's "standard" or not doesn't neccesarily matter when you're dealing 
with your own records, passing them through several stops on a 
toolchain, and have tools available that can do it. Who cares if 
any/everyone else uses it.


http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/

Re: [CODE4LIB] MARCXML - What is it for?

Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML
(although still probably not as binary), based on a standard low-level
data format so easier to work with using existing tools (and developers
eyes) than binary, no maximum record length.

There have been a couple competing attempts to define a
marc-expressed-in-json 'standard', none have really caught on yet. I
like Ross's latest attempt:
http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/

Patrick Hochstenbach wrote:

Dear Nate,

There is a trade-off: do you want very fast processing of data -> go for binary data. do you want to share your data globally easily in many (not per se library related) environments -> go for XML/RDF.
Open your data and do both :-)

Pat

Sent from my iPhone

On 25 Oct 2010, at 20:39, "Nate Vack" wrote:

Hi all,

I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.

In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
round-trippable.

What's the advantage to this? I can see using a human-readable format
for poorly-documented file formats -- they're relatively easy to read
and understand. But MARC is well, well-documented, with more than one
free implementation in cursory searching. And once you know a binary
file's format, it's no harder to parse than XML, and the data's
smaller and processing faster.

So... why the XML?

Curious,
-Nate

Re: [CODE4LIB] MARCXML - What is it for?

MODS was an attempt to mostly-but-not-entirely-roundtrippably represent 
data in MARC in a format that's more 'normal' XML, without packed bytes 
in elements, with element names that are more or less self-documenting, 
etc.  It's caught on even less than MARCXML though, so if you find 
MARCXML under-adopted (I disagree), you won't like MODS.


Personally I think MODS is kind of the worst of both worlds. The only 
reason to stick with something that looks anything like MARC is to be 
round-trippable with legacy MARC, which MODS is not.  But if you're 
going to give that up, you really want more improvements than MODS 
supplies, it's still got a lot of the unfortunate legacy of MARC in it.


Nate Vack wrote:

On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding  wrote:
  

- XML is self-describing, binary is not.

Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate catalog cards.



Yeah -- this is kinda the source of my confusion. In the case of the
files I'm reading, it's not that it's hard to find out where the
nMeasurement field lives (it's six short ints starting at offset 64),
but what the field means, and whether or not I care about it.

Switching to an XML format doesn't help with that at all.

WRT character encoding issues and validation: if MARC and MARCXML are
round-trippable, a solution in one environment is equivalent to a
solution in the other.

And I think we've all seen plenty of unvalidated, badly-formed XML,
and plenty with Character Encoding Problemsâ„¢ ;-)

Thanks for the input!
-Nate

Re: [CODE4LIB] MARCXML - What is it for?

Yes, it is designed to be a round-trippable expression of ordinary marc 
in XML. Some reasons this is useful:


1. No maximum record length, unlike actual marc which tops out at ~10k.
2. You can use XSLT and other XML tools to work with it, and store it in 
stores optimized for XML (or that only accept XML), etc.
3. You can embed it inside XML schema's that allow arbitrary embeddable 
XML.
4. (Of much lesser importance than these others, but still ends up being 
important to me -- saving the time of the developer does matter) it's a 
lot easier to debug the raw data, doesn't require me to open up a hex 
editor and count bytes.


Nate Vack wrote:

Hi all,

I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.

In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
round-trippable.

What's the advantage to this? I can see using a human-readable format
for poorly-documented file formats -- they're relatively easy to read
and understand. But MARC is well, well-documented, with more than one
free implementation in cursory searching. And once you know a binary
file's format, it's no harder to parse than XML, and the data's
smaller and processing faster.

So... why the XML?

Curious,
-Nate

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Kyle Banerjee

On Mon, Oct 25, 2010 at 12:22 PM, Eric Hellman  wrote:

> I think you'd have a very hard time demonstrating any speed advantage to
> MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If
> there exists a MARC parser that has ever been speed-optimized without
> serious compromise, I'm sure someone on this list will have a good story
> about it.

I'll take MarcEdit over a XML parser for MARCXML any day. For a benchmark
test, try roundtripping a million records. Unless I've been messing with the
wrong stuff, the differences are dramatic.

kyle

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding

Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?

Tim

On Mon, Oct 25, 2010 at 3:35 PM, MJ Suhonos  wrote:
> I'll just leave this here:
>
> http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records
>
> That trade-off ought to offend both camps, though I happen to think it's 
> quite clever.
>
> MJ
>
> On 2010-10-25, at 3:22 PM, Eric Hellman wrote:
>
>> I think you'd have a very hard time demonstrating any speed advantage to 
>> MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If 
>> there exists a MARC parser that has ever been speed-optimized without 
>> serious compromise, I'm sure someone on this list will have a good story 
>> about it.
>>
>> On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote:
>>
>>> Dear Nate,
>>>
>>> There is a trade-off: do you want very fast processing of data -> go for 
>>> binary data. do you want to share your data globally easily in many (not 
>>> per se library related) environments -> go for XML/RDF.
>>> Open your data and do both :-)
>>>
>>> Pat
>>>
>>> Sent from my iPhone
>>>
>>> On 25 Oct 2010, at 20:39, "Nate Vack"  wrote:
>>>
 Hi all,

 I've just spent the last couple of weeks delving into and decoding a
 binary file format. This, in turn, got me thinking about MARCXML.

 In a nutshell, it looks like it's supposed to contain the exact same
 data as a normal MARC record, except in XML form. As in, it should be
 round-trippable.

 What's the advantage to this? I can see using a human-readable format
 for poorly-documented file formats -- they're relatively easy to read
 and understand. But MARC is well, well-documented, with more than one
 free implementation in cursory searching. And once you know a binary
 file's format, it's no harder to parse than XML, and the data's
 smaller and processing faster.

 So... why the XML?

 Curious,
 -Nate
>>
>> Eric Hellman
>> President, Gluejar, Inc.
>> 41 Watchung Plaza, #132
>> Montclair, NJ 07042
>> USA
>>
>> e...@hellman.net
>> http://go-to-hellman.blogspot.com/
>> @gluejar
>

-- 
Check out my library at http://www.librarything.com/profile/timspalding

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Kyle Banerjee

On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding  wrote:

> Does processing speed of something matter anymore? You'd have to be
> doing a LOT of processing to care, wouldn't you?
>

Data migrations and data dumps are a common use case. Needing to break or
make hundreds of thousands or millions of records is not uncommon.

kyle

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos

I'll just leave this here:

http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records

That trade-off ought to offend both camps, though I happen to think it's quite 
clever.

MJ

On 2010-10-25, at 3:22 PM, Eric Hellman wrote:

> I think you'd have a very hard time demonstrating any speed advantage to MARC 
> over MARCXML. XML parsers have been speed optimized out the wazoo; If there 
> exists a MARC parser that has ever been speed-optimized without serious 
> compromise, I'm sure someone on this list will have a good story about it.
> 
> On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote:
> 
>> Dear Nate,
>> 
>> There is a trade-off: do you want very fast processing of data -> go for 
>> binary data. do you want to share your data globally easily in many (not per 
>> se library related) environments -> go for XML/RDF. 
>> Open your data and do both :-)
>> 
>> Pat
>> 
>> Sent from my iPhone
>> 
>> On 25 Oct 2010, at 20:39, "Nate Vack"  wrote:
>> 
>>> Hi all,
>>> 
>>> I've just spent the last couple of weeks delving into and decoding a
>>> binary file format. This, in turn, got me thinking about MARCXML.
>>> 
>>> In a nutshell, it looks like it's supposed to contain the exact same
>>> data as a normal MARC record, except in XML form. As in, it should be
>>> round-trippable.
>>> 
>>> What's the advantage to this? I can see using a human-readable format
>>> for poorly-documented file formats -- they're relatively easy to read
>>> and understand. But MARC is well, well-documented, with more than one
>>> free implementation in cursory searching. And once you know a binary
>>> file's format, it's no harder to parse than XML, and the data's
>>> smaller and processing faster.
>>> 
>>> So... why the XML?
>>> 
>>> Curious,
>>> -Nate
> 
> Eric Hellman
> President, Gluejar, Inc.
> 41 Watchung Plaza, #132
> Montclair, NJ 07042
> USA
> 
> e...@hellman.net 
> http://go-to-hellman.blogspot.com/
> @gluejar

Re: [CODE4LIB] MARCXML - What is it for?

I guess what I meant is that in MARCXML, you have a  element with 
subsequent  elements each with fairly clear attributes, which, while 
not my idea of fun Sunday-afternoon reading, requires less specialized tools to 
parse (hello Textmate!) and is a bit easier than trying to count INT positions. 
One quick XPath query and you can have all 245 fields, regardless of their 
length or position in the record.

On 2010-10-25, at 3:26 PM, Nate Vack wrote:

> On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding  wrote:
>> - XML is self-describing, binary is not.
>> 
>> Not to quibble, but that's only in a theoretical sense here. Something
>> like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
>> At least MARC records kinda imitate catalog cards.
> 
> Yeah -- this is kinda the source of my confusion. In the case of the
> files I'm reading, it's not that it's hard to find out where the
> nMeasurement field lives (it's six short ints starting at offset 64),
> but what the field means, and whether or not I care about it.
> 
> Switching to an XML format doesn't help with that at all.
> 
> WRT character encoding issues and validation: if MARC and MARCXML are
> round-trippable, a solution in one environment is equivalent to a
> solution in the other.
> 
> And I think we've all seen plenty of unvalidated, badly-formed XML,
> and plenty with Character Encoding Problemsâ„¢ ;-)
> 
> Thanks for the input!
> -Nate

Re: [CODE4LIB] MARCXML - What is it for?

Hiya,

On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack  wrote:
> Switching to an XML format doesn't help with that at all.

I'm willing to take it further and say that MARCXML was the worst
thing the library world ever did. Some might argue it was a good first
step, and that it was better with something rather than nothing, to
which I respond ;

Poppycock!

MARCXML is nothing short of evil. Not only does it goes against every
principal of good XML anywhere (don't rely on whitespace, structure
over code, namespace conventions, identity management, document
control, separation of entities and properties, and on and on), it
breaks the ontological commitment that a better treatment of the MARC
data could bring, deterring people from actually a) using the darn
thing as anything but a bare minimal crutch, and b) expanding it to be
actual useful and interesting.

The quicker the library world can get rid of this monstrosity, the
better, although I doubt that will ever happen; it will hang around
like a foul stench for as long as there is MARC in the world. A long
time. A long sad time.

A few extra notes;
   http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html

Can you tell I'm not a fan? :)


Kind regards,

Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Nate Vack

On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding  wrote:
> - XML is self-describing, binary is not.
>
> Not to quibble, but that's only in a theoretical sense here. Something
> like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
> At least MARC records kinda imitate catalog cards.

Yeah -- this is kinda the source of my confusion. In the case of the
files I'm reading, it's not that it's hard to find out where the
nMeasurement field lives (it's six short ints starting at offset 64),
but what the field means, and whether or not I care about it.

Switching to an XML format doesn't help with that at all.

WRT character encoding issues and validation: if MARC and MARCXML are
round-trippable, a solution in one environment is equivalent to a
solution in the other.

And I think we've all seen plenty of unvalidated, badly-formed XML,
and plenty with Character Encoding Problemsâ„¢ ;-)

Thanks for the input!
-Nate

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Eric Hellman

I think you'd have a very hard time demonstrating any speed advantage to MARC 
over MARCXML. XML parsers have been speed optimized out the wazoo; If there 
exists a MARC parser that has ever been speed-optimized without serious 
compromise, I'm sure someone on this list will have a good story about it.

On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote:

> Dear Nate,
> 
> There is a trade-off: do you want very fast processing of data -> go for 
> binary data. do you want to share your data globally easily in many (not per 
> se library related) environments -> go for XML/RDF. 
> Open your data and do both :-)
> 
> Pat
> 
> Sent from my iPhone
> 
> On 25 Oct 2010, at 20:39, "Nate Vack"  wrote:
> 
>> Hi all,
>> 
>> I've just spent the last couple of weeks delving into and decoding a
>> binary file format. This, in turn, got me thinking about MARCXML.
>> 
>> In a nutshell, it looks like it's supposed to contain the exact same
>> data as a normal MARC record, except in XML form. As in, it should be
>> round-trippable.
>> 
>> What's the advantage to this? I can see using a human-readable format
>> for poorly-documented file formats -- they're relatively easy to read
>> and understand. But MARC is well, well-documented, with more than one
>> free implementation in cursory searching. And once you know a binary
>> file's format, it's no harder to parse than XML, and the data's
>> smaller and processing faster.
>> 
>> So... why the XML?
>> 
>> Curious,
>> -Nate

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042
USA

e...@hellman.net 
http://go-to-hellman.blogspot.com/
@gluejar

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bryan Baldus

On  Monday, October 25, 2010 1:50 PM, Andrew Hankinson wrote:
>- Documents can be validated for their "well-formedness" using these existing 
>tools and a pre-defined schema (a validator for MARC would need to be 
>custom-coded)

In Perl, MARC::Lint might be an example of such a validator (though I need to 
update it with the most recent MARC updates at some point soon). MarcEdit also 
includes a validator.

Bryan Baldus
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding

- XML is self-describing, binary is not.

Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate catalog cards.
:)

Tim

On Mon, Oct 25, 2010 at 2:50 PM, Andrew Hankinson
 wrote:
> I'm not a big user of MARCXML, but I can think of a few reasons off the top 
> of my head:
>
> - Existing libraries for reading, manipulating and searching XML-based 
> documents are very mature.
> - Documents can be validated for their "well-formedness" using these existing 
> tools and a pre-defined schema (a validator for MARC would need to be 
> custom-coded)
> - MARCXML can easily be incorporated into XML-based meta-metadata schemas, 
> like METS.
> - It can be parsed and manipulated in a web service context without sending a 
> binary blob over the wire.
> - XML is self-describing, binary is not.
>
> There's nothing stopping you from reading the MARCXML into a binary blob and 
> working on it from there. But when sharing documents from different 
> institutions around the globe, using a wide variety of tools and techniques, 
> XML seems to be the lowest common denominator.
>
> -Andrew
>
> On 2010-10-25, at 2:38 PM, Nate Vack wrote:
>
>> Hi all,
>>
>> I've just spent the last couple of weeks delving into and decoding a
>> binary file format. This, in turn, got me thinking about MARCXML.
>>
>> In a nutshell, it looks like it's supposed to contain the exact same
>> data as a normal MARC record, except in XML form. As in, it should be
>> round-trippable.
>>
>> What's the advantage to this? I can see using a human-readable format
>> for poorly-documented file formats -- they're relatively easy to read
>> and understand. But MARC is well, well-documented, with more than one
>> free implementation in cursory searching. And once you know a binary
>> file's format, it's no harder to parse than XML, and the data's
>> smaller and processing faster.
>>
>> So... why the XML?
>>
>> Curious,
>> -Nate
>



-- 
Check out my library at http://www.librarything.com/profile/timspalding

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos

It's helpful to think of MARCXML as a sort of lingua franca.

> - Existing libraries for reading, manipulating and searching XML-based 
> documents are very mature.
Including XSLT and XPath; very powerful stuff.

> There's nothing stopping you from reading the MARCXML into a binary blob and 
> working on it from there. But when sharing documents from different 
> institutions around the globe, using a wide variety of tools and techniques, 
> XML seems to be the lowest common denominator.

Assuming it's also round-trippable, MARC-in-JSON would accomplish this as well.

Not to mention it's nice to be able to read and edit MARC records in any 
(any!!) text editor for those of us who are comfortable looking at JSON or XML 
but can't handle staring at binary bytestreams without having an aneurysm.

MJ

> On 2010-10-25, at 2:38 PM, Nate Vack wrote:
> 
>> Hi all,
>> 
>> I've just spent the last couple of weeks delving into and decoding a
>> binary file format. This, in turn, got me thinking about MARCXML.
>> 
>> In a nutshell, it looks like it's supposed to contain the exact same
>> data as a normal MARC record, except in XML form. As in, it should be
>> round-trippable.
>> 
>> What's the advantage to this? I can see using a human-readable format
>> for poorly-documented file formats -- they're relatively easy to read
>> and understand. But MARC is well, well-documented, with more than one
>> free implementation in cursory searching. And once you know a binary
>> file's format, it's no harder to parse than XML, and the data's
>> smaller and processing faster.
>> 
>> So... why the XML?
>> 
>> Curious,
>> -Nate

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Patrick Hochstenbach

Dear Nate,

There is a trade-off: do you want very fast processing of data -> go for binary 
data. do you want to share your data globally easily in many (not per se 
library related) environments -> go for XML/RDF. 
Open your data and do both :-)

Pat

Sent from my iPhone

On 25 Oct 2010, at 20:39, "Nate Vack"  wrote:

> Hi all,
> 
> I've just spent the last couple of weeks delving into and decoding a
> binary file format. This, in turn, got me thinking about MARCXML.
> 
> In a nutshell, it looks like it's supposed to contain the exact same
> data as a normal MARC record, except in XML form. As in, it should be
> round-trippable.
> 
> What's the advantage to this? I can see using a human-readable format
> for poorly-documented file formats -- they're relatively easy to read
> and understand. But MARC is well, well-documented, with more than one
> free implementation in cursory searching. And once you know a binary
> file's format, it's no harder to parse than XML, and the data's
> smaller and processing faster.
> 
> So... why the XML?
> 
> Curious,
> -Nate

Re: [CODE4LIB] MARCXML - What is it for?

I'm not a big user of MARCXML, but I can think of a few reasons off the top of 
my head:

- Existing libraries for reading, manipulating and searching XML-based 
documents are very mature.
- Documents can be validated for their "well-formedness" using these existing 
tools and a pre-defined schema (a validator for MARC would need to be 
custom-coded)
- MARCXML can easily be incorporated into XML-based meta-metadata schemas, like 
METS.
- It can be parsed and manipulated in a web service context without sending a 
binary blob over the wire.
- XML is self-describing, binary is not.

There's nothing stopping you from reading the MARCXML into a binary blob and 
working on it from there. But when sharing documents from different 
institutions around the globe, using a wide variety of tools and techniques, 
XML seems to be the lowest common denominator.

-Andrew

On 2010-10-25, at 2:38 PM, Nate Vack wrote:

> Hi all,
> 
> I've just spent the last couple of weeks delving into and decoding a
> binary file format. This, in turn, got me thinking about MARCXML.
> 
> In a nutshell, it looks like it's supposed to contain the exact same
> data as a normal MARC record, except in XML form. As in, it should be
> round-trippable.
> 
> What's the advantage to this? I can see using a human-readable format
> for poorly-documented file formats -- they're relatively easy to read
> and understand. But MARC is well, well-documented, with more than one
> free implementation in cursory searching. And once you know a binary
> file's format, it's no harder to parse than XML, and the data's
> smaller and processing faster.
> 
> So... why the XML?
> 
> Curious,
> -Nate

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding

MARC records break parsing far too frequently. Apart from requiring no
truly specialized tools, MARCXML should—should!—eliminate many of
those problems. That's not to mention that MARC character sets vary a
lot (DanMARC anyone?), and more even in practice than in theory.

>From my perspective the problem is simply that MARCXML isn't as
ubiquitous as MARC. For what we do, at least, there's no point. We'd
need to parse non-XML MARC data anyway. So if we're going to do it, we
might as well do it for everything.

Best,
Tim

On Mon, Oct 25, 2010 at 2:38 PM, Nate Vack  wrote:
> Hi all,
>
> I've just spent the last couple of weeks delving into and decoding a
> binary file format. This, in turn, got me thinking about MARCXML.
>
> In a nutshell, it looks like it's supposed to contain the exact same
> data as a normal MARC record, except in XML form. As in, it should be
> round-trippable.
>
> What's the advantage to this? I can see using a human-readable format
> for poorly-documented file formats -- they're relatively easy to read
> and understand. But MARC is well, well-documented, with more than one
> free implementation in cursory searching. And once you know a binary
> file's format, it's no harder to parse than XML, and the data's
> smaller and processing faster.
>
> So... why the XML?
>
> Curious,
> -Nate
>



-- 
Check out my library at http://www.librarything.com/profile/timspalding

[CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Nate Vack

Hi all,

I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.

In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
round-trippable.

What's the advantage to this? I can see using a human-readable format
for poorly-documented file formats -- they're relatively easy to read
and understand. But MARC is well, well-documented, with more than one
free implementation in cursory searching. And once you know a binary
file's format, it's no harder to parse than XML, and the data's
smaller and processing faster.

So... why the XML?

Curious,
-Nate

Re: [CODE4LIB] Help with DLF-ILS GetAvailability


Emily Lynema wrote:
standardized metadata! While we had envisioned using something like 
MARCXML or ISO Holdings here to express things like serial runs, there 
  


Kind of a side note, but please consider ONIX Serial Holdings for 
expressing serial runs!   It is by far the best schema I've seen for 
doing this -- simple for simple cases, flexible for other cases, 
actually DOES express things in a machine-interpretable way. Everything 
else I've seen is both way too complicated, even for simple cases, and 
often ends up expressing holdings in a way that a machine can't act upon 
anyway.

Re: [CODE4LIB] Help with DLF-ILS GetAvailability

2010-10-25 Thread Emily Lynema

I agree with Jonathan and David. The only reason there are no examples 
of including  within  is 
because no one thought of a use case for why you would do that. The xsd 
for  explicitly states that it is simply "Metadata must 
be expressed in XML that complies with another XML Schema 
(namespace=#other). Metadata must be explicitly qualified in the 
response." So the only restriction is that it's some kind of 
standardized metadata! While we had envisioned using something like 
MARCXML or ISO Holdings here to express things like serial runs, there 
is no reason that simpleavailability could not be employed to describe a 
different kind of collection of items. The  and 
 are after all intended to represent a collection of 
items, and as David points out, the ISO Holdings schema explicitly 
allows for collection-level availability summary. And I will also note 
that ISO Holdings certainly does express availability in addition to 
'holdings'; they are really one and the same thing. I guess I should 
note that I was a member of the original DLF group, so I suppose this is 
a fairly authoritative perspective on the original intent of the 
elements. :) -emily -- Date: Thu, 21 Oct 
2010 16:26:54 -0400 From: Jonathan Rochkind  Subject: 
Re: Help with DLF-ILS GetAvailability I don't think that's an abuse. I 
consider  to be for information about a "holdingset", or 
some collection of "items", while  is for information about an 
individual item. I think regardless of what you do you are being 
over-optimistic in thinking that if you just "do dlf", your stuff will 
interchangeable with any other clients or servers "doing dlf". The spec 
is way too open-ended for that, it leaves a whole bunch of details not 
specified and up to the implementer. For better or worse. I made more 
comments about this in the blog post I referenced earlier. Jonathan Owen 
Stephens wrote:

> Thanks Dave,
>
> Yes - my reading was that dlf:holdings was for pure 'holdings' as opposed to
> 'availability'. We could put the simpleavailability in there I guess but as
> you say since we are controlling both ends then there doesn't seem any point
> in abusing it like that. The downside is we'd hoped to do something that
> could be taken by other sites - the original plan was to use the Juice
> framework - developed by Talis using jQuery to parse a standard availability
> format so that this could then be applied easily in other environments.
> Obviously we can still achieve the outcome we need for the immediate
> requirements of the project by using a custom format.
>
> Thanks again
>
> Owen
>
>
> On Thu, Oct 21, 2010 at 4:28 PM, Walker, David  wrote:
>
>   

>> Hey Owen,
>>
>> Seems like the you could use the  element to hold this kind
>> of individual library information.
>>
>> The DLF-ILS documentation doesn't seem to think that you would use
>> dlf:simpleavailability here, though, but rather MARC or ISO holdings
>> schemas.
>>
>> But if you're controlling both ends of the communication, I don't know if
>> it really matters.
>>
>> --Dave
>>
>> ==
>> David Walker
>> Library Web Services Manager
>> California State University
>> http://xerxes.calstate.edu
>> 
>> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen
>> Stephens [o...@ostephens.com]
>> Sent: Wednesday, October 20, 2010 12:22 PM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] Help with DLF-ILS GetAvailability
>>
>> I'm working with the University of Oxford to look at integrating some
>> library services into their VLE/Learning Management System (Sakai). One of
>> the services is something that will give availability for items on a reading
>> list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the
>> DLF-ILS GetAvailability specification to achieve this.
>>
>> For physical items, the availability information I was hoping to use is
>> expressed at the level of a physical collection. For example, if several
>> college libraries within the University I have aggregated information that
>> tells me the availability of the item in each of the college libraries.
>> However, I don't have item level information.
>>
>> I can see how I can use simpleavailability to say over the entire
>> institution whether (e.g.) a book is available or not. However, I'm not
>> clear I can express this in a more granular way (say availability on a
>> library by library basis) except by going to item level. Also although it
>> seems you can express multiple locations in simpleavailability, and multiple
>> availabilitymsg, there is no way I can see to link these, so although I
>> could list each location OK, I can't attach an availabilitymsg to a specific
>> location (unless I only express one location).
>>
>> Am I missing something, or is my interpretation correct?
>>
>> Any other suggestions?
>>
>> Thanks,
>>
>> Owen
>>
>> PS also looked at DAIA which I like, but this (as far as I can tel

[CODE4LIB] testing testing testing - Solr indexing software

2010-10-25 Thread Naomi Dushay

I just finished a bunch of blog posts about the sorts of tests to  
write for Solr indexing software.  Comments are welcome.  Try not to  
drool when you fall asleep on your keyboard.


Start with this one:

http://discovery-grindstone.blogspot.com/2010/10/testing-solr-indexing-software.html

- Naomi

[CODE4LIB] (LC) call number searching in Solr

2010-10-25 Thread Naomi Dushay

I recently set up a testing framework allowing me to twiddle Solr  
knobs until I met acceptance criteria for LC call number searching.  I  
came up with two Solr field types that worked for my criteria.


You can read all about it here:

http://discovery-grindstone.blogspot.com/2010/10/lc-call-number-searching-in-solr.html

- Naomi

Re: [CODE4LIB] Django

2010-10-25 Thread Nate Vack

On Mon, Oct 25, 2010 at 9:19 AM, Junior Tidal  wrote:

> Does anyone have any recommendations for learning Django? Books, websites, 
> video tutorials, etc. ...

For resources, "learn django" in Google shows a bunch of promising hints.

Methodology-wise: Start with a fairly concrete, well-defined problem.
Have a product in mind before you start. Work hard with the tool you
choose to make your product. Don't stress about whether you've chosen
the best tools (you haven't) or whether you're doing it perfectly (you
aren't). Make the thing.

You can spend months looking over example code and tutorials and blog
posts and not learn nearly as much as you would attacking the problem.
Plus, you've gotten closer to solving the problem as you've learned.

Or, DHH says it a bit better:

http://37signals.com/svn/posts/2582-how-do-i-learn-to-program

Cheers,
-Nate

Re: [CODE4LIB] VPN vs. Proxy - Quick Question

2010-10-25 Thread Thomas Bennett

We have VPN and Proxy(III WAM) available here although for our online 
resources VPN doesn't get you anything special you still go through proxy.  
The regular URLs and Proxy URLs are in a PostgreSQL database and the page with 
the links to online resources is dynamically fed based on your IP (HTTP 
variable HTTP_X_FORWARDED_FOR).  Apache forwards all requests to Zope server 
so that's why I'm not checking REMOTE_ADDR variable.  If your IP is not in our 
domain, that is if the first two octets don't match, then you get a proxy link 
which goes to our III authentication page.  Online resources that are free get 
the same URL for on campus and off campus not a PROXY link.

  I use a simple python script to check the http variable 
'HTTP_X_FORWARDED_FOR' and return 0 or 1 in the variable 'hostname' to a Zope 
(python based WEB server)  page. A simple IF conditional statement determines 
which URL to display based on the return value of the script.

# call the python script ip_add_flag and set the return value to the variable 
hostname

 (opens in a new window)

The campus offers a VPN service but you don't get the usual campus domain IP so 
we handle it the same as if it is any other off campus IP, our vendors are not 
given this range either so it is not in the group of IPs for licensing certain 
databases.

As far as user complaints, we have a form that a small group of people here 
receive those submissions and they put it into TRAC and individually work 
through the issues.

Don't know the ratio of Proxy:VPN users, I don't have a definitive range of VPN 
IPs  to work with.  The campus VPN is used to be able to access certain 
servers that are not normally accessible off campus because the vlan they are 
in.

Thomas  

On Monday 25 October 2010 09:33:55 Tim McGeary wrote:
> Hi all,
> 
> I realize that some of you may not directly deal with this issue, but I
> was wondering if I could get some quick replies about how your
> institutions are handling access to off-campus resources via VPN and Proxy.
> 
> Do you offer a VPN service?  If so, do you split-tunnel the traffic so
> that the VPN only handles traffic to inside your campus IP?  If you
> split-tunnel, do users complain about not being able to connect to
> external library resources (databases, journals, etc)?
> 
> Do you offer a Proxy service?  Will your proxy service work for users
> already connected to VPN?
> 
> Do you know an estimated ratio of Proxy:VPN users?
> 
> Thanks,
> Tim
> 

-- 
==
Thomas McMillan Grant Bennett   Appalachian State University
Operations & Systems AnalystP O Box 32026
University LibraryBoone, North Carolina 28608
(828) 262 6587

Library Systems Help Desk: https://www.library.appstate.edu/help/
==

Re: [CODE4LIB] Django

2010-10-25 Thread Michael J. Giarlo

I'd start here:

   http://docs.djangoproject.com/en/1.2/

There are some tutorials in there as well.

-Mike



On Mon, Oct 25, 2010 at 10:19, Junior Tidal  wrote:
> Hello Code4Lib,
>
> Does anyone have any recommendations for learning Django? Books, websites, 
> video tutorials, etc. ...
>
> thanks,
>
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
>
> http://library.citytech.cuny.edu
>

Re: [CODE4LIB] Django

There's the Django Book: http://www.djangobook.com/ (Make sure you choose the 
revised edition for 1.0)
The Django docs, with some intro tutorials: 
http://docs.djangoproject.com/en/1.2/

Did you try those already?


On 2010-10-25, at 10:19 AM, Junior Tidal wrote:

> Hello Code4Lib,
> 
> Does anyone have any recommendations for learning Django? Books, websites, 
> video tutorials, etc. ...
> 
> thanks,
> 
> Junior Tidal
> Assistant Professor
> Web Services and Multimedia Librarian
> New York City College of Technology, CUNY 
> 300 Jay Street
> Brooklyn, NY 11210
> 718.260.5481
> 
> http://library.citytech.cuny.edu

[CODE4LIB] Django

2010-10-25 Thread Junior Tidal

Hello Code4Lib,

Does anyone have any recommendations for learning Django? Books, websites, 
video tutorials, etc. ...

thanks,

Junior Tidal
Assistant Professor
Web Services and Multimedia Librarian
New York City College of Technology, CUNY 
300 Jay Street
Brooklyn, NY 11210
718.260.5481
 
http://library.citytech.cuny.edu

Re: [CODE4LIB] Simple Flexible ILS written in Django