Re: [CODE4LIB] is this valid marc ?
> Is it really true that newline characters are not allowed in a marc > value? Yes. CONTROL FUNCTION CODES [1] Eight characters are specifically designated as control characters for MARC 21 use: - escape character, 1B(hex) in MARC-8 and Unicode encoding - subfield delimiter, 1F(hex) in MARC-8 and Unicode encoding - field terminator, 1E(hex) in MARC-8 and Unicode encoding - record terminator, 1D(hex) in MARC-8 and Unicode encoding - non-sorting character(s) begin, 88(hex) in MARC-8 and 98(hex) in Unicode encoding - non-sorting character(s) end, 89(hex) in MARC-8 and 9C(hex) in Unicode encoding - joiner, 8D(hex) in MARC-8 and 200D (hex) in Unicode encoding - nonjoiner, 8E(hex) in MARC-8 and 200C (hex) in Unicode encoding. [1] http://www.loc.gov/marc/specifications/specchargeneral.html#controlfunction -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Thursday, May 19, 2011 1:27 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] is this valid marc ? > > Is it really true that newline characters are not allowed in a marc > value? I thought they were, not with any special meaning, just as > ordinary data. If they're not, that's useful to know, so I don't put > any there! > > I'd ask for a reference to the standard that says this, but I suspect > it's going to be some impenetrable implication of a side effect of an > subtle adjective either way. > > On 5/19/2011 2:19 PM, Karen Coyle wrote: > > Quoting Andreas Orphanides : > > > >> > >> Anyway, I think having these two parts of the same URL data on > >> separate lines is definitely Not Right, but I am not sure if it adds > >> up to invalid MARC. > > > > Exactly. The CR and LF characters are NOT defined as valid in the MARC > > character set and should not be used. In fact, in MARC there is no > > concept of "lines", only variable length strings (usually up to > > char). > > > > kc > > > >> > >> -dre. > >> > >> [1] http://www.loc.gov/marc/bibliographic/bd856.html > >> [2] I am not a cataloger. Don't hurt me. > >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. > >> > >> On 5/19/2011 12:37 PM, James Lecard wrote: > >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I > >>> get > >>> from a partner. > >>> > >>> The 856 field is splitted over 2 lines, causing the ruby library to > >>> ignore > >>> it (I've patched it to overcome this issue) but I want to know if > >>> this kind > >>> of marc is valid ? > >>> > >>> =LDR 00638nam 2200181uu 4500 > >>> =001 cla-MldNA01 > >>> =008 080101s2008\\\|fre|| > >>> =040 \\$aMy Provider > >>> =041 0\$afre > >>> =245 10$aThis Subject > >>> =260 \\$aParis$bJ. Doe$c2008 > >>> =490 \\$aSome topic > >>> =650 1\$aNarratif, Autre forme > >>> =655 \7$abook$2lcsh > >>> =752 \\$aA Place on earth > >>> =776 \\$dParis: John Doe and Cie, 1973 > >>> =856 \2$qtext/html > >>> =856 > >>> \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library > >>> > >>> Thanks, > >>> > >>> James L. > >> > > > > > >
Re: [CODE4LIB] is this valid marc ?
Thanks Michael. So one weird thing is that at least some of those characters "specifically designated as control characters" aren't ordinarily what everyone else considers "control characters". To me, "control character" means ASCII less than 20. Which the last four aren't. So now it's unclear what the "prohibted" (by not being mentioned) control characters are, since I don't know what MARC considers a 'control character' exactly. But I'm really just picking nits to demonstrate the impenetrability of MARC specs. I believe you all (especially Terry) that CR and LF aren't allowed. But, two, Michael, are you the doran in this? http://rocky.uta.edu/doran/charsets/marc8default.html You might want to remove CR, LF, and the other disallowed control characters from your own published list of MARC8 characters! On 5/19/2011 3:16 PM, Doran, Michael D wrote: Is it really true that newline characters are not allowed in a marc value? Yes. CONTROL FUNCTION CODES [1] Eight characters are specifically designated as control characters for MARC 21 use: - escape character, 1B(hex) in MARC-8 and Unicode encoding - subfield delimiter, 1F(hex) in MARC-8 and Unicode encoding - field terminator, 1E(hex) in MARC-8 and Unicode encoding - record terminator, 1D(hex) in MARC-8 and Unicode encoding - non-sorting character(s) begin, 88(hex) in MARC-8 and 98(hex) in Unicode encoding - non-sorting character(s) end, 89(hex) in MARC-8 and 9C(hex) in Unicode encoding - joiner, 8D(hex) in MARC-8 and 200D (hex) in Unicode encoding - nonjoiner, 8E(hex) in MARC-8 and 200C (hex) in Unicode encoding. [1] http://www.loc.gov/marc/specifications/specchargeneral.html#controlfunction -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, May 19, 2011 1:27 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] is this valid marc ? Is it really true that newline characters are not allowed in a marc value? I thought they were, not with any special meaning, just as ordinary data. If they're not, that's useful to know, so I don't put any there! I'd ask for a reference to the standard that says this, but I suspect it's going to be some impenetrable implication of a side effect of an subtle adjective either way. On 5/19/2011 2:19 PM, Karen Coyle wrote: Quoting Andreas Orphanides: Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. Exactly. The CR and LF characters are NOT defined as valid in the MARC character set and should not be used. In fact, in MARC there is no concept of "lines", only variable length strings (usually up to char). kc -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
It's been a while since I looked of the ISO spec (which I still can't believe I had to buy to read) -- but you can certainly infer by looking at legal characters laid out by LC. In reality -- only a handful of unprintable characters are technically allowed in a MARC record -- but you have to remember that when MARC was created -- it was for block reading -- and generally, early (and current) readers stop on hard breaks. --TR > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Thursday, May 19, 2011 11:49 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] is this valid marc ? > > On 5/19/2011 2:33 PM, Reese, Terry wrote: > > Jonathan, > > > > Karen is correct -- CR/LF are invalid characters within a MARC record. This > has nothing to do if the character is valid in the set -- the format itself > doesn't > allow it. > > I'm curious where in the spec it says this -- of course, it's an intellectual > exersize at this point, because even if the spec says one thing, it doesn't > matter if everyone (including tool-writers) has always understood it > differently. (This is a problem for me with lots of library 'standards' > including > MARC. "Oh yeah, it might APPEAR to say/allow/prohibit that, but don't > believe it, 'everyone' has always understood it diffferently." Or two parts > of a > spec which contradict each other). > > In the glossary here: > http://www.loc.gov/marc/specifications/speccharintro.html > > It does say "Consequently,/code points/less than 80 (hex) have the same > meaning in both of the encodings used in MARC 21 and may be referred to as > ASCII in either environment." Which could be interpreted to include control > chars such as CR and LF. (Thanks Dan Scott). Of course, the glossary section > may not actually be an operative part of the standard, or it may not mean > what it seems to mean, or everyone may have always acted as if it meant > something different. Welcome to MARC. > > But I'm not succesfully finding anything else that says one way or another on > the legality. Most of the ascii control chars do seem to be missing from Marc8 > (whether by design or accident), but that doesn't neccesarily mean they're > illegal in a MARC record using some other (legal for MARC) encoding. > > But I believe Terry that it's not allowed (I believe Terry about just about > everything). It's just really an intellectual exersize in the difficulty of > finding > answers in the MARC spec at the moment.
Re: [CODE4LIB] is this valid marc ?
On 5/19/2011 2:33 PM, Reese, Terry wrote: Jonathan, Karen is correct -- CR/LF are invalid characters within a MARC record. This has nothing to do if the character is valid in the set -- the format itself doesn't allow it. I'm curious where in the spec it says this -- of course, it's an intellectual exersize at this point, because even if the spec says one thing, it doesn't matter if everyone (including tool-writers) has always understood it differently. (This is a problem for me with lots of library 'standards' including MARC. "Oh yeah, it might APPEAR to say/allow/prohibit that, but don't believe it, 'everyone' has always understood it diffferently." Or two parts of a spec which contradict each other). In the glossary here: http://www.loc.gov/marc/specifications/speccharintro.html It does say "Consequently,/code points/less than 80 (hex) have the same meaning in both of the encodings used in MARC 21 and may be referred to as ASCII in either environment." Which could be interpreted to include control chars such as CR and LF. (Thanks Dan Scott). Of course, the glossary section may not actually be an operative part of the standard, or it may not mean what it seems to mean, or everyone may have always acted as if it meant something different. Welcome to MARC. But I'm not succesfully finding anything else that says one way or another on the legality. Most of the ascii control chars do seem to be missing from Marc8 (whether by design or accident), but that doesn't neccesarily mean they're illegal in a MARC record using some other (legal for MARC) encoding. But I believe Terry that it's not allowed (I believe Terry about just about everything). It's just really an intellectual exersize in the difficulty of finding answers in the MARC spec at the moment.
Re: [CODE4LIB] is this valid marc ?
On 5/19/2011 2:33 PM, Kyle Banerjee wrote: However, what would be the use case for including them as you don't know how they'll be interpreted by the app that you hand the data to? Only when the destination is an app you have complete control over too. One use case I was idly turning over in my head lately. I export data about my bibs from my ILS to Solr in Marc. But I am increasingly needing to stuff 'local' data that doesn't fit into any Marc field in there too, because I need it available at Solr indexing stage. Not concerned with doing this in a 'standard' way, just need to get it in there SOMEHOW, because Marc is all that makes it to my Solr indexer. (and it would be somewhat complicated to change my pipeline to send a package that includes Marc plus other metadata payloads, there are a bunch of pieces in the pipeline that really want Marc-as-marc). So one idea I had was encoding it as arbitrary key/value pairs in YAML, and just sticking the YAML in a 9xx field. But a newline is a significant character for YAML. I don't care about this data being _meaningful_ to anyone other than my own custom local destination, but I do care about leaving the Marc structurally legal (especially because if its' not some of the individual elements of the pipeline might choke on it or corrupt it). Another different idea I was also thinking about: All of our MARC 'summaries' (520) show up in our interfaces as one giant paragraph. Even when they are publisher back-of-the-book copy that was originally multiple paragraphs. Sometimes a MARC record has the exact same text in it as an Amazon description, but the Amazon description is a lot more readable because it is rightly multiple paragraphs. If newlines were legal in a 520, then a cataloger could preserve them --- systems that just ignored it would continue to, no loss; but systems that wanted to take account of it could, for instance by using HTML or tags to paragraph-ize on newlines before outputting to an HTML display. But not if newlines aren't legal in a value, of course. Jonathan I've seen people put HTML in certain fields to achieve a certain effect in catalogs, but this is a dodgy practice since it relies on the questionable assumption that the end application will just pass through whatever is sent. kyle
Re: [CODE4LIB] is this valid marc ?
Jonathan, Karen is correct -- CR/LF are invalid characters within a MARC record. This has nothing to do if the character is valid in the set -- the format itself doesn't allow it. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, May 19, 2011 11:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] is this valid marc ? I wonder if it depends on if your record is in Marc8 or UTF-8, if I'm reading Karen right to say that CR/LF aren't in the Marc8 character set. They're certainly in UTF-8! And a Marc record can be in UTF-8. On 5/19/2011 2:27 PM, Jonathan Rochkind wrote: > Is it really true that newline characters are not allowed in a marc > value? I thought they were, not with any special meaning, just as > ordinary data. If they're not, that's useful to know, so I don't put > any there! > > I'd ask for a reference to the standard that says this, but I suspect > it's going to be some impenetrable implication of a side effect of an > subtle adjective either way. > > On 5/19/2011 2:19 PM, Karen Coyle wrote: >> Quoting Andreas Orphanides : >> >>> >>> Anyway, I think having these two parts of the same URL data on >>> separate lines is definitely Not Right, but I am not sure if it adds >>> up to invalid MARC. >> >> Exactly. The CR and LF characters are NOT defined as valid in the >> MARC character set and should not be used. In fact, in MARC there is >> no concept of "lines", only variable length strings (usually up to >> char). >> >> kc >> >>> >>> -dre. >>> >>> [1] http://www.loc.gov/marc/bibliographic/bd856.html >>> [2] I am not a cataloger. Don't hurt me. >>> [3] I am not an expert on MARC ingest or on ruby-marc. I could be >>> wrong. >>> >>> On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L. >>> >> >> >>
Re: [CODE4LIB] is this valid marc ?
Is it really true that newline characters are not allowed in a marc value? > I thought they were, not with any special meaning, just as ordinary data. > If they're not, that's useful to know, so I don't put any there! > This is also my understanding. However, what would be the use case for including them as you don't know how they'll be interpreted by the app that you hand the data to? I've seen people put HTML in certain fields to achieve a certain effect in catalogs, but this is a dodgy practice since it relies on the questionable assumption that the end application will just pass through whatever is sent. kyle
Re: [CODE4LIB] is this valid marc ?
I wonder if it depends on if your record is in Marc8 or UTF-8, if I'm reading Karen right to say that CR/LF aren't in the Marc8 character set. They're certainly in UTF-8! And a Marc record can be in UTF-8. On 5/19/2011 2:27 PM, Jonathan Rochkind wrote: Is it really true that newline characters are not allowed in a marc value? I thought they were, not with any special meaning, just as ordinary data. If they're not, that's useful to know, so I don't put any there! I'd ask for a reference to the standard that says this, but I suspect it's going to be some impenetrable implication of a side effect of an subtle adjective either way. On 5/19/2011 2:19 PM, Karen Coyle wrote: Quoting Andreas Orphanides : Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. Exactly. The CR and LF characters are NOT defined as valid in the MARC character set and should not be used. In fact, in MARC there is no concept of "lines", only variable length strings (usually up to char). kc -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
Is it really true that newline characters are not allowed in a marc value? I thought they were, not with any special meaning, just as ordinary data. If they're not, that's useful to know, so I don't put any there! I'd ask for a reference to the standard that says this, but I suspect it's going to be some impenetrable implication of a side effect of an subtle adjective either way. On 5/19/2011 2:19 PM, Karen Coyle wrote: Quoting Andreas Orphanides : Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. Exactly. The CR and LF characters are NOT defined as valid in the MARC character set and should not be used. In fact, in MARC there is no concept of "lines", only variable length strings (usually up to char). kc -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
Quoting Andreas Orphanides : Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. Exactly. The CR and LF characters are NOT defined as valid in the MARC character set and should not be used. In fact, in MARC there is no concept of "lines", only variable length strings (usually up to char). kc -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L. -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] is this valid marc ?
On Thu, May 19, 2011 at 1:33 PM, Bill Dueber wrote: > record['856'] is defined to return the *first* 856 in the record, which, if > you look at the documentation...er...ok. Which is not documented as such in > MARC::Record (http://rubydoc.info/gems/marc/0.4.2/MARC/Record) > > To get them all, you need to do something like > > sixfifties = record.fields '650' # returns array of results > > Or, to iterate > > record.each_by_tag('650') do |f| > puts f['u'] if f['u'] # print out a URL if we've got one > end > What Bill said. Also, there's a somewhat complicated calculus that comes into play here regarding ruby-marc and looking up subfields and performance. Modern ruby-marc (which 0.4.2 is an example) has the capability of providing a hash of the fields for much faster access than: eight_fifty_sixes = record.find_all { |field| field.tag == "856" } However, it comes at cost (that is, there's a penalty in building the field map). This penalty is offset if you wind up doing a lot of one-off lookups in a single record. If you're simply looking for a single field in every record (or know, before hand, what fields you're looking for), it's *much* faster to do something like: tags = ['001', '020', '100', '110', '111', '245', '650', '856'] fields = record.find_all { | field | tags.include?(field.tag) } or whatever. At some point we did benchmark of this (Bill Dueber did it: https://gist.github.com/591907) and the threshold was somewhere around 6 or so #find_all calls were needed to offset building the field map. This is why it's not really documented. This is the sort of thing that really needs to go into the ruby-marc wiki. BTW, the behavior exists for subfields, too. If you do something like record['043']['a'] and there are multiple subfield "a"s, you'll only get the first one. -Ross. > > > On Thu, May 19, 2011 at 1:16 PM, James Lecard wrote: > >> I'll dig in this one, thanks for this input Jonathan... I'm not so so >> familiar with the library yet, I'll do some more debugging but in fact what >> is happening is that I have no value with an access such as >> record['856']['u'] field, while I get one for record['856']['q'] >> And the marc you are seeing is copy/pasted from a marc editor gui, its not >> the actual marc record, I edited it so that its data is not recognisable >> (for confidentiality). >> >> James >> >> >> 2011/5/19 Jonathan Rochkind >> >> > Now whether it _means_ what you want it to mean is another question, >> yeah. >> > As Andreas said, I don't think that particular example _ought_ to have >> two >> > 856's. >> > >> > But it ought to be perfectly parseable marc. >> > >> > If your 'patch' is to make ruby-marc combine those multiple 856's into >> one >> > -- that is not right, two seperate 856's are two seperate 856's, same as >> any >> > other marc field. Applying that patch would mess up ruby-marc, not fix >> it. >> > >> > You need to be more specific about what you're doing and what you mean >> > exactly by 'causing the ruby library to ignore it'. I wonder if you are >> > just using the a method in ruby-marc which only returns the first field >> > matching a given tag when there is more than one. >> > >> > >> > >> > >> > On 5/19/2011 12:51 PM, Andreas Orphanides wrote: >> > >> >> From the MARC documentation [1]: >> >> >> >> "Field 856 is repeated when the location data elements vary (the URL in >> >> subfield $u or subfields $a, $b, $d, when used). It is also repeated >> when >> >> more than one access method is used, different portions of the item are >> >> available electronically, mirror sites are recorded, different >> >> formats/resolutions with different URLs are indicated, and related items >> are >> >> recorded." >> >> >> >> So it looks like however the URL is handled, a single 856 field should >> be >> >> used to indicate the location [2]. I am not familiar enough with MARC to >> say >> >> how it "should" have been done, but it looks like $q and $u would >> probably >> >> be sufficient (if they're in the same line). >> >> >> >> However, since the field is repeatable, the parser shouldn't be choking >> on >> >> it, unless it's choking on it for a sophisticated reason (e.g., "These >> >> aren't the subfield tags I expect to be seeing"). It also looks like if >> $u >> >> is provided, the first subfield should indicate access method (in this >> case >> >> "4" for HTTP). Maybe that's what's causing the problem? [3] >> >> >> >> Anyway, I think having these two parts of the same URL data on separate >> >> lines is definitely Not Right, but I am not sure if it adds up to >> invalid >> >> MARC. >> >> >> >> -dre. >> >> >> >> [1] http://www.loc.gov/marc/bibliographic/bd856.html >> >> [2] I am not a cataloger. Don't hurt me. >> >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. >> >> >> >> On 5/19/2011 12:37 PM, James Lecard wrote: >> >> >> >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I >> get >> >>> from a partner. >> >>> >> >
Re: [CODE4LIB] MARCXML to MODS: 590 Field
Jon and Karen are correct. LC doesn't map/convert local fields because usage varies. Tracy Tracy Meehleib Network Development and MARC Standards Office Library of Congress 101 Independence Ave SE Washington, DC 20540-4402 +1 202 707 0121 (voice) +1 202 707 0115 (fax) t...@loc.gov -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Miller Sent: Thursday, May 19, 2011 12:35 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field Joel, The 590 is indeed defined for local use, so whatever your local institution uses it for should guide your mapping to MODS. There are some examples of what it's used for on the OCLC Bibliographic Formats and Standards pages: http://www.oclc.org/bibformats/en/5xx/590.shtm Frequently it's used as a note that is specific to a local copy of an item. If your institution uses it inconsistently, you might want to just map it to mods:note. Karen Karen D. Miller Monographic/Digital Projects Cataloger Bibliographic Services Dept. Northwestern University Library Evanston, IL k-mill...@northwestern.edu 847-467-3462 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon Stroop Sent: Thursday, May 19, 2011 11:07 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field I'm going to guess that it's because 59x fields are defined for local use: http://www.loc.gov/marc/bibliographic/bd59x.html ...but someone from LC should be able to confirm. -Jon -- Jon Stroop Metadata Analyst Firestone Library Princeton University Princeton, NJ 08544 Email: jstr...@princeton.edu Phone: (609)258-0059 Fax: (609)258-0441 http://pudl.princeton.edu http://diglib.princeton.edu http://diglib.princeton.edu/ead http://www.cpanda.org/cpanda On 05/19/2011 11:45 AM, Richard, Joel M wrote: > Dear hive-mind, > > Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT [1] does not handle the MARC 590 Local Notes field? It seems to handle everything else, not that I've done an exhaustive search... :) > > Granted, I could copy/create my own XSLT and add this functionality in myself, but I'm curious as to whether or not there's some logic behind this decision to not include it. Logic that I would not naturally understand since I'm not formally trained as a librarian. > > Thanks! > --Joel > > [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl > > > Joel Richard > IT Specialist, Web Services Department > Smithsonian Institution Libraries | http://www.sil.si.edu/ > (202) 633-1706 | richar...@si.edu
Re: [CODE4LIB] Seth Godin on The future of the library
On 5/19/2011 1:23 PM, Ryan Engel wrote: There are some who argue that if it's valuable to others, then others should pay for it (even when the improved access benefits your institution first and foremost, and distribution of the improvements is an arguably beneficial side effect) . Why should one institution carry the financial burden of improving something that benefits others beyond that institution? It's not an argument I agree with, but it's one I've heard before. It is a somewhat odd position especially for libraries who have been in the business of providing service to others at no profit to themselves for many years, including in technical matters such as cooperative cataloging and lending via ILL. Libraries have gotten to be where they are today by willing to chip in for the general good on a sort of "generalized reciprocity" basis. Jonathan
Re: [CODE4LIB] is this valid marc ?
I believe that the ruby-marc API, when you do record['856'], you just get the first 856, if there are more than one. You have to use other API (I forget offhand) to get more than one, the ['856'] is just a shortcut when you will only have one or only care about the first one. So I don't think there's any bug in ruby-marc. Your data example is _odd_ though, it's not usual to record 856's like that, and it probably shouldn't be recorded like that. Multiple 856's can exist where then are in fact multiple URLs recorded. On 5/19/2011 1:16 PM, James Lecard wrote: I'll dig in this one, thanks for this input Jonathan... I'm not so so familiar with the library yet, I'll do some more debugging but in fact what is happening is that I have no value with an access such as record['856']['u'] field, while I get one for record['856']['q'] And the marc you are seeing is copy/pasted from a marc editor gui, its not the actual marc record, I edited it so that its data is not recognisable (for confidentiality). James 2011/5/19 Jonathan Rochkind Now whether it _means_ what you want it to mean is another question, yeah. As Andreas said, I don't think that particular example _ought_ to have two 856's. But it ought to be perfectly parseable marc. If your 'patch' is to make ruby-marc combine those multiple 856's into one -- that is not right, two seperate 856's are two seperate 856's, same as any other marc field. Applying that patch would mess up ruby-marc, not fix it. You need to be more specific about what you're doing and what you mean exactly by 'causing the ruby library to ignore it'. I wonder if you are just using the a method in ruby-marc which only returns the first field matching a given tag when there is more than one. On 5/19/2011 12:51 PM, Andreas Orphanides wrote: From the MARC documentation [1]: "Field 856 is repeated when the location data elements vary (the URL in subfield $u or subfields $a, $b, $d, when used). It is also repeated when more than one access method is used, different portions of the item are available electronically, mirror sites are recorded, different formats/resolutions with different URLs are indicated, and related items are recorded." So it looks like however the URL is handled, a single 856 field should be used to indicate the location [2]. I am not familiar enough with MARC to say how it "should" have been done, but it looks like $q and $u would probably be sufficient (if they're in the same line). However, since the field is repeatable, the parser shouldn't be choking on it, unless it's choking on it for a sophisticated reason (e.g., "These aren't the subfield tags I expect to be seeing"). It also looks like if $u is provided, the first subfield should indicate access method (in this case "4" for HTTP). Maybe that's what's causing the problem? [3] Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
record['856'] is defined to return the *first* 856 in the record, which, if you look at the documentation...er...ok. Which is not documented as such in MARC::Record (http://rubydoc.info/gems/marc/0.4.2/MARC/Record) To get them all, you need to do something like sixfifties = record.fields '650' # returns array of results Or, to iterate record.each_by_tag('650') do |f| puts f['u'] if f['u'] # print out a URL if we've got one end On Thu, May 19, 2011 at 1:16 PM, James Lecard wrote: > I'll dig in this one, thanks for this input Jonathan... I'm not so so > familiar with the library yet, I'll do some more debugging but in fact what > is happening is that I have no value with an access such as > record['856']['u'] field, while I get one for record['856']['q'] > And the marc you are seeing is copy/pasted from a marc editor gui, its not > the actual marc record, I edited it so that its data is not recognisable > (for confidentiality). > > James > > > 2011/5/19 Jonathan Rochkind > > > Now whether it _means_ what you want it to mean is another question, > yeah. > > As Andreas said, I don't think that particular example _ought_ to have > two > > 856's. > > > > But it ought to be perfectly parseable marc. > > > > If your 'patch' is to make ruby-marc combine those multiple 856's into > one > > -- that is not right, two seperate 856's are two seperate 856's, same as > any > > other marc field. Applying that patch would mess up ruby-marc, not fix > it. > > > > You need to be more specific about what you're doing and what you mean > > exactly by 'causing the ruby library to ignore it'. I wonder if you are > > just using the a method in ruby-marc which only returns the first field > > matching a given tag when there is more than one. > > > > > > > > > > On 5/19/2011 12:51 PM, Andreas Orphanides wrote: > > > >> From the MARC documentation [1]: > >> > >> "Field 856 is repeated when the location data elements vary (the URL in > >> subfield $u or subfields $a, $b, $d, when used). It is also repeated > when > >> more than one access method is used, different portions of the item are > >> available electronically, mirror sites are recorded, different > >> formats/resolutions with different URLs are indicated, and related items > are > >> recorded." > >> > >> So it looks like however the URL is handled, a single 856 field should > be > >> used to indicate the location [2]. I am not familiar enough with MARC to > say > >> how it "should" have been done, but it looks like $q and $u would > probably > >> be sufficient (if they're in the same line). > >> > >> However, since the field is repeatable, the parser shouldn't be choking > on > >> it, unless it's choking on it for a sophisticated reason (e.g., "These > >> aren't the subfield tags I expect to be seeing"). It also looks like if > $u > >> is provided, the first subfield should indicate access method (in this > case > >> "4" for HTTP). Maybe that's what's causing the problem? [3] > >> > >> Anyway, I think having these two parts of the same URL data on separate > >> lines is definitely Not Right, but I am not sure if it adds up to > invalid > >> MARC. > >> > >> -dre. > >> > >> [1] http://www.loc.gov/marc/bibliographic/bd856.html > >> [2] I am not a cataloger. Don't hurt me. > >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. > >> > >> On 5/19/2011 12:37 PM, James Lecard wrote: > >> > >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I > get > >>> from a partner. > >>> > >>> The 856 field is splitted over 2 lines, causing the ruby library to > >>> ignore > >>> it (I've patched it to overcome this issue) but I want to know if this > >>> kind > >>> of marc is valid ? > >>> > >>> =LDR 00638nam 2200181uu 4500 > >>> =001 cla-MldNA01 > >>> =008 080101s2008\\\|fre|| > >>> =040 \\$aMy Provider > >>> =041 0\$afre > >>> =245 10$aThis Subject > >>> =260 \\$aParis$bJ. Doe$c2008 > >>> =490 \\$aSome topic > >>> =650 1\$aNarratif, Autre forme > >>> =655 \7$abook$2lcsh > >>> =752 \\$aA Place on earth > >>> =776 \\$dParis: John Doe and Cie, 1973 > >>> =856 \2$qtext/html > >>> =856 > \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library > >>> > >>> Thanks, > >>> > >>> James L. > >>> > >> > >> > -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Seth Godin on The future of the library
There are some who argue that if it's valuable to others, then others should pay for it (even when the improved access benefits your institution first and foremost, and distribution of the improvements is an arguably beneficial side effect) . Why should one institution carry the financial burden of improving something that benefits others beyond that institution? It's not an argument I agree with, but it's one I've heard before. Luciano Ramalho wrote: On Thu, May 19, 2011 at 6:24 AM, graham wrote: 2. It is hard to justify spending time on improving access to free stuff when the end result would be good for everyone, not just the institution doing the work (unless it can be kept in a consortium and outside-world access limited) Why is it hard to justify anything that would be good for everyone?
Re: [CODE4LIB] is this valid marc ?
You've gotten some other good responses, but I thought I'd mention the LoC and OCLC sites on MARC if you haven't seen them yet. First, the LoC site at http://www.loc.gov/marc/. This is what I use as a guide and a reference. Some folks prefer the OCLC docs http://www.oclc.org/bibformats/en/, particularly if they're an OCLC member. Of course, these apply to MARC-21 and not UniMarc. Not sure what good resources are out there for UniMARC. Jon Gorman
Re: [CODE4LIB] is this valid marc ?
I'll dig in this one, thanks for this input Jonathan... I'm not so so familiar with the library yet, I'll do some more debugging but in fact what is happening is that I have no value with an access such as record['856']['u'] field, while I get one for record['856']['q'] And the marc you are seeing is copy/pasted from a marc editor gui, its not the actual marc record, I edited it so that its data is not recognisable (for confidentiality). James 2011/5/19 Jonathan Rochkind > Now whether it _means_ what you want it to mean is another question, yeah. > As Andreas said, I don't think that particular example _ought_ to have two > 856's. > > But it ought to be perfectly parseable marc. > > If your 'patch' is to make ruby-marc combine those multiple 856's into one > -- that is not right, two seperate 856's are two seperate 856's, same as any > other marc field. Applying that patch would mess up ruby-marc, not fix it. > > You need to be more specific about what you're doing and what you mean > exactly by 'causing the ruby library to ignore it'. I wonder if you are > just using the a method in ruby-marc which only returns the first field > matching a given tag when there is more than one. > > > > > On 5/19/2011 12:51 PM, Andreas Orphanides wrote: > >> From the MARC documentation [1]: >> >> "Field 856 is repeated when the location data elements vary (the URL in >> subfield $u or subfields $a, $b, $d, when used). It is also repeated when >> more than one access method is used, different portions of the item are >> available electronically, mirror sites are recorded, different >> formats/resolutions with different URLs are indicated, and related items are >> recorded." >> >> So it looks like however the URL is handled, a single 856 field should be >> used to indicate the location [2]. I am not familiar enough with MARC to say >> how it "should" have been done, but it looks like $q and $u would probably >> be sufficient (if they're in the same line). >> >> However, since the field is repeatable, the parser shouldn't be choking on >> it, unless it's choking on it for a sophisticated reason (e.g., "These >> aren't the subfield tags I expect to be seeing"). It also looks like if $u >> is provided, the first subfield should indicate access method (in this case >> "4" for HTTP). Maybe that's what's causing the problem? [3] >> >> Anyway, I think having these two parts of the same URL data on separate >> lines is definitely Not Right, but I am not sure if it adds up to invalid >> MARC. >> >> -dre. >> >> [1] http://www.loc.gov/marc/bibliographic/bd856.html >> [2] I am not a cataloger. Don't hurt me. >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. >> >> On 5/19/2011 12:37 PM, James Lecard wrote: >> >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get >>> from a partner. >>> >>> The 856 field is splitted over 2 lines, causing the ruby library to >>> ignore >>> it (I've patched it to overcome this issue) but I want to know if this >>> kind >>> of marc is valid ? >>> >>> =LDR 00638nam 2200181uu 4500 >>> =001 cla-MldNA01 >>> =008 080101s2008\\\|fre|| >>> =040 \\$aMy Provider >>> =041 0\$afre >>> =245 10$aThis Subject >>> =260 \\$aParis$bJ. Doe$c2008 >>> =490 \\$aSome topic >>> =650 1\$aNarratif, Autre forme >>> =655 \7$abook$2lcsh >>> =752 \\$aA Place on earth >>> =776 \\$dParis: John Doe and Cie, 1973 >>> =856 \2$qtext/html >>> =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library >>> >>> Thanks, >>> >>> James L. >>> >> >>
Re: [CODE4LIB] is this valid marc ?
I'm curious what's going on here, it doesn't make any sense. Do you just mean that your MARC file has more than one 856 in it? That's what your pasted marc looks like, but that is definitely legal, AND I've parsed many many marc files with more than one 856 in them, with ruby-marc, it was not a problem. I do it all the time. Or do you mean your 856 had a newline ("\n") in it? I don't know if I've ever tried that, although yes, it should be legal. But if ruby-marc has a bug there, yes it needs to be fixed. What form is your marc in that you are parsing with ruby-marc? marc21 binary? marcxml? Or are you actually trying to parse what you pasted in, that weird marc-as-human-readable-text format? I vaguely recall ruby-marc having a method to parse such marc-as-human-readable-text, but I'm not sure if it's actually a _standard_ at all, so I'm not sure if it's possible to say what should or shouldn't be legal in it. Jonathan On 5/19/2011 12:49 PM, James Lecard wrote: Thanks a lot Richard, So I guess my patch could be ported to the source code of ruby-marc, Let me know if interested, James 2011/5/19 Richard, Joel M I'm no MARC expert, but I've learned enough to say that yes, this is valid in that what you're seeing is the $q (Electronic format type) and $u (Uniform Resource Identifier ) subfields of the 856 field. http://www.oclc.org/bibformats/en/8xx/856.shtm You'll see other things when you get multiple authors (creators) on an item or multiple anythings that can occur more than once. --Joel Joel Richard IT Specialist, Web Services Department Smithsonian Institution Libraries | http://www.sil.si.edu/ (202) 633-1706 | richar...@si.edu On May 19, 2011, at 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
Now whether it _means_ what you want it to mean is another question, yeah. As Andreas said, I don't think that particular example _ought_ to have two 856's. But it ought to be perfectly parseable marc. If your 'patch' is to make ruby-marc combine those multiple 856's into one -- that is not right, two seperate 856's are two seperate 856's, same as any other marc field. Applying that patch would mess up ruby-marc, not fix it. You need to be more specific about what you're doing and what you mean exactly by 'causing the ruby library to ignore it'. I wonder if you are just using the a method in ruby-marc which only returns the first field matching a given tag when there is more than one. On 5/19/2011 12:51 PM, Andreas Orphanides wrote: From the MARC documentation [1]: "Field 856 is repeated when the location data elements vary (the URL in subfield $u or subfields $a, $b, $d, when used). It is also repeated when more than one access method is used, different portions of the item are available electronically, mirror sites are recorded, different formats/resolutions with different URLs are indicated, and related items are recorded." So it looks like however the URL is handled, a single 856 field should be used to indicate the location [2]. I am not familiar enough with MARC to say how it "should" have been done, but it looks like $q and $u would probably be sufficient (if they're in the same line). However, since the field is repeatable, the parser shouldn't be choking on it, unless it's choking on it for a sophisticated reason (e.g., "These aren't the subfield tags I expect to be seeing"). It also looks like if $u is provided, the first subfield should indicate access method (in this case "4" for HTTP). Maybe that's what's causing the problem? [3] Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
In my last message, some of my "subfield"s should of course read "indicator". Still digesting lunch -dre. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
Thanks a lot Richard, So I guess my patch could be ported to the source code of ruby-marc, Let me know if interested, James 2011/5/19 Richard, Joel M > I'm no MARC expert, but I've learned enough to say that yes, this is valid > in that what you're seeing is the $q (Electronic format type) and $u > (Uniform Resource Identifier ) subfields of the 856 field. > > http://www.oclc.org/bibformats/en/8xx/856.shtm > > You'll see other things when you get multiple authors (creators) on an item > or multiple anythings that can occur more than once. > > --Joel > > Joel Richard > IT Specialist, Web Services Department > Smithsonian Institution Libraries | http://www.sil.si.edu/ > (202) 633-1706 | richar...@si.edu > > > > > On May 19, 2011, at 12:37 PM, James Lecard wrote: > > > I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get > > from a partner. > > > > The 856 field is splitted over 2 lines, causing the ruby library to > ignore > > it (I've patched it to overcome this issue) but I want to know if this > kind > > of marc is valid ? > > > > =LDR 00638nam 2200181uu 4500 > > =001 cla-MldNA01 > > =008 080101s2008\\\|fre|| > > =040 \\$aMy Provider > > =041 0\$afre > > =245 10$aThis Subject > > =260 \\$aParis$bJ. Doe$c2008 > > =490 \\$aSome topic > > =650 1\$aNarratif, Autre forme > > =655 \7$abook$2lcsh > > =752 \\$aA Place on earth > > =776 \\$dParis: John Doe and Cie, 1973 > > =856 \2$qtext/html > > =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library > > > > Thanks, > > > > James L. >
Re: [CODE4LIB] is this valid marc ?
From the MARC documentation [1]: "Field 856 is repeated when the location data elements vary (the URL in subfield $u or subfields $a, $b, $d, when used). It is also repeated when more than one access method is used, different portions of the item are available electronically, mirror sites are recorded, different formats/resolutions with different URLs are indicated, and related items are recorded." So it looks like however the URL is handled, a single 856 field should be used to indicate the location [2]. I am not familiar enough with MARC to say how it "should" have been done, but it looks like $q and $u would probably be sufficient (if they're in the same line). However, since the field is repeatable, the parser shouldn't be choking on it, unless it's choking on it for a sophisticated reason (e.g., "These aren't the subfield tags I expect to be seeing"). It also looks like if $u is provided, the first subfield should indicate access method (in this case "4" for HTTP). Maybe that's what's causing the problem? [3] Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] MARCXML to MODS: 590 Field
Thanks, Karen and Jon! That's what I suspected, but I couldn't find anything on the web about the thought process behind ignoring the 590 altogether. We'll likely end up using a local version of the XSLT to map it the mods:note as you suggested. We simply don't want this information to be lost in our MODS record as we, for example, embed it inside a METS document. --Joel On May 19, 2011, at 12:34 PM, Karen Miller wrote: > Joel, > > The 590 is indeed defined for local use, so whatever your local institution > uses it for should guide your mapping to MODS. There are some examples of > what it's used for on the OCLC Bibliographic Formats and Standards pages: > > http://www.oclc.org/bibformats/en/5xx/590.shtm > > Frequently it's used as a note that is specific to a local copy of an item. > If your institution uses it inconsistently, you might want to just map it to > mods:note. > > Karen > > Karen D. Miller > Monographic/Digital Projects Cataloger > Bibliographic Services Dept. > Northwestern University Library > Evanston, IL > k-mill...@northwestern.edu > 847-467-3462 > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon > Stroop > Sent: Thursday, May 19, 2011 11:07 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field > > I'm going to guess that it's because 59x fields are defined for local use: > > http://www.loc.gov/marc/bibliographic/bd59x.html > > ...but someone from LC should be able to confirm. > -Jon > > -- > Jon Stroop > Metadata Analyst > Firestone Library > Princeton University > Princeton, NJ 08544 > > Email: jstr...@princeton.edu > Phone: (609)258-0059 > Fax: (609)258-0441 > > http://pudl.princeton.edu > http://diglib.princeton.edu > http://diglib.princeton.edu/ead > http://www.cpanda.org/cpanda > > > > On 05/19/2011 11:45 AM, Richard, Joel M wrote: >> Dear hive-mind, >> >> Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT > [1] does not handle the MARC 590 Local Notes field? It seems to handle > everything else, not that I've done an exhaustive search... :) >> >> Granted, I could copy/create my own XSLT and add this functionality in > myself, but I'm curious as to whether or not there's some logic behind this > decision to not include it. Logic that I would not naturally understand > since I'm not formally trained as a librarian. >> >> Thanks! >> --Joel >> >> [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl >> >> >> Joel Richard >> IT Specialist, Web Services Department >> Smithsonian Institution Libraries | http://www.sil.si.edu/ >> (202) 633-1706 | richar...@si.edu
Re: [CODE4LIB] is this valid marc ?
I'm no MARC expert, but I've learned enough to say that yes, this is valid in that what you're seeing is the $q (Electronic format type) and $u (Uniform Resource Identifier ) subfields of the 856 field. http://www.oclc.org/bibformats/en/8xx/856.shtm You'll see other things when you get multiple authors (creators) on an item or multiple anythings that can occur more than once. --Joel Joel Richard IT Specialist, Web Services Department Smithsonian Institution Libraries | http://www.sil.si.edu/ (202) 633-1706 | richar...@si.edu On May 19, 2011, at 12:37 PM, James Lecard wrote: > I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get > from a partner. > > The 856 field is splitted over 2 lines, causing the ruby library to ignore > it (I've patched it to overcome this issue) but I want to know if this kind > of marc is valid ? > > =LDR 00638nam 2200181uu 4500 > =001 cla-MldNA01 > =008 080101s2008\\\|fre|| > =040 \\$aMy Provider > =041 0\$afre > =245 10$aThis Subject > =260 \\$aParis$bJ. Doe$c2008 > =490 \\$aSome topic > =650 1\$aNarratif, Autre forme > =655 \7$abook$2lcsh > =752 \\$aA Place on earth > =776 \\$dParis: John Doe and Cie, 1973 > =856 \2$qtext/html > =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library > > Thanks, > > James L.
[CODE4LIB] is this valid marc ?
I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] MARCXML to MODS: 590 Field
Joel, The 590 is indeed defined for local use, so whatever your local institution uses it for should guide your mapping to MODS. There are some examples of what it's used for on the OCLC Bibliographic Formats and Standards pages: http://www.oclc.org/bibformats/en/5xx/590.shtm Frequently it's used as a note that is specific to a local copy of an item. If your institution uses it inconsistently, you might want to just map it to mods:note. Karen Karen D. Miller Monographic/Digital Projects Cataloger Bibliographic Services Dept. Northwestern University Library Evanston, IL k-mill...@northwestern.edu 847-467-3462 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon Stroop Sent: Thursday, May 19, 2011 11:07 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field I'm going to guess that it's because 59x fields are defined for local use: http://www.loc.gov/marc/bibliographic/bd59x.html ...but someone from LC should be able to confirm. -Jon -- Jon Stroop Metadata Analyst Firestone Library Princeton University Princeton, NJ 08544 Email: jstr...@princeton.edu Phone: (609)258-0059 Fax: (609)258-0441 http://pudl.princeton.edu http://diglib.princeton.edu http://diglib.princeton.edu/ead http://www.cpanda.org/cpanda On 05/19/2011 11:45 AM, Richard, Joel M wrote: > Dear hive-mind, > > Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT [1] does not handle the MARC 590 Local Notes field? It seems to handle everything else, not that I've done an exhaustive search... :) > > Granted, I could copy/create my own XSLT and add this functionality in myself, but I'm curious as to whether or not there's some logic behind this decision to not include it. Logic that I would not naturally understand since I'm not formally trained as a librarian. > > Thanks! > --Joel > > [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl > > > Joel Richard > IT Specialist, Web Services Department > Smithsonian Institution Libraries | http://www.sil.si.edu/ > (202) 633-1706 | richar...@si.edu
Re: [CODE4LIB] MARCXML to MODS: 590 Field
I'm going to guess that it's because 59x fields are defined for local use: http://www.loc.gov/marc/bibliographic/bd59x.html ...but someone from LC should be able to confirm. -Jon -- Jon Stroop Metadata Analyst Firestone Library Princeton University Princeton, NJ 08544 Email: jstr...@princeton.edu Phone: (609)258-0059 Fax: (609)258-0441 http://pudl.princeton.edu http://diglib.princeton.edu http://diglib.princeton.edu/ead http://www.cpanda.org/cpanda On 05/19/2011 11:45 AM, Richard, Joel M wrote: Dear hive-mind, Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT [1] does not handle the MARC 590 Local Notes field? It seems to handle everything else, not that I've done an exhaustive search... :) Granted, I could copy/create my own XSLT and add this functionality in myself, but I'm curious as to whether or not there's some logic behind this decision to not include it. Logic that I would not naturally understand since I'm not formally trained as a librarian. Thanks! --Joel [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl Joel Richard IT Specialist, Web Services Department Smithsonian Institution Libraries | http://www.sil.si.edu/ (202) 633-1706 | richar...@si.edu
Re: [CODE4LIB] Seth Godin on The future of the library
On Thu, May 19, 2011 at 8:31 AM, Andreas Orphanides wrote: > - As Graham says, there's a sunk-cost issue: you're going to prioritize the > stuff you paid for over free stuff since you've already invested resources in > it. Everybody who believes in sunk-cost should learn to play Go, the ancient japanese game. One of the things that you learn playing Go is to let go (pun intended) of resources already spent unwisely when there are better courses of action. Wikipedia has a good introductory article on the subjec "Escalation of commitment": http://en.wikipedia.org/wiki/Escalation_of_commitment -- Luciano Ramalho programador repentista || stand-up programmer Twitter: @luciano
Re: [CODE4LIB] Seth Godin on The future of the library
On Thu, May 19, 2011 at 6:24 AM, graham wrote: > 2. It is hard to justify spending time on improving access to free stuff > when the end result would be good for everyone, not just the institution > doing the work (unless it can be kept in a consortium and outside-world > access limited) Why is it hard to justify anything that would be good for everyone? -- Luciano Ramalho programador repentista || stand-up programmer Twitter: @luciano
[CODE4LIB] MARCXML to MODS: 590 Field
Dear hive-mind, Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT [1] does not handle the MARC 590 Local Notes field? It seems to handle everything else, not that I've done an exhaustive search... :) Granted, I could copy/create my own XSLT and add this functionality in myself, but I'm curious as to whether or not there's some logic behind this decision to not include it. Logic that I would not naturally understand since I'm not formally trained as a librarian. Thanks! --Joel [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl Joel Richard IT Specialist, Web Services Department Smithsonian Institution Libraries | http://www.sil.si.edu/ (202) 633-1706 | richar...@si.edu
Re: [CODE4LIB] Seth Godin on The future of the library
On 5/19/2011 11:01 AM, graham wrote: Replying to Jonathan's mail rather at random, since several people are saying similar things. 1. 'Free resources can vanish any time.' But so can commercial ones, which is why LOCKSS was created. This isn't an insoluble issue or one unique to free resources. You missed my point. The difficulty we have of dealing with the "breaking resources" problem is proportional to the number of vendors/sources we are dealing with. Dealing with 10 or 100 vendors is hard; dealing with 1000s of sources is harder. Ignoring free stuff is one easy way of not having to deal with this. (Not neccesarily an optimal one!). I do not disagree that there are huge advantages to free resources of course! Just trying to analyze some of the practical difficulties, which are not simply irrational prejudices or what have you. Also didn't mean to say that any of the challenges are insoluble or unique to free resources.
Re: [CODE4LIB] Seth Godin on The future of the library
There is no such thing as a zero-cost lunch; but there is such a thing as a freedom lunch. I concur with Karen that (once again) much confusion is being generated here by the English language's lamentable use of the same word "free" to mean too such different things. -- Mike. On 19 May 2011 16:01, graham wrote: > Replying to Jonathan's mail rather at random, since several people are > saying similar things. > > 1. 'Free resources can vanish any time.' But so can commercial ones, > which is why LOCKSS was created. This isn't an insoluble issue or one > unique to free resources. > > 2. 'Managing 100s of paid resources is difficult, managing 1000s of free > ones would be impossible'. But why on earth would you try? There are > many specialized free resources, only a few of which are likely to > provide material your particular library wants in its collection. Surely > you would select the ones you want, not least on grounds of reliability. > And on those grounds (longevity and reliability) you would end up using > Gutenberg in preference to any commercial supplier (not that I'm > suggesting you should)). Selection of commercial resources is done at > least in part by cost; selection of free ones can be done on more > appropriate grounds. > > 3. 'There is no such thing as a free lunch'. Who said there was? But > resources which can be used freely have advantages over ones that can't. > > > Graham > > On 05/19/11 15:44, Jonathan Rochkind wrote: >> Another problem with free online resources not just 'collection >> selection', but maintenance/support once selected. A resource hosted >> elsewhere can stop working at any time, which is a management challenge. >> >> The present environment is ALREADY a management challenge, of course. >> But consider the present environment: You subscribe to anywhere from a >> handful to around 100 seperate vendor 'platforms'. Each one can change >> it's interface at any time, or go down at any time, breaking your >> integration or access to it. When it does, you've got to notice (a hard >> problem in itself), and then file a support incident with the vendor. >> This is already a mess we have trouble keeping straight. But. >> >> Compare to the idea of hundreds or thousands or more different suppliers >> hosting free content, each one of which can change it's interface or go >> down at any time, and when you notice (still a hard problem, now even >> harder because you have more content from more hosts)... what do you do? >> >> One solution to this would be free content aggregators which hosted LOTS >> of free content on one platform (cutting down your number of sources to >> keep track of make sure they're working), and additionally, presumably >> for a fee, offered support services. >> >> Another direction would be not relying on remote platforms to host >> content, but hosting it internally. Which may be more 'business case' >> feasible with free content than with pay content -- the owners/providers >> dont' want to let us host the pay content locally. But hosting content >> locally comes with it's own expenses, the library needs to invest >> resources in developing/maintaining or purchasing the software (and >> hardware) to do that, as well as respond to maintenance issues with the >> local hosting. >> >> In the end, there's no such thing as a free lunch, as usual. "Free" >> content still isn't free for libraries to integrate with local >> interfaces and support well, whether that cost comes from internal >> staffing and other budgetting, or from paying a third party to help. Of >> course, some solutions are more cost efficient than others, not all are >> equal. >> >> Jonathan >> >> On 5/19/2011 9:31 AM, Bill Dueber wrote: >>> My short answer: It's too damn expensive to check out everything that's >>> available for free to see if it's worth selecting for inclusion, and >>> library's (at least as I see them) are supposed to be curated, not >>> comprehensive. >>> >>> My long answer: >>> >>> The most obvious issue is that the OPAC is traditionally a listing of >>> "holdings," and free ebooks aren't "held" in any sense that helps >>> disambiguate them from any other random text on the Internet. >>> Certainly the >>> fact that someone bothered to transform it into ebook form isn't >>> indicative >>> of anything. Not everything that's available can be cataloged. I see >>> "stuff >>> we paid for" not as an arbitrary bias, but simply as a very, very >>> useful way >>> to define the borders of the library. >>> >>> "Free" is a very recent phenomenon, but it just adds more complexity >>> to the >>> existing problem of deciding what publications are within the library's >>> scope. Library collections are curated, and that curation mission is not >>> simply a side effect of limited funds. The filtering process that goes >>> into >>> deciding what a library will hold is itself an incredibly valuable >>> aspect of >>> the collection. >>> >>> Up until very recently, the most important p
Re: [CODE4LIB] Seth Godin on The future of the library
I wonder if we aren't conflating a diverse set of issues here. - free (no cost) - free and online - free = not peer reviewed - online As Jonathan notes, we already face problems with online materials, even those we subscribe to. And libraries do take in free hard-copy books in the form of donations (although weeding through those is almost not worth the trouble). In addition, there are free materials like government documents (at least in the US) that are considered quite valuable. So it seems like "free" isn't the big issue here, it's management, selection, etc. kc Quoting Jonathan Rochkind : Another problem with free online resources not just 'collection selection', but maintenance/support once selected. A resource hosted elsewhere can stop working at any time, which is a management challenge. The present environment is ALREADY a management challenge, of course. But consider the present environment: You subscribe to anywhere from a handful to around 100 seperate vendor 'platforms'. Each one can change it's interface at any time, or go down at any time, breaking your integration or access to it. When it does, you've got to notice (a hard problem in itself), and then file a support incident with the vendor. This is already a mess we have trouble keeping straight. But. Compare to the idea of hundreds or thousands or more different suppliers hosting free content, each one of which can change it's interface or go down at any time, and when you notice (still a hard problem, now even harder because you have more content from more hosts)... what do you do? One solution to this would be free content aggregators which hosted LOTS of free content on one platform (cutting down your number of sources to keep track of make sure they're working), and additionally, presumably for a fee, offered support services. Another direction would be not relying on remote platforms to host content, but hosting it internally. Which may be more 'business case' feasible with free content than with pay content -- the owners/providers dont' want to let us host the pay content locally. But hosting content locally comes with it's own expenses, the library needs to invest resources in developing/maintaining or purchasing the software (and hardware) to do that, as well as respond to maintenance issues with the local hosting. In the end, there's no such thing as a free lunch, as usual. "Free" content still isn't free for libraries to integrate with local interfaces and support well, whether that cost comes from internal staffing and other budgetting, or from paying a third party to help. Of course, some solutions are more cost efficient than others, not all are equal. Jonathan On 5/19/2011 9:31 AM, Bill Dueber wrote: My short answer: It's too damn expensive to check out everything that's available for free to see if it's worth selecting for inclusion, and library's (at least as I see them) are supposed to be curated, not comprehensive. My long answer: The most obvious issue is that the OPAC is traditionally a listing of "holdings," and free ebooks aren't "held" in any sense that helps disambiguate them from any other random text on the Internet. Certainly the fact that someone bothered to transform it into ebook form isn't indicative of anything. Not everything that's available can be cataloged. I see "stuff we paid for" not as an arbitrary bias, but simply as a very, very useful way to define the borders of the library. "Free" is a very recent phenomenon, but it just adds more complexity to the existing problem of deciding what publications are within the library's scope. Library collections are curated, and that curation mission is not simply a side effect of limited funds. The filtering process that goes into deciding what a library will hold is itself an incredibly valuable aspect of the collection. Up until very recently, the most important pre-purchase filter was the fact that some publisher thought she could make some money by printing text on paper, and by doing so also allocated resources to edit/typeset/etc. For a traditionally-published work, we know that real person(s), with relatively transparent goals, has already read it and decided it was worth the gamble to sink some fixed costs into the project. It certainly wasn't a perfect filter, but anyone who claims it didn't add enormous information to the system is being disingenuous. Now that (e)publishing and (e)printing costs have nosedived toward $0.00, that filter is breaking. Even print-on-paper costs have been reduced enormously. But going through the slush pile, doing market research, filtering, editing, marketing -- these things all cost money, and for the moment the traditional publishing houses still do them better and more efficiently than anyone else. And they expect to be paid for their work, and they should. There's a tendency in the library world, I think, to dismi
Re: [CODE4LIB] Seth Godin on The future of the library
Replying to Jonathan's mail rather at random, since several people are saying similar things. 1. 'Free resources can vanish any time.' But so can commercial ones, which is why LOCKSS was created. This isn't an insoluble issue or one unique to free resources. 2. 'Managing 100s of paid resources is difficult, managing 1000s of free ones would be impossible'. But why on earth would you try? There are many specialized free resources, only a few of which are likely to provide material your particular library wants in its collection. Surely you would select the ones you want, not least on grounds of reliability. And on those grounds (longevity and reliability) you would end up using Gutenberg in preference to any commercial supplier (not that I'm suggesting you should)). Selection of commercial resources is done at least in part by cost; selection of free ones can be done on more appropriate grounds. 3. 'There is no such thing as a free lunch'. Who said there was? But resources which can be used freely have advantages over ones that can't. Graham On 05/19/11 15:44, Jonathan Rochkind wrote: > Another problem with free online resources not just 'collection > selection', but maintenance/support once selected. A resource hosted > elsewhere can stop working at any time, which is a management challenge. > > The present environment is ALREADY a management challenge, of course. > But consider the present environment: You subscribe to anywhere from a > handful to around 100 seperate vendor 'platforms'. Each one can change > it's interface at any time, or go down at any time, breaking your > integration or access to it. When it does, you've got to notice (a hard > problem in itself), and then file a support incident with the vendor. > This is already a mess we have trouble keeping straight. But. > > Compare to the idea of hundreds or thousands or more different suppliers > hosting free content, each one of which can change it's interface or go > down at any time, and when you notice (still a hard problem, now even > harder because you have more content from more hosts)... what do you do? > > One solution to this would be free content aggregators which hosted LOTS > of free content on one platform (cutting down your number of sources to > keep track of make sure they're working), and additionally, presumably > for a fee, offered support services. > > Another direction would be not relying on remote platforms to host > content, but hosting it internally. Which may be more 'business case' > feasible with free content than with pay content -- the owners/providers > dont' want to let us host the pay content locally. But hosting content > locally comes with it's own expenses, the library needs to invest > resources in developing/maintaining or purchasing the software (and > hardware) to do that, as well as respond to maintenance issues with the > local hosting. > > In the end, there's no such thing as a free lunch, as usual. "Free" > content still isn't free for libraries to integrate with local > interfaces and support well, whether that cost comes from internal > staffing and other budgetting, or from paying a third party to help. Of > course, some solutions are more cost efficient than others, not all are > equal. > > Jonathan > > On 5/19/2011 9:31 AM, Bill Dueber wrote: >> My short answer: It's too damn expensive to check out everything that's >> available for free to see if it's worth selecting for inclusion, and >> library's (at least as I see them) are supposed to be curated, not >> comprehensive. >> >> My long answer: >> >> The most obvious issue is that the OPAC is traditionally a listing of >> "holdings," and free ebooks aren't "held" in any sense that helps >> disambiguate them from any other random text on the Internet. >> Certainly the >> fact that someone bothered to transform it into ebook form isn't >> indicative >> of anything. Not everything that's available can be cataloged. I see >> "stuff >> we paid for" not as an arbitrary bias, but simply as a very, very >> useful way >> to define the borders of the library. >> >> "Free" is a very recent phenomenon, but it just adds more complexity >> to the >> existing problem of deciding what publications are within the library's >> scope. Library collections are curated, and that curation mission is not >> simply a side effect of limited funds. The filtering process that goes >> into >> deciding what a library will hold is itself an incredibly valuable >> aspect of >> the collection. >> >> Up until very recently, the most important pre-purchase filter was the >> fact >> that some publisher thought she could make some money by printing text on >> paper, and by doing so also allocated resources to edit/typeset/etc. >> For a >> traditionally-published work, we know that real person(s), with >> relatively >> transparent goals, has already read it and decided it was worth the >> gamble >> to sink some fixed costs into the project. It certainly wasn't a per
[CODE4LIB] Job Posting: Web Developer, Smithsonian Institution Libraries
The Smithsonian Institution Libraries is recruiting for a web developer position. We are in the midst of many interesting projects right now, including working with linked open data, building a new digital library, moving to Drupal, mass-digitization, and other projects. The Libraries serves a broad audience including researchers throughout the Institution – from Art to Zoology – as well as affiliated scientists and curators, students, and the general public. We are a small and friendly department that has a lot of support from management. More information can be found here http://www.sil.si.edu/link/?webdev or on http://www.USAjobs.gov by searching for Job Announcement Number: 11R-LG-296860-MPA-SIL The Smithsonian Institution is an EEO employer. Joel Richard IT Specialist, Web Services Department Smithsonian Institution Libraries | http://www.sil.si.edu/ (202) 633-1706 | richar...@si.edu
Re: [CODE4LIB] wikipedia/author disambiguation
Curious what script you've used that isn't production ready -- I don't think you meant to post in the URL for the JQuery library? On 5/19/2011 10:39 AM, Karen Coyle wrote: This sounds like a great way to "translate" from library forms to wikipedia name forms. But for on-the-fly use I wonder if it wouldn't be more efficient to eliminate the "middle man." Karen, can you say a little about what it took to link library names to WP? Was it a one-step, two-step, etc.? There is a script that I've seen used, although it doesn't seem to be production ready: https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js One interesting note from the OL experience of linking to WP: generally you need to "re-reverse" the names to get a match: from Twain, Mark to Mark Twain. But for some names that isn't the case: Mao, Tse-Tung. Edward Betts used Wikipedia to determine which names do not get "re-reversed". The OL code for its wikipedia lookup is at: https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/wikipedia It, however, runs against dumps rather than an API. kc Quoting Karen Coombs : Graham, I'd advocate using WorldCat Identities to get to the appropriate url for dbpedia. Each Identity record has a wikipedia element in it that you could use to link to either Wikipedia or dbpedia. If you want to see an example of this in action you can check out the Author Info demo I did for code4lib 2010 here - http://www.librarywebchic.net/mashups/author_info/info_about_this_author.php?OCLCNum=32939031 The code for this demo is available for download at - http://www.worldcat.org/devnet/code/devnetDemos/trunk/ You'll want the author_info folder and identity_info.php Karen Karen A. Coombs Product Manager OCLC Developer Network coom...@oclc.org On Thu, May 19, 2011 at 4:40 AM, graham wrote: I need to be able to take author data from a catalogue record and use it to look up the author on Wikipedia on the fly. So I may have birth date and possibly year of death in addition to (one spelling of) the name, the title of one book the author wrote etc. I know there are various efforts in progress that will improve the current situation, but as things stand at the moment what is the best* way to do this? 1. query wikipedia for as much as possible, parse and select the best fitting result 2. go via dbpedia/freebase and work back from there 3. use VIAF and/or OCLC services 4. Other? (I have no experience of 2-4 yet :-( Thanks Graham * 'best' being constrained by: - need to do this in real-time - need to avoid dependence on services which may be taken away or charged for - being able to justify to librarians as reasonably accurate :-)
Re: [CODE4LIB] Seth Godin on The future of the library
Another problem with free online resources not just 'collection selection', but maintenance/support once selected. A resource hosted elsewhere can stop working at any time, which is a management challenge. The present environment is ALREADY a management challenge, of course. But consider the present environment: You subscribe to anywhere from a handful to around 100 seperate vendor 'platforms'. Each one can change it's interface at any time, or go down at any time, breaking your integration or access to it. When it does, you've got to notice (a hard problem in itself), and then file a support incident with the vendor. This is already a mess we have trouble keeping straight. But. Compare to the idea of hundreds or thousands or more different suppliers hosting free content, each one of which can change it's interface or go down at any time, and when you notice (still a hard problem, now even harder because you have more content from more hosts)... what do you do? One solution to this would be free content aggregators which hosted LOTS of free content on one platform (cutting down your number of sources to keep track of make sure they're working), and additionally, presumably for a fee, offered support services. Another direction would be not relying on remote platforms to host content, but hosting it internally. Which may be more 'business case' feasible with free content than with pay content -- the owners/providers dont' want to let us host the pay content locally. But hosting content locally comes with it's own expenses, the library needs to invest resources in developing/maintaining or purchasing the software (and hardware) to do that, as well as respond to maintenance issues with the local hosting. In the end, there's no such thing as a free lunch, as usual. "Free" content still isn't free for libraries to integrate with local interfaces and support well, whether that cost comes from internal staffing and other budgetting, or from paying a third party to help. Of course, some solutions are more cost efficient than others, not all are equal. Jonathan On 5/19/2011 9:31 AM, Bill Dueber wrote: My short answer: It's too damn expensive to check out everything that's available for free to see if it's worth selecting for inclusion, and library's (at least as I see them) are supposed to be curated, not comprehensive. My long answer: The most obvious issue is that the OPAC is traditionally a listing of "holdings," and free ebooks aren't "held" in any sense that helps disambiguate them from any other random text on the Internet. Certainly the fact that someone bothered to transform it into ebook form isn't indicative of anything. Not everything that's available can be cataloged. I see "stuff we paid for" not as an arbitrary bias, but simply as a very, very useful way to define the borders of the library. "Free" is a very recent phenomenon, but it just adds more complexity to the existing problem of deciding what publications are within the library's scope. Library collections are curated, and that curation mission is not simply a side effect of limited funds. The filtering process that goes into deciding what a library will hold is itself an incredibly valuable aspect of the collection. Up until very recently, the most important pre-purchase filter was the fact that some publisher thought she could make some money by printing text on paper, and by doing so also allocated resources to edit/typeset/etc. For a traditionally-published work, we know that real person(s), with relatively transparent goals, has already read it and decided it was worth the gamble to sink some fixed costs into the project. It certainly wasn't a perfect filter, but anyone who claims it didn't add enormous information to the system is being disingenuous. Now that (e)publishing and (e)printing costs have nosedived toward $0.00, that filter is breaking. Even print-on-paper costs have been reduced enormously. But going through the slush pile, doing market research, filtering, editing, marketing -- these things all cost money, and for the moment the traditional publishing houses still do them better and more efficiently than anyone else. And they expect to be paid for their work, and they should. There's a tendency in the library world, I think, to dismiss the value of non-academic professionals and assume random people or librarians can just do the work (see also: web-site development, usability studies, graphic design, instructional design and development), but successful publishers are incredibly good at what they do, and the value they add shouldn't be dismissed (although their business practices should certainly be under scrutiny). Of course, I'm not differentiating free (no money) and free (CC0). One can imagine models where the functions of the publishing house move to a work-for-hire model and the final content is released CC0, but it's not clear who's going to pay them for their time.
Re: [CODE4LIB] wikipedia/author disambiguation
This sounds like a great way to "translate" from library forms to wikipedia name forms. But for on-the-fly use I wonder if it wouldn't be more efficient to eliminate the "middle man." Karen, can you say a little about what it took to link library names to WP? Was it a one-step, two-step, etc.? There is a script that I've seen used, although it doesn't seem to be production ready: https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js One interesting note from the OL experience of linking to WP: generally you need to "re-reverse" the names to get a match: from Twain, Mark to Mark Twain. But for some names that isn't the case: Mao, Tse-Tung. Edward Betts used Wikipedia to determine which names do not get "re-reversed". The OL code for its wikipedia lookup is at: https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/wikipedia It, however, runs against dumps rather than an API. kc Quoting Karen Coombs : Graham, I'd advocate using WorldCat Identities to get to the appropriate url for dbpedia. Each Identity record has a wikipedia element in it that you could use to link to either Wikipedia or dbpedia. If you want to see an example of this in action you can check out the Author Info demo I did for code4lib 2010 here - http://www.librarywebchic.net/mashups/author_info/info_about_this_author.php?OCLCNum=32939031 The code for this demo is available for download at - http://www.worldcat.org/devnet/code/devnetDemos/trunk/ You'll want the author_info folder and identity_info.php Karen Karen A. Coombs Product Manager OCLC Developer Network coom...@oclc.org On Thu, May 19, 2011 at 4:40 AM, graham wrote: I need to be able to take author data from a catalogue record and use it to look up the author on Wikipedia on the fly. So I may have birth date and possibly year of death in addition to (one spelling of) the name, the title of one book the author wrote etc. I know there are various efforts in progress that will improve the current situation, but as things stand at the moment what is the best* way to do this? 1. query wikipedia for as much as possible, parse and select the best fitting result 2. go via dbpedia/freebase and work back from there 3. use VIAF and/or OCLC services 4. Other? (I have no experience of 2-4 yet :-( Thanks Graham * 'best' being constrained by: - need to do this in real-time - need to avoid dependence on services which may be taken away or charged for - being able to justify to librarians as reasonably accurate :-) -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] wikipedia/author disambiguation
In addition to the approaches you note, might be worth investigating this tool that came up in a thread just a few days ago on this list: http://wikipedia-miner.sourceforge.net/ I think nobody's done enough with this yet to be sure what will work best, I think you're going to have to experiment and let us know. VIAF/OCLC services are presumably using some sort of statistical analysis/text mining approaches under the hood; wikipedia-miner is using such approaches but giving you the code in open source too if you're curious exactly what they're doing. I suspect statistical approaches like wikipedia-miner uses are likely to be more productive than pure "parsing" approaches considering only one record at a time in isolation. But writing your own statistics analysis algorithms is probably more work than you want, especially when wikipedia-miner and/or VIAF/OCLC services already exist. If you don't do statistical analysis of the corpus, and do end up actually trying to search wikipedia directly -- then I suspect dbpedia is a lot more convenient endpoint than trying to screen-scrape HTML wikipedia. That's pretty much what dbpedia is for. But these are all just my guesses, not informed by any work I've done. Jonathan On 5/19/2011 5:40 AM, graham wrote: I need to be able to take author data from a catalogue record and use it to look up the author on Wikipedia on the fly. So I may have birth date and possibly year of death in addition to (one spelling of) the name, the title of one book the author wrote etc. I know there are various efforts in progress that will improve the current situation, but as things stand at the moment what is the best* way to do this? 1. query wikipedia for as much as possible, parse and select the best fitting result 2. go via dbpedia/freebase and work back from there 3. use VIAF and/or OCLC services 4. Other? (I have no experience of 2-4 yet :-( Thanks Graham * 'best' being constrained by: - need to do this in real-time - need to avoid dependence on services which may be taken away or charged for - being able to justify to librarians as reasonably accurate :-)
Re: [CODE4LIB] Seth Godin on The future of the library
On 2011-05-18 20:30, Eric Hellman wrote: Exactly. I apologize if my comment was perceived as coy, but I've chosen to invest in the possibility that Creative Commons licensing is a viable way forward for libraries, authors, readers, etc. Here's a link the last of a 5 part series on open-access ebooks. I hope it inspires work in the code4lib community to make libraries more friendly to free stuff. http://go-to-hellman.blogspot.com/2011/05/open-access-ebooks-part-5-changing.html Here's a post from a Jewish Studies scholar about his own decision to self-publish under a CC license http://www.rationalistjudaism.com/2011/05/changing-world-of-jewish-scholarship.html -- Yitzchak Schaffer
Re: [CODE4LIB] wikipedia/author disambiguation
Graham, I'd advocate using WorldCat Identities to get to the appropriate url for dbpedia. Each Identity record has a wikipedia element in it that you could use to link to either Wikipedia or dbpedia. If you want to see an example of this in action you can check out the Author Info demo I did for code4lib 2010 here - http://www.librarywebchic.net/mashups/author_info/info_about_this_author.php?OCLCNum=32939031 The code for this demo is available for download at - http://www.worldcat.org/devnet/code/devnetDemos/trunk/ You'll want the author_info folder and identity_info.php Karen Karen A. Coombs Product Manager OCLC Developer Network coom...@oclc.org On Thu, May 19, 2011 at 4:40 AM, graham wrote: > I need to be able to take author data from a catalogue record and use it > to look up the author on Wikipedia on the fly. So I may have birth date > and possibly year of death in addition to (one spelling of) the name, > the title of one book the author wrote etc. > > I know there are various efforts in progress that will improve the > current situation, but as things stand at the moment what is the best* > way to do this? > > 1. query wikipedia for as much as possible, parse and select the best > fitting result > > 2. go via dbpedia/freebase and work back from there > > 3. use VIAF and/or OCLC services > > 4. Other? > > (I have no experience of 2-4 yet :-( > > > Thanks > Graham > * 'best' being constrained by: > - need to do this in real-time > - need to avoid dependence on services which may be taken away > or charged for > - being able to justify to librarians as reasonably accurate :-) >
Re: [CODE4LIB] Seth Godin on The future of the library
My short answer: It's too damn expensive to check out everything that's available for free to see if it's worth selecting for inclusion, and library's (at least as I see them) are supposed to be curated, not comprehensive. My long answer: The most obvious issue is that the OPAC is traditionally a listing of "holdings," and free ebooks aren't "held" in any sense that helps disambiguate them from any other random text on the Internet. Certainly the fact that someone bothered to transform it into ebook form isn't indicative of anything. Not everything that's available can be cataloged. I see "stuff we paid for" not as an arbitrary bias, but simply as a very, very useful way to define the borders of the library. "Free" is a very recent phenomenon, but it just adds more complexity to the existing problem of deciding what publications are within the library's scope. Library collections are curated, and that curation mission is not simply a side effect of limited funds. The filtering process that goes into deciding what a library will hold is itself an incredibly valuable aspect of the collection. Up until very recently, the most important pre-purchase filter was the fact that some publisher thought she could make some money by printing text on paper, and by doing so also allocated resources to edit/typeset/etc. For a traditionally-published work, we know that real person(s), with relatively transparent goals, has already read it and decided it was worth the gamble to sink some fixed costs into the project. It certainly wasn't a perfect filter, but anyone who claims it didn't add enormous information to the system is being disingenuous. Now that (e)publishing and (e)printing costs have nosedived toward $0.00, that filter is breaking. Even print-on-paper costs have been reduced enormously. But going through the slush pile, doing market research, filtering, editing, marketing -- these things all cost money, and for the moment the traditional publishing houses still do them better and more efficiently than anyone else. And they expect to be paid for their work, and they should. There's a tendency in the library world, I think, to dismiss the value of non-academic professionals and assume random people or librarians can just do the work (see also: web-site development, usability studies, graphic design, instructional design and development), but successful publishers are incredibly good at what they do, and the value they add shouldn't be dismissed (although their business practices should certainly be under scrutiny). Of course, I'm not differentiating free (no money) and free (CC0). One can imagine models where the functions of the publishing house move to a work-for-hire model and the final content is released CC0, but it's not clear who's going to pay them for their time. -Bill- On Thu, May 19, 2011 at 8:04 AM, Andreas Orphanides < andreas_orphani...@ncsu.edu> wrote: > On 5/19/2011 7:36 AM, Mike Taylor wrote: > >> I dunno. How do you assess the whole realm of proprietary stuff? >> Wouldn't the same approach work for free stuff? >> >> -- Mike. >> > > A fair question. I think there's maybe at least two parts: marketing and > bundling. > > Marketing is of course not ideal, and likely counterproductive on a number > of measures, but at least when a product is marketed you get sales demos. > Even if they are designed to make a product or collection look as good as > possible, it still gives you some sense of scale, quality, content, etc. > > I think bundling is probably more important. It's a challenge in the > free-stuff realm, but for open access products where there is bundling (for > instance, Directory of Open Access Journals) I think you are likely to see > wider adoption. > > Bundling can of course be both good (lower management cost) and bad > (potentially diluting collection quality for your target audience). But when > there isn't any bundling, which is true for a whole lot of free stuff, > you've got to locally gather a million little bits into a collection. > > I guess what's really happening in the bundling case, at least for free > content, is that collection and quality management activities are being > "outsourced" to a third party. This is probably why DOAJ gets decent > adoption. But of course, this still requires SOME group to be willing to > perform these activities, and for the content/package to remain free, they > either have to get some kind of outside funding (e.g., donations) or be > willing to volunteer their services. > -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Seth Godin on The future of the library
On 5/19/2011 7:36 AM, Mike Taylor wrote: I dunno. How do you assess the whole realm of proprietary stuff? Wouldn't the same approach work for free stuff? -- Mike. A fair question. I think there's maybe at least two parts: marketing and bundling. Marketing is of course not ideal, and likely counterproductive on a number of measures, but at least when a product is marketed you get sales demos. Even if they are designed to make a product or collection look as good as possible, it still gives you some sense of scale, quality, content, etc. I think bundling is probably more important. It's a challenge in the free-stuff realm, but for open access products where there is bundling (for instance, Directory of Open Access Journals) I think you are likely to see wider adoption. Bundling can of course be both good (lower management cost) and bad (potentially diluting collection quality for your target audience). But when there isn't any bundling, which is true for a whole lot of free stuff, you've got to locally gather a million little bits into a collection. I guess what's really happening in the bundling case, at least for free content, is that collection and quality management activities are being "outsourced" to a third party. This is probably why DOAJ gets decent adoption. But of course, this still requires SOME group to be willing to perform these activities, and for the content/package to remain free, they either have to get some kind of outside funding (e.g., donations) or be willing to volunteer their services.
Re: [CODE4LIB] Seth Godin on The future of the library
On 19 May 2011 12:31, Andreas Orphanides wrote: > - I think there's a fear of a slippery slope and/or information overload: How > do you assess the whole realm of freely-available stuff? I dunno. How do you assess the whole realm of proprietary stuff? Wouldn't the same approach work for free stuff? -- Mike.
Re: [CODE4LIB] Seth Godin on The future of the library
Quoting Karen Coyle 05/19/11 1:32 AM >>> > Eric, > > In what ways do you think that libraries today are not friendly to free stuff? > > kc >From my own (rather limited) experience, I think collection developers see >free/open source/open access stuff as a bit of a management challenge: - As Graham says, there's a sunk-cost issue: you're going to prioritize the stuff you paid for over free stuff since you've already invested resources in it. - I think there's a fear of a slippery slope and/or information overload: How do you assess the whole realm of freely-available stuff? How do you prioritize it? How do you ingest it? How do you find the staff energy to maintain all the records? How do you know when to stop? There's also the possibility of drowning out your core collection strengths with material that's irrelevant to your main users, unless you spend a lot of time and energy selecting carefully. - I imagine there's also the lingering perception of getting what you pay for in many minds: it may be perceived that free stuff simply isn't of sufficient quality to include in a high-profile collection. If you do want to vet the free stuff you add to the collection, there's more staff cost. I am sure there are other perceived challenges. I'm curious to see what Eric has to say; he's way more savvy on this kind of thing than I am, that's for sure. -Dre.
[CODE4LIB] Materio and modules
Hi, After about a year of development, we (a hospital library in Sweden) have published some programs that might be of interest for other libraries. They include: Materio - publication platform which gives a common login system, where one can install modules (programs) which do stuff. Modules can be installed and upgraded on the fly for a (hopefully) zero-downtime environment. Modules can have separate data layers so that multiple libraries can use one and the same module. Modules we have created so far: Article harvester - aggregates published articles and presents users with new articles each week. We use it for academic coverage for doctors. It's easy for subscribers and gives them just the new stuff published. Little Boxes CMS - a cms which can publish just about anything, but specialises in resources with dedicated link boxes, with file upload capability and a wysiwyg interface. Aimed to be quick and easy for administrators to work with. Functions approximately like iGoogle or Netvibes. (You can try it out at http://demo.fabicutv.com). Boing - ip-sensitive links. Can create permanent links which can point to different places according to caller ip. You can, for example, create a link called "Your library catalogue" that goes to the regional library catalogue (depending on caller IP). Materio and modules are translatable, and are currently translated to English and Swedish. In the works is a OpenURL resolver with integrated A-Z (journal) list. Everything is licensed under AGPL 3 and created using PHP, MySQL and jQuery. If you wanna help out with development, please do. Materio and modules: http://materio.fabicutv.com/wiki/doku.php Happy trails, Tony Mattsson IT-Librarian Landstinget Dalarna Bibliotek och informationscentral http://www.ldbib.se
[CODE4LIB] wikipedia/author disambiguation
I need to be able to take author data from a catalogue record and use it to look up the author on Wikipedia on the fly. So I may have birth date and possibly year of death in addition to (one spelling of) the name, the title of one book the author wrote etc. I know there are various efforts in progress that will improve the current situation, but as things stand at the moment what is the best* way to do this? 1. query wikipedia for as much as possible, parse and select the best fitting result 2. go via dbpedia/freebase and work back from there 3. use VIAF and/or OCLC services 4. Other? (I have no experience of 2-4 yet :-( Thanks Graham * 'best' being constrained by: - need to do this in real-time - need to avoid dependence on services which may be taken away or charged for - being able to justify to librarians as reasonably accurate :-)
Re: [CODE4LIB] Seth Godin on The future of the library
Not replying for Eric but I hope he doesn't mind me butting in too.. As a newcomer to (academic) libraries from a software background, some of the things that first struck me were; 1. The amount of money spent on non-free stuff means it has to be emphasized over free stuff in publicity to try to get the usage to justify the spend 2. It is hard to justify spending time on improving access to free stuff when the end result would be good for everyone, not just the institution doing the work (unless it can be kept in a consortium and outside-world access limited) 3. Bizarre (to me) academic attitudes to free stuff feed through to libraries: many academic seem to feel that wikipedia should be blocked rather than improved, for example. Graham On 05/19/11 06:30, Karen Coyle wrote: > Quoting Eric Hellman : > >> Exactly. I apologize if my comment was perceived as coy, but I've >> chosen to invest in the possibility that Creative Commons licensing is >> a viable way forward for libraries, authors, readers, etc. Here's a >> link the last of a 5 part series on open-access ebooks. I hope it >> inspires work in the code4lib community to make libraries more >> friendly to free stuff. > > Eric, > > In what ways do you think that libraries today are not friendly to free > stuff? > > kc > >> >> http://go-to-hellman.blogspot.com/2011/05/open-access-ebooks-part-5-changing.html >> >> >> On May 18, 2011, at 7:20 PM, David Friggens wrote: >> > Some ebooks, in fact some of the greatest ever written, already > cost less > than razor blades. >>> Do you mean ones not under copyright? >>> >>> Those, plus Creative Commons etc. >> > > >