date:20110519

Thanks Michael. So one weird thing is that at least some of those 
characters "specifically designated as control characters" aren't 
ordinarily what everyone else considers "control characters".  To me, 
"control character" means ASCII less than 20. Which the last four 
aren't. So now it's unclear what the "prohibted" (by not being 
mentioned) control characters are, since I don't know what MARC 
considers a 'control character' exactly.


But I'm really just picking nits to demonstrate the impenetrability of 
MARC specs.  I believe you all (especially Terry) that CR and LF aren't 
allowed.


But, two, Michael, are you the doran in this? 
http://rocky.uta.edu/doran/charsets/marc8default.html


You might want to remove CR, LF, and the other disallowed control 
characters from your own published list of MARC8 characters!


On 5/19/2011 3:16 PM, Doran, Michael D wrote:

Is it really true that newline characters are not allowed in a marc
value?

Yes.

   CONTROL FUNCTION CODES [1]

   Eight characters are specifically designated as control characters for MARC 
21 use:

   - escape character, 1B(hex) in MARC-8 and Unicode encoding
   - subfield delimiter, 1F(hex) in MARC-8 and Unicode encoding
   - field terminator, 1E(hex) in MARC-8 and Unicode encoding
   - record terminator, 1D(hex) in MARC-8 and Unicode encoding
   - non-sorting character(s) begin, 88(hex) in MARC-8 and 98(hex) in Unicode 
encoding
   - non-sorting character(s) end, 89(hex) in MARC-8 and 9C(hex) in Unicode 
encoding
   - joiner, 8D(hex) in MARC-8 and 200D (hex) in Unicode encoding
   - nonjoiner, 8E(hex) in MARC-8 and 200C (hex) in Unicode encoding.

[1] http://www.loc.gov/marc/specifications/specchargeneral.html#controlfunction

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/







-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Thursday, May 19, 2011 1:27 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] is this valid marc ?

Is it really true that newline characters are not allowed in a marc
value?  I thought they were, not with any special meaning, just as
ordinary data.  If they're not, that's useful to know, so I don't put
any there!

I'd ask for a reference to the standard that says this, but I suspect
it's going to be some impenetrable implication of a side effect of an
subtle adjective either way.

On 5/19/2011 2:19 PM, Karen Coyle wrote:

Quoting Andreas Orphanides:


Anyway, I think having these two parts of the same URL data on
separate lines is definitely Not Right, but I am not sure if it adds
up to invalid MARC.

Exactly. The CR and LF characters are NOT defined as valid in the MARC
character set and should not be used. In fact, in MARC there is no
concept of "lines", only variable length strings (usually up to 
char).

kc


-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.

On 5/19/2011 12:37 PM, James Lecard wrote:

I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I
get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to
ignore
it (I've patched it to overcome this issue) but I want to know if
this kind
of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856
\\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread Reese, Terry

It's been a while since I looked of the ISO spec (which I still can't believe I 
had to buy to read) -- but you can certainly infer by looking at legal 
characters laid out by LC.  In reality -- only a handful of unprintable 
characters are technically allowed in a MARC record -- but you have to remember 
that when MARC was created -- it was for block reading -- and generally, early 
(and current) readers stop on hard breaks.

--TR

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Jonathan Rochkind
> Sent: Thursday, May 19, 2011 11:49 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] is this valid marc ?
> 
> On 5/19/2011 2:33 PM, Reese, Terry wrote:
> > Jonathan,
> >
> > Karen is correct -- CR/LF are invalid characters within a MARC record.  This
> has nothing to do if the character is valid in the set -- the format itself 
> doesn't
> allow it.
> 
> I'm curious where in the spec it says this -- of course, it's an intellectual
> exersize at this point, because even if the spec says one thing, it doesn't
> matter if everyone (including tool-writers) has always understood it
> differently. (This is a problem for me with lots of library 'standards' 
> including
> MARC. "Oh yeah, it might APPEAR to say/allow/prohibit that, but don't
> believe it, 'everyone' has always understood it diffferently." Or two parts 
> of a
> spec which contradict each other).
> 
> In the glossary here:
> http://www.loc.gov/marc/specifications/speccharintro.html
> 
> It does say "Consequently,/code points/less than 80 (hex) have the same
> meaning in both of the encodings used in MARC 21 and may be referred to as
> ASCII in either environment." Which could be interpreted to include control
> chars such as CR and LF. (Thanks Dan Scott). Of course, the glossary section
> may not actually be an operative part of the standard, or it may not mean
> what it seems to mean, or everyone may have always acted as if it meant
> something different. Welcome to MARC.
> 
> But I'm not succesfully finding anything else that says one way or another on
> the legality. Most of the ascii control chars do seem to be missing from Marc8
> (whether by design or accident), but that doesn't neccesarily mean they're
> illegal in a MARC record using some other (legal for MARC) encoding.
> 
> But I believe Terry that it's not allowed (I believe Terry about just about
> everything).  It's just really an intellectual exersize in the difficulty of 
> finding
> answers in the MARC spec at the moment.

Re: [CODE4LIB] is this valid marc ?


On 5/19/2011 2:33 PM, Reese, Terry wrote:

Jonathan,

Karen is correct -- CR/LF are invalid characters within a MARC record.  This 
has nothing to do if the character is valid in the set -- the format itself 
doesn't allow it.


I'm curious where in the spec it says this -- of course, it's an 
intellectual exersize at this point, because even if the spec says one 
thing, it doesn't matter if everyone (including tool-writers) has always 
understood it differently. (This is a problem for me with lots of 
library 'standards' including MARC. "Oh yeah, it might APPEAR to 
say/allow/prohibit that, but don't believe it, 'everyone' has always 
understood it diffferently." Or two parts of a spec which contradict 
each other).


In the glossary here: 
http://www.loc.gov/marc/specifications/speccharintro.html


It does say "Consequently,/code points/less than 80 (hex) have the same 
meaning in both of the encodings used in MARC 21 and may be referred to 
as ASCII in either environment." Which could be interpreted to include 
control chars such as CR and LF. (Thanks Dan Scott). Of course, the 
glossary section may not actually be an operative part of the standard, 
or it may not mean what it seems to mean, or everyone may have always 
acted as if it meant something different. Welcome to MARC.


But I'm not succesfully finding anything else that says one way or 
another on the legality. Most of the ascii control chars do seem to be 
missing from Marc8 (whether by design or accident), but that doesn't 
neccesarily mean they're illegal in a MARC record using some other 
(legal for MARC) encoding.


But I believe Terry that it's not allowed (I believe Terry about just 
about everything).  It's just really an intellectual exersize in the 
difficulty of finding answers in the MARC spec at the moment.

Re: [CODE4LIB] is this valid marc ?


On 5/19/2011 2:33 PM, Kyle Banerjee wrote:
However, what would be the use case for including them as you don't 
know how

they'll be interpreted by the app that you hand the data to?


Only when the destination is an app you have complete control over too.

One use case I was idly turning over in my head lately. I export data 
about my bibs from my ILS to Solr in Marc. But I am increasingly needing 
to stuff 'local' data that doesn't fit into any Marc field in there too, 
because I need it available at Solr indexing stage. Not concerned with 
doing this in a 'standard' way, just need to get it in there SOMEHOW, 
because Marc is all that makes it to my Solr indexer. (and it would be 
somewhat complicated to change my pipeline to send a package that 
includes Marc plus other metadata payloads, there are a bunch of pieces 
in the pipeline that really want Marc-as-marc).


So one idea I had was encoding it as arbitrary key/value pairs in YAML, 
and just sticking the YAML in a 9xx field.


But a newline is a significant character for YAML.  I don't care about 
this data being _meaningful_ to anyone other than my own custom local 
destination, but I do care about leaving the Marc structurally legal 
(especially because if its' not some of the individual elements of the 
pipeline might choke on it or corrupt it).


Another different idea I was also thinking about:  All of our MARC 
'summaries' (520) show up in our interfaces as one giant paragraph. Even 
when they are publisher back-of-the-book copy that was originally 
multiple paragraphs. Sometimes a MARC record has the exact same text in 
it as an Amazon description, but the Amazon description is a lot more 
readable because it is rightly multiple paragraphs.  If newlines were 
legal in a 520, then a cataloger could preserve them --- systems that 
just ignored it would continue to, no loss; but systems that wanted to 
take account of it could, for instance by using HTML  or  tags to 
paragraph-ize on newlines before outputting to an HTML display.  But not 
if newlines aren't legal in a value, of course.


Jonathan



I've seen
people put HTML in certain fields to achieve a certain effect in catalogs,
but this is a dodgy practice since it relies on the questionable assumption
that the end application will just pass through whatever is sent.

kyle

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread Reese, Terry

Jonathan, 

Karen is correct -- CR/LF are invalid characters within a MARC record.  This 
has nothing to do if the character is valid in the set -- the format itself 
doesn't allow it.

--TR

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Jonathan Rochkind
Sent: Thursday, May 19, 2011 11:29 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] is this valid marc ?

I wonder if it depends on if your record is in Marc8 or UTF-8, if I'm reading 
Karen right to say that CR/LF aren't in the Marc8 character set. 
They're certainly in UTF-8!  And a Marc record can be in UTF-8.

On 5/19/2011 2:27 PM, Jonathan Rochkind wrote:
> Is it really true that newline characters are not allowed in a marc 
> value?  I thought they were, not with any special meaning, just as 
> ordinary data.  If they're not, that's useful to know, so I don't put 
> any there!
>
> I'd ask for a reference to the standard that says this, but I suspect 
> it's going to be some impenetrable implication of a side effect of an 
> subtle adjective either way.
>
> On 5/19/2011 2:19 PM, Karen Coyle wrote:
>> Quoting Andreas Orphanides :
>>
>>>
>>> Anyway, I think having these two parts of the same URL data on 
>>> separate lines is definitely Not Right, but I am not sure if it adds 
>>> up to invalid MARC.
>>
>> Exactly. The CR and LF characters are NOT defined as valid in the 
>> MARC character set and should not be used. In fact, in MARC there is 
>> no concept of "lines", only variable length strings (usually up to
>>  char).
>>
>> kc
>>
>>>
>>> -dre.
>>>
>>> [1] http://www.loc.gov/marc/bibliographic/bd856.html
>>> [2] I am not a cataloger. Don't hurt me.
>>> [3] I am not an expert on MARC ingest or on ruby-marc. I could be 
>>> wrong.
>>>
>>> On 5/19/2011 12:37 PM, James Lecard wrote:
 I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files 
 I get from a partner.

 The 856 field is splitted over 2 lines, causing the ruby library to 
 ignore it (I've patched it to overcome this issue) but I want to 
 know if this kind of marc is valid ?

 =LDR  00638nam  2200181uu 4500
 =001  cla-MldNA01
 =008  080101s2008\\\|fre||
 =040  \\$aMy Provider
 =041  0\$afre
 =245  10$aThis Subject
 =260  \\$aParis$bJ. Doe$c2008
 =490  \\$aSome topic
 =650  1\$aNarratif, Autre forme
 =655  \7$abook$2lcsh
 =752  \\$aA Place on earth
 =776  \\$dParis: John Doe and Cie, 1973
 =856  \2$qtext/html
 =856
 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

 Thanks,

 James L.
>>>
>>
>>
>>

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread Kyle Banerjee

Is it really true that newline characters are not allowed in a marc value?
>  I thought they were, not with any special meaning, just as ordinary data.
>  If they're not, that's useful to know, so I don't put any there!
>

This is also my understanding.

However, what would be the use case for including them as you don't know how
they'll be interpreted by the app that you hand the data to? I've seen
people put HTML in certain fields to achieve a certain effect in catalogs,
but this is a dodgy practice since it relies on the questionable assumption
that the end application will just pass through whatever is sent.

kyle

Re: [CODE4LIB] is this valid marc ?

I wonder if it depends on if your record is in Marc8 or UTF-8, if I'm 
reading Karen right to say that CR/LF aren't in the Marc8 character set. 
They're certainly in UTF-8!  And a Marc record can be in UTF-8.


On 5/19/2011 2:27 PM, Jonathan Rochkind wrote:
Is it really true that newline characters are not allowed in a marc 
value?  I thought they were, not with any special meaning, just as 
ordinary data.  If they're not, that's useful to know, so I don't put 
any there!


I'd ask for a reference to the standard that says this, but I suspect 
it's going to be some impenetrable implication of a side effect of an 
subtle adjective either way.


On 5/19/2011 2:19 PM, Karen Coyle wrote:

Quoting Andreas Orphanides :



Anyway, I think having these two parts of the same URL data on 
separate lines is definitely Not Right, but I am not sure if it adds 
up to invalid MARC.


Exactly. The CR and LF characters are NOT defined as valid in the 
MARC character set and should not be used. In fact, in MARC there is 
no concept of "lines", only variable length strings (usually up to 
 char).


kc



-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be 
wrong.


On 5/19/2011 12:37 PM, James Lecard wrote:
I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files 
I get

from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to 
ignore
it (I've patched it to overcome this issue) but I want to know if 
this kind

of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  
\\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library


Thanks,

James L.

Re: [CODE4LIB] is this valid marc ?

Is it really true that newline characters are not allowed in a marc 
value?  I thought they were, not with any special meaning, just as 
ordinary data.  If they're not, that's useful to know, so I don't put 
any there!


I'd ask for a reference to the standard that says this, but I suspect 
it's going to be some impenetrable implication of a side effect of an 
subtle adjective either way.


On 5/19/2011 2:19 PM, Karen Coyle wrote:

Quoting Andreas Orphanides :



Anyway, I think having these two parts of the same URL data on 
separate lines is definitely Not Right, but I am not sure if it adds 
up to invalid MARC.


Exactly. The CR and LF characters are NOT defined as valid in the MARC 
character set and should not be used. In fact, in MARC there is no 
concept of "lines", only variable length strings (usually up to  
char).


kc



-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.

On 5/19/2011 12:37 PM, James Lecard wrote:
I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I 
get

from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to 
ignore
it (I've patched it to overcome this issue) but I want to know if 
this kind

of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  
\\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library


Thanks,

James L.

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread Karen Coyle


Quoting Andreas Orphanides :



Anyway, I think having these two parts of the same URL data on  
separate lines is definitely Not Right, but I am not sure if it adds  
up to invalid MARC.


Exactly. The CR and LF characters are NOT defined as valid in the MARC  
character set and should not be used. In fact, in MARC there is no  
concept of "lines", only variable length strings (usually up to   
char).


kc



-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.

On 5/19/2011 12:37 PM, James Lecard wrote:

I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to ignore
it (I've patched it to overcome this issue) but I want to know if this kind
of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.






--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread Ross Singer

On Thu, May 19, 2011 at 1:33 PM, Bill Dueber  wrote:
> record['856'] is defined to return the *first* 856 in the record, which, if
> you look at the documentation...er...ok. Which is not documented as such in
> MARC::Record (http://rubydoc.info/gems/marc/0.4.2/MARC/Record)
>
> To get them all, you need to do something like
>
>  sixfifties = record.fields '650' # returns array of results
>
> Or, to iterate
>
>  record.each_by_tag('650') do |f|
>    puts f['u'] if f['u'] # print out a URL if we've got one
>  end
>

What Bill said.  Also, there's a somewhat complicated calculus that
comes into play here regarding ruby-marc and looking up subfields and
performance.

Modern ruby-marc (which 0.4.2 is an example) has the capability of
providing a hash of the fields for much faster access than:

eight_fifty_sixes = record.find_all { |field| field.tag == "856" }

However, it comes at cost (that is, there's a penalty in building the
field map).  This penalty is offset if you wind up doing a lot of
one-off lookups in a single record.  If you're simply looking for a
single field in every record (or know, before hand, what fields you're
looking for), it's *much* faster to do something like:

tags = ['001', '020', '100', '110', '111', '245', '650', '856']
fields = record.find_all { | field | tags.include?(field.tag) }

or whatever.  At some point we did benchmark of this (Bill Dueber did
it: https://gist.github.com/591907) and the threshold was somewhere
around 6 or so #find_all calls were needed to offset building the
field map.

This is why it's not really documented.   This is the sort of thing
that really needs to go into the ruby-marc wiki.

BTW, the behavior exists for subfields, too.  If you do something like
record['043']['a'] and there are multiple subfield "a"s, you'll only
get the first one.

-Ross.
>
>
> On Thu, May 19, 2011 at 1:16 PM, James Lecard  wrote:
>
>> I'll dig in this one, thanks for this input Jonathan... I'm not so so
>> familiar with the library yet, I'll do some more debugging but in fact what
>> is happening is that I have no value with an access such as
>> record['856']['u'] field, while I get one for record['856']['q']
>> And the marc you are seeing is copy/pasted from a marc editor gui, its not
>> the actual marc record, I edited it so that its data is not recognisable
>> (for confidentiality).
>>
>> James
>>
>>
>> 2011/5/19 Jonathan Rochkind 
>>
>> > Now whether it _means_ what you want it to mean is another question,
>> yeah.
>> > As Andreas said, I don't think that particular example _ought_ to have
>> two
>> > 856's.
>> >
>> > But it ought to be perfectly parseable marc.
>> >
>> > If your 'patch' is to make ruby-marc combine those multiple 856's into
>> one
>> > -- that is not right, two seperate 856's are two seperate 856's, same as
>> any
>> > other marc field. Applying that patch would mess up ruby-marc, not fix
>> it.
>> >
>> > You need to be more specific about what you're doing and what you mean
>> > exactly by 'causing the ruby library to ignore it'.  I wonder if you are
>> > just using the a method in ruby-marc which only returns the first field
>> > matching a given tag when there is more than one.
>> >
>> >
>> >
>> >
>> > On 5/19/2011 12:51 PM, Andreas Orphanides wrote:
>> >
>> >> From the MARC documentation [1]:
>> >>
>> >> "Field 856 is repeated when the location data elements vary (the URL in
>> >> subfield $u or subfields $a, $b, $d, when used). It is also repeated
>> when
>> >> more than one access method is used, different portions of the item are
>> >> available electronically, mirror sites are recorded, different
>> >> formats/resolutions with different URLs are indicated, and related items
>> are
>> >> recorded."
>> >>
>> >> So it looks like however the URL is handled, a single 856 field should
>> be
>> >> used to indicate the location [2]. I am not familiar enough with MARC to
>> say
>> >> how it "should" have been done, but it looks like $q and $u would
>> probably
>> >> be sufficient (if they're in the same line).
>> >>
>> >> However, since the field is repeatable, the parser shouldn't be choking
>> on
>> >> it, unless it's choking on it for a sophisticated reason (e.g., "These
>> >> aren't the subfield tags I expect to be seeing"). It also looks like if
>> $u
>> >> is provided, the first subfield should indicate access method (in this
>> case
>> >> "4" for HTTP). Maybe that's what's causing the problem? [3]
>> >>
>> >> Anyway, I think having these two parts of the same URL data on separate
>> >> lines is definitely Not Right, but I am not sure if it adds up to
>> invalid
>> >> MARC.
>> >>
>> >> -dre.
>> >>
>> >> [1] http://www.loc.gov/marc/bibliographic/bd856.html
>> >> [2] I am not a cataloger. Don't hurt me.
>> >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.
>> >>
>> >> On 5/19/2011 12:37 PM, James Lecard wrote:
>> >>
>> >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I
>> get
>> >>> from a partner.
>> >>>
>> >

Re: [CODE4LIB] MARCXML to MODS: 590 Field

2011-05-19 Thread Meehleib, Tracy

Jon and Karen are correct. LC doesn't map/convert local fields because usage 
varies.

Tracy

Tracy Meehleib
Network Development and MARC Standards Office
Library of Congress
101 Independence Ave SE
Washington, DC 20540-4402
+1 202 707 0121 (voice)
+1 202 707 0115 (fax)
t...@loc.gov



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
Miller
Sent: Thursday, May 19, 2011 12:35 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field

Joel,

The 590 is indeed defined for local use, so whatever your local institution 
uses it for should guide your mapping to MODS. There are some examples of what 
it's used for on the OCLC Bibliographic Formats and Standards pages:

http://www.oclc.org/bibformats/en/5xx/590.shtm

Frequently it's used as a note that is specific to a local copy of an item.
If your institution uses it inconsistently, you might want to just map it to 
mods:note.

Karen

Karen D. Miller
Monographic/Digital Projects Cataloger
Bibliographic Services Dept.
Northwestern University Library
Evanston, IL
k-mill...@northwestern.edu
847-467-3462


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon
Stroop
Sent: Thursday, May 19, 2011 11:07 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field

I'm going to guess that it's because 59x fields are defined for local use:

http://www.loc.gov/marc/bibliographic/bd59x.html

...but someone from LC should be able to confirm.
-Jon

-- 
Jon Stroop
Metadata Analyst
Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://pudl.princeton.edu
http://diglib.princeton.edu
http://diglib.princeton.edu/ead
http://www.cpanda.org/cpanda



On 05/19/2011 11:45 AM, Richard, Joel M wrote:
> Dear hive-mind,
>
> Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT
[1] does not handle the MARC 590 Local Notes field? It seems to handle
everything else, not that I've done an exhaustive search... :)
>
> Granted, I could copy/create my own XSLT and add this functionality in
myself, but I'm curious as to whether or not there's some logic behind this
decision to not include it. Logic that I would not naturally understand
since I'm not formally trained as a librarian.
>
> Thanks!
> --Joel
>
> [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl
>
>
> Joel Richard
> IT Specialist, Web Services Department
> Smithsonian Institution Libraries | http://www.sil.si.edu/
> (202) 633-1706 | richar...@si.edu

Re: [CODE4LIB] Seth Godin on The future of the library


On 5/19/2011 1:23 PM, Ryan Engel wrote:
There are some who argue that if it's valuable to others, then others 
should pay for it (even when the improved access benefits your 
institution first and foremost, and distribution of the improvements 
is an arguably beneficial side effect) .  Why should one institution 
carry the financial burden of improving something that benefits others 
beyond that institution?  It's not an argument I agree with, but it's 
one I've heard before.


It is a somewhat odd position especially for libraries who have been in 
the business of providing service to others at no profit to themselves 
for many years, including in technical matters such as cooperative 
cataloging and lending via ILL.  Libraries have gotten to be where they 
are today by willing to chip in for the general good on a sort of 
"generalized reciprocity" basis.


Jonathan

Re: [CODE4LIB] is this valid marc ?

I believe that the ruby-marc API, when you do record['856'], you just 
get the first 856, if there are more than one. You have to use other API 
(I forget offhand) to get more than one, the ['856'] is just a shortcut 
when you will only have one or only care about the first one.


So I don't think there's any bug in ruby-marc.

Your data example is _odd_ though, it's not usual to record 856's like 
that, and it probably shouldn't be recorded like that. Multiple 856's 
can exist where then are in fact multiple URLs recorded.


On 5/19/2011 1:16 PM, James Lecard wrote:

I'll dig in this one, thanks for this input Jonathan... I'm not so so
familiar with the library yet, I'll do some more debugging but in fact what
is happening is that I have no value with an access such as
record['856']['u'] field, while I get one for record['856']['q']
And the marc you are seeing is copy/pasted from a marc editor gui, its not
the actual marc record, I edited it so that its data is not recognisable
(for confidentiality).

James


2011/5/19 Jonathan Rochkind


Now whether it _means_ what you want it to mean is another question, yeah.
As Andreas said, I don't think that particular example _ought_ to have two
856's.

But it ought to be perfectly parseable marc.

If your 'patch' is to make ruby-marc combine those multiple 856's into one
-- that is not right, two seperate 856's are two seperate 856's, same as any
other marc field. Applying that patch would mess up ruby-marc, not fix it.

You need to be more specific about what you're doing and what you mean
exactly by 'causing the ruby library to ignore it'.  I wonder if you are
just using the a method in ruby-marc which only returns the first field
matching a given tag when there is more than one.




On 5/19/2011 12:51 PM, Andreas Orphanides wrote:


 From the MARC documentation [1]:

"Field 856 is repeated when the location data elements vary (the URL in
subfield $u or subfields $a, $b, $d, when used). It is also repeated when
more than one access method is used, different portions of the item are
available electronically, mirror sites are recorded, different
formats/resolutions with different URLs are indicated, and related items are
recorded."

So it looks like however the URL is handled, a single 856 field should be
used to indicate the location [2]. I am not familiar enough with MARC to say
how it "should" have been done, but it looks like $q and $u would probably
be sufficient (if they're in the same line).

However, since the field is repeatable, the parser shouldn't be choking on
it, unless it's choking on it for a sophisticated reason (e.g., "These
aren't the subfield tags I expect to be seeing"). It also looks like if $u
is provided, the first subfield should indicate access method (in this case
"4" for HTTP). Maybe that's what's causing the problem? [3]

Anyway, I think having these two parts of the same URL data on separate
lines is definitely Not Right, but I am not sure if it adds up to invalid
MARC.

-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.

On 5/19/2011 12:37 PM, James Lecard wrote:


I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to
ignore
it (I've patched it to overcome this issue) but I want to know if this
kind
of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread Bill Dueber

record['856'] is defined to return the *first* 856 in the record, which, if
you look at the documentation...er...ok. Which is not documented as such in
MARC::Record (http://rubydoc.info/gems/marc/0.4.2/MARC/Record)

To get them all, you need to do something like

  sixfifties = record.fields '650' # returns array of results

Or, to iterate

  record.each_by_tag('650') do |f|
puts f['u'] if f['u'] # print out a URL if we've got one
  end



On Thu, May 19, 2011 at 1:16 PM, James Lecard  wrote:

> I'll dig in this one, thanks for this input Jonathan... I'm not so so
> familiar with the library yet, I'll do some more debugging but in fact what
> is happening is that I have no value with an access such as
> record['856']['u'] field, while I get one for record['856']['q']
> And the marc you are seeing is copy/pasted from a marc editor gui, its not
> the actual marc record, I edited it so that its data is not recognisable
> (for confidentiality).
>
> James
>
>
> 2011/5/19 Jonathan Rochkind 
>
> > Now whether it _means_ what you want it to mean is another question,
> yeah.
> > As Andreas said, I don't think that particular example _ought_ to have
> two
> > 856's.
> >
> > But it ought to be perfectly parseable marc.
> >
> > If your 'patch' is to make ruby-marc combine those multiple 856's into
> one
> > -- that is not right, two seperate 856's are two seperate 856's, same as
> any
> > other marc field. Applying that patch would mess up ruby-marc, not fix
> it.
> >
> > You need to be more specific about what you're doing and what you mean
> > exactly by 'causing the ruby library to ignore it'.  I wonder if you are
> > just using the a method in ruby-marc which only returns the first field
> > matching a given tag when there is more than one.
> >
> >
> >
> >
> > On 5/19/2011 12:51 PM, Andreas Orphanides wrote:
> >
> >> From the MARC documentation [1]:
> >>
> >> "Field 856 is repeated when the location data elements vary (the URL in
> >> subfield $u or subfields $a, $b, $d, when used). It is also repeated
> when
> >> more than one access method is used, different portions of the item are
> >> available electronically, mirror sites are recorded, different
> >> formats/resolutions with different URLs are indicated, and related items
> are
> >> recorded."
> >>
> >> So it looks like however the URL is handled, a single 856 field should
> be
> >> used to indicate the location [2]. I am not familiar enough with MARC to
> say
> >> how it "should" have been done, but it looks like $q and $u would
> probably
> >> be sufficient (if they're in the same line).
> >>
> >> However, since the field is repeatable, the parser shouldn't be choking
> on
> >> it, unless it's choking on it for a sophisticated reason (e.g., "These
> >> aren't the subfield tags I expect to be seeing"). It also looks like if
> $u
> >> is provided, the first subfield should indicate access method (in this
> case
> >> "4" for HTTP). Maybe that's what's causing the problem? [3]
> >>
> >> Anyway, I think having these two parts of the same URL data on separate
> >> lines is definitely Not Right, but I am not sure if it adds up to
> invalid
> >> MARC.
> >>
> >> -dre.
> >>
> >> [1] http://www.loc.gov/marc/bibliographic/bd856.html
> >> [2] I am not a cataloger. Don't hurt me.
> >> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.
> >>
> >> On 5/19/2011 12:37 PM, James Lecard wrote:
> >>
> >>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I
> get
> >>> from a partner.
> >>>
> >>> The 856 field is splitted over 2 lines, causing the ruby library to
> >>> ignore
> >>> it (I've patched it to overcome this issue) but I want to know if this
> >>> kind
> >>> of marc is valid ?
> >>>
> >>> =LDR  00638nam  2200181uu 4500
> >>> =001  cla-MldNA01
> >>> =008  080101s2008\\\|fre||
> >>> =040  \\$aMy Provider
> >>> =041  0\$afre
> >>> =245  10$aThis Subject
> >>> =260  \\$aParis$bJ. Doe$c2008
> >>> =490  \\$aSome topic
> >>> =650  1\$aNarratif, Autre forme
> >>> =655  \7$abook$2lcsh
> >>> =752  \\$aA Place on earth
> >>> =776  \\$dParis: John Doe and Cie, 1973
> >>> =856  \2$qtext/html
> >>> =856
>  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library
> >>>
> >>> Thanks,
> >>>
> >>> James L.
> >>>
> >>
> >>
>



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Ryan Engel

There are some who argue that if it's valuable to others, then others 
should pay for it (even when the improved access benefits your 
institution first and foremost, and distribution of the improvements is 
an arguably beneficial side effect) .  Why should one institution carry 
the financial burden of improving something that benefits others beyond 
that institution?  It's not an argument I agree with, but it's one I've 
heard before.



Luciano Ramalho wrote:

On Thu, May 19, 2011 at 6:24 AM, graham  wrote:
   

2. It is hard to justify spending time on improving access to free stuff
when the end result would be good for everyone, not just the institution
doing the work (unless it can be kept in a consortium and outside-world
access limited)
 


Why is it hard to justify anything that would be good for everyone?

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread Jon Gorman

You've gotten some other good responses, but I thought I'd mention the
LoC and OCLC sites on MARC if you haven't seen them yet.

First, the LoC site at http://www.loc.gov/marc/.  This is what I use
as a guide and a reference.

Some folks prefer the OCLC docs http://www.oclc.org/bibformats/en/,
particularly if they're an OCLC member.

Of course, these apply to MARC-21 and not UniMarc.  Not sure what good
resources are out there for UniMARC.

Jon Gorman

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread James Lecard

I'll dig in this one, thanks for this input Jonathan... I'm not so so
familiar with the library yet, I'll do some more debugging but in fact what
is happening is that I have no value with an access such as
record['856']['u'] field, while I get one for record['856']['q']
And the marc you are seeing is copy/pasted from a marc editor gui, its not
the actual marc record, I edited it so that its data is not recognisable
(for confidentiality).

James


2011/5/19 Jonathan Rochkind 

> Now whether it _means_ what you want it to mean is another question, yeah.
> As Andreas said, I don't think that particular example _ought_ to have two
> 856's.
>
> But it ought to be perfectly parseable marc.
>
> If your 'patch' is to make ruby-marc combine those multiple 856's into one
> -- that is not right, two seperate 856's are two seperate 856's, same as any
> other marc field. Applying that patch would mess up ruby-marc, not fix it.
>
> You need to be more specific about what you're doing and what you mean
> exactly by 'causing the ruby library to ignore it'.  I wonder if you are
> just using the a method in ruby-marc which only returns the first field
> matching a given tag when there is more than one.
>
>
>
>
> On 5/19/2011 12:51 PM, Andreas Orphanides wrote:
>
>> From the MARC documentation [1]:
>>
>> "Field 856 is repeated when the location data elements vary (the URL in
>> subfield $u or subfields $a, $b, $d, when used). It is also repeated when
>> more than one access method is used, different portions of the item are
>> available electronically, mirror sites are recorded, different
>> formats/resolutions with different URLs are indicated, and related items are
>> recorded."
>>
>> So it looks like however the URL is handled, a single 856 field should be
>> used to indicate the location [2]. I am not familiar enough with MARC to say
>> how it "should" have been done, but it looks like $q and $u would probably
>> be sufficient (if they're in the same line).
>>
>> However, since the field is repeatable, the parser shouldn't be choking on
>> it, unless it's choking on it for a sophisticated reason (e.g., "These
>> aren't the subfield tags I expect to be seeing"). It also looks like if $u
>> is provided, the first subfield should indicate access method (in this case
>> "4" for HTTP). Maybe that's what's causing the problem? [3]
>>
>> Anyway, I think having these two parts of the same URL data on separate
>> lines is definitely Not Right, but I am not sure if it adds up to invalid
>> MARC.
>>
>> -dre.
>>
>> [1] http://www.loc.gov/marc/bibliographic/bd856.html
>> [2] I am not a cataloger. Don't hurt me.
>> [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.
>>
>> On 5/19/2011 12:37 PM, James Lecard wrote:
>>
>>> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
>>> from a partner.
>>>
>>> The 856 field is splitted over 2 lines, causing the ruby library to
>>> ignore
>>> it (I've patched it to overcome this issue) but I want to know if this
>>> kind
>>> of marc is valid ?
>>>
>>> =LDR  00638nam  2200181uu 4500
>>> =001  cla-MldNA01
>>> =008  080101s2008\\\|fre||
>>> =040  \\$aMy Provider
>>> =041  0\$afre
>>> =245  10$aThis Subject
>>> =260  \\$aParis$bJ. Doe$c2008
>>> =490  \\$aSome topic
>>> =650  1\$aNarratif, Autre forme
>>> =655  \7$abook$2lcsh
>>> =752  \\$aA Place on earth
>>> =776  \\$dParis: John Doe and Cie, 1973
>>> =856  \2$qtext/html
>>> =856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library
>>>
>>> Thanks,
>>>
>>> James L.
>>>
>>
>>

Re: [CODE4LIB] is this valid marc ?


I'm curious what's going on here, it doesn't make any sense.

Do you just mean that your MARC file has more than one 856 in it? That's 
what your pasted marc looks like, but that is definitely legal, AND I've 
parsed many many marc files with more than one 856 in them, with 
ruby-marc, it was not a problem.  I do it all the time.


Or do you mean your 856 had a newline ("\n") in it?  I don't know if 
I've ever tried that, although yes, it should be legal.  But if 
ruby-marc has a bug there, yes it needs to be fixed.


What form is your marc in that you are parsing with ruby-marc?  marc21 
binary? marcxml?  Or are you actually trying to parse what you pasted 
in, that weird marc-as-human-readable-text format?  I vaguely recall 
ruby-marc having a method to parse such marc-as-human-readable-text, but 
I'm not sure if it's actually a _standard_ at all, so I'm not sure if 
it's possible to say what should or shouldn't be legal in it.


Jonathan

On 5/19/2011 12:49 PM, James Lecard wrote:

Thanks a lot Richard,

So I guess my patch could be ported to the source code of ruby-marc,

Let me know if interested,

James

2011/5/19 Richard, Joel M


I'm no MARC expert, but I've learned enough to say that yes, this is valid
in that what you're seeing is the $q (Electronic format type) and $u
(Uniform Resource Identifier ) subfields of the 856 field.

http://www.oclc.org/bibformats/en/8xx/856.shtm

You'll see other things when you get multiple authors (creators) on an item
or multiple anythings that can occur more than once.

--Joel

Joel Richard
IT Specialist, Web Services Department
Smithsonian Institution Libraries | http://www.sil.si.edu/
(202) 633-1706 | richar...@si.edu




On May 19, 2011, at 12:37 PM, James Lecard wrote:


I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to

ignore

it (I've patched it to overcome this issue) but I want to know if this

kind

of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.

Re: [CODE4LIB] is this valid marc ?

Now whether it _means_ what you want it to mean is another question, 
yeah. As Andreas said, I don't think that particular example _ought_ to 
have two 856's.


But it ought to be perfectly parseable marc.

If your 'patch' is to make ruby-marc combine those multiple 856's into 
one -- that is not right, two seperate 856's are two seperate 856's, 
same as any other marc field. Applying that patch would mess up 
ruby-marc, not fix it.


You need to be more specific about what you're doing and what you mean 
exactly by 'causing the ruby library to ignore it'.  I wonder if you are 
just using the a method in ruby-marc which only returns the first field 
matching a given tag when there is more than one.




On 5/19/2011 12:51 PM, Andreas Orphanides wrote:

From the MARC documentation [1]:

"Field 856 is repeated when the location data elements vary (the URL 
in subfield $u or subfields $a, $b, $d, when used). It is also 
repeated when more than one access method is used, different portions 
of the item are available electronically, mirror sites are recorded, 
different formats/resolutions with different URLs are indicated, and 
related items are recorded."


So it looks like however the URL is handled, a single 856 field should 
be used to indicate the location [2]. I am not familiar enough with 
MARC to say how it "should" have been done, but it looks like $q and 
$u would probably be sufficient (if they're in the same line).


However, since the field is repeatable, the parser shouldn't be 
choking on it, unless it's choking on it for a sophisticated reason 
(e.g., "These aren't the subfield tags I expect to be seeing"). It 
also looks like if $u is provided, the first subfield should indicate 
access method (in this case "4" for HTTP). Maybe that's what's causing 
the problem? [3]


Anyway, I think having these two parts of the same URL data on 
separate lines is definitely Not Right, but I am not sure if it adds 
up to invalid MARC.


-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.

On 5/19/2011 12:37 PM, James Lecard wrote:

I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to 
ignore
it (I've patched it to overcome this issue) but I want to know if 
this kind

of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  
\\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library


Thanks,

James L.

Re: [CODE4LIB] is this valid marc ?

In my last message, some of my "subfield"s should of course read "indicator". 
Still digesting lunch


-dre.

On 5/19/2011 12:37 PM, James Lecard wrote:

I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to ignore
it (I've patched it to overcome this issue) but I want to know if this kind
of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.

Re: [CODE4LIB] is this valid marc ?

2011-05-19 Thread James Lecard

Thanks a lot Richard,

So I guess my patch could be ported to the source code of ruby-marc,

Let me know if interested,

James

2011/5/19 Richard, Joel M 

> I'm no MARC expert, but I've learned enough to say that yes, this is valid
> in that what you're seeing is the $q (Electronic format type) and $u
> (Uniform Resource Identifier ) subfields of the 856 field.
>
> http://www.oclc.org/bibformats/en/8xx/856.shtm
>
> You'll see other things when you get multiple authors (creators) on an item
> or multiple anythings that can occur more than once.
>
> --Joel
>
> Joel Richard
> IT Specialist, Web Services Department
> Smithsonian Institution Libraries | http://www.sil.si.edu/
> (202) 633-1706 | richar...@si.edu
>
>
>
>
> On May 19, 2011, at 12:37 PM, James Lecard wrote:
>
> > I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
> > from a partner.
> >
> > The 856 field is splitted over 2 lines, causing the ruby library to
> ignore
> > it (I've patched it to overcome this issue) but I want to know if this
> kind
> > of marc is valid ?
> >
> > =LDR  00638nam  2200181uu 4500
> > =001  cla-MldNA01
> > =008  080101s2008\\\|fre||
> > =040  \\$aMy Provider
> > =041  0\$afre
> > =245  10$aThis Subject
> > =260  \\$aParis$bJ. Doe$c2008
> > =490  \\$aSome topic
> > =650  1\$aNarratif, Autre forme
> > =655  \7$abook$2lcsh
> > =752  \\$aA Place on earth
> > =776  \\$dParis: John Doe and Cie, 1973
> > =856  \2$qtext/html
> > =856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library
> >
> > Thanks,
> >
> > James L.
>

Re: [CODE4LIB] is this valid marc ?


From the MARC documentation [1]:

"Field 856 is repeated when the location data elements vary (the URL in subfield 
$u or subfields $a, $b, $d, when used). It is also repeated when more than one 
access method is used, different portions of the item are available 
electronically, mirror sites are recorded, different formats/resolutions with 
different URLs are indicated, and related items are recorded."


So it looks like however the URL is handled, a single 856 field should be used 
to indicate the location [2]. I am not familiar enough with MARC to say how it 
"should" have been done, but it looks like $q and $u would probably be 
sufficient (if they're in the same line).


However, since the field is repeatable, the parser shouldn't be choking on it, 
unless it's choking on it for a sophisticated reason (e.g., "These aren't the 
subfield tags I expect to be seeing"). It also looks like if $u is provided, the 
first subfield should indicate access method (in this case "4" for HTTP). Maybe 
that's what's causing the problem? [3]


Anyway, I think having these two parts of the same URL data on separate lines is 
definitely Not Right, but I am not sure if it adds up to invalid MARC.


-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.

On 5/19/2011 12:37 PM, James Lecard wrote:

I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to ignore
it (I've patched it to overcome this issue) but I want to know if this kind
of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.

Re: [CODE4LIB] MARCXML to MODS: 590 Field

Thanks, Karen and Jon!

That's what I suspected, but I couldn't find anything on the web about the 
thought process behind ignoring the 590 altogether. We'll likely end up using a 
local version of the XSLT to map it the mods:note as you suggested. We simply 
don't want this information to be lost in our MODS record as we, for example, 
embed it inside a METS document.

--Joel


On May 19, 2011, at 12:34 PM, Karen Miller wrote:

> Joel,
> 
> The 590 is indeed defined for local use, so whatever your local institution
> uses it for should guide your mapping to MODS. There are some examples of
> what it's used for on the OCLC Bibliographic Formats and Standards pages:
> 
> http://www.oclc.org/bibformats/en/5xx/590.shtm
> 
> Frequently it's used as a note that is specific to a local copy of an item.
> If your institution uses it inconsistently, you might want to just map it to
> mods:note.
> 
> Karen
> 
> Karen D. Miller
> Monographic/Digital Projects Cataloger
> Bibliographic Services Dept.
> Northwestern University Library
> Evanston, IL 
> k-mill...@northwestern.edu
> 847-467-3462
> 
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon
> Stroop
> Sent: Thursday, May 19, 2011 11:07 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field
> 
> I'm going to guess that it's because 59x fields are defined for local use:
> 
> http://www.loc.gov/marc/bibliographic/bd59x.html
> 
> ...but someone from LC should be able to confirm.
> -Jon
> 
> -- 
> Jon Stroop
> Metadata Analyst
> Firestone Library
> Princeton University
> Princeton, NJ 08544
> 
> Email: jstr...@princeton.edu
> Phone: (609)258-0059
> Fax: (609)258-0441
> 
> http://pudl.princeton.edu
> http://diglib.princeton.edu
> http://diglib.princeton.edu/ead
> http://www.cpanda.org/cpanda
> 
> 
> 
> On 05/19/2011 11:45 AM, Richard, Joel M wrote:
>> Dear hive-mind,
>> 
>> Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT
> [1] does not handle the MARC 590 Local Notes field? It seems to handle
> everything else, not that I've done an exhaustive search... :)
>> 
>> Granted, I could copy/create my own XSLT and add this functionality in
> myself, but I'm curious as to whether or not there's some logic behind this
> decision to not include it. Logic that I would not naturally understand
> since I'm not formally trained as a librarian.
>> 
>> Thanks!
>> --Joel
>> 
>> [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl
>> 
>> 
>> Joel Richard
>> IT Specialist, Web Services Department
>> Smithsonian Institution Libraries | http://www.sil.si.edu/
>> (202) 633-1706 | richar...@si.edu

Re: [CODE4LIB] is this valid marc ?

I'm no MARC expert, but I've learned enough to say that yes, this is valid in 
that what you're seeing is the $q (Electronic format type) and $u (Uniform 
Resource Identifier ) subfields of the 856 field. 

http://www.oclc.org/bibformats/en/8xx/856.shtm

You'll see other things when you get multiple authors (creators) on an item or 
multiple anythings that can occur more than once.

--Joel

Joel Richard
IT Specialist, Web Services Department
Smithsonian Institution Libraries | http://www.sil.si.edu/
(202) 633-1706 | richar...@si.edu




On May 19, 2011, at 12:37 PM, James Lecard wrote:

> I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
> from a partner.
> 
> The 856 field is splitted over 2 lines, causing the ruby library to ignore
> it (I've patched it to overcome this issue) but I want to know if this kind
> of marc is valid ?
> 
> =LDR  00638nam  2200181uu 4500
> =001  cla-MldNA01
> =008  080101s2008\\\|fre||
> =040  \\$aMy Provider
> =041  0\$afre
> =245  10$aThis Subject
> =260  \\$aParis$bJ. Doe$c2008
> =490  \\$aSome topic
> =650  1\$aNarratif, Autre forme
> =655  \7$abook$2lcsh
> =752  \\$aA Place on earth
> =776  \\$dParis: John Doe and Cie, 1973
> =856  \2$qtext/html
> =856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library
> 
> Thanks,
> 
> James L.

[CODE4LIB] is this valid marc ?

2011-05-19 Thread James Lecard

I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to ignore
it (I've patched it to overcome this issue) but I want to know if this kind
of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\|fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856  \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.

Re: [CODE4LIB] MARCXML to MODS: 590 Field

2011-05-19 Thread Karen Miller

Joel,

The 590 is indeed defined for local use, so whatever your local institution
uses it for should guide your mapping to MODS. There are some examples of
what it's used for on the OCLC Bibliographic Formats and Standards pages:

http://www.oclc.org/bibformats/en/5xx/590.shtm

Frequently it's used as a note that is specific to a local copy of an item.
If your institution uses it inconsistently, you might want to just map it to
mods:note.

Karen

Karen D. Miller
Monographic/Digital Projects Cataloger
Bibliographic Services Dept.
Northwestern University Library
Evanston, IL 
k-mill...@northwestern.edu
847-467-3462


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon
Stroop
Sent: Thursday, May 19, 2011 11:07 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field

I'm going to guess that it's because 59x fields are defined for local use:

http://www.loc.gov/marc/bibliographic/bd59x.html

...but someone from LC should be able to confirm.
-Jon

-- 
Jon Stroop
Metadata Analyst
Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://pudl.princeton.edu
http://diglib.princeton.edu
http://diglib.princeton.edu/ead
http://www.cpanda.org/cpanda



On 05/19/2011 11:45 AM, Richard, Joel M wrote:
> Dear hive-mind,
>
> Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT
[1] does not handle the MARC 590 Local Notes field? It seems to handle
everything else, not that I've done an exhaustive search... :)
>
> Granted, I could copy/create my own XSLT and add this functionality in
myself, but I'm curious as to whether or not there's some logic behind this
decision to not include it. Logic that I would not naturally understand
since I'm not formally trained as a librarian.
>
> Thanks!
> --Joel
>
> [1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl
>
>
> Joel Richard
> IT Specialist, Web Services Department
> Smithsonian Institution Libraries | http://www.sil.si.edu/
> (202) 633-1706 | richar...@si.edu

Re: [CODE4LIB] MARCXML to MODS: 590 Field

2011-05-19 Thread Jon Stroop


I'm going to guess that it's because 59x fields are defined for local use:

http://www.loc.gov/marc/bibliographic/bd59x.html

...but someone from LC should be able to confirm.
-Jon

--
Jon Stroop
Metadata Analyst
Firestone Library
Princeton University
Princeton, NJ 08544

Email: jstr...@princeton.edu
Phone: (609)258-0059
Fax: (609)258-0441

http://pudl.princeton.edu
http://diglib.princeton.edu
http://diglib.princeton.edu/ead
http://www.cpanda.org/cpanda



On 05/19/2011 11:45 AM, Richard, Joel M wrote:

Dear hive-mind,

Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT [1] 
does not handle the MARC 590 Local Notes field? It seems to handle everything 
else, not that I've done an exhaustive search... :)

Granted, I could copy/create my own XSLT and add this functionality in myself, 
but I'm curious as to whether or not there's some logic behind this decision to 
not include it. Logic that I would not naturally understand since I'm not 
formally trained as a librarian.

Thanks!
--Joel

[1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl


Joel Richard
IT Specialist, Web Services Department
Smithsonian Institution Libraries | http://www.sil.si.edu/
(202) 633-1706 | richar...@si.edu

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Luciano Ramalho

On Thu, May 19, 2011 at 8:31 AM, Andreas Orphanides
 wrote:
> - As Graham says, there's a sunk-cost issue: you're going to prioritize the 
> stuff you paid for over free stuff since you've already invested resources in 
> it.

Everybody who believes in sunk-cost should learn to play Go, the
ancient japanese game. One of the things that you learn playing Go is
to let go (pun intended) of resources already spent unwisely when
there are better courses of action.

Wikipedia has a good introductory article on the subjec "Escalation of
commitment":

http://en.wikipedia.org/wiki/Escalation_of_commitment
-- 
Luciano Ramalho
programador repentista || stand-up programmer
Twitter: @luciano

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Luciano Ramalho

On Thu, May 19, 2011 at 6:24 AM, graham  wrote:
> 2. It is hard to justify spending time on improving access to free stuff
> when the end result would be good for everyone, not just the institution
> doing the work (unless it can be kept in a consortium and outside-world
> access limited)

Why is it hard to justify anything that would be good for everyone?


-- 
Luciano Ramalho
programador repentista || stand-up programmer
Twitter: @luciano

[CODE4LIB] MARCXML to MODS: 590 Field

Dear hive-mind,

Does anyone know why the Library of Congress-supplied MARCXML to MODS XSLT [1] 
does not handle the MARC 590 Local Notes field? It seems to handle everything 
else, not that I've done an exhaustive search... :) 

Granted, I could copy/create my own XSLT and add this functionality in myself, 
but I'm curious as to whether or not there's some logic behind this decision to 
not include it. Logic that I would not naturally understand since I'm not 
formally trained as a librarian. 

Thanks!
--Joel

[1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-4.xsl


Joel Richard
IT Specialist, Web Services Department
Smithsonian Institution Libraries | http://www.sil.si.edu/
(202) 633-1706 | richar...@si.edu

Re: [CODE4LIB] Seth Godin on The future of the library


On 5/19/2011 11:01 AM, graham wrote:

Replying to Jonathan's mail rather at random, since several people are
saying similar things.

1. 'Free resources can vanish any time.' But so can commercial ones,
which is why LOCKSS was created. This isn't an insoluble issue or one
unique to free resources.


You missed my point.  The difficulty we have of dealing with the 
"breaking resources" problem is proportional to the number of 
vendors/sources we are dealing with.  Dealing with 10 or 100 vendors is 
hard; dealing with 1000s of sources is harder.  Ignoring free stuff is 
one easy way of not having to deal with this. (Not neccesarily an 
optimal one!).


I do not disagree that there are huge advantages to free resources of 
course!  Just trying to analyze some of the practical difficulties, 
which are not simply irrational prejudices or what have you.  Also 
didn't mean to say that any of the challenges are insoluble or unique to 
free resources.

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Mike Taylor

There is no such thing as a zero-cost lunch; but there is such a thing
as a freedom lunch.  I concur with Karen that (once again) much
confusion is being generated here by the English language's lamentable
use of the same word "free" to mean too such different things.

-- Mike.



On 19 May 2011 16:01, graham  wrote:
> Replying to Jonathan's mail rather at random, since several people are
> saying similar things.
>
> 1. 'Free resources can vanish any time.' But so can commercial ones,
> which is why LOCKSS was created. This isn't an insoluble issue or one
> unique to free resources.
>
> 2. 'Managing 100s of paid resources is difficult, managing 1000s of free
> ones would be impossible'. But why on earth would you try? There are
> many specialized free resources, only a few of which are likely to
> provide material your particular library wants in its collection. Surely
> you would select the ones you want, not least on grounds of reliability.
> And on those grounds (longevity and reliability) you would end up using
> Gutenberg in preference to any commercial supplier (not that I'm
> suggesting you should)). Selection of commercial resources is done at
> least in part by cost; selection of free ones can be done on more
> appropriate grounds.
>
> 3. 'There is no such thing as a free lunch'. Who said there was? But
> resources which can be used freely have advantages over ones that can't.
>
>
> Graham
>
> On 05/19/11 15:44, Jonathan Rochkind wrote:
>> Another problem with free online resources not just 'collection
>> selection', but maintenance/support once selected. A resource hosted
>> elsewhere can stop working at any time, which is a management challenge.
>>
>> The present environment is ALREADY a management challenge, of course.
>> But consider the present environment: You subscribe to anywhere from a
>> handful to around 100 seperate vendor 'platforms'.  Each one can change
>> it's interface at any time, or go down at any time, breaking your
>> integration or access to it.  When it does, you've got to notice (a hard
>> problem in itself), and then file a support incident with the vendor.
>> This is already a mess we have trouble keeping straight. But.
>>
>> Compare to the idea of hundreds or thousands or more different suppliers
>> hosting free content, each one of which can change it's interface or go
>> down at any time, and when you notice (still a hard problem, now even
>> harder because you have more content from more hosts)... what do you do?
>>
>> One solution to this would be free content aggregators which hosted LOTS
>> of free content on one platform (cutting down your number of sources to
>> keep track of make sure they're working), and additionally, presumably
>> for a fee, offered support services.
>>
>> Another direction would be not relying on remote platforms to host
>> content, but hosting it internally. Which may be more 'business case'
>> feasible with free content than with pay content -- the owners/providers
>> dont' want to let us host the pay content locally.  But hosting content
>> locally comes with it's own expenses, the library needs to invest
>> resources in developing/maintaining or purchasing the software (and
>> hardware) to do that, as well as respond to maintenance issues with the
>> local hosting.
>>
>> In the end, there's no such thing as a free lunch, as usual. "Free"
>> content still isn't free for libraries to integrate with local
>> interfaces and support well, whether that cost comes from internal
>> staffing and other budgetting, or from paying a third party to help.  Of
>> course, some solutions are more cost efficient than others, not all are
>> equal.
>>
>> Jonathan
>>
>> On 5/19/2011 9:31 AM, Bill Dueber wrote:
>>> My short answer: It's too damn expensive to check out everything that's
>>> available for free to see if it's worth selecting for inclusion, and
>>> library's (at least as I see them) are supposed to be curated, not
>>> comprehensive.
>>>
>>> My long answer:
>>>
>>> The most obvious issue is that the OPAC is traditionally a listing of
>>> "holdings," and free ebooks aren't "held" in any sense that helps
>>> disambiguate them from any other random text on the Internet.
>>> Certainly the
>>> fact that someone bothered to transform it into ebook form isn't
>>> indicative
>>> of anything. Not everything that's available can be cataloged. I see
>>> "stuff
>>> we paid for" not as an arbitrary bias, but simply as a very, very
>>> useful way
>>> to define the borders of the library.
>>>
>>> "Free" is a very recent phenomenon, but it just adds more complexity
>>> to the
>>> existing problem of deciding what publications are within the library's
>>> scope. Library collections are curated, and that curation mission is not
>>> simply a side effect of limited funds. The filtering process that goes
>>> into
>>> deciding what a library will hold is itself an incredibly valuable
>>> aspect of
>>> the collection.
>>>
>>> Up until very recently, the most important p

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Karen Coyle


I wonder if we aren't conflating a diverse set of issues here.

- free (no cost)
- free and online
- free = not peer reviewed
- online

As Jonathan notes, we already face problems with online materials,  
even those we subscribe to. And libraries do take in free hard-copy  
books in the form of donations (although weeding through those is  
almost not worth the trouble). In addition, there are free materials  
like government documents (at least in the US) that are considered  
quite valuable.


So it seems like "free" isn't the big issue here, it's management,  
selection, etc.


kc


Quoting Jonathan Rochkind :

Another problem with free online resources not just 'collection  
selection', but maintenance/support once selected. A resource hosted  
elsewhere can stop working at any time, which is a management  
challenge.


The present environment is ALREADY a management challenge, of  
course. But consider the present environment: You subscribe to  
anywhere from a handful to around 100 seperate vendor 'platforms'.   
Each one can change it's interface at any time, or go down at any  
time, breaking your integration or access to it.  When it does,  
you've got to notice (a hard problem in itself), and then file a  
support incident with the vendor.  This is already a mess we have  
trouble keeping straight. But.


Compare to the idea of hundreds or thousands or more different  
suppliers hosting free content, each one of which can change it's  
interface or go down at any time, and when you notice (still a hard  
problem, now even harder because you have more content from more  
hosts)... what do you do?


One solution to this would be free content aggregators which hosted  
LOTS of free content on one platform (cutting down your number of  
sources to keep track of make sure they're working), and  
additionally, presumably for a fee, offered support services.


Another direction would be not relying on remote platforms to host  
content, but hosting it internally. Which may be more 'business  
case' feasible with free content than with pay content -- the  
owners/providers dont' want to let us host the pay content locally.   
But hosting content locally comes with it's own expenses, the  
library needs to invest resources in developing/maintaining or  
purchasing the software (and hardware) to do that, as well as  
respond to maintenance issues with the local hosting.


In the end, there's no such thing as a free lunch, as usual. "Free"  
content still isn't free for libraries to integrate with local  
interfaces and support well, whether that cost comes from internal  
staffing and other budgetting, or from paying a third party to help.  
 Of course, some solutions are more cost efficient than others, not  
all are equal.


Jonathan

On 5/19/2011 9:31 AM, Bill Dueber wrote:

My short answer: It's too damn expensive to check out everything that's
available for free to see if it's worth selecting for inclusion, and
library's (at least as I see them) are supposed to be curated, not
comprehensive.

My long answer:

The most obvious issue is that the OPAC is traditionally a listing of
"holdings," and free ebooks aren't "held" in any sense that helps
disambiguate them from any other random text on the Internet. Certainly the
fact that someone bothered to transform it into ebook form isn't indicative
of anything. Not everything that's available can be cataloged. I see "stuff
we paid for" not as an arbitrary bias, but simply as a very, very useful way
to define the borders of the library.

"Free" is a very recent phenomenon, but it just adds more complexity to the
existing problem of deciding what publications are within the library's
scope. Library collections are curated, and that curation mission is not
simply a side effect of limited funds. The filtering process that goes into
deciding what a library will hold is itself an incredibly valuable aspect of
the collection.

Up until very recently, the most important pre-purchase filter was the fact
that some publisher thought she could make some money by printing text on
paper, and by doing so also allocated resources to edit/typeset/etc. For a
traditionally-published work, we know that real person(s), with relatively
transparent goals, has already read it and decided it was worth the gamble
to sink some fixed costs into the project. It certainly wasn't a perfect
filter, but anyone who claims it didn't add enormous information to the
system is being disingenuous.

Now that (e)publishing and (e)printing costs have nosedived toward $0.00,
that filter is breaking. Even print-on-paper costs have been reduced
enormously. But going through the slush pile, doing market research,
filtering, editing, marketing -- these things all cost money, and for the
moment the traditional publishing houses still do them better and more
efficiently than anyone else. And they expect to be paid for their work, and
they should.

There's a tendency in the library world, I think, to dismi

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread graham

Replying to Jonathan's mail rather at random, since several people are
saying similar things.

1. 'Free resources can vanish any time.' But so can commercial ones,
which is why LOCKSS was created. This isn't an insoluble issue or one
unique to free resources.

2. 'Managing 100s of paid resources is difficult, managing 1000s of free
ones would be impossible'. But why on earth would you try? There are
many specialized free resources, only a few of which are likely to
provide material your particular library wants in its collection. Surely
you would select the ones you want, not least on grounds of reliability.
And on those grounds (longevity and reliability) you would end up using
Gutenberg in preference to any commercial supplier (not that I'm
suggesting you should)). Selection of commercial resources is done at
least in part by cost; selection of free ones can be done on more
appropriate grounds.

3. 'There is no such thing as a free lunch'. Who said there was? But
resources which can be used freely have advantages over ones that can't.


Graham

On 05/19/11 15:44, Jonathan Rochkind wrote:
> Another problem with free online resources not just 'collection
> selection', but maintenance/support once selected. A resource hosted
> elsewhere can stop working at any time, which is a management challenge.
> 
> The present environment is ALREADY a management challenge, of course.
> But consider the present environment: You subscribe to anywhere from a
> handful to around 100 seperate vendor 'platforms'.  Each one can change
> it's interface at any time, or go down at any time, breaking your
> integration or access to it.  When it does, you've got to notice (a hard
> problem in itself), and then file a support incident with the vendor. 
> This is already a mess we have trouble keeping straight. But.
> 
> Compare to the idea of hundreds or thousands or more different suppliers
> hosting free content, each one of which can change it's interface or go
> down at any time, and when you notice (still a hard problem, now even
> harder because you have more content from more hosts)... what do you do?
> 
> One solution to this would be free content aggregators which hosted LOTS
> of free content on one platform (cutting down your number of sources to
> keep track of make sure they're working), and additionally, presumably
> for a fee, offered support services.
> 
> Another direction would be not relying on remote platforms to host
> content, but hosting it internally. Which may be more 'business case'
> feasible with free content than with pay content -- the owners/providers
> dont' want to let us host the pay content locally.  But hosting content
> locally comes with it's own expenses, the library needs to invest
> resources in developing/maintaining or purchasing the software (and
> hardware) to do that, as well as respond to maintenance issues with the
> local hosting.
> 
> In the end, there's no such thing as a free lunch, as usual. "Free"
> content still isn't free for libraries to integrate with local
> interfaces and support well, whether that cost comes from internal
> staffing and other budgetting, or from paying a third party to help.  Of
> course, some solutions are more cost efficient than others, not all are
> equal.
> 
> Jonathan
> 
> On 5/19/2011 9:31 AM, Bill Dueber wrote:
>> My short answer: It's too damn expensive to check out everything that's
>> available for free to see if it's worth selecting for inclusion, and
>> library's (at least as I see them) are supposed to be curated, not
>> comprehensive.
>>
>> My long answer:
>>
>> The most obvious issue is that the OPAC is traditionally a listing of
>> "holdings," and free ebooks aren't "held" in any sense that helps
>> disambiguate them from any other random text on the Internet.
>> Certainly the
>> fact that someone bothered to transform it into ebook form isn't
>> indicative
>> of anything. Not everything that's available can be cataloged. I see
>> "stuff
>> we paid for" not as an arbitrary bias, but simply as a very, very
>> useful way
>> to define the borders of the library.
>>
>> "Free" is a very recent phenomenon, but it just adds more complexity
>> to the
>> existing problem of deciding what publications are within the library's
>> scope. Library collections are curated, and that curation mission is not
>> simply a side effect of limited funds. The filtering process that goes
>> into
>> deciding what a library will hold is itself an incredibly valuable
>> aspect of
>> the collection.
>>
>> Up until very recently, the most important pre-purchase filter was the
>> fact
>> that some publisher thought she could make some money by printing text on
>> paper, and by doing so also allocated resources to edit/typeset/etc.
>> For a
>> traditionally-published work, we know that real person(s), with
>> relatively
>> transparent goals, has already read it and decided it was worth the
>> gamble
>> to sink some fixed costs into the project. It certainly wasn't a per

[CODE4LIB] Job Posting: Web Developer, Smithsonian Institution Libraries

The Smithsonian Institution Libraries is recruiting for a web developer 
position. We are in the midst of many interesting projects right now, including 
working with linked open data, building a new digital library, moving to 
Drupal, mass-digitization, and other projects.

The Libraries serves a broad audience including researchers throughout the 
Institution – from Art to Zoology – as well as affiliated scientists and 
curators, students, and the general public.  We are a small and friendly 
department that has a lot of support from management.

More information can be found here http://www.sil.si.edu/link/?webdev or on 
http://www.USAjobs.gov by searching for Job Announcement Number: 
11R-LG-296860-MPA-SIL

The Smithsonian Institution is an EEO employer.

Joel Richard
IT Specialist, Web Services Department
Smithsonian Institution Libraries | http://www.sil.si.edu/
(202) 633-1706 | richar...@si.edu

Re: [CODE4LIB] wikipedia/author disambiguation

Curious what script you've used that isn't production ready -- I don't
think you meant to post in the URL for the JQuery library?

On 5/19/2011 10:39 AM, Karen Coyle wrote:
This sounds like a great way to "translate" from library forms to
wikipedia name forms. But for on-the-fly use I wonder if it wouldn't
be more efficient to eliminate the "middle man." Karen, can you say a
little about what it took to link library names to WP? Was it a
one-step, two-step, etc.?

There is a script that I've seen used, although it doesn't seem to be
production ready:

https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js

One interesting note from the OL experience of linking to WP:
generally you need to "re-reverse" the names to get a match: from
Twain, Mark to Mark Twain. But for some names that isn't the case:
Mao, Tse-Tung. Edward Betts used Wikipedia to determine which names do
not get "re-reversed".

The OL code for its wikipedia lookup is at:

https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/wikipedia

It, however, runs against dumps rather than an API.

Quoting Karen Coombs :

Graham,

I'd advocate using WorldCat Identities to get to the appropriate url
for dbpedia. Each Identity record has a wikipedia element in it that
you could use to link to either Wikipedia or dbpedia.

If you want to see an example of this in action you can check out the
Author Info demo I did for code4lib 2010 here -
http://www.librarywebchic.net/mashups/author_info/info_about_this_author.php?OCLCNum=32939031

The code for this demo is available for download at -
http://www.worldcat.org/devnet/code/devnetDemos/trunk/

You'll want the author_info folder and identity_info.php

Karen

Karen A. Coombs
Product Manager
OCLC Developer Network
coom...@oclc.org

On Thu, May 19, 2011 at 4:40 AM, graham wrote:
I need to be able to take author data from a catalogue record and
use it

to look up the author on Wikipedia on the fly. So I may have birth date
and possibly year of death in addition to (one spelling of) the name,
the title of one book the author wrote etc.

I know there are various efforts in progress that will improve the
current situation, but as things stand at the moment what is the best*
way to do this?

1. query wikipedia for as much as possible, parse and select the best
fitting result

2. go via dbpedia/freebase and work back from there

3. use VIAF and/or OCLC services

4. Other?

(I have no experience of 2-4 yet :-(

Thanks
Graham
* 'best' being constrained by:
- need to do this in real-time
- need to avoid dependence on services which may be taken away
or charged for
- being able to justify to librarians as reasonably accurate :-)

Re: [CODE4LIB] Seth Godin on The future of the library

Another problem with free online resources not just 'collection 
selection', but maintenance/support once selected. A resource hosted 
elsewhere can stop working at any time, which is a management challenge.


The present environment is ALREADY a management challenge, of course. 
But consider the present environment: You subscribe to anywhere from a 
handful to around 100 seperate vendor 'platforms'.  Each one can change 
it's interface at any time, or go down at any time, breaking your 
integration or access to it.  When it does, you've got to notice (a hard 
problem in itself), and then file a support incident with the vendor.  
This is already a mess we have trouble keeping straight. But.


Compare to the idea of hundreds or thousands or more different suppliers 
hosting free content, each one of which can change it's interface or go 
down at any time, and when you notice (still a hard problem, now even 
harder because you have more content from more hosts)... what do you do?


One solution to this would be free content aggregators which hosted LOTS 
of free content on one platform (cutting down your number of sources to 
keep track of make sure they're working), and additionally, presumably 
for a fee, offered support services.


Another direction would be not relying on remote platforms to host 
content, but hosting it internally. Which may be more 'business case' 
feasible with free content than with pay content -- the owners/providers 
dont' want to let us host the pay content locally.  But hosting content 
locally comes with it's own expenses, the library needs to invest 
resources in developing/maintaining or purchasing the software (and 
hardware) to do that, as well as respond to maintenance issues with the 
local hosting.


In the end, there's no such thing as a free lunch, as usual. "Free" 
content still isn't free for libraries to integrate with local 
interfaces and support well, whether that cost comes from internal 
staffing and other budgetting, or from paying a third party to help.  Of 
course, some solutions are more cost efficient than others, not all are 
equal.


Jonathan

On 5/19/2011 9:31 AM, Bill Dueber wrote:

My short answer: It's too damn expensive to check out everything that's
available for free to see if it's worth selecting for inclusion, and
library's (at least as I see them) are supposed to be curated, not
comprehensive.

My long answer:

The most obvious issue is that the OPAC is traditionally a listing of
"holdings," and free ebooks aren't "held" in any sense that helps
disambiguate them from any other random text on the Internet. Certainly the
fact that someone bothered to transform it into ebook form isn't indicative
of anything. Not everything that's available can be cataloged. I see "stuff
we paid for" not as an arbitrary bias, but simply as a very, very useful way
to define the borders of the library.

"Free" is a very recent phenomenon, but it just adds more complexity to the
existing problem of deciding what publications are within the library's
scope. Library collections are curated, and that curation mission is not
simply a side effect of limited funds. The filtering process that goes into
deciding what a library will hold is itself an incredibly valuable aspect of
the collection.

Up until very recently, the most important pre-purchase filter was the fact
that some publisher thought she could make some money by printing text on
paper, and by doing so also allocated resources to edit/typeset/etc. For a
traditionally-published work, we know that real person(s), with relatively
transparent goals, has already read it and decided it was worth the gamble
to sink some fixed costs into the project. It certainly wasn't a perfect
filter, but anyone who claims it didn't add enormous information to the
system is being disingenuous.

Now that (e)publishing and (e)printing costs have nosedived toward $0.00,
that filter is breaking. Even print-on-paper costs have been reduced
enormously. But going through the slush pile, doing market research,
filtering, editing, marketing -- these things all cost money, and for the
moment the traditional publishing houses still do them better and more
efficiently than anyone else. And they expect to be paid for their work, and
they should.

There's a tendency in the library world, I think, to dismiss the value of
non-academic professionals and assume random people or librarians can just
do the work (see also: web-site development, usability studies, graphic
design, instructional design and development), but successful publishers are
incredibly good at what they do, and the value they add shouldn't be
dismissed (although their business practices should certainly be under
scrutiny).

Of course, I'm not differentiating free (no money) and free (CC0). One can
imagine models where the functions of the publishing house move to a
work-for-hire model and the final content is released CC0, but it's not
clear who's going to pay them for their time.

Re: [CODE4LIB] wikipedia/author disambiguation

2011-05-19 Thread Karen Coyle

This sounds like a great way to "translate" from library forms to
wikipedia name forms. But for on-the-fly use I wonder if it wouldn't
be more efficient to eliminate the "middle man." Karen, can you say a
little about what it took to link library names to WP? Was it a
one-step, two-step, etc.?

There is a script that I've seen used, although it doesn't seem to be
production ready:

https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js

The OL code for its wikipedia lookup is at:

https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/wikipedia

It, however, runs against dumps rather than an API.

Quoting Karen Coombs :

Graham,

I'd advocate using WorldCat Identities to get to the appropriate url
for dbpedia. Each Identity record has a wikipedia element in it that
you could use to link to either Wikipedia or dbpedia.

The code for this demo is available for download at -
http://www.worldcat.org/devnet/code/devnetDemos/trunk/

You'll want the author_info folder and identity_info.php

Karen

Karen A. Coombs
Product Manager
OCLC Developer Network
coom...@oclc.org

On Thu, May 19, 2011 at 4:40 AM, graham wrote:

I need to be able to take author data from a catalogue record and use it
to look up the author on Wikipedia on the fly. So I may have birth date
and possibly year of death in addition to (one spelling of) the name,
the title of one book the author wrote etc.

I know there are various efforts in progress that will improve the
current situation, but as things stand at the moment what is the best*
way to do this?

1. query wikipedia for as much as possible, parse and select the best
fitting result

2. go via dbpedia/freebase and work back from there

3. use VIAF and/or OCLC services

4. Other?

(I have no experience of 2-4 yet :-(

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Re: [CODE4LIB] wikipedia/author disambiguation

In addition to the approaches you note, might be worth investigating 
this tool that came up in a thread just a few days ago on this list:


http://wikipedia-miner.sourceforge.net/


I think nobody's done enough with this yet to be sure what will work 
best, I think you're going to have to experiment and let us know.


VIAF/OCLC services are presumably using some sort of statistical 
analysis/text mining approaches under the hood; wikipedia-miner is using 
such approaches but giving you the code in open source too if you're 
curious exactly what they're doing.  I suspect statistical approaches 
like wikipedia-miner uses are likely to be more productive than pure 
"parsing" approaches considering only one record at a time in 
isolation.   But writing your own statistics analysis algorithms is 
probably more work than you want, especially when wikipedia-miner and/or 
VIAF/OCLC services already exist.


If you don't do statistical analysis of the corpus, and do end up 
actually trying to search wikipedia directly -- then I suspect dbpedia 
is a lot more convenient endpoint than trying to screen-scrape HTML 
wikipedia. That's pretty much what dbpedia is for.


But these are all just my guesses, not informed by any work I've done.

Jonathan


On 5/19/2011 5:40 AM, graham wrote:

I need to be able to take author data from a catalogue record and use it
to look up the author on Wikipedia on the fly. So I may have birth date
and possibly year of death in addition to (one spelling of) the name,
the title of one book the author wrote etc.

I know there are various efforts in progress that will improve the
current situation, but as things stand at the moment what is the best*
way to do this?

1. query wikipedia for as much as possible, parse and select the best
fitting result

2. go via dbpedia/freebase and work back from there

3. use VIAF and/or OCLC services

4. Other?

(I have no experience of 2-4 yet :-(


Thanks
Graham
* 'best' being constrained by:
- need to do this in real-time
- need to avoid dependence on services which may be taken away
or charged for
- being able to justify to librarians as reasonably accurate :-)

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Yitzchak Schaffer


On 2011-05-18 20:30, Eric Hellman wrote:

Exactly. I apologize if my comment was perceived as coy, but I've chosen to 
invest in the possibility that Creative Commons licensing is a viable way 
forward for libraries, authors, readers, etc. Here's a link the last of a 5 
part series on open-access ebooks. I hope it inspires work in the code4lib 
community to make libraries more friendly to free stuff.

http://go-to-hellman.blogspot.com/2011/05/open-access-ebooks-part-5-changing.html



Here's a post from a Jewish Studies scholar about his own decision to 
self-publish under a CC license


http://www.rationalistjudaism.com/2011/05/changing-world-of-jewish-scholarship.html

--
Yitzchak Schaffer

Re: [CODE4LIB] wikipedia/author disambiguation

2011-05-19 Thread Karen Coombs

Graham,

I'd advocate using WorldCat Identities to get to the appropriate url
for dbpedia. Each Identity record has a wikipedia element in it that
you could use to link to either Wikipedia or dbpedia.

If you want to see an example of this in action you can check out the
Author Info demo I did for code4lib 2010 here -
http://www.librarywebchic.net/mashups/author_info/info_about_this_author.php?OCLCNum=32939031

The code for this demo is available for download at -
http://www.worldcat.org/devnet/code/devnetDemos/trunk/

You'll want the author_info folder and identity_info.php

Karen

Karen A. Coombs
Product Manager
OCLC Developer Network
coom...@oclc.org


On Thu, May 19, 2011 at 4:40 AM, graham  wrote:
> I need to be able to take author data from a catalogue record and use it
> to look up the author on Wikipedia on the fly. So I may have birth date
> and possibly year of death in addition to (one spelling of) the name,
> the title of one book the author wrote etc.
>
> I know there are various efforts in progress that will improve the
> current situation, but as things stand at the moment what is the best*
> way to do this?
>
> 1. query wikipedia for as much as possible, parse and select the best
> fitting result
>
> 2. go via dbpedia/freebase and work back from there
>
> 3. use VIAF and/or OCLC services
>
> 4. Other?
>
> (I have no experience of 2-4 yet :-(
>
>
> Thanks
> Graham
> * 'best' being constrained by:
> - need to do this in real-time
> - need to avoid dependence on services which may be taken away
> or charged for
> - being able to justify to librarians as reasonably accurate :-)
>

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Bill Dueber

My short answer: It's too damn expensive to check out everything that's
available for free to see if it's worth selecting for inclusion, and
library's (at least as I see them) are supposed to be curated, not
comprehensive.

My long answer:

The most obvious issue is that the OPAC is traditionally a listing of
"holdings," and free ebooks aren't "held" in any sense that helps
disambiguate them from any other random text on the Internet. Certainly the
fact that someone bothered to transform it into ebook form isn't indicative
of anything. Not everything that's available can be cataloged. I see "stuff
we paid for" not as an arbitrary bias, but simply as a very, very useful way
to define the borders of the library.

"Free" is a very recent phenomenon, but it just adds more complexity to the
existing problem of deciding what publications are within the library's
scope. Library collections are curated, and that curation mission is not
simply a side effect of limited funds. The filtering process that goes into
deciding what a library will hold is itself an incredibly valuable aspect of
the collection.

Up until very recently, the most important pre-purchase filter was the fact
that some publisher thought she could make some money by printing text on
paper, and by doing so also allocated resources to edit/typeset/etc. For a
traditionally-published work, we know that real person(s), with relatively
transparent goals, has already read it and decided it was worth the gamble
to sink some fixed costs into the project. It certainly wasn't a perfect
filter, but anyone who claims it didn't add enormous information to the
system is being disingenuous.

Now that (e)publishing and (e)printing costs have nosedived toward $0.00,
that filter is breaking. Even print-on-paper costs have been reduced
enormously. But going through the slush pile, doing market research,
filtering, editing, marketing -- these things all cost money, and for the
moment the traditional publishing houses still do them better and more
efficiently than anyone else. And they expect to be paid for their work, and
they should.

There's a tendency in the library world, I think, to dismiss the value of
non-academic professionals and assume random people or librarians can just
do the work (see also: web-site development, usability studies, graphic
design, instructional design and development), but successful publishers are
incredibly good at what they do, and the value they add shouldn't be
dismissed (although their business practices should certainly be under
scrutiny).

Of course, I'm not differentiating free (no money) and free (CC0). One can
imagine models where the functions of the publishing house move to a
work-for-hire model and the final content is released CC0, but it's not
clear who's going to pay them for their time.

  -Bill-

On Thu, May 19, 2011 at 8:04 AM, Andreas Orphanides <
andreas_orphani...@ncsu.edu> wrote:

> On 5/19/2011 7:36 AM, Mike Taylor wrote:
>
>> I dunno.  How do you assess the whole realm of proprietary stuff?
>> Wouldn't the same approach work for free stuff?
>>
>> -- Mike.
>>
>
> A fair question. I think there's maybe at least two parts: marketing and
> bundling.
>
> Marketing is of course not ideal, and likely counterproductive on a number
> of measures, but at least when a product is marketed you get sales demos.
> Even if they are designed to make a product or collection look as good as
> possible, it still gives you some sense of scale, quality, content, etc.
>
> I think bundling is probably more important. It's a challenge in the
> free-stuff realm, but for open access products where there is bundling (for
> instance, Directory of Open Access Journals) I think you are likely to see
> wider adoption.
>
> Bundling can of course be both good (lower management cost) and bad
> (potentially diluting collection quality for your target audience). But when
> there isn't any bundling, which is true for a whole lot of free stuff,
> you've got to locally gather a million little bits into a collection.
>
> I guess what's really happening in the bundling case, at least for free
> content, is that collection and quality management activities are being
> "outsourced" to a third party. This is probably why DOAJ gets decent
> adoption. But of course, this still requires SOME group to be willing to
> perform these activities, and for the content/package to remain free, they
> either have to get some kind of outside funding (e.g., donations) or be
> willing to volunteer their services.
>

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

Re: [CODE4LIB] Seth Godin on The future of the library


On 5/19/2011 7:36 AM, Mike Taylor wrote:

I dunno.  How do you assess the whole realm of proprietary stuff?
Wouldn't the same approach work for free stuff?

-- Mike.


A fair question. I think there's maybe at least two parts: marketing and 
bundling.

Marketing is of course not ideal, and likely counterproductive on a number of 
measures, but at least when a product is marketed you get sales demos. Even if 
they are designed to make a product or collection look as good as possible, it 
still gives you some sense of scale, quality, content, etc.


I think bundling is probably more important. It's a challenge in the free-stuff 
realm, but for open access products where there is bundling (for instance, 
Directory of Open Access Journals) I think you are likely to see wider adoption.


Bundling can of course be both good (lower management cost) and bad (potentially 
diluting collection quality for your target audience). But when there isn't any 
bundling, which is true for a whole lot of free stuff, you've got to locally 
gather a million little bits into a collection.


I guess what's really happening in the bundling case, at least for free content, 
is that collection and quality management activities are being "outsourced" to a 
third party. This is probably why DOAJ gets decent adoption. But of course, this 
still requires SOME group to be willing to perform these activities, and for the 
content/package to remain free, they either have to get some kind of outside 
funding (e.g., donations) or be willing to volunteer their services.

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-19 Thread Mike Taylor

On 19 May 2011 12:31, Andreas Orphanides  wrote:
> - I think there's a fear of a slippery slope and/or information overload: How 
> do you assess the whole realm of freely-available stuff?

I dunno.  How do you assess the whole realm of proprietary stuff?
Wouldn't the same approach work for free stuff?

-- Mike.

Re: [CODE4LIB] Seth Godin on The future of the library