Re: [CODE4LIB] marc21 and usmarc

2009-01-23 Thread Till Kinstler

Ya'aqov Ziso schrieb:


MAB

German MARC


Just to note that:
MAB is (unfortunately) not a German MARC, it is structurally and 
semantically totally different from any MARC dialect. So it is really 
hard to work with MAB data, because all the nice open source tools for 
processing bibliographic records (MARC::Record, marc4j, software based 
on that like solrmarc ...) are hard to use in applications for German 
libraries. There are no nice OS programming libraries for handling MAB...
There are ongoing efforts to promote MARC 21 in Germany. There is even a 
resolution by the German library standardization board to switch to MARC 
21 as official interlibrary exchange format dating back to 2004. But in 
practice, MARC 21 doesn't play any role yet in Germany... That makes it 
really hard to promote some Open Source solutions that focus primarily 
on MARC data (not so much for technical reasons).


--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de


Re: [CODE4LIB] marc21 and usmarc

2009-01-23 Thread Brown, Alan
At 22 January 2009 16:28 Eric Lease Morgan wrote:

 Does anybody here know the difference between MARC21 and USMARC?
 
 I am munging sets of MARC bibliographic data from a III catalog with
 holdings data from the same. I am using MARC::Batch to read my bib'
 data (with both strict and warnings turned off), insert 853 and 863
 fields, and writing the data using the as_usmarc method. Therefore, I
 think I am creating USMARC files. I can then use marcdump to... dump
 the records. It returns 0 errors. 

Eric, This isn't an encoding thing is it? I know that a number of III
catalogues still encode their diacritics using the MARC8 version of
USMARC. We have changed ours to Unicode now, but we did have an issue of
the catalogue outputting unicode records that weren't tagged as such in
the leader and so couldn't be identified as proper MARC21 (current
version of USMARC). III have solved this with their latest release. This
issue had me scratching my head with a lot of my MARC::Record scripts,
but generally they failed quite spectacularly.

regards

Alan Brown
-- 
Alan Brown
Library Systems Liaison Officer
Resource Services
Bury Libraries
Textile Hall
Manchester Rd
Bury
BL9 0DG
Tel 0161 253 5877
Fax 0161 253 6003
http://www.bury.gov.uk/libraries
http://library.bury.gov.uk
-
Why not visit our website www.bury.gov.uk
-
The information contained in this e-mail and any files transmitted
with it is for the intended recipient(s) alone. It may contain
confidential information that is exempt from the disclosure under
English law and may also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take any
action in reliance on it. 
If you have received this e-mail in error, please notify us immediately by 
using 
the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may be 
intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any 
response to it under the Freedom of Information Act 2000 unless the information
in it is covered by one of the exemptions in the Act. By responding to this
e-mail you accept that your response may be subject of recording/monitoring to
ensure compliance with the Council's ICT Security Policy. 
Electronic service accepted only at legalservi...@bury.gov.uk and on fax number 
0161 253 5119 .
*


Re: [CODE4LIB] marc21 and usmarc

2009-01-23 Thread Eric Lease Morgan
On 1/23/09 4:39 AM, Brown, Alan a.br...@bury.gov.uk wrote:

 Does anybody here know the difference between MARC21 and USMARC?
 
 I am munging sets of MARC bibliographic data from a III catalog with
 holdings data from the same. I am using MARC::Batch to read my bib'
 data (with both strict and warnings turned off), insert 853 and 863
 fields, and writing the data using the as_usmarc method. Therefore, I
 think I am creating USMARC files. I can then use marcdump to... dump
 the records. It returns 0 errors.
 
 Eric, This isn't an encoding thing is it? I know that a number of III
 catalogues still encode their diacritics using the MARC8 version of
 USMARC. We have changed ours to Unicode now, but we did have an issue of
 the catalogue outputting unicode records that weren't tagged as such in
 the leader and so couldn't be identified as proper MARC21 (current
 version of USMARC). III have solved this with their latest release. This
 issue had me scratching my head with a lot of my MARC::Record scripts,
 but generally they failed quite spectacularly.


Actually, I believe I am suffering from a number of different types of
errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2)
syntactical errors (lack of periods, invalid choices of indicators, etc.),
3) incorrect data types (strings entered into fields denoted for integers,
etc.) Just about the only thing I haven't encountered are structural errors
such as invalid leader, and this doesn't even take into account possible
data entry errors (author is Franklin when Twain was entered).

Yes, I do have an encoding issue. All of my incoming records are in MARC8.
I'm not sure, but I think the Primo tool expects UTF-8. I can easily update
the encoding bit (change leader position 09 from blank to a), but this does
not change any actual encoding in the bibliographic section of my data.
Consequently, after updating the encoding bit and looping through my munged
data MARC::Record chokes on records with the following error where UTF-8 is
denoted but include MARC8 characters:

  utf8 \xE8 does not map to Unicode at
  /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166.

Upon looking at the raw MARC see the the offending record includes the word
Münich. What can I do to transform MARC8 data into UTF-8? What can I do to
trap the error above, and skip these invalid records?

-- 
Eric Lease Morgan


Re: [CODE4LIB] marc21 and usmarc

2009-01-23 Thread Tod Olson

On Jan 23, 2009, at 5:52 AM, Eric Lease Morgan wrote:


On 1/23/09 4:39 AM, Brown, Alan a.br...@bury.gov.uk wrote:


Does anybody here know the difference between MARC21 and USMARC?

I am munging sets of MARC bibliographic data from a III catalog with
holdings data from the same. I am using MARC::Batch to read my bib'
data (with both strict and warnings turned off), insert 853 and 863
fields, and writing the data using the as_usmarc method.  
Therefore, I

think I am creating USMARC files. I can then use marcdump to... dump
the records. It returns 0 errors.


Eric, This isn't an encoding thing is it? I know that a number of III
catalogues still encode their diacritics using the MARC8 version of
USMARC. We have changed ours to Unicode now, but we did have an  
issue of
the catalogue outputting unicode records that weren't tagged as  
such in

the leader and so couldn't be identified as proper MARC21 (current
version of USMARC). III have solved this with their latest release.  
This
issue had me scratching my head with a lot of my MARC::Record  
scripts,

but generally they failed quite spectacularly.



Actually, I believe I am suffering from a number of different types of
errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2)
syntactical errors (lack of periods, invalid choices of indicators,  
etc.),
3) incorrect data types (strings entered into fields denoted for  
integers,
etc.) Just about the only thing I haven't encountered are structural  
errors
such as invalid leader, and this doesn't even take into account  
possible

data entry errors (author is Franklin when Twain was entered).

Yes, I do have an encoding issue. All of my incoming records are in  
MARC8.
I'm not sure, but I think the Primo tool expects UTF-8. I can easily  
update
the encoding bit (change leader position 09 from blank to a), but  
this does
not change any actual encoding in the bibliographic section of my  
data.
Consequently, after updating the encoding bit and looping through my  
munged
data MARC::Record chokes on records with the following error where  
UTF-8 is

denoted but include MARC8 characters:

 utf8 \xE8 does not map to Unicode at
 /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166.

Upon looking at the raw MARC see the the offending record includes  
the word
Münich. What can I do to transform MARC8 data into UTF-8? What can I  
do to

trap the error above, and skip these invalid records?



We've had good luck with the yaz-marcdump utility that's included with  
the YAZ toolkit.  We're  using it to convert our exported Horizon  
records from MARC8 to UTF-8 before we import into AquaBrowser.  The  
tool is easy to compile, blindingly fast, forgiving of common MARC  
errors, and changes the coding correctly. It's been serving us well.


-Tod

Tod Olson t...@uchicago.edu
Systems Librarian
University of Chicago Library


Re: [CODE4LIB] marc21 and usmarc

2009-01-23 Thread Mark Jordan
- Jonathan Rochkind rochk...@jhu.edu wrote:

 A
 
 US-MARC/MARC21 record can actually be in MARC-8 encoding OR in UTF-8,
 
 and there is actually a field (fixed field I think) to declare which 
 encoding is used. 

leader pos 09

Mark


Re: [CODE4LIB] marc21 and usmarc

2009-01-23 Thread Kyle Banerjee
 Actually, I believe I am suffering from a number of different types of
 errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2)
 syntactical errors (lack of periods, invalid choices of indicators, etc.),
 3) incorrect data types (strings entered into fields denoted for integers,
 etc.) Just about the only thing I haven't encountered are structural errors
 such as invalid leader, and this doesn't even take into account possible
 data entry errors (author is Franklin when Twain was entered).

This MARC stuff is more confusing than it needs to be. As far as the
original question about the difference between USMARC and MARC21,
there is none for all practical purposes. In the mid 90's, the USMARC
and CANMARC communities tried to eliminate differences between them to
improve standardization. The outcome was called MARC21.

Structurally, it's all the same stuff. The differences they're talking
about resolving between CANMARC and USMARC refer to what MARC tags
correspond with which data fields rather than substantive differences
in structure..

The MARC format itself is just a container, and it does not require
that the fields be numeric -- that title is in 245 is simply a
cataloging practice. Although catalogers always use numbers, the
structure of the MARC format allows other characters to be used.

 Despite all of the library commmunities voiced obsession with doing things
 'by the book' according to standards, anyone that's actually tried to work
 with an actually existing large corpus of MARC data finds that is is all
 over the place, and very non-compliant in many ways.

This sums up the problem nicely. For all their carping about detail,
accuracy, and the like catalogers are not consistent once you get
beyond a few basic metadata fields.

This is because catalogers like to believe they can exert far more
bibliographic control than is realistically possible. As a result,
they have developed hopelessly complex procedures that would cause any
Byzantine ruler to break down in tears.

Have you ever seen the books catalogers do to do their jobs? There's
not just AACR2, but also the Library of Congress Rule Interpretations,
the Subject Cataloging Manual, LCCS, Cutter Tables, code lists for
various fields, CONSER manual, Romanization tables, Bib formats and
standards, and there are a zillion specialized resources. BTW, there
is nothing unusual about using all the resources mentioned above to
catalog a single piece.

If you mention inconsistency to a cataloger, you'll trigger a
monologue on quality control and who isn't doing what properly.
However, you know the system is poorly designed when people who've
been cataloging for more than 10 years can't get it right. In any
case, the consistency is so bad that you're better off running
heuristic procedures on data strings than trusting special purpose
fields. Even fields as basic as encoding level that all catalogers
know are not trustworthy enough to rely on.

Catalogers. Can't live with 'em. Can't shoot 'em

kyle (ex-cataloger who created literally thousands of original records
in OCLC during a former lifetime)
-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 541.359.9599


[CODE4LIB] Promo for free issues of PyMag or php|architect

2009-01-23 Thread Jay Luker
Hi gang,

On a lark I e-mailed Doug Hellmann, EiC at Python Magazine, to ask about the
possibility of a group coupon code for code4lib. Apparently we qualify. :)

Here's the deal:

1) anyone who would like 3 free issues of either PyMag [1] or php|architect
[2] should first create an account on the respective site. For example, [3].


2) Next we need a way to collect the e-mail addresses of those account
holders. I first thought, wiki page, but some folks might balk at that.
Unless anyone has a better suggestion, you can just e-mail me at
lb...@reallywow.com and put the string [zine] in the subject somewhere so
I can filter it.

3) After two weeks I'll send the addresses to Doug at PyMag and he'll
trigger the promo on those accounts.

I also suggested to Doug the idea of some free subscriptions to give away at
the conference along with the usual slew of O'Reilly books. He's checking
with his publisher.

Cheers,

--jay

PS, there *will* be O'Reilly books this year, right? Oh God, say yes. I live
for that raffle.


Re: [CODE4LIB] Promo for free issues of PyMag or php|architect

2009-01-23 Thread Jay Luker
AAaaand the footnotes:

[1] http://pymag.phparch.com/c/
[2] http://www.phparch.com/
[3] https://store-pymag.phparch.com/c/account/new/account/


On Fri, Jan 23, 2009 at 12:35 PM, Jay Luker lb...@reallywow.com wrote:

 Hi gang,

 On a lark I e-mailed Doug Hellmann, EiC at Python Magazine, to ask about
 the possibility of a group coupon code for code4lib. Apparently we qualify.
 :)

 Here's the deal:

 1) anyone who would like 3 free issues of either PyMag [1] or php|architect
 [2] should first create an account on the respective site. For example, [3].


 2) Next we need a way to collect the e-mail addresses of those account
 holders. I first thought, wiki page, but some folks might balk at that.
 Unless anyone has a better suggestion, you can just e-mail me at
 lb...@reallywow.com and put the string [zine] in the subject somewhere
 so I can filter it.

 3) After two weeks I'll send the addresses to Doug at PyMag and he'll
 trigger the promo on those accounts.

 I also suggested to Doug the idea of some free subscriptions to give away
 at the conference along with the usual slew of O'Reilly books. He's checking
 with his publisher.

 Cheers,

 --jay

 PS, there *will* be O'Reilly books this year, right? Oh God, say yes. I
 live for that raffle.


Re: [CODE4LIB] Dutch Code4Lib

2009-01-23 Thread Hamparian,Don
Do you see any opportunities to partner with them for an European meeting? Or 
is that more trouble then its worth?


-Original Message-
From: Code for Libraries on behalf of Ross Singer
Sent: Thu 1/22/2009 4:06 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Dutch Code4Lib
 
Eric, this is a good point.  I will be at ELAG this year, and I think
Ed Corrado will, too.

Past presentations look to be very in line with Code4lib and, in fact,
it was billed to me as If you think of Access or Code4Lib but in a
scenic European setting with great beer then you'll have a good idea
of what we are planning by Ron Davies, one of the coordinators.

-Ross.

On Thu, Jan 22, 2009 at 2:33 PM, Eric Lease Morgan emor...@nd.edu wrote:
 On 1/22/09 1:02 PM, Ed Summers e...@pobox.com wrote:

 Wow, this sounds too good to be true. Perhaps this is premature, but
 do you think there might be interest in hosting a code4lib2010 in the
 Netherlands? (he asks selfishly).

 On another note, there is already a library conference that is apparently
 very similar to the Access tradition and Code4Lib that takes place in
 Europe, and I think it is called European Library Automation Group (ELAG).
 See:

  http://indico.ulib.sk/MaKaC/conferenceDisplay.py?confId=5

 While I would love to have a Code4Lib thang in Europe, maybe there is
 something already in place. This year it is in Bratislava (Slovakia). Next
 year I believe it takes place somewhere in Norway.

 --
 Eric Morgan



[CODE4LIB] My previous email

2009-01-23 Thread Hamparian,Don
My apologies for my previous email. We are thinking about a European-based 
developers network (Hackathon) meeting in Europe this year. So my first thought 
was whether we could work together on a European meeting. So apologies for 
sending it -- I meant to bounce the idea off a few people first, but due to my 
fat thumbs (and issues with Outlook Web Access) you all got it. 

Don 

OCLC Grid Services