Re: [CODE4LIB] marc21 and usmarc
Ya'aqov Ziso schrieb: MAB German MARC Just to note that: MAB is (unfortunately) not a German MARC, it is structurally and semantically totally different from any MARC dialect. So it is really hard to work with MAB data, because all the nice open source tools for processing bibliographic records (MARC::Record, marc4j, software based on that like solrmarc ...) are hard to use in applications for German libraries. There are no nice OS programming libraries for handling MAB... There are ongoing efforts to promote MARC 21 in Germany. There is even a resolution by the German library standardization board to switch to MARC 21 as official interlibrary exchange format dating back to 2004. But in practice, MARC 21 doesn't play any role yet in Germany... That makes it really hard to promote some Open Source solutions that focus primarily on MARC data (not so much for technical reasons). -- Till Kinstler Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger Sieben 1, D 37073 Göttingen kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de
Re: [CODE4LIB] marc21 and usmarc
At 22 January 2009 16:28 Eric Lease Morgan wrote: Does anybody here know the difference between MARC21 and USMARC? I am munging sets of MARC bibliographic data from a III catalog with holdings data from the same. I am using MARC::Batch to read my bib' data (with both strict and warnings turned off), insert 853 and 863 fields, and writing the data using the as_usmarc method. Therefore, I think I am creating USMARC files. I can then use marcdump to... dump the records. It returns 0 errors. Eric, This isn't an encoding thing is it? I know that a number of III catalogues still encode their diacritics using the MARC8 version of USMARC. We have changed ours to Unicode now, but we did have an issue of the catalogue outputting unicode records that weren't tagged as such in the leader and so couldn't be identified as proper MARC21 (current version of USMARC). III have solved this with their latest release. This issue had me scratching my head with a lot of my MARC::Record scripts, but generally they failed quite spectacularly. regards Alan Brown -- Alan Brown Library Systems Liaison Officer Resource Services Bury Libraries Textile Hall Manchester Rd Bury BL9 0DG Tel 0161 253 5877 Fax 0161 253 6003 http://www.bury.gov.uk/libraries http://library.bury.gov.uk - Why not visit our website www.bury.gov.uk - The information contained in this e-mail and any files transmitted with it is for the intended recipient(s) alone. It may contain confidential information that is exempt from the disclosure under English law and may also be covered by legal,professional or other privilege. If you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this e-mail in error, please notify us immediately by using the reply facility on your e-mail system. If this message is being transmitted over the Internet, be aware that it may be intercepted by third parties. As a public body, the Council may be required to disclose this e-mail or any response to it under the Freedom of Information Act 2000 unless the information in it is covered by one of the exemptions in the Act. By responding to this e-mail you accept that your response may be subject of recording/monitoring to ensure compliance with the Council's ICT Security Policy. Electronic service accepted only at legalservi...@bury.gov.uk and on fax number 0161 253 5119 . *
Re: [CODE4LIB] marc21 and usmarc
On 1/23/09 4:39 AM, Brown, Alan a.br...@bury.gov.uk wrote: Does anybody here know the difference between MARC21 and USMARC? I am munging sets of MARC bibliographic data from a III catalog with holdings data from the same. I am using MARC::Batch to read my bib' data (with both strict and warnings turned off), insert 853 and 863 fields, and writing the data using the as_usmarc method. Therefore, I think I am creating USMARC files. I can then use marcdump to... dump the records. It returns 0 errors. Eric, This isn't an encoding thing is it? I know that a number of III catalogues still encode their diacritics using the MARC8 version of USMARC. We have changed ours to Unicode now, but we did have an issue of the catalogue outputting unicode records that weren't tagged as such in the leader and so couldn't be identified as proper MARC21 (current version of USMARC). III have solved this with their latest release. This issue had me scratching my head with a lot of my MARC::Record scripts, but generally they failed quite spectacularly. Actually, I believe I am suffering from a number of different types of errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2) syntactical errors (lack of periods, invalid choices of indicators, etc.), 3) incorrect data types (strings entered into fields denoted for integers, etc.) Just about the only thing I haven't encountered are structural errors such as invalid leader, and this doesn't even take into account possible data entry errors (author is Franklin when Twain was entered). Yes, I do have an encoding issue. All of my incoming records are in MARC8. I'm not sure, but I think the Primo tool expects UTF-8. I can easily update the encoding bit (change leader position 09 from blank to a), but this does not change any actual encoding in the bibliographic section of my data. Consequently, after updating the encoding bit and looping through my munged data MARC::Record chokes on records with the following error where UTF-8 is denoted but include MARC8 characters: utf8 \xE8 does not map to Unicode at /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166. Upon looking at the raw MARC see the the offending record includes the word Münich. What can I do to transform MARC8 data into UTF-8? What can I do to trap the error above, and skip these invalid records? -- Eric Lease Morgan
Re: [CODE4LIB] marc21 and usmarc
On Jan 23, 2009, at 5:52 AM, Eric Lease Morgan wrote: On 1/23/09 4:39 AM, Brown, Alan a.br...@bury.gov.uk wrote: Does anybody here know the difference between MARC21 and USMARC? I am munging sets of MARC bibliographic data from a III catalog with holdings data from the same. I am using MARC::Batch to read my bib' data (with both strict and warnings turned off), insert 853 and 863 fields, and writing the data using the as_usmarc method. Therefore, I think I am creating USMARC files. I can then use marcdump to... dump the records. It returns 0 errors. Eric, This isn't an encoding thing is it? I know that a number of III catalogues still encode their diacritics using the MARC8 version of USMARC. We have changed ours to Unicode now, but we did have an issue of the catalogue outputting unicode records that weren't tagged as such in the leader and so couldn't be identified as proper MARC21 (current version of USMARC). III have solved this with their latest release. This issue had me scratching my head with a lot of my MARC::Record scripts, but generally they failed quite spectacularly. Actually, I believe I am suffering from a number of different types of errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2) syntactical errors (lack of periods, invalid choices of indicators, etc.), 3) incorrect data types (strings entered into fields denoted for integers, etc.) Just about the only thing I haven't encountered are structural errors such as invalid leader, and this doesn't even take into account possible data entry errors (author is Franklin when Twain was entered). Yes, I do have an encoding issue. All of my incoming records are in MARC8. I'm not sure, but I think the Primo tool expects UTF-8. I can easily update the encoding bit (change leader position 09 from blank to a), but this does not change any actual encoding in the bibliographic section of my data. Consequently, after updating the encoding bit and looping through my munged data MARC::Record chokes on records with the following error where UTF-8 is denoted but include MARC8 characters: utf8 \xE8 does not map to Unicode at /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166. Upon looking at the raw MARC see the the offending record includes the word Münich. What can I do to transform MARC8 data into UTF-8? What can I do to trap the error above, and skip these invalid records? We've had good luck with the yaz-marcdump utility that's included with the YAZ toolkit. We're using it to convert our exported Horizon records from MARC8 to UTF-8 before we import into AquaBrowser. The tool is easy to compile, blindingly fast, forgiving of common MARC errors, and changes the coding correctly. It's been serving us well. -Tod Tod Olson t...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] marc21 and usmarc
- Jonathan Rochkind rochk...@jhu.edu wrote: A US-MARC/MARC21 record can actually be in MARC-8 encoding OR in UTF-8, and there is actually a field (fixed field I think) to declare which encoding is used. leader pos 09 Mark
Re: [CODE4LIB] marc21 and usmarc
Actually, I believe I am suffering from a number of different types of errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2) syntactical errors (lack of periods, invalid choices of indicators, etc.), 3) incorrect data types (strings entered into fields denoted for integers, etc.) Just about the only thing I haven't encountered are structural errors such as invalid leader, and this doesn't even take into account possible data entry errors (author is Franklin when Twain was entered). This MARC stuff is more confusing than it needs to be. As far as the original question about the difference between USMARC and MARC21, there is none for all practical purposes. In the mid 90's, the USMARC and CANMARC communities tried to eliminate differences between them to improve standardization. The outcome was called MARC21. Structurally, it's all the same stuff. The differences they're talking about resolving between CANMARC and USMARC refer to what MARC tags correspond with which data fields rather than substantive differences in structure.. The MARC format itself is just a container, and it does not require that the fields be numeric -- that title is in 245 is simply a cataloging practice. Although catalogers always use numbers, the structure of the MARC format allows other characters to be used. Despite all of the library commmunities voiced obsession with doing things 'by the book' according to standards, anyone that's actually tried to work with an actually existing large corpus of MARC data finds that is is all over the place, and very non-compliant in many ways. This sums up the problem nicely. For all their carping about detail, accuracy, and the like catalogers are not consistent once you get beyond a few basic metadata fields. This is because catalogers like to believe they can exert far more bibliographic control than is realistically possible. As a result, they have developed hopelessly complex procedures that would cause any Byzantine ruler to break down in tears. Have you ever seen the books catalogers do to do their jobs? There's not just AACR2, but also the Library of Congress Rule Interpretations, the Subject Cataloging Manual, LCCS, Cutter Tables, code lists for various fields, CONSER manual, Romanization tables, Bib formats and standards, and there are a zillion specialized resources. BTW, there is nothing unusual about using all the resources mentioned above to catalog a single piece. If you mention inconsistency to a cataloger, you'll trigger a monologue on quality control and who isn't doing what properly. However, you know the system is poorly designed when people who've been cataloging for more than 10 years can't get it right. In any case, the consistency is so bad that you're better off running heuristic procedures on data strings than trusting special purpose fields. Even fields as basic as encoding level that all catalogers know are not trustworthy enough to rely on. Catalogers. Can't live with 'em. Can't shoot 'em kyle (ex-cataloger who created literally thousands of original records in OCLC during a former lifetime) -- -- Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance baner...@uoregon.edu / 541.359.9599
[CODE4LIB] Promo for free issues of PyMag or php|architect
Hi gang, On a lark I e-mailed Doug Hellmann, EiC at Python Magazine, to ask about the possibility of a group coupon code for code4lib. Apparently we qualify. :) Here's the deal: 1) anyone who would like 3 free issues of either PyMag [1] or php|architect [2] should first create an account on the respective site. For example, [3]. 2) Next we need a way to collect the e-mail addresses of those account holders. I first thought, wiki page, but some folks might balk at that. Unless anyone has a better suggestion, you can just e-mail me at lb...@reallywow.com and put the string [zine] in the subject somewhere so I can filter it. 3) After two weeks I'll send the addresses to Doug at PyMag and he'll trigger the promo on those accounts. I also suggested to Doug the idea of some free subscriptions to give away at the conference along with the usual slew of O'Reilly books. He's checking with his publisher. Cheers, --jay PS, there *will* be O'Reilly books this year, right? Oh God, say yes. I live for that raffle.
Re: [CODE4LIB] Promo for free issues of PyMag or php|architect
AAaaand the footnotes: [1] http://pymag.phparch.com/c/ [2] http://www.phparch.com/ [3] https://store-pymag.phparch.com/c/account/new/account/ On Fri, Jan 23, 2009 at 12:35 PM, Jay Luker lb...@reallywow.com wrote: Hi gang, On a lark I e-mailed Doug Hellmann, EiC at Python Magazine, to ask about the possibility of a group coupon code for code4lib. Apparently we qualify. :) Here's the deal: 1) anyone who would like 3 free issues of either PyMag [1] or php|architect [2] should first create an account on the respective site. For example, [3]. 2) Next we need a way to collect the e-mail addresses of those account holders. I first thought, wiki page, but some folks might balk at that. Unless anyone has a better suggestion, you can just e-mail me at lb...@reallywow.com and put the string [zine] in the subject somewhere so I can filter it. 3) After two weeks I'll send the addresses to Doug at PyMag and he'll trigger the promo on those accounts. I also suggested to Doug the idea of some free subscriptions to give away at the conference along with the usual slew of O'Reilly books. He's checking with his publisher. Cheers, --jay PS, there *will* be O'Reilly books this year, right? Oh God, say yes. I live for that raffle.
Re: [CODE4LIB] Dutch Code4Lib
Do you see any opportunities to partner with them for an European meeting? Or is that more trouble then its worth? -Original Message- From: Code for Libraries on behalf of Ross Singer Sent: Thu 1/22/2009 4:06 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Dutch Code4Lib Eric, this is a good point. I will be at ELAG this year, and I think Ed Corrado will, too. Past presentations look to be very in line with Code4lib and, in fact, it was billed to me as If you think of Access or Code4Lib but in a scenic European setting with great beer then you'll have a good idea of what we are planning by Ron Davies, one of the coordinators. -Ross. On Thu, Jan 22, 2009 at 2:33 PM, Eric Lease Morgan emor...@nd.edu wrote: On 1/22/09 1:02 PM, Ed Summers e...@pobox.com wrote: Wow, this sounds too good to be true. Perhaps this is premature, but do you think there might be interest in hosting a code4lib2010 in the Netherlands? (he asks selfishly). On another note, there is already a library conference that is apparently very similar to the Access tradition and Code4Lib that takes place in Europe, and I think it is called European Library Automation Group (ELAG). See: http://indico.ulib.sk/MaKaC/conferenceDisplay.py?confId=5 While I would love to have a Code4Lib thang in Europe, maybe there is something already in place. This year it is in Bratislava (Slovakia). Next year I believe it takes place somewhere in Norway. -- Eric Morgan
[CODE4LIB] My previous email
My apologies for my previous email. We are thinking about a European-based developers network (Hackathon) meeting in Europe this year. So my first thought was whether we could work together on a European meeting. So apologies for sending it -- I meant to bounce the idea off a few people first, but due to my fat thumbs (and issues with Outlook Web Access) you all got it. Don OCLC Grid Services