Re: [CODE4LIB] marc-8
> But I had no idea Marc8 allowed escape sequences to temporarily switch > to a different encoding. Really? Oh my god. For you young'uns that were "born Unicode" and are a bit foggy on the MARC-8 environment (and all its... intricacies), I did a short write-up a few years ago: Coded Character Sets > A Technical Primer for Librarians http://rocky.uta.edu/doran/charsets/ Feel free to skip the intro, but the "MARC-8" and "MARC Unicode" sections are short and worth a read. Plus there's a lot of bonus stuff, including "Resources on the Web" (http://rocky.uta.edu/doran/charsets/resources.html) with an emphasis on library automation and the internet environment. Begging your pardon for the self-promotion, -- Michael > -Original Message- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Monday, October 24, 2011 2:14 PM > To: Code for Libraries > Cc: Doran, Michael D > Subject: Re: [CODE4LIB] marc-8 > > Yeah, but if there's Perl code and Java code to do it, can't be _that_ > hard to port to ruby if I could figure out what you need to do to > get first-class char encoding support in ruby 1.9 anyway. > > I mean, you could do it just as a library without that... but it's > enough trouble that, yeah, I don't want to do it, but if the benefit was > first-class encoding support same as any other encoding in ruby 1.9, > that you can use with the built in tools for converting encodings and > any library that uses em bigger benefit. > > But I had no idea Marc8 allowed escape sequences to temporarily switch > to a different encoding. Really? Oh my god. > > On 10/24/2011 3:10 PM, Doran, Michael D wrote: > > Hi Jonathan, > > > >> I tried to figure out how to custom add a new encoding to ruby 1.9 with > >> the idea of adding Marc8 as an actuall ruby 1.9 character encoding > >> supported same as any other built in char encoding > > Not a trivial undertaking. Remember that the MARC-8 environment allows > alternate character sets to be invoked within a MARC record using two > different "escape" methods [1]. Just one of the reasons why you're not > finding a bunch of these MARC-8 conversion modules, and one for every > language. ;-) > > > > -- Michael > > > > [1] Technique 1 is unique to MARC-8 and provides access to a small number > of Greek symbols, subscripts, and superscripts. Technique 2 is based on the > ANSI X3.41 (ISO 2022) "Code Extension Techniques for Use with 7-bit and 8- > bit Character Sets" standard. See the MARC 21 Specification for details on > accessing alternate graphic character sets > (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative). > > > > > >> -Original Message- > >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > >> Jonathan Rochkind > >> Sent: Monday, October 24, 2011 2:01 PM > >> To: CODE4LIB@LISTSERV.ND.EDU > >> Subject: Re: [CODE4LIB] marc-8 > >> > >> What _ought_ to be easiest of all is getting our ILS's to NEVER export > >> Marc8 _ever_ again. UTF8 only. > >> > >> Sadly, that only ought to be easiest. > >> > >> But IMO there's no reason any of us should be dealing with Marc8 ever > >> again. The only thing that should deal in Marc8 is an ILS, and should > >> only input it, NEVER output it, UTF8 only, please! > >> > >> But this is not the world we live in. > >> > >> I tried to figure out how to custom add a new encoding to ruby 1.9 with > >> the idea of adding Marc8 as an actuall ruby 1.9 character encoding > >> supported same as any other built in char encoding, but I couldn't > >> figure out if that was possible or how to do it. If it was possible to > >> do at that low level in ruby 1.9, it might justify the time to do it. > >> > >> On 10/24/2011 2:55 PM, Doran, Michael D wrote: > >>> Eric, > >>> > >>> Sometimes for grandpa Perl stuff -- especially as concerns charsets > and/or > >> internationalization -- it's worth pinging these lists: > >>> perl4...@perl.org (yes, still alive and kicking) > >>> > >>> perl-i...@perl.org (very low traffic list, but some knowledgeable > >> subscribers) > >>> -- Michael > >>> > >>>> -Original Message- > >>>> From: Doran, Michael D > >>>> Sent: Monday, October 24, 2011 1:48 PM > >>>> To: 'Code for Libraries' > >>>> Subject: RE:
Re: [CODE4LIB] marc-8
On Oct 24, 2011, at 3:03 PM, Jon Gorman wrote: > yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw >marc21.utf8.raw This worked great! My version of yaz-marcdump was older and was not doing the trick. code4lib++ -- Eric
Re: [CODE4LIB] marc-8
Yeah, but if there's Perl code and Java code to do it, can't be _that_ hard to port to ruby if I could figure out what you need to do to get first-class char encoding support in ruby 1.9 anyway. I mean, you could do it just as a library without that... but it's enough trouble that, yeah, I don't want to do it, but if the benefit was first-class encoding support same as any other encoding in ruby 1.9, that you can use with the built in tools for converting encodings and any library that uses em bigger benefit. But I had no idea Marc8 allowed escape sequences to temporarily switch to a different encoding. Really? Oh my god. On 10/24/2011 3:10 PM, Doran, Michael D wrote: Hi Jonathan, I tried to figure out how to custom add a new encoding to ruby 1.9 with the idea of adding Marc8 as an actuall ruby 1.9 character encoding supported same as any other built in char encoding Not a trivial undertaking. Remember that the MARC-8 environment allows alternate character sets to be invoked within a MARC record using two different "escape" methods [1]. Just one of the reasons why you're not finding a bunch of these MARC-8 conversion modules, and one for every language. ;-) -- Michael [1] Technique 1 is unique to MARC-8 and provides access to a small number of Greek symbols, subscripts, and superscripts. Technique 2 is based on the ANSI X3.41 (ISO 2022) "Code Extension Techniques for Use with 7-bit and 8-bit Character Sets" standard. See the MARC 21 Specification for details on accessing alternate graphic character sets (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative). -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Monday, October 24, 2011 2:01 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] marc-8 What _ought_ to be easiest of all is getting our ILS's to NEVER export Marc8 _ever_ again. UTF8 only. Sadly, that only ought to be easiest. But IMO there's no reason any of us should be dealing with Marc8 ever again. The only thing that should deal in Marc8 is an ILS, and should only input it, NEVER output it, UTF8 only, please! But this is not the world we live in. I tried to figure out how to custom add a new encoding to ruby 1.9 with the idea of adding Marc8 as an actuall ruby 1.9 character encoding supported same as any other built in char encoding, but I couldn't figure out if that was possible or how to do it. If it was possible to do at that low level in ruby 1.9, it might justify the time to do it. On 10/24/2011 2:55 PM, Doran, Michael D wrote: Eric, Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or internationalization -- it's worth pinging these lists: perl4...@perl.org (yes, still alive and kicking) perl-i...@perl.org (very low traffic list, but some knowledgeable subscribers) -- Michael -Original Message- From: Doran, Michael D Sent: Monday, October 24, 2011 1:48 PM To: 'Code for Libraries' Subject: RE: [CODE4LIB] marc-8 Okay. How do I go about converting MARC-8 encoded records into UTF-8? In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed Summers, and now maintained by Galen Charlton). -- Michael -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Monday, October 24, 2011 1:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] marc-8 On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding) data? You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8. /me throws his hands up in the air and screams! Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I could simply try it and see what happens.) -- Eric Morgan
Re: [CODE4LIB] marc-8
Hi Jonathan, > I tried to figure out how to custom add a new encoding to ruby 1.9 with > the idea of adding Marc8 as an actuall ruby 1.9 character encoding > supported same as any other built in char encoding Not a trivial undertaking. Remember that the MARC-8 environment allows alternate character sets to be invoked within a MARC record using two different "escape" methods [1]. Just one of the reasons why you're not finding a bunch of these MARC-8 conversion modules, and one for every language. ;-) -- Michael [1] Technique 1 is unique to MARC-8 and provides access to a small number of Greek symbols, subscripts, and superscripts. Technique 2 is based on the ANSI X3.41 (ISO 2022) "Code Extension Techniques for Use with 7-bit and 8-bit Character Sets" standard. See the MARC 21 Specification for details on accessing alternate graphic character sets (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative). > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Monday, October 24, 2011 2:01 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] marc-8 > > What _ought_ to be easiest of all is getting our ILS's to NEVER export > Marc8 _ever_ again. UTF8 only. > > Sadly, that only ought to be easiest. > > But IMO there's no reason any of us should be dealing with Marc8 ever > again. The only thing that should deal in Marc8 is an ILS, and should > only input it, NEVER output it, UTF8 only, please! > > But this is not the world we live in. > > I tried to figure out how to custom add a new encoding to ruby 1.9 with > the idea of adding Marc8 as an actuall ruby 1.9 character encoding > supported same as any other built in char encoding, but I couldn't > figure out if that was possible or how to do it. If it was possible to > do at that low level in ruby 1.9, it might justify the time to do it. > > On 10/24/2011 2:55 PM, Doran, Michael D wrote: > > Eric, > > > > Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or > internationalization -- it's worth pinging these lists: > > > > perl4...@perl.org (yes, still alive and kicking) > > > > perl-i...@perl.org (very low traffic list, but some knowledgeable > subscribers) > > > > -- Michael > > > >> -Original Message- > >> From: Doran, Michael D > >> Sent: Monday, October 24, 2011 1:48 PM > >> To: 'Code for Libraries' > >> Subject: RE: [CODE4LIB] marc-8 > >> > >>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? > >> In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed > >> Summers, and now maintained by Galen Charlton). > >> > >> -- Michael > >> > >>> -Original Message- > >>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > >> Eric > >>> Lease Morgan > >>> Sent: Monday, October 24, 2011 1:39 PM > >>> To: CODE4LIB@LISTSERV.ND.EDU > >>> Subject: Re: [CODE4LIB] marc-8 > >>> > >>> On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: > >>> > >>>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing > >>>>> (encoding) data? > >>>> You can't. MARC-8 is a character set that is unknown to the operating > >>> system. Your best bet is to convert MARC-8-encoded records into UTF-8. > >>> > >>> /me throws his hands up in the air and screams! > >>> > >>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? I > >> know > >>> yaz-marcdump changes the encoding bit in MARC leaders. Does it also > >> convert > >>> MARC-8 characters to UTF-8? (I guess I could simply try it and see what > >>> happens.) > >>> > >>> -- > >>> Eric Morgan
Re: [CODE4LIB] marc-8
>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing >>> (encoding) data? >> >> You can't. MARC-8 is a character set that is unknown to the operating >> system. Your best bet is to convert MARC-8-encoded records into UTF-8. > > /me throws his hands up in the air and screams! > > Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know > yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert > MARC-8 characters to UTF-8? (I guess I could simply try it and see what > happens.) > I seem to remember there was an older version of yaz-marcdump that seemed a bit buggy (would just change the header but not change encoding despite command-line options, if there was a certain combination chosen). It's also possible I was just working with a script that specified the encoding change but not the leader. I'd say get the most recent version of yaz (don't use anything in an OS repository) and then follow the docs: http://www.indexdata.com/yaz/doc/yaz-marcdump.html. The first example is what you want: yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw >marc21.utf8.raw The -f is the source encoding, the -t is the target encoding, and the -l 9=97 sets leader to a (decimal of character to change the 9th character to a). I've typically found this is one of the easier ways to do the character set encoding, although the various Perl modules (if they're recent enough) should be able to handle the conversion as well through the MARC::Charset library. Check the cpan pages. Jon Gorman ps. For the love of all that is good, don't try to do anything in Perl with the raw MARC record to do the encoding change yourself. I've seen someone really screw records up because they altered individual characters, which in turn lead to different byte lengths. This caused all sorts of insanity which meant really weird things happened with MARC parsers that tried to follow the MARC directory (which uses byte addresses to deal with variable fields).
Re: [CODE4LIB] marc-8
What _ought_ to be easiest of all is getting our ILS's to NEVER export Marc8 _ever_ again. UTF8 only. Sadly, that only ought to be easiest. But IMO there's no reason any of us should be dealing with Marc8 ever again. The only thing that should deal in Marc8 is an ILS, and should only input it, NEVER output it, UTF8 only, please! But this is not the world we live in. I tried to figure out how to custom add a new encoding to ruby 1.9 with the idea of adding Marc8 as an actuall ruby 1.9 character encoding supported same as any other built in char encoding, but I couldn't figure out if that was possible or how to do it. If it was possible to do at that low level in ruby 1.9, it might justify the time to do it. On 10/24/2011 2:55 PM, Doran, Michael D wrote: Eric, Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or internationalization -- it's worth pinging these lists: perl4...@perl.org (yes, still alive and kicking) perl-i...@perl.org (very low traffic list, but some knowledgeable subscribers) -- Michael -Original Message- From: Doran, Michael D Sent: Monday, October 24, 2011 1:48 PM To: 'Code for Libraries' Subject: RE: [CODE4LIB] marc-8 Okay. How do I go about converting MARC-8 encoded records into UTF-8? In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed Summers, and now maintained by Galen Charlton). -- Michael -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Monday, October 24, 2011 1:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] marc-8 On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding) data? You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8. /me throws his hands up in the air and screams! Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I could simply try it and see what happens.) -- Eric Morgan
Re: [CODE4LIB] marc-8
On 10/24/2011 2:52 PM, Ross Singer wrote: On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan wrote: Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I could simply try it and see what happens.) Yes, it does. It uses yaz-iconv. Theoretically, you could wrap some Perl module around that. I've contemplated it for ruby-marc, but then it always seems a lot easier to ignore it and delete any emails that request it. Or use jruby, where you can use Marc4J. Or actually port either the Java or (apparently?) Perl version into ruby; okay that one is not "easier" then anything in the short term, but in the long term I'd rather have pure ruby that something that relies on an external bash call or a C extension, those latter are invariably going to be annoying and confusing maintenance down the line, in my experience. But I'm not doing any of these things anytime soon either. So far all my ruby that deals with Marc gets something else to convert it first. (In my largest case, Java Marc4J converts it before it's stored in a stored field in a Solr index, and my ruby only gets it from the stored field in Solr, already converted).
Re: [CODE4LIB] marc-8
If I understand correctly, there's some support for this in pymarc as well: https://github.com/edsu/pymarc/blob/master/pymarc/marc8.py#L22 -Mike On Mon, Oct 24, 2011 at 14:52, Jonathan Rochkind wrote: > Woah, there is a library in Perl to do that? Sweet! Okay, now I know two > languages with such a library, Perl and Java. > > Anyone want to write one for ruby? :) > > On 10/24/2011 2:47 PM, Doran, Michael D wrote: >>> >>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? >> >> In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed >> Summers, and now maintained by Galen Charlton). >> >> -- Michael >> >>> -Original Message- >>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >>> Eric >>> Lease Morgan >>> Sent: Monday, October 24, 2011 1:39 PM >>> To: CODE4LIB@LISTSERV.ND.EDU >>> Subject: Re: [CODE4LIB] marc-8 >>> >>> On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: >>> >>>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing >>>>> (encoding) data? >>>> >>>> You can't. MARC-8 is a character set that is unknown to the operating >>> >>> system. Your best bet is to convert MARC-8-encoded records into UTF-8. >>> >>> /me throws his hands up in the air and screams! >>> >>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? I >>> know >>> yaz-marcdump changes the encoding bit in MARC leaders. Does it also >>> convert >>> MARC-8 characters to UTF-8? (I guess I could simply try it and see what >>> happens.) >>> >>> -- >>> Eric Morgan >
Re: [CODE4LIB] marc-8
Eric, Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or internationalization -- it's worth pinging these lists: perl4...@perl.org (yes, still alive and kicking) perl-i...@perl.org (very low traffic list, but some knowledgeable subscribers) -- Michael > -Original Message- > From: Doran, Michael D > Sent: Monday, October 24, 2011 1:48 PM > To: 'Code for Libraries' > Subject: RE: [CODE4LIB] marc-8 > > > Okay. How do I go about converting MARC-8 encoded records into UTF-8? > > In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed > Summers, and now maintained by Galen Charlton). > > -- Michael > > > -Original Message- > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Eric > > Lease Morgan > > Sent: Monday, October 24, 2011 1:39 PM > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: Re: [CODE4LIB] marc-8 > > > > On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: > > > > >> In Perl, how do I specify MARC-8 when reading (decoding) and writing > > >> (encoding) data? > > > > > > You can't. MARC-8 is a character set that is unknown to the operating > > system. Your best bet is to convert MARC-8-encoded records into UTF-8. > > > > /me throws his hands up in the air and screams! > > > > Okay. How do I go about converting MARC-8 encoded records into UTF-8? I > know > > yaz-marcdump changes the encoding bit in MARC leaders. Does it also > convert > > MARC-8 characters to UTF-8? (I guess I could simply try it and see what > > happens.) > > > > -- > > Eric Morgan
Re: [CODE4LIB] marc-8
On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan wrote: > Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know > yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert > MARC-8 characters to UTF-8? (I guess I could simply try it and see what > happens.) > Yes, it does. It uses yaz-iconv. Theoretically, you could wrap some Perl module around that. I've contemplated it for ruby-marc, but then it always seems a lot easier to ignore it and delete any emails that request it. A whole lot easier. -Ross.
Re: [CODE4LIB] marc-8
Woah, there is a library in Perl to do that? Sweet! Okay, now I know two languages with such a library, Perl and Java. Anyone want to write one for ruby? :) On 10/24/2011 2:47 PM, Doran, Michael D wrote: Okay. How do I go about converting MARC-8 encoded records into UTF-8? In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed Summers, and now maintained by Galen Charlton). -- Michael -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Monday, October 24, 2011 1:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] marc-8 On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding) data? You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8. /me throws his hands up in the air and screams! Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I could simply try it and see what happens.) -- Eric Morgan
Re: [CODE4LIB] marc-8
The only language that I know of with a library for reading Marc8 and converting to another encoding (such as UTF-8) is Java. The Marc4J package will do it. I suppose there may be C libraries too; is yaz written in C? As Michael suggests the easiest thing to do (if you're not in Java) is probably to use the 'yaz' tools to convert to UTF-8 before anything else touches it. If you do end up writing a Marc8 handling library in another language like Perl (presumably you could use the Java code in Marc4J as a guide), please do share! Heh. On 10/24/2011 2:34 PM, Doran, Michael D wrote: Hi Eric, In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding) data? You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8. ...it is converted it Perl's internal encoding (UTF-8) As an FTY, UTF-8 is *not* Perl's internal encoding. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Monday, October 24, 2011 1:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] marc-8 In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding) data? Character encoding is the bane of my existence. I have learned that when reading from a file I ought to specify the type of encoding the file is in and decode accordingly, or else. Once read, it is converted it Perl's internal encoding (UTF-8) and can be manipulated. Similarly, when writing I am expected to specify the encoding. Both the reading (decoding) and the writing (encoding) can be done with the Encode module. Here is a some code illustrating what I'm trying to do with MARC records which are apparently in MARC-8: # require use Encode qw( encode decode ); # initialize my $batch = MARC::Batch->new( 'USMARC', './records.mrc' ); open OUT, '> updated.mrc'; # process each record while ( my $marc = $batch->next ) { # get the title my $_245 = decode( 'FOO', $marc->title ); # do cool stuff with the title here # output the cool stuff print OUT encode( 'FOO', $_245 ); } # done close OUT; exit; My problem is, I don't know what to put in place of FOO. What is the official name of MARC-8's encoding scheme? -- Eric "The Ugly American" Morgan University of Notre Dame (574) 631-8604
Re: [CODE4LIB] marc-8
> Okay. How do I go about converting MARC-8 encoded records into UTF-8? In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed Summers, and now maintained by Galen Charlton). -- Michael > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric > Lease Morgan > Sent: Monday, October 24, 2011 1:39 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] marc-8 > > On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: > > >> In Perl, how do I specify MARC-8 when reading (decoding) and writing > >> (encoding) data? > > > > You can't. MARC-8 is a character set that is unknown to the operating > system. Your best bet is to convert MARC-8-encoded records into UTF-8. > > /me throws his hands up in the air and screams! > > Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know > yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert > MARC-8 characters to UTF-8? (I guess I could simply try it and see what > happens.) > > -- > Eric Morgan
Re: [CODE4LIB] marc-8
> I know yaz-marcdump changes the encoding bit in MARC > leaders. Does it also convert MARC-8 characters to UTF-8? Yes. We use it for that purpose all the time. --Dave - David Walker Library Web Services Manager California State University -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Monday, October 24, 2011 11:39 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] marc-8 On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: >> In Perl, how do I specify MARC-8 when reading (decoding) and writing >> (encoding) data? > > You can't. MARC-8 is a character set that is unknown to the operating > system. Your best bet is to convert MARC-8-encoded records into UTF-8. /me throws his hands up in the air and screams! Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I could simply try it and see what happens.) -- Eric Morgan
Re: [CODE4LIB] marc-8
On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: >> In Perl, how do I specify MARC-8 when reading (decoding) and writing >> (encoding) data? > > You can't. MARC-8 is a character set that is unknown to the operating > system. Your best bet is to convert MARC-8-encoded records into UTF-8. /me throws his hands up in the air and screams! Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I could simply try it and see what happens.) -- Eric Morgan
Re: [CODE4LIB] marc-8
> As an FTY, Oops, in a hurry. s/FTY/FYI/ > -Original Message- > From: Doran, Michael D > Sent: Monday, October 24, 2011 1:35 PM > To: 'Code for Libraries' > Subject: RE: marc-8 > > Hi Eric, > > > In Perl, how do I specify MARC-8 when reading (decoding) and writing > > (encoding) data? > > You can't. MARC-8 is a character set that is unknown to the operating > system. Your best bet is to convert MARC-8-encoded records into UTF-8. > > > ...it is converted it Perl's > > internal encoding (UTF-8) > > As an FTY, UTF-8 is *not* Perl's internal encoding. > > -- Michael > > # Michael Doran, Systems Librarian > # University of Texas at Arlington > # 817-272-5326 office > # 817-688-1926 mobile > # do...@uta.edu > # http://rocky.uta.edu/doran/ > > > > > -Original Message- > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Eric > > Lease Morgan > > Sent: Monday, October 24, 2011 1:18 PM > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: [CODE4LIB] marc-8 > > > > In Perl, how do I specify MARC-8 when reading (decoding) and writing > > (encoding) data? > > > > Character encoding is the bane of my existence. I have learned that when > > reading from a file I ought to specify the type of encoding the file is in > > and decode accordingly, or else. Once read, it is converted it Perl's > > internal encoding (UTF-8) and can be manipulated. Similarly, when writing I > > am expected to specify the encoding. Both the reading (decoding) and the > > writing (encoding) can be done with the Encode module. Here is a some code > > illustrating what I'm trying to do with MARC records which are apparently > in > > MARC-8: > > > > # require > > use Encode qw( encode decode ); > > > > # initialize > > my $batch = MARC::Batch->new( 'USMARC', './records.mrc' ); > > open OUT, ' > updated.mrc'; > > > > # process each record > > while ( my $marc = $batch->next ) { > > > > # get the title > > my $_245 = decode( 'FOO', $marc->title ); > > > > # do cool stuff with the title here > > > > # output the cool stuff > > print OUT encode( 'FOO', $_245 ); > > > > } > > > > # done > > close OUT; > > exit; > > > > > > My problem is, I don't know what to put in place of FOO. What is the > official > > name of MARC-8's encoding scheme? > > > > -- > > Eric "The Ugly American" Morgan > > University of Notre Dame > > > > (574) 631-8604
Re: [CODE4LIB] marc-8
Hi Eric, > In Perl, how do I specify MARC-8 when reading (decoding) and writing > (encoding) data? You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8. > ...it is converted it Perl's > internal encoding (UTF-8) As an FTY, UTF-8 is *not* Perl's internal encoding. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric > Lease Morgan > Sent: Monday, October 24, 2011 1:18 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] marc-8 > > In Perl, how do I specify MARC-8 when reading (decoding) and writing > (encoding) data? > > Character encoding is the bane of my existence. I have learned that when > reading from a file I ought to specify the type of encoding the file is in > and decode accordingly, or else. Once read, it is converted it Perl's > internal encoding (UTF-8) and can be manipulated. Similarly, when writing I > am expected to specify the encoding. Both the reading (decoding) and the > writing (encoding) can be done with the Encode module. Here is a some code > illustrating what I'm trying to do with MARC records which are apparently in > MARC-8: > > # require > use Encode qw( encode decode ); > > # initialize > my $batch = MARC::Batch->new( 'USMARC', './records.mrc' ); > open OUT, ' > updated.mrc'; > > # process each record > while ( my $marc = $batch->next ) { > > # get the title > my $_245 = decode( 'FOO', $marc->title ); > > # do cool stuff with the title here > > # output the cool stuff > print OUT encode( 'FOO', $_245 ); > > } > > # done > close OUT; > exit; > > > My problem is, I don't know what to put in place of FOO. What is the official > name of MARC-8's encoding scheme? > > -- > Eric "The Ugly American" Morgan > University of Notre Dame > > (574) 631-8604
[CODE4LIB] marc-8
In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding) data? Character encoding is the bane of my existence. I have learned that when reading from a file I ought to specify the type of encoding the file is in and decode accordingly, or else. Once read, it is converted it Perl's internal encoding (UTF-8) and can be manipulated. Similarly, when writing I am expected to specify the encoding. Both the reading (decoding) and the writing (encoding) can be done with the Encode module. Here is a some code illustrating what I'm trying to do with MARC records which are apparently in MARC-8: # require use Encode qw( encode decode ); # initialize my $batch = MARC::Batch->new( 'USMARC', './records.mrc' ); open OUT, ' > updated.mrc'; # process each record while ( my $marc = $batch->next ) { # get the title my $_245 = decode( 'FOO', $marc->title ); # do cool stuff with the title here # output the cool stuff print OUT encode( 'FOO', $_245 ); } # done close OUT; exit; My problem is, I don't know what to put in place of FOO. What is the official name of MARC-8's encoding scheme? -- Eric "The Ugly American" Morgan University of Notre Dame (574) 631-8604