Re: reading and writing of utf-8 with marc::batch
Hi Eric, my first guess would be your terminal is not utf8. If you comment out #binmode( STDOUT, :utf8 ); and that does the trick, then you can start looking for how to change your terminal settings. (And that can sometimes be a rather frustrating task, I'm afraid) /Leif Andersson Stockholm UL Från: Eric Lease Morgan [emor...@nd.edu] Skickat: den 26 mars 2013 21:22 Till: perl4lib@perl.org Ämne: reading and writing of utf-8 with marc::batch For the life of me I can't figure out how to do reading and writing of UTF-8 with MARC::Batch. I have a UTF-8 encoded file of MARC records. Dumping the records and greping for a particular string illustrates the validity: $ marcdump und.marc | grep Sainte-Face und.marc 1000 records 2000 records 3000 records 4000 records 5000 records 6000 records 7000 records 8000 records 9000 records 1 records 11000 records 12000 records 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face 610 20 _aArchiconfrérie de la Sainte-Face 13000 records $ I then run a Perl script that simply reads each record and dumps it to STDOUT. Notice how I define both my input and output as UTF-8: #!/shared/perl/current/bin/perl # configure use constant MARC = './und.marc'; # require use strict; use MARC::Batch; # initialize binmode ( MARC, :utf8 ); my $batch = MARC::Batch-new( 'USMARC', MARC ); $batch-strict_off; $batch-warnings_off; binmode( STDOUT, :utf8 ); # read write while ( my $marc = $batch-next ) { print $marc-as_usmarc } # done exit; But my output is munged: $ ./marc.pl und.mrc $ marcdump und.mrc | grep Sainte-Face und.mrc 1000 records 2000 records 3000 records 4000 records 5000 records 6000 records 7000 records 8000 records 9000 records 1 records 11000 records 12000 records 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face 610_aArchiconfrérie de la Sainte-Face 13000 records $ What am I doing wrong!? -- Eric Lease Morgan University of Notre Dame 574/631-8604
Re: MARC::Charset 1.34
It gunzips fine, but then there seems to be something wrong with the tar file... /Leif Andersson Stockholm University Library
Re: MARC::Charset 1.34
Corrupt tar file RESOLVED. But here's the background anyway. I downloaded the MARC-Charset-1.34.tar.gz to Windows7. The archive is corrupt was the error message from WinRAR on Windows. Other utils on the same platform agreed. I made a second download with the same result. Now - third try - it suddenly works! So, either something happened during the first two transfers. Or - some mirror somewhere has a corrupt copy. (and my downloads happened to use that mirror) /Leif Från: Galen Charlton [gmcha...@gmail.com] Skickat: den 11 februari 2013 20:01 Till: Leif Andersson Kopia: perl4lib Ämne: Re: MARC::Charset 1.34 Hi, On Mon, Feb 11, 2013 at 10:50 AM, Leif Andersson leif.anders...@sub.su.semailto:leif.anders...@sub.su.se wrote: It gunzips fine, but then there seems to be something wrong with the tar file... Could you elaborate? In particular, what platform are you on and what error message are you getting? I tried installing MARC::Charset 1.34 via a 'cpan MARC::Charset' on a fresh Debian box, and it worked for me. Regards, Galen -- Galen Charlton gmcha...@gmail.commailto:gmcha...@gmail.com
Re: Anybody know what this USMARC.pm error is?
I am not sure I really got this. Because if I did: - The error messages in your orig posting were not the exact error messages. - The original posted code was not the actual code producing the errors - And the sample MARC records, supplied to demonstrate the errors, were actually OK Sometimes you are lucky :-) Glad you solved the case. /Leif
Re: MARC blob to MARC::Record object
I hope you will forgive me for a late addendum. Not only do I have to apologize for the late arrival of this post, I also should apologize for its (lack of) seriousness. Actually - this is in every respect just a programming scherzo, so to speak. (Even though the code below works, at least for me) Now you are all warned. ;-) So: If you are used to letting MARC::Batch read the records from a file, then you can simply read from your database (i.e. your statement handle) like you were reading from a file. Like this: code #!/usr/local/bin/perl -w use DBI; use MARC::Batch; use strict; #BEGIN { #$ENV{NLS_LANG} = ...; #} my $dbh = DBI-connect(...) || die 1; $dbh-{LongReadLen} = 9; $dbh-{LongTruncOk} = 0; my $sql = q( select GetBibBlob(bib_id) from bib_master where rownum = 3 ); my $sth = $dbh-prepare($sql) || die 2; my $rv = $sth-execute() || die 3; # add some magic: tie(*MARC, 'dbfile', $sth); # pass the virtual filehandle to MARC::Batch my $batch = MARC::Batch-new('USMARC', *MARC ); $batch-strict_off; # read as usual while ( my $marc = $batch-next ) { print $marc-as_formatted(), \n\n; } #--- package dbfile; use strict; sub TIEHANDLE { my ($class, $sth) = @_; my $i = { 'sth' = $sth, 'eof' = 0, }; bless $i, $class; } sub READLINE { my ($marc) = $_[0]-{sth}-fetchrow_array() ; if (defined $marc) { my $len = substr($marc,0,5); return substr($marc,0,$len); } else { $_[0]-{'eof'} = 1; return undef; } } sub EOF { # eof() $_[0]-{'eof'}; } sub FILENO {1} sub BINMODE {1} sub CLOSE {1} sub DESTROY {1} __END__ /code That's all folks, /Leif Leif Andersson, Systems Librarian Stockholm University Library Från: Doran, Michael D [do...@uta.edu] Skickat: den 7 januari 2011 15:11 Till: Leif Andersson; 'Jon Gorman'; perl4lib Ämne: RE: MARC blob to MARC::Record object Hi Leif and Jon, use MARC::Record; ... my $record = MARC::Record-new_from_usmarc( $blob ); This works! From: Jon Gorman [mailto:jonathan.gor...@gmail.com] Sent: Friday, January 07, 2011 7:51 AM You'll probably think of this when you get up, but did you make sure to import the package? ie use MARC::FILE::USMARC;? This made the other way work, too! (I had only use MARC::File) Much thanks to Leif and Jon. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Leif Andersson [mailto:leif.anders...@sub.su.se] Sent: Friday, January 07, 2011 7:50 AM To: Doran, Michael D; perl4lib Subject: Re: MARC blob to MARC::Record object Hi Michael, this is how I - in principle - usually do it: use MARC::Record; ... my $record = MARC::Record-new_from_usmarc( $blob ); /Leif Leif Andersson, Systems librarian Stockholm University Library Från: Doran, Michael D [do...@uta.edu] Skickat: den 7 januari 2011 00:18 Till: perl4lib Ämne: MARC blob to MARC::Record object I am working on a Perl script that retrieves data from our Voyager ILS via an SQL query. Among other data, I have MARC records in blob form, and the script processes one MARC record at a time. I want to be able to parse and modify/convert the MARC record (using MARC::Record) before writing/printing data to a file. How do I make the MARC blob into a MARC::Record object (without having to first save it a file and read it in with MARC::File/Batch)? The MARC blob is already in a variable, so it doesn't make sense (to me) to write it out to a file just so I can read it back in. Unless I have to, natch. I apologize if I am missing something obvious. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/
Re: MARC blob to MARC::Record object
Hi Michael, this is how I - in principle - usually do it: use MARC::Record; ... my $record = MARC::Record-new_from_usmarc( $blob ); /Leif Leif Andersson, Systems librarian Stockholm University Library Från: Doran, Michael D [do...@uta.edu] Skickat: den 7 januari 2011 00:18 Till: perl4lib Ämne: MARC blob to MARC::Record object I am working on a Perl script that retrieves data from our Voyager ILS via an SQL query. Among other data, I have MARC records in blob form, and the script processes one MARC record at a time. I want to be able to parse and modify/convert the MARC record (using MARC::Record) before writing/printing data to a file. How do I make the MARC blob into a MARC::Record object (without having to first save it a file and read it in with MARC::File/Batch)? The MARC blob is already in a variable, so it doesn't make sense (to me) to write it out to a file just so I can read it back in. Unless I have to, natch. I apologize if I am missing something obvious. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/
Re: Moose based Perl library for MARC records
Frédéric, Just out of curiosity - what was your main motivation for writing another MARC module? In what ways does your distribution differ from MARC::Record? /Leif
SV: MARC-perl: different versions yield different results
Hi Galen, Let me tell you I really appreciate the work you and many others have put down in the MARC::Record suite. I don't quite consider myself a programmer. I just happen to do some of my work by taking advantage of programming resources that are available to me. As for the patch, I am not sure my brute hack qualifies as a one. But for now I will leave that up to others to decide. /Leif Från: Galen Charlton [gmcha...@gmail.com] Skickat: den 12 oktober 2010 17:35 Till: Leif Andersson Kopia: Al; perl4lib@perl.org Ämne: Re: MARC-perl: different versions yield different results Hi Leif, On Tue, Oct 12, 2010 at 10:58 AM, Leif Andersson leif.anders...@sub.su.se wrote: To change directly in code like this is totally no-no to many programmers. If you feel uncomfortable with this, there are other methods doing the same stuff. As it happens, this is the very mailing list where patches to MARC::* are typically discussed. Feel free to send one. Regards, Galen -- Galen Charlton gmcha...@gmail.com
Re: MARC-perl: different versions yield different results
This has nothing to do with Perl versions. MARC::Record 1.38 and earlier does not display this problem. MARC::Record 2.0.0, the so called unicode version, introduced the problem you describe. That is when writing records: causing incorrect leader length and corrupted utf-8 There are different ways to deal with this. Myself I have changed one of the modules. MARC::File::USMARC It has a function called encode() around line 315 I have added a use bytes; just before the final return. Like this: use bytes; return join(,$marc-leader, @$directory, END_OF_FIELD, @$fields, END_OF_RECORD); To change directly in code like this is totally no-no to many programmers. If you feel uncomfortable with this, there are other methods doing the same stuff. You could write a package: package MARC_Record_hack; use MARC::File::USMARC; no warnings 'redefine'; sub MARC::File::USMARC::encode() { my $marc = shift; $marc = shift if (ref($marc)||$marc) =~ /^MARC::File/; my ($fields,$directory,$reclen,$baseaddress) = MARC::File::USMARC::_build_tag_directory($marc); $marc-set_leader_lengths( $reclen, $baseaddress ); # Glomp it all together use bytes; return join(,$marc-leader, @$directory, \x1E, @$fields, \x1D); } use warnings; 1; __END__ With the inclusion of this package your original code should work fine, I'd guess. use MARC::Batch; use MARC_Record_hack; my $batch = new MARC::Batch('USMARC', $ARGV[0]); $batch-strict_off (); $batch-warnings_off (); #binmode( STDOUT, ':raw' ); #binmode STDOUT; my $record = $batch-next; print $record-as_usmarc; As a habit I use binmode FH; when I write records to file. It is not needed, but it keeps me from the temptation of doing any other assumptions about character encodings. /Leif Andersson Stockholm University Library Från: Al [ra...@berkeley.edu] Skickat: den 12 oktober 2010 00:03 Till: perl4lib@perl.org Ämne: MARC-perl: different versions yield different results Example marc record is here: http://www.mediafire.com/file/u5cxkrfwh9ew09z/example.zip When I process the record above in perl 5.8, MARC::Record version 1.38, and Encode.pm version 2.12, the record comes out fine. When I use perl 5.10, MARC::Record version 2.0.0, and Encode.pm 2.40 the record comes out corrupted and MARC::Record will no longer read the result. The problem is with a Unicode character (big surprise). The earlier version leaves the \xC3A1 character intact, the later version changes it to \xE1 which is invalid. I've read as many of the perl4lib messages on the subject of UTF-8 as I could but my eyes are spinning. I'm hoping by including a complete but simple perl program and making a MARC record available that somebody can explain to me in detail what is going on. My inclination is to simply revert to the earlier version of perl but perhaps if I really understood the issue that may not be necessary. Here is the test program I use: use MARC::Batch; my $batch = new MARC::Batch('USMARC', $ARGV[0]); $batch-strict_off (); $batch-warnings_off (); #binmode( STDOUT, ':utf8' ); my $record = $batch-next; print $record-as_usmarc; Run the program on the record, then run it again on the output and the second time perl quits with an error: utf8 \xE1 does not map to Unicode at Encode.pm line 174. That should not happen. Why the different behavior with the different versions? I can't see anything wrong with the original record - it's valid UTF8 as far as I can tell. Leader byte 9 is correctly set to 'a'. Uncommenting the binmode line seems to work - the character is output unchanged as is supposed to happen. The problem is my record batches are a mixture of UTF8 and MARC8 and explicitly setting binmode screws things up. I need a solution that transparently handles a mix of record encodings. I rather suspect the problem is with Encode.pm and not MARC perl but I can't be sure. It also may be due to the way perl handles IO between version 5.8 and 5.10. BTW the problem happens on Windows and Unix. Thanks for any advice you can give me, Al
Re: MARC-perl: different versions yield different results
Hi Ed, Yes I ment that the drawback is in modifying a CPAN module locally. Actually, I don't know if there are any undesireable side effects. None that I know of - I have myself used this technique for almost three years now. The idea is that the MARC::Record object per se should be just binary. The efforts made in the leap from 1.38 to 2.0.0 to treat this blob as an (always well formed!) utf8 string, was a mistake in my eyes. It has resulted in at least two common problems. 1. when writing records: the leader length / corrupted utf8 problem I responed to in my post. 2. when reading bad utf8 records: special care has to be taken so that not your whole application just dies at that record Almost all postings to this forum since 2.0.0 has been concerned with one of these problems. (exaggregating a little, but not much) To put in use bytes is a shortcut instead of rewriting a whole bunch of code, which probably is esthetically more pleasing. But it is obviously much more work... And by the way, the second problem can be dealt with by changing sub MARC::File::Encode::marc_to_utf8 { return Encode::decode( 'UTF-8', $_[0], 0 ); # do NOT check if UTF-8 is valid! } Yes, that is also a hack! To sum up. I think it is a good idea to make the MARC blob a binary object, so to speak. I don't know if you should just apply my simple hacks to CPAN code. Or if it is called for a thourough re-write of some parts of the modules. Those changes may involve some changes in coding styles in the scripts that use MARC::Record. But probably all you have to do is to remove all that strange code you put in there as workarounds to the character bugs. And yes, I have been using MARC::Charset in combination with this technique, without any problems that I can recall. :-) /Leif Från: ed.summ...@gmail.com [ed.summ...@gmail.com] f#246;r Ed Summers [...@pobox.com] Skickat: den 12 oktober 2010 17:13 Till: perl4lib@perl.org Ämne: Re: MARC-perl: different versions yield different results Hi Leif, Is the downside to this approach that you are modifying a CPAN module in place, or is it something to do with the behavior of 'use bytes'? Would there be any undesirable side effects to adding 'use bytes' to MARC::File::USMARC::encode on CPAN? //Ed On Tue, Oct 12, 2010 at 7:58 AM, Leif Andersson leif.anders...@sub.su.se wrote: Myself I have changed one of the modules. MARC::File::USMARC It has a function called encode() around line 315 I have added a use bytes; just before the final return. Like this: use bytes; return join(,$marc-leader, @$directory, END_OF_FIELD, @$fields, END_OF_RECORD); To change directly in code like this is totally no-no to many programmers. If you feel uncomfortable with this, there are other methods doing the same stuff. You could write a package: package MARC_Record_hack; use MARC::File::USMARC; no warnings 'redefine'; sub MARC::File::USMARC::encode() { my $marc = shift; $marc = shift if (ref($marc)||$marc) =~ /^MARC::File/; my ($fields,$directory,$reclen,$baseaddress) = MARC::File::USMARC::_build_tag_directory($marc); $marc-set_leader_lengths( $reclen, $baseaddress ); # Glomp it all together use bytes; return join(,$marc-leader, @$directory, \x1E, @$fields, \x1D); } use warnings; 1; __END__
Re: Stripping out Unicode combining characters (diacritics)
I've been doing it like Mike R suggested for quite some while. But some characters do not map nicely into this scheme. So you may want to manually take care of stuff like german eszet, ligature oe etc, etc. s/\x{00df}/ss/g; s/\x{0152}/Oe/g; s/\x{0153}/oe/g; ...to be continued... Leif == Leif Andersson, Systems Librarian Stockholm University Library SE-106 91 Stockholm SWEDEN Phone : +46 8 162769 Mobile: +46 70 6904281 -Ursprungligt meddelande- Från: Doran, Michael D [mailto:[EMAIL PROTECTED] Skickat: den 6 maj 2008 04:13 Till: Mike Rylander Kopia: [EMAIL PROTECTED]; Perl4lib Ämne: RE: Stripping out Unicode combining characters (diacritics) Hi Mike, I appreciate the quick reply. I am familiar with the Unicode::Normalize module (and will also be using that), but I left it out of this question because it's not relevant to the problem I'm currently trying to solve. The text I'm trying to strip diacritics out of does not have precomposed accented characters. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ -Original Message- From: Mike Rylander [mailto:[EMAIL PROTECTED] Sent: Mon 5/5/2008 8:52 PM To: Doran, Michael D Cc: [EMAIL PROTECTED]; Perl4lib Subject: Re: Stripping out Unicode combining characters (diacritics) On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D [EMAIL PROTECTED] wrote: [snip] I'm pulling my hair out on this... so any help would be appreciated. If there's any other info I can provide, let me know. You'll want to transform the text to NFD format (nominally, base characters plus combining marks) instead of NFC (precombined characters) using Unicode::Normalize: use Unicode::Normalize; my $text = NFD($original); $text =~ s/\pM+//go; Hope that helps. -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://www.esilibrary.com
Re: Help for utf-8 output
It seems there is a little bug (by design) kicking in. The leader gets wrong and some characters get wrong in this case: + Reading a raw marc record (utf8) from file + Turning it into a MARC::Record object + Without modification writing it out to file. Yes. Even without modification the bug manifests itself! Let's start with code simply copying one record from a file utf8.mrc containing one or more marc records. This basic operation not involving MARC::Record is OK. #!perl -w use strict; # open(IN, utf8.mrc) || die 1; open(OUT, out_good.mrc) || die 2; binmode IN; binmode OUT; # # Read in raw MARC $/ = \x1D; my $marc = IN; print OUT $marc; __END__ Now, we're adding MARC::Record to the process, along with some debug info. Example code producing *faulty* record: #!perl -w use strict; use MARC::Record; use Devel::Peek; # open(IN, utf8.mrc) || die 1; open(OUT, out_bad.mrc) || die 2; binmode IN; binmode OUT; # # Read in raw MARC $/ = \x1D; my $marc = IN; Dump($marc); # the utf8-flag is not on my $obj = MARC::Record-new_from_usmarc( $marc ); # Convert back to raw MARC my $marc2 = $obj-as_usmarc(); Dump($marc2); # the utf8-flag IS on print OUT $marc2; __END__ In this case the leader and actual length will not agree, as your utf8 characters have turned into latin1. The problem is that $marc2 has the utf8 flag set internally by Perl. And the conversion on output is made in spite of binmode. We can get around the problem by either (for instance) use bytes; or Encode::_utf8_off($marc2); before printing to file. But shouldn't MARC::Record take care of this for us? A file of MARC records may contain records in different encodings. The text parts of a MARC record can be treated as made up by certain encodings, but the blob itself, I suppose, should be exposed to the caller as pure binary. Are there any drawbacks in letting MARC::Record strip off any eventual utf8 flag before returning the record as_usmarc() ? If not I suggest this change be made to a future release of MARC::Record. I shall also add that this character mess only sets in when doing IO. If you are updating your databases through one API or another you are probably OK! Leif == Leif Andersson, Systems Librarian Stockholm University Library SE-106 91 Stockholm SWEDEN Phone : +46 8 162769 Mobile: +46 70 6904281 -Ursprungligt meddelande- Från: Doran, Michael D [mailto:[EMAIL PROTECTED] Skickat: den 21 februari 2008 18:49 Till: perl4lib@perl.org Ämne: RE: Help for utf-8 output Hi Jackie, I'm working on a very similar problem... converting theses/dissertations records (in XML) to MARC records. I'm still in the testing stage, but have had similar problems with records with diacritics in the 100 or 245 fields (however diacritics in a 520a field don't seem to cause any problems). Since our records are not diacritic rich it's hard to determine the exact extent of the problem. I am using these versions: Perl v5.8.8 MARC::Charset 0.98 MARC::Lint 1.43 MARC::Record 2.0 XML::LibXML 1.66 Here's an example bad record (which I have minimized to just the 245 field): marcdump test.mrc test.mrc LDR 00127cam a2200037 4500 245 13 _aAn Empirical Test Of The Situational Leadership® Model In Japan / _cRiho Yoshioka. Recs Errs Filename - - 1 1 test.mrc When I run test.mrc through MARC::Lint, I get this message: Invalid record length in record 1: Leader says 00127 bytes but it's actually 125 Invalid length in directory for tag 245 in record 1 field does not end in end of field character in tag 245 in record 1 When examined in vi the character in question, a Registered Sign, appears to be correctly UTF-8 encoded C2AE, and the bib Leader (position 09=a) indicates that it is Unicode encoded. I've attached the MARC record. I noticed that when I run your record (ck245.dat) through MARC::Lint, I get the same invalid record length message: Invalid record length in record 3: Leader says 00567 bytes but it's actually 569 field does not end in end of field character in tag 100 in record 3 field does not end in end of field character in tag 245 in record 3 Invalid indicators .10 forced to blanks in record 3 for tag 245 field does not end in end of field character in tag 260 in record 3 Invalid indicators . forced to blanks in record 3 for tag 260 field does not end in end of field character in tag 300 in record 3 Invalid indicators . forced to blanks in record 3 for tag 300 field does not end in end of field character in tag 502 in record 3 Invalid indicators . forced to blanks in record 3 for tag 502 field does not end in end of field character in tag 504 in record 3 Invalid indicators . forced to blanks in record 3 for tag 504 field does not end in end of field character in tag 690 in record 3 Invalid indicators . 4 forced to blanks in record 3 for tag 690 Anybody have any ideas? -- Michael # Michael Doran
Re: passing parameters to function as variable
Merritt, I guess you can do it with eval, but you can also do it with an array instead of a string. my @a245 = ('a', 'The Title', 'b', 'Subtitle', 'c', 'Author', 'h', '[Electronic resource]', ); my $field = $record-field('245'); my $revised_245 = MARC::Field-new('245', '', '', @a245); $field-replace_with($revised_245); Leif Andersson -Ursprungligt meddelande- Från: Merritt H Lennox [mailto:[EMAIL PROTECTED] Skickat: den 16 augusti 2007 20:35 Till: perl4lib@perl.org Ämne: passing parameters to function as variable Hi - I was wondering if anyone has tried creating a variable containing the parameter list for a MARC method. In the snippet below, I'm doing some cleanup on subfields of the 245, and inserting a default subfield h for those lacking one. I never know which subfields I'll have, so rather than using a cumbersome series of if...elsif...else statements, I'd construct the string I want to pass to the new() function, and insert it once. If any subfield doesn't exist, the SFstring variable for that subfield is empty. I know that the string in $subfield_list_245 contains the string I want, from having used the print statement [1], but every run gives me the error Field 245 must have at least one subfield. I've tried using the eval statement on the $subfield_list_245, and using various quotation marks around it, but new() just doesn't recognize or want to work with the string. Is there a way to make this approach work? Are the blanks I've concatenated to the end a problem? Or is there a more succinct way of approaching this in the first place? Start snippet if ( ! $subfield_245h ) { my $subfield_list_245 = $a_SFstring . $b_SFstring . $c_SFstring . ',h=' . ' . [Electronic resource] . ' . $n_SFstring . $ p_SFstring; ## print $subfield_list_245; my $revised_245 = MARC::Field-new('245', '', '',$subfield_list_245); $record-replace_with($revised_245); } end snippet Thanks for any ideas! Merritt [1] for my simple test record, the subfield list prints out: a='Accountancy Ireland',h='[Electronic resource]' Merritt Lennox Library Management System Administrator 550A Bird Library Syracuse University Syracuse, NY 13244 315-443-9629 [EMAIL PROTECTED] Never doubt that a small group of thoughtful, committed citizens can change the world. Indeed, it's the only thing that ever has. - Margaret Mead
MARC::Record and importing broken UTF8
Nice, But what is the best way to deal with all those broken UTF8 encodings we encounter over and over again when importing MARC records from outer space? As it is now the application dies with something like 'utf8 \xXX does not map to Unicode at C:/Perl/lib/Encode.pm line 166.' The problem seems to lie in MARC::File::Encode sub marc_to_utf8 { # if there is invalid utf8 date then this will through an exception # let's just hope it's valid :-) return decode( 'UTF-8', $_[0], 1 ); } Is it possible to introduce a sloppy mode switch? Leif == Leif Andersson, Systems Librarian Stockholm University Library SE-106 91 Stockholm SWEDEN Phone : +46 8 162769 Mobile: +46 70 6904281 -Ursprungligt meddelande- Från: Mike Rylander [mailto:[EMAIL PROTECTED] Skickat: den 26 januari 2007 02:35 Till: Public Open-ILS tech discussion; perl4lib Ämne: Re: Fwd: Module update for MARC::Record OK, folks, MARC::Record 2.0.0 is officially out. http://search.cpan.org/~mikery/MARC-Record-2.0.0/lib/MARC/Record.pm Give it a go, and let me know if you see anything broken. Sorry for the delay! -miker
MARC::Charset and transcoding of MARC::Record objects
How would you guys do to transcode a whole MARC record, contained in a MARC::Record object, from MARC8 to UTF8? I can see from the documentation it looks quite easy to do the transcoding on smaller pieces of data using MARC::Charset. But how to deal with it when it comes to whole records? I seem to recall someone on this list mentioning the path MARC Record (MARC8) - MARCXML (UTF8) - MARC Record (UTF8) This trip involves, I'd guess, MARC::File::XML in addition to MARC::Charset But I suspect there may be different approaches here. Leif == Leif Andersson, Systems Librarian Stockholm University Library SE-106 91 Stockholm SWEDEN Phone : +46 8 162769 Mobile: +46 70 6904281
MARC::Record ordering of fields
How would you do to re-order the fields in a MARC::Record-record? I just needed that kind of thing and after some struggeling came up with: @{$record-{_fields}} = sort { lc($a-{_tag}) cmp lc($b-{_tag}) } @{$record-{_fields}}; It seems to work, but I am interested in how others would address the same problem. Leif == Leif Andersson, Systems Librarian Stockholm University Library
Re: Sort with MARC::Record
This is one way to do it: #!/usr/local/bin/perl -w use strict; use MARC::Batch; # sort marc records on field 001 # usage: sort_marc.pl infil.mrc utfil.mrc my $batch = new MARC::Batch( 'USMARC', $ARGV[0] ); my @records = (); my @f001= (); my $idx = 0; while ( my $MARC = $batch-next ) { push(@records, $MARC); push(@f001, [$idx++, $MARC-field(001)-as_string]); } foreach my $rec (sort { $a-[1] = $b-[1] } @f001) { print $records[$rec-[0]]; } __END__ You may need to guard yourself against records having no field 001. 518 records, if that is what you have to deal with, should under normal conditions not raise any memory issues. Leif -Ursprungligt meddelande- Från: Jackie Shieh [mailto:[EMAIL PROTECTED] Skickat: den 31 januari 2005 21:40 Till: perl4lib@perl.org Ämne: Sort with MARC::Record Has anyone sorted a file of hundreds of records by 001? I have a file of 518 records unsorted and a file of sorted ids from 001 (406). I would like to sort my marc 518 records first before extracting the 406 records based on the 2nd file from the set of 518 records. I'd appreciate any suggestions, thanks. --Jackie |Jackie Shieh |Special Projects Collections Team |Harlan Hatcher Graduate Library |University of Michigan |920 North University |Ann Arbor, MI 48109-1205 |Phone: 734.936.2401 FAX: 734.615.9788 |E-mail: [EMAIL PROTECTED]
Return values from MARC::Record
I think the return values from various methods in the MARC::Record distribution could be more intuitive. And also more consistent. If we have a BAD record in $record and try to perform $record-field($tag) we get 0 in return. But if you try $record-subfield($sub) we get undef. I would rather prefer undef for both. With the same BAD record we try $subfield = eval { $record-field($tag)-subfield($sub) } This is the only case where we have to put the code in eval. Should MARC:: take care of the eval for us? I am beginning to think so. At the bottom of this we have the creation of the record. What would we expect to get back from these? my $record1 = MARC::Record-new_from_usmarc( ); my $record2 = MARC::Record-new_from_usmarc( undef ); my $record3 = MARC::Record-new_from_usmarc( '' ); my $record4 = MARC::Record-new_from_usmarc( 'not a valid record' ); Currently they all provide us with a broken record object. From the three first I myself would prefer to get undef in return. That is how MARC::Batch treats the records. The $record4 is a bit more complicated. We have to decide what is a valid record in this context? The answer would be, I'd guess, if we can perform other methods on the object it is a valid record (so far). Leif == Leif Andersson, Systems Librarian Stockholm University Library SE-106 91 Stockholm SWEDEN Phone : +46 8 162769 Mobile: +46 70 6904281