A bulletin from the "haste makes waste" department...

>                   $ME =~ s/[\xE1-\xFE]//g;
>                   $TITLE =~ s/[\xE1-\xFE]//g;

Ooops, that should be "E0" instead of "E1" as the first hex value in the 
substitutions:
                   $ME =~ s/[\xE0-\xFE]//g;
                   $TITLE =~ s/[\xE0-\xFE]//g;

Sorry,

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/ 

> -----Original Message-----
> From: Doran, Michael D 
> Sent: Tuesday, January 11, 2005 2:13 PM
> To: perl4lib@perl.org
> Subject: RE: Ignoring Diacritics accessing Fixed Field Data
> 
> Hi Jane,
> 
> These answers assume that the data you are processing:
> 1) is encoded in the MARC-8 character set, and
> 2) consists of the MARC-8 default basic and extended Latin characters.
> 
> > Dave,Ayod\2003
> > Paòt,Kaâs\2002
> > Baks,Dasa\2003
> > ,Viâs\2002
> >
> > Problem 1: As you can see, I don't really want the first four 
> > characters, I want the first four SEARCHABLE characters. How
> > can I tell MARC Record to give me the first four characters, 
> > excluding diacritics?
> 
> Assuming that you asking how to strip out the MARC-8 
> combining diacritic characters, try inserting the 
> substitution commands listed (as shown below) just prior to 
> the substr commands:
> 
> >                 my $ME = $field->subfield('a');
>                   $ME =~ s/[\xE1-\xFE]//g;
> >                 my $four100 = substr( $ME, 0, 4 );
> 
> >                 my $TITLE = $field->subfield('a');
>                   $TITLE =~ s/[\xE1-\xFE]//g;
> >                 my $four245 = substr( $TITLE, 0, 4 );
> 
> -- Michael
> 
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 cell
> # [EMAIL PROTECTED]
> # http://rocky.uta.edu/doran/ 
> 
> > -----Original Message-----
> > From: Jacobs, Jane W [mailto:[EMAIL PROTECTED] 
> > Sent: Tuesday, January 11, 2005 12:30 PM
> > To: perl4lib@perl.org
> > Subject: Ignoring Diacritics accessing Fixed Field Data
> > 
> > Hi folks,
> > 
> > I'm trying to write a routine to construct a text file of 
> > OCLC search key from a group of existing records.  What I 
> > want is something like:
> > 
> > Brah,vasa/2003
> > 
> > That is 1st four letters of 100 + comma + 1st four letters of 
> > 245 + slash + date.
> > 
> > In principle I have this working with:
> > 
> > 
> > open( FOURS, ">4-4-date.txt" );
> > 
> > 
> > while ( my $r = $batch->next() ) {
> >       
> >         my @fields = $r->field( '100' );
> >         foreach my $field ( @fields ) {
> >                 my $ME = $field->subfield('a');
> >                 my $four100 = substr( $ME, 0, 4 );
> >       
> >                 print FOURS "$four100";
> >         }     
> > 
> >         my @fields = $r->field( '245' );
> >         foreach my $field ( @fields ) {
> >                 my $TITLE = $field->subfield('a');
> >                 my $four245 = substr( $TITLE, 0, 4 );
> >                 print FOURS ",$four245";
> >         }     
> > 
> >         my @fields = $r->field( '260' );
> >         foreach my $field ( @fields ) {
> >                 my $PD = $field->subfield('c');
> >                 my $four260 = substr( $PD, 0, 4);
> >                 print FOURS "\\$four260\n";
> >         }                     
> > 
> > 
> > My result was something like:
> > 
> > Dave,Ayod\2003
> > Paòt,Kaâs\2002
> > Baks,Dasa\2003
> > ,Viâs\2002
> > 
> > Problem 1: As you can see, I don't really want the first four 
> > characters, I want the first four SEARCHABLE characters.  How 
> > can I tell MARC Record to give me the first four characters, 
> > excluding diacritics?
> > 
> > Problem 2:  In these examples 260 $c works OK, but I could 
> > get a cleaner result by accessing the date from the fixed 
> > field (008 07-10).  How would I do that?  I was looking in 
> > the tutorial, but couldn't seem to find anything that seemed 
> > to help.  If I'm missing something there please point it up.
> > 
> >  Thanks in advance to anyone who can help.
> > 
> >  
> > JJ
> > 
> > 
> > 
> > **Views expressed by the author do not necessarily represent 
> > those of the Queens Library.**
> > 
> > Jane Jacobs
> > Asst. Coord., Catalog Division
> > Queens Borough Public Library
> > 89-11 Merrick Blvd.
> > Jamaica, NY 11432
> > 
> > tel.: (718) 990-0804
> > e-mail: [EMAIL PROTECTED]
> > FAX. (718) 990-8566 
> > 
> > 
> 

Reply via email to