A bulletin from the "haste makes waste" department... > $ME =~ s/[\xE1-\xFE]//g; > $TITLE =~ s/[\xE1-\xFE]//g;
Ooops, that should be "E0" instead of "E1" as the first hex value in the substitutions: $ME =~ s/[\xE0-\xFE]//g; $TITLE =~ s/[\xE0-\xFE]//g; Sorry, -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Doran, Michael D > Sent: Tuesday, January 11, 2005 2:13 PM > To: perl4lib@perl.org > Subject: RE: Ignoring Diacritics accessing Fixed Field Data > > Hi Jane, > > These answers assume that the data you are processing: > 1) is encoded in the MARC-8 character set, and > 2) consists of the MARC-8 default basic and extended Latin characters. > > > Dave,Ayod\2003 > > Paòt,Kaâs\2002 > > Baks,Dasa\2003 > > ,Viâs\2002 > > > > Problem 1: As you can see, I don't really want the first four > > characters, I want the first four SEARCHABLE characters. How > > can I tell MARC Record to give me the first four characters, > > excluding diacritics? > > Assuming that you asking how to strip out the MARC-8 > combining diacritic characters, try inserting the > substitution commands listed (as shown below) just prior to > the substr commands: > > > my $ME = $field->subfield('a'); > $ME =~ s/[\xE1-\xFE]//g; > > my $four100 = substr( $ME, 0, 4 ); > > > my $TITLE = $field->subfield('a'); > $TITLE =~ s/[\xE1-\xFE]//g; > > my $four245 = substr( $TITLE, 0, 4 ); > > -- Michael > > # Michael Doran, Systems Librarian > # University of Texas at Arlington > # 817-272-5326 office > # 817-688-1926 cell > # [EMAIL PROTECTED] > # http://rocky.uta.edu/doran/ > > > -----Original Message----- > > From: Jacobs, Jane W [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, January 11, 2005 12:30 PM > > To: perl4lib@perl.org > > Subject: Ignoring Diacritics accessing Fixed Field Data > > > > Hi folks, > > > > I'm trying to write a routine to construct a text file of > > OCLC search key from a group of existing records. What I > > want is something like: > > > > Brah,vasa/2003 > > > > That is 1st four letters of 100 + comma + 1st four letters of > > 245 + slash + date. > > > > In principle I have this working with: > > > > > > open( FOURS, ">4-4-date.txt" ); > > > > > > while ( my $r = $batch->next() ) { > > > > my @fields = $r->field( '100' ); > > foreach my $field ( @fields ) { > > my $ME = $field->subfield('a'); > > my $four100 = substr( $ME, 0, 4 ); > > > > print FOURS "$four100"; > > } > > > > my @fields = $r->field( '245' ); > > foreach my $field ( @fields ) { > > my $TITLE = $field->subfield('a'); > > my $four245 = substr( $TITLE, 0, 4 ); > > print FOURS ",$four245"; > > } > > > > my @fields = $r->field( '260' ); > > foreach my $field ( @fields ) { > > my $PD = $field->subfield('c'); > > my $four260 = substr( $PD, 0, 4); > > print FOURS "\\$four260\n"; > > } > > > > > > My result was something like: > > > > Dave,Ayod\2003 > > Paòt,Kaâs\2002 > > Baks,Dasa\2003 > > ,Viâs\2002 > > > > Problem 1: As you can see, I don't really want the first four > > characters, I want the first four SEARCHABLE characters. How > > can I tell MARC Record to give me the first four characters, > > excluding diacritics? > > > > Problem 2: In these examples 260 $c works OK, but I could > > get a cleaner result by accessing the date from the fixed > > field (008 07-10). How would I do that? I was looking in > > the tutorial, but couldn't seem to find anything that seemed > > to help. If I'm missing something there please point it up. > > > > Thanks in advance to anyone who can help. > > > > > > JJ > > > > > > > > **Views expressed by the author do not necessarily represent > > those of the Queens Library.** > > > > Jane Jacobs > > Asst. Coord., Catalog Division > > Queens Borough Public Library > > 89-11 Merrick Blvd. > > Jamaica, NY 11432 > > > > tel.: (718) 990-0804 > > e-mail: [EMAIL PROTECTED] > > FAX. (718) 990-8566 > > > > >