That worked well!
Thanks!
JJ

**Views expressed by the author do not necessarily represent those of the 
Queens Library.**

Jane Jacobs
Asst. Coord., Catalog Division
Queens Borough Public Library
89-11 Merrick Blvd.
Jamaica, NY 11432

tel.: (718) 990-0804
e-mail: [EMAIL PROTECTED]
FAX. (718) 990-8566



-----Original Message-----
From: Doran, Michael D [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 11, 2005 3:13 PM
To: perl4lib@perl.org
Subject: RE: Ignoring Diacritics accessing Fixed Field Data


Hi Jane,

These answers assume that the data you are processing:
1) is encoded in the MARC-8 character set, and
2) consists of the MARC-8 default basic and extended Latin characters.

> Dave,Ayod\2003
> Paòt,Kaâs\2002
> Baks,Dasa\2003
> ,Viâs\2002
>
> Problem 1: As you can see, I don't really want the first four
> characters, I want the first four SEARCHABLE characters. How
> can I tell MARC Record to give me the first four characters, 
> excluding diacritics?

Assuming that you asking how to strip out the MARC-8 combining diacritic 
characters, try inserting the substitution commands listed (as shown below) 
just prior to the substr commands:

>                 my $ME = $field->subfield('a');
                  $ME =~ s/[\xE1-\xFE]//g;
>                 my $four100 = substr( $ME, 0, 4 );

>                 my $TITLE = $field->subfield('a');
                  $TITLE =~ s/[\xE1-\xFE]//g;
>                 my $four245 = substr( $TITLE, 0, 4 );

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/ 

> -----Original Message-----
> From: Jacobs, Jane W [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 11, 2005 12:30 PM
> To: perl4lib@perl.org
> Subject: Ignoring Diacritics accessing Fixed Field Data
> 
> Hi folks,
> 
> I'm trying to write a routine to construct a text file of
> OCLC search key from a group of existing records.  What I 
> want is something like:
> 
> Brah,vasa/2003
> 
> That is 1st four letters of 100 + comma + 1st four letters of
> 245 + slash + date.
> 
> In principle I have this working with:
> 
> 
> open( FOURS, ">4-4-date.txt" );
> 
> 
> while ( my $r = $batch->next() ) {
>       
>         my @fields = $r->field( '100' );
>         foreach my $field ( @fields ) {
>                 my $ME = $field->subfield('a');
>                 my $four100 = substr( $ME, 0, 4 );
>       
>                 print FOURS "$four100";
>         }     
> 
>         my @fields = $r->field( '245' );
>         foreach my $field ( @fields ) {
>                 my $TITLE = $field->subfield('a');
>                 my $four245 = substr( $TITLE, 0, 4 );
>                 print FOURS ",$four245";
>         }     
> 
>         my @fields = $r->field( '260' );
>         foreach my $field ( @fields ) {
>                 my $PD = $field->subfield('c');
>                 my $four260 = substr( $PD, 0, 4);
>                 print FOURS "\\$four260\n";
>         }                     
> 
> 
> My result was something like:
> 
> Dave,Ayod\2003
> Paòt,Kaâs\2002
> Baks,Dasa\2003
> ,Viâs\2002
> 
> Problem 1: As you can see, I don't really want the first four
> characters, I want the first four SEARCHABLE characters.  How 
> can I tell MARC Record to give me the first four characters, 
> excluding diacritics?
> 
> Problem 2:  In these examples 260 $c works OK, but I could
> get a cleaner result by accessing the date from the fixed 
> field (008 07-10).  How would I do that?  I was looking in 
> the tutorial, but couldn't seem to find anything that seemed 
> to help.  If I'm missing something there please point it up.
> 
>  Thanks in advance to anyone who can help.
> 
>  
> JJ
> 
> 
> 
> **Views expressed by the author do not necessarily represent
> those of the Queens Library.**
> 
> Jane Jacobs
> Asst. Coord., Catalog Division
> Queens Borough Public Library
> 89-11 Merrick Blvd.
> Jamaica, NY 11432
> 
> tel.: (718) 990-0804
> e-mail: [EMAIL PROTECTED]
> FAX. (718) 990-8566
> 
> 

Reply via email to