First off, it's entirely possible that you have bad UTF-8 (perhaps rogue MARC-8, perhaps just lousy characters) in your MARC. I know we have plenty of that crap.
You need to tell perl that you'll be outputting UTF-8 using 'bincode' binmode(FILE, ':utf8'); In general, you'll want to do this to basically every file you open for reading or writing. A great overview of Perl and UTF-8 can be found at: http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default On Mon, Jul 30, 2012 at 6:51 PM, Shelley Doljack <sdolj...@stanford.edu>wrote: > Hi, > > I wrote a script that extracts marc records from a file given certain > conditions and puts them in a new file. When my input record is correctly > encoded in UTF-8 and I run my script from windows command prompt, this > warning message appears: "Wide character in print at record_extraction.plline > 99" (the line in my script where I print to a new file using > as_usmarc). I compared the extracted record before and after in MarcEdit > and the diacritic was changed. I tried marcdump newfile.mrc to see what > happens and I get this error: "utf8 \xF4 does not map to Unicode at > C:/Perl64/lib/Encode.pm line 176." When I run my extraction script again > with MARC-8 encoded data then I don't have the same problem. > > The basic outline of my script is: > > my $batch = MARC::Batch->new('USMARC', $input_file); > > while (my $record = $batch->next()) { > #do some checks > #if checks ok then > print FILE $record->as_usmarc(); > } > > Do I need to add something that specifies to interpret the data as UTF-8? > Does MARC::Record not handle UTF-8 at all? > > Thanks, > Shelley > > ---- > Shelley Doljack > E-Resources Metadata Librarian > Metadata and Library Systems > Stanford University Libraries > sdolj...@stanford.edu > 650-725-0167 > -- Bill Dueber Programmer -- Library Systems University of Michigan