Hello all,
Still working on UNICODE in Koha.
We are stuck with a not-so-nice problem. (Many many thanks to the
librarians that wrote marc21 and unimarc standards...)
I explain :
yesterday :
joshua "the new marc::file::xml works fine with utf8 now".
me : "Great ! i'll give it a try"
today :
me : "oh non, ca ne marche pas" (in english : "hey, it doesn't work...")
my XML (coming from zebra) is utf-8, but the MARC::Record after
my $record = MARC::Record->new_from_xml($raw, 'utf8');
is marc8...
1 hour later, joshua wakes up, as most americans and we began digging on
#koha irc.
1 hour later the problem was identified :
PROBLEM :
* in MARC21, the encoding is defined by position 9 of the leader. 'a'
means UTF-8
* in UNIMARC, this is an empty position ! the encoding is in positions
26-27 and 28-29 of 100$a (<200 are all fixed coded fields in unimarc :
http://bibliotheque.bgp-fr.com/Unimarc_abrege.pdf, page 8 for 100$a)
BIG PROBLEM :
MARC::File::XML only checks for position 9, thinking the XML is
necessary a marc21 file.
I think (& joshua agrees) we will have to hack MARC::File::XML to solve
this problem.
We have 2 solutions :
* add a test to define wether we are UNIMARC or MARC21. In UNIMARC,
title is in 200, while 200 is empty in MARC21.
* add a parameter to ->new_as_xml($xml,'UTF-8','UNIMARC') to specify we
are sending the parser an unimarc file.
Ed & al, let me know what you think, thanks.
--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)