Hello all,

Still working on UNICODE in Koha.

We are stuck with a not-so-nice problem. (Many many thanks to the librarians that wrote marc21 and unimarc standards...)

I explain :

yesterday :
joshua "the new marc::file::xml works fine with utf8 now".
me : "Great ! i'll give it a try"

today :
me : "oh non, ca ne marche pas" (in english : "hey, it doesn't work...")

my XML (coming from zebra) is utf-8, but the MARC::Record after
    my $record = MARC::Record->new_from_xml($raw, 'utf8');
is marc8...

1 hour later, joshua wakes up, as most americans and we began digging on #koha irc.
1 hour later the problem was identified :
PROBLEM :
* in MARC21, the encoding is defined by position 9 of the leader. 'a' means UTF-8 * in UNIMARC, this is an empty position ! the encoding is in positions 26-27 and 28-29 of 100$a (<200 are all fixed coded fields in unimarc : http://bibliotheque.bgp-fr.com/Unimarc_abrege.pdf, page 8 for 100$a)

BIG PROBLEM :
MARC::File::XML only checks for position 9, thinking the XML is necessary a marc21 file.

I think (& joshua agrees) we will have to hack MARC::File::XML to solve this problem.
We have 2 solutions :
* add a test to define wether we are UNIMARC or MARC21. In UNIMARC, title is in 200, while 200 is empty in MARC21. * add a parameter to ->new_as_xml($xml,'UTF-8','UNIMARC') to specify we are sending the parser an unimarc file.

Ed & al, let me know what you think, thanks.
--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)

Reply via email to