MARC::Charset 1.35

Galen Charlton Tue, 13 Aug 2013 20:07:39 -0700

Hi,

I have uploaded version 1.35 of MARC::Charset to CPAN.  This is a
relatively significant bugfix release, particularly for folks who need
to handle MARC-8 records containing extended Cyrillic and Arabic
characters.  Changes from 1.34 are:


- improve conversion of certain composed characters to MARC8

  Some characters should not be fully decomposed
  before converting them to MARC8.  This patch adds
  a table of such characters, based on Annex A of
  http://www.loc.gov/marc/marbi/2006/2006-04.html
  and on some sample records provided by Jason
  Stephenson of MVLC.

- recognize G0 and G1 characters properly

  When converting from MARC8 to UTF8, MARC::Charset now
  properly recognizes if a (single-byte) MARC8 character falls
  in G0 or G1.

  This is part of the fix for RT#63271 (converting characters
  in the Extended Cyrillic character set), but should also
  fix similar issues with converting characters in the extended
  Arabic set.

  This commit also means that all MARC8 character sets that support
  both G0 and G1 wll be properly converted, regardless of whether
  they're currently set as the G0 or G1 character set.  For example,
  it is now possible to convert Extended Latin as G0 or Basic Latin
  as G1.

  This fixes RT#63271

- have MARC::Charset::Code->marc_value() handle G0/G1 conversion

  Since there's at present no need to do things like have
  ANSEL be the G0 character set when converting from UTF8 to
  MARC8, this commit centralizes the logic for deciding
  whether to return the G0 or G1 MARC8 representation of a
  character.

  Also add MARC::Charset::Code->g0_marc_value(), which returns
  the G0 representation of the character for use by the
  character DB.

- New test cases for converting Vietnamese and Extended Cyrillic
  text.

Regards,

Galen
-- 
Galen Charlton
[email protected]

MARC::Charset 1.35

Reply via email to