Re: Marc::XML with MARC21

2010-01-26 Thread Ed Summers
Oops I forgot to attach the script as promised didn't I. I also meant
to say that this is a fine place to discuss questions about eprints
too. Although I imagine it might be good to ask on eprints specific
lists where there might be more eprints eyes.

//Ed


test.pl
Description: Binary data


Re: Marc::XML with MARC21

2010-01-26 Thread Ed Summers
Hi Michele:

Yes, I see a UTF-8 encoding error in that file when I try to check it
with xmllint (from the libxml2 package):

e...@curry:~/Downloads$ xmllint marc.xml
marc.xml:1: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE0 0x20 0x3A 0x3C
ld code="b">le infrastrutture, l' organizzazione, i contratti e le responsabilit

This causes MARC::Record->new_from_xml to blow up too, with a somewhat
unhelpful error:

not well-formed (invalid token) at line 1, column 1533, byte 1533 at
/usr/lib/perl5/XML/Parser.pm line 187

It looks like your xml file might be in ISO-8859-1 (at least the unix
file command told me):

e...@curry:~/Projects/marc-xml$ file marc.xml
marc.xml: ISO-8859 text, with very long lines, with no line terminators

So you could try to convert your XML string with Encode before handing
it off to MARC::Record->new_from_xml:

  use Encode;
  Encode->from_to($xml, 'iso-8859-1', 'utf-8');

I attached the full script which seems to work OK. Note, if you are on
ubuntu it looks like they are a few versions back on their
libmarc-xml-perl package (v0.88) instead of the latest on CPAN (v0.92)
... and v0.88 doesn't handle namespaces properly...

//Ed


Re: Marc::XML with MARC21

2010-01-26 Thread Michele Pinassi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Ed,

i've did some tests and all works correctly as you said. At this point i
have some troubles importing MARC record into EPrints with MARC plugin
developed by Jose Miguel. But...maybe this is not the right place where
ask about it :-)

Thanks, Michele

Ed Summers ha scritto:
> Hi Michele:
> 
> I copied and pasted the XML from your email and ran it through a
> simple test script (both attached) and the record seemed to be parsed
> ok. What do you see if you run the attached test.pl?
> 
> //Ed
> 


- --
|| Michele Pinassi
|| System Manager Area Sistema Biblioteche - UniSi
|| https://sites.google.com/a/unisi.it/o-zone/
|| Assistenza: +39.577.232299 (int. 2299)
|| Personale: +39.577.232477 (int. 2477)
|| FAX: +39.577.232430 (int. 2430)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkte1aoACgkQFPw35TwkuY57VQCfU/m2CDS7e9eEcGZ3pSsKvFA+
5x8An3HPz1AbHYkiASLXooraP24f0Kms
=1Mdu
-END PGP SIGNATURE-


Re: Marc::XML with MARC21

2010-01-26 Thread Michele Pinassi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Ed,
yes, it works as expected ! I've just tried again with my marc.xml (as
attached) and seems to be encoding problems. Maybe Aleph don't export in
UTF-8 ?

Thanks for your help,
Michele

Ed Summers ha scritto:
> Hi Michele:
> 
> I copied and pasted the XML from your email and ran it through a
> simple test script (both attached) and the record seemed to be parsed
> ok. What do you see if you run the attached test.pl?
> 
> //Ed
> 


- --
|| Michele Pinassi
|| System Manager Area Sistema Biblioteche - UniSi
|| https://sites.google.com/a/unisi.it/o-zone/
|| Assistenza: +39.577.232299 (int. 2299)
|| Personale: +39.577.232477 (int. 2477)
|| FAX: +39.577.232430 (int. 2430)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkteqLwACgkQFPw35TwkuY5f3QCeIjh80sQHCVl4u39gJreI13Dr
lhAAnAhiR/Cs93aROB8EdImVx6k09NTA
=jIAj
-END PGP SIGNATURE-
http://www.loc.gov/MARC21/slim"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd";>^cam^^22^^i^4507626628814075913ITServizio Bibliotecario SeneseRICAVI, 262 p. ;24 cmNavigazione da diportoLegislazioneAntonini,AlfredoMorandi,FrancescoitaLa navigazione da diporto :le infrastrutture, l' organizzazione, i contratti e le responsabilità :atti del convegno, Trieste, 27 marzo 1998 /a cura di Alfredo Antonini e Francesco MorandiMilano :Giuffrè1999Collana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio EmiliaNuova serie ;0048Collana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia0048343.4509620^^sxx^|r^|||

Re: Marc::XML with MARC21

2010-01-25 Thread Ed Summers
Hi Michele:

I copied and pasted the XML from your email and ran it through a
simple test script (both attached) and the record seemed to be parsed
ok. What do you see if you run the attached test.pl?

//Ed


test.pl
Description: Binary data
http://www.loc.gov/MARC21/slim";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd";>^cam^^22^^i^4507626628814075913ITServizio
Bibliotecario SeneseRICAVI, 262 p.
;24
cmNavigazione da
diportoLegislazioneAntonini,AlfredoMorandi,FrancescoitaLa navigazione da diporto
:le infrastrutture, l'
organizzazione, i contratti e le responsabilità
:atti del convegno, Trieste,
27
marzo 1998 /a cura di
Alfredo
Antonini e Francesco
MorandiMilano
:Giuffrè1999Collana del Dipartimento di
scienze giuridiche e della Facoltà di giurisprudenza dell'
Università di
Modena e Reggio EmiliaNuova
serie ;0048Collana del Dipartimento di
scienze giuridiche e della Facoltà di giurisprudenza dell'
Università di
Modena e Reggio Emilia0048343.4509620^^sxx^|r^|||


Re: Marc::XML with MARC21

2010-01-25 Thread Jon Gorman
>
> my $file = MARC::Record->new_from_xml($marc->serialize(),"UTF-8","MARC21");
>        $epdata = $plugin->EPrints::Plugin::Import::MARC::convert_input(
> $file );
>
> and here come troubles: only few metadatas will be interpreted
> correctly, losing a lot of datas.

Ummm, so what metdata makes it through?  I see examples of what you
feed it, but not what is coming out.  Just from looking quickly at the
MarcXML the only thing that seems really weird right away is the
trailing 008 for the control field for the leader.  Don't know what
the xsd states about the ordering, but typically all the controlfields
are at the top of a MARC record.

Jon Gorman