Whoops, Sorry to rick for the double-post, but realized I didn't send
my answers to the list directly...

> perl 5.10.0 --
>  linux/ ubuntu  2.6.31-19-generic-pae #56-Ubuntu SMP

Ok, that should have some nice unicode options and should be using
unicode internally.


> however, when I use: LWP, and 'get',
> like this:
>     $URL = "
> http://www.worldcat.org/webservices/catalog/content/$onum?servicelevel=full&wskey=$WSKEY
> ";
>     $xml_text = get $URL;
> and I print xml_text, the copyright symbol is now just 0xA9, so when I do a
>
>  new MARC::Record->new_from_xml($xml_text);

My first place to look at wouldn't be MARC::Record in this case.  I'd
be first looking at what LWP does.  I know it has quite a few encoding
options and perl now has more built-ins with utf-8 as well.  What
happens if you just dump the LWP result to file?  Also, look at the
decoded_content method off of
http://search.cpan.org/~gaas/HTTP-Message-6.02/lib/HTTP/Response.pm.
Also look at what the actual headers return, you may have to override
them.

ie $xml_text->decoded_content

Then make sure the leader is set up correctly.

You might need to use Encoding, but I don't think so.  (I think LWP
would put it into utf-8 by default, but that might change based on
system settings.  If you have problems, look at the encoding stuff,
decode on the way in from LWP, then encode into utf before handing to
MARC::Record.

(Btw, are you using LWP, or are you actually using LWP::Simple?
Simple doesn't have as many unicode options, I'd go with full-scale
LWP).

I know I've used http://juerd.nl/site.plp/perluniadvice in the past.
It's got some useful info.

Jon Gorman

Reply via email to