tional/questions/qa-forms-utf-8
You'll need to add the MARC control characters ^_, ^^, and ^] to the ASCII part
of the expression in the above page. (I think the w3c example is aimed at XML1.0
in which the MARC control characters are not allowed.)
Ashley.
--
Ashley Sanders a.sand...@manchester
but the results weren't very good.
>>
>> Anyone have recommendations? I'd prefer it be in perl but it doesn't have to
>> be.
>>
>> Arvin
>>
--
Ashley Sanders a.sand...@manchester.ac.uk
Copac http://copac.ac.uk -- A Mimas service funded by JISC
-8 data mixed in with UTF-8. Or a MARC-8 record with
the wrong leader info.
Unfortunately bad UTF-8 is pretty common in my experience. You can
use a regexp to check if something is valid utf-8:
http://keithdevens.com/weblog/archive/2004/Jun/29/UTF-8.regex
Then it's up to you to take appropriate
e MARC end-of-record characters
into newlines. Then use the split command to carve up
the output of tr into files of 1000 records.
You then may have to use tr to convert the newlines back
to MARC end-of-record characters.
Ashley.
--
Ashley Sanders a.sand...@manchester.ac.uk
Copac http://copac.ac.uk A Mimas service funded by JISC
struction, ie:
$mgr = new Net::Z3950::Manager(async => 0, user => xxx, pass => yyy);
or perhaps adding it here instead:
$conn = new Net::Z3950::Connection($mgr, $host, $port, async => 0);
Hope this helps,
Regards,
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
Copac http://copac.ac.uk A Mimas service funded by JISC
k
the same MARC bib record that you are already getting,
and attached to it, a series of holdings records either
in MARC format or in a z39.50 specific format. How you
access these holdings records will, again, vary depending
on the client software you are using.
Regards,
Ashley.
--
Ashley Sanders
appear somewhere.
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
Copac http://copac.ac.uk A MIMAS Service funded by JISC
]|\xf0[cC])
which may be rather too simple. For a critical application I'd come up
with something a bit better (after first eye-balling a load of records.)
Just as an aside, I'm not using perl -- I'm using the Boost Regexp
library for C++ (which is a good implementation of perl regexp
ring of text (which admittedly you don't tend to get in MARC
records) that
tests as UTF-8 is very unlikely to be anything else. Distinguishing
Latin-1 from
MARC-8 is a bit more like guess work. As a test for MARC-8 I look for
the common
combining diacritics followed by a vowel.
Regards,
A
umber of records
it will return in any one reuqest, but I wouldn't have thought
20 records would cause any server a problem (unless they are
very large records.))
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
Copac http://copac.ac.uk -- A MIMAS service funded by JISC
) | ed - xx
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
Copac http://copac.ac.uk -- A MIMAS service funded by JISC
a hash reference. I think you need to use the
-lables option which does take a reference to a hash. The -lables
hash lets you display one thing to the user and return another
value to your script.
Regards,
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
Copac http://copac.ac.uk -- A MIMAS service funded by JISC
t; by
John Doe becomes Doe,Foo in our database.
Which is a long winded way of saying that a simple
substr ($TITLE, 0, 4) may not be appropriate in all
cases.
Regards,
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
Copac http://copac.ac.uk -- A MIMAS service funded by JISC
cord does not play nicely with Unicode (UTF8).
http://rt.cpan.org/NoAuth/Bug.html?id=3707
It is possible they are MARC-8 characters rather than utf-8. In MARC-8
E5 is "macron" and F2 is "dot below." Is MARC::Record trying to treat
than as Unicode when in fact they are MARC-
MARC 21 specifications for record
structure, character sets and exchange media" published in 2000
by the LoC. ISBN 0844410063.
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
COPAC: A public bibliographic database from MIMAS, funded by JISC
http://copac.ac.uk/ - [EMAIL PROTECTED]
eroes."
So by the above definitions a tag of "00A" is a control field
whereas "SYS" is a data field.
Regards,
Ashley.
--
Ashley Sanders [EMAIL PROTECTED]
COPAC: A public bibliographic database from MIMAS, funded by JISC
http://copac.ac.uk/ - [EMAIL PROTECTED]
cords or of a special Z39.50 defined format.
Regards,
Ashley.
--
Ashley Sanders[EMAIL PROTECTED]
COPAC: A public bibliographic database from MIMAS, funded by JISC
http://copac.ac.uk/ - [EMAIL PROTECTED]
and '=' or else weirdness will ensue. :)
Just to point out that '?' and '=' are (amongst many other non alpha-
numeric characters) explicitly allowed by MARC21 for use in local
data elements. So they are standard conforming really.
Ashley.
--
Ashley Sanders[EMAIL PROTECTED]
COPAC: A public bibliographic database from MIMAS, funded by JISC
http://copac.ac.uk/ - [EMAIL PROTECTED]
Ed,
> Thanks for the details Ashley.
The full details (my email had a couple of typos) are at:
http://www.loc.gov/marc/bibliographic/ecbdldrd.html
(I think the above page uses # to represent a space character.)
Ashley.
--
Ashley Sanders[EMAIL PROTEC
Test Author00aTest Title
If an application such as zebra is doing things correctly, then
it has every right to think the record is bad if it sees these
errors.
Of course, it may be something else completely.
Regards,
Ashley.
--
Ashley Sanders[EMAIL PROTEC
'isbn' => $isbn);
> foreach my $i (1..$rs->count) {
> $book = $rs->match($i);
> print $book->title_proper, "\n";
> ... other MARC::Record operations here ...
> }
Have you seen the perl binding of Zoom; an easy to u
21 matches
Mail list logo