mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Deng, Sai
Sent: Friday, April 20, 2012 8:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding
@LISTSERV.ND.EDU] On Behalf Of
Robert Haschart
Sent: Thursday, April 19, 2012 2:23 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21
On 4/18/2012 12:08 PM, Jonathan Rochkind wrote:
On 4/18/2012 11:09 AM, Doran, Michael D wrote
, 2012 2:14 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding
Ah, thanks Terry.
That canned cleaner in MarcEdit sounds potentially useful -- I'm in a
continuing battle to keep the character encoding in our local marc corpus clean.
(The real blame here is on cataloger
outside of the general
smart quote issue.
--TR
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Deng,
Sai
Sent: Friday, April 20, 2012 6:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding
If a canned cleaner can
# http://rocky.uta.edu/doran/
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Deng, Sai
Sent: Friday, April 20, 2012 8:55 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding
If a canned cleaner can be added
[mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tod
Olson
Sent: Tuesday, April 17, 2012 10:13 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709
and MARC21
In practice it seems to mean UTF-8. At least I've only seen UTF-8, and I can't
AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding
If your records are really in MARC8 not UTF8, your best bet is to use a tool to
convert them to UTF8 before hitting your XSLT.
The open source 'yaz' command line tools can do it for Marc21.
The Marc4J package can
Sent: Thursday, April 19, 2012 11:13 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding
If your records are really in MARC8 not UTF8, your best bet is to use a tool to
convert them to UTF8 before hitting your XSLT.
The open source 'yaz' command line tools can do
quotes/values.
--TR
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Thursday, April 19, 2012 11:13 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding
If your records are really
On 4/19/2012 3:23 PM, LeVan,Ralph wrote:
We see Unicode data pasted into MARC8 records all the time. It happens enough
that my MARC8-Unicode converter takes a second look at illegal MARC8 bytes and
tries a UTF-8 encoding as well.
Right. I see it too. I'm arguing that means cataloger entry
On 4/18/2012 12:08 PM, Jonathan Rochkind wrote:
On 4/18/2012 11:09 AM, Doran, Michael D wrote:
I don't believe that is the case. Take UTF-8 out of the picture, and
consider the MARC-8 character set with its escape sequences and
combining characters. A character such as an n with a tilde
It has to mean UTF-8. ISO 2709 is very byte-oriented, from the directory
structure to the byte-offsets in the fixed fields. The values in these places
all assume 8-bit character data, it's completely baked in to the file format.
-Tod
On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:
for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill
Dueber
Sent: Tuesday, April 17, 2012 5:50 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709
and MARC21
On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero sesunc
On 4/18/2012 6:04 AM, Tod Olson wrote:
It has to mean UTF-8. ISO 2709 is very byte-oriented, from the directory
structure to the byte-offsets in the fixed fields. The values in these places
all assume 8-bit character data, it's completely baked in to the file format.
I'm not sure that
-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Tod Olson
Sent: Wednesday, April 18, 2012 5:04 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more
In fact, I worry that the standard may pre-date UTF-8, with it's
reference to UCS --- if I understand things right, at one point
there
was only one unicode encoding, called UCS, which is basically a
backwards-compatible subset of what became UTF-16.
So I worry the standard really means
UTF-8 was the marc standard from the beginning:
http://www.loc.gov/marc/marbi/1998/98-18.html
The first proposals were a character mapping between Unicode and MARC-8
and didn't mention the character encodings, thus the term UCS which
was a common term for Unicode at that time. (see:
.
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
Doran, Michael D
Sent: Wednesday, April 18, 2012 10:05 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21
Hi Tod,
I'm
/
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Huwig,Steve
Sent: Wednesday, April 18, 2012 9:21 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21
I could be mistaken (never having
I don't know about ISO 2709 itself, but the MARC21 implementation of
it refers to octets, aka 8-bit bytes:
http://www.loc.gov/marc/specifications/specrecstruc.html
Characters may be encoded using one or more than one octet, depending
on the character set. All ASCII characters are encoded using
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 19:55
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21
Okay, forget XML for a
.) ;-)
-- Michael
-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, April 18, 2012 11:09 AM
To: Code for Libraries
Cc: Doran, Michael D
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21
On 4/18/2012 11:09 AM
In practice it seems to mean UTF-8. At least I've only seen UTF-8, and I can't
imagine the code that processes this stuff being safe for UTF-16 or UTF-32. All
of the offsets are byte-oriented, and there's too much legacy code that makes
assumption about null-terminated strings.
-Tod
On Apr
On Tue, Apr 17, 2012 at 7:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
Okay, forget XML for a moment, let's just look at marc 'binary'.
First, for Anglophone-centric MARC21.
Actually Anglo and Francophone centric. And the USMARC style 245 was a poor
replacement for the UKMARC approach
On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero sesunc...@gmail.com wrote:
Actually Anglo and Francophone centric. And the USMARC style 245 was a poor
replacement for the UKMARC approach (someone at the British Library hosted
Linked Data meeting wondered why there were punctation characters
25 matches
Mail list logo