Re: [CODE4LIB] free source for issn-periodical-type data?

2012-04-17 Thread Michael Hopwood
Just a quick note:

The correct URL for ONIX for Serials is 
http://www.editeur.org/17/ONIX-for-Serials/ - note that this is a family of 
standards, so it covers a very wide range of data types and content. The code 
lists Tom mentioned are available there in human-readable form.

Also: it sounded to me that Ken was after an actual database of the journal 
product type information - something like a serials in print database?

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom 
Pasley
Sent: 16 April 2012 22:15
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] free source for issn-periodical-type data?

Hi Ken,

Actually, I'm not sure this will answer all of your needs - although it does 
cover peer-review:

Metadata fields for an ISSN

A number of metadata fields can be associated with an ISSN number:

   - form: Each ISSN has a production form, indicated by an ONIX production
   form code http://www.editeur.org/onixserials.html. Current supported
   values include: JB ( Printed serial ), JC ( Serial distributed
   electronically by carrier ) ,JD ( Electronic serial distributed online ),
   MA ( Microform )
   - oclcnum: Oclcnum
   - peerreview: Peerreview, 'Y' if the ISSN is peer-reviewed, 'N' if the
   ISSN is not peer-reviewed.
   - publisher: Publisher
   - rawcoverage: Human-readable Coverage
   - title: Title
   - issnl: Linking ISSN, as defined
herehttp://www.issn.org/2-22637-What-is-an-ISSN-L.php
   - rssurl: Journal feed URL, data obtained from 
ticTOCShttp://www.tictocs.ac.uk/

T.

On Tue, Apr 17, 2012 at 1:33 AM, Ken Irwin kir...@wittenberg.edu wrote:

 Hi folks,

 Does anyone know of a free data source that correlates ISSNs with data 
 that includes what kind of publication is this? e.g.

 *Academic journal (+/- peer review?)

 *Popular magazine

 *Newspaper

 *Trade journal

 *Etc

 Obviously, there's some wiggle room in these designations, and I don't 
 need a super-solid answer.

 I've been asked to supply information about our academic journal 
 collection, and I don't have a particularly good way of 
 differentiating between our e-journals and e-magazines, for instance. 
 Individual suppliers might make these distinctions, but I'm really 
 hoping that a query-able (or,
 better: downloadable) file exists.

 Any ideas?

 Thanks
 Ken



[CODE4LIB] Job: Archivist, Institute of Jazz Studies at Rutgers-Newark

2012-04-17 Thread jobs
RESPONSIBILITIES:

The Rutgers University Libraries seek an experienced, innovative, and
serviceoriented librarian to fill the position of Archivist in the Institute
of Jazz Studies, John Cotton Dana Library onthe Newark
Campus of Rutgers, The State University of New Jersey.
Reporting to the Director of the Instituteof Jazz Studies,
the Archivist will take a leadership role in the management and oversight of
archival andresearch collections in IJS in ensuring the
effective provision of library and information services to the
diversecommunity of users. Will receive, arrange, describe,
preserve and create finding aids using best practices andcutting-edge 
techniques for the Institute's archival collections, which
consist of music manuscripts, personalpapers, photographs,
memorabilia, and other materials. Will provide in-depth assistance to visiting
researchersand scholars as well as respond to requests by
mail, email and phone. Will provide materials for the
media,performing arts and other organizations. Will
identify, solicit and steward donors, and advise the Dana
LibraryDirector on the acceptance of gift collections for
the Institute. Will oversee the activities of grant-
fundedarchivists. Will supervise student workers and
interns, including the provision of training in archival
practices.Will provide outreach, enhancing the visibility
of the Institute and its collections, by conducting tours of
theInstitute and preparing exhibits, and represent the
Institute at professional meetings and conferences.
Willcollaborate in the Libraries' digitization
efforts. As a member of a university-wide faculty, the
Archivist isexpected to participate in system-wide
initiatives, committees, and task forces, and to demonstrate
commitmentto continual professional development through
scholarly research relevant to areas of responsibility, including
publications, presentations and participation and leadership in the work of
relevant professional associations.

  
QUALIFICATIONS: 

A record of professional experience in an academic or research library,
archives, orsimilar setting, with emphasis on experience in
archival processing, management and preservation.
Extensiveknowledge of and experience in the development of
EAD finding aids. Extensive knowledge of issues relatingto
managing and preserving digital collections. Awareness of
national issues and trends in archives and incollections
services. Must have the ability and desire to meet tenure and promotion
requirements. Thus, thesuccessful
candidate will have a Master's degree from an ALA-accredited institution
and/or a Master's degreein Archival Studies. Knowledge of
or familiarity with jazz history is desired.

  
SALARY:

Salary and rank will be commensurate with qualifications and experience.

  
STATUS/BENEFITS:

Faculty status, calendar year appointment, retirement plans, life/health
insurance,prescription drug, dental and eyeglass plans,
tuition remission, one month vacation.

  
LIBRARY AND UNIVERSITY PROFILE: 

Rutgers University is a member of the Association of
American Universities. The university, spread over three
regional campuses, includes over 50,000 graduate and
undergraduate students and 2,500 faculty, engaged in
numerous degree-granting, research and professionalprograms
in all disciplines, as well as a broad spectrum of service programs for the
state. Situated on 35 acresin downtown Newark, Rutgers-
Newark is part of a dynamic urban environment and is positioned to take
aleading role in the further revitalization of Newark.
The Newark campus is a doctoral-degree granting
researchinstitution, classified as a Carnegie Research
Intensive institution. Rutgers-Newark offers 14 doctoral
programs:American studies, applied physics, behavioral and
neural science biology, chemistry, criminal
justice,environmental science, global affairs, management,
mathematical sciences, nursing, psychology,
publicadministration, and urban systems. With more than
11,000 graduate and undergraduate students and anticipated



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/891/


[CODE4LIB] Job: Senior Web Development and User Experience Technician, Discovery Systems at Queen's University

2012-04-17 Thread jobs
**Description and Duties:**  
  
Within the framework of established policies, regulations and procedures, in
consultation with the Systems Coordinator, the Division Head of Discovery
Systems and other Discovery Systems staff, the incumbent provides technical
expertise and support for the Library's web presence. Duties include
development of new web applications and maintenance and enhancement of
existing web applications from simple to complex; ensuring the smooth
operation of designated Library web software systems (eg. Drupal, WordPress,
DokuWiki, and in-house web applications) with appropriate documentation, back-
up, maintenance and upgrades; diagnosis, research and troubleshooting of
problems with the Library's web presence, escalating issues as necessary, and
documenting solutions; exploring new web software systems; providing user
support for web systems; assisting in development of web analytics tracking
reports and in implementing analytics and usability driven web page
modifications; providing back-up for other senior Discovery Systems
technicians in the areas of user support, database administration, and user
support.

  
**Qualifications**  
  
Recent college diploma or other post-secondary education specializing in web
application development and user driven design, and a minimum of one year of
proven and recent experience in complex database driven website development,
preferably in a high-demand user-centred environment OR The equivalent
combination of education and experience which must include a minimum of two
years' proven and recent experience in complex database and user focused
website development, preferably in a high-demand user-centred environment.

Proven proficiency with PHP, SQL, relational databases (e.g. MySQL, Oracle,
PostgreSQL), HTML, CSS, CVS/SVN. Desirable experience: Unix system
administration, Apache administration, experience with web application
performance tuning.

  
**Desirable experience: **  
  
Proven experience with library specific web applications (e.g. ILS, Discovery
Layer, Open Journal Software, Institutional Repository, OpenURL resolver).
Proven experience working in a team environment an asset. Familiarity with
Queen's computing infrastructure an asset.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/892/


[CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
I know how char encodings work in MARC ISO binary -- the encoding can 
legally be either Marc8 or UTF8 (nothing else).  The encoding of a 
record is specified in it's header. In the wild, specified encodings are 
frequently wrong, or data includes weird mixed encodings. Okay!


But what's going on with MarcXML?  What are the legal encodings for 
MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in 
XML?  The MARC header is (or can) be present in MarcXML -- trust the 
MARC header, or trust the XML doctype char encoding?


What's the legal thing  to do? What's actually found 'in the wild' with 
MarcXML?


Can anyone advise?

Jonathan


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread LeVan,Ralph
There are probably a couple of answers to that.

XML rules define what characterset is used. The encoding attribute on
the ?xml? header is where you find out what characterset is being
used.

I've always gone under the assumption that if an encoding wasn't
specified, then UTF-8 is in effect and that has always worked for me.
It turns out the standard says US-ASCII is the default encoding.

But, ignoring the encoding, the original MarcXML rules were the same as
the MARC-21 rules for character repertoire and you were suppose to
restrict yourself to characters that could be mapped back into MARC-8.
I don't know if that rule is still in force, but everyone ignores it.

I hope that helps!

Ralph

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 12:35 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: MarcXML and char encodings

I know how char encodings work in MARC ISO binary -- the encoding can 
legally be either Marc8 or UTF8 (nothing else).  The encoding of a 
record is specified in it's header. In the wild, specified encodings are

frequently wrong, or data includes weird mixed encodings. Okay!

But what's going on with MarcXML?  What are the legal encodings for 
MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in 
XML?  The MARC header is (or can) be present in MarcXML -- trust the 
MARC header, or trust the XML doctype char encoding?

What's the legal thing  to do? What's actually found 'in the wild' with 
MarcXML?

Can anyone advise?

Jonathan


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Kyle Banerjee
 What's the legal thing  to do? What's actually found 'in the wild' with
 MarcXML?


In some cases, invalid XML.

In an ideal world, the encoding should be included in the declaration. But
I wouldn't trust it.

kyle


-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
So what if the ?xml? decleration says one charset encoding, but the 
MARC header included in the MarcXML says a different encoding... which 
one is the 'legal' one to believe?


Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an 
entirely different charset that is legal in XML?  If you did that, what 
should the MARC header included in the XML say?


I know how char encodings work in XML.  I don't understand what the 
standards say about how that interacts with the MARC data in MarcXML.


Jonathan

On 4/17/2012 1:51 PM, LeVan,Ralph wrote:

There are probably a couple of answers to that.

XML rules define what characterset is used. The encoding attribute on
the?xml?  header is where you find out what characterset is being
used.

I've always gone under the assumption that if an encoding wasn't
specified, then UTF-8 is in effect and that has always worked for me.
It turns out the standard says US-ASCII is the default encoding.

But, ignoring the encoding, the original MarcXML rules were the same as
the MARC-21 rules for character repertoire and you were suppose to
restrict yourself to characters that could be mapped back into MARC-8.
I don't know if that rule is still in force, but everyone ignores it.

I hope that helps!

Ralph

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 12:35 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: MarcXML and char encodings

I know how char encodings work in MARC ISO binary -- the encoding can
legally be either Marc8 or UTF8 (nothing else).  The encoding of a
record is specified in it's header. In the wild, specified encodings are

frequently wrong, or data includes weird mixed encodings. Okay!

But what's going on with MarcXML?  What are the legal encodings for
MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in
XML?  The MARC header is (or can) be present in MarcXML -- trust the
MARC header, or trust the XML doctype char encoding?

What's the legal thing  to do? What's actually found 'in the wild' with
MarcXML?

Can anyone advise?

Jonathan



Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind

On 4/17/2012 1:57 PM, Kyle Banerjee wrote:

In some cases, invalid XML. In an ideal world, the encoding should be 
included in the declaration. But I wouldn't trust it. kyle 


So would you use the Marc header payload instead?

Or you're just saying you wouldn't trust _any_ encoding declerations you 
find anywhere?


When writing a library to handle marc, I think the base line should be 
making it do the official legal standards-complaint right thing.  Extra 
heuristics to deal with invalid data can be added on top.


But my trouble here is I can't even figure out what the official legal 
standards-compliant thing is.


Maybe that's becuase the MarcXML standard simply doesn't address it, and 
it's all implementation dependent. sigh.


The problem is how the XML documents own char encoding is supposed to 
interact with the MARC header; especially because there's no way to put 
Marc8 in an XML char encoding doctype (is there?);  and whether 
encodings other than Marc8 or UTF8 are legal in MarcXML, even though 
they aren't in MARC ISO binary.


I think the answer might be nobody knows, and there is no standard 
right way to do it. Which is unfortunate.


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind

Okay, maybe here's another way to approach the question.

If I want to have a MarcXML document encoded in Marc8 -- what should it 
look like?  What should be in the XML decleration? What should be in the 
MARC header embedded in the XML?  Or is it not in fact legal at all?


If I want to have a MarcXML document encoded in UTF8, what should it 
look like? What should be in the XML decleration? What should be in the 
MARC header embedded in the XML?


If I want to have a MarcXML document with a char encoding that is 
_neither_ Marc8 nor UTF8, but something else generally legal for XML -- 
is this legal at all? And if so, what should it look like? What should 
be in the XML decleration? What should be in the MARC header embedded in 
the XML?


On 4/17/2012 1:57 PM, Kyle Banerjee wrote:

What's the legal thing  to do? What's actually found 'in the wild' with
MarcXML?


In some cases, invalid XML.

In an ideal world, the encoding should be included in the declaration. But
I wouldn't trust it.

kyle




[CODE4LIB] Job: Director of Library Information Technology Production Services at University of Illinois at Urbana-Champaign

2012-04-17 Thread jobs
**Director of Library Information Technology Production Services**  
Academic Professional Position

University of Illinois at Urbana-Champaign

  
**Position Available**: This position is available July, 2012. This is a 
100%-time, twelve-month appointment Academic Professional position.  
  
**Duties and Responsibilities**: The University of Illinois at Urbana-Champaign 
seeks an innovative, collaborative, and service-oriented professional for the 
position of Director of Library Information Technology Production Services. The 
University Library maintains a robust infrastructure for digital collections 
and services that supports the needs of nearly 100 million virtual visitors 
each year. Reporting to the Associate University Librarian for Information 
Technology Planning and Policy, the successful candidate will oversee the 
staff, technology support, networking, infrastructure, and applications support 
for Library enterprise IT systems, including Infrastructure Management and 
Support (IMS), Workstation and Network Support (WNS), and Help Desk (HD) 
services. See https://jobs.illinois.edu for complete list of duties.  
  
**Qualifications: _Required_:** A Bachelor's degree; experience in a library or 
academic computing services setting; demonstrated experience implementing 
user-focused or customer service technology services in a high-volume academic 
setting; project management experience in substantial computing or information 
system implementations or migrations; experience supervising and mentoring 
technical professionals; ability to facilitate effective prioritization and 
collaboration on projects with multiple customer groups, including domain 
experts, academic users at various skill levels, and IT professionals; 
demonstrated ability to lead and manage professional staff, to make decisions 
in a collaborative team environment, and successfully direct and support 
multiple production operations; excellent oral and written communication 
skills; familiarity with data storage requirements. See 
https://jobs.illinois.edu for list of preferred qualifications.  
  
**To Apply**: To ensure full consideration, please complete your candidate 
profile at https://jobs.illinois.edu and upload a letter of interest, resume, 
and contact information including email addresses for three professional 
references. Applications not submitted through this website will not be 
considered. For questions, please call: 217-333-8169.  
  
**Deadline**: In order to ensure full consideration, applications must be 
received by May 14, 2012.  
  
Illinois is an Affirmative Action /Equal Opportunity Employer and welcomes
individuals with diverse backgrounds, experiences, and ideas who embrace and
value diversity and inclusivity. www.inclusiveillinois.illinois.edu



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/893/


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread LeVan,Ralph
 If I want to have a MarcXML document encoded in Marc8 -- what should
it 
 look like?  What should be in the XML decleration? What should be in
the 
 MARC header embedded in the XML?  Or is it not in fact legal at all?

I'm going out on a limb here, but I don't think it is legal.  There is
no formal encoding that corresponds to MARC-8, so there's no way to tell
XML tools how to interpret the bytes.


 If I want to have a MarcXML document encoded in UTF8, what should it 
 look like? What should be in the XML decleration? What should be in
the 
 MARC header embedded in the XML?

?xml encoding=UTF-8?

I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
really matter to any XML tools.


 If I want to have a MarcXML document with a char encoding that is 
 _neither_ Marc8 nor UTF8, but something else generally legal for XML
-- 
 is this legal at all? And if so, what should it look like? What should

 be in the XML decleration? What should be in the MARC header embedded
in 
 the XML?

I'd claim this is legal, if it is legal XML.  Set your encoding to
anything that is valid.

As a Java programmer, using java XML tools, the encoding is just a hint
to the tools.  I end up with Unicode strings after the XML is read.  So
I always ignore the encoding byte in the leader.

Following that logic, that byte is about encoding.  It has meaning when
ISO 2709 is the transfer mechanism.  But, in this case, XML is the
transfer mechanism and it's rules for identifying the encoding are what
matter.  I'm proposing that the encoding byte in the leader is
meaningless.

Ralph


[CODE4LIB] Code4Lib West Registration Form: July 30, 2012

2012-04-17 Thread Reese, Terry
The University of Oregon Libraries and Oregon State University Libraries invite 
you to code4lib west, Monday, July 30, 2012, at the UO Knight Library. There is 
no registration fee for this conference. Registration is limited to 50 
participants. All participants are expected to deliver a lightning talk. In the 
event registration fills up quickly, limits on participation per institution 
may be employed. Your registration is not confirmed until you receive an email. 
Registrations will be confirmed by April 30, 2012.

URL: 
https://docs.google.com/spreadsheet/viewform?formkey=dGRFM0Zob1dsNEE2RU9VY25SNlllUEE6MQ

--TR

***
Terry Reese, Associate Professor
Gray Family Chair for
Innovative Library Services
121 Valley Library
Corvallis, OR 97331
tel: 541.737.6384
***


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Doran, Michael D
Hi Ralph,

 But, ignoring the encoding, the original MarcXML rules were the same as
 the MARC-21 rules for character repertoire and you were suppose to
 restrict yourself to characters that could be mapped back into MARC-8.
 I don't know if that rule is still in force, but everyone ignores it.

That rule no longer applies per the December 2007 revision of the MARC 21 
Specifications:

To facilitate the movement of records between MARC-8 
and Unicode environments, it was recommended for an 
initial period that the use of Unicode be restricted 
to a repertoire identical in extent to the MARC-8 
repertoire. [...] however, such a restriction is no 
longer appropriate. The full UCS repertoire, as currently 
defined at the Unicode web site, is valid for encoding 
MARC 21 records subject only to the constraints described 
[in the current MARC 21 Specifications].

-- from MARC 21 Specifications (revised December 2007) [1]

-- Michael

[1] http://www.loc.gov/marc/specifications/speccharucs.html

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 LeVan,Ralph
 Sent: Tuesday, April 17, 2012 12:51 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] MarcXML and char encodings
 
 There are probably a couple of answers to that.
 
 XML rules define what characterset is used. The encoding attribute on
 the ?xml? header is where you find out what characterset is being
 used.
 
 I've always gone under the assumption that if an encoding wasn't
 specified, then UTF-8 is in effect and that has always worked for me.
 It turns out the standard says US-ASCII is the default encoding.
 
 But, ignoring the encoding, the original MarcXML rules were the same as
 the MARC-21 rules for character repertoire and you were suppose to
 restrict yourself to characters that could be mapped back into MARC-8.
 I don't know if that rule is still in force, but everyone ignores it.
 
 I hope that helps!
 
 Ralph
 
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 17, 2012 12:35 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: MarcXML and char encodings
 
 I know how char encodings work in MARC ISO binary -- the encoding can
 legally be either Marc8 or UTF8 (nothing else).  The encoding of a
 record is specified in it's header. In the wild, specified encodings are
 
 frequently wrong, or data includes weird mixed encodings. Okay!
 
 But what's going on with MarcXML?  What are the legal encodings for
 MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in
 XML?  The MARC header is (or can) be present in MarcXML -- trust the
 MARC header, or trust the XML doctype char encoding?
 
 What's the legal thing  to do? What's actually found 'in the wild' with
 MarcXML?
 
 Can anyone advise?
 
 Jonathan


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind

Thanks, this is helpful feedback at least.

I think it's completely irrelevant, when determining what is legal under 
standards, to talk about what certain Java tools happen to do though, I 
don't care too much what some tool you happen to use does.


In this case, I'm _writing_ the tools. I want to make them do 'the right 
thing', with some mix of what's actually official legally correct and 
what's practically useful.  What your Java tools do is more or less 
irrelevant to me. I certainly _could_ make my tool respect the Marc 
leader encoded in MarcXML over the XML decleration if I wanted to. I 
could even make it assume the data is Marc8 in XML, even though there's 
no XML charset type for it, if the leader says it's Marc8.


But do others agree that there is in fact no legal way to have Marc8 in 
MarcXML?


Do others agree that you can use non-UTF8 encodings in MarcXML, so long 
as they are legal XML?


I won't even ask someone to cite standards documents, because it's 
pretty clear that LC forgot to consider this when establishing MarcXML.  
(And I have no faith that one could get LC to make a call on this and 
publish it any time this century).


Has anyone seen any Marc8-encoded MarcXML in the wild? Is it common? How 
is it represented with regard to the XML leader and the Marc header?


Has anyone seen any MarcXML with char encodings that are neither Marc8 
nor UTF8 in the wild? Are they common? How are they represented with 
regard to XML leader and Marc header?


On 4/17/2012 2:32 PM, LeVan,Ralph wrote:

If I want to have a MarcXML document encoded in Marc8 -- what should

it

look like?  What should be in the XML decleration? What should be in

the

MARC header embedded in the XML?  Or is it not in fact legal at all?

I'm going out on a limb here, but I don't think it is legal.  There is
no formal encoding that corresponds to MARC-8, so there's no way to tell
XML tools how to interpret the bytes.



If I want to have a MarcXML document encoded in UTF8, what should it
look like? What should be in the XML decleration? What should be in

the

MARC header embedded in the XML?

?xml encoding=UTF-8?

I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
really matter to any XML tools.



If I want to have a MarcXML document with a char encoding that is
_neither_ Marc8 nor UTF8, but something else generally legal for XML


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Sheila M. Morrissey
Re: But do others agree that there is in fact no legal way to have Marc8 in 
MarcXML?

No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in 
the XML prolog, and you will want to be aware that XML processors are only 
REQUIRED to process UTF-8 and UTF-16 -- in practice many (including JAVA-based 
one) can handle other encodings -- but you will have to make sure whatever XML 
processor you use, in whatever language it is written, has a handy-dandy MARC8 
coder/decoder ring

Sheila

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 2:46 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings

Thanks, this is helpful feedback at least.

I think it's completely irrelevant, when determining what is legal under 
standards, to talk about what certain Java tools happen to do though, I 
don't care too much what some tool you happen to use does.

In this case, I'm _writing_ the tools. I want to make them do 'the right 
thing', with some mix of what's actually official legally correct and 
what's practically useful.  What your Java tools do is more or less 
irrelevant to me. I certainly _could_ make my tool respect the Marc 
leader encoded in MarcXML over the XML decleration if I wanted to. I 
could even make it assume the data is Marc8 in XML, even though there's 
no XML charset type for it, if the leader says it's Marc8.

But do others agree that there is in fact no legal way to have Marc8 in 
MarcXML?

Do others agree that you can use non-UTF8 encodings in MarcXML, so long 
as they are legal XML?

I won't even ask someone to cite standards documents, because it's 
pretty clear that LC forgot to consider this when establishing MarcXML.  
(And I have no faith that one could get LC to make a call on this and 
publish it any time this century).

Has anyone seen any Marc8-encoded MarcXML in the wild? Is it common? How 
is it represented with regard to the XML leader and the Marc header?

Has anyone seen any MarcXML with char encodings that are neither Marc8 
nor UTF8 in the wild? Are they common? How are they represented with 
regard to XML leader and Marc header?

On 4/17/2012 2:32 PM, LeVan,Ralph wrote:
 If I want to have a MarcXML document encoded in Marc8 -- what should
 it
 look like?  What should be in the XML decleration? What should be in
 the
 MARC header embedded in the XML?  Or is it not in fact legal at all?
 I'm going out on a limb here, but I don't think it is legal.  There is
 no formal encoding that corresponds to MARC-8, so there's no way to tell
 XML tools how to interpret the bytes.


 If I want to have a MarcXML document encoded in UTF8, what should it
 look like? What should be in the XML decleration? What should be in
 the
 MARC header embedded in the XML?
 ?xml encoding=UTF-8?

 I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
 really matter to any XML tools.


 If I want to have a MarcXML document with a char encoding that is
 _neither_ Marc8 nor UTF8, but something else generally legal for XML


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Houghton,Andrew
 Jonathan Rochkind
 Sent: Tuesday, April 17, 2012 14:18
 Subject: Re: [CODE4LIB] MarcXML and char encodings
 
 Okay, maybe here's another way to approach the question.
 
 If I want to have a MarcXML document encoded in Marc8 -- what should it
 look like?  What should be in the XML decleration? What should be in
 the
 MARC header embedded in the XML?  Or is it not in fact legal at all?
 
 If I want to have a MarcXML document encoded in UTF8, what should it
 look like? What should be in the XML decleration? What should be in the
 MARC header embedded in the XML?
 
 If I want to have a MarcXML document with a char encoding that is
 _neither_ Marc8 nor UTF8, but something else generally legal for XML --
 is this legal at all? And if so, what should it look like? What should
 be in the XML decleration? What should be in the MARC header embedded
 in
 the XML?

You cannot have a MARC-XML document encoded in MARC-8, well sort of, but it's 
not standard. To answer your questions you have to refer to a variety of 
standards:

http://www.w3.org/TR/2008/REC-xml-20081126/#NT-EncodingDecl
In an encoding declaration, the values  UTF-8 ,  UTF-16 ,  ISO-10646-UCS-2 
, and  ISO-10646-UCS-4  should be used for the various encodings and 
transformations of Unicode / ISO/IEC 10646, the values  ISO-8859-1 ,  
ISO-8859-2 , ...  ISO-8859- n  (where n is the part number) should be used 
for the parts of ISO 8859, and the values  ISO-2022-JP ,  Shift_JIS , and  
EUC-JP  should be used for the various encoded forms of JIS X-0208-1997. It is 
recommended that character encodings registered (as charsets) with the Internet 
Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be 
referred to using their registered names; other encodings should use names 
starting with an x- prefix. XML processors should match character encoding 
names in a case-insensitive way and should either interpret an IANA-registered 
name as the encoding registered at IANA for that name or treat it as unknown 
(processors are, of course, not required to support all IANA-!
 registered encodings).

In the absence of information provided by an external transport protocol (e.g. 
HTTP or MIME), it is a fatal error for an entity including an encoding 
declaration to be presented to the XML processor in an encoding other than that 
named in the declaration, or for an entity which begins with neither a Byte 
Order Mark nor an encoding declaration to use an encoding other than UTF-8. 
Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not 
strictly need an encoding declaration.


1) The above says that ?xml version=1.0 ? means the same as ?xml 
version=1.0 encoding=utf-8 ? and if you prefer you can omit the XML 
declaration and that is assumed to be UTF-8 unless there is a BOM (Byte Order 
Mark) which determines UTF-8 vs UTF-16BE vs UTF-16LE.

2) If you really wanted to encode the XML in MARC-8 you need to specify x- 
since if you refer to: http://www.iana.org/assignments/character-sets MARC-8 
isn't a registered character set, hence cannot be specified in the encoding 
attribute unless the name was prefixed with x-. Which implies that no 
standard XML library will know how to convert the MARC-8 characters into 
Unicode so the XML DOM can be used. So unless you want to write your own MARC-8 
= Unicode conversion routines and integrate them your preferred XML library 
it isn't going to work out of the box for anyone else but yourself.

When dealing with MARC-XML you should ignore the values in LDR/00-04, LDR/10, 
LDR/11, LDR/12-16, LDR/20-23. If you look at the MARC-XML schema you will note 
that the definition for leaderDataType specifies LDR/00-04 [\d ]{5}, LDR/10 
and LDR/11 (2| ), LDR/12-16 [\d ]{5}, LDR/20-23 (4500| ). Note the 
MARC-XML schema allows spaces in those positions because they are not relevant 
in the XML format, though very relevant in the binary format.

You probably should ignore LDR/09 since most MARC to MARC-XML converters do not 
change this value to 'a' although many converters do change the value when 
converting MARC binary between MARC-8 and UTF-8. The only valid character set 
for MARC-XML is Unicode and it *should* be encoded in UTF-8 in Unicode 
normalization form D (NFD) although most XML libraries will not know the 
difference if it was encoded as UTF-16BE or UTF-16LE in Unicode normalization 
form D since the XML libraries internally work with Unicode.

I could have sworn that this information was specified on LC's site at one 
point in time, but I'm having trouble finding the documentation.


Hope this helps, Andy.


[CODE4LIB] Job: Web Developer at Michigan Technological University

2012-04-17 Thread jobs
Michigan Technological University's Van Pelt and Opie Library seeks an
energetic, user-focused and collegial Web developer that enjoys working on a
variety of projects with library and IT staff, faculty, and students that
support library services, instruction and research.

  
Michigan Technological University (mtu.edu) is a leading public research
university developing new technologies and preparing students to create the
future for a prosperous and sustainable world. Michigan Tech offers more than
130 undergraduate and graduate degree programs in engineering; forest
resources; computing; technology; business; economics; natural, physical and
environmental sciences; arts; humanities; and social sciences.

  
Located on the Keweenaw Peninsula on Michigan's picturesque and peaceful Upper
Peninsula, Houghton has been named recently as one of America's best small
towns as well as a 10 Top Adrenaline Outposts by National Geographic. Four-
season outdoor activities range from downhill and cross-country skiing,
hiking, birding and fishing. The university's Rozsa Center for the Performing
Arts, the Calumet Theatre, Pine Mountain Festival and numerous year-round
heritage and international festivities provide a range of art, music and craft
opportunities.

  
For more information and to apply, go to:
https://www.jobs.mtu.edu/postings/468

  
Michigan Technological University is an equal opportunity educational
institution/equal opportunity employer, committed to excellence through
diversity in education and employment.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/894/


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Kyle Banerjee

 So would you use the Marc header payload instead?


 Or you're just saying you wouldn't trust _any_ encoding declerations you
 find anywhere?


This.

The short version is that too many vendors and systems just supply some
value without making sure that's what they're spitting out. I haven't had
to mess with this stuff for a few years, so I'm hoping Terry Reese weighs
in on this conversation -- he has a lot of experience dealing with encoding
headaches. However, the bottom line is that the most reliable method is to
use heuristics to detect what's going on. Yeah, that totally kills the
point of listing encodings in first place, but just as is the case with any
unreliably used data point, it's all GIGO.

When writing a library to handle marc, I think the base line should be
 making it do the official legal standards-complaint right thing.  Extra
 heuristics to deal with invalid data can be added on top.


I'm hoping things have improved, but if heuristics are more reliable than
reading the right areas of the record, you have to ignore what's there
(which makes even reading it pointless). I do think there is value in
encouraging vendors to actually pay attention to this stuff as such basic
screwups undermine both the the credibility of the data source and the
service that depends on the data.


 But my trouble here is I can't even figure out what the official legal
 standards-compliant thing is.

 Maybe that's becuase the MarcXML standard simply doesn't address it, and
 it's all implementation dependent. sigh.

 The problem is how the XML documents own char encoding is supposed to
 interact with the MARC header; especially because there's no way to put
 Marc8 in an XML char encoding doctype (is there?);  and whether encodings
 other than Marc8 or UTF8 are legal in MarcXML, even though they aren't in
 MARC ISO binary.

 I think the answer might be nobody knows, and there is no standard right
 way to do it. Which is unfortunate.


A good summary of the situation as I understand it.

kyle

-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Karen Coyle
The discussions at the MARC standards group relating to Unicode all had 
to do with using Unicode *within* ISO2709. I can't find any evidence 
that MARCXML ever went through the standards process. (This may not be a 
bad thing.) So none of what we know about the MARBI discussions and 
resulting standards can really help us here, except perhaps by analogy.


In LC's own example on the MARCXML page (the Sandburg example) the 
Leader is copied without change from the ISO2709/MARC-8 record to the 
MARCXML/Unicode record -- in other words, it still has a blank in offset 
09, which means MARC-8. (The XML record is UTF-8.) My gut feeling is 
that the Leader in MARCXML should be treated like the human appendix -- 
something that once had a use, but is now just being carried along for 
historical reasons. I would not expect it to reflect the XML record 
within which it is embedded. Unfortunately, it is the only source of 
some key information, like type of record. The more I think about it, 
the more MARCXML strikes me as a really messed-up format.


kc



On 4/17/12 11:46 AM, Jonathan Rochkind wrote:

Thanks, this is helpful feedback at least.

I think it's completely irrelevant, when determining what is legal under
standards, to talk about what certain Java tools happen to do though, I
don't care too much what some tool you happen to use does.

In this case, I'm _writing_ the tools. I want to make them do 'the right
thing', with some mix of what's actually official legally correct and
what's practically useful. What your Java tools do is more or less
irrelevant to me. I certainly _could_ make my tool respect the Marc
leader encoded in MarcXML over the XML decleration if I wanted to. I
could even make it assume the data is Marc8 in XML, even though there's
no XML charset type for it, if the leader says it's Marc8.

But do others agree that there is in fact no legal way to have Marc8 in
MarcXML?

Do others agree that you can use non-UTF8 encodings in MarcXML, so long
as they are legal XML?

I won't even ask someone to cite standards documents, because it's
pretty clear that LC forgot to consider this when establishing MarcXML.
(And I have no faith that one could get LC to make a call on this and
publish it any time this century).

Has anyone seen any Marc8-encoded MarcXML in the wild? Is it common? How
is it represented with regard to the XML leader and the Marc header?

Has anyone seen any MarcXML with char encodings that are neither Marc8
nor UTF8 in the wild? Are they common? How are they represented with
regard to XML leader and Marc header?

On 4/17/2012 2:32 PM, LeVan,Ralph wrote:

If I want to have a MarcXML document encoded in Marc8 -- what should

it

look like? What should be in the XML decleration? What should be in

the

MARC header embedded in the XML? Or is it not in fact legal at all?

I'm going out on a limb here, but I don't think it is legal. There is
no formal encoding that corresponds to MARC-8, so there's no way to tell
XML tools how to interpret the bytes.



If I want to have a MarcXML document encoded in UTF8, what should it
look like? What should be in the XML decleration? What should be in

the

MARC header embedded in the XML?

?xml encoding=UTF-8?

I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
really matter to any XML tools.



If I want to have a MarcXML document with a char encoding that is
_neither_ Marc8 nor UTF8, but something else generally legal for XML


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Houghton,Andrew
 Karen Coyle
 Sent: Tuesday, April 17, 2012 15:41
 Subject: Re: [CODE4LIB] MarcXML and char encodings
 
 The discussions at the MARC standards group relating to Unicode all had
 to do with using Unicode *within* ISO2709. I can't find any evidence
 that MARCXML ever went through the standards process. (This may not be
 a
 bad thing.) So none of what we know about the MARBI discussions and
 resulting standards can really help us here, except perhaps by analogy.

Well I can confirm that the MARCXML didn't go through MARBI since I was
one of OCLC's representatives who solidified MARCXML. MARCXML came out
of a meeting at LC between the MARC Standards office, OCLC, RLG, and 
one or two other interested parties whom I cannot remember or find in
my emails or notes about the meeting.


Andy.


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Decasm
Let me make some recommendations. These are what I would consider best 
practices for interoperability.

1) Never put marc8 in xml. Just don't do it. No one expects it. Few will be 
willing to bother with it.

2) Always prefer utf8 for marcxml. You can use any standard charset if you need 
 to, but without special circumstances, use utf8

3) ignore leader 9 in marcxml. Only consider the prolog. (consider not trust.)
If you reasonably can, fail when the charset is Wrong.

/dev

Sent via the Samsung Galaxy S™ II Skyrocket™, an ATT 4G LTE smartphone.

 Original message 
Subject: Re: [CODE4LIB] MarcXML and char encodings 
From: Jonathan Rochkind rochk...@jhu.edu 
To: CODE4LIB@LISTSERV.ND.EDU 
CC:  

Thanks, this is helpful feedback at least.

I think it's completely irrelevant, when determining what is legal under 
standards, to talk about what certain Java tools happen to do though, I 
don't care too much what some tool you happen to use does.

In this case, I'm _writing_ the tools. I want to make them do 'the right 
thing', with some mix of what's actually official legally correct and 
what's practically useful.  What your Java tools do is more or less 
irrelevant to me. I certainly _could_ make my tool respect the Marc 
leader encoded in MarcXML over the XML decleration if I wanted to. I 
could even make it assume the data is Marc8 in XML, even though there's 
no XML charset type for it, if the leader says it's Marc8.

But do others agree that there is in fact no legal way to have Marc8 in 
MarcXML?

Do others agree that you can use non-UTF8 encodings in MarcXML, so long 
as they are legal XML?

I won't even ask someone to cite standards documents, because it's 
pretty clear that LC forgot to consider this when establishing MarcXML.  
(And I have no faith that one could get LC to make a call on this and 
publish it any time this century).

Has anyone seen any Marc8-encoded MarcXML in the wild? Is it common? How 
is it represented with regard to the XML leader and the Marc header?

Has anyone seen any MarcXML with char encodings that are neither Marc8 
nor UTF8 in the wild? Are they common? How are they represented with 
regard to XML leader and Marc header?

On 4/17/2012 2:32 PM, LeVan,Ralph wrote:
 If I want to have a MarcXML document encoded in Marc8 -- what should
 it
 look like?  What should be in the XML decleration? What should be in
 the
 MARC header embedded in the XML?  Or is it not in fact legal at all?
 I'm going out on a limb here, but I don't think it is legal.  There is
 no formal encoding that corresponds to MARC-8, so there's no way to tell
 XML tools how to interpret the bytes.


 If I want to have a MarcXML document encoded in UTF8, what should it
 look like? What should be in the XML decleration? What should be in
 the
 MARC header embedded in the XML?
 ?xml encoding=UTF-8?

 I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
 really matter to any XML tools.


 If I want to have a MarcXML document with a char encoding that is
 _neither_ Marc8 nor UTF8, but something else generally legal for XML


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind

On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:

No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in 
the XML prolog,


Wait, how canyou declare a Marc8 encoding in an XML 
decleration/prolog/whatever it's called?


The things that appear there need to be from a specific list, and I 
didn't think Marc8 was on that list?


Can you give me an example?  And, if you happen to have it, link to XML 
standard that says this is legal?


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread LeVan,Ralph
 No -- it is perfectly legal - -but you MUST declare the encoding to
BE Marc8 in the XML prolog,

 Wait, how canyou declare a Marc8 encoding in an XML 
 decleration/prolog/whatever it's called?

Nope, you can't do that.  There is no approved name for the MARC-8
encoding.  As Andy said, the closest you could get would be to make up
an experimental name, like x-marc-8, but no tool in the world would
recognize that.

Ralph


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Sheila M. Morrissey
In XML standard:

It is RECOMMENDED that character encodings registered (as charsets) 
with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those 
just listed, be referred to usingtheir registered names; other encodings 
SHOULD use names starting with an x- prefix. XML processors SHOULD match 
character encoding names in a case-insensitive way and SHOULDeither 
interpret an IANA-registered name as the encoding registered at IANA for that 
name or treat it as unknown (processors are, of course, not required to support 
all IANA-  registered encodings).


As I suggested -- since MARC8 isn't (so far as I know) registered -- you won't 
get far with most standard tools, in whatever language -- you'll have to extend 
them to first recognize the encoding name, and second, decode the content.

smm

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Tuesday, April 17, 2012 4:19 PM
To: Code for Libraries
Cc: Sheila M. Morrissey
Subject: Re: [CODE4LIB] MarcXML and char encodings



On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:
 No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 
 in the XML prolog,

Wait, how canyou declare a Marc8 encoding in an XML 
decleration/prolog/whatever it's called?

The things that appear there need to be from a specific list, and I 
didn't think Marc8 was on that list?

Can you give me an example?  And, if you happen to have it, link to XML 
standard that says this is legal?


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Eric Lease Morgan
MARC-8. Cool in its time. Dumb now. Typical. --ELM


Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Sheila M. Morrissey
I think this is a case of being in violent agreement -- see some earlier 
replies in this thread -- 
Pragmatically, if you are going to hew to marc-8 encoding transported in XML -- 
you are losing the usefulness of standard tools for xml --
smm

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
LeVan,Ralph
Sent: Tuesday, April 17, 2012 4:21 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings

 No -- it is perfectly legal - -but you MUST declare the encoding to
BE Marc8 in the XML prolog,

 Wait, how canyou declare a Marc8 encoding in an XML 
 decleration/prolog/whatever it's called?

Nope, you can't do that.  There is no approved name for the MARC-8
encoding.  As Andy said, the closest you could get would be to make up
an experimental name, like x-marc-8, but no tool in the world would
recognize that.

Ralph


[CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-17 Thread Jonathan Rochkind

Okay, forget XML for a moment, let's just look at marc 'binary'.

First, for Anglophone-centric MARC21.

The LC docs don't actually say quite what I thought about leader byte 
09, used to advertise encoding:



a - UCS/Unicode
Character coding in the record makes use of characters from the 
Universal Coded Character Set (UCS) (ISO 10646), or Unicode™, an 
industry subset.




That doesn't say UTF-8. It says UCS or Unicode. What does that 
actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to 
what used to be called UCS I think?).  Whatever it actually means, do 
people violate it in the wild?




Now we get to non-Anglophone centric marc. I think all of which is 
ISO_2709?  A standard which of course is not open access, so I can't get 
it to see what it says.


But leader 09 being used for encoding -- is that Marc21 specific, or is 
it true of any ISO-2709?  Marc8 and unicode being the only valid 
encodings can't be true of any ISO-2709, right?


Is there a generic ISO-2709 way to deal with this, or not so much?


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-17 Thread Simon Spero
On Tue, Apr 17, 2012 at 7:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Okay, forget XML for a moment, let's just look at marc 'binary'.

 First, for Anglophone-centric MARC21.


Actually Anglo and Francophone centric. And the USMARC style 245 was a poor
replacement for the UKMARC approach (someone at the British Library hosted
Linked Data meeting wondered why there were punctation characters included
in the data in the title field. The catalogers wept slightly).

Simon


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-17 Thread Bill Dueber
On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero sesunc...@gmail.com wrote:

 Actually Anglo and Francophone centric. And the USMARC style 245 was a poor
 replacement for the UKMARC approach (someone at the British Library hosted
 Linked Data meeting wondered why there were punctation characters included
 in the data in the title field. The catalogers wept slightly).

 Simon



Slightly? I cry my eyes out *every single day* about that. Well, every
weekday, anyway.


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library