Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Brian Kennison
On Mar 8, 2012, at 1:46 PM, Terray, James wrote:

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9: 
 ordinal not in range(128)


Hello everyone,

I just ran into this the other day when trying to write to a file. I searched 
the documentation and found this:

fp = codecs.open(dc.csv, mode=w, encoding=utf-8) 

This opens a file that is utf-8 aware and it let me write the file. Doesn't 
answer your question about the encoding but it will let you save the record. 

--
Brian Kennison
Western Connecticut State University


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Tom Keays
I'm out of my depth here, but I'm curious how this all works. Is it true
that, in MARC8 records, there is supposed to be an 066 field included that
defines non-Latin character sets? I'm making this conclusion from some
things I read on the LOC website. ANSEL is mentioned as one of the
instances where this might be necessary.

http://www.loc.gov/marc/specifications/speccharucs.html#field066
http://www.loc.gov/marc/specifications/speccharconversion.html#escape
http://www.loc.gov/marc/bibliographic/bd066.html


On Thu, Mar 8, 2012 at 1:02 PM, Godmar Back god...@gmail.com wrote:

 Hi,

 a few days ago, I showed pymarc to a group of technical librarians to
 demonstrate how easily certain tasks can be scripted/automated.

 Unfortunately, it blew up at me when I tried to write a record:

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9:
 ordinal not in range(128)

 Investigation revealed this culprit:

 =LDR  00916nam a2200241I  4500
 =001  ocm10685946
 =005  19880203211447.0
 =007  cr\bn||abp
 =007  cr\bn||cda
 =008  840503s1939gw00010\ger\d
 =040  \\$aMBB$cMBB$dCRL
 =049  \\$aCRLL
 =100  10$aEsser, Hermann,$d1900-
 =245  14$aDie jE8udischer Weltpest ;$bjudendE1ammerung auf dem
 Erdball,$cvon Hermann Esser.
 =260  0\$aME8unchen,$bZentralverlag der N S D A P., F. Eher
 ahchf.,$c1939.
 =300  \\$a243 [1] p.$c23 cm.
 =533  \\$aAlso available as electronic reproduction.$bChicago :$cCenter for
 Research Libraries,$d[2009]
 =650  \0$aJewish question.
 =700  12$aBierbrauer, Johann Jacob,$d1705-1760?
 =710  2\$aCenter for Research Libraries (U.S.)
 =856  41$uhttp://dds.crl.edu/CRLdelivery.asp?tid=10538$zOnline version
 =907  \\$a.b28931622$b08-30-10$c08-30-10
 =998  \\$awww$b08-30-10$cm$dz$e-$fger$ggw $h4$i0

 The leader[9] field is set to 'a', so the record should contain
 UTF8-encoded Unicode [1], but E8 75 in the 245$a appears to be ANSEL where
 'E8' denotes the Umlaut preceding the lowercase 'u' (0x75). [2]

 To me, this record looks misencoded... am I correct here? There are
 thousands of such records in the data set I'm dealing with, which was
 obtained using the 'Data Exchange' feature of III's Millennium system.

 My question is how others, especially pymarc users dealing with III
 records, deal with this issue or whatever other
 experiences/hints/practices/kludges exist in this area.

 Thanks.

  - Godmar

 [1] http://www.loc.gov/marc/bibliographic/bdleader.html
 [2] http://lcweb2.loc.gov/diglib/codetables/45.html



Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Mark A. Matienzo
On Fri, Mar 9, 2012 at 7:23 AM, Godmar Back god...@gmail.com wrote:

 Mark, while I would be able to contribute code to pymarc, I probably won't
 (unless my collaborators' needs in respect to pymarc become urgent.)

Such is our conundrum. Most of my uses of pymarc only involve reading
records, not writing them.

 That's something occasional
 contributors cannot do, it requires work by the core team, in discussion
 with frequent users.

If you've looked at some of the past issues, you may have seen that
we've had some healthy discussions. Not all are resolved, clearly.
Speaking as an individual and not for the pymarc team, I agree that we
need this discussion.

 (I would have liked to take this discussion to a
 pymarc-users list, but didn't find any.)

Per the README [0]: The pymarc developers encourage you to join the
pymarc Google Group [1] if you need help.  Also, please feel free to
use issue tracking [2] on Github to to submit feature requests or bug
reports. If you've got an itch to scratch, please scratch it, and send
merge requests on Github [3].


[0] https://github.com/edsu/pymarc/blob/master/README.md
[1] http://groups.google.com/group/pymarc
[2] https://github.com/edsu/pymarc/issues
[3] https://github.com/edsu/pymarc

Mark


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Michael B. Klein
The internal discussion then becomes, I have a need, and I've written
something that satisfies it. I think it could also be useful to others, but
I'm not going to have time to make major changes or implement features
others need. Should I open source this or keep it to myself? Does freeing
my code come with an implicit requirement to maintain and support it?
Should it?

I'd vote open source just about every time. If someone sees the need and
has the time to do a functional/requirements analysis and develop a core
team around pymarc, more power to them. The code that's already there will
give them a head start. Or they can start from scratch.

Until then, it will remain a fork-patch-and-pull, community-supported
project.

On Fri, Mar 9, 2012 at 4:23 AM, Godmar Back god...@gmail.com wrote:

 On Thu, Mar 8, 2012 at 3:53 PM, Mark A. Matienzo m...@matienzo.org
 wrote:

  On Thu, Mar 8, 2012 at 3:32 PM, Godmar Back god...@gmail.com wrote:
 
   One side comment here; while smart handling/automatic detection of
   encodings would be a nice feature to have, it would help if pymarc
 could
   operate in an 'agnostic', or 'raw' mode where it would simply preserve
  the
   encoding that's there after a record has been read when writing the
  record.
  
   [ Right now, pymarc does not have such a mode - if leader[9] == 'a',
 the
   data is unconditionally utf8 encoded on output as per mbklein's patch.
 ]
 
  Please feel free to write a patch and submit a pull request if you're
  able to contribute code to do this.
 
 
 Mark, while I would be able to contribute code to pymarc, I probably won't
 (unless my collaborators' needs in respect to pymarc become urgent.)

 I've been contributing to open source for over 15 years, my first major
 contribution having been the ext2fs filesystem code in the FreeBSD kernel (

 http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/filesystems-linux.html
 )
 and I'm a bit confused by how the spirit in the community has changed.  The
 phrase patches welcome used to be reserved for when there was a feature
 request somebody wanted, but you (the owner/maintainer of the software)
 didn't have the time or considered the problem not important.

 Back then, it used to be that all suggestions were welcome. For instance,
 if a user pointed out a typo, you'd fix it. Similarly, if a user or fellow
 developer pointed out a potential design flaw, you'd understand that you
 don't ask for patches, but that you go back to the drawing board and think
 about your software's design. In pymarc's case, what's needed is not more
 code (it already has a moderately confusing set of almost a dozen switches
 for reading/writing), but a requirement analysis where you think about use
 cases you want to support. For instance, whether you want to support
 reading/writing real world records in batches (without touching them) even
 if they have flaws or not. And/Or whether you insist on interpreting a
 record's data in terms of encoding, always. That's something occasional
 contributors cannot do, it requires work by the core team, in discussion
 with frequent users. (I would have liked to take this discussion to a
 pymarc-users list, but didn't find any.)

  - Godmar



Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Godmar Back
On Fri, Mar 9, 2012 at 10:37 AM, Michael B. Klein mbkl...@gmail.com wrote:

 The internal discussion then becomes, I have a need, and I've written
 something that satisfies it. I think it could also be useful to others, but
 I'm not going to have time to make major changes or implement features
 others need. Should I open source this or keep it to myself? Does freeing
 my code come with an implicit requirement to maintain and support it?
 Should it?


It used to be that way, at least it was this way when I grew up in open
source (in the 90s, before Eric Raymond invented the term). And it makes
sense, for successful projects that have at least a moderate number of
users.  Just dumping your code on github helps very few people.


 I'd vote open source just about every time. If someone sees the need and
 has the time to do a functional/requirements analysis and develop a core
 team around pymarc, more power to them. The code that's already there will
 give them a head start. Or they can start from scratch.

 Until then, it will remain a fork-patch-and-pull, community-supported
 project.


It's not just an agreement on design goals the core team must reach, it's
also the issue of maintaining a record (in email discussions/posts and in
the developer's minds) of what issues arose, what legacy decisions were
made, where backwards compatibility is required. That's something
maintainers do, it enables them to reason about future design
decisions. People who feel a sense of ownership and mental
investment. Sure, I could throw in a flag 'dont_utf8_encode' to make the
code work for my case. But it wouldn't improve the software.  (In pymarc's
case, I'd also recommend a discussion about data structures. For instance,
what should the type of the elements of the subfield array be that's passed
to a Field constructor? 8-bit string or unicode objects? The thread you
link to shows ambiguity here.)

Staying with fork-patch-and-pull may help individual people meet their
individual needs, but can prevent wide-spread adoption - and creates
frustration for users who may lack the expertise to track down encoding
errors or who are even unable to understand where the code they're using
lives on their machine. Once a piece of software has reached the stage
where it's distributed as a package (which pymarc, I believe, is), the
distributors have taken on a piece of responsibility. Related, being
unwilling to fix even documentation typos unless someone clones the
repository and delivers a pull request (on a silver platter?) seems unusual
to me, but - perhaps I'm just too old and culturally out of tune with
today's open source movement. (I'm not being ironic here, maybe there has
been a shift and I should just get with it.)

 - Godmar


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Jon Gorman
 It used to be that way, at least it was this way when I grew up in open
 source (in the 90s, before Eric Raymond invented the term). And it makes
 sense, for successful projects that have at least a moderate number of
 users.  Just dumping your code on github helps very few people.


You realize this isn't Apache, right?  It seems a small project,
mostly maintained by folks as they get time.  There's no SCRUM
meetings or hallway meetings, no foundation, no checklist.  Surely you
can't generalize two interactions first as reflective as the culture
of open source.  It seems to have been a small piece of code shared
so others wouldn't have to do it over again and it's grown with time.
The primary thrust seems to be for library developers, not catalogers
or folks learning python code.

The typo you bought up was patched by one of the team-members within
a hour or two from what I can tell.  (Assuming you meant issue #22
https://github.com/edsu/pymarc/issues/22).  From what I can tell
someone patched it in less than an hour.

In general though github is the sourceforge of years past, but even
better.  It seems entirely reasonable to ask for a patch to me.
Perhaps it could have been handled more delicately by both sides.
Perhaps you weren't treated as nicely as you'd like.  There's probably
some truth to that.  But at the same time, Ed did include a wink at
the end after requesting the patch.  Had you perhaps cut him some
slack instead of immediately responding incredulously  you'd find it
was fixed when he got time. Or not.  He has his own priorities as do
other folks who contributed to the code.

If you're unhappy with the dump on github approach, then don't use the
software.  No one ran around forcing folks to do it.  It's one of
those lightweight github approaches, just another approach to open
source software.  In all the years I've also been involved with open
source every project has had it's own unique culture.  There's
responsibility on the user before using software to figure out what it
is.  If it doesn't meet their expectation, I see little reason that
the developer should feel compelled to change unless they're getting
paid for the work.  Obviously some people have found the dump on
github approach useful if they've contributed patches.

Can't we all just shake hands virtually or something?

Jon Gorman


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Godmar Back
On Fri, Mar 9, 2012 at 11:48 AM, Jon Gorman jonathan.gor...@gmail.comwrote:


 Can't we all just shake hands virtually or something?


Here's my hand ||*(  [1].

I overreacted, for which I'm sorry. (Also, I didn't see the entire github
conversation until I just now visited the website, the github email
notification seems selective and only sent me Ed's replies (?) in my
emailbox.)

 - Godmar

[1] http://www.kadifeli.com/fedon/smiley.htm


Re: [CODE4LIB] Sharing code

2012-03-09 Thread Whitworth, Cliff
NOOB to list and am appreciative of this discussion. My boss is encouraging me 
to share code and pointed me to code4lib. the majority of my code is recycled / 
repurposed from others so I've had reservations about sharing mainly because of 
what's taken from others. At the least, I'm mindful about leaving 
acknowledgements intact. Is there a good resource on how to start sharing code 
and ethical considerations? 

Thanks for letting me chime in and best regards, 


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Godmar 
Back
Sent: Friday, March 09, 2012 11:12 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded 
III records

On Fri, Mar 9, 2012 at 11:48 AM, Jon Gorman jonathan.gor...@gmail.comwrote:


 Can't we all just shake hands virtually or something?


Here's my hand ||*(  [1].

I overreacted, for which I'm sorry. (Also, I didn't see the entire github 
conversation until I just now visited the website, the github email 
notification seems selective and only sent me Ed's replies (?) in my
emailbox.)

 - Godmar

[1] http://www.kadifeli.com/fedon/smiley.htm


[CODE4LIB] Job: Metadata and Taxonomy Librarian at Library of Parliament

2012-03-09 Thread jobs
**Metadata and Taxonomy Librarian**  
**Information and Document Resource Service**  
  
Indeterminate Position

  
Classification: LS-3 ($69,866 - $83,554)

(bilingual imperative: CBC/CBC)

  
Closing Date: 2012/03/26

  
**The ideal candidate possesses the following:**  

  * Knowledge of MARC coding, RCAA2, LCSH, CSH, RVM, LC classification and the 
new RDA standards
  * Knowledge of non-MARC metadata schemas such as Dublin Core, MODS and METS
  * Knowledge of information technologies, especially emerging technologies 
applicable to system interoperability
  * Knowledge of standards, codes and protocols used in standardized 
description and metadata
  * General knowledge of the Library of Parliament's products, services and 
publications
  * Excellent analytical skills to design tools adapted to client needs
  * Excellent oral and written communication skills in both official languages
  * Ability to soundly manage time and workload according to individual and 
team priorities
  * Flexibility, resourcefulness and sound judgement
  * Team spirit, initiative and good interpersonal skills
  
**To be considered, candidates must have:**  

  * Preference will be given to candidates with a Master's degree in Library 
Sciences or in Library and Information Sciences from a recognized university; a 
combination of education and extensive experience related directly to the 
position may also be considered
  * Experience with metadata schemas and the development of vocabularies
  * Experience in the standardized description of resources and the use of 
controlled vocabularies
  * Experience in project management, follow up and quality control
  * Experience writing and producing manuals for taxonomy users and overseeing 
the application of guidelines
  * Experience working with an integrated library system
  * Experience holding training sessions and providing technical advice is an 
asset
  
**Candidates retained in this selection process will be required to obtain:**

  * A successful second-language evaluation (bilingual imperative: CBC/CBC)
  * A successful pre-employment screening
  
**Additional information:**

  * This selection process is open to employees of the Senate, the House of 
Commons, the Library of Parliament, the Office of the Senate Ethics Officer, 
the Office of the Conflict of Interest and Ethics Commissioner, the public 
service and the public.
  * A written exam may be administered
  * Qualified candidates from this selection process may be considered for 
temporary or indeterminate positions requiring similar competencies at the 
Library of Parliament
  * Satisfactory references are an essential condition of employment
  * Education and experience requirements will be used as part of the initial 
selection process
  * Proof of education will be required
  * We are committed to employment equity
  
To apply, please send your C.V. and cover letter clearly indicating how you
meet each of the requirements of the position by March 26, 2012. Please quote
Competition 11-I-44.

  
  
By email: lop...@parl.gc.ca

  
By fax: 613-995-9582

  
By mail:

 50 O'Connor Street

 Library of Parliament

 Human Resources Division

 Ottawa, ON K1A 0A9

  
Please address questions to Human Resources at 613-996-2424 or
lop...@parl.gc.ca. We thank all those who apply. Only those selected for
further consideration will be contacted.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/833/


[CODE4LIB] Job: Software Developer (Java) at University of Maryland, College Park

2012-03-09 Thread jobs
**Note: this is a reposting 
of[http://jobs.code4lib.org/job/801/](http://jobs.code4lib.org/job/801/) 
because the close date has been extended to March 23.**  
  
An opportunity exists for one or more experienced software
developers to work within the team environment of the University of Maryland
(UM) Libraries in College Park, the largest university library system in the
region and in close proximity to the nation's capital. Visit the UM Libraries
web-site at http://www.lib.umd.edu.

  
Note: this announcement will be used to fill TWO vacancies.

  
Responsibilities

  
The UM Libraries' Information Technology Division supports the library
automation needs of the University System of Maryland and Affiliated
Institutions (USMAI). Working within a team environment, the successful
candidate(s) will provide broad programming support to the UM Libraries for
the design, development, and delivery of Java-based software applications,
large-scale digital collections, and web interfaces. The successful
candidate(s) will:

  * Design and develop tools for managing production workflows, large-scale 
ingestion, inventory control and preservation of digital collections;
  * Select and utilize appropriate software languages, frameworks and platforms 
for new and existing library projects;
  * Provide object-oriented programming for various library initiatives;
  * Provide web interface development support for digital collection management 
systems;
  * Research and develop applications to interface with bibliographic systems, 
acquisition systems, reference and circulation systems;
  * Utilize project management tools such as JIRA to record and monitor 
progress; and
  * Lead technical development on some projects.
Qualifications

  
Required:

  * Bachelor's degree in a field related to information sciences, computer 
sciences and engineering, or information management
  * Minimum of three (3) years of programming experience using the Java language
  * Experience creating web applications using JSP
  * Experience using JDBC to interact with a relational database such as 
PostgreSQL or MySQL
  * Experience using version control software such as Subversion or Git
  * Excellent interpersonal skills; Excellent written and verbal communications 
skills
APPLICATIONS: Electronic applications required. Please apply online at
https://jobs.umd.edu/applicants/Central?quickFind=56411. No
relocation assistance will be provided. The University of Maryland Libraries
will not sponsor individuals for employment. You must be
legally able to work in the United States. An application
consists of a cover letter which includes the source of advertisement, a
resume and names/e-mail addresses of three references. Applications will be
reviewed as they are received and accepted until March 9, 2012.

  
The University of Maryland, College Park, actively subscribes to a policy of
equal employment opportunity, and will not discriminate against any employee
or applicant because of race, age, sex, color, sexual orientation, physical or
mental disability, religion, ancestry or national origin, marital status,
genetic information, or political affiliation. Minorities and women are
encouraged to apply.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/834/