[CODE4LIB] Job: Records Management Archivist at Johns Hopkins University

2012-04-18 Thread jobs
The Johns Hopkins University Sheridan Libraries is hiring a Records Management
Archivist to work with the University Archivist to develop an innovative
approach to records management with the purpose of improving our stewardship
of a university history that exists in print, digitized, and born-digital form
and can come from within and outside the boundaries of official university
activity.

  
Essential skills and knowledge areas include:

  
strong technological literacy and curiosity;

deep understanding of born-digital archives and the emerging tools and
techniques used to manage born-digital archives;

experience with traditional archives functions and processes;

comprehension of holistic approaches to information management that account
for recorded memory that takes a variety of forms, including analog,
digitized, and born-digital;

excellent interpersonal and communication skills needed when interviewing a
diverse community of records creators for the purpose of discerning
information management behavior and its impact on the documentation of
university memory;

excellent writing and information visualization skills needed to create
records retention schedules, functional requirements and other documentation,
and information models;

the creativity, entrepreneurial spirit, and critical thinking competencies
needed to play a crucial role in redefining institutional records management
for an increasingly born-digital world in which important analog traces will
endure.

  
Questions about the position should be addressed to jsteele at jhu dot edu (no
phone calls, please). For a more prescribed list of duties
and additional qualifications and to apply to this unique opportunity, please
visit https://hrnt.jhu.edu/jhujobs/job_view.cfm?view_req_id=52166.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/895/


[CODE4LIB] Google Scholar Indexing Guidelines: Highwire Press vs. Eprints vs. BE Press vs. PRISM?

2012-04-18 Thread Brett Bonfield
<<< No Message Collected >>>


Re: [CODE4LIB] Job: Senior Application Developer at New York Public Library

2012-04-18 Thread Cary Gordon
<<< No Message Collected >>>


[CODE4LIB] JCDL 2012 registration opens today, April 5

2012-04-18 Thread Howard, Barrie
<<< No Message Collected >>>


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Tod Olson
In practice it seems to mean UTF-8. At least I've only seen UTF-8, and I can't 
imagine the code that processes this stuff being safe for UTF-16 or UTF-32. All 
of the offsets are byte-oriented, and there's too much legacy code that makes 
assumption about null-terminated strings.

-Tod

On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:

> Okay, forget XML for a moment, let's just look at marc 'binary'.
> 
> First, for Anglophone-centric MARC21.
> 
> The LC docs don't actually say quite what I thought about leader byte 09, 
> used to advertise encoding:
> 
> 
> a - UCS/Unicode
> Character coding in the record makes use of characters from the Universal 
> Coded Character Set (UCS) (ISO 10646), or Unicode™, an industry subset.
> 
> 
> 
> That doesn't say UTF-8. It says UCS or "Unicode". What does that actually 
> mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to what used to be 
> called "UCS" I think?).  Whatever it actually means, do people violate it in 
> the wild?
> 
> 
> 
> Now we get to non-Anglophone centric marc. I think all of which is ISO_2709?  
> A standard which of course is not open access, so I can't get it to see what 
> it says.
> 
> But leader 09 being used for encoding -- is that Marc21 specific, or is it 
> true of any ISO-2709?  Marc8 and "unicode" being the only valid encodings 
> can't be true of any ISO-2709, right?
> 
> Is there a generic ISO-2709 way to deal with this, or not so much?


Re: [CODE4LIB] Job: Senior Application Developer at New York Public Library

2012-04-18 Thread Ross Singer
<<< No Message Collected >>>


[CODE4LIB] Representing geographic hiearchy in linked data

2012-04-18 Thread Ethan Gruber
<<< No Message Collected >>>


[CODE4LIB] Islandora Camp 2012 Registration & Public Brainstorm/Call for Proposals

2012-04-18 Thread David Wilcox
* Apologies for cross-posting *

We're excited to invite you all to the third annual Islandora Camp
(Aug 1-3, 2012).  Islandora Camp welcomes developers, administrators,
and users of Islandora  to meet, learn, and grow the ecosystem!
Registration for Islandora Camp is now open, and is available via the
following link:
http://islandora.ca/node/add/islandora-camp-registration

Registration is $350, and includes a banquet dinner at Stanhope
(http://www.stanhopebeachresort.com) on August 2nd. The agenda is
still pending our call for proposals (see below). However, we expect a
similar structure to last year, with concurrent sessions running all
three days appropriate to both beginners and advanced Islandorians.

Public Brainstorm and Call for Proposals

We've created a Google Moderator stream for Islandora Camp here:
http://www.google.com/moderator/#16/e=1fe634. You can view all of the
presentation ideas and vote on your favourites! You can also suggest
your own ideas for posters, presentations, papers, user groups, and
workshops - just indicate whether you're volunteering to present or
just interested in attending a session on a particular topic. Please
get your suggestions into the system by the end of May to make sure
they're considered for the conference schedule.

Mark your calendars: The Red Island Repository Institute will be back
in 2012. Tentative dates are September 24-28, 2012. We will post more
information as it becomes available.

-- 
David Wilcox, BA, MLIS
Islandora Training/Support Coordinator
Robertson Library
University of Prince Edward Island
dwil...@upei.ca
Skype Name: david.wilcox82
902.620.5167


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Jonathan Rochkind

On 4/18/2012 11:09 AM, Doran, Michael D wrote:

I don't believe that is the case.  Take UTF-8 out of the picture, and consider the MARC-8 
character set with its escape sequences and combining characters.  A character such as an 
"n" with a tilde would consist of two bytes.  The Greek small letter alpha, if 
invoked in accordance with ANSI X3.41, would consist of five bytes (two bytes for the 
initial escape sequence, a byte for the character, and then two bytes for the escape 
sequence returning to the default character set).


ISO 2709 doesn't care how many bytes your characters are. The directory 
and offsets and other things count bytes, not characters. (which was, in 
my opinion, the _right_ decision, for once with marc!)


How bytes translate into characters is not a concern of ISO 2709.

The majority of non-7-bit-ASCII encodings will have chars that are more 
than one byte, either sometimes or always. This is true of MARC8 (some 
chars), UTF8 (some chars), and UTF16 (all chars), all of them. (It is 
not true of Latin-1 though, for instance, I don't think).


ISO 2709 doesn't care what char encodings you use, and there's no 
standard ISO 2709 way to determine what char encodings are used for 
_data_ in the MARC record. ISO 2709 does say that _structural_ elements 
like field names, subfield names, the directory itself, seperator chars, 
etc, all need to be (essentially, over-simplifying) 7-bit-ASCII. The 
actual data itself is application dependent, 2709 doesn't care, and 2709 
doesn't give any standard cross-2709 way to determine it.


That is my conclusion at the moment, helped by all of you all in this 
thread, thanks!


[CODE4LIB] a slight change of venue for code4lib north - also: only five spots remain!

2012-04-18 Thread Mita Williams
There's been a slight change of venue for code4lib north. Instead of the
new engineering building, we will be taking over the 4th floor of the Leddy
Library, University of Windsor.

We made the change because we wanted to make sure we had ample space for
small working groups as there seems to be a lot of interest in hackfest
projects this time around:
http://wiki.code4lib.org/index.php/North

For those registered, please consider giving a talk!  For those still
thinking about registering, well you better hurry because as the time of
this email, only five spots remain: http://c4ln2012.eventbrite.com/

All the best
Mita


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Jonathan Rochkind

On 4/18/2012 10:33 AM, Karen Coyle wrote:

UTF-8 was the marc standard from the beginning:

http://www.loc.gov/marc/marbi/1998/98-18.html


Thank you Karen!

Who wants to try to get LC to update the docs at:

http://www.loc.gov/marc/bibliographic/bdleader.html
and
http://www.loc.gov/marc/bibliographic/concise/bdleader.html

accordingly?  They just say "UCS/Unicode", which is vague, and even 
implies the legacy "UCS" encoding (which is a backwards-compatible 
version of what became UTF-16) instead of UTF-8.


Standards documentation, treat them like they matter if you want them to 
matter!


Jonathan



The first proposals were a character mapping between Unicode and MARC-8
and didn't mention the character encodings, thus the term "UCS" which
was a common term for Unicode at that time. (see:
http://www.loc.gov/marc/marbi/1996/96-10.html). But when it got down to
brass tacks, it was UTF-8, and left open the possibility of UTF-16
(which was still a viable rival to UTF-8 at the time, as I recall.)
UTF-16 had the advantage of every character being of uniform length, but
it also did not cover all of the characters of interest to libraries.

The decision was also made to use byte count rather than character count
in the directory. This was influenced by the UTF-8 decision.

kc

On 4/18/12 7:04 AM, Jonathan Rochkind wrote:

On 4/18/2012 6:04 AM, Tod Olson wrote:

It has to mean UTF-8. ISO 2709 is very byte-oriented, from the
directory structure to the byte-offsets in the fixed fields. The
values in these places all assume 8-bit character data, it's
completely baked in to the file format.


I'm not sure that follows. One could certainly have UTF-16 in a Marc
record, and still count bytes to get a directory structure and byte
offsets. (In some ways it'd be easier since every char would be two
bytes).

In fact, I worry that the standard may pre-date UTF-8, with it's
reference to "UCS" --- if I understand things right, at one point there
was only one unicode encoding, called "UCS", which is basically a
backwards-compatible subset of what became UTF-16.

So I worry the standard really "means" UCS/UTF-16.

But if in fact records in the wild with the 'u' value are far more
likely to be UTF-8... well it's certainly not the first time the MARC21
standard was useless/ignored as a standard in answering such questions.




Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Jonathan Rochkind
Was not talking about MarcXML in this thread, shifted to talking about 
Marc21 binary, changed the subject and started a new thread accordingly.


On 4/18/2012 10:23 AM, LeVan,Ralph wrote:

In fact, I worry that the standard may pre-date UTF-8, with it's
reference to "UCS" ---  if I understand things right, at one point

there

was only one unicode encoding, called "UCS", which is basically a
backwards-compatible subset of what became UTF-16.



So I worry the standard really "means" UCS/UTF-16.


Now you're just trying to scare yourself.  I've never seen UTF-16
MarcXML.  I've never seen anything but UTF-8 encoded MarcXML.

Ralph



Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Simon Spero
On Wed, Apr 18, 2012 at 12:38 PM, Doran, Michael D  wrote:

> > ISO 2709 doesn't care how many bytes your characters are. The directory
> > and offsets and other things count bytes, not characters.
>
> That was exactly my point.  (Which I am stating since you quoted me and I
> couldn't tell if you were refuting my point, or using it to support your
> conclusion.)  ;-)
>

Z39.2 counts octets, but say they're counting characters.  If you find a
record that appears to use characters instead of bytes, ignore it; it's
legacy R'LMARC, which has been declared officially dead- it's Z39.2/ISO2709
that are eternally lying.

UNIMARC *can* allow UCS-2 encodings in data fields, but it  does not seem
possible for this to imply that lengths are in characters on any charitable
reading. This is because the information that UCS-2 will be used is located
at a non-zero offset within a fixed field. If offsets were in character
units, rather than bytes, it would not be possible to locate this value
within the field.

Simon


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Doran, Michael D
> ISO 2709 doesn't care how many bytes your characters are. The directory
> and offsets and other things count bytes, not characters.

That was exactly my point.  (Which I am stating since you quoted me and I 
couldn't tell if you were refuting my point, or using it to support your 
conclusion.)  ;-)

-- Michael

> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Wednesday, April 18, 2012 11:09 AM
> To: Code for Libraries
> Cc: Doran, Michael D
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> On 4/18/2012 11:09 AM, Doran, Michael D wrote:
> > I don't believe that is the case.  Take UTF-8 out of the picture, and
> consider the MARC-8 character set with its escape sequences and combining
> characters.  A character such as an "n" with a tilde would consist of two
> bytes.  The Greek small letter alpha, if invoked in accordance with ANSI
> X3.41, would consist of five bytes (two bytes for the initial escape
> sequence, a byte for the character, and then two bytes for the escape
> sequence returning to the default character set).
> 
> ISO 2709 doesn't care how many bytes your characters are. The directory
> and offsets and other things count bytes, not characters. (which was, in
> my opinion, the _right_ decision, for once with marc!)
> 
> How bytes translate into characters is not a concern of ISO 2709.
> 
> The majority of non-7-bit-ASCII encodings will have chars that are more
> than one byte, either sometimes or always. This is true of MARC8 (some
> chars), UTF8 (some chars), and UTF16 (all chars), all of them. (It is
> not true of Latin-1 though, for instance, I don't think).
> 
> ISO 2709 doesn't care what char encodings you use, and there's no
> standard ISO 2709 way to determine what char encodings are used for
> _data_ in the MARC record. ISO 2709 does say that _structural_ elements
> like field names, subfield names, the directory itself, seperator chars,
> etc, all need to be (essentially, over-simplifying) 7-bit-ASCII. The
> actual data itself is application dependent, 2709 doesn't care, and 2709
> doesn't give any standard cross-2709 way to determine it.
> 
> That is my conclusion at the moment, helped by all of you all in this
> thread, thanks!


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Houghton,Andrew
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Jonathan Rochkind
> Sent: Tuesday, April 17, 2012 19:55
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> Okay, forget XML for a moment, let's just look at marc 'binary'.
> 
> First, for Anglophone-centric MARC21.
> 
> The LC docs don't actually say quite what I thought about leader byte
> 09, used to advertise encoding:
> 
> 
> a - UCS/Unicode
> Character coding in the record makes use of characters from the
> Universal Coded Character Set (UCS) (ISO 10646), or Unicode™, an
> industry subset.
> 
> 
> 
> That doesn't say UTF-8. It says UCS or "Unicode". What does that
> actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to
> what used to be called "UCS" I think?).  Whatever it actually means, do
> people violate it in the wild?
> 
First UCS/Unicode basically means the same thing. Second UTF-8, UTF-16, UTF-32 
are encoding forms for UCS/Unicode. The MARC documentation does actually say 
MARC binary records *must* be encoded UTF-8 when LDR/09 content has the value 
'a'.

You need to refer to the appropriate standards for this information and 
definitions:


Unicode specifies three encoding forms, of which only one, UTF-8 (UCS 
Transformation Format 8), is authorized for use in MARC 21 records.


UCS. Acronym for Universal Character Set, which is specified by International 
Standard ISO/IEC 10646, which is equivalent in repertoire to the Unicode 
Standard.


Unicode Encoding Form. A character encoding form that assigns each Unicode 
scalar value to a unique code unit sequence. The Unicode Standard defines three 
Unicode encoding forms: UTF-8, UTF-16, and UTF-32. (See definition D79 in 
Section 3.9, Unicode Encoding Forms.)


UTF-8. A multibyte encoding for text that represents each Unicode character 
with 1 to 4 bytes, and which is backward-compatible with ASCII. UTF-8 is the 
predominant form of Unicode in web pages. More technically: (1) The UTF-8 
encoding form. (2) The UTF-8 encoding scheme. (3) “UCS Transformation Format 
8,” defined in Annex D of ISO/IEC 10646:2003, technically equivalent to the 
definitions in the Unicode Standard.


UTF-16. A multibyte encoding for text that represents each Unicode character 
with 2 or 4 bytes; it is not backward-compatible with ASCII. It is the internal 
form of Unicode in many programming languages, such as Java, C#, and 
JavaScript, and in many operating systems. More technically: (1) The UTF-16 
encoding form. (2) The UTF-16 encoding scheme. (3) “Transformation format for 
16 planes of Group 00,” defined in Annex C of ISO/IEC 10646:2003; technically 
equivalent to the definitions in the Unicode Standard.

Andy


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Andy Kohler
I don't know about ISO 2709 itself, but the MARC21 implementation of
it refers to octets, aka 8-bit bytes:
http://www.loc.gov/marc/specifications/specrecstruc.html

Characters "may be encoded using one or more than one octet, depending
on the character set. All ASCII characters are encoded using one octet
in the ASCII encoding and the Unicode UTF-8 encoding, thus a character
is equivalent in length to an octet when an element's values are
restricted to ASCII."

--Andy

On Wed, Apr 18, 2012 at 7:20 AM, Huwig,Steve  wrote:
> I could be mistaken (never having had the pleasure of reading it), but
> isn't ISO-2709 specified as a fixed number of characters, and any
> conflation of characters and 8-bit bytes is on the part of users and
> implementations?
>
> I think ISO 2709 might not know from bytes, only characters.


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Karen Coyle
At the time of creation, characters and bytes were 1-to-1 because MARC 
used only ASCII. So there was no distinction at the outset. Some 
positions are still limited to ascii characters (Leader, fixed fields, 
subfield codes, etc.).


kc

On 4/18/12 7:20 AM, Huwig,Steve wrote:

I could be mistaken (never having had the pleasure of reading it), but
isn't ISO-2709 specified as a fixed number of characters, and any
conflation of characters and 8-bit bytes is on the part of users and
implementations?

I think ISO 2709 might not know from bytes, only characters.


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf

Of

Doran, Michael D
Sent: Wednesday, April 18, 2012 10:05 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21

Hi Tod,

I'm not understanding how UTF-8 would be considered 8-bit character
data (other than the ASCII-range of the Unicode repertoire, natch).  I
don't think ISO 2709 knows from characters, only bytes.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf

Of

Tod Olson
Sent: Wednesday, April 18, 2012 5:04 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21

It has to mean UTF-8. ISO 2709 is very byte-oriented, from the

directory

structure to the byte-offsets in the fixed fields. The values in

these

places all assume 8-bit character data, it's completely baked in to

the

file format.

-Tod

On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:


Okay, forget XML for a moment, let's just look at marc 'binary'.

First, for Anglophone-centric MARC21.

The LC docs don't actually say quite what I thought about leader

byte

09, used to advertise encoding:



a - UCS/Unicode
Character coding in the record makes use of characters from the

Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an

industry

subset.




That doesn't say UTF-8. It says UCS or "Unicode". What does that

actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer

to

what used to be called "UCS" I think?).  Whatever it actually means,

do

people violate it in the wild?




Now we get to non-Anglophone centric marc. I think all of which is

ISO_2709?  A standard which of course is not open access, so I can't

get

it to see what it says.


But leader 09 being used for encoding -- is that Marc21 specific,

or is

it true of any ISO-2709?  Marc8 and "unicode" being the only valid
encodings can't be true of any ISO-2709, right?


Is there a generic ISO-2709 way to deal with this, or not so much?


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Archivists' Toolkit: Adding Digital Objects via MySQL

2012-04-18 Thread Mennerich, Donald
Rosalyn,

I've written a number of scripts of this nature. Here's a quick one I wrote 
recently to add DAOs to our AT for an audio digitization project (note it does 
not include file versions, just Components, Instances and DAOs).
It starts at the ResourceComponent identified by the long at the top of the 
script. The resourceId is also hard-coded in a number of places. I've got some 
tidier Java that runs as part of a automated process for a large digitization 
project, but all the basic Inserts are in this: 
https://github.com/yalemssa/ATK_DAO_Scripts/blob/master/components_atk.groovy

Don Mennerich
donald.menner...@yale.edu


From: Rosalyn Metz mailto:rosalynm...@gmail.com>>
Date: Wed, Apr 18, 2012 at 9:23 AM
Subject: [CODE4LIB] Archivists' Toolkit: Adding Digital Objects via MySQL
To: CODE4LIB@listserv.nd.edu


Hi Everyone,

I posted this over on the Archivists' Toolkit listserv and got no response
(yet), so I thought I might try here as well.

I have a large quantity (around 300+) of digital objects that I need to add
to Archivists' Toolkit.  I think I've figured out what queries I need to
run in order to do this in MySQL (rather than the interface) but I wanted
to get opinions from the peanut gallery before trying it out on my test
instance.

It seems that there are actually two update queries that need to be used
when creating a Digital Object.  They are:

insert into ArchDescriptionInstances
(instanceType, resourceComponentId, resourceId, parentResourceId,
instanceDescriminator, archDescriptionInstancesId)
values
('Digital object', 336673, null, 543, 'digital', 22567003)


and...

insert into DigitalObjects
(version, lastUpdated, created, lastUpdatedBy, createdBy, title,
dateExpression, dateBegin, dateEnd, languageCode, restrictionsApply,
eadDaoActuate, eadDaoShow, metsIdentifier, objectType, label, objectOrder,
componentId, parentDigitalObjectId, archDescriptionInstancesId,
repositoryId)
values
(0, '2012-04-17 12:05:15', '2012-04-17 12:05:15', 'username', 'username',
'title', '1938-1959', null, null, '', 0, 'onRequest', 'new', '678.1829',
'text', '', 0, '', null, 22567003, 1)


There also appears to be some update queries as well, but I'm guessing that
they are less important (please correct me if I'm wrong).  Has anyone tried
to do this in the past? If so do you have scripts that will create Digital
Objects for you that you wouldn't mind sharing?  Is there anything you
think I should know before testing this out in my test instance of AT?  Any
caveats for me?

Any help anyone can provide would be greatly appreciated.

Thanks,
Rosalyn


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Doran, Michael D
> I could be mistaken (never having had the pleasure of reading it), but
> isn't ISO-2709 specified as a fixed number of characters, and any
> conflation of characters and 8-bit bytes is on the part of users and
> implementations?

I don't believe that is the case.  Take UTF-8 out of the picture, and consider 
the MARC-8 character set with its escape sequences and combining characters.  A 
character such as an "n" with a tilde would consist of two bytes.  The Greek 
small letter alpha, if invoked in accordance with ANSI X3.41, would consist of 
five bytes (two bytes for the initial escape sequence, a byte for the 
character, and then two bytes for the escape sequence returning to the default 
character set).

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Huwig,Steve
> Sent: Wednesday, April 18, 2012 9:21 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> I could be mistaken (never having had the pleasure of reading it), but
> isn't ISO-2709 specified as a fixed number of characters, and any
> conflation of characters and 8-bit bytes is on the part of users and
> implementations?
> 
> I think ISO 2709 might not know from bytes, only characters.
> 
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> Of
> > Doran, Michael D
> > Sent: Wednesday, April 18, 2012 10:05 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> > ISO_2709 and MARC21
> >
> > Hi Tod,
> >
> > I'm not understanding how UTF-8 would be considered 8-bit character
> > data (other than the ASCII-range of the Unicode repertoire, natch).  I
> > don't think ISO 2709 knows from characters, only bytes.
> >
> > -- Michael
> >
> > # Michael Doran, Systems Librarian
> > # University of Texas at Arlington
> > # 817-272-5326 office
> > # 817-688-1926 mobile
> > # do...@uta.edu
> > # http://rocky.uta.edu/doran/
> >
> >
> > > -Original Message-
> > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> > Of
> > > Tod Olson
> > > Sent: Wednesday, April 18, 2012 5:04 AM
> > > To: CODE4LIB@LISTSERV.ND.EDU
> > > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> > > ISO_2709 and MARC21
> > >
> > > It has to mean UTF-8. ISO 2709 is very byte-oriented, from the
> > directory
> > > structure to the byte-offsets in the fixed fields. The values in
> > these
> > > places all assume 8-bit character data, it's completely baked in to
> > the
> > > file format.
> > >
> > > -Tod
> > >
> > > On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:
> > >
> > > > Okay, forget XML for a moment, let's just look at marc 'binary'.
> > > >
> > > > First, for Anglophone-centric MARC21.
> > > >
> > > > The LC docs don't actually say quite what I thought about leader
> > byte
> > > 09, used to advertise encoding:
> > > >
> > > >
> > > > a - UCS/Unicode
> > > > Character coding in the record makes use of characters from the
> > > Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an
> > industry
> > > subset.
> > > >
> > > >
> > > >
> > > > That doesn't say UTF-8. It says UCS or "Unicode". What does that
> > > actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer
> to
> > > what used to be called "UCS" I think?).  Whatever it actually means,
> > do
> > > people violate it in the wild?
> > > >
> > > >
> > > >
> > > > Now we get to non-Anglophone centric marc. I think all of which is
> > > ISO_2709?  A standard which of course is not open access, so I can't
> > get
> > > it to see what it says.
> > > >
> > > > But leader 09 being used for encoding -- is that Marc21 specific,
> > or is
> > > it true of any ISO-2709?  Marc8 and "unicode" being the only valid
> > > encodings can't be true of any ISO-2709, right?
> > > >
> > > > Is there a generic ISO-2709 way to deal with this, or not so much?


[CODE4LIB] Job: Senior Developer, WGBH Media Library & Archives, Boston

2012-04-18 Thread Courtney Michael
*Please excuse cross-postings*

WGBH is looking for a creative and energetic Senior Developer to lead the 
development of a digital asset management (DAM) preservation system for the 
WGBH Media Library and Archives.

The Senior Developer will play a leading role in designing and implementing the 
architecture, workflows, and applications for WGBH MLA digital library 
services. The system will be based on the Hydra Project technology stack, which 
includes Ruby on Rails, Blacklight, Apache Solr, and the Fedora Commons 
repository. In addition, the Senior Developer will work on web based projects 
for the Media Library and Archives, including the implementation of a website 
to give scholars and researchers access to material in the WGBH Archive.

Working closely with the Media Library and Archive’s Director, Project Manager, 
and a WGBH Interactive Designer, the Senior Developer will specify, document 
and develop the technical architecture of a prototype digital asset management 
system for digital preservation. They will develop user interfaces to the 
system. They will also continue to develop the Open Vault website: 
http://openvault.wgbh.org.

 Specific duties include:

  *   Gather requirements and develop specifications for the digital library 
architecture; work closely with digital object creators and managers to 
understand their needs.
  *   Working with open-source applications and toolkits, design and implement 
a multi-purpose repository infrastructure that supports the ingestion, 
preservation, and delivery of digital objects.
  *   Test, evaluate, and recommend potential toolkits and applications for 
inclusion in the repository architecture.
  *   Design and implement workflows to extract, transform and repurpose 
metadata and digital objects as needed.
  *   Customize open source applications to provide front-end interfaces to the 
repository for end-user delivery
  *   Maintain digital library architecture, troubleshooting issues whenever 
they arise.
  *   Keep abreast of community-wide developments in the realm of digital 
library software and infrastructure.
  *   Contribute to the development of Open Source applications.
  *   Write and maintain documentation.
  *   May supervise junior programmers

 Please note that this position has the possibility of being extended based 
upon funding levels.

Responsible for maintaining a working environment that leverages the potential 
and diversity of the department's entire staff. Provide direction and 
leadership in such a way as to nurture, create and maintain an environment that 
is (1) free from discrimination, intolerance and harassment and (2) provides 
employees with equal access to opportunities for growth and advancement 
including professional development whenever possible.

Skills Required:

  *   The ideal candidate:
  *   Has experience implementing digital archives, using repository software 
such as DSpace or Fedora Commons.
  *   Is Unix proficient.
  *   Has some experience with Blacklight, Hydra, Ruby on Rails and/or Solr.
  *   Can demonstrate understanding of Internet technologies including HTML, 
CSS, JavaScript and XML (particularly XSLT, XPath and RDF)
  *   Has worked with web services such as REST, SOAP and/or XML-RPC
  *   Is familiar with one or more RDMS, such as MySQL. Experience integrating 
with, or extracting data from, FileMaker Pro will also be helpful.
  *   Is familiar with online media workflows (from post-production to 
compression to distribution).

 WGBH is a Mac shop, with LAMP servers. Candidates should be prepared to share 
and discuss code samples.

Educational Requirements:

To perform the required duties, the Senior Developer must possess the skills 
and qualities required to complete a Bachelor's Degree in Computer Science, and 
more than 3 years of work experience developing web applications. Demonstrated 
interest in library or moving images archive issues preferred.

Department Overview:

WGBH produces the best and most well known television, radio and online 
programs for public media. The WGBH Media Library and Archives preserves and 
helps re-purpose WGBH creations into the future. The MLA establishes the 
policies and procedures for the access, acquisition, intellectual control, and 
preservation of WGBH’s physical media and digital production and administrative 
assets. The MLA also offers production organization of archival materials from 
projects start up to shut down, research services, rights clearances, and 
licenses WGBH stock footage.  This is a full-time, on-site position with 
benefits, starting as soon as possible. It is funded for 12 months, with the 
possibility of renewal after that. Moderate travel may be required. We work 
hard, but believe in work/life balance.

Apply at http://www.wgbh.org/about/employmentOpportunities.cfm


Courtney Michael
Project Manager
WGBH Media Library & Archives
One Guest Street
Boston, MA 02135
p. 617-300-2673
f. 617-

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Houghton,Andrew
> Jonathan Rochkind
> Sent: Tuesday, April 17, 2012 19:55
> Subject: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
>
> The LC docs don't actually say quite what I thought about leader byte
> 09, used to advertise encoding:
> 
> 
> a - UCS/Unicode
> Character coding in the record makes use of characters from the
> Universal Coded Character Set (UCS) (ISO 10646), or Unicode™, an
> industry subset.
> 
> 
> 
> That doesn't say UTF-8. It says UCS or "Unicode". What does that
> actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to
> what used to be called "UCS" I think?).  Whatever it actually means, do
> people violate it in the wild?

First UCS/Unicode basically means the same thing. Second UTF-8, UTF-16, UTF-32 
are encoding forms for UCS/Unicode. The MARC documentation does actually say 
MARC binary records *must* be encoded UTF-8 when LDR/09 content has the value 
'a'.

You need to refer to the appropriate standards for this information and 
definitions:


Unicode specifies three encoding forms, of which only one, UTF-8 (UCS 
Transformation Format 8), is authorized for use in MARC 21 records.

 
UCS. Acronym for Universal Character Set, which is specified by International 
Standard ISO/IEC 10646, which is equivalent in repertoire to the Unicode 
Standard.


Unicode Encoding Form. A character encoding form that assigns each Unicode 
scalar value to a unique code unit sequence. The Unicode Standard defines three 
Unicode encoding forms: UTF-8, UTF-16, and UTF-32. (See definition D79 in 
Section 3.9, Unicode Encoding Forms.)

 
UTF-8. A multibyte encoding for text that represents each Unicode character 
with 1 to 4 bytes, and which is backward-compatible with ASCII. UTF-8 is the 
predominant form of Unicode in web pages. More technically: (1) The UTF-8 
encoding form. (2) The UTF-8 encoding scheme. (3) “UCS Transformation Format 
8,” defined in Annex D of ISO/IEC 10646:2003, technically equivalent to the 
definitions in the Unicode Standard.

 
UTF-16. A multibyte encoding for text that represents each Unicode character 
with 2 or 4 bytes; it is not backward-compatible with ASCII. It is the internal 
form of Unicode in many programming languages, such as Java, C#, and 
JavaScript, and in many operating systems. More technically: (1) The UTF-16 
encoding form. (2) The UTF-16 encoding scheme. (3) “Transformation format for 
16 planes of Group 00,” defined in Annex C of ISO/IEC 10646:2003; technically 
equivalent to the definitions in the Unicode Standard.


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Huwig,Steve
I could be mistaken (never having had the pleasure of reading it), but
isn't ISO-2709 specified as a fixed number of characters, and any
conflation of characters and 8-bit bytes is on the part of users and
implementations?

I think ISO 2709 might not know from bytes, only characters. 

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
> Doran, Michael D
> Sent: Wednesday, April 18, 2012 10:05 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> Hi Tod,
> 
> I'm not understanding how UTF-8 would be considered 8-bit character
> data (other than the ASCII-range of the Unicode repertoire, natch).  I
> don't think ISO 2709 knows from characters, only bytes.
> 
> -- Michael
> 
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 mobile
> # do...@uta.edu
> # http://rocky.uta.edu/doran/
> 
> 
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> Of
> > Tod Olson
> > Sent: Wednesday, April 18, 2012 5:04 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> > ISO_2709 and MARC21
> >
> > It has to mean UTF-8. ISO 2709 is very byte-oriented, from the
> directory
> > structure to the byte-offsets in the fixed fields. The values in
> these
> > places all assume 8-bit character data, it's completely baked in to
> the
> > file format.
> >
> > -Tod
> >
> > On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:
> >
> > > Okay, forget XML for a moment, let's just look at marc 'binary'.
> > >
> > > First, for Anglophone-centric MARC21.
> > >
> > > The LC docs don't actually say quite what I thought about leader
> byte
> > 09, used to advertise encoding:
> > >
> > >
> > > a - UCS/Unicode
> > > Character coding in the record makes use of characters from the
> > Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an
> industry
> > subset.
> > >
> > >
> > >
> > > That doesn't say UTF-8. It says UCS or "Unicode". What does that
> > actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer
to
> > what used to be called "UCS" I think?).  Whatever it actually means,
> do
> > people violate it in the wild?
> > >
> > >
> > >
> > > Now we get to non-Anglophone centric marc. I think all of which is
> > ISO_2709?  A standard which of course is not open access, so I can't
> get
> > it to see what it says.
> > >
> > > But leader 09 being used for encoding -- is that Marc21 specific,
> or is
> > it true of any ISO-2709?  Marc8 and "unicode" being the only valid
> > encodings can't be true of any ISO-2709, right?
> > >
> > > Is there a generic ISO-2709 way to deal with this, or not so much?


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Karen Coyle

UTF-8 was the marc standard from the beginning:

http://www.loc.gov/marc/marbi/1998/98-18.html

The first proposals were a character mapping between Unicode and MARC-8 
and didn't mention the character encodings, thus the term "UCS" which 
was a common term for Unicode at that time. (see: 
http://www.loc.gov/marc/marbi/1996/96-10.html). But when it got down to 
brass tacks, it was UTF-8, and left open the possibility of UTF-16 
(which was still a viable rival to UTF-8 at the time, as I recall.) 
UTF-16 had the advantage of every character being of uniform length, but 
it also did not cover all of the characters of interest to libraries.


The decision was also made to use byte count rather than character count 
in the directory. This was influenced by the UTF-8 decision.


kc

On 4/18/12 7:04 AM, Jonathan Rochkind wrote:

On 4/18/2012 6:04 AM, Tod Olson wrote:

It has to mean UTF-8. ISO 2709 is very byte-oriented, from the
directory structure to the byte-offsets in the fixed fields. The
values in these places all assume 8-bit character data, it's
completely baked in to the file format.


I'm not sure that follows. One could certainly have UTF-16 in a Marc
record, and still count bytes to get a directory structure and byte
offsets. (In some ways it'd be easier since every char would be two bytes).

In fact, I worry that the standard may pre-date UTF-8, with it's
reference to "UCS" --- if I understand things right, at one point there
was only one unicode encoding, called "UCS", which is basically a
backwards-compatible subset of what became UTF-16.

So I worry the standard really "means" UCS/UTF-16.

But if in fact records in the wild with the 'u' value are far more
likely to be UTF-8... well it's certainly not the first time the MARC21
standard was useless/ignored as a standard in answering such questions.


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread LeVan,Ralph
> In fact, I worry that the standard may pre-date UTF-8, with it's 
> reference to "UCS" ---  if I understand things right, at one point
there 
> was only one unicode encoding, called "UCS", which is basically a 
> backwards-compatible subset of what became UTF-16.

> So I worry the standard really "means" UCS/UTF-16.

Now you're just trying to scare yourself.  I've never seen UTF-16
MarcXML.  I've never seen anything but UTF-8 encoded MarcXML.

Ralph


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Doran, Michael D
Hi Tod,

I'm not understanding how UTF-8 would be considered 8-bit character data (other 
than the ASCII-range of the Unicode repertoire, natch).  I don't think ISO 2709 
knows from characters, only bytes.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/


> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Tod Olson
> Sent: Wednesday, April 18, 2012 5:04 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> It has to mean UTF-8. ISO 2709 is very byte-oriented, from the directory
> structure to the byte-offsets in the fixed fields. The values in these
> places all assume 8-bit character data, it's completely baked in to the
> file format.
> 
> -Tod
> 
> On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:
> 
> > Okay, forget XML for a moment, let's just look at marc 'binary'.
> >
> > First, for Anglophone-centric MARC21.
> >
> > The LC docs don't actually say quite what I thought about leader byte
> 09, used to advertise encoding:
> >
> >
> > a - UCS/Unicode
> > Character coding in the record makes use of characters from the
> Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an industry
> subset.
> >
> >
> >
> > That doesn't say UTF-8. It says UCS or "Unicode". What does that
> actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to
> what used to be called "UCS" I think?).  Whatever it actually means, do
> people violate it in the wild?
> >
> >
> >
> > Now we get to non-Anglophone centric marc. I think all of which is
> ISO_2709?  A standard which of course is not open access, so I can't get
> it to see what it says.
> >
> > But leader 09 being used for encoding -- is that Marc21 specific, or is
> it true of any ISO-2709?  Marc8 and "unicode" being the only valid
> encodings can't be true of any ISO-2709, right?
> >
> > Is there a generic ISO-2709 way to deal with this, or not so much?


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Jonathan Rochkind

On 4/18/2012 6:04 AM, Tod Olson wrote:

It has to mean UTF-8. ISO 2709 is very byte-oriented, from the directory 
structure to the byte-offsets in the fixed fields. The values in these places 
all assume 8-bit character data, it's completely baked in to the file format.


I'm not sure that follows. One could certainly have UTF-16 in a Marc 
record, and still count bytes to get a directory structure and byte 
offsets. (In some ways it'd be easier since every char would be two bytes).


In fact, I worry that the standard may pre-date UTF-8, with it's 
reference to "UCS" ---  if I understand things right, at one point there 
was only one unicode encoding, called "UCS", which is basically a 
backwards-compatible subset of what became UTF-16.


So I worry the standard really "means" UCS/UTF-16.

But if in fact records in the wild with the 'u' value are far more 
likely to be UTF-8... well it's certainly not the first time the MARC21 
standard was useless/ignored as a standard in answering such questions.


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Peter Noerr
We cried our eyes out in 1976 when this first came to our attention at the BL. 
Even more crying when we couldn't get rid of it in the MARC-I to MARC-II 
conversion (well before MARC21 was even a twinkle) - a lot of tears are 
gathering somewhere.

Peter



> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill 
> Dueber
> Sent: Tuesday, April 17, 2012 5:50 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 
> and MARC21
> 
> On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero  wrote:
> 
> > Actually Anglo and Francophone centric. And the USMARC style 245 was a
> > poor replacement for the UKMARC approach (someone at the British
> > Library hosted Linked Data meeting wondered why there were punctation
> > characters included in the data in the title field. The catalogers wept 
> > slightly).
> >
> > Simon
> >
> 
> 
> Slightly? I cry my eyes out *every single day* about that. Well, every 
> weekday, anyway.
> 
> 
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library


Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Tod Olson
It has to mean UTF-8. ISO 2709 is very byte-oriented, from the directory 
structure to the byte-offsets in the fixed fields. The values in these places 
all assume 8-bit character data, it's completely baked in to the file format.

-Tod

On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:

> Okay, forget XML for a moment, let's just look at marc 'binary'.
> 
> First, for Anglophone-centric MARC21.
> 
> The LC docs don't actually say quite what I thought about leader byte 09, 
> used to advertise encoding:
> 
> 
> a - UCS/Unicode
> Character coding in the record makes use of characters from the Universal 
> Coded Character Set (UCS) (ISO 10646), or Unicode™, an industry subset.
> 
> 
> 
> That doesn't say UTF-8. It says UCS or "Unicode". What does that actually 
> mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to what used to be 
> called "UCS" I think?).  Whatever it actually means, do people violate it in 
> the wild?
> 
> 
> 
> Now we get to non-Anglophone centric marc. I think all of which is ISO_2709?  
> A standard which of course is not open access, so I can't get it to see what 
> it says.
> 
> But leader 09 being used for encoding -- is that Marc21 specific, or is it 
> true of any ISO-2709?  Marc8 and "unicode" being the only valid encodings 
> can't be true of any ISO-2709, right?
> 
> Is there a generic ISO-2709 way to deal with this, or not so much?


[CODE4LIB] Job: Two-Year Research Fellowship in Digital Curation at University of Colorado at Boulder

2012-04-18 Thread jobs
Two-Year Research Fellowship in Digital Curation

Journalism and Mass Communication

University of Colorado at Boulder

  
We are seeking to hire a research fellow with a degree in Library and/or
Information Science, or an arts, humanities or social science discipline in
which the candidate has acquired significant research and practical expertise
in the area of digital curation. The ideal candidate should provide evidence
of past practical experience in digital curation and possess a clear research
and/or creative work agenda in which digital curation is the central activity.
We seek to hire someone who has earned a graduate degree within the past three
years that emphasizes digital archiving, preservation and curation. A Ph.D. is
preferred, but strong candidates with M.A. or M.S. degrees also will be
considered.

  
During his/her tenure as a digital curation fellow, the person hired will: 1)
curate an original or collaborative project on campus; 2) make occasional
campus presentations about the subject of digital curation and its value
across disciplines; (3) conduct a graduate seminar, open to
graduate students from multiple disciplines, surveying research and best
practices within the field of digital curation; and (4) advise faculty and
administrators on the development of curriculum in the field of digital
curation.

  
The person hired would provide outreach to various constituencies on campus,
particularly visual artists, musicians, journalists, filmmakers, librarians,
museum curators and archivists who seek to acquire curation skills for
creating digital archives of primary data (image, sound, text), and for
accessing, analyzing, and presenting such data.

  
The salary would be US $50,000 per year for a two-year contract. Full faculty
benefits would be provided for the period of the contract.

  
Screening of applications will begin May 1, 2012 and will continue until the
position is filled. For guidelines on applying, go to: www.jobsatcu.com. The
job posting number is: **817241**

  
The University of Colorado Boulder is an equal opportunity employer committed
to diversity and equality in education and employment.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/896/


[CODE4LIB] Archivists' Toolkit: Adding Digital Objects via MySQL

2012-04-18 Thread Rosalyn Metz
Hi Everyone,

I posted this over on the Archivists' Toolkit listserv and got no response
(yet), so I thought I might try here as well.

I have a large quantity (around 300+) of digital objects that I need to add
to Archivists' Toolkit.  I think I've figured out what queries I need to
run in order to do this in MySQL (rather than the interface) but I wanted
to get opinions from the peanut gallery before trying it out on my test
instance.

It seems that there are actually two update queries that need to be used
when creating a Digital Object.  They are:

insert into ArchDescriptionInstances
(instanceType, resourceComponentId, resourceId, parentResourceId,
instanceDescriminator, archDescriptionInstancesId)
values
('Digital object', 336673, null, 543, 'digital', 22567003)


and...

insert into DigitalObjects
(version, lastUpdated, created, lastUpdatedBy, createdBy, title,
dateExpression, dateBegin, dateEnd, languageCode, restrictionsApply,
eadDaoActuate, eadDaoShow, metsIdentifier, objectType, label, objectOrder,
componentId, parentDigitalObjectId, archDescriptionInstancesId,
repositoryId)
values
(0, '2012-04-17 12:05:15', '2012-04-17 12:05:15', 'username', 'username',
'title', '1938-1959', null, null, '', 0, 'onRequest', 'new', '678.1829',
'text', '', 0, '', null, 22567003, 1)


There also appears to be some update queries as well, but I'm guessing that
they are less important (please correct me if I'm wrong).  Has anyone tried
to do this in the past? If so do you have scripts that will create Digital
Objects for you that you wouldn't mind sharing?  Is there anything you
think I should know before testing this out in my test instance of AT?  Any
caveats for me?

Any help anyone can provide would be greatly appreciated.

Thanks,
Rosalyn