Re: [CODE4LIB] character-sets for dummies?

2009-12-16 Thread Thomale, J
Ken,

Great suggestions so far--I have just one thing to add.

If you ever reach the point at which you find yourself examining code tables to 
figure out what character set something is using, you might also want to find a 
good hex editor so that you can examine your data byte by byte. Since what 
you're looking at otherwise is always going to be the data as interpreted by a 
particular program (email program, web browser, text editor), looking at it 
with a hex editor can give you a nice grounding in reality, without that extra 
layer of interpretation.

I use XVI32: http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm

Jason Thomale
Metadata Librarian
Texas Tech University Libraries



 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ken Irwin
 Sent: Wednesday, December 16, 2009 11:02 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] character-sets for dummies?
 
 Hi all,
 
 I'm looking for a good source to help me understand character sets and
 how to use them. I pretty much know nothing about this - the whole
 world of Unicode, ASCII, octal, UTF-8, etc. is baffling to me.
 
 My immediate issue is that I think I need to integrate data from a
 variety of character sets into one MySQL table - I expect I need some
 way to convert from one to another, but I don't really even know how to
 tell which data are in which format.
 
 Our homegrown journal list (akin to SerialsSolutions) includes data
 ingested from publishers, vendors, the library catalog (III), etc. When
 I look at the data in emacs, some of it renders like this:
  Revista de Oncolog\303\255a  [slashes-and-digits
 instead of diacritics]
 And other data looks more like:
  Revista de Música Latinoamericana[weird characters instead of
 diacritics]
 
 My MySQL table is currently set up with the collation set to: utf8-bin
 , and the titles from the second category (weird characters display in
 emacs) render properly when the database data is output to the a web
 browser. The data from the former example (\###) renders as an I don't
 know what character this is placeholder in Firefox and IE.
 
 So, can someone please point me toward any or all of the following?
 
 · A good primer for understanding all of this stuff
 
 · A method for converting all of my data to the same character
 set so it plays nicely in the database
 
 · The names of which character-sets I might be working with
 here
 
 Many thanks!
 
 Ken


Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards

2009-08-12 Thread Thomale, J
 Ross Singer wrote:
 
  One of the problems here is that it doesn't begin to address the DCAM
  -- these are 59 properties that can be reused among 22 classes,
 giving
  them different semantic meaning.
 
 
 Uh, no. That's the opposite of what the DC terms are about. Each term
 has a defined range -- so the defined range of creator is Agent. It can
 only be used as an Agent. You don't mix and match, and you don't assign
 different semantics to the same property under different circumstances.
 You can further *restrict* properties (which is what the application
 profile is all about) but you can't change the semantics of properties.
 The DCAM allows a property to be a member of more than one class, but I
 don't see any examples of that (will ask on the DC list) in DC terms.
 Remember that the A in DCAM is abstract. Some of the things it makes
 possible may be unusable in real life.

I think what Ross is getting at isn't exactly what he said: you're right of 
course that the semantics of a DC property don't change when you use it in 
various circumstances. What happens is that you use the same DC property to 
describe the same type of attribute for different entities within your data 
set. So, you use a dc:creator property to describe the creator of a digital 
image file, a dc:creator property to describe the creator of the thing in your 
image, and a dc:creator property to describe the creator of the metadata 
record. The properties point to different things, but it's still the same 
property--you don't need to define three separate and distinct properties in 
your schema. This is one of the things that makes MARC (and so many other 
metadata standards) unwieldy, because you've got the same essential type of 
property (like a creator property) that is defined separately depending on 
what it is supposed to be describing (when it doesn't necessarily have to be 
tha!
 t way).

  Dublin Core is toothless and practically worthless in XML form.  It
 is
  considerably more powerful when used in RDF, however, because they
  play to their mutual strengths, namely that in RDF, you generally
  don't use a schema in isolation.
 
 
 The elements in Dublin Core are the elements in Dublin Core. The
 serialization shouldn't really matter. 

I think the reason it matters--and the reason that Ross mentioned it--is 
because the commonly-used XML serialization is just a flat DC record describing 
a single resource (isn't it?). Using a data model like RDF (or the DCAM), DC 
becomes a lot more powerful because of the ability to reuse the DC metadata 
properties to describe different entities/resources easily.

Your later point that some DC properties aren't specific enough (e.g., title) 
still stands, although I wonder if it would be kosher (as far as DC is 
concerned) to do something like this:

http://example.org/resource-uri   dc:title
http://example.org/resource-uri/title.
http://example.org/resource-uri/title mods:title  The Title of a Book.
http://example.org/resource-uri/title mods:subtitle   The Subtitle.

Similarly with names:

http://example.org/resource-uri   dc:creator  
http://example.org/johndoe.
http://example.org/johndoeawesomenamevocab:firstname  John.
http://example.org/johndoeawesomenamevocab:lastname   Doe.

So that way you're making use DC elements that are already defined and adding 
specificity where necessary instead of reinventing new metadata elements where 
you really don't need to.

Jason Thomale
Metadata Librarian
Texas Tech University Libraries


Re: [CODE4LIB] MathML-image conversion?

2008-09-24 Thread Thomale, J
I haven't used it myself, but it looks like JEuclid would do what you need--it 
includes a command-line tool for converting MathML to a variety of different 
image formats.

http://jeuclid.sourceforge.net/

Jason Thomale


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Thomas Dowling
 Sent: Wednesday, September 24, 2008 7:33 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] MathML-image conversion?

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Code4Libbers--

 To support some commercial e-books that we load locally and publish to
 our users with XTF, we need to handle formulas marked up in MathML.

 For their own web site, the book publisher farms this out to a
 third-party vendor that somehow converts the MathML to inline images
 for
 display in HTML pages.  The publisher is unable or unwilling to tell us
 what their vendor uses for the job.

 I'm all for sending MathML down the pipe (caveat browsor), but our
 e-book guru hasn't been able to slip it through XTF.  If that isn't an
 option, does anyone know a tool to convert possible large sets of
 MathML
 formulas to PNG (or GIF) images?  I've run across similar converters
 for
 LaTeX, but not MathML.


 - --
 Thomas Dowling
 [EMAIL PROTECTED]
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFI2jN2Qw3b3khQwSgRAvKKAJ9cVQJAa4BbwULSrF3PEbCEehVGWQCgwTC8
 mMZGyOn+sAb7DVxeLF3Z2Kc=
 =D5JW
 -END PGP SIGNATURE-


[CODE4LIB] Position Announcement: Metadata Librarian

2008-05-20 Thread Thomale, J
Note that this message has been cross-posted to several lists. Please
forgive and ignore any duplication.

The Texas Tech University Libraries is looking to hire a second metadata
librarian. The person who fills this position will be responsible for
helping develop metadata standards and best practices for digital
materials acquired by the Libraries. The person will also participate in
providing access to library materials acquired by the University
Libraries and will contribute to the development of departmental
policies and procedures.

This is an entry-level position. It is also a tenure-track faculty
position.

View the complete position announcement here:

http://library.ttu.edu/hr/images/metadata.pdf

To apply for this position, follow the instructions in the announcement,
or visit http://jobs.texastech.edu/, click on search postings, and
search for a requisition number of 76561.


Re: [CODE4LIB] Radioactive records for Solr

2007-02-09 Thread Thomale, J
I just saw this message as well as the follow-ups today. I was one of
the graduate assistants working on this phase of Dr. Moen's Zinterop
project, and I worked quite closely with the Radioactive records and the
Radioactive perl module that Mike Taylor from Index Data developed. I
also helped write some of the documentation for the test scripts that we
developed that utilized that perl module. There had been plans in there
somewhere to develop a paper that really outlined how to use the perl
module to do in-depth Z39.50 interoperability testing, but I graduated,
got a full-time position, and haven't had time to think about it much
since. I always thought it had a lot of potential.

I would be more than happy to lend any help at all in any capacity that
I can. I was always a little bit disappointed that we hadn't gotten to
really follow up on the project much, and I'm glad to see that there is
some interest in it. I just had lunch with Dr. Moen at the OR 2007
conference in San Antonio at the end of last month, so I'm still on very
good terms with him and could perhaps help enlist his help, as well.

Please let me know how I can help. Work keeps me busy, but I can find
time after hours if I need to. It's been over a year, so it might take
me a bit to get back into it, but I'm sure it will come back to me.

(I hope that didn't sound too desperate.) :-)

Thanks,

Jason Thomale
Metadata Librarian
Texas Tech University Libraries



 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
 Binkley, Peter
 Sent: Thursday, February 08, 2007 3:13 PM
 To: CODE4LIB@listserv.nd.edu
 Subject: [CODE4LIB] Radioactive records for Solr

 In hunting for data to help model subject faceting for MARC records,
 I've just been looking at Bill Moen's Zinterop report

(http://www.unt.edu/zinterop/ZInterop2/Documents/ZInterop2FinalReport_we
 m4Dec2005.pdf). It occurs to me that with all our various projects
 working on indexing MARC records in Solr, we should set up and
 distribute a set of radioactive records to use in each project to
 diagnose and compare indexing and querying behaviour. Probably we
could
 just use the Zinterop records (which are described in detail in that
pdf
 but aren't available for download anywhere I could find); but we might
 want to enhance them with data suitable for testing our faceting
 systems. Not sure what that would mean but I thought I'd throw it out.

 If you were at Access '05, you heard Bill describe the Z39.50 testing
he
 was doing with radioactive records: records with known unique values
in
 all indexed fields, that could be used for automated testing of Z39.50
 search functionality. The same approach might be very useful as we
feel
 our way towards a Solr MARC indexing system.

 Has anyone already done something like this?

 Peter

 Peter Binkley
 Digital Initiatives Technology Librarian
 Information Technology Services
 4-30 Cameron Library
 University of Alberta Libraries
 Edmonton, Alberta
 Canada T6G 2J8
 Phone: (780) 492-3743
 Fax: (780) 492-9243
 e-mail: [EMAIL PROTECTED]


[CODE4LIB] Position Announcement: Metadata Librarian

2006-12-05 Thread Thomale, J
Note that this message has been cross-posted to several lists. Please
forgive and ignore any duplication.

 

The Texas Tech University Libraries is looking to hire a second metadata
librarian. The person who fills this position will be responsible for
the development of metadata standards and best practices for digital
materials acquired by the Libraries. The person will also participate in
providing access to library materials acquired by the University
Libraries and will contribute to the development of departmental
policies and procedures.

 

This is an entry-level position. It is also a tenure-track faculty
position.

 

View the complete position announcement here:

 

http://library.ttu.edu/hr/images/metadata%20librarian.pdf

 

To apply for this position, follow the instructions in the announcement,
or visit http://jobs.texastech.edu/, click on search postings, and
search for a requisition number of 72946.

 

Thanks,

 

Jason Thomale

Metadata Librarian

Texas Tech University Library