Re: [CODE4LIB] character-sets for dummies?
Ken, Great suggestions so far--I have just one thing to add. If you ever reach the point at which you find yourself examining code tables to figure out what character set something is using, you might also want to find a good hex editor so that you can examine your data byte by byte. Since what you're looking at otherwise is always going to be the data as interpreted by a particular program (email program, web browser, text editor), looking at it with a hex editor can give you a nice grounding in reality, without that extra layer of interpretation. I use XVI32: http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm Jason Thomale Metadata Librarian Texas Tech University Libraries -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ken Irwin Sent: Wednesday, December 16, 2009 11:02 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] character-sets for dummies? Hi all, I'm looking for a good source to help me understand character sets and how to use them. I pretty much know nothing about this - the whole world of Unicode, ASCII, octal, UTF-8, etc. is baffling to me. My immediate issue is that I think I need to integrate data from a variety of character sets into one MySQL table - I expect I need some way to convert from one to another, but I don't really even know how to tell which data are in which format. Our homegrown journal list (akin to SerialsSolutions) includes data ingested from publishers, vendors, the library catalog (III), etc. When I look at the data in emacs, some of it renders like this: Revista de Oncolog\303\255a [slashes-and-digits instead of diacritics] And other data looks more like: Revista de Música Latinoamericana[weird characters instead of diacritics] My MySQL table is currently set up with the collation set to: utf8-bin , and the titles from the second category (weird characters display in emacs) render properly when the database data is output to the a web browser. The data from the former example (\###) renders as an I don't know what character this is placeholder in Firefox and IE. So, can someone please point me toward any or all of the following? · A good primer for understanding all of this stuff · A method for converting all of my data to the same character set so it plays nicely in the database · The names of which character-sets I might be working with here Many thanks! Ken
Re: [CODE4LIB] MARC/MODS and Automating Migration to Linked-Data Standards
Ross Singer wrote: One of the problems here is that it doesn't begin to address the DCAM -- these are 59 properties that can be reused among 22 classes, giving them different semantic meaning. Uh, no. That's the opposite of what the DC terms are about. Each term has a defined range -- so the defined range of creator is Agent. It can only be used as an Agent. You don't mix and match, and you don't assign different semantics to the same property under different circumstances. You can further *restrict* properties (which is what the application profile is all about) but you can't change the semantics of properties. The DCAM allows a property to be a member of more than one class, but I don't see any examples of that (will ask on the DC list) in DC terms. Remember that the A in DCAM is abstract. Some of the things it makes possible may be unusable in real life. I think what Ross is getting at isn't exactly what he said: you're right of course that the semantics of a DC property don't change when you use it in various circumstances. What happens is that you use the same DC property to describe the same type of attribute for different entities within your data set. So, you use a dc:creator property to describe the creator of a digital image file, a dc:creator property to describe the creator of the thing in your image, and a dc:creator property to describe the creator of the metadata record. The properties point to different things, but it's still the same property--you don't need to define three separate and distinct properties in your schema. This is one of the things that makes MARC (and so many other metadata standards) unwieldy, because you've got the same essential type of property (like a creator property) that is defined separately depending on what it is supposed to be describing (when it doesn't necessarily have to be tha! t way). Dublin Core is toothless and practically worthless in XML form. It is considerably more powerful when used in RDF, however, because they play to their mutual strengths, namely that in RDF, you generally don't use a schema in isolation. The elements in Dublin Core are the elements in Dublin Core. The serialization shouldn't really matter. I think the reason it matters--and the reason that Ross mentioned it--is because the commonly-used XML serialization is just a flat DC record describing a single resource (isn't it?). Using a data model like RDF (or the DCAM), DC becomes a lot more powerful because of the ability to reuse the DC metadata properties to describe different entities/resources easily. Your later point that some DC properties aren't specific enough (e.g., title) still stands, although I wonder if it would be kosher (as far as DC is concerned) to do something like this: http://example.org/resource-uri dc:title http://example.org/resource-uri/title. http://example.org/resource-uri/title mods:title The Title of a Book. http://example.org/resource-uri/title mods:subtitle The Subtitle. Similarly with names: http://example.org/resource-uri dc:creator http://example.org/johndoe. http://example.org/johndoeawesomenamevocab:firstname John. http://example.org/johndoeawesomenamevocab:lastname Doe. So that way you're making use DC elements that are already defined and adding specificity where necessary instead of reinventing new metadata elements where you really don't need to. Jason Thomale Metadata Librarian Texas Tech University Libraries
Re: [CODE4LIB] MathML-image conversion?
I haven't used it myself, but it looks like JEuclid would do what you need--it includes a command-line tool for converting MathML to a variety of different image formats. http://jeuclid.sourceforge.net/ Jason Thomale -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Dowling Sent: Wednesday, September 24, 2008 7:33 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] MathML-image conversion? -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Code4Libbers-- To support some commercial e-books that we load locally and publish to our users with XTF, we need to handle formulas marked up in MathML. For their own web site, the book publisher farms this out to a third-party vendor that somehow converts the MathML to inline images for display in HTML pages. The publisher is unable or unwilling to tell us what their vendor uses for the job. I'm all for sending MathML down the pipe (caveat browsor), but our e-book guru hasn't been able to slip it through XTF. If that isn't an option, does anyone know a tool to convert possible large sets of MathML formulas to PNG (or GIF) images? I've run across similar converters for LaTeX, but not MathML. - -- Thomas Dowling [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI2jN2Qw3b3khQwSgRAvKKAJ9cVQJAa4BbwULSrF3PEbCEehVGWQCgwTC8 mMZGyOn+sAb7DVxeLF3Z2Kc= =D5JW -END PGP SIGNATURE-
[CODE4LIB] Position Announcement: Metadata Librarian
Note that this message has been cross-posted to several lists. Please forgive and ignore any duplication. The Texas Tech University Libraries is looking to hire a second metadata librarian. The person who fills this position will be responsible for helping develop metadata standards and best practices for digital materials acquired by the Libraries. The person will also participate in providing access to library materials acquired by the University Libraries and will contribute to the development of departmental policies and procedures. This is an entry-level position. It is also a tenure-track faculty position. View the complete position announcement here: http://library.ttu.edu/hr/images/metadata.pdf To apply for this position, follow the instructions in the announcement, or visit http://jobs.texastech.edu/, click on search postings, and search for a requisition number of 76561.
Re: [CODE4LIB] Radioactive records for Solr
I just saw this message as well as the follow-ups today. I was one of the graduate assistants working on this phase of Dr. Moen's Zinterop project, and I worked quite closely with the Radioactive records and the Radioactive perl module that Mike Taylor from Index Data developed. I also helped write some of the documentation for the test scripts that we developed that utilized that perl module. There had been plans in there somewhere to develop a paper that really outlined how to use the perl module to do in-depth Z39.50 interoperability testing, but I graduated, got a full-time position, and haven't had time to think about it much since. I always thought it had a lot of potential. I would be more than happy to lend any help at all in any capacity that I can. I was always a little bit disappointed that we hadn't gotten to really follow up on the project much, and I'm glad to see that there is some interest in it. I just had lunch with Dr. Moen at the OR 2007 conference in San Antonio at the end of last month, so I'm still on very good terms with him and could perhaps help enlist his help, as well. Please let me know how I can help. Work keeps me busy, but I can find time after hours if I need to. It's been over a year, so it might take me a bit to get back into it, but I'm sure it will come back to me. (I hope that didn't sound too desperate.) :-) Thanks, Jason Thomale Metadata Librarian Texas Tech University Libraries -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Binkley, Peter Sent: Thursday, February 08, 2007 3:13 PM To: CODE4LIB@listserv.nd.edu Subject: [CODE4LIB] Radioactive records for Solr In hunting for data to help model subject faceting for MARC records, I've just been looking at Bill Moen's Zinterop report (http://www.unt.edu/zinterop/ZInterop2/Documents/ZInterop2FinalReport_we m4Dec2005.pdf). It occurs to me that with all our various projects working on indexing MARC records in Solr, we should set up and distribute a set of radioactive records to use in each project to diagnose and compare indexing and querying behaviour. Probably we could just use the Zinterop records (which are described in detail in that pdf but aren't available for download anywhere I could find); but we might want to enhance them with data suitable for testing our faceting systems. Not sure what that would mean but I thought I'd throw it out. If you were at Access '05, you heard Bill describe the Z39.50 testing he was doing with radioactive records: records with known unique values in all indexed fields, that could be used for automated testing of Z39.50 search functionality. The same approach might be very useful as we feel our way towards a Solr MARC indexing system. Has anyone already done something like this? Peter Peter Binkley Digital Initiatives Technology Librarian Information Technology Services 4-30 Cameron Library University of Alberta Libraries Edmonton, Alberta Canada T6G 2J8 Phone: (780) 492-3743 Fax: (780) 492-9243 e-mail: [EMAIL PROTECTED]
[CODE4LIB] Position Announcement: Metadata Librarian
Note that this message has been cross-posted to several lists. Please forgive and ignore any duplication. The Texas Tech University Libraries is looking to hire a second metadata librarian. The person who fills this position will be responsible for the development of metadata standards and best practices for digital materials acquired by the Libraries. The person will also participate in providing access to library materials acquired by the University Libraries and will contribute to the development of departmental policies and procedures. This is an entry-level position. It is also a tenure-track faculty position. View the complete position announcement here: http://library.ttu.edu/hr/images/metadata%20librarian.pdf To apply for this position, follow the instructions in the announcement, or visit http://jobs.texastech.edu/, click on search postings, and search for a requisition number of 72946. Thanks, Jason Thomale Metadata Librarian Texas Tech University Library