Re: [CODE4LIB] Slicing/dicing/combining large amounts of data efficiently
Kyle -- if this was me -- I'd break the file into a database. You have a lot of different options, but the last time I had to do something like this -- I broke the data into 10 tables -- a control table with a primary key and oclc number, a table for 0xx fields, a table for 1xx, 2xx, etc. including OCLC number and key that they relate too. You can actually do this with MarcEdit (if you have mysql installed) -- but on a laptop -- I'm not going to guarantee speed with the process. Plus, the process to generate the SQL data will be significant. It might take 15 hours to generate the database, but then you'd have it and could create indexes on it. But you could use it to create the database and then prep the files for later work. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Wednesday, February 27, 2013 9:45 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Slicing/dicing/combining large amounts of data efficiently I'm involved in a migration project that requires identification of local information in millions of MARC records. The master records I need to compare with are 14GB total. I don't know what the others will be, but since the masters are deduped and the source files aren't (plus they contain loads of other garbage), there will be considerably more. Roughly speaking, if I compare 1000 master records per second, it would take about 2 1/2 hours to cut through the file. I need to be able to ask the file whatever questions the librarians might have (i.e. many), so speed is important. For reasons I won't go into right now, I'm stuck doing this on my laptop in cygwin right now and that affects my range of motion. I'm trying to figure out the best way to proceed. Currently, I'm extracting specific fields for comparison. Each field tag gets a single line keyed by OCLC number (repeated fields are catted together with a delimiter). The idea is that if I deal with only one field at a time, I can slurp the master info in memory and retrieve it via hash (OCLC control number) as I loop through the comparison data. Local data will either be stored in special files that are loaded separately from the bibs or recorded in reports for maintenance projects This process is clunky because a special comparison file has to be created for each question, but it does seem to work (generating preprocess files and then doing the compare is measured in minutes rather than hours). I didn't use a DB because there's no way I could store the reference data in memory and I figured I'd just thrash my drive. Is this a reasonable approach, and whether or not it is, what tools should I be thinking of using for this? Thanks, kyle
Re: [CODE4LIB] wiki page about the chode4lib irc bot created
Looking at that, the only absolutely library-specific content there appears to be the MARC plugin (which isn't documented in detail). MARC and not well documented...that sounds about right. --tr * Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Library Corvallis, OR 97331 tel: 541.737.6384 *
Re: [CODE4LIB] code4lib.org domain
Wilhelmina, To answer your two questions. 1) yes, during the 30 day expiration period when registration lapses your site will typically become unavailable 2) this isn't just about one person at OSU. Ryan Ordway is our sys admin, but c4l is supported by a number of folks at the institution in various capacities...up to the director. Were Ryan to leave, the process for maintaining the infrastructure would simply fall to someone else at the Library. Tr * Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Library Corvallis, OR 97331 541.737.6384 From: Wilhelmina Randtke Sent: 12/18/2012 2:00 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] code4lib.org domain Pay for it shouldn't be an issue. It's like $10 a year to register the domain, right? So, don't make a big deal out of OSU paying for it. The fee is negligible. The key concern is how committed to OSU is Ryan Ordway, and what's the climate there like. I see this as transferring to the people who are currently technical contacts at OSU, not to a faceless organization. If they already hold several other URLs, and have a policy and timeframe for tracking and renewing these then that's a plus. Also, I asked before, and I'm going to ask again, will the domain stop working (so stop pointing at nameservers) during the redemption period? If so, then a worst case scenario is not too bad, because there will be some warning and a late fee assuming the registered owner can be contacted, rather than just loosing the domain if the bill isn't paid. -Wilhelmina Randtke On Tue, Dec 18, 2012 at 3:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: I definitely see what you're saying, but think there are pro's and con's both ways. OSU is already responsible for the bulk of our infrastructure too, adding the DNS would be minor. But there are definitely pro's (as well as con's) to individual and/or non-institutional ownership/responsibility/**management, compared to institutional. In the end, as with much Code4Lib, as with much volunteer projects -- what it comes down to is who's offering to volunteer to do it. OSU is offering to volunteer to do it (and pay for it, apparently?), and we obviously find OSU to be generally responsible, since they host the rest of our infrastructure. Someone offering to do it right now, someone we find generally responsible -- always beats the hypothetical other solution that has nobody actually volunteering to do it. So, Wilhelmina, are you volunteering to run the DNS instead? :) (and pay for it, or fundraise to pay for it) If you are, then we might have two options. Otherwise, we've got one, and no reason to reject it unless we thought OSU was not trustworthy with the responsibility or something (which if we did, would be a big problem, since they already responsible for a lot more than that). On 12/18/2012 4:34 PM, Wilhelmina Randtke wrote: I'm for individual ownership and management over organizational. Organizations tend to not have written documentation, and to rely on institutional memory. I see two things going wrong: Contact at OSU leaves OSU and no one thinks to renew domain, or OSU doesn't have a dedicated contact and at some point they don't renew because they don't see the value. Also important: OSU is on state funding cycles, so may have some rule against renewing for more than a year at a time. So, the deadline to renew will come more frequently than it would with unrestricted funds and the ability to renew for 5 or 10 years at a time. When the domain expires, it will go into a redemption period of about a month. I remember what the whois record looks like for domains in the redemption period, and whois does give the contact information. Does the URL stop working during this period? If so, then that's great because if there is a problem with a renewal then many people will notice the URL not working, and be able to check the status of the domain and get on it. -Wilhelmina Randtke On Tue, Dec 18, 2012 at 2:32 PM, Ed Summers e...@pobox.com wrote: HI all, I've owned the code4lib.org since 2005 and have been thinking it might be wise for to transfer ownership of it to someone else. Sometimes I forget to pay bills, and miss emails, and it seems like the domain means something to a larger group of people. With Ryan Ordway's help Oregon State University indicated they would be willing to take over administration of the domain. They also have been responsible for running the Drupal instance at code4lib.org and the Mediawiki instance at wiki.code4lib.org -- so it seems like a logical move. But I thought I would bring it up here first in the interests of transparency, community building and whatnot, to see if there were any objections or ideas. //Ed
Re: [CODE4LIB] Leader in MarcXML Files ( Record Length )
I wouldn't. One of the benefits of marcxml is that you are not constrained by marcs record length issues. Deciding to calculate that value would add an arbitrary length limitation to the format (in my opinion). Tr * Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Library Corvallis, OR 97331 541.737.6384 From: Sullivan, Mark V Sent: 6/29/2012 6:52 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Leader in MarcXML Files ( Record Length ) All, I received a question regarding a software library I have created and released as open source. The record length in the leader ( positions 0-4 ) was not being calculated correctly when writing as MarcXML. However, this raises a more philosophical and larger question. What is the point of the first five digits of the leader, outside of a ISO2709 / MARC21 encoded record? Should I calculate the record length AS IF it would be encoded in ISO2709? This would be computationally non-trivial and would likely double the time necessary for my software to write a MarcXML file. Should I just make the first five digits of the leader '0', since it means nothing in the context of a MarcXML file? Has anyone else pondered this question or have any input on how current systems work? Keep in mind I could be writing a MarcXML record for a record created or modified in memory, so just using a pre-existing record length is not an option. Many thanks for your consideration. Mark V Sullivan Digital Development and Web Coordinator Technology and Support Services University of Florida Libraries 352-273-2907 (office) 352-682-9692 (mobile) mars...@uflib.ufl.edumailto:mars...@uflib.ufl.edu
Re: [CODE4LIB] Leader in MarcXML Files ( Record Length )
If im writing marcxml from scratch, I agree. If I'm converting it from marc, i print out the length value from the record more for historical purposes. Tr * Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Library Corvallis, OR 97331 541.737.6384 From: Devon Sent: 6/29/2012 7:09 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Leader in MarcXML Files ( Record Length ) When writing MARC XML, you should use zeros. The following document [1] says you can use blanks, but the schema [2] uses a pattern that indicates digits should be used. When reading MARC XML, you should just ignore whatever is in those positions. [1] http://www.loc.gov/standards/marcxml/marcxml-design.html [2] http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd /dev On Fri, Jun 29, 2012 at 9:51 AM, Sullivan, Mark V mars...@uflib.ufl.edu wrote: All, I received a question regarding a software library I have created and released as open source. The record length in the leader ( positions 0-4 ) was not being calculated correctly when writing as MarcXML. However, this raises a more philosophical and larger question. What is the point of the first five digits of the leader, outside of a ISO2709 / MARC21 encoded record? Should I calculate the record length AS IF it would be encoded in ISO2709? This would be computationally non-trivial and would likely double the time necessary for my software to write a MarcXML file. Should I just make the first five digits of the leader '0', since it means nothing in the context of a MarcXML file? Has anyone else pondered this question or have any input on how current systems work? Keep in mind I could be writing a MarcXML record for a record created or modified in memory, so just using a pre-existing record length is not an option. Many thanks for your consideration. Mark V Sullivan Digital Development and Web Coordinator Technology and Support Services University of Florida Libraries 352-273-2907 (office) 352-682-9692 (mobile) mars...@uflib.ufl.edumailto:mars...@uflib.ufl.edu -- Sent from my GMail account.
Re: [CODE4LIB] Best way to process large XML files
I would really consider SAX. In MarcEdit, I had originally utilized an XSLT process for handling MARCXML translations (using both SAXON and MSXML parsers) -- but as you noticed -- there ends up being an upper limit to what you can process. The break point for me was when working with some researchers experimenting with data from the HathiTrust and they had a 32 GB XML file of MARCXML that needed to be processed. Using the DOM model, the process was untenable. Re-working the code so that it was SAX based -- required building, to some degree, the same type of templating to react to specific elements and nested elements -- but shifted processing time so that it took ~8 minutes to translate those 32 GBs of MARCXML data into MARC (and allowed me to include code that handled some common issues related to field length, etc. at the point of translation). Not knowing what your XML files look like, my guess is that if you do it right, you can template your SAX code in such a way that the actual processing code is smaller and much more efficient than anything you could create using a DOM method. --tr -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Friday, June 08, 2012 11:36 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Best way to process large XML files I'm working on a script that needs to be able to crosswalk at least a couple hundred XML files regularly, some of which are quite large. I've thought of a number of ways to go about this, but I wanted to bounce this off the list since I'm sure people here deal with this problem all the time. My goal is to make something that's easy to read/maintain without pegging the CPU and consuming too much memory. The performance and load I'm seeing from running the files through LibXML and SimpleXML on the large files is completely unacceptable. SAX is not out of the question, but I'm trying to avoid it if possible to keep the code more compact and easier to read. I'm tempted to streamedit out all line breaks since they occur in unpredictable places and put new ones at the end of each record into a temp file. Then I can read the temp file one line at a time and process using SimpleXML. That way, there's no need to load giant files into memory, create huge arrays, etc and the code would be easy enough for a 6th grader to follow. My proposed method doesn't sound very efficient to me, but it should consume predictable resources which don't increase with file size. How do you guys deal with large XML files? Thanks, kyle rantWhy the heck does the XML spec require a root element, particularly since large files usually consist of a large number of records/documents? This makes it absolutely impossible to process a file of any size without resorting to SAX or string parsing -- which takes away many of the advantages you'd normally have with an XML structure. /rant -- -- Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance baner...@uoregon.edubaner...@orbiscascade.org / 503.999.9787
Re: [CODE4LIB] more on MARC char encoding
Dealing with smart quotes is easy -- dealing with chemistry and mathematics symbols is much more challenging because there is so much variety. If you sent me some example documents off list so I could put together some sample files, I could take a closer look, but couldn't make any promises outside of the general smart quote issue. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Deng, Sai Sent: Friday, April 20, 2012 6:55 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] more on MARC char encoding If a canned cleaner can be added in MarcEdit to deal with smart quotes/values, that will be great! Besides the smart quotes, please consider other special characters including Chemistry and mathematics symbols (these are different types of special characters, right?) To better understand the character encoding issue, can anybody point me to some resources or list like UTF8 encoded data but not in the MARC8 character set? Thanks a lot. Sophie -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, April 19, 2012 2:14 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] more on MARC char encoding Ah, thanks Terry. That canned cleaner in MarcEdit sounds potentially useful -- I'm in a continuing battle to keep the character encoding in our local marc corpus clean. (The real blame here is on cataloger interfaces that let catalogers save data that are illegal bytes for the character set it's being saved as. And/or display the data back to the cataloger using a translation that lets them show up as expected even though they are _wrong_ for the character set being saved as. Connexion is theoretically the rolls royce of cataloger interfaces, does it do this? Gosh I hope not.) On 4/19/2012 2:20 PM, Reese, Terry wrote: Actually -- the issue isn't one of MARC8 versus UTF8 (since this data is being harvested from DSpace and is UTF8 encoded). It's actually an issue with user entered data -- specifically, smart quotes and the like. These values obviously are not in the MARC8 characterset and cause many who transform user entered data (which tend to be used by default on Windows) from XML to MARC. If you are sticking with a strickly UTF8 based system, there generally are not issues because these are valid characters. If you move them into a system where the data needs to be represented in MARC -- then you have more problems. We do a lot of harvesting, and because of that, we run into these types of issues moving data that is in UTF8, but has characters not represented in MARC8, from into Connexion and having some of that data flattened. Given the wide range of data not in the MARC8 set that can show up in UTF8, it's not a surprise that this would happen. My guess is that you could add a template to your XSLT translation that attempted to filter the most common forms of these smart quotes/values and replace them with the more standard values. Likewise, if there was a great enough need, I could provide a canned cleaner in MarcEdit that could fix many of the most common varieties of these smart quotes/values. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, April 19, 2012 11:13 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] more on MARC char encoding If your records are really in MARC8 not UTF8, your best bet is to use a tool to convert them to UTF8 before hitting your XSLT. The open source 'yaz' command line tools can do it for Marc21. The Marc4J package can do it in java, and probably work for any MARC variant not just Marc21. Char encoding issues are tricky. You might want to first figure out if your records are really in Marc8, thus the problems, or if instead they illegally contain bad data or data in some other encoding (Latin1). Char encoding is a tricky topic, you might want to do some reading on it in general. The Unicode docs are pretty decent. On 4/19/2012 11:06 AM, Deng, Sai wrote: Hi list, I am a Metadata librarian but not a programmer, sorry if my question seems naïve. We use XSLT stylesheet to transform some harvested DC records from DSpace to MARC in MarcEdit, and then export them to OCLC. Some characters do not display correctly and need manual editing, for example: In MarcEditor Transferred to OCLC Edit in OCLC Bayes’ theorem Bayes⁰́₉ theorem Bayes' theorem ―it won‘t happen here‖ attitude ⁰́₅it won⁰́₈t happen here⁰́₆ attitude it won't happen here attitude “Generation Y” ⁰́₋Generation Y
Re: [CODE4LIB] more on MARC char encoding
Actually -- the issue isn't one of MARC8 versus UTF8 (since this data is being harvested from DSpace and is UTF8 encoded). It's actually an issue with user entered data -- specifically, smart quotes and the like. These values obviously are not in the MARC8 characterset and cause many who transform user entered data (which tend to be used by default on Windows) from XML to MARC. If you are sticking with a strickly UTF8 based system, there generally are not issues because these are valid characters. If you move them into a system where the data needs to be represented in MARC -- then you have more problems. We do a lot of harvesting, and because of that, we run into these types of issues moving data that is in UTF8, but has characters not represented in MARC8, from into Connexion and having some of that data flattened. Given the wide range of data not in the MARC8 set that can show up in UTF8, it's not a surprise that this would happen. My guess is that you could add a template to your XSLT translation that attempted to filter the most common forms of these smart quotes/values and replace them with the more standard values. Likewise, if there was a great enough need, I could provide a canned cleaner in MarcEdit that could fix many of the most common varieties of these smart quotes/values. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, April 19, 2012 11:13 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] more on MARC char encoding If your records are really in MARC8 not UTF8, your best bet is to use a tool to convert them to UTF8 before hitting your XSLT. The open source 'yaz' command line tools can do it for Marc21. The Marc4J package can do it in java, and probably work for any MARC variant not just Marc21. Char encoding issues are tricky. You might want to first figure out if your records are really in Marc8, thus the problems, or if instead they illegally contain bad data or data in some other encoding (Latin1). Char encoding is a tricky topic, you might want to do some reading on it in general. The Unicode docs are pretty decent. On 4/19/2012 11:06 AM, Deng, Sai wrote: Hi list, I am a Metadata librarian but not a programmer, sorry if my question seems naïve. We use XSLT stylesheet to transform some harvested DC records from DSpace to MARC in MarcEdit, and then export them to OCLC. Some characters do not display correctly and need manual editing, for example: In MarcEditor Transferred to OCLC Edit in OCLC Bayes’ theorem Bayes⁰́₉ theorem Bayes' theorem ―it won‘t happen here‖ attitude ⁰́₅it won⁰́₈t happen here⁰́₆ attitude it won't happen here attitude “Generation Y”⁰́₋Generation Y⁰́₊ Generation Y listeners‟ evaluations listeners⁰́ evaluations listeners' evaluations high school – from high school ⁰́₃ from high school – from Co₀․₅Zn₀․₅Fe₂O₄ Co²́⁰⁰́Þ²́⁵Zn²́⁰⁰́Þ²́⁵Fe²́²O²́⁴ Co0.5Zn0.5Fe2O4? μÎơ μ Nafion® Nafion℗ʼ Nafion® LévyL©♭vy Lévy 43±13.20 years 43℗ł13.20 years 43±13.20 years 12.6 ± 7.05 ft∙lbs12.6 ℗ł 7.05 ft⁸́₉lbs 12.6 ± 7.05 ft•lbs ‘Pouring on the Pounds'⁰́₈Pouring on the Pounds' 'Pouring on the Pounds' k-ε turbulence k-Îæ turbulence k-ε turbulence student—neither parents student⁰́₄neither parents student-neither parents Λ = M – {p1, p2,…,pκ} Î₎ = M ⁰́₃ {p1, p2,⁰́Œ,pÎð} ? (won’t save) M = (0, δ)x × Y M = (0, Îþ)x ©₇ Y? 100°
[CODE4LIB] Code4Lib West Registration Form: July 30, 2012
The University of Oregon Libraries and Oregon State University Libraries invite you to code4lib west, Monday, July 30, 2012, at the UO Knight Library. There is no registration fee for this conference. Registration is limited to 50 participants. All participants are expected to deliver a lightning talk. In the event registration fills up quickly, limits on participation per institution may be employed. Your registration is not confirmed until you receive an email. Registrations will be confirmed by April 30, 2012. URL: https://docs.google.com/spreadsheet/viewform?formkey=dGRFM0Zob1dsNEE2RU9VY25SNlllUEE6MQ --TR *** Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Library Corvallis, OR 97331 tel: 541.737.6384 ***
[CODE4LIB] Save the data for Code4Lib West; July 30, 2012
The University of Oregon Libraries and Oregon State University Libraries invite you to code4lib west, Monday, July 30, 2012, at the UO Knight Library. There is no registration fee for this conference. Registration is limited to 50 participants. All participants are expected to deliver a lightning talk. In the event registration fills up quickly, limits on participation per institution may be employed. The conference will be a combination of lightning talks, code/system troubleshooting, and birds of feather groups. See http://oregondigital.org/digcol/code4libwest/ for more information. -Karen and Terry *** Karen Estlund Digital Library Services, Head Oregon Digital Newspaper Program, Director University of Oregon Libraries Eugene, OR 97403-1299 541-346-185 kestl...@uoregon.edumailto:kestl...@uoregon.edu Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Library Corvallis, OR 97331 tel: 541.737.6384 ***
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
This is one of the reasons you really can't trust the information found in position 9. This is one of the reasons why when I wrote MarcEdit, I utilize a mixed process when working with data and determining characterset -- a process that reads this byte and takes the information under advisement, but in the end treats it more as a suggestion and one part of a larger heuristic analysis of the record data to determine whether the information is in UTF8 or not. Fortunately, determining if a set of data is in UTF8 or something else, is a fairly easy process. Determining the something else is much more difficult, but generally not necessary. For that reason, if I was advising other people working on MARC processing libraries, I'd advocate having a process for recognizing that certain informational data may not be set correctly, and essentially utilize a compatibility process to read and correct them. Because unfortunately, while the number of vendors and systems that set this encoding byte correctly has increased dramatically (it used to be pretty much no one) -- but it's still so uneven, I generally consider this information unreliable. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Godmar Back Sent: Thursday, March 08, 2012 11:01 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records On Thu, Mar 8, 2012 at 1:46 PM, Terray, James james.ter...@yale.edu wrote: Hi Godmar, UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9: ordinal not in range(128) Having seen my fair share of these kinds of encoding errors in Python, I can speculate (without seeing the pymarc source code, so please don't hold me to this) that it's the Python code that's not set up to handle the UTF-8 strings from your data source. In fact, the error indicates it's using the default 'ascii' codec rather than 'utf-8'. If it said 'utf-8' codec can't decode..., then I'd suspect a problem with the data. If you were to send the full traceback (all the gobbledy-gook that Python spews when it encounters an error) and the version of pymarc you're using to the program's author(s), they may be able to help you out further. My question is less about the Python error, which I understand, than about the MARC record causing the error and about how others deal with this issue (if it's a common issue, which I do not know.) But, here's the long story from pymarc's perspective. The record has leader[9] == 'a', but really, truly contains ANSEL-encoded data. When reading the record with a MARCReader(to_unicode = False) instance, the record reads ok since no decoding is attempted, but attempts at writing the record fail with the above error since pymarc attempts to utf8 encode the ANSEL-encoded string which contains non-ascii chars such as 0xe8 (the ANSEL Umlaut prefix). It does so because leader[9] == 'a' (see [1]). When reading the record with a MARCReader(to_unicode=True) instance, it'll throw an exception during marc_decode when trying to utf8-decode the ANSEL-encoded string. Rightly so. I don't blame pymarc for this behavior; to me, the record looks wrong. - Godmar (ps: that said, what pymarc does fails in different circumstances - from what I can see, pymarc shouldn't assume that it's ok to utf8-encode the field data if leader[9] is 'a'. For instance, this would double-encode correctly encoded Marc/Unicode records that were read with a MARCReader(to_unicode=False) instance. But that's a separate issue that is not my immediate concern. pymarc should probably remember if a record needs or does not need encoding when writing it, rather than consulting the leader[9] field.) (*) https://github.com/mbklein/pymarc/commit/ff312861096ecaa527d210836dbef904c24baee6
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
Ed, Sure -- but this is one part of a much larger process. MarcEdit has two MARC algorithms, one that is a strict processing algorithm, and one that is a loose processing algorithm that is able to process data that would be otherwise invalid for most processors (and this is done because in the real world, vendors send bad records...often. Anyway, the character encoding is actually one of the last things MarcEdit does before writing the processed file to disk. The reason for this is that MarcEdit reads and interacts with MARC data at the bit level, meaning characterset is pretty meaningless for the vast majority of the work that it does. When writing to disk though, .NET requests the filestream to be set to the correct encoding, otherwise data can be flattened and diacritics lost. Essentially at that last step, the record is passed to a function called RecognizeUTF8 that takes a byte array. The program then enumerates the bytes to determine if the record is recognizable as UTF8 using a process based loosely around some of the work done by the International Components for Unicode (http://site.icu-project.org/) -- who have some incredible C libraries that do much more than you'd ever need to know how to do. While these don't work in C# -- they demonstrate some well-known methods for evaluating byte level data for code page evaluation. Of course, one area where I split directions is that I'm not interested in other charactersets and MARC data with poorly coded UTF8 data needs to be forced to render as MARC8 (my opinion) until the invalid characters are corrected. So, in my process, invalid UTF8 data will flag the process and force data output in the mnemonic data format I use for MARC8 encoded data. Does that make sense? --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ed Summers Sent: Thursday, March 08, 2012 12:19 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records Hi Terry, On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry terry.re...@oregonstate.edu wrote: This is one of the reasons you really can't trust the information found in position 9. This is one of the reasons why when I wrote MarcEdit, I utilize a mixed process when working with data and determining characterset -- a process that reads this byte and takes the information under advisement, but in the end treats it more as a suggestion and one part of a larger heuristic analysis of the record data to determine whether the information is in UTF8 or not. Fortunately, determining if a set of data is in UTF8 or something else, is a fairly easy process. Determining the something else is much more difficult, but generally not necessary. Can you describe in a bit more detail how MARCEdit sniffs the record to determine the encoding? This has come up enough times w/ pymarc to make it worth implementing. //Ed
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
I also used to think it would be cool if we could get MARC8 encoding/decoding into the Python standard library, but then I realized I'd rather work on other stuff while MARC8 withers and dies. Wouldn't that be nice. In MarcEdit, all data wants to be treated as UTF8, MARC8 support is there as a legacy. Which is why processing MARC8 data in MarcEdit is slightly slower than UTF8 (because there is a kind of emulation that occurs to translate charactersets on the fly when needed). --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gabriel Farrell Sent: Thursday, March 08, 2012 12:19 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records Sounds like what you do, Terry, and what we need in PyMARC, is something like UnicodeDammit [0]. Actually handling all of these esoteric encodings would be quite the chore, though. I also used to think it would be cool if we could get MARC8 encoding/decoding into the Python standard library, but then I realized I'd rather work on other stuff while MARC8 withers and dies. [0] https://github.com/bdoms/beautifulsoup/blob/master/BeautifulSoup.py#L1753 On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry terry.re...@oregonstate.edu wrote: This is one of the reasons you really can't trust the information found in position 9. This is one of the reasons why when I wrote MarcEdit, I utilize a mixed process when working with data and determining characterset -- a process that reads this byte and takes the information under advisement, but in the end treats it more as a suggestion and one part of a larger heuristic analysis of the record data to determine whether the information is in UTF8 or not. Fortunately, determining if a set of data is in UTF8 or something else, is a fairly easy process. Determining the something else is much more difficult, but generally not necessary. For that reason, if I was advising other people working on MARC processing libraries, I'd advocate having a process for recognizing that certain informational data may not be set correctly, and essentially utilize a compatibility process to read and correct them. Because unfortunately, while the number of vendors and systems that set this encoding byte correctly has increased dramatically (it used to be pretty much no one) -- but it's still so uneven, I generally consider this information unreliable. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Godmar Back Sent: Thursday, March 08, 2012 11:01 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records On Thu, Mar 8, 2012 at 1:46 PM, Terray, James james.ter...@yale.edu wrote: Hi Godmar, UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9: ordinal not in range(128) Having seen my fair share of these kinds of encoding errors in Python, I can speculate (without seeing the pymarc source code, so please don't hold me to this) that it's the Python code that's not set up to handle the UTF-8 strings from your data source. In fact, the error indicates it's using the default 'ascii' codec rather than 'utf-8'. If it said 'utf-8' codec can't decode..., then I'd suspect a problem with the data. If you were to send the full traceback (all the gobbledy-gook that Python spews when it encounters an error) and the version of pymarc you're using to the program's author(s), they may be able to help you out further. My question is less about the Python error, which I understand, than about the MARC record causing the error and about how others deal with this issue (if it's a common issue, which I do not know.) But, here's the long story from pymarc's perspective. The record has leader[9] == 'a', but really, truly contains ANSEL-encoded data. When reading the record with a MARCReader(to_unicode = False) instance, the record reads ok since no decoding is attempted, but attempts at writing the record fail with the above error since pymarc attempts to utf8 encode the ANSEL-encoded string which contains non-ascii chars such as 0xe8 (the ANSEL Umlaut prefix). It does so because leader[9] == 'a' (see [1]). When reading the record with a MARCReader(to_unicode=True) instance, it'll throw an exception during marc_decode when trying to utf8-decode the ANSEL-encoded string. Rightly so. I don't blame pymarc for this behavior; to me, the record looks wrong. - Godmar (ps: that said, what pymarc does fails in different circumstances - from what I can see, pymarc shouldn't assume that it's ok to utf8-encode the field data if leader[9] is 'a'. For instance, this would double-encode correctly encoded Marc/Unicode records that were read with a MARCReader(to_unicode=False) instance. But that's a separate issue
Re: [CODE4LIB] MarcEdit command line tool
Here's the problem -- you are missing a switch. In MarcEdit, the XSLT conversations run through Marc21XML. To move from MARC21XML to MARC, MarcEdit uses a crosswalk to the mnemonic format. When you use the GUI -- this value is set for you -- but since it is user configurable, the command-line requires you to set it. So, for example, here's an example of how it would look running on my machine (below). --TR Here's a full commandline example: c:\Program Files\MarcEdit 5.0cmarcedit -s c:\users\reeset\desktop\2011.xml -d c:\users\reeset\desktop\2011b.mrc -xmlmarc -mxslt c:\program files\MarcEdit 5.0\xslt\MARC21slim2Mnemonic.xsl Beginning Process... 2 records have been processed in 2.686151 seconds. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Crowe, Sean (crowesn) Sent: Thursday, September 01, 2011 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] MarcEdit command line tool I'm scripting some batch editing routines and I'd like to use the MarcEdit command line tool to convert Marc21XML to MARC. I can successfully convert using the 5.2 gui but no dice using the command line tool. To this point I've only every used the command line tool to break and make marc files. Here is my syntax: C:\Program Files\MarcEdit 5.0cmarcedit.exe -s xmltest.xml -d xmltest.mrc -xmlmarc Beginning Process... -1 records have been processed in 0.00 seconds. Header and namespace info from xml doc: ?xml version='1.0' encoding='UTF-8' ? collection xsi:schemaLocation='http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd' xmlns='http://www.loc.gov/MARC21/slim' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' record Should I be referencing an xslt file? Thanks in advance, Sean Crowe Electronic Resources Librarian Electronic Resources Dept. University of Cincinnati Libraries PO Box 210033 Cincinnati OH 45221-0033 Tel: (513) 556-1899 Fax: (513) 556-4393 Email: sean.cr...@uc.edu gchat: crowesn
[CODE4LIB] Job announcement: Associate University Librarian for Research and Scholarly Communication, Oregon State University
Please share this announcement with colleagues who would be interested. JOB ANNOUNCEMENT: Associate University Librarian for Research and Scholarly Communication Oregon State University Libraries Oregon State University Libraries seeks an innovative, dynamic, and experienced library leader to join the organization's senior leadership team. The Associate University Librarian for Research and Scholarly Communication will shape the Libraries' digital library strategies as they advance the development and communication of scholarly research and further the University's goal of becoming a top ten land-grant institution. As a member of the senior management team, the AUL for Research and Scholarly Communication will contribute to long-range planning, program development and evaluation, resource development, budget formulation, and allocation of resources in support of the Libraries' mission. He/she will work with department heads to identify and implement the strategic directions for the Center for Digital Scholarship and Services, Emerging Technologies and Services, University Archives, and Special Collections. The AUL must demonstrate a strong commitment to the collaborative development and implementation of innovative digital and web initiatives and services that respond adroitly to all users' evolving needs as researchers and scholars. OSU Libraries has nearly 2 million volumes and vast digital resources including ScholarsArchive@OSU (the 4th ranked single-university repository in the U.S.), internationally recognized digital collections like the Oregon Explorer natural resources digital library, and an agile development environment which has produced the LibraryFind(tm) metasearch application, the Library à la Carte Content Management System and other digital initiatives to serve the university's 24,000 students, faculty scholars and researchers, and the public. OSU Libraries is a member of the Orbis/Cascades Alliance of Northwest universities and colleges, which has a total of more than 9 million holdings. The OSU Libraries' Special Collections include the Ava Helen and Linus Pauling Papers as the cornerstone for collections on the history of science and technology in the 20th century. University Archives collections record the history of OSU and include the Oregon Multicultural Archives, which documents the lives and activities of ethnic minority communities in Oregon; and extensive collections pertaining to natural resources in Oregon and the Northwest. Required Qualifications: * MLS from an ALA-accredited library program or foreign equivalent. * Minimum of seven years increasing responsibility in an academic or research library. * Applied knowledge of the principles of library management and organization. * Experience with budget operations and strategic planning. * Knowledge of new information technologies, evolving models of scholarship, and the presentation of services in the Web environment and the ability to articulate how these influence teaching, learning and scholarship. * Strong record of scholarly publication, research and national participation in professional societies suitable for appointment as associate professor * Demonstrated commitment to service to all constituencies. * A record of accomplishment in dealing with change and mentoring and coaching staff at all levels including successful experience supporting tenure-track faculty. * Experience in working with state and/or regional consortia. * Excellent analytical, interpersonal, oral and written communication skills. * A demonstrable commitment to promoting and enhancing diversity. * A demonstrated commitment to working collaboratively. * Experience managing and administering digital library initiatives and services. * Experience with assessment and evaluation techniques, especially as applied to programs and services relevant to position responsibilities Preferred Qualifications: * Additional graduate degree. * Experience working with special collections and archives. * Experience participating in a library fundraising and development program, engaging with new and ongoing donors and providing stewardship information to major donors Environment: Oregon State is a leading research university located in one of the safest, smartest, greenest small cities in the nation: http://oregonstate.edu/main/about. Situated 90 miles south of Portland, and an hour from the Cascades or the Pacific Coast, Corvallis is the perfect home base for exploring Oregon's natural wonders. The university has an institution-wide commitment to diversity, multiculturalism and community and actively recruits and retains a diverse workforce and student body that includes members of historically underrepresented groups. Employment Conditions: Full-time, 12 month, annual tenure track appointment at the rank of Associate Professor. Salary is commensurate
Re: [CODE4LIB] z39.50 and write operations
Yes, but only if the server you are using supports the z39.50 extended attributes. However, few commercial ils systems seem to support it by default. Tr * Terry Reese Gray Family Chair for Innovative Library Services 121 Valley Library Corvallis, OR 97331 phone: 541.737.6384 * -Original Message- From: Eric Lease Morgan Sent: Friday, June 03, 2011 12:16 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] z39.50 and write operations Does Z39.50 support write operations? We here at Notre Dame may be working with a vendor in the near future who will be reading our MARC bibliographic records via Z39.50. I have also been told, not necessarily by the vendor, that these same records will be updated and reinserted back into our catalog by the vendor. I am quite familiar with Z39.50's ability to search and download content, but I am not familiar with its ability to write back to the server. Is this possible? -- Eric Lease Morgan University of Notre Dame
Re: [CODE4LIB] is this valid marc ?
Jonathan, Karen is correct -- CR/LF are invalid characters within a MARC record. This has nothing to do if the character is valid in the set -- the format itself doesn't allow it. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, May 19, 2011 11:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] is this valid marc ? I wonder if it depends on if your record is in Marc8 or UTF-8, if I'm reading Karen right to say that CR/LF aren't in the Marc8 character set. They're certainly in UTF-8! And a Marc record can be in UTF-8. On 5/19/2011 2:27 PM, Jonathan Rochkind wrote: Is it really true that newline characters are not allowed in a marc value? I thought they were, not with any special meaning, just as ordinary data. If they're not, that's useful to know, so I don't put any there! I'd ask for a reference to the standard that says this, but I suspect it's going to be some impenetrable implication of a side effect of an subtle adjective either way. On 5/19/2011 2:19 PM, Karen Coyle wrote: Quoting Andreas Orphanides andreas_orphani...@ncsu.edu: Anyway, I think having these two parts of the same URL data on separate lines is definitely Not Right, but I am not sure if it adds up to invalid MARC. Exactly. The CR and LF characters are NOT defined as valid in the MARC character set and should not be used. In fact, in MARC there is no concept of lines, only variable length strings (usually up to char). kc -dre. [1] http://www.loc.gov/marc/bibliographic/bd856.html [2] I am not a cataloger. Don't hurt me. [3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong. On 5/19/2011 12:37 PM, James Lecard wrote: I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I get from a partner. The 856 field is splitted over 2 lines, causing the ruby library to ignore it (I've patched it to overcome this issue) but I want to know if this kind of marc is valid ? =LDR 00638nam 2200181uu 4500 =001 cla-MldNA01 =008 080101s2008\\\|fre|| =040 \\$aMy Provider =041 0\$afre =245 10$aThis Subject =260 \\$aParis$bJ. Doe$c2008 =490 \\$aSome topic =650 1\$aNarratif, Autre forme =655 \7$abook$2lcsh =752 \\$aA Place on earth =776 \\$dParis: John Doe and Cie, 1973 =856 \2$qtext/html =856 \\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library Thanks, James L.
Re: [CODE4LIB] is this valid marc ?
It's been a while since I looked of the ISO spec (which I still can't believe I had to buy to read) -- but you can certainly infer by looking at legal characters laid out by LC. In reality -- only a handful of unprintable characters are technically allowed in a MARC record -- but you have to remember that when MARC was created -- it was for block reading -- and generally, early (and current) readers stop on hard breaks. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, May 19, 2011 11:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] is this valid marc ? On 5/19/2011 2:33 PM, Reese, Terry wrote: Jonathan, Karen is correct -- CR/LF are invalid characters within a MARC record. This has nothing to do if the character is valid in the set -- the format itself doesn't allow it. I'm curious where in the spec it says this -- of course, it's an intellectual exersize at this point, because even if the spec says one thing, it doesn't matter if everyone (including tool-writers) has always understood it differently. (This is a problem for me with lots of library 'standards' including MARC. Oh yeah, it might APPEAR to say/allow/prohibit that, but don't believe it, 'everyone' has always understood it diffferently. Or two parts of a spec which contradict each other). In the glossary here: http://www.loc.gov/marc/specifications/speccharintro.html It does say Consequently,/code points/less than 80 (hex) have the same meaning in both of the encodings used in MARC 21 and may be referred to as ASCII in either environment. Which could be interpreted to include control chars such as CR and LF. (Thanks Dan Scott). Of course, the glossary section may not actually be an operative part of the standard, or it may not mean what it seems to mean, or everyone may have always acted as if it meant something different. Welcome to MARC. But I'm not succesfully finding anything else that says one way or another on the legality. Most of the ascii control chars do seem to be missing from Marc8 (whether by design or accident), but that doesn't neccesarily mean they're illegal in a MARC record using some other (legal for MARC) encoding. But I believe Terry that it's not allowed (I believe Terry about just about everything). It's just really an intellectual exersize in the difficulty of finding answers in the MARC spec at the moment.
Re: [CODE4LIB] MARC magic for file
Actually, you can have records that are MARC21 coming out of vendor databases (who sometime embed control characters into the leader) and still be valid. Once you stop looking at just your ILS or OCLC, you probably wouldn't be surprised to know that records start looking very different. --TR Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Libraries Corvallis, Or 97331 tel: 541.737.6384 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 06, 2011 9:44 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file Can't you have a legal MARC file that does NOT have 4500 in those leader positions? It's just not legal Marc21, right? Other marc formats may specify or even allow flexibility in the things these bytes specify: * Length of the length-of-field portion * Number of characters in the starting-character-position portion of a Directory entry * Number of characters in the implementation-defined portion of a Directory entry Or, um, 23, which is I guess is left to the specific Marc implementation (ie, Marc21 is one such) to use for it's own purposes. I have no idea how that should inform the 'marc magic'. Is mime-type application/marc defined as specifically Marc21, or as any Marc? Jonathan On 4/6/2011 12:28 PM, Ford, Kevin wrote: Well, this brings us right up against the issue of files that adhere to their specifications versus forgiving applications. Think of browsers and HTML. Suffice it to say, MARC applications are quite likely to be forgiving of leader positions 20-23. In my non-conforming MARC file and in Bill's, the leader positions 20-21 (45) seemed constant, but things could fall apart for positions 22-23. So... I present the following (in-line and attached, to preserve tabs) in an attempt to straddle the two sides of this issue: applications forgiving of non- conforming files. Should the two characters following 45 (at position 20) *not* be 00, then the identification will be noted as non-conforming. We could classify this as reasonable identification but hardly ironclad (indeed, simply checking to confirm that part of the first 24 positions match the specification hardly constitutes a robust identification, but it's something). It will also give you a mimetype too, now. Would any like testing it out more fully on their own files? # # MARC 21 Magic (Third cut) # Set at position 0 0 bytex # leader position 20-21 must be 45 20 string 45 # leader starts with 5 digits, followed by codes specific to MARC format 0 regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z] MARC Bibliographic !:mime application/marc 0 regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority !:mime application/marc 0 regex/1 (^[0-9]{5})[cdn][uvxy] MARC Holdings !:mime application/marc 0 regex/1 (^[0-9]{5})[acdn][w]MARC Classification !:mime application/marc 0 regex/1 (^[0-9]{5})[cdn][q] MARC Community !:mime application/marc # leader position 22-23, should be 00 but is it? 0 regex/1 (^.{21})([^0]{2}) (non-conforming) !:mime application/marc If this works, I'll see about submitting this copy. Thanks to all your efforts already. Warmly, Kevin -- Library of Congress Network Development and MARC Standards Office From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero [s...@unc.edu] Sent: Sunday, April 03, 2011 14:01 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file I am pretty sure that the marc4j standard reader ignores them; the tolerant reader definitely does. Otherwise JHU might have about two parseable records based on the mangled leaders that J-Rock gets stuck with :-) An analysis of the ~7M LC bib records from the scriblio.net data files (~ Dec 2006) indicated that leader has less than 8 bits of information in it (shannon-weaver definition). This excludes the initial length value, which is redundant given the end of record marker. The LC V'GER adds a pseudo tag 000 to it's HTML view of the MARC leader. The final characters of the leader are 450. Also, I object to the phrase decent MARC tool. Any tool capable of dealing with MARC as it exists cannot afford the luxury of decency :-) [ HA: A clear conscience? BW: Yes, Sir Humphrey. HA: When did you acquire this taste for luxuries?] Simon On Fri, Apr 1, 2011 at 5:16 AM, Owen Stephenso...@ostephens.com wrote: I'm sure any decent MARC tool can deal with them, since decent MARC tools are certainly going to be forgiving enough to deal with four characters that
Re: [CODE4LIB] MARC magic for file
Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fields, which in my opinion, is more open to interpretation based on system usage of the data. For example, 22 and 23 are undefined values that local systems may very well have a practical need to define and use given that there are only so many values in the leader. This is why I sometimes see additional values in the 09 field (which should be a or blank) to define different character set types, or additional elements added to other fields. If I want to validate the content of those fields, I'd validate it through a different process -- but I separate the process from the validation of the structure -- because the two are not exclusive. --TR -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, April 06, 2011 9:59 AM To: Code for Libraries Cc: Reese, Terry Subject: Re: [CODE4LIB] MARC magic for file I'm not sure what you mean Terry. Maybe we have different understandings of valid. If leader bytes 20-23 are not 4500, I suggest that is _by definition_ not a valid Marc21 file. It violates the Marc21 specification. Now, they may still be _usable_, by software that ignores these bytes anyway or works around them. We definitely have a lot of software that does that. Which can end up causing problems that remind me of very analagous problems caused by the early days of web browsers that felt like being 'tolerant' of bad data. My html works in every web brower BUT this one, why not? Oh, becuase that's the only one that actually followed the standard, oops. I actually ran into an example of that problem with this exact issue. MOST software just ignores marc leader bytes 20-23, and assumes the semantics of 4500---the only legal semantics for Marc21. But Marc4j actually _respected_ them, apparently the author thought that some marc in the wild might intentionally set different bytes here (no idea if that's true or not). So if the leader bytes 20-23 were invalid (according to the spec), Marc47 would suddenly decide that the length of field portion was NOT 4, but actually BELIEVE whatever was in leader byte 20, causing the record to be parsed improperly. And I had records like that coming out of my ILS (not even a vendor database). That was an unfun couple days of debugging to figure out what was going on. On 4/6/2011 12:52 PM, Reese, Terry wrote: Actually, you can have records that are MARC21 coming out of vendor databases (who sometime embed control characters into the leader) and still be valid. Once you stop looking at just your ILS or OCLC, you probably wouldn't be surprised to know that records start looking very different. --TR Terry Reese, Associate Professor Gray Family Chair for Innovative Library Services 121 Valley Libraries Corvallis, Or 97331 tel: 541.737.6384 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 06, 2011 9:44 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file Can't you have a legal MARC file that does NOT have 4500 in those leader positions? It's just not legal Marc21, right? Other marc formats may specify or even allow flexibility in the things these bytes specify: * Length of the length-of-field portion * Number of characters in the starting-character-position portion of a Directory entry * Number of characters in the implementation-defined portion of a Directory entry Or, um, 23, which is I guess is left to the specific Marc implementation (ie, Marc21 is one such) to use for it's own purposes. I have no idea how that should inform the 'marc magic'. Is mime-type application/marc defined as specifically Marc21, or as any Marc? Jonathan On 4/6/2011 12:28 PM, Ford, Kevin wrote: Well, this brings us right up against the issue of files that adhere to their specifications versus forgiving applications. Think of browsers and HTML. Suffice it to say, MARC applications are quite likely to be forgiving of leader positions 20-23. In my non-conforming MARC file and in Bill's, the leader positions 20-21 (45) seemed constant, but things could fall apart for positions 22-23. So... I present the following (in-line and attached, to preserve tabs) in an attempt to straddle the two sides of this issue: applications forgiving of non- conforming files. Should the two characters following 45 (at position 20) *not* be 00, then the identification will be noted as non-conforming. We could classify this as reasonable identification but hardly ironclad
Re: [CODE4LIB] MARC magic for file
I'm honestly not family with magic. I can tell you in MarcEdit, the way that the process works is there is a very generic function that reads the structure of the data not trusting the information in the leader (since I find this data very un-reliable). Then, if users want to apply a set of rules to the validation -- I apply those as a secondary process. If you are looking to validate specific content within a record, then what you want to do in this function may be appropriate -- though you'll find some local systems will consistently fail the process. --tr From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of William Denton [w...@pobox.com] Sent: Wednesday, April 06, 2011 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file On 6 April 2011, Reese, Terry wrote: Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fields, which in my opinion, is more open to interpretation based on system usage of the data. What do you think is the best way to recognize MARC files (up to some level of validity, given all the MARC you've seen and parsed) that could be made to work the way magic is defined? Bill -- William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
Re: [CODE4LIB] utf8 \xC2 does not map to Unicode
I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in MARC-8. I'd guess the file isn't in UTF8. --TR -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 06, 2011 1:28 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] utf8 \xC2 does not map to Unicode I am not familar with that Perl module. But I'm more familiar then I'd want with char encoding in Marc. I don't recognize the bytes 0xC2 (there are some bytes I became pathetically familiar with in past debugging, but I've forgotten em), but the first things to look at: 1. Is your Marc file encoded in Marc8 or UTF-8? I'm betting Marc8. Theoretically there is a Marc leader byte that tells you whether it's Marc8 or UTF-8, but the leader byte is often wrong in real world records. Is it wrong? 2. Does Perl MARC::Batch have a function to convert from Marc8 to UTF-8? If so, how does it decide whether to convert? Is it trying to do that? Is it assuming that the leader byte the record accurately identifies the encoding, and if so, is the leader byte wrong? Is it trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the first place? Or is it assuming the source was UTF-8 in the first place, when in fact it was Marc8? Not the answer you wanted, maybe someone else will have that. Debugging char encoding is hands down the most annoying kind of debugging I ever do. On 4/6/2011 4:13 PM, Eric Lease Morgan wrote: Ack! While using the venerable Perl MARC::Batch module I get the following error while trying to read a MARC record: utf8 \xC2 does not map to Unicode This is a real pain, and I'm hoping someone here can help me either: 1) trap this error allowing me to move on, or 2) figure out how to open the file correctly.
Re: [CODE4LIB] marcxml
Yes -- that's right. There is a zip file with install instructions for any non-windows based system for which a MONO port is present. --TR -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Joel Marchesoni Sent: Thursday, November 11, 2010 8:40 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] marcxml There actually is a version of MARCEdit for Linux now. I think (although I can't remember and can't find it on the site) that it relies on Mono. MARCEdit download page: http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html Joel -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of J.D.Gravestock Sent: Thursday, November 11, 2010 6:26 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] marcxml I'd be interested to know if anyone is using a good marcxml to marc converter (other than marcedit, i.e. non windows). I've tried the perl module marc::xml but having a few problems with the conversion which I can't replicate in marcedit. Are there any that I've missed? Jill ** Jill Gravestock Open University Library Milton Keynes -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England Wales and a charity registered in Scotland (SC 038302). --
Re: [CODE4LIB] Copy Cataloging MARC record manipulation
Andy, Since I write marcedit, maybe I can help. If you can give me an idea what you are up to, I'll see if its something that can be dealt with. Tr Terry Reese Gray Family Chair for Innovative Library Services Oregon State University Libraries Corvallis, OR 97331 tel: 541.737.6384 web: http://people.oregonstate.edu/~reeset/ On Aug 19, 2010, at 6:16 PM, Andy Kelly a.m.ke...@gmail.com wrote: Greetings all, I am in a bit of a fix. I am working to get my library working up a more effective copy-cataloging workflow and was looking for some software suggestions. I'm more or less trapped on Windows XP and have so far been running Mercury Z39.50 client with some success. My search would end here if exporting one. record. at. a. time. wasn't so painful. I've been evaluating MarcEdit and it's associated Z39.50 Client. I've found it to be slow, buggy and always trapped in windows of fixed sizes. It can also only search one Z39.50 server at a time, so it replaces one bottleneck with another. I get the impression I'm sort of in the Dark Ages here in that we're not just OCLC copy-cataloging subscribers, but I can't seem to convince my superiors that that service is worth making room for in the budget, though perhaps this is a more common situation than I'm aware of. Ideally: I feed in a txt file or CSV of ISBNs and I get out one big MARC record to feed my [ancient, fussy] OPAC. This might be one of those ...why don't you do it with a Perl script? problems that might get me to really dive into my copy of Introducing Perl. (I've looked at the ZOOM Perl Bindings and MARC module on CPAN, both look promising but far beyond my current limited abilities and likely even further beyond my boss, future replacements and/or student worker's ability to maintain or use.) Thanks for your help suggestions. ~Andy
[CODE4LIB] Pacific Northwest Code4Lib chapter and meeting
FYI for the larger group. Since many members in the PNW simply cannot travel to the larger C4L meeting due to budgetary restraints (this year, and very likely the next), etc -- we will be starting up a PNW local chapter and hosting a one day C4L meeting for those in the area that are interested, but maybe otherwise were not able to attend the annual C4L meeting. Info can be found at: http://groups.google.com/group/pnwcode4lib?hl=en. Plus, it will give the PNW a group that can start crafting a plan to bring the C4L conference back to its PNW home. J --TR *** Terry Reese The Gray Family Chair for Innovative Library Services Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: terry.re...@oregonstate.edu http: http://oregonstate.edu/~reeset ***
[CODE4LIB] FW: Please visit RDA Test Website
Posted on behalf of Dianne McCutcheon * Terry Reese The Gray Family Chair for Innovative Library Services Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: terry.re...@oregonstate.edu http: http://oregonstate.edu/~reeset * * The US National Libraries RDA Test Steering Committee has launched a Website for the RDA test project, at URL http://www.loc.gov/bibliographic-future/rda/ http://www.loc.gov/bibliographic-future/rda/%20 The site includes a link to a fill-in PDF application form that you can use to let us know if you're interested in being selected as a test partner. The Test Steering Committee received excellent comments about the project after the RDA Test Planning Forum at ALA Midwinter in Denver. As a result of this feedback, we realized that we needed to ask for more precise information from the potential test participants. So we revised the application form and made it available on the RDA Test Planning Website. Please complete and return the form, even if you submitted an expression of interest earlier. The Website also has links to a proposed timeline and to the methodology that the Steering Committee plans to use for the testing. We'll update the site with additional information as we develop a complete test protocol. Thank you very much for your interest in the US National Libraries RDA Test project. We look forward to hearing from you. As the application form states, we're requesting that anyone interested in participating as a test partner return the PDF application, via email, by April 13 to Susan Morris. The email link in the form will return it to Susan. Please get in touch with her if you have any questions or if there is any problem with the PDF. Susan R. Morris Special Assistant to the Director for Acquisitions and Bibliographic Access Library of Congress voice: 202-707-6073 fax: 202-252-3220 For the US National Libraries RDA Test Steering Committee: co-chairs Chris Cole, Dianne McCutcheon, and Beacher Wiggins
Re: [CODE4LIB] Zotero under attack
This seems like a real grey area. I can see Thomson Scientific putting up a fuss when using ENS files generated by the creator of EndNote. But ENS files can -- and have -- be created by just about anyone (librarians, journal publishers, researchers) and published on the open web. I'm not sure that's what they are saying. Endnote does come with ens files that they create (I believe, that was the case the last time I looked at the software), managed and provided as part of their application. They certainly can claim rights to those (this isn't really a gray area) -- and unless the Zotero software is able to determine user generated files from files distributed as part of the Endnote application, then it could be problematic. --TR *** Terry Reese Cataloger for Networked Resources Digital Production Unit Head Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: [EMAIL PROTECTED] http: http://oregonstate.edu/~reeset *** From: Code for Libraries on behalf of Peter Murray Sent: Sun 9/28/2008 5:46 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Zotero under attack -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've posted some analysis and plenty of links to critical bits at http://dltj.org/article/endnote-zotero-lawsuit/ Some other thoughts... On Sep 26, 2008, at 4:01 PM, Reese, Terry wrote: While reverse engineering the .ens style files really isn't that big of a deal (this kind of reverse engineering is generally legally permitted), utilizing the collected knowledge-base from an End-note application is. I've run into this in the past with other software that I've worked on -- there is a good deal of legal tiptoeing that often needs to be done when you are building software that will essentially bird dog another (proprietary) application's knowledge-base. This seems like a real grey area. I can see Thomson Scientific putting up a fuss when using ENS files generated by the creator of EndNote. But ENS files can -- and have -- be created by just about anyone (librarians, journal publishers, researchers) and published on the open web. I don't see anything in the license agreement or argued elsewhere that says Thomson Scientific has rights over these works (the citation definition files) created and published by others. That would seem akin to Microsoft claiming rights over documents written in Word. Peter - -- Peter Murrayhttp://www.pandc.org/peter/work/ Assistant Director, New Service Development tel:+1-614-728-3600;ext=338 OhioLINK: the Ohio Library and Information NetworkColumbus, Ohio The Disruptive Library Technology Jesterhttp://dltj.org/ Attrib-Noncomm-Share http://creativecommons.org/licenses/by-nc-sa/2.5/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Darwin) Comment: Ask me how you can start digitally signing your email! iD8DBQFI4CVf4+t4qSfPIHIRAkYFAJ0Qq85j1IXKv9aAnexFo+kvbS/eEACcCuCY kXoL085OZqvLFtbb+tb3LRI= =2Z92 -END PGP SIGNATURE-
Re: [CODE4LIB] Zotero under attack
Hopefully, this quote from the article: A significant and highly touted feature of the new beta version of Zotero, however, is its ability to convert - in direct violation of the License Agreement - Thomson's 3,500 plus proprietary .ens style files within the EndNote Software into free, open source, easily distributable Zotero .csl files isn't quite this straightforward. While reverse engineering the .ens style files really isn't that big of a deal (this kind of reverse engineering is generally legally permitted), utilizing the collected knowledge-base from an End-note application is. I've run into this in the past with other software that I've worked on -- there is a good deal of legal tiptoeing that often needs to be done when you are building software that will essentially bird dog another (proprietary) application's knowledge-base. --TR -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of wally grotophorst Sent: Friday, September 26, 2008 12:09 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Zotero under attack http://www.courthousenews.com/2008/09/17/Reuters_Says_George_Mason_Univ ersity_Is_Handing_Out_Its_Proprietary_Software.htm I guess stuff like this is what gives me that anti-corporate bias...
Re: [CODE4LIB] what's the best way to get from Portland to San Francisco on Feb 28?
You'll want to fly. On the West Coast, taking the train is a bit of a crap shoot and wouldn't advise it unless you had a day between when you are suppose to arrive and when you need to arrive. The few times I've taken Amtrak on the West coast between Seattle and Los Angelos, I've never been on time. I've been anywhere between 5 hours to one day late depending the distance needed to travel. In fact, given my past experience, if I wasn't going to fly -- I would drive. It will take you approximately 12-13 hours to drive down I-5 from Portland to San Francisco. By train, almost twice as long. --TR *** Terry Reese Cataloger for Networked Resources Digital Production Unit Head Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: [EMAIL PROTECTED] http: http://oregonstate.edu/~reeset *** From: Code for Libraries on behalf of Elizabeth Sadler Sent: Wed 2/20/2008 6:59 PM To: CODE4LIB@listserv.nd.edu Subject: [CODE4LIB] what's the best way to get from Portland to San Francisco on Feb 28? Dear Code4Lib folks, Can some of you west-coasters advise me on the best (read: cheapest and most fun) way to get from Portland to San Francisco on February 28? Taking a plane is my last resort. My first choice would be hitching a ride with any conference attendees who would be going that way anyway, and I also thought about taking the train but Amtrak has stymied me. Greyhound would take much too long, I imagine. Anyway, I thought it was worth a question. Any suggestions from the community? Bess Elizabeth (Bess) Sadler Research and Development Librarian Digital Scholarship Services (DSS) Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [EMAIL PROTECTED] (434) 243-2305
Re: [CODE4LIB] getting Worldcat records
Since these are your libraries' records, you can certainly download them again from OCLC. I've also known libraries in the past that have been able to have oclc generate a subset of records from their database -- though in these cases, this always has involved a cost to purchase the records. In terms of how easy it is to do on your own -- if you don't have OCLC do it, you would likely need a list of all the OCLC numbers that you are interested in. With that list, you could easily batch export the data again from Worldcat using Connexion. --TR -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Alberto Accomazzi Sent: Thursday, February 07, 2008 10:35 AM To: CODE4LIB@listserv.nd.edu Subject: [CODE4LIB] getting Worldcat records Our project maintains a database of bibliographic metadata for all things in astronomy and most of physics. We'd like to add records for books that have been recently added to our library and to correlate existing records with the library holdings. Sounds easy enough, but because of the intricacies of Harvard libraries administration we haven't been able to get a dump of the records, much less a feed. The recent emails about OCLC worldcat records made me wonder if we could get the equivalent data from them (since our library subscribes to them). Essentially what I'd like is a dump of all QB and QC records in OCLC entered by Harvard, so we can index them and then point to the library record in OCLC. Is this (a) legal, (b) feasible, (c) easy? I assume the answer to (a) and (b) is yes, since we have our library's support. If not, are there alternatives? I learned about openlibrary only yesterday, so I haven't had a chance to explore what's in it yet... Thanks, -- Alberto
Re: [CODE4LIB] Records for Open Library
Isn't sharing such records a no-no? No, OCLC's guidelines for transfer (http://www.oclc.org/support/documentation/worldcat/records/guidelines/default.htm) specifically give unrestricted transfer rights to libraries and non-commercial entities. The Open Library is both. It's a registered library in California and a non-profit. So in either situtation, it's not a problem. --TR *** Terry Reese Cataloger for Networked Resources Digital Production Unit Head Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: [EMAIL PROTECTED] http: http://oregonstate.edu/~reeset *** From: Code for Libraries on behalf of Peter Murray Sent: Wed 2/6/2008 2:50 PM To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] Records for Open Library -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Feb 5, 2008, at 12:11 PM, K.G. Schneider wrote: Has your library considered contributing records to Open Library ( http://www.openlibrary.org/ )? If so I'd like to hear from you on or off list. How would that work? Most of the records in OhioLINK are probably derived from OCLC Worldcat. Isn't sharing such records a no-no? Peter - -- Peter Murrayhttp://www.pandc.org/peter/work/ Assistant Director, New Service Development tel:+1-614-728-3600;ext=338 OhioLINK: the Ohio Library and Information NetworkColumbus, Ohio The Disruptive Library Technology Jesterhttp://dltj.org/ Attrib-Noncomm-Share http://creativecommons.org/licenses/by-nc-sa/2.5/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Darwin) iD8DBQFHqjmq4+t4qSfPIHIRAqggAKDGoUmRO/7tcmdTn7f8YEnaBTbhQQCfYSBy yJU+FrMcWRUGURJk29iDx5w= =CEg4 -END PGP SIGNATURE-
Re: [CODE4LIB] low-cost software for prison libraries?
I'd suggest Koha -- but if they are looking for something simple and lowcost, you could try something like CDS/ISIS (http://portal.unesco.org/ci/en/ev.php-URL_ID=5330URL_DO=DO_TOPICURL_SECTION=201.html) -- it's free and developed by Unesco. The other one you could try ResourceMate (http://www.resourcemate.com/) -- its a low cost windows based system that I've used before. This list here might also be useful -- though a little dated: http://www.librarysupportstaff.com/4automate.html --TR *** Terry Reese Cataloger for Networked Resources Digital Production Unit Head Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: [EMAIL PROTECTED] http: http://oregonstate.edu/~reeset *** From: Code for Libraries on behalf of Jonathan Rochkind Sent: Wed 1/30/2008 8:54 PM To: CODE4LIB@listserv.nd.edu Subject: [CODE4LIB] low-cost software for prison libraries? Hi all, this is forwarded from a prison librarian listserv. Does anyone know of any very low-cost (or open source?) library systems that would be suitable for small and/or low-staffed libraries? I'm thinking something like Koha or Evergreen would probably be overkill and/or too hard to install without much/any tech/systems staff, but I could very well be wrong, I don't know much about either system. I also don't know much about the needs of that kind of small library. If anyone does have ideas, could you send them directly to Mary (in addition to CCing the list if you want, because I'm interested too and I bet other list members would be.). I've been curious for a while about solutions available to the very small/limited-resource library in the way 'automation', but know almost nothing about it and am not sure if there's an easy way to find out. If anyone happens to know something about this (or is interested in researching it), I personally think the Code4Lib Journal would be a great place to publish an essay or survey on that topic. Jonathan Begin forwarded message: From: [EMAIL PROTECTED] Date: January 30, 2008 9:12:19 PM EST To: [EMAIL PROTECTED] Subject: [prison-l] Library automation software Greetings: Last month there was some discussion here about cheap/free/ reasonably priced automation software for correctional libraries. I am on a statewide committee which has just been formed to research and recommend a software package to replace Athena (formerly by Sagebrush, now Follett) in most of the correctional libraries in Virginia. After years in public libraries I am very familiar with some of the big vendors, but they are simply financially out of the question for our agency, not to mention web- based. I have looked at the websites for LibraryThing, Auto Librarian, and ResourceMate, which were recommended here in the previous discussion. If you know of or have a circ/cat system that is reasonably priced (or dirt cheap) and works well for you, please share the information with me, with pros and cons if you like. All replies greatly appreciated, and thanks in advance. Mary Geist, librarian Dept. of Correctional Education Brunswick Correctional Center 1147 Planter's Road Lawrenceville, VA 23868 434.848.4131, ext. 1146
Re: [CODE4LIB] Library Software Manifesto
Roy, While your rights are interesting, the consumer responsibilities I find are actually more important (and always more difficult to see followed). As some that develops software for wide public consumption (read, not developers but the computer illiterate in many cases), I find that points 1-3 are the most difficult for most people. Invariably, people don't really know what they want from an application -- just an idea of a workflow as to how something might have worked (or had learned) before. Likewise, most assume that if you just say, x doesn't work then as the developer you'll be able to decode the problem. Sometimes, I can decode the problem as the User (which tells me that what I'm doing needs to be more straightforward) -- other times, I rely on the user to provide as much information as possible to reproduce problems which can be like a trip to the dentist. I think our software vendors are in the same position. Many have fallen to sleep in terms of understanding what libraries want today -- but at the same time -- librarians have traditionally been, as a user group (I'm painting in broad strokes here), a bunch of whinners that really don't know what they want to begin with. Any library software manifesto that includes vendor responsibilies needs to equally highlight the responsiblities users have in this relationship (which looks like the direction you are going -- just don't undersell it). --TR *** Terry Reese Cataloger for Networked Resources Digital Production Unit Head Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: [EMAIL PROTECTED] http: http://oregonstate.edu/~reeset *** From: Code for Libraries on behalf of Roy Tennant Sent: Tue 11/6/2007 10:07 AM To: CODE4LIB@listserv.nd.edu Subject: [CODE4LIB] Library Software Manifesto I have a presentation coming up and I'm considering doing what I'm calling a Library Software Manifesto. Some of the following may not be completely understandable on the face of it, and I would be explaining the meaning during the presentation, but this is what I have so far and I'd be interested in other ideas this group has or comments on this. Thanks, Roy Consumer Rights - I have a right to use what I buy - I have a right to the API if I've bought the product - I have a right to accurate, complete documentation - I have a right to my data - I have a right to not have simple things needlessly complicated Consumer Responsibilities - I have a responsibility to communicate my needs clearly and specifically - I have a responsibility to report reproducible bugs in a way as to facilitate reproducing it - I have a responsibility to report irreproducible bugs with as much detail as I can provide - I have a responsibility to request new features responsibly - I have a responsibility to view any adjustments to default settings critically
Re: [CODE4LIB] library find and bibliographic citation export?
COINs are included in the output, but because the current pages are loaded via AJAX, the data isn't visible to browser plugins like Libx, Zotero, etc. 0.8.3 will remove nearly all the ajax -- and when that happens, the COINS data should be visible. --TR *** Terry Reese Cataloger for Networked Resources Digital Production Unit Head Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: [EMAIL PROTECTED] http: http://oregonstate.edu/~reeset *** From: Code for Libraries on behalf of Karen Coombs Sent: Thu 9/27/2007 11:31 AM To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] library find and bibliographic citation export? I believe that LibraryFind includes COinS but they aren't working quite right in the current version. If the COinS were working correctly (which they are supposed to in the next version) then Zotero would read them and allow you to import results. I don't know of anyone who has added a citation export feature otherwise though. Jeremy or Terry please correct me if I've got my COinS information in which version confused. Karen On 9/27/07 11:57 AM, Tim Shearer [EMAIL PROTECTED] wrote: Hi, I'm interested to know if anyone working with LibraryFind has begun work to create a tool for bibliographic export to citation management tools like refworks, etc. Thanks! Tim +++ Tim Shearer Web Development Coordinator The University Library University of North Carolina at Chapel Hill [EMAIL PROTECTED] 919-962-1288 +++ -- Karen A. Coombs Head of Libraries' Web Services University of Houston 114 University Libraries Houston, TX 77204-2000 Phone: (713) 743-3713 Fax: (713) 743-9811 Email: [EMAIL PROTECTED]
Re: [CODE4LIB] Polls open for Code4Lib 2007 T-Shirt design
Per the Rosie the Riveter Memorial (http://www.rosietheriveter.org/faq.htm) regarding the image. Given that its a commissioned work by the United States War Production Commission, I'd say that its likely to be in the public domain. I wouldn't worry about it. 4. Is the Rosie the Riveter image copyrighted? The image that has become most widely known was commissioned by the United States War Production Commission - Co-coordinating Committee for use on a recruiting poster in 1943. It was intended to be displayed for only two weeks, February 15 through February 28. The artist was J. Howard Miller. It is widely held that this image is in the public domain, but we are aware of no official documentation to that effect. There are less well-known images, including a painting by Norman Rockwell entitled Rosie the Riveter, that remain under copyright. --TR *** Terry Reese Cataloger for Networked Resources Digital Production Unit Head Oregon State University Libraries Corvallis, OR 97331 tel: 541-737-6384 email: [EMAIL PROTECTED] http: http://oregonstate.edu/~reeset *** From: Code for Libraries on behalf of Edward Corrado Sent: Mon 1/29/2007 5:45 AM To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] Polls open for Code4Lib 2007 T-Shirt design I have no idea of the legal status of the photo, I believe the length of time for copyright in the USA is 75 years (unless your Disney, then it is for ever), thus it still may be covered on this side of the pond. I think we need to clarify this before printing up a bunch of shirts with this photo. Edward - who doubts anyone will be chasing after code4lib but still, we should do things the right way Rob Styles said the following on 1/29/2007 4:54 AM: The photo is an original WWII photo from 1944, it's outside of the 50 years covered by copyright here in the UK in is in use by several different organisations. I believe we don't need any clearance. rob Rob Styles Programme Manager, Data Services, Talis tel: +44 (0)870 400 5000 fax: +44 (0)870 400 5001 direct: +44 (0)870 400 5004 mobile: +44 (0)7971 475 257 msn: [EMAIL PROTECTED] irc: irc.freenode.net/mrob,isnick -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Roy Tennant Sent: 26 January 2007 21:10 To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] Polls open for Code4Lib 2007 T-Shirt design I hate to be the one to raise this, but it seems like I must since the design is leading in the polls, but do we have (or can we obtain) the right to reproduce that photo? Roy The very latest from Talis read the latest news at www.talis.com/news listen to our podcasts www.talis.com/podcasts see us at these events www.talis.com/events join the discussion here www.talis.com/forums join our developer community www.talis.com/tdn and read our blogs www.talis.com/blogs Any views or personal opinions expressed within this email may not be those of Talis Information Ltd. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited. Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB. -- Edward M. Corrado http://www.tcnj.edu/~corrado/ Systems Librarian The College of New Jersey 403E TCNJ Library PO Box 7718 Ewing, NJ 08628-0718 Tel: 609.771.3337 Fax: 609.637.5177 Email: [EMAIL PROTECTED]