Re: [CODE4LIB] Google can give you answers, but librarians give you the right answers
I would suggest that librarians are interested in, among other things, promoting information literacy skills to our patrons. According to ACRL's Standards for Information Literacy in Higher Education (2000 edition): http://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/standards.pdf An information literate individual is able to: -Determine the nature and extent of information needed -Access the needed information effectively and efficiently -Evaluate information and its sources critically -Incorporate selected information into one’s knowledge base -Use information effectively to accomplish a specific purpose -Understand the economic, legal, and social issues surrounding the use of information, and access and use information ethically and legally The idea that Google can't provide the contextual nature of the information that it presents and people vary in their levels of information literacy, a librarian, presumably with an advanced skillset and knowledge base in this area, can help provide assistance and context to what a patron might need. In that sense, I think a librarian can often add tremendous value to a search. Mike Beccaria Director of Library Services Paul Smith’s College 7833 New York 30 Paul Smiths, NY 12970 518.327.6376 mbecca...@paulsmiths.edu www.paulsmiths.edu -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Friday, April 01, 2016 2:00 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Google can give you answers, but librarians give you the right answers On Thu, Mar 31, 2016 at 9:31 PM, Cornel Darden Jr.wrote: > > "Google can give you answers, but librarians give you the right answers." > > Is it me? Or is there something wrong with this statement? > There's nothing wrong with the statement. As is the case with all sound bites, it should be used to stimulate thought rather than express reality. Librarians have a schizophrenic relationship with Google. We dump on Google all the time, but it's one of the tools librarians of all stripes rely on the most. When we build things, we emulate Google's look, feel, and functionality. And while we blast Google on privacy issues, human librarians know a lot about what the individuals they serve use, why, and how -- it is much easier to get anonymous help from Google than a librarian. There are many animals in the information ecosystem, libraries and Google being among them. Our origins and evolutionary path differ, and this diversity is a good thing. kyle
Re: [CODE4LIB] Creating a Linked Data Service
Really helpful responses all. Moving forward with a plan that is much simpler than before. Thanks so much! Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dan Scott Sent: Saturday, August 09, 2014 1:41 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Creating a Linked Data Service On Wed, Aug 6, 2014 at 2:45 PM, Michael Beccaria mbecca...@paulsmiths.edu wrote: I have recently had the opportunity to create a new library web page and host it on my own servers. One of the elements of the new page that I want to improve upon is providing live or near live information on technology availability (10 of 12 laptops available, etc.). That data resides on my ILS server and I thought it might be a good time to upgrade the bubble gum and duct tape solution I now have to creating a real linked data service that would provide that availability information to the web server. The problem is there is a lot of overly complex and complicated information out there on linked data and RDF and the semantic web etc. Yes... this is where I was a year or two ago. Content negotiation / triple stores / ontologies / Turtle / n-quads / blah blah blah / head hits desk. and I'm looking for a simple guide to creating a very simple linked data service with php or python or whatever. Does such a resource exist? Any advice on where to start? Adding to the barrage of suggestions, I would suggest a simple structured data approach: a) Get your web page working first, clearly showing the availability of the hardware: make the humans happy! b) Enhance the markup of your web page to use microdata or RDFa to provide structured data around the web page content: make the machines happy! Let's assume your web page lists hardware as follows: h1Laptops/h1 ul liLaptop 1: available (circulation desk)/li liLaptop 2: loaned out/li ... /ul Assuming your hardware has the general attributes of type, location, name, and status, you could use microdata to mark this up like so: h1Laptops/h1 ul li itemscope itemtype=http://example.org/laptop;span itemprop=nameLaptop 1/span: span itemprop=statusavailable/span (span itemprop=locationcirculation desk/span)/li li itemscope itemtype=http://example.org/laptop;span itemprop=nameLaptop 2/span: span itemprop=statusloaned out/span/li ... /ul (We're using the itemtype attribute to specify the type of the object, using a made-up vocabulary... which is fine to start with). Toss that into the structured data linter at http://linter.structured-data.org and you can see (roughly) what any microdata parser will spit out. That's already fairly useful to machines that would want to parse the page for their own purposes (mobile apps, or aggregators of all available library hardware across public and academic libraries in your area, or whatever). The advantage of using structured data is that you can later on decide to use div or table markup, and as long as you keep the itemscope/itemtype/itemprop properties generating the same output, any clients using microdata parsers are going to just keep on working... whereas screen-scraping approaches will generally crash and burn if you change the HTML out from underneath them. For what it's worth, you're not serving up linked data at this point, because you're not really linking to anything, and you're not providing any identifiers to which others could link. You can add itemid attributes to satisfy the latter goal: h1Laptops/h1 ul li itemscope itemtype=http://example.org/laptop; itemid=#laptop1span itemprop=nameLaptop 1/span: span itemprop=statusavailable/span (span itemprop=locationcirculation desk/span)/li li itemscope itemtype=http://example.org/laptop; itemid=#laptop2span itemprop=nameLaptop 2/span: span itemprop=statusloaned out/span/li ... /ul I guess if you wanted to avoid this being a linked data silo, you could link out from the web page to the manufacturer's page to identify the make/model of each piece of hardware; but realistically that's probably not going to help anyone, so why bother? Long story short, you can achieve a lot of linked data / semantic web goals by simply generating basic structured data without having to worry about content negotiation to serve up RDF/XML and JSON-LD and Turtle, setting up triple stores, or other such nonsense. You can use whatever technology you're using to generate your web pages (assuming they're dynamically generated) to add in this structured data. If you're interested, over the last year I've put together a couple of gentle self-guiding tutorials on using RDFa (fulfills roughly the same role as microdata) with schema.org (a general vocabulary of types and their properties). The shorter one is at https
Re: [CODE4LIB] Creating a Linked Data Service
I'm a one man shop and sometimes go to these conferences where many of you brilliant people are making these brilliant solutions making these ubiquitous black box data services that talk to one another using a standardized query language and I felt inspired and thought maybe I have been doing patch work on a job that really ought to be done a better way. I'm all about the bubble gum and duct tape stuff but I was at a point where it would have been a good time to migrate to something a little more robust. I'm getting the impression that for the size of the projects I'm working on linked data and other similar solutions are very much overkill. I'll have a PHP script output some custom xml that can be ingested on the other end and call it a day. Done :-) This is also, at least for me, a challenge I have with being a wear-a-lot-of-hats-and-sometimes-write-code person at a small institution. Most of the time I'm not sure what I am supposed to be doing so I just make a solution that works without having others to bounce ideas off of. Thanks for the support. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Riley-Huff, Debra Sent: Wednesday, August 06, 2014 11:52 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Creating a Linked Data Service I agree with Roy. Seems like something that could be easily handled with PHP or Python scripts. Someone on the list may even have a homegrown solution (improved duct tape) they would be happy to share. I fail to see what the project has to do with linked data or why you would go that route. Debra Riley-Huff Head of Web Services Associate Professor JD Williams Library University of Mississippi University, MS 38677 662-915-7353 riley...@olemiss.edu On Wed, Aug 6, 2014 at 9:33 PM, Roy Tennant roytenn...@gmail.com wrote: I'm puzzled about why you want to use linked data for this. At first glance the requirement simply seems to be to fetch data from your ILS server, which likely could be sent in any number of simple packages that don't require an RDF wrapper. If you are the only one consuming this data then you can use whatever (simplistic, proprietary) format you want. I just don't see what benefits you would get by creating linked data in this case that you wouldn't get by doing something much more straightforward and simple. And don't be harshing on duct tape. Duct tape is a perfectly fine solution for many problems. Roy On Wed, Aug 6, 2014 at 2:45 PM, Michael Beccaria mbecca...@paulsmiths.edu wrote: I have recently had the opportunity to create a new library web page and host it on my own servers. One of the elements of the new page that I want to improve upon is providing live or near live information on technology availability (10 of 12 laptops available, etc.). That data resides on my ILS server and I thought it might be a good time to upgrade the bubble gum and duct tape solution I now have to creating a real linked data service that would provide that availability information to the web server. The problem is there is a lot of overly complex and complicated information out there onlinked data and RDF and the semantic web etc. and I'm looking for a simple guide to creating a very simple linked data service with php or python or whatever. Does such a resource exist? Any advice on where to start? Thanks, Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today!
[CODE4LIB] Creating a Linked Data Service
I have recently had the opportunity to create a new library web page and host it on my own servers. One of the elements of the new page that I want to improve upon is providing live or near live information on technology availability (10 of 12 laptops available, etc.). That data resides on my ILS server and I thought it might be a good time to upgrade the bubble gum and duct tape solution I now have to creating a real linked data service that would provide that availability information to the web server. The problem is there is a lot of overly complex and complicated information out there onlinked data and RDF and the semantic web etc. and I'm looking for a simple guide to creating a very simple linked data service with php or python or whatever. Does such a resource exist? Any advice on where to start? Thanks, Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today!
[CODE4LIB] Recommendations for IT Department Management Resouces
I'm looking for resources on managing IT departments and infrastructure in an academic environment. Resources that go over high level organization stuff like essential job roles, policies, standard operating procedures, etc.? Anyone know of any good resources out there that they consider useful or essential? Thanks, Mike Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today!
[CODE4LIB] Python (or similar) package to read counter stat reports?
This might be a bit obscure, but is there a python package or other programming language package that is designed to read library Counter statistics reports? I'm looking to start building a data warehouse for some of our ebook and journal vendors and want to pull data from these reports. I can script it but wanted to know if anything might exist to help along the way. Anybody else working on or completed something similar? Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today!
Re: [CODE4LIB] De-dup MARC Ebook records
Steve, I don't think it's so much find a control field (however, the closest match I can use is ISBN or eISBN which has its issues) but also normalizing the data in the fields so that matches are produced. It will no doubt take some time to figure out. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of McDonald, Stephen Sent: Friday, August 16, 2013 8:16 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] De-dup MARC Ebook records Michael Beccaria said: Thanks for the replies. To clarify, I am working with 2 (or more in the future) marc records outside of the ILS. I've tried using Marcedit but my usage did vary...not much overlap with the control fields that were available to me. I have a feeling they are a bit varied. I'm also messing around with marcXimiL a little but I'm having trouble getting it to output any records at all. I also was looking at the XC aggregation module but I was having trouble getting that to work properly as well and the listserv was unresponsive. It seemed like good software but it required me to set up an OAI harvest source to allow it to ingest the records and that...well...enough is enough... I think I will probably need to write something, and at least that way I know what it will be doing rather than plowing through software that has little to no support. Please feel free to let me know of a particular strategy you think might work best in this regard... If you couldn't get adequate deduping from the control fields available in MarcEdit deduping, what control fields do you think you need to dedup on? You can actually specify any arbitrary field and subfield for deduping in MarcEdit. Steve McDonald steve.mcdon...@tufts.edu
Re: [CODE4LIB] De-dup MARC Ebook records
Karen, Do you have a sense of how well it actually works? Is Open Library implementing it? Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coyle Sent: Thursday, August 22, 2013 11:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] De-dup MARC Ebook records The record matching algorithm used by the Open Library is available here: https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/merge The original spec, which may have changed in the implementation, is here: http://kcoyle.net/merge.html kc On 8/22/13 8:07 AM, Michael Beccaria wrote: Steve, I don't think it's so much find a control field (however, the closest match I can use is ISBN or eISBN which has its issues) but also normalizing the data in the fields so that matches are produced. It will no doubt take some time to figure out. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of McDonald, Stephen Sent: Friday, August 16, 2013 8:16 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] De-dup MARC Ebook records Michael Beccaria said: Thanks for the replies. To clarify, I am working with 2 (or more in the future) marc records outside of the ILS. I've tried using Marcedit but my usage did vary...not much overlap with the control fields that were available to me. I have a feeling they are a bit varied. I'm also messing around with marcXimiL a little but I'm having trouble getting it to output any records at all. I also was looking at the XC aggregation module but I was having trouble getting that to work properly as well and the listserv was unresponsive. It seemed like good software but it required me to set up an OAI harvest source to allow it to ingest the records and that...well...enough is enough... I think I will probably need to write something, and at least that way I know what it will be doing rather than plowing through software that has little to no support. Please feel free to let me know of a particular strategy you think might work best in this regard... If you couldn't get adequate deduping from the control fields available in MarcEdit deduping, what control fields do you think you need to dedup on? You can actually specify any arbitrary field and subfield for deduping in MarcEdit. Steve McDonald steve.mcdon...@tufts.edu -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
[CODE4LIB] De-dup MARC Ebook records
Has anyone had any luck finding a good way to de-duplicate MARC records from ebook vendors. We're looking to integrate Ebrary and Ebsco Academic Ebook collections and they estimate an overlap into the 10's of thousands. Strategies, tools, software? Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today!
Re: [CODE4LIB] De-dup MARC Ebook records
Thanks for the replies. To clarify, I am working with 2 (or more in the future) marc records outside of the ILS. I've tried using Marcedit but my usage did vary...not much overlap with the control fields that were available to me. I have a feeling they are a bit varied. I'm also messing around with marcXimiL a little but I'm having trouble getting it to output any records at all. I also was looking at the XC aggregation module but I was having trouble getting that to work properly as well and the listserv was unresponsive. It seemed like good software but it required me to set up an OAI harvest source to allow it to ingest the records and that...well...enough is enough... I think I will probably need to write something, and at least that way I know what it will be doing rather than plowing through software that has little to no support. Please feel free to let me know of a particular strategy you think might work best in this regard... Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Andy Kohler Sent: Thursday, August 15, 2013 2:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] De-dup MARC Ebook records Are you expecting to work with two files of records, outside of your ILS? If so, for a project like that I'd probably write Perl script(s) using MARC::Record (there are similar code libraries for Ruby, Python and Java at least). For each record in each file, use the ISBN (and/or OCLC number and/or LCCN) as a key. Compare all sets, and keep one record per key. This assumes that the vendors are supplying records with standard identifiers, and not just their own record numbers. If you're comparing each file with what's already in your ILS, then it'll depend on the tools the ILS offers for matching incoming records to the database. Or, export the database and compare it with the files, as above. Andy Kohler / UCLA Library Info Tech akoh...@library.ucla.edu / 310 206-8312 On Thu, Aug 15, 2013 at 10:11 AM, Michael Beccaria mbecca...@paulsmiths.edu wrote: Has anyone had any luck finding a good way to de-duplicate MARC records from ebook vendors. We're looking to integrate Ebrary and Ebsco Academic Ebook collections and they estimate an overlap into the 10's of thousands.
Re: [CODE4LIB] web-based ocr
Tesseract has really poor quality last time I tried it and ABBYY server is ridiculously expensive (and charges perpage). Leadtools has an ocr sdk but it too is expensive. If you want to go relatively cheap on this (and I don't know for sure but probably break some licensing agreement with ABBYY) you could set up a web server with a $99 version of abbyy finereader with a hotfolder set up to convert anything that is dropped into it to txt. You would then have to write the backend to keep track of the files that were submitted, let abbyy convert it, and then show the results to the end user. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Tuesday, March 12, 2013 2:16 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] web-based ocr Thank you for the prompt replies. Call me cheap or unable to navigate the political/fiscal landscape, but I don't see myself subscribing to a service. Instead I see putting a wrapper around Tesseract, but alas, the wrappers are written in languages that I don't know. [1] Hmmm... On the Perl side, I am having problems installing Image::OCR::Tesseract. [1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns -- Eric Still Cogitating Morgan
Re: [CODE4LIB] XML Parsing and Python
I ended up doing a regular expression find and replace function to replace all illegal xml characters with a dash or something. I was more disappointed in the fact that on the xml creation end, minidom was able to create non-compliant xml files. I assumed that if minidom could make it, it would be compliant but that doesn't seem to be the case. Now I have to add a find and replace function on the creation side to avoid this issue in the future. Good learning experience I guess. Thanks for all your suggestions. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Chris Beer Sent: Tuesday, March 05, 2013 1:48 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] XML Parsing and Python I'll note that 0x is a UTF-8 non-character, and these noncharacters should never be included in text interchange between implementations. [1] I assume the OCR engine maybe using 0x when it can't recognize a character? So, it's not wrong for a parser to complain (or, not complain) about 0x, and you can just scrub the string like Jon suggests. Chris [1] http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Noncharacters On 5 Mar, 2013, at 9:16 , Jon Stroop jstr...@princeton.edu wrote: Mike, I haven't used minidom extensively but my guess is that doc.toprettyxml(indent= ,encoding=utf-8) isn't actually changing the encoding because it can't parse the string in your content variable. I'm surprised that you're not getting tossed a UnicodeError, but The docs for Node.toxml() [1] might shed some light: To avoid UnicodeError exceptions in case of unrepresentable text data, the encoding argument should be specified as utf-8. So what happens if you're not explicit about the encoding, i.e. just doc.toprettyxml()? This would hopefully at least move your exception to a more appropriate place. In any case, one solution would be to scrub the string in your content variable to get rid of the invalid characters (hopefully they're insignificant). Maybe something like this: def unicode_filter(char): try: unicode(char, encoding='utf-8', errors='strict') return char except UnicodeDecodeError: return '' content = 'abc\xFF' content = ''.join(map(unicode_filter, content)) print content Not really my area of expertise, but maybe worth a shot -Jon 1. http://docs.python.org/2/library/xml.dom.minidom.html#xml.dom.minidom. Node.toxml -- Jon Stroop Digital Initiatives Programmer/Analyst Princeton University Library jstr...@princeton.edu On 03/04/2013 03:00 PM, Michael Beccaria wrote: I'm working on a project that takes the ocr data found in a pdf and places it in a custom xml file. I use Python scripts to create the xml file. Something like this (trimmed down a bit): from xml.dom.minidom import Document doc = Document() Page = doc.createElement(Page) doc.appendChild(Page) f = StringIO(txt) lines = f.readlines() for line in lines: word = doc.createElement(String) ... word.setAttribute(CONTENT,content) Page.appendChild(word) return doc.toprettyxml(indent= ,encoding=utf-8) This creates a file, simply, that looks like this: ?xml version=1.0 encoding=utf-8? Page HEIGHT=3296 WIDTH=2609 String CONTENT=BuffaloLaunch / String CONTENT=Club / String CONTENT=Offices / String CONTENT=Installed / ... /Page I am able to get this document to be created ok and saved to an xml file. The problem occurs when I try and have it read using the lxml library: from lxml import etree doc = etree.parse(filename) I am running across errors like XMLSyntaxError: Char 0x out of allowed range, line 94, column 19. Which when I look at the file, is true. There is a 0X character in the content field. How is a file able to be created using minidom (which I assume would create a valid xml file) and then failing when parsing with lxml? What should I do to fix this on the encoding side so that errors don't show up on the parsing side? Thanks, Mike How is the Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today!
[CODE4LIB] XML Parsing and Python
I'm working on a project that takes the ocr data found in a pdf and places it in a custom xml file. I use Python scripts to create the xml file. Something like this (trimmed down a bit): from xml.dom.minidom import Document doc = Document() Page = doc.createElement(Page) doc.appendChild(Page) f = StringIO(txt) lines = f.readlines() for line in lines: word = doc.createElement(String) ... word.setAttribute(CONTENT,content) Page.appendChild(word) return doc.toprettyxml(indent= ,encoding=utf-8) This creates a file, simply, that looks like this: ?xml version=1.0 encoding=utf-8? Page HEIGHT=3296 WIDTH=2609 String CONTENT=BuffaloLaunch / String CONTENT=Club / String CONTENT=Offices / String CONTENT=Installed / ... /Page I am able to get this document to be created ok and saved to an xml file. The problem occurs when I try and have it read using the lxml library: from lxml import etree doc = etree.parse(filename) I am running across errors like XMLSyntaxError: Char 0x out of allowed range, line 94, column 19. Which when I look at the file, is true. There is a 0X character in the content field. How is a file able to be created using minidom (which I assume would create a valid xml file) and then failing when parsing with lxml? What should I do to fix this on the encoding side so that errors don't show up on the parsing side? Thanks, Mike How is the Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today!
[CODE4LIB] OCR To ALTO without ABBYY
I inadvertently purchase ABBYY Finereader 11 Corporate thinking that it would be capable of outputting to ALTO XML. I was wrong. ABBYY Finereader Engine does:/ Ultimately, I want to OCR some newspaper images and export them to ALTO XML and, until the proof of concept is done, I want to try to do it on the cheap. My plan this morning was to write some scripts to OCR them using Microsoft Office Document Imaging (MODI) and then export the results to ALTO XML which could be a big project. Has anyone done this before or know of a quick and dirty way to get some OCR data? Thanks, Mike Beccaria Systems Librarian Paul Smith's College 518.327.6376
Re: [CODE4LIB] Silently print (no GUI) in Windows
Wireless no-device-drive install print solutions usually do this and I think Adobe Acrobat full version does this when it converts files from, say, Word to PDF. They automate a print job and print to a PDF writer printer. This usually requires whatever software that is needed to print be installed on the machine (i.e. acrobat, excel, word, etc). You could easily write a vbscript or powershell script to print them like so: How to print a pdf file: set oWsh = CreateObject (Wscript.Shell) oWsh.run Acrobat.exe /p /h FileName,,true And a Word document: Set objWord = CreateObject(Word.Application) Set objDoc = objWord.Documents.Open(c:\scripts\inventory.doc) objDoc.PrintOut() objWord.Quit Or, for word documents, you can use the command line to print (via a batch file or other scripting program) refer to this: http://www.christowles.com/2011/04/microsoft-word-printing-from-command.html Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Tuesday, April 03, 2012 3:17 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Silently print (no GUI) in Windows Would Google Cloud Print be helpful? Otherwise, I think you may need to use multiple apps to actually print things (i.e. you actually need Word to print Word docs) unless the files are all converted. While at least in the case of Word, this can be done from the command line with switches, it actually invokes the whole program which is a huge waste -- it's probably better to just have Office running and then have an Office Basic program scan for files and send them to the printer. kyle On Tue, Apr 3, 2012 at 11:48 AM, Kozlowski,Brendon bkozlow...@sals.eduwrote: Not a dumb question at all. In this particular case, the receiving PC that is to be storing/printing the documents will be taking jobs from multiple networks, buildings, etc by either piping an email account, or downloading via a user's upload from a webpage. We already have a solution for catching jobs in the print spooler (not ours), but need to automate the sending of the documents to the spooler itself. The only way I've ever sent documents to the spooler was by opening up the full application (ex: Microsoft Word), and using the GUI to send the print job. Since the PC housing and releasing these files is expected to be un-manned and sit in a back room, we just need to be able to silently print the jobs in the background. Opening multiple applications over and over again would use up a lot of resources, so a silent, no-GUI option would be the best from my very little understanding - if it's even possible. Brendon Kozlowski Web Administrator Saratoga Springs Public Library 49 Henry Street Saratoga Springs, NY, 12866 [518] 584-7860 x217 From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Kyle Banerjee [baner...@uoregon.edu] Sent: Tuesday, April 03, 2012 1:25 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Silently print (no GUI) in Windows At the risk of asking a dumb question, why wouldn't a print server meet your use case if the print jobs come from elsewhere? kyle On Tue, Apr 3, 2012 at 9:15 AM, Kozlowski,Brendon bkozlow...@sals.edu wrote: I'm curious to know if anyone has discovered ways of silently printing documents from such Windows applications as: - Acrobat Reader (current version) - Microsoft Office 2007 (Word, Excel, Powerpoint, Visio, etc...) - Windows Picture and Fax Viewer I unfortunately haven't had much luck finding any resources on this. I'd like to be able to receive documents in a queue like fashion to a single PC and simply print them off as they arrive. However, automating the loading/exiting of the full-blown application each time, and on-demand, seems a little too cumbersome and unnecessary. I have not yet decided on whether I'd be scripting it (PHP, AutoIT, batch files, VBS, Powershell, etc...) or learning and then writing a .NET application. If .NET solutions use the COM object, the scripting becomes a potential candidate. Unfortunately I need to know how, or even if, it's even possible to do first. Thank you for any and all feedback or assistance. Brendon Kozlowski Web Administrator Saratoga Springs Public Library 49 Henry Street Saratoga Springs, NY, 12866 [518] 584-7860 x217 Please consider the environment before printing this message. To report this message as spam, offensive, or if you feel you have received this in error, please send e-mail to ab...@sals.edu including the entire contents and subject of the message. It will be reviewed by staff and acted upon appropriately. --
Re: [CODE4LIB] image zoom for iPad
I thought Microsoft released a seadragon mobile app awhile back also. I remember playing with it on my ipod touch. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Friscia, Michael Sent: Monday, January 30, 2012 4:20 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] image zoom for iPad Hi all, I'm wondering if anyone can recommend an image zoom option for ipad that provides functionality like zoomify/seadragon but works on the ipad. I'm hoping for some ajax/jquery library I never heard of that will work and provide good functionality. Or maybe I'm doing something wrong and my use of seadragon would be better if I did x, y and z... Any thoughts or suggestions that do not include don't do zoom would be greatly appreciated. Thanks, -mike ___ Michael Friscia Manager, Digital Library Programming Services Yale University Library (203) 432-1856
Re: [CODE4LIB] NEcode4lib?
I'd be very interested in going. Yale is a good location for me. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joseph Montibello Sent: Friday, December 16, 2011 9:42 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] NEcode4lib? Hi, It looks like there was a New England regional a couple of years ago. Is there still any activity/interest in this region? I can imagine that in addition to folks who missed the registration power-hour, there might be a significant group that can't get their library to support a trip to Seattle. Just curious. Joe Montibello, MLIS Library Systems Manager Dartmouth College Library 603.646.9394 joseph.montibe...@dartmouth.edumailto:joseph.montibe...@dartmouth.edu
Re: [CODE4LIB] Examples of visual searching or browsing
Microsoft labs awhile back released a few visualization tools that might be of interest: 1. Deepzoom: quickly explore gigapixel sized images and collections of images. http://www.microsoft.com/silverlight/deep-zoom/ Here's one of my favorite examples of Yosemite valley: http://www.xrez.com/yose_proj/yose_deepzo And the classic Hard Rock Memorabilia site: http://memorabilia.hardrock.com/ 2. Pivot: http://www.microsoft.com/silverlight/pivotviewer/ 3. Photosynth - really innovative but perhaps limited in it's scope. http://photosynth.net/ Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu Become a friend of Paul Smith's Library on Facebook today! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Julia Bauder Sent: Thursday, October 27, 2011 4:27 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Examples of visual searching or browsing Dear fans of cool Web-ness, I'm looking for examples of projects that use visual(=largely non-text and non-numeric) interfaces to let patrons browse/search collections. Things like the GeoSearch on North Carolina Maps[1], or projects that use Simile's Timeline or Exhibit widgets[2] to provide access to collections (e.g., what's described here: https://letterpress.uchicago.edu/index.php/jdhcs/article/download/59/70) , or in-the-wild uses of Recollection[3]. I'm less interested in knowing about tools (although I'm never *uninterested* in finding out about cool tools) than about production or close-to-production sites that are making good use of these or similar tools to provide visual, non-linear access to collections. Who's doing slick stuff in this area that deserves a look? Thanks! Julia [1] http://dc.lib.unc.edu/ncmaps/search.php [2] http://www.simile-widgets.org/ [3] http://recollection.zepheira.com/ * Julia Bauder Data Services Librarian Interim Director of the Data Analysis and Social Inquiry Lab (DASIL) Grinnell College Libraries Sixth Ave. Grinnell, IA 50112 641-269-4431
Re: [CODE4LIB] id services from loc
While not exactly what you're looking for, OCLC Collection Analysis documenation contains a file on how they map LC\Dewey call numbers to subjects. I'm not sure if they are LC subject headings though. Click OCLC Conspectus on this page for the excel sheet: http://www.oclc.org/support/documentation/collectionanalysis/default.htm Mike Beccaria Systems Librarian Paul Smith's College 518.327.6376 From: Code for Libraries on behalf of Enrico Silterra Sent: Tue 10/18/2011 2:11 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] id services from loc is there any way to go from a LC call number, like DF853 to http://id.loc.gov/authorities/subjects/sh85057107 via some sort of api? opensearch? thanks, rick -- Enrico Silterra Software Engineer 501 Olin Library Cornell University Ithaca NY 14853 Voice: 607-255-6851 Fax: 607-255-6110 E-mail: es...@cornell.edu http://www.library.cornell.edu/dlit Out of the crooked timber of humanity no straight thing was ever made CONFIDENTIALITY NOTE The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this document.
[CODE4LIB] Software for Capstone\Theses Projects
I've been looking for an out of the box solution to archive and make accessible capstone\theses projects to web users. The caveat being that when the author submits the paper, they would be able provide permissions and metadata to the document (copyright and access) and, based on those permissions, the entire document would be made public or only the metadata. I know that there are large repository software packages like DSpace or Fedora Commons that probably do this, but I was looking for something smaller. I don't need to scale to millions of documents and have all of the potential bells and whistles. Just something that lets people create an account, upload, set permissions and the have documents show up in the search interface. Anything like this around? Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu
Re: [CODE4LIB] Free/Open OCR solutions?
No, other than it is possible to do so: http://office.microsoft.com/en-us/help/about-ocr-international-issues-HP003081238.aspx http://office.microsoft.com/en-us/help/about-ocr-international-issues-HP003081238.aspx Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of stuart yeates Sent: Monday, August 02, 2010 4:46 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Free/Open OCR solutions? Michael Beccaria wrote: Andrew, If you have MS Office, Microsoft has an OCR engine built in. I used it to OCR some college yearbooks at MPOW. It's not ABBYY but it works pretty well! It's scriptable using VBScript or your MS language of choice. http://msdn.microsoft.com/en-us/library/aa167607(office.11).aspx Notice the OCR method in the document. Could someone comment on the efficacy of this OCR on languages with non-latin characters? cheers stuart -- Stuart Yeates http://www.nzetc.org/ New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/ Institutional Repository
Re: [CODE4LIB] New books RSS feed / badge with cover images?
Laura, While not directly related to your question, another route you might want to go is to try and get thumbnail images from other sources (i.e. amazon, openlibrary, etc.) in addition to syndetics and cache them for future use. I do this on our website and I wrote an article on the basics here: (http://journal.code4lib.org/articles/1009). I can send you the code I have on my server for our complete solution if you want. Here's our new books page, database driven with local thumbnails (http://www.paulsmiths.edu/library/books.php). I also use the same data to send out a new books email every 2 weeks to subscribers. You can sign up here to get a sample (http://library.paulsmiths.edu/newbooks/subscribe.php) and then unsubscribe in your 1st email if you want. Again, if you want the Python code that generates and sends the emails, let me know. Also, if you're a PHP fan, VuFind has some code on downloading book covers here: https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/bookcover.ph p Hope that helps in a round-a-bout way. Be Well, Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Laura Harris Sent: Friday, April 09, 2010 10:07 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] New books RSS feed / badge with cover images? Hi, all - I suspect something like this is being done already, so I thought I would check in and ask. Essentially, what I would like to do is display the library's new books on a web page in a graphic format - I'd like it to look very similar to the sorts of widgets that GoodReads or LibraryThing users can create. I threw up a few quick examples here: http://gvsu.edu/library/zzwidget-test-171.htm Now, we have an RSS feed for our new books (Millennium is our ILS if it matters), and as I understand it, the images we get from Syndetic Solutions are parsed as enclosures to that RSS feed. Is there a way to take the RSS feed, and only show those enclosures (if they exist, and are not the default grey box we see if the book doesn't have a cover image) somehow? Or perhaps there's a really easy way to do this that I'm overlooking. Would appreciate your insight! Thanks,
Re: [CODE4LIB] Conference followup; open position at Google Cambridge
Will, I didn't get a chance to attend code4lib this year, but thought I would respond via the list. At Paul Smith's College we are using google book thumbnails for our catalog as well as using the embedded viewers to allow users to view previews of available books. Here is a sample search: http://library.paulsmiths.edu/catalog/search/?q=forestindex=textsort=; limit=pubdaterange:2000-2009facetclick=pubdaterange We will be switching over to VuFind this summer and I will likely use GB in a similar way with that interface as well. I plan (hopefully this summer) to build a web service that uses OCLC Web Services, Open Library, Hathi Trust, and Google Books to search for and return similar items from those resources to display in our catalog. I really like the service overall. Mike Beccaria Systems Librarian Head of Digital Initiative Paul Smith's College 518.327.6376 mbecca...@paulsmiths.edu -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Will Brockman Sent: Tuesday, March 09, 2010 5:02 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Conference followup; open position at Google Cambridge As a first-time Code4Lib attendee, let me say thanks for a fun conference - a very interesting and creative group of people! A question I posed to some of you in person, and would love to hear more answers to: What are you doing with Google Books? Do you have a new way of using that resource? Are there things you'd like to do with it that aren't possible yet? Also, a couple of people asked if Google is hiring. Not only are we hiring large numbers of software engineers, but we're now seeking a librarian / software developer (below). I'm happy to take questions about either. All the best, Will brock...@google Metadata Analyst Google Books is looking for a hybrid librarian/software developer to help us organize all the world's books. This person would work closely with software engineers and librarians on a variety of tasks, ranging from algorithm evaluation to designing and implementing improvements to Google Books. Candidates should have: * An MLS or MLIS degree, ideally with cataloguing experience * Programming experience in Python, C++, or Java * Project management experience a plus, but not required This position is full-time and based in Cambridge, MA.
Re: [CODE4LIB] University of Rochester Releases IR+ Institutional Repository System
Nathan, Can you summarize how the IR+ software is different than other major institutional repository software? I'm not directly involved with a repository and so my understanding of the scope of these products lacks detail. Where does IR+ fit into the big picture? Thanks, Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Sarr, Nathan Sent: Tuesday, December 15, 2009 2:57 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] University of Rochester Releases IR+ Institutional Repository System The University of Rochester is pleased to announce the 1.0 production version of its new open source institutional repository software platform, IR+. The University has been running IR+ in production since August 2009. The download can be found here: http://code.google.com/p/irplus/downloads/list http://code.google.com/p/irplus/downloads/list The website for the project can be found here: http://www.irplus.org http://www.irplus.org IR+ includes the following features: - Repository Wide Statistics: download counts at the repository collection and publication level. The statistics excludes web crawler results, and includes the ability to retroactively remove previously unknown crawlers or download counts that should not be included, for more accurate statistical reporting. - Researcher Pages, to allow users (faculty, graduate students, researchers) to highlight their work and post their CV o Example of a current researcher: https://urresearch.rochester.edu/viewResearcherPage.action?researcherId= 30 https://urresearch.rochester.edu/viewResearcherPage.action?researcherId =30 - Ability to create Personal publications that allows users to have full control over their work and see download counts without publishing into the repository. - An online workspace where users can store files they are working on, and if needed, share files with colleagues or their thesis advisor. - Contributor pages where users can view download counts for all publications that they are associated with in the repository. o Example of a contributor page: https://urresearch.rochester.edu/viewContributorPage.action?personNameId =20 https://urresearch.rochester.edu/viewContributorPage.action?personNameI d=20 - Faceted Searching (example search for: Graduate Student Research) o https://urresearch.rochester.edu/searchRepositoryItems.action?query=Medi cal+Image https://urresearch.rochester.edu/searchRepositoryItems.action?query=Med ical+Image - Embargos (example below embargoed until 2011-01-01) o https://urresearch.rochester.edu/institutionalPublicationPublicView.acti on?institutionalItemId=8057 https://urresearch.rochester.edu/institutionalPublicationPublicView.act ion?institutionalItemId=8057 - Name Authority Control (Notice changes in last name) o https://urresearch.rochester.edu/viewContributorPage.action?personNameId =209 https://urresearch.rochester.edu/viewContributorPage.action?personNameI d=209 You can see the IR+ system customized for our university and in action here: https://urresearch.rochester.edu https://urresearch.rochester.edu A further explanation of highlights can be found on my researcher page here: https://urresearch.rochester.edu/researcherPublicationView.action?resear cherPublicationId=11 https://urresearch.rochester.edu/researcherPublicationView.action?resea rcherPublicationId=11 The documentation for the system (install/user/administration) with lots of pictures can be found on my researcher page here: https://urresearch.rochester.edu/researcherPublicationView.action?resear cherPublicationId=16 https://urresearch.rochester.edu/researcherPublicationView.action?resea rcherPublicationId=16 We would be happy to give you a personal tour of the system and the features it provides. Please feel free to contact me with any questions you may have. -Nate Nathan Sarr Senior Software Engineer River Campus Libraries University of Rochester Rochester, NY 14627 (585) 275-0692
Re: [CODE4LIB] holdings standards/protocols
VuFind has a connector that works pretty well for SirsiDynix Unicorn/Symphony users. It levies an ILS server side script (Perl I think) to interface with the API to get holdings data. It is possible to get account data the same way, though it hasn't been developed. You can see it run here on our beta install: http://library.paulsmiths.edu/vufind/Search/Home?lookfor=dogtype=allsu bmit=Find Wait a few seconds following page load and the holdings data should update. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Chris Keene Sent: Monday, November 16, 2009 9:05 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] holdings standards/protocols Hi We recently implemented a new third party web catalogue (Aquabrowser). So far so good, and separating the web based discovery layer from the monolithic ILS seems to be the right direction. However there seems to be two areas of weakness: Holdings and 'My Accout' (renewal, reservations). i.e. the need for *any* catalogue/discovery system to allow users to see holdings and account info from *any* ils. I'm trying to get my facts right on the current situation. Any help appreciated. re Holdings. Two things come up when asking around and looking on the web (some what briefly), Z39.50 and ISO 20775. Can anyone give me an idea if any/many/all (ILS) Z implementations have implemented the holdings information? Is there a way of testing this using a client such as yaz (e.g. a worked example of seeing holdings via Z) Is there interest from ILS suppliers in the ISO holdings standard, are any of them implementing it? http://www.loc.gov/standards/iso20775/ http://www.portia.dk/zholdings/ http://www.nlbconference.com/ilds/plenary4B.htm Thanks for any info Chris -- Chris Keene c.j.ke...@sussex.ac.uk Technical Development Manager Tel (01273) 877950 University of Sussex Library http://www.sussex.ac.uk/library/
Re: [CODE4LIB] preconference proposals - solr
I can't make it to c4l this year :( But knowing that the preconferences are really very valuable, if there is a way that this information could be recorded and placed online like the main presentations that would be amazing! Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bess Sadler Sent: Friday, November 13, 2009 11:26 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] preconference proposals - solr Hey, how about this? I've been discussing this off list with Erik and Naomi and this is what we came up with (I also added it to the wiki): This is a proposal for several pre-conference sessions that would fit together nicely for people interested in implementing a next-gen catalog system. 1. Morning session - solr white belt Instructor: Bess Sadler (anyone else want to join me?) The journey of solr mastery begins with installation. We will then proceed to data types, indexing, querying, and inner harmony. You will leave this session with enough information to start running a solr service with your own data. 2. Morning session - solr black belt Instructors: Erik Hatcher (and Naomi Dushay? she has offered to help, if that's of interest) Amaze your friends with your ability to combine boolean and weighted searching. Confound your enemies with your mastery of the secrets of dismax. Leave slow queries in the dust as you performance tune solr within an inch of its life. [We should probably add more specific advanced topics here... suggestions welcome] 3. Afternoon session - Blacklight Instructors: Naomi Dushay, Jessie Keck, and Bess Sadler Apply your solr skills to running Blacklight as a front end for your library catalog, institutional repository, or anything you can index into solr. We'll cover installation, source control with git, local modifications, test driving development, and writing object-specific behaviors. You'll leave this workshop ready to revolutionize discovery at your library. Solr white belts or black belts are welcome. And then anyone else who had a topic that built on solr (e.g., vufind?) could add it in the afternoon. Obviously I'm biased, but I really do think the topic of implementing a next gen catalog is meaty enough for a half day and I know people are asking me about it and eager to attend such a thing. What do you think, folks? Bess On 12-Nov-09, at 4:10 PM, Gabriel Farrell wrote: On Tue, Nov 10, 2009 at 02:47:42PM +, Jodi Schneider wrote: If you'd be up for it Erik, I'd envision a basic session in the morning. Some of us (like me) have never gotten Solr up and running. Then the afternoon could break off for an advanced session. Though I like Bess's idea, too! Would that be suitable for a conference breakout? Not sure I'd want to pit it against Solr advanced session! The preconfs should be as inclusive as possible, but I'm wondering if the Solr session might be more beneficial if we dive into the particulars right off the bat in the morning. There are only a few steps to get Solr up and running -- it's in the configuration for our custom needs that the advice of a certain Mr. Hatcher can really be helpful. You're right, though, that the NGC thing sounds more like a BOF session. I'd support that in order to attend a full preconf day of Solr. Gabriel Elizabeth (Bess) Sadler Chief Architect for the Online Library Environment Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 b...@virginia.edu (434) 243-2305
[CODE4LIB] SerialsSolutions Javascript Question
I was intrigued by someone who posted to the Worldcat Developers Network forum. They were asking about the xISSN service and having it return whether an ISSN is peer reviewed or not. Which got me thinking...Has anyone been able to finagle a feature into their SerialsSolutions A-Z list where it shows peer reviewed status for the titles that are returned using a WC service? SS has limited editing capabilities on their page so the javascript question is this: Is it possible when being able to edit ONLY the header to alter span tags of a loaded web page using javascript? Can I insert some javascript in those sections that will scrape the ISSN number from these span tags and add some dynamic content from a web service using javascript alone? I'm not very proficient at javascript so be gentle. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376
Re: [CODE4LIB] SerialsSolutions Javascript Question
I should clarify. The most granular piece of information in the html is a class attribute (i.e. there is no id). Here is a snippet: div class=SS_Holding style=background-color: #CECECE !-- Journal Information -- span class=SS_JournalTitlestrongAnnals of forest science./strong/spannbsp;span class=SS_JournalISSN(1286-4560)/span I want to alter the span class=SS_JournalISSN(1286-4560)/span section. Maybe add some html after the issn that tells whether it is peer reviewed or not. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Michael Beccaria Sent: Wednesday, October 28, 2009 9:13 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] SerialsSolutions Javascript Question I was intrigued by someone who posted to the Worldcat Developers Network forum. They were asking about the xISSN service and having it return whether an ISSN is peer reviewed or not. Which got me thinking...Has anyone been able to finagle a feature into their SerialsSolutions A-Z list where it shows peer reviewed status for the titles that are returned using a WC service? SS has limited editing capabilities on their page so the javascript question is this: Is it possible when being able to edit ONLY the header to alter span tags of a loaded web page using javascript? Can I insert some javascript in those sections that will scrape the ISSN number from these span tags and add some dynamic content from a web service using javascript alone? I'm not very proficient at javascript so be gentle. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376
Re: [CODE4LIB] OCR PDFs
It's not exactly what you're looking for, but Microsoft Office comes with a scripting OCR engine that works on TIFFs. I use it to get text from yearbooks we are scanning so people can look for names and such. While I wouldn't put it on par with ABBYY, it does a pretty decent job. I wrote a simple script in vbscript that scans all the tiff files in a folder and exports a txt file with the same name as the image that has all of the text it finds. If you want it, let me know and I'll send it your way. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED] --- This message may contain confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of James Tuttle Sent: Friday, October 17, 2008 7:57 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] OCR PDFs -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I wonder if any of you might have experience with creating text PDFs from TIFFs. I've been using tiffcp to stitch TIFFs together into a single image and then using tiff2pdf to generate PDFs from the single TIFF. I've had to pass this image-based PDF to someone with Acrobat to use it's batch processing facility to OCR the text and save a text-based PDF. I wonder if anyone has suggestions for software I can integrate into the script (Python on Linux) I'm using. Thanks, James - -- - --- James Tuttle Digital Repository Librarian NCSU Libraries, Box 7111 North Carolina State University Raleigh, NC 27695-7111 [EMAIL PROTECTED] (919)513-0651 Phone (919)515-3031 Fax -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI+H1zKxpLzx+LOWMRAgxIAJwNXyeMJbk6r6hmHpNAdEvWIQbCVgCgp8JR nyS3WZ4UuRbU/6DTH7ohe/M= =mT2T -END PGP SIGNATURE-
Re: [CODE4LIB] marc4j 2.4 released
Very cool! I noticed that a feature, MarcDirStreamReader, is capable of iterating over all marc record files in a given directory. Does anyone know of any de-duplicating efforts done with marc4j? For example, libraries that have similar holdings would have their records merged into one record with a location tag somewhere. I know places do it (consortia etc.) but I haven't been able to find a good open program that handles stuff like that. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED] --- This message may contain confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Bess Sadler Sent: Monday, October 20, 2008 11:12 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] marc4j 2.4 released Dear Code4Libbers, I'm very pleased to announce that for the first time in almost two years there has been a new release of marc4j. Release 2.4 is a minor release in the sense that it shouldn't break any existing code, but it's a major release in the sense that it represents an influx of new people into the development of this project, and a significant improvement in marc4j's ability to handle malformed or mis-encoded marc records. Release notes are here: http://marc4j.tigris.org/files/documents/ 220/44060/changes.txt And the project website, including download links, is here: http:// marc4j.tigris.org/ We've been using this new marc4j code in solrmarc since solrmarc started, so if you're using Blacklight or VuFind, you're probably using it already, just in an unreleased form. Bravo to Bob Haschart, Wayne Graham, and Bas Peters for making these improvements to marc4j and getting this release out the door. Bess Elizabeth (Bess) Sadler Research and Development Librarian Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [EMAIL PROTECTED] (434) 243-2305
[CODE4LIB] Google Books Dynamic Links API and Python
Not everyone will care, but I will put it in here for posterity sake and probably for my own reference when I forget in the future. I was having trouble getting the new google books dynamic link api to work right with python (http://code.google.com/apis/books/docs/dynamic-links.html). I was using the basic urllib python library with a non-working code base that looks like this: import urllib,urllib2 gparams = urllib.urlencode({'bibkeys': 'ISBN:061837943', 'jscmd':'viewapi','callback':'mycallback'}) g=urllib2.urlopen(url=http://books.google.com/books?%s; % gparams) print g.read() I was getting an http 401 error, Unauthorized. Code4lib IRC folks told me it was probably the headers urllib was sending, and they were right. I wrote code to modify the headers to make google believe I was requesting from firefox. The working code is below. I know most of you can write this stuff in your sleep, but I thought this might save someone like me some time in the end. Hope it helps, Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED] import urllib,urllib2 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor()) params = urllib.urlencode({'bibkeys': 'ISBN:061837943', 'jscmd':'viewapi','callback':'mycallback'}) request = urllib2.Request('http://books.google.com/books?bibkeys=0618379436jscmd= viewapicallback=mycallback') opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')] data = opener.open(request).read() print data
Re: [CODE4LIB] Google Books Dynamic Links API and Python
Scratch that, the code is simpler. Serves me right for not checking things twice: import urllib,urllib2 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor()) request = urllib2.Request('http://books.google.com/books?bibkeys=0618379436jscmd= viewapicallback=mycallback') opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')] data = opener.open(request).read() print data Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED]
Re: [CODE4LIB] Free covers from Google
If you can find a public email address anywhere or comment form, let us know. You can send a response to them here: http://www.google.com/support/librariancenter/bin/request.py The form seems to be for librarians so maybe they'll understand the issue and talk to people who may be able to make a change. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED]
[CODE4LIB] Whatbird Interface Framework
Hey all, I'm considering trying to create a framework\tool to allow people to create a whatbird.com like interface for other types of datasets (plants, trees, anything really). The idea is to create a framework allowing users to create a discovery tool with attribute selections to narrow down the result set. So, for example, our faculty/students would identify attributes found in all trees (leaf shape, fruit, bark, form, etc.) and then input this data into the tool which would then allow them to input actual trees and associate them with the attributes (as well as input description info, pictures, etc.). The end result would look something like whatbird.com does with birds. This will be a challenge for me (but a good one). My thought is to use a web framework like Django (picked because I know it a little) but am unsure if you can have it organize the database tables with the relationships properly. I considered using solr but thought it would be overkill considering the relatively small datasets this tool would be used to create (under 1000 objects) but in the end it might be a good bet. If approved (I have to talk to the dean of our forestry department to see if he will buy into the idea) I will try and create the bulk of it during January and tweak it the rest of the semester. Anyone interesting in working on this type of project with me? Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED]
[CODE4LIB] Preconference Location
I noticed that the pre-conference location was changed to the Tate Student Center. Is this near the Georgia Center Hotel? Come to think of it, is the Georgia Center for Continuing Education nearby as well? I'm asking because I won't have a car...just that shuttle to and from the airport. Thanks, Mike Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED]