Re: [CODE4LIB] Google can give you answers, but librarians give you the right answers

2016-04-04 Thread Michael Beccaria
I would suggest that librarians are interested in, among other things, 
promoting information literacy skills to our patrons. According to ACRL's 
Standards for Information Literacy in Higher Education (2000 edition):
http://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/standards.pdf

An information literate individual is able to:
-Determine the nature and extent of information needed
-Access the needed information effectively and efficiently
-Evaluate information and its sources critically
-Incorporate selected information into one’s knowledge base
-Use information effectively to accomplish a specific purpose
-Understand the economic, legal, and social issues surrounding the use of 
information, and access and use information ethically and legally

The idea that Google can't provide the contextual nature of the information 
that it presents and people vary in their levels of information literacy, a 
librarian, presumably with an advanced skillset and knowledge base in this 
area, can help provide assistance and context to what a patron might need. In 
that sense, I think a librarian can often add tremendous value to a search.

Mike Beccaria
Director of Library Services
Paul Smith’s College
7833 New York 30
Paul Smiths, NY 12970
518.327.6376
mbecca...@paulsmiths.edu
www.paulsmiths.edu

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle 
Banerjee
Sent: Friday, April 01, 2016 2:00 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Google can give you answers, but librarians give you 
the right answers

On Thu, Mar 31, 2016 at 9:31 PM, Cornel Darden Jr.  wrote:

>
> "Google can give you answers, but librarians give you the right answers."
>
> Is it me? Or is there something wrong with this statement?
>

There's nothing wrong with the statement. As is the case with all sound bites, 
it should be used to stimulate thought rather than express reality.

Librarians have a schizophrenic relationship with Google. We dump on Google all 
the time, but it's one of the tools librarians of all stripes rely on the most. 
When we build things, we emulate Google's look, feel, and functionality. And  
while we blast Google on privacy issues, human librarians know a lot about what 
the individuals they serve use, why, and how -- it is much easier to get 
anonymous help from Google than a librarian.

There are many animals in the information ecosystem, libraries and Google being 
among them. Our origins and evolutionary path differ, and this diversity is a 
good thing.

kyle


Re: [CODE4LIB] Creating a Linked Data Service

2014-08-13 Thread Michael Beccaria
Really helpful responses all. Moving forward with a plan that is much simpler 
than before. Thanks so much!

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dan 
Scott
Sent: Saturday, August 09, 2014 1:41 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Creating a Linked Data Service

On Wed, Aug 6, 2014 at 2:45 PM, Michael Beccaria mbecca...@paulsmiths.edu
wrote:

 I have recently had the opportunity to create a new library web page 
 and host it on my own servers. One of the elements of the new page 
 that I want to improve upon is providing live or near live information 
 on technology availability (10 of 12 laptops available, etc.). That 
 data resides on my ILS server and I thought it might be a good time to 
 upgrade the bubble gum and duct tape solution I now have to creating a 
 real linked data service that would provide that availability information to 
 the web server.

 The problem is there is a lot of overly complex and complicated 
 information out there on linked data and RDF and the semantic web etc.


Yes... this is where I was a year or two ago. Content negotiation / triple 
stores / ontologies / Turtle / n-quads / blah blah blah / head hits desk.


 and I'm looking for a simple guide to creating a very simple linked 
 data service with php or python or whatever. Does such a resource 
 exist? Any advice on where to start?


Adding to the barrage of suggestions, I would suggest a simple structured data 
approach:

a) Get your web page working first, clearly showing the availability of the
hardware: make the humans happy!
b) Enhance the markup of your web page to use microdata or RDFa to provide 
structured data around the web page content: make the machines happy!

Let's assume your web page lists hardware as follows:

h1Laptops/h1
ul
  liLaptop 1: available (circulation desk)/li
  liLaptop 2: loaned out/li
   ...
/ul

Assuming your hardware has the general attributes of type, location, 
name, and status, you could use microdata to mark this up like so:

h1Laptops/h1
ul
  li itemscope itemtype=http://example.org/laptop;span
itemprop=nameLaptop 1/span: span itemprop=statusavailable/span
(span itemprop=locationcirculation desk/span)/li
  li itemscope itemtype=http://example.org/laptop;span
itemprop=nameLaptop 2/span: span itemprop=statusloaned out/span/li
   ...
/ul

(We're using the itemtype attribute to specify the type of the object, using a 
made-up vocabulary... which is fine to start with).

Toss that into the structured data linter at http://linter.structured-data.org 
and you can see (roughly) what any microdata parser will spit out. That's 
already fairly useful to machines that would want to parse the page for their 
own purposes (mobile apps, or aggregators of all available library hardware 
across public and academic libraries in your area, or whatever). The advantage 
of using structured data is that you can later on decide to use div or 
table markup, and as long as you keep the itemscope/itemtype/itemprop 
properties generating the same output, any clients using microdata parsers are 
going to just keep on working... whereas screen-scraping approaches will 
generally crash and burn if you change the HTML out from underneath them.

For what it's worth, you're not serving up linked data at this point, because 
you're not really linking to anything, and you're not providing any identifiers 
to which others could link. You can add itemid attributes to satisfy the latter 
goal:

h1Laptops/h1
ul
  li itemscope itemtype=http://example.org/laptop;
itemid=#laptop1span itemprop=nameLaptop 1/span: span 
itemprop=statusavailable/span (span itemprop=locationcirculation 
desk/span)/li
  li itemscope itemtype=http://example.org/laptop; itemid=#laptop2span 
itemprop=nameLaptop 2/span: span itemprop=statusloaned out/span/li
   ...
/ul

I guess if you wanted to avoid this being a linked data silo, you could link 
out from the web page to the manufacturer's page to identify the make/model of 
each piece of hardware; but realistically that's probably not going to help 
anyone, so why bother?

Long story short, you can achieve a lot of linked data / semantic web goals by 
simply generating basic structured data without having to worry about content 
negotiation to serve up RDF/XML and JSON-LD and Turtle, setting up triple 
stores, or other such nonsense. You can use whatever technology you're using to 
generate your web pages (assuming they're dynamically
generated) to add in this structured data.

If you're interested, over the last year I've put together a couple of gentle 
self-guiding tutorials on using RDFa (fulfills roughly the same role as 
microdata) with schema.org (a general vocabulary of types and their 
properties). The shorter one is at https

Re: [CODE4LIB] Creating a Linked Data Service

2014-08-07 Thread Michael Beccaria
I'm a one man shop and sometimes go to these conferences where many of you 
brilliant people are making these brilliant solutions making these ubiquitous 
black box data services that talk to one another using a standardized query 
language and I felt inspired and thought maybe I have been doing patch work on 
a job that really ought to be done a better way. I'm all about the bubble gum 
and duct tape stuff but I was at a point where it would have been a good time 
to migrate to something a little more robust. I'm getting the impression that 
for the size of the projects I'm working on linked data and other similar 
solutions are very much overkill. I'll have a PHP script output some custom xml 
that can be ingested on the other end and call it a day. Done :-)

This is also, at least for me, a challenge I have with being a 
wear-a-lot-of-hats-and-sometimes-write-code person at a small institution. Most 
of the time I'm not sure what I am supposed to be doing so I just make a 
solution that works without having others to bounce ideas off of. Thanks for 
the support.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Riley-Huff, Debra
Sent: Wednesday, August 06, 2014 11:52 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Creating a Linked Data Service

I agree with Roy. Seems like something that could be easily handled with PHP or 
Python scripts. Someone on the list may even have a homegrown solution 
(improved duct tape) they would be happy to share. I fail to see what the 
project has to do with linked data or why you would go that route.

Debra Riley-Huff
Head of Web Services  Associate Professor JD Williams Library University of 
Mississippi University, MS 38677
662-915-7353
riley...@olemiss.edu


On Wed, Aug 6, 2014 at 9:33 PM, Roy Tennant roytenn...@gmail.com wrote:

 I'm puzzled about why you want to use linked data for this. At first 
 glance the requirement simply seems to be to fetch data from your ILS 
 server, which likely could be sent in any number of simple packages 
 that don't require an RDF wrapper. If you are the only one consuming 
 this data then you can use whatever (simplistic, proprietary) format 
 you want. I just don't see what benefits you would get by creating 
 linked data in this case that you wouldn't get by doing something 
 much more straightforward and simple. And don't be harshing on duct 
 tape. Duct tape is a perfectly fine solution for many problems.
 Roy


 On Wed, Aug 6, 2014 at 2:45 PM, Michael Beccaria 
 mbecca...@paulsmiths.edu
 
 wrote:

  I have recently had the opportunity to create a new library web page 
  and host it on my own servers. One of the elements of the new page 
  that I
 want
  to improve upon is providing live or near live information on 
  technology availability (10 of 12 laptops available, etc.). That 
  data resides on my ILS server and I thought it might be a good time 
  to upgrade the bubble
 gum
  and duct tape solution I now have to creating a real linked data 
  service that would provide that availability information to the web server.
 
  The problem is there is a lot of overly complex and complicated 
  information out there onlinked data and RDF and the semantic web 
  etc. and I'm looking for a simple guide to creating a very simple 
  linked data service with php or python or whatever. Does such a 
  resource exist? Any advice on where to start?
  Thanks,
 
  Mike Beccaria
  Systems Librarian
  Head of Digital Initiative
  Paul Smith's College
  518.327.6376
  mbecca...@paulsmiths.edu
  Become a friend of Paul Smith's Library on Facebook today!
 



[CODE4LIB] Creating a Linked Data Service

2014-08-06 Thread Michael Beccaria
I have recently had the opportunity to create a new library web page and host 
it on my own servers. One of the elements of the new page that I want to 
improve upon is providing live or near live information on technology 
availability (10 of 12 laptops available, etc.). That data resides on my ILS 
server and I thought it might be a good time to upgrade the bubble gum and duct 
tape solution I now have to creating a real linked data service that would 
provide that availability information to the web server.

The problem is there is a lot of overly complex and complicated information out 
there onlinked data and RDF and the semantic web etc. and I'm looking for a 
simple guide to creating a very simple linked data service with php or python 
or whatever. Does such a resource exist? Any advice on where to start?
Thanks,

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


[CODE4LIB] Recommendations for IT Department Management Resouces

2014-05-02 Thread Michael Beccaria
I'm looking for resources on managing IT departments and infrastructure in an 
academic environment. Resources that go over high level organization stuff like 
essential job roles, policies, standard operating procedures, etc.? Anyone know 
of any good resources out there that they consider useful or essential?
Thanks,
Mike

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


[CODE4LIB] Python (or similar) package to read counter stat reports?

2014-02-11 Thread Michael Beccaria
This might be a bit obscure, but is there a python package or other programming 
language package that is designed to read library Counter statistics reports? 
I'm looking to start building a data warehouse for some of our ebook and 
journal vendors and want to pull data from these reports. I can script it but 
wanted to know if anything might exist to help along the way. Anybody else 
working on or completed something similar?

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


Re: [CODE4LIB] De-dup MARC Ebook records

2013-08-22 Thread Michael Beccaria
Steve,
I don't think it's so much find a control field (however, the closest match I 
can use is ISBN or eISBN which has its issues) but also normalizing the data in 
the fields so that matches are produced. It will no doubt take some time to 
figure out.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
McDonald, Stephen
Sent: Friday, August 16, 2013 8:16 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] De-dup MARC Ebook records

Michael Beccaria said:
 Thanks for the replies. To clarify, I am working with 2 (or more in 
 the future) marc records outside of the ILS. I've tried using Marcedit 
 but my usage did vary...not much overlap with the control fields that 
 were available to me. I have a feeling they are a bit varied. I'm also 
 messing around with marcXimiL a little but I'm having trouble getting 
 it to output any records at all. I also was looking at the XC 
 aggregation module but I was having trouble getting that to work 
 properly as well and the listserv was unresponsive. It seemed like 
 good software but it required me to set up an OAI harvest source to 
 allow it to ingest the records and that...well...enough is enough... I 
 think I will probably need to write something, and at least that way I 
 know what it will be doing rather than plowing through software that 
 has little to no support. Please feel free to let me know of a particular 
 strategy you think might work best in this regard...

If you couldn't get adequate deduping from the control fields available in 
MarcEdit deduping, what control fields do you think you need to dedup on?  You 
can actually specify any arbitrary field and subfield for deduping in MarcEdit.

Steve McDonald
steve.mcdon...@tufts.edu


Re: [CODE4LIB] De-dup MARC Ebook records

2013-08-22 Thread Michael Beccaria
Karen,
Do you have a sense of how well it actually works? Is Open Library implementing 
it?

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
Coyle
Sent: Thursday, August 22, 2013 11:53 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] De-dup MARC Ebook records

The record matching algorithm used by the Open Library is available here:
https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/merge

The original spec, which may have changed in the implementation, is here:

http://kcoyle.net/merge.html

kc


On 8/22/13 8:07 AM, Michael Beccaria wrote:
 Steve,
 I don't think it's so much find a control field (however, the closest match I 
 can use is ISBN or eISBN which has its issues) but also normalizing the data 
 in the fields so that matches are produced. It will no doubt take some time 
 to figure out.

 Mike Beccaria
 Systems Librarian
 Head of Digital Initiative
 Paul Smith's College
 518.327.6376
 mbecca...@paulsmiths.edu
 Become a friend of Paul Smith's Library on Facebook today!


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
 Of McDonald, Stephen
 Sent: Friday, August 16, 2013 8:16 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] De-dup MARC Ebook records

 Michael Beccaria said:
 Thanks for the replies. To clarify, I am working with 2 (or more in 
 the future) marc records outside of the ILS. I've tried using 
 Marcedit but my usage did vary...not much overlap with the control 
 fields that were available to me. I have a feeling they are a bit 
 varied. I'm also messing around with marcXimiL a little but I'm 
 having trouble getting it to output any records at all. I also was 
 looking at the XC aggregation module but I was having trouble getting 
 that to work properly as well and the listserv was unresponsive. It 
 seemed like good software but it required me to set up an OAI harvest 
 source to allow it to ingest the records and that...well...enough is 
 enough... I think I will probably need to write something, and at 
 least that way I know what it will be doing rather than plowing 
 through software that has little to no support. Please feel free to let me 
 know of a particular strategy you think might work best in this regard...
 If you couldn't get adequate deduping from the control fields available in 
 MarcEdit deduping, what control fields do you think you need to dedup on?  
 You can actually specify any arbitrary field and subfield for deduping in 
 MarcEdit.

   Steve McDonald
   steve.mcdon...@tufts.edu

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


[CODE4LIB] De-dup MARC Ebook records

2013-08-15 Thread Michael Beccaria
Has anyone had any luck finding a good way to de-duplicate MARC records from 
ebook vendors. We're looking to integrate Ebrary and Ebsco Academic Ebook 
collections and they estimate an overlap into the 10's of thousands.

Strategies, tools, software?

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


Re: [CODE4LIB] De-dup MARC Ebook records

2013-08-15 Thread Michael Beccaria
Thanks for the replies. To clarify, I am working with 2 (or more in the future) 
marc records outside of the ILS. I've tried using Marcedit but my usage did 
vary...not much overlap with the control fields that were available to me. I 
have a feeling they are a bit varied. I'm also messing around with marcXimiL a 
little but I'm having trouble getting it to output any records at all. I also 
was looking at the XC aggregation module but I was having trouble getting that 
to work properly as well and the listserv was unresponsive. It seemed like good 
software but it required me to set up an OAI harvest source to allow it to 
ingest the records and that...well...enough is enough... I think I will 
probably need to write something, and at least that way I know what it will be 
doing rather than plowing through software that has little to no support. 
Please feel free to let me know of a particular strategy you think might work 
best in this regard...

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Andy 
Kohler
Sent: Thursday, August 15, 2013 2:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] De-dup MARC Ebook records

Are you expecting to work with two files of records, outside of your ILS?
If so, for a project like that I'd probably write Perl script(s) using 
MARC::Record (there are similar code libraries for Ruby, Python and Java at 
least).

For each record in each file, use the ISBN (and/or OCLC number and/or LCCN) as 
a key.  Compare all sets, and keep one record per key.

This assumes that the vendors are supplying records with standard identifiers, 
and not just their own record numbers.

If you're comparing each file with what's already in your ILS, then it'll 
depend on the tools the ILS offers for matching incoming records to the 
database.  Or, export the database and compare it with the files, as above.

Andy Kohler / UCLA Library Info Tech
akoh...@library.ucla.edu / 310 206-8312

On Thu, Aug 15, 2013 at 10:11 AM, Michael Beccaria mbecca...@paulsmiths.edu
 wrote:

 Has anyone had any luck finding a good way to de-duplicate MARC 
 records from ebook vendors. We're looking to integrate Ebrary and 
 Ebsco Academic Ebook collections and they estimate an overlap into the 10's 
 of thousands.




Re: [CODE4LIB] web-based ocr

2013-03-13 Thread Michael Beccaria
Tesseract has really poor quality last time I tried it and ABBYY server is 
ridiculously expensive (and charges perpage). Leadtools has an ocr sdk but it 
too is expensive. If you want to go relatively cheap on this (and I don't know 
for sure but probably break some licensing agreement with ABBYY) you could set 
up a web server with a $99 version of abbyy finereader with a hotfolder set up 
to convert anything that is dropped into it to txt. You would then have to 
write the backend to keep track of the files that were submitted, let abbyy 
convert it, and then show the results to the end user.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric 
Lease Morgan
Sent: Tuesday, March 12, 2013 2:16 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] web-based ocr

Thank you for the prompt replies. 

Call me cheap or unable to navigate the political/fiscal landscape, but I don't 
see myself subscribing to a service. Instead I see putting a wrapper around 
Tesseract, but alas, the wrappers are written in languages that I don't know. 
[1] Hmmm... On the Perl side, I am having problems installing 
Image::OCR::Tesseract. 

[1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns

--
Eric Still Cogitating Morgan


Re: [CODE4LIB] XML Parsing and Python

2013-03-07 Thread Michael Beccaria
I ended up doing a regular expression find and replace function to replace all 
illegal xml characters with a dash or something. I was more disappointed in the 
fact that on the xml creation end, minidom was able to create non-compliant xml 
files. I assumed that if minidom could make it, it would be compliant but that 
doesn't seem to be the case. Now I have to add a find and replace function on 
the creation side to avoid this issue in the future. Good learning experience I 
guess. Thanks for all your suggestions.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Chris 
Beer
Sent: Tuesday, March 05, 2013 1:48 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] XML Parsing and Python

I'll note that 0x is a UTF-8 non-character, and  these noncharacters 
should never be included in text interchange between implementations. [1] I 
assume the OCR engine maybe using 0x when it can't recognize a character? 
So, it's not wrong for a parser to complain (or, not complain) about 0x, 
and you can just scrub the string like Jon suggests.

Chris


[1] http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Noncharacters

On 5 Mar, 2013, at 9:16 , Jon Stroop jstr...@princeton.edu wrote:

 Mike,
 I haven't used minidom extensively but my guess is that 
 doc.toprettyxml(indent= ,encoding=utf-8) isn't actually changing the 
 encoding because it can't parse the string in your content variable. I'm 
 surprised that you're not getting tossed a UnicodeError, but The docs for 
 Node.toxml() [1] might shed some light:
 
 To avoid UnicodeError exceptions in case of unrepresentable text data, the 
 encoding argument should be specified as utf-8.
 
 So what happens if you're not explicit about the encoding, i.e. just 
 doc.toprettyxml()? This would hopefully at least move your exception to a 
 more appropriate place.
 
 In any case, one solution would be to scrub the string in your content 
 variable to get rid of the invalid characters (hopefully they're 
 insignificant). Maybe something like this:
 
 def unicode_filter(char):
try:
unicode(char, encoding='utf-8', errors='strict')
return char
except UnicodeDecodeError:
return ''
 
 content = 'abc\xFF'
 content = ''.join(map(unicode_filter, content)) print content
 
 Not really my area of expertise, but maybe worth a shot
 -Jon
 
 1. 
 http://docs.python.org/2/library/xml.dom.minidom.html#xml.dom.minidom.
 Node.toxml
 
 --
 Jon Stroop
 Digital Initiatives Programmer/Analyst Princeton University Library 
 jstr...@princeton.edu
 
 
 
 
 On 03/04/2013 03:00 PM, Michael Beccaria wrote:
 I'm working on a project that takes the ocr data found in a pdf and places 
 it in a custom xml file.
 
 I use Python scripts to create the xml file. Something like this (trimmed 
 down a bit):
 
 from xml.dom.minidom import Document
 doc = Document()
  Page = doc.createElement(Page)
  doc.appendChild(Page)
  f = StringIO(txt)
  lines = f.readlines()
  for line in lines:
  word = doc.createElement(String)
  ...
  word.setAttribute(CONTENT,content)
  Page.appendChild(word)
  return doc.toprettyxml(indent=  ,encoding=utf-8)
 
 
 This creates a file, simply, that looks like this:
 ?xml version=1.0 encoding=utf-8? Page HEIGHT=3296 
 WIDTH=2609
   String CONTENT=BuffaloLaunch /
   String CONTENT=Club /
   String CONTENT=Offices /
   String CONTENT=Installed /
   ...
 /Page
 
 I am able to get this document to be created ok and saved to an xml file. 
 The problem occurs when I try and have it read using the lxml library:
 
 from lxml import etree
 doc = etree.parse(filename)
 
 
 I am running across errors like XMLSyntaxError: Char 0x out of allowed 
 range, line 94, column 19. Which when I look at the file, is true. There is 
 a 0X character in the content field.
 
 How is a file able to be created using minidom (which I assume would create 
 a valid xml file) and then failing when parsing with lxml? What should I do 
 to fix this on the encoding side so that errors don't show up on the parsing 
 side?
 Thanks,
 Mike
 
 How is the
 Mike Beccaria
 Systems Librarian
 Head of Digital Initiative
 Paul Smith's College
 518.327.6376
 mbecca...@paulsmiths.edu
 Become a friend of Paul Smith's Library on Facebook today!


[CODE4LIB] XML Parsing and Python

2013-03-04 Thread Michael Beccaria
I'm working on a project that takes the ocr data found in a pdf and places it 
in a custom xml file.

I use Python scripts to create the xml file. Something like this (trimmed down 
a bit):

from xml.dom.minidom import Document
doc = Document()
Page = doc.createElement(Page)
doc.appendChild(Page)
f = StringIO(txt)
lines = f.readlines()
for line in lines:
word = doc.createElement(String)
...
word.setAttribute(CONTENT,content)
Page.appendChild(word)
return doc.toprettyxml(indent=  ,encoding=utf-8)


This creates a file, simply, that looks like this:
?xml version=1.0 encoding=utf-8?
Page HEIGHT=3296 WIDTH=2609
  String CONTENT=BuffaloLaunch /
  String CONTENT=Club /
  String CONTENT=Offices /
  String CONTENT=Installed /
  ...
/Page

I am able to get this document to be created ok and saved to an xml file. The 
problem occurs when I try and have it read using the lxml library:

from lxml import etree
doc = etree.parse(filename)


I am running across errors like XMLSyntaxError: Char 0x out of allowed 
range, line 94, column 19. Which when I look at the file, is true. There is a 
0X character in the content field.

How is a file able to be created using minidom (which I assume would create a 
valid xml file) and then failing when parsing with lxml? What should I do to 
fix this on the encoding side so that errors don't show up on the parsing side?
Thanks,
Mike

How is the
Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


[CODE4LIB] OCR To ALTO without ABBYY

2012-09-06 Thread Michael Beccaria
I inadvertently purchase ABBYY Finereader 11 Corporate thinking that it would 
be capable of outputting to ALTO XML. I was wrong. ABBYY Finereader Engine 
does:/

Ultimately, I want to OCR some newspaper images and export them to ALTO XML 
and, until the proof of concept is done, I want to try to do it on the cheap. 
My plan this morning was to write some scripts to OCR them using Microsoft 
Office Document Imaging (MODI) and then export the results to ALTO XML which 
could be a big project. Has anyone done this before or know of a quick and 
dirty way to get some OCR data?
Thanks,
Mike Beccaria
Systems Librarian
Paul Smith's College
518.327.6376


Re: [CODE4LIB] Silently print (no GUI) in Windows

2012-04-04 Thread Michael Beccaria
Wireless no-device-drive install print solutions usually do this and I think 
Adobe Acrobat full version does this when it converts files from, say, Word to 
PDF. They automate a print job and print to a PDF writer printer. This usually 
requires whatever software that is needed to print be installed on the machine 
(i.e. acrobat, excel, word, etc). You could easily write a vbscript or 
powershell script to print them like so:

How to print a pdf file:
set oWsh = CreateObject (Wscript.Shell)
oWsh.run Acrobat.exe /p /h FileName,,true

And a Word document:
Set objWord = CreateObject(Word.Application)
Set objDoc = objWord.Documents.Open(c:\scripts\inventory.doc)
objDoc.PrintOut()
objWord.Quit

Or, for word documents, you can use the command line to print (via a batch file 
or other scripting program) refer to this:
http://www.christowles.com/2011/04/microsoft-word-printing-from-command.html

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle 
Banerjee
Sent: Tuesday, April 03, 2012 3:17 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Silently print (no GUI) in Windows

Would Google Cloud Print be helpful?

Otherwise, I think you may need to use multiple apps to actually print things 
(i.e. you actually need Word to print Word docs) unless the files are all 
converted. While at least in the case of Word, this can be done from the 
command line with switches, it actually invokes the whole program which is a 
huge waste -- it's probably better to just have Office running and then have an 
Office Basic program scan for files and send them to the printer.

kyle

On Tue, Apr 3, 2012 at 11:48 AM, Kozlowski,Brendon bkozlow...@sals.eduwrote:

 Not a dumb question at all. In this particular case, the receiving PC 
 that is to be storing/printing the documents will be taking jobs from 
 multiple networks, buildings, etc by either piping an email account, 
 or downloading via a user's upload from a webpage. We already have a 
 solution for catching jobs in the print spooler (not ours), but need 
 to automate the sending of the documents to the spooler itself.

 The only way I've ever sent documents to the spooler was by opening up 
 the full application (ex: Microsoft Word), and using the GUI to send 
 the print job. Since the PC housing and releasing these files is 
 expected to be un-manned and sit in a back room, we just need to be 
 able to silently print the jobs in the background. Opening multiple 
 applications over and over again would use up a lot of resources, so a 
 silent, no-GUI option would be the best from my very little understanding - 
 if it's even possible.



 Brendon Kozlowski
 Web Administrator
 Saratoga Springs Public Library
 49 Henry Street
 Saratoga Springs, NY, 12866
 [518] 584-7860 x217
 
 From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Kyle 
 Banerjee [baner...@uoregon.edu]
 Sent: Tuesday, April 03, 2012 1:25 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Silently print (no GUI) in Windows

 At the risk of asking a dumb question, why wouldn't a print server 
 meet your use case if the print jobs come from elsewhere?

 kyle

 On Tue, Apr 3, 2012 at 9:15 AM, Kozlowski,Brendon bkozlow...@sals.edu
 wrote:

  I'm curious to know if anyone has discovered ways of silently 
  printing documents from such Windows applications as:
 
 
 
  - Acrobat Reader (current version)
 
  - Microsoft Office 2007 (Word, Excel, Powerpoint, Visio, etc...)
 
  - Windows Picture and Fax Viewer
 
 
 
  I unfortunately haven't had much luck finding any resources on this.
 
 
 
  I'd like to be able to receive documents in a queue like fashion to 
  a single PC and simply print them off as they arrive. However, 
  automating
 the
  loading/exiting of the full-blown application each time, and 
  on-demand, seems a little too cumbersome and unnecessary.
 
 
 
  I have not yet decided on whether I'd be scripting it (PHP, AutoIT, 
  batch files, VBS, Powershell, etc...) or learning and then writing a 
  .NET application. If .NET solutions use the COM object, the 
  scripting becomes
 a
  potential candidate. Unfortunately I need to know how, or even if, 
  it's even possible to do first.
 
 
 
  Thank you for any and all feedback or assistance.
 
 
 
 
  Brendon Kozlowski
  Web Administrator
  Saratoga Springs Public Library
  49 Henry Street
  Saratoga Springs, NY, 12866
  [518] 584-7860 x217
 
  Please consider the environment before printing this message.
 
  To report this message as spam, offensive, or if you feel you have 
  received this in error, please send e-mail to ab...@sals.edu 
  including the entire contents and subject of the message.
  It will be reviewed by staff and acted upon appropriately.
 



 --
 

Re: [CODE4LIB] image zoom for iPad

2012-01-31 Thread Michael Beccaria
I thought Microsoft released a seadragon mobile app awhile back also. I
remember playing with it on my ipod touch.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Friscia, Michael
Sent: Monday, January 30, 2012 4:20 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] image zoom for iPad

Hi all,
I'm wondering if anyone can recommend an image zoom option for ipad that
provides functionality like zoomify/seadragon but works on the ipad. I'm
hoping for some ajax/jquery library I never heard of that will work and
provide good functionality. Or maybe I'm doing something wrong and my
use of seadragon would be better if I did x, y and z...

Any thoughts or suggestions that do not include don't do zoom would be
greatly appreciated.

Thanks,
-mike
___
Michael Friscia
Manager, Digital Library  Programming Services

Yale University Library
(203) 432-1856


Re: [CODE4LIB] NEcode4lib?

2011-12-16 Thread Michael Beccaria
I'd be very interested in going. Yale is a good location for me.
 
Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Joseph Montibello
Sent: Friday, December 16, 2011 9:42 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] NEcode4lib?

Hi,

It looks like there was a New England regional a couple of years ago. Is
there still any activity/interest in this region? I can imagine that in
addition to folks who missed the registration power-hour, there might be
a significant group that can't get their library to support a trip to
Seattle.

Just curious.
Joe Montibello, MLIS
Library Systems Manager
Dartmouth College Library
603.646.9394
joseph.montibe...@dartmouth.edumailto:joseph.montibe...@dartmouth.edu


Re: [CODE4LIB] Examples of visual searching or browsing

2011-10-31 Thread Michael Beccaria
Microsoft labs awhile back released a few visualization tools that might
be of interest:
1. Deepzoom: quickly explore gigapixel sized images and collections of
images. 
http://www.microsoft.com/silverlight/deep-zoom/ 
Here's one of my favorite examples of Yosemite valley:
http://www.xrez.com/yose_proj/yose_deepzo 
And the classic Hard Rock Memorabilia site:
http://memorabilia.hardrock.com/
2. Pivot:
http://www.microsoft.com/silverlight/pivotviewer/
3. Photosynth - really innovative but perhaps limited in it's scope. 
http://photosynth.net/


Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Julia Bauder
Sent: Thursday, October 27, 2011 4:27 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Examples of visual searching or browsing

Dear fans of cool Web-ness,

I'm looking for examples of projects that use visual(=largely non-text
and
non-numeric) interfaces to let patrons browse/search collections. Things
like the GeoSearch on North Carolina Maps[1], or projects that use
Simile's Timeline or Exhibit widgets[2] to provide access to collections
(e.g., what's described here:
https://letterpress.uchicago.edu/index.php/jdhcs/article/download/59/70)
, or in-the-wild uses of Recollection[3]. I'm less interested in knowing
about tools (although I'm never *uninterested* in finding out about cool
tools) than about production or close-to-production sites that are
making good use of these or similar tools to provide visual, non-linear
access to collections. Who's doing slick stuff in this area that
deserves a look?

Thanks!

Julia

[1] http://dc.lib.unc.edu/ncmaps/search.php
[2] http://www.simile-widgets.org/
[3] http://recollection.zepheira.com/




*

Julia Bauder

Data Services Librarian

Interim Director of the Data Analysis and Social Inquiry Lab (DASIL)

Grinnell College Libraries

 Sixth Ave.

Grinnell, IA 50112



641-269-4431


Re: [CODE4LIB] id services from loc

2011-10-18 Thread Michael Beccaria
While not exactly what you're looking for, OCLC Collection Analysis 
documenation contains a file on how they map LC\Dewey call numbers to subjects. 
I'm not sure if they are LC subject headings though. Click OCLC Conspectus on 
this page for the excel sheet:
http://www.oclc.org/support/documentation/collectionanalysis/default.htm
Mike Beccaria
Systems Librarian
Paul Smith's College
518.327.6376



From: Code for Libraries on behalf of Enrico Silterra
Sent: Tue 10/18/2011 2:11 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] id services from loc



is there any way to go from a LC call number,
like DF853  to http://id.loc.gov/authorities/subjects/sh85057107
via some sort of api? opensearch?
thanks,
rick



--
Enrico Silterra Software Engineer
501 Olin Library Cornell University Ithaca NY 14853
Voice: 607-255-6851 Fax: 607-255-6110 E-mail: es...@cornell.edu
http://www.library.cornell.edu/dlit
Out of the crooked timber of humanity no straight thing was ever made
CONFIDENTIALITY NOTE
The information transmitted, including attachments, is intended only
for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance
upon, this information by persons or entities other than the intended
recipient is prohibited. If you received this in error, please contact
the sender and destroy any copies of this document.


[CODE4LIB] Software for Capstone\Theses Projects

2011-09-21 Thread Michael Beccaria
I've been looking for an out of the box solution to archive and make
accessible capstone\theses projects to web users. The caveat being that
when the author submits the paper, they would be able provide
permissions and metadata to the document (copyright and access) and,
based on those permissions, the entire document would be made public or
only the metadata. I know that there are large repository software
packages like DSpace or Fedora Commons that probably do this, but I was
looking for something smaller. I don't need to scale to millions of
documents and have all of the potential bells and whistles. Just
something that lets people create an account, upload, set permissions
and the have documents show up in the search interface.

Anything like this around?

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu


Re: [CODE4LIB] Free/Open OCR solutions?

2010-08-03 Thread Michael Beccaria
No, other than it is possible to do so:
http://office.microsoft.com/en-us/help/about-ocr-international-issues-HP003081238.aspx
http://office.microsoft.com/en-us/help/about-ocr-international-issues-HP003081238.aspx

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of stuart 
yeates
Sent: Monday, August 02, 2010 4:46 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Free/Open OCR solutions?

Michael Beccaria wrote:
 Andrew, 
 If you have MS Office, Microsoft has an OCR engine built in. I used it
 to OCR some college yearbooks at MPOW. It's not ABBYY but it works
 pretty well! It's scriptable using VBScript or your MS language of
 choice.
 
 http://msdn.microsoft.com/en-us/library/aa167607(office.11).aspx
 Notice the OCR method in the document.

Could someone comment on the efficacy of this OCR on languages with 
non-latin characters?

cheers
stuart
-- 
Stuart Yeates
http://www.nzetc.org/   New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] New books RSS feed / badge with cover images?

2010-04-14 Thread Michael Beccaria
Laura,
While not directly related to your question, another route you might
want to go is to try and get thumbnail images from other sources (i.e.
amazon, openlibrary, etc.) in addition to syndetics and cache them for
future use. I do this on our website and I wrote an article on the
basics here: (http://journal.code4lib.org/articles/1009). 

I can send you the code I have on my server for our complete solution if
you want.

Here's our new books page, database driven with local thumbnails
(http://www.paulsmiths.edu/library/books.php). I also use the same data
to send out a new books email every 2 weeks to subscribers. You can sign
up here to get a sample
(http://library.paulsmiths.edu/newbooks/subscribe.php) and then
unsubscribe in your 1st email if you want. Again, if you want the Python
code that generates and sends the emails, let me know.

Also, if you're a PHP fan, VuFind has some code on downloading book
covers here:
https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/bookcover.ph
p

Hope that helps in a round-a-bout way.
Be Well,

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Laura Harris
Sent: Friday, April 09, 2010 10:07 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] New books RSS feed / badge with cover images?

Hi, all - I suspect something like this is being done already, so I
thought I would check in and ask. 

Essentially, what I would like to do is display the library's new books
on a web page in a graphic format - I'd like it to look very similar to
the sorts of widgets that GoodReads or LibraryThing users can create. I
threw up a few quick examples here:

http://gvsu.edu/library/zzwidget-test-171.htm 

Now, we have an RSS feed for our new books (Millennium is our ILS if it
matters), and as I understand it, the images we get from Syndetic
Solutions are parsed as enclosures to that RSS feed. Is there a way to
take the RSS feed, and only show those enclosures (if they exist, and
are not the default grey box we see if the book doesn't have a cover
image) somehow? 

Or perhaps there's a really easy way to do this that I'm overlooking. 

Would appreciate your insight! 

Thanks,


Re: [CODE4LIB] Conference followup; open position at Google Cambridge

2010-03-15 Thread Michael Beccaria
Will,
I didn't get a chance to attend code4lib this year, but thought I would
respond via the list. At Paul Smith's College we are using google book
thumbnails for our catalog as well as using the embedded viewers to
allow users to view previews of available books. Here is a sample
search:

http://library.paulsmiths.edu/catalog/search/?q=forestindex=textsort=;
limit=pubdaterange:2000-2009facetclick=pubdaterange

We will be switching over to VuFind this summer and I will likely use GB
in a similar way with that interface as well. I plan (hopefully this
summer) to build a web service that uses OCLC Web Services, Open
Library, Hathi Trust, and Google Books to search for and return similar
items from those resources to display in our catalog. I really like the
service overall.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Will Brockman
Sent: Tuesday, March 09, 2010 5:02 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Conference followup; open position at Google
Cambridge

As a first-time Code4Lib attendee, let me say thanks for a fun
conference - a very interesting and creative group of people!

A question I posed to some of you in person, and would love to hear
more answers to: What are you doing with Google Books?  Do you have a
new way of using that resource?  Are there things you'd like to do
with it that aren't possible yet?

Also, a couple of people asked if Google is hiring.  Not only are we
hiring large numbers of software engineers, but we're now seeking a
librarian / software developer (below).  I'm happy to take questions
about either.

All the best,
Will
brock...@google


Metadata Analyst

Google Books is looking for a hybrid librarian/software developer to
help us organize all the world's books.  This person would work
closely with software engineers and librarians on a variety of tasks,
ranging from algorithm evaluation to designing and implementing
improvements to Google Books.

Candidates should have:
* An MLS or MLIS degree, ideally with cataloguing experience
* Programming experience in Python, C++, or Java
* Project management experience a plus, but not required

This position is full-time and based in Cambridge, MA.


Re: [CODE4LIB] University of Rochester Releases IR+ Institutional Repository System

2009-12-15 Thread Michael Beccaria
Nathan,
Can you summarize how the IR+ software is different than other major
institutional repository software? I'm not directly involved with a
repository and so my understanding of the scope of these products lacks
detail. Where does IR+ fit into the big picture?
Thanks,

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Sarr, Nathan
Sent: Tuesday, December 15, 2009 2:57 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] University of Rochester Releases IR+ Institutional
Repository System

The University of Rochester is pleased to announce the 1.0 production
version of its new open source institutional repository software
platform, IR+.  The University has been running IR+ in production since
August 2009.  
 

The download can be found here:

 

http://code.google.com/p/irplus/downloads/list
http://code.google.com/p/irplus/downloads/list 
 
 
The website for the project can be found here:
 
http://www.irplus.org http://www.irplus.org 
 
 
IR+ includes the following features:
 
-   Repository Wide Statistics: download counts at the repository
collection and publication level.  The statistics excludes web crawler
results, and includes the ability to retroactively remove previously
unknown crawlers or download counts that should not be included, for
more accurate statistical reporting.
 
-   Researcher Pages, to allow users (faculty, graduate students,
researchers) to highlight their work and post their CV
 

o   Example of a current researcher:
https://urresearch.rochester.edu/viewResearcherPage.action?researcherId=
30
https://urresearch.rochester.edu/viewResearcherPage.action?researcherId
=30 

 

-   Ability to create Personal publications that allows users to have
full control over their work and see download counts without publishing
into the repository.

 

-   An online workspace where users can store files they are working on,
and if needed, share files with colleagues or their thesis advisor. 

 

-  Contributor pages where users can view download counts for
all publications that they are associated with in the repository.

 

o   Example of a contributor page:
https://urresearch.rochester.edu/viewContributorPage.action?personNameId
=20
https://urresearch.rochester.edu/viewContributorPage.action?personNameI
d=20  

 

-  Faceted Searching (example search for: Graduate Student
Research)

 

o
https://urresearch.rochester.edu/searchRepositoryItems.action?query=Medi
cal+Image
https://urresearch.rochester.edu/searchRepositoryItems.action?query=Med
ical+Image 

 

-  Embargos (example below embargoed until 2011-01-01)

 

o
https://urresearch.rochester.edu/institutionalPublicationPublicView.acti
on?institutionalItemId=8057
https://urresearch.rochester.edu/institutionalPublicationPublicView.act
ion?institutionalItemId=8057 

 

 

-  Name Authority Control (Notice changes in last name)

o
https://urresearch.rochester.edu/viewContributorPage.action?personNameId
=209
https://urresearch.rochester.edu/viewContributorPage.action?personNameI
d=209 

 

 

 
You can see the IR+ system customized for our university and in action
here: https://urresearch.rochester.edu
https://urresearch.rochester.edu 
 
 
A further explanation of highlights can be found on my researcher page
here:
 

https://urresearch.rochester.edu/researcherPublicationView.action?resear
cherPublicationId=11
https://urresearch.rochester.edu/researcherPublicationView.action?resea
rcherPublicationId=11 

 

The documentation for the system (install/user/administration) with lots
of pictures can be found on my researcher page here:

 

https://urresearch.rochester.edu/researcherPublicationView.action?resear
cherPublicationId=16
https://urresearch.rochester.edu/researcherPublicationView.action?resea
rcherPublicationId=16 

 

We would be happy to give you a personal tour of the system and the
features it provides. 

 

Please feel free to contact me with any questions you may have.  

-Nate

 

 

Nathan Sarr

Senior Software Engineer

River Campus Libraries

University of Rochester

Rochester, NY  14627

(585) 275-0692

 


Re: [CODE4LIB] holdings standards/protocols

2009-11-16 Thread Michael Beccaria
VuFind has a connector that works pretty well for SirsiDynix
Unicorn/Symphony users. It levies an ILS server side script (Perl I
think) to interface with the API to get holdings data. It is possible to
get account data the same way, though it hasn't been developed.

You can see it run here on our beta install:
http://library.paulsmiths.edu/vufind/Search/Home?lookfor=dogtype=allsu
bmit=Find
Wait a few seconds following page load and the holdings data should
update.

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Chris Keene
Sent: Monday, November 16, 2009 9:05 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] holdings standards/protocols

Hi

We recently implemented a new third party web catalogue (Aquabrowser).

So far so good, and separating the web based discovery layer from the 
monolithic ILS seems to be the right direction.

However there seems to be two areas of weakness: Holdings and 'My 
Accout' (renewal, reservations). i.e. the need for  *any* 
catalogue/discovery system to allow users to see holdings and account 
info from *any* ils.


I'm trying to get my facts right on the current situation. Any help 
appreciated.

re Holdings.  Two things come up when asking around and looking on the 
web (some what briefly), Z39.50 and ISO 20775.

Can anyone give me an idea if any/many/all (ILS) Z implementations have 
implemented the holdings information?

Is there a way of testing this using a client such as yaz (e.g. a worked

example of seeing holdings via Z)

Is there interest from ILS suppliers in the ISO holdings standard, are 
any of them implementing it?

http://www.loc.gov/standards/iso20775/
http://www.portia.dk/zholdings/
http://www.nlbconference.com/ilds/plenary4B.htm


Thanks for any info
Chris
-- 
Chris Keene c.j.ke...@sussex.ac.uk
Technical Development Manager   Tel (01273) 877950
University of Sussex Library
http://www.sussex.ac.uk/library/


Re: [CODE4LIB] preconference proposals - solr

2009-11-16 Thread Michael Beccaria
I can't make it to c4l this year :( But knowing that the preconferences
are really very valuable, if there is a way that this information could
be recorded and placed online like the main presentations that would be
amazing!

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Bess Sadler
Sent: Friday, November 13, 2009 11:26 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] preconference proposals - solr

Hey, how about this? I've been discussing this off list with Erik and  
Naomi and this is what we came up with (I also added it to the wiki):

This is a proposal for several pre-conference sessions that would fit  
together nicely for people interested in implementing a next-gen  
catalog system.

1. Morning session - solr white belt
Instructor: Bess Sadler (anyone else want to join me?)
The journey of solr mastery begins with installation. We will then  
proceed to data types, indexing, querying, and inner harmony. You will  
leave this session with enough information to start running a solr  
service with your own data.

2. Morning session - solr black belt
Instructors: Erik Hatcher (and Naomi Dushay? she has offered to help,  
if that's of interest)
Amaze your friends with your ability to combine boolean and weighted  
searching. Confound your enemies with your mastery of the secrets of  
dismax. Leave slow queries in the dust as you performance tune solr  
within an inch of its life. [We should probably add more specific  
advanced topics here... suggestions welcome]

3. Afternoon session - Blacklight
Instructors: Naomi Dushay, Jessie Keck, and Bess Sadler
Apply your solr skills to running Blacklight as a front end for your  
library catalog, institutional repository, or anything you can index  
into solr. We'll cover installation, source control with git, local  
modifications, test driving development, and writing object-specific  
behaviors. You'll leave this workshop ready to revolutionize discovery  
at your library. Solr white belts or black belts are welcome.

And then anyone else who had a topic that built on solr (e.g.,  
vufind?) could add it in the afternoon. Obviously I'm biased, but I  
really do think the topic of implementing a next gen catalog is meaty  
enough for a half day and I know people are asking me about it and  
eager to attend such a thing.

What do you think, folks?

Bess

On 12-Nov-09, at 4:10 PM, Gabriel Farrell wrote:

 On Tue, Nov 10, 2009 at 02:47:42PM +, Jodi Schneider wrote:
 If you'd be up for it Erik, I'd envision a basic session in the  
 morning.
 Some of us (like me) have never gotten Solr up and running.

 Then the afternoon could break off for an advanced session.

 Though I like Bess's idea, too! Would that be suitable for a  
 conference
 breakout? Not sure I'd want to pit it against Solr advanced session!

 The preconfs should be as inclusive as possible, but I'm wondering if
 the Solr session might be more beneficial if we dive into the
 particulars right off the bat in the morning.  There are only a few
 steps to get Solr up and running -- it's in the configuration for our
 custom needs that the advice of a certain Mr. Hatcher can really be
 helpful.

 You're right, though, that the NGC thing sounds more like a BOF  
 session.
 I'd support that in order to attend a full preconf day of Solr.


 Gabriel

Elizabeth (Bess) Sadler
Chief Architect for the Online Library Environment
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

b...@virginia.edu
(434) 243-2305


[CODE4LIB] SerialsSolutions Javascript Question

2009-10-28 Thread Michael Beccaria
I was intrigued by someone who posted to the Worldcat Developers Network
forum. They were asking about the xISSN service and having it return
whether an ISSN is peer reviewed or not. Which got me thinking...Has
anyone been able to finagle a feature into their SerialsSolutions A-Z
list where it shows peer reviewed status for the titles that are
returned using a WC service? SS has limited editing capabilities on
their page so the javascript question is this:

Is it possible when being able to edit ONLY the header to alter span
tags of a loaded web page using javascript? Can I insert some javascript
in those sections that will scrape the ISSN number from these span tags
and add some dynamic content from a web service using javascript alone?

I'm not very proficient at javascript so be gentle.

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376


Re: [CODE4LIB] SerialsSolutions Javascript Question

2009-10-28 Thread Michael Beccaria
I should clarify. The most granular piece of information in the html is
a class attribute (i.e. there is no id). Here is a snippet:

div class=SS_Holding style=background-color: #CECECE
!-- Journal Information --
span class=SS_JournalTitlestrongAnnals of forest
science./strong/spannbsp;span
class=SS_JournalISSN(1286-4560)/span


I want to alter the span class=SS_JournalISSN(1286-4560)/span
section. Maybe add some html after the issn that tells whether it is
peer reviewed or not.

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Michael Beccaria
Sent: Wednesday, October 28, 2009 9:13 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] SerialsSolutions Javascript Question

I was intrigued by someone who posted to the Worldcat Developers Network
forum. They were asking about the xISSN service and having it return
whether an ISSN is peer reviewed or not. Which got me thinking...Has
anyone been able to finagle a feature into their SerialsSolutions A-Z
list where it shows peer reviewed status for the titles that are
returned using a WC service? SS has limited editing capabilities on
their page so the javascript question is this:

Is it possible when being able to edit ONLY the header to alter span
tags of a loaded web page using javascript? Can I insert some javascript
in those sections that will scrape the ISSN number from these span tags
and add some dynamic content from a web service using javascript alone?

I'm not very proficient at javascript so be gentle.

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376


Re: [CODE4LIB] OCR PDFs

2008-10-20 Thread Michael Beccaria
It's not exactly what you're looking for, but Microsoft Office comes
with a scripting OCR engine that works on TIFFs. I use it to get text
from yearbooks we are scanning so people can look for names and such.
While I wouldn't put it on par with ABBYY, it does a pretty decent job.

I wrote a simple script in vbscript that scans all the tiff files in a
folder and exports a txt file with the same name as the image that has
all of the text it finds. If you want it, let me know and I'll send it
your way.

Mike Beccaria 
Systems Librarian 
Head of Digital Initiatives 
Paul Smith's College 
518.327.6376 
[EMAIL PROTECTED] 
 
---
This message may contain confidential information and is intended only
for the individual named. If you are not the named addressee you should
not disseminate, distribute or copy this e-mail. Please notify the
sender immediately by e-mail if you have received this e-mail by mistake
and delete this e-mail from your system.
-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
James Tuttle
Sent: Friday, October 17, 2008 7:57 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] OCR PDFs

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I wonder if any of you might have experience with creating text PDFs
from  TIFFs.  I've been using tiffcp to stitch TIFFs together into a
single image and then using tiff2pdf to generate PDFs from the single
TIFF.  I've had to pass this image-based PDF to someone with Acrobat to
use it's batch processing facility to OCR the text and save a text-based
PDF.  I wonder if anyone has suggestions for software I can integrate
into the script (Python on Linux) I'm using.

Thanks,
James

- --
- ---
James Tuttle
Digital Repository Librarian

NCSU Libraries, Box 7111
North Carolina State University
Raleigh, NC 27695-7111
[EMAIL PROTECTED]

(919)513-0651 Phone
(919)515-3031  Fax

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI+H1zKxpLzx+LOWMRAgxIAJwNXyeMJbk6r6hmHpNAdEvWIQbCVgCgp8JR
nyS3WZ4UuRbU/6DTH7ohe/M=
=mT2T
-END PGP SIGNATURE-


Re: [CODE4LIB] marc4j 2.4 released

2008-10-20 Thread Michael Beccaria
Very cool! I noticed that a feature, MarcDirStreamReader, is capable of
iterating over all marc record files in a given directory. Does anyone
know of any de-duplicating efforts done with marc4j? For example,
libraries that have similar holdings would have their records merged
into one record with a location tag somewhere. I know places do it
(consortia etc.) but I haven't been able to find a good open program
that handles stuff like that.

Mike Beccaria 
Systems Librarian 
Head of Digital Initiatives 
Paul Smith's College 
518.327.6376 
[EMAIL PROTECTED] 
 
---
This message may contain confidential information and is intended only
for the individual named. If you are not the named addressee you should
not disseminate, distribute or copy this e-mail. Please notify the
sender immediately by e-mail if you have received this e-mail by mistake
and delete this e-mail from your system.

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Bess Sadler
Sent: Monday, October 20, 2008 11:12 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] marc4j 2.4 released

Dear Code4Libbers,

I'm very pleased to announce that for the first time in almost two  
years there has been a new release of marc4j. Release 2.4 is a minor  
release in the sense that it shouldn't break any existing code, but  
it's a major release in the sense that it represents an influx of new  
people into the development of this project, and a significant  
improvement in marc4j's ability to handle malformed or mis-encoded  
marc records.

Release notes are here: http://marc4j.tigris.org/files/documents/ 
220/44060/changes.txt

And the project website, including download links, is here: http:// 
marc4j.tigris.org/

We've been using this new marc4j code in solrmarc since solrmarc  
started, so if you're using Blacklight or VuFind, you're probably  
using it already, just in an unreleased form.

Bravo to Bob Haschart, Wayne Graham, and Bas Peters for making these  
improvements to marc4j and getting this release out the door.

Bess

Elizabeth (Bess) Sadler
Research and Development Librarian
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[EMAIL PROTECTED]
(434) 243-2305


[CODE4LIB] Google Books Dynamic Links API and Python

2008-10-02 Thread Michael Beccaria
Not everyone will care, but I will put it in here for posterity sake and
probably for my own reference when I forget in the future.

I was having trouble getting the new google books dynamic link api to
work right with python
(http://code.google.com/apis/books/docs/dynamic-links.html). I was using
the basic urllib python library with a non-working code base that looks
like this:

import urllib,urllib2
gparams = urllib.urlencode({'bibkeys': 'ISBN:061837943',
'jscmd':'viewapi','callback':'mycallback'})
g=urllib2.urlopen(url=http://books.google.com/books?%s; % gparams)
print g.read()

I was getting an http 401 error, Unauthorized. Code4lib IRC folks told
me it was probably the headers urllib was sending, and they were right.
I wrote code to modify the headers to make google believe I was
requesting from firefox. The working code is below. I know most of you
can write this stuff in your sleep, but I thought this might save
someone like me some time in the end.
Hope it helps,
Mike Beccaria 
Systems Librarian 
Head of Digital Initiatives 
Paul Smith's College 
518.327.6376 
[EMAIL PROTECTED] 


import urllib,urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
params = urllib.urlencode({'bibkeys': 'ISBN:061837943',
'jscmd':'viewapi','callback':'mycallback'})

request =
urllib2.Request('http://books.google.com/books?bibkeys=0618379436jscmd=
viewapicallback=mycallback')
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')]
data = opener.open(request).read()
print data


Re: [CODE4LIB] Google Books Dynamic Links API and Python

2008-10-02 Thread Michael Beccaria
Scratch that, the code is simpler. Serves me right for not checking
things twice:
import urllib,urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
request =
urllib2.Request('http://books.google.com/books?bibkeys=0618379436jscmd=
viewapicallback=mycallback')
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')]
data = opener.open(request).read()
print data

Mike Beccaria 
Systems Librarian 
Head of Digital Initiatives 
Paul Smith's College 
518.327.6376 
[EMAIL PROTECTED] 
 


Re: [CODE4LIB] Free covers from Google

2008-03-18 Thread Michael Beccaria
 If you can find a public email address
anywhere or comment form, let us know.

You can send a response to them here:
http://www.google.com/support/librariancenter/bin/request.py

The form seems to be for librarians so maybe they'll understand the
issue and talk to people who may be able to make a change.

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376
[EMAIL PROTECTED]


[CODE4LIB] Whatbird Interface Framework

2007-12-18 Thread Michael Beccaria
Hey all,

I'm considering trying to create a framework\tool to allow people to
create a whatbird.com like interface for other types of datasets
(plants, trees, anything really).

The idea is to create a framework allowing users to create a discovery
tool with attribute selections to narrow down the result set. So, for
example, our faculty/students would identify attributes found in all
trees (leaf shape, fruit, bark, form, etc.) and then input this data
into the tool which would then allow them to input actual trees and
associate them with the attributes (as well as input description info,
pictures, etc.). The end result would look something like whatbird.com
does with birds.

This will be a challenge for me (but a good one). My thought is to use a
web framework like Django (picked because I know it a little) but am
unsure if you can have it organize the database tables with the
relationships properly. I considered using solr but thought it would be
overkill considering the relatively small datasets this tool would be
used to create (under 1000 objects) but in the end it might be a good
bet. If approved (I have to talk to the dean of our forestry department
to see if he will buy into the idea) I will try and create the bulk of
it during January and tweak it the rest of the semester.

Anyone interesting in working on this type of project with me?

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376
[EMAIL PROTECTED]


[CODE4LIB] Preconference Location

2007-02-16 Thread Michael Beccaria
I noticed that the pre-conference location was changed to the Tate
Student Center. Is this near the Georgia Center Hotel? Come to think of
it, is the Georgia Center for Continuing Education nearby as well?

I'm asking because I won't have a car...just that shuttle to and from
the airport.

Thanks,
Mike


Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376
[EMAIL PROTECTED]