Re: [CODE4LIB] web-based ocr

2013-03-12 Thread Richard Sarvas
Something like this is on my to do list for our future Fedora Commons 
deployment here at UConn. I was considering wrapping a SOAP interface around 
something like the Perl Image::OCR::Tesseract module and adding it to our 
ingest pipeline unless someone can recommend a better OCR application.


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Till 
Kinstler
Sent: Tuesday, March 12, 2013 12:30 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] web-based ocr

Am 12.03.2013 16:57, schrieb Eric Lease Morgan:

 Does anybody know of something like this that exists already?

We are running something like this. Not with a HTML or REST-ful front end, but 
WebDAV. The users of this service do mass digitization. They mount their 
individual WebDAV share, push scanned image files there and read the OCR 
results from output files (usually not by hand but with some software that 
manages their digitization workflow).
The actual OCR is done by an ABBYY Recognition Server, the WebDAV front end 
including accounting is a straightforward home-brewed solution.

Till

--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger 
Sieben 1, D 37073 Göttingen kinst...@gbv.de, +49 (0) 551 39-13431, 
http://www.gbv.de


Re: [CODE4LIB] web-based ocr

2013-03-13 Thread Richard Sarvas
FYI - the Image::OCR::Tesseract install was a real pain for me on RHEL. I kept 
running into problems getting one of the dependency modules Time::Format 
(Date::Manip::TZ_Base errors) installed on RHEL. Eventually I had to install 
Date::Manip via YUM, then do a force install of Time::Format. After that 
Image::OCR::Tesseract refused to recognice that the Tesseract executable was 
installed because the source code compile and installed placed the executable 
in /usr/local/bin and not /usr/bin. Once I moved the Tesseract executable to 
/usr/bin the Image::OCR::Tesseract module install worked fine (ImageMagick and 
Leptonica having been previously installed).


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric 
Lease Morgan
Sent: Wednesday, March 13, 2013 8:54 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] web-based ocr

On Mar 13, 2013, at 8:07 AM, Ben Brumfield benwb...@gmail.com wrote:

 https://github.com/idigbio-aocr/RESTAPI/tree/master/doc

Interesting. Printed for future reference. Thank you.

BTW, I did finally get Image::OCR::Tesseract to make, make test, and make 
install correctly. I did not have the correct/proper libraries installed for 
Tesseract's supporting Leptonica library. Now I need to find a PDF library 
similar to libtff and libpng. 

--
Eric Morgan


Re: [CODE4LIB] Lorem Ipsum metadata? Is there such a thing?

2013-12-09 Thread Richard Sarvas
I've had good luck using both the Data::Faker and Text::Lorem Perl modules to 
generate large amounts (30k+ rows) of Archivists Toolkit test data. Other ports 
of Data::Faker would probably work just as well, though it needs a bit more 
code to more than generate more than name, address and contact info. At the 
time I was mostly just generating person data for the AT names table but I had 
considered one day extending the code so that more detailed person data could 
be created but I never got around to it. It never occurred to me that there 
might actually be a need for something along the lines of a scholarly NPC 
generator. 


Rick

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Sean 
Hannan
Sent: Sunday, December 08, 2013 7:00 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Lorem Ipsum metadata? Is there such a thing?

In ruby, there's the ffaker gem (https://github.com/EmmanuelOga/ffaker), which 
itself is a port of Perl's Data::Faker. 

-Sean

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Pottinger, 
Hardy J. [pottinge...@missouri.edu]
Sent: Saturday, December 07, 2013 11:51 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Lorem Ipsum metadata? Is there such a thing?

Hi, I asked this on Google Plus earlier today, but I figured I'd better take 
this question here: my brain is trying to tell me that there's a service or app 
that makes fake metadata, kind of like Lorem Ipsum but you feed it your 
fields and it gives you nonsense metadata back. But, it looks right enough for 
testing. Yesterday, I had to make up about 50 rows of fake metadata to test 
some code that handles paging in a UI, and I had to make it all up by hand. 
This hurts my soul. Someone please tell me such a service exists, and link me 
to it, so I never have to do this again. Or else, I may just make such a 
service, to save us all. But I don't want to go coding some new service if it 
already exists, because that sort of thing is for chumps.


--
HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library 
Systems http://lso.umsystem.edu/~pottingerhj/
https://MOspace.umsystem.edu/
Making things that are beautiful is real fun. --Lou Reed


Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-19 Thread Richard Sarvas
I did some experimentation wrapping the Perl Image::ExifTool module (along with 
Image::OCR::Tesseract) in some code that exposed it as a SOAP service for use 
in a Fedora Commons ingest service. It seemed to work well enough for bulk file 
processing in testing, though the approach of a custom ingest system, in 
general, was eventually abandoned when consultants were brought in. 

Were I to do it again I'd probably also add a REST interface to the generic 
service wrapper.


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward 
Summers
Sent: Tuesday, December 17, 2013 4:54 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: 
Possible or Pipedream?

I remember hearing somewhere that ExifTool is pretty good for extracting image 
metadata. 

edsu--


Re: [CODE4LIB] EZProxy changes / alternatives ?

2014-01-29 Thread Richard Sarvas
What about using some of the open source WSO2 products to mimic the same 
functionality as EZProxy? This sounds like a task that Enterprise Service Bus 
combined (ESB) with Identity Server (IS) could do. Most of their products are 
some version of an Apache project or other wrapped up in a common user 
interface.

http://wso2.com/products/

They're free, so the price is right.


Rick

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Riley 
Childs
Sent: Tuesday, January 28, 2014 10:13 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?

What about CAS, it has some proxy component...I think.

Sent from my iPhone

 On Jan 28, 2014, at 10:11 PM, tmccanna tmcca...@georgialibraries.org 
 wrote:

 Ditto to Andreas.


 Sent from my Verizon Wireless 4G LTE Smartphone

 div Original message /divdivFrom: Andreas 
 Orphanides akorp...@ncsu.edu /divdivDate:01/28/2014  9:29 PM  
 (GMT-05:00) /divdivTo: CODE4LIB@LISTSERV.ND.EDU 
 /divdivSubject: Re: [CODE4LIB] EZProxy changes / alternatives ? 
 /divdiv /divThat's simple for the techs, but VPNs can be a royal pain 
 in the keester if you're an end-user, for a variety of reasons. It should be 
 incumbent on us as information specialists to unburden the user to the extent 
 possible.


 On Tue, Jan 28, 2014 at 9:23 PM, Aaron Addison 
 addi...@library.umass.eduwrote:

 Some use Squid, its not hard to set up.  But most vendors publish 
 rules with ezproxy in mind.

 The other fairly simple solution is to run a VPN for access, and 
 require people to use that.

 Aaron


 On Tuesday, January 28, 2014, stuart yeates stuart.yea...@vuw.ac.nz
 wrote:

 We've just received notification of forth-coming changes to EZProxy,
 which
 will require us to pay an arm and a leg for future versions to 
 install locally and/or host with OCLC AU with a ~ 10,000km round trip.

 What are the alternatives?

 cheers
 stuart
 --
 Stuart Yeates
 Library Technology Services http://www.victoria.ac.nz/library/



Re: [CODE4LIB] separate list for jobs

2014-05-06 Thread Richard Sarvas
Not to be a jerk about this, but why is the answer always No? There seem to 
be more posts on this list relating to job openings than there are relating to 
code discussions. Are job postings a part why this list was originally created? 
If so, I'll stop now. 

Then again, perhaps as a group we are just not posting enough code related 
topics to drown out the occasional job posting.


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Wilhelmina Randtke
Sent: Tuesday, May 06, 2014 12:39 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] separate list for jobs

This comes up all the time, and always it's no.  For anyone who doesn't like 
the job postings, use email filters.


-Wilhelmina

On Tue, May 6, 2014 at 11:34 AM, Dan Chudnov daniel.chud...@gmail.com wrote:
 Is it time to reconsider:  should we start a separate list for Job: 
 postings?  code4lib-jobs, perhaps?

   -Dan


Re: [CODE4LIB] separate list for jobs

2014-05-06 Thread Richard Sarvas
Very well then, carry on with the job postings.


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Steve 
Meyer
Sent: Tuesday, May 06, 2014 1:34 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] separate list for jobs

There is another benefit in addition to, skills should I cultivate. There is 
a follow-the-money factor. Declaring I'm for Linked Data is one thing.
Putting Linked Data in a job title is something far more significant.

Since code4lib is not always boast4lib-ish, it would be too great a loss to not 
see the evidence of financial investment by institutions for things like the 
Hydra stack (Solr, Fedora, Blacklight...) over the last few years.
When your HR department says you are building an RDF-based triple store, I am 
pretty certain you will be doing it.


On Tue, May 6, 2014 at 1:01 PM, Kyle Banerjee kyle.baner...@gmail.comwrote:

 On Tue, May 6, 2014 at 9:59 AM, Richard Sarvas  
 richard.sar...@lib.uconn.edu
  wrote:

  Not to be a jerk about this, but why is the answer always No? 
  There
 seem
  to be more posts on this list relating to job openings than there 
  are relating to code discussions. Are job postings a part why this 
  list was originally created? If so, I'll stop now.
 

 Fragmentation dilutes the community and creates an unnecessary barrier 
 by requiring people to know one more thing. Email filters take no time 
 at all to set up so anyone who considers them noise doesn't need to be 
 exposed to them.

 kyle



Re: [CODE4LIB] separate list for jobs

2014-05-06 Thread Richard Sarvas
Actually, I am not complaining. I am just wondering why I am receiving so may 
job postings on a list serve that I though was supposed to be relating to 
Code4Lib conferences and coding in library environments. Had the list been 
called Code4LibJobs I suspect I never would have asked the question in the 
first place. As that is not the title of this list I felt it was a reasonable 
question, mostly because every time this topic comes up people simply respond 
No without explaining why. When the topic was proposed by another member I 
took the time to seek clarification.

Still, thanks for taking the time to explain reason why so many job postings 
appear on this list. 


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stuart 
Yeates
Sent: Tuesday, May 06, 2014 3:51 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] separate list for jobs

On 05/07/2014 04:59 AM, Richard Sarvas wrote:
 Not to be a jerk about this, but why is the answer always No? There seem to 
 be more posts on this list relating to job openings than there are relating 
 to code discussions. Are job postings a part why this list was originally 
 created? If so, I'll stop now.

The answer is always no because we are collectively using the the possession 
of an email client with filtering capability and the personal knowledge of how 
to use it as a Shibboleth for group membership. Those who find it easier to 
complain than write a filter mark themselves as members of the outgroup 
intruding on the ingroup.

cheers
stuart


Re: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs

2014-05-08 Thread Richard Sarvas
Let's not dwell on any single reply in this thread - that tends to make people 
uncomfortable, and not something I want to be a part of. We have a lively and 
interesting discussion going and we've also gained some new insights as to how 
some subscribers are using this list and for what reasons. I think the main 
point discovered so far is that the job postings are considered far more 
important by the overall community than some of us previously suspected (myself 
included). 

I have the answer to the question I was originally looking for, thank you all. 



Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stuart 
Yeates
Sent: Wednesday, May 07, 2014 8:28 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs

The fact that the only person who has given any acknowledgement of 
understanding my message was someone else in .ac.nz suggests that despite my 
best efforts my message content was effectively shredded by the implicit 
conversion from New Zealand English to International English.

My apologies; I withdraw my original email.

To translate explicitly into International English, my point was:

I have observed that an individuals position on mail filtering vs separate 
mailing lists appears to be an implicit marker of group membership in this 
group (i.e. a shibboleth).

Note that I do not endorse this or any other marker of group membership, but my 
understanding of psychology of groups suggest that all functional groups have 
markers of group membership and that attempting to eliminate markers of group 
membership in an attempt at inclusiveness (a) can in itself be a marker of 
group membership and (b) is only likely to drive a shift from relative explicit 
markers to relatively implicit markers.

cheers
stuart

On 05/08/2014 10:17 AM, David Friggens wrote:
 This is a pretty terrible reply.

 I thought it was a great reply.

 obscure words (seriously, shibboleth?)

 Somewhat obscure, but not so much in Code4Lib.
 http://en.wikipedia.org/wiki/Shibboleth
 http://en.wikipedia.org/wiki/Shibboleth_(Internet2)

 Unless you're trying to be sarcastic...in which case ignore this.

 He most definitely was.

 I believe Stuart's point was to suggest that when the multiple 
 requests for a separate list for job notices get immediately shot down 
 with no - use an email filter, or are you stupid? [1] it doesn't 
 help to create an inclusive and good learning environment.

 [1] NB the respondents aren't explicitly are you stupid but that's 
 how it may be taken by some people.

 And to answer the original question - job listings help more people than 
 they annoy so they should be kept as-is.

 My view is that it would make more sense to have separate discussion 
 and job notice lists, as I see in other places. But I'm not that 
 bothered personally, as I would subscribe to both and filter them into 
 the same folder in my mail client. :-)

 Cheers
 David



Re: [CODE4LIB] Mac OS 9 emulator

2015-04-23 Thread Richard Sarvas
Lynda,
Do you need to use the data on the file system in an emulated environment or 
are you just trying to access the data on the file system created by OS9?


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle 
Banerjee
Sent: Thursday, April 23, 2015 1:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Mac OS 9 emulator

On Thu, Apr 23, 2015 at 10:20 AM, Schmitz Fuhrig, Lynda  
schmitzfuhr...@si.edu wrote:

 Thanks for the responses.

 We actually need to read media within it so Virtual Box would not work 
 for us.


Could you say a bit more about your use case? Some applications such as dealing 
with archival materials might actually require actual hardware in which case 
ebay may be the best option.

kyle


Re: [CODE4LIB] Mac OS 9 emulator

2015-04-23 Thread Richard Sarvas
For data transfer between old virtual Mac drive volumes (and CDs) , I used to 
use a free program called HFSExplorer on Windows, but being written in JAVA 
it might work on other platforms. The source code appears to be available for 
download as well.

http://www.catacombae.org/hfsexplorer/

Another option might be to create an old Windows NT or 2000 system with 
Services for Macintosh installed to directly read old Mac drives or to 
connect legacy Mac hardware over a network.


Rick

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Schmitz 
Fuhrig, Lynda
Sent: Thursday, April 23, 2015 3:04 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Mac OS 9 emulator

Typically we are trying to access the data and transfer it for preservation 
work. There could be cases though where the data will need to be in the 
emulated environment in order to replay. We can't always predict what types of 
records we are going to get from across the Institution. We have encountered 
CDs and diskettes that will only read in the OS 9 environment and won't even be 
recognized in OS 10.x.

Lynda Schmitz Fuhrig
Electronic Records Archivist
Digital Services Division
Smithsonian Institution Archives
Capital Gallery Building
600 Maryland Ave SW
Suite 3000
MRC 507
Washington, DC 20024-2520

siarchives.si.edu | @SmithsonianArch | Facebook | e-newsletter

A gift in support of the Archives will help make more of our collections 
accessible!


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Richard 
Sarvas
Sent: Thursday, April 23, 2015 2:37 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Mac OS 9 emulator

Lynda,
Do you need to use the data on the file system in an emulated environment or 
are you just trying to access the data on the file system created by OS9?


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle 
Banerjee
Sent: Thursday, April 23, 2015 1:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Mac OS 9 emulator

On Thu, Apr 23, 2015 at 10:20 AM, Schmitz Fuhrig, Lynda  
schmitzfuhr...@si.edu wrote:

 Thanks for the responses.

 We actually need to read media within it so Virtual Box would not work 
 for us.


Could you say a bit more about your use case? Some applications such as dealing 
with archival materials might actually require actual hardware in which case 
ebay may be the best option.

kyle


Re: [CODE4LIB] looking for free hosting for html code

2015-05-22 Thread Richard Sarvas
What about using Code Anywhere?

https://codeanywhere.com/

They have a free hosting option that can also drag/drop data from DropBox and a 
Google Drive, though content hosting can only be done from the sandbox server 
they create for you as part of your account. The integrated web-based HTML 
editor is not that bad to work with.


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Charlie 
Morris
Sent: Friday, May 22, 2015 9:14 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] looking for free hosting for html code

I've never done this, but I've heard you can use DropBox in an unofficial 
capacity to host basic pages too:
http://www.dropboxwiki.com/tips-and-tricks/host-websites-with-dropbox

On Fri, May 22, 2015 at 8:59 AM, Joe Hourcle onei...@grace.nascom.nasa.gov
wrote:

 On Fri, 22 May 2015, Sarles Patricia (18K500) wrote:

 [trimmed]

  I plan to teach coding to my 6th and 12th grade students next school 
 year
 and our lab has a mixture of old (2008) and new Macs (2015) so I want 
 to make all the Macs functional for writing code in an editor.

 My next question is this:

 I am familiar with free Web creation and hosting sites like Weebly, 
 Wix, Google sites, Wikispaces, WordPress, and Blogger, but do you 
 know of any free hosting sites that will allow you to plug in your 
 own code. i.e. host your own html files?


 If it's straight HTML, and doesn't need any sort of text 
 pre-processing (SSI, ASP, JSP, PHP, ColdFusion, etc.), I think that 
 you can use Google Drive.  This help page seems to suggest that's true:

 https://support.google.com/drive/answer/2881970?hl=en

 With all static files it might also be possible to lay things out so 
 that you could serve it through github or similar.  (and teaching them 
 about version control isn't a bad idea, either)

 -Joe