Re: [CODE4LIB] web-based ocr
Something like this is on my to do list for our future Fedora Commons deployment here at UConn. I was considering wrapping a SOAP interface around something like the Perl Image::OCR::Tesseract module and adding it to our ingest pipeline unless someone can recommend a better OCR application. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Till Kinstler Sent: Tuesday, March 12, 2013 12:30 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] web-based ocr Am 12.03.2013 16:57, schrieb Eric Lease Morgan: Does anybody know of something like this that exists already? We are running something like this. Not with a HTML or REST-ful front end, but WebDAV. The users of this service do mass digitization. They mount their individual WebDAV share, push scanned image files there and read the OCR results from output files (usually not by hand but with some software that manages their digitization workflow). The actual OCR is done by an ABBYY Recognition Server, the WebDAV front end including accounting is a straightforward home-brewed solution. Till -- Till Kinstler Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger Sieben 1, D 37073 Göttingen kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de
Re: [CODE4LIB] web-based ocr
FYI - the Image::OCR::Tesseract install was a real pain for me on RHEL. I kept running into problems getting one of the dependency modules Time::Format (Date::Manip::TZ_Base errors) installed on RHEL. Eventually I had to install Date::Manip via YUM, then do a force install of Time::Format. After that Image::OCR::Tesseract refused to recognice that the Tesseract executable was installed because the source code compile and installed placed the executable in /usr/local/bin and not /usr/bin. Once I moved the Tesseract executable to /usr/bin the Image::OCR::Tesseract module install worked fine (ImageMagick and Leptonica having been previously installed). Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Wednesday, March 13, 2013 8:54 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] web-based ocr On Mar 13, 2013, at 8:07 AM, Ben Brumfield benwb...@gmail.com wrote: https://github.com/idigbio-aocr/RESTAPI/tree/master/doc Interesting. Printed for future reference. Thank you. BTW, I did finally get Image::OCR::Tesseract to make, make test, and make install correctly. I did not have the correct/proper libraries installed for Tesseract's supporting Leptonica library. Now I need to find a PDF library similar to libtff and libpng. -- Eric Morgan
Re: [CODE4LIB] Lorem Ipsum metadata? Is there such a thing?
I've had good luck using both the Data::Faker and Text::Lorem Perl modules to generate large amounts (30k+ rows) of Archivists Toolkit test data. Other ports of Data::Faker would probably work just as well, though it needs a bit more code to more than generate more than name, address and contact info. At the time I was mostly just generating person data for the AT names table but I had considered one day extending the code so that more detailed person data could be created but I never got around to it. It never occurred to me that there might actually be a need for something along the lines of a scholarly NPC generator. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Sean Hannan Sent: Sunday, December 08, 2013 7:00 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Lorem Ipsum metadata? Is there such a thing? In ruby, there's the ffaker gem (https://github.com/EmmanuelOga/ffaker), which itself is a port of Perl's Data::Faker. -Sean From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Pottinger, Hardy J. [pottinge...@missouri.edu] Sent: Saturday, December 07, 2013 11:51 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Lorem Ipsum metadata? Is there such a thing? Hi, I asked this on Google Plus earlier today, but I figured I'd better take this question here: my brain is trying to tell me that there's a service or app that makes fake metadata, kind of like Lorem Ipsum but you feed it your fields and it gives you nonsense metadata back. But, it looks right enough for testing. Yesterday, I had to make up about 50 rows of fake metadata to test some code that handles paging in a UI, and I had to make it all up by hand. This hurts my soul. Someone please tell me such a service exists, and link me to it, so I never have to do this again. Or else, I may just make such a service, to save us all. But I don't want to go coding some new service if it already exists, because that sort of thing is for chumps. -- HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ Making things that are beautiful is real fun. --Lou Reed
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
I did some experimentation wrapping the Perl Image::ExifTool module (along with Image::OCR::Tesseract) in some code that exposed it as a SOAP service for use in a Fedora Commons ingest service. It seemed to work well enough for bulk file processing in testing, though the approach of a custom ingest system, in general, was eventually abandoned when consultants were brought in. Were I to do it again I'd probably also add a REST interface to the generic service wrapper. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward Summers Sent: Tuesday, December 17, 2013 4:54 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream? I remember hearing somewhere that ExifTool is pretty good for extracting image metadata. edsu--
Re: [CODE4LIB] EZProxy changes / alternatives ?
What about using some of the open source WSO2 products to mimic the same functionality as EZProxy? This sounds like a task that Enterprise Service Bus combined (ESB) with Identity Server (IS) could do. Most of their products are some version of an Apache project or other wrapped up in a common user interface. http://wso2.com/products/ They're free, so the price is right. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Riley Childs Sent: Tuesday, January 28, 2014 10:13 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] EZProxy changes / alternatives ? What about CAS, it has some proxy component...I think. Sent from my iPhone On Jan 28, 2014, at 10:11 PM, tmccanna tmcca...@georgialibraries.org wrote: Ditto to Andreas. Sent from my Verizon Wireless 4G LTE Smartphone div Original message /divdivFrom: Andreas Orphanides akorp...@ncsu.edu /divdivDate:01/28/2014 9:29 PM (GMT-05:00) /divdivTo: CODE4LIB@LISTSERV.ND.EDU /divdivSubject: Re: [CODE4LIB] EZProxy changes / alternatives ? /divdiv /divThat's simple for the techs, but VPNs can be a royal pain in the keester if you're an end-user, for a variety of reasons. It should be incumbent on us as information specialists to unburden the user to the extent possible. On Tue, Jan 28, 2014 at 9:23 PM, Aaron Addison addi...@library.umass.eduwrote: Some use Squid, its not hard to set up. But most vendors publish rules with ezproxy in mind. The other fairly simple solution is to run a VPN for access, and require people to use that. Aaron On Tuesday, January 28, 2014, stuart yeates stuart.yea...@vuw.ac.nz wrote: We've just received notification of forth-coming changes to EZProxy, which will require us to pay an arm and a leg for future versions to install locally and/or host with OCLC AU with a ~ 10,000km round trip. What are the alternatives? cheers stuart -- Stuart Yeates Library Technology Services http://www.victoria.ac.nz/library/
Re: [CODE4LIB] separate list for jobs
Not to be a jerk about this, but why is the answer always No? There seem to be more posts on this list relating to job openings than there are relating to code discussions. Are job postings a part why this list was originally created? If so, I'll stop now. Then again, perhaps as a group we are just not posting enough code related topics to drown out the occasional job posting. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Wilhelmina Randtke Sent: Tuesday, May 06, 2014 12:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] separate list for jobs This comes up all the time, and always it's no. For anyone who doesn't like the job postings, use email filters. -Wilhelmina On Tue, May 6, 2014 at 11:34 AM, Dan Chudnov daniel.chud...@gmail.com wrote: Is it time to reconsider: should we start a separate list for Job: postings? code4lib-jobs, perhaps? -Dan
Re: [CODE4LIB] separate list for jobs
Very well then, carry on with the job postings. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Steve Meyer Sent: Tuesday, May 06, 2014 1:34 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] separate list for jobs There is another benefit in addition to, skills should I cultivate. There is a follow-the-money factor. Declaring I'm for Linked Data is one thing. Putting Linked Data in a job title is something far more significant. Since code4lib is not always boast4lib-ish, it would be too great a loss to not see the evidence of financial investment by institutions for things like the Hydra stack (Solr, Fedora, Blacklight...) over the last few years. When your HR department says you are building an RDF-based triple store, I am pretty certain you will be doing it. On Tue, May 6, 2014 at 1:01 PM, Kyle Banerjee kyle.baner...@gmail.comwrote: On Tue, May 6, 2014 at 9:59 AM, Richard Sarvas richard.sar...@lib.uconn.edu wrote: Not to be a jerk about this, but why is the answer always No? There seem to be more posts on this list relating to job openings than there are relating to code discussions. Are job postings a part why this list was originally created? If so, I'll stop now. Fragmentation dilutes the community and creates an unnecessary barrier by requiring people to know one more thing. Email filters take no time at all to set up so anyone who considers them noise doesn't need to be exposed to them. kyle
Re: [CODE4LIB] separate list for jobs
Actually, I am not complaining. I am just wondering why I am receiving so may job postings on a list serve that I though was supposed to be relating to Code4Lib conferences and coding in library environments. Had the list been called Code4LibJobs I suspect I never would have asked the question in the first place. As that is not the title of this list I felt it was a reasonable question, mostly because every time this topic comes up people simply respond No without explaining why. When the topic was proposed by another member I took the time to seek clarification. Still, thanks for taking the time to explain reason why so many job postings appear on this list. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stuart Yeates Sent: Tuesday, May 06, 2014 3:51 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] separate list for jobs On 05/07/2014 04:59 AM, Richard Sarvas wrote: Not to be a jerk about this, but why is the answer always No? There seem to be more posts on this list relating to job openings than there are relating to code discussions. Are job postings a part why this list was originally created? If so, I'll stop now. The answer is always no because we are collectively using the the possession of an email client with filtering capability and the personal knowledge of how to use it as a Shibboleth for group membership. Those who find it easier to complain than write a filter mark themselves as members of the outgroup intruding on the ingroup. cheers stuart
Re: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs
Let's not dwell on any single reply in this thread - that tends to make people uncomfortable, and not something I want to be a part of. We have a lively and interesting discussion going and we've also gained some new insights as to how some subscribers are using this list and for what reasons. I think the main point discovered so far is that the job postings are considered far more important by the overall community than some of us previously suspected (myself included). I have the answer to the question I was originally looking for, thank you all. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stuart Yeates Sent: Wednesday, May 07, 2014 8:28 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Withdraw my post was: Re: [CODE4LIB] separate list for jobs The fact that the only person who has given any acknowledgement of understanding my message was someone else in .ac.nz suggests that despite my best efforts my message content was effectively shredded by the implicit conversion from New Zealand English to International English. My apologies; I withdraw my original email. To translate explicitly into International English, my point was: I have observed that an individuals position on mail filtering vs separate mailing lists appears to be an implicit marker of group membership in this group (i.e. a shibboleth). Note that I do not endorse this or any other marker of group membership, but my understanding of psychology of groups suggest that all functional groups have markers of group membership and that attempting to eliminate markers of group membership in an attempt at inclusiveness (a) can in itself be a marker of group membership and (b) is only likely to drive a shift from relative explicit markers to relatively implicit markers. cheers stuart On 05/08/2014 10:17 AM, David Friggens wrote: This is a pretty terrible reply. I thought it was a great reply. obscure words (seriously, shibboleth?) Somewhat obscure, but not so much in Code4Lib. http://en.wikipedia.org/wiki/Shibboleth http://en.wikipedia.org/wiki/Shibboleth_(Internet2) Unless you're trying to be sarcastic...in which case ignore this. He most definitely was. I believe Stuart's point was to suggest that when the multiple requests for a separate list for job notices get immediately shot down with no - use an email filter, or are you stupid? [1] it doesn't help to create an inclusive and good learning environment. [1] NB the respondents aren't explicitly are you stupid but that's how it may be taken by some people. And to answer the original question - job listings help more people than they annoy so they should be kept as-is. My view is that it would make more sense to have separate discussion and job notice lists, as I see in other places. But I'm not that bothered personally, as I would subscribe to both and filter them into the same folder in my mail client. :-) Cheers David
Re: [CODE4LIB] Mac OS 9 emulator
Lynda, Do you need to use the data on the file system in an emulated environment or are you just trying to access the data on the file system created by OS9? Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Thursday, April 23, 2015 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Mac OS 9 emulator On Thu, Apr 23, 2015 at 10:20 AM, Schmitz Fuhrig, Lynda schmitzfuhr...@si.edu wrote: Thanks for the responses. We actually need to read media within it so Virtual Box would not work for us. Could you say a bit more about your use case? Some applications such as dealing with archival materials might actually require actual hardware in which case ebay may be the best option. kyle
Re: [CODE4LIB] Mac OS 9 emulator
For data transfer between old virtual Mac drive volumes (and CDs) , I used to use a free program called HFSExplorer on Windows, but being written in JAVA it might work on other platforms. The source code appears to be available for download as well. http://www.catacombae.org/hfsexplorer/ Another option might be to create an old Windows NT or 2000 system with Services for Macintosh installed to directly read old Mac drives or to connect legacy Mac hardware over a network. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Schmitz Fuhrig, Lynda Sent: Thursday, April 23, 2015 3:04 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Mac OS 9 emulator Typically we are trying to access the data and transfer it for preservation work. There could be cases though where the data will need to be in the emulated environment in order to replay. We can't always predict what types of records we are going to get from across the Institution. We have encountered CDs and diskettes that will only read in the OS 9 environment and won't even be recognized in OS 10.x. Lynda Schmitz Fuhrig Electronic Records Archivist Digital Services Division Smithsonian Institution Archives Capital Gallery Building 600 Maryland Ave SW Suite 3000 MRC 507 Washington, DC 20024-2520 siarchives.si.edu | @SmithsonianArch | Facebook | e-newsletter A gift in support of the Archives will help make more of our collections accessible! -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Richard Sarvas Sent: Thursday, April 23, 2015 2:37 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Mac OS 9 emulator Lynda, Do you need to use the data on the file system in an emulated environment or are you just trying to access the data on the file system created by OS9? Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Thursday, April 23, 2015 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Mac OS 9 emulator On Thu, Apr 23, 2015 at 10:20 AM, Schmitz Fuhrig, Lynda schmitzfuhr...@si.edu wrote: Thanks for the responses. We actually need to read media within it so Virtual Box would not work for us. Could you say a bit more about your use case? Some applications such as dealing with archival materials might actually require actual hardware in which case ebay may be the best option. kyle
Re: [CODE4LIB] looking for free hosting for html code
What about using Code Anywhere? https://codeanywhere.com/ They have a free hosting option that can also drag/drop data from DropBox and a Google Drive, though content hosting can only be done from the sandbox server they create for you as part of your account. The integrated web-based HTML editor is not that bad to work with. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Charlie Morris Sent: Friday, May 22, 2015 9:14 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] looking for free hosting for html code I've never done this, but I've heard you can use DropBox in an unofficial capacity to host basic pages too: http://www.dropboxwiki.com/tips-and-tricks/host-websites-with-dropbox On Fri, May 22, 2015 at 8:59 AM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: On Fri, 22 May 2015, Sarles Patricia (18K500) wrote: [trimmed] I plan to teach coding to my 6th and 12th grade students next school year and our lab has a mixture of old (2008) and new Macs (2015) so I want to make all the Macs functional for writing code in an editor. My next question is this: I am familiar with free Web creation and hosting sites like Weebly, Wix, Google sites, Wikispaces, WordPress, and Blogger, but do you know of any free hosting sites that will allow you to plug in your own code. i.e. host your own html files? If it's straight HTML, and doesn't need any sort of text pre-processing (SSI, ASP, JSP, PHP, ColdFusion, etc.), I think that you can use Google Drive. This help page seems to suggest that's true: https://support.google.com/drive/answer/2881970?hl=en With all static files it might also be possible to lay things out so that you could serve it through github or similar. (and teaching them about version control isn't a bad idea, either) -Joe