On Oct 14, 2013, at 7:56 AM, Nicolas Franck wrote:
> Could this also be done by Apache Tika? Or do I miss a crucial point?
>
> http://tika.apache.org/1.4/gettingstarted.html
Nicolas, this looks VERY promising! It seemingly can extract the OCR from a PDF
document as well as extract the text fr
On Oct 14, 2013, at 1:48 AM, Penelope Campbell
wrote:
>> For a limited period of time I am making publicly available a Web-based
>> program called PDF2TXT -- http://bit.ly/1bJRyh8
>
> As a small special library (solo librarian) in an Australian State
> Government Department I use DB/Text works
On Oct 13, 2013, at 6:21 PM, David Friggens wrote:
>>> For a limited period of time I am making publicly available a Web-based
>>> program called PDF2TXT --http://bit.ly/1bJRyh8
>>
>> PDF2TXT extracts the text from an OCRed PDF document
>
> The file I tried was digital native (probably from Wo
On Oct 11, 2013, at 6:39 PM, Mark Pernotto wrote:
> Just from a curiosity standpoint, what encoding is being utilized? I know
> nothing about Perl. It seemed to have no problem parsing a dash (-) if it
> was up against another character (2007-2012), but barfs when it's by itself
> (2007 � 2012)
On Oct 11, 2013, at 6:39 PM, Mark Pernotto wrote:
> Putting my devil's advocate hat on, it doesn't parse foreign documents well
> (I got it to break!). I also got inconsistent results feeding it PDF files
> with tables embedded (but haven't been able to figure out what it is about
> them it does
On Oct 12, 2013, at 8:31 AM, Nicolas Franck wrote:
>> For a limited period of time I am making publicly available a Web-based
>> program called PDF2TXT -- http://bit.ly/1bJRyh8
>
> If you're looking for the pdf in question, search for "IIPv105.pdf" in your
> log files.
> It is a very simple pd
On Oct 11, 2013, at 1:49 PM, Matthew Sherman wrote:
>> For a limited period of time I am making publicly available a Web-based
>> program called PDF2TXT -- http://bit.ly/1bJRyh8
>
> Very slick, good work. I can see where this tool can be very helpful. It
> does have some issues with some chara
On Oct 11, 2013, at 12:57 PM, Sean Hannan wrote:
>> For a limited period of time I am making publicly available a Web-based
>> program called PDF2TXT -- http://bit.ly/1bJRyh8
>
> Very cool. But, why only for a limited period of time?
Sean, thank you for your support. I'm making it available te
On Oct 11, 2013, at 11:57 AM, Peter Murray wrote:
>> For a limited period of time I am making publicly available a Web-based
>> program called PDF2TXT --http://bit.ly/1bJRyh8
>
> Very neat. I couldn't get the 'network diagram' link to work (from
> http://dh.crc.nd.edu/sandbox/pdf2txt/pdf2txt.
w. The output is
ugly.) Also, please be gentle with it because it does not process things the
size of the Bible.
--
[cid:116F6092-2AB6-4E95-8199-25639542726A]
Eric Lease Morgan
Digital Initiatives Librarian
University of Notre Dame
Room 131, Hesburgh Libraries
Notre Dame, IN 46556
o: 57
https://github.com/ericleasemorgan/htrc
[2] Center's Data API - http://www.hathitrust.org/htrc/api-guide
--
[cid:116F6092-2AB6-4E95-8199-25639542726A]
Eric Lease Morgan
Digital Initiatives Librarian
University of Notre Dame
Room 130, Hesburgh Libraries
Notre Dame, IN 46556
o: 574-631-8604
e: emor
On Sep 4, 2013, at 9:42 AM, Eric Lease Morgan wrote:
>> I get the basic concepts of linked data. But what I don't understand is
>> why the idea has been around so long, yet there seems to be a dearth of
>> useful applications that live up to the hype. So, what I
http://sites.tufts.edu/liam/2013/08/08/liam-guidebook/
--
Eric Lease Morgan
On Sep 3, 2013, at 10:49 AM, "Coombs,Karen" wrote:
> Option 2. - Use HTTPful code library (http://phphttpclient.com/). This is a
> well developed and supported code base which is designed specifically to
> support REST interactions. It is easy to install via Composer or Phar, or
> manually. It
om/content/view/181/190/
--
Eric Lease Morgan, Digital Initiatives Librarian
Hesburgh Libraries
University of Notre Dame
574/631-8604
>From a different list I learned of the following WorldCat metadata API
>webinar, and I thought some of us over here maybe interested too. From the
>description:
The WorldCat Metadata API supports a variety of cataloging
functionality for libraries to catalog their collections in
WorldCat
On Aug 15, 2013, at 11:11 AM, diego ferreyra wrote:
> Hi, we looking for a web tool to manage Authority Data with this general
> desirable features:
>
> - Web tool
> - Open source
> - Capabilities to expose web services or document API
> - Manage data about persons and institutions
I'm
completed by March 2014. On my mark.
Get set. Go. Wish me luck, and let’s see if we can build some community.
[0] LiAM - http://sites.tufts.edu/liam/
[1] prospectus - http://bit.ly/15TX0rs
[2] Catholic Research Resources Alliance - http://www.catholicresearch.net/
--
Eric Lease Morgan
In the very recently announced call for participation at the Speaking In Code
symposium at UVa there is a section called "You are welcome here", and it
explicitly invites non-white non-males to participate --
http://codespeak.scholarslab.org/#inclusivity Kudos. --Eric Morgan
ational Science Foundation grant
(DUE-0333601) called OCKHAM Library Network, Integrating the NSDL into
Traditional Library Services.
* Exploiting "Light-weight" Protocols and Open Source Tools to Implement
Digital Library Collections and Services by Xiaorong Xiang and Eric Lease
Mo
ou. --Eric Lease Morgan
On Aug 6, 2013, at 2:59 AM, Patrick Hochstenbach
wrote:
> LibreCat
> -=-=-=-=
>
> LibreCat is an open collaboration of the university libraries of Lund,
> Ghent, and Bielefeld to create tools for library and research services.
> One of the toolkits we
Here is a pointer to a recent article on policies for 3D printers. FYI:
A Model for Managing 3D Printing Services in Academic Libraries
by Vincent F. Scalfani in Issues in Science and Technology
Librarianship, Spring 2013
The appearance of 3D printers in university libraries opens many
p://dh.crc.nd.edu/blog/2013/05/htrc/
--
Eric Lease Morgan
University of Notre Dame
lysis
against it. Fun with modern librarianship?
[0] blog posting about the HTRC - http://dh.crc.nd.edu/blog/2013/05/htrc/
--
Eric Lease Morgan, Digital Initiatives Librarian
University of Notre Dame
> The HTRC is the beginnings of an service providing a computable interface to
> some of the content of the HathiTrust.
On a side note, the shear number of programatic interfaces to bibliographic
information coupled with the increasing availability of full text access is
creating an enormous p
> Is the integer just a sequential count of all items or is it specific to
> the name title pair?
It is a unique integer (sequential count), and possibly a key for some sort.
--ELM
On May 21, 2013, at 2:10 PM, Michael Lackhoff wrote:
> Ebooks as you get them have very different naming schemes and I would
> like to rename them according to a common naming convention which should
> include some important bibliographic data like publication year, title
> (up to what length?),
(I don't envy the job of articulating policies, and our profession sure is full
of rules! --ELM)
e thing still is pretty cool.
--
Eric Lease Morgan, Digital Initiatives Librarian
University of Notre Dame
574/631-8604
Create EAD files to describe the collections in your archives because EAD is
the MARC of the archives world. There are no two ways about it. --ELM
On May 6, 2013, at 11:08 AM, Eric Lease Morgan wrote:
> The second is a cool gender visualization brought to my attention by a
> colleague here in the Libraries -- Lauren Ajamie. (Again, "Thank you.") It
> illustrates the percentage of women to men in the publishing of s
[Posted by request. --ELM]
> Web Services Librarian
> Sonoma State University, Rohnert Park, CA
>
> Just 50 miles north of San Francisco in beautiful Sonoma County, the
> University Library in
> the Jean and Charles Schulz Information Center thrives on innovation and
> creativity. We
> are see
to be exported from member
integrated library systems and accessible at the other end of a URL. I have an
alliance member who has lost their systems librarian, and thus not sure how to
do the necessary export. Can somebody here show me some instructions on how to
accomplish this task.
--
Eric Lease Morgan
Here's an idea for web-based OCR:
1. Have Web-based OCR available
2. Make it easy for people to save
content in a Web-accessible
location thing like Box.net
3. Allow readers (I don't use the
word "users" anymore) to select
items from their Web-accessible
location a
On Mar 13, 2013, at 8:07 AM, Ben Brumfield wrote:
> https://github.com/idigbio-aocr/RESTAPI/tree/master/doc
Interesting. Printed for future reference. Thank you.
BTW, I did finally get Image::OCR::Tesseract to make, make test, and make
install correctly. I did not have the correct/proper libra
On Mar 12, 2013, at 3:26 PM, chris fitzpatrick wrote:
> About using Google Driveyeah, we're very small ( 115 students!), so
> we're very interested in keeping our over-heads nice and low..
> I'm guess I'm old enough to think that 100 GB for $5 a month is a pretty
> good deal, so we started
Thank you for the prompt replies.
Call me cheap or unable to navigate the political/fiscal landscape, but I don't
see myself subscribing to a service. Instead I see putting a wrapper around
Tesseract, but alas, the wrappers are written in languages that I don't know.
[1] Hmmm… On the Perl side
something like this that exists already?
--
Eric Lease Morgan
University of Notre Dame
nd.edu/sandbox/cyl/catalog/details/adventurewithapa00ferriala.html
[6] Catholic Youth Literature - http://dh.crc.nd.edu/sandbox/cyl/catalog/
[7] ELIS - http://freeyourmetadata.org/named-entity-extraction/
--
Eric Lease Morgan
University of Notre Dame
574/631-8604
ms are created. It is a lot like music as well. Here is the tool -- a
flute. Here are the notes it can play. Combine the notes to create beautiful
music. Combine the functions of computers to create beautiful solutions.
Combine the elements of mathematics to create beautiful descriptions.
--
Eric
On Jan 28, 2013, at 5:56 PM, Misty De Meo
wrote:
> This service uses Freebase to determine the gender of names, and offers a
> JSON API: http://genderednames.freebaseapps.com/
This looks like it will work quite well. Thank you.
Using the API and for a good time, I plan to extract the first na
Does anybody here now of an API allowing me to feed it a personal name (like
Tom, Dick, or Harry), and have it return the possible/probable gender of the
name?
--
ELM
On Jan 18, 2013, at 10:58 AM, Kevin S. Clarke wrote:
> On Fri, Jan 18, 2013 at 10:54 AM, Devon wrote:
>> If zoia is in violation of the Code of Conduct, then
>> remedial action is warranted. I think in this case, rather then getting rid
>> of the bot, we can just remove the offending plugins.
>
d, and I agree with him. I don't really
think there is a need for an additional "social structures", but no one is
stopping anybody else from creating one. I really like the idea of "and" not
"or". Personally, I believe we need fewer lists, not more.
--
Eric Lease Morgan
University of Notre Dame
Rosalyn Metz wrote:
> https://www.surveymonkey.com/s/68G5TBG
H3ll, two questions. That was too easy! --Eric
On Dec 3, 2012, at 2:15 PM, Rosalyn Metz wrote:
> Yay! We had a 13.2% response rate.
Please send the URL of the survey out to the mailing list at least one more
time. I'm sure you will get at least one more resonant, me. --Earache
> Can you tell me how many people are on the list? I'm curious for the
> results of the gender survey. I'm hoping that we got a decent sample size.
There are 2,250 subscribers to the Code4Lib mailing list. "Long live Code4Lib"
(That sort of rhymes.) --ELM
As the Code4Lib LISTSERV manager, I am able to make a number of antidotal
mailing list observations from the past few weeks:
* The volume of mailing list traffic always surges just before the annual
meeting, but the conversation this time around was much broader than in
previous years. This t
On Nov 29, 2012, at 9:36 AM, Ross Singer wrote:
>> Here, here! But I do really try to figure out what the code does
>> before implementing/deploying.
>
> I just cut, paste, and deploy.
>
> The users will tell me if I got it right.
Seriously, that is my way of coding too. In reality I think I
On Nov 13, 2012, at 12:03 AM, Ross Singer wrote:
> http://vote.code4lib.org/election/24
Vote early. Vote often. Thank you, Ross. The implementation worked well for me.
--ELM
uing in library "catalogs".
--
Eric Lease Morgan
University of Notre Dame
574/631-8604
For better or for worse, one the many Code4Lib Mailing List Home Pages, has
been moved to the following address because the host where it resided is all
but defunct - http://dh.crc.nd.edu/sandbox/mailing-lists/code4lib/ FYI
--
Eric Lease Morgan
Hesburgh Libraries
University of Notre Dame
574
> The traditional Unix tool for this job is procmail[1].
procmail++ That cool little email filter thing was the core of my Mr.
Serials Process "way back" in 1994 or so. And it still works great! The syntax
of its recipes is a bit obtuse, but still… --ELM
On Oct 1, 2012, at 7:55 PM, Bess Sadler wrote:
> For a full-text search system we're prototyping, we are being asked to
> provide term co-occurrence analysis. I'm not very familiar with this concept,
> so maybe someone on the list can describe it better, but I believe that what
> is wanted is t
On Sep 17, 2012, at 3:12 PM, wrote:
> But I'm having trouble coming up with an algorithm that can consistently spit
> these out in the form we'd want to display given the data available in TGN.
A dense but rich, just-published article from D-Lib Magazine about geocoding --
Fulltext Geocoding
dea,
then I could create all sorts of "kewl" URLs returning interesting information
about all sorts of texts.
Do you know how to write a CGI script that calls SEASR/Meandre flows?
[1] SEASR - http://seasr.org/
[2] Meandre - http://seasr.org/meandre/
--
Eric Lease Morgan
University of Notre Dame
On Aug 30, 2012, at 1:03 PM, miles stauffer
wrote:
> http://selection.datavisualization.ch/
Wow!! --ELM
On Aug 7, 2012, at 1:23 AM, Yong Tang wrote:
> First of all, what tool /tools do you use to manipulate PDF file
> directly in a script? I tried some Perl modules such as CAM::PDF and
> PDF::API2. The results were not pretty. The original text format was lost.
Yong, what type of manipulation do
If I needed/wanted to know what materials held by my library were also in the
HaitTrust, then programmatically how could I figure this out? In other words,
do you know of a way to query the HaitTrust and limit the results to items my
library owns? --Eric Lease Morgan
I think the "flood" of job postings is a good problem to have. --ELM
What are some techniques y'all would suggest for maintaining state for text
processing?
I have created a number of terminal- and Web-based interfaces facilitating text
mining services against… texts. All of these interfaces take a piece of input
denoting what text to process, for example,
http
numbers, extract call numbers from OCLC classify
service
# Eric Lease Morgan
# July 17, 2012 - first cut; "Thanks to Terry Reese for suggesting classify!"
# configure
use constant QUERY =>
'http://classify.oclc.org/classify2/Classify?oclc=##OCLC##&summary=true';
#
> If so, then given an OCLC number is it possible get an LCCN ^h^h^h^ Library
> of Congress call number, and if so, then how?
Oops! I know I can get an LCCN. Given an OCLC number, what I need is a Library
of Congress call number. --Earache
Is it true that the WorldCat API consistently does not return call numbers when
accessed via the following sort of URL:
http://www.worldcat.org/webservices/catalog/content/464597226?wskey=[key]
If so, then given an OCLC number is it possible get an LCCN, and if so, then
how?
If not, then I'm
On Jun 1, 2012, at 8:47 AM, Edward Iglesias wrote:
> http://www.wandora.org/wandora/wiki/index.php?title=Topic_Maps
Along those same lines, a force-directed graph might be fun as well:
http://mbostock.github.com/d3/ex/force.html
--
Eric Lease Morgan
there will also be a
facilitation of roundtable discussions on the topic of the day.
To register, simply send your name to Eric Lease Morgan
, and you will be registered. Easy! For more
detail, see:
http://blogs.nd.edu/emorgan/2012/03/pda/
Everybody is welcome, and the more the merrie
MARC-8. Cool in its time. Dumb now. Typical. --ELM
I have a single co-located host and I get ping, power, pipe, and air
conditioned comfort for $75/month. I haven't seen nor touched my (Linux) server
in more than four or five years, and I might have restarted it four times.
--
Eric Lease Morgan
What are the sorts of hardware and software requirements for hosting
code4lib.org? I own a co-located host that I might be able to do the job,
maybe.
--
Eric Lease Morgan
> LlkjyYYYYyetyeyppf
> Prpfc
> EXpdpppePeppp
> Pp
> P$
> $p
>
> Pp$epepp
> $ppeppPP
> PRpp
> PepplpereprpeprrprPRPeeopwprprPprppertrretrtrrterrtwrtrtww
> TrWtwteteetrteeetetttetrteyertEtrrtEgrerrtetteyeyeeytwtyeyeyeeyeeeyeey
> eryeeyeyyyeryyyeyeyeyeyeyyyeyyy
Here is a link to additional position announcements, again, forwarded by
request. --ELM
https://wikimediafoundation.org/wiki/Job_openings
Senior Software Engineer Frontend
Systems Engineer - Data Analytics
Software Developer Backend
Software Developer Frontend
QA Lead
Director
The following is a specific position description that may be of interest to our
community. 'Past along by request. --ELM
https://wikimediafoundation.org/wiki/Job_openings/Software_Developer_Frontend
Job Title: Software Developer
Reports To: Director of Features
Job Purpose
As a software engine
While I will not be able to attend, there sure were a lot of cool domain names
in that announcement:
* http://serials.lt
* http://lrs.lt
* http://uniqueids.org
* http://openlib.org/
* http://authorprofile.org
+1
--
ELM
The mailing list includes approximately 1800 people:
http://infomotions.com/blog/2011/03/where-in-the-world-is-the-mail-going/
--
ELM
I too got a cease and desisted letter almost twenty years ago. I wrote a CGI
script that would calculate the phase of the moon. I called it LunaTick. The
letter was from a lawyer defending a trademark for a fishing lure. --Eric Morgan
Ironically, I had (or there was) some trouble with the term
"MyLibrary@NCState". Granted, the term was originally a variation of My
Netscape, My Yahoo, and My Deja News, but all sorts of things followed it, like
MyiLibrary, the Google Books My Library, and then there was a ALA thing. I'm
not ne
r programs. All the hundreds of others are
simply variations on themes."
--
Eric Lease Morgan
IMHO, the idea of intellectual "property" on things that can be duplicated
without any sort of degradation -- like software -- is absolutely absurd and
bogus. --Eric Morgan
[The following is being forwarded upon request. --ELM]
Head of Systems Librarian -- Milner Library, Illinois State University:
This position is responsible for planning, implementing, overseeing, and
evaluating the services, projects, systems, and other resources of the Web
Services and End-Us
On Oct 24, 2011, at 3:03 PM, Jon Gorman wrote:
> yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw >marc21.utf8.raw
This worked great! My version of yaz-marcdump was older and was not doing the
trick. code4lib++
--
Eric
On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote:
>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>> (encoding) data?
>
> You can't. MARC-8 is a character set that is unknown to the operating
> system. Your best bet is to convert MARC-8-encoded records into UTF-8.
In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding)
data?
Character encoding is the bane of my existence. I have learned that when
reading from a file I ought to specify the type of encoding the file is in and
decode accordingly, or else. Once read, it is converted i
On Oct 3, 2011, at 11:32 AM, Demian Katz wrote:
> Additionally, I notice that there are different versions of the PDF here:
>
> http://www.archive.org/details/canarybird00schm
>
> (one labeled PDF, another B/W PDF)
>
> Does one version work better on tablets than the other?
At first glance, t
On Oct 3, 2011, at 11:12 AM, Dave Caroline wrote:
> Diva was announced here of 6th of June
> https://listserv.nd.edu/cgi-bin/wa?A2=ind1106&L=CODE4LIB&T=0&F=&S=&P=27064
>
> The clever part is you only send the visible part at the scale they
> are viewing so little excess bandwidth.
>
> For online
Most people seem to have mixed results when trying to open the PDF files on
their tablet-based (Android and iOS) devices. Bummer! These PDF files were
harvested from the Internet Archive. They seem to be viewable just fine for
desktop machines, but not tablets.
The number of files I have in the
On Oct 3, 2011, at 10:26 AM, Dave Caroline wrote:
> It is educational to look at memory use in the pc when that pdf is loaded.
> Evince here is using 600meg do you have space for such objects on
> these little toys
>
> try something like diva so you dont suck the resources dry on the client
Plea
Are any of you able to open the following URL with an Android-based tablet
device:
http://dh.crc.nd.edu/sandbox/cyl/corpus/canarybird00schm.pdf
I have harvested about 60 PDF documents from the Internet Archive, and I
created a rudimentary tablet-based interface to the collection here:
http:/
On Sep 28, 2011, at 2:29 PM, Michael J. Giarlo wrote:
> P.S. Perhaps those who take issue with Mr. Tennant's listserv
> etiquette and ethics can take this up privately?
I concur. Let's please talk about code and libraries.
--
Eric Lease Morgan
Mailing List Owner and All Around Heavy
condition. Really.
I'm serious.
--
Eric Lease Morgan
> Are there any ground rules or terms of use for this list... All I can
> find is this:
>
> https://listserv.nd.edu/cgi-bin/wa?A2=ind0312&L=CODE4LIB&T=0&F=&S=&P=61
Now that is one h3ll of a policy, if I do say so myself! --ELM
gt; Let's accept that and move on.
I concur. Let's accept that and move on.
--
Eric Lease Morgan
plotting place names on a world maps.
http://bit.ly/ojWmzN
For more information about the DPLA, see -- http://bit.ly/irjzqO
--
Eric Lease Morgan
University of Notre Dame
lifeandliterature.org/p/code-challenge.html
Interesting, and I see a growing number of these sorts of "challenges". Fun! Is
there a prize?
--
Eric Lease Morgan
University of Notre Dame
server", we will write a piece of intermediary software to act as a go
between. This isn't really a big deal since all of our other implementations of
Fedora are expected to work in the same way. Wish us luck.
--
Eric Lease Morgan
University of Notre Dame
On Aug 29, 2011, at 3:38 PM, James Gilbert wrote:
> Works fine on my computer... Are both Adobe Reader and Windows Updates
> current?
>
> I had this issue on a book-keeper's computer... installed more recent
> version of Adobe Reader, and seemed to fix it.
I don't know if things are up-to-date o
ntral. Networking issue? Port
issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus off-campus issue?
Could some of y'all try to load some of the URLs with IE and tell me your
experience? Other suggestions would be greatly appreciated as well.
--
Eric Lease Morgan
University of Notre Dame
to do." Finally, like most institutions, libraries are risk adverse
places. I believe all of these factors contribute to the idea that open source
software is still in the adoption phase.
--
Eric Lease Morgan
developers have "scratched their itch", made their software
available to the world, and if so inclined, spent time and effort building a
community around the software. Your approach seems RFP-like. Statements of work
will be drafted by LTS. Developers (consultants) will respond, be selected, and
contracted. Software will be created.
--
Eric Lease Morgan
subsets of the complete metadata, with a link to a more
extensive, standalone representation URI (see the Section Listing
Acquisition Feeds for more).
The thing described is a electronic catalog, albeit not necessarily a "library
catalog", but a catalog none-the-less.
--
Eric Lease Morgan
On Jul 30, 2011, at 10:05 AM, Eric Lease Morgan wrote:
>> http://opds-spec.org/2011/06/15/opds-1-1-call-for-comments/
>
> By way of the Code4Lib mailing list and Ed
Sh!t Don't you hate when that happens.
--
Earache Least Moron
201 - 300 of 725 matches
Mail list logo