Re: [CODE4LIB] Linux tools for making PDFs

2009-02-03 Thread Posthumus, Etienne
Yan, I am curious what kind of image files you mean that contain east Asian characters, surely an image has no characters in it but just pixels? Or do you mean the filenames? This: http://www.swftools.org/about.html Is the best thing since sliced bread for me. Among other things it can convert

Re: [CODE4LIB] Linux tools for making PDFs

2009-02-03 Thread Anand Chitipothu
> Yan, not sure how it handles east Asian characters, but imagemagick will > create PDFs, e.g., > > convert FILE.jpg FILE.pdf I have used this for converting an SVG image with some text in Telugu language without any issues.

Re: [CODE4LIB] OCR engine for Persian/Dari

2009-02-03 Thread Mark Jordan
Hi again Yan, There's this one: http://www.worldlanguage.com/Products/Readiris-Pro-11-Middle-East-Edition-ArabicReadiris-Farsi-Persian-Arabic-Farsi-110226.htm We have a copy of the Traditional Chinese version of Readiris and find its accuracy to be fairly poor (and its performance on latin char

Re: [CODE4LIB] Linux tools for making PDFs

2009-02-03 Thread Mark Jordan
Yan, not sure how it handles east Asian characters, but imagemagick will create PDFs, e.g., convert FILE.jpg FILE.pdf See http://www.imagemagick.org/script/convert.php for more info. Mark - "Yan Han" wrote: > Hello, > > > > Do you know a tool running under Linux to make PDFs from i

[CODE4LIB] OCR engine for Persian/Dari

2009-02-03 Thread Han, Yan
Hello, Do you know an OCR engine for Persian/Dari ? If so, what is the accurate rate? Thanks, Yan

[CODE4LIB] Linux tools for making PDFs

2009-02-03 Thread Han, Yan
Hello, Do you know a tool running under Linux to make PDFs from images? I use Adobe Acrobat professional in Windows to create PDFs from image files. However, Acrobat does not handle image files with east Asian characters. Yan

[CODE4LIB] A particular OCR challenge...

2009-02-03 Thread Karen Coyle
The 11th edition of the Dewey Decimal system, which he wrote in his 'reformed spelling.' Amazingly, the Google text (at least the part I've scanned) catches it perfectly: "In the clast card catalog the clasification is mapt out abuv the cards by projecting gyds, making reference almost instant

[CODE4LIB] REMINDER: Call for Code4Lib 2010 Hosting Proposals

2009-02-03 Thread Michael J. Giarlo
Folks, Just a reminder that the deadline for Code4Lib 2010 hosting proposals is next Thursday, February 12th. See below for more information. -Mike The Code4Lib Conference Planning Group is putting out a call for proposals to host the 2010 Code4Lib Conference. Information on the kin

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Walter Lewis
Gabriel Farrell wrote: On Tue, Feb 03, 2009 at 10:09:54AM -0500, Walter Lewis wrote: If we had to correct it all: a) it would never get done and b) it would be better than some of the originals which are rife with typographic errors. Hence the genius of Distributed Proofreaders [1] a

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Gabriel Farrell
On Tue, Feb 03, 2009 at 10:09:54AM -0500, Walter Lewis wrote: > If we had to correct it all: a) it would never get done and b) it would > be better than some of the originals which are rife with typographic > errors. Hence the genius of Distributed Proofreaders [1] and reCAPTCHA [2]. [1] http:

[CODE4LIB] Roommates and rides

2009-02-03 Thread Emily Molanphy
Just wanted to send a reminder of this useful wiki page for roommates and rides: http://wiki.code4lib.org/index.php/RoommatesRidesEtc I have an ulterior motive for the reminder--someone I was tentatively going to share a room with turns out not to be able to make it. So if you're interested in s

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Walter Lewis
Karen Coyle wrote: I know that 98% is impressive, but I always like to remember that with an average of 2000 characters per page that means 40 potential errors per book page. Just to give us some perspective on the level of cleanup that will be needed for books being digitized today. The "good"

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Karen Coyle
Randy Stern wrote: Abbyy Finereader and Nuance Omnipage are the two leading commercial OCR products. Both can achieve 98% + character accuracy on most book-like material scanned at 300 dpi. I know that 98% is impressive, but I always like to remember that with an average of 2000 characters pe

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Walter Lewis
Randy Stern wrote: Abbyy Finereader and Nuance Omnipage are the two leading commercial OCR products. Both can achieve 98% + character accuracy on most book-like material scanned at 300 dpi. At 07:37 AM 2/3/2009 -0500, Nicole Engard wrote: I'm with Christian - I loved Abbyy FineReader when I u

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Randy Stern
Abbyy Finereader and Nuance Omnipage are the two leading commercial OCR products. Both can achieve 98% + character accuracy on most book-like material scanned at 300 dpi. - Randy Stern (who formerly worked in the OCR industry) At 07:37 AM 2/3/2009 -0500, Nicole Engard wrote: I'm with Christia

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Nicole Engard
I'm with Christian - I loved Abbyy FineReader when I used it at both my previous libraries. It's very accurate and it's affordable if you're not using it for mass digitization :) but we never got the server contract because like Christian said - it is quite expensive. --- Nicole C. Engard Open S

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread MJ Ray
Alberto Accomazzi wrote: > [...] I know about OCRopus but I have a feeling that > commercial products still have a significant edge over public domain > packages. [...] OCRopus is released under the Apache License 2.0, which allows commercial development. It is not a public domain package. Fee

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Christian Mahnke
Hello, 2009/2/3 Alberto Accomazzi > Sorry if this is a bit off-topic, but I was wondering if any of you clever > fellows have a recommendation for an OCR package, possibly with a native > linux port. I know about OCRopus but I have a feeling that commercial > products still have a significant e

Re: [CODE4LIB] "best" OCR package?

2009-02-03 Thread Emmanuel Di Pretoro
Hi, It wasn't a recommendation since I never try it, but I've heard a lot of good about tesseract. It was currently developed by Google, but I don't know if they use it. Some link : - http://code.google.com/p/tesseract-ocr/ - http://en.wikipedia.org/wiki/Tesseract_%28software%29 Hope this help