I have thousands of forms equivalent to invoices that I'd like to put into
a database. I'm thinking I would like to have some OCR app/tool scan these
forms, and then generate a CSV with each field. Does anyone have
recommendations on software for this?
--
Adam Vande More
equivalent to invoices that I'd like to put into
a database. I'm thinking I would like to have some OCR app/tool scan these
forms, and then generate a CSV with each field. Does anyone have
recommendations on software for this?
--
Adam Vande More
if Finereader runs
under
emulator though. If the file is already a PDF and 72 DPI with
text
as
graphics most of the damage has already been done, and it will be
extremely hard to OCR.
well, damage is probably done. how can i check the
resolution?
i
--
From: Gary Kline kl...@thought.org
Sent: Thursday, January 29, 2009 4:23 AM
To: Andrew Gould andrewlylego...@gmail.com
Cc: Reko Turja reko.tu...@liukuma.net; FreeBSD Mailing List
freebsd-questions@freebsd.org
Subject: Re: OCR...
On Wed, Jan
Mailing List
freebsd-questions@freebsd.org
Subject: Re: OCR...
On Wed, Jan 28, 2009 at 07:33:41PM -0600, Andrew Gould wrote:
On Wed, Jan 28, 2009 at 5:09 PM, Gary Kline kl...@thought.org wrote:
On Wed, Jan 28, 2009 at 01:32:57PM -0600, Andrew Gould wrote:
On Wed, Jan 28, 2009 at 1:22
Gary Kline wrote:
well, i'm ashamed to admit that i've put at least a dozen hours in
trying, then re-re-retrying to OCR a imaged pdf file with as many
open source ocr packages as i can find.
I have seen good results with tesseract which is in the ports and free.
Otherwise with OmniPage
to OCR.
-Reko
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
of the damage has already been done, and it will be
extremely hard to OCR.
well, damage is probably done. how can i check the resolution?
i tried to increase it by creating huge ppm and tif files, but
then that's really absurd since there can only be just so much
though. If the file is already a PDF and 72 DPI with text as
graphics most of the damage has already been done, and it will be
extremely hard to OCR.
well, damage is probably done. how can i check the resolution?
i tried to increase it by creating huge ppm and tif files
either feature or qualitywise. No idea if Finereader runs under
emulator though. If the file is already a PDF and 72 DPI with text as
graphics most of the damage has already been done, and it will be
extremely hard to OCR.
well, damage is probably done. how can i check
, and it will be
extremely hard to OCR.
well, damage is probably done. how can i check the resolution?
i tried to increase it by creating huge ppm and tif files, but
then that's really absurd since there can only be just so much
data per image. i _could_ try xv and jpeg
as
graphics most of the damage has already been done, and it will be
extremely hard to OCR.
well, damage is probably done. how can i check the resolution?
i tried to increase it by creating huge ppm and tif files, but
then that's really absurd since
guys,
well, i'm ashamed to admit that i've put at least a dozen hours in
trying, then re-re-retrying to OCR a imaged pdf file with as many
open source ocr packages as i can find. before i quit for supper
tonight, i finally threw in the towel. realized than i would have
been THROUGH with all
of .jpg/.gif/.whatever. Read the manual carefully before
attempting; also note this can be a slow process.
Which still doesn't give plain text. But in this case one would need an
OCR app.
There is a new one available in ports called cuneiform. It is supposed
to be quite good, but I haven't had
On Tue, Dec 02, 2008 at 02:07:30AM +0100, Roland Smith wrote:
On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote:
pdftotext fail on the large [32MB] file I've got. Is there any
other way I can translate this huge textfile to ascii or html or
text?
Please define fail
'pypdf' to split a multipage
PDF scan into individual pages, then used the tesseract OCR to convert
to text. Not 100% of course, and it really got confused by pages that
were not right-side-up, but not a bad start for pages that are really
scans -- images -- rather than PDF representation
Guys,
pdftotext fail on the large [32MB] file I've got. Is there any other
way I
can translate this huge textfile to ascii or html or text?
thanks,
gary
--
Gary Kline [EMAIL PROTECTED] http://www.thought.org Public Service Unix
On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote:
pdftotext fail on the large [32MB] file I've got. Is there any
other way I can translate this huge textfile to ascii or html or
text?
Please define fail in this context? I've used pdftotxt on documents
exceeding
Roland Smith writes:
pdftotext fail on the large [32MB] file I've got. Is there any
other way I can translate this huge textfile to ascii or html or
text?
Please define fail in this context? I've used pdftotxt on
documents exceeding 40MB. However there are of course
1) Some PDFs are just wrappers around JPEG images. In this case
there is no text for pdftotext to convert = epic fail.
In this case convert from the ImageMagick port will get you a
series of .jpg/.gif/.whatever. Read the manual carefully before
attempting; also note this can be a
On Thu, Sep 01, 2005 at 08:07:26PM -0700, Gary Kline wrote:
People,
I want to scan ~400 pp of an out-of-print and out-of-copyright
book (from 1913) and need to know what the best scanner is
and if there has been substantial improvement in OCR
On 9/1/05, Gary Kline [EMAIL PROTECTED] wrote:
People,
I want to scan ~400 pp of an out-of-print and out-of-copyright
book (from 1913) and need to know what the best scanner is
and if there has been substantial improvement in OCR
software in recent
At 08:07 PM 9/1/2005 -0700, Gary Kline wrote:
People,
I want to scan ~400 pp of an out-of-print and out-of-copyright
book (from 1913) and need to know what the best scanner is
and if there has been substantial improvement in OCR
software in recent years
been substantial improvement in OCR
software in recent years. This book has few footnotes
or different typefaces, so it should make things easier.
Oh, an if there is something that plugs into DOS/DOZE
and just works, super. I'lll use my W2K box. (Hopefully
is
and if there has been substantial improvement in OCR
software in recent years. This book has few footnotes
or different typefaces, so it should make things easier.
Oh, an if there is something that plugs into DOS/DOZE
and just works, super. I'lll use my W2K box
substantial improvement in OCR
software in recent years. This book has few footnotes
or different typefaces, so it should make things easier.
There are several free OCR programs. I've used gocr
(http://jocr.sourceforge.net/ and no, that's not a typo) and ocrad
(http
it's been photographed, with something to keep the opposite page out of the
camera's way.
I have to admit that I do all my scanning and OCR on an OS X system, only
marginally related to FreeBSD. I use an older HP Scanjet with automatic
document feeder (ADF), and the HP software will scan straight
to know what the best scanner is
and if there has been substantial improvement in OCR
software in recent years. This book has few footnotes
or different typefaces, so it should make things easier.
Oh, an if there is something that plugs into DOS/DOZE
People,
I want to scan ~400 pp of an out-of-print and out-of-copyright
book (from 1913) and need to know what the best scanner is
and if there has been substantial improvement in OCR
software in recent years. This book has few footnotes
29 matches
Mail list logo