I have been following this list for many years. The vast majority of
the questions are the same ones over and over. After a while, it gets
really tiresome to give the same answers time and again. So, I think
many are too sick and tired of it to keep responding, particularly when
the answers
On Fri, 18 Mar 2016, Edson Luis Moretti wrote:
Yes, I was wondering the same. When I did the post I saw
that the GMail's preview didn't open the image. I already
did convert them to work with Tesseract but the original
file has 353kb and converting them the output is like
twice bigger.
I need
On Mon, 21 Mar 2016, Javier Escribano wrote:
Hi all,
I want to create a searchable PDF from existing scanned
PDF without losing its properties. I'm confused about this
point.
How can I make a PDF searchable without alter its
properties? (pdf/a, black & white...)
I tried with Tess4j but I
On Tue, 20 Jan 2015, newbie wrote:
I found that vip1200.jpg works at scale Width(8654px) and
height(5748px), but most of the time I either get an
Invalid mem access or out of mem(heap) error before I am
able to rescale to the optimal scale.I need to come up
with some other generic way to
On Wed, 14 Jan 2015, newbie wrote:
Flash Thunder, I think I went ahead
of myself in the email below. The upscaled image has the
same dpi as the original image( 96dpi). I ahve upscaled
pixels for which the ocr works without doing step 2 and
3(by trail and error). But I dont
On Mon, 27 Oct 2014, Rick Leir wrote:
Hi Rob
My preprocessing is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jON
GSChLRv4
Maybe you would call it adaptive?
Adaptive means that the threshold for choosing between
black and white changes over different parts of
On Fri, 17 Oct 2014, Rick Leir wrote:
I opened the jpg in Gimp, and you can see that it is about
100 pixels per text line:
[gimpOriginal.png]
That image looks to be scanned at about 150 dpi. With
such faint characters, scanning at 300 or 600 dpi would
have been better. Anyway, try scaling
On Wed, 8 Oct 2014, Quan Nguyen wrote:
Rescaling to 300DPI would help.
The DPI is confusing if you're not running a scanner,
and the minumum only applies to typical font sizes.
It's the size of the characters in pixels that is
important. Perhaps the wiki page should have some
images for the
On Fri, 20 Jun 2014, Traun Leyden wrote:
Thanks, this is really useful. (and shame on me for not
RTFM'ing a bit more first)
That document mentions to make sure the orientation/skew
is straight, but does not give any hints on how to
actually do this in an automated fashion. Any tips?
You can
On Thu, 10 Apr 2014, Joel Wheeler wrote:
Thank you for your suggestion! I had read that suggestion
in the docs prior to my attempts but didn't believe that
taking the same image and simply bumping up the dpi on it
would fix the translation errors. It seems like this would
be something that
On Wed, 12 Mar 2014, Leonardo Gabrielli wrote:
I'd like to bump this topic.I am new to OCR and I find
weird enough that there's no hardware for OCR nowadays
(dedicated ICs), no libraries for embedded architectures
and the only option is to use ARM platforms with linux and
tesseract.
OCR is
On Tue, 5 Nov 2013, Nick White wrote:
Hi Patrick,
1) Is it normal for tesseract to give different results on different operating
systems?
2) If so, what sort of things accounts for the differences?
3) Is it possible to get more consistent results through configuration?
There are examples of
Hi Stuart,
if the characters that touch do so consistently, then
maybe you can train your own language, including
in it the pairs of characters that usually connect.
I'm pretty sure that Google already does this for
cases like fi and fl. You can then tell tesseract
to use both english and your
On Wed, 11 Sep 2013, Stuart wrote:
I'm trying to convert some old C code I only have
printouts of back to source. I expected to have to do a
little editing, but Tesseract is having serious problems.
I scanned the images in at 800 DPI, it looks clean and I
tried some of the imagemagic scripts
On Thu, 12 Sep 2013, Stuart wrote:
Automatically subdividing each image into character cells
and OCR'ing each character separately sems like the only
way out of this. I am experimenting with makebox to define
the boxes first.
Argh! When I read proportional font I thought
monospace font,
On Wed, 17 Apr 2013, Sven Pedersen wrote:
This is covered in theFAQ:https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_
do_I_add_just_one_character_or_one_font_to_my_favourite_l
ang
which links to the training WIKI
https://code.google.com/p/tesseract-ocr/wiki/TrainingTess
eract3
--Sven
On Wed, 20 Mar 2013, Choi wrote:
hello everyone!
It is about a function in api/baseapi.cpp.
void SetImage(const unsigned char* imagedata, int width,
int height,
int bytes_per_pixel, int bytes_per_line);
Is it possible that bytes_per_line not equal to
On Fri, 26 Oct 2012, Gaara Sabaku wrote:
tesseract is designed to read tiffs. Use image magic to
convert the file.
MSPaint in vista and up can save files to tiff format.
TIFF data can be compressed with all kinds of compression
methods, including JPEG. So, converting from JPEG to TIFF
may
On Thu, 6 Sep 2012, Nick White wrote:
Hi Piyush,
As you said 600 DPI image would be good for OCRs. But I am not able to relate
600 DPI with these parameters. My guess is DPI is same as density. Any
suggestion would be highly appreciated.
DPI is the same as imagemagick's -density command, at
On Thu, 6 Sep 2012, newtotesseract wrote:
Hi Nick,
I tried passing in the CCITTFaxDecode data to tesseract,
but it was not detected as TIFF.
It seems like CCITT fax is not same as TIFF.
Google search showed me that few other people also faced
same issue
On Fri, 22 Jun 2012, Ananth K wrote:
Hi all,
Im new to OCR and Im an iPhone application developer. I
tried the OCR demo code, which builds and even runs fine.
But Im facing an issue while archiving it. The xcode
version is 4.3.2. The sample project Im using
On Wed, 6 Jun 2012, La Monte H. P. Yarroll wrote:
Am I the only one wondering what a printable control
character might look like? To me control character is a
thing like carriage return or form feed which doesn't have
a printable representation.
Those actually are printable because they do
On Tue, 29 Nov 2011, Jarrett Jordaan wrote:
Thanks.
I tared all those files and unpacked it on a new server.
Just had to run ldconfig though.
I don't have a build server as yet. But will have a look at
automation.
For now, manually compiling on target platform will have to-do :)
What an
On Tue, 29 Nov 2011, Jarrett Jordaan wrote:
I'm not a C++ pro. I know a bit about linux. That said, are all the
files used by tessaract in:
/usr/local/lib/
/usr/local/bin/
/usr/local/share/tessdata
Have I missed anything?
That way I can compile on one server, compress then and unpack files
On Fri, 30 Sep 2011, merve t wrote:
again My question is should not the installation distribute headers into
usr/include directory???
Generally, external packages built from sources get installed into
/usr/local or /opt by default. That way they survive system upgrades
without being
On Fri, 19 Aug 2011, Andriy Malovanyy wrote:
To sriranga:
I tried changing dpi (check the previous post). It doesnt work.
Did you rescale the image from 72 dpi to 300 dpi, or just change
the tag on the original image to say 300 dpi? The latter won't work.
Tesseract seems to be tuned to work
On Fri, 19 Aug 2011, Andriy Malovanyy wrote:
To Rob:
Initially I had 640x480 image with 72dpi with number occupying almost
all the image. What I did is just opened the image in Photoshop, went
to size of image menu, changed the resolution to 300 dpi (image
increased in size) and set the image
On Fri, 6 May 2011, Patrick Collins wrote:
Hi,I am trying to scan a series of documents which have been badly skewed by
the book's edge. Has anyone seen any commercial or open sources
implementations of deskewing software which can handle advanced deskew's
like this?
Patrick.
It's hard to
On Sat, 30 Apr 2011, Enrico Segre wrote:
Thx. I tried your patch in rev 581, and on my test page it worked like
I expected only if I cropped the image very close to the left text
margin. With a larger margin, a line break is inserted at every line.
It may well have to do with the changed
On Fri, 29 Apr 2011, Dmitri Silaev wrote:
No, with Tesseract itself it's not possible.
This is a job for old good batch files or scripts.
Warm regards,
Dmitri Silaev
www.CustomOCR.com
On Fri, Apr 29, 2011 at 5:41 AM, faye stefan.der.pr...@googlemail.com wrote:
Is there an option to let
On Mon, 25 Apr 2011, Enrico Segre wrote:
I'm striving to use tesseract for providing content to the Project
Gutenberg. There, proofing workflow requires that one blank line is
inserted between each recognized paragraph, paragraphs being defined
by a changing indentation of their first line
Hi all,
a while ago, I wrote myself a hack to tesseract to insert
a blank line before every new paragraph. I did that by
checking the x-position of the first word in every line
with respect to the left side of the current block in
baseapi.cpp:TessBaseAPI::GetUTF8Text(). This code worked
well
On Fri, 13 Aug 2010, zdenko podobny wrote:
Because IFAIK nobody react on Catalin e-mail I offered him to create project
to collect patches and possibly to solve known issues. Because of my low
time resource project is looking still for owner/contributors. Warmly
welcomed are expect for python
On Wed, 7 Apr 2010, MARTIN Pierre wrote:
i know that's off topic, but i suspect people here to be aware on this domain
of expertise: is OCROpus compilable for Windows? Without Cygwin? Is there a C++
API useable to lambda MSVStudio users for example?
On Thu, 8 Apr 2010, MARTIN Pierre wrote:
Maybe _you_ could be the resource that helps them port to Windows.
i'll make them an offer of my skills, yes. But why not on Tesseract?
It would be less work than writing another OCR engine from scratch,
and you would get results a lot sooner.
Yes it
On Tue, 1 Sep 2009, Bertl wrote:
On Sep 1, 6:00 pm, SteveP spohor...@sjm.com wrote:
This might be the same issue discussed in the thread Latest Tesseract
on Mac OS?.
yep, indeed ... maybe the README or INSTALL.* files should cover that
part in the near future?
anyway, thanks a bunch,
On Thu, 20 Aug 2009, MoKy wrote:
Am I the only one to not see this file ?
Moky
On 18 ao?t, 18:41, MoKy joffrey.agou...@gmail.com wrote:
Hello,
I've just checked the SVN and there is still no ccmain/
tesseractmain.cpp.
Where can I find this file ? Can the svn be updated to be complete
On Thu, 20 Aug 2009, Ray Smith wrote:
The problem is that the configure script was out of date. I have just updated
the configure script and it should now
work, unless your system doesn't have the correct version of autotools, in
which case you still have to run runautoconf.
38 matches
Mail list logo