Re: [tesseract-ocr] Any success story?

2023-11-15 Thread Robert Komar
I have been following this list for many years.  The vast majority of the questions are the same ones over and over.  After a while, it gets really tiresome to give the same answers time and again.  So, I think many are too sick and tired of it to keep responding, particularly when the answers

Re: [tesseract-ocr] Re: Problem with colored Tif Images

2016-03-21 Thread Robert Komar
On Fri, 18 Mar 2016, Edson Luis Moretti wrote: Yes, I was wondering the same. When I did the post I saw that the GMail's preview didn't open the image. I already did convert them to work with Tesseract but the original file has 353kb and converting them the output is like twice bigger. I need

Re: [tesseract-ocr] Searchable PDF

2016-03-21 Thread Robert Komar
On Mon, 21 Mar 2016, Javier Escribano wrote: Hi all, I want to create a searchable PDF from existing scanned PDF without losing its properties. I'm confused about this point. How can I make a PDF searchable without alter its properties? (pdf/a, black & white...) I tried with Tess4j but I

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Robert Komar
On Tue, 20 Jan 2015, newbie wrote: I found that vip1200.jpg works at scale Width(8654px) and height(5748px), but most of the time I either get an Invalid mem access or out of mem(heap) error before I am able to rescale to the optimal scale.I need to come up with some other generic way to

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-14 Thread Robert Komar
On Wed, 14 Jan 2015, newbie wrote: Flash Thunder, I think I went ahead of myself in the email below. The upscaled image has the same dpi as the original image( 96dpi). I ahve upscaled pixels for which the ocr works without doing step 2 and 3(by trail and error). But I dont

Re: [tesseract-ocr] how can I get better results for this

2014-10-27 Thread Robert Komar
On Mon, 27 Oct 2014, Rick Leir wrote: Hi Rob My preprocessing is mentioned in this post: https://groups.google.com/forum/#!topic/tesseract-ocr/jON GSChLRv4 Maybe you would call it adaptive? Adaptive means that the threshold for choosing between black and white changes over different parts of

Re: [tesseract-ocr] how can I get better results for this

2014-10-17 Thread Robert Komar
On Fri, 17 Oct 2014, Rick Leir wrote: I opened the jpg in Gimp, and you can see that it is about 100 pixels per text line: [gimpOriginal.png] That image looks to be scanned at about 150 dpi. With such faint characters, scanning at 300 or 600 dpi would have been better. Anyway, try scaling

Re: [tesseract-ocr] Re: Tesseract on simple image

2014-10-09 Thread Robert Komar
On Wed, 8 Oct 2014, Quan Nguyen wrote: Rescaling to 300DPI would help. The DPI is confusing if you're not running a scanner, and the minumum only applies to typical font sizes. It's the size of the characters in pixels that is important. Perhaps the wiki page should have some images for the

Re: [tesseract-ocr] Any suggestions on pre-processing to improve accuracy?

2014-06-22 Thread Robert Komar
On Fri, 20 Jun 2014, Traun Leyden wrote: Thanks, this is really useful. (and shame on me for not RTFM'ing a bit more first) That document mentions to make sure the orientation/skew is straight, but does not give any hints on how to actually do this in an automated fashion. Any tips? You can

Re: [tesseract-ocr] Re: Poor translations on first attempts

2014-04-10 Thread Robert Komar
On Thu, 10 Apr 2014, Joel Wheeler wrote: Thank you for your suggestion! I had read that suggestion in the docs prior to my attempts but didn't believe that taking the same image and simply bumping up the dpi on it would fix the translation errors. It seems like this would be something that

Re: minimum requirements for tesseract

2014-03-12 Thread Robert Komar
On Wed, 12 Mar 2014, Leonardo Gabrielli wrote: I'd like to bump this topic.I am new to OCR and I find weird enough that there's no hardware for OCR nowadays (dedicated ICs), no libraries for embedded architectures and the only option is to use ARM platforms with linux and tesseract. OCR is

Re: Different Results on Linux vs Windows

2013-11-05 Thread Robert Komar
On Tue, 5 Nov 2013, Nick White wrote: Hi Patrick, 1) Is it normal for tesseract to give different results on different operating systems? 2) If so, what sort of things accounts for the differences? 3) Is it possible to get more consistent results through configuration? There are examples of

Re: OCR of C code

2013-09-13 Thread Robert Komar
Hi Stuart, if the characters that touch do so consistently, then maybe you can train your own language, including in it the pairs of characters that usually connect. I'm pretty sure that Google already does this for cases like fi and fl. You can then tell tesseract to use both english and your

Re: OCR of C code

2013-09-12 Thread Robert Komar
On Wed, 11 Sep 2013, Stuart wrote: I'm trying to convert some old C code I only have printouts of back to source. I expected to have to do a little editing, but Tesseract is having serious problems. I scanned the images in at 800 DPI, it looks clean and I tried some of the imagemagic scripts

Re: OCR of C code

2013-09-12 Thread Robert Komar
On Thu, 12 Sep 2013, Stuart wrote: Automatically subdividing each image into character cells and OCR'ing each character separately sems like the only way out of this. I am experimenting with makebox to define the boxes first. Argh! When I read proportional font I thought monospace font,

Re: Hindi training data - unicharset_extractor error

2013-04-17 Thread Robert Komar
On Wed, 17 Apr 2013, Sven Pedersen wrote: This is covered in theFAQ:https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_ do_I_add_just_one_character_or_one_font_to_my_favourite_l ang which links to the training WIKI https://code.google.com/p/tesseract-ocr/wiki/TrainingTess eract3 --Sven

Re: I have an question about baseapi.c++ code

2013-03-20 Thread Robert Komar
On Wed, 20 Mar 2013, Choi wrote: hello everyone! It is about a function in api/baseapi.cpp. void SetImage(const unsigned char* imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line); Is it possible that bytes_per_line not equal to

Re: Error when executed

2012-10-26 Thread Robert Komar
On Fri, 26 Oct 2012, Gaara Sabaku wrote: tesseract is designed to read tiffs. Use image magic to convert the file. MSPaint in vista and up can save files to tiff format. TIFF data can be compressed with all kinds of compression methods, including JPEG. So, converting from JPEG to TIFF may

Re: How to improve recognition on TIFF black-and white Romanian text?

2012-09-06 Thread Robert Komar
On Thu, 6 Sep 2012, Nick White wrote: Hi Piyush, As you said 600 DPI image would be good for OCRs. But I am not able to relate 600 DPI with these parameters. My guess is DPI is same as density. Any suggestion would be highly appreciated. DPI is the same as imagemagick's -density command, at

Re: can tesseract decode the OCR from CCITT facsimile standard images?

2012-09-06 Thread Robert Komar
On Thu, 6 Sep 2012, newtotesseract wrote: Hi Nick, I tried passing in the CCITTFaxDecode data to tesseract, but it was not detected as TIFF. It seems like CCITT fax is not same as TIFF. Google search showed me that few other people also faced same issue

Re: Error while archiving OCR Demo on iPhone

2012-06-23 Thread Robert Komar
On Fri, 22 Jun 2012, Ananth K wrote: Hi all, Im new to OCR and Im an iPhone application developer. I tried the OCR demo code, which builds and even runs fine. But Im facing an issue while archiving it. The xcode version is 4.3.2. The sample project Im using

Re: Is there a way to train Tesseract to NOT output/recognize a character?

2012-06-06 Thread Robert Komar
On Wed, 6 Jun 2012, La Monte H. P. Yarroll wrote: Am I the only one wondering what a printable control character might look like? To me control character is a thing like carriage return or form feed which doesn't have a printable representation. Those actually are printable because they do

Re: Packaging

2011-12-01 Thread Robert Komar
On Tue, 29 Nov 2011, Jarrett Jordaan wrote: Thanks. I tared all those files and unpacked it on a new server. Just had to run ldconfig though. I don't have a build server as yet. But will have a look at automation. For now, manually compiling on target platform will have to-do :) What an

Re: Packaging

2011-11-29 Thread Robert Komar
On Tue, 29 Nov 2011, Jarrett Jordaan wrote: I'm not a C++ pro. I know a bit about linux. That said, are all the files used by tessaract in: /usr/local/lib/ /usr/local/bin/ /usr/local/share/tessdata Have I missed anything? That way I can compile on one server, compress then and unpack files

Re: Installation

2011-09-30 Thread Robert Komar
On Fri, 30 Sep 2011, merve t wrote: again My question is should not the installation distribute headers into usr/include directory??? Generally, external packages built from sources get installed into /usr/local or /opt by default. That way they survive system upgrades without being

Re: Ang.: Re: logging data with web camera and tessaract

2011-08-19 Thread Robert Komar
On Fri, 19 Aug 2011, Andriy Malovanyy wrote: To sriranga: I tried changing dpi (check the previous post). It doesnt work. Did you rescale the image from 72 dpi to 300 dpi, or just change the tag on the original image to say 300 dpi? The latter won't work. Tesseract seems to be tuned to work

Re: Ang.: Re: logging data with web camera and tessaract

2011-08-19 Thread Robert Komar
On Fri, 19 Aug 2011, Andriy Malovanyy wrote: To Rob: Initially I had 640x480 image with 72dpi with number occupying almost all the image. What I did is just opened the image in Photoshop, went to size of image menu, changed the resolution to 300 dpi (image increased in size) and set the image

Re: Deskew waves in a document

2011-05-06 Thread Robert Komar
On Fri, 6 May 2011, Patrick Collins wrote: Hi,I am trying to scan a series of documents which have been badly skewed by the book's edge. Has anyone seen any commercial or open sources implementations of deskewing software which can handle advanced deskew's like this? Patrick. It's hard to

Re: insert blank line (or any other mark) between paragraphs

2011-04-30 Thread Robert Komar
On Sat, 30 Apr 2011, Enrico Segre wrote: Thx. I tried your patch in rev 581, and on my test page it worked like I expected only if I cropped the image very close to the left text margin. With a larger margin, a line break is inserted at every line. It may well have to do with the changed

Re: Several input files into one output file

2011-04-28 Thread Robert Komar
On Fri, 29 Apr 2011, Dmitri Silaev wrote: No, with Tesseract itself it's not possible. This is a job for old good batch files or scripts. Warm regards, Dmitri Silaev www.CustomOCR.com On Fri, Apr 29, 2011 at 5:41 AM, faye stefan.der.pr...@googlemail.com wrote: Is there an option to let

Re: insert blank line (or any other mark) between paragraphs

2011-04-25 Thread Robert Komar
On Mon, 25 Apr 2011, Enrico Segre wrote: I'm striving to use tesseract for providing content to the Project Gutenberg. There, proofing workflow requires that one blank line is inserted between each recognized paragraph, paragraphs being defined by a changing indentation of their first line

block bounding box no longer being determined?

2010-11-23 Thread Robert Komar
Hi all, a while ago, I wrote myself a hack to tesseract to insert a blank line before every new paragraph. I did that by checking the x-position of the first word in every line with respect to the left side of the current block in baseapi.cpp:TessBaseAPI::GetUTF8Text(). This code worked well

Re: Announcement: new version of pyTesseractTrainer available

2010-08-13 Thread Robert Komar
On Fri, 13 Aug 2010, zdenko podobny wrote: Because IFAIK nobody react on Catalin e-mail I offered him to create project to collect patches and possibly to solve known issues. Because of my low time resource project is looking still for owner/contributors. Warmly welcomed are expect for python

Re: Using Tesseract from a C++ application.

2010-04-07 Thread Robert Komar
On Wed, 7 Apr 2010, MARTIN Pierre wrote: i know that's off topic, but i suspect people here to be aware on this domain of expertise: is OCROpus compilable for Windows? Without Cygwin? Is there a C++ API useable to lambda MSVStudio users for example?

Re: Using Tesseract from a C++ application.

2010-04-07 Thread Robert Komar
On Thu, 8 Apr 2010, MARTIN Pierre wrote: Maybe _you_ could be the resource that helps them port to Windows. i'll make them an offer of my skills, yes. But why not on Tesseract? It would be less work than writing another OCR engine from scratch, and you would get results a lot sooner. Yes it

Re: Tesseract 3.0

2009-09-01 Thread Robert Komar
On Tue, 1 Sep 2009, Bertl wrote: On Sep 1, 6:00 pm, SteveP spohor...@sjm.com wrote: This might be the same issue discussed in the thread Latest Tesseract on Mac OS?. yep, indeed ... maybe the README or INSTALL.* files should cover that part in the near future? anyway, thanks a bunch,

Re: tesseract 3.0 missing svn file

2009-08-20 Thread Robert Komar
On Thu, 20 Aug 2009, MoKy wrote: Am I the only one to not see this file ? Moky On 18 ao?t, 18:41, MoKy joffrey.agou...@gmail.com wrote: Hello, I've just checked the SVN and there is still no ccmain/ tesseractmain.cpp. Where can I find this file ? Can the svn be updated to be complete

Re: tesseract 3.0 missing svn file

2009-08-20 Thread Robert Komar
On Thu, 20 Aug 2009, Ray Smith wrote: The problem is that the configure script was out of date. I have just updated the configure script and it should now work, unless your system doesn't have the correct version of autotools, in which case you still have to run runautoconf.