Re: How to detect inverted image in a picture

2011-03-16 Thread Dmitry Silaev
the outer outline of the big white blob is very long). One of Tess's notable features is that it can handle inverted text. Though it should be able to get all outlines, and you've helped him to achieve this. Warm regards, Dmitry Silaev On Wed, Mar 16, 2011 at 10:57 AM, Ice Head iceh

Re: Especial Characteres

2011-03-14 Thread Dmitry Silaev
Manuel, I'm afraid just chaining command line tools won't help in this case. I'm talking about programming. And yes, I did solve many practical problems related to layout analysis, and other fields of document image processing, and succeeded in it )) Warm regards, Dmitry Silaev On Mon, Mar

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
. Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer dhoff...@gmail.com wrote: Hi Vicky, Can you tell me more about this paper?  It looks like this is not a free document so I can't just read it to see if it would solve the problem I have. My problem is that I have

Re: Tesseract 3.00

2011-03-14 Thread Dmitry Silaev
Actually, there's more than just VietOCR. Check this: http://en.wikipedia.org/wiki/Tesseract_(software)#User_interfaces Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 2:13 AM, Onion onionzwie...@gmail.com wrote: Ok, thanks. That will be too complicated for me to use. Will have

Re: Tesseract 3.00

2011-03-14 Thread Dmitry Silaev
You don't need to bother using *two together*. Tesseract is a basis FreeOCR is built on, so these two are together already. FreeOCR's graphic interface is quite user friendly. Just install and use. I don't know what else needs to be said )) Warm regards, Dmitry Silaev On Mon, Mar 14, 2011

Re: Customising Tesseract for character recognition

2011-03-14 Thread Dmitry Silaev
Ehmm... I don't get it. If you've succeeded in using iterators, it's at your full disposal to format the output in any way you want programmatically, isn't it? Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 1:56 PM, Jose diox...@gmail.com wrote: *I only modify how the result is printed

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
Dave, Yep, quality is relatively poor so don't expect high accuracy from Tess. Do you need every table cell's contents? Or getting numbers is just enough and in a next step you can restore [predefined] item names? Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 4:19 PM, David Hoffer

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
Dave, What is the format and resolution in which you initially get your images? For such poor quality every conversion makes an image even worse... Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 5:29 PM, David Hoffer dhoff...@gmail.com wrote: Dmitry, Would using a loss-less format

Re: Especial Characteres

2011-03-14 Thread Dmitry Silaev
#f98699a9caf36dbc If you see no clues in these posts then you need to send your sample images, there's no other way to help you. Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 5:22 PM, manuel...@gmail.com manuel...@gmail.com wrote: Thanks. I need a GUI that tells to tesseract to recognize just

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
As I can see, your source data can be deemed as 1-bit (binary) losslessly compressed image. So a lossless conversion to any image format (makes no difference which) will do no harm. Warm regards, Dmitry Silaev On Tue, Mar 15, 2011 at 8:31 AM, David Hoffer dhoff...@gmail.com wrote: Dmitry

Re: how to get the character in an image file which is in table format.

2011-03-13 Thread Dmitry Silaev
own opinion, and it does not necessarily coincide with the views of other document image processing people. Warm regards, Dmitry Silaev On Sun, Mar 13, 2011 at 12:52 AM, TP wing...@gmail.com wrote: How about this technique mentioned in the Leptonica documentation (its even easier if you can

Re: Especial Characteres

2011-03-13 Thread Dmitry Silaev
this was tested under Windows. Probably I can try this under Ubuntu, but I don't know when I have enough time to reboot, set up a C++ compiler, build Tesseract and do some testing, sorry )) Are you sure you downloaded the latest stable version of Tesseract? Warm regards, Dmitry Silaev On Thu

Re: Tesseract 3.00

2011-03-13 Thread Dmitry Silaev
-ocr/wiki/ReadMe#Windows Warm regards, Dmitry Silaev On Sun, Mar 13, 2011 at 11:36 PM, Onion onionzwie...@gmail.com wrote: I installed Tesseract 3.00 and the German and Czech languages as well as English. Now how do I run it? Are there directions somewhere? When I click Start Tesseract OCR

Re: Customising Tesseract for character recognition

2011-03-13 Thread Dmitry Silaev
as with it, and always the result was satisfactory. Let me know the details on your command line and OS. Warm regards, Dmitry Silaev On Sun, Mar 13, 2011 at 11:18 PM, patrickq patrick.questemb...@gmail.com wrote: You expect way too much from Tesseract: it's not Tesseract's job to slice and dice

Re: how to get the character in an image file which is in table format.

2011-03-12 Thread Dmitry Silaev
Tesseract's layout analysis. Then go PSM_SINGLE_LINE and PSM_SINGLE_BLOCK. However for PSM_SINGLE_WORD or PSM_SINGLE_CHAR you'd need to do your own segmentation. I don't know if you are ready to dive into such serious development. HTH Warm regards, Dmitry Silaev On Sat, Mar 12, 2011 at 7:39 AM

Re: how to get the character in an image file which is in table format.

2011-03-11 Thread Dmitry Silaev
. You need to remove lines and borders and pass the cleaned image to Tesseract. There can arise many issues related to this process, but I think there's no need to tell anything else now, except if you express some interest in it. Warm regards, Dmitry Silaev On Fri, Mar 11, 2011 at 7:21 AM

Re: are there parameters to increase the chances for white space between words?

2011-03-11 Thread Dmitry Silaev
Try textord_words_min_minspace, fraction of x-height Warm regards, Dmitry Silaev On Mon, Mar 7, 2011 at 8:28 PM, JMW white.j...@gmail.com wrote: I'm having some consistent problems with lack of whte space between words.  I.e. Thisisyour statementthatshows theamount you owe foryour

Re: noise output

2011-03-04 Thread Dmitry Silaev
need to extend your pre-processing in order to feed Tess with images indeed containing text. Decisions can be made based on contrast estimation, distinctive color distribution, etc. HTH Warm regards, Dmitry Silaev On Fri, Mar 4, 2011 at 5:25 PM, zdravco zdra...@gmail.com wrote: Hello, I am

Fwd: noise output

2011-03-04 Thread Dmitry Silaev
method based on edge        detector.PDF HTH Warm regards, Dmitry Silaev On Sat, Mar 5, 2011 at 8:56 AM, Saurabh Gandhi saurabh...@gmail.com wrote: Hey, Any algorithm / whitepaper suggestions for text extraction, especially if the text is not over-lay text but a part of the image itself

Re: Especial Characteres

2011-03-03 Thread Dmitry Silaev
Sriranga, Thanks for letting me know. You are the first one then, and I invented the bicycle )) However an article might be still of use instead of verbose forum discussion... May be you'd like to write it then? Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold

Re: Especial Characteres

2011-03-03 Thread Dmitry Silaev
in programming can make this traineddata file himself )) Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:08 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dmitry, No I am NOT the first invented but actually credited to spohor...@sjm.com -who helped me very lot including

Re: Especial Characteres

2011-03-03 Thread Dmitry Silaev
Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my por.traineddata file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize. Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel

Re: image binarization

2011-03-02 Thread Dmitry Silaev
Without any image samples, you can only get a vague advice. Provide the community with samples and you might get a satisfactory concrete response. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 1:43 PM, Cong Nguyen congnguye...@gmail.com wrote: Please be careful with the Otsu algorithm

Re: Customising Tesseract for character recognition

2011-02-24 Thread Dmitry Silaev
, Dmitry Silaev On Thu, Feb 24, 2011 at 1:05 PM, Jose diox...@gmail.com wrote: Hi, (as you now Saurabh because we talked in private the other day) I tried the PSM_SINGLE_COLUMN and the accuracy drops dramatically... I can't afford to loose that accuracy. Is it possible to change the way the output

Re: Customising Tesseract for character recognition

2011-02-24 Thread Dmitry Silaev
, Dmitry Silaev On Thu, Feb 24, 2011 at 1:50 PM, Jose diox...@gmail.com wrote: Dmitry the recognition works the only thing is the way it is parsing it... :S I think segmentation of the images would be too much painful! I only won't to change the other that is display or the bounding boxes so I

Re: Customising Tesseract for character recognition

2011-02-24 Thread Dmitry Silaev
The best way to explain everything would be just to send your source image examples, describe what information you want to get from them and provide the community with the code snippets you use to interface with Tess. And please be as detailed as possible. Warm regards, Dmitry Silaev On Thu

Re: [Tesseract 3] English training text

2011-02-22 Thread Dmitry Silaev
Interesting. I was wondering about Cube since its traces began to appear in the source code but had no enough time to investigate it thorougly Zdenko, would you please kindly share your other findings on Cube? Regards, Dmitry On Tue, Feb 22, 2011 at 11:13 AM, zdenko podobny zde...@gmail.com

Re: problem in single word recognition

2011-02-22 Thread Dmitry Silaev
I might not understood you fully, but this is an obvious excerpt from baseapi.h: Each SetRectangle clears the recogntion results so multiple rectangles can be recognized with the same image Indeed, SetRectangle() calls ClearResults() which deletes the pageres and clears the block list ready for

Re: Adaptive Data

2011-02-21 Thread Dmitry Silaev
Hi Zvezdoslav, Check out the code of the Classify::EndAdaptiveClassifier() and Classify::InitAdaptiveClassifier() methods. Also search for classify_use_pre_adapted_templates and classify_save_adapted_templates HTH Regards, Dmitry On Feb 16, 4:50 pm, Zvezdoslav Kunov z.ku...@gmail.com wrote:

Re: Image pre-processing for good OCR results

2011-02-20 Thread Dmitry Silaev
Jon, I don't know if it's intended but all your links to images report We're sorry. The page you tried to access is not available. In that way nothing can be advised on your issue... Warm regards, Dmitry Silaev On Mon, Feb 21, 2011 at 5:02 AM, Jon Andersen jande...@gmail.com wrote: Hi, My

Re: Wrappers for tessearct3.01?

2011-02-15 Thread Dmitry Silaev
. Instead of rummaging in Tess's guts I'd better use a pretty convenient and high-level interface provided by ResultIterator (see GetIterator() in baseapi.h and then read all comments in resultiterator.h and pageiterator.h) Warm regards, Dmitry Silaev On Wed, Feb 16, 2011 at 5:34 AM, devTess

Re: Provide/visualize baseline info?

2011-02-08 Thread Dmitry Silaev
no value (( *** I'm still seeking for somebody's help regarding this topic's subject. *** Warm regards, Dmitry Silaev 2011/2/8 Sriranga(78yrsold) withblessi...@gmail.com Dmitry, Congratulations !! successfully installed in winXP and tried using phototest.tif 1st commandline tesseract

Re: Wrappers for tessearct3.01?

2011-02-08 Thread Dmitry Silaev
step-by-step debugging is also of use )) Warm regards, Dmitry Silaev On Tue, Feb 8, 2011 at 6:44 PM, devTess jim...@googlemail.com wrote: Hi Dimitry, with the guidelines provided from you, I prepared a strong cup of coffee and start reading the top part of baseapi.h Q1 Init(datapath

Re: Provide/visualize baseline info?

2011-02-06 Thread Dmitry Silaev
the reasonable forum post size here in Google Groups, I placed the more verbose and overall nicer looking instructions in my blog at http://rdaemons.blogspot.com/2011/02/tesseract-ocr-setting-up-interactive.html Warm regards, Dmitry Silaev 2011/2/6 Sriranga(78yrsold) withblessi...@gmail.com Dear

Re: Tesseract Training

2011-01-24 Thread Dmitry Silaev
appropriate people to do this job. Warm regards, Dmitry Silaev -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr

Re: Tesseract Training

2011-01-20 Thread Dmitry Silaev
not much of a recent graduate already (( Warm regards, Dmitry Silaev -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr

Re: Tesseract Training

2011-01-18 Thread Dmitry Silaev
to the entire glyph combination. Then during the post-processing you'll need to replace this single code with a predefined dependent Unicode pair. Hope I've managed to express myself clearly. Warm regards, Dmitry Silaev -- You received this message because you are subscribed to the Google Groups

Re: Tesseract Training

2011-01-17 Thread Dmitry Silaev
Dear Sochenda, I've checked the Unicode table range you've sent and now I see what the problem is. I'd agree that in such algorithmic writing system (contrasted with simpler positional systems like say Roman or Cyrillic) the stages of pre-/post-processing are inevitable. I'd suggest making

Re: Tesseract Training

2011-01-16 Thread Dmitry Silaev
://code.google.com/p/tesseract-ocr/wiki/ReadMe). These are not quite easy searchable documents but they contain all the info you might need. Warm regards, Dmitry Silaev On Sun, Jan 16, 2011 at 10:42 AM, KHEM Sochenda khemsoche...@gmail.comwrote: Dear Dmitry, Thank you very much

Re: Tesseract Training

2011-01-14 Thread Dmitry Silaev
, Dmitry Silaev On Fri, Jan 14, 2011 at 10:25 AM, KHEM Sochenda khemsoche...@gmail.comwrote: Dear Tesseract Team, In training new language step, we have to assign a unicode value to each box. I would like to know if a shape that is composed of *several unicode characters? Is there anyway

Re: Can't get the user dictionary to work

2010-07-30 Thread Dmitry Silaev
On the plus side, it turns out that there are functions buried in the code to serialise/deserialise the classifier state, so it might be useful to run a whole corpus of short images through tess in one batch, save the state, and load that at startup. Could you please be more specific, what

Re: tesseract output correction gear

2010-07-13 Thread Dmitry Silaev
there was a minor bug which prevented display of magnified textline images in the viewport after save now it's fixed eh, development version as i said On Tue, Jul 13, 2010 at 3:36 PM, Jimmy O'Regan jore...@gmail.com wrote: On 13 July 2010 11:55, daemon-s daemons2...@gmail.com wrote: Please