Re: how to get the character in an image file which is in table format.

2011-03-13 Thread David Hoffer
Hi Vicky, Can you tell me more about this paper? It looks like this is not a free document so I can't just read it to see if it would solve the problem I have. My problem is that I have grey-scale image data (tif/jpg/etc) that contains text within a table format, i.e. cells on the page. The doc

RE: how to get the character in an image file which is in table format.

2011-03-13 Thread Vicky Budhiraja
Hello, I used this paper (for pre-processing): Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 1240 - 1256 Best Regards, Vicky -Original Message- From: tesseract-ocr@googlegroups.com [mail

Re: Especial Characteres

2011-03-13 Thread manuel...@gmail.com
What would you recommend to use to split the columns? I think I will need to scan using tesseract column by column. So after that I will need to merge it to make correct rows. Can you point me a direction to help me? What tools (unix compatible tools) can I use to tell tesseract to scan a specif

Re: Tesseract 3.00

2011-03-13 Thread Quan Nguyen
There are some GUI frontends that you can use, such as VietOCR, which are available as Java and .NET apps. http://vietocr.sf.net On Mar 13, 6:13 pm, Onion wrote: > Ok, thanks. That will be too complicated for me to use. Will have to > uninstall it. -- You received this message because you are

Re: Tesseract 3.00

2011-03-13 Thread Onion
Ok, thanks. That will be too complicated for me to use. Will have to uninstall it. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email

Re: Customising Tesseract for character recognition

2011-03-13 Thread Dmitry Silaev
Jose, I run Tesseract revision 549 from the command line under Windows with no special config and get the segmentation which is almost correct. What language file do you use? I used the following command line tesseract 3.tiff test3 -l eng with no pageseg_mode (-psm argument) as well as with it,

Re: Tesseract 3.00

2011-03-13 Thread Dmitry Silaev
Although Tesseract team struggles to get it more user-friendly, many obvious user issues are still opaque or hard to find an answer to... Tesseract is a console application, it has no GUI. You should open a Windows command line and type a command. Read more at http://code.google.com/p/tesseract-oc

Tesseract 3.00

2011-03-13 Thread Onion
I installed Tesseract 3.00 and the German and Czech languages as well as English. Now how do I run it? Are there directions somewhere? When I click Start > Tesseract OCR, a DOS screen flashes for a split second, then nothing happens. Thanks -- You received this message because you are sub

Re: Customising Tesseract for character recognition

2011-03-13 Thread patrickq
You expect way too much from Tesseract: it's not Tesseract's job to slice and dice the text according to various organizational requirements of applications - that's for the application to handle. You can get all the coordinates of all characters and easily determine which one are in what you consi

Re: Customising Tesseract for character recognition

2011-03-13 Thread Jose
Hi Patrick, yes the results are correct! but the format of the results it is not! that's my trouble -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this g

Re: Customising Tesseract for character recognition

2011-03-13 Thread patrickq
Tesseract 3.00 gets this text 100% correct, including the smudged numbers at the bottom. See: http://www.scanbizcards.com/plate1.jpg http://www.scanbizcards.com/plate2.jpg (scanning was done with ScanBizCards on an iPhone - if you try it yourself with the app on Android or iPhone, please disable i

Re: Customising Tesseract for character recognition

2011-03-13 Thread Jose
Hi Dmitry, sorry for the delay... I produced some samples and see if you can give them a look! regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe

Re: Especial Characteres

2011-03-13 Thread Dmitry Silaev
Running via ports can cause diverse errors. Try to compile Tesseract natively. I use revision 549 and as I said it works fine. Such tables as you have present a challenge for simple layout processing algorithms, due to sparsely located text. A minimal skew which is almost inevitable could break al

Re: Especial Characteres

2011-03-13 Thread Dmitry Silaev
Manuel, The sample you provided definitely has insufficient resolution. You may only expect some part of the heading to be recognized. So this is what happened when I've run the recognition of your image. But I haven't got any error or warning messages with my "por.traineddata" at all! However al

Re: how to get the character in an image file which is in table format.

2011-03-13 Thread Dmitry Silaev
The first step in this technique is to threshold the image using a manually selected threshold value. Within the text of the article this step only deserved a line of code (pix1 = pixThresholdToBinary(pixs, 150)), but not a single word. However the fact that such a convenient threshold luckily exis