Thank you.

That seems to fixed the dropped characters problem.

jt


On Wednesday, March 9, 2016 at 10:20:27 AM UTC-8, zdenop wrote:
>
> SetPageSegMode  and try PSM_SINGLE_BLOCK.
>
> See:
>
> https://github.com/tesseract-ocr/tesseract/wiki/APIExample#orientation-and-script-detection-osd-example
>
> https://github.com/tesseract-ocr/tesseract/blob/master/ccstruct/publictypes.h#L151
>
> Zdenko
>
> On Wed, Mar 9, 2016 at 6:45 PM, 'John Taves' via tesseract-ocr <
> [email protected] <javascript:>> wrote:
>
>> I am using the c# API and whatever default page segmentation happens. 
>> What tess variable[1] should I play with?
>>
>> [1]http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version
>>
>> jt
>>
>> On Wednesday, March 9, 2016 at 8:44:02 AM UTC-8, zdenop wrote:
>>>
>>> What page segmentation method[1] you used?
>>>
>>> [1] 
>>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method
>>>
>>> Zdenko
>>>
>>> On Wed, Mar 9, 2016 at 5:14 PM, 'John Taves' via tesseract-ocr <
>>> [email protected]> wrote:
>>>
>>>> I am trying to recognize a flawless image. I created the image from a 
>>>> pdf that is all vector, not image. It has no noise, no skew, flawless 
>>>> characters in any DPI that I want.
>>>>
>>>>
>>>> The recognition from Tesseract sucks. Generally the problem is dropped 
>>>> characters. It seems to randomly ignore perfectly good looking characters.
>>>>
>>>>
>>>> The screen shot shows the text results in the upper left and the image 
>>>> in the background (only the upper left of the image is visible). The 
>>>> bounding boxes of the results are shown in red on that image. Notice all 
>>>> the missing characters. On this particular image all the characters to the 
>>>> right of what you can see are found and recognized properly. The image 
>>>> consists of a table of information (rows of item #, size, description, and 
>>>> qty). The columns are not nicely aligned (although this example is pretty 
>>>> good). Some rows are separated by a line (this example has a line for each 
>>>> row, and notice that tesseract gives me a bounding box for some of the 
>>>> lines, but not all). I tried removing the lines, but that just changed the 
>>>> set of dropped characters with no rhyme or reason to it. Other images from 
>>>> this same set are very similar but tesseract will drop characters on the 
>>>> right, or whole lines will be missing. I have tried different DPI from 75 
>>>> to 300, but the results were just as disappointing.
>>>>
>>>>
>>>> Can anyone suggest how this might be solved?
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-YwT5YW2wYGo/VuBLmZ-_lSI/AAAAAAAAAZ8/FhfW1gGg_8g/s1600/BadOCR.png>
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-ER5AgyxXtY4/VuBLtP6wWvI/AAAAAAAAAaA/1Lxb767Xiqs/s1600/foo700219.png>
>>>>
>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/8c27aca6-3a45-4c23-97af-676fc6b0b611%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/8c27aca6-3a45-4c23-97af-676fc6b0b611%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/365df293-c049-418f-8632-9bb64c080d32%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/365df293-c049-418f-8632-9bb64c080d32%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f6fd0606-32b4-4a94-8f15-08478a0a5fa2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to