Quan/Shree,
Do u know of some tool that would only leave the fonts
on the image ? A preprocessing of the image for tesseract ?
Thanks
On Tuesday, November 11, 2014 3:41:21 PM UTC-5, Quan Nguyen wrote:
>
> The buttons, port, signs, symbols, logos -- those non-text elements -- all
> help confuse Tesseract.
>
> On Tuesday, November 11, 2014 2:04:35 PM UTC-6, newbie wrote:
>>
>> Quan,
>> Can u ellaborate on the problems with image processing - what
>> do u mean by the non text objects ? I have attached the image in a thread
>> above to shree.
>> Thanks
>>
>>
>> On Tuesday, November 11, 2014 2:17:30 PM UTC-5, Quan Nguyen wrote:
>>>
>>> Looks like you got yourself a problem of image processing, not training.
>>> There are many non-text objects in your image; any OCR engine would have
>>> problems with. Eliminating them, you'll get better results.
>>>
>>> On Tuesday, November 11, 2014 9:30:24 AM UTC-6, newbie wrote:
>>>>
>>>> Shree,
>>>> Thanks for taking the time to respond. I think I am lost at
>>>> the first step. I have an image(ArrisVIP500.png, attached for sample) from
>>>> which I need to extract the text from. I need to train that
>>>> tessearact/tess4j engine to pick up the text from this image.
>>>>
>>>> But the Tiff/Box Generator is looking for a text file. So I started out
>>>> with a notepad(vip2500.txt file also attached) file, with the text in the
>>>> image in the same font type(the font I got similar to the image was
>>>> san-serif on whatthe font , dont know if that is right). When I load the
>>>> txt file to the Tiff/Box Generator, I dont see the generate button to
>>>> generate the .tif and box files.
>>>>
>>>> Any help is appreciated.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, November 11, 2014 3:16:03 AM UTC-5, shree wrote:
>>>>>
>>>>> JTessBoxEditor has three tabs
>>>>>
>>>>> Use *Tiff/Box Generator* to generate tiff and box files from a given
>>>>> text file for the chosen font
>>>>>
>>>>> The Box files created by Box/Tiff Generator are based on the rendering
>>>>> of the text in the chosen font and will be accurate - however they may
>>>>> still get errors 'blob not found' during training.
>>>>>
>>>>> Use *Trainer *in *Make Box File *mode to generate box files from an
>>>>> image using the chosen language's traineddata
>>>>>
>>>>> Please note that the BOX files created by Tesseract under Trainer will
>>>>> only be as good as the recognition by Tesseract using the traineddata
>>>>> being
>>>>> used and may require a lot of modification.
>>>>>
>>>>> Use* Box Editor *to edit the box files (if needed)
>>>>>
>>>>> Use *Trainer *in *Train with existing Box* to use box/tiff pairs that
>>>>> you may have
>>>>>
>>>>> If you want to do training using JTessBoxEditor, you need to create
>>>>> the other files required for training (see /samples/vie for files for
>>>>> vietnamese) - you may be able to use some of the files from tesseract's
>>>>> langdata repo as a start
>>>>>
>>>>>
>>>>>
>>>>> ShreeDevi
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>>> On Mon, Nov 10, 2014 at 11:58 PM, newbie <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I have installed JTessBoxEditor to train my images for tess4j. But I
>>>>>> am unable to open the file(png,tiff) in the box editor. When I read the
>>>>>> tutorial , it says use tiff/box files as input to the editor, but when
>>>>>> it
>>>>>> browse's for files it seems to be looking for text files. I have an
>>>>>> original png file, which I converted into tiff. I also tried converting
>>>>>> the
>>>>>> png to a 8bpp grayscale but in vain. I am still struggling to see the
>>>>>> image
>>>>>> file in the JTessBoxEditor. Any help is appreciated.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wednesday, September 25, 2013 10:02:13 PM UTC-4, Quan Nguyen wrote:
>>>>>>>
>>>>>>> jTessBoxEditor is a Java box editor for Tesseract OCR data. It can
>>>>>>> read images of common image formats, including multi-page TIFF. The
>>>>>>> program requires JRE 6.0 or later.
>>>>>>>
>>>>>>> Version 1.0 Beta integrates support for full automation of Tesseract
>>>>>>> training. Please post your comments/feedback here. Thank you.
>>>>>>>
>>>>>>> http://vietocr.sourceforge.net/training.html
>>>>>>> http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde%40googlegroups.com
>>>>>>
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/7f647b66-18df-44bb-b486-6b5144b23572%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.