You don't need to train in order to extract text.

Have you tried with the english traineddata .. available from
https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Nov 11, 2014 at 9:00 PM, newbie <[email protected]> wrote:

> Shree,
>          Thanks for taking the time to respond. I think I am lost at the
> first step. I have an image(ArrisVIP500.png, attached for sample) from
> which I need to extract the text from. I need to train that
> tessearact/tess4j engine to pick up the text from this image.
>
> But the Tiff/Box Generator is looking for a text file. So I started out
> with a notepad(vip2500.txt file also attached) file, with the text in the
> image in the same font type(the font I got similar to the image was
> san-serif on whatthe font , dont know if that is right). When I load the
> txt file to the Tiff/Box Generator, I dont see the generate button to
> generate the .tif and box files.
>
> Any help is appreciated.
>
>
>
>
>
>
> On Tuesday, November 11, 2014 3:16:03 AM UTC-5, shree wrote:
>>
>> JTessBoxEditor has three tabs
>>
>> Use *Tiff/Box Generator* to generate tiff and box files from a given
>> text file for the chosen font
>>
>> The Box files created by Box/Tiff Generator are based on the rendering of
>> the text in the chosen font and will be accurate - however they may still
>> get errors 'blob not found' during training.
>>
>> Use *Trainer *in *Make Box File *mode to generate box files from an
>> image using the chosen language's traineddata
>>
>> Please note that the BOX files created by Tesseract under Trainer will
>> only be as good as the recognition by Tesseract using the traineddata being
>> used and may require a lot of modification.
>>
>> Use* Box Editor *to edit the box files (if needed)
>>
>> Use *Trainer *in *Train with existing Box* to use box/tiff pairs that
>> you may have
>>
>> If you want to do training using JTessBoxEditor, you need to create the
>> other files required for training (see /samples/vie for files for
>> vietnamese) - you may be able to use some of the files from tesseract's
>> langdata repo as a start
>>
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Mon, Nov 10, 2014 at 11:58 PM, newbie <[email protected]> wrote:
>>
>>> I have installed JTessBoxEditor to train my images for tess4j. But I am
>>> unable to open the file(png,tiff) in the box editor. When I read the
>>> tutorial , it says use tiff/box files as input to the editor, but when it
>>> browse's for files it seems to be looking for text files. I have an
>>> original png file, which I converted into tiff. I also tried converting the
>>> png to a 8bpp grayscale but in vain. I am still struggling to see the image
>>> file in the JTessBoxEditor. Any help is appreciated.
>>>
>>>
>>>
>>> On Wednesday, September 25, 2013 10:02:13 PM UTC-4, Quan Nguyen wrote:
>>>>
>>>> jTessBoxEditor is a Java box editor for Tesseract OCR data. It can read
>>>> images of common image formats, including multi-page TIFF. The
>>>> program requires JRE 6.0 or later.
>>>>
>>>> Version 1.0 Beta integrates support for full automation of Tesseract
>>>> training. Please post your comments/feedback here. Thank you.
>>>>
>>>> http://vietocr.sourceforge.net/training.html
>>>> http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/ebe4115d-3384-474f-ac65-f738c5c26910%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/ebe4115d-3384-474f-ac65-f738c5c26910%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUoJityYkBrjhd%3D%3DZqJM1X1RwnvpTTWYDKNCzaxmUNj9A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to