Re: [tesseract-ocr] Re: jTessBoxEditor - Tesseract box editor & trainer

newbie Tue, 11 Nov 2014 10:54:46 -0800

Shree,
          The eng.traindata that comes with tess4j, which I am presuming is 
the one from the google link below, gives me this below. I should be able 
to read the vip2500 and AT&T Uverse from the image, which it is not doing. 
Hence I thought I might have to train it.



AT&T U-verse

rowan <3 3
/ --

vxvzsoo ‘Q’

On Tuesday, November 11, 2014 1:11:04 PM UTC-5, shree wrote:
>
> You don't need to train in order to extract text.
>
> Have you tried with the english traineddata .. available from 
> https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata
>
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Nov 11, 2014 at 9:00 PM, newbie <[email protected] 
> <javascript:>> wrote:
>
>> Shree, 
>>          Thanks for taking the time to respond. I think I am lost at the 
>> first step. I have an image(ArrisVIP500.png, attached for sample) from 
>> which I need to extract the text from. I need to train that 
>> tessearact/tess4j engine to pick up the text from this image.
>>
>> But the Tiff/Box Generator is looking for a text file. So I started out 
>> with a notepad(vip2500.txt file also attached) file, with the text in the 
>> image in the same font type(the font I got similar to the image was 
>> san-serif on whatthe font , dont know if that is right). When I load the 
>> txt file to the Tiff/Box Generator, I dont see the generate button to 
>> generate the .tif and box files.
>>
>> Any help is appreciated.
>>
>>
>>
>>
>>
>>
>> On Tuesday, November 11, 2014 3:16:03 AM UTC-5, shree wrote:
>>>
>>> JTessBoxEditor has three tabs
>>>
>>> Use *Tiff/Box Generator* to generate tiff and box files from a given 
>>> text file for the chosen font
>>>
>>> The Box files created by Box/Tiff Generator are based on the rendering 
>>> of the text in the chosen font and will be accurate - however they may 
>>> still get errors 'blob not found' during training. 
>>>
>>> Use *Trainer *in *Make Box File *mode to generate box files from an 
>>> image using the chosen language's traineddata
>>>
>>> Please note that the BOX files created by Tesseract under Trainer will 
>>> only be as good as the recognition by Tesseract using the traineddata being 
>>> used and may require a lot of modification. 
>>>
>>> Use* Box Editor *to edit the box files (if needed)
>>>
>>> Use *Trainer *in *Train with existing Box* to use box/tiff pairs that 
>>> you may have
>>>
>>> If you want to do training using JTessBoxEditor, you need to create the 
>>> other files required for training (see /samples/vie for files for 
>>> vietnamese) - you may be able to use some of the files from tesseract's 
>>> langdata repo as a start
>>>
>>>
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Mon, Nov 10, 2014 at 11:58 PM, newbie <[email protected]> wrote:
>>>
>>>> I have installed JTessBoxEditor to train my images for tess4j. But I am 
>>>> unable to open the file(png,tiff) in the box editor. When I read the 
>>>> tutorial , it says use tiff/box files as input to the editor, but when it 
>>>> browse's for files it seems to be looking for text files. I have an 
>>>> original png file, which I converted into tiff. I also tried converting 
>>>> the 
>>>> png to a 8bpp grayscale but in vain. I am still struggling to see the 
>>>> image 
>>>> file in the JTessBoxEditor. Any help is appreciated.
>>>>
>>>>
>>>>
>>>> On Wednesday, September 25, 2013 10:02:13 PM UTC-4, Quan Nguyen wrote:
>>>>>
>>>>> jTessBoxEditor is a Java box editor for Tesseract OCR data. It can 
>>>>> read images of common image formats, including multi-page TIFF. The
>>>>> program requires JRE 6.0 or later.
>>>>>
>>>>> Version 1.0 Beta integrates support for full automation of Tesseract 
>>>>> training. Please post your comments/feedback here. Thank you.
>>>>>
>>>>> http://vietocr.sourceforge.net/training.html
>>>>> http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/
>>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/ebe4115d-3384-474f-ac65-f738c5c26910%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/ebe4115d-3384-474f-ac65-f738c5c26910%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f8c9f7f9-76fe-4f6e-a937-384d34751fd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: jTessBoxEditor - Tesseract box editor & trainer

Reply via email to