The google link u gave me below does not let me download the file. Just wanted to check if its different from the one I have.
On Tuesday, November 11, 2014 1:53:57 PM UTC-5, newbie wrote: > > Shree, > The eng.traindata that comes with tess4j, which I am presuming > is the one from the google link below, gives me this below. I should be > able to read the vip2500 and AT&T Uverse from the image, which it is not > doing. Hence I thought I might have to train it. > > > AT&T U-verse > > rowan <3 3 > / -- > > vxvzsoo ‘Q’ > > On Tuesday, November 11, 2014 1:11:04 PM UTC-5, shree wrote: >> >> You don't need to train in order to extract text. >> >> Have you tried with the english traineddata .. available from >> https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, Nov 11, 2014 at 9:00 PM, newbie <[email protected]> wrote: >> >>> Shree, >>> Thanks for taking the time to respond. I think I am lost at the >>> first step. I have an image(ArrisVIP500.png, attached for sample) from >>> which I need to extract the text from. I need to train that >>> tessearact/tess4j engine to pick up the text from this image. >>> >>> But the Tiff/Box Generator is looking for a text file. So I started out >>> with a notepad(vip2500.txt file also attached) file, with the text in the >>> image in the same font type(the font I got similar to the image was >>> san-serif on whatthe font , dont know if that is right). When I load the >>> txt file to the Tiff/Box Generator, I dont see the generate button to >>> generate the .tif and box files. >>> >>> Any help is appreciated. >>> >>> >>> >>> >>> >>> >>> On Tuesday, November 11, 2014 3:16:03 AM UTC-5, shree wrote: >>>> >>>> JTessBoxEditor has three tabs >>>> >>>> Use *Tiff/Box Generator* to generate tiff and box files from a given >>>> text file for the chosen font >>>> >>>> The Box files created by Box/Tiff Generator are based on the rendering >>>> of the text in the chosen font and will be accurate - however they may >>>> still get errors 'blob not found' during training. >>>> >>>> Use *Trainer *in *Make Box File *mode to generate box files from an >>>> image using the chosen language's traineddata >>>> >>>> Please note that the BOX files created by Tesseract under Trainer will >>>> only be as good as the recognition by Tesseract using the traineddata >>>> being >>>> used and may require a lot of modification. >>>> >>>> Use* Box Editor *to edit the box files (if needed) >>>> >>>> Use *Trainer *in *Train with existing Box* to use box/tiff pairs that >>>> you may have >>>> >>>> If you want to do training using JTessBoxEditor, you need to create the >>>> other files required for training (see /samples/vie for files for >>>> vietnamese) - you may be able to use some of the files from tesseract's >>>> langdata repo as a start >>>> >>>> >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Mon, Nov 10, 2014 at 11:58 PM, newbie <[email protected]> wrote: >>>> >>>>> I have installed JTessBoxEditor to train my images for tess4j. But I >>>>> am unable to open the file(png,tiff) in the box editor. When I read the >>>>> tutorial , it says use tiff/box files as input to the editor, but when it >>>>> browse's for files it seems to be looking for text files. I have an >>>>> original png file, which I converted into tiff. I also tried converting >>>>> the >>>>> png to a 8bpp grayscale but in vain. I am still struggling to see the >>>>> image >>>>> file in the JTessBoxEditor. Any help is appreciated. >>>>> >>>>> >>>>> >>>>> On Wednesday, September 25, 2013 10:02:13 PM UTC-4, Quan Nguyen wrote: >>>>>> >>>>>> jTessBoxEditor is a Java box editor for Tesseract OCR data. It can >>>>>> read images of common image formats, including multi-page TIFF. The >>>>>> program requires JRE 6.0 or later. >>>>>> >>>>>> Version 1.0 Beta integrates support for full automation of Tesseract >>>>>> training. Please post your comments/feedback here. Thank you. >>>>>> >>>>>> http://vietocr.sourceforge.net/training.html >>>>>> http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/ >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/63554b1e-5e5d-48a5-b751-220ccd006cde%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/ebe4115d-3384-474f-ac65-f738c5c26910%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/ebe4115d-3384-474f-ac65-f738c5c26910%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/36d9425d-29e5-4d3d-9d1e-6895cb3fe70a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

