Thanks for letting me know. No I haven't had a chance. I will try 4.0 although I have never manually dealt with tesseract. I've been using programs for 3.x that trained and made box files automatically.
On Apr 24, 2017 12:43 AM, "ShreeDevi Kumar" <[email protected]> wrote: > James, > > Were you able to get this to work for you with 3.04/3.05? > > I get accurate results using Tesseract 4.0 alpha, though it takes longer > with --oem 1 than --oem 0. > > > ./troublewith98-300.jpg > Tesseract Open Source OCR Engine v4.00.00alpha-385-gab41465 with Leptonica > > real 0m1.203s > user 0m0.578s > sys 0m0.203s > Tesseract Open Source OCR Engine v4.00.00alpha-385-gab41465 with Leptonica > > real 0m4.485s > user 0m5.125s > sys 0m0.234s > > See attached .. > > You can test with https://sourceforge.net/projects/vietocr/files/ > vietocr.net/5.0alpha/ > which uses Tesseract.NET (Tesseract 4.00alpha 362b68e) > > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sun, Apr 23, 2017 at 9:25 AM, ShreeDevi Kumar <[email protected]> > wrote: > >> Try training using more samples of 8, 9, B etc. >> >> What results do you get with the provided eng.traineddata? Are they >> better or worse? >> >> Have you tried changing DPI of image to 300? >> >> - excuse the brevity, sent from mobile >> >> On 22-Apr-2017 10:29 PM, "James Abney" <[email protected]> wrote: >> >>> Oh yes I guess I forgot to include that information, I did train using >>> only that font and with the same size font. I am on windows 7 and I used >>> 3.05 to train, although the .net wrapper i use is 3.04. I don't see how it >>> has difficulty with the 9 and 8, seems very odd. >>> >>> On Friday, April 21, 2017 at 11:05:49 PM UTC-4, shree wrote: >>>> >>>> Which version of Tesseract. Which o/s? >>>> >>>> If all your text is in tungsten-semibold, have you tried training with >>>> just that font? >>>> >>>> - excuse the brevity, sent from mobile >>>> >>>> >>>> On 22-Apr-2017 12:50 AM, "James Abney" <[email protected]> wrote: >>>> >>>> The font is tungsten semibold >>>> >>>> >>>> On Friday, April 21, 2017 at 2:08:53 PM UTC-4, James Abney wrote: >>>>> >>>>> I'm having issues with tesseract dealing with the number 9 and 8 >>>>> especially when they are next to each other. This is really the only issue >>>>> I have. Even when ocr a tiff file it shows 123456789 as 123456788. I will >>>>> link an example. Any help is appreciated. The following image is an >>>>> example >>>>> where my software using tesseract interprets the 899B8993B as 88888-838. >>>>> >>>>> >>>>> <https://lh3.googleusercontent.com/-HF3RzbqMD6I/WPo8RYC6GaI/AAAAAAAAAJg/phkq6dgtvSE5f3upJQrfowEp1vyW8TQXwCLcB/s1600/troublewith98.png> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/tesseract-ocr/4a0c2a52-3eb5-4884-9371-111a6fbea73b%40goo >>>> glegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/4a0c2a52-3eb5-4884-9371-111a6fbea73b%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/414a0ab1-8b9a-48a6-8571-795345ac316f%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/414a0ab1-8b9a-48a6-8571-795345ac316f%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/tesseract-ocr/ekDV9gLb-80/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/CAG2NduVOcgryCqD77SZgHKDuJqgGCQmW9U9zFdgOoG8HT%2BHK3Q% > 40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVOcgryCqD77SZgHKDuJqgGCQmW9U9zFdgOoG8HT%2BHK3Q%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2By7dTc3Tou53_O%2Bcys%3DOAZCX9x6%2BzHe4egLx0UXKmQXgTFgcA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

