Your results correspond to what I had written earlier. However I don't know the reasons for the error in your third experiment. Probably this was because of a browser glitch or smth. Anyways you should be able to send the image for processing just after you opened the demo's webpage. You can try again.
As for Kannada, I think, at this moment, the latest traineddata file is sufficient for the demo. Kindly send it to me please. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Sep 8, 2011 at 12:01 PM, Sriranga(78yrsold) <[email protected]> wrote: > > > On Thu, Sep 8, 2011 at 1:24 PM, Sriranga(78yrsold) <[email protected]> > wrote: >> >> Hi Dmitri, >> Thanks for the encouragement to pursue the OCR.I am really extremely >> grateful to you for all valuable guidance rendered to me from time to time- >> which I cannot forget your great noble help. >> >> As suggested I download again OCR.tim from the Tim website and saved >> downloaded and uploaded in your cloud demo. The result is reproduced below: >> 1)for ocrbook-1.tif (unedited - original)= >> s On the Insert tab, the galleries include items that are designed to >> coordinate with the overall look of >> your document. You can use these galleries to insert tables, headers, >> footers, lists, cover pages, and >> other document building blocks. When you create pictures, charts, or >> diagrams, they also coordinate >> = g with your current document look. >> two times experimented but output is same. >> 2)ocrbook-2.tif(edited in paintbrush - removed speckles with help of >> magnifier in the paintbrush itself. and tested in your demo) output was >> correct. reproduced below = >> >> On the Insert tab, the galleries include items that are designed to >> coordinate with the overall look of >> your document. You can use these galleries to insert tables, headers, >> footers, lists, cover pages, and >> other document building blocks. When you create pictures, charts, or >> diagrams, they also coordinate >> with your current document look. >> >> 3) testing.tif(this is renamed file for unittled.tif forwarded to you >> earlier) >> when uploaded in your Demo : error displayed as >> ERROR: "Illegal parameter values: 'name' cannot be blank." >> Where I made mistake? >> For your experiment purpose, whether trainedata file is sufficient or all >> generated data files like unicharset,etc are required for your testing? >> I tested using kannada.tif tile output was in English - this proves your >> demo is supported for all langs - depends of<Lang>traineddata files are >> installed in the cloud - I think. >> With Warmest Regards, >> -sriranga(78yrs) >> >> >> >> On Thu, Sep 8, 2011 at 2:56 AM, Dmitri Silaev <[email protected]> >> wrote: >>> >>> Hi Sriranga! >>> >>> Glad you are now OK. I must express my respect and admiration on your >>> efforts in the OCR field while having all these troubles with your >>> health. >>> >>> You are right, the result for *your image* with CustomOCR Tesseract >>> demo is exactly like you've attached. But *your image* is not the same >>> as the image *Tim had sent*: the Tim's is much smaller, having as much >>> background as needed around the text, while yours is having huge >>> whitespace to the bottom right to the text. Shame to Tesseract but >>> this degrades recognition accuracy much. >>> >>> A hint on how to obtain Tim's image the right way. Click Tim's link, >>> then in the menu choose File\Donwload Original. Then save the file >>> onto your local hard drive. After that indicate that file in the Image >>> file field of the CustomOCR Standard Tesseract OCR demo and then run >>> processing. >>> >>> Once you've tested the demo with Tim's image you will get the perfect, >>> crisp and clear result, check this yourself. >>> >>> And the last. Absolutely no objections on making Kannada recognition >>> in the form of CustomOCR demo. Is I see now, this should be a separate >>> demo. I'll be glad to make this for the community and waiting for you >>> kindly send me your last traineddata components as well as the >>> compiled traineddata file. >>> >>> Warm regards, >>> Dmitri Silaev >>> www.CustomOCR.com >>> >>> >>> >>> >>> >>> On Wed, Sep 7, 2011 at 6:58 AM, Sriranga(78yrsold) >>> <[email protected]> wrote: >>> > Hi Dmitri, >>> > I got laser treatment for my blurred vision. Now OK. I tested in your >>> > demo >>> > attached output below >>> > On file Insert tab, the gallzries xnclude items that are dcslgled to >>> > onnrdinab: with the ova:-all look of >>> > your dncumem. You can use than galleries w insert mum, heudms, footers, >>> > um, >>> > cover pig»-5, and >>> > other ducumcnt budding blucls. wh=- you mm piclures, mm, at diagrams, >>> > they >>> > also ccordinlle >>> > with yuur wmm document lnnk. >>> > I am using r-527 winxp >>> > commandline used as follow: >>> > M:\>tesseract untitled.TIF testtif >>> > Tesseract Open Source OCR Engine with Leptonica >>> > Number of found pages: 1. >>> > M:\> >>> > M:\>tesseract untitled.TIF 2testtif -l eng >>> > Tesseract Open Source OCR Engine with Leptonica >>> > Number of found pages: 1. >>> > >>> > M:\> >>> > submitted for your persual. I find no difference between demo and cmd >>> > output. Where i made a mistake. >>> > I may kindly be informed whether your demo cannot be tested for Kannada >>> > ? >>> > With regards, >>> > -sriranga(78yrs) >>> > >>> > >>> > >>> > On Fri, Sep 2, 2011 at 8:19 AM, Sriranga(78yrsold) >>> > <[email protected]> >>> > wrote: >>> >> >>> >> HI dmitri, >>> >> I am still using r-527 and winxp. I am suffering from blurred vision. >>> >> With warm regards, >>> >> -sriranga(78) >>> >> >>> >> On Thu, Sep 1, 2011 at 8:22 PM, Dmitri Silaev <[email protected]> >>> >> wrote: >>> >>> >>> >>> I don't know your Tesseract's version but here you can witness that >>> >>> with rev. 580 the result is perfect: >>> >>> >>> >>> http://www.customocr.com/index.php?r=site/page&view=demos.tesseract_ocr >>> >>> The extra chars in the first and last lines are due to some speckle >>> >>> noise to the left of these lines. >>> >>> >>> >>> Warm regards, >>> >>> Dmitri Silaev >>> >>> www.CustomOCR.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Sep 1, 2011 at 2:36 PM, Tim Alexander >>> >>> <[email protected]> >>> >>> wrote: >>> >>> > Apologies. Have google docced a portion of the tif file I ran >>> >>> > tesseract on: >>> >>> > >>> >>> > >>> >>> > >>> >>> > https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B-BfHrAa9J5kZDEzNWRmODItZGFiZi00Y2NkLWI2N2MtZjA5MDg1OTEzYjky&hl=en_US >>> >>> > >>> >>> > Regards >>> >>> > >>> >>> > Tim >>> >>> > >>> >>> > On Aug 31, 8:08 pm, Dmitri Silaev <[email protected]> wrote: >>> >>> >> No chance to answer your questions without a sample image. Please >>> >>> >> provide. >>> >>> >> >>> >>> >> Warm regards, >>> >>> >> Dmitri Silaevwww.CustomOCR.com >>> >>> >> >>> >>> >> On Wed, Aug 31, 2011 at 3:43 PM, Tim Alexander >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> <[email protected]> wrote: >>> >>> >> > Seem to have tesseract setup and scripted ok running on Ubuntu >>> >>> >> > 11.04. >>> >>> >> > However I am finding my accuracy for OCR to be fairly low. At >>> >>> >> > first >>> >>> >> > I >>> >>> >> > thought it was the scanned documents I was using but I recently >>> >>> >> > ran >>> >>> >> > my >>> >>> >> > script against a printed and scanned Word document using Times >>> >>> >> > New >>> >>> >> > Roman with the output from MS Words random paragraph function. >>> >>> >> >>> >>> >> > I was undere the impression that the english training data that >>> >>> >> > is >>> >>> >> > downloadable from the site included times new roman as one of >>> >>> >> > the >>> >>> >> > pre >>> >>> >> > trained fonts? Either way my results look like this: >>> >>> >> >>> >>> >> > "On the Insertt ab, the galleriesi nclude itemst hat are >>> >>> >> > designedto >>> >>> >> > coordinatew ith the overall look of >>> >>> >> > yourd ocumenYt. ou canu set heseg alleriesto insertt ablesh, >>> >>> >> > eadersfo, >>> >>> >> > otersl,i sts,c overp agesa, nd >>> >>> >> > other document building blocks. When you create pictures, >>> >>> >> > charts, or >>> >>> >> > diagrams, they also coordinate >>> >>> >> > with your current document look." >>> >>> >> >>> >>> >> > As you can see there are several words where the delineation >>> >>> >> > between >>> >>> >> > two words is somewhat jumbled. Is this a case of having to >>> >>> >> > train >>> >>> >> > tesseract or is it more down to the scan quality or >>> >>> >> > preprocessing >>> >>> >> > (or >>> >>> >> > lack of)? >>> >>> >> >>> >>> >> > Any help or input greatly appreciated. >>> >>> >> >>> >>> >> > Regards >>> >>> >> >>> >>> >> > Tim >>> >>> >> >>> >>> >> > -- >>> >>> >> > You received this message because you are subscribed to the >>> >>> >> > Google >>> >>> >> > Groups "tesseract-ocr" group. >>> >>> >> > To post to this group, send email to >>> >>> >> > [email protected] >>> >>> >> > To unsubscribe from this group, send email to >>> >>> >> > [email protected] >>> >>> >> > For more options, visit this group at >>> >>> >> >http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> > >>> >>> > -- >>> >>> > You received this message because you are subscribed to the Google >>> >>> > Groups "tesseract-ocr" group. >>> >>> > To post to this group, send email to [email protected] >>> >>> > To unsubscribe from this group, send email to >>> >>> > [email protected] >>> >>> > For more options, visit this group at >>> >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> > >>> >>> >>> >>> -- >>> >>> You received this message because you are subscribed to the Google >>> >>> Groups "tesseract-ocr" group. >>> >>> To post to this group, send email to [email protected] >>> >>> To unsubscribe from this group, send email to >>> >>> [email protected] >>> >>> For more options, visit this group at >>> >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups "tesseract-ocr" group. >>> > To post to this group, send email to [email protected] >>> > To unsubscribe from this group, send email to >>> > [email protected] >>> > For more options, visit this group at >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> > >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

