On Thu, Sep 8, 2011 at 1:24 PM, Sriranga(78yrsold) <[email protected]>wrote:
> Hi Dmitri, > Thanks for the encouragement to pursue the OCR.I am really extremely > grateful to you for all valuable guidance rendered to me from time to time- > which I cannot forget your great noble help. > > As suggested I download again OCR.tim from the Tim website and saved > downloaded and uploaded in your cloud demo. The result is reproduced below: > 1)for ocrbook-1.tif (unedited - original)= > *s On the Insert tab, the galleries include items that are designed to > coordinate with the overall look of > your document. You can use these galleries to insert tables, headers, > footers, lists, cover pages, and > > other document building blocks. When you create pictures, charts, or > diagrams, they also coordinate > = g with your current document look.* > two times experimented but output is same. > 2)ocrbook-2.tif(edited in paintbrush - removed speckles with help of > magnifier in the paintbrush itself. and tested in your demo) output was > correct. reproduced below = > * > On the Insert tab, the galleries include items that are designed to > coordinate with the overall look of > your document. You can use these galleries to insert tables, headers, > footers, lists, cover pages, and > > other document building blocks. When you create pictures, charts, or > diagrams, they also coordinate > with your current document look. > * > > 3) testing.tif(this is renamed file for unittled.tif forwarded to you > earlier) > when uploaded in your Demo : error displayed as > ERROR: "Illegal parameter values: 'name' cannot be blank." > Where I made mistake? > For your experiment purpose, whether trainedata file is sufficient or all > generated data files like unicharset,etc are required for your testing? > I tested using kannada.tif tile output was in English - this proves your > demo is supported for all langs - depends of<Lang>traineddata files are > installed in the cloud - I think. > With Warmest Regards, > -sriranga(78yrs) > > > > > On Thu, Sep 8, 2011 at 2:56 AM, Dmitri Silaev <[email protected]>wrote: > >> Hi Sriranga! >> >> Glad you are now OK. I must express my respect and admiration on your >> efforts in the OCR field while having all these troubles with your >> health. >> >> You are right, the result for *your image* with CustomOCR Tesseract >> demo is exactly like you've attached. But *your image* is not the same >> as the image *Tim had sent*: the Tim's is much smaller, having as much >> background as needed around the text, while yours is having huge >> whitespace to the bottom right to the text. Shame to Tesseract but >> this degrades recognition accuracy much. >> >> A hint on how to obtain Tim's image the right way. Click Tim's link, >> then in the menu choose File\Donwload Original. Then save the file >> onto your local hard drive. After that indicate that file in the Image >> file field of the CustomOCR Standard Tesseract OCR demo and then run >> processing. >> >> Once you've tested the demo with Tim's image you will get the perfect, >> crisp and clear result, check this yourself. >> >> And the last. Absolutely no objections on making Kannada recognition >> in the form of CustomOCR demo. Is I see now, this should be a separate >> demo. I'll be glad to make this for the community and waiting for you >> kindly send me your last traineddata components as well as the >> compiled traineddata file. >> >> Warm regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> >> >> >> On Wed, Sep 7, 2011 at 6:58 AM, Sriranga(78yrsold) >> <[email protected]> wrote: >> > Hi Dmitri, >> > I got laser treatment for my blurred vision. Now OK. I tested in your >> demo >> > attached output below >> > On file Insert tab, the gallzries xnclude items that are dcslgled to >> > onnrdinab: with the ova:-all look of >> > your dncumem. You can use than galleries w insert mum, heudms, footers, >> um, >> > cover pig»-5, and >> > other ducumcnt budding blucls. wh=- you mm piclures, mm, at diagrams, >> they >> > also ccordinlle >> > with yuur wmm document lnnk. >> > I am using r-527 winxp >> > commandline used as follow: >> > M:\>tesseract untitled.TIF testtif >> > Tesseract Open Source OCR Engine with Leptonica >> > Number of found pages: 1. >> > M:\> >> > M:\>tesseract untitled.TIF 2testtif -l eng >> > Tesseract Open Source OCR Engine with Leptonica >> > Number of found pages: 1. >> > >> > M:\> >> > submitted for your persual. I find no difference between demo and cmd >> > output. Where i made a mistake. >> > I may kindly be informed whether your demo cannot be tested for Kannada >> ? >> > With regards, >> > -sriranga(78yrs) >> > >> > >> > >> > On Fri, Sep 2, 2011 at 8:19 AM, Sriranga(78yrsold) < >> [email protected]> >> > wrote: >> >> >> >> HI dmitri, >> >> I am still using r-527 and winxp. I am suffering from blurred vision. >> >> With warm regards, >> >> -sriranga(78) >> >> >> >> On Thu, Sep 1, 2011 at 8:22 PM, Dmitri Silaev <[email protected]> >> >> wrote: >> >>> >> >>> I don't know your Tesseract's version but here you can witness that >> >>> with rev. 580 the result is perfect: >> >>> >> http://www.customocr.com/index.php?r=site/page&view=demos.tesseract_ocr >> >>> The extra chars in the first and last lines are due to some speckle >> >>> noise to the left of these lines. >> >>> >> >>> Warm regards, >> >>> Dmitri Silaev >> >>> www.CustomOCR.com >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Thu, Sep 1, 2011 at 2:36 PM, Tim Alexander < >> [email protected]> >> >>> wrote: >> >>> > Apologies. Have google docced a portion of the tif file I ran >> >>> > tesseract on: >> >>> > >> >>> > >> >>> > >> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B-BfHrAa9J5kZDEzNWRmODItZGFiZi00Y2NkLWI2N2MtZjA5MDg1OTEzYjky&hl=en_US >> >>> > >> >>> > Regards >> >>> > >> >>> > Tim >> >>> > >> >>> > On Aug 31, 8:08 pm, Dmitri Silaev <[email protected]> wrote: >> >>> >> No chance to answer your questions without a sample image. Please >> >>> >> provide. >> >>> >> >> >>> >> Warm regards, >> >>> >> Dmitri Silaevwww.CustomOCR.com >> >>> >> >> >>> >> On Wed, Aug 31, 2011 at 3:43 PM, Tim Alexander >> >>> >> >> >>> >> >> >>> >> >> >>> >> <[email protected]> wrote: >> >>> >> > Seem to have tesseract setup and scripted ok running on Ubuntu >> >>> >> > 11.04. >> >>> >> > However I am finding my accuracy for OCR to be fairly low. At >> first >> >>> >> > I >> >>> >> > thought it was the scanned documents I was using but I recently >> ran >> >>> >> > my >> >>> >> > script against a printed and scanned Word document using Times >> New >> >>> >> > Roman with the output from MS Words random paragraph function. >> >>> >> >> >>> >> > I was undere the impression that the english training data that >> is >> >>> >> > downloadable from the site included times new roman as one of the >> >>> >> > pre >> >>> >> > trained fonts? Either way my results look like this: >> >>> >> >> >>> >> > "On the Insertt ab, the galleriesi nclude itemst hat are >> designedto >> >>> >> > coordinatew ith the overall look of >> >>> >> > yourd ocumenYt. ou canu set heseg alleriesto insertt ablesh, >> >>> >> > eadersfo, >> >>> >> > otersl,i sts,c overp agesa, nd >> >>> >> > other document building blocks. When you create pictures, charts, >> or >> >>> >> > diagrams, they also coordinate >> >>> >> > with your current document look." >> >>> >> >> >>> >> > As you can see there are several words where the delineation >> between >> >>> >> > two words is somewhat jumbled. Is this a case of having to train >> >>> >> > tesseract or is it more down to the scan quality or preprocessing >> >>> >> > (or >> >>> >> > lack of)? >> >>> >> >> >>> >> > Any help or input greatly appreciated. >> >>> >> >> >>> >> > Regards >> >>> >> >> >>> >> > Tim >> >>> >> >> >>> >> > -- >> >>> >> > You received this message because you are subscribed to the >> Google >> >>> >> > Groups "tesseract-ocr" group. >> >>> >> > To post to this group, send email to >> [email protected] >> >>> >> > To unsubscribe from this group, send email to >> >>> >> > [email protected] >> >>> >> > For more options, visit this group at >> >>> >> >http://groups.google.com/group/tesseract-ocr?hl=en >> >>> > >> >>> > -- >> >>> > You received this message because you are subscribed to the Google >> >>> > Groups "tesseract-ocr" group. >> >>> > To post to this group, send email to [email protected] >> >>> > To unsubscribe from this group, send email to >> >>> > [email protected] >> >>> > For more options, visit this group at >> >>> > http://groups.google.com/group/tesseract-ocr?hl=en >> >>> > >> >>> >> >>> -- >> >>> You received this message because you are subscribed to the Google >> >>> Groups "tesseract-ocr" group. >> >>> To post to this group, send email to [email protected] >> >>> To unsubscribe from this group, send email to >> >>> [email protected] >> >>> For more options, visit this group at >> >>> http://groups.google.com/group/tesseract-ocr?hl=en >> >> >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > [email protected] >> > For more options, visit this group at >> > http://groups.google.com/group/tesseract-ocr?hl=en >> > >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

