Re: Quality of OCR

Sriranga(78yrsold) Thu, 08 Sep 2011 01:03:56 -0700

On Thu, Sep 8, 2011 at 1:24 PM, Sriranga(78yrsold)
<[email protected]>wrote:


> Hi Dmitri,
> Thanks for the encouragement to pursue the OCR.I am really extremely
> grateful to you for all valuable guidance rendered to me from time to time-
> which I cannot forget your great noble help.
>
> As suggested I download again OCR.tim from the Tim website and saved
> downloaded and uploaded in your cloud demo. The result is reproduced below:
> 1)for ocrbook-1.tif (unedited - original)=
> *s On the Insert tab, the galleries include items that are designed to
> coordinate with the overall look of
> your document. You can use these galleries to insert tables, headers,
> footers, lists, cover pages, and
>
> other document building blocks. When you create pictures, charts, or
> diagrams, they also coordinate
> = g with your current document look.*
> two times experimented but output is same.
> 2)ocrbook-2.tif(edited in paintbrush - removed speckles with help of
> magnifier in the paintbrush itself. and tested in your demo) output was
> correct. reproduced below =
> *
> On the Insert tab, the galleries include items that are designed to
> coordinate with the overall look of
> your document. You can use these galleries to insert tables, headers,
> footers, lists, cover pages, and
>
> other document building blocks. When you create pictures, charts, or
> diagrams, they also coordinate
> with your current document look.
> *
>
> 3) testing.tif(this is renamed file for unittled.tif forwarded to you
> earlier)
> when uploaded in your Demo : error displayed as
> ERROR: "Illegal parameter values: 'name' cannot be blank."
> Where I made mistake?
> For your experiment purpose, whether trainedata file is sufficient or all
> generated  data files like unicharset,etc are required for your testing?
> I tested using kannada.tif tile output was in English - this proves your
> demo is supported for all langs - depends of<Lang>traineddata files are
> installed in the cloud - I think.
> With Warmest Regards,
> -sriranga(78yrs)
>
>
>
>
> On Thu, Sep 8, 2011 at 2:56 AM, Dmitri Silaev <[email protected]>wrote:
>
>> Hi Sriranga!
>>
>> Glad you are now OK. I must express my respect and admiration on your
>> efforts in the OCR field while having all these troubles with your
>> health.
>>
>> You are right, the result for *your image* with CustomOCR Tesseract
>> demo is exactly like you've attached. But *your image* is not the same
>> as the image *Tim had sent*: the Tim's is much smaller, having as much
>> background as needed around the text, while yours is having huge
>> whitespace to the bottom right to the text. Shame to Tesseract but
>> this degrades recognition accuracy much.
>>
>> A hint on how to obtain Tim's image the right way. Click Tim's link,
>> then in the menu choose File\Donwload Original. Then save the file
>> onto your local hard drive. After that indicate that file in the Image
>> file field of the CustomOCR Standard Tesseract OCR demo and then run
>> processing.
>>
>> Once you've tested the demo with Tim's image you will get the perfect,
>> crisp and clear result, check this yourself.
>>
>> And the last. Absolutely no objections on making Kannada recognition
>> in the form of CustomOCR demo. Is I see now, this should be a separate
>> demo. I'll be glad to make this for the community and waiting for you
>> kindly send me your last traineddata components as well as the
>> compiled traineddata file.
>>
>> Warm regards,
>> Dmitri Silaev
>> www.CustomOCR.com
>>
>>
>>
>>
>>
>> On Wed, Sep 7, 2011 at 6:58 AM, Sriranga(78yrsold)
>> <[email protected]> wrote:
>> > Hi Dmitri,
>> > I got laser treatment for my blurred vision. Now OK. I tested in your
>> demo
>> > attached output below
>> > On ﬁle Insert tab, the gallzries xnclude items that are dcslgled to
>> > onnrdinab: with the ova:-all look of
>> > your dncumem. You can use than galleries w insert mum, heudms, footers,
>> um,
>> > cover pig»-5, and
>> > other ducumcnt budding blucls. wh=- you mm piclures, mm, at diagrams,
>> they
>> > also ccordinlle
>> > with yuur wmm document lnnk.
>> > I am using r-527 winxp
>> > commandline used as follow:
>> > M:\>tesseract untitled.TIF testtif
>> > Tesseract Open Source OCR Engine with Leptonica
>> > Number of found pages: 1.
>> > M:\>
>> > M:\>tesseract untitled.TIF 2testtif -l eng
>> > Tesseract Open Source OCR Engine with Leptonica
>> > Number of found pages: 1.
>> >
>> > M:\>
>> > submitted for your persual. I find no difference between demo and cmd
>> > output. Where i made a mistake.
>> > I may kindly be informed whether your demo cannot be tested for Kannada
>> ?
>> > With regards,
>> > -sriranga(78yrs)
>> >
>> >
>> >
>> > On Fri, Sep 2, 2011 at 8:19 AM, Sriranga(78yrsold) <
>> [email protected]>
>> > wrote:
>> >>
>> >> HI dmitri,
>> >> I am still using r-527 and winxp. I am suffering from blurred vision.
>> >> With warm regards,
>> >> -sriranga(78)
>> >>
>> >> On Thu, Sep 1, 2011 at 8:22 PM, Dmitri Silaev <[email protected]>
>> >> wrote:
>> >>>
>> >>> I don't know your Tesseract's version but here you can witness that
>> >>> with rev. 580 the result is perfect:
>> >>>
>> http://www.customocr.com/index.php?r=site/page&view=demos.tesseract_ocr
>> >>> The extra chars in the first and last lines are due to some speckle
>> >>> noise to the left of these lines.
>> >>>
>> >>> Warm regards,
>> >>> Dmitri Silaev
>> >>> www.CustomOCR.com
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Sep 1, 2011 at 2:36 PM, Tim Alexander <
>> [email protected]>
>> >>> wrote:
>> >>> > Apologies.  Have google docced a portion of the tif file I ran
>> >>> > tesseract on:
>> >>> >
>> >>> >
>> >>> >
>> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B-BfHrAa9J5kZDEzNWRmODItZGFiZi00Y2NkLWI2N2MtZjA5MDg1OTEzYjky&hl=en_US
>> >>> >
>> >>> > Regards
>> >>> >
>> >>> > Tim
>> >>> >
>> >>> > On Aug 31, 8:08 pm, Dmitri Silaev <[email protected]> wrote:
>> >>> >> No chance to answer your questions without a sample image. Please
>> >>> >> provide.
>> >>> >>
>> >>> >> Warm regards,
>> >>> >> Dmitri Silaevwww.CustomOCR.com
>> >>> >>
>> >>> >> On Wed, Aug 31, 2011 at 3:43 PM, Tim Alexander
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> <[email protected]> wrote:
>> >>> >> > Seem to have tesseract setup and scripted ok running on Ubuntu
>> >>> >> > 11.04.
>> >>> >> > However I am finding my accuracy for OCR to be fairly low.  At
>> first
>> >>> >> > I
>> >>> >> > thought it was the scanned documents I was using but I recently
>> ran
>> >>> >> > my
>> >>> >> > script against a printed and scanned Word document using Times
>> New
>> >>> >> > Roman with the output from MS Words random paragraph function.
>> >>> >>
>> >>> >> > I was undere the impression that the english training data that
>> is
>> >>> >> > downloadable from the site included times new roman as one of the
>> >>> >> > pre
>> >>> >> > trained fonts?  Either way my results look like this:
>> >>> >>
>> >>> >> > "On the Insertt ab, the galleriesi nclude itemst hat are
>> designedto
>> >>> >> > coordinatew ith the overall look of
>> >>> >> > yourd ocumenYt. ou canu set heseg alleriesto insertt ablesh,
>> >>> >> > eadersfo,
>> >>> >> > otersl,i sts,c overp agesa, nd
>> >>> >> > other document building blocks. When you create pictures, charts,
>> or
>> >>> >> > diagrams, they also coordinate
>> >>> >> > with your current document look."
>> >>> >>
>> >>> >> > As you can see there are several words where the delineation
>> between
>> >>> >> > two words is somewhat jumbled.  Is this a case of having to train
>> >>> >> > tesseract or is it more down to the scan quality or preprocessing
>> >>> >> > (or
>> >>> >> > lack of)?
>> >>> >>
>> >>> >> > Any help or input greatly appreciated.
>> >>> >>
>> >>> >> > Regards
>> >>> >>
>> >>> >> > Tim
>> >>> >>
>> >>> >> > --
>> >>> >> > You received this message because you are subscribed to the
>> Google
>> >>> >> > Groups "tesseract-ocr" group.
>> >>> >> > To post to this group, send email to
>> [email protected]
>> >>> >> > To unsubscribe from this group, send email to
>> >>> >> > [email protected]
>> >>> >> > For more options, visit this group at
>> >>> >> >http://groups.google.com/group/tesseract-ocr?hl=en
>> >>> >
>> >>> > --
>> >>> > You received this message because you are subscribed to the Google
>> >>> > Groups "tesseract-ocr" group.
>> >>> > To post to this group, send email to [email protected]
>> >>> > To unsubscribe from this group, send email to
>> >>> > [email protected]
>> >>> > For more options, visit this group at
>> >>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >>> >
>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google
>> >>> Groups "tesseract-ocr" group.
>> >>> To post to this group, send email to [email protected]
>> >>> To unsubscribe from this group, send email to
>> >>> [email protected]
>> >>> For more options, visit this group at
>> >>> http://groups.google.com/group/tesseract-ocr?hl=en
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Quality of OCR

Reply via email to