Hi Charles,
Using screen captures for OCR usually doesn't work directly. You'll need to
increase the resolution, perhaps with ImageMagick or something. Tesseract
needs the height of each letter to be a certain amount -- you'll find info
in the documentation, but try upsizing from 95 to 200 dpi. You should get
better results. Ideal range is 200 -- 300 dpi. It is much better if you make
the original image at that resolution.
--Sven


On Tue, Jun 14, 2011 at 6:02 AM, Charles Roos <[email protected]>wrote:

> VietOCR with Swe-pack does very bad OCR-ing of my "svenska.png" file.
> Can you try if you get as bad results with attached file?
> C.
>
>
> 2011/6/12 Quan Nguyen <[email protected]>
>
>> You've mixed up between Tesseract program vs. data version.
>> *.traineddata is for 3.0x. VietOCR.NET is currently only compatible
>> with 2.04. To use *.traineddata, you'll need the Java version @
>> http://sourceforge.net/projects/vietocr/files/vietocr/3.1.3 .
>>
>> On Jun 12, 10:02 am, Charles Roos <[email protected]> wrote:
>> > I don't have file "tess.exe" at all.
>> > But i have those files:
>> > C:\Program Files\VietOCR.NET\VietOCR.exe
>> > C:\Program Files\VietOCR.NET\tessdata\nor.traineddata
>> > C:\Program Files\FreeOCR\FreeOCR.exe
>> > C:\Program Files\FreeOCR\tessdata\nor.traineddata
>> > C:\WINDOWS\tessdata\nor.traineddata
>> >
>> > FreeOCR when running shows option to choose language "Norway", but
>> > VietOCR doesn't show this language.
>> > So, VietOCR doesn't allow to install new language at all, but FreeOCR
>> > allows but the installaed language doesn't produce any output when ocr-
>> > ing.
>> > I will try both programs in my other computer on Monday.
>> > I think i won't post screenshots here, i don't believe anything
>> > solving can be seen on those.
>> > C.
>> >
>> > On Jun 12, 5:37 pm, "Sriranga(78yrsold)" <[email protected]>
>> > wrote:
>> >
>> > > here also same mistake done in the freeocr. Infact in vietocr tessdata
>> > > folder is in the tesseract folder wherein it contains tess.exe and
>> tessdata
>> > > folder.
>> >
>> > > On Sun, Jun 12, 2011 at 8:04 PM, Charles Roos <
>> [email protected]>wrote:
>> >
>> > > > I installed vietOCR now, the language combo has only English and
>> > > > vietnamese language there.
>> > > > I copypasted FreeOCR's Norway and Swedish language files to folder:
>> > > > "C:\Program Files\VietOCR.NET\tessdata"
>> > > > After restarting, the select-box "OCR-Laqnguage" didnt get those new
>> > > > languages there.
>> > > > When choosing vietnamese language i get system error/bug when
>> OCR-ing,
>> > > > with English option everything works.
>> > > > I think something is wrong with my computer perhaps.
>> > > > Thanks anyway,
>> > > > C.
>> >
>> > > > On Jun 12, 5:21 pm, "Sriranga(78yrsold)" <[email protected]>
>> > > > wrote:
>> > > > > why not try with vietOCR which supports all langs and all formats
>> of
>> > > > image
>> >
>> > > > > On Sun, Jun 12, 2011 at 7:49 PM, Charles Roos <
>> [email protected]
>> > > > >wrote:
>> >
>> > > > > > I read and did exactly how is described under this link:
>> > > > > >http://www.paperfile.net/ocr_lang.htm
>> > > > > > If i click 'Settings' menu and then choose 'Open Language
>> Folder'
>> > > > > > then this folder is opened for me:
>> > > > > > "C:\WINDOWS\tessdata\"
>> > > > > > There i see 8 files starting with "eng.", and also i see files
>> > > > > > "nor.traineddata", "swe.traineddata" both have ca 2332KB size.
>> > > > > > When i start FreeOCR i see 3 languages in drop-down box "OCR
>> > > > > > Language:":
>> > > > > > eng
>> > > > > > nor
>> > > > > > swe.
>> > > > > > If i select "eng", then OCR succeeds. But with oter 2 language
>> ocr-ing
>> > > > > > doesn't succeed. No new data comes to right panel.
>> > > > > > Maybe i should try older FreeOCR version, i will try to find
>> older
>> > > > > > version.
>> > > > > > C.
>> >
>> > > > > > On Jun 12, 5:12 pm, "Sriranga(78yrsold)" <
>> [email protected]>
>> > > > > > wrote:
>> > > > > > > No I dont agree with your views. Even kANNADA lang works well
>> in the
>> > > > > > > freeOCR. have you read instructions how to add datafiles under
>> > > > tessdata
>> > > > > > > folder of free0cr?
>> >
>> > > > > > > On Sun, Jun 12, 2011 at 7:38 PM, Charles Roos <
>> > > > [email protected]
>> > > > > > >wrote:
>> >
>> > > > > > > > Also re-installing software didn't change anything- i can
>> only do
>> > > > OCR
>> > > > > > > > in English, however i can select in Language combo box "nor"
>> and
>> > > > "swe"
>> > > > > > > > now, which doesn't work.
>> > > > > > > > I downloaded the exe-file from there:
>> > > > > > > >http://www.paperfile.net/freeocr.exe
>> > > > > > > > I have Windows Xp.
>> > > > > > > > Seems for me that only english language works, other
>> languages
>> > > > don't
>> > > > > > > > work.
>> > > > > > > > C.
>> >
>> > > > > > > > On Jun 12, 4:54 pm, Charles Roos <[email protected]
>> >
>> > > > wrote:
>> > > > > > > > > Also NORway language pack OCR doesn't produce any
>> character for
>> > > > me.
>> > > > > > > > > Also when i create by hand directory
>> > > > > > > > > "C:\Program Files\FreeOCR\tessdata"
>> > > > > > > > > then nothing changes to better again.
>> > > > > > > > > I restarted computer, no success of that again.
>> > > > > > > > > I wil ltry to re-install Free-OCR software now.
>> > > > > > > > > C.
>> >
>> > > > > > > > > On Jun 12, 4:34 pm, Sven Pedersen <
>> [email protected]>
>> > > > wrote:
>> >
>> > > > > > > > > > Hi Charles,
>> > > > > > > > > > That is for fraktur fonts, I believe. It was my
>> understanding
>> > > > that
>> > > > > > > > there was
>> > > > > > > > > > another training set for regular Swedish. Check out
>> > > > > > swe.traineddata.gz
>> > > > > > > > athttp://
>> > > > > > code.google.com/p/tesseract-ocr/downloads/listhttp://code.goog.
>> ..
>> > > > > > > > > > But I'm of Norwegian extraction, so haven't looked into
>> it
>> > > > much...
>> > > > > > :-P
>> > > > > > > > > > -_Sven
>> >
>> > > > > > > > > > On Sun, Jun 12, 2011 at 8:20 AM, Charles Roos <
>> > > > > > > > [email protected]>wrote:
>> >
>> > > > > > > > > > > I downloaded Swedish language pack file
>> > > > ("swe-frak.traineddata")
>> > > > > > from
>> > > > > > > > > > > there:
>> >
>> > > >
>> http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak.
>> > > > > > ..
>> > > > > > > > > > > I saved it to folder
>> > > > > > > > > > > "C:\WINDOWS\tessdata\"
>> > > > > > > > > > > I restarted "FreeOCR v3", i choosed from combobox "OCR
>> > > > Language"
>> > > > > > item
>> > > > > > > > > > > "swe".
>> > > > > > > > > > > I pressed "Scan", document image was scanned into left
>> pane.
>> > > > > > > > > > > Then i clicked "OCR", but nothing happened- the right
>> pane
>> > > > > > content
>> > > > > > > > > > > stayed with helpful default text.
>> > > > > > > > > > > Then i changed language to "Eng" and pressed "OCR",
>> and right
>> > > > > > panel
>> > > > > > > > > > > was filled with scanned text, but shedish letters are
>> wrong
>> > > > in
>> > > > > > this
>> > > > > > > > > > > way.
>> > > > > > > > > > > Why Swe-ocr doesn't work?
>> > > > > > > > > > > Br.,
>> > > > > > > > > > > C.
>> >
>> > > > > > > > > > > On Jun 12, 4:07 pm, Charles Roos <
>> [email protected]>
>> > > > > > wrote:
>> > > > > > > > > > > > Hi,
>> > > > > > > > > > > > i found it,
>> > > > > > > > > > > > thx.
>> >
>> > > >
>> http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak.
>> > > > > > ..
>> >
>> > > > > > > > > > > > On Jun 12, 3:40 pm, patrickq <
>> > > > [email protected]>
>> > > > > > > > wrote:
>> >
>> > > > > > > > > > > > > The Swedish language pack is right there on the
>> downloads
>> > > > > > page
>> > > > > > > > (and
>> > > > > > > > > > > > > we've been using it successfully). Don't know
>> about
>> > > > Estonian.
>> >
>> > > > > > > > > > > > > On Jun 12, 8:19 am, Charles Roos <
>> > > > [email protected]>
>> > > > > > > > wrote:
>> >
>> > > > > > > > > > > > > > Do you have
>> > > > > > > > > > > > > > Language pack for: Swedish language, Estonian
>> Language?
>> > > > > > > > > > > > > > Or do you know free ocr software for those
>> languages?
>> > > > > > > > > > > > > > Thx.
>> >
>> > > > > > > > > > > --
>> > > > > > > > > > > You received this message because you are subscribed
>> to the
>> > > > > > Google
>> > > > > > > > > > > Groups "tesseract-ocr" group.
>> > > > > > > > > > > To post to this group, send email to
>> > > > > > [email protected]
>> > > > > > > > > > > To unsubscribe from this group, send email to
>> > > > > > > > > > > [email protected]
>> > > > > > > > > > > For more options, visit this group at
>> > > > > > > > > > >http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> > > > > > > > > > --
>> > > > > > > > > > ``All that is gold does not glitter,
>> > > > > > > > > >   not all those who wander are lost;
>> > > > > > > > > > the old that is strong does not wither,
>> > > > > > > > > >   deep roots are not reached by the frost.
>> > > > > > > > > > From the ashes a fire shall be woken,
>> > > > > > > > > >   a light from the shadows shall spring;
>> > > > > > > > > > renewed shall be blade that was broken,
>> > > > > > > > > >   the crownless again shall be king.”
>> >
>> > > > > > > > --
>> > > > > > > > You received this message because you are subscribed to the
>> Google
>> > > > > > > > Groups "tesseract-ocr" group.
>> > > > > > > > To post to this group, send email to
>> > > > [email protected]
>> > > > > > > > To unsubscribe from this group, send email to
>> > > > > > > > [email protected]
>> > > > > > > > For more options, visit this group at
>> > > > > > > >http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> > > > > > --
>> > > > > > You received this message because you are subscribed to the
>> Google
>> > > > > > Groups "tesseract-ocr" group.
>> > > > > > To post to this group, send email to
>> [email protected]
>> > > > > > To unsubscribe from this group, send email to
>> > > > > > [email protected]
>> > > > > > For more options, visit this group at
>> > > > > >http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> > > > --
>> > > > You received this message because you are subscribed to the Google
>> > > > Groups "tesseract-ocr" group.
>> > > > To post to this group, send email to [email protected]
>> > > > To unsubscribe from this group, send email to
>> > > > [email protected]
>> > > > For more options, visit this group at
>> > > >http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to