Hi Charles, Using screen captures for OCR usually doesn't work directly. You'll need to increase the resolution, perhaps with ImageMagick or something. Tesseract needs the height of each letter to be a certain amount -- you'll find info in the documentation, but try upsizing from 95 to 200 dpi. You should get better results. Ideal range is 200 -- 300 dpi. It is much better if you make the original image at that resolution. --Sven
On Tue, Jun 14, 2011 at 6:02 AM, Charles Roos <[email protected]>wrote: > VietOCR with Swe-pack does very bad OCR-ing of my "svenska.png" file. > Can you try if you get as bad results with attached file? > C. > > > 2011/6/12 Quan Nguyen <[email protected]> > >> You've mixed up between Tesseract program vs. data version. >> *.traineddata is for 3.0x. VietOCR.NET is currently only compatible >> with 2.04. To use *.traineddata, you'll need the Java version @ >> http://sourceforge.net/projects/vietocr/files/vietocr/3.1.3 . >> >> On Jun 12, 10:02 am, Charles Roos <[email protected]> wrote: >> > I don't have file "tess.exe" at all. >> > But i have those files: >> > C:\Program Files\VietOCR.NET\VietOCR.exe >> > C:\Program Files\VietOCR.NET\tessdata\nor.traineddata >> > C:\Program Files\FreeOCR\FreeOCR.exe >> > C:\Program Files\FreeOCR\tessdata\nor.traineddata >> > C:\WINDOWS\tessdata\nor.traineddata >> > >> > FreeOCR when running shows option to choose language "Norway", but >> > VietOCR doesn't show this language. >> > So, VietOCR doesn't allow to install new language at all, but FreeOCR >> > allows but the installaed language doesn't produce any output when ocr- >> > ing. >> > I will try both programs in my other computer on Monday. >> > I think i won't post screenshots here, i don't believe anything >> > solving can be seen on those. >> > C. >> > >> > On Jun 12, 5:37 pm, "Sriranga(78yrsold)" <[email protected]> >> > wrote: >> > >> > > here also same mistake done in the freeocr. Infact in vietocr tessdata >> > > folder is in the tesseract folder wherein it contains tess.exe and >> tessdata >> > > folder. >> > >> > > On Sun, Jun 12, 2011 at 8:04 PM, Charles Roos < >> [email protected]>wrote: >> > >> > > > I installed vietOCR now, the language combo has only English and >> > > > vietnamese language there. >> > > > I copypasted FreeOCR's Norway and Swedish language files to folder: >> > > > "C:\Program Files\VietOCR.NET\tessdata" >> > > > After restarting, the select-box "OCR-Laqnguage" didnt get those new >> > > > languages there. >> > > > When choosing vietnamese language i get system error/bug when >> OCR-ing, >> > > > with English option everything works. >> > > > I think something is wrong with my computer perhaps. >> > > > Thanks anyway, >> > > > C. >> > >> > > > On Jun 12, 5:21 pm, "Sriranga(78yrsold)" <[email protected]> >> > > > wrote: >> > > > > why not try with vietOCR which supports all langs and all formats >> of >> > > > image >> > >> > > > > On Sun, Jun 12, 2011 at 7:49 PM, Charles Roos < >> [email protected] >> > > > >wrote: >> > >> > > > > > I read and did exactly how is described under this link: >> > > > > >http://www.paperfile.net/ocr_lang.htm >> > > > > > If i click 'Settings' menu and then choose 'Open Language >> Folder' >> > > > > > then this folder is opened for me: >> > > > > > "C:\WINDOWS\tessdata\" >> > > > > > There i see 8 files starting with "eng.", and also i see files >> > > > > > "nor.traineddata", "swe.traineddata" both have ca 2332KB size. >> > > > > > When i start FreeOCR i see 3 languages in drop-down box "OCR >> > > > > > Language:": >> > > > > > eng >> > > > > > nor >> > > > > > swe. >> > > > > > If i select "eng", then OCR succeeds. But with oter 2 language >> ocr-ing >> > > > > > doesn't succeed. No new data comes to right panel. >> > > > > > Maybe i should try older FreeOCR version, i will try to find >> older >> > > > > > version. >> > > > > > C. >> > >> > > > > > On Jun 12, 5:12 pm, "Sriranga(78yrsold)" < >> [email protected]> >> > > > > > wrote: >> > > > > > > No I dont agree with your views. Even kANNADA lang works well >> in the >> > > > > > > freeOCR. have you read instructions how to add datafiles under >> > > > tessdata >> > > > > > > folder of free0cr? >> > >> > > > > > > On Sun, Jun 12, 2011 at 7:38 PM, Charles Roos < >> > > > [email protected] >> > > > > > >wrote: >> > >> > > > > > > > Also re-installing software didn't change anything- i can >> only do >> > > > OCR >> > > > > > > > in English, however i can select in Language combo box "nor" >> and >> > > > "swe" >> > > > > > > > now, which doesn't work. >> > > > > > > > I downloaded the exe-file from there: >> > > > > > > >http://www.paperfile.net/freeocr.exe >> > > > > > > > I have Windows Xp. >> > > > > > > > Seems for me that only english language works, other >> languages >> > > > don't >> > > > > > > > work. >> > > > > > > > C. >> > >> > > > > > > > On Jun 12, 4:54 pm, Charles Roos <[email protected] >> > >> > > > wrote: >> > > > > > > > > Also NORway language pack OCR doesn't produce any >> character for >> > > > me. >> > > > > > > > > Also when i create by hand directory >> > > > > > > > > "C:\Program Files\FreeOCR\tessdata" >> > > > > > > > > then nothing changes to better again. >> > > > > > > > > I restarted computer, no success of that again. >> > > > > > > > > I wil ltry to re-install Free-OCR software now. >> > > > > > > > > C. >> > >> > > > > > > > > On Jun 12, 4:34 pm, Sven Pedersen < >> [email protected]> >> > > > wrote: >> > >> > > > > > > > > > Hi Charles, >> > > > > > > > > > That is for fraktur fonts, I believe. It was my >> understanding >> > > > that >> > > > > > > > there was >> > > > > > > > > > another training set for regular Swedish. Check out >> > > > > > swe.traineddata.gz >> > > > > > > > athttp:// >> > > > > > code.google.com/p/tesseract-ocr/downloads/listhttp://code.goog. >> .. >> > > > > > > > > > But I'm of Norwegian extraction, so haven't looked into >> it >> > > > much... >> > > > > > :-P >> > > > > > > > > > -_Sven >> > >> > > > > > > > > > On Sun, Jun 12, 2011 at 8:20 AM, Charles Roos < >> > > > > > > > [email protected]>wrote: >> > >> > > > > > > > > > > I downloaded Swedish language pack file >> > > > ("swe-frak.traineddata") >> > > > > > from >> > > > > > > > > > > there: >> > >> > > > >> http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak. >> > > > > > .. >> > > > > > > > > > > I saved it to folder >> > > > > > > > > > > "C:\WINDOWS\tessdata\" >> > > > > > > > > > > I restarted "FreeOCR v3", i choosed from combobox "OCR >> > > > Language" >> > > > > > item >> > > > > > > > > > > "swe". >> > > > > > > > > > > I pressed "Scan", document image was scanned into left >> pane. >> > > > > > > > > > > Then i clicked "OCR", but nothing happened- the right >> pane >> > > > > > content >> > > > > > > > > > > stayed with helpful default text. >> > > > > > > > > > > Then i changed language to "Eng" and pressed "OCR", >> and right >> > > > > > panel >> > > > > > > > > > > was filled with scanned text, but shedish letters are >> wrong >> > > > in >> > > > > > this >> > > > > > > > > > > way. >> > > > > > > > > > > Why Swe-ocr doesn't work? >> > > > > > > > > > > Br., >> > > > > > > > > > > C. >> > >> > > > > > > > > > > On Jun 12, 4:07 pm, Charles Roos < >> [email protected]> >> > > > > > wrote: >> > > > > > > > > > > > Hi, >> > > > > > > > > > > > i found it, >> > > > > > > > > > > > thx. >> > >> > > > >> http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak. >> > > > > > .. >> > >> > > > > > > > > > > > On Jun 12, 3:40 pm, patrickq < >> > > > [email protected]> >> > > > > > > > wrote: >> > >> > > > > > > > > > > > > The Swedish language pack is right there on the >> downloads >> > > > > > page >> > > > > > > > (and >> > > > > > > > > > > > > we've been using it successfully). Don't know >> about >> > > > Estonian. >> > >> > > > > > > > > > > > > On Jun 12, 8:19 am, Charles Roos < >> > > > [email protected]> >> > > > > > > > wrote: >> > >> > > > > > > > > > > > > > Do you have >> > > > > > > > > > > > > > Language pack for: Swedish language, Estonian >> Language? >> > > > > > > > > > > > > > Or do you know free ocr software for those >> languages? >> > > > > > > > > > > > > > Thx. >> > >> > > > > > > > > > > -- >> > > > > > > > > > > You received this message because you are subscribed >> to the >> > > > > > Google >> > > > > > > > > > > Groups "tesseract-ocr" group. >> > > > > > > > > > > To post to this group, send email to >> > > > > > [email protected] >> > > > > > > > > > > To unsubscribe from this group, send email to >> > > > > > > > > > > [email protected] >> > > > > > > > > > > For more options, visit this group at >> > > > > > > > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > > > > > > > > -- >> > > > > > > > > > ``All that is gold does not glitter, >> > > > > > > > > > not all those who wander are lost; >> > > > > > > > > > the old that is strong does not wither, >> > > > > > > > > > deep roots are not reached by the frost. >> > > > > > > > > > From the ashes a fire shall be woken, >> > > > > > > > > > a light from the shadows shall spring; >> > > > > > > > > > renewed shall be blade that was broken, >> > > > > > > > > > the crownless again shall be king.” >> > >> > > > > > > > -- >> > > > > > > > You received this message because you are subscribed to the >> Google >> > > > > > > > Groups "tesseract-ocr" group. >> > > > > > > > To post to this group, send email to >> > > > [email protected] >> > > > > > > > To unsubscribe from this group, send email to >> > > > > > > > [email protected] >> > > > > > > > For more options, visit this group at >> > > > > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > > > > -- >> > > > > > You received this message because you are subscribed to the >> Google >> > > > > > Groups "tesseract-ocr" group. >> > > > > > To post to this group, send email to >> [email protected] >> > > > > > To unsubscribe from this group, send email to >> > > > > > [email protected] >> > > > > > For more options, visit this group at >> > > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > > -- >> > > > You received this message because you are subscribed to the Google >> > > > Groups "tesseract-ocr" group. >> > > > To post to this group, send email to [email protected] >> > > > To unsubscribe from this group, send email to >> > > > [email protected] >> > > > For more options, visit this group at >> > > >http://groups.google.com/group/tesseract-ocr?hl=en >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

