charles, congratulations!. You have succeeded in using vietocr. You can utilise "<lang>.DangAmbigs.txt" under folder "Data" as well as "dict" also to attain more accuracy. Wish you Good Luck, -sriranga(78yrs)
On Wed, Jun 15, 2011 at 12:41 AM, Charles Roos <[email protected]>wrote: > Hi, > I finally got with java based VietOCR my Swedish ocr-ing working, with not > so bad results. > Thx everybody, > C. > > > > 2011/6/14 Quan Nguyen <[email protected]> > >> Charles, please try again with Image > Screenshot Mode turned on. >> >> Albeit, scan the image again with proper resolution, as Sven >> suggested. The resolution of screen captures is generally not adequate >> for OCR purpose. >> >> On Jun 14, 6:02 am, Charles Roos <[email protected]> wrote: >> > VietOCR with Swe-pack does very bad OCR-ing of my "svenska.png" file. >> > Can you try if you get as bad results with attached file? >> > C. >> > >> > 2011/6/12 Quan Nguyen <[email protected]> >> > >> > > You've mixed up between Tesseract program vs. data version. >> > > *.traineddata is for 3.0x. VietOCR.NET is currently only compatible >> > > with 2.04. To use *.traineddata, you'll need the Java version @ >> > >http://sourceforge.net/projects/vietocr/files/vietocr/3.1.3. >> > >> > > On Jun 12, 10:02 am, Charles Roos <[email protected]> wrote: >> > > > I don't have file "tess.exe" at all. >> > > > But i have those files: >> > > > C:\Program Files\VietOCR.NET\VietOCR.exe >> > > > C:\Program Files\VietOCR.NET\tessdata\nor.traineddata >> > > > C:\Program Files\FreeOCR\FreeOCR.exe >> > > > C:\Program Files\FreeOCR\tessdata\nor.traineddata >> > > > C:\WINDOWS\tessdata\nor.traineddata >> > >> > > > FreeOCR when running shows option to choose language "Norway", but >> > > > VietOCR doesn't show this language. >> > > > So, VietOCR doesn't allow to install new language at all, but >> FreeOCR >> > > > allows but the installaed language doesn't produce any output when >> ocr- >> > > > ing. >> > > > I will try both programs in my other computer on Monday. >> > > > I think i won't post screenshots here, i don't believe anything >> > > > solving can be seen on those. >> > > > C. >> > >> > > > On Jun 12, 5:37 pm, "Sriranga(78yrsold)" <[email protected]> >> > > > wrote: >> > >> > > > > here also same mistake done in the freeocr. Infact in vietocr >> tessdata >> > > > > folder is in the tesseract folder wherein it contains tess.exe and >> > > tessdata >> > > > > folder. >> > >> > > > > On Sun, Jun 12, 2011 at 8:04 PM, Charles Roos < >> > > [email protected]>wrote: >> > >> > > > > > I installed vietOCR now, the language combo has only English and >> > > > > > vietnamese language there. >> > > > > > I copypasted FreeOCR's Norway and Swedish language files to >> folder: >> > > > > > "C:\Program Files\VietOCR.NET\tessdata" >> > > > > > After restarting, the select-box "OCR-Laqnguage" didnt get those >> new >> > > > > > languages there. >> > > > > > When choosing vietnamese language i get system error/bug when >> > > OCR-ing, >> > > > > > with English option everything works. >> > > > > > I think something is wrong with my computer perhaps. >> > > > > > Thanks anyway, >> > > > > > C. >> > >> > > > > > On Jun 12, 5:21 pm, "Sriranga(78yrsold)" < >> [email protected]> >> > > > > > wrote: >> > > > > > > why not try with vietOCR which supports all langs and all >> formats >> > > of >> > > > > > image >> > >> > > > > > > On Sun, Jun 12, 2011 at 7:49 PM, Charles Roos < >> > > [email protected] >> > > > > > >wrote: >> > >> > > > > > > > I read and did exactly how is described under this link: >> > > > > > > >http://www.paperfile.net/ocr_lang.htm >> > > > > > > > If i click 'Settings' menu and then choose 'Open Language >> Folder' >> > > > > > > > then this folder is opened for me: >> > > > > > > > "C:\WINDOWS\tessdata\" >> > > > > > > > There i see 8 files starting with "eng.", and also i see >> files >> > > > > > > > "nor.traineddata", "swe.traineddata" both have ca 2332KB >> size. >> > > > > > > > When i start FreeOCR i see 3 languages in drop-down box "OCR >> > > > > > > > Language:": >> > > > > > > > eng >> > > > > > > > nor >> > > > > > > > swe. >> > > > > > > > If i select "eng", then OCR succeeds. But with oter 2 >> language >> > > ocr-ing >> > > > > > > > doesn't succeed. No new data comes to right panel. >> > > > > > > > Maybe i should try older FreeOCR version, i will try to find >> > > older >> > > > > > > > version. >> > > > > > > > C. >> > >> > > > > > > > On Jun 12, 5:12 pm, "Sriranga(78yrsold)" < >> > > [email protected]> >> > > > > > > > wrote: >> > > > > > > > > No I dont agree with your views. Even kANNADA lang works >> well >> > > in the >> > > > > > > > > freeOCR. have you read instructions how to add datafiles >> under >> > > > > > tessdata >> > > > > > > > > folder of free0cr? >> > >> > > > > > > > > On Sun, Jun 12, 2011 at 7:38 PM, Charles Roos < >> > > > > > [email protected] >> > > > > > > > >wrote: >> > >> > > > > > > > > > Also re-installing software didn't change anything- i >> can >> > > only do >> > > > > > OCR >> > > > > > > > > > in English, however i can select in Language combo box >> "nor" >> > > and >> > > > > > "swe" >> > > > > > > > > > now, which doesn't work. >> > > > > > > > > > I downloaded the exe-file from there: >> > > > > > > > > >http://www.paperfile.net/freeocr.exe >> > > > > > > > > > I have Windows Xp. >> > > > > > > > > > Seems for me that only english language works, other >> > > languages >> > > > > > don't >> > > > > > > > > > work. >> > > > > > > > > > C. >> > >> > > > > > > > > > On Jun 12, 4:54 pm, Charles Roos < >> [email protected]> >> > > > > > wrote: >> > > > > > > > > > > Also NORway language pack OCR doesn't produce any >> character >> > > for >> > > > > > me. >> > > > > > > > > > > Also when i create by hand directory >> > > > > > > > > > > "C:\Program Files\FreeOCR\tessdata" >> > > > > > > > > > > then nothing changes to better again. >> > > > > > > > > > > I restarted computer, no success of that again. >> > > > > > > > > > > I wil ltry to re-install Free-OCR software now. >> > > > > > > > > > > C. >> > >> > > > > > > > > > > On Jun 12, 4:34 pm, Sven Pedersen < >> [email protected] >> > >> > > > > > wrote: >> > >> > > > > > > > > > > > Hi Charles, >> > > > > > > > > > > > That is for fraktur fonts, I believe. It was my >> > > understanding >> > > > > > that >> > > > > > > > > > there was >> > > > > > > > > > > > another training set for regular Swedish. Check out >> > > > > > > > swe.traineddata.gz >> > > > > > > > > > athttp:// >> > > > > > > > >> code.google.com/p/tesseract-ocr/downloads/listhttp://code.goog. >> > > .. >> > > > > > > > > > > > But I'm of Norwegian extraction, so haven't looked >> into >> > > it >> > > > > > much... >> > > > > > > > :-P >> > > > > > > > > > > > -_Sven >> > >> > > > > > > > > > > > On Sun, Jun 12, 2011 at 8:20 AM, Charles Roos < >> > > > > > > > > > [email protected]>wrote: >> > >> > > > > > > > > > > > > I downloaded Swedish language pack file >> > > > > > ("swe-frak.traineddata") >> > > > > > > > from >> > > > > > > > > > > > > there: >> > >> > > > > > >> http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak >> > > . >> > > > > > > > .. >> > > > > > > > > > > > > I saved it to folder >> > > > > > > > > > > > > "C:\WINDOWS\tessdata\" >> > > > > > > > > > > > > I restarted "FreeOCR v3", i choosed from combobox >> "OCR >> > > > > > Language" >> > > > > > > > item >> > > > > > > > > > > > > "swe". >> > > > > > > > > > > > > I pressed "Scan", document image was scanned into >> left >> > > pane. >> > > > > > > > > > > > > Then i clicked "OCR", but nothing happened- the >> right >> > > pane >> > > > > > > > content >> > > > > > > > > > > > > stayed with helpful default text. >> > > > > > > > > > > > > Then i changed language to "Eng" and pressed >> "OCR", and >> > > right >> > > > > > > > panel >> > > > > > > > > > > > > was filled with scanned text, but shedish letters >> are >> > > wrong >> > > > > > in >> > > > > > > > this >> > > > > > > > > > > > > way. >> > > > > > > > > > > > > Why Swe-ocr doesn't work? >> > > > > > > > > > > > > Br., >> > > > > > > > > > > > > C. >> > >> > > > > > > > > > > > > On Jun 12, 4:07 pm, Charles Roos < >> > > [email protected]> >> > > > > > > > wrote: >> > > > > > > > > > > > > > Hi, >> > > > > > > > > > > > > > i found it, >> > > > > > > > > > > > > > thx. >> > >> > > > > > >> http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak >> > > . >> > > > > > > > .. >> > >> > > > > > > > > > > > > > On Jun 12, 3:40 pm, patrickq < >> > > > > > [email protected]> >> > > > > > > > > > wrote: >> > >> > > > > > > > > > > > > > > The Swedish language pack is right there on >> the >> > > downloads >> > > > > > > > page >> > > > > > > > > > (and >> > > > > > > > > > > > > > > we've been using it successfully). Don't know >> about >> > > > > > Estonian. >> > >> > > > > > > > > > > > > > > On Jun 12, 8:19 am, Charles Roos < >> > > > > > [email protected]> >> > > > > > > > > > wrote: >> > >> > > > > > > > > > > > > > > > Do you have >> > > > > > > > > > > > > > > > Language pack for: Swedish language, >> Estonian >> > > Language? >> > > > > > > > > > > > > > > > Or do you know free ocr software for those >> > > languages? >> > > > > > > > > > > > > > > > Thx. >> > >> > > > > > > > > > > > > -- >> > > > > > > > > > > > > You received this message because you are >> subscribed to >> > > the >> > > > > > > > Google >> > > > > > > > > > > > > Groups "tesseract-ocr" group. >> > > > > > > > > > > > > To post to this group, send email to >> > > > > > > > [email protected] >> > > > > > > > > > > > > To unsubscribe from this group, send email to >> > > > > > > > > > > > > [email protected] >> > > > > > > > > > > > > For more options, visit this group at >> > > > > > > > > > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > > > > > > > > > > -- >> > > > > > > > > > > > ``All that is gold does not glitter, >> > > > > > > > > > > > not all those who wander are lost; >> > > > > > > > > > > > the old that is strong does not wither, >> > > > > > > > > > > > deep roots are not reached by the frost. >> > > > > > > > > > > > From the ashes a fire shall be woken, >> > > > > > > > > > > > a light from the shadows shall spring; >> > > > > > > > > > > > renewed shall be blade that was broken, >> > > > > > > > > > > > the crownless again shall be king.” >> > >> > > > > > > > > > -- >> > > > > > > > > > You received this message because you are subscribed to >> the >> > > Google >> > > > > > > > > > Groups "tesseract-ocr" group. >> > > > > > > > > > To post to this group, send email to >> > > > > > [email protected] >> > > > > > > > > > To unsubscribe from this group, send email to >> > > > > > > > > > [email protected] >> > > > > > > > > > For more options, visit this group at >> > > > > > > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > > > > > > -- >> > > > > > > > You received this message because you are subscribed to the >> > > Google >> > > > > > > > Groups "tesseract-ocr" group. >> > > > > > > > To post to this group, send email to >> > > [email protected] >> > > > > > > > To unsubscribe from this group, send email to >> > > > > > > > [email protected] >> > > > > > > > For more options, visit this group at >> > > > > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > > > > -- >> > > > > > You received this message because you are subscribed to the >> Google >> > > > > > Groups "tesseract-ocr" group. >> > > > > > To post to this group, send email to >> [email protected] >> > > > > > To unsubscribe from this group, send email to >> > > > > > [email protected] >> > > > > > For more options, visit this group at >> > > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > -- >> > >> > ... >> > >> > read more » >> > >> > vietocr_very_bad_swe_ocr.png >> > 421KViewDownload >> > >> > svenska.png >> > 54KViewDownload >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

