Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-11 Thread Fanatico
After some research in Korean I found that they do use Chinese characters in their language, so it is correct to set Chinese as a sublanguage, the problem is that the kor.training_text doesn't have chinede letters, so the code is only training Korean and ignoring the Chinese, so if I tesseract

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread Fanatico
The conf from kor did already have it #Fixes https://github.com/tesseract-ocr/tesseract/issues/1009 preserve_interword_spaces 1 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread ShreeDevi Kumar
For Korean, please check whether adding the following lines to config, improves your results further. #Fixes https://github.com/tesseract-ocr/tesseract/issues/1009 preserve_interword_spaces 1 ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread ShreeDevi Kumar
Leftover from 3.04, my guess. On Mon 9 Apr, 2018, 12:52 PM Fanatico, wrote: > It worked, thanks. > > Any reason for this chi_tra there? > > > On Monday, 9 April 2018 03:24:44 UTC-3, shree wrote: >> >> Please remove the sub language line from config file, and use combine

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread Fanatico
It worked, thanks. Any reason for this chi_tra there? On Monday, 9 April 2018 03:24:44 UTC-3, shree wrote: > > Please remove the sub language line from config file, and use combine > tessdata to overwrite it. > > Right now it seems to be using chi_tra also. > > On Mon 9 Apr, 2018, 11:48 AM

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread ShreeDevi Kumar
Please remove the sub language line from config file, and use combine tessdata to overwrite it. Right now it seems to be using chi_tra also. On Mon 9 Apr, 2018, 11:48 AM Fanatico, wrote: > I used one traineddata that I created on removing the top layer from the >

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread Fanatico
I used one traineddata that I created on removing the top layer from the kor.traineddata from "tessdata_best", after this I replaced this traineddata with the one from "tessdata_best" and got the same problem. Yes, it include chi_tra as sublanguage tessedit_load_sublangs chi_tra

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-08 Thread ShreeDevi Kumar
Which traineddata are you using? Use combine_tessdata and extract the config file to see if chinese is included as sub language. Also look at the lstm-unicharset to see if the Chinese characters are included in it. On Mon 9 Apr, 2018, 11:09 AM Fanatico, wrote: > I'm

[tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-08 Thread Fanatico
I'm running tesseract with the "-l kor" param but it is detecting some chinese characters, the image really have 3 chinese characters but none of them is returning correctly (and I'm not expecting them to return correctly) but the others korean characters are being recognized as chinese