Re: [tesseract-ocr] Re: Change unicharset

2018-04-12 Thread ShreeDevi Kumar
1. concatenate the two training texts cat ./langdata/kor/kor.training_text ./langdata/chi_tra/chi_tra.training_text > ./langdata/kor/kor-chi_tra.training_text 2. run tesstrain.sh with (update for your paths, run with just one font which supports both languages as a test)

[tesseract-ocr] Re: Change unicharset

2018-04-12 Thread Fanatico
And if I look at the "kor.unicharset" created after executing "training/tesstrain.sh" it only contains the korean characters, even after I changing "kor.lstm-unicharset" from the "kor.traineddata" -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Change unicharset

2018-04-12 Thread Fanatico
I already did it, but I keep getting this error on "training/tesstrain.sh": No block overlapping textline: 가능한 튤립 첫 칼럼 절차 주 - 하기 말썽쟁이 같다 ㆍ 상품권 팁 | No block overlapping textline: 겪은 덕숀 수 대 라이브 넥 ' 토론 게시판 10 헵번 등 관련 담뿍 No block overlapping textline: 자리 유통 댈 월 피부 쥬얼리 에 뿌찢 타겟 그룹 안팎일 똑똑한 No block

Re: [tesseract-ocr] Change unicharset

2018-04-12 Thread ShreeDevi Kumar
You cannot just overwrite the lstm.unicharset in a tarineddata file, the unicharset has to be in sync with the other files in it i.e. lstm, dawgs, recoder etc. > I'm merging the ```kor.training_text``` with the ```chi_tra.training_text``` for tests You need to go through the complete training

[tesseract-ocr] Change unicharset

2018-04-12 Thread Fanatico
I'm trying to add Chinese to my Korean charset, but I'm not able to do it. Obs.: Since Korean can use some Chinese characters (hanja) I'm merging the ```kor.training_text``` with the ```chi_tra.training_text``` for tests Reference: https://en.wikipedia.org/wiki/Hanja

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

2018-04-12 Thread Kalven Schraut
also looks like calling configure with the --disable-openmp prevents the seg fault which allows me to continue to use tesseract for now would still like to figure out why the seg fault is occuring with multiple threads. On Thursday, April 12, 2018 at 8:23:06 PM UTC-5, Kalven Schraut wrote: > >

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

2018-04-12 Thread Kalven Schraut
Looks like the issue is from #pragma omp parallel for num_threads(kNumThreads) at https://github.com/tesseract-ocr/tesseract/blob/master/lstm/fullyconnected.cpp#L140 anyone more familiar with openMP know a possible reason for a seg fault there? On Tuesday, April 10, 2018 at 7:29:31 AM UTC-5,

Re: [tesseract-ocr] How to include tesseract 4.00 to my visual studio c++ ??

2018-04-12 Thread Zdenko Podobny
You should download the source and build and install it with cppan + cmake. See https://github.com/tesseract-ocr/tesseract/wiki/Compiling#develop-tesseract Zdenko 2018-04-11 4:21 GMT+02:00 : > i have been using tesseract 3.04 i could use it just by adding the include

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

2018-04-12 Thread Kalven Schraut
After doing some more digging and running valgrind on code the last fews lines were ==360==by 0x95B913A: tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*, tesseract::WordData*) (control.cpp:1314) ==360==by 0x95BC63B: tesseract::Tesseract::RecogAllWordsPassN(int,