1. concatenate the two training texts
cat ./langdata/kor/kor.training_text
./langdata/chi_tra/chi_tra.training_text >
./langdata/kor/kor-chi_tra.training_text
2. run tesstrain.sh with (update for your paths, run with just one font
which supports both languages as a test)
And if I look at the "kor.unicharset" created after executing
"training/tesstrain.sh" it only contains the korean characters, even after
I changing "kor.lstm-unicharset" from the "kor.traineddata"
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr"
I already did it, but I keep getting this error on "training/tesstrain.sh":
No block overlapping textline: 가능한 튤립 첫 칼럼 절차 주 - 하기 말썽쟁이 같다 ㆍ 상품권 팁 |
No block overlapping textline: 겪은 덕숀 수 대 라이브 넥 ' 토론 게시판 10 헵번 등 관련 담뿍
No block overlapping textline: 자리 유통 댈 월 피부 쥬얼리 에 뿌찢 타겟 그룹 안팎일 똑똑한
No block
You cannot just overwrite the lstm.unicharset in a tarineddata file, the
unicharset has to be in sync with the other files in it i.e. lstm, dawgs,
recoder etc.
> I'm merging the ```kor.training_text``` with the
```chi_tra.training_text``` for tests
You need to go through the complete training
I'm trying to add Chinese to my Korean charset, but I'm not able to do it.
Obs.: Since Korean can use some Chinese characters (hanja) I'm merging the
```kor.training_text``` with the ```chi_tra.training_text``` for tests
Reference:
https://en.wikipedia.org/wiki/Hanja
also looks like calling configure with the --disable-openmp prevents the
seg fault which allows me to continue to use tesseract for now would still
like to figure out why the seg fault is occuring with multiple threads.
On Thursday, April 12, 2018 at 8:23:06 PM UTC-5, Kalven Schraut wrote:
>
>
Looks like the issue is from
#pragma omp parallel for num_threads(kNumThreads) at
https://github.com/tesseract-ocr/tesseract/blob/master/lstm/fullyconnected.cpp#L140
anyone more familiar with openMP know a possible reason for a seg fault
there?
On Tuesday, April 10, 2018 at 7:29:31 AM UTC-5,
You should download the source and build and install it with cppan + cmake.
See
https://github.com/tesseract-ocr/tesseract/wiki/Compiling#develop-tesseract
Zdenko
2018-04-11 4:21 GMT+02:00 :
> i have been using tesseract 3.04 i could use it just by adding the include
After doing some more digging and running valgrind on code the last fews
lines were
==360==by 0x95B913A:
tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*,
tesseract::WordData*) (control.cpp:1314)
==360==by 0x95BC63B: tesseract::Tesseract::RecogAllWordsPassN(int,
9 matches
Mail list logo