date:20180412

Re: [tesseract-ocr] Re: Change unicharset

2018-04-12 Thread ShreeDevi Kumar

1. concatenate the two training texts cat ./langdata/kor/kor.training_text ./langdata/chi_tra/chi_tra.training_text > ./langdata/kor/kor-chi_tra.training_text 2. run tesstrain.sh with (update for your paths, run with just one font which supports both languages as a test)

[tesseract-ocr] Re: Change unicharset

2018-04-12 Thread Fanatico

And if I look at the "kor.unicharset" created after executing "training/tesstrain.sh" it only contains the korean characters, even after I changing "kor.lstm-unicharset" from the "kor.traineddata" -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Change unicharset

2018-04-12 Thread Fanatico

I already did it, but I keep getting this error on "training/tesstrain.sh": No block overlapping textline: 가능한 튤립 첫 칼럼 절차 주 - 하기 말썽쟁이 같다 ㆍ 상품권 팁 | No block overlapping textline: 겪은 덕숀 수 대 라이브 넥 ' 토론 게시판 １０ 헵번 등 관련 담뿍 No block overlapping textline: 자리 유통 댈 월 피부 쥬얼리 에 뿌찢 타겟 그룹 안팎일 똑똑한 No block

Re: [tesseract-ocr] Change unicharset

2018-04-12 Thread ShreeDevi Kumar

You cannot just overwrite the lstm.unicharset in a tarineddata file, the unicharset has to be in sync with the other files in it i.e. lstm, dawgs, recoder etc. > I'm merging the ```kor.training_text``` with the ```chi_tra.training_text``` for tests You need to go through the complete training

[tesseract-ocr] Change unicharset

2018-04-12 Thread Fanatico

I'm trying to add Chinese to my Korean charset, but I'm not able to do it. Obs.: Since Korean can use some Chinese characters (hanja) I'm merging the ```kor.training_text``` with the ```chi_tra.training_text``` for tests Reference: https://en.wikipedia.org/wiki/Hanja

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

2018-04-12 Thread Kalven Schraut

also looks like calling configure with the --disable-openmp prevents the seg fault which allows me to continue to use tesseract for now would still like to figure out why the seg fault is occuring with multiple threads. On Thursday, April 12, 2018 at 8:23:06 PM UTC-5, Kalven Schraut wrote: > >

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

2018-04-12 Thread Kalven Schraut

Looks like the issue is from #pragma omp parallel for num_threads(kNumThreads) at https://github.com/tesseract-ocr/tesseract/blob/master/lstm/fullyconnected.cpp#L140 anyone more familiar with openMP know a possible reason for a seg fault there? On Tuesday, April 10, 2018 at 7:29:31 AM UTC-5,

Re: [tesseract-ocr] How to include tesseract 4.00 to my visual studio c++ ??

2018-04-12 Thread Zdenko Podobny

You should download the source and build and install it with cppan + cmake. See https://github.com/tesseract-ocr/tesseract/wiki/Compiling#develop-tesseract Zdenko 2018-04-11 4:21 GMT+02:00 : > i have been using tesseract 3.04 i could use it just by adding the include

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

2018-04-12 Thread Kalven Schraut

After doing some more digging and running valgrind on code the last fews lines were ==360==by 0x95B913A: tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*, tesseract::WordData*) (control.cpp:1314) ==360==by 0x95BC63B: tesseract::Tesseract::RecogAllWordsPassN(int,

Re: [tesseract-ocr] Re: Change unicharset

[tesseract-ocr] Re: Change unicharset

Re: [tesseract-ocr] Change unicharset

Re: [tesseract-ocr] Change unicharset

[tesseract-ocr] Change unicharset

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

Re: [tesseract-ocr] How to include tesseract 4.00 to my visual studio c++ ??

[tesseract-ocr] Re: Tesseract 4.0 on Alpine Linux Docker Container

9 matches

Site Navigation

Mail list logo

Footer information