[tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz
I downloaded deu_frak.traineddata Fraktur.traineddata and frk.traineddata 
to usr/loca/share/tessdata. But when using

$ tesseract file.tiff -l Fraktur Fraktur

I get the error message

Error opening data file ./tessdata/Fraktur.traineddata 
Please make sure the TESSDATA_PREFIX environment variable is set to the 
parent directory of your "tessdata" directory. Failed loading language 
'Fraktur' Tesseract couldn't load any languages! Could not initialize 
tesseract.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e190c5c4-9099-4077-98a8-bf03802e509d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] bounding boxes

2018-04-10 Thread Slavomir Klis
Hi I'm trying to figure out how tesseract produces boundings boxes. What 
are the factors saying the two words going to be inside one box or splited 
into two distinct. It seems to be like random or depend on image quality 
but I cannot find more specific rules.

Where to look for more information?

Slavomir

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d51bbe48-3500-41cc-afb7-0f8ad7df7617%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
First of all: your command if wrong. It should be constructed this way:

tesseract image output [options]


See tesseract --help for more details.

Next:  error message is clear:

Error opening data file ./tessdata/Fraktur.traineddata

You (or your installation) instructed to look for trainneddata in current
director (./). Do you have it there?

Next tesseract gave you hint how you can fix the problem (" Please make
sure the TESSDATA_PREFIX..."). Did you use it?


Zdenko

2018-04-10 11:39 GMT+02:00 Firlefanz :

> I downloaded deu_frak.traineddata Fraktur.traineddata and frk.traineddata
> to usr/loca/share/tessdata. But when using
>
> $ tesseract file.tiff -l Fraktur Fraktur
>
> I get the error message
>
> Error opening data file ./tessdata/Fraktur.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to the
> parent directory of your "tessdata" directory. Failed loading language
> 'Fraktur' Tesseract couldn't load any languages! Could not initialize
> tesseract.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/e190c5c4-9099-4077-98a8-bf03802e509d%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wvsneanRfCYzxQixu7H9-oFHJ5-L89PVUK%3DKAHqMiJJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz

Thank you for your reply. I used the command following this guide 
https://www.youtube.com/watch?v=QhJiOCwz-_I -- if it's wrong, then I will 
not follow this guide anymore.

Yes, I have Fraktur.traineddata in usr/loca/share/tessdata

I do not know how to change "the TESSDATA_PREFIX environment variable"

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/25f7316b-424f-49f3-b33d-9a00fe5a1eaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Tesseract 4.0 on Alpine Linux Docker Container

2018-04-10 Thread Kalven Schraut
I am attempting to use tesseract's API in my project and everything works 
as expected on ubuntu when running the code, but I am receiving a seg fault 
when I moved everything over to an alpine docker container.

The backtrace from the segfault:
#0  0x72c4a50a in ?? () from /usr/lib/libgomp.so.1
#1  0x72c45d02 in GOMP_parallel () from /usr/lib/libgomp.so.1
#2  0x7492cfea in tesseract::FullyConnected::Forward 
(this=0x577dbb20, debug=, input=..., 
input_transpose=,
scratch=0x577f9fc8, output=0x5877b6a0) at fullyconnected.cpp:140
#3  0x749598ff in tesseract::Series::Forward (this=0x57803f60, 
debug=, input=..., input_transpose=,
scratch=0x577f9fc8, output=0x5877b6a0) at series.cpp:123
#4  0x749598ff in tesseract::Series::Forward (this=0x57803d60, 
debug=, input=..., input_transpose=,
scratch=0x577f9fc8, output=0x7fffc380) at series.cpp:123
#5  0x7493b8ce in tesseract::LSTMRecognizer::RecognizeLine 
(this=this@entry=0x577f9c80, image_data=..., invert=invert@entry=true,
debug=debug@entry=false, re_invert=re_invert@entry=false, 
upside_down=upside_down@entry=false, scale_factor=0x7fffc35c, 
inputs=0x7fffc410,
outputs=0x7fffc380) at lstmrecognizer.cpp:256
#6  0x7493c4d0 in tesseract::LSTMRecognizer::RecognizeLine 
(this=0x577f9c80, image_data=..., invert=invert@entry=true, debug=false,
worst_dict_cert=worst_dict_cert@entry=-3.5714285373687744, 
line_box=..., words=words@entry=0x7fffc600) at lstmrecognizer.cpp:190
#7  0x7480978f in tesseract::Tesseract::LSTMRecognizeWord 
(this=this@entry=0x57782420, block=..., row=row@entry=0x5861c2e0, 
word=,
words=words@entry=0x7fffc600) at linerec.cpp:241
#8  0x747ef729 in tesseract::Tesseract::classify_word_pass1 
(this=0x57782420, word_data=..., in_word=0x585bd4e0, 
out_words=0x7fffc600)
at control.cpp:1373
#9  0x747f09a5 in tesseract::Tesseract::RetryWithLanguage 
(this=0x57782420, word_data=..., recognizer=, 
debug=debug@entry=false,
in_word=0x585bd4e0, best_words=0x7fffc6e0) at control.cpp:898
#10 0x747f113b in tesseract::Tesseract::classify_word_and_language 
(this=this@entry=0x57782420, pass_n=pass_n@entry=1,
pr_it=pr_it@entry=0x7fffc850, 
word_data=word_data@entry=0x5863ac08) at control.cpp:1314
#11 0x747f463c in tesseract::Tesseract::RecogAllWordsPassN 
(this=this@entry=0x57782420, pass_n=pass_n@entry=1, 
monitor=monitor@entry=0x0,
pr_it=pr_it@entry=0x7fffc850, words=words@entry=0x7fffc830) at 
control.cpp:265
#12 0x747f612d in tesseract::Tesseract::recog_all_words 
(this=0x57782420, page_res=0x585d6160, monitor=monitor@entry=0x0,
target_word_box=target_word_box@entry=0x0, 
word_config=word_config@entry=0x0, dopasses=dopasses@entry=0) at 
control.cpp:352

I first tried installing the tesseract-git package in alpine where I 
noticed the issue so I just finished compiling the master branch of 
tesseract-ocr and I am still receiving the seg fault.

Also my compiled version of tesseract runs fine through the CLI.

I am lost as to what else could be the problem and would appreciate any 
help/direction on how to solve this issue.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/36d548db-99b4-4215-9bec-df8ae8b3d931%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
If you followed someone tutorial you should complain to its author ;-).

I am not familiar with Mac, but on linux you can do it (in command) this
way:

 export  TESSDATA_PREFIX=/usr/loca/share/

Maybe it is similar on Mac. Try to google how to set environment variable
on Mac.

Zdenko

2018-04-10 13:43 GMT+02:00 Firlefanz :

>
> Thank you for your reply. I used the command following this guide
> https://www.youtube.com/watch?v=QhJiOCwz-_I -- if it's wrong, then I will
> not follow this guide anymore.
>
> Yes, I have Fraktur.traineddata in usr/loca/share/tessdata
>
> I do not know how to change "the TESSDATA_PREFIX environment variable"
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/25f7316b-424f-49f3-b33d-9a00fe5a1eaf%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zmE1jRMvrvQ8Zi3UW_9LJ0AweM%2BFtZmHWdtKL-m1K74w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Fanatico
You installed it using brew or compiled it yourself?

try to type this in the terminal and post here the result

echo $TESSDATA_PREFIX


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4c014565-98b8-493d-9180-cb289c74075c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
Platform: MAC OS X
Tesseract: 4.0.0-beta.1-69-g10f4

Wen I execute a command like:

SCROLLVIEW_PATH=~/projects/tesseract/java \
  ~/projects/tesseract/training/lstmtraining \
--debug_interval 100 \
--continue_from ~/projects/ocr/training/kortrain/kor_from_full/kor.lstm 
\
--traineddata 
~/projects/ocr/training/kortrain/new_train/kor/kor.traineddata \
--append_index 5 \
--net_spec '[Lfx256 O1c111]' \
--model_output ~/projects/ocr/training/kortrain/kor_from_full/base \
--train_listfile 
~/projects/ocr/training/kortrain/new_train/kor.training_files.txt \
--eval_listfile 
~/projects/ocr/training/kortrain/eval/kor.training_files.txt \
--target_error_rate 1 
&>~/projects/ocr/training/kortrain/kor_from_full/basetrain.log

I have "--train_listfile" that tells the location of my training files for 
each font and I have "--eval_listfile" that I suppose is the location for 
the training files used to test the result of the training.

So my doubt is:
1 - Why I'm training with the fonts "A", "B" and "C" but testing with the 
fonts "D", "E" and "F"?
2 - And if I need to test using the same fonts, then why do I need to pass 
2 times the same file?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/532b2514-ff7d-4c2c-998a-d61a2aee653a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz
Nothing happens if I type in echo $TESSDATA_PREFIX

I thought about installing tesseract 4.0beta, is there a step-by-step-guide 
how to do this? with brew install tesseract I cannot choose the version, 
i.e. it's 3.05.01


Am Dienstag, 10. April 2018 15:07:18 UTC+2 schrieb Fanatico:
>
> You installed it using brew or compiled it yourself?
>
> try to type this in the terminal and post here the result
>
> echo $TESSDATA_PREFIX
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/93afc21e-5e17-469b-a5b4-52378c9ed926%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Doubt on "--eval_listfile"

2018-04-10 Thread ShreeDevi Kumar
To make sure that the model is not overfitted to training data, your eval
set should be different.

You can use a different text file, different fonts from the training set to
check that the model performs well on text and fonts it has not seen
earlier.

On Tue 10 Apr, 2018, 8:16 PM Fanatico,  wrote:

> Platform: MAC OS X
> Tesseract: 4.0.0-beta.1-69-g10f4
>
> Wen I execute a command like:
>
> SCROLLVIEW_PATH=~/projects/tesseract/java \
>   ~/projects/tesseract/training/lstmtraining \
> --debug_interval 100 \
> --continue_from
> ~/projects/ocr/training/kortrain/kor_from_full/kor.lstm \
> --traineddata
> ~/projects/ocr/training/kortrain/new_train/kor/kor.traineddata \
> --append_index 5 \
> --net_spec '[Lfx256 O1c111]' \
> --model_output ~/projects/ocr/training/kortrain/kor_from_full/base \
> --train_listfile
> ~/projects/ocr/training/kortrain/new_train/kor.training_files.txt \
> --eval_listfile
> ~/projects/ocr/training/kortrain/eval/kor.training_files.txt \
> --target_error_rate 1
> &>~/projects/ocr/training/kortrain/kor_from_full/basetrain.log
>
> I have "--train_listfile" that tells the location of my training files
> for each font and I have "--eval_listfile" that I suppose is the location
> for the training files used to test the result of the training.
>
> So my doubt is:
> 1 - Why I'm training with the fonts "A", "B" and "C" but testing with the
> fonts "D", "E" and "F"?
> 2 - And if I need to test using the same fonts, then why do I need to pass
> 2 times the same file?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/532b2514-ff7d-4c2c-998a-d61a2aee653a%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW9syRYqEWAMUSqaE%3DWY2TnRCp3BXPrnQ0pdTaAduxdNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Fanatico
try this code in the console:
brew info tesseract

This must return some info, one these infos is the path where your 
tesseract is installed copy it and execute this code on your console:
export TESSDATA_PREFIX=[the path you just copied]

try to execute your code again, if it works you can past this code on you 
.bash_profile or use it in every new terminal you open

I made an step by step to build and use tesseract 4.0 on my mac you can see 
it here .

Obs.: Read everything before doing it

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c27f5bb6-233c-479e-893a-0e49407b3acd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
I see, thanks for the reply.

On Tuesday, 10 April 2018 11:45:59 UTC-3, Fanatico wrote:
>
> Platform: MAC OS X
> Tesseract: 4.0.0-beta.1-69-g10f4
>
> Wen I execute a command like:
>
> SCROLLVIEW_PATH=~/projects/tesseract/java \
>   ~/projects/tesseract/training/lstmtraining \
> --debug_interval 100 \
> --continue_from 
> ~/projects/ocr/training/kortrain/kor_from_full/kor.lstm \
> --traineddata 
> ~/projects/ocr/training/kortrain/new_train/kor/kor.traineddata \
> --append_index 5 \
> --net_spec '[Lfx256 O1c111]' \
> --model_output ~/projects/ocr/training/kortrain/kor_from_full/base \
> --train_listfile 
> ~/projects/ocr/training/kortrain/new_train/kor.training_files.txt \
> --eval_listfile 
> ~/projects/ocr/training/kortrain/eval/kor.training_files.txt \
> --target_error_rate 1 
> &>~/projects/ocr/training/kortrain/kor_from_full/basetrain.log
>
> I have "--train_listfile" that tells the location of my training files 
> for each font and I have "--eval_listfile" that I suppose is the location 
> for the training files used to test the result of the training.
>
> So my doubt is:
> 1 - Why I'm training with the fonts "A", "B" and "C" but testing with the 
> fonts "D", "E" and "F"?
> 2 - And if I need to test using the same fonts, then why do I need to pass 
> 2 times the same file?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c219cf38-9c05-44d9-9cf9-ab8e05b960d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
I just thought, but can I pass only the ".training_text" file as a param ?
like --training_text

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/448c48c7-27ab-4fd7-aba4-43e37bc0aa06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
wen I asked about passing the ".training_text" as a param, I meant in the 
creation of the training data "training/tesstrain.sh"

On Tuesday, 10 April 2018 13:30:05 UTC-3, Fanatico wrote:
>
> I just thought, but can I pass only the ".training_text" file as a param ?
> like --training_text
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3b918a9a-0d49-4b28-b624-0e2e9df03f1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread ShreeDevi Kumar
Yes, and you can use different text files for training and eval.



ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Apr 10, 2018 at 10:01 PM, Fanatico  wrote:

> wen I asked about passing the ".training_text" as a param, I meant in the
> creation of the training data "training/tesstrain.sh"
>
> On Tuesday, 10 April 2018 13:30:05 UTC-3, Fanatico wrote:
>>
>> I just thought, but can I pass only the ".training_text" file as a param ?
>> like --training_text
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/3b918a9a-0d49-4b28-b624-0e2e9df03f1a%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVv8%3DVBvSJX7KXCJEazZjT%2Bfisj2efxB1mq2ApNGygz3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] How to train for multiple languages?

2018-04-10 Thread Fanatico
I want to train fo kor+chi how can I do it?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c5be93c0-125e-4e22-9f3d-cc162159178c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] How to train for multiple languages?

2018-04-10 Thread ShreeDevi Kumar
Ray has not given instructions for multi language or script type training.

You can try to concatenate the two training texts, word lists, merge the
unicharsets (merge_unicharsets command), and then do replace a layer
training with your primary language as base.

Also, unpack the Han and Hangul script traineddata using combine_tessdata
-u and look at the unicharset, word lists etc in it.

On Wed 11 Apr, 2018, 7:19 AM Fanatico,  wrote:

> I want to train fo kor+chi how can I do it?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/c5be93c0-125e-4e22-9f3d-cc162159178c%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWSEPpn%3DBy2n7dWuZ6cb3vY5mOqFinXW9QvYdXKuFDkTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Column splitting failed around fuzzy line

2018-04-10 Thread Ewan Mellor


Hi,


I am using Tesseract 4 (git 10f4998a) to process a file with two columns.  
A snippet of the image is shown below.  The problem is that there is a 
fuzzy line between the two columns, and the column detector has got 
confused.  I've ended up with one block covering the first column up to 
"The" on the second line, but then a block covering both columns with the 
"patient has ..." all the way across to "history of low".


I've looked in the debug views, and it looks to me like the line removal 
hasn't managed to remove that fuzzy line down the middle.  The "good" is 
then close enough that the column finder is deciding to merge the two 
blocks on that line.


Looking at the code in linefind.cpp and colfind.cpp, I see lots of 
constants for various thresholds, but I don't see any configurable ones, 
and I'm not sure which way to go now.  Would it be better to work on the 
line detector in linefind.cpp and try and get rid of that vertical line?  
Or would I be better to run a columnar histogram and try and do column 
splitting myself?  Or should I ignore the fact that the line hasn't been 
removed, and concentrate on tightening up the column finder so that it's 
able to separate these two columns correctly?  It seems to me that there's 
enough of a gap there that it ought to be able to separate the columns (it 
does a pretty good job on the rest of the document, so it can't be far off).


Any recommendations would be appreciated.


Thanks,


Ewan.






-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bdee5651-c305-4bbb-a14c-ccd5ba5cd7e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] How to include tesseract 4.00 to my visual studio c++ ??

2018-04-10 Thread abdelsalam . h . a . a
i have been using tesseract 3.04 i could use it just by adding the include 
file to my project, but when i download the new version tesseract 4.00 
there was no include file . plz any one can help me in this thank you . 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/007be974-01d9-4934-952a-145571edf788%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.