[tesseract-ocr] bounding boxes

2018-04-10 Thread Slavomir Klis
Hi I'm trying to figure out how tesseract produces boundings boxes. What are the factors saying the two words going to be inside one box or splited into two distinct. It seems to be like random or depend on image quality but I cannot find more specific rules. Where to look for more

[tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz
I downloaded deu_frak.traineddata Fraktur.traineddata and frk.traineddata to usr/loca/share/tessdata. But when using $ tesseract file.tiff -l Fraktur Fraktur I get the error message Error opening data file ./tessdata/Fraktur.traineddata Please make sure the TESSDATA_PREFIX environment

Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz
Thank you for your reply. I used the command following this guide https://www.youtube.com/watch?v=QhJiOCwz-_I -- if it's wrong, then I will not follow this guide anymore. Yes, I have Fraktur.traineddata in usr/loca/share/tessdata I do not know how to change "the TESSDATA_PREFIX environment

Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
First of all: your command if wrong. It should be constructed this way: tesseract image output [options] See tesseract --help for more details. Next: error message is clear: Error opening data file ./tessdata/Fraktur.traineddata You (or your installation) instructed to look for trainneddata

[tesseract-ocr] Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
Platform: MAC OS X Tesseract: 4.0.0-beta.1-69-g10f4 Wen I execute a command like: SCROLLVIEW_PATH=~/projects/tesseract/java \ ~/projects/tesseract/training/lstmtraining \ --debug_interval 100 \ --continue_from ~/projects/ocr/training/kortrain/kor_from_full/kor.lstm \ --traineddata

Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
If you followed someone tutorial you should complain to its author ;-). I am not familiar with Mac, but on linux you can do it (in command) this way: export TESSDATA_PREFIX=/usr/loca/share/ Maybe it is similar on Mac. Try to google how to set environment variable on Mac. Zdenko 2018-04-10

[tesseract-ocr] Tesseract 4.0 on Alpine Linux Docker Container

2018-04-10 Thread Kalven Schraut
I am attempting to use tesseract's API in my project and everything works as expected on ubuntu when running the code, but I am receiving a seg fault when I moved everything over to an alpine docker container. The backtrace from the segfault: #0 0x72c4a50a in ?? () from

[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Firlefanz
Nothing happens if I type in echo $TESSDATA_PREFIX I thought about installing tesseract 4.0beta, is there a step-by-step-guide how to do this? with brew install tesseract I cannot choose the version, i.e. it's 3.05.01 Am Dienstag, 10. April 2018 15:07:18 UTC+2 schrieb Fanatico: > > You

[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Fanatico
You installed it using brew or compiled it yourself? try to type this in the terminal and post here the result echo $TESSDATA_PREFIX -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails

Re: [tesseract-ocr] Doubt on "--eval_listfile"

2018-04-10 Thread ShreeDevi Kumar
To make sure that the model is not overfitted to training data, your eval set should be different. You can use a different text file, different fonts from the training set to check that the model performs well on text and fonts it has not seen earlier. On Tue 10 Apr, 2018, 8:16 PM Fanatico,

Re: [tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread ShreeDevi Kumar
Yes, and you can use different text files for training and eval. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Apr 10, 2018 at 10:01 PM, Fanatico wrote: > wen I asked about passing

[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
I see, thanks for the reply. On Tuesday, 10 April 2018 11:45:59 UTC-3, Fanatico wrote: > > Platform: MAC OS X > Tesseract: 4.0.0-beta.1-69-g10f4 > > Wen I execute a command like: > > SCROLLVIEW_PATH=~/projects/tesseract/java \ > ~/projects/tesseract/training/lstmtraining \ >

[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Fanatico
try this code in the console: brew info tesseract This must return some info, one these infos is the path where your tesseract is installed copy it and execute this code on your console: export TESSDATA_PREFIX=[the path you just copied] try to execute your code again, if it works you can past

[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
wen I asked about passing the ".training_text" as a param, I meant in the creation of the training data "training/tesstrain.sh" On Tuesday, 10 April 2018 13:30:05 UTC-3, Fanatico wrote: > > I just thought, but can I pass only the ".training_text" file as a param ? > like --training_text > --

[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
I just thought, but can I pass only the ".training_text" file as a param ? like --training_text -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [tesseract-ocr] How to train for multiple languages?

2018-04-10 Thread ShreeDevi Kumar
Ray has not given instructions for multi language or script type training. You can try to concatenate the two training texts, word lists, merge the unicharsets (merge_unicharsets command), and then do replace a layer training with your primary language as base. Also, unpack the Han and Hangul

[tesseract-ocr] How to train for multiple languages?

2018-04-10 Thread Fanatico
I want to train fo kor+chi how can I do it? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group,