[tesseract-ocr] Change text from training

2018-04-15 Thread Fanatico
What is the correct way to change the training text from a traineddata that I'm working? I'm training an new traineddata and it started to get some results, but now I want to change the text used to train it and continue from where I stopped. How can I do it? -- You received this message

[tesseract-ocr] Re: Change unicharset

2018-04-12 Thread Fanatico
And if I look at the "kor.unicharset" created after executing "training/tesstrain.sh" it only contains the korean characters, even after I changing "kor.lstm-unicharset" from the "kor.traineddata" -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Change unicharset

2018-04-12 Thread Fanatico
training_text``` for tests > > You need to go through the complete training process after this. Only then > both set of characters will reflected in it. > > You can try add a layer training with tessdata_best/kor.traineddata to > continue from. > > ShreeDevi > __

[tesseract-ocr] Change unicharset

2018-04-12 Thread Fanatico
I'm trying to add Chinese to my Korean charset, but I'm not able to do it. Obs.: Since Korean can use some Chinese characters (hanja) I'm merging the ```kor.training_text``` with the ```chi_tra.training_text``` for tests Reference: https://en.wikipedia.org/wiki/Hanja

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-11 Thread Fanatico
on an image that has Korean and Chinese it is going to recognize some Korean characters as Chinese and some Chinese characters as Korean. On Monday, 9 April 2018 05:15:57 UTC-3, shree wrote: > > Leftover from 3.04, my guess. > > On Mon 9 Apr, 2018, 12:52 PM Fanatico, <fana

[tesseract-ocr] Re: How to train for multiple languages?

2018-04-11 Thread Fanatico
Thanks, I was going to do this, just to be sure if there wasn't a way to train 2 traineddata like the actual. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

[tesseract-ocr] How to train for multiple languages?

2018-04-10 Thread Fanatico
I want to train fo kor+chi how can I do it? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group,

[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
wen I asked about passing the ".training_text" as a param, I meant in the creation of the training data "training/tesstrain.sh" On Tuesday, 10 April 2018 13:30:05 UTC-3, Fanatico wrote: > > I just thought, but can I pass only the ".training_text" file

[tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
I see, thanks for the reply. On Tuesday, 10 April 2018 11:45:59 UTC-3, Fanatico wrote: > > Platform: MAC OS X > Tesseract: 4.0.0-beta.1-69-g10f4 > > Wen I execute a command like: > > SCROLLVIEW_PATH=~/projects/tesseract/java \ > ~/projects/tessera

[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Fanatico
try this code in the console: brew info tesseract This must return some info, one these infos is the path where your tesseract is installed copy it and execute this code on your console: export TESSDATA_PREFIX=[the path you just copied] try to execute your code again, if it works you can past

[tesseract-ocr] Doubt on "--eval_listfile"

2018-04-10 Thread Fanatico
Platform: MAC OS X Tesseract: 4.0.0-beta.1-69-g10f4 Wen I execute a command like: SCROLLVIEW_PATH=~/projects/tesseract/java \ ~/projects/tesseract/training/lstmtraining \ --debug_interval 100 \ --continue_from ~/projects/ocr/training/kortrain/kor_from_full/kor.lstm \ --traineddata

[tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Fanatico
You installed it using brew or compiled it yourself? try to type this in the terminal and post here the result echo $TESSDATA_PREFIX -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread Fanatico
The conf from kor did already have it #Fixes https://github.com/tesseract-ocr/tesseract/issues/1009 preserve_interword_spaces 1 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread Fanatico
9 Apr, 2018, 11:48 AM Fanatico, <fanati...@gmail.com > > wrote: > >> I used one traineddata that I created on removing the top layer from the >> kor.traineddata from "tessdata_best", after this I replaced this >> traineddata with the one from "tess

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread Fanatico
I used one traineddata that I created on removing the top layer from the kor.traineddata from "tessdata_best", after this I replaced this traineddata with the one from "tessdata_best" and got the same problem. Yes, it include chi_tra as sublanguage tessedit_load_sublangs chi_tra

[tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-08 Thread Fanatico
I'm running tesseract with the "-l kor" param but it is detecting some chinese characters, the image really have 3 chinese characters but none of them is returning correctly (and I'm not expecting them to return correctly) but the others korean characters are being recognized as chinese

[tesseract-ocr] Install and run tesseract 4.0 on MAC OSX step by step

2018-04-08 Thread Fanatico
I just posted at the repo issues a step to step that I needed to do so I could use tessercat 4.0 from my MAC, so I'm just sharing the link in case someone has the same problems I got. Obs.: It can save a few days of your life https://github.com/tesseract-ocr/tesseract/issues/1453 -- You

Re: [tesseract-ocr] Failed to build ScrollView.jar on MAC OSX

2018-04-08 Thread Fanatico
I managed to build it, but I needed to clone the repo and build it to use. So I don't recommend to install tesseract using brew -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,

Re: [tesseract-ocr] Failed to build ScrollView.jar on MAC OSX

2018-04-07 Thread Fanatico
from the java folder "cd ~/projects/tesseract/java" in my case On Saturday, 7 April 2018 12:40:29 UTC-3, shree wrote: > > Please see > https://github.com/tesseract-ocr/tesseract/blob/master/Makefile.am > > From which dir did you try > > make ScrollView.jar > > ShreeDevi >

[tesseract-ocr] Failed to build ScrollView.jar on MAC OSX

2018-04-07 Thread Fanatico
Hi. I finally got the training from 4.o to work, but I was unable to build the ScrollView.jar so Im currently running the test with "--debug_interval -1". Can someone help Me? Sistem Platform: MAC OS X 10.13.3 (installed with brew) Tesseract: 4.0.0-beta.1 leptonica: 1.75.3 libjpeg 9c :

Re: [tesseract-ocr] ERROR: exp0.box does not exist or is not readable

2018-04-07 Thread Fanatico
se look here: https://github.com/tesseract-ocr/tesseract/issues/736 On Saturday, 7 April 2018 04:35:36 UTC-3, shree wrote: > > Look in your tmp directory in the sub folders referred in the console > output > > Check the log file and other files there > > On Sat 7 Apr, 2018

Re: [tesseract-ocr] ERROR: exp0.box does not exist or is not readable

2018-04-06 Thread Fanatico
Yes the location is correct, I tried to put the full path to the folder and go the same error. Im just cloned the https://github.com/tesseract-ocr/langdata repo On Friday, 6 April 2018 23:28:06 UTC-3, shree wrote: > > Is your langdata in --langdata_dir ../../langdata > >> >> >> -- You

[tesseract-ocr] ERROR: exp0.box does not exist or is not readable

2018-04-06 Thread Fanatico
I'm trying to execute the training from the 4.o tutorial, but I'm getting an error, can someone help with this? Platform: MAC OS X 10.13.3 Tesseract: 4.0.0-beta.1 leptonica: 1.75.3 libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 Code used ../../tesseract/training/tesstrain.sh \

Re: [tesseract-ocr] Error at training 4.0

2018-04-05 Thread Fanatico
Thanks for the quick response, I did not see this part in the documentation ... My problem is that in the image "kor.AppleMyungjo.exp0.tif" the tesseract is recognizing nothing, the box file is empty and in the image "kor.AppleMyungjo.exp1.tif" it is not recognizing the last quotation marks

[tesseract-ocr] Error at training 4.0

2018-04-04 Thread Fanatico
Hi, I'm new to tesseract and ocr in general, and need some help to train my tesseract. Config Platform: Mac OS X 10.13.3 Tesseract Version: 4.0.0-beta.1 leptonica: 1.75.3 libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 images used kor.AppleMyungjo.exp1.tif