[tesseract-ocr] does tesseract has cache thing?

2014-07-17 Thread Jing JC
I am exhausted figuring out how the user-words and user-patterns work? I did over 10 different experiments. the result never matched the word I put in user-words. What are the possible reasons? Thanks again in advance. -- You received this message because you are subscribed to the Google

[tesseract-ocr] anyone sheds light on their experiments/experiences with tesseract 3.03. just gonna use text2image function in 3.03, does it still worth to upgrade?

2014-07-17 Thread Jing JC
I do not need a zillion fonts or images though, just train some numbers. I read through all the posts. didn't see much cons for upgrading to 3.03 though. any hints? thanks in advance. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To

[tesseract-ocr] JTessbox Modifying the boxes

2014-07-17 Thread Jing JC
https://lh4.googleusercontent.com/-Cop8Qo5A9VQ/U8d3mOfAyWI/Aos/Lu9gT01DbKo/s1600/bounding+box+without+actual+letters+touching.png The Ray's tutorial said the bounding box overlaps. so when I modify the box inside JTessbox, do I keep the overlapping boxes, or make the boxes non

[tesseract-ocr] how is does tesseract make decision when classifying something?

2014-07-17 Thread Jing JC
https://lh5.googleusercontent.com/-yFFAJvw4F1U/U8d3NPEBjcI/Aok/G8IIxiz5aLA/s1600/selecting+.png seems not only does eng.cube.freq-words work. it is depended on other factors. too -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To

[tesseract-ocr] what does width= right -left = no silly +1/-1 mean in this tutorial?

2014-07-17 Thread Jing JC
I am going through Ray Smith's tutorial, and don't get it? anyone sheds some light on it? thank you. https://lh4.googleusercontent.com/-zJbwuVTCRg4/U8dqBCSoIEI/AoU/SSWEXrP9LwY/s1600/no+silly+%2B1%3A-1.png

[tesseract-ocr] Re: About the jpn.traindata

2014-07-17 Thread Barrie Treloar
On Monday, May 23, 2011 10:50:31 AM UTC+9:30, Mostafa wrote: Dear Mr. Smith Hope you passing a lovely day. I had post a FAQ about the jpn.traineddata which thread is as below:

Re: [tesseract-ocr] what does width= right -left = no silly +1/-1 mean in this tutorial?

2014-07-17 Thread Nick White
On Wed, Jul 16, 2014 at 11:17:00PM -0700, Jing JC wrote: I am going through Ray Smith's tutorial, and don't get it? He means that as the co-ordinate system uses bottom left as the origin, you will never get a minus number co-ordinate (as you could if the origin was elsewhere). -- You

Re: [tesseract-ocr] JTessbox Modifying the boxes

2014-07-17 Thread Nick White
On Thu, Jul 17, 2014 at 12:14:43AM -0700, Jing JC wrote: The Ray's tutorial said the bounding box overlaps. so when I modify the box inside JTessbox, do I keep the overlapping boxes, or make the boxes non touching. That's interesting, actually; I didn't realise Tesseract did outlining

[tesseract-ocr] Re: JTessbox Modifying the boxes

2014-07-17 Thread Paul
Citing from the Wiki (https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Old_Manual_method): It is sometimes important to space out the text a bit when printing, so up the inter-character and inter-line spacing in your word processor. Not spacing text out sufficiently will cause

[tesseract-ocr] error when shape clustering

2014-07-17 Thread Jing JC
[root@centos57 AdaleMono]# shapeclustering -F font_properties -U unicharset eng.matrx60x40.exp0.tr Reading eng.matrx60x40.exp0.tr ... Error: Unable to open eng.matrx60x40.exp0.tr! signal_termination_handler:Error:Signal_termination_handler called:Code 3000 Segmentation fault

[tesseract-ocr] Re: JTessbox Modifying the boxes

2014-07-17 Thread Jing JC
yep agree. didn't end up having overlapping during training stage yet. On Thursday, 17 July 2014 11:32:59 UTC-7, Paul wrote: Citing from the Wiki ( https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Old_Manual_method ): It is sometimes important to space out the text a bit

Re: [tesseract-ocr] JTessbox Modifying the boxes

2014-07-17 Thread Jing JC
yep yep. it happened during the bounding boxes I generated myself. not happened to the .box during the training step yet. On Thursday, 17 July 2014 10:17:39 UTC-7, Nick White wrote: On Thu, Jul 17, 2014 at 12:14:43AM -0700, Jing JC wrote: The Ray's tutorial said the bounding box

[tesseract-ocr] Re: error when shape clustering

2014-07-17 Thread Jing JC
I followed the naming convention though: tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] nobatch box.train while my font name contained number: my font name is matrx60x40, I changed to the name arial,and it worked. weird. I tried replacing the font name to

Re: [tesseract-ocr] Re: Missing detailed documentation about Unicharset files

2014-07-17 Thread Albrecht Hilker
Hello Nick It is great that you are motivated to make a documentation and that you answer the questions in the forum. Nevertheless I read a post from Ray where he says that he receives millions of emails and the last thing he likes to do is writing long texts (email responses or