[tesseract-ocr] Difference trained data for Chinese

2017-08-11 Thread Yang Yu
Good day! Recently I was using tesseract (4.0 alpha) to do Chinese OCR and it works really great. Now I want to pick up a best model to use but I find several versions. What is the difference between them? 1. chi_sim from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files (around 50M)

[tesseract-ocr] Re: How Much TO Enlarge Screenshot?

2017-08-11 Thread Dbsk Dbsk
cause tesseract operate at 300dpi, so you should change to that level. for example the screen dpi is 72, then enlarge the screenshot to 400% On Friday, August 4, 2017 at 1:56:32 AM UTC+8, James Lee wrote: > > Is there way to find out how much to enlarge a screenshot for best > accuracy? > Is

[tesseract-ocr] How to know how many symbol is a word in pagelayout?

2017-08-11 Thread Dbsk Dbsk
i can use the code below to draw every word and every symbol bounding box, now i want to if i can know how many symbol in the word when i got a word? thanks for any info! = #include #include #include #include using namespace std; int main() {

[tesseract-ocr] Re: Difference trained data for Chinese

2017-08-11 Thread shree
Please see https://github.com/tesseract-ocr/tessdata/issues/72 On Friday, August 11, 2017 at 2:26:55 PM UTC+5:30, Yang Yu wrote: > > Good day! > > Recently I was using tesseract (4.0 alpha) to do Chinese OCR and it works > really great. Now I want to pick up a best model to use but I find