On Friday, February 19, 2016 at 3:00:42 PM UTC-5, viraf wrote: > > Tom, I created a multi-page TIFF as per earlier recommendation on this > thread (avoid multiple inits). Running it on Linux from the command line > provided me with a reference by which to compute PPM that I could target > with Tess4J. I had hoped to get 10+ PPM / core and shift focus on > accuracy. I am at about 6 PPM and unclear where / how to improve > performance (speed). >
I take it the question about the representativeness of that size file was too sensitive/boring/trivial/... to answer. Given the issues with multi-page TIFFs, one experiment worth running is to try a list of single page TIFFs instead of one ridiculously large file. $ cat > filelist.txt page0001.tif page0002.tif ... page1800.tif $ tesseract filelist.txt Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/977f7307-3440-4c3c-a053-2de9b7c3c4f0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

