https://lists.apache.org/x/thread.html/r8cacffe1c39a0c35471b29e1f6029be0226dd8a4c6670e3f607a6a3c@%3Cuser.tika.apache.org%3E
On Mon, Jan 11, 2021 at 6:18 PM Peter Kronenberg <[email protected]> wrote: > Hi, don’t think you ever responded to this one. What is the purpose of > the resize option? Why do we want to increase the size of the image at > all, much less by 9x? Does it make the OCR easier on a bigger image? Why > kind of assessment would you do to decide how much to enlarge? (I looked > for your TODO but couldn’t find it) > > > > Now that I understand better what enableImageProcessing means, I need to > re-run some of my performance tests > > > > *From:* Peter Kronenberg > *Sent:* Friday, January 8, 2021 1:36 PM > *To:* [email protected]; [email protected] > *Subject:* RE: {EXTERNAL}tesseract resize option > > > > Why does it increase the size of the image by 9 times? What is the > purpose? Does it help with a specific area? What are the reasons I > would want to change the default? > > In one of your samples, you used 300. Is there a reason you changed it > from 900? > > > > *From:* Tim Allison <[email protected]> > *Sent:* Friday, January 8, 2021 1:22 PM > *To:* [email protected] > *Subject:* {EXTERNAL}tesseract resize option > > > > CAUTION: This email originated from outside of the organization. DO NOT > click links or open attachments unless you recognize the sender and know > the content is safe. > > I was asked about this privately. > > > > This is an area of the codebase I've just become rather familiar > with...but hadn't touched. And, frankly, given the amount of traffic on > this area that we've had on the user lists, I don't think many others are > exercising this part of the codebase. Also, I'm not sure it is documented > _anywhere_ but I could be wrong. > > > > If you have ImageMagick configured _and_ you tell tesseract to > "enableImageProcessing", the tesseract ocr parser will include the "resize" > option in the commandline for imagemagick. > > > > The number is a percentage of the image's current size. The default is > 900%, which means that if everything is working, your image will be > expanded to 9x its current size. > > > > As I recently added in a TODO in the source code, we should probably do > some dynamic assessment of how much to enlarge an image (auto mode) because > 9x even on our unit test file takes _FOREVER_. > > > > The commandline args for imagemagick are set via: > > > > String[] args = new String[]{ > "-density", Integer.*toString*(config.getDensity()), > "-depth ", Integer.*toString*(config.getDepth()), > "-colorspace", config.getColorspace(), > "-filter", config.getFilter(), > "-resize", config.getResize() + "%", > "-rotate", angle, > sourceFile.toAbsolutePath().toString(), > targFile.toAbsolutePath().toString() > }; > >
