https://lists.apache.org/x/thread.html/r8cacffe1c39a0c35471b29e1f6029be0226dd8a4c6670e3f607a6a3c@%3Cuser.tika.apache.org%3E


On Mon, Jan 11, 2021 at 6:18 PM Peter Kronenberg <[email protected]>
wrote:

> Hi, don’t think you ever responded to this one.  What is the purpose of
> the resize option?  Why do we want to increase the size of the image at
> all, much less by 9x?  Does it make the OCR easier on a bigger image?  Why
> kind of assessment would you do to decide how much to enlarge? (I looked
> for your TODO but couldn’t find it)
>
>
>
> Now that I understand better what enableImageProcessing means, I need to
> re-run some of my performance tests
>
>
>
> *From:* Peter Kronenberg
> *Sent:* Friday, January 8, 2021 1:36 PM
> *To:* [email protected]; [email protected]
> *Subject:* RE: {EXTERNAL}tesseract resize option
>
>
>
> Why does it increase the size of the image by 9 times?  What is the
> purpose?  Does it help with a specific area?    What are the reasons I
> would want to change the default?
>
> In one of your samples, you used 300.  Is there a reason you changed it
> from 900?
>
>
>
> *From:* Tim Allison <[email protected]>
> *Sent:* Friday, January 8, 2021 1:22 PM
> *To:* [email protected]
> *Subject:* {EXTERNAL}tesseract resize option
>
>
>
> CAUTION: This email originated from outside of the organization. DO NOT
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
> I was asked about this privately.
>
>
>
> This is an area of the codebase I've just become rather familiar
> with...but hadn't touched.  And, frankly, given the amount of traffic on
> this area that we've had on the user lists, I don't think many others are
> exercising this part of the codebase.  Also, I'm not sure it is documented
> _anywhere_ but I could be wrong.
>
>
>
> If you have ImageMagick configured _and_ you tell tesseract to
> "enableImageProcessing", the tesseract ocr parser will include the "resize"
> option in the commandline for imagemagick.
>
>
>
> The number is a percentage of the image's current size.  The default is
> 900%, which means that if everything is working, your image will be
> expanded to 9x its current size.
>
>
>
> As I recently added in a TODO in the source code, we should probably do
> some dynamic assessment of how much to enlarge an image (auto mode) because
> 9x even on our unit test file takes _FOREVER_.
>
>
>
> The commandline args for imagemagick are set via:
>
>
>
> String[] args = new String[]{
>         "-density", Integer.*toString*(config.getDensity()),
>         "-depth ", Integer.*toString*(config.getDepth()),
>         "-colorspace", config.getColorspace(),
>         "-filter", config.getFilter(),
>         "-resize", config.getResize() + "%",
>         "-rotate", angle,
>         sourceFile.toAbsolutePath().toString(),
>         targFile.toAbsolutePath().toString()
> };
>
>

Reply via email to