Wait, that _is_ tess4j, 8MB jar with bundled tessdata and win32 binaries.
Can we somehow liberate jdeskew so we don't package all of tess4j?

On Wed, Jan 13, 2021 at 7:15 AM Tim Allison <[email protected]> wrote:

> Peter,
>   If you have a chance, can you see if that tess4j module has the same
> functionality as what we're getting with ImageMagick?  It'd be great to
> knock out 2 external dependencies with native Java if possible.
>
>    Cheers,
>
>          Tim
>
> On Wed, Jan 13, 2021 at 7:14 AM Tim Allison <[email protected]> wrote:
>
>> >But does the same thing apply in this case? It's not really using all of
>> tess4j. Just 1 package from it
>>
>> Sorry, I should have checked on this exact point before responding.  If
>> it isn't massive and has no native libraries, y, let's go for it.  Let me
>> look into it a bit today.
>>
>> On Tue, Jan 12, 2021 at 10:42 PM Peter Kronenberg <
>> [email protected]> wrote:
>>
>>> I sort of  see your reasons for not using Tess4j to replace the current
>>> command line calls to Tesseract. But does the same thing apply in this
>>> case? It's not really using all of tess4j. Just 1 package from it
>>>
>>>
>>> ------------------------------
>>> *From:* Tim Allison <[email protected]>
>>> *Sent:* Tuesday, January 12, 2021 9:11:58 PM
>>> *To:* [email protected] <[email protected]>
>>> *Subject:* Re: Rotation script
>>>
>>> I really like the idea of moving to pure Java for deskewing. We chose
>>> not to use tess4j earlier as a Java binding for tesseract because it
>>> requires native code....[1]
>>>
>>> If we can do it with another call to tesseract from the command line or
>>> if there is a fairly lightweight pure Java, ASL 2.0 friendly image library
>>> that works well, that would be great.
>>>
>>> [1]
>>> https://issues.apache.org/jira/browse/TIKA-2293
>>>
>>> On Tue, Jan 12, 2021 at 8:28 PM Peter Kronenberg <
>>> [email protected]> wrote:
>>>
>>> I'd been meaning to ask why you calculate the rotation with a Python
>>> script.  As far as I can tell, that is the only reason for the Python
>>> dependency, which just adds a little (lot?) more complexity to the whole
>>> project, as well as who knows how much extra overhead there is to make the
>>> Python call. (not to mention, it took me practically a whole day last week
>>> to get all the dependencies working on a Linux system in order to be able
>>> to run Rotation.py)
>>>
>>> But now, I have a more important reason to question this.  The Rotation
>>> script does not work very well.  I ran it on the attached files.  I started
>>> with the straight file and rotated them using Irfanview (15 = 1.5, 25 = 2.5)
>>> Rotation.py returns 0 for the 1 and 1.5 degree file.  And it returns 1
>>> for the 2 degree file.  And it seems to always return an integer, or at
>>> least rounded to an integer.
>>>
>>> Here is a simple Java routine which does the same thing and it appears to 
>>> be far more accurate.  It uses Tess4j
>>>
>>> <dependency>
>>>   <groupId>net.sourceforge.tess4j</groupId>
>>>   <artifactId>tess4j</artifactId>
>>>   <version>4.5.4</version>
>>> </dependency>
>>>
>>>
>>> import com.recognition.software.jdeskew.ImageDeskew;
>>>
>>> import javax.imageio.ImageIO;
>>> import java.awt.image.BufferedImage;
>>> import java.io.File;
>>> import java.io.IOException;
>>>
>>> public class GetAngle {
>>>
>>>     public static void main(String[] args) throws IOException {
>>>         BufferedImage bi = ImageIO.read(new 
>>> File("c:\\Testfiles\\Dickens_skew25.png"));
>>>         ImageDeskew id = new ImageDeskew(bi);
>>>         double imageSkewAngle = id.getSkewAngle(); // determine skew angle
>>>         System.out.println(imageSkewAngle);
>>>     }
>>> }
>>>
>>> I've been poking around this code and might actually do the change we
>>> discussed about not doing the rotation when the angle is 0, as well as
>>> allowing rotation even if you're not doing the pre-processing.  I'd be glad
>>> to take a look at this as well if you think it's a worthwhile direction.
>>>
>>>

Reply via email to