Oh, you’re already ahead of me. Yes, that’s what I thought. You can just copy the 1 or 2 classes that are needed and not import all of Tess4j
From: Tim Allison <[email protected]> Sent: Wednesday, January 13, 2021 8:39 AM To: [email protected] Subject: Re: Rotation script Y, jdeskew only imports java.awt... no other dependencies. We can copy/paste that source into our codebase and remove rotation.py. On Wed, Jan 13, 2021 at 8:35 AM Tim Allison <[email protected]<mailto:[email protected]>> wrote: Wait, that _is_ tess4j, 8MB jar with bundled tessdata and win32 binaries. Can we somehow liberate jdeskew so we don't package all of tess4j? On Wed, Jan 13, 2021 at 7:15 AM Tim Allison <[email protected]<mailto:[email protected]>> wrote: Peter, If you have a chance, can you see if that tess4j module has the same functionality as what we're getting with ImageMagick? It'd be great to knock out 2 external dependencies with native Java if possible. Cheers, Tim On Wed, Jan 13, 2021 at 7:14 AM Tim Allison <[email protected]<mailto:[email protected]>> wrote: >But does the same thing apply in this case? It's not really using all of >tess4j. Just 1 package from it Sorry, I should have checked on this exact point before responding. If it isn't massive and has no native libraries, y, let's go for it. Let me look into it a bit today. On Tue, Jan 12, 2021 at 10:42 PM Peter Kronenberg <[email protected]<mailto:[email protected]>> wrote: I sort of see your reasons for not using Tess4j to replace the current command line calls to Tesseract. But does the same thing apply in this case? It's not really using all of tess4j. Just 1 package from it ________________________________ From: Tim Allison <[email protected]<mailto:[email protected]>> Sent: Tuesday, January 12, 2021 9:11:58 PM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Re: Rotation script I really like the idea of moving to pure Java for deskewing. We chose not to use tess4j earlier as a Java binding for tesseract because it requires native code....[1] If we can do it with another call to tesseract from the command line or if there is a fairly lightweight pure Java, ASL 2.0 friendly image library that works well, that would be great. [1] https://issues.apache.org/jira/browse/TIKA-2293 On Tue, Jan 12, 2021 at 8:28 PM Peter Kronenberg <[email protected]<mailto:[email protected]>> wrote: I'd been meaning to ask why you calculate the rotation with a Python script. As far as I can tell, that is the only reason for the Python dependency, which just adds a little (lot?) more complexity to the whole project, as well as who knows how much extra overhead there is to make the Python call. (not to mention, it took me practically a whole day last week to get all the dependencies working on a Linux system in order to be able to run Rotation.py) But now, I have a more important reason to question this. The Rotation script does not work very well. I ran it on the attached files. I started with the straight file and rotated them using Irfanview (15 = 1.5, 25 = 2.5) Rotation.py returns 0 for the 1 and 1.5 degree file. And it returns 1 for the 2 degree file. And it seems to always return an integer, or at least rounded to an integer. Here is a simple Java routine which does the same thing and it appears to be far more accurate. It uses Tess4j <dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactId>tess4j</artifactId> <version>4.5.4</version> </dependency> import com.recognition.software.jdeskew.ImageDeskew; import javax.imageio.ImageIO; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; public class GetAngle { public static void main(String[] args) throws IOException { BufferedImage bi = ImageIO.read(new File("c:\\Testfiles\\Dickens_skew25.png")); ImageDeskew id = new ImageDeskew(bi); double imageSkewAngle = id.getSkewAngle(); // determine skew angle System.out.println(imageSkewAngle); } } I've been poking around this code and might actually do the change we discussed about not doing the rotation when the angle is 0, as well as allowing rotation even if you're not doing the pre-processing. I'd be glad to take a look at this as well if you think it's a worthwhile direction.
