Wait, that _is_ tess4j, 8MB jar with bundled tessdata and win32 binaries. Can we somehow liberate jdeskew so we don't package all of tess4j?
On Wed, Jan 13, 2021 at 7:15 AM Tim Allison <[email protected]> wrote: > Peter, > If you have a chance, can you see if that tess4j module has the same > functionality as what we're getting with ImageMagick? It'd be great to > knock out 2 external dependencies with native Java if possible. > > Cheers, > > Tim > > On Wed, Jan 13, 2021 at 7:14 AM Tim Allison <[email protected]> wrote: > >> >But does the same thing apply in this case? It's not really using all of >> tess4j. Just 1 package from it >> >> Sorry, I should have checked on this exact point before responding. If >> it isn't massive and has no native libraries, y, let's go for it. Let me >> look into it a bit today. >> >> On Tue, Jan 12, 2021 at 10:42 PM Peter Kronenberg < >> [email protected]> wrote: >> >>> I sort of see your reasons for not using Tess4j to replace the current >>> command line calls to Tesseract. But does the same thing apply in this >>> case? It's not really using all of tess4j. Just 1 package from it >>> >>> >>> ------------------------------ >>> *From:* Tim Allison <[email protected]> >>> *Sent:* Tuesday, January 12, 2021 9:11:58 PM >>> *To:* [email protected] <[email protected]> >>> *Subject:* Re: Rotation script >>> >>> I really like the idea of moving to pure Java for deskewing. We chose >>> not to use tess4j earlier as a Java binding for tesseract because it >>> requires native code....[1] >>> >>> If we can do it with another call to tesseract from the command line or >>> if there is a fairly lightweight pure Java, ASL 2.0 friendly image library >>> that works well, that would be great. >>> >>> [1] >>> https://issues.apache.org/jira/browse/TIKA-2293 >>> >>> On Tue, Jan 12, 2021 at 8:28 PM Peter Kronenberg < >>> [email protected]> wrote: >>> >>> I'd been meaning to ask why you calculate the rotation with a Python >>> script. As far as I can tell, that is the only reason for the Python >>> dependency, which just adds a little (lot?) more complexity to the whole >>> project, as well as who knows how much extra overhead there is to make the >>> Python call. (not to mention, it took me practically a whole day last week >>> to get all the dependencies working on a Linux system in order to be able >>> to run Rotation.py) >>> >>> But now, I have a more important reason to question this. The Rotation >>> script does not work very well. I ran it on the attached files. I started >>> with the straight file and rotated them using Irfanview (15 = 1.5, 25 = 2.5) >>> Rotation.py returns 0 for the 1 and 1.5 degree file. And it returns 1 >>> for the 2 degree file. And it seems to always return an integer, or at >>> least rounded to an integer. >>> >>> Here is a simple Java routine which does the same thing and it appears to >>> be far more accurate. It uses Tess4j >>> >>> <dependency> >>> <groupId>net.sourceforge.tess4j</groupId> >>> <artifactId>tess4j</artifactId> >>> <version>4.5.4</version> >>> </dependency> >>> >>> >>> import com.recognition.software.jdeskew.ImageDeskew; >>> >>> import javax.imageio.ImageIO; >>> import java.awt.image.BufferedImage; >>> import java.io.File; >>> import java.io.IOException; >>> >>> public class GetAngle { >>> >>> public static void main(String[] args) throws IOException { >>> BufferedImage bi = ImageIO.read(new >>> File("c:\\Testfiles\\Dickens_skew25.png")); >>> ImageDeskew id = new ImageDeskew(bi); >>> double imageSkewAngle = id.getSkewAngle(); // determine skew angle >>> System.out.println(imageSkewAngle); >>> } >>> } >>> >>> I've been poking around this code and might actually do the change we >>> discussed about not doing the rotation when the angle is 0, as well as >>> allowing rotation even if you're not doing the pre-processing. I'd be glad >>> to take a look at this as well if you think it's a worthwhile direction. >>> >>>
