Y, jdeskew only imports java.awt... no other dependencies. We can copy/paste that source into our codebase and remove rotation.py.
On Wed, Jan 13, 2021 at 8:35 AM Tim Allison <[email protected]> wrote: > Wait, that _is_ tess4j, 8MB jar with bundled tessdata and win32 binaries. > Can we somehow liberate jdeskew so we don't package all of tess4j? > > On Wed, Jan 13, 2021 at 7:15 AM Tim Allison <[email protected]> wrote: > >> Peter, >> If you have a chance, can you see if that tess4j module has the same >> functionality as what we're getting with ImageMagick? It'd be great to >> knock out 2 external dependencies with native Java if possible. >> >> Cheers, >> >> Tim >> >> On Wed, Jan 13, 2021 at 7:14 AM Tim Allison <[email protected]> wrote: >> >>> >But does the same thing apply in this case? It's not really using all >>> of tess4j. Just 1 package from it >>> >>> Sorry, I should have checked on this exact point before responding. If >>> it isn't massive and has no native libraries, y, let's go for it. Let me >>> look into it a bit today. >>> >>> On Tue, Jan 12, 2021 at 10:42 PM Peter Kronenberg < >>> [email protected]> wrote: >>> >>>> I sort of see your reasons for not using Tess4j to replace the current >>>> command line calls to Tesseract. But does the same thing apply in this >>>> case? It's not really using all of tess4j. Just 1 package from it >>>> >>>> >>>> ------------------------------ >>>> *From:* Tim Allison <[email protected]> >>>> *Sent:* Tuesday, January 12, 2021 9:11:58 PM >>>> *To:* [email protected] <[email protected]> >>>> *Subject:* Re: Rotation script >>>> >>>> I really like the idea of moving to pure Java for deskewing. We chose >>>> not to use tess4j earlier as a Java binding for tesseract because it >>>> requires native code....[1] >>>> >>>> If we can do it with another call to tesseract from the command line or >>>> if there is a fairly lightweight pure Java, ASL 2.0 friendly image library >>>> that works well, that would be great. >>>> >>>> [1] >>>> https://issues.apache.org/jira/browse/TIKA-2293 >>>> >>>> On Tue, Jan 12, 2021 at 8:28 PM Peter Kronenberg < >>>> [email protected]> wrote: >>>> >>>> I'd been meaning to ask why you calculate the rotation with a Python >>>> script. As far as I can tell, that is the only reason for the Python >>>> dependency, which just adds a little (lot?) more complexity to the whole >>>> project, as well as who knows how much extra overhead there is to make the >>>> Python call. (not to mention, it took me practically a whole day last week >>>> to get all the dependencies working on a Linux system in order to be able >>>> to run Rotation.py) >>>> >>>> But now, I have a more important reason to question this. The Rotation >>>> script does not work very well. I ran it on the attached files. I started >>>> with the straight file and rotated them using Irfanview (15 = 1.5, 25 = >>>> 2.5) >>>> Rotation.py returns 0 for the 1 and 1.5 degree file. And it returns 1 >>>> for the 2 degree file. And it seems to always return an integer, or at >>>> least rounded to an integer. >>>> >>>> Here is a simple Java routine which does the same thing and it appears to >>>> be far more accurate. It uses Tess4j >>>> >>>> <dependency> >>>> <groupId>net.sourceforge.tess4j</groupId> >>>> <artifactId>tess4j</artifactId> >>>> <version>4.5.4</version> >>>> </dependency> >>>> >>>> >>>> import com.recognition.software.jdeskew.ImageDeskew; >>>> >>>> import javax.imageio.ImageIO; >>>> import java.awt.image.BufferedImage; >>>> import java.io.File; >>>> import java.io.IOException; >>>> >>>> public class GetAngle { >>>> >>>> public static void main(String[] args) throws IOException { >>>> BufferedImage bi = ImageIO.read(new >>>> File("c:\\Testfiles\\Dickens_skew25.png")); >>>> ImageDeskew id = new ImageDeskew(bi); >>>> double imageSkewAngle = id.getSkewAngle(); // determine skew angle >>>> System.out.println(imageSkewAngle); >>>> } >>>> } >>>> >>>> I've been poking around this code and might actually do the change we >>>> discussed about not doing the rotation when the angle is 0, as well as >>>> allowing rotation even if you're not doing the pre-processing. I'd be glad >>>> to take a look at this as well if you think it's a worthwhile direction. >>>> >>>>
