Peter,
If you have a chance, can you see if that tess4j module has the same
functionality as what we're getting with ImageMagick? It'd be great to
knock out 2 external dependencies with native Java if possible.
Cheers,
Tim
On Wed, Jan 13, 2021 at 7:14 AM Tim Allison <[email protected]> wrote:
> >But does the same thing apply in this case? It's not really using all of
> tess4j. Just 1 package from it
>
> Sorry, I should have checked on this exact point before responding. If it
> isn't massive and has no native libraries, y, let's go for it. Let me look
> into it a bit today.
>
> On Tue, Jan 12, 2021 at 10:42 PM Peter Kronenberg <
> [email protected]> wrote:
>
>> I sort of see your reasons for not using Tess4j to replace the current
>> command line calls to Tesseract. But does the same thing apply in this
>> case? It's not really using all of tess4j. Just 1 package from it
>>
>>
>> ------------------------------
>> *From:* Tim Allison <[email protected]>
>> *Sent:* Tuesday, January 12, 2021 9:11:58 PM
>> *To:* [email protected] <[email protected]>
>> *Subject:* Re: Rotation script
>>
>> I really like the idea of moving to pure Java for deskewing. We chose not
>> to use tess4j earlier as a Java binding for tesseract because it requires
>> native code....[1]
>>
>> If we can do it with another call to tesseract from the command line or
>> if there is a fairly lightweight pure Java, ASL 2.0 friendly image library
>> that works well, that would be great.
>>
>> [1]
>> https://issues.apache.org/jira/browse/TIKA-2293
>>
>> On Tue, Jan 12, 2021 at 8:28 PM Peter Kronenberg <
>> [email protected]> wrote:
>>
>> I'd been meaning to ask why you calculate the rotation with a Python
>> script. As far as I can tell, that is the only reason for the Python
>> dependency, which just adds a little (lot?) more complexity to the whole
>> project, as well as who knows how much extra overhead there is to make the
>> Python call. (not to mention, it took me practically a whole day last week
>> to get all the dependencies working on a Linux system in order to be able
>> to run Rotation.py)
>>
>> But now, I have a more important reason to question this. The Rotation
>> script does not work very well. I ran it on the attached files. I started
>> with the straight file and rotated them using Irfanview (15 = 1.5, 25 = 2.5)
>> Rotation.py returns 0 for the 1 and 1.5 degree file. And it returns 1
>> for the 2 degree file. And it seems to always return an integer, or at
>> least rounded to an integer.
>>
>> Here is a simple Java routine which does the same thing and it appears to be
>> far more accurate. It uses Tess4j
>>
>> <dependency>
>> <groupId>net.sourceforge.tess4j</groupId>
>> <artifactId>tess4j</artifactId>
>> <version>4.5.4</version>
>> </dependency>
>>
>>
>> import com.recognition.software.jdeskew.ImageDeskew;
>>
>> import javax.imageio.ImageIO;
>> import java.awt.image.BufferedImage;
>> import java.io.File;
>> import java.io.IOException;
>>
>> public class GetAngle {
>>
>> public static void main(String[] args) throws IOException {
>> BufferedImage bi = ImageIO.read(new
>> File("c:\\Testfiles\\Dickens_skew25.png"));
>> ImageDeskew id = new ImageDeskew(bi);
>> double imageSkewAngle = id.getSkewAngle(); // determine skew angle
>> System.out.println(imageSkewAngle);
>> }
>> }
>>
>> I've been poking around this code and might actually do the change we
>> discussed about not doing the rotation when the angle is 0, as well as
>> allowing rotation even if you're not doing the pre-processing. I'd be glad
>> to take a look at this as well if you think it's a worthwhile direction.
>>
>>