Hi guys, Some of you may know of the Tesjeract project that brings Tesseract to Java on Windows. I'm interesting in running Audiveris (which uses Tesjeract) on Linux so I've been trying to port Tesjeract to Linux. It uses tessdll.dll, which initially posed a problem because there is no direct equivalent for Linux.
I eventually managed to create a Linux equivalent version. To keep things simple, it only contains the minimum needed by Tesjeract. In order to build it, I had to build Tesseract itself as shared libraries, something that hasn't been done on Linux up till now. See issue 174 about that but I don't think it's the cause of the problem I'm about to mention. I now have Audiveris successfully calling Tesseract through Tesjeract without crashing. The problem is that only ~ is being returned. I understand that this is what happens when a glyph isn't recognised? I've checked the files being fed to Tesseract and they are indeed uncompressed 8-bit TIFFs. Here's a sample: http://groups.google.com/group/tesseract-ocr/web/tesjeract-linux.tiff Tesseract correctly reports the image information. Here's some of the output I get: [java] Image has 8 * 1 bits per pixel, and size (83,24) [java] Resolution=1 [java] omr.glyph.text.Sentence.recognize(Sentence.java:636) -- INFO: Glyph#763 (eng)->"~" [java] Image has 8 * 1 bits per pixel, and size (80,24) [java] Resolution=1 [java] omr.glyph.text.Sentence.recognize(Sentence.java:636) -- INFO: Glyph#768 (eng)->"~" [java] Image has 8 * 1 bits per pixel, and size (50,23) [java] Resolution=1 [java] omr.glyph.text.Sentence.recognize(Sentence.java:636) -- INFO: Glyph#438 (eng)->"~" [java] Image has 8 * 1 bits per pixel, and size (617,27) [java] Resolution=1 [java] omr.glyph.text.Sentence.recognize(Sentence.java:636) -- INFO: Glyph#765 (eng)->"~" I also fed the sample to Tesseract directly and it correctly recognised it as "COUNTRY" so it seems there's nothing wrong with my installation. I did see a mailing list thread about the DLL returning ~ when the application didn't but I got the impression that this problem was fixed. Since I've only mimicked existing code, I don't have a deep understand of how Tesseract works so I'd appreciate it if someone could take a look and see if there's anything obviously wrong. Someone familiar with the DLL and/or Tesjeract should be able to follow this code quite easily. http://groups.google.com/group/tesseract-ocr/web/tesjeract-linux.zip If you want to try and build it, let me know and I'll post some instructions. Note that I'm using the latest Tesseract code from SVN. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

