On 25 Jan 2009, at 11:17 AM, Hydro Meteor wrote:


On Wed, Jan 14, 2009 at 4:43 AM, Christiaan Hofman <[email protected]> wrote:

On 14 Jan 2009, at 3:04 PM, Adam M. Goldstein wrote:

> On Jan 14, 2009, at 7:12 AM, Christiaan Hofman wrote:
>
>> Tesseract is an example of what I was calling "it won't be good
>> enough". It's source code for a command line tool, not a program, and
>> it does only text analysis, not layout analysis. The latter is also
>> crucial to be able to select. And it certainly does not output PDF.
>> So
>> you're still (very) far from having selectable PDFs, as Noam is
>> asking
>> for. Unfortunately.
>
> A layout tool called "ocropus" integrates tesseract to give better
> quality results than with tesseract alone. At the google pages about
> this (http://sites.google.com/site/ocropus/platforms/os-x) it is
> claimed that it has been successfully compiled on OSX, although Linux
> seems to be the main target platform. Google claims that this
> combination works as well as commercially available OCR software. They
> seem to have a vested interest in this because they want to get the
> text from all of the scanned images of library books in their google
> library project.
>

I also saw that project. It indeed takes the next step, but still far
from sufficient.

> Anyhow, I don't know how you'd manipulate the scanned text to match
> the PDF so text can be selected.

As I mentioned in the RFE about this, it really is a big show stopper
for integration in Skim, because we simply have no access to the
PDFKit internals to patch.

Is PDFKit a moving target (meaning, its closed up by Apple thus no access to source code)? What about in the context of GnuStep? Since Skim can be compiled (at least in theory) to run on GnuStep,

Who says that? It most definitely can not (if not just because GnuStep doesn't have PDFKit yet).

BTW, AFAIC, even IF it would be possible, I certainly wouldn't spend the (enormous amount of) time to implement it.

So anyone who thinks it's possible is welcome to join and implement it. keeping on asking us is pointless.

Christiaan

would it be possible to combine Skim + ocropus + tesseract under the context of GnuStep? That would be a potentially rocking solution. I'd love to see it. It just so happens that I'm in the market for buying a scanner and I want a sheetfeeder (probably will get a Fujitsu ScanSnap). I'm looking at SANE for open source scanning capability. To be able to add open source OCR with SANE backed scanning and then to top it off with Skim would be nirvana. I can well imagine even running this on GnuStep which is itself on a virtual machine such as under the auspices of VMWare or Parallels on a Linux desktop which itself is running on OS X (the host OS).

Cheers!

[SNIP


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Skim-app-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/skim-app-users

Reply via email to