Hi, On Mon, Dec 8, 2008 at 2:05 PM, Nadav Har'El <[EMAIL PROTECTED]> wrote: > Yes, I admitted that it is a scary idea, but in the long run, what *will* > the Tika developers do if indeed there is a bug in a specific PDF construct? > Hope that the PDFbox developers fix it?
Yes. We file an issue at PDFBox and upgrade to the next release that fixes that problem. If we actually *know* how to fix the problem then we attach the patch to that issue. If the PDFBox developers are unresponsive, then we start looking for some other PDF parser library that better meets our needs. Forking PDFBox is IMHO only the very last option after the above alternatives are exhausted. And even then I wouldn't move the code to Tika, but rather start a new project where people interested in PDF (as opposed to generic text extraction) can come together and work on the code. > If indeed such a symbiosis is possible, then it will indeed be great because > the size issue becomes moot (although others like the "dll hell" mentioned > earlier don't). I just just wonder if we have any power to influence these > other projects. Most projects I know are eager to welcome people with good ideas and patches. The basic principle of open source development is that you influence projects by contributing to them. I don't see any parser project rejecting a proposal like "it would be helpful for Tika if you adopted this patch I've prepared" unless the patch itself is crappy. BR, Jukka Zitting