I have a 3Mb pdf files (and others) that takes 3 minutes to extract ist content. In my test I am using AutodetectParser (and PDFParser). I have built Tika from sources, i.e. am using 1.5 snapshot.
Can anybody explain why/how this is possible? Where/how can I send the very document? Regards Clemens
