Issued a bug https://issues.apache.org/jira/browse/TIKA-1213 allthough I'm not 
sure whether it's abug or me applying the API inappropriately.

Could the newly introduced NonSequentialPDFParser "help"?

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:[email protected]] 
Gesendet: Sonntag, 22. Dezember 2013 10:08
An: [email protected]
Betreff: How can parsing a 5Mb take 3minutes?

I have a 3Mb pdf files (and others) that takes 3 minutes to extract ist 
content. In my test I am using AutodetectParser (and PDFParser). 
I have built Tika from sources, i.e. am using 1.5 snapshot.

Can anybody explain why/how this is possible?

Where/how can I send the very document? 

Regards
Clemens

Reply via email to