Opened an issue therefor https://issues.apache.org/jira/browse/PDFBOX-1821
-----Ursprüngliche Nachricht----- Von: Clemens Wyss - MySign AG [mailto:[email protected]] Gesendet: Sonntag, 22. Dezember 2013 17:37 An: '[email protected]' Betreff: Parsing a pdf file takes 3minutes I initially posted this question in the tika-mailing list, and I even created an issue herefore: https://issues.apache.org/jira/browse/TIKA-1213 Hopefully now being on the right list, I re-phrase the problem I am confronted with: I have (several) pdf documents which take up to 3minutes to be parsed/extracted (for later lucene indexing). For example the pdf which is attached to the jira issue requires 3minutes. How/why is this possible? How can I improve on this? Any help appreciated Clemens

