[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330882#comment-17330882
]
David Pilato commented on TIKA-3364:
Oh my god! I'm feeling stupid.
Anyway, I was not able to choose
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330851#comment-17330851
]
Tim Allison commented on TIKA-3364:
---
try {{pdfParser.setExtractBookmarksText(false);}}
> PDF Content is
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330827#comment-17330827
]
Nick Burch commented on TIKA-3364:
--
I'm not sure if we already have outlines/bookmarks elsewhere in other
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330824#comment-17330824
]
David Pilato commented on TIKA-3364:
So I trie this:
{code:java}
PDFParser pdfParser =
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330810#comment-17330810
]
Tim Allison commented on TIKA-3364:
---
We should probably add extra markup in the xhtml to identify the
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330809#comment-17330809
]
Tim Allison commented on TIKA-3364:
---
You can see the text under the {{Outlines}} node.
> PDF Content is
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330805#comment-17330805
]
Tim Allison commented on TIKA-3364:
---
{noformat}
Dummy PDF file
{noformat}
> PDF Content is
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330799#comment-17330799
]
Tim Allison commented on TIKA-3364:
---
The PDF contains bookmark text, which is what is triggering the .