[
https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
M A updated NUTCH-2742:
---
Attachment: (was: segment-dump.txt)
> Unable to parse specific pdf file
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
M A updated NUTCH-2742:
---
Comment: was deleted
(was: saf)
> Unable to parse specific pdf file
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
M A updated NUTCH-2742:
---
Description:
It appears that the Tika plugin is not parsing some PDF files.
When I completed a dump of the segment d
[
https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
M A updated NUTCH-2742:
---
Attachment: (was: crawl.log)
> Unable to parse specific pdf file
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
M A reopened NUTCH-2742:
saf
> Unable to parse specific pdf file
> -
>
> Key: NUTCH-2742
>
[
https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
M A closed NUTCH-2742.
--
Resolution: Not A Problem
> Unable to parse specific pdf file
> -
>
> K
[
https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
M A updated NUTCH-2742:
---
Comment: was deleted
(was: Apologies, didn't realise that was a feature.)
> Unable to parse specific pdf file
>
7 matches
Mail list logo