[jira] [Updated] (NUTCH-2742) Unable to parse specific pdf file

2019-10-06 Thread M A (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M A updated NUTCH-2742: --- Attachment: (was: segment-dump.txt) > Unable to parse specific pdf file > - > >

[jira] [Issue Comment Deleted] (NUTCH-2742) Unable to parse specific pdf file

2019-10-06 Thread M A (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M A updated NUTCH-2742: --- Comment: was deleted (was: saf) > Unable to parse specific pdf file > - > >

[jira] [Updated] (NUTCH-2742) Unable to parse specific pdf file

2019-10-06 Thread M A (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M A updated NUTCH-2742: --- Description: It appears that the Tika plugin is not parsing some PDF files. When I completed a dump of the segment d

[jira] [Updated] (NUTCH-2742) Unable to parse specific pdf file

2019-10-06 Thread M A (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M A updated NUTCH-2742: --- Attachment: (was: crawl.log) > Unable to parse specific pdf file > - > >

[jira] [Reopened] (NUTCH-2742) Unable to parse specific pdf file

2019-10-06 Thread M A (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M A reopened NUTCH-2742: saf > Unable to parse specific pdf file > - > > Key: NUTCH-2742 >

[jira] [Closed] (NUTCH-2742) Unable to parse specific pdf file

2019-10-06 Thread M A (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M A closed NUTCH-2742. -- Resolution: Not A Problem > Unable to parse specific pdf file > - > > K

[jira] [Issue Comment Deleted] (NUTCH-2742) Unable to parse specific pdf file

2019-10-06 Thread M A (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] M A updated NUTCH-2742: --- Comment: was deleted (was: Apologies, didn't realise that was a feature.) > Unable to parse specific pdf file >