[ https://issues.apache.org/jira/browse/NUTCH-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
M A reopened NUTCH-2742: ------------------------ saf > Unable to parse specific pdf file > --------------------------------- > > Key: NUTCH-2742 > URL: https://issues.apache.org/jira/browse/NUTCH-2742 > Project: Nutch > Issue Type: Bug > Components: nutchNewbie, parser > Affects Versions: 1.15 > Reporter: M A > Priority: Minor > > It appears that the Tika plugin is not parsing some PDF files. > An example is > "https://parlinfo.aph.gov.au/parlInfo/download/chamber/hansards/1b090c4f-e4d9-4785-a733-b5270139d035/toc_pdf/Senate_2019_02_12_6907_Official.pdf" > When I completed a dump of the segment data there is no content > > EDIT: See attached for output and crawl log > -- This message was sent by Atlassian Jira (v8.3.4#803005)