Hi, I have crawled a bunch of PDF urls using Nutch 1.8. It returned empty "title" and "content" for some of the PDF urls. When I pulled up one such url, the text seems to be easily selectable and does *not* contain a bunch of images as in (non-ocr'd pdf), I am confused about why Nutch returned empty values for "title" and "content" for such a pdf. Example url for which Nutch returned empty title and content- http://www.fs.fed.us/global/iitf/pubs/ja_iitf_2012_holm001.pdf
The way I figured out title and content was empty was through Solr Admin. After it was crawled and indexed in Solr, I search for that url in Solr Admin UI and it had these values for title, content, url, type fields - >From Solr Admin: title,content,url,type "",,http://www.fs.fed.us/global/iitf/pubs/ja_iitf_2012_holm001.pdf,"application/pdf,application,pdf" Any thoughts please??? Thanks

