Hi,

can you provide a concrete example?
What does Google show as title?
If there is no title defined in PDF's "info" container
(aka properties aka meta data) it must be, e.g.,
- file name / URL
- first heading
or something similar.

Nutch 2.2.1 is using Tika 1.3 to parse PDFs.
In doubt, you should check the behavior of the current
Tika version and ev. ask on the Tika mailing list
if you thinks it's a defect of the PDF parser.

Thanks,
Sebastian


On 04/12/2014 11:20 PM, A Laxmi wrote:
> Hi,
> 
> Nutch doesn't seem to grab the title of PDF files when there is *no
> title*defined in PDF properties where as Google does. Could someone
> explain if
> any additional tweaking has to be done from Nutch side so it does not
> return empty title?
> 
> Thanks!
> 

Reply via email to