George Weller wrote:
> Hi all,
> 
> First I note in the logs that a large number of PDF documents have been
> fetched, and yet only two have been indexed, and indeed only these two
> appear in search results. The content limit is set high enough to allow
> these documents to be indexed, so I can't think why this should be.

Are there any related errors on log?

> Secondly for those documents that ARE indexed, rather bizarrely, the
> document titles in the search results have a '.xls' extension. I can even
> search for all PDF document just by using the query 'xls'. Note that this
> suffix is most definitely NOT in the actual title of those files. I also
> chanced upon a site that seems to use Nutch (no affiliation- I just googled)
> and found the same problem...

In the examples from your site the title is extracted from the pdf
metadata so it just uses the title stored within the pdf doc.

-- 
 Sami Siren

Reply via email to