Hi Ian

> The problem I'm finding is that the crawler is not apparently visiting or
> indexing the content of these urls. The document at the far end of the link
> has this url
>
> http://[domain]/medialibrary.axd?id=414405745
>
> is actually a pdf. I am using the tika plugin which I thought would allow
> for indexing pdfs.
>
>
don't blame parse-tika : if the URL is not fetched then it has no chance of
being parsed then indexed

check your URL filter : the link above contains a '?' which by default
would get the URL to be filtered out



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to