having said that if the URL filters are correct, the next step is to check
that the parser actually returns the outlink. Google for ParserChecker and
try it on the URL containing the link

On 23 January 2012 16:04, Julien Nioche <lists.digitalpeb...@gmail.com>wrote:

> Hi Ian
>
>
>> The problem I'm finding is that the crawler is not apparently visiting or
>> indexing the content of these urls. The document at the far end of the link
>> has this url
>>
>> http://[domain]/medialibrary.axd?id=414405745
>>
>> is actually a pdf. I am using the tika plugin which I thought would allow
>> for indexing pdfs.
>>
>>
> don't blame parse-tika : if the URL is not fetched then it has no chance
> of being parsed then indexed
>
> check your URL filter : the link above contains a '?' which by default
> would get the URL to be filtered out
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to