having said that if the URL filters are correct, the next step is to check that the parser actually returns the outlink. Google for ParserChecker and try it on the URL containing the link
On 23 January 2012 16:04, Julien Nioche <lists.digitalpeb...@gmail.com>wrote: > Hi Ian > > >> The problem I'm finding is that the crawler is not apparently visiting or >> indexing the content of these urls. The document at the far end of the link >> has this url >> >> http://[domain]/medialibrary.axd?id=414405745 >> >> is actually a pdf. I am using the tika plugin which I thought would allow >> for indexing pdfs. >> >> > don't blame parse-tika : if the URL is not fetched then it has no chance > of being parsed then indexed > > check your URL filter : the link above contains a '?' which by default > would get the URL to be filtered out > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com