Hi Ian
> The problem I'm finding is that the crawler is not apparently visiting or > indexing the content of these urls. The document at the far end of the link > has this url > > http://[domain]/medialibrary.axd?id=414405745 > > is actually a pdf. I am using the tika plugin which I thought would allow > for indexing pdfs. > > don't blame parse-tika : if the URL is not fetched then it has no chance of being parsed then indexed check your URL filter : the link above contains a '?' which by default would get the URL to be filtered out -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

