Mike, I had a similar issue, the way I dealt with it was to change the code in org.apache.nutch.parse.html.HtmlParser around line 178 to add my own url to the list based on some info on 'content.getUrl()'. The good thing is that this class is mentioned in 'parse-plugin.xml' configuration and you can create your own HTML parser and update this config and one more conf, so it is extensible.
Peyman On Fri, Nov 18, 2011 at 8:04 AM, Michael Kelleher <[email protected]> wrote: > I have content that is not linkable from anywhere on a site. This content > is only reachable via a search page. > > Is it possible via some type of connector or custom plugin to index this > content? > > Thanks, > > --mike >

