Mike,

I had a similar issue, the way I dealt with it was to change the code in
org.apache.nutch.parse.html.HtmlParser around line 178 to add my own
url to the list based on some info on 'content.getUrl()'.
The good thing is that this class is mentioned in 'parse-plugin.xml'
configuration and you can create your own HTML parser and update this
config and one more conf, so it is extensible.

Peyman

On Fri, Nov 18, 2011 at 8:04 AM, Michael Kelleher <[email protected]> wrote:
> I have content that is not linkable from anywhere on a site.  This content
> is only reachable via a search page.
>
> Is it possible via some type of connector or custom plugin to index this
> content?
>
> Thanks,
>
> --mike
>

Reply via email to