I had posted a similar issue earlier but got no response. I'm running
Nutch 0.9 version. I'm crawling an internal site to build the index and
results. However the index and search results are accessed externally.
Since the search results are accessed externally, the associated urls
need to be transformed. It is best that the transformation happens when
the index and its artifacts are built.
What is the procedure to transform urls.
Do I write a custom plug that extends URL Normalize? If so what scope
does this need to be associated with?
Appreciate your suggestions.
Thanks,
Salman