If you are using Solr versions in the 4.x series, then you could update the
fields [1] once the data is indexed. This is not doing the nutch way but
this is something that came in min and can work right away.

[1] http://wiki.apache.org/solr/UpdateJSON#Atomic_Updates


On Mon, Apr 22, 2013 at 9:56 AM, Niels Boldt <nielsbo...@gmail.com> wrote:

> Hi,
>
> We are crawling a site using nutch 1.6 and indexing into solr.
>
> However, we need to rewrite the urls that are indexed in the following way
>
> For instance, nutch crawls a page http://www.example.com/article=xxx but
> when moving data to the index we would like to use the url
>
> http://www.example.com/kb#article=xxx <http://www.example.com/article=xxx>
>
> Instead. So when we get data from solr it will show links to
> http://www.example.com/kb#article=xxx
> <http://www.example.com/article=xxx> instead
> of http://www.example.com/article=xxx
>
> Is that possible to do by creating a plugin that extends the UrlNormalizer,
> eg
>
> http://nutch.apache.org/apidocs-1.4/org/apache/nutch/net/URLNormalizer.html
>
> Or is it better to add a new indexed property that we use.
>
> Best Regards
> Niels
>



-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Reply via email to