Indexer to normalize URL's
https://issues.apache.org/jira/browse/NUTCH-1300

This will _not_ update existing documents! You have to reindex all segments 
with normalizing enabled.

On Monday 12 March 2012 14:32:57 webdev1977 wrote:
> How would one go about changing the hostnames that a large number of urls
> point to in both the crawldb as well as the solr index?  I tried running
> the updatedb with the -normalize switch on. I added a regular expression
> in regex-normalize.xml. Then I ran the solrindex command, but nothing
> seemed to change in my search?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Hostnames-changed-for-lots-of-URLS-in-c
> rawldb-solr-index-how-to-change-tp3819265p3819265.html Sent from the Nutch
> - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex

Reply via email to