Indexer to normalize URL's https://issues.apache.org/jira/browse/NUTCH-1300
This will _not_ update existing documents! You have to reindex all segments with normalizing enabled. On Monday 12 March 2012 14:32:57 webdev1977 wrote: > How would one go about changing the hostnames that a large number of urls > point to in both the crawldb as well as the solr index? I tried running > the updatedb with the -normalize switch on. I added a regular expression > in regex-normalize.xml. Then I ran the solrindex command, but nothing > seemed to change in my search? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Hostnames-changed-for-lots-of-URLS-in-c > rawldb-solr-index-how-to-change-tp3819265p3819265.html Sent from the Nutch > - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex