I'm attempting to run some new regex-normalize and regex-urlfilter
rules on my existing crawl directory.

for example:

<regex>
        <pattern>(https?://)www\.(.*)</pattern>
        <substitution>$1$2</substitution>
</regex>

I tried the updatedb command, and the mergedb command, but neither of
these seem to be updating what the web-application returns:

./nutch updatedb ../TEST1/crawl/crawldb/
../TEST1/crawl/segments/20071125053435/ -normalize -filter
./nutch mergedb ../TEST1/crawl/crawldb/ ../TEST1/crawl/crawldb/
-normalize -filter

Am I on the right track?

Reply via email to