regex-normalize.xml This allows you to transform urls based on regular expressions.
So you could make one appear to be the other, or vice versa, or both appear to be a third. Rules are written like so: <regex-normalize> <regex> <pattern>(https?://)www\.(.*)</pattern> <substitution>$1$3</substitution> </regex> ... This example removes (www) from urls. On 4/22/08, Raj Malhotra <[EMAIL PROTECTED]> wrote: > Hi > I have two urls - 1) http://servername/mac and 2) > http://servername/mac/index.html . Is it possible to tell nutch that these > two urls are same through configurations.If any body knows to tackle this > please explain me how to do this. > > regards > > Raj >
