regex-normalize.xml

This allows you to transform urls based on regular expressions.

So you could make one appear to be the other, or vice versa, or both
appear to be a third.

Rules are written like so:

<regex-normalize>
<regex>
  <pattern>(https?://)www\.(.*)</pattern>
  <substitution>$1$3</substitution>
</regex>
...

This example removes (www) from urls.

On 4/22/08, Raj Malhotra <[EMAIL PROTECTED]> wrote:
> Hi
>  I have two urls  - 1) http://servername/mac and 2)
>  http://servername/mac/index.html . Is it possible to tell nutch that these
>  two urls are same through configurations.If any body knows to tackle this
>  please explain me how to do this.
>
>  regards
>
> Raj
>

Reply via email to