On 9/7/06, David Wallace <[EMAIL PROTECTED]> wrote:
> Just guessing, but could this be caused by session ids in the URL?  Or
> some other unimportant piece of data?  If this is the case, then every
> page would be added to the index when it's crawled, regardless of
> whether it's already in there, with a different session id.  If this is
> what's causing your problem, then you need to use the regexp URL
> normaliser to strip out the session ids.

Nice try but no luck, I'm afraid.
The complete web is absolutely static. The reason is that we've set up
IIS (I'm not too happy choosing IIS over apache) to serve files from a
shared directory on the same server, the rationale beeing that we'd
rather have http://-type links than file://.
>From what I've seen in the logs, I don't see URLs varying so I'm still
at square one. Still, thanks for the effort. If you have any other
ideas, I'm eager to hear them.

t.n.a.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to