On 9/8/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Tomi NA wrote:
> On 9/7/06, David Wallace <[EMAIL PROTECTED]> wrote:
>> Just guessing, but could this be caused by session ids in the URL?  Or
>> some other unimportant piece of data?  If this is the case, then every
>> page would be added to the index when it's crawled, regardless of
>> whether it's already in there, with a different session id.  If this is
>> what's causing your problem, then you need to use the regexp URL
>> normaliser to strip out the session ids.
>
> Nice try but no luck, I'm afraid.
> The complete web is absolutely static. The reason is that we've set up
> IIS (I'm not too happy choosing IIS over apache) to serve files from a
> shared directory on the same server, the rationale beeing that we'd
> rather have http://-type links than file://.
>> From what I've seen in the logs, I don't see URLs varying so I'm still
> at square one. Still, thanks for the effort. If you have any other
> ideas, I'm eager to hear them.

The best way to discover what's going on is to start from a small subset
of injected urls, and do the following:

* inject

* dump the db to a text file

* generate / fetch / updatedb

* dump the db again to a second text file

* compare the files.

I'll see if I'm able to reproduce those steps here, thanks.

t.n.a.

Reply via email to