Re: [WWWOFFLE-Users] refresh-recurse in lastout: possible improvement?

Andrew M. Bishop Sun, 22 Oct 2006 21:47:53 -0700

Albert Reiner <[EMAIL PROTECTED]> writes:

> The (not too) recent changes to the LASTOUT index as regards POST
> requests are an enormous improvement for me.
> 
> Now, the last thing I am aware of that does not quite work the way I
> would like in the LASTOUT index are items like
> 
>   
> <http://localhost:8080/refresh-recurse/?url=http://www.pmsf.de/pub/cmucl/doc/cmu-user/;depth=5;limit=http://www.pmsf.de/pub/cmucl/doc/cmu-user/;images=Y;frames=Y>
> 
> i.e., cases where the "outgoing" URL was actually an instruction for
> WWWOFFLE to fetch pages recursively.  Selecting that link from the
> LASTOUT index does not have the desired effect of informing me about
> the pages that were fetched, and I have to remember the URL and search
> the LASTTIME index instead for these items.
> 
> Would it be possible (with reasonable effort) to make sure that the
> LASTOUT index links to a page listing the new pages recursively
> fetched or something like that?  I assume this information would be
> available by parsing the starting page and checking the LASTTIME
> index.  (Ideally, one would then be able to sort pages as on the index
> pages.  Or one might also present pages in a tree structure
> corresponding to the "generations" of the recursive fetch.)


I don't know what would be the appropriate information to display in
this case.  The cache index for the host would be one option, but it
would not be clear which pages were fetched this time and which were
not.  To produce a list of all of the pages that were fetched this
time would not cover the case of recursive fetches across multiple
hosts.  You suggestion for a tree structure would be quite difficult
because it would mean re-parsing all of the pages and deciding which
ones might have been fetched and then listing the ones that were
actually fetched.

> Alternatively, it would already be an improvement if selecting that
> link in the LASTOUT index led to a page stating that this links
> represents a recursive selection of items that can be reached starting
> from some URL (http://www.pmsf.de/pub/cmucl/doc/cmu-user/ in the above
> example) that is then linked to.

This would be possible except that there is no way to determine that
the link was followed to see what pages were fetched and not to
request the pages in the first place.  Following the link in the index
is exactly the same as making the request in the first place.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.9/user.html

Re: [WWWOFFLE-Users] refresh-recurse in lastout: possible improvement?

Reply via email to