On 2003/02/07 14:41:44, [EMAIL PROTECTED] wrote:
> Also, should robots.txt parsing be mandatory or on a voluntary basis? If you
> make this configurable, the first thing people will do is shut it off.

I would like to see it there by default and be able to be
disabled on a per site ('document' in JPluck terms) -- not
global -- basis.

By default: be good. Be "bad" on a case by case basis.


> As an aside: do people think the default user-agent should be AvantGo?

No. Same principle as above. Be good by default.

Again, on a per site (document) basis, the ability to change the
User Agent string will be required.

The changes you are making to limit the number of simultaneous
requests and the wait between requests are a really good idea.
:-)

To be honest, a lot of times I use wget first to fetch the
pages, because I find I often have to tweak them first before
converting them.

Hopefully, if I start using XSL to do the tweaking, with the
addition of the cache you have made, if I go through a lot of
tweak-convert tweak-convert cycles, it will only hit the site
the first time and there-after go through the cache. (Or will it
still hit the site to get if the documents were modified. If so,
I'd like a way to make it only use the cache. That is, work
offline.)
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to