> In almost all cases (99% or higher), robots.txt is used to indicate > that a site shouldn't be crawled, *because* they don't want it > to be indexed. The intention is painfully clear...
Not really, maybe they could care less about the index, but don't want crawlers looping through all their bandwidth, blowing out their logs, skewing their stats, etc. > And they should all move to places where they won't be killed for > ... > But that doesn't make it right to act in a way that can be expected > to harm people when you know better and can avoid it. People are going to publish links whether or not the site wishes to be crawled, indexed, found, outed, whatever. Phone books exist, deal with it. Those who wish to find, kill, or play you don't care about robots, noindex, passwords, laws, wishes, infiltrating your secret online or real life networks or anything else. Whether or not Tor2Web publishes its domain/url list is moot because somewhere, somehow, someone already collated and published it. Not least of which are those bad political opressors who already tapped Tor2Web's clearnet and whatever else to get it. Re: the original topic... Tor2Web would obtain its domains/urls via being a *proxy*, not crawling. Robots/noindex is not part of that. > rendezvous collection Operations involving the dirservs have been deprecated, it's now 2^80. As to being any specific relay [the RP?], not sure. But if so, the domain view there is going to be narrow and slow going. Someone who has read that part of the design could answer... > Hopefully some kind of NG onion would include addition data in HiddenServiceAuthorizeClient HidServAuth _______________________________________________ tor-talk mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
