On 2/14/13 10:28 PM, [email protected] wrote: > On 14/02/13 10:55, Fabio Pietrosanti (naif) wrote: > > > Tor2web software by default: - setup a robots.txt to prevent search > > engine scraping - block a wide set of Crawler UA to further block > > search activities - prevent hotlinking (from an internet resource > > to do <img src="https://blahblah.tor2web.org/image.jpg"> > > This seems like a strange default to me. The hotlinking is required to avoid having people linking "highly controversial material" on public internet forum, using the few Tor2web proxy as a sort of "Content Delivery Network" . > I can see why people would > want to create hidden services that can be discovered using ordinary > channels on the Internet such as search engines. > > If a hidden service operator actually wanted to block search engines, > they'd know to create their own robots.txt file, or to add appropriate > meta tags to their HTML, or to simply block based on the User-Agent > header... Tor2web 3.0 beta1 is not a final solution, it would require a lot of additional code and features to make it really flexible (like permitting a TorHS operator to configure this robots.txt behavior).
However in the meantime there's a simple reason, survival of services, to avoid the general indexing of Tor2web exposed Tor Hidden Services. Given the experience, it's much more difficult to keep running a Tor2web server, rather than keep running a Tor Exit Node with completely open exit-policy. If you enable "google indexing", the amount of complaints that you will receive will exponentially increase, quickly creating serious issue in being able to keep the Tor2web proxy running. :\ Fabio _______________________________________________ tor-talk mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
