Leeroy, to avoid being indexed by Googlebot et al, place the appropriate /robots.txt at your root. It's described in the FAQ.
http://www.onion.city/faq.html As a historical note, the reason Aaron and I chose Tor2web's URL design was so search engines would automatically see any /robots.txt an onionsite specifies. -V On Fri, Feb 13, 2015 at 3:30 PM, l.m <[email protected]> wrote: > > >Alas no. I'm aware this is suboptimal. I see GOOG search engine as > a > >temporary-ladder just to get the ball rolling. I am open to using > any > >other index. For what it's worth I'm very pleased with GOOG's > >performance---right now it's searching an index of 650k onion pages > and the > >number grows every day. > > If you instead use a google search appliance couldn't you use google > engine for indexing without having to use google itself? Wouldn't that > also avoid the problem of google queries being associated with the > client making the request? > > >Although we technically could read provided passwords, we don't keep > logs > >of passed traffic. However, I understand that many users don't > understand > >the tor2web threat model. But this is the same as all Tor2web nodes, > yes? > >This is not at all unique to OnionCity. As far as I know all Tor2web > nodes > >allow form submissions. > > What is unique to onion.city is that access to someonion.onion.city > occurs using http and doesn't redirect to the .onion if Tor is in use. > That the tor2web mirror might snoop is implicit--that the exit (if > using tor) might also snoop is more of a concern. > > >You mentioned it'd be better to have it randomly pick among the > available > >Tor2web nodes instead of everything going through OnionCity. This > breaks > >the GOOG search engine which only wants to return "canonical" URLs. > We > >could talk about making OnionCity a DNS round-robin akin to how > Tor2web.org > >currently works, but then I'm just replicating Tor2web. > > The ability of tor2web to provide mirrors should be optional. If you > only know one mirror and that mirror cannot service the request then > how are you going to get any of the other mirrors? Google engine can > return related addresses in an order based on the success of loading > the mirror itself. If onion.city always works it will tend to precede > tor2web.org. If onion.city goes down (having search front-end separate > from tor2web mirror) the search engine can reorder the result to > improve the success of the first click. > > >Right now I aggregate existing lists of onion sites and put them > into the > >site map. > >* https://ahmia.fi/onions/ > >* http://skunksworkedp2cg.onion.city/sites.txt > >* http://xlmvhk3rpdux26dz.onion.city/ > >* http://kkkkkku5juzqh33a.onion.city/ > > If google is itself handling the indexing won't that cause a problem > for sites in those lists, which are normally okay with being indexed, > just not by googlebot? I for one couldn't care less about being > indexed by ahmia.fi but it'll be a cold day in hell before I let > googlebot. Precisely because of how easy it is to link the search to > the requester. > --leeroy > -- > tor-talk mailing list - [email protected] > To unsubscribe or change other settings go to > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk > -- tor-talk mailing list - [email protected] To unsubscribe or change other settings go to https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
