Mailing list transition [archives]
Can someone make sure all the new lists get submitted/added to markmail? As official archives in Maildir or Mbox are not yet provided (under the curious guise of spam prevention), some alternative indexes to the ones provided by the list engine would be valuable to the community. *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Scroogle and Tor
Thus spake Matthew (pump...@cotse.net): On 13/02/11 19:09, scroo...@lavabit.com wrote: I've been fighting two different Tor users for a week. Each is apparently having a good time trying to see how quickly they can get results from Scroogle searches via Tor exit nodes. The fastest I've seen is about two per second. Since Tor users are only two percent of all Scroogle searches, I'm not adverse to blocking all Tor exits for a while when all else fails. These two Tor users were rotating their search terms, and one also switched his user-agent once. You can see why I might be tempted to throw my block all Tor switch on occasion -- sometimes there's no other way to convince the bad guy that he's not going to succeed. For the less than knowledgeable people amongst us (e.g me) who want to learn a bit more: what was the rationale for those two Tor users doing what they did? What do they get from it? I second this. Daniel, If you can find a way to fingerprint these bots, my suggestion would be to observe the types of queries they are running (perhaps for some of their earlier runs from when you could ban them by user agent?). One of the things Google does is actually decide your 'Captchaness' based on the content of your queries. Well, at least I suspect that's what they are doing, because I have been able to more reliably reproduce torbutton Captcha-related bugs when I try hard to write queries like robots that are looking for php sites to exploit. I would love to hear more about the types of scrapers that abuse Tor. Or rather, I would like to see if someone can at least identify rational behavior behind scrapers that abuse Tor. Some of it could also be misdirected malware that is operating from within Torified browsers. Some of it could also be deliberately torified malware. Google won't tell us any of this, obviously ;). -- Mike Perry Mad Computer Scientist fscked.org evil labs pgpVxq8YphoPj.pgp Description: PGP signature
Re: Scroogle and Tor
scroo...@lavabit.com wrote: I've been fighting two different Tor users for a week. Each is apparently having a good time trying to see how quickly they can get results from Scroogle searches via Tor exit nodes. [snip] As the person who (recently) raised the question about the availability of Scroogle via Tor, I want to thank you both for running Scroogle and for coming on this list to explain what happened. I also apologize to the list for not mentioning that Scroogle is once again available via Tor. (I discovered that and meant to publish that fact aprox. 24 hours ago.) You are obviously much more knowledgable about network issues than I am so I will leave it to others to advise you about possible mitigations for your problems. It is a real shame about the script kiddies, but such is the world we live in. Jim *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
I never made the claim this was safer. Of course, not quoted as such. Plaintext anywhere is risky. Yet this entire thread is about sniffing. How plaintext-only exits somehow equate to sniffing. And how badexiting plaintext-only exits somehow equates to reducing that risk. Both are weak premises. And said exits were loosely defined as wolves whose only purpose was to log traffic. I cited several engineering reasosn why this type of exit policy is a pain for us. Perhaps after the nodes were waxed on the premise of sniffing and the thread exploded. (Dethreading might show otherwise so no picking is intended at all.) It shouldn't matter though as certainly folks would better support decisions to solve anonymity engineering and/or performance problems that are causing a non-trivial impact or holdup. Is Tor really at the point where reducing the exit matrix provides significant or greater win as opposed to updates in other areas? Does (or will) Tor bundle 80+443 to the same destination via the same exit? What about http[s], smtp[s], imap[s], pop[s], submission grouping? If the user is using different functions or accounts with different protocols, he likely doesn't want this. Better let him do his own bundling with MAPADDRESS or some toggles or something and enhance those tools instead. I've also made the claim that there is no rational reason to operate an exit in this fashion People are encouraged to help out however they can. Therefore, operator fiat and whim is, by definition, sufficient reason. If Joe operator thinks 6667, 31337, 21, 23, 25, 80, 6969, 12345, 7, 53, 79, 2401, 19, 70, 110, 123 and so on are pretty uber cool, daresay even silly motivation, and wants to support them, that's his right. Just as he can disallow www.{un.int,aclu.org}:80. He doesn't have to announce it with some 'no sniffing, pro rights' policy statement to those that might believe the paper it's printed on, validate his social ties, be contacted, or otherwise vetted. If another example is needed, not that one is; Corporate, edu and other LAN's sometimes think they can block 'ooo, encryption bad' ports so they can watch their user's plaintext URL's with their substandard vendor nanny watch tool of the day. All the while their staff laughs at them as they happily tunnel whatever they want over that (perhaps even the client or exit parts of Tor). Yes, this kind of joke exists :) And another; In some equally crazy backwards braindead jurisdiction, being able to say 'hey, we're not hiding our traffic in crypto, we forbid it, so look mr. authorized gov agent, you can sniff all that traffic you're getting reports about, and we're not in it, therefore we're off the hook'. Perhaps even in France, etc, with their strange crypto laws. There was also mention of exits to RFC1918 space. No ISP with brains routes this, especially not for customer facing interfaces. Yet they could simply be exits so that the operator and others can access the 1918 space said operator has deployed internally. They might not care to use a (hidden service OnionCat VPN) for this. Be it due to config, speed, anonymity or otherwise. Nor might they wish to overload routable address space as an exit to their local designs. It's just as crazy. But they're all rational in someone's mind. [I haven't actually tried to map 1918 _in_ to Tor yet, just figured what can be configured not to go out must be capable of going in.] What about the users that want to reach their peer, via that only exit in Siberia whose IP isn't blocked before their peer, that only happens to only be offering port 80, to which their peer can listen. It's not a question of whether *we* would do such things or see them as rational. This is network space, any to any, hack to hack. One man's widget is another man's stinky wicket. It's the tools that matter. Tor is a network tool, with a nifty anonymity layer. We also detect throttling by virtue of our bw authorities measuring using 443. The same goes for exits that we detect ... throttling 443 Thanks, I yield this hack to be mooted by the project, cool. 443 is the second-most trafficed port by byte on the Tor network, occupying only ~1% of the traffic. Sniffing was needed to determine this :) And, assuming 80 was found to be the first-most (which sounds right); then in the 80+443(+rest) case, a sniffer's cost is only raised, say, sub 10%, not double. So dropping said nodes truly does nothing useful costwise either. (A days worth of netflow on a faster open exit would show the port distribution breakdown, if anyone wants to.) Node testing methodologies are cool. And what can't be proven beyond that belongs to userland. Engineering is also cool (and there are some potentially good reasons to normalize exits there, beyond the crypto/non-crypto port groups to be sure). And all the various use cases, examples and whims are cool. So why not start a new thread exploring the engineering and, if valid and overriding of same, let the
Re: Is gatereloaded a Bad Exit?
So, with everything said, could we now please Un-BadExit the nodes that were affected? Thanks! morphium *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
Thus spake morphium (morph...@morphium.info): So, with everything said, could we now please Un-BadExit the nodes that were affected? Sure, dude. Since you've read everything that was said, I take it you're volunteering to contact the other node operators and ask them to give reasons for why they chose their exit policy? Let us know their preferred email addresses when you're done. But they'll have to survive a challenge and response round proving they can modify their contact info field ;). -- Mike Perry Mad Computer Scientist fscked.org evil labs pgprFx9XcJnz7.pgp Description: PGP signature
Re: Is gatereloaded a Bad Exit?
Am 14.02.2011 14:41, schrieb morphium: So, with everything said, could we now please Un-BadExit the nodes that were affected? the whole discussion didn't change my mind. I still support the idea of flagging them as bad exit. regards Olaf *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
the whole discussion didn't change my mind. I still support the idea of flagging them as bad exit. Same. Mike gave some good reasons for flagging them weeks ago and I've yet to see much else besides ranting that seems to ignore most of this thread. -Damian *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
Sure, dude. Since you've read everything that was said, I take it you're volunteering to contact the other node operators and ask them to give reasons for why they chose their exit policy? So please BadExit all nodes without contact email, if they don't explain why they chose the default exit policy, I think they should be blacklisted! Thanks! morphium *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
On Mon, 2011-02-14 at 14:41 +0100, morphium wrote: So, with everything said, could we now please Un-BadExit the nodes that were affected? Sorry, but this has been a long thread and I want to try to make sure I understand something important. Is it true or false that traffic was actually exiting through gatereloaded et all? I recall seeing that those nodes weren't marked as exits in the consensus anyway. If that is the case, then all of John Case's arguments related to super-secret movie-plot usages of Tor and servers running on port 80 only accessible through gatereloaded et all seem to be irrelevant. signature.asc Description: This is a digitally signed message part
Re: Is gatereloaded a Bad Exit?
On Mon, 14 Feb 2011, morphium wrote: Sure, dude. Since you've read everything that was said, I take it you're volunteering to contact the other node operators and ask them to give reasons for why they chose their exit policy? So please BadExit all nodes without contact email, if they don't explain why they chose the default exit policy, I think they should be blacklisted! No, it goes further than that. The real motion here is to BadExit all nodes that aren't being used and deployed exactly like I deploy mine. When the dust settles, could we get the official threat model, and the official end user profile and the official use case documented ? Again, for the lulz. *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
dir-spec.txt and directory-signature entries
The final entries in a consensus document are a number of directory- signature entries. dir-spec.txt says: cite directory-signature SP identity SP signing-key-digest NL Signature This is a signature of the status document, with the initial item network-status-version, and the signature item directory-signature, using the signing key. (In this case, we take the hash through the _space_ after directory-signature, not the newline: this ensures that all authorities sign the same thing.) identity is the hex-encoded digest of the authority identity key of the signing authority, and signing-key-digest is the hex-encoded digest of the current authority signing key of the signing authority. /cite Does that mean The hash from the network-status-version entry to the *first* directory-signature entry including a SP? Or something else? The wording in dir-spec.txt is ambigous to me. Any help appreciated. Cheers /Jocke *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
I suppose the anarchist genes in me are not strong enough. I have to agree with Mike Perry's arguments, given his credibility, and his clearer perspective than most of the rest of us. If this BadExit policy is being made up ad-hoc, that's fine by me. If the offending Tor node operators want to stand up and defend themselves, or their choices, that's fine too. -- Julie C. ju...@h-ck.ca GPG key FF4E2E70 available at http://keys.gnupg.net On Mon, Feb 14, 2011 at 9:44 AM, John Case c...@sdf.lonestar.org wrote: On Mon, 14 Feb 2011, morphium wrote: Sure, dude. Since you've read everything that was said, I take it you're volunteering to contact the other node operators and ask them to give reasons for why they chose their exit policy? So please BadExit all nodes without contact email, if they don't explain why they chose the default exit policy, I think they should be blacklisted! No, it goes further than that. The real motion here is to BadExit all nodes that aren't being used and deployed exactly like I deploy mine. When the dust settles, could we get the official threat model, and the official end user profile and the official use case documented ? Again, for the lulz. *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: dir-spec.txt and directory-signature entries
On 2011-02-14 19:46, Nick Mathewson wrote: snip/ Does that mean The hash from the network-status-version entry to the *first* directory-signature entry including a SP? It means everything beginning with the string network-status-version and ending with the first string directory-signature . This refers to the _string_ directory signature (with included space), not to the entire directory signature. (It _can't_ refer to the entire directory signature, since when the authority computes the signature, it doesn't know what the signature is going to be.) Yes, that was my understanding as well. Thanks for the clarification. I looked elsewhere in my code and realised that the shared signature code added an extra \n after directory-signature when verifying consensus documents. I got extremely confused because I could verify both router descriptor and key certificate documents. In other words: My bad, i.e. I needed someone to talk to. :-) Sorry for the noise Cheers /Jocke *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
On Mon, 2011-02-14 at 17:41 +, John Case wrote: On Mon, 14 Feb 2011, Ted Smith wrote: Sorry, but this has been a long thread and I want to try to make sure I understand something important. Is it true or false that traffic was actually exiting through gatereloaded et all? I recall seeing that those nodes weren't marked as exits in the consensus anyway. If that is the case, then all of John Case's arguments related to super-secret movie-plot usages of Tor and servers running on port 80 only accessible through gatereloaded et all seem to be irrelevant. And therefore will always be irrelevant, never affecting a single ToR user into the infinite future. Is there an or-parliament list I should be on if I want to be an Official Tor Project Legislator, making these *important policy decisions* that affect Tor into the *infinite future*? I know it's easier to send emails about something incredibly unimportant to inflate one's own ego than it is to actually get shit done, but this is ridiculous. signature.asc Description: This is a digitally signed message part
Re: Is gatereloaded a Bad Exit?
On 2/14/2011 7:48 AM, grarpamp wrote: [snip] If another example is needed, not that one is; Corporate, edu and other LAN's sometimes think they can block 'ooo, encryption bad' ports so they can watch their user's plaintext URL's with their substandard vendor nanny watch tool of the day. All the while their staff laughs at them as they happily tunnel whatever they want over that (perhaps even the client or exit parts of Tor). Yes, this kind of joke exists :) [/snip] Although I've been keeping out of this argument for the most part, and even though I'm leaning towards seeing things Mike's way, I just wanted to comment that I've actually been in an environment like this several times, once at my previous university, and once working for a local government organization. As asinine as such reasoning is on the part of the network administrator (or the person who signs their checks), I can see why the *ability* to run strange exit policies could be a good thing, and should be preserved in the software. However, I see no reason why providing an anonymous contact email would be so hard. Certainly if you're going out of your way to avoid [insert conspiracy of choice] in order to run a node, you have the skills to use one of the hundreds of free email services out there? I don't think asking for a tiny bit of responsibility on the part of exit operators is too much to ask, and I'm amazed that allow them to continue to function as middle nodes until they explain why their node appears broken or malicious is continually being turned into some kind of human-rights violation. ~Justin Aplin *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
On Monday 14 February 2011 14:17:45 Aplin, Justin M wrote: However, I see no reason why providing an anonymous contact email would be so hard. Certainly if you're going out of your way to avoid [insert conspiracy of choice] in order to run a node, you have the skills to use one of the hundreds of free email services out there? I don't think asking for a tiny bit of responsibility on the part of exit operators is too much to ask, and I'm amazed that allow them to continue to function as middle nodes until they explain why their node appears broken or malicious is continually being turned into some kind of human-rights violation. Or even better, create a nym using remailers. This does take some maintenance, as if one of the remailers goes down, you have to make a new chain of remailers for the nym to work, but it's more secure than a Yahoo/Hotmail/etc. account. cmeclax *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
Hello Julie, On Mon, 14 Feb 2011, Julie C wrote: I suppose the anarchist genes in me are not strong enough. I have to agree with Mike Perry's arguments, given his credibility, and his clearer perspective than most of the rest of us. If this BadExit policy is being made up ad-hoc, that's fine by me. If the offending Tor node operators want to stand up and defend themselves, or their choices, that's fine too. Great. What's the acceptable companion port to 119 ? How about 6667 ? Since these ports, like 25, have no standard companion (like 80/443 typically does) what collection of encrypted ports need to be maintained to balance out running 199/6667 ? Come on people - I thought there would be quick answers to all of this... RE: clearer perspective - it's easy to have a clear perspective when you discount all possible use cases that aren't what I do. *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
On 2011-02-14 John Case wrote: Where's the answer to this ? I chose edge-case scenarios above, for sure, but this is the real meat of the implementation of your plans, and I'd like to know if you've given any thought to this whatsoever. What _is_ the proper corresponding open port for 25 ? What _do_ you find an acceptable match for 53 ? What system of weights will you give ports that don't have an obvious correlary ? Oh, by the way - I used TCP port 80 this morning for something other than cleartext HTTP. You've already made perfectly clear that you don't get the point. Can we now stop beating the dead horse? Thank you. Regards Ansgar Wiechers -- All vulnerabilities deserve a public fear period prior to patches becoming available. --Jason Coombs on Bugtraq *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Is gatereloaded a Bad Exit?
On Mon, Feb 14, 2011 at 4:32 PM, John Case c...@sdf.lonestar.org wrote: Hello Julie, On Mon, 14 Feb 2011, Julie C wrote: I suppose the anarchist genes in me are not strong enough. I have to agree with Mike Perry's arguments, given his credibility, and his clearer perspective than most of the rest of us. If this BadExit policy is being made up ad-hoc, that's fine by me. If the offending Tor node operators want to stand up and defend themselves, or their choices, that's fine too. Great. What's the acceptable companion port to 119 ? How about 6667 ? Since these ports, like 25, have no standard companion (like 80/443 typically does) what collection of encrypted ports need to be maintained to balance out running 199/6667 ? Come on people - I thought there would be quick answers to all of this... RE: clearer perspective - it's easy to have a clear perspective when you discount all possible use cases that aren't what I do. Here's an argument tip: When you think you've spotted some enormous hole in the other side's argument, there is at least a small chance that you're actually instead spotted a hole in your understanding of their position. You should probably take a moment to reflect and make sure you're confident that you know where the error is before hitting send. I refrained from answering this the first time you asked it because I thought if I gave you more time you might realize that it wasn't really a useful question. No one has suggested every unencrypted port must be matched. There are some very clear matches which do exist (e.g. HTTP/HTTPS) and for those matches action can be taken. Nothing requires anything to be done about all the other cases where such nice and popular parallels are not obvious or where the protocols are unpopular enough to begin with. HTTP is an overwhelming popular port, and there really isn't anything wrong with special casing _just_ that, if thats all that it ever came to. Your examples aren't the best though, SSL SMTP is on 465— and it's probably common enough that a similar rule could be enforced if anyone cared. IRC ports aren't all that consistent even without the introduction of security, so there isn't much that can be said there. [snip] and people that need this are in literally life or death (or at least free or jail) situations Then they need to not run an exit. If running an exit is probably going to get you killed or put in jail you should not be running one. If you're right and the decision to allow wacko exit policies discourages people with their life on the line from running exits, then I could imagine no better policy. *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
ToR: A network by/for ToR admins
On Mon, 14 Feb 2011, Gregory Maxwell wrote: Then they need to not run an exit. If running an exit is probably going to get you killed or put in jail you should not be running one. If you're right and the decision to allow wacko exit policies discourages people with their life on the line from running exits, then I could imagine no better policy. Thank you, thank you. It took some time and some goading, but we've finally arrived: ToR will be used the way we think it should be. That's all I needed to hear. *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Scroogle and Tor
Some have wondered why anyone would want to abuse Scroogle using Tor. Apart from some malicious types that may be doing it for their own amusement, it looks to me like they are trying to datamine Google -- arguably the largest, most diverse database on the planet. If you can manage to run a script 24/7 that datamines Google, you can monetize your results. Search engine optimizers would like to be able to do this. So would various directory builders. Doing it by scraping google.com directly is not easy. Scroogle provides 100 links of organic results per request, with less than one-half the byte-bloat that Google delivers for the same links and snippets. It is also much easier to parse Scroogle's simple output page than it is to parse Google's output page. I spend a couple hours per day blocking abusers. A huge amount of this is done through a couple dozen monitoring programs I've written, but for the most part these programs provide candidates for blocking only, and my wetware is needed to make the final determination. My efforts to counter abuse occasionally cause some programmers to consider using Tor to get Scroogle's results. About a year ago I began requiring any and all Tor searches at Scroogle to use SSL. Using SSL is always a good idea, but the main reason I did this is that the SSL requirement discouraged script writers who didn't know how to add this to their scripts. This policy helped immensely in cutting back on the abuse I was seeing from Tor. Now I'm seeing script writers who have solved the SSL problem. This leaves me with the user-agent, the search terms, and as a last resort, blocking Tor exit nodes. If they vary their search terms and user-agents, it can take hours to analyze patterns and accurately block them by returning a blank page. That's the way I prefer to do it, because I don't like to block Tor exit nodes. Those who are most sympathetic with what Tor is doing are also sympathetic with what Scroogle is doing. There's a lot of collateral damage associated with blocking Tor exit nodes, and I don't want to alienate the Tor community except as a last resort. One reason why Scroogle has lasted for more than six years is that we are nonprofit, and Google knows by now that I don't tolerate abuse. My job is to stop the abuser before Scroogle passes their search terms to Google. Abusers who use Tor make this more difficult for me. Blocking an IP address is easy, but blocking Tor abusers without alienating other Tor users is more complex. -- Daniel Brandt *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Scroogle and Tor
On 02/14/2011 06:29 PM, scroo...@lavabit.com wrote: Some have wondered why anyone would want to abuse Scroogle using Tor. Apart from some malicious types that may be doing it for their own amusement, it looks to me like they are trying to datamine Google -- arguably the largest, most diverse database on the planet. Makes a lot of sense. Actually, can hardly blame them for wanting to mine the data. Of course, you make it pretty easily available, as you detail. I can see why this starts to present a problem. I spend a couple hours per day blocking abusers. A huge amount of this is done through a couple dozen monitoring programs I've written, but for the most part these programs provide candidates for blocking only, and my wetware is needed to make the final determination. Ouch, that really sucks... time like that adds up fast. Now I'm seeing script writers who have solved the SSL problem. This leaves me with the user-agent, the search terms, and as a last resort, blocking Tor exit nodes. If they vary their search terms and user-agents, it can take hours to analyze patterns and accurately block them by returning a blank page. That's the way I prefer to do it, because I don't like to block Tor exit nodes. Those who are most sympathetic with what Tor is doing are also sympathetic with what Scroogle is doing. There's a lot of collateral damage associated with blocking Tor exit nodes, and I don't want to alienate the Tor community except as a last resort. Well...google uses the captcha system. Hard to say how well that works. I doubt anything too simple is going to work here, for many reasons, including the ones that you specify. How about this... we know you can (mostly reliably) detect tor exits. I think you have your goals wrong. You don't need to stop the scripts from getting to google, even google can't stop that on their own site. What you need is to make abusive use unprofitable on a scale that matters. Tor users care about their privacy right... but you need a way to differentiate them. So how about a temporary registration system? I get sent to a page with a captcha (or two kinds even). If I pass, then I get a token (set in a cookie, or put in the query string) that lets me do searches. Maybe I can set when it should expire (up to a max) maybe put in a 30 second timeout before it becomes active. (slow them down some more)... maybe limit the rate per ip over time for registrations? Secondly, have you considered poisoning their stream? If you detect an obvious abusive script, return randomized cached results. Ruining their work, rather than just slowing them down, might convince them to move on and try somewhere else. It is a thought anyway. One reason why Scroogle has lasted for more than six years is that we are nonprofit, and Google knows by now that I don't tolerate abuse. My job is to stop the abuser before Scroogle passes their search terms to Google. Abusers who use Tor make this more difficult for me. Blocking an IP address is easy, but blocking Tor abusers without alienating other Tor users is more complex. It will be sad to see tor users lose your service (I actually had only heard the name before this thread, very curious to check it out now). -Steve *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: Scroogle and Tor
Thus spake scroo...@lavabit.com (scroo...@lavabit.com): My efforts to counter abuse occasionally cause some programmers to consider using Tor to get Scroogle's results. About a year ago I began requiring any and all Tor searches at Scroogle to use SSL. Using SSL is always a good idea, but the main reason I did this is that the SSL requirement discouraged script writers who didn't know how to add this to their scripts. This policy helped immensely in cutting back on the abuse I was seeing from Tor. Now I'm seeing script writers who have solved the SSL problem. This leaves me with the user-agent, the search terms, and as a last resort, blocking Tor exit nodes. If they vary their search terms and user-agents, it can take hours to analyze patterns and accurately block them by returning a blank page. That's the way I prefer to do it, because I don't like to block Tor exit nodes. Those who are most sympathetic with what Tor is doing are also sympathetic with what Scroogle is doing. There's a lot of collateral damage associated with blocking Tor exit nodes, and I don't want to alienate the Tor community except as a last resort. Great, now that we know the motivations of the scrapers and a history of the arms race so far, it becomes a bit easier to try to do some things to mitigate their efforts. I particularly like the idea of feeding them random, incorrect search results when you can fingerprint them. If you want my suggestions for next steps in the arms race for this, (having written some benevolent scrapers and web scanners myself), it would actually be to do things that require your adversary to implement and load more and more bits of a proper web browser into their crawlers for them to succeed in properly issuing queries to you. Some examples: 1. A couple layers of crazy CSS. If you use CSS style sheets that fetch other randomly generated and programmatically controlled style elements that are also keyed to the form submit for the search query (via an extra hidden parameter or something that is their hash), then you can verify on your server side that a given query also loaded sufficient CSS to be genuine. The problem with this is it will mess with people who use your search plugin or search keywords, but you could also do it in a brief landing page that is displayed *after* the query, but before a 302 or meta-refresh to actual results, for problem IPs. 2. Storing identifiers in the cache http://crypto.stanford.edu/sameorigin/safecachetest.html has some PoC of this. Torbutton protects against long-term cache identifiers, but for performance reasons the memory cache is enabled by default, so you could use this to differentiate crawlers who do not properly obey all brower caching sematics. Caching is actually pretty darn hard to get right, so there's probably quite a bit more room here than just plain identifiers. 3. Javascript proof of work If the client supports javascript, you can have them factor some medium-sized integers and post the factorization with the query string, to prove some level of periodic work. The factors could be stored in cookies and given a lifetime. The obvious downside of this is that I bet a fair share of your users are running NoScript, or prefer to disable js and cookies. Anyways, thanks for your efforts with Scroogle. Hopefully the above ideas are actually easy enough to implement on your infrastructure to make it worth your while to use for all problem IPs, not just Tor. -- Mike Perry Mad Computer Scientist fscked.org evil labs pgpDQruQ8zLhC.pgp Description: PGP signature
Re: Scroogle and Tor
On Mon, 14 Feb 2011 20:19:50 -0800 Mike Perry mikepe...@fscked.org wrote: 2. Storing identifiers in the cache http://crypto.stanford.edu/sameorigin/safecachetest.html has some PoC of this. Torbutton protects against long-term cache identifiers, but for performance reasons the memory cache is enabled by default, so you could use this to differentiate crawlers who do not properly obey all brower caching sematics. Caching is actually pretty darn hard to get right, so there's probably quite a bit more room here than just plain identifiers. Polipo monkey-wrenches Torbutton's protection against long-term cache identifiers. Robert Ransom signature.asc Description: PGP signature
Re: Scroogle and Tor
Thus spake Robert Ransom (rransom.8...@gmail.com): On Mon, 14 Feb 2011 20:19:50 -0800 Mike Perry mikepe...@fscked.org wrote: 2. Storing identifiers in the cache http://crypto.stanford.edu/sameorigin/safecachetest.html has some PoC of this. Torbutton protects against long-term cache identifiers, but for performance reasons the memory cache is enabled by default, so you could use this to differentiate crawlers who do not properly obey all brower caching sematics. Caching is actually pretty darn hard to get right, so there's probably quite a bit more room here than just plain identifiers. Polipo monkey-wrenches Torbutton's protection against long-term cache identifiers. I hate polipo. I've been trying ignore it until it fucking dies. But it's like a zombie that just won't stop gnawing on our brains. Worse, a crack smoking zombie that got us all addicted to it through second hand crack smoke. Or something. But hey, it's better than privoxy. Maybe? I was under the impression that we hacked it to also be memory-only, though. But you're right, if I toggle Torbutton to clear my cache, Polipo's is still there... -- Mike Perry Mad Computer Scientist fscked.org evil labs pgpgDTEhULdw5.pgp Description: PGP signature