Mailing list transition [archives]

2011-02-14 Thread grarpamp
Can someone make sure all the new lists get submitted/added
to markmail?

As official archives in Maildir or Mbox are not yet provided (under
the curious guise of spam prevention), some alternative indexes
to the ones provided by the list engine would be valuable to
the community.
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Scroogle and Tor

2011-02-14 Thread Mike Perry
Thus spake Matthew (pump...@cotse.net):

 On 13/02/11 19:09, scroo...@lavabit.com wrote:
 I've been fighting two different Tor users for a week. Each is
 apparently having a good time trying to see how quickly they
 can get results from Scroogle searches via Tor exit nodes.
 The fastest I've seen is about two per second. Since Tor users
 are only two percent of all Scroogle searches, I'm not adverse
 to blocking all Tor exits for a while when all else fails.
 These two Tor users were rotating their search terms, and one
 also switched his user-agent once. You can see why I might be
 tempted to throw my block all Tor switch on occasion --
 sometimes there's no other way to convince the bad guy that
 he's not going to succeed.
 
 For the less than knowledgeable people amongst us (e.g me) who want to 
 learn a bit more: what was the rationale for those two Tor users doing what 
 they did?  What do they get from it?

I second this.

Daniel,

If you can find a way to fingerprint these bots, my suggestion would
be to observe the types of queries they are running (perhaps for some
of their earlier runs from when you could ban them by user agent?).

One of the things Google does is actually decide your 'Captchaness'
based on the content of your queries. Well, at least I suspect that's
what they are doing, because I have been able to more reliably
reproduce torbutton Captcha-related bugs when I try hard to write
queries like robots that are looking for php sites to exploit.

I would love to hear more about the types of scrapers that abuse Tor.
Or rather, I would like to see if someone can at least identify
rational behavior behind scrapers that abuse Tor. Some of it could
also be misdirected malware that is operating from within Torified
browsers. Some of it could also be deliberately torified malware.

Google won't tell us any of this, obviously ;).


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgpVxq8YphoPj.pgp
Description: PGP signature


Re: Scroogle and Tor

2011-02-14 Thread Jim

scroo...@lavabit.com wrote:

I've been fighting two different Tor users for a week. Each is
apparently having a good time trying to see how quickly they
can get results from Scroogle searches via Tor exit nodes. 
[snip]


As the person who (recently) raised the question about the availability 
of Scroogle via Tor, I want to thank you both for running Scroogle and 
for coming on this list to explain what happened.  I also apologize to 
the list for not mentioning that Scroogle is once again available via 
Tor.  (I discovered that and meant to publish that fact aprox. 24 hours 
ago.)


You are obviously much more knowledgable about network issues than I am 
so I will leave it to others to advise you about possible mitigations 
for your problems.  It is a real shame about the script kiddies, but 
such is the world we live in.


Jim


***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread grarpamp
 I never made the claim this was safer.

Of course, not quoted as such. Plaintext anywhere is risky. Yet
this entire thread is about sniffing. How plaintext-only exits
somehow equate to sniffing. And how badexiting plaintext-only exits
somehow equates to reducing that risk. Both are weak premises. And
said exits were loosely defined as wolves whose only purpose was
to log traffic.

 I cited several engineering reasosn why this type of exit policy
 is a pain for us.

Perhaps after the nodes were waxed on the premise of sniffing and
the thread exploded. (Dethreading might show otherwise so no picking
is intended at all.) It shouldn't matter though as certainly folks
would better support decisions to solve anonymity engineering and/or
performance problems that are causing a non-trivial impact or holdup.

Is Tor really at the point where reducing the exit matrix provides
significant or greater win as opposed to updates in other areas?

Does (or will) Tor bundle 80+443 to the same destination via the
same exit? What about http[s], smtp[s], imap[s], pop[s], submission
grouping? If the user is using different functions or accounts with
different protocols, he likely doesn't want this. Better let him
do his own bundling with MAPADDRESS or some toggles or something
and enhance those tools instead.

 I've also made the claim that there is no rational reason to
 operate an exit in this fashion

People are encouraged to help out however they can. Therefore,
operator fiat and whim is, by definition, sufficient reason. If Joe
operator thinks 6667, 31337, 21, 23, 25, 80, 6969, 12345, 7, 53,
79, 2401, 19, 70, 110, 123 and so on are pretty uber cool, daresay
even silly motivation, and wants to support them, that's his right.
Just as he can disallow www.{un.int,aclu.org}:80. He doesn't have
to announce it with some 'no sniffing, pro rights' policy statement
to those that might believe the paper it's printed on, validate his
social ties, be contacted, or otherwise vetted.

If another example is needed, not that one is; Corporate, edu and
other LAN's sometimes think they can block 'ooo, encryption bad'
ports so they can watch their user's plaintext URL's with their
substandard vendor nanny watch tool of the day. All the while their
staff laughs at them as they happily tunnel whatever they want over
that (perhaps even the client or exit parts of Tor). Yes, this kind
of joke exists :)

And another; In some equally crazy backwards braindead jurisdiction,
being able to say 'hey, we're not hiding our traffic in crypto, we
forbid it, so look mr. authorized gov agent, you can sniff all that
traffic you're getting reports about, and we're not in it, therefore
we're off the hook'. Perhaps even in France, etc, with their strange
crypto laws.

There was also mention of exits to RFC1918 space. No ISP with brains
routes this, especially not for customer facing interfaces. Yet
they could simply be exits so that the operator and others can
access the 1918 space said operator has deployed internally. They
might not care to use a (hidden service OnionCat VPN) for this. Be
it due to config, speed, anonymity or otherwise. Nor might they
wish to overload routable address space as an exit to their local
designs. It's just as crazy. But they're all rational in someone's
mind.

[I haven't actually tried to map 1918 _in_ to Tor yet, just figured
what can be configured not to go out must be capable of going in.]

What about the users that want to reach their peer, via that only
exit in Siberia whose IP isn't blocked before their peer, that only
happens to only be offering port 80, to which their peer can listen.

It's not a question of whether *we* would do such things or see
them as rational. This is network space, any to any, hack to hack.
One man's widget is another man's stinky wicket. It's the tools
that matter. Tor is a network tool, with a nifty anonymity layer.


 We also detect throttling by virtue of our bw authorities measuring
 using 443.
 The same goes for exits that we detect ... throttling 443

Thanks, I yield this hack to be mooted by the project, cool.

 443 is the second-most trafficed port by byte on the Tor network,
 occupying only ~1% of the traffic.

Sniffing was needed to determine this :) And, assuming 80 was found
to be the first-most (which sounds right); then in the 80+443(+rest)
case, a sniffer's cost is only raised, say, sub 10%, not double.
So dropping said nodes truly does nothing useful costwise either.
(A days worth of netflow on a faster open exit would show the port
distribution breakdown, if anyone wants to.)


Node testing methodologies are cool. And what can't be proven
beyond that belongs to userland. Engineering is also cool (and
there are some potentially good reasons to normalize exits there,
beyond the crypto/non-crypto port groups to be sure). And all the
various use cases, examples and whims are cool. So why not start
a new thread exploring the engineering and, if valid and overriding
of same, let the 

Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread morphium
So, with everything said, could we now please Un-BadExit the nodes
that were affected?

Thanks!
morphium
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Mike Perry
Thus spake morphium (morph...@morphium.info):

 So, with everything said, could we now please Un-BadExit the nodes
 that were affected?

Sure, dude. Since you've read everything that was said, I take it
you're volunteering to contact the other node operators and ask them
to give reasons for why they chose their exit policy?

Let us know their preferred email addresses when you're done. But
they'll have to survive a challenge and response round proving they
can modify their contact info field ;).


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgprFx9XcJnz7.pgp
Description: PGP signature


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Olaf Selke
Am 14.02.2011 14:41, schrieb morphium:
 So, with everything said, could we now please Un-BadExit the nodes
 that were affected?

the whole discussion didn't change my mind. I still support the idea of
flagging them as bad exit.

regards Olaf
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Damian Johnson
 the whole discussion didn't change my mind. I still support the idea of
 flagging them as bad exit.

Same. Mike gave some good reasons for flagging them weeks ago and I've
yet to see much else besides ranting that seems to ignore most of this
thread. -Damian
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread morphium
 Sure, dude. Since you've read everything that was said, I take it
 you're volunteering to contact the other node operators and ask them
 to give reasons for why they chose their exit policy?

So please BadExit all nodes without contact email, if they don't
explain why they chose the default exit policy, I think they should be
blacklisted!

Thanks!
morphium
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Ted Smith
On Mon, 2011-02-14 at 14:41 +0100, morphium wrote:
 So, with everything said, could we now please Un-BadExit the nodes
 that were affected?
 

Sorry, but this has been a long thread and I want to try to make sure I
understand something important.

Is it true or false that traffic was actually exiting through
gatereloaded et all?

I recall seeing that those nodes weren't marked as exits in the
consensus anyway. If that is the case, then all of John Case's arguments
related to super-secret movie-plot usages of Tor and servers running on
port 80 only accessible through gatereloaded et all seem to be
irrelevant.


signature.asc
Description: This is a digitally signed message part


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread John Case


On Mon, 14 Feb 2011, morphium wrote:


Sure, dude. Since you've read everything that was said, I take it
you're volunteering to contact the other node operators and ask them
to give reasons for why they chose their exit policy?


So please BadExit all nodes without contact email, if they don't
explain why they chose the default exit policy, I think they should be
blacklisted!



No, it goes further than that.  The real motion here is to BadExit all 
nodes that aren't being used and deployed exactly like I deploy mine.


When the dust settles, could we get the official threat model, and the 
official end user profile and the official use case documented ?  Again, 
for the lulz.

***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


dir-spec.txt and directory-signature entries

2011-02-14 Thread J
The final entries in a consensus document are a number of directory-
signature entries.

dir-spec.txt says:

cite

  directory-signature SP identity SP signing-key-digest NL Signature

This is a signature of the status document, with the initial item
network-status-version, and the signature item
directory-signature, using the signing key.  (In this case, we
take
the hash through the _space_ after directory-signature, not the
newline: this ensures that all authorities sign the same thing.)
identity is the hex-encoded digest of the authority identity
key of
the signing authority, and signing-key-digest is the hex-encoded
digest of the current authority signing key of the signing
authority.

/cite

Does that mean The hash from the network-status-version entry to the
*first* directory-signature entry including a SP?

Or something else? The wording in dir-spec.txt is ambigous to me.

Any help appreciated.

Cheers
/Jocke
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Julie C
I suppose the anarchist genes in me are not strong enough. I have to agree
with Mike Perry's arguments, given his credibility, and his clearer
perspective than most of the rest of us. If this BadExit policy is being
made up ad-hoc, that's fine by me. If the offending Tor node operators want
to stand up and defend themselves, or their choices, that's fine too.

--
Julie C.
ju...@h-ck.ca
GPG key FF4E2E70 available at http://keys.gnupg.net




On Mon, Feb 14, 2011 at 9:44 AM, John Case c...@sdf.lonestar.org wrote:


 On Mon, 14 Feb 2011, morphium wrote:

  Sure, dude. Since you've read everything that was said, I take it
 you're volunteering to contact the other node operators and ask them
 to give reasons for why they chose their exit policy?


 So please BadExit all nodes without contact email, if they don't
 explain why they chose the default exit policy, I think they should be
 blacklisted!



 No, it goes further than that.  The real motion here is to BadExit all
 nodes that aren't being used and deployed exactly like I deploy mine.

 When the dust settles, could we get the official threat model, and the
 official end user profile and the official use case documented ?  Again, for
 the lulz.

 ***
 To unsubscribe, send an e-mail to majord...@torproject.org with
 unsubscribe or-talkin the body. http://archives.seul.org/or/talk/



Re: dir-spec.txt and directory-signature entries

2011-02-14 Thread Joakim G.
On 2011-02-14 19:46, Nick Mathewson wrote:

snip/


 Does that mean The hash from the network-status-version entry to the
 *first* directory-signature entry including a SP?
 
 It means everything beginning with the string network-status-version
 and ending with the first string directory-signature .  This refers
 to the _string_ directory signature  (with included space), not to
 the entire directory signature.  (It _can't_ refer to the entire
 directory signature, since when the authority computes the signature,
 it doesn't know what the signature is going to be.)

Yes, that was my understanding as well. Thanks for the clarification.

I looked elsewhere in my code and realised that the shared signature
code added an extra \n after directory-signature  when verifying
consensus documents. I got extremely confused because I could verify
both router descriptor and key certificate documents.

In other words: My bad, i.e. I needed someone to talk to. :-)

Sorry for the noise

Cheers
/Jocke
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Ted Smith
On Mon, 2011-02-14 at 17:41 +, John Case wrote:
 On Mon, 14 Feb 2011, Ted Smith wrote:
 
  Sorry, but this has been a long thread and I want to try to make
 sure I
  understand something important.
 
  Is it true or false that traffic was actually exiting through
  gatereloaded et all?
 
  I recall seeing that those nodes weren't marked as exits in the
  consensus anyway. If that is the case, then all of John Case's
 arguments
  related to super-secret movie-plot usages of Tor and servers running
 on
  port 80 only accessible through gatereloaded et all seem to be
  irrelevant.
 
 
 And therefore will always be irrelevant, never affecting a single ToR user 
 into the infinite future. 

Is there an or-parliament list I should be on if I want to be an
Official Tor Project Legislator, making these *important policy
decisions* that affect Tor into the *infinite future*?

I know it's easier to send emails about something incredibly unimportant
to inflate one's own ego than it is to actually get shit done, but this
is ridiculous.


signature.asc
Description: This is a digitally signed message part


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Aplin, Justin M

On 2/14/2011 7:48 AM, grarpamp wrote:
[snip]

If another example is needed, not that one is; Corporate, edu and
other LAN's sometimes think they can block 'ooo, encryption bad'
ports so they can watch their user's plaintext URL's with their
substandard vendor nanny watch tool of the day. All the while their
staff laughs at them as they happily tunnel whatever they want over
that (perhaps even the client or exit parts of Tor). Yes, this kind
of joke exists :)

[/snip]

Although I've been keeping out of this argument for the most part, and 
even though I'm leaning towards seeing things Mike's way, I just wanted 
to comment that I've actually been in an environment like this several 
times, once at my previous university, and once working for a local 
government organization. As asinine as such reasoning is on the part of 
the network administrator (or the person who signs their checks), I can 
see why the *ability* to run strange exit policies could be a good 
thing, and should be preserved in the software.


However, I see no reason why providing an anonymous contact email would 
be so hard. Certainly if you're going out of your way to avoid [insert 
conspiracy of choice] in order to run a node, you have the skills to use 
one of the hundreds of free email services out there? I don't think 
asking for a tiny bit of responsibility on the part of exit operators is 
too much to ask, and I'm amazed that allow them to continue to function 
as middle nodes until they explain why their node appears broken or 
malicious is continually being turned into some kind of human-rights 
violation.


~Justin Aplin

***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread cmeclax-sazri
On Monday 14 February 2011 14:17:45 Aplin, Justin M wrote:
 However, I see no reason why providing an anonymous contact email would
 be so hard. Certainly if you're going out of your way to avoid [insert
 conspiracy of choice] in order to run a node, you have the skills to use
 one of the hundreds of free email services out there? I don't think
 asking for a tiny bit of responsibility on the part of exit operators is
 too much to ask, and I'm amazed that allow them to continue to function
 as middle nodes until they explain why their node appears broken or
 malicious is continually being turned into some kind of human-rights
 violation.

Or even better, create a nym using remailers. This does take some maintenance, 
as if one of the remailers goes down, you have to make a new chain of 
remailers for the nym to work, but it's more secure than a Yahoo/Hotmail/etc. 
account.

cmeclax
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread John Case


Hello Julie,

On Mon, 14 Feb 2011, Julie C wrote:


I suppose the anarchist genes in me are not strong enough. I have to agree
with Mike Perry's arguments, given his credibility, and his clearer
perspective than most of the rest of us. If this BadExit policy is being
made up ad-hoc, that's fine by me. If the offending Tor node operators want
to stand up and defend themselves, or their choices, that's fine too.



Great.  What's the acceptable companion port to 119 ?  How about 6667 ?

Since these ports, like 25, have no standard companion (like 80/443 
typically does) what collection of encrypted ports need to be maintained 
to balance out running 199/6667 ?


Come on people - I thought there would be quick answers to all of this...

RE: clearer perspective - it's easy to have a clear perspective when you 
discount all possible use cases that aren't what I do.

***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Ansgar Wiechers
On 2011-02-14 John Case wrote:
 Where's the answer to this ?  I chose edge-case scenarios above, for
 sure, but this is the real meat of the implementation of your plans,
 and I'd  like to know if you've given any thought to this whatsoever.

 What _is_ the proper corresponding open port for 25 ?  What _do_ you
 find an acceptable match for 53 ?  What system of weights will you
 give  ports that don't have an obvious correlary ?

 Oh, by the way - I used TCP port 80 this morning for something other
 than  cleartext HTTP.

You've already made perfectly clear that you don't get the point. Can we
now stop beating the dead horse? Thank you.

Regards
Ansgar Wiechers
-- 
All vulnerabilities deserve a public fear period prior to patches
becoming available.
--Jason Coombs on Bugtraq
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Is gatereloaded a Bad Exit?

2011-02-14 Thread Gregory Maxwell
On Mon, Feb 14, 2011 at 4:32 PM, John Case c...@sdf.lonestar.org wrote:
 Hello Julie,
 On Mon, 14 Feb 2011, Julie C wrote:

 I suppose the anarchist genes in me are not strong enough. I have to agree
 with Mike Perry's arguments, given his credibility, and his clearer
 perspective than most of the rest of us. If this BadExit policy is being
 made up ad-hoc, that's fine by me. If the offending Tor node operators
 want
 to stand up and defend themselves, or their choices, that's fine too.
 Great.  What's the acceptable companion port to 119 ?  How about 6667 ?

 Since these ports, like 25, have no standard companion (like 80/443
 typically does) what collection of encrypted ports need to be maintained to
 balance out running 199/6667 ?

 Come on people - I thought there would be quick answers to all of this...

 RE: clearer perspective - it's easy to have a clear perspective when you
 discount all possible use cases that aren't what I do.


Here's an argument tip: When you think you've spotted some enormous
hole in the other side's argument, there is at least a small chance
that you're actually instead spotted a hole in your understanding of
their position. You should probably take a moment to reflect and make
sure you're confident that you know where the error is before hitting
send.  I refrained from answering this the first time you asked it
because I thought if I gave you more time you might realize that it
wasn't really a useful question.

No one has suggested every unencrypted port must be matched.  There
are some very clear matches which do exist (e.g. HTTP/HTTPS) and for
those matches action can be taken.  Nothing requires anything to be
done about all the other cases where such nice and popular parallels
are not obvious or where the protocols are unpopular enough to begin
with.  HTTP is an overwhelming popular port, and there really isn't
anything wrong with special casing _just_ that, if thats all that it
ever came to.

Your examples aren't the best though, SSL SMTP is on 465— and it's
probably common enough that a similar rule could be enforced if anyone
cared. IRC ports aren't all that consistent even without the
introduction of security, so there isn't much that can be said there.

[snip]
 and people that need this are in literally life or death (or at least free or 
 jail) situations

Then they need to not run an exit. If running an exit is probably
going to get you killed or put in jail you should not be running one.
If you're right and the decision to allow wacko exit policies
discourages people with their life on the line from running exits,
then I could imagine no better policy.
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


ToR: A network by/for ToR admins

2011-02-14 Thread John Case


On Mon, 14 Feb 2011, Gregory Maxwell wrote:


Then they need to not run an exit. If running an exit is probably
going to get you killed or put in jail you should not be running one.
If you're right and the decision to allow wacko exit policies
discourages people with their life on the line from running exits,
then I could imagine no better policy.



Thank you, thank you.  It took some time and some goading, but we've 
finally arrived:


ToR will be used the way we think it should be.

That's all I needed to hear.
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Scroogle and Tor

2011-02-14 Thread scroogle
Some have wondered why anyone would want to abuse Scroogle
using Tor. Apart from some malicious types that may be
doing it for their own amusement, it looks to me like they
are trying to datamine Google -- arguably the largest,
most diverse database on the planet.

If you can manage to run a script 24/7 that datamines
Google, you can monetize your results. Search engine
optimizers would like to be able to do this. So would
various directory builders.

Doing it by scraping google.com directly is not easy.
Scroogle provides 100 links of organic results per
request, with less than one-half the byte-bloat that
Google delivers for the same links and snippets. It is
also much easier to parse Scroogle's simple output page
than it is to parse Google's output page.

I spend a couple hours per day blocking abusers. A huge
amount of this is done through a couple dozen monitoring
programs I've written, but for the most part these
programs provide candidates for blocking only, and
my wetware is needed to make the final determination.

My efforts to counter abuse occasionally cause some
programmers to consider using Tor to get Scroogle's
results. About a year ago I began requiring any and all
Tor searches at Scroogle to use SSL. Using SSL is always
a good idea, but the main reason I did this is that the
SSL requirement discouraged script writers who didn't
know how to add this to their scripts. This policy
helped immensely in cutting back on the abuse I was
seeing from Tor.

Now I'm seeing script writers who have solved the SSL
problem. This leaves me with the user-agent, the search
terms, and as a last resort, blocking Tor exit nodes.
If they vary their search terms and user-agents, it can
take hours to analyze patterns and accurately block them
by returning a blank page. That's the way I prefer to do
it, because I don't like to block Tor exit nodes. Those
who are most sympathetic with what Tor is doing are also
sympathetic with what Scroogle is doing. There's a lot of
collateral damage associated with blocking Tor exit nodes,
and I don't want to alienate the Tor community except as
a last resort.

One reason why Scroogle has lasted for more than six
years is that we are nonprofit, and Google knows by now
that I don't tolerate abuse. My job is to stop the abuser
before Scroogle passes their search terms to Google.
Abusers who use Tor make this more difficult for me.
Blocking an IP address is easy, but blocking Tor abusers
without alienating other Tor users is more complex.

-- Daniel Brandt



***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Scroogle and Tor

2011-02-14 Thread thecarp
On 02/14/2011 06:29 PM, scroo...@lavabit.com wrote:
 Some have wondered why anyone would want to abuse Scroogle
 using Tor. Apart from some malicious types that may be
 doing it for their own amusement, it looks to me like they
 are trying to datamine Google -- arguably the largest,
 most diverse database on the planet.

Makes a lot of sense. Actually, can hardly blame them for wanting to
mine the data. Of course, you make it pretty easily available, as you
detail. I can see why this starts to present a problem.
 I spend a couple hours per day blocking abusers. A huge
 amount of this is done through a couple dozen monitoring
 programs I've written, but for the most part these
 programs provide candidates for blocking only, and
 my wetware is needed to make the final determination.

Ouch, that really sucks... time like that adds up fast.

 Now I'm seeing script writers who have solved the SSL
 problem. This leaves me with the user-agent, the search
 terms, and as a last resort, blocking Tor exit nodes.
 If they vary their search terms and user-agents, it can
 take hours to analyze patterns and accurately block them
 by returning a blank page. That's the way I prefer to do
 it, because I don't like to block Tor exit nodes. Those
 who are most sympathetic with what Tor is doing are also
 sympathetic with what Scroogle is doing. There's a lot of
 collateral damage associated with blocking Tor exit nodes,
 and I don't want to alienate the Tor community except as
 a last resort.


Well...google uses the captcha system. Hard to say how well that works.
I doubt anything too simple is going to work here, for many reasons,
including the ones that you specify. How about this... we know you can
(mostly reliably) detect tor exits.

I think you have your goals wrong. You don't need to stop the scripts
from getting to google, even google can't stop that on their own site.
What you need is to make abusive use unprofitable on a scale that matters.

Tor users care about their privacy right... but you need a way to
differentiate them. So how about a temporary registration system? I get
sent to a page with a captcha (or two kinds even). If I pass, then I get
a token (set in a cookie, or put in the query string) that lets me do
searches. Maybe I can set when it should expire (up to a max) maybe
put in a 30 second timeout before it becomes active. (slow them down
some more)... maybe limit the rate per ip over time for registrations?

Secondly, have you considered poisoning their stream? If you detect an
obvious abusive script, return randomized cached results. Ruining their
work, rather than just slowing them down, might convince them to move on
and try somewhere else. It is a thought anyway.

 One reason why Scroogle has lasted for more than six
 years is that we are nonprofit, and Google knows by now
 that I don't tolerate abuse. My job is to stop the abuser
 before Scroogle passes their search terms to Google.
 Abusers who use Tor make this more difficult for me.
 Blocking an IP address is easy, but blocking Tor abusers
 without alienating other Tor users is more complex.

It will be sad to see tor users lose your service (I actually had only
heard the name before this thread, very curious to check it out now).

-Steve

***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Scroogle and Tor

2011-02-14 Thread Mike Perry
Thus spake scroo...@lavabit.com (scroo...@lavabit.com):

 My efforts to counter abuse occasionally cause some
 programmers to consider using Tor to get Scroogle's
 results. About a year ago I began requiring any and all
 Tor searches at Scroogle to use SSL. Using SSL is always
 a good idea, but the main reason I did this is that the
 SSL requirement discouraged script writers who didn't
 know how to add this to their scripts. This policy
 helped immensely in cutting back on the abuse I was
 seeing from Tor.
 
 Now I'm seeing script writers who have solved the SSL
 problem. This leaves me with the user-agent, the search
 terms, and as a last resort, blocking Tor exit nodes.
 If they vary their search terms and user-agents, it can
 take hours to analyze patterns and accurately block them
 by returning a blank page. That's the way I prefer to do
 it, because I don't like to block Tor exit nodes. Those
 who are most sympathetic with what Tor is doing are also
 sympathetic with what Scroogle is doing. There's a lot of
 collateral damage associated with blocking Tor exit nodes,
 and I don't want to alienate the Tor community except as
 a last resort.

Great, now that we know the motivations of the scrapers and a history
of the arms race so far, it becomes a bit easier to try to do some
things to mitigate their efforts. I particularly like the idea of
feeding them random, incorrect search results when you can fingerprint
them.


If you want my suggestions for next steps in the arms race for this,
(having written some benevolent scrapers and web scanners myself), it
would actually be to do things that require your adversary to
implement and load more and more bits of a proper web browser into
their crawlers for them to succeed in properly issuing queries to you.

Some examples:

1. A couple layers of crazy CSS.

If you use CSS style sheets that fetch other randomly generated and
programmatically controlled style elements that are also keyed to the
form submit for the search query (via an extra hidden parameter or
something that is their hash), then you can verify on your server side
that a given query also loaded sufficient CSS to be genuine. 

The problem with this is it will mess with people who use your search
plugin or search keywords, but you could also do it in a brief landing
page that is displayed *after* the query, but before a 302 or
meta-refresh to actual results, for problem IPs.

2. Storing identifiers in the cache

http://crypto.stanford.edu/sameorigin/safecachetest.html has some PoC
of this. Torbutton protects against long-term cache identifiers, but
for performance reasons the memory cache is enabled by default, so you
could use this to differentiate crawlers who do not properly obey all
brower caching sematics. Caching is actually pretty darn hard to get
right, so there's probably quite a bit more room here than just plain
identifiers.

3. Javascript proof of work

If the client supports javascript, you can have them factor some
medium-sized integers and post the factorization with the query
string, to prove some level of periodic work. The factors could be
stored in cookies and given a lifetime. The obvious downside of this
is that I bet a fair share of your users are running NoScript, or
prefer to disable js and cookies.


Anyways, thanks for your efforts with Scroogle. Hopefully the above
ideas are actually easy enough to implement on your infrastructure to
make it worth your while to use for all problem IPs, not just Tor.

-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgpDQruQ8zLhC.pgp
Description: PGP signature


Re: Scroogle and Tor

2011-02-14 Thread Robert Ransom
On Mon, 14 Feb 2011 20:19:50 -0800
Mike Perry mikepe...@fscked.org wrote:

 2. Storing identifiers in the cache
 
 http://crypto.stanford.edu/sameorigin/safecachetest.html has some PoC
 of this. Torbutton protects against long-term cache identifiers, but
 for performance reasons the memory cache is enabled by default, so you
 could use this to differentiate crawlers who do not properly obey all
 brower caching sematics. Caching is actually pretty darn hard to get
 right, so there's probably quite a bit more room here than just plain
 identifiers.

Polipo monkey-wrenches Torbutton's protection against long-term cache
identifiers.


Robert Ransom


signature.asc
Description: PGP signature


Re: Scroogle and Tor

2011-02-14 Thread Mike Perry
Thus spake Robert Ransom (rransom.8...@gmail.com):

 On Mon, 14 Feb 2011 20:19:50 -0800
 Mike Perry mikepe...@fscked.org wrote:
 
  2. Storing identifiers in the cache
  
  http://crypto.stanford.edu/sameorigin/safecachetest.html has some PoC
  of this. Torbutton protects against long-term cache identifiers, but
  for performance reasons the memory cache is enabled by default, so you
  could use this to differentiate crawlers who do not properly obey all
  brower caching sematics. Caching is actually pretty darn hard to get
  right, so there's probably quite a bit more room here than just plain
  identifiers.
 
 Polipo monkey-wrenches Torbutton's protection against long-term cache
 identifiers.

I hate polipo. I've been trying ignore it until it fucking dies. But
it's like a zombie that just won't stop gnawing on our brains. Worse,
a crack smoking zombie that got us all addicted to it through second
hand crack smoke. Or something. But hey, it's better than privoxy.
Maybe?

I was under the impression that we hacked it to also be memory-only,
though. But you're right, if I toggle Torbutton to clear my cache,
Polipo's is still there...


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgpgDTEhULdw5.pgp
Description: PGP signature