Re: [tor-dev] [GSoC '16] Exitmap project - Introduction and request for comments

2016-03-19 Thread Philipp Winter
Hi Mridul,

Thanks for your interest in exitmap.

On Fri, Mar 18, 2016 at 11:26:01AM +0530, Mridul Malpotra wrote:
> I will also be reading the tech report on Exitmap and would be
> grateful if you can recommend any other resource(s) that I should be
> referring to.

Don't bother reading the technical report.  The PETS paper that you've
already read is the most recent version.

> a. How was the bifurcation between stand-alone and same-process
> modules decided? Are there any advantages to allow for multiple forked
> processes for specific modules?

Do you mean the modules that are written in pure Python versus the
modules that use external tools?  Generally, we prefer to have pure
Python modules, but in some cases it's more convenient and practical to
run an external tool, say openssl, and parse its output.

> b. For testing active attacks, can there be modules developed
> keeping other cleartext protocols like SNMP and Telnet in mind?
> Alternatively, is there a way to determine what protocols are being used
> over Tor and their popularity?

Yes, you can certainly write SNMP and Telnet modules.  Depending on what
particular attack you are trying to expose (e.g., credential sniffing,
content injection), the complexity of this can range from
straightforward to quite tricky.

At this point, we have no privacy-preserving way to measure port
popularity.

> c. How is Exitmap being crowdsourced currently? I'm interested to
> know how data is being collected from volunteers running the scanner.

A bunch of people, including myself, occasionally run exitmap.  Some of
these people wrote their own modules, which is great, because simply
re-running the same modules wouldn't be all that useful.  If somebody
catches a bad exit relay, we report the result to
bad-rel...@torproject.org.

You can see that the process is informal, which is problematic because
scans are not archived centrally.  It would be neat to have a server
that accepts incoming scan results, archives them, and provides an
interface to analyse scans.  I don't consider this high priority, but
you might want to add it as an optional task if you still have time
towards the end of GSoC.

> 1. Achieve autonomous scanning in Exitmap with periodic scans that,
> based on a certain algorithm, fetches relay descriptors and automates
> various subtasks for consistent data collection and verification. The main
> challenges that I expect will be intelligently recognizing which tasks to
> automate and when, and making the entire background process execution
> efficient in resource consumption.

This is the most important issue.  A solid implementation of this would
be very helpful.

> 2. Emulating multiple user interaction in individual modules and in
> Exitmap overall to provide indistinguishability to Exitmap from regular
> users. I will try to explore libraries for this purpose like Splinter with
> Selenium or BeautifulSoup with Requests that help dynamically interaction
> with the web resource. The main challenges that I expect will be to scale
> this automated testing alongside the running asynchronous jobs and making
> the entire scans look like genuine user interactions. Any suggestions on
> better ways to do this will be helpful.

Yes, that would be great to have.  At this point, we are mimicing Tor
Browser, which doesn't work that well because the HTTP headers are
stored in an orderless dictionary:


> 3. Making the codebase more robust by adding unit test cases. I
> plan on using either the plain unittest/unittest2 framework or
> nose/nose2/pytest tools or any other alternatives that I may find or be
> recommended. I plan to simultaneously write the unit test cases for new
> code added and improve upon the exiting testing programs.

Sounds good.  For what it's worth, py.test was added in commit 63671d3f.

> 4. (Optional) I read from the mail threads on the tor-dev mailing
> list that the code needs to be converted to be Python3 compatible. Would
> like your opinion on whether it is a viable option and if it is possible,
> would like to include this in my list of tasks.

I haven't given it a shot myself, but I cannot think of a reason why it
would be hard.  (Famous last words!)  I would add it to the list of
optional tasks.

> 5. (Optional) If I can spare time in the milestone timeline and if
> discussion leads to some clarity, I would like to add another module for
> more cleartext protocols that could be implemented like SNMP or Telnet. I
> am also looking at possible local to remote attacks that are active at the
> application layer and could be tested in Exitmap. I'll update if I find
> anything.

Sounds good.

Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] [GSoC '16] Exitmap project - Introduction and request for comments

2016-03-19 Thread Mridul Malpotra
Hi everyone! I'm Mridul. I wish to apply for the Exitmap improvements
project mentored by Dr. Philipp Winter for the Google Summer of Code 2016.
My current IRC handle is mtyamantau.

Contents

1. Introduction - About myself and experience with Tor
2. Exitmap - Current progress and questions
3. GSoC - Rough proposal structure and questions

1. Introduction - About myself and experience with Tor
--

I'm Mridul Malpotra, currently in my senior year pursuing bachelors in
Computer Science from IIIT Delhi, India. My interests primarily lie in
computer networks and network security, specifically anonymous networks
like Tor and I2P. Through my now 1-year long undergraduate thesis work
under Dr. Sambuddho Chakravarty, I have had exposure to the Tor network,
relevant literature and some related projects, which helped me better
understand and appreciate the current research and development going on.

My work involved manually setting up testbeds through testing Tor networks
on our institute intranet as well as on PlanetLab (for those wondering, I
had recommended Chutney and Shadow). The current private testing Tor
network is running on a PlanetLab slice (iiitd_mridul2) with ~170 nodes
globally and 3 directory authorities. I used the control protocol through
the Stem library to help in multiple circuit creation and stream
attachments for measuring performance of a software over Tor.

I have also had experience with open source software, by contributing to
the Non intrusive load-monitoring toolkit (NILMTK) which is based on Python
and Pandas. While working there, I helped contribute code for additional
features, fixed a few bugs and also worked with a few of Python's package
management and documentation systems. Relevant links:
github.com/nilmtk/nilmtk/commits?author=mridulmalpotra


2. Exitmap - Current progress and questions
---

I recently read about Exitmap in the 'Differential Treatment of Anonymous
Users' paper by Khattak et. al. The use case for fast automated scanning
through Exitmap to evaluate ~1000 exit nodes was really interesting. On top
of that, it fitted my use case of testing a particular software's
performance over Tor. Familiarizing myself with the source code, I think I
understand the basic layout for how the scanner works and appreciate the
modularity of task executions. I followed the project's progress on github
and have read the 'Spoiled Onions' paper by Winter et. al.

In the coming 2 days, I plan to tinker around more with the code, discuss
concerns, issues and/or suggestions if any, and get myself properly
familiarized with the codebase. I also have certain ideas regarding what
modules could be added and improvements made, some of which I have
mentioned in the next section. I will also be reading the tech report on
Exitmap and would be grateful if you can recommend any other resource(s)
that I should be referring to.

Lastly, I had a few queries related to the project and/or paper and
apologize for the naivety in the questions if any.
a. How was the bifurcation between stand-alone and same-process
modules decided? Are there any advantages to allow for multiple forked
processes for specific modules?
b. For testing active attacks, can there be modules developed
keeping other cleartext protocols like SNMP and Telnet in mind?
Alternatively, is there a way to determine what protocols are being used
over Tor and their popularity?
c. How is Exitmap being crowdsourced currently? I'm interested to
know how data is being collected from volunteers running the scanner.


3. GSoC - Rough proposal structure and questions


Here I am listing the possible objectives that my project will be focusing
on. I request your feedback and comments on the chosen topics and their
descriptions.

1. Achieve autonomous scanning in Exitmap with periodic scans that,
based on a certain algorithm, fetches relay descriptors and automates
various subtasks for consistent data collection and verification. The main
challenges that I expect will be intelligently recognizing which tasks to
automate and when, and making the entire background process execution
efficient in resource consumption.

2. Emulating multiple user interaction in individual modules and in
Exitmap overall to provide indistinguishability to Exitmap from regular
users. I will try to explore libraries for this purpose like Splinter with
Selenium or BeautifulSoup with Requests that help dynamically interaction
with the web resource. The main challenges that I expect will be to scale
this automated testing alongside the running asynchronous jobs and making
the entire scans look like genuine user interactions. Any suggestions on
better ways to do this will be helpful.

3. Making the codebase more robust by adding unit test cases. I
plan on using either 

Re: [tor-dev] [GSoC '16] Exitmap project - Introduction and request for comments

2016-03-19 Thread grarpamp
On 3/18/16, Mridul Malpotra  wrote:
> b. For testing active attacks, can there be modules developed
> keeping other cleartext protocols like SNMP and Telnet in mind?

Tor only supports TCP of course, however any cleartext application
protocol using it is subject to snooping / modification. HTTP, POP3,
NNTP, etc. And if the cert is MITM or server faked, so is TLS.
A map to a honeypot of passwords [telnet pop3 ...] would be fun.

> Alternatively, is there a way to determine what protocols are being used
> over Tor and their popularity?

That might guide which protocol to develop module for, along with
thinking of what payoff for snooping / modification that proto is.
Note tor claims such traffic analysis research is likely too
sensitive to conduct, even though people privately conduct
it all the time.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev