Author: atagar
Date: 2012-02-27 15:08:12 +0000 (Mon, 27 Feb 2012)
New Revision: 25482

Modified:
   website/trunk/getinvolved/en/volunteer.wml
Log:
Adding Karsten's metrics project to the volunteer page



Modified: website/trunk/getinvolved/en/volunteer.wml
===================================================================
--- website/trunk/getinvolved/en/volunteer.wml  2012-02-27 06:16:47 UTC (rev 
25481)
+++ website/trunk/getinvolved/en/volunteer.wml  2012-02-27 15:08:12 UTC (rev 
25482)
@@ -543,6 +543,11 @@
     Karsten Loesing.
     </p>
     
+    <p>
+    <b>Project Ideas:</b><br />
+    <i><a href="#metricsSearch">Searchable Tor descriptor and Metrics data 
archive</a></i> (Python/Django?)
+    </p>
+    
     <a id="project-torstatus"></a>
     <h3><a 
href="https://trac.torproject.org/projects/tor/wiki/projects/TorStatus";>TorStatus</a>
 (<a
     href="https://gitweb.torproject.org/torstatus.git";>code</a>)</h3>
@@ -968,6 +973,25 @@
     </li>
     -->
     
+    <a id="metricsSearch"></a>
+    <li>
+    <b>Searchable Tor descriptor and Metrics data archive</b>
+    <br>
+    Priority: <i>Medium</i>
+    <br>
+    Effort Level: <i>Medium</i>
+    <br>
+    Skill Level: <i>Medium</i>
+    <br>
+    Likely Mentors: <i>Karsten</i>
+    <p>The <a href="https://metrics.torproject.org/data.html";>Metrics data 
archive</a> of Tor relay descriptors and other Tor-related network data has 
grown to over 100G in size, bz2-compressed.  We have developed two search 
interfaces: the <a 
href="https://metrics.torproject.org/relay-search.html";>relay search</a> finds 
relays by nickname, fingerprint, or IP address in a given month; <a 
href="https://metrics.torproject.org/exonerator-beta.html";>ExoneraTor</a> finds 
whether a given IP address was a relay on a given day.</p>
+    
+    <p>We'd like to have a more general search application for Tor descriptors 
and metrics data.  There are more <a 
href="https://metrics.torproject.org/formats.html";>descriptor types</a> that 
we'd like to include in the search.  The search application should handle most 
of them and understand some semantics like what's a timestamp, what's an IP 
address, and what's a link to another descriptor.  Users should then be able to 
search for arbitrary strings or limit their search to given time periods or IP 
address ranges.  Descriptors that reference other descriptors should contain 
links, and descriptors should be able to say from where they are linked.  The 
goal is to make the archive easily browsable.</p>
+    
+    <p>The search application shall be separate from the metrics website and 
shouldn't rely on the metrics website codebase.  The search application will 
contain hourly updated descriptor data from the metrics website via rsync.  
Programming language and database system are not specified yet, though there's 
a slight preference for Python/Django and Postgres for maintenance reasons.  If 
there are good reasons to pick something else, e.g, some NoSQL variant or some 
search application framework, that's fine, too.  Further requirements are that 
lookups should be really fast and that changes to the search application can be 
implemented in reasonable time.</p>
+    
+    <p>Applications for this project should come with a design of the proposed 
search application, ideally with a proof-of-concept based on a subset of the 
available data to show that it will be able to handle the 100G+ of data.</p>
+    
     <a id="unitTesting"></a>
     <li>
     <b>Improve our unit testing process</b>

_______________________________________________
tor-commits mailing list
[email protected]
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits

Reply via email to