Hi Zack,

On Thu, Jun 05, 2014 at 12:45:11PM -0400, Zack Weinberg wrote:
> I'd like to restart the conversation about hardening Wikipedia (or
> possibly Wikimedia in general) against traffic analysis.  I brought
> this up ... last November, I think, give or take a month?  but it got
> lost in a larger discussion about HTTPS.

This sounds like a great idea to me, thanks for thinking about it 
and sharing it. Privacy of peoples' reading habits is critical, and 
the more we can do to ensure it the better.

> With that data in hand, the next phase would be to develop some sort
> of algorithm for automatically padding HTTP responses to maximize
> eavesdropper confusion while minimizing overhead.  I don't yet know
> exactly how this would work.  I imagine that it would be based on
> clustering the database into sets of pages with similar length but
> radically different contents.  The output of this would be some
> combination of changes to MediaWiki core (for instance, to ensure that
> the overall length of the HTTP response headers does not change when
> one logs in) and an extension module that actually performs the bulk
> of the padding.  I am not at all a PHP developer, so I would need help
> from someone who is with this part.

I'm not a big PHP developer, but given the right project I can be 
enticed into doing some, and I'd be very happy to help out with 
this. Ensuring any changes didn't add complexity would be very 
important, but that should be do-able.

As was mentioned, external resources like variously sized images 
would probably be the trickiest thing to figure out good ways 
around. IIRC SPDY has some inlining multiple resources in the same 
packet sort of stuff, which we might be able to take advantage of to 
help here (it's been ages since I read about it, though).

Nick

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to