(Only the latter part of this post is related to Twisted. I apologise in 
advance for the former, and you can skip to that half directly to read about 
the gist)

Hi Laurens,

> I think you're doing fine. Distributed systems are just kind of hard :-)

This is encouraging to hear!

> It sounds like your fundamentally building an eventually consistent 
> distributed database. We have a few of those already: it might be 
> significantly less work to just use one of them. I suppose it depends why 
> you're trying to make it distributed. 

I have done a superficial research, so far it points to implementing my own 
solution. Or rather, what I am producing is not a full–fledged eventually 
consistent database at all, it's a loosely knit bunches of people. There is no 
'database' in the classical sense.

> Is this about reliability in the face of e.g. hardware failure, or is this 
> about being able to disseminate data when someone tries to stop you from 
> doing so? Are you trying to protect against byzantine failures too?

It's about the right to free speech, so I would say the latter. I'm from Turkey 
and currently living in the United States, so while I am protected under First 
Amendment, my compatriots are not. This is significant: As you might have 
heard, we had a wave of peaceful protests two months ago against the last green 
space being demolished in the heart of Istanbul, the Gezi Park, to turn the 
space into a shopping mall. The riot police attacked unarmed, stationary people 
with tear gas canisters, which, led to a series of much larger protests (a few 
million large at its height) and an international outcry. Thousands arrested, 
five dead, state apparatus pressing charges for everyone from high schoolers to 
suburban moms. This country is not a banana republic and it's what made these 
so shocking: Turkey is the 15th largest economy in the world, larger than South 
Korea in terms of GDP, in the process of joining the European Union etc. 

The reason I'm telling this is what happened afterwards; after the govt. had to 
stop police violence after the EU / US pressure, they silently began a 
witch–hunt of Twitter, Facebook users and bloggers they deemed to be 'promoting 
armed insurgency against the state', and there are currently around 20-25 
people currently under detention, waiting for trial. Of course, Twitter and 
Facebook does not give Turkish government any IP addresses: They just look at 
people's profiles, and just grab the most likely person having the same name. 
Our very broken legal system allows for up to five years (!) of contempt of 
court without pressing any charges, something which the Islamist government 
really loves to use against its people. 

I am building a tool that allows people to express their opinions without 
necessarily revealing their identities. It's called Aether, a distributed 
network that allows its users—all users are anonymous and unregistered by 
default— to exercise their right to free speech without being endangered by 
state violence. Everything within it is public, and everything posted on Aether 
is in public domain. (And please excuse the holier–than–thou sounding 
copywriting on the webpage—this was my thesis project and it was one of the 
requirements.)

The backend process of this application runs entirely on Twisted. The business 
rules are simple. Consider Alice the local node, Bob the remote node, and Carol 
another remote node. When Bob connects to Alice and gets a list of the posts 
Alice has, it will request the posts he does not have. The posts Alice has 
publicly available are the posts Alice has either a) created, or b) upvoted. If 
Bob, at some point, also likes the post he has gotten from Alice, Bob will also 
start to publicly distribute that post. At that point, it is impossible to 
determine whether Alice or Bob has created this post—Alice might as well have 
gotten the post from a third party which Bob is not aware of. The post, being 
distributed from two nodes, now has have a higher chance of being found by 
Carol. If you extrapolate it to a thousand people who all have upvoted the 
post, it becomes rather impossible to determine the origin. The act of 'liking' 
something is the exact same thing as sharing something, as is 'creating' 
something—there is absolutely no difference, and every node only has the IP 
address of the last ring of the chain. The nodes simply count how many times 
they encounter the same post digest to determine the amount of upvotes the post 
has gotten, and they use it to determine the lineup of posts the app shows to 
its user. Other than that, there is no global database, there is no global 
state, no people in the entire network is aware of all rest—I just strive to 
distribute the maximum amount of popular data to maximum amount of people 
possible. The client application pieces together all this information into a 
coherent whole of topics, subjects and posts. There is also distribution of 
user addresses through Aether to allow people to find new nodes to connect to.

There are also other details both in cryptography to defeat a global passive 
adversary, or detailed business rules to detect and hide abuses from the local 
user and many other things—this is a large project I have been working on alone 
for a very long time so some parts of it are rightly esoteric. And I have been 
off–topic already way longer than acceptable. I have the entire local 
application finished, and the only remaining part is networking, which is why I 
am trying to figure out Twisted so hard. Here are a bunch of screenshots for 
your perusal. Image 1  Image 2  Image 3  Image 4.

> While you can't rely on synchronized clocks (in the wall-clock time sense) in 
> a distributed system, you *can* rely on timestamps of your immutable 
> messages. You could send only message id's in the preceding time window, for 
> example. You can use hash chains to guarantee that the boards share the same 
> history.

I'm just discarding posts timestamped UNIX time that is ahead of the local UNIX 
time—for this specific purpose, it works well. 

All of those commands are handled through AMP protocol, and so far I am 
treating AMP like a local protocol with no chance of failure—that won't be the 
case under a real network. I can serve errors over AMP, but it starts to get 
very, very complex very fast when you do not have any guarantees on in which 
order things arrive. There are certain actions I want to forbid if a certain 
sequence has not been completed with that peer yet, but otherwise the protocol 
is remarkably flexible, and likewise remarkably pain–inflicting in its 
implementation. I guess I just want to know if I am using Twisted in this 
project the sanest way possible—I have enough insanity going on in my project 
to last a lifetime already!

Sorry about the semi off–topic wall of text, won't happen again.

Thanks,
Burak





On Aug 21, 2013, at 3:28 PM, Laurens Van Houtven <_...@lvh.io> wrote:

> Hi Burak,
> 
> 
> I think you're doing fine. Distributed systems are just kind of hard :-)
> 
> It sounds like your fundamentally building an eventually consistent 
> distributed database. We have a few of those already: it might be 
> significantly less work to just use one of them. I suppose it depends why 
> you're trying to make it distributed. Is this about reliability in the face 
> of e.g. hardware failure, or is this about being able to disseminate data 
> when someone tries to stop you from doing so? Are you trying to protect 
> against byzantine failures too?
> 
> That said, you might want to consider how you communicate posts. Six months 
> worth of posts is a lot. Even with ten posts per day, you'd end up with 
> ~10*30*6 = 1800 hash values. The digest size of BLAKE2 is variable, but if 
> you're using 512 bit digests, that's 64 bytes, or 112.5 kibibytes for the 
> whole thing. That's probably more than you want to send in a single message.
> 
> While you can't rely on synchronized clocks (in the wall-clock time sense) in 
> a distributed system, you *can* rely on timestamps of your immutable 
> messages. You could send only message id's in the preceding time window, for 
> example. You can use hash chains to guarantee that the boards share the same 
> history.
> 
> cheers
> lvh
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to