Max, Between two operators the communication depends on stream locality. For thread local, the tuple is passed via thread stack; for container local there is a queue in between. For the rest there is a buffer server. This is effectively a pub-sub mechanism built over tcp-ip. Default is kryo serialization, but you can add your own. Addressing is via pub-sub; i.e. sender does not bother about addressing, the receiver connects to the buffer server showing interest. Routing is kicked off during launch by the master (Stram) and is a launch time or "re-do physical plan" time decision. Re-do will happen in outage, or dynamic changes.
Thks Amol On Fri, Nov 25, 2016 at 8:33 AM, Max Bridgewater <[email protected]> wrote: > Hi Folks, > > I was giving an Apex demo the other day and people asked following > questions: > > 1) what is the communication protocol between operators when they are on > distant nodes. That means, how does Apex transport the tuples from one node > to the other? > Is it a custom protocol on top of TCP/IP or is it RPC? > 2) What is the serialization algorithm used? > 3) What is the addressing scheme between operators? That means how does > Apex know where an operator is located and how to route data to it? Is > there an operator registry? If so, where does it reside? > > Thoughts? > > Thanks, > Max. > >
