Max,

Apex deploys a fast pub-sub server called buffer server in every yarn
container it gets (except AM) before deploying operators on it. For all the
operators which are connected downstream to operators outside the current
container, their output ports become publishers to the buffer server. The
downstream operators' input ports become subscribers to the buffer server.
So there is no concept a central operator/port registry, however all the
downstream operators do register their input ports with the buffer server.
The serialization algorithm is kryo based. The data transport protocol is
based on Netlet: https://github.com/DataTorrent/Netlet which is on top of
TCP/IP.

Regards,
Ashwin.

On Fri, Nov 25, 2016 at 8:33 AM, Max Bridgewater <[email protected]>
wrote:

> Hi Folks,
>
> I was giving an Apex demo the other day and people asked following
> questions:
>
> 1) what is the communication protocol between operators when they are on
> distant nodes. That means, how does Apex transport the tuples from one node
> to the other?
> Is it a custom protocol on top of TCP/IP or is it RPC?
> 2) What is the serialization algorithm used?
> 3) What is the addressing scheme between operators? That means how does
> Apex know where an operator is located and how to route data to it? Is
> there an operator registry? If so, where does it reside?
>
> Thoughts?
>
> Thanks,
> Max.
>
>


-- 

Regards,
Ashwin.

Reply via email to