Hi

I am a newbie to spark and spark streaming - I just recently became aware of
it, but it seems really relevant to what I am trying to achieve.

I am looking into using spark streaming, with an input stream from zeromq. 
I am trying to figure out what machine is actually listening on the zeromq
socket.

If I have a 10-machine cluster, and one machine acting as the "application
machine", meaning the machine that runs the code I write, and submit jobs to
the cluster - which of these machines will subscribe to the zeromq socket?
Looking at the code, I see a "Sub" socket is created, meaning all the data
will pass through the socket, which leads me to believe that the application
machine is the one listening on the socket, and passes all received data to
the cluster. This means that this single machine might become a bottle neck
in a high throughput use case.
A better approach would be to have each node in the cluster listen on a
"fanout" socket, meaning each node in the cluster receive a part of the
data.

I am not sure about any of this, as I am not an expert in ZeroMQ, and
definitely not in spark. If someone can clarify how this works, I'd really
appreciate it.
In addition, if there other input streams, that operate in a different
manner that does not require the entire data to pass through the
"application machine", I'd really appreciate knowing that too.

Thank you very much in advance,
Amir



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-a-cluster-tp1576.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to