Hi,

I am planning to go production using spark standalone mode using the
following configuration and I would like to know if I am missing something
or any other suggestions are welcome.

1) Three Spark Standalone Master deployed on different nodes and using
Apache Zookeeper for Leader Election.
2) Two or Three worker nodes (For our workloads which is being able to
process 5000 messages/sec two worker nodes are more than sufficient when we
ran our tests but for the safe side we may use three )
3) will use HDFS for storing recoverable state, WAL, checkpoint etc since
we are running a streaming application.
4) some sort of monitoring and alerting framework

Do I need anything else apart from this? What's not clear to me is how
service discovery is done. For examples, Right now we manually have to edit
the ip addresses of worker machines in SPARK_HOME/conf/slaves so we have to
bring the entire cluster down. so what is most common way to solve this
given that we dont plan on using mesos or yarn? I know of some tools which
can help me here but I would like to know which of those tools are widely
used? Any other suggestions in case I am missing are welcome.

Thanks,
kant

Reply via email to