I have been offline for a few months so I haven't seen the discussions
around this, but without knowing the full context I would be concerned with
using ZooKeeper for things that could/should go in a conf file. Given that
ZooKeeper is inherently a potential bottleneck in a cluster and leveraged
A couple of times during config discussions we floated the idea of
integrating zookeeper for process config. This would both facilitate
broader deployments and bring the project into design parity with the way
other projects like spark handle config at a cluster level. This might be a
good time to
Hi Nathanael,
My thought was that the Spark-specific parameters defined in /etc/spot.conf
(SPK_*, for cores and memory per executor, etc.) were global defaults that
could apply to any Spark application (could be Spark streaming ingest, or
the Spark ML LDA, ...) although a particular Spark
I’m adding some checks into ml_ops.sh to avoid passing spark-submit a bunch of
empty variables.
My question is rather the LDA_* options in spot.conf should really be
SPK_LDA_*?
they are variables for the spark job and yet it’s not instantly clear that they
need to be included and can not be