Re: [SPOT-ML] LDA options in spot.conf

2018-05-02 Thread Michael Ridley
I have been offline for a few months so I haven't seen the discussions around this, but without knowing the full context I would be concerned with using ZooKeeper for things that could/should go in a conf file. Given that ZooKeeper is inherently a potential bottleneck in a cluster and leveraged

Re: [SPOT-ML] LDA options in spot.conf

2018-05-02 Thread Austin Leahy
A couple of times during config discussions we floated the idea of integrating zookeeper for process config. This would both facilitate broader deployments and bring the project into design parity with the way other projects like spark handle config at a cluster level. This might be a good time to

Re: [SPOT-ML] LDA options in spot.conf

2018-05-01 Thread Curtis Howard
Hi Nathanael, My thought was that the Spark-specific parameters defined in /etc/spot.conf (SPK_*, for cores and memory per executor, etc.) were global defaults that could apply to any Spark application (could be Spark streaming ingest, or the Spark ML LDA, ...) although a particular Spark

[SPOT-ML] LDA options in spot.conf

2018-04-30 Thread Nate Smith
I’m adding some checks into ml_ops.sh to avoid passing spark-submit a bunch of empty variables. My question is rather the LDA_* options in spot.conf should really be SPK_LDA_*? they are variables for the spark job and yet it’s not instantly clear that they need to be included and can not be