Re: Spark ML online serving

2018-07-12 Thread Maximiliano Felice
Hi! I know I'm late, but just to point some highlights of our usecase. We currently: - Use Spark as an ETL tool, followed by - a Python (numpy/pandas based) pipeline to preprocess information and - use Tensorflow for training our Neural Networks What we'd love to, and why we don't:

Re: Interest in adding ability to request GPU's to the spark client?

2018-07-12 Thread Maximiliano Felice
Hi, I've been meaning to reply to this email for a while now, sorry for taking so much time. I personally think that adding GPU resource management will allow us to boost some ETL performance a lot. For the last year, I've worked in transforming some Machine Learning pipelines from Python in

Re: Spark Monitoring using Jolokia

2018-01-08 Thread Maximiliano Felice
Hi! I don't know very much about them, but I'm currently working in posting custom metrics into Graphite. I found useful the internals described in this library: https://github.com/groupon/spark-metrics Hope this at least can give you a hint. Best of

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Maximiliano Felice
Hi Jeroen, I experienced a similar issue a few weeks ago. The situation was a result of a mix of speculative execution and OOM issues in the container. First of all, when an executor takes too much time in Spark, it is handled by the YARN speculative execution, which will launch a new executor