RE: build models in parallel
You can use your groupId as a grid parameter, filter your dataset using this id in a pipeline stage, before feeding it to the model. The following may help: http://spark.apache.org/docs/latest/ml-tuning.html
Re: build models in parallel
They https://www.youtube.com/watch?v=R-6nAwLyWCI use such functionality via pyspark. Xiaomeng Wanschrieb am Di., 29. Nov. 2016 um 17:54 Uhr: > I want to divide big data into groups (eg groupby some id), and build one > model for each group. I am wondering whether I can