Could you check the log to see how much iterations does your LoR runs? Does
your program output same model between different attempts?
Thanks
Yanbo
2016-08-12 3:08 GMT-07:00 olivierjeunen :
> I'm using pyspark ML's logistic regression implementation to do some
> classification on an AWS EMR Yarn cluster.
>
> The cluster consists of 10 m3.xlarge nodes and is set up as follows:
> spark.driver.memory 10g, spark.driver.cores 3 , spark.executor.memory 10g,
> spark.executor-cores 4.
>
> I enabled yarn's dynamic allocation abilities.
>
> The problem is that my results are way unstable. Sometimes my application
> finishes using 13 executors total, sometimes all of them seem to die and
> the
> application ends up using anywhere between 100 and 200...
>
> Any insight on what could cause this stochastic behaviour would be greatly
> appreciated.
>
> The code used to run the logistic regression:
>
> data = spark.read.parquet(storage_path).repartition(80)
> lr = LogisticRegression()
> lr.setMaxIter(50)
> lr.setRegParam(0.063)
> evaluator = BinaryClassificationEvaluator()
> lrModel = lr.fit(data.filter(data.test == 0))
> predictions = lrModel.transform(data.filter(data.test == 1))
> auROC = evaluator.evaluate(predictions)
> print "auROC on test set: ", auROC
> Data is a dataframe of roughly 2.8GB
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-s-Logistic-Regression-runs-
> unstable-on-Yarn-cluster-tp27520.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>