Re: Spark's Logistic Regression runs unstable on Yarn cluster

2016-08-16 Thread Yanbo Liang
Could you check the log to see how much iterations does your LoR runs? Does
your program output same model between different attempts?

Thanks
Yanbo

2016-08-12 3:08 GMT-07:00 olivierjeunen :

> I'm using pyspark ML's logistic regression implementation to do some
> classification on an AWS EMR Yarn cluster.
>
> The cluster consists of 10 m3.xlarge nodes and is set up as follows:
> spark.driver.memory 10g, spark.driver.cores  3 , spark.executor.memory 10g,
> spark.executor-cores 4.
>
> I enabled yarn's dynamic allocation abilities.
>
> The problem is that my results are way unstable. Sometimes my application
> finishes using 13 executors total, sometimes all of them seem to die and
> the
> application ends up using anywhere between 100 and 200...
>
> Any insight on what could cause this stochastic behaviour would be greatly
> appreciated.
>
> The code used to run the logistic regression:
>
> data = spark.read.parquet(storage_path).repartition(80)
> lr = LogisticRegression()
> lr.setMaxIter(50)
> lr.setRegParam(0.063)
> evaluator = BinaryClassificationEvaluator()
> lrModel = lr.fit(data.filter(data.test == 0))
> predictions = lrModel.transform(data.filter(data.test == 1))
> auROC = evaluator.evaluate(predictions)
> print "auROC on test set: ", auROC
> Data is a dataframe of roughly 2.8GB
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-s-Logistic-Regression-runs-
> unstable-on-Yarn-cluster-tp27520.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark's Logistic Regression runs unstable on Yarn cluster

2016-08-12 Thread olivierjeunen
I'm using pyspark ML's logistic regression implementation to do some
classification on an AWS EMR Yarn cluster.

The cluster consists of 10 m3.xlarge nodes and is set up as follows:
spark.driver.memory 10g, spark.driver.cores  3 , spark.executor.memory 10g,
spark.executor-cores 4.

I enabled yarn's dynamic allocation abilities.

The problem is that my results are way unstable. Sometimes my application
finishes using 13 executors total, sometimes all of them seem to die and the
application ends up using anywhere between 100 and 200...

Any insight on what could cause this stochastic behaviour would be greatly
appreciated.

The code used to run the logistic regression:

data = spark.read.parquet(storage_path).repartition(80)
lr = LogisticRegression()
lr.setMaxIter(50)
lr.setRegParam(0.063)
evaluator = BinaryClassificationEvaluator()
lrModel = lr.fit(data.filter(data.test == 0))
predictions = lrModel.transform(data.filter(data.test == 1))
auROC = evaluator.evaluate(predictions)
print "auROC on test set: ", auROC
Data is a dataframe of roughly 2.8GB



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-s-Logistic-Regression-runs-unstable-on-Yarn-cluster-tp27520.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org