Re: Slow Performance with Apache Spark Gradient Boosted Tree training runs

Yashwanth Kumar Tue, 22 Sep 2015 03:03:18 -0700

Hi vkutsenko,

Can you just give partitions to the input labeled rdd, like:


 <LabeledPoint> data = MLUtils.loadLibSVMFile(jsc.sc(),
"s3://somebucket/somekey/plaintext_libsvm_file").toJavaRDD().*repartition(5)*;


Here, i used 5, since you have have 5 cores.

Also for further benchmark and performance tuning:

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Slow-Performance-with-Apache-Spark-Gradient-Boosted-Tree-training-runs-tp24758p24764.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Slow Performance with Apache Spark Gradient Boosted Tree training runs

Reply via email to