Hi vkutsenko, Can you just give partitions to the input labeled rdd, like:
<LabeledPoint> data = MLUtils.loadLibSVMFile(jsc.sc(), "s3://somebucket/somekey/plaintext_libsvm_file").toJavaRDD().*repartition(5)*; Here, i used 5, since you have have 5 cores. Also for further benchmark and performance tuning: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Slow-Performance-with-Apache-Spark-Gradient-Boosted-Tree-training-runs-tp24758p24764.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org