Spark Random Forest training cost same time on yarn as on standalone

2016-10-20 Thread
I'm training random forest model using spark2.0 on yarn with cmd like: $SPARK_HOME/bin/spark-submit \ --class com.netease.risk.prediction.HelpMain --master yarn --deploy-mode client --driver-cores 1 --num-executors 32 --executor-cores 2 --driver-memory 10g --executor-memory 6g \ --conf

Spark ML OOM problem

2016-10-12 Thread
Hi I'm using spark ml to train RandomForest Model . There is about over 200, 000 lines in the training data file and about 100 features. I'm running spark in local mode and with JAVA_OPTS like: -Xms1024m -Xmx10296m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps, but OOM error keep coming out, I

How to continuous update or refresh RandomForestClassificationModel

2016-08-19 Thread
Hi All I'm using my training data generate the RandomForestClassificationModel , and I can use this to predict the upcoming data. But if predict failed I'll put the failed features into the training data, here is my question , how can I update or refresh the model ? Which api should

How to Improve Random Forest classifier accuracy

2016-08-18 Thread
Hi All I using spark ml Random Forest classifier, I have only two label categories (1, 0) ,about 30 features and data size over 100, 000. I run the spark JavaRandomForestClassifierExample code, the model came out with the results (I make some change, show more detail result): Test Error =

How to Improve Random Forest classifier accuracy

2016-08-18 Thread
Hi All

Questions about ml.random forest (only one decision tree?)

2016-08-04 Thread
Hi all I'm trying to use spark ml to do some prediction with random forest. By reading the example code https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaRandomForestClassifierExample.java , I can only find out it's similar to