I tried GBDTs both with Python's sklearn as well as Spark's local stand-alone 
MLlib implementation with default settings for a binary classification problem. 
I kept the numIterations, loss function same in both the cases. The features 
are all real valued and continuous. However, the AUC in MLLib implementation 
was way off compared to sklearn's. These were the parameters for sklearn's 
classifier:

GradientBoostingClassifier(
    init=None, learning_rate=0.001, loss='deviance',max_depth=8,
    max_features=None, max_leaf_nodes=None, min_samples_leaf=1, 
    min_samples_split=2, min_weight_fraction_leaf=0.0, 
    n_estimators=100, random_state=None, subsample=1.0, 
    verbose=0, warm_start=False) 
I wanted to check if there's a way to figure and set these params in MLlib or 
if MLlib also assumes same settings (which are pretty standard).

Any pointers to figure the difference would be helpful.

Reply via email to