Re: how to print auc & prc for GBTClassifier, which is okay for RandomForestClassifier

2016-11-28 Thread Nick Pentreath
This is because currently GBTClassifier doesn't extend the
ClassificationModel abstract class, which in turn has the rawPredictionCol
and related methods for generating that column.

I'm actually not sure off hand whether this was because the GBT
implementation could not produce the raw prediction value, or due to
waiting for future multi-class support before implementing all the
classifier methods.


On Sun, 27 Nov 2016 at 19:52 Zhiliang Zhu 
wrote:

>
> Hi All,
>
> I need to print auc and prc for GBTClassifier model, it seems okay for
> RandomForestClassifier but not GBTClassifier, though rawPrediction column
> is neither in original data.
>
> the codes are :
>
> ..
> // Set up Pipeline
> val stages = new mutable.ArrayBuffer[PipelineStage]()
>
> val labelColName = if (algo == "GBTClassification") "indexedLabel"
> else "label"
> if (algo == "GBTClassification") {
>   val labelIndexer = new StringIndexer()
> .setInputCol("label")
> .setOutputCol(labelColName)
>   stages += labelIndexer
> }
>
> val rawFeatureSize =
> data.select("rawFeatures").first().toString().split(",").length;
> var indices : Array[Int] = new Array[Int](rawFeatureSize);
> for (i <- 0 until rawFeatureSize) {
> indices(i) = i;
> }
> val featuresSlicer = new VectorSlicer()
>   .setInputCol("rawFeatures")
>   .setOutputCol("features")
>   .setIndices(indices)
> stages += featuresSlicer
>
> val dt = algo match {
>
> // THE PROBLEM IS HERE:
>
> //GBTClassifier will not work, error is that field rawPrediction is not
> there, which appeared in the last line of code as pipeline.fit(data)
> //however, the similar codes are okay for RandomForestClassifier
> //in fact, rawPrediction column seems not in original data, but generated
> in BinaryClassificationEvaluator pipelineModel by auto
>
>   case "GBTClassification" =>
> new GBTClassifier()
>   .setFeaturesCol("features")
>   .setLabelCol(labelColName)
>   .setLabelCol(labelColName)
>   case _ => throw new IllegalArgumentException("Algo ${params.algo}
> not supported.")
> }
>
> val grid = new ParamGridBuilder()
>   .addGrid(dt.maxDepth, Array(1))
>   .addGrid(dt.subsamplingRate, Array(0.5))
>   .build()
> val cv = new CrossValidator()
>   .setEstimator(dt)
>   .setEstimatorParamMaps(grid)
>   .setEvaluator((new BinaryClassificationEvaluator))
>   .setNumFolds(6)
> stages += cv
>
> val pipeline = new Pipeline().setStages(stages.toArray)
>
> // Fit the Pipeline
> val pipelineModel = pipeline.fit(data)
> 
>
> Thanks in advance ~~
>
> Zhiliang
>
>
>


how to print auc & prc for GBTClassifier, which is okay for RandomForestClassifier

2016-11-27 Thread Zhiliang Zhu

Hi All,
I need to print auc and prc for GBTClassifier model, it seems okay for 
RandomForestClassifier but not GBTClassifier, though rawPrediction column is 
neither in original data.
the codes are :
..    // Set up Pipeline    val stages 
= new mutable.ArrayBuffer[PipelineStage]()
    val labelColName = if (algo == "GBTClassification") "indexedLabel" else 
"label"    if (algo == "GBTClassification") {      val labelIndexer = new 
StringIndexer()        .setInputCol("label")        .setOutputCol(labelColName) 
     stages += labelIndexer    }
    val rawFeatureSize = 
data.select("rawFeatures").first().toString().split(",").length;    var indices 
: Array[Int] = new Array[Int](rawFeatureSize);    for (i <- 0 until 
rawFeatureSize) {        indices(i) = i;    }    val featuresSlicer = new 
VectorSlicer()      .setInputCol("rawFeatures")      .setOutputCol("features")  
    .setIndices(indices)    stages += featuresSlicer
    val dt = algo match {
// THE PROBLEM IS HERE:
//GBTClassifier will not work, error is that field rawPrediction is not there, 
which appeared in the last line of code as pipeline.fit(data) //however, the 
similar codes are okay for RandomForestClassifier//in fact, rawPrediction 
column seems not in original data, but generated in 
BinaryClassificationEvaluator pipelineModel by auto 
      case "GBTClassification" =>        new GBTClassifier()           
.setFeaturesCol("features")          .setLabelCol(labelColName)          
.setLabelCol(labelColName)      case _ => throw new 
IllegalArgumentException("Algo ${params.algo} not supported.")    }
    val grid = new ParamGridBuilder()      .addGrid(dt.maxDepth, Array(1))      
.addGrid(dt.subsamplingRate, Array(0.5))      .build()    val cv = new 
CrossValidator()      .setEstimator(dt)      .setEstimatorParamMaps(grid)      
.setEvaluator((new BinaryClassificationEvaluator))      .setNumFolds(6)    
stages += cv
    val pipeline = new Pipeline().setStages(stages.toArray)
    // Fit the Pipeline    val pipelineModel = 
pipeline.fit(data)
Thanks in advance ~~
Zhiliang