[ 
https://issues.apache.org/jira/browse/SPARK-45154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795716#comment-17795716
 ] 

APeng Zhang commented on SPARK-45154:
-------------------------------------

[~oumarnour] I think you need to set the _seed_ param of CrossValidator.

> Pyspark DecisionTreeClassifier: results and tree structure in spark3 very 
> different from that of the spark2 version on the same data and with the same 
> hyperparameters.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-45154
>                 URL: https://issues.apache.org/jira/browse/SPARK-45154
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib, PySpark, Spark Core
>    Affects Versions: 3.0.0, 3.3.1, 3.2.4, 3.3.3, 3.3.2, 3.4.0, 3.4.1
>            Reporter: Oumar Nour
>            Priority: Critical
>              Labels: decisiontree, pyspark3, spark2, spark3
>
> Hello,
> I have an engine running on spark2 using a DecisionTreeClassifier model using 
> the CrossValidator. 
>  
> {code:java}
> dt  = DecisionTreeClassifier(maxBins=10000, seed=0)   
> cv_dt_evaluator = BinaryClassificationEvaluator(
>             metricName="", 
>             rawPredictionCol="probability")
> # Create param grid and cross validator for model selection
> dt_grid = ParamGridBuilder()\
>             .addGrid(
>                 dt.minInstancesPerNode, [100]
>         )\
>             .addGrid(
>                 dt.maxDepth, [10]
>         )\
>             .build()
> cv = CrossValidator(
>             estimator=dt, estimatorParamMaps=dt_grid, 
> evaluator=cv_dt_evaluator,
>             parallelism=4
>             numFolds=4
>         ){code}
>  
> I want to {*}migrate from spark2  to spark3{*}. I've run 
> *DecisionTreeClassifier* on the same data with the same parameter values. But 
> unfortunately my results are {*}completely different, especially in terms of 
> tree structure{*}. I have trees with less depth and fewer splits on spark3. 
> I've tried to read the documentation but I haven't found an answer to my 
> question.
>  
> Can you help me find a solution to this problem?
> Thanks in advance for your help 
>         
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to