Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19979
  
    @jkbradley 
    > When there has been a shuffle, it is likely the Rows will not follow a 
fixed order.
    
    Agreed. But we can make sure it generate fix order from the last shuffle 
position in the physical plan RDD lineage. Those model which works like `map` 
transformation, I think it can make sure output row order to be exactly the 
same with input row order.
    
    > test statistics (such as min/max ) on global transformer output
    
    This is also used in some tests, such as "predictRaw and 
predictProbability" testcase in `DecisionTreeClassifierSuite"
    
    > For comparing results with expected values, I much prefer for those 
values to be in a column in the original input dataset.
    
    Agreed.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to