Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19979
@jkbradley
> When there has been a shuffle, it is likely the Rows will not follow a
fixed order.
Agreed. But we can make sure it generate fix order from the last shuffle
position in the physical plan RDD lineage. Those model which works like `map`
transformation, I think it can make sure output row order to be exactly the
same with input row order.
> test statistics (such as min/max ) on global transformer output
This is also used in some tests, such as "predictRaw and
predictProbability" testcase in `DecisionTreeClassifierSuite"
> For comparing results with expected values, I much prefer for those
values to be in a column in the original input dataset.
Agreed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]