Github user smurching commented on the issue:
https://github.com/apache/spark/pull/19433
Made a few updates, hereâs a quick summary/what Iâd propose moving
forward:
Right now:
* Shared row indices for all (categorical & continuous) features are stored
& updated in `TrainingInfo`
* `LocalDecisionTree.computeBestSplits` computes best splits/sufficient
stats for a single feature at a time
* A utility method (`LocalDecisionTreeUtils.updateArrayForSplit`) is used
to sort both feature values and shared row indices
When we add support for raw continuous feature values:
* Add a subclass of `FeatureColumn` (e.g. `ContinuousFeatureColumn`) that
stores and sorts its own array of row indices, pass these row indices to
methods requiring them.
I also renamed `FeatureVector` to `FeatureColumn` since the former seemed
like itâd confuse developers (`FeatureVector` sounds like a single data point)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]