Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/17383
Hi, since the work has been done for a long time, I take a review by
myself.
After careful review, as SparseVector is compressed sparse row format, so
the only benefit of the PR would be for data storage but in the cost of
performance. But for tree-method, it is uncommon to handle a super large
dimension features. Hence, it cannot satisfy me.
I prefer to [SPARK-3717: DecisionTree, RandomForest: Partition by
feature](https://issues.apache.org/jira/browse/SPARK-3717) as an alternative,
which will be benefits in both performance and storage if I understand
correctly. So the PR is closed. Thank everyone for review / comment.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]