zhengruifeng edited a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624985569
This PR is a update of https://github.com/apache/spark/pull/27374, it can
avoid performance regression on sparse datasets by default (with blockSize=1).
On dense
zhengruifeng edited a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624427337
performace test on **sparse dataset**: the first 10,000 instances of
`webspam_wc_normalized_trigram`
code:
```scala
val df =