srowen commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1477246959
I don't know enough to say whether it's worth a new method. Can we start
with the change that needs no new API, is it a big enough win?
--
This is an automated message from the Apache Gi
srowen commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1473071461
If it's faster and gives the right answers, sure
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
srowen commented on PR #40263:
URL: https://github.com/apache/spark/pull/40263#issuecomment-1466280043
So this seems slower on a medium-sized data set. I don't know if delaying
the collect() matters much; the overall execution time matters. I'm worried
that this gets much slower on 1M or 10