Github user srowen commented on the issue:
https://github.com/apache/spark/pull/16038
Right now, everything is dense here, right? That's the worst case. Your
goal is to avoid serializing a dense zero vector and I say it can just be
sparse, which solves the immediate problem. From there, some operations may or
may not result in sparse vectors, but that's not relevant -- it can only be
better than being all dense, which is the current case. It matters at the end
because the return type is dense, but, making it dense is easy. Paying the cost
of copying to a dense representation is the only downside, but that's small
compared to the saving in serialization (I presume).
I don't understand the problem you're suggesting? are you saying that you
_don't_ want _any_ operations to be on sparse vectors? I'd leave that choice to
the implementations, but, if you're worried about it, simply force the argument
of the seqOp to become dense. Then, the only change is the smaller
serialization and everything should be identical afterwards.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]