Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4591#issuecomment-74411356
@rxin So I should emphasize that the result of parallelizing an empty `Seq`
right now is an RDD with many empty partitions, rather than no partitions. It
_almost_ works to return an empty RDD with no partitions, except that I found
one case in the Python API, in its own paralleize() implementation, where it
builds a result for xranges by mapping the empty partitions of the result of
parallelize([]) to the desired partitions with a generator function.
This is basically the much simpler state after my first commit in this PR.
If you can think of a better way to implement that bit of the Python API that
doesn't rely on getting back an RDD with many empty partitions, that could be a
way forward. The nice thing is that that version solves the problem of
`parallelize(Seq())` for all numbers of partitions. I suppose it introduces a
small behavior change, in that you get back a 0-partition RDD here not one with
the default # of partitions.
Anyhow, otherwise, I tend to agree with just documenting this and moving on.
I'll pause a day for more thoughts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]