[GitHub] spark pull request: [SPARK-8893] Add runtime checks against non-po...

darabos Mon, 13 Jul 2015 04:39:50 -0700

Github user darabos commented on the pull request:

    https://github.com/apache/spark/pull/7285#issuecomment-120898722
  
    > Ah, I think this may have to be a check higher up, on the argument to 
`repartition`? this looks too low level. An RDD with 0 partitions is OK, just 
not repartitioning a (non-empty) RDD to 0 partitions.
    
    `repartition` just calls `coalesce`, which just calls `CoalescedRDD`, and 
that is where I put the assertion. That assertion is fine I think, it was not 
triggered during the tests, and it does not interfere with zero-partition RDDs. 
(Okay, it would prevent repartitioning an empty RDD into zero partitions. I've 
added a condition now to allow that.)
    
    The tests triggered the other assertion, in `HashPartitioner`. The failures 
make it clear that there are valid cases where zero-size `HashPartitioner`s are 
created (such as running groupByKey on an empty RDD). As long as they don't 
call `getPartition` there is nothing wrong with this. `getPartition` will try 
to divide by zero when it is called in this case, so it will be detected 
without my assertion.
    
    For negative partition counts `getPartition` would silently return bogus 
(positive) results though, so I kept the assertion against negative partitions 
counts. I admit it's a bit silly. Let me know what you think.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8893] Add runtime checks against non-po...

Reply via email to