[GitHub] spark pull request: Fixed bug in setMinPartitions

srowen Fri, 01 Jan 2016 11:11:39 -0800

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/10546#issuecomment-168332657
  
    Agree, compare to the impl in `WholeTextInputFormat`. Really it can be 
tidier, and fix minPartitions = 0, with:
    
    ```
      def setMinPartitions(context: JobContext, minPartitions: Int) {
        val totalLen = 
listStatus(context).asScala.filterNot(_.isDir).map(_.getLen).sum
        val maxSplitSize = math.ceil(totalLen / math.max(minPartitions, 
1.0)).toLong
        super.setMaxSplitSize(maxSplitSize)
      }
    ```
    
    But @datafarmer please see 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark for how 
we suggest changes first.
    
    @kmader WDYT?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Fixed bug in setMinPartitions

Reply via email to