GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/376

    [SPARK-1415] Hadoop min split for wholeTextFiles()

    JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-1415).
    
    New Hadoop API of `InputFormat` does not provide the `minSplits` parameter, 
which makes the API incompatible between `HadoopRDD` and `NewHadoopRDD`. The PR 
is for constructing compatible APIs.
    
    Though `minSplits` is deprecated by New Hadoop API, we think it is better 
to make APIs compatible here. 
    
    **Note** that `minSplits` in `wholeTextFiles` could only be treated as a 
*suggestion*, the real number of splits may not be greater than `minSplits` due 
to `isSplitable()=false`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark hadoop-min-split

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/376.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #376
    
----
commit b0e9c0cb8ac3f9d31b0f441b2b5635b400ae422c
Author: Xusen Yin <[email protected]>
Date:   2014-04-10T03:21:58Z

    add minSplits for WholeTextFiles

commit 85975a047eb25b810794cf32dee040fe3ae3b8e7
Author: Xusen Yin <[email protected]>
Date:   2014-04-10T04:41:22Z

    refine Java API and comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to