GitHub user 10110346 opened a pull request:

    https://github.com/apache/spark/pull/22725

    [SPARK-24610][[CORE][FOLLOW-UP]fix reading small files via BinaryFileRDD

    ## What changes were proposed in this pull request?
    
    This is a follow up of #21601, `StreamFileInputFormat` and 
`WholeTextFileInputFormat` have the same problem.
    
    `Minimum split size pernode 5123456 cannot be larger than maximum split 
size 4194304
    java.io.IOException: Minimum split size pernode 5123456 cannot be larger 
than maximum split size 4194304
            at 
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:
 201)
        at 
org.apache.spark.rdd.BinaryFileRDD.getPartitions(BinaryFileRDD.scala:52)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:254)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:252)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2138)`
    
    ## How was this patch tested?
    Added a unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/10110346/spark maxSplitSize_node_rack

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22725.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22725
    
----
commit 54ffcdb7a18471a7a24fe36a000ca0cc4e8d0eba
Author: liuxian <liu.xian3@...>
Date:   2018-10-15T07:28:31Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to