[jira] [Comment Edited] (SPARK-16575) partition calculation mismatch with sc.binaryFiles

Tarun Kumar (JIRA) Fri, 14 Oct 2016 01:40:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572950#comment-15572950
 ]


Tarun Kumar edited comment on SPARK-16575 at 10/14/16 8:38 AM:
---------------------------------------------------------------

[~rxin] I have now added the support of openCostInBytes, similar to SQL (thanks 
for pointing out). It does now create an optimized number of partitions. 
Request you to review and suggest. Once again, Thanks for your suggestion, It 
worked like a charm!


was (Author: fidato13):
[~rxin] I have now added the support of openCostInBytes, similar to SQL (thanks 
for pointing out). It does now creates an optimized number of partitions. 
Request you to review and suggest. Once again, Thanks for your suggestion, It 
worked like a charm!

> partition calculation mismatch with sc.binaryFiles
> --------------------------------------------------
>
>                 Key: SPARK-16575
>                 URL: https://issues.apache.org/jira/browse/SPARK-16575
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, Java API, Shuffle, Spark Core, Spark Shell
>    Affects Versions: 1.6.1, 1.6.2
>            Reporter: Suhas
>            Priority: Critical
>
> sc.binaryFiles is always creating an RDD with number of partitions as 2.
> Steps to reproduce: (Tested this bug on databricks community edition)
> 1. Try to create an RDD using sc.binaryFiles. In this example, airlines 
> folder has 1922 files.
>      Ex: {noformat}val binaryRDD = 
> sc.binaryFiles("/databricks-datasets/airlines/*"){noformat}
> 2. check the number of partitions of the above RDD
>     - binaryRDD.partitions.size = 2. (expected value is more than 2)
> 3. If the RDD is created using sc.textFile, then the number of partitions are 
> 1921.
> 4. Using the same sc.binaryFiles will create 1921 partitions in Spark 1.5.1 
> version.
> For explanation with screenshot, please look at the link below,
> http://apache-spark-developers-list.1001551.n3.nabble.com/Partition-calculation-issue-with-sc-binaryFiles-on-Spark-1-6-2-tt18314.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16575) partition calculation mismatch with sc.binaryFiles

Reply via email to