Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/7210#issuecomment-119824697
  
    Also, I think users can easily workaround this issue without using 
`CombineFileIntputFormat` by adding a `coalesce(n)` call, where `n` is the 
desired task number. In MapReduce, basically the framework decides how many 
splits to use, but in Spark, it can be controlled explicitly. For example:
    
    ```scala
    sqlContext.read.parquet("hdfs://some/path").coalesce(1).collect()
    ```
    
    In this way, only a single task is used to read all the files at the given 
path. Does this trick work for you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to