Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/7210#issuecomment-119824697
Also, I think users can easily workaround this issue without using
`CombineFileIntputFormat` by adding a `coalesce(n)` call, where `n` is the
desired task number. In MapReduce, basically the framework decides how many
splits to use, but in Spark, it can be controlled explicitly. For example:
```scala
sqlContext.read.parquet("hdfs://some/path").coalesce(1).collect()
```
In this way, only a single task is used to read all the files at the given
path. Does this trick work for you?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]