GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/8236
[SPARK-7837] [SQL] Avoids double closing output writers when commitTask()
fails
When inserting data into a `HadoopFsRelation`, if `commitTask()` of the
writer container fails, `abortTask()` will be invoked. However, both
`commitTask()` and `abortTask()` try to close the output writer(s). The problem
is that, closing underlying writers may not be an idempotent operation. E.g.,
`ParquetRecordWriter.close()` throws NPE when called twice.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liancheng/spark spark-7837/double-closing
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8236.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8236
----
commit 9b668c3ca708b7f899ecbab21ff96b3e35fb2ea7
Author: Cheng Lian <[email protected]>
Date: 2015-08-17T09:30:15Z
Avoids double closing output writers when commitTask() fails
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]