GitHub user ksakellis opened a pull request:
https://github.com/apache/spark/pull/3120
[SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds
When calculating the input metrics there was an assumption that one task
only reads from one block - this is not true for some operations including
coalesce. This patch simply increments the task's input metrics if previous
ones existed of the same read method.
A limitation to this patch is that if a task reads from two different
blocks of different read methods, one will override the other.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ksakellis/spark kostas-spark-4092
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3120.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3120
----
commit 467ebfa4b786274f3fa66d3aad1fdfe433ed771e
Author: Sandy Ryza <[email protected]>
Date: 2014-10-31T23:51:57Z
SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader
instantiation
commit a61eaedd2a1e78102c7bea4da5a2f0a21ba2983c
Author: Sandy Ryza <[email protected]>
Date: 2014-11-03T20:37:55Z
Kostas's review feedback
commit f1a615f0c758adec7868256b6774e29f24b2ff33
Author: Kostas Sakellis <[email protected]>
Date: 2014-11-04T01:59:18Z
[SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds
When calculating the input metrics there was an assumption
that one task only reads from one block - this is not true
for some operations including coalesce. This patch simply
increments the task's input metrics if previous ones existed
of the same read method.
A limitation to this patch is that if a task reads from
two different blocks of different read methods, one will override
the other.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]