GitHub user ksakellis opened a pull request:

    https://github.com/apache/spark/pull/3120

    [SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds

    When calculating the input metrics there was an assumption that one task 
only reads from one block - this is not true for some operations including 
coalesce. This patch simply increments the task's input metrics if previous 
ones existed of the same read method.
    
    A limitation to this patch is that if a task reads from two different 
blocks of different read methods, one will override the other.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ksakellis/spark kostas-spark-4092

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3120
    
----
commit 467ebfa4b786274f3fa66d3aad1fdfe433ed771e
Author: Sandy Ryza <[email protected]>
Date:   2014-10-31T23:51:57Z

    SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader 
instantiation

commit a61eaedd2a1e78102c7bea4da5a2f0a21ba2983c
Author: Sandy Ryza <[email protected]>
Date:   2014-11-03T20:37:55Z

    Kostas's review feedback

commit f1a615f0c758adec7868256b6774e29f24b2ff33
Author: Kostas Sakellis <[email protected]>
Date:   2014-11-04T01:59:18Z

    [SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds
    
    When calculating the input metrics there was an assumption
    that one task only reads from one block - this is not true
    for some operations including coalesce. This patch simply
    increments the task's input metrics if previous ones existed
    of the same read method.
    
    A limitation to this patch is that if a task reads from
    two different blocks of different read methods, one will override
    the other.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to