[GitHub] spark pull request: SPARK-2630 Input data size of CoalescedRDD cou...

pwendell Sat, 20 Sep 2014 22:55:13 -0700

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2310#issuecomment-56290108
  
    @ash211 this particular fix will only work for HadoopRDD but not other 
cases where we track input bytes read. A more general fix is pretty simple 
though, I think.
    
    The way to fix this is to change the input bytes read from a Long to an 
AtomicLong and to `incrementAndGet` it whenever it is modified (currently in 
HadoopRDD, newHadoopRDD, and BlockManager) instead of just setting the value 
directly. I think that would fix the case where it's mutated multiple times in 
a single task.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-2630 Input data size of CoalescedRDD cou...

Reply via email to