[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

ksakellis Mon, 02 Feb 2015 09:20:34 -0800

Github user ksakellis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4067#discussion_r23940117
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -47,9 +49,13 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
             val inputMetrics = blockResult.inputMetrics
             val existingMetrics = context.taskMetrics
               .getInputMetricsForReadMethod(inputMetrics.readMethod)
    -        existingMetrics.addBytesRead(inputMetrics.bytesRead)
    +        existingMetrics.incBytesRead(inputMetrics.bytesRead)
     
    -        new InterruptibleIterator(context, 
blockResult.data.asInstanceOf[Iterator[T]])
    +        val iter = blockResult.data.asInstanceOf[Iterator[T]]
    +        new InterruptibleIterator(context, 
AfterNextInterceptingIterator(iter, (next: T) => {
    +          existingMetrics.incRecordsRead(1)
    --- End diff --
    
    This is a valid concern but I followed the pattern we have with 
CompletionIterator and InterruptableIterator. The CompletionIterator adds a 
function call after every "hasNext()" call - granted this is at the block level 
(not record). The IterruptableIterator adds a function call at each hasNext() 
and next() at the record level too. Did we evaluate the performance impact of 
these iterators too? 
    
    @pwendell can you explain to me where the locking occurs? calling 
incRecordsRead calls the AtomicLong.setAndget which uses a compare and swap. I 
don't see any locking. Since this is going to be called by one thread (the task 
thread), the compare should never fail so there shouldn't be any retrying. 
    
    @rxin I think adding the counting in the InterruptibleIterator is not a 
good separation of concerns. Also the InterruptibleIterator is used in a few 
places so we'd have to add some switching logic to know whether we should 
update the recordsread metrics?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

Reply via email to