Github user dibbhatt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6707#discussion_r31933485
  
    --- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala
 ---
    @@ -64,11 +68,17 @@ private[streaming] class BlockManagerBasedBlockHandler(
       extends ReceivedBlockHandler with Logging {
     
       def storeBlock(blockId: StreamBlockId, block: ReceivedBlock): 
ReceivedBlockStoreResult = {
    +    var numRecords = None: Option[Long]
    +    val countIterator = block match {
    +      case ArrayBufferBlock(arrayBuffer) => new 
CountingIterator(arrayBuffer.iterator)
    +      case IteratorBlock(iterator) => new CountingIterator(iterator)
    +      case _ => null
    +    }
         val putResult: Seq[(BlockId, BlockStatus)] = block match {
           case ArrayBufferBlock(arrayBuffer) =>
    -        blockManager.putIterator(blockId, arrayBuffer.iterator, 
storageLevel, tellMaster = true)
    +        blockManager.putIterator(blockId, countIterator, storageLevel, 
tellMaster = true)
    --- End diff --
    
    I think numRecords after putIterator will have issue if blockmanager not 
able to unroll the block safely to memory. In that case block-id will come as 
null and SparkException will be thrown. We should count the number of records 
only after block-id is there in putResult. Do let me know what you think. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to