GitHub user dibbhatt opened a pull request:
https://github.com/apache/spark/pull/6707
[SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct
count at Spark UI
@tdas @zsxwing this is the new PR for Spark-8080
I have merged https://github.com/apache/spark/pull/6659
Also to mention , for MEMORY_ONLY settings , when Block is not able to
unrollSafely to memory if enough space is not there, BlockManager won't try to
put the block and ReceivedBlockHandler will throw SparkException as it could
not find the block id in PutResult. Thus number of records in block won't be
counted if Block failed to unroll in memory. Which is fine.
For MEMORY_DISK settings , if BlockManager not able to unroll block to
memory, block will still get deseralized to Disk. Same for WAL based store. So
for those cases ( storage level = memory + disk ) number of records will be
counted even though the block not able to unroll to memory.
thus I added the isFullyConsumed in the CountingIterator but have not used
it as such case will never happen that block not fully consumed and
ReceivedBlockHandler still get the block ID.
I have added few test cases to cover those block unrolling scenarios also.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dibbhatt/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6707.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6707
----
commit 01e6dc8ad9ac6353ef8e073b93a96bffb6e46ca6
Author: U-PEROOT\UBHATD1 <[email protected]>
Date: 2015-06-08T14:17:16Z
A
commit 4c5931d660c6d0642dbb63c2340b24f5493e19d3
Author: Dibyendu Bhattacharya <[email protected]>
Date: 2015-06-08T15:04:04Z
[SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct
count at Spark UI
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]