[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883237 @rxin that's a fair solution, too, although the bitmap needs to be losslessly compressed. I could imagine cases where data is already partitioned but a user performs partition-preserving operations without specifying `preservesPartitioning`, then does a filtering operation that would otherwise benefit from partitioning. In these cases, you might have this extreme bimodal distribution where most blocks are zero but the remaining blocks might be big. In these cases, do you care about the exact sizes of those blocks? Probably not in most cases, since there will be few blocks. I'll look into folding this into the compressed version as you've suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883834 @JoshRosen I have been looking into the compressed bitmap and already get a good idea of how to use roaring bitmap to perform the task. If this work is not urgent, can you give me one day or two to get the compressed bitmap part completed? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883993 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21964/consoleFull) for PR 2866 at commit [`c23897a`](https://github.com/apache/spark/commit/c23897aea7881eb819ec074073a4431ec8ba7eb5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21964/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59966852 Compressed bitmaps are in general just variants of run-length encoding that is lossless. Which should be able to handle your case too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59976845 @Ishiihara Thanks for the reminder about Roaring BitMap. I'm just going to do this myself, since it should only take a few minutes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59979144 @JoshRosen Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59989609 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22001/consoleFull) for PR 2866 at commit [`609407d`](https://github.com/apache/spark/commit/609407de8a0bd78ca043de19c185c7a43bcf5b5e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59989763 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22001/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59989759 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22001/consoleFull) for PR 2866 at commit [`609407d`](https://github.com/apache/spark/commit/609407de8a0bd78ca043de19c185c7a43bcf5b5e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2866#discussion_r19175249 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -56,37 +78,25 @@ class MapStatusSuite extends FunSuite { assert(status.getSizeForBlock(2000) === 150L) } - test(classOf[HighlyCompressedMapStatus].getName + : estimated size is within 10%) { --- End diff -- I removed this test because it was broken as originally written. The test says that it tests HighlyCompressedMapStatus's error, but it was broken because it never actually checked that the highly-compressed status was actually created. Since this test only used 50 map outputs rather than 2000, it never actually exercised HighlyCompressedMapStatus's code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59990095 I've updated this to use Roaring Bitmap for tracking which blocks are non-empty. I also changed HighlyCompressedMapStatus to use the average size of only non-empty blocks; this should provide better estimates for map outputs that contain a few huge partitions and many empty ones. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59990957 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59991107 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22004/consoleFull) for PR 2866 at commit [`ba2e71c`](https://github.com/apache/spark/commit/ba2e71c398c21a3b1f10d29d617c6d15e687ed6c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59991652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22004/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59991650 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22004/consoleFull) for PR 2866 at commit [`ba2e71c`](https://github.com/apache/spark/commit/ba2e71c398c21a3b1f10d29d617c6d15e687ed6c). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59993759 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22005/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-60003719 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-60004163 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22010/consoleFull) for PR 2866 at commit [`ba2e71c`](https://github.com/apache/spark/commit/ba2e71c398c21a3b1f10d29d617c6d15e687ed6c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/2866 [SPARK-4019] Fix MapStatus compression bug that could lead to empty results This commit fixes a bug in MapStatus that could cause jobs to wrongly return empty results if those jobs contained stages with more than 2000 partitions where most of those partitions were empty. For jobs with 2000 partitions, MapStatus uses HighlyCompressedMapStatus, which only stores the average size of blocks. If the average block size is zero, then this will cause all blocks to be reported as empty, causing BlockFetcherIterator to mistakenly skip them. For example, this would return an empty result: sc.makeRDD(0 until 10, 1000).repartition(2001).collect() The root problem here is that MapStatus has a (previously undocumented) correctness property that was violated by HighlyCompressedMapStatus: If a block is non-empty, then getSizeForBlock must be non-zero. I fixed this by introducing a new SparseCompressedMapStatus which only stores the sizes of non-empty blocks. I also added new tests and assertions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark spark-4019 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2866 commit 91276a3bea64a11ff443baeb32df4ef1dab9d7c8 Author: Josh Rosen joshro...@databricks.com Date: 2014-10-21T01:34:57Z [SPARK-4019] Fix MapStatus compression bug that could lead to empty results. This commit fixes a bug in MapStatus that could cause jobs to wrongly return empty results if those jobs contained stages with more than 2000 partitions where most of those partitions were empty. For jobs with 2000 partitions, MapStatus uses HighlyCompressedMapStatus, which only stores the average size of blocks. If the average block size is zero, then this will cause all blocks to be reported as empty, causing BlockFetcherIterator to mistakenly skip them. For example, this would return an empty result: sc.makeRDD(0 until 10, 1000).repartition(2001).collect() The root problem here is that MapStatus has a (previously undocumented) correctness property that was violated by HighlyCompressedMapStatus: If a block is non-empty, then getSizeForBlock must be non-zero. I fixed this by introducing a new SparseCompressedMapStatus which only stores the sizes of non-empty blocks. I also added new tests and assertions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59867805 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21951/consoleFull) for PR 2866 at commit [`91276a3`](https://github.com/apache/spark/commit/91276a3bea64a11ff443baeb32df4ef1dab9d7c8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59871327 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21951/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59871323 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21951/consoleFull) for PR 2866 at commit [`91276a3`](https://github.com/apache/spark/commit/91276a3bea64a11ff443baeb32df4ef1dab9d7c8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59871966 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21957/consoleFull) for PR 2866 at commit [`c23897a`](https://github.com/apache/spark/commit/c23897aea7881eb819ec074073a4431ec8ba7eb5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59876264 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21957/consoleFull) for PR 2866 at commit [`c23897a`](https://github.com/apache/spark/commit/c23897aea7881eb819ec074073a4431ec8ba7eb5). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59876271 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21957/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59879420 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59879396 ;retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59879704 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21964/consoleFull) for PR 2866 at commit [`c23897a`](https://github.com/apache/spark/commit/c23897aea7881eb819ec074073a4431ec8ba7eb5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59882025 Oh wow. Thanks for fixing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59882089 Actually instead of introducing a new one, what if we introduce a compressed bitmap that tracks zero-sized blocks, and then use avg size to track only non-zero blocks? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org