Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77862 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77862/testReport)**
for PR 18231 at commit
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/18231
> There isn't a reference here anymore; there could be elsewhere.
Only if there was a bug in the RPC layer, since this is an RPC handler and
the message should not be referenced by the RPC
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18231
There isn't a reference _here_ anymore; there could be elsewhere. It sounds
like there's good reason to believe there is not another reference hanging
around though.
---
If your project is set up
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/18231
@srowen I don't see any references to the original `OpenBlocks` message nor
to the block id array in the updated code, not sure why do you think there's
still a reference somewhere?
---
If your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77831/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77831 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77831/testReport)**
for PR 18231 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77831 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77831/testReport)**
for PR 18231 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77811/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77811 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77811/testReport)**
for PR 18231 at commit
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
@srowen
I did a test to verify this patch.
I wrap a number of blocks inside `OpenBlocks` and send it to
`ExternalShuffleBlockHandler`.
With this change:
it cost about 133M in the
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
Yes, I think it's great to do some tests and give a good evidence.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18231
I'm not clear that's true, no. Not, at least, in the lifetime of the
iterator. That's what has to be true for this to help anything. Do you have
evidence this is true? for example if you have tests
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
there is no where referencing `msg`, right? I guess the `msg` will be
garbage collected fluently.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77811 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77811/testReport)**
for PR 18231 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18231
That's not the question though. The question is whether they could be freed
even after this change. msg still references it. That's what you need to
establish, if only by some empirical testing.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
The blockIds cannot be freed because they are referenced in the iterator.
In current change they are not. We reference the mapIdAndReduceIds instead.
Thus the blockIds in OpenBlocks can be
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18231
I get it. But that doesn't make the reference in OpenBlocks go away. This
only helps anything is msg/msgObj can be garbage collected earlier. Is that the
case? right now this is allocating
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
I mean the blockIds in `OpenBlocks`, they have reference in iterator.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18231
The current iterator doesn't have any state except for an int. What are you
referring to?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
@srowen Sorry, I didn't make it clear.
1. In current code, all blockIds are stored in the iterator. They are
released only when the iterator is traversed.
2. Now I change the `String` to
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
Actually it's more than 12 bytes.
Yes, there are millions of these. In my heap dump, it's 1.5 G
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18231
That's 12 bytes. Are there millions of these?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
@vanzin
Thanks a lot for reviewing this. I refined according to your comments,
Please take another look at this when you have time :)
---
If your project is set up for it, you can reply to
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
@srowen
Thanks a lot looking into this :)
For example: blockId="shuffle_20_1000_2000", it is stored as an `String`,
which costs more than 20 bytes. In this change, it will cost only 8
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77806/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77806 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77806/testReport)**
for PR 18231 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77806 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77806/testReport)**
for PR 18231 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77795/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18231
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77795 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77795/testReport)**
for PR 18231 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18231
**[Test build #77795 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77795/testReport)**
for PR 18231 at commit
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
n my cluster, we are suffering from OOM of shuffle-service.
We found that a lot of executors are fetching blocks from a single
shuffle-service. Analyzing the memory, we found that the
35 matches
Mail list logo