Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
Created https://github.com/apache/spark/pull/19081.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
jenkins retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
Not sure why the PRB is not picking up my requests. @sitalkedia can you
close and re-open the PR to see if that does it?
(The change looks fine, it just would be nice to get a clean test
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
SparkR tests have been super flaky lately.
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
Not sure why the test failed? May be the build is unstable? cc - @vanzin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/19048
Yea I agree the change made in this PR looks good for your issue, I'm just
suggesting maybe we could refactor the code further more, maybe as a follow up
work.
---
If your project is set up
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81195/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81195 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81195/testReport)**
for PR 19048 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81195 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81195/testReport)**
for PR 19048 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81194/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81194 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81194/testReport)**
for PR 19048 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81194 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81194/testReport)**
for PR 19048 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81193/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81193 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81193/testReport)**
for PR 19048 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81193 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81193/testReport)**
for PR 19048 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
@jiangxb1987 - I agree with you. I do not have the context or history to
comment on that. Unfortunately, the api has been designed that way and book
keeping of target number of executors is done
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/19048
One thing I don't understand clearly is why we should update the
`requestedTotalExecutors` inside the function `killExecutors`, asking to kill
some executor(s) don't implies we are requesting
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
>>Or it can call killExecutors() like it does today and then call
requestTotalExecutors right after, same result without the awkwardness of the
parameter name, but that adds a trip to the
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
(Or it can call `killExecutors()` like it does today and then call
`requestTotalExecutors` right after, same result without the awkwardness of the
parameter name, but that adds a trip to the cluster
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
Well, that's adding an API that does the same thing as existing APIs but a
little bit differently. In my view that adds to the problem, instead of fixing
it. Now every caller into the
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
On a high level I agree that keeping the states in 3 places is creating a
mess but changing that would require a big refactoring which is probably
outside of the scope of this change.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
> May be it would be cleaner if we provide a new api like this -
killExecutorsAndNotUpdateTotal?
I think the main thing that bothers me is that adding anything to the API
is making all this
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
>> this code in the EAM: Should be changed to account for the current
number of executors, so that the EAM doesn't tell the CGSB that it wants less
executors than currently exist.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
I think I see what you're saying. But I still think it's the fault of the
EAM.
> But please note that while killing 2 executors the EAM did not reduce its
target to 3, it is still 5.
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
>> Why? Because of the idle timeout? If that's your point, then the change
I referenced above should avoid that.
Yes because of idle timeout. Note that the `numExecutorsTarget` is 5 and
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
If you can actually provide logs that show what you're trying to say that
would probably be easier.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
> but the EAM asks the scheduler to kill 2 of them.
Why? Because of the idle timeout? If that's your point, then the change I
referenced above should avoid that.
> The scheduler
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
To be clear there is no issue on EAM side. Consider the following situation
-
- 10 executors are running, each executor can run 4 tasks at max.
- 20 tasks are running so EAM sets the
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
I think I'm starting to understand what you're getting at, but I still
don't see why this has anything to do with the CGSB. What I understand from
your comment is that the EAM may reduce its target
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
That's not really true.
The EAM uses the `requestTotalExecutors` api to set the target for the
scheduler.
- 10 executors are running, each executor can run 4 tasks at max.
- 20
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81132/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81132 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81132/testReport)**
for PR 19048 at commit
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
> This is when things get out of sync because now the scheduler will set
the number of total executors needed from 4 to 1.
Have you actually observed that behavior?
The way I
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
Looking at the scheduler and the dynamic executor allocator code, this is
what my understanding, correct me if I am wrong.
Let's say the dynamic executor allocator is ramping down the
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19048
I'm not sure I understand why is this a problem. What is the undesired
behavior that happens because of this? That's not explained either in the PR
nor in the bug.
The way I understand the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81132 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81132/testReport)**
for PR 19048 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
Jenkins retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81116/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81116 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81116/testReport)**
for PR 19048 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81116 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81116/testReport)**
for PR 19048 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/19048
cc - @markhamstra , @sameeragarwal, @rxin, @vanzin,
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81110/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19048
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81110 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81110/testReport)**
for PR 19048 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19048
**[Test build #81110 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81110/testReport)**
for PR 19048 at commit
51 matches
Mail list logo