Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
Thanks for the responses, I learned a lot from this:) I am going to close
this PR for now, and maybe collaborate on the Kubernetes ticket raised by this
PR. Thanks.
---
--
Github user liyinan926 commented on the issue:
https://github.com/apache/spark/pull/21067
+1 on what @foxish said. If using a Job is the right way to go ultimately,
it's good to open discussion with sig-apps on adding an option to the Job API &
controller to use deterministic pod name
Github user foxish commented on the issue:
https://github.com/apache/spark/pull/21067
> ReadWriteOnce storage can only be attached to one node.
This is well known. Using the RWO volume for fencing here would work - but
this is not representative of all users. This breaks down
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/21067
@baluchicken yeah I thought of that but I was hoping for more automation.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@skonto if the node never become available again the new driver will stay
in Pending state until like @foxish said "the user explicitly force-kills the
old driver".
---
--
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/21067
> Once the partitioned node become available again the unknown old driver
pod got terminated, the volume got unattached and get reattached to the new
driver pod which state now changed from pending t
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
I ran some more tests about this. I think we can say that this change can
add resiliency to spark batch jobs where just like in case of YARN Spark will
retry the job from the beginning if an err
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93201/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #93201 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93201/testReport)**
for PR 21067 at commit
[`c04179b`](https://github.com/apache/spark/commit/c
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #93201 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93201/testReport)**
for PR 21067 at commit
[`c04179b`](https://github.com/apache/spark/commit/c0
Github user foxish commented on the issue:
https://github.com/apache/spark/pull/21067
> After a short/configurable delay the driver pod state changed to Unknown
and the Job controller initiated a new spark driver.
This is dangerous behavior. The old spark driver can still b
Github user promiseofcake commented on the issue:
https://github.com/apache/spark/pull/21067
@baluchicken, did that test involve using checkpointing in a shared
location?
---
-
To unsubscribe, e-mail: reviews-unsubs
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@foxish I just checked on a Google Kubernetes Cluster with Kubernetes
version 1.10.4-gke.2. I created a two node cluster and I emulated "network
partition" with iptables rules (node running the
Github user liyinan926 commented on the issue:
https://github.com/apache/spark/pull/21067
+1 on what @foxish said. I would also like to see a detailed discussion on
the semantic differences this brings onto the table first before committing to
this approach.
---
---
Github user foxish commented on the issue:
https://github.com/apache/spark/pull/21067
I don't think this current approach will suffice. Correctness is important
here, especially for folks using spark streaming. I understand that we're
proposing the use of backoff limits but there is *
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92685/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #92685 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92685/testReport)**
for PR 21067 at commit
[`0f280f4`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #92685 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92685/testReport)**
for PR 21067 at commit
[`0f280f4`](https://github.com/apache/spark/commit/0f
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@skonto thanks, I am going to check it.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional com
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/21067
@baluchicken probably this is covered here:
https://github.com/apache/spark/pull/21260. I kind of missed that, as I thought
it was only for hostpaths but it also covers PVs.
---
--
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92650/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #92650 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92650/testReport)**
for PR 21067 at commit
[`4e0b3b0`](https://github.com/apache/spark/commit/4
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@mccheah rebased to master and updated the PR, now the
KubernetesDriverBuilder will create the driver job instead of the configuration
steps.
---
-
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #92650 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92650/testReport)**
for PR 21067 at commit
[`4e0b3b0`](https://github.com/apache/spark/commit/4e
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@skonto sorry I have couple of other things to do but I am trying to update
this as my time allows it.
Yes we are planning to create a PR about the PVs related stuff as soon as
this one went
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/21067
@baluchicken @foxish any update on this? HA story is pretty critical for
production in many cases.
---
-
To unsubscribe, e-mail:
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@felixcheung rebased to master and fixed failing unit tests
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.or
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91660/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #91660 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91660/testReport)**
for PR 21067 at commit
[`00a149a`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #91660 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91660/testReport)**
for PR 21067 at commit
[`00a149a`](https://github.com/apache/spark/commit/00
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21067
any update?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user liyinan926 commented on the issue:
https://github.com/apache/spark/pull/21067
@foxish on concerns of the lack of exactly-one semantics.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
F
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90898/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #90898 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90898/testReport)**
for PR 21067 at commit
[`2b1de38`](https://github.com/apache/spark/commit/2
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #90898 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90898/testReport)**
for PR 21067 at commit
[`2b1de38`](https://github.com/apache/spark/commit/2b
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90895/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #90895 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90895/testReport)**
for PR 21067 at commit
[`95f6886`](https://github.com/apache/spark/commit/9
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #90895 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90895/testReport)**
for PR 21067 at commit
[`95f6886`](https://github.com/apache/spark/commit/95
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@felixcheung fixed the Scala style validations, sorry.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90868/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #90868 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90868/testReport)**
for PR 21067 at commit
[`f19bf1a`](https://github.com/apache/spark/commit/f
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21067
**[Test build #90868 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90868/testReport)**
for PR 21067 at commit
[`f19bf1a`](https://github.com/apache/spark/commit/f1
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21067
Jenkins, ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: revie
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
Rebased again to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@mccheah Rebased to master, and added support for configurable backofflimit.
---
-
To unsubscribe, e-mail: reviews-unsubscr..
Github user stoader commented on the issue:
https://github.com/apache/spark/pull/21067
@mccheah
> But whether or not the driver should be relaunchable should be determined
by the application submitter, and not necessarily done all the time. Can we
make this behavior configur
Github user mccheah commented on the issue:
https://github.com/apache/spark/pull/21067
> We don't have a solid story for checkpointing streaming computation right
now, and even if we did, you'll certainly lose all progress from batch jobs.
Should probably clarify re: streaming
Github user mccheah commented on the issue:
https://github.com/apache/spark/pull/21067
Looks like there's a lot of conflicts from the refactor that was just
merged.
In general though I don't think this buys us too much. The problem is that
when the driver fails, you'll lose a
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21067
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
58 matches
Mail list logo