[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...

2018-09-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-griffin/pull/421


---


[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...

2018-09-23 Thread chemikadze
Github user chemikadze commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/421#discussion_r219724832
  
--- Diff: 
service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java ---
@@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean 
instance) {
 instance.setState(LivySessionStates.toLivyState(state));
 }
 return true;
+} catch (HttpClientErrorException e) {
+LOGGER.warn("client error {} from yarn: {}",
+e.getMessage(), e.getResponseBodyAsString());
+if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
+// in sync with Livy behavior, see 
com.cloudera.livy.utils.SparkYarnApp
+instance.setState(DEAD);
--- End diff --

Only 404 is handled here, which should not be result of network issue.

It looks like any kind of error reported by Yarn client (after internal 
retries) results in DEADing job on Livy side: 
https://github.com/cloudera/livy/blob/master/server/src/main/scala/com/cloudera/livy/utils/SparkYarnApp.scala#L307
I'll need to double check whether not found applications are ever getting 
retried, to make sure behavior is same as on Livy side. If not -- then that's 
what Livy would do.


---


[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...

2018-09-22 Thread guoyuepeng
Github user guoyuepeng commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/421#discussion_r219672109
  
--- Diff: 
service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java ---
@@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean 
instance) {
 instance.setState(LivySessionStates.toLivyState(state));
 }
 return true;
+} catch (HttpClientErrorException e) {
+LOGGER.warn("client error {} from yarn: {}",
+e.getMessage(), e.getResponseBodyAsString());
+if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
+// in sync with Livy behavior, see 
com.cloudera.livy.utils.SparkYarnApp
+instance.setState(DEAD);
--- End diff --

Agree we need to handle state, 
but what if this is caused by network issue, 
should we double confirm before we jump to conclusion that the instance is 
dead?


---


[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...

2018-09-21 Thread chemikadze
GitHub user chemikadze opened a pull request:

https://github.com/apache/incubator-griffin/pull/421

GRIFFIN-197 Treat non-existing YARN app as FAILED

This avoids jobs becoming stuck in UNKNOWN state on Service side.
Also, improves logging for YARN client errors.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chemikadze/incubator-griffin GRIFFIN-197

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-griffin/pull/421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #421


commit 3c7a240992327a827ebcd51f684b59f466e60f99
Author: Nikolay Sokolov 
Date:   2018-09-21T20:44:15Z

GRIFFIN-197 Treat non-existing YARN app as FAILED

This avoids jobs becoming stuck in UNKNOWN state on Service side.
Also, improves logging for YARN client errors.




---