[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-griffin/pull/421 ---
[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...
Github user chemikadze commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/421#discussion_r219724832 --- Diff: service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java --- @@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean instance) { instance.setState(LivySessionStates.toLivyState(state)); } return true; +} catch (HttpClientErrorException e) { +LOGGER.warn("client error {} from yarn: {}", +e.getMessage(), e.getResponseBodyAsString()); +if (e.getStatusCode() == HttpStatus.NOT_FOUND) { +// in sync with Livy behavior, see com.cloudera.livy.utils.SparkYarnApp +instance.setState(DEAD); --- End diff -- Only 404 is handled here, which should not be result of network issue. It looks like any kind of error reported by Yarn client (after internal retries) results in DEADing job on Livy side: https://github.com/cloudera/livy/blob/master/server/src/main/scala/com/cloudera/livy/utils/SparkYarnApp.scala#L307 I'll need to double check whether not found applications are ever getting retried, to make sure behavior is same as on Livy side. If not -- then that's what Livy would do. ---
[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...
Github user guoyuepeng commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/421#discussion_r219672109 --- Diff: service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java --- @@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean instance) { instance.setState(LivySessionStates.toLivyState(state)); } return true; +} catch (HttpClientErrorException e) { +LOGGER.warn("client error {} from yarn: {}", +e.getMessage(), e.getResponseBodyAsString()); +if (e.getStatusCode() == HttpStatus.NOT_FOUND) { +// in sync with Livy behavior, see com.cloudera.livy.utils.SparkYarnApp +instance.setState(DEAD); --- End diff -- Agree we need to handle state, but what if this is caused by network issue, should we double confirm before we jump to conclusion that the instance is dead? ---
[GitHub] incubator-griffin pull request #421: GRIFFIN-197 Treat non-existing YARN app...
GitHub user chemikadze opened a pull request: https://github.com/apache/incubator-griffin/pull/421 GRIFFIN-197 Treat non-existing YARN app as FAILED This avoids jobs becoming stuck in UNKNOWN state on Service side. Also, improves logging for YARN client errors. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chemikadze/incubator-griffin GRIFFIN-197 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-griffin/pull/421.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #421 commit 3c7a240992327a827ebcd51f684b59f466e60f99 Author: Nikolay Sokolov Date: 2018-09-21T20:44:15Z GRIFFIN-197 Treat non-existing YARN app as FAILED This avoids jobs becoming stuck in UNKNOWN state on Service side. Also, improves logging for YARN client errors. ---