[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773755#comment-16773755 ] Congxian Qiu(klion26) commented on FLINK-10481: --- another instance: https://api.travis-ci.org/v3/job/496339787/log.txt > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Assignee: Dawid Wysakowicz >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.6.3, 1.7.0 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR: unsatisfiable constraints: > bash (missing): > required by: world[bash] > snappy (missing): > required by: world[snappy] > The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero > code: 2 > {code} > https://api.travis-ci.org/v3/job/434909395/log.txt > It seems as if it is related to > https://github.com/gliderlabs/docker-alpine/issues/264 and > https://github.com/gliderlabs/docker-alpine/issues/279. > We might want to switch to a different base image to avoid these problems in > the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769076#comment-16769076 ] Congxian Qiu commented on FLINK-10481: -- Is the problem relevant to this issue? - Travis log link: [https://api.travis-ci.org/v3/job/493604030/log.txt] - Error log: {code:java} Step 2/16 : RUN apk add --no-cache bash snappy libc6-compat ---> [Warning] IPv4 forwarding is disabled. Networking will not work. ---> Running in 8c5ab0903f84 fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz [91mWARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.9/main/x86_64/APKINDEX.tar.gz: temporary error (try again later) [0mfetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz bash (missing): required by: world[bash] libc6-compat (missing): required by: world[libc6-compat] snappy (missing): required by: world[snappy] [91mWARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz: temporary error (try again later) ERROR: unsatisfiable constraints: [0mThe command '/bin/sh -c apk add --no-cache bash snappy libc6-compat' returned a non-zero code: 3 Command: build_image failed. Retrying... Command: build_image failed 3 times. Failed to build docker image. Aborting... [FAIL] Test script contains errors. Checking for errors... No errors in log files. Checking for exceptions... No exceptions in log files. Checking for non-empty .out files... grep: /home/travis/build/apache/flink/flink-dist/target/flink-1.8-SNAPSHOT-bin/flink-1.8-SNAPSHOT/log/*.out: No such file or directory No non-empty .out files. [FAIL] 'Wordcount end-to-end test in docker env' failed after 1 minutes and 35 seconds! Test exited with exit code 1 No taskexecutor daemon to stop on host travis-job-9baf0d81-84bb-4970-897d-6beb240d4b16. No standalonesession daemon to stop on host travis-job-9baf0d81-84bb-4970-897d-6beb240d4b16. travis_time:end:12dbc4b0:start=1550211894080441746,finish=1550214946487459619,duration=3052407017873 [0K[31;1mThe command "./tools/travis_controller.sh" exited with 1.[0m {code} > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Assignee: Dawid Wysakowicz >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.6.3, 1.7.0 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR: unsatisfiable constraints: > bash (missing): > required by: world[bash] > snappy (missing): > required by: world[snappy] > The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero > code: 2 > {code} > https://api.travis-ci.org/v3/job/434909395/log.txt > It seems as if it is related to > https://github.com/gliderlabs/docker-alpine/issues/264 and > https://github.com/gliderlabs/docker-alpine/issues/279. > We might want to switch to a different base image to avoid these problems in > the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691952#comment-16691952 ] ASF GitHub Bot commented on FLINK-10481: dawidwys closed pull request #7074: [FLINK-10481][e2e] Added retry logic for building docker image URL: https://github.com/apache/flink/pull/7074 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/flink-end-to-end-tests/test-scripts/common.sh b/flink-end-to-end-tests/test-scripts/common.sh index 4e6254864c1..275d9c49f4b 100644 --- a/flink-end-to-end-tests/test-scripts/common.sh +++ b/flink-end-to-end-tests/test-scripts/common.sh @@ -663,3 +663,22 @@ function find_latest_completed_checkpoint { local checkpoint_meta_file=$(ls -d ${checkpoint_root_directory}/chk-[1-9]*/_metadata | sort -Vr | head -n1) echo "$(dirname "${checkpoint_meta_file}")" } + +function retry_times() { +local retriesNumber=$1 +local backoff=$2 +local command=${@:3} + +for (( i = 0; i < ${retriesNumber}; i++ )) +do +if ${command}; then +return 0 +fi + +echo "Command: ${command} failed. Retrying..." +sleep ${backoff} +done + +echo "Command: ${command} failed ${retriesNumber} times." +return 1 +} diff --git a/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh b/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh index 370ef052a4d..2d8aa4f47d4 100755 --- a/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh +++ b/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh @@ -21,6 +21,8 @@ source "$(dirname "$0")"/common.sh DOCKER_MODULE_DIR=${END_TO_END_DIR}/../flink-container/docker DOCKER_SCRIPTS=${END_TO_END_DIR}/test-scripts/container-scripts +DOCKER_IMAGE_BUILD_RETRIES=3 +BUILD_BACKOFF_TIME=5 export FLINK_JOB=org.apache.flink.examples.java.wordcount.WordCount export FLINK_DOCKER_IMAGE_NAME=test_docker_embedded_job @@ -30,12 +32,19 @@ export INPUT_PATH=/data/test/input export OUTPUT_PATH=/data/test/output export FLINK_JOB_ARGUMENTS="--input ${INPUT_PATH}/words --output ${OUTPUT_PATH}/docker_wc_out" -# user inside the container must be able to createto workaround in-container permissions +build_image() { +./build.sh --from-local-dist --job-jar ${FLINK_DIR}/examples/batch/WordCount.jar --image-name ${FLINK_DOCKER_IMAGE_NAME} +} + +# user inside the container must be able to create files, this is a workaround in-container permissions mkdir -p $OUTPUT_VOLUME chmod 777 $OUTPUT_VOLUME cd "$DOCKER_MODULE_DIR" -./build.sh --from-local-dist --job-jar ${FLINK_DIR}/examples/batch/WordCount.jar --image-name ${FLINK_DOCKER_IMAGE_NAME} +if ! retry_times $DOCKER_IMAGE_BUILD_RETRIES ${BUILD_BACKOFF_TIME} build_image; then +echo "Failed to build docker image. Aborting..." +exit 1 +fi cd "$END_TO_END_DIR" docker-compose -f ${DOCKER_MODULE_DIR}/docker-compose.yml -f ${DOCKER_SCRIPTS}/docker-compose.test.yml up --abort-on-container-exit --exit-code-from job-cluster &> /dev/null This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Assignee: Dawid Wysakowicz >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.5.6, 1.6.3, 1.8.0, 1.7.1 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR:
[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687930#comment-16687930 ] ASF GitHub Bot commented on FLINK-10481: kl0u commented on a change in pull request #7074: [FLINK-10481][e2e] Added retry logic for building docker image URL: https://github.com/apache/flink/pull/7074#discussion_r233818875 ## File path: flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh ## @@ -30,14 +32,23 @@ export INPUT_PATH=/data/test/input export OUTPUT_PATH=/data/test/output export FLINK_JOB_ARGUMENTS="--input ${INPUT_PATH}/words --output ${OUTPUT_PATH}/docker_wc_out" -# user inside the container must be able to createto workaround in-container permissions +build_image() { +cd "$DOCKER_MODULE_DIR" Review comment: I would put the `cd "$DOCKER_MODULE_DIR"` and `cd "$END_TO_END_DIR"` around the `if` statement for clarity and also leave the `mkdir -p $OUTPUT_VOLUME` and `chmod 777 $OUTPUT_VOLUME` on top, so that the test fails fast if we do not have the necessary permissions. In other words, it could look sth like: ``` build_image() { ./build.sh --from-local-dist --job-jar ${FLINK_DIR}/examples/batch/WordCount.jar --image-name ${FLINK_DOCKER_IMAGE_NAME} local error_code=$? return ${error_code} } # user inside the container must be able to create files, this is a workaround in-container permissions mkdir -p $OUTPUT_VOLUME chmod 777 $OUTPUT_VOLUME cd "$DOCKER_MODULE_DIR" if ! retry_times $DOCKER_IMAGE_BUILD_RETRIES ${BUILD_BACKOFF_TIME} build_image; then echo "Failed to build docker image. Aborting..." exit 1 fi cd "$END_TO_END_DIR" ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Assignee: Dawid Wysakowicz >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.5.6, 1.6.3, 1.7.0 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR: unsatisfiable constraints: > bash (missing): > required by: world[bash] > snappy (missing): > required by: world[snappy] > The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero > code: 2 > {code} > https://api.travis-ci.org/v3/job/434909395/log.txt > It seems as if it is related to > https://github.com/gliderlabs/docker-alpine/issues/264 and > https://github.com/gliderlabs/docker-alpine/issues/279. > We might want to switch to a different base image to avoid these problems in > the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681478#comment-16681478 ] ASF GitHub Bot commented on FLINK-10481: dawidwys opened a new pull request #7074: [FLINK-10481][e2e] Added retry logic for building docker image URL: https://github.com/apache/flink/pull/7074 Added retry logic around building docker job image. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.5.6, 1.6.3, 1.7.0 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR: unsatisfiable constraints: > bash (missing): > required by: world[bash] > snappy (missing): > required by: world[snappy] > The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero > code: 2 > {code} > https://api.travis-ci.org/v3/job/434909395/log.txt > It seems as if it is related to > https://github.com/gliderlabs/docker-alpine/issues/264 and > https://github.com/gliderlabs/docker-alpine/issues/279. > We might want to switch to a different base image to avoid these problems in > the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641036#comment-16641036 ] Till Rohrmann commented on FLINK-10481: --- Maybe we can harden the test case to check for these kind of failures and then to retry or to not let the test fail. > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.7.0 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR: unsatisfiable constraints: > bash (missing): > required by: world[bash] > snappy (missing): > required by: world[snappy] > The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero > code: 2 > {code} > https://api.travis-ci.org/v3/job/434909395/log.txt > It seems as if it is related to > https://github.com/gliderlabs/docker-alpine/issues/264 and > https://github.com/gliderlabs/docker-alpine/issues/279. > We might want to switch to a different base image to avoid these problems in > the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636387#comment-16636387 ] JIN SUN commented on FLINK-10481: - seems like a intermittent issue, might be docker repository issue, other vision has same problem. > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.7.0 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR: unsatisfiable constraints: > bash (missing): > required by: world[bash] > snappy (missing): > required by: world[snappy] > The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero > code: 2 > {code} > https://api.travis-ci.org/v3/job/434909395/log.txt > It seems as if it is related to > https://github.com/gliderlabs/docker-alpine/issues/264 and > https://github.com/gliderlabs/docker-alpine/issues/279. > We might want to switch to a different base image to avoid these problems in > the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)