Here is the corresponding JIRA ticket: https://issues.apache.org/jira/browse/FLINK-15806
On Wed, Jan 29, 2020 at 3:16 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Theo, > > your assumption is correct that Flink won't clean up its files when using > `yarn application -kill ID`. This should also hold true for other temporary > files generated by Flink's Blob service, shuffle service and io manager. > These files are usually stored under /tmp and should be cleaned up > eventually, though. > > I think a better approach is to reconnect to the Flink Yarn session > cluster and then issue the "stop" command. You can either do it via > `bin/yarn-session.sh -id APP_ID` and then type "stop" or you do `echo > "stop" | bin/yarn-session.sh -id APP_ID`. > > I think we should also update the logging statements of the > yarn-session.sh which say that you should use `yarn application -kill` in > order to stop the process. > > Cheers, > Till > > On Tue, Jan 28, 2020 at 6:21 PM Theo Diefenthal < > theo.diefent...@scoop-software.de> wrote: > >> Hi there, >> >> Today I realized that we currently have a lot of not housekept flink >> distribution jar files and would like to know what to do about this, i.e. >> how to proper housekeep them. >> >> In the job submitting HDFS home directory, I find a subdirectory called >> `.flink` with hundreds of subfolders like `application_1573731655031_0420`, >> having the following structure: >> >> -rw-r--r-- 3 dev dev 861 2020-01-27 21:17 >> /user/dev/.flink/application_1580155950981_0010/4797ff6e-853b-460c-81b3-34078814c5c9-taskmanager-conf.yaml >> -rw-r--r-- 3 dev dev 691 2020-01-27 21:16 >> /user/dev/.flink/application_1580155950981_0010/application_1580155950981_0010-flink-conf.yaml2755466919863419496.tmp >> -rw-r--r-- 3 dev dev 861 2020-01-27 21:17 >> /user/dev/.flink/application_1580155950981_0010/fdb5ef57-c140-4f6d-9791-c226eb1438ce-taskmanager-conf.yaml >> -rw-r--r-- 3 dev dev 92.2 M 2020-01-27 21:16 >> /user/dev/.flink/application_1580155950981_0010/flink-dist_2.11-1.9.1.jar >> drwxr-xr-x - dev dev 0 2020-01-27 21:16 >> /user/dev/.flink/application_1580155950981_0010/lib >> -rw-r--r-- 3 dev dev 2.6 K 2020-01-27 21:16 >> /user/dev/.flink/application_1580155950981_0010/log4j.properties >> -rw-r--r-- 3 dev dev 2.3 K 2020-01-27 21:16 >> /user/dev/.flink/application_1580155950981_0010/logback.xml >> drwxr-xr-x - dev dev 0 2020-01-27 21:16 >> /user/dev/.flink/application_1580155950981_0010/plugins >> >> With having tons of those folders (For each flink session we >> launched/killed in our CI CD pipeline), they sum up to some terrabytes in >> our HDFS in used space. >> I suppose, I kill our flink sessions wrongly. We start and stop sessions >> and and jobs separately like so: >> >> Start: >> >> ${OS_ROOT}/flink/bin/yarn-session.sh -jm 4g -tm 32g --name >> "${FLINK_SESSION_NAME}" -d -Denv.java.opts="-XX:+HeapDumpOnOutOfMemoryError" >> >> ${OS_ROOT}/flink/bin/flink run -m ${FLINK_HOST} [..savepoint/checkpoint >> options...] -d -n "${JOB_JAR}" $* >> >> Stop >> >> ${OS_ROOT}/flink/bin/flink stop -p ${SAVEPOINT_BASEDIR}/${FLINK_JOB_NAME} -m >> ${FLINK_HOST} ${ID} >> >> yarn application -kill "${ID}" >> >> >> yarn application -kill was the best I could find as the flink docu >> states, the linux session process should only be closed (" Stop the YARN >> session by stopping the unix process (using CTRL+C) or by entering ‘stop’ >> into the client."). >> >> Now my question: Is there a more elegant way to kill a yarn session >> (remotely from some host in the cluster, not necessarily the one starting >> the detached session), which also does the housekeeping then? Or should I >> do the housekeeping myself manually? (Pretty easy to script). Do I need to >> expect any more side effects when killing the session with "yarn >> application -kill"? >> >> Best regards >> Theo >> >> -- >> SCOOP Software GmbH - Gut Maarhausen - Eiler Straße 3 P - D-51107 Köln >> Theo Diefenthal >> >> T +49 221 801916-196 - F +49 221 801916-17 - M +49 160 90506575 >> theo.diefent...@scoop-software.de - www.scoop-software.de >> Sitz der Gesellschaft: Köln, Handelsregister: Köln, >> Handelsregisternummer: HRB 36625 >> Geschäftsführung: Dr. Oleg Balovnev, Frank Heinen, >> Martin Müller-Rohde, Dr. Wolfgang Reddig, Roland Scheel >> >