[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick resolved as Fixed Laurent Hory please do not reopen. If you believe you have discovered a regression, file a separate issue linked to this one with complete, minimal steps to reproduce from scratch. Jenkins / JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Change By: Jesse Glick Status: Reopened Resolved Resolution: Fixed Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Laurent Hory reopened an issue Hi, fix for this issue cause also a freeze during pytest execution on docker node The sh command are block during execution With the diagnostics parameters the execution works correctly -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true The context is an pipeline to launch pytest with tox docker_img_version='0.17' launch_container_with_user='jenkins' parser_name='sweep_tools' github_cred_id='xx' github_url='g...@github.com:xxx.git' pipeline { agent { docker { image 'x/analytics/ci_process:'+docker_img_version label 'slaves_host' args '-u jenkins -e POSTGRESQL_USER= -e POSTGRESQL_PASSWORD= -e POSTGRESQL_DATABASE=x' customWorkspace "/home/jenkins/workspace/${env.JOB_NAME}" } } stages { stage('Delete tox cache') { steps { dir("${env.WORKSPACE}/.tox") { deleteDir() } } } stage('Test execution') { steps { sh 'tox' junit 'tests/results/results.xml' cobertura autoUpdateHealth: false, autoUpdateStability: false, coberturaReportFile: 'tests/results/coverage.xml', conditionalCoverageTargets: '70, 0, 0', failUnhealthy: false, failUnstable: false, lineCoverageTargets: '80, 0, 0', maxNumberOfBuilds: 0, methodCoverageTargets: '80, 0, 0', onlyStable: false, sourceEncoding: 'ASCII', zoomCoverageChart: false } } } } The code in tox.ini file is [tox] envlist = py36[testenv] install_command = pip install {opts} {packages} deps = pytest pytest-cov pytest-flask chardet commands = pytest --cov=sweep_tools --cov-report term-missing --cov-report xml:tests/results/coverage.xml --junitxml tests/results/results.xml Actually the command line executed in step is and freeze just after pytest execution: sh -c "({ while [ -d '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d' -a \! -f '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d/jenkins-result.txt' ]; do touch '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d/jenkins-log.txt'; sleep 3; done } & jsc=durable-07cc39a960911d2b0363cd9d28761c7c; JENKINS_SERVER_COOKIE=$jsc 'sh' -xe '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d/script.sh' > '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d/jenkins-result.txt.tmp'; mv '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d/jenkins-result.txt.tmp' '/home/jenkins/workspace/auto_ci/sweep_tools/dev@tmp/durable-d4d01f8d/jenkins-result.txt'; wait) >&- 2>&- &" I can fix the the freeze with following command line:
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Daniel Jeznach commented on JENKINS-58290 Re: WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Jesse Glick Yes, pkill was guilty in this case. Thank you for quick and accurate response. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.200355.156202156.6147.1564731960105%40Atlassian.JIRA.
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick commented on JENKINS-58290 Re: WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Daniel Jeznach you can file a separate issue (using the is caused by option to Link) with complete, minimal steps to reproduce from scratch. Offhand I suspect the pkill command is to blame, perhaps by killing too much. It is possible you did not need this to begin with—exiting a node block already sends a termination signal to all processes which inherited the (IIRC) JENKINS_MAGIC_COOKIE environment variable passed to any nested sh steps. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.200355.156202156.5632.1564675560156%40Atlassian.JIRA.
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Daniel Jeznach edited a comment on JENKINS-58290 Re: WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Hello, fix for this issue causes one of our pipelines to stuck. Passing {noformat}-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true{noformat} parameter reverts to old behavior, so I blame running sh build step in subshell as a cause of this.Pipeline code is:{code:java}pipeline {agent { label "xx" + (env.JOB_NAME.find(/-(u14|u16|u18|w7|w10)(?=-)/) ?:'') }options {timeout(time: 60, unit: 'MINUTES')}environment {TMP = "${env.WORKSPACE}/tmp"PROJECT_NAME = "${env.JOB_NAME}".find(/^[^-]+/)PROJECT_BRANCH = "${env.JOB_NAME}".find(/(?<=^[^-]+-)[^-]+/)PROJECT_RELEASE = "${env.PROJECT_BRANCH}".replace('MAIN', 'rc')P4_BRANCH = "REL/${env.PROJECT_BRANCH}".replace('REL/MAIN', 'MAIN')}parameters {string(name: 'change', defaultValue: '', description: 'Changelist to build.', trim: true)string(name: 'status', defaultValue: '', description: 'Changelist status (shelved/submitted).', trim: true)string(name: 'review', defaultValue: '', description: 'Helix Swarm review number.', trim: true)string(name: 'pass', defaultValue: '', description: 'Success callback URL.', trim: true)string(name: 'fail', defaultValue: '', description: 'Failure callback URL.', trim: true)}triggers {pollSCM('@midnight')}stages {stage('sync') {steps {p4sync(credential: 'x',format: '.-${NODE_NAME}-${JOB_NAME}-${EXECUTOR_NUMBER}_ws',populate: forceClean(),source: depotSource("//path/${env.P4_BRANCH}/..."))buildName "${env.P4_CHANGELIST} ${params.status?:''} ${params.change?:''}".trim()sh '''sed 's/\\\$Change: [0-9]\\+ \\\$/\$Change: '$P4_CHANGELIST' \$/' -i x/version.py'''script {def dot = env.PROJECT_RELEASE.find(/rc$/) ? '' : '.'env.PACKAGE_NAME = "${env.PROJECT_NAME}-${env.PROJECT_RELEASE}${dot}${env.P4_CHANGELIST}.tar.gz"}echo "PROJECT_NAME = $PROJECT_NAME"echo "PROJECT_BRANCH = $PROJECT_BRANCH"echo "PROJECT_RELEASE = $PROJECT_RELEASE"echo "P4_BRANCH = $P4_BRANCH"echo "PACKAGE_NAME = $PACKAGE_NAME"}}stage('build') {steps { sh '''#!/bin/bash -xe# Clean up possible orphans from other test sessionspkill -P1 -U $(id -u) || truemkdir $TMPvirtualenv venv. venv/bin/activatepython setup.py sdistpip install dist/${PROJECT_NAME}-${PROJECT_RELEASE}*.tar.gzxx master --print-requirements | xargs pip installpip install -r tests/test_requirements.txt'''}}stage('test') { steps { sh '''. venv/bin/activatepytest -n 4 \ --timeout=300 \ --junit-xml ${WORKSPACE}/testresults.xml \ --verbose \ --cov=${PROJECT_NAME} \ --cov-branch \ --cov-report xml:${WORKSPACE}/coverage.xml \
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Daniel Jeznach updated an issue Jenkins / JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Change By: Daniel Jeznach Comment: Hello, fix for this issue causes one of our pipelines to stuck. Passing {noformat}-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true{noformat} parameter reverts to old behavior, so I blame running sh build step in subshell as a cause of this.Pipeline code is:{code:java}pipeline {agent { label "xx" + (env.JOB_NAME.find(/-(u14|u16|u18|w7|w10)(?=-)/) ?:'') }options {timeout(time: 60, unit: 'MINUTES')}environment {TMP = "${env.WORKSPACE}/tmp"PROJECT_NAME = "${env.JOB_NAME}".find(/^[^-]+/)PROJECT_BRANCH = "${env.JOB_NAME}".find(/(?<=^[^-]+-)[^-]+/)PROJECT_RELEASE = "${env.PROJECT_BRANCH}".replace('MAIN', 'rc')P4_BRANCH = "REL/${env.PROJECT_BRANCH}".replace('REL/MAIN', 'MAIN')}parameters {string(name: 'change', defaultValue: '', description: 'Changelist to build.', trim: true)string(name: 'status', defaultValue: '', description: 'Changelist status (shelved/submitted).', trim: true)string(name: 'review', defaultValue: '', description: 'Helix Swarm review number.', trim: true)string(name: 'pass', defaultValue: '', description: 'Success callback URL.', trim: true)string(name: 'fail', defaultValue: '', description: 'Failure callback URL.', trim: true)}triggers {pollSCM('@midnight')}stages {stage('sync') {steps {p4sync(credential: 'x',format: '.-${NODE_NAME}-${JOB_NAME}-${EXECUTOR_NUMBER}_ws',populate: forceClean(),source: depotSource("//path/${env.P4_BRANCH}/..."))buildName "${env.P4_CHANGELIST} ${params.status?:''} ${params.change?:''}".trim()sh '''sed 's/\\\$Change: [0-9]\\+ \\\$/\$Change: '$P4_CHANGELIST' \$/' -i x/version.py'''script {def dot = env.PROJECT_RELEASE.find(/rc$/) ? '' : '.'env.PACKAGE_NAME = "${env.PROJECT_NAME}-${env.PROJECT_RELEASE}${dot}${env.P4_CHANGELIST}.tar.gz"}echo "PROJECT_NAME = $PROJECT_NAME"echo "PROJECT_BRANCH = $PROJECT_BRANCH"echo "PROJECT_RELEASE = $PROJECT_RELEASE"echo "P4_BRANCH = $P4_BRANCH"echo "PACKAGE_NAME = $PACKAGE_NAME"}}stage('build') {steps { sh '''#!/bin/bash -xe# Clean up possible orphans from other test sessionspkill -P1 -U $(id -u) || truemkdir $TMPvirtualenv venv. venv/bin/activatepython setup.py sdistpip install dist/${PROJECT_NAME}-${PROJECT_RELEASE}*.tar.gzimgee master --print-requirements | xargs pip installpip install -r tests/test_requirements.txt'''
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Daniel Jeznach commented on JENKINS-58290 Re: WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Hello, fix for this issue causes one of our pipelines to stuck. Passing -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true parameter reverts to old behavior, so I blame running sh build step in subshell as a cause of this. Pipeline code is: pipeline { agent { label "xx" + (env.JOB_NAME.find(/-(u14|u16|u18|w7|w10)(?=-)/) ?:'') } options { timeout(time: 60, unit: 'MINUTES') } environment { TMP = "${env.WORKSPACE}/tmp" PROJECT_NAME = "${env.JOB_NAME}".find(/^[^-]+/) PROJECT_BRANCH = "${env.JOB_NAME}".find(/(?<=^[^-]+-)[^-]+/) PROJECT_RELEASE = "${env.PROJECT_BRANCH}".replace('MAIN', 'rc') P4_BRANCH = "REL/${env.PROJECT_BRANCH}".replace('REL/MAIN', 'MAIN') } parameters { string(name: 'change', defaultValue: '', description: 'Changelist to build.', trim: true) string(name: 'status', defaultValue: '', description: 'Changelist status (shelved/submitted).', trim: true) string(name: 'review', defaultValue: '', description: 'Helix Swarm review number.', trim: true) string(name: 'pass', defaultValue: '', description: 'Success callback URL.', trim: true) string(name: 'fail', defaultValue: '', description: 'Failure callback URL.', trim: true) } triggers { pollSCM('@midnight') } stages { stage('sync') { steps { p4sync( credential: 'x', format: '.-${NODE_NAME}-${JOB_NAME}-${EXECUTOR_NUMBER}_ws', populate: forceClean(), source: depotSource("//path/${env.P4_BRANCH}/...") ) buildName "${env.P4_CHANGELIST} ${params.status?:''} ${params.change?:''}".trim() sh '''sed 's/\\\$Change: [0-9]\\+ \\\$/\$Change: '$P4_CHANGELIST' \$/' -i x/version.py''' script { def dot = env.PROJECT_RELEASE.find(/rc$/) ? '' : '.' env.PACKAGE_NAME = "${env.PROJECT_NAME}-${env.PROJECT_RELEASE}${dot}${env.P4_CHANGELIST}.tar.gz" } echo "PROJECT_NAME = $PROJECT_NAME" echo "PROJECT_BRANCH = $PROJECT_BRANCH" echo "PROJECT_RELEASE = $PROJECT_RELEASE" echo "P4_BRANCH = $P4_BRANCH" echo "PACKAGE_NAME = $PACKAGE_NAME" } } stage('build') { steps { sh ''' #!/bin/bash -xe # Clean up possible orphans from other test sessions pkill -P1 -U $(id -u) || true mkdir $TMP virtualenv venv . venv/bin/activate python setup.py sdist pip install dist/${PROJECT_NAME}-${PROJECT_RELEASE}*.tar.gz xx master --print-requirements | xargs pip install
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Daniel Jeznach commented on JENKINS-58290 Re: WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Hello, fix for this issue causes one of our pipelines to stuck. Passing -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true parameter reverts to old behavior, so I blame running sh build step in subshell as a cause of this. Pipeline code is: pipeline { agent { label "xx" + (env.JOB_NAME.find(/-(u14|u16|u18|w7|w10)(?=-)/) ?:'') } options { timeout(time: 60, unit: 'MINUTES') } environment { TMP = "${env.WORKSPACE}/tmp" PROJECT_NAME = "${env.JOB_NAME}".find(/^[^-]+/) PROJECT_BRANCH = "${env.JOB_NAME}".find(/(?<=^[^-]+-)[^-]+/) PROJECT_RELEASE = "${env.PROJECT_BRANCH}".replace('MAIN', 'rc') P4_BRANCH = "REL/${env.PROJECT_BRANCH}".replace('REL/MAIN', 'MAIN') } parameters { string(name: 'change', defaultValue: '', description: 'Changelist to build.', trim: true) string(name: 'status', defaultValue: '', description: 'Changelist status (shelved/submitted).', trim: true) string(name: 'review', defaultValue: '', description: 'Helix Swarm review number.', trim: true) string(name: 'pass', defaultValue: '', description: 'Success callback URL.', trim: true) string(name: 'fail', defaultValue: '', description: 'Failure callback URL.', trim: true) } triggers { pollSCM('@midnight') } stages { stage('sync') { steps { p4sync( credential: 'x', format: '.-${NODE_NAME}-${JOB_NAME}-${EXECUTOR_NUMBER}_ws', populate: forceClean(), source: depotSource("//path/${env.P4_BRANCH}/...") ) buildName "${env.P4_CHANGELIST} ${params.status?:''} ${params.change?:''}".trim() sh '''sed 's/\\\$Change: [0-9]\\+ \\\$/\$Change: '$P4_CHANGELIST' \$/' -i x/version.py''' script { def dot = env.PROJECT_RELEASE.find(/rc$/) ? '' : '.' env.PACKAGE_NAME = "${env.PROJECT_NAME}-${env.PROJECT_RELEASE}${dot}${env.P4_CHANGELIST}.tar.gz" } echo "PROJECT_NAME = $PROJECT_NAME" echo "PROJECT_BRANCH = $PROJECT_BRANCH" echo "PROJECT_RELEASE = $PROJECT_RELEASE" echo "P4_BRANCH = $P4_BRANCH" echo "PACKAGE_NAME = $PACKAGE_NAME" } } stage('build') { steps { sh ''' #!/bin/bash -xe # Clean up possible orphans from other test sessions pkill -P1 -U $(id -u) || true mkdir $TMP virtualenv venv . venv/bin/activate python setup.py sdist pip install dist/${PROJECT_NAME}-${PROJECT_RELEASE}*.tar.gz imgee master --print-requirements | xargs pip install
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick updated JENKINS-58290 Jenkins / JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Change By: Jesse Glick Status: In Review Resolved Resolution: Fixed Released As: durable-task 1.30 Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.200355.156202156.4542.1562599200148%40Atlassian.JIRA. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick updated JENKINS-58290 Jenkins / JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Change By: Jesse Glick Status: In Progress Review Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.200355.156202156.13845.1562024040211%40Atlassian.JIRA. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick updated an issue Jenkins / JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Change By: Jesse Glick The {{durable-task}} plugin runs a wrapper process which redirects the user process' stdout/err to a file and sends its exit code to another file. Thus there is no need for the agent JVM to hold onto a process handle for the wrapper; it should be fork-and-forget. In fact the {{Proc}} is discarded.Unfortunately, the current implementation in {{BourneShellScript}} does not actually allow the {{Proc}} to exit until the user process also exits. On a regular agent this does not matter much. But when you run {{sh}} steps inside {{container}} on a Kubernetes agent, {{ContainerExecDecorator}} and {{ContainerExecProc}} actually keep a WebSocket open for the duration of the launched process. This consumes three master-side Java threads per {{sh}}, and also consumes resources on the Kubernetes API server ( ; it is possible to run out of connections ) . The It also consumes three master-side Java threads are per {{sh}}, like{code:none}"OkHttp http://…/..." #361 prio=5 os_prio=0 tid=… nid=… runnable […] java.lang.Thread.State: RUNNABLEat java.net.SocketInputStream.socketRead0(Native Method)at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)at java.net.SocketInputStream.read(SocketInputStream.java:171)at java.net.SocketInputStream.read(SocketInputStream.java:141)at okio.Okio$2.read(Okio.java:140)at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)at okio.RealBufferedSource.request(RealBufferedSource.java:68)at okio.RealBufferedSource.require(RealBufferedSource.java:61)at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)"OkHttp WebSocket http://…/..." #359 prio=5 os_prio=0 tid=… nid=… waiting on condition […] java.lang.Thread.State: TIMED_WAITING (parking)at sun.misc.Unsafe.park(Native Method)- parking to wait for <…> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)at
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick updated an issue Jenkins / JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Change By: Jesse Glick The {{durable-task}} plugin runs a wrapper process which redirects the user process' stdout/err to a file and sends its exit code to another file. Thus there is no need for the agent JVM to hold onto a process handle for the wrapper; it should be fork-and-forget. In fact the {{Proc}} is discarded.Unfortunately, the current implementation in {{BourneShellScript}} does not actually allow the {{Proc}} to exit until the user process also exits. On a regular agent this does not matter much. But when you run {{sh}} steps inside {{container}} on a Kubernetes agent, {{ContainerExecDecorator}} and {{ContainerExecProc}} actually keep a WebSocket open for the duration of the launched process. This consumes three master-side Java threads per {{sh}}, and also consumes resources on the Kubernetes API server (it is possible to run out of connections). The threads are like{code:none}"OkHttp http://…/..." #361 prio=5 os_prio=0 tid=… nid=… runnable […] java.lang.Thread.State: RUNNABLEat java.net.SocketInputStream.socketRead0(Native Method)at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)at java.net.SocketInputStream.read(SocketInputStream.java:171)at java.net.SocketInputStream.read(SocketInputStream.java:141)at okio.Okio$2.read(Okio.java:140)at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)at okio.RealBufferedSource.request(RealBufferedSource.java:68)at okio.RealBufferedSource.require(RealBufferedSource.java:61)at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)"OkHttp WebSocket http://…/..." #359 prio=5 os_prio=0 tid=… nid=… waiting on condition […] java.lang.Thread.State: TIMED_WAITING (parking)at sun.misc.Unsafe.park(Native Method)- parking to wait for <…> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)at
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick started work on JENKINS-58290 Change By: Jesse Glick Status: Open In Progress Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.200355.156202156.13842.1562021700082%40Atlassian.JIRA. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-58290) WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
Title: Message Title Jesse Glick created an issue Jenkins / JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator Issue Type: Bug Assignee: Jesse Glick Components: durable-task-plugin, kubernetes-plugin Created: 2019-07-01 22:52 Labels: threads leak Priority: Critical Reporter: Jesse Glick The durable-task plugin runs a wrapper process which redirects the user process' stdout/err to a file and sends its exit code to another file. Thus there is no need for the agent JVM to hold onto a process handle for the wrapper; it should be fork-and-forget. In fact the Proc is discarded. Unfortunately, the current implementation in BourneShellScript does not actually allow the Proc to exit until the user process also exits. On a regular agent this does not matter much. But when you run sh steps inside container on a Kubernetes agent, ContainerExecDecorator and ContainerExecProc actually keep a WebSocket open for the duration of the launched process. This consumes three master-side Java threads per sh, and also consumes resources on the Kubernetes API server (it is possible to run out of connections).