Re: apache-beam-jenkins-15 out of disk

2019-07-03 Thread Yifan Zou
I reimaged the beam15. The worker is re-enabled. Let us know if anything weird happens on any agent. Thanks. Yifan On Mon, Jul 1, 2019 at 10:00 AM Yifan Zou wrote: > https://issues.apache.org/jira/browse/BEAM-7650 tracks the docker issue. > > On Sun, Jun 30, 2019 at 2:35 PM Mark Liu wrote: >

Re: apache-beam-jenkins-15 out of disk

2019-07-01 Thread Yifan Zou
https://issues.apache.org/jira/browse/BEAM-7650 tracks the docker issue. On Sun, Jun 30, 2019 at 2:35 PM Mark Liu wrote: > Thank you for triaging and working out a solution Yifan and Ankur. > > Ankur, from what you discovered, we should fix this race condition > otherwise same problem will

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Yifan Zou
Sorry for the inconvenience. I disabled the worker. I'll need more time to restore it. On Fri, Jun 28, 2019 at 3:56 PM Daniel Oliveira wrote: > Any updates to this issue today? It seems like this (or a similar bug) is > still happening across many Pre and Postcommits. > > On Fri, Jun 28, 2019

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Daniel Oliveira
Any updates to this issue today? It seems like this (or a similar bug) is still happening across many Pre and Postcommits. On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou wrote: > I did the prune on beam15. The disk was free but all jobs fails with other > weird problems. Looks like docker prune

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Yifan Zou
I did the prune on beam15. The disk was free but all jobs fails with other weird problems. Looks like docker prune overkills, but I don't have evidence. Will look further in AM. On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri wrote: > See how the hdfs IT already avoids tag collisions. > > On Thu,

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Udi Meiri
See how the hdfs IT already avoids tag collisions. On Thu, Jun 27, 2019, 20:42 Yichi Zhang wrote: > for flakiness I guess a tag is needed to separate concurrent build apart. > > On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang wrote: > >> maybe a cron job on jenkins node that does docker prune

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yichi Zhang
maybe a cron job on jenkins node that does docker prune every day? On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka wrote: > This highlights the race condition caused by using single docker registry > on a machine. > If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one > after

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yifan Zou
The problem was because of the large quantity of stale docker images generated by the Python portable tests and HDFS IT. Dumping the docker disk usage gives me: TYPETOTAL ACTIVE SIZE RECLAIMABLE *Images 1039356

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yifan Zou
Something were eating the disk. Disconnected the worker so jobs could be allocated to other nodes. Will look deeper. Filesystem Size Used Avail Use% Mounted on /dev/sda1 485G 485G 96K 100% / On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou wrote: > I'm on it. > > On Thu, Jun 27, 2019