I reimaged the beam15. The worker is re-enabled. Let us know if anything
weird happens on any agent.
Thanks.
Yifan
On Mon, Jul 1, 2019 at 10:00 AM Yifan Zou wrote:
> https://issues.apache.org/jira/browse/BEAM-7650 tracks the docker issue.
>
> On Sun, Jun 30, 2019 at 2:35 PM Mark Liu wrote:
>
https://issues.apache.org/jira/browse/BEAM-7650 tracks the docker issue.
On Sun, Jun 30, 2019 at 2:35 PM Mark Liu wrote:
> Thank you for triaging and working out a solution Yifan and Ankur.
>
> Ankur, from what you discovered, we should fix this race condition
> otherwise same problem will
Sorry for the inconvenience. I disabled the worker. I'll need more time to
restore it.
On Fri, Jun 28, 2019 at 3:56 PM Daniel Oliveira
wrote:
> Any updates to this issue today? It seems like this (or a similar bug) is
> still happening across many Pre and Postcommits.
>
> On Fri, Jun 28, 2019
Any updates to this issue today? It seems like this (or a similar bug) is
still happening across many Pre and Postcommits.
On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou wrote:
> I did the prune on beam15. The disk was free but all jobs fails with other
> weird problems. Looks like docker prune
I did the prune on beam15. The disk was free but all jobs fails with other
weird problems. Looks like docker prune overkills, but I don't have
evidence. Will look further in AM.
On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri wrote:
> See how the hdfs IT already avoids tag collisions.
>
> On Thu,
See how the hdfs IT already avoids tag collisions.
On Thu, Jun 27, 2019, 20:42 Yichi Zhang wrote:
> for flakiness I guess a tag is needed to separate concurrent build apart.
>
> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang wrote:
>
>> maybe a cron job on jenkins node that does docker prune
maybe a cron job on jenkins node that does docker prune every day?
On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka wrote:
> This highlights the race condition caused by using single docker registry
> on a machine.
> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
> after
The problem was because of the large quantity of stale docker images
generated by the Python portable tests and HDFS IT.
Dumping the docker disk usage gives me:
TYPETOTAL ACTIVE SIZE
RECLAIMABLE
*Images 1039356
Something were eating the disk. Disconnected the worker so jobs could be
allocated to other nodes. Will look deeper.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 485G 485G 96K 100% /
On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou wrote:
> I'm on it.
>
> On Thu, Jun 27, 2019