Re: apache-beam-jenkins-15 out of disk

2019-07-03 Thread Yifan Zou
I reimaged the beam15. The worker is re-enabled. Let us know if anything
weird happens on any agent.

Thanks.
Yifan

On Mon, Jul 1, 2019 at 10:00 AM Yifan Zou  wrote:

> https://issues.apache.org/jira/browse/BEAM-7650 tracks the docker issue.
>
> On Sun, Jun 30, 2019 at 2:35 PM Mark Liu  wrote:
>
>> Thank you for triaging and working out a solution Yifan and Ankur.
>>
>> Ankur, from what you discovered, we should fix this race condition
>> otherwise same problem will happen in the future. Is there a jira tracking
>> this issue?
>>
>> On Fri, Jun 28, 2019 at 4:56 PM Yifan Zou  wrote:
>>
>>> Sorry for the inconvenience. I disabled the worker. I'll need more time
>>> to restore it.
>>>
>>> On Fri, Jun 28, 2019 at 3:56 PM Daniel Oliveira 
>>> wrote:
>>>
 Any updates to this issue today? It seems like this (or a similar bug)
 is still happening across many Pre and Postcommits.

 On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou  wrote:

> I did the prune on beam15. The disk was free but all jobs fails with
> other weird problems. Looks like docker prune overkills, but I don't have
> evidence. Will look further in AM.
>
> On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri  wrote:
>
>> See how the hdfs IT already avoids tag collisions.
>>
>> On Thu, Jun 27, 2019, 20:42 Yichi Zhang  wrote:
>>
>>> for flakiness I guess a tag is needed to separate concurrent build
>>> apart.
>>>
>>> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang 
>>> wrote:
>>>
 maybe a cron job on jenkins node that does docker prune every day?

 On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka 
 wrote:

> This highlights the race condition caused by using single docker
> registry on a machine.
> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
> after another then the 2nd one will replace the 1st one and cause 
> flakyness.
>
> Is their a way to dynamically create and destroy docker repository
> on a machine and clean all the relevant data?
>
> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou 
> wrote:
>
>> The problem was because of the large quantity of stale docker
>> images generated by the Python portable tests and HDFS IT.
>>
>> Dumping the docker disk usage gives me:
>>
>> TYPETOTAL   ACTIVE  SIZE
>>RECLAIMABLE
>> *Images  1039356
>> 424GB   384.2GB (90%)*
>> Containers  987 2
>> 2.042GB 2.041GB (99%)
>> Local Volumes   126 0
>> 392.8MB 392.8MB (100%)
>>
>> REPOSITORY
>> TAG IMAGE IDCREATED
>> SIZESHARED SIZE UNIQUE SIZE 
>> CONTAINERS
>> jenkins-docker-apache.bintray.io/beam/python3
>>  latest  ff1b949f444222 hours ago
>> 1.639GB
>>   922.3MB  716.9MB 0
>> jenkins-docker-apache.bintray.io/beam/python
>>latest  1dda7b9d974822 hours ago
>> 1.624GB
>> 913.7MB   710.3MB 0
>> 
>>  05458187a0e3
>> 22 hours
>> ago732.9MB 625.1MB107.8MB
>>  4
>> 
>>  896f35dd685f
>> 23 hours
>> ago1.639GB 922.3MB   716.9MB 
>> 0
>> 
>>  db4d24ca9f2b
>> 23 hours
>> ago1.624GB 913.7MB  710.3MB  
>>0
>> 
>>   547df4d71c31
>> 23
>> hours ago732.9MB 625.1MB 107.8MB
>>   4
>> 
>>   dd7d9582c3e0
>> 23
>> hours ago1.639GB 922.3MB 716.9MB
>>   0
>> 
>>   664aae255239
>> 23
>> hours ago1.624GB 913.7MB 710.3MB
>>   0
>> 
>>   b528fedf9228
>> 23
>> hours ago732.9MB 625.1MB 107.8MB
>>   4
>> 
>>   8e996f22435e
>> 25
>> hours ago1.624GB  

Re: apache-beam-jenkins-15 out of disk

2019-07-01 Thread Yifan Zou
https://issues.apache.org/jira/browse/BEAM-7650 tracks the docker issue.

On Sun, Jun 30, 2019 at 2:35 PM Mark Liu  wrote:

> Thank you for triaging and working out a solution Yifan and Ankur.
>
> Ankur, from what you discovered, we should fix this race condition
> otherwise same problem will happen in the future. Is there a jira tracking
> this issue?
>
> On Fri, Jun 28, 2019 at 4:56 PM Yifan Zou  wrote:
>
>> Sorry for the inconvenience. I disabled the worker. I'll need more time
>> to restore it.
>>
>> On Fri, Jun 28, 2019 at 3:56 PM Daniel Oliveira 
>> wrote:
>>
>>> Any updates to this issue today? It seems like this (or a similar bug)
>>> is still happening across many Pre and Postcommits.
>>>
>>> On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou  wrote:
>>>
 I did the prune on beam15. The disk was free but all jobs fails with
 other weird problems. Looks like docker prune overkills, but I don't have
 evidence. Will look further in AM.

 On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri  wrote:

> See how the hdfs IT already avoids tag collisions.
>
> On Thu, Jun 27, 2019, 20:42 Yichi Zhang  wrote:
>
>> for flakiness I guess a tag is needed to separate concurrent build
>> apart.
>>
>> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang 
>> wrote:
>>
>>> maybe a cron job on jenkins node that does docker prune every day?
>>>
>>> On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka 
>>> wrote:
>>>
 This highlights the race condition caused by using single docker
 registry on a machine.
 If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
 after another then the 2nd one will replace the 1st one and cause 
 flakyness.

 Is their a way to dynamically create and destroy docker repository
 on a machine and clean all the relevant data?

 On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou 
 wrote:

> The problem was because of the large quantity of stale docker
> images generated by the Python portable tests and HDFS IT.
>
> Dumping the docker disk usage gives me:
>
> TYPETOTAL   ACTIVE  SIZE
>  RECLAIMABLE
> *Images  1039356 424GB
>   384.2GB (90%)*
> Containers  987 2
> 2.042GB 2.041GB (99%)
> Local Volumes   126 0
> 392.8MB 392.8MB (100%)
>
> REPOSITORY
> TAG IMAGE IDCREATED
> SIZESHARED SIZE UNIQUE SIZE 
> CONTAINERS
> jenkins-docker-apache.bintray.io/beam/python3
>  latest  ff1b949f444222 hours ago
> 1.639GB
>   922.3MB  716.9MB 0
> jenkins-docker-apache.bintray.io/beam/python
>  latest  1dda7b9d974822 hours ago
> 1.624GB
>   913.7MB   710.3MB 0
> 
>  05458187a0e322 
> hours
> ago732.9MB 625.1MB107.8MB 
> 4
> 
>  896f35dd685f23 
> hours
> ago1.639GB 922.3MB   716.9MB  
>0
> 
>  db4d24ca9f2b23 
> hours
> ago1.624GB 913.7MB  710.3MB   
>   0
> 
> 547df4d71c3123 
> hours
> ago732.9MB 625.1MB 107.8MB
>  4
> 
> dd7d9582c3e023 
> hours
> ago1.639GB 922.3MB 716.9MB
>  0
> 
> 664aae25523923 
> hours
> ago1.624GB 913.7MB 710.3MB
>  0
> 
>   b528fedf922823
> hours ago732.9MB 625.1MB 107.8MB
>   4
> 
>   8e996f22435e25
> hours ago1.624GB 913.7MB710.3MB
> 0
> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_test
> latest  24b73b3fec0625 hours ago1.305GB
> 965.7MB   339.5MB 0
> 
>  

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Yifan Zou
Sorry for the inconvenience. I disabled the worker. I'll need more time to
restore it.

On Fri, Jun 28, 2019 at 3:56 PM Daniel Oliveira 
wrote:

> Any updates to this issue today? It seems like this (or a similar bug) is
> still happening across many Pre and Postcommits.
>
> On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou  wrote:
>
>> I did the prune on beam15. The disk was free but all jobs fails with
>> other weird problems. Looks like docker prune overkills, but I don't have
>> evidence. Will look further in AM.
>>
>> On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri  wrote:
>>
>>> See how the hdfs IT already avoids tag collisions.
>>>
>>> On Thu, Jun 27, 2019, 20:42 Yichi Zhang  wrote:
>>>
 for flakiness I guess a tag is needed to separate concurrent build
 apart.

 On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang  wrote:

> maybe a cron job on jenkins node that does docker prune every day?
>
> On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka 
> wrote:
>
>> This highlights the race condition caused by using single docker
>> registry on a machine.
>> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
>> after another then the 2nd one will replace the 1st one and cause 
>> flakyness.
>>
>> Is their a way to dynamically create and destroy docker repository on
>> a machine and clean all the relevant data?
>>
>> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou 
>> wrote:
>>
>>> The problem was because of the large quantity of stale docker images
>>> generated by the Python portable tests and HDFS IT.
>>>
>>> Dumping the docker disk usage gives me:
>>>
>>> TYPETOTAL   ACTIVE  SIZE
>>>RECLAIMABLE
>>> *Images  1039356 424GB
>>> 384.2GB (90%)*
>>> Containers  987 2   2.042GB
>>> 2.041GB (99%)
>>> Local Volumes   126 0   392.8MB
>>> 392.8MB (100%)
>>>
>>> REPOSITORY
>>>   TAG IMAGE IDCREATED
>>>   SIZESHARED SIZE UNIQUE SIZE CONTAINERS
>>> jenkins-docker-apache.bintray.io/beam/python3
>>>latest  ff1b949f444222 hours ago1.639GB
>>> 922.3MB  716.9MB 0
>>> jenkins-docker-apache.bintray.io/beam/python
>>>latest  1dda7b9d974822 hours ago1.624GB
>>> 913.7MB   710.3MB 0
>>> 
>>>05458187a0e322 
>>> hours
>>> ago732.9MB 625.1MB107.8MB 4
>>> 
>>>896f35dd685f23 
>>> hours
>>> ago1.639GB 922.3MB   716.9MB
>>>  0
>>> 
>>>db4d24ca9f2b23 
>>> hours
>>> ago1.624GB 913.7MB  710.3MB >>> 0
>>> 
>>>   547df4d71c3123 hours
>>> ago732.9MB 625.1MB 107.8MB 4
>>> 
>>>   dd7d9582c3e023 hours
>>> ago1.639GB 922.3MB 716.9MB 0
>>> 
>>>   664aae25523923 hours
>>> ago1.624GB 913.7MB 710.3MB 0
>>> 
>>> b528fedf922823 
>>> hours
>>> ago732.9MB 625.1MB 107.8MB 4
>>> 
>>> 8e996f22435e25 
>>> hours
>>> ago1.624GB 913.7MB710.3MB 0
>>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_testlatest
>>>  24b73b3fec0625 hours ago1.305GB
>>> 965.7MB   339.5MB 0
>>> 
>>> 096325fb48de   25 
>>> hours
>>> ago732.9MB 625.1MB107.8MB  2
>>> jenkins-docker-apache.bintray.io/beam/java
>>>  latest  c36d8ff2945d  25 hours ago
>>> 685.6MB
>>> 625.1MB   60.52MB 0
>>> 
>>>   11c86ebe025f26 hours
>>> ago1.639GB 922.3MB  716.9MB >>> 0
>>> 
>>>   2ecd69c89ec126 hours
>>> ago1.624GB 913.7MB 710.3MB 0
>>> 

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Daniel Oliveira
Any updates to this issue today? It seems like this (or a similar bug) is
still happening across many Pre and Postcommits.

On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou  wrote:

> I did the prune on beam15. The disk was free but all jobs fails with other
> weird problems. Looks like docker prune overkills, but I don't have
> evidence. Will look further in AM.
>
> On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri  wrote:
>
>> See how the hdfs IT already avoids tag collisions.
>>
>> On Thu, Jun 27, 2019, 20:42 Yichi Zhang  wrote:
>>
>>> for flakiness I guess a tag is needed to separate concurrent build
>>> apart.
>>>
>>> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang  wrote:
>>>
 maybe a cron job on jenkins node that does docker prune every day?

 On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka  wrote:

> This highlights the race condition caused by using single docker
> registry on a machine.
> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
> after another then the 2nd one will replace the 1st one and cause 
> flakyness.
>
> Is their a way to dynamically create and destroy docker repository on
> a machine and clean all the relevant data?
>
> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou  wrote:
>
>> The problem was because of the large quantity of stale docker images
>> generated by the Python portable tests and HDFS IT.
>>
>> Dumping the docker disk usage gives me:
>>
>> TYPETOTAL   ACTIVE  SIZE
>>RECLAIMABLE
>> *Images  1039356 424GB
>> 384.2GB (90%)*
>> Containers  987 2   2.042GB
>>   2.041GB (99%)
>> Local Volumes   126 0   392.8MB
>>   392.8MB (100%)
>>
>> REPOSITORY
>> TAG IMAGE IDCREATED
>> SIZESHARED SIZE UNIQUE SIZE CONTAINERS
>> jenkins-docker-apache.bintray.io/beam/python3
>>  latest  ff1b949f444222 hours ago1.639GB
>>   922.3MB  716.9MB 0
>> jenkins-docker-apache.bintray.io/beam/python
>>latest  1dda7b9d974822 hours ago1.624GB
>> 913.7MB   710.3MB 0
>> 
>>  05458187a0e322 hours 
>> ago
>>732.9MB 625.1MB107.8MB 4
>> 
>>  896f35dd685f23 hours 
>> ago
>>1.639GB 922.3MB   716.9MB 0
>> 
>>  db4d24ca9f2b23 hours 
>> ago
>>1.624GB 913.7MB  710.3MB 0
>> 
>>   547df4d71c3123 hours
>> ago732.9MB 625.1MB 107.8MB 4
>> 
>>   dd7d9582c3e023 hours
>> ago1.639GB 922.3MB 716.9MB 0
>> 
>>   664aae25523923 hours
>> ago1.624GB 913.7MB 710.3MB 0
>> 
>>   b528fedf922823 hours
>> ago732.9MB 625.1MB 107.8MB 4
>> 
>>   8e996f22435e25 hours
>> ago1.624GB 913.7MB710.3MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_testlatest
>>24b73b3fec0625 hours ago1.305GB
>> 965.7MB   339.5MB 0
>> 
>>   096325fb48de   25 hours 
>> ago
>>732.9MB 625.1MB107.8MB  2
>> jenkins-docker-apache.bintray.io/beam/java
>>  latest  c36d8ff2945d  25 hours ago
>> 685.6MB
>> 625.1MB   60.52MB 0
>> 
>>   11c86ebe025f26 hours
>> ago1.639GB 922.3MB  716.9MB 0
>> 
>>   2ecd69c89ec126 hours
>> ago1.624GB 913.7MB 710.3MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify-8590_testlatest
>>  3d1d589d44fe2 days ago  1.305GB
>> 965.7MB   339.5MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test  latest
>>  d1cc503ebe8e2 days ago  1.305GB

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Yifan Zou
I did the prune on beam15. The disk was free but all jobs fails with other
weird problems. Looks like docker prune overkills, but I don't have
evidence. Will look further in AM.

On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri  wrote:

> See how the hdfs IT already avoids tag collisions.
>
> On Thu, Jun 27, 2019, 20:42 Yichi Zhang  wrote:
>
>> for flakiness I guess a tag is needed to separate concurrent build apart.
>>
>> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang  wrote:
>>
>>> maybe a cron job on jenkins node that does docker prune every day?
>>>
>>> On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka  wrote:
>>>
 This highlights the race condition caused by using single docker
 registry on a machine.
 If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
 after another then the 2nd one will replace the 1st one and cause 
 flakyness.

 Is their a way to dynamically create and destroy docker repository on a
 machine and clean all the relevant data?

 On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou  wrote:

> The problem was because of the large quantity of stale docker images
> generated by the Python portable tests and HDFS IT.
>
> Dumping the docker disk usage gives me:
>
> TYPETOTAL   ACTIVE  SIZE
>  RECLAIMABLE
> *Images  1039356 424GB
>   384.2GB (90%)*
> Containers  987 2   2.042GB
>   2.041GB (99%)
> Local Volumes   126 0   392.8MB
>   392.8MB (100%)
>
> REPOSITORY
> TAG IMAGE IDCREATED
> SIZESHARED SIZE UNIQUE SIZE CONTAINERS
> jenkins-docker-apache.bintray.io/beam/python3
>  latest  ff1b949f444222 hours ago1.639GB
>   922.3MB  716.9MB 0
> jenkins-docker-apache.bintray.io/beam/python
>  latest  1dda7b9d974822 hours ago1.624GB
>   913.7MB   710.3MB 0
> 
>  05458187a0e322 hours 
> ago
>732.9MB 625.1MB107.8MB 4
> 
>  896f35dd685f23 hours 
> ago
>1.639GB 922.3MB   716.9MB 0
> 
>  db4d24ca9f2b23 hours 
> ago
>1.624GB 913.7MB  710.3MB 0
> 
> 547df4d71c3123 hours ago
>732.9MB 625.1MB 107.8MB 4
> 
> dd7d9582c3e023 hours ago
>1.639GB 922.3MB 716.9MB 0
> 
> 664aae25523923 hours ago
>1.624GB 913.7MB 710.3MB 0
> 
>   b528fedf922823 hours
> ago732.9MB 625.1MB 107.8MB 4
> 
>   8e996f22435e25 hours
> ago1.624GB 913.7MB710.3MB 0
> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_testlatest
>24b73b3fec0625 hours ago1.305GB
> 965.7MB   339.5MB 0
> 
>   096325fb48de   25 hours 
> ago
>732.9MB 625.1MB107.8MB  2
> jenkins-docker-apache.bintray.io/beam/java
>latest  c36d8ff2945d  25 hours ago685.6MB
>   625.1MB   60.52MB 0
> 
> 11c86ebe025f26 hours ago
>1.639GB 922.3MB  716.9MB 0
> 
> 2ecd69c89ec126 hours ago
>1.624GB 913.7MB 710.3MB 0
> hdfs_it-jenkins-beam_postcommit_python_verify-8590_testlatest
>  3d1d589d44fe2 days ago  1.305GB
> 965.7MB   339.5MB 0
> hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test  latest
>  d1cc503ebe8e2 days ago  1.305GB
> 965.7MB 339.2MB 0
> hdfs_it-jenkins-beam_postcommit_python_verify-8577_testlatest
>  8582c6ca6e153 days ago  1.305GB
> 965.7MB  339.2MB 0
> hdfs_it-jenkins-beam_postcommit_python_verify-8576_testlatest

Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Udi Meiri
See how the hdfs IT already avoids tag collisions.

On Thu, Jun 27, 2019, 20:42 Yichi Zhang  wrote:

> for flakiness I guess a tag is needed to separate concurrent build apart.
>
> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang  wrote:
>
>> maybe a cron job on jenkins node that does docker prune every day?
>>
>> On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka  wrote:
>>
>>> This highlights the race condition caused by using single docker
>>> registry on a machine.
>>> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
>>> after another then the 2nd one will replace the 1st one and cause flakyness.
>>>
>>> Is their a way to dynamically create and destroy docker repository on a
>>> machine and clean all the relevant data?
>>>
>>> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou  wrote:
>>>
 The problem was because of the large quantity of stale docker images
 generated by the Python portable tests and HDFS IT.

 Dumping the docker disk usage gives me:

 TYPETOTAL   ACTIVE  SIZE
  RECLAIMABLE
 *Images  1039356 424GB
   384.2GB (90%)*
 Containers  987 2   2.042GB
 2.041GB (99%)
 Local Volumes   126 0   392.8MB
 392.8MB (100%)

 REPOSITORY
   TAG IMAGE IDCREATED
 SIZESHARED SIZE UNIQUE SIZE CONTAINERS
 jenkins-docker-apache.bintray.io/beam/python3
  latest  ff1b949f444222 hours ago1.639GB
   922.3MB  716.9MB 0
 jenkins-docker-apache.bintray.io/beam/python
  latest  1dda7b9d974822 hours ago1.624GB
   913.7MB   710.3MB 0
 
05458187a0e322 hours ago
  732.9MB 625.1MB107.8MB 4
 
896f35dd685f23 hours ago
  1.639GB 922.3MB   716.9MB 0
 
db4d24ca9f2b23 hours ago
  1.624GB 913.7MB  710.3MB 0
 
 547df4d71c3123 hours ago
732.9MB 625.1MB 107.8MB 4
 
 dd7d9582c3e023 hours ago
1.639GB 922.3MB 716.9MB 0
 
 664aae25523923 hours ago
1.624GB 913.7MB 710.3MB 0
 
 b528fedf922823 hours ago
732.9MB 625.1MB 107.8MB 4
 
 8e996f22435e25 hours ago
1.624GB 913.7MB710.3MB 0
 hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_testlatest
  24b73b3fec0625 hours ago1.305GB
 965.7MB   339.5MB 0
 
 096325fb48de   25 hours ago
  732.9MB 625.1MB107.8MB  2
 jenkins-docker-apache.bintray.io/beam/java
latest  c36d8ff2945d  25 hours ago685.6MB
   625.1MB   60.52MB 0
 
 11c86ebe025f26 hours ago
1.639GB 922.3MB  716.9MB 0
 
 2ecd69c89ec126 hours ago
1.624GB 913.7MB 710.3MB 0
 hdfs_it-jenkins-beam_postcommit_python_verify-8590_testlatest
3d1d589d44fe2 days ago  1.305GB
 965.7MB   339.5MB 0
 hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test  latest
d1cc503ebe8e2 days ago  1.305GB
 965.7MB 339.2MB 0
 hdfs_it-jenkins-beam_postcommit_python_verify-8577_testlatest
8582c6ca6e153 days ago  1.305GB
 965.7MB  339.2MB 0
 hdfs_it-jenkins-beam_postcommit_python_verify-8576_testlatest
4591e09481703 days ago  1.305GB
 965.7MB  339.2MB 0
 hdfs_it-jenkins-beam_postcommit_python_verify-8575_testlatest
ab181c49d56e4 days ago  1.305GB
 965.7MB  339.2MB 0
 hdfs_it-jenkins-beam_postcommit_python_verify-8573_testlatest

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yichi Zhang
maybe a cron job on jenkins node that does docker prune every day?

On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka  wrote:

> This highlights the race condition caused by using single docker registry
> on a machine.
> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
> after another then the 2nd one will replace the 1st one and cause flakyness.
>
> Is their a way to dynamically create and destroy docker repository on a
> machine and clean all the relevant data?
>
> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou  wrote:
>
>> The problem was because of the large quantity of stale docker images
>> generated by the Python portable tests and HDFS IT.
>>
>> Dumping the docker disk usage gives me:
>>
>> TYPETOTAL   ACTIVE  SIZE
>>RECLAIMABLE
>> *Images  1039356 424GB
>> 384.2GB (90%)*
>> Containers  987 2   2.042GB
>>   2.041GB (99%)
>> Local Volumes   126 0   392.8MB
>>   392.8MB (100%)
>>
>> REPOSITORY
>> TAG IMAGE IDCREATED
>> SIZESHARED SIZE UNIQUE SIZE CONTAINERS
>> jenkins-docker-apache.bintray.io/beam/python3
>>  latest  ff1b949f444222 hours ago1.639GB
>>   922.3MB  716.9MB 0
>> jenkins-docker-apache.bintray.io/beam/python
>>  latest  1dda7b9d974822 hours ago1.624GB
>>   913.7MB   710.3MB 0
>> 
>>  05458187a0e322 hours ago
>>732.9MB 625.1MB107.8MB 4
>> 
>>  896f35dd685f23 hours ago
>>1.639GB 922.3MB   716.9MB 0
>> 
>>  db4d24ca9f2b23 hours ago
>>1.624GB 913.7MB  710.3MB 0
>> 
>>   547df4d71c3123 hours ago
>>  732.9MB 625.1MB 107.8MB 4
>> 
>>   dd7d9582c3e023 hours ago
>>  1.639GB 922.3MB 716.9MB 0
>> 
>>   664aae25523923 hours ago
>>  1.624GB 913.7MB 710.3MB 0
>> 
>>   b528fedf922823 hours ago
>>  732.9MB 625.1MB 107.8MB 4
>> 
>>   8e996f22435e25 hours ago
>>  1.624GB 913.7MB710.3MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_testlatest
>>24b73b3fec0625 hours ago1.305GB 965.7MB
>>  339.5MB 0
>> 
>>   096325fb48de   25 hours ago
>>732.9MB 625.1MB107.8MB  2
>> jenkins-docker-apache.bintray.io/beam/java
>>  latest  c36d8ff2945d  25 hours ago685.6MB
>> 625.1MB   60.52MB 0
>> 
>>   11c86ebe025f26 hours ago
>>  1.639GB 922.3MB  716.9MB 0
>> 
>>   2ecd69c89ec126 hours ago
>>  1.624GB 913.7MB 710.3MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify-8590_testlatest
>>  3d1d589d44fe2 days ago  1.305GB
>> 965.7MB   339.5MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test  latest
>>  d1cc503ebe8e2 days ago  1.305GB
>> 965.7MB 339.2MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify-8577_testlatest
>>  8582c6ca6e153 days ago  1.305GB
>> 965.7MB  339.2MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify-8576_testlatest
>>  4591e09481703 days ago  1.305GB
>> 965.7MB  339.2MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify-8575_testlatest
>>  ab181c49d56e4 days ago  1.305GB
>> 965.7MB  339.2MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify-8573_testlatest
>>  2104ba0a6db74 days ago  1.305GB
>> 965.7MB  339.2MB 0
>> ...
>> <1000+ images>
>>
>> I removed unused the images and the beam15 is back now.
>>
>> Opened https://issues.apache.org/jira/browse/BEAM-7650.
>> Ankur, I assigned the issue to you. Feel free to reassign it if needed.
>>
>> Thank you.
>> Yifan
>>
>> On Thu, Jun 27, 2019 at 11:29 AM Yifan Zou  wrote:
>>
>>> Something were eating the disk. Disconnected the worker so jobs could be
>>> 

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yifan Zou
The problem was because of the large quantity of stale docker images
generated by the Python portable tests and HDFS IT.

Dumping the docker disk usage gives me:

TYPETOTAL   ACTIVE  SIZE
 RECLAIMABLE
*Images  1039356 424GB
  384.2GB (90%)*
Containers  987 2   2.042GB
2.041GB (99%)
Local Volumes   126 0   392.8MB
392.8MB (100%)

REPOSITORY
  TAG IMAGE IDCREATED SIZE
   SHARED SIZE UNIQUE SIZE CONTAINERS
jenkins-docker-apache.bintray.io/beam/python3
 latest  ff1b949f444222 hours ago1.639GB
  922.3MB  716.9MB 0
jenkins-docker-apache.bintray.io/beam/python
 latest  1dda7b9d974822 hours ago1.624GB
  913.7MB   710.3MB 0

   05458187a0e322 hours ago
 732.9MB 625.1MB107.8MB 4

   896f35dd685f23 hours ago
 1.639GB 922.3MB   716.9MB 0

   db4d24ca9f2b23 hours ago
 1.624GB 913.7MB  710.3MB 0

547df4d71c3123 hours ago
   732.9MB 625.1MB 107.8MB 4

dd7d9582c3e023 hours ago
   1.639GB 922.3MB 716.9MB 0

664aae25523923 hours ago
   1.624GB 913.7MB 710.3MB 0

b528fedf922823 hours ago
   732.9MB 625.1MB 107.8MB 4

8e996f22435e25 hours ago
   1.624GB 913.7MB710.3MB 0
hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_testlatest
 24b73b3fec0625 hours ago1.305GB 965.7MB
   339.5MB 0

096325fb48de   25 hours ago
 732.9MB 625.1MB107.8MB  2
jenkins-docker-apache.bintray.io/beam/java
 latest  c36d8ff2945d  25 hours ago685.6MB
625.1MB   60.52MB 0

11c86ebe025f26 hours ago
   1.639GB 922.3MB  716.9MB 0

2ecd69c89ec126 hours ago
   1.624GB 913.7MB 710.3MB 0
hdfs_it-jenkins-beam_postcommit_python_verify-8590_testlatest
   3d1d589d44fe2 days ago  1.305GB 965.7MB
 339.5MB 0
hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test  latest
   d1cc503ebe8e2 days ago  1.305GB 965.7MB
339.2MB 0
hdfs_it-jenkins-beam_postcommit_python_verify-8577_testlatest
   8582c6ca6e153 days ago  1.305GB 965.7MB
339.2MB 0
hdfs_it-jenkins-beam_postcommit_python_verify-8576_testlatest
   4591e09481703 days ago  1.305GB 965.7MB
339.2MB 0
hdfs_it-jenkins-beam_postcommit_python_verify-8575_testlatest
   ab181c49d56e4 days ago  1.305GB 965.7MB
339.2MB 0
hdfs_it-jenkins-beam_postcommit_python_verify-8573_testlatest
   2104ba0a6db74 days ago  1.305GB 965.7MB
339.2MB 0
...
<1000+ images>

I removed unused the images and the beam15 is back now.

Opened https://issues.apache.org/jira/browse/BEAM-7650.
Ankur, I assigned the issue to you. Feel free to reassign it if needed.

Thank you.
Yifan

On Thu, Jun 27, 2019 at 11:29 AM Yifan Zou  wrote:

> Something were eating the disk. Disconnected the worker so jobs could be
> allocated to other nodes. Will look deeper.
> Filesystem  Size  Used  Avail Use% Mounted on
> /dev/sda1   485G  485G 96K 100%  /
>
>
> On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou  wrote:
>
>> I'm on it.
>>
>> On Thu, Jun 27, 2019 at 10:17 AM Udi Meiri  wrote:
>>
>>> Opened a bug here: https://issues.apache.org/jira/browse/BEAM-7648
>>>
>>> Can someone investigate what's going on?
>>>
>>


Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yifan Zou
Something were eating the disk. Disconnected the worker so jobs could be
allocated to other nodes. Will look deeper.
Filesystem  Size  Used  Avail Use% Mounted on
/dev/sda1   485G  485G 96K 100%  /


On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou  wrote:

> I'm on it.
>
> On Thu, Jun 27, 2019 at 10:17 AM Udi Meiri  wrote:
>
>> Opened a bug here: https://issues.apache.org/jira/browse/BEAM-7648
>>
>> Can someone investigate what's going on?
>>
>