Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

2019-08-14 Thread Gabriele Cerami
On 14 Aug, Javier Pena wrote:
> Two months after this e-mail, we're having the same situation.

Looking at the example patch you used before, (which hasn't merged yet)
I see this as a combination of factors:

On one side, the fact that we are launching jobs that would not be
affected by the patch. This increase the number of jobs on a single run.
In these cases even without a permanent solution I tend to create a
temporary patch for CI that removes every job and runs the only one I
need, and make my patch depend on it.

On the other side, there's our excessive reliance on end-to-end
integration tests (when we deploy everything to test our code, we also
have to collect everything). Because of the poor execution paths
coverage we are getting from these kind of tests we tend to err on the
side of caution and add several integration tests to be sure we're not
breaking any other execution path. This also increases the number of
rechecks needed, as we take more time and resources to discover that a
PS has broken an execution path.
Shifting our strategy to use more functional test would help in this
case.

It would be interesting to gather statistics on what repos trigger the
most jobs, and see where we could start reducing.
___
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org


Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

2019-08-14 Thread Javier Pena



- Original Message -
> Hi all,
> 
> For the last few days, I have been monitoring a spike in disk space
> utilization for logs.rdoproject.org. The current situation is:
> 
> - 94% of space used, with less than 140GB out of 2TB available.
> - The log pruning script has been reclaiming less space than we are using for
> new logs during this week.
> - I expect the situation to improve over the weekend, but we're definitely
> running out of space.
> 
> I have looked at a random job (https://review.opendev.org/639324, patch set
> 26), and found that each run is consuming 1.2 GB of disk space in logs. The
> worst offenders I have found are:
> 
> - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and
> 40 MB each
> - logs/undercloud/home/zuul/tempest/.stackviz directory on
> tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a
> virtualenv eating up 81 MB.
> 
> As a temporary measure, I am reducing log retention from 21 days to 14, but
> we still need to reduce the rate at which we are uploading logs. Would it be
> possible to check the oooq-generated logs and see where we can reduce? These
> jobs are by far the ones consuming most space.
> 

Two months after this e-mail, we're having the same situation.

Disk I/O performance on RDO Cloud is not great, so we're close to 95% disk 
space usage, and old logs deletion is slower than new logs addition. On top of 
this, any attempt to clear logs more aggressively cause additional load on the 
server, which results on failed log uploads [1].

Please, could we tackle the excessive log uploads asap? I see the .stackviz 
virtualenv directories are still being uploaded. If we don't fix this soon, 
we'll end up having unwanted downtime in the log server, which will affect all 
jobs.

Thanks,
Javier

[1] - https://review.rdoproject.org/zuul/builds?result=POST_FAILURE


> Thanks,
> Javier
> ___
> dev mailing list
> dev@lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/dev
> 
> To unsubscribe: dev-unsubscr...@lists.rdoproject.org
> 
___
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org


Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

2019-06-14 Thread Phill. Whiteside
Ahh, consumer class drives... are the cloud class drives made and badged by
apple. MTBF for HDD's? No difference, if a HDD decides to die, it dies.
That's why we have RAID.

Just my 1c worth.

Regards,
Phill.

On Fri, 14 Jun 2019 at 22:50, Alan Pevec  wrote:

> > 10-14TB hard drives are not really so expensive.
>
> true for consumer-class drives, cloud storage is more like > $1k/month
> for 10TB HDD and >$5k/moth for 10TB SSD
>
> Cheers,
> Alan
> ___
> dev mailing list
> dev@lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/dev
>
> To unsubscribe: dev-unsubscr...@lists.rdoproject.org
>
___
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org


Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

2019-06-14 Thread Alan Pevec
> 10-14TB hard drives are not really so expensive.

true for consumer-class drives, cloud storage is more like > $1k/month
for 10TB HDD and >$5k/moth for 10TB SSD

Cheers,
Alan
___
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org


Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

2019-06-14 Thread Sorin Sbarnea
2TB seems very little for a data hoarder like me, I happen to have ~30TB of 
storage at home. 

10-14TB hard drives are not really so expensive.

While I totally agree that we should control/limit better what we collect, I 
wonder if we should not aim to get a setup where we do not struggle with disk 
space, where we can keep logs for ~60 days without having to think too much 
about running out of disk space.

Another approach which I used in the past was a cleanup script that was 
removing old builds based on their age as long the free disk space was under a 
specific value (10%?). That means a dynamic retention period.

Thanks
Sorin

> On 13 Jun 2019, at 15:58, Wesley Hayutin  wrote:
> 
> 
> 
> On Thu, Jun 13, 2019 at 8:55 AM Javier Pena  > wrote:
> 
> 
> 
> 
> On Thu, Jun 13, 2019 at 8:22 AM Javier Pena  > wrote:
> Hi all,
> 
> For the last few days, I have been monitoring a spike in disk space 
> utilization for logs.rdoproject.org . The 
> current situation is:
> 
> - 94% of space used, with less than 140GB out of 2TB available.
> - The log pruning script has been reclaiming less space than we are using for 
> new logs during this week.
> - I expect the situation to improve over the weekend, but we're definitely 
> running out of space.
> 
> I have looked at a random job (https://review.opendev.org/639324 
> , patch set 26), and found that each run 
> is consuming 1.2 GB of disk space in logs. The worst offenders I have found 
> are:
> 
> - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15 and 
> 40 MB each
> - logs/undercloud/home/zuul/tempest/.stackviz directory on 
> tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a virtualenv 
> eating up 81 MB.
> 
> Can we sync up w/ how you are calculating these results as they do not match 
> our results.
> I see each job consuming about 215M of space, we are close on stackviz being 
> 83M. Oddly I don't see atop.bin.gz in our calculations so I'll have to look 
> into that.
> I've checked it directly using du on the logserver. By 1.2 GB I meant the 
> aggregate of the 8 jobs running for a single patchset. PS26 is currently 
> using 2.5 GB and had one recheck.
> 
> About the atop.bin.gz file:
> 
> # find . -name atop.bin.gz -exec du -sh {} \;
> 16M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/042cb8f/logs/undercloud/var/log/atop.bin.gz
> 16M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/e4171d7/logs/undercloud/var/log/atop.bin.gz
> 28M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/ffd4de9/logs/undercloud/var/log/atop.bin.gz
> 26M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/34d44bf/logs/undercloud/var/log/atop.bin.gz
> 25M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/b89761d/logs/undercloud/var/log/atop.bin.gz
> 24M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/undercloud/var/log/atop.bin.gz
> 29M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/a10447d/logs/undercloud/var/log/atop.bin.gz
> 44M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/99a5f9f/logs/undercloud/var/log/atop.bin.gz
> 15M
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/subnode-2/var/log/atop.bin.gz
> 33M
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/undercloud/var/log/atop.bin.gz
> 16M
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/subnode-2/var/log/atop.bin.gz
> 33M
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/undercloud/var/log/atop.bin.gz
> 40M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/109d5ae/logs/undercloud/var/log/atop.bin.gz
> 45M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c2ebeae/logs/undercloud/var/log/atop.bin.gz
> 39M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/7fe5bbb/logs/undercloud/var/log/atop.bin.gz
> 16M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/5e6cb0f/logs/undercloud/var/log/atop.bin.gz
> 40M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/c6bf5ea/logs/undercloud/var/log/atop.bin.gz
> 40M
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/6ec5ac6/logs/undercloud/var/log/atop.bin.gz
> 
> Can I safely delete all .stackviz directories? I guess that would give us 
> some breathing room.
> 
> Yup, go for it
>  
> 
> Regards,
> Javier
> 
> Each job reports the size of the logs e.g. [1]
> http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt
>  
> 
> 
> 
> As a temporary measure, I am reducing log 

Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

2019-06-13 Thread Wesley Hayutin
On Thu, Jun 13, 2019 at 8:55 AM Javier Pena  wrote:

>
>
> --
>
>
>
> On Thu, Jun 13, 2019 at 8:22 AM Javier Pena  wrote:
>
>> Hi all,
>>
>> For the last few days, I have been monitoring a spike in disk space
>> utilization for logs.rdoproject.org. The current situation is:
>>
>> - 94% of space used, with less than 140GB out of 2TB available.
>> - The log pruning script has been reclaiming less space than we are using
>> for new logs during this week.
>> - I expect the situation to improve over the weekend, but we're
>> definitely running out of space.
>>
>> I have looked at a random job (https://review.opendev.org/639324, patch
>> set 26), and found that each run is consuming 1.2 GB of disk space in logs.
>> The worst offenders I have found are:
>>
>> - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15
>> and 40 MB each
>> - logs/undercloud/home/zuul/tempest/.stackviz directory on
>> tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a
>> virtualenv eating up 81 MB.
>>
>
> Can we sync up w/ how you are calculating these results as they do not
> match our results.
> I see each job consuming about 215M of space, we are close on stackviz
> being 83M. Oddly I don't see atop.bin.gz in our calculations so I'll have
> to look into that.
>
> I've checked it directly using du on the logserver. By 1.2 GB I meant the
> aggregate of the 8 jobs running for a single patchset. PS26 is currently
> using 2.5 GB and had one recheck.
>
> About the atop.bin.gz file:
>
> # find . -name atop.bin.gz -exec du -sh {} \;
> 16M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/042cb8f/logs/undercloud/var/log/atop.bin.gz
> 16M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens-branch/e4171d7/logs/undercloud/var/log/atop.bin.gz
> 28M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/ffd4de9/logs/undercloud/var/log/atop.bin.gz
> 26M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-rocky-branch/34d44bf/logs/undercloud/var/log/atop.bin.gz
> 25M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/b89761d/logs/undercloud/var/log/atop.bin.gz
> 24M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/undercloud/var/log/atop.bin.gz
> 29M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/a10447d/logs/undercloud/var/log/atop.bin.gz
> 44M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/99a5f9f/logs/undercloud/var/log/atop.bin.gz
> 15M
>  
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/subnode-2/var/log/atop.bin.gz
> 33M
>  
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/c8a8c60/logs/undercloud/var/log/atop.bin.gz
> 16M
>  
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/subnode-2/var/log/atop.bin.gz
> 33M
>  
> ./tripleo-ci-centos-7-multinode-1ctlr-featureset010/73ef532/logs/undercloud/var/log/atop.bin.gz
> 40M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/109d5ae/logs/undercloud/var/log/atop.bin.gz
> 45M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c2ebeae/logs/undercloud/var/log/atop.bin.gz
> 39M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/7fe5bbb/logs/undercloud/var/log/atop.bin.gz
> 16M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/5e6cb0f/logs/undercloud/var/log/atop.bin.gz
> 40M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/c6bf5ea/logs/undercloud/var/log/atop.bin.gz
> 40M
>  
> ./tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039/6ec5ac6/logs/undercloud/var/log/atop.bin.gz
>
> Can I safely delete all .stackviz directories? I guess that would give us
> some breathing room.
>

Yup, go for it


>
> Regards,
> Javier
>
> Each job reports the size of the logs e.g. [1]
>
> http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt
>
>
>> As a temporary measure, I am reducing log retention from 21 days to 14,
>> but we still need to reduce the rate at which we are uploading logs. Would
>> it be possible to check the oooq-generated logs and see where we can
>> reduce? These jobs are by far the ones consuming most space.
>>
>> Thanks,
>> Javier
>> ___
>> dev mailing list
>> dev@lists.rdoproject.org
>> http://lists.rdoproject.org/mailman/listinfo/dev
>>
>> To unsubscribe: dev-unsubscr...@lists.rdoproject.org
>>
>
>
___
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org


Re: [rdo-dev] [infra][tripleo-ci] Disk space usage in logs.rdoproject.org

2019-06-13 Thread Wesley Hayutin
On Thu, Jun 13, 2019 at 8:22 AM Javier Pena  wrote:

> Hi all,
>
> For the last few days, I have been monitoring a spike in disk space
> utilization for logs.rdoproject.org. The current situation is:
>
> - 94% of space used, with less than 140GB out of 2TB available.
> - The log pruning script has been reclaiming less space than we are using
> for new logs during this week.
> - I expect the situation to improve over the weekend, but we're definitely
> running out of space.
>
> I have looked at a random job (https://review.opendev.org/639324, patch
> set 26), and found that each run is consuming 1.2 GB of disk space in logs.
> The worst offenders I have found are:
>
> - atop.bin.gz files (one per job, 8 jobs per recheck), ranging between 15
> and 40 MB each
> - logs/undercloud/home/zuul/tempest/.stackviz directory on
> tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 jobs, which is a
> virtualenv eating up 81 MB.
>

Can we sync up w/ how you are calculating these results as they do not
match our results.
I see each job consuming about 215M of space, we are close on stackviz
being 83M. Oddly I don't see atop.bin.gz in our calculations so I'll have
to look into that.

Each job reports the size of the logs e.g. [1]
http://logs.rdoproject.org/24/639324/26/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein-branch/9ade834/logs/quickstart_files/log-size.txt



>
> As a temporary measure, I am reducing log retention from 21 days to 14,
> but we still need to reduce the rate at which we are uploading logs. Would
> it be possible to check the oooq-generated logs and see where we can
> reduce? These jobs are by far the ones consuming most space.
>
> Thanks,
> Javier
> ___
> dev mailing list
> dev@lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/dev
>
> To unsubscribe: dev-unsubscr...@lists.rdoproject.org
>
___
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org