Re: [onap-discuss] /var/log/syslog taking up a lot of space on OOM Beijing

2018-06-04 Thread Michael O'Brien
Hi,
   Could you post your running pod list - from the logs it looks like you may 
have some stuck container deletions - these usually require manual deletion for 
now as a workaround for a kubernetes issue in the current release of docker 
relating to PV's

I checked my 2 long lived CD systems and posted some log stats in the jira 
below - I don't see over 12Mb of syslog data - but we should definitely watch 
the FS size as this will usually be our first point of failure.  I have in the 
past however seen a fully saturated master once.
Usually the cluster runs about 5G for the /dockerdata-nfs share, the master 
will need under 60G and each cluster host around 100G to run for days under 40G 
of docker downloads.  The issue is also things like ONAP log files saturating 
the HD - which we are keenly interested in - usually the first point of failure 
of a system - a 100% used HD - so we should watch this via a JIRA

https://jira.onap.org/browse/LOG-453

on cluster.onap.info I only see 12Mb

{noformat}
ubu...@cluster1.onap.info
ubuntu@ip-172-31-28-156:~$ df
Filesystem   1K-blocks Used
Available Use% Mounted on
udev  156911400 
15691140   0% /dev
tmpfs  3139820   351036  
2788784  12% /run
/dev/xvda181254044 65222160 
16015500  81% /
tmpfs 15699096 6892 
15692204   1% /dev/shm
tmpfs 51200 
5120   0% /run/lock
tmpfs 156990960 
15699096   0% /sys/fs/cgroup
fs-023adc1b.efs.us-west-1.amazonaws.com:/ 9007199254739968  3729408 
9007199251010560   1% /dockerdata-nfs
tmpfs  31398200  
3139820   0% /run/user/1000

-rw-r-  1 syslog adm 8292567 Jun  4 21:07 syslog
-rw-r-  1 syslog adm12395747 Jun  4 06:25 syslog.1
-rw-r-  1 syslog adm  710414 Jun  3 06:25 syslog.2.gz
-rw-r-  1 syslog adm  700347 Jun  2 06:25 syslog.3.gz
-rw-r-  1 syslog adm  721147 Jun  1 06:25 syslog.4.gz
-rw-r-  1 syslog adm  636081 May 31 06:25 syslog.5.gz
-rw-r-  1 syslog adm  373696 May 30 06:25 syslog.6.gz
-rw-r-  1 syslog adm  109797 May 29 06:25 syslog.7.gz

Thanks for bringing this up
/michael


From: onap-discuss-boun...@lists.onap.org 
[mailto:onap-discuss-boun...@lists.onap.org] On Behalf Of 
abdelmuhaimen.sea...@orange.com
Sent: Monday, June 4, 2018 6:27 AM
To: onap-discuss@lists.onap.org
Subject: Re: [onap-discuss] /var/log/syslog taking up a lot of space on OOM 
Beijing

Hi,

After 4 days, the installation is still stable, however, still the 
var/log/syslog and /var/log/syslog.1 have reached 44 GB space again.


-rw-r-  1 syslog adm   7043982261 Jun  4 10:23 syslog
-rw-r-  1 syslog adm  37589664321 Jun  4 06:25 syslog.1
-rw-r-  1 syslog adm99405 Jun  3 06:25 syslog.2.gz
-rw-r-  1 syslog adm79388 Jun  2 06:25 syslog.3.gz
-rw-r-  1 syslog adm   155344 Jun  1 06:25 syslog.4.gz



Abdelmuhaimen Seaudi
Orange Labs Egypt
Email: abdelmuhaimen.sea...@orange.com
Mobile: +2012 84644 733

From: SEAUDI Abdelmuhaimen OBS/CSO
Sent: Saturday, June 2, 2018 1:01 PM
To: onap-discuss@lists.onap.org
Subject: /var/log/syslog taking up a lot of space on OOM Beijing

Hi,

I have an OOM Beijing instance running on 1 VM for Rancher Server, and 3 VMs 
Rancher hosts, each with 8 vCPUs, 52 GB RAM, 50 GB Root Parition, and 100 GB 
2nd Partition for /var/lib/docker/.

After running for 2 days, the installation is stable so far, only OOM-SNIRO 
gives FAIL in robot health check, and only pod onap-oof is failing, since the 
installation.

However, I noticed one of the nodes taking a lot of root storage space, and I 
found out it's /var/log/syslog and /var/log/syslog.1, which are taking ~43 GB 
of space.

What is the reason for this behaviour ?

root@olc-bjng-2:~# free -h
  totalusedfree  shared  buff/cache   available
Mem:51G 25G 12G557M 13G 24G
Swap:0B  0B  0B
root@olc-bjng-2:~# df -h /dev/vda1 /dev/vdb
Filesystem  Size  Used Avail Use% Mounted on
/dev/vda149G   44G  5.0G  90% /   a lot 
of space, not from /var/lib/docker
/dev/vdb 99G   30G   64G  32% /mnt
root@olc-bjng-2:~#

root@olc-bjng-2:/var/log# ls -l
total 42153316
...
-rw-r-  1 syslog adm   9811749726 Jun  2 10:40 syslog
-rw-r-  1 syslog adm  33351210056 Jun  2 06:25 syslog.1
-rw-r-  1 syslog adm   149784 Jun  1 06:25 syslog.2.gz

I see the following lines near the top of syslog.1

Jun  1 08:58:44 

Re: [onap-discuss] /var/log/syslog taking up a lot of space on OOM Beijing

2018-06-04 Thread abdelmuhaimen.seaudi
Hi,

After 4 days, the installation is still stable, however, still the 
var/log/syslog and /var/log/syslog.1 have reached 44 GB space again.


-rw-r-  1 syslog adm   7043982261 Jun  4 10:23 syslog
-rw-r-  1 syslog adm  37589664321 Jun  4 06:25 syslog.1
-rw-r-  1 syslog adm99405 Jun  3 06:25 syslog.2.gz
-rw-r-  1 syslog adm79388 Jun  2 06:25 syslog.3.gz
-rw-r-  1 syslog adm   155344 Jun  1 06:25 syslog.4.gz



Abdelmuhaimen Seaudi
Orange Labs Egypt
Email: abdelmuhaimen.sea...@orange.com
Mobile: +2012 84644 733

From: SEAUDI Abdelmuhaimen OBS/CSO
Sent: Saturday, June 2, 2018 1:01 PM
To: onap-discuss@lists.onap.org
Subject: /var/log/syslog taking up a lot of space on OOM Beijing

Hi,

I have an OOM Beijing instance running on 1 VM for Rancher Server, and 3 VMs 
Rancher hosts, each with 8 vCPUs, 52 GB RAM, 50 GB Root Parition, and 100 GB 
2nd Partition for /var/lib/docker/.

After running for 2 days, the installation is stable so far, only OOM-SNIRO 
gives FAIL in robot health check, and only pod onap-oof is failing, since the 
installation.

However, I noticed one of the nodes taking a lot of root storage space, and I 
found out it's /var/log/syslog and /var/log/syslog.1, which are taking ~43 GB 
of space.

What is the reason for this behaviour ?

root@olc-bjng-2:~# free -h
  totalusedfree  shared  buff/cache   available
Mem:51G 25G 12G557M 13G 24G
Swap:0B  0B  0B
root@olc-bjng-2:~# df -h /dev/vda1 /dev/vdb
Filesystem  Size  Used Avail Use% Mounted on
/dev/vda149G   44G  5.0G  90% /   a lot 
of space, not from /var/lib/docker
/dev/vdb 99G   30G   64G  32% /mnt
root@olc-bjng-2:~#

root@olc-bjng-2:/var/log# ls -l
total 42153316
...
-rw-r-  1 syslog adm   9811749726 Jun  2 10:40 syslog
-rw-r-  1 syslog adm  33351210056 Jun  2 06:25 syslog.1
-rw-r-  1 syslog adm   149784 Jun  1 06:25 syslog.2.gz

I see the following lines near the top of syslog.1

Jun  1 08:58:44 olc-bjng-2 dockerd[9856]: time="2018-06-01T08:58:44.599081414Z" 
level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in 
container 6b3b60d6dde71c0ea16b698b1b7a964e53fb3dd54bdf1e198a49d51f117a2c40"
Jun  1 08:58:45 olc-bjng-2 dockerd[9856]: time="2018-06-01T08:58:45.907586780Z" 
level=error msg="Handler for GET 
/v1.22/containers/952333fb19fec201d9adf226847d8a3e21045d831fe73c10082fe1124271f31a/json
 returned error: No such container: 
952333fb19fec201d9adf226847d8a3e21045d831fe73c10082fe1124271f31a"
Jun  1 08:59:43 olc-bjng-2 dockerd[9856]: time="2018-06-01T08:59:43.372667000Z" 
level=warning msg="failed to close stdin: rpc error: code = 2 desc = write 
/var/run/docker/libcontainerd/containerd/4780e1142fbea7740b0eded42057be2ab17cc0998b2319596654a9a3606b9045/6f635acd5a26750d0f84ee506ed28c4b18b39a64c8994966636d138a88db8e6c/control:
 bad file descriptor"
Jun  1 09:02:28 olc-bjng-2 dockerd[9856]: time="2018-06-01T09:02:28.564875759Z" 
level=error msg="Handler for DELETE 
/v1.27/images/sha256:b7fa6b9cb097d4be9c482f44a2ab2d84d0067b598f80dacfedc11b30feaf2fc6
 returned error: conflict: unable to delete b7fa6b9cb097 (cannot be forced) - 
image is being used by running container c55cddbe095c"
Jun  1 09:02:28 olc-bjng-2 dockerd[9856]: time="2018-06-01T09:02:28.568346021Z" 
level=error msg="Handler for DELETE 
/v1.27/images/sha256:14de771cc17886ac2b6e6eace825c8c67afb59b88804944421a5c2dbebe1ddaf
 returned error: conflict: unable to delete 14de771cc178 (cannot be forced) - 
image is being used by running container 1f3008e7bc83"
Jun  1 09:02:28 olc-bjng-2 dockerd[9856]: time="2018-06-01T09:02:28.570397502Z" 
level=error msg="Handler for DELETE 
/v1.27/images/sha256:bd33f8c865b1cefab6e876006de8542892e21205f33ecd0de82c778be72a2b39
 returned error: conflict: unable to delete bd33f8c865b1 (cannot be forced) - 
image is being used by running container 82a6677eaf0f"

And i see the following lines near the bottom of syslog.1
Jun  2 06:25:01 olc-bjng-2 dockerd[9856]: time="2018-06-02T06:25:01.634823245Z" 
level=error msg="Failed to log msg \"\\tat 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:419)\"
 for logger json-file: write 
/mnt/containers/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a-json.log:
 no space left on device"
Jun  2 06:25:01 olc-bjng-2 dockerd[9856]: time="2018-06-02T06:25:01.635142774Z" 
level=error msg="Failed to log msg \"\\tat 
org.glassfish.jersey.client.InboundJaxrsResponse.readEntity(InboundJaxrsResponse.java:108)\"
 for logger json-file: write 
/mnt/containers/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a-json.log:
 no space left on device"
Jun  2 06:25:01 olc-bjng-2 dockerd[9856]: 

[onap-discuss] /var/log/syslog taking up a lot of space on OOM Beijing

2018-06-02 Thread abdelmuhaimen.seaudi
Hi,

I have an OOM Beijing instance running on 1 VM for Rancher Server, and 3 VMs 
Rancher hosts, each with 8 vCPUs, 52 GB RAM, 50 GB Root Parition, and 100 GB 
2nd Partition for /var/lib/docker/.

After running for 2 days, the installation is stable so far, only OOM-SNIRO 
gives FAIL in robot health check, and only pod onap-oof is failing, since the 
installation.

However, I noticed one of the nodes taking a lot of root storage space, and I 
found out it's /var/log/syslog and /var/log/syslog.1, which are taking ~43 GB 
of space.

What is the reason for this behaviour ?

root@olc-bjng-2:~# free -h
  totalusedfree  shared  buff/cache   available
Mem:51G 25G 12G557M 13G 24G
Swap:0B  0B  0B
root@olc-bjng-2:~# df -h /dev/vda1 /dev/vdb
Filesystem  Size  Used Avail Use% Mounted on
/dev/vda149G   44G  5.0G  90% /   a lot 
of space, not from /var/lib/docker
/dev/vdb 99G   30G   64G  32% /mnt
root@olc-bjng-2:~#

root@olc-bjng-2:/var/log# ls -l
total 42153316
...
-rw-r-  1 syslog adm   9811749726 Jun  2 10:40 syslog
-rw-r-  1 syslog adm  33351210056 Jun  2 06:25 syslog.1
-rw-r-  1 syslog adm   149784 Jun  1 06:25 syslog.2.gz

I see the following lines near the top of syslog.1

Jun  1 08:58:44 olc-bjng-2 dockerd[9856]: time="2018-06-01T08:58:44.599081414Z" 
level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in 
container 6b3b60d6dde71c0ea16b698b1b7a964e53fb3dd54bdf1e198a49d51f117a2c40"
Jun  1 08:58:45 olc-bjng-2 dockerd[9856]: time="2018-06-01T08:58:45.907586780Z" 
level=error msg="Handler for GET 
/v1.22/containers/952333fb19fec201d9adf226847d8a3e21045d831fe73c10082fe1124271f31a/json
 returned error: No such container: 
952333fb19fec201d9adf226847d8a3e21045d831fe73c10082fe1124271f31a"
Jun  1 08:59:43 olc-bjng-2 dockerd[9856]: time="2018-06-01T08:59:43.372667000Z" 
level=warning msg="failed to close stdin: rpc error: code = 2 desc = write 
/var/run/docker/libcontainerd/containerd/4780e1142fbea7740b0eded42057be2ab17cc0998b2319596654a9a3606b9045/6f635acd5a26750d0f84ee506ed28c4b18b39a64c8994966636d138a88db8e6c/control:
 bad file descriptor"
Jun  1 09:02:28 olc-bjng-2 dockerd[9856]: time="2018-06-01T09:02:28.564875759Z" 
level=error msg="Handler for DELETE 
/v1.27/images/sha256:b7fa6b9cb097d4be9c482f44a2ab2d84d0067b598f80dacfedc11b30feaf2fc6
 returned error: conflict: unable to delete b7fa6b9cb097 (cannot be forced) - 
image is being used by running container c55cddbe095c"
Jun  1 09:02:28 olc-bjng-2 dockerd[9856]: time="2018-06-01T09:02:28.568346021Z" 
level=error msg="Handler for DELETE 
/v1.27/images/sha256:14de771cc17886ac2b6e6eace825c8c67afb59b88804944421a5c2dbebe1ddaf
 returned error: conflict: unable to delete 14de771cc178 (cannot be forced) - 
image is being used by running container 1f3008e7bc83"
Jun  1 09:02:28 olc-bjng-2 dockerd[9856]: time="2018-06-01T09:02:28.570397502Z" 
level=error msg="Handler for DELETE 
/v1.27/images/sha256:bd33f8c865b1cefab6e876006de8542892e21205f33ecd0de82c778be72a2b39
 returned error: conflict: unable to delete bd33f8c865b1 (cannot be forced) - 
image is being used by running container 82a6677eaf0f"

And i see the following lines near the bottom of syslog.1
Jun  2 06:25:01 olc-bjng-2 dockerd[9856]: time="2018-06-02T06:25:01.634823245Z" 
level=error msg="Failed to log msg \"\\tat 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:419)\"
 for logger json-file: write 
/mnt/containers/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a-json.log:
 no space left on device"
Jun  2 06:25:01 olc-bjng-2 dockerd[9856]: time="2018-06-02T06:25:01.635142774Z" 
level=error msg="Failed to log msg \"\\tat 
org.glassfish.jersey.client.InboundJaxrsResponse.readEntity(InboundJaxrsResponse.java:108)\"
 for logger json-file: write 
/mnt/containers/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a-json.log:
 no space left on device"
Jun  2 06:25:01 olc-bjng-2 dockerd[9856]: time="2018-06-02T06:25:01.635522112Z" 
level=error msg="Failed to log msg \"\\tat 
org.onap.usecaseui.server.util.DmaapSubscriber.getDMaaPData(DmaapSubscriber.java:112)\"
 for logger json-file: write 
/mnt/containers/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a/36441e295ace461c6c1e5a2a60f8004272c056619d381846349ce47c8900385a-json.log:
 no space left on device"
Jun  2 06:25:01 olc-bjng-2 dockerd[9856]: time="2018-06-02T06:25:01.636113243Z" 
level=error msg="Failed to log msg \"\\tat 
org.onap.usecaseui.server.util.DmaapSubscriber.subscribe(DmaapSubscriber.java:79)\"
 for logger json-file: write