Re: Logging seems to be working, but no logs are collected

Noriko Hosoi Wed, 01 Nov 2017 14:00:57 -0700

On 11/01/2017 12:56 PM, Rich Megginson wrote:

On 11/01/2017 01:18 PM, Tim Dudgeon wrote:
More data on this.
Just to confirm that the journal on the node is receiving events:
sudo journalctl -n 25
-- Logs begin at Wed 2017-11-01 14:24:08 UTC, end at Wed 2017-11-0119:15:15 UTC. --Nov 01 19:14:23 master-1.openstacklocal origin-master[15148]: I110119:14:23.286735 15148 rest.go:324] Starting watch for/api/v1/configmaps, rv=1940 labels= fieldsNov 01 19:14:24 master-1.openstacklocal origin-master[15148]: I110119:14:24.288497 15148 rest.go:324] Starting watch for/api/v1/nodes, rv=6595 labels= fields= timNov 01 19:14:29 master-1.openstacklocal origin-master[15148]: I110119:14:29.283528 15148 rest.go:324] Starting watch for/apis/extensions/v1beta1/ingresses, rv=4 lNov 01 19:14:36 master-1.openstacklocal origin-master[15148]: I110119:14:36.566696 15148 rest.go:324] Starting watch for /api/v1/pods,rv=6028 labels= fields= timeNov 01 19:14:40 master-1.openstacklocal origin-master[15148]: I110119:14:40.284191 15148 rest.go:324] Starting watch for/api/v1/persistentvolumeclaims, rv=1606 laNov 01 19:14:43 master-1.openstacklocal origin-master[15148]: I110119:14:43.291205 15148 rest.go:324] Starting watch for/apis/authorization.openshift.io/v1/policyNov 01 19:14:43 master-1.openstacklocal origin-master[15148]: I110119:14:43.348888 15148 rest.go:324] Starting watch for/oapi/v1/hostsubnets, rv=1054 labels= fielNov 01 19:14:47 master-1.openstacklocal origin-node[20672]: I110119:14:47.255576 20672 operation_generator.go:609] MountVolume.SetUpsucceeded for volume "kubernetNov 01 19:14:47 master-1.openstacklocal origin-node[20672]: I110119:14:47.256440 20672 operation_generator.go:609] MountVolume.SetUpsucceeded for volume "kubernetNov 01 19:14:47 master-1.openstacklocal origin-node[20672]: I110119:14:47.258455 20672 operation_generator.go:609] MountVolume.SetUpsucceeded for volume "kubernetNov 01 19:14:48 master-1.openstacklocal origin-master[15148]: I110119:14:48.291988 15148 rest.go:324] Starting watch for/apis/authorization.openshift.io/v1/clusteNov 01 19:14:51 master-1.openstacklocal sshd[46103]: Invalid useradmin from 118.89.45.36 port 17929Nov 01 19:14:51 master-1.openstacklocal sshd[46103]:input_userauth_request: invalid user admin [preauth]Nov 01 19:14:52 master-1.openstacklocal sshd[46103]: Connectionclosed by 118.89.45.36 port 17929 [preauth]Nov 01 19:14:56 master-1.openstacklocal origin-master[15148]: I110119:14:56.206290 15148 rest.go:324] Starting watch for/api/v1/services, rv=2008 labels= fields=Nov 01 19:14:57 master-1.openstacklocal origin-master[15148]: I110119:14:57.559640 15148 rest.go:324] Starting watch for/api/v1/namespaces, rv=1845 labels= fieldsNov 01 19:14:59 master-1.openstacklocal origin-master[15148]: I110119:14:59.275807 15148 rest.go:324] Starting watch for/api/v1/podtemplates, rv=4 labels= fields=Nov 01 19:14:59 master-1.openstacklocal origin-master[15148]: I110119:14:59.459554 15148 rest.go:324] Starting watch for/apis/storage.k8s.io/v1beta1/storageclasseNov 01 19:15:01 master-1.openstacklocal origin-master[15148]: I110119:15:01.286182 15148 rest.go:324] Starting watch for/apis/extensions/v1beta1/replicasets, rv=4Nov 01 19:15:06 master-1.openstacklocal origin-master[15148]: I110119:15:06.270704 15148 rest.go:324] Starting watch for/apis/security.openshift.io/v1/securityconNov 01 19:15:06 master-1.openstacklocal origin-master[15148]: I110119:15:06.290752 15148 rest.go:324] Starting watch for/apis/batch/v2alpha1/cronjobs, rv=4 labelsNov 01 19:15:08 master-1.openstacklocal origin-master[15148]: I110119:15:08.330948 15148 rest.go:324] Starting watch for/api/v1/services, rv=2008 labels= fields=Nov 01 19:15:08 master-1.openstacklocal origin-master[15148]: I110119:15:08.460997 15148 rest.go:324] Starting watch for/api/v1/serviceaccounts, rv=1909 labels= fNov 01 19:15:14 master-1.openstacklocal origin-master[15148]: I110119:15:14.286471 15148 rest.go:324] Starting watch for/apis/rbac.authorization.k8s.io/v1beta1/roNov 01 19:15:15 master-1.openstacklocal sudo[46140]: centos :TTY=pts/0 ; PWD=/home/centos ; USER=root ; COMMAND=/bin/journalctl -n 25
So why is the fluentd running on that node not picking up these events?
I think it has to do with this:
2017-11-01 16:59:47 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:49 +0000 [warn]: no patterns matchedtag="kubernetes.journal.container"
That's very bad.

Noriko, what is the fix for the missing @OUTPUT section?


I have 2 questions.

In the fluentd pod:

   oc rsh $FLUENTDPOD

Do we have a filter-post-z-* config file in /etc/fluent/configs.d?
# ls /etc/fluent/configs.d/openshift/filter-post-z-*
/etc/fluent/configs.d/openshift/filter-post-z-retag-two.conf

Also, how does the fluentd's configmap look like?
oc edit configmap $FLUENTDPOD

Does the configmap have <label @OUTPUT> as follows?
8<-----------------------------------------------------------------------------------------
    <label @INGRESS>
    ## filters
      @include configs.d/openshift/filter-pre-*.conf
      @include configs.d/openshift/filter-retag-journal.conf
      @include configs.d/openshift/filter-k8s-meta.conf
      @include configs.d/openshift/filter-kibana-transform.conf
      @include configs.d/openshift/filter-k8s-flatten-hash.conf
      @include configs.d/openshift/filter-k8s-record-transform.conf
      @include configs.d/openshift/filter-syslog-record-transform.conf
      @include configs.d/openshift/filter-viaq-data-model.conf
      @include configs.d/openshift/filter-post-*.conf
    ##
    </label>

    <label @OUTPUT>
    ## matches
      @include configs.d/openshift/output-pre-*.conf
      @include configs.d/openshift/output-operations.conf
      @include configs.d/openshift/output-applications.conf
      # no post - applications.conf matches everything left
    ##
    </label>
8<-----------------------------------------------------------------------------------------

If there is no filter-post-z-* config file in/etc/fluent/configs.d/openshift, please remove </label> and <label@OUTPUT> as follows:

8<-----------------------------------------------------------------------------------------
    <label @INGRESS>
    ## filters
      @include configs.d/openshift/filter-pre-*.conf
      @include configs.d/openshift/filter-retag-journal.conf
      @include configs.d/openshift/filter-k8s-meta.conf
      @include configs.d/openshift/filter-kibana-transform.conf
      @include configs.d/openshift/filter-k8s-flatten-hash.conf
      @include configs.d/openshift/filter-k8s-record-transform.conf
      @include configs.d/openshift/filter-syslog-record-transform.conf
      @include configs.d/openshift/filter-viaq-data-model.conf
      @include configs.d/openshift/filter-post-*.conf
    ##

    ## matches
      @include configs.d/openshift/output-pre-*.conf
      @include configs.d/openshift/output-operations.conf
      @include configs.d/openshift/output-applications.conf
      # no post - applications.conf matches everything left
    ##
    </label>
8<-----------------------------------------------------------------------------------------

If you have the filter-post-z-* config file in/etc/fluent/configs.d/openshift and do not have </label> and <label@OUTPUT>, please add them. (I don't think that's the case since thefluentd run.sh does not install filter-post-z-* unless <label @OUTPUT>is found in the configmap.)


Thanks,
--noriko

On 01/11/2017 18:16, Tim Dudgeon wrote:
Correction/update on this.
The `journalctl -n 100` command runs ON on the host but not insidethe pod.
The file `/var/log/journal.pos` is present both on the host and inthe pod.
Tim


On 01/11/2017 17:28, Tim Dudgeon wrote:
So I've tried this and a few other variants but not made any progress.
The issue seems to be that there are no journal logs?

# journalctl -n 100
No journal files were found.
-- No entries --

Even though:

# cat /var/log/journal.pos
s=8da3038f46274f8f80cadbf839d487a5;i=45bd;b=80a3902da560465e8799ccf3e6fb2ef7;m=27729aac6;t=55cef2678535f;x=42fa04d62b52d49fsh-4.2
And in the logs of the pod I see this:

$  oc logs logging-fluentd-h6f3h
umounts of dead containers will fail. Ignoring...
umount:/var/lib/docker/containers/30effb9ff35fc74b9bf37ebeeb5d0d61b515a55e4f3ae52e9bb618ac55704d73/shm:not mountedumount:/var/lib/docker/containers/39b5c1572e79dd2e698917a7116c6110d2c6eb0a6761142a6e718904f6c43022/shm:not mountedumount:/var/lib/docker/containers/64c1c27537aa7441ded69a04c78f2f2ce60920fa6e4dc628637a19289b2ead6a/shm:not mountedumount:/var/lib/docker/containers/7b8564902f011522917c6cffd8a39133cabb8588229f2836c9fbcee95960ac78/shm:not mountedumount:/var/lib/docker/containers/b85e6d1123da047a7ffe679edfb71376267ef27e9525c7097f3fd6668acd110e/shm:not mountedumount:/var/lib/docker/containers/c02f10b8dcf69979a95305a76c2f570aaf37fb9c2c0cad6893ed1822f7f24274/shm:not mountedumount:/var/lib/docker/containers/c30c4f0f34b2470ef5280c85a6db3910b143707df997ad6ee6ed2c2208009a70/shm:not mountedumount:/var/lib/docker/containers/c67c4e5e89b5f41c593ba2e538671821b6b43936962e8d49785b292644c4a031/shm:not mounted2017-11-01 16:59:42 +0000 [info]: reading config filepath="/etc/fluent/fluent.conf"2017-11-01 16:59:43 +0000 [warn]: 'block' action stops inputprocess until the buffer full is resolved. Check your pipeline thisaction is fit or not2017-11-01 16:59:43 +0000 [warn]: 'block' action stops inputprocess until the buffer full is resolved. Check your pipeline thisaction is fit or not2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:43 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:44 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:44 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:46 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:47 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:49 +0000 [warn]: no patterns matchedtag="kubernetes.journal.container"2017-11-01 16:59:51 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 16:59:52 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 17:01:02 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 17:02:30 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 17:06:04 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 17:10:01 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 17:14:01 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 17:18:06 +0000 [warn]: no patterns matchedtag="journal.system"2017-11-01 17:22:09 +0000 [warn]: no patterns matchedtag="journal.system"
This is really a basic centos7 image with the only modificationsdone by installing the packages required by openshift and thenrunning the ansible installer.
Tim


On 31/10/2017 18:15, Rich Megginson wrote:
Very strange. It would appear that fluentd was not able to keepup with the log rate to the journal for such an extent that thefluentd current cursor position was rotated away . . .
You can "reset" fluentd by shutting it down, then removing thatcursor file. That will tell fluentd to start reading from thetail of the journal. but NOTE - THAT WILL LOSE ALL RECORDSCURRENTLY IN THE JOURNAL. If you want to try to recovereverything in the journal, then oc set env ds/logging-fluentdJOURNAL_READ_FROM_HEAD=true - but note that this may take severalhours until you have recent records in Elasticsearch, depending onwhat is the log rate to the journal and how fast fluentd can keep up.
If you go the JOURNAL_READ_FROM_HEAD=true route, setting the envshould trigger a redeployment of fluentd, so you should not haveto restart/relabel.
oc label node --all --overwrite logging-infra-fluentd-
... wait for oc pods to report no logging-fluentd pods ...
rm -f /var/log/journal.pos
oc label node --all --overwrite logging-infra-fluentd=true

Then, monitor fluentd like this:
https://github.com/openshift/origin-aggregated-logging/blob/master/hack/testing/entrypoint.sh#L56
and monitor the journald log rate (number of logs/minute) like this:
https://github.com/openshift/origin-aggregated-logging/blob/master/hack/testing/entrypoint.sh#L70
On 10/31/2017 11:57 AM, Tim Dudgeon wrote:
$ sudo docker info | grep -i log
WARNING: Usage of loopback devices is strongly discouraged forproduction use. Use `--storage-opt dm.thinpooldev` to specify acustom block storage device.
Logging Driver: journald
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

$ journalctl -r -n 1 --show-cursor
-- Logs begin at Sun 2017-10-29 03:04:42 UTC, end at Tue2017-10-31 17:54:37 UTC. --Oct 31 17:54:37 worker-1.openstacklocal dockerd-current[6135]:{"type":"response","@timestamp":"2017-10-31T17:54:37Z","tags":[],"pid":8,"-- cursor:s=f746c7090d724f5ab0ece0d13683fc53;i=a54f2;b=93b6daa912044dd9ae9f05521c603efc;m=55116ad995;t=55cdb72d7c92d;x=5a16032caedc4423
On 31/10/2017 17:31, Rich Megginson wrote:
# docker info | grep -i log

# journalctl -r -n 1 --show-cursor


On 10/31/2017 11:12 AM, Tim Dudgeon wrote:
Thanks. Those links are useful.
It looks to me like its a problem at the fluentd level. This iswhat I see on on of the fluentd pods:
sh-4.2# cat /var/log/es-containers.log.pos
cat: /var/log/es-containers.log.pos: No such file or directory
sh-4.2# cat /var/log/journal.pos
s=52fdd277f90749b0a442c78739b1efa7;i=50d69;b=2a3f1736a1a1486d83f95db719fdc281;m=5465b53fd1;t=55cdac4738846;x=85596f3f5f5a27e4sh-4.2#
sh-4.2# journalctl -c `cat /var/log/journal.pos`
No journal files were found.
-- No entries --
Which might sort of explain why everything is running but nologs are being processed.
This is based on a centos7 image with only the necessaryopenshift packages installed and then openshift installed usingansible. The logging setup in the inventory file is this:
openshift_hosted_logging_deployer_version=v3.6.0
openshift_hosted_logging_deploy=true
openshift_hosted_logging_storage_kind=nfs
openshift_hosted_logging_storage_access_modes=['ReadWriteOnce']
openshift_hosted_logging_storage_nfs_directory=/exports
openshift_hosted_logging_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_logging_storage_volume_name=logging
openshift_hosted_logging_storage_volume_size=10Gi
openshift_hosted_logging_storage_labels={'storage': 'logging'}


Tim


On 31/10/2017 16:37, Jeff Cantrill wrote:
Please provide additional information, logs, etc or post theoutput of [1] someplace for review. Additionally, considerreviewing [2].
[1]https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh[2]https://github.com/openshift/origin-aggregated-logging/blob/master/docs/checking-efk-health.md
On Tue, Oct 31, 2017 at 11:47 AM, Tim Dudgeon<tdudgeon...@gmail.com <mailto:tdudgeon...@gmail.com>> wrote:
    Hi All,
I've deployed logging using the ansible installer (v3.6.0)for a fairly simple openshift setup and everything appears torunning:
    NAME              READY STATUS RESTARTS   AGE
logging-curator-1-gvh73 1/1 Running24 3d logging-es-data-master-xz0e7a0c-1-deploy 0/1 Error0 3d logging-es-data-master-xz0e7a0c-4-deploy 0/1 Error0 3d logging-es-data-master-xz0e7a0c-5-deploy 0/1 Error0 3d logging-es-data-master-xz0e7a0c-7-t4xpf 1/1 Running0 3d
    logging-fluentd-4rm2w              1/1 Running 0 3d
    logging-fluentd-8h944              1/1 Running 0 3d
    logging-fluentd-n00bn              1/1 Running 0 3d
    logging-fluentd-vt8hh              1/1 Running 0 3d
    logging-kibana-1-g7l4z              2/2 Running 0 3d
(the failed pods were related to getting elasticsearchrunning,
    but that was resolved).
The problem is that I don't see any logs in Kibana. When Ilook
    in the fluentd pod logs I see lots of stuff like this:

    2017-10-31 13:53:15 +0000 [warn]: no patterns matched
    tag="journal.system"
    2017-10-31 13:58:02 +0000 [warn]: no patterns matched
    tag="kubernetes.journal.container"
    2017-10-31 14:02:18 +0000 [warn]: no patterns matched
    tag="journal.system"
    2017-10-31 14:07:15 +0000 [warn]: no patterns matched
    tag="journal.system"
    2017-10-31 14:11:20 +0000 [warn]: no patterns matched
    tag="journal.system"
    2017-10-31 14:15:16 +0000 [warn]: no patterns matched
    tag="journal.system"
    2017-10-31 14:19:58 +0000 [warn]: no patterns matched
    tag="journal.system"

    Is this the cause, and if so what is wrong?
    If not how to debug this?

    Tim



_______________________________________________
    users mailing list
    users@lists.openshift.redhat.com
<mailto:users@lists.openshift.redhat.com>
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
<http://lists.openshift.redhat.com/openshiftmm/listinfo/users>




--
--
Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
OpenShift Integration Services
Red Hat, Inc.
*Office*: 703-748-4420 | 866-546-8970 ext. 8162420
jcant...@redhat.com <mailto:jcant...@redhat.com>
http://www.redhat.com
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Logging seems to be working, but no logs are collected

Reply via email to