On 11/02/2017 02:01 AM, Tim Dudgeon wrote:
Noriko, That fixed it.
There was no filter-post-z-* file and the </label> and <label @OUTPUT>
tags were present.
After removing those tags and restarting the fluentd pods logs are
getting pushed to ES.
So the question is how to avoid this problem in the first place?
Upstream logging is a bit of a mess right now.
Some time ago we decoupled the configuration of logging from the
implementation. That is, we moved all of the configuration into
openshift-ansible. That meant we needed to either release
openshift-ansible packages and logging images in absolute lock-step
(which didn't happen - in fact we never released upstream logging images
for 3.6.x - this is now being addressed -
https://github.com/openshift/origin-aggregated-logging/pull/758), or we
need to ensure that openshift-ansible logging changes did not depend on
the image version, and vice versa (this also didn't happen - we released
changes to the logging images that assumed they would only ever be
deployed with a specific version of openshift-ansible, instead of
adopting a more "defensive programming" style).
This was a simple ansible install using this in the inventory file:
openshift_logging_image_version=v3.6.1
openshift_hosted_logging_deploy=true
openshift_logging_fluentd_journal_read_from_head=false
(note, the image tag for the ES deployment currently needs to be
changed to :latest for ES to start, but that's a separate issue).
On 01/11/2017 21:00, Noriko Hosoi wrote:
On 11/01/2017 12:56 PM, Rich Megginson wrote:
On 11/01/2017 01:18 PM, Tim Dudgeon wrote:
More data on this.
Just to confirm that the journal on the node is receiving events:
sudo journalctl -n 25
-- Logs begin at Wed 2017-11-01 14:24:08 UTC, end at Wed 2017-11-01
19:15:15 UTC. --
Nov 01 19:14:23 master-1.openstacklocal origin-master[15148]: I1101
19:14:23.286735 15148 rest.go:324] Starting watch for
/api/v1/configmaps, rv=1940 labels= fields
Nov 01 19:14:24 master-1.openstacklocal origin-master[15148]: I1101
19:14:24.288497 15148 rest.go:324] Starting watch for
/api/v1/nodes, rv=6595 labels= fields= tim
Nov 01 19:14:29 master-1.openstacklocal origin-master[15148]: I1101
19:14:29.283528 15148 rest.go:324] Starting watch for
/apis/extensions/v1beta1/ingresses, rv=4 l
Nov 01 19:14:36 master-1.openstacklocal origin-master[15148]: I1101
19:14:36.566696 15148 rest.go:324] Starting watch for
/api/v1/pods, rv=6028 labels= fields= time
Nov 01 19:14:40 master-1.openstacklocal origin-master[15148]: I1101
19:14:40.284191 15148 rest.go:324] Starting watch for
/api/v1/persistentvolumeclaims, rv=1606 la
Nov 01 19:14:43 master-1.openstacklocal origin-master[15148]: I1101
19:14:43.291205 15148 rest.go:324] Starting watch for
/apis/authorization.openshift.io/v1/policy
Nov 01 19:14:43 master-1.openstacklocal origin-master[15148]: I1101
19:14:43.348888 15148 rest.go:324] Starting watch for
/oapi/v1/hostsubnets, rv=1054 labels= fiel
Nov 01 19:14:47 master-1.openstacklocal origin-node[20672]: I1101
19:14:47.255576 20672 operation_generator.go:609]
MountVolume.SetUp succeeded for volume "kubernet
Nov 01 19:14:47 master-1.openstacklocal origin-node[20672]: I1101
19:14:47.256440 20672 operation_generator.go:609]
MountVolume.SetUp succeeded for volume "kubernet
Nov 01 19:14:47 master-1.openstacklocal origin-node[20672]: I1101
19:14:47.258455 20672 operation_generator.go:609]
MountVolume.SetUp succeeded for volume "kubernet
Nov 01 19:14:48 master-1.openstacklocal origin-master[15148]: I1101
19:14:48.291988 15148 rest.go:324] Starting watch for
/apis/authorization.openshift.io/v1/cluste
Nov 01 19:14:51 master-1.openstacklocal sshd[46103]: Invalid user
admin from 118.89.45.36 port 17929
Nov 01 19:14:51 master-1.openstacklocal sshd[46103]:
input_userauth_request: invalid user admin [preauth]
Nov 01 19:14:52 master-1.openstacklocal sshd[46103]: Connection
closed by 118.89.45.36 port 17929 [preauth]
Nov 01 19:14:56 master-1.openstacklocal origin-master[15148]: I1101
19:14:56.206290 15148 rest.go:324] Starting watch for
/api/v1/services, rv=2008 labels= fields=
Nov 01 19:14:57 master-1.openstacklocal origin-master[15148]: I1101
19:14:57.559640 15148 rest.go:324] Starting watch for
/api/v1/namespaces, rv=1845 labels= fields
Nov 01 19:14:59 master-1.openstacklocal origin-master[15148]: I1101
19:14:59.275807 15148 rest.go:324] Starting watch for
/api/v1/podtemplates, rv=4 labels= fields=
Nov 01 19:14:59 master-1.openstacklocal origin-master[15148]: I1101
19:14:59.459554 15148 rest.go:324] Starting watch for
/apis/storage.k8s.io/v1beta1/storageclasse
Nov 01 19:15:01 master-1.openstacklocal origin-master[15148]: I1101
19:15:01.286182 15148 rest.go:324] Starting watch for
/apis/extensions/v1beta1/replicasets, rv=4
Nov 01 19:15:06 master-1.openstacklocal origin-master[15148]: I1101
19:15:06.270704 15148 rest.go:324] Starting watch for
/apis/security.openshift.io/v1/securitycon
Nov 01 19:15:06 master-1.openstacklocal origin-master[15148]: I1101
19:15:06.290752 15148 rest.go:324] Starting watch for
/apis/batch/v2alpha1/cronjobs, rv=4 labels
Nov 01 19:15:08 master-1.openstacklocal origin-master[15148]: I1101
19:15:08.330948 15148 rest.go:324] Starting watch for
/api/v1/services, rv=2008 labels= fields=
Nov 01 19:15:08 master-1.openstacklocal origin-master[15148]: I1101
19:15:08.460997 15148 rest.go:324] Starting watch for
/api/v1/serviceaccounts, rv=1909 labels= f
Nov 01 19:15:14 master-1.openstacklocal origin-master[15148]: I1101
19:15:14.286471 15148 rest.go:324] Starting watch for
/apis/rbac.authorization.k8s.io/v1beta1/ro
Nov 01 19:15:15 master-1.openstacklocal sudo[46140]: centos :
TTY=pts/0 ; PWD=/home/centos ; USER=root ; COMMAND=/bin/journalctl
-n 25
So why is the fluentd running on that node not picking up these
events?
I think it has to do with this:
2017-11-01 16:59:47 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:49 +0000 [warn]: no patterns matched
tag="kubernetes.journal.container"
That's very bad.
Noriko, what is the fix for the missing @OUTPUT section?
I have 2 questions.
In the fluentd pod:
oc rsh $FLUENTDPOD
Do we have a filter-post-z-* config file in /etc/fluent/configs.d?
# ls /etc/fluent/configs.d/openshift/filter-post-z-*
/etc/fluent/configs.d/openshift/filter-post-z-retag-two.conf
Also, how does the fluentd's configmap look like?
oc edit configmap $FLUENTDPOD
Does the configmap have <label @OUTPUT> as follows?
8<-----------------------------------------------------------------------------------------
<label @INGRESS>
## filters
@include configs.d/openshift/filter-pre-*.conf
@include configs.d/openshift/filter-retag-journal.conf
@include configs.d/openshift/filter-k8s-meta.conf
@include configs.d/openshift/filter-kibana-transform.conf
@include configs.d/openshift/filter-k8s-flatten-hash.conf
@include configs.d/openshift/filter-k8s-record-transform.conf
@include configs.d/openshift/filter-syslog-record-transform.conf
@include configs.d/openshift/filter-viaq-data-model.conf
@include configs.d/openshift/filter-post-*.conf
##
</label>
<label @OUTPUT>
## matches
@include configs.d/openshift/output-pre-*.conf
@include configs.d/openshift/output-operations.conf
@include configs.d/openshift/output-applications.conf
# no post - applications.conf matches everything left
##
</label>
8<-----------------------------------------------------------------------------------------
If there is no filter-post-z-* config file in
/etc/fluent/configs.d/openshift, please remove </label> and <label
@OUTPUT> as follows:
8<-----------------------------------------------------------------------------------------
<label @INGRESS>
## filters
@include configs.d/openshift/filter-pre-*.conf
@include configs.d/openshift/filter-retag-journal.conf
@include configs.d/openshift/filter-k8s-meta.conf
@include configs.d/openshift/filter-kibana-transform.conf
@include configs.d/openshift/filter-k8s-flatten-hash.conf
@include configs.d/openshift/filter-k8s-record-transform.conf
@include configs.d/openshift/filter-syslog-record-transform.conf
@include configs.d/openshift/filter-viaq-data-model.conf
@include configs.d/openshift/filter-post-*.conf
##
## matches
@include configs.d/openshift/output-pre-*.conf
@include configs.d/openshift/output-operations.conf
@include configs.d/openshift/output-applications.conf
# no post - applications.conf matches everything left
##
</label>
8<-----------------------------------------------------------------------------------------
If you have the filter-post-z-* config file in
/etc/fluent/configs.d/openshift and do not have </label> and <label
@OUTPUT>, please add them. (I don't think that's the case since the
fluentd run.sh does not install filter-post-z-* unless <label
@OUTPUT> is found in the configmap.)
Thanks,
--noriko
On 01/11/2017 18:16, Tim Dudgeon wrote:
Correction/update on this.
The `journalctl -n 100` command runs ON on the host but not inside
the pod.
The file `/var/log/journal.pos` is present both on the host and in
the pod.
Tim
On 01/11/2017 17:28, Tim Dudgeon wrote:
So I've tried this and a few other variants but not made any
progress.
The issue seems to be that there are no journal logs?
# journalctl -n 100
No journal files were found.
-- No entries --
Even though:
# cat /var/log/journal.pos
s=8da3038f46274f8f80cadbf839d487a5;i=45bd;b=80a3902da560465e8799ccf3e6fb2ef7;m=27729aac6;t=55cef2678535f;x=42fa04d62b52d49fsh-4.2
And in the logs of the pod I see this:
$ oc logs logging-fluentd-h6f3h
umounts of dead containers will fail. Ignoring...
umount:
/var/lib/docker/containers/30effb9ff35fc74b9bf37ebeeb5d0d61b515a55e4f3ae52e9bb618ac55704d73/shm:
not mounted
umount:
/var/lib/docker/containers/39b5c1572e79dd2e698917a7116c6110d2c6eb0a6761142a6e718904f6c43022/shm:
not mounted
umount:
/var/lib/docker/containers/64c1c27537aa7441ded69a04c78f2f2ce60920fa6e4dc628637a19289b2ead6a/shm:
not mounted
umount:
/var/lib/docker/containers/7b8564902f011522917c6cffd8a39133cabb8588229f2836c9fbcee95960ac78/shm:
not mounted
umount:
/var/lib/docker/containers/b85e6d1123da047a7ffe679edfb71376267ef27e9525c7097f3fd6668acd110e/shm:
not mounted
umount:
/var/lib/docker/containers/c02f10b8dcf69979a95305a76c2f570aaf37fb9c2c0cad6893ed1822f7f24274/shm:
not mounted
umount:
/var/lib/docker/containers/c30c4f0f34b2470ef5280c85a6db3910b143707df997ad6ee6ed2c2208009a70/shm:
not mounted
umount:
/var/lib/docker/containers/c67c4e5e89b5f41c593ba2e538671821b6b43936962e8d49785b292644c4a031/shm:
not mounted
2017-11-01 16:59:42 +0000 [info]: reading config file
path="/etc/fluent/fluent.conf"
2017-11-01 16:59:43 +0000 [warn]: 'block' action stops input
process until the buffer full is resolved. Check your pipeline
this action is fit or not
2017-11-01 16:59:43 +0000 [warn]: 'block' action stops input
process until the buffer full is resolved. Check your pipeline
this action is fit or not
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:43 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:44 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:44 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:46 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:47 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:49 +0000 [warn]: no patterns matched
tag="kubernetes.journal.container"
2017-11-01 16:59:51 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 16:59:52 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 17:01:02 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 17:02:30 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 17:06:04 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 17:10:01 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 17:14:01 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 17:18:06 +0000 [warn]: no patterns matched
tag="journal.system"
2017-11-01 17:22:09 +0000 [warn]: no patterns matched
tag="journal.system"
This is really a basic centos7 image with the only modifications
done by installing the packages required by openshift and then
running the ansible installer.
Tim
On 31/10/2017 18:15, Rich Megginson wrote:
Very strange. It would appear that fluentd was not able to keep
up with the log rate to the journal for such an extent that the
fluentd current cursor position was rotated away . . .
You can "reset" fluentd by shutting it down, then removing that
cursor file. That will tell fluentd to start reading from the
tail of the journal. but NOTE - THAT WILL LOSE ALL RECORDS
CURRENTLY IN THE JOURNAL. If you want to try to recover
everything in the journal, then oc set env ds/logging-fluentd
JOURNAL_READ_FROM_HEAD=true - but note that this may take
several hours until you have recent records in Elasticsearch,
depending on what is the log rate to the journal and how fast
fluentd can keep up.
If you go the JOURNAL_READ_FROM_HEAD=true route, setting the env
should trigger a redeployment of fluentd, so you should not have
to restart/relabel.
oc label node --all --overwrite logging-infra-fluentd-
... wait for oc pods to report no logging-fluentd pods ...
rm -f /var/log/journal.pos
oc label node --all --overwrite logging-infra-fluentd=true
Then, monitor fluentd like this:
https://github.com/openshift/origin-aggregated-logging/blob/master/hack/testing/entrypoint.sh#L56
and monitor the journald log rate (number of logs/minute) like
this:
https://github.com/openshift/origin-aggregated-logging/blob/master/hack/testing/entrypoint.sh#L70
On 10/31/2017 11:57 AM, Tim Dudgeon wrote:
$ sudo docker info | grep -i log
WARNING: Usage of loopback devices is strongly discouraged for
production use. Use `--storage-opt dm.thinpooldev` to specify a
custom block storage device.
Logging Driver: journald
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
$ journalctl -r -n 1 --show-cursor
-- Logs begin at Sun 2017-10-29 03:04:42 UTC, end at Tue
2017-10-31 17:54:37 UTC. --
Oct 31 17:54:37 worker-1.openstacklocal dockerd-current[6135]:
{"type":"response","@timestamp":"2017-10-31T17:54:37Z","tags":[],"pid":8,"
-- cursor:
s=f746c7090d724f5ab0ece0d13683fc53;i=a54f2;b=93b6daa912044dd9ae9f05521c603efc;m=55116ad995;t=55cdb72d7c92d;x=5a16032caedc4423
On 31/10/2017 17:31, Rich Megginson wrote:
# docker info | grep -i log
# journalctl -r -n 1 --show-cursor
On 10/31/2017 11:12 AM, Tim Dudgeon wrote:
Thanks. Those links are useful.
It looks to me like its a problem at the fluentd level. This
is what I see on on of the fluentd pods:
sh-4.2# cat /var/log/es-containers.log.pos
cat: /var/log/es-containers.log.pos: No such file or directory
sh-4.2# cat /var/log/journal.pos
s=52fdd277f90749b0a442c78739b1efa7;i=50d69;b=2a3f1736a1a1486d83f95db719fdc281;m=5465b53fd1;t=55cdac4738846;x=85596f3f5f5a27e4sh-4.2#
sh-4.2# journalctl -c `cat /var/log/journal.pos`
No journal files were found.
-- No entries --
Which might sort of explain why everything is running but no
logs are being processed.
This is based on a centos7 image with only the necessary
openshift packages installed and then openshift installed
using ansible. The logging setup in the inventory file is this:
openshift_hosted_logging_deployer_version=v3.6.0
openshift_hosted_logging_deploy=true
openshift_hosted_logging_storage_kind=nfs
openshift_hosted_logging_storage_access_modes=['ReadWriteOnce']
openshift_hosted_logging_storage_nfs_directory=/exports
openshift_hosted_logging_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_logging_storage_volume_name=logging
openshift_hosted_logging_storage_volume_size=10Gi
openshift_hosted_logging_storage_labels={'storage': 'logging'}
Tim
On 31/10/2017 16:37, Jeff Cantrill wrote:
Please provide additional information, logs, etc or post the
output of [1] someplace for review. Additionally, consider
reviewing [2].
[1]
https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh
[2]
https://github.com/openshift/origin-aggregated-logging/blob/master/docs/checking-efk-health.md
On Tue, Oct 31, 2017 at 11:47 AM, Tim Dudgeon
<tdudgeon...@gmail.com <mailto:tdudgeon...@gmail.com>> wrote:
Hi All,
I've deployed logging using the ansible installer
(v3.6.0) for a
fairly simple openshift setup and everything appears to
running:
NAME READY STATUS RESTARTS AGE
logging-curator-1-gvh73 1/1 Running
24 3d
logging-es-data-master-xz0e7a0c-1-deploy 0/1 Error
0 3d
logging-es-data-master-xz0e7a0c-4-deploy 0/1 Error
0 3d
logging-es-data-master-xz0e7a0c-5-deploy 0/1 Error
0 3d
logging-es-data-master-xz0e7a0c-7-t4xpf 1/1 Running
0 3d
logging-fluentd-4rm2w 1/1 Running 0 3d
logging-fluentd-8h944 1/1 Running 0 3d
logging-fluentd-n00bn 1/1 Running 0 3d
logging-fluentd-vt8hh 1/1 Running 0 3d
logging-kibana-1-g7l4z 2/2 Running 0 3d
(the failed pods were related to getting elasticsearch
running,
but that was resolved).
The problem is that I don't see any logs in Kibana. When
I look
in the fluentd pod logs I see lots of stuff like this:
2017-10-31 13:53:15 +0000 [warn]: no patterns matched
tag="journal.system"
2017-10-31 13:58:02 +0000 [warn]: no patterns matched
tag="kubernetes.journal.container"
2017-10-31 14:02:18 +0000 [warn]: no patterns matched
tag="journal.system"
2017-10-31 14:07:15 +0000 [warn]: no patterns matched
tag="journal.system"
2017-10-31 14:11:20 +0000 [warn]: no patterns matched
tag="journal.system"
2017-10-31 14:15:16 +0000 [warn]: no patterns matched
tag="journal.system"
2017-10-31 14:19:58 +0000 [warn]: no patterns matched
tag="journal.system"
Is this the cause, and if so what is wrong?
If not how to debug this?
Tim
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
<mailto:users@lists.openshift.redhat.com>
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
<http://lists.openshift.redhat.com/openshiftmm/listinfo/users>
--
--
Jeff Cantrill
Senior Software Engineer, Red Hat Engineering
OpenShift Integration Services
Red Hat, Inc.
*Office*: 703-748-4420 | 866-546-8970 ext. 8162420
jcant...@redhat.com <mailto:jcant...@redhat.com>
http://www.redhat.com
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users