Re: logging-es errors: shards failed

Luke Meyer Fri, 15 Jul 2016 12:43:00 -0700

They surely do. Although it would probably be easiest here to just get them
from `oc logs` against the ES pod, especially if we can't trust ES storage.


On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante <pport...@redhat.com> wrote:

> Eric, Luke,
>
> Do the logs from the ES instance itself flow into that ES instance?
>
> -peter
>
> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <alexwa...@exosite.com>
> wrote:
> > I'm not sure that I can.  I clicked the "Archive" link for the logging-es
> > pod and then changed the query in Kibana to "kubernetes_container_name:
> > logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
> > results, instead getting this error:
> >
> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699
> ]
> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb
> ]
> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
> > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9
> ]
> > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477
> ]
> >
> > When I initially clicked the "Archive" link, I saw a lot of messages with
> > the kubernetes_container_name "logging-fluentd", which is not what I
> > expected to see.
> >
> >
> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <pport...@redhat.com>
> > wrote:
> >>
> >> Can you go back further in the logs to the point where the errors
> started?
> >>
> >> I am thinking about possible Java HEAP issues, or possibly ES
> >> restarting for some reason.
> >>
> >> -peter
> >>
> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <lvl...@redhat.com>
> wrote:
> >> > Also looking at this.
> >> > Alex, is it possible to investigate if you were having some kind of
> >> > network connection issues in the ES cluster (I mean between individual
> >> > cluster nodes)?
> >> >
> >> > Regards,
> >> > Lukáš
> >> >
> >> >
> >> >
> >> >
> >> >> On 15 Jul 2016, at 17:08, Peter Portante <pport...@redhat.com>
> wrote:
> >> >>
> >> >> Just catching up on the thread, will get back to you all in a few ...
> >> >>
> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz <ewoli...@redhat.com
> >
> >> >> wrote:
> >> >>> Adding Lukas and Peter
> >> >>>
> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <lme...@redhat.com>
> wrote:
> >> >>>>
> >> >>>> I believe the "queue capacity" there is the number of parallel
> >> >>>> searches
> >> >>>> that can be queued while the existing search workers operate. It
> >> >>>> sounds like
> >> >>>> it has plenty of capacity there and it has a different reason for
> >> >>>> rejecting
> >> >>>> the query. I would guess the data requested is missing given it
> >> >>>> couldn't
> >> >>>> fetch shards it expected to.
> >> >>>>
> >> >>>> The number of shards is a multiple (for redundancy) of the number
> of
> >> >>>> indices, and there is an index created per project per day. So even
> >> >>>> for a
> >> >>>> small cluster this doesn't sound out of line.
> >> >>>>
> >> >>>> Can you give a little more information about your logging
> deployment?
> >> >>>> Have
> >> >>>> you deployed multiple ES nodes for redundancy, and what are you
> using
> >> >>>> for
> >> >>>> storage? Could you attach full ES logs? How many OpenShift nodes
> and
> >> >>>> projects do you have? Any history of events that might have
> resulted
> >> >>>> in lost
> >> >>>> data?
> >> >>>>
> >> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <alexwa...@exosite.com
> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> When doing searches in Kibana, I get error messages similar to
> >> >>>>> "Courier
> >> >>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals
> errors
> >> >>>>> like
> >> >>>>> this: "EsRejectedExecutionException[rejected execution (queue
> >> >>>>> capacity 1000)
> >> >>>>> on
> >> >>>>>
> >> >>>>>
> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
> ]".
> >> >>>>>
> >> >>>>> A bit of investigation lead me to conclude that our Elasticsearch
> >> >>>>> server
> >> >>>>> was not sufficiently powerful, but I spun up a new one with four
> >> >>>>> times the
> >> >>>>> CPU and RAM of the original one, but the queue capacity is still
> >> >>>>> only 1000.
> >> >>>>> Also, 2020 seems like a really ridiculous number of shards.  Any
> >> >>>>> idea what's
> >> >>>>> going on here?
> >> >>>>>
> >> >>>>> --
> >> >>>>>
> >> >>>>> Alex Wauck // DevOps Engineer
> >> >>>>>
> >> >>>>> E X O S I T E
> >> >>>>> www.exosite.com
> >> >>>>>
> >> >>>>> Making Machines More Human.
> >> >>>>>
> >> >>>>>
> >> >>>>> _______________________________________________
> >> >>>>> users mailing list
> >> >>>>> users@lists.openshift.redhat.com
> >> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> users mailing list
> >> >>>> users@lists.openshift.redhat.com
> >> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >> >
> >
> >
> >
> >
> > --
> >
> > Alex Wauck // DevOps Engineer
> >
> > E X O S I T E
> > www.exosite.com
> >
> > Making Machines More Human.
>

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: logging-es errors: shards failed

Reply via email to