Re: logging-es errors: shards failed

Alex Wauck Fri, 15 Jul 2016 13:32:26 -0700

I also tried to fetch the logs from our logging-ops ES instance.  That also
met with failure.  Searching for "kubernetes_namespace_name: logging" there
lead to "No results found".


On Fri, Jul 15, 2016 at 2:48 PM, Peter Portante <[email protected]> wrote:

> Well, we don't send ES logs to itself.  I think you can create a
> feedback loop that breaks the whole thing down.
> -peter
>
> On Fri, Jul 15, 2016 at 3:39 PM, Luke Meyer <[email protected]> wrote:
> > They surely do. Although it would probably be easiest here to just get
> them
> > from `oc logs` against the ES pod, especially if we can't trust ES
> storage.
> >
> > On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante <[email protected]>
> wrote:
> >>
> >> Eric, Luke,
> >>
> >> Do the logs from the ES instance itself flow into that ES instance?
> >>
> >> -peter
> >>
> >> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <[email protected]>
> >> wrote:
> >> > I'm not sure that I can.  I clicked the "Archive" link for the
> >> > logging-es
> >> > pod and then changed the query in Kibana to
> "kubernetes_container_name:
> >> > logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
> >> > results, instead getting this error:
> >> >
> >> > Index:
> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699]
> >> > Index:
> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb]
> >> > Index:
> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
> >> > Index:
> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9]
> >> > Index:
> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477]
> >> >
> >> > When I initially clicked the "Archive" link, I saw a lot of messages
> >> > with
> >> > the kubernetes_container_name "logging-fluentd", which is not what I
> >> > expected to see.
> >> >
> >> >
> >> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <[email protected]
> >
> >> > wrote:
> >> >>
> >> >> Can you go back further in the logs to the point where the errors
> >> >> started?
> >> >>
> >> >> I am thinking about possible Java HEAP issues, or possibly ES
> >> >> restarting for some reason.
> >> >>
> >> >> -peter
> >> >>
> >> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <[email protected]>
> >> >> wrote:
> >> >> > Also looking at this.
> >> >> > Alex, is it possible to investigate if you were having some kind of
> >> >> > network connection issues in the ES cluster (I mean between
> >> >> > individual
> >> >> > cluster nodes)?
> >> >> >
> >> >> > Regards,
> >> >> > Lukáš
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >> On 15 Jul 2016, at 17:08, Peter Portante <[email protected]>
> >> >> >> wrote:
> >> >> >>
> >> >> >> Just catching up on the thread, will get back to you all in a few
> >> >> >> ...
> >> >> >>
> >> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz
> >> >> >> <[email protected]>
> >> >> >> wrote:
> >> >> >>> Adding Lukas and Peter
> >> >> >>>
> >> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <[email protected]>
> >> >> >>> wrote:
> >> >> >>>>
> >> >> >>>> I believe the "queue capacity" there is the number of parallel
> >> >> >>>> searches
> >> >> >>>> that can be queued while the existing search workers operate. It
> >> >> >>>> sounds like
> >> >> >>>> it has plenty of capacity there and it has a different reason
> for
> >> >> >>>> rejecting
> >> >> >>>> the query. I would guess the data requested is missing given it
> >> >> >>>> couldn't
> >> >> >>>> fetch shards it expected to.
> >> >> >>>>
> >> >> >>>> The number of shards is a multiple (for redundancy) of the
> number
> >> >> >>>> of
> >> >> >>>> indices, and there is an index created per project per day. So
> >> >> >>>> even
> >> >> >>>> for a
> >> >> >>>> small cluster this doesn't sound out of line.
> >> >> >>>>
> >> >> >>>> Can you give a little more information about your logging
> >> >> >>>> deployment?
> >> >> >>>> Have
> >> >> >>>> you deployed multiple ES nodes for redundancy, and what are you
> >> >> >>>> using
> >> >> >>>> for
> >> >> >>>> storage? Could you attach full ES logs? How many OpenShift nodes
> >> >> >>>> and
> >> >> >>>> projects do you have? Any history of events that might have
> >> >> >>>> resulted
> >> >> >>>> in lost
> >> >> >>>> data?
> >> >> >>>>
> >> >> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck
> >> >> >>>> <[email protected]>
> >> >> >>>> wrote:
> >> >> >>>>>
> >> >> >>>>> When doing searches in Kibana, I get error messages similar to
> >> >> >>>>> "Courier
> >> >> >>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals
> >> >> >>>>> errors
> >> >> >>>>> like
> >> >> >>>>> this: "EsRejectedExecutionException[rejected execution (queue
> >> >> >>>>> capacity 1000)
> >> >> >>>>> on
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>
> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
> ]".
> >> >> >>>>>
> >> >> >>>>> A bit of investigation lead me to conclude that our
> Elasticsearch
> >> >> >>>>> server
> >> >> >>>>> was not sufficiently powerful, but I spun up a new one with
> four
> >> >> >>>>> times the
> >> >> >>>>> CPU and RAM of the original one, but the queue capacity is
> still
> >> >> >>>>> only 1000.
> >> >> >>>>> Also, 2020 seems like a really ridiculous number of shards.
> Any
> >> >> >>>>> idea what's
> >> >> >>>>> going on here?
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>>
> >> >> >>>>> Alex Wauck // DevOps Engineer
> >> >> >>>>>
> >> >> >>>>> E X O S I T E
> >> >> >>>>> www.exosite.com
> >> >> >>>>>
> >> >> >>>>> Making Machines More Human.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> _______________________________________________
> >> >> >>>>> users mailing list
> >> >> >>>>> [email protected]
> >> >> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> _______________________________________________
> >> >> >>>> users mailing list
> >> >> >>>> [email protected]
> >> >> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Alex Wauck // DevOps Engineer
> >> >
> >> > E X O S I T E
> >> > www.exosite.com
> >> >
> >> > Making Machines More Human.
> >
> >
>



-- 

Alex Wauck // DevOps Engineer

*E X O S I T E*
*www.exosite.com <http://www.exosite.com/>*

Making Machines More Human.

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: logging-es errors: shards failed

Reply via email to