Re: logging-es errors: shards failed

Eric Wolinetz Fri, 15 Jul 2016 13:38:08 -0700

The logging-ops instance will contain the logs from /var/log/messages* and
the "default", "openshift" and "openshift-infra" name spaces only.


On Fri, Jul 15, 2016 at 3:28 PM, Alex Wauck <[email protected]> wrote:

> I also tried to fetch the logs from our logging-ops ES instance.  That
> also met with failure.  Searching for "kubernetes_namespace_name: logging"
> there lead to "No results found".
>
> On Fri, Jul 15, 2016 at 2:48 PM, Peter Portante <[email protected]>
> wrote:
>
>> Well, we don't send ES logs to itself.  I think you can create a
>> feedback loop that breaks the whole thing down.
>> -peter
>>
>> On Fri, Jul 15, 2016 at 3:39 PM, Luke Meyer <[email protected]> wrote:
>> > They surely do. Although it would probably be easiest here to just get
>> them
>> > from `oc logs` against the ES pod, especially if we can't trust ES
>> storage.
>> >
>> > On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante <[email protected]>
>> wrote:
>> >>
>> >> Eric, Luke,
>> >>
>> >> Do the logs from the ES instance itself flow into that ES instance?
>> >>
>> >> -peter
>> >>
>> >> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <[email protected]>
>> >> wrote:
>> >> > I'm not sure that I can.  I clicked the "Archive" link for the
>> >> > logging-es
>> >> > pod and then changed the query in Kibana to
>> "kubernetes_container_name:
>> >> > logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
>> >> > results, instead getting this error:
>> >> >
>> >> > Index:
>> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699]
>> >> > Index:
>> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb]
>> >> > Index:
>> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
>> >> > Index:
>> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9]
>> >> > Index:
>> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477]
>> >> >
>> >> > When I initially clicked the "Archive" link, I saw a lot of messages
>> >> > with
>> >> > the kubernetes_container_name "logging-fluentd", which is not what I
>> >> > expected to see.
>> >> >
>> >> >
>> >> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <
>> [email protected]>
>> >> > wrote:
>> >> >>
>> >> >> Can you go back further in the logs to the point where the errors
>> >> >> started?
>> >> >>
>> >> >> I am thinking about possible Java HEAP issues, or possibly ES
>> >> >> restarting for some reason.
>> >> >>
>> >> >> -peter
>> >> >>
>> >> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <[email protected]>
>> >> >> wrote:
>> >> >> > Also looking at this.
>> >> >> > Alex, is it possible to investigate if you were having some kind
>> of
>> >> >> > network connection issues in the ES cluster (I mean between
>> >> >> > individual
>> >> >> > cluster nodes)?
>> >> >> >
>> >> >> > Regards,
>> >> >> > Lukáš
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >> On 15 Jul 2016, at 17:08, Peter Portante <[email protected]>
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> Just catching up on the thread, will get back to you all in a few
>> >> >> >> ...
>> >> >> >>
>> >> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz
>> >> >> >> <[email protected]>
>> >> >> >> wrote:
>> >> >> >>> Adding Lukas and Peter
>> >> >> >>>
>> >> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <[email protected]>
>> >> >> >>> wrote:
>> >> >> >>>>
>> >> >> >>>> I believe the "queue capacity" there is the number of parallel
>> >> >> >>>> searches
>> >> >> >>>> that can be queued while the existing search workers operate.
>> It
>> >> >> >>>> sounds like
>> >> >> >>>> it has plenty of capacity there and it has a different reason
>> for
>> >> >> >>>> rejecting
>> >> >> >>>> the query. I would guess the data requested is missing given it
>> >> >> >>>> couldn't
>> >> >> >>>> fetch shards it expected to.
>> >> >> >>>>
>> >> >> >>>> The number of shards is a multiple (for redundancy) of the
>> number
>> >> >> >>>> of
>> >> >> >>>> indices, and there is an index created per project per day. So
>> >> >> >>>> even
>> >> >> >>>> for a
>> >> >> >>>> small cluster this doesn't sound out of line.
>> >> >> >>>>
>> >> >> >>>> Can you give a little more information about your logging
>> >> >> >>>> deployment?
>> >> >> >>>> Have
>> >> >> >>>> you deployed multiple ES nodes for redundancy, and what are you
>> >> >> >>>> using
>> >> >> >>>> for
>> >> >> >>>> storage? Could you attach full ES logs? How many OpenShift
>> nodes
>> >> >> >>>> and
>> >> >> >>>> projects do you have? Any history of events that might have
>> >> >> >>>> resulted
>> >> >> >>>> in lost
>> >> >> >>>> data?
>> >> >> >>>>
>> >> >> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck
>> >> >> >>>> <[email protected]>
>> >> >> >>>> wrote:
>> >> >> >>>>>
>> >> >> >>>>> When doing searches in Kibana, I get error messages similar to
>> >> >> >>>>> "Courier
>> >> >> >>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals
>> >> >> >>>>> errors
>> >> >> >>>>> like
>> >> >> >>>>> this: "EsRejectedExecutionException[rejected execution (queue
>> >> >> >>>>> capacity 1000)
>> >> >> >>>>> on
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
>> ]".
>> >> >> >>>>>
>> >> >> >>>>> A bit of investigation lead me to conclude that our
>> Elasticsearch
>> >> >> >>>>> server
>> >> >> >>>>> was not sufficiently powerful, but I spun up a new one with
>> four
>> >> >> >>>>> times the
>> >> >> >>>>> CPU and RAM of the original one, but the queue capacity is
>> still
>> >> >> >>>>> only 1000.
>> >> >> >>>>> Also, 2020 seems like a really ridiculous number of shards.
>> Any
>> >> >> >>>>> idea what's
>> >> >> >>>>> going on here?
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>>
>> >> >> >>>>> Alex Wauck // DevOps Engineer
>> >> >> >>>>>
>> >> >> >>>>> E X O S I T E
>> >> >> >>>>> www.exosite.com
>> >> >> >>>>>
>> >> >> >>>>> Making Machines More Human.
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> _______________________________________________
>> >> >> >>>>> users mailing list
>> >> >> >>>>> [email protected]
>> >> >> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> >> >> >>>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> _______________________________________________
>> >> >> >>>> users mailing list
>> >> >> >>>> [email protected]
>> >> >> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Alex Wauck // DevOps Engineer
>> >> >
>> >> > E X O S I T E
>> >> > www.exosite.com
>> >> >
>> >> > Making Machines More Human.
>> >
>> >
>>
>
>
>
> --
>
> Alex Wauck // DevOps Engineer
>
> *E X O S I T E*
> *www.exosite.com <http://www.exosite.com/>*
>
> Making Machines More Human.
>
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: logging-es errors: shards failed

Reply via email to