Re: logging-es errors: shards failed

Peter Portante Fri, 15 Jul 2016 08:47:53 -0700

Can you go back further in the logs to the point where the errors started?

I am thinking about possible Java HEAP issues, or possibly ES
restarting for some reason.


-peter

On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <[email protected]> wrote:
> Also looking at this.
> Alex, is it possible to investigate if you were having some kind of network 
> connection issues in the ES cluster (I mean between individual cluster nodes)?
>
> Regards,
> Lukáš
>
>
>
>
>> On 15 Jul 2016, at 17:08, Peter Portante <[email protected]> wrote:
>>
>> Just catching up on the thread, will get back to you all in a few ...
>>
>> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz <[email protected]> wrote:
>>> Adding Lukas and Peter
>>>
>>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <[email protected]> wrote:
>>>>
>>>> I believe the "queue capacity" there is the number of parallel searches
>>>> that can be queued while the existing search workers operate. It sounds 
>>>> like
>>>> it has plenty of capacity there and it has a different reason for rejecting
>>>> the query. I would guess the data requested is missing given it couldn't
>>>> fetch shards it expected to.
>>>>
>>>> The number of shards is a multiple (for redundancy) of the number of
>>>> indices, and there is an index created per project per day. So even for a
>>>> small cluster this doesn't sound out of line.
>>>>
>>>> Can you give a little more information about your logging deployment? Have
>>>> you deployed multiple ES nodes for redundancy, and what are you using for
>>>> storage? Could you attach full ES logs? How many OpenShift nodes and
>>>> projects do you have? Any history of events that might have resulted in 
>>>> lost
>>>> data?
>>>>
>>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <[email protected]> wrote:
>>>>>
>>>>> When doing searches in Kibana, I get error messages similar to "Courier
>>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors like
>>>>> this: "EsRejectedExecutionException[rejected execution (queue capacity 
>>>>> 1000)
>>>>> on
>>>>> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e]".
>>>>>
>>>>> A bit of investigation lead me to conclude that our Elasticsearch server
>>>>> was not sufficiently powerful, but I spun up a new one with four times the
>>>>> CPU and RAM of the original one, but the queue capacity is still only 
>>>>> 1000.
>>>>> Also, 2020 seems like a really ridiculous number of shards.  Any idea 
>>>>> what's
>>>>> going on here?
>>>>>
>>>>> --
>>>>>
>>>>> Alex Wauck // DevOps Engineer
>>>>>
>>>>> E X O S I T E
>>>>> www.exosite.com
>>>>>
>>>>> Making Machines More Human.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: logging-es errors: shards failed

Reply via email to