Can you go back further in the logs to the point where the errors started? I am thinking about possible Java HEAP issues, or possibly ES restarting for some reason.
-peter On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <[email protected]> wrote: > Also looking at this. > Alex, is it possible to investigate if you were having some kind of network > connection issues in the ES cluster (I mean between individual cluster nodes)? > > Regards, > Lukáš > > > > >> On 15 Jul 2016, at 17:08, Peter Portante <[email protected]> wrote: >> >> Just catching up on the thread, will get back to you all in a few ... >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz <[email protected]> wrote: >>> Adding Lukas and Peter >>> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <[email protected]> wrote: >>>> >>>> I believe the "queue capacity" there is the number of parallel searches >>>> that can be queued while the existing search workers operate. It sounds >>>> like >>>> it has plenty of capacity there and it has a different reason for rejecting >>>> the query. I would guess the data requested is missing given it couldn't >>>> fetch shards it expected to. >>>> >>>> The number of shards is a multiple (for redundancy) of the number of >>>> indices, and there is an index created per project per day. So even for a >>>> small cluster this doesn't sound out of line. >>>> >>>> Can you give a little more information about your logging deployment? Have >>>> you deployed multiple ES nodes for redundancy, and what are you using for >>>> storage? Could you attach full ES logs? How many OpenShift nodes and >>>> projects do you have? Any history of events that might have resulted in >>>> lost >>>> data? >>>> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <[email protected]> wrote: >>>>> >>>>> When doing searches in Kibana, I get error messages similar to "Courier >>>>> Fetch: 919 of 2020 shards failed". Deeper inspection reveals errors like >>>>> this: "EsRejectedExecutionException[rejected execution (queue capacity >>>>> 1000) >>>>> on >>>>> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e]". >>>>> >>>>> A bit of investigation lead me to conclude that our Elasticsearch server >>>>> was not sufficiently powerful, but I spun up a new one with four times the >>>>> CPU and RAM of the original one, but the queue capacity is still only >>>>> 1000. >>>>> Also, 2020 seems like a really ridiculous number of shards. Any idea >>>>> what's >>>>> going on here? >>>>> >>>>> -- >>>>> >>>>> Alex Wauck // DevOps Engineer >>>>> >>>>> E X O S I T E >>>>> www.exosite.com >>>>> >>>>> Making Machines More Human. >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > _______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
