Eric, Luke, Do the logs from the ES instance itself flow into that ES instance?
-peter On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <[email protected]> wrote: > I'm not sure that I can. I clicked the "Archive" link for the logging-es > pod and then changed the query in Kibana to "kubernetes_container_name: > logging-es-cycd8veb && kubernetes_namespace_name: logging". I got no > results, instead getting this error: > > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12 > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue > capacity 1000) on > org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699] > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14 > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue > capacity 1000) on > org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb] > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15 > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue > capacity 1000) on > org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e] > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29 > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue > capacity 1000) on > org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9] > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30 > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue > capacity 1000) on > org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477] > > When I initially clicked the "Archive" link, I saw a lot of messages with > the kubernetes_container_name "logging-fluentd", which is not what I > expected to see. > > > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <[email protected]> > wrote: >> >> Can you go back further in the logs to the point where the errors started? >> >> I am thinking about possible Java HEAP issues, or possibly ES >> restarting for some reason. >> >> -peter >> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <[email protected]> wrote: >> > Also looking at this. >> > Alex, is it possible to investigate if you were having some kind of >> > network connection issues in the ES cluster (I mean between individual >> > cluster nodes)? >> > >> > Regards, >> > Lukáš >> > >> > >> > >> > >> >> On 15 Jul 2016, at 17:08, Peter Portante <[email protected]> wrote: >> >> >> >> Just catching up on the thread, will get back to you all in a few ... >> >> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz <[email protected]> >> >> wrote: >> >>> Adding Lukas and Peter >> >>> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <[email protected]> wrote: >> >>>> >> >>>> I believe the "queue capacity" there is the number of parallel >> >>>> searches >> >>>> that can be queued while the existing search workers operate. It >> >>>> sounds like >> >>>> it has plenty of capacity there and it has a different reason for >> >>>> rejecting >> >>>> the query. I would guess the data requested is missing given it >> >>>> couldn't >> >>>> fetch shards it expected to. >> >>>> >> >>>> The number of shards is a multiple (for redundancy) of the number of >> >>>> indices, and there is an index created per project per day. So even >> >>>> for a >> >>>> small cluster this doesn't sound out of line. >> >>>> >> >>>> Can you give a little more information about your logging deployment? >> >>>> Have >> >>>> you deployed multiple ES nodes for redundancy, and what are you using >> >>>> for >> >>>> storage? Could you attach full ES logs? How many OpenShift nodes and >> >>>> projects do you have? Any history of events that might have resulted >> >>>> in lost >> >>>> data? >> >>>> >> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <[email protected]> >> >>>> wrote: >> >>>>> >> >>>>> When doing searches in Kibana, I get error messages similar to >> >>>>> "Courier >> >>>>> Fetch: 919 of 2020 shards failed". Deeper inspection reveals errors >> >>>>> like >> >>>>> this: "EsRejectedExecutionException[rejected execution (queue >> >>>>> capacity 1000) >> >>>>> on >> >>>>> >> >>>>> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e]". >> >>>>> >> >>>>> A bit of investigation lead me to conclude that our Elasticsearch >> >>>>> server >> >>>>> was not sufficiently powerful, but I spun up a new one with four >> >>>>> times the >> >>>>> CPU and RAM of the original one, but the queue capacity is still >> >>>>> only 1000. >> >>>>> Also, 2020 seems like a really ridiculous number of shards. Any >> >>>>> idea what's >> >>>>> going on here? >> >>>>> >> >>>>> -- >> >>>>> >> >>>>> Alex Wauck // DevOps Engineer >> >>>>> >> >>>>> E X O S I T E >> >>>>> www.exosite.com >> >>>>> >> >>>>> Making Machines More Human. >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> [email protected] >> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> >>>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> users mailing list >> >>>> [email protected] >> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> > > > > > > -- > > Alex Wauck // DevOps Engineer > > E X O S I T E > www.exosite.com > > Making Machines More Human. _______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
