I also tried to fetch the logs from our logging-ops ES instance. That also met with failure. Searching for "kubernetes_namespace_name: logging" there lead to "No results found".
On Fri, Jul 15, 2016 at 2:48 PM, Peter Portante <[email protected]> wrote: > Well, we don't send ES logs to itself. I think you can create a > feedback loop that breaks the whole thing down. > -peter > > On Fri, Jul 15, 2016 at 3:39 PM, Luke Meyer <[email protected]> wrote: > > They surely do. Although it would probably be easiest here to just get > them > > from `oc logs` against the ES pod, especially if we can't trust ES > storage. > > > > On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante <[email protected]> > wrote: > >> > >> Eric, Luke, > >> > >> Do the logs from the ES instance itself flow into that ES instance? > >> > >> -peter > >> > >> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <[email protected]> > >> wrote: > >> > I'm not sure that I can. I clicked the "Archive" link for the > >> > logging-es > >> > pod and then changed the query in Kibana to > "kubernetes_container_name: > >> > logging-es-cycd8veb && kubernetes_namespace_name: logging". I got no > >> > results, instead getting this error: > >> > > >> > Index: > unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12 > >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution > (queue > >> > capacity 1000) on > >> > > >> > > org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699] > >> > Index: > unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14 > >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution > (queue > >> > capacity 1000) on > >> > > >> > > org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb] > >> > Index: > unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15 > >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution > (queue > >> > capacity 1000) on > >> > > org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e] > >> > Index: > unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29 > >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution > (queue > >> > capacity 1000) on > >> > > >> > > org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9] > >> > Index: > unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30 > >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution > (queue > >> > capacity 1000) on > >> > > >> > > org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477] > >> > > >> > When I initially clicked the "Archive" link, I saw a lot of messages > >> > with > >> > the kubernetes_container_name "logging-fluentd", which is not what I > >> > expected to see. > >> > > >> > > >> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <[email protected] > > > >> > wrote: > >> >> > >> >> Can you go back further in the logs to the point where the errors > >> >> started? > >> >> > >> >> I am thinking about possible Java HEAP issues, or possibly ES > >> >> restarting for some reason. > >> >> > >> >> -peter > >> >> > >> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <[email protected]> > >> >> wrote: > >> >> > Also looking at this. > >> >> > Alex, is it possible to investigate if you were having some kind of > >> >> > network connection issues in the ES cluster (I mean between > >> >> > individual > >> >> > cluster nodes)? > >> >> > > >> >> > Regards, > >> >> > Lukáš > >> >> > > >> >> > > >> >> > > >> >> > > >> >> >> On 15 Jul 2016, at 17:08, Peter Portante <[email protected]> > >> >> >> wrote: > >> >> >> > >> >> >> Just catching up on the thread, will get back to you all in a few > >> >> >> ... > >> >> >> > >> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz > >> >> >> <[email protected]> > >> >> >> wrote: > >> >> >>> Adding Lukas and Peter > >> >> >>> > >> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <[email protected]> > >> >> >>> wrote: > >> >> >>>> > >> >> >>>> I believe the "queue capacity" there is the number of parallel > >> >> >>>> searches > >> >> >>>> that can be queued while the existing search workers operate. It > >> >> >>>> sounds like > >> >> >>>> it has plenty of capacity there and it has a different reason > for > >> >> >>>> rejecting > >> >> >>>> the query. I would guess the data requested is missing given it > >> >> >>>> couldn't > >> >> >>>> fetch shards it expected to. > >> >> >>>> > >> >> >>>> The number of shards is a multiple (for redundancy) of the > number > >> >> >>>> of > >> >> >>>> indices, and there is an index created per project per day. So > >> >> >>>> even > >> >> >>>> for a > >> >> >>>> small cluster this doesn't sound out of line. > >> >> >>>> > >> >> >>>> Can you give a little more information about your logging > >> >> >>>> deployment? > >> >> >>>> Have > >> >> >>>> you deployed multiple ES nodes for redundancy, and what are you > >> >> >>>> using > >> >> >>>> for > >> >> >>>> storage? Could you attach full ES logs? How many OpenShift nodes > >> >> >>>> and > >> >> >>>> projects do you have? Any history of events that might have > >> >> >>>> resulted > >> >> >>>> in lost > >> >> >>>> data? > >> >> >>>> > >> >> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck > >> >> >>>> <[email protected]> > >> >> >>>> wrote: > >> >> >>>>> > >> >> >>>>> When doing searches in Kibana, I get error messages similar to > >> >> >>>>> "Courier > >> >> >>>>> Fetch: 919 of 2020 shards failed". Deeper inspection reveals > >> >> >>>>> errors > >> >> >>>>> like > >> >> >>>>> this: "EsRejectedExecutionException[rejected execution (queue > >> >> >>>>> capacity 1000) > >> >> >>>>> on > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e > ]". > >> >> >>>>> > >> >> >>>>> A bit of investigation lead me to conclude that our > Elasticsearch > >> >> >>>>> server > >> >> >>>>> was not sufficiently powerful, but I spun up a new one with > four > >> >> >>>>> times the > >> >> >>>>> CPU and RAM of the original one, but the queue capacity is > still > >> >> >>>>> only 1000. > >> >> >>>>> Also, 2020 seems like a really ridiculous number of shards. > Any > >> >> >>>>> idea what's > >> >> >>>>> going on here? > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> > >> >> >>>>> Alex Wauck // DevOps Engineer > >> >> >>>>> > >> >> >>>>> E X O S I T E > >> >> >>>>> www.exosite.com > >> >> >>>>> > >> >> >>>>> Making Machines More Human. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> _______________________________________________ > >> >> >>>>> users mailing list > >> >> >>>>> [email protected] > >> >> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > >> >> >>>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> _______________________________________________ > >> >> >>>> users mailing list > >> >> >>>> [email protected] > >> >> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > >> >> > > >> > > >> > > >> > > >> > > >> > -- > >> > > >> > Alex Wauck // DevOps Engineer > >> > > >> > E X O S I T E > >> > www.exosite.com > >> > > >> > Making Machines More Human. > > > > > -- Alex Wauck // DevOps Engineer *E X O S I T E* *www.exosite.com <http://www.exosite.com/>* Making Machines More Human.
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
