The logging-ops instance will contain the logs from /var/log/messages* and the "default", "openshift" and "openshift-infra" name spaces only.
On Fri, Jul 15, 2016 at 3:28 PM, Alex Wauck <[email protected]> wrote: > I also tried to fetch the logs from our logging-ops ES instance. That > also met with failure. Searching for "kubernetes_namespace_name: logging" > there lead to "No results found". > > On Fri, Jul 15, 2016 at 2:48 PM, Peter Portante <[email protected]> > wrote: > >> Well, we don't send ES logs to itself. I think you can create a >> feedback loop that breaks the whole thing down. >> -peter >> >> On Fri, Jul 15, 2016 at 3:39 PM, Luke Meyer <[email protected]> wrote: >> > They surely do. Although it would probably be easiest here to just get >> them >> > from `oc logs` against the ES pod, especially if we can't trust ES >> storage. >> > >> > On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante <[email protected]> >> wrote: >> >> >> >> Eric, Luke, >> >> >> >> Do the logs from the ES instance itself flow into that ES instance? >> >> >> >> -peter >> >> >> >> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <[email protected]> >> >> wrote: >> >> > I'm not sure that I can. I clicked the "Archive" link for the >> >> > logging-es >> >> > pod and then changed the query in Kibana to >> "kubernetes_container_name: >> >> > logging-es-cycd8veb && kubernetes_namespace_name: logging". I got no >> >> > results, instead getting this error: >> >> > >> >> > Index: >> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12 >> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution >> (queue >> >> > capacity 1000) on >> >> > >> >> > >> org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699] >> >> > Index: >> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14 >> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution >> (queue >> >> > capacity 1000) on >> >> > >> >> > >> org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb] >> >> > Index: >> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15 >> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution >> (queue >> >> > capacity 1000) on >> >> > >> org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e] >> >> > Index: >> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29 >> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution >> (queue >> >> > capacity 1000) on >> >> > >> >> > >> org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9] >> >> > Index: >> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30 >> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution >> (queue >> >> > capacity 1000) on >> >> > >> >> > >> org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477] >> >> > >> >> > When I initially clicked the "Archive" link, I saw a lot of messages >> >> > with >> >> > the kubernetes_container_name "logging-fluentd", which is not what I >> >> > expected to see. >> >> > >> >> > >> >> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante < >> [email protected]> >> >> > wrote: >> >> >> >> >> >> Can you go back further in the logs to the point where the errors >> >> >> started? >> >> >> >> >> >> I am thinking about possible Java HEAP issues, or possibly ES >> >> >> restarting for some reason. >> >> >> >> >> >> -peter >> >> >> >> >> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <[email protected]> >> >> >> wrote: >> >> >> > Also looking at this. >> >> >> > Alex, is it possible to investigate if you were having some kind >> of >> >> >> > network connection issues in the ES cluster (I mean between >> >> >> > individual >> >> >> > cluster nodes)? >> >> >> > >> >> >> > Regards, >> >> >> > Lukáš >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> >> On 15 Jul 2016, at 17:08, Peter Portante <[email protected]> >> >> >> >> wrote: >> >> >> >> >> >> >> >> Just catching up on the thread, will get back to you all in a few >> >> >> >> ... >> >> >> >> >> >> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz >> >> >> >> <[email protected]> >> >> >> >> wrote: >> >> >> >>> Adding Lukas and Peter >> >> >> >>> >> >> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <[email protected]> >> >> >> >>> wrote: >> >> >> >>>> >> >> >> >>>> I believe the "queue capacity" there is the number of parallel >> >> >> >>>> searches >> >> >> >>>> that can be queued while the existing search workers operate. >> It >> >> >> >>>> sounds like >> >> >> >>>> it has plenty of capacity there and it has a different reason >> for >> >> >> >>>> rejecting >> >> >> >>>> the query. I would guess the data requested is missing given it >> >> >> >>>> couldn't >> >> >> >>>> fetch shards it expected to. >> >> >> >>>> >> >> >> >>>> The number of shards is a multiple (for redundancy) of the >> number >> >> >> >>>> of >> >> >> >>>> indices, and there is an index created per project per day. So >> >> >> >>>> even >> >> >> >>>> for a >> >> >> >>>> small cluster this doesn't sound out of line. >> >> >> >>>> >> >> >> >>>> Can you give a little more information about your logging >> >> >> >>>> deployment? >> >> >> >>>> Have >> >> >> >>>> you deployed multiple ES nodes for redundancy, and what are you >> >> >> >>>> using >> >> >> >>>> for >> >> >> >>>> storage? Could you attach full ES logs? How many OpenShift >> nodes >> >> >> >>>> and >> >> >> >>>> projects do you have? Any history of events that might have >> >> >> >>>> resulted >> >> >> >>>> in lost >> >> >> >>>> data? >> >> >> >>>> >> >> >> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck >> >> >> >>>> <[email protected]> >> >> >> >>>> wrote: >> >> >> >>>>> >> >> >> >>>>> When doing searches in Kibana, I get error messages similar to >> >> >> >>>>> "Courier >> >> >> >>>>> Fetch: 919 of 2020 shards failed". Deeper inspection reveals >> >> >> >>>>> errors >> >> >> >>>>> like >> >> >> >>>>> this: "EsRejectedExecutionException[rejected execution (queue >> >> >> >>>>> capacity 1000) >> >> >> >>>>> on >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> >> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e >> ]". >> >> >> >>>>> >> >> >> >>>>> A bit of investigation lead me to conclude that our >> Elasticsearch >> >> >> >>>>> server >> >> >> >>>>> was not sufficiently powerful, but I spun up a new one with >> four >> >> >> >>>>> times the >> >> >> >>>>> CPU and RAM of the original one, but the queue capacity is >> still >> >> >> >>>>> only 1000. >> >> >> >>>>> Also, 2020 seems like a really ridiculous number of shards. >> Any >> >> >> >>>>> idea what's >> >> >> >>>>> going on here? >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> >> >> >> >>>>> Alex Wauck // DevOps Engineer >> >> >> >>>>> >> >> >> >>>>> E X O S I T E >> >> >> >>>>> www.exosite.com >> >> >> >>>>> >> >> >> >>>>> Making Machines More Human. >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> _______________________________________________ >> >> >> >>>>> users mailing list >> >> >> >>>>> [email protected] >> >> >> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> >> >> >>>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> _______________________________________________ >> >> >> >>>> users mailing list >> >> >> >>>> [email protected] >> >> >> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > >> >> > Alex Wauck // DevOps Engineer >> >> > >> >> > E X O S I T E >> >> > www.exosite.com >> >> > >> >> > Making Machines More Human. >> > >> > >> > > > > -- > > Alex Wauck // DevOps Engineer > > *E X O S I T E* > *www.exosite.com <http://www.exosite.com/>* > > Making Machines More Human. > >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
