Re: logging-es errors: shards failed

2016-07-15 Thread Eric Wolinetz
The logging-ops instance will contain the logs from /var/log/messages* and
the "default", "openshift" and "openshift-infra" name spaces only.

On Fri, Jul 15, 2016 at 3:28 PM, Alex Wauck  wrote:

> I also tried to fetch the logs from our logging-ops ES instance.  That
> also met with failure.  Searching for "kubernetes_namespace_name: logging"
> there lead to "No results found".
>
> On Fri, Jul 15, 2016 at 2:48 PM, Peter Portante 
> wrote:
>
>> Well, we don't send ES logs to itself.  I think you can create a
>> feedback loop that breaks the whole thing down.
>> -peter
>>
>> On Fri, Jul 15, 2016 at 3:39 PM, Luke Meyer  wrote:
>> > They surely do. Although it would probably be easiest here to just get
>> them
>> > from `oc logs` against the ES pod, especially if we can't trust ES
>> storage.
>> >
>> > On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante 
>> wrote:
>> >>
>> >> Eric, Luke,
>> >>
>> >> Do the logs from the ES instance itself flow into that ES instance?
>> >>
>> >> -peter
>> >>
>> >> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck 
>> >> wrote:
>> >> > I'm not sure that I can.  I clicked the "Archive" link for the
>> >> > logging-es
>> >> > pod and then changed the query in Kibana to
>> "kubernetes_container_name:
>> >> > logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
>> >> > results, instead getting this error:
>> >> >
>> >> > Index:
>> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699]
>> >> > Index:
>> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb]
>> >> > Index:
>> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
>> >> > Index:
>> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9]
>> >> > Index:
>> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
>> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
>> (queue
>> >> > capacity 1000) on
>> >> >
>> >> >
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477]
>> >> >
>> >> > When I initially clicked the "Archive" link, I saw a lot of messages
>> >> > with
>> >> > the kubernetes_container_name "logging-fluentd", which is not what I
>> >> > expected to see.
>> >> >
>> >> >
>> >> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <
>> pport...@redhat.com>
>> >> > wrote:
>> >> >>
>> >> >> Can you go back further in the logs to the point where the errors
>> >> >> started?
>> >> >>
>> >> >> I am thinking about possible Java HEAP issues, or possibly ES
>> >> >> restarting for some reason.
>> >> >>
>> >> >> -peter
>> >> >>
>> >> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček 
>> >> >> wrote:
>> >> >> > Also looking at this.
>> >> >> > Alex, is it possible to investigate if you were having some kind
>> of
>> >> >> > network connection issues in the ES cluster (I mean between
>> >> >> > individual
>> >> >> > cluster nodes)?
>> >> >> >
>> >> >> > Regards,
>> >> >> > Lukáš
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >> On 15 Jul 2016, at 17:08, Peter Portante 
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> Just catching up on the thread, will get back to you all in a few
>> >> >> >> ...
>> >> >> >>
>> >> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz
>> >> >> >> 
>> >> >> >> wrote:
>> >> >> >>> Adding Lukas and Peter
>> >> >> >>>
>> >> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer 
>> >> >> >>> wrote:
>> >> >> 
>> >> >>  I believe the "queue capacity" there is the number of parallel
>> >> >>  searches
>> >> >>  that can be queued while the existing search workers operate.
>> It
>> >> >>  sounds like
>> >> >>  it has plenty of capacity there and it has a different reason
>> for
>> >> >>  rejecting
>> >> >>  the query. I would guess the data requested is missing given it
>> >> >>  couldn't
>> >> >>  fetch shards it expected to.
>> >> >> 
>> >> >>  The number of shards is a multiple (for redundancy) of the
>> number
>> >> >>  of
>> >> >>  indices, and there is an index created per 

Re: logging-es errors: shards failed

2016-07-15 Thread Alex Wauck
I also tried to fetch the logs from our logging-ops ES instance.  That also
met with failure.  Searching for "kubernetes_namespace_name: logging" there
lead to "No results found".

On Fri, Jul 15, 2016 at 2:48 PM, Peter Portante  wrote:

> Well, we don't send ES logs to itself.  I think you can create a
> feedback loop that breaks the whole thing down.
> -peter
>
> On Fri, Jul 15, 2016 at 3:39 PM, Luke Meyer  wrote:
> > They surely do. Although it would probably be easiest here to just get
> them
> > from `oc logs` against the ES pod, especially if we can't trust ES
> storage.
> >
> > On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante 
> wrote:
> >>
> >> Eric, Luke,
> >>
> >> Do the logs from the ES instance itself flow into that ES instance?
> >>
> >> -peter
> >>
> >> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck 
> >> wrote:
> >> > I'm not sure that I can.  I clicked the "Archive" link for the
> >> > logging-es
> >> > pod and then changed the query in Kibana to
> "kubernetes_container_name:
> >> > logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
> >> > results, instead getting this error:
> >> >
> >> > Index:
> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699]
> >> > Index:
> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb]
> >> > Index:
> unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
> >> > Index:
> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9]
> >> > Index:
> unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
> >> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution
> (queue
> >> > capacity 1000) on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477]
> >> >
> >> > When I initially clicked the "Archive" link, I saw a lot of messages
> >> > with
> >> > the kubernetes_container_name "logging-fluentd", which is not what I
> >> > expected to see.
> >> >
> >> >
> >> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante  >
> >> > wrote:
> >> >>
> >> >> Can you go back further in the logs to the point where the errors
> >> >> started?
> >> >>
> >> >> I am thinking about possible Java HEAP issues, or possibly ES
> >> >> restarting for some reason.
> >> >>
> >> >> -peter
> >> >>
> >> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček 
> >> >> wrote:
> >> >> > Also looking at this.
> >> >> > Alex, is it possible to investigate if you were having some kind of
> >> >> > network connection issues in the ES cluster (I mean between
> >> >> > individual
> >> >> > cluster nodes)?
> >> >> >
> >> >> > Regards,
> >> >> > Lukáš
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >> On 15 Jul 2016, at 17:08, Peter Portante 
> >> >> >> wrote:
> >> >> >>
> >> >> >> Just catching up on the thread, will get back to you all in a few
> >> >> >> ...
> >> >> >>
> >> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz
> >> >> >> 
> >> >> >> wrote:
> >> >> >>> Adding Lukas and Peter
> >> >> >>>
> >> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer 
> >> >> >>> wrote:
> >> >> 
> >> >>  I believe the "queue capacity" there is the number of parallel
> >> >>  searches
> >> >>  that can be queued while the existing search workers operate. It
> >> >>  sounds like
> >> >>  it has plenty of capacity there and it has a different reason
> for
> >> >>  rejecting
> >> >>  the query. I would guess the data requested is missing given it
> >> >>  couldn't
> >> >>  fetch shards it expected to.
> >> >> 
> >> >>  The number of shards is a multiple (for redundancy) of the
> number
> >> >>  of
> >> >>  indices, and there is an index created per project per day. So
> >> >>  even
> >> >>  for a
> >> >>  small cluster this doesn't sound out of line.
> >> >> 
> >> >>  Can you give a little more information about your logging
> >> >>  deployment?
> >> >>  Have
> >> >>  you deployed multiple ES nodes for redundancy, and what are you
> >> >>  using
> >> >>  for
> >> >>  

Re: logging-es errors: shards failed

2016-07-15 Thread Luke Meyer
They surely do. Although it would probably be easiest here to just get them
from `oc logs` against the ES pod, especially if we can't trust ES storage.

On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante  wrote:

> Eric, Luke,
>
> Do the logs from the ES instance itself flow into that ES instance?
>
> -peter
>
> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck 
> wrote:
> > I'm not sure that I can.  I clicked the "Archive" link for the logging-es
> > pod and then changed the query in Kibana to "kubernetes_container_name:
> > logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
> > results, instead getting this error:
> >
> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699
> ]
> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb
> ]
> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
> > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9
> ]
> > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> > capacity 1000) on
> > org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477
> ]
> >
> > When I initially clicked the "Archive" link, I saw a lot of messages with
> > the kubernetes_container_name "logging-fluentd", which is not what I
> > expected to see.
> >
> >
> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante 
> > wrote:
> >>
> >> Can you go back further in the logs to the point where the errors
> started?
> >>
> >> I am thinking about possible Java HEAP issues, or possibly ES
> >> restarting for some reason.
> >>
> >> -peter
> >>
> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček 
> wrote:
> >> > Also looking at this.
> >> > Alex, is it possible to investigate if you were having some kind of
> >> > network connection issues in the ES cluster (I mean between individual
> >> > cluster nodes)?
> >> >
> >> > Regards,
> >> > Lukáš
> >> >
> >> >
> >> >
> >> >
> >> >> On 15 Jul 2016, at 17:08, Peter Portante 
> wrote:
> >> >>
> >> >> Just catching up on the thread, will get back to you all in a few ...
> >> >>
> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz  >
> >> >> wrote:
> >> >>> Adding Lukas and Peter
> >> >>>
> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer 
> wrote:
> >> 
> >>  I believe the "queue capacity" there is the number of parallel
> >>  searches
> >>  that can be queued while the existing search workers operate. It
> >>  sounds like
> >>  it has plenty of capacity there and it has a different reason for
> >>  rejecting
> >>  the query. I would guess the data requested is missing given it
> >>  couldn't
> >>  fetch shards it expected to.
> >> 
> >>  The number of shards is a multiple (for redundancy) of the number
> of
> >>  indices, and there is an index created per project per day. So even
> >>  for a
> >>  small cluster this doesn't sound out of line.
> >> 
> >>  Can you give a little more information about your logging
> deployment?
> >>  Have
> >>  you deployed multiple ES nodes for redundancy, and what are you
> using
> >>  for
> >>  storage? Could you attach full ES logs? How many OpenShift nodes
> and
> >>  projects do you have? Any history of events that might have
> resulted
> >>  in lost
> >>  data?
> >> 
> >>  On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck  >
> >>  wrote:
> >> >
> >> > When doing searches in Kibana, I get error messages similar to
> >> > "Courier
> >> > Fetch: 919 of 2020 shards failed".  Deeper inspection reveals
> errors
> >> > like
> >> > this: "EsRejectedExecutionException[rejected execution (queue
> >> > capacity 1000)
> >> > on
> >> >
> >> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
> ]".
> >> >
> >> > A bit of investigation lead me to conclude that our Elasticsearch
> >> > server
> >> > was not sufficiently powerful, but I spun up a new one with four
> >> > times the
> >> > CPU 

Re: logging-es errors: shards failed

2016-07-15 Thread Peter Portante
Eric, Luke,

Do the logs from the ES instance itself flow into that ES instance?

-peter

On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck  wrote:
> I'm not sure that I can.  I clicked the "Archive" link for the logging-es
> pod and then changed the query in Kibana to "kubernetes_container_name:
> logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
> results, instead getting this error:
>
> Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699]
> Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb]
> Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
> Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9]
> Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477]
>
> When I initially clicked the "Archive" link, I saw a lot of messages with
> the kubernetes_container_name "logging-fluentd", which is not what I
> expected to see.
>
>
> On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante 
> wrote:
>>
>> Can you go back further in the logs to the point where the errors started?
>>
>> I am thinking about possible Java HEAP issues, or possibly ES
>> restarting for some reason.
>>
>> -peter
>>
>> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček  wrote:
>> > Also looking at this.
>> > Alex, is it possible to investigate if you were having some kind of
>> > network connection issues in the ES cluster (I mean between individual
>> > cluster nodes)?
>> >
>> > Regards,
>> > Lukáš
>> >
>> >
>> >
>> >
>> >> On 15 Jul 2016, at 17:08, Peter Portante  wrote:
>> >>
>> >> Just catching up on the thread, will get back to you all in a few ...
>> >>
>> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz 
>> >> wrote:
>> >>> Adding Lukas and Peter
>> >>>
>> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer  wrote:
>> 
>>  I believe the "queue capacity" there is the number of parallel
>>  searches
>>  that can be queued while the existing search workers operate. It
>>  sounds like
>>  it has plenty of capacity there and it has a different reason for
>>  rejecting
>>  the query. I would guess the data requested is missing given it
>>  couldn't
>>  fetch shards it expected to.
>> 
>>  The number of shards is a multiple (for redundancy) of the number of
>>  indices, and there is an index created per project per day. So even
>>  for a
>>  small cluster this doesn't sound out of line.
>> 
>>  Can you give a little more information about your logging deployment?
>>  Have
>>  you deployed multiple ES nodes for redundancy, and what are you using
>>  for
>>  storage? Could you attach full ES logs? How many OpenShift nodes and
>>  projects do you have? Any history of events that might have resulted
>>  in lost
>>  data?
>> 
>>  On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck 
>>  wrote:
>> >
>> > When doing searches in Kibana, I get error messages similar to
>> > "Courier
>> > Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors
>> > like
>> > this: "EsRejectedExecutionException[rejected execution (queue
>> > capacity 1000)
>> > on
>> >
>> > org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e]".
>> >
>> > A bit of investigation lead me to conclude that our Elasticsearch
>> > server
>> > was not sufficiently powerful, but I spun up a new one with four
>> > times the
>> > CPU and RAM of the original one, but the queue capacity is still
>> > only 1000.
>> > Also, 2020 seems like a really ridiculous number of shards.  Any
>> > idea what's
>> > going on here?
>> >
>> > --
>> >
>> > Alex Wauck // DevOps Engineer
>> >
>> > E X O S I T E
>> > www.exosite.com
>> >
>> > Making Machines More Human.
>> >
>> >
>> > ___
>> > users mailing list
>> > 

Re: Atomic Centos, can't upgrade

2016-07-15 Thread Colin Walters
On Mon, Jul 11, 2016, at 09:56 AM, Scott Dodson wrote:
> That commit is mostly related to the fact that we cannot
> upgrade/downgrade docker on atomic host like can on RHEL so abort the
> docker upgrade playbook early.

For short term fixes, it is however possible to use `atomic host deploy` to 
reset to
an earlier known version.  But it's not a long term solution because that also
means one isn't getting kernel security updates and such.

We're working on new mechanisms addressing the privileged/system container case.

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: logging-es errors: shards failed

2016-07-15 Thread Alex Wauck
I'm not sure that I can.  I clicked the "Archive" link for the logging-es
pod and then changed the query in Kibana to "kubernetes_container_name:
logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
results, instead getting this error:


   - *Index:*
unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@6b1f2699]
   - *Index:*
unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@66b9a5fb]
   - *Index:*
unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@512820e]
   - *Index:*
unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@3dce96b9]
   - *Index:*
unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
   *Shard:* 2 *Reason:* EsRejectedExecutionException[rejected execution
   (queue capacity 1000) on
   org.elasticsearch.search.action.SearchServiceTransportAction$23@2f774477]

When I initially clicked the "Archive" link, I saw a lot of messages with
the kubernetes_container_name "logging-fluentd", which is not what I
expected to see.


On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante 
wrote:

> Can you go back further in the logs to the point where the errors started?
>
> I am thinking about possible Java HEAP issues, or possibly ES
> restarting for some reason.
>
> -peter
>
> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček  wrote:
> > Also looking at this.
> > Alex, is it possible to investigate if you were having some kind of
> network connection issues in the ES cluster (I mean between individual
> cluster nodes)?
> >
> > Regards,
> > Lukáš
> >
> >
> >
> >
> >> On 15 Jul 2016, at 17:08, Peter Portante  wrote:
> >>
> >> Just catching up on the thread, will get back to you all in a few ...
> >>
> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz 
> wrote:
> >>> Adding Lukas and Peter
> >>>
> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer  wrote:
> 
>  I believe the "queue capacity" there is the number of parallel
> searches
>  that can be queued while the existing search workers operate. It
> sounds like
>  it has plenty of capacity there and it has a different reason for
> rejecting
>  the query. I would guess the data requested is missing given it
> couldn't
>  fetch shards it expected to.
> 
>  The number of shards is a multiple (for redundancy) of the number of
>  indices, and there is an index created per project per day. So even
> for a
>  small cluster this doesn't sound out of line.
> 
>  Can you give a little more information about your logging deployment?
> Have
>  you deployed multiple ES nodes for redundancy, and what are you using
> for
>  storage? Could you attach full ES logs? How many OpenShift nodes and
>  projects do you have? Any history of events that might have resulted
> in lost
>  data?
> 
>  On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck 
> wrote:
> >
> > When doing searches in Kibana, I get error messages similar to
> "Courier
> > Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors
> like
> > this: "EsRejectedExecutionException[rejected execution (queue
> capacity 1000)
> > on
> >
> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
> ]".
> >
> > A bit of investigation lead me to conclude that our Elasticsearch
> server
> > was not sufficiently powerful, but I spun up a new one with four
> times the
> > CPU and RAM of the original one, but the queue capacity is still
> only 1000.
> > Also, 2020 seems like a really ridiculous number of shards.  Any
> idea what's
> > going on here?
> >
> > --
> >
> > Alex Wauck // DevOps Engineer
> >
> > E X O S I T E
> > www.exosite.com
> >
> > Making Machines More Human.
> >
> >
> > ___
> > users mailing list
> > users@lists.openshift.redhat.com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >
> 
> 
>  ___
>  users mailing list
>  users@lists.openshift.redhat.com
>  

Re: Atomic Centos, can't upgrade

2016-07-15 Thread Philippe Lafoucrière
https://docs.openshift.org/latest/dev_guide/shared_memory.html

fixed the issue, but It seems something changed regarding /dev or shm
docker mounts between 1.2.0 and 1.2.1.
Can someone confirm?
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: logging-es errors: shards failed

2016-07-15 Thread Peter Portante
Can you go back further in the logs to the point where the errors started?

I am thinking about possible Java HEAP issues, or possibly ES
restarting for some reason.

-peter

On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček  wrote:
> Also looking at this.
> Alex, is it possible to investigate if you were having some kind of network 
> connection issues in the ES cluster (I mean between individual cluster nodes)?
>
> Regards,
> Lukáš
>
>
>
>
>> On 15 Jul 2016, at 17:08, Peter Portante  wrote:
>>
>> Just catching up on the thread, will get back to you all in a few ...
>>
>> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz  wrote:
>>> Adding Lukas and Peter
>>>
>>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer  wrote:

 I believe the "queue capacity" there is the number of parallel searches
 that can be queued while the existing search workers operate. It sounds 
 like
 it has plenty of capacity there and it has a different reason for rejecting
 the query. I would guess the data requested is missing given it couldn't
 fetch shards it expected to.

 The number of shards is a multiple (for redundancy) of the number of
 indices, and there is an index created per project per day. So even for a
 small cluster this doesn't sound out of line.

 Can you give a little more information about your logging deployment? Have
 you deployed multiple ES nodes for redundancy, and what are you using for
 storage? Could you attach full ES logs? How many OpenShift nodes and
 projects do you have? Any history of events that might have resulted in 
 lost
 data?

 On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck  wrote:
>
> When doing searches in Kibana, I get error messages similar to "Courier
> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors like
> this: "EsRejectedExecutionException[rejected execution (queue capacity 
> 1000)
> on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e]".
>
> A bit of investigation lead me to conclude that our Elasticsearch server
> was not sufficiently powerful, but I spun up a new one with four times the
> CPU and RAM of the original one, but the queue capacity is still only 
> 1000.
> Also, 2020 seems like a really ridiculous number of shards.  Any idea 
> what's
> going on here?
>
> --
>
> Alex Wauck // DevOps Engineer
>
> E X O S I T E
> www.exosite.com
>
> Making Machines More Human.
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>


 ___
 users mailing list
 users@lists.openshift.redhat.com
 http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: logging-es errors: shards failed

2016-07-15 Thread Eric Wolinetz
Adding Lukas and Peter

On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer  wrote:

> I believe the "queue capacity" there is the number of parallel searches
> that can be queued while the existing search workers operate. It sounds
> like it has plenty of capacity there and it has a different reason for
> rejecting the query. I would guess the data requested is missing given it
> couldn't fetch shards it expected to.
>
> The number of shards is a multiple (for redundancy) of the number of
> indices, and there is an index created per project per day. So even for a
> small cluster this doesn't sound out of line.
>
> Can you give a little more information about your logging deployment? Have
> you deployed multiple ES nodes for redundancy, and what are you using for
> storage? Could you attach full ES logs? How many OpenShift nodes and
> projects do you have? Any history of events that might have resulted in
> lost data?
>
> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck  wrote:
>
>> When doing searches in Kibana, I get error messages similar to "Courier
>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors like
>> this: "EsRejectedExecutionException[rejected execution (queue capacity
>> 1000) on
>> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
>> ]".
>>
>> A bit of investigation lead me to conclude that our Elasticsearch server
>> was not sufficiently powerful, but I spun up a new one with four times the
>> CPU and RAM of the original one, but the queue capacity is still only
>> 1000.  Also, 2020 seems like a really ridiculous number of shards.  Any
>> idea what's going on here?
>>
>> --
>>
>> Alex Wauck // DevOps Engineer
>>
>> *E X O S I T E*
>> *www.exosite.com *
>>
>> Making Machines More Human.
>>
>>
>> ___
>> users mailing list
>> users@lists.openshift.redhat.com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-15 Thread Philippe Lafoucrière
I confirm: it's fixed :)​
thanks!
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: logging-es errors: shards failed

2016-07-15 Thread Luke Meyer
I believe the "queue capacity" there is the number of parallel searches
that can be queued while the existing search workers operate. It sounds
like it has plenty of capacity there and it has a different reason for
rejecting the query. I would guess the data requested is missing given it
couldn't fetch shards it expected to.

The number of shards is a multiple (for redundancy) of the number of
indices, and there is an index created per project per day. So even for a
small cluster this doesn't sound out of line.

Can you give a little more information about your logging deployment? Have
you deployed multiple ES nodes for redundancy, and what are you using for
storage? Could you attach full ES logs? How many OpenShift nodes and
projects do you have? Any history of events that might have resulted in
lost data?

On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck  wrote:

> When doing searches in Kibana, I get error messages similar to "Courier
> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors like
> this: "EsRejectedExecutionException[rejected execution (queue capacity
> 1000) on
> org.elasticsearch.search.action.SearchServiceTransportAction$23@14522b8e
> ]".
>
> A bit of investigation lead me to conclude that our Elasticsearch server
> was not sufficiently powerful, but I spun up a new one with four times the
> CPU and RAM of the original one, but the queue capacity is still only
> 1000.  Also, 2020 seems like a really ridiculous number of shards.  Any
> idea what's going on here?
>
> --
>
> Alex Wauck // DevOps Engineer
>
> *E X O S I T E*
> *www.exosite.com *
>
> Making Machines More Human.
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users