Re: Usergrid2 - Reindexing ES data

Harish Singh Bisht Mon, 25 Jan 2016 05:07:51 -0800

Hi Michael,

(resending email as i found another type of error from the logs)
Thank you very much for the help. I tried the full system re-index API, but
i am still seeing the below issues:


1. One of the ES indexes (usergrid_applications_4) still has 15 shards in
"INITIALIZING" state. The ES cluster health is red.
 Log Extract from ES (details in appendices) shows that:
"[usergrid_applications_4][14] failed to start shard.".

2. Usergrid logs show the below types of errors after invoking the re-index
api (details in appendices):
Type 1
UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
Type 2
corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
Type 3
corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>-
Missing messages from queue post operation
Type 4
core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>-
Usergrid-SQS-Pool task queue full, rejecting task
rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and running
in thread QueueConsumer_11
Type 5
EsRejectedExecutionException[rejected execution (queue capacity 300) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13afccc8
];

Please suggest the next steps. Appreciate the help.

Many Thanks,
Harish



-------------------------------------------------------------------
*Appendices*
*1. ELASTICSEARCH LOG*
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25
06:40:44,227][WARN
][indices.cluster          ] [Blindside] [usergrid_applications_4][14]
failed to start shard
Jan 25 17:10:23 Elasticsearch elasticsearch.log:
 org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[usergrid_applications_4][14] failed to recover shard
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
java.lang.Thread.run(Thread.java:745)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
org.elasticsearch.index.translog.TranslogCorruptedException: translog
corruption while reading from stream
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 4 more
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
java.io.EOFException
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 5 more
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25
06:40:44,279][WARN
][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
sending failed shard for [usergrid_applications_4][14],
node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
[fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
recover shard]; nested: TranslogCorruptedException[translog corruption
while reading from stream]; nested: EOFException; ]]
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25
06:40:44,279][WARN
][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
received shard failed for [usergrid_applications_4][14],
node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
[fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
recover shard]; nested: TranslogCorruptedException[translog corruption
while reading from stream]; nested: EOFException; ]]
......
*2. USERGRID LOG*
Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>-
Rebuilding all applications
Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>-
Sending batch of 1000 to be indexed.
Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>-
Unable to index
id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET),
type=entity, index=usergrid_applications_4,
failureMessage=UnavailableShardsException[[usergrid_applications_4][4]
Primary shard is not active or isn't assigned is a known node. Timeout:
[1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
.....
Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
Jan 25 16:05:21 Usergrid-2 catalina.out:  java.lang.RuntimeException: Error
during processing of bulk index operations one of the responses failed.
Jan 25 16:05:21 Usergrid-2 catalina.out:
 UnavailableShardsException[[usergrid_applications_4][4] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
Jan 25 16:05:21 Usergrid-2 catalina.out:
 UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
Jan 25 16:05:21 Usergrid-2 catalina.out:
 UnavailableShardsException[[usergrid_applications_4][10] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@445968e3]
.......
Jan 25 16:53:02 Usergrid-2 catalina.out: 2016-01-25 06:23:02 ERROR
index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-97>-
Unable to index
id=appId(65ccc2b7-bde0-11e5-88d3-0a75091e6d25,application).entityId(b7dcf6c9-bf62-11e5-88d3-0a75091e6d25,activity).version(b7dcf6cb-bf62-11e5-88d3-0a75091e6d25).nodeId(65ccc2b7-bde0-11e5-88d3-0a75091e6d25,application).edgeName(zzzcollzzz|activities).nodeType(TARGET),
type=entity, index=usergrid_applications_3,
failureMessage=RemoteTransportException[[Blindside][inet[/10.0.0.148:9300]][indices:data/write/bulk[s]]];
nested: EsRejectedExecutionException[rejected execution (queue capacity
300) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13afccc8
];
Jan 25 16:53:03 Usergrid-2 catalina.out:
RemoteTransportException[[Blindside][inet[/10.0.0.148:9300]][indices:data/write/bulk[s]]];
nested: EsRejectedExecutionException[rejected execution (queue capacity
300) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@59da0cf2
];

On Mon, Jan 25, 2016 at 5:54 PM, Harish Singh Bisht <
[email protected]> wrote:

> Hi Michael,
>
> Thank you vert much for the help. I tried the full system re-index API,
> but i am still seeing the below issues:
>
> 1. One of the ES indexes (usergrid_applications_4) still has 15 shards in
> "INITIALIZING" state. The ES cluster health is red.
>  Log Extract from ES (details in appendices) shows that:
> "[usergrid_applications_4][14] failed to start shard.".
>
> 2. Usergrid logs show the below types of errors after invoking the
> re-index api (details in appendices):
> Type 1
> UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
> not active or isn't assigned is a known node. Timeout: [1m], request:
> org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
> Type 2
> corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
> Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
> Type 3
> corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>-
> Missing messages from queue post operation
> Type 4
> core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>-
> Usergrid-SQS-Pool task queue full, rejecting task
> rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and
> running in thread QueueConsumer_11
>
> Please suggest the next steps. Appriciate the help.
>
> Many Thanks,
> Harish
>
>
>
> -------------------------------------------------------------------
> *Appendices*
> *1. ELASTICSEARCH LOG*
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
> 06:40:44,227][WARN
> ][indices.cluster          ] [Blindside] [usergrid_applications_4][14]
> failed to start shard
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:
>  org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
> [usergrid_applications_4][14] failed to recover shard
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> java.lang.Thread.run(Thread.java:745)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
> org.elasticsearch.index.translog.TranslogCorruptedException: translog
> corruption while reading from stream
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 4 more
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
> java.io.EOFException
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
> org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 5 more
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
> 06:40:44,279][WARN
> ][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
> sending failed shard for [usergrid_applications_4][14],
> node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
> [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
> [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
> recover shard]; nested: TranslogCorruptedException[translog corruption
> while reading from stream]; nested: EOFException; ]]
> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
> 06:40:44,279][WARN
> ][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
> received shard failed for [usergrid_applications_4][14],
> node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
> [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
> [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
> recover shard]; nested: TranslogCorruptedException[translog corruption
> while reading from stream]; nested: EOFException; ]]
> ......
> *2. USERGRID LOG*
> Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
> rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>-
> Rebuilding all applications
> Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
> corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>-
> Sending batch of 1000 to be indexed.
> Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
> index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>-
> Unable to index
> id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET),
> type=entity, index=usergrid_applications_4,
> failureMessage=UnavailableShardsException[[usergrid_applications_4][4]
> Primary shard is not active or isn't assigned is a known node. Timeout:
> [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
> .....
> Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
> corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
> Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
> Jan 25 16:05:21 Usergrid-2 catalina.out:  java.lang.RuntimeException:
> Error during processing of bulk index operations one of the responses
> failed.
> Jan 25 16:05:21 Usergrid-2 catalina.out:
>  UnavailableShardsException[[usergrid_applications_4][4] Primary shard is
> not active or isn't assigned is a known node. Timeout: [1m], request:
> org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
> Jan 25 16:05:21 Usergrid-2 catalina.out:
>  UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
> not active or isn't assigned is a known node. Timeout: [1m], request:
> org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
> Jan 25 16:05:21 Usergrid-2 catalina.out:
>  UnavailableShardsException[[usergrid_applications_4][10] Primary shard is
> not active or isn't assigned is a known node. Timeout: [1m], request:
> org.elasticsearch.action.bulk.BulkShardRequest@445968e3]
>
>
> On Fri, Jan 22, 2016 at 10:29 PM, Michael Russo <[email protected]>
> wrote:
>
>> Hi Harish,
>>
>> Yeah, in Usergrid 2 you can re-index all of the entity data that exists
>> in Cassandra.  Here are example curl requests that invoke the reindex APIs:
>>
>> *Full system re-index:*
>>
>> curl -i -X POST -u <sysadmin user>:<sysadmin pass> "
>> http://localhost:8080/system/index/rebuild";
>>
>> *Per application re-index:*
>>
>> curl -i -X POST -u <sysadmin user>:<sysadmin pass> "
>> http://localhost:8080/system/index/rebuild/<application uuid>"
>>
>> Thanks.
>> -Michael
>>
>> On Fri, Jan 22, 2016 at 4:37 AM, Harish Singh Bisht <
>> [email protected]> wrote:
>>
>>> Hi Team,
>>>
>>> We have been our testing our application based on Usergrid 2 (master
>>> branch) and started noticing unusual poor performance with spikes in the
>>> response time.
>>>
>>> Our investigations revealed that during the load testing we ran out of
>>> HDD space on the elasticsearch single node cluster. This led to indexing
>>> failures.
>>>
>>> So we increased the HDD space and restarted ES. But now the cluster
>>> health is red and alot of shards are in the initializing state. It seem
>>> data has been lost on the ES node.
>>>
>>> Is there any way to recover the lost data in ES? Specifically, is there
>>> a way to trigger a re-index of data from Cassandra to ES?
>>>
>>> Appreciate the help.
>>>
>>> Thanks
>>> Harish
>>>
>>> --
>>> Regards,
>>> Harish Singh Bisht
>>>
>>>
>>
>
>
> --
> Regards,
> Harish Singh Bisht
>
>


-- 
Regards,
Harish Singh Bisht

Re: Usergrid2 - Reindexing ES data

Reply via email to