Hi Michael,

Thank you vert much for the help. I tried the full system re-index API, but
i am still seeing the below issues:

1. One of the ES indexes (usergrid_applications_4) still has 15 shards in
"INITIALIZING" state. The ES cluster health is red.
 Log Extract from ES (details in appendices) shows that:
"[usergrid_applications_4][14] failed to start shard.".

2. Usergrid logs show the below types of errors after invoking the re-index
api (details in appendices):
Type 1
UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
Type 2
corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
Type 3
corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>-
Missing messages from queue post operation
Type 4
core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>-
Usergrid-SQS-Pool task queue full, rejecting task
rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and running
in thread QueueConsumer_11

Please suggest the next steps. Appriciate the help.

Many Thanks,
Harish



-------------------------------------------------------------------
*Appendices*
*1. ELASTICSEARCH LOG*
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25
06:40:44,227][WARN
][indices.cluster          ] [Blindside] [usergrid_applications_4][14]
failed to start shard
Jan 25 17:10:23 Elasticsearch elasticsearch.log:
 org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[usergrid_applications_4][14] failed to recover shard
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
java.lang.Thread.run(Thread.java:745)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
org.elasticsearch.index.translog.TranslogCorruptedException: translog
corruption while reading from stream
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 4 more
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
java.io.EOFException
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 5 more
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25
06:40:44,279][WARN
][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
sending failed shard for [usergrid_applications_4][14],
node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
[fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
recover shard]; nested: TranslogCorruptedException[translog corruption
while reading from stream]; nested: EOFException; ]]
Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25
06:40:44,279][WARN
][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
received shard failed for [usergrid_applications_4][14],
node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
[fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
recover shard]; nested: TranslogCorruptedException[translog corruption
while reading from stream]; nested: EOFException; ]]
......
*2. USERGRID LOG*
Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>-
Rebuilding all applications
Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>-
Sending batch of 1000 to be indexed.
Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>-
Unable to index
id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET),
type=entity, index=usergrid_applications_4,
failureMessage=UnavailableShardsException[[usergrid_applications_4][4]
Primary shard is not active or isn't assigned is a known node. Timeout:
[1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
.....
Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
Jan 25 16:05:21 Usergrid-2 catalina.out:  java.lang.RuntimeException: Error
during processing of bulk index operations one of the responses failed.
Jan 25 16:05:21 Usergrid-2 catalina.out:
 UnavailableShardsException[[usergrid_applications_4][4] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
Jan 25 16:05:21 Usergrid-2 catalina.out:
 UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
Jan 25 16:05:21 Usergrid-2 catalina.out:
 UnavailableShardsException[[usergrid_applications_4][10] Primary shard is
not active or isn't assigned is a known node. Timeout: [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@445968e3]


On Fri, Jan 22, 2016 at 10:29 PM, Michael Russo <[email protected]>
wrote:

> Hi Harish,
>
> Yeah, in Usergrid 2 you can re-index all of the entity data that exists in
> Cassandra.  Here are example curl requests that invoke the reindex APIs:
>
> *Full system re-index:*
>
> curl -i -X POST -u <sysadmin user>:<sysadmin pass> "
> http://localhost:8080/system/index/rebuild";
>
> *Per application re-index:*
>
> curl -i -X POST -u <sysadmin user>:<sysadmin pass> "
> http://localhost:8080/system/index/rebuild/<application uuid>"
>
> Thanks.
> -Michael
>
> On Fri, Jan 22, 2016 at 4:37 AM, Harish Singh Bisht <
> [email protected]> wrote:
>
>> Hi Team,
>>
>> We have been our testing our application based on Usergrid 2 (master
>> branch) and started noticing unusual poor performance with spikes in the
>> response time.
>>
>> Our investigations revealed that during the load testing we ran out of
>> HDD space on the elasticsearch single node cluster. This led to indexing
>> failures.
>>
>> So we increased the HDD space and restarted ES. But now the cluster
>> health is red and alot of shards are in the initializing state. It seem
>> data has been lost on the ES node.
>>
>> Is there any way to recover the lost data in ES? Specifically, is there a
>> way to trigger a re-index of data from Cassandra to ES?
>>
>> Appreciate the help.
>>
>> Thanks
>> Harish
>>
>> --
>> Regards,
>> Harish Singh Bisht
>>
>>
>


-- 
Regards,
Harish Singh Bisht

Reply via email to