Hi Michael, Thank you vert much for the help. I tried the full system re-index API, but i am still seeing the below issues:
1. One of the ES indexes (usergrid_applications_4) still has 15 shards in "INITIALIZING" state. The ES cluster health is red. Log Extract from ES (details in appendices) shows that: "[usergrid_applications_4][14] failed to start shard.". 2. Usergrid logs show the below types of errors after invoking the re-index api (details in appendices): Type 1 UnavailableShardsException[[usergrid_applications_4][7] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@155fc46c] Type 2 corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>- Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1 Type 3 corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>- Missing messages from queue post operation Type 4 core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>- Usergrid-SQS-Pool task queue full, rejecting task rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and running in thread QueueConsumer_11 Please suggest the next steps. Appriciate the help. Many Thanks, Harish ------------------------------------------------------------------- *Appendices* *1. ELASTICSEARCH LOG* Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 06:40:44,227][WARN ][indices.cluster ] [Blindside] [usergrid_applications_4][14] failed to start shard Jan 25 17:10:23 Elasticsearch elasticsearch.log: org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [usergrid_applications_4][14] failed to recover shard Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at java.lang.Thread.run(Thread.java:745) Jan 25 17:10:23 Elasticsearch elasticsearch.log: Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog corruption while reading from stream Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257) Jan 25 17:10:23 Elasticsearch elasticsearch.log: ... 4 more Jan 25 17:10:23 Elasticsearch elasticsearch.log: Caused by: java.io.EOFException Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68) Jan 25 17:10:23 Elasticsearch elasticsearch.log: ... 5 more Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 06:40:44,279][WARN ][cluster.action.shard ] [Blindside] [usergrid_applications_4][14] sending failed shard for [usergrid_applications_4][14], node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to recover shard]; nested: TranslogCorruptedException[translog corruption while reading from stream]; nested: EOFException; ]] Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 06:40:44,279][WARN ][cluster.action.shard ] [Blindside] [usergrid_applications_4][14] received shard failed for [usergrid_applications_4][14], node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to recover shard]; nested: TranslogCorruptedException[translog corruption while reading from stream]; nested: EOFException; ]] ...... *2. USERGRID LOG* Jan 25 16:05:19 Usergrid-2 catalina.out: 2016-01-25 05:35:19 INFO rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>- Rebuilding all applications Jan 25 16:05:19 Usergrid-2 catalina.out: 2016-01-25 05:35:19 INFO corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>- Sending batch of 1000 to be indexed. Jan 25 16:05:21 Usergrid-2 catalina.out: 2016-01-25 05:35:20 ERROR index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>- Unable to index id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET), type=entity, index=usergrid_applications_4, failureMessage=UnavailableShardsException[[usergrid_applications_4][4] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1] ..... Jan 25 16:05:21 Usergrid-2 catalina.out: 2016-01-25 05:35:20 ERROR corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>- Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1 Jan 25 16:05:21 Usergrid-2 catalina.out: java.lang.RuntimeException: Error during processing of bulk index operations one of the responses failed. Jan 25 16:05:21 Usergrid-2 catalina.out: UnavailableShardsException[[usergrid_applications_4][4] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1] Jan 25 16:05:21 Usergrid-2 catalina.out: UnavailableShardsException[[usergrid_applications_4][7] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@155fc46c] Jan 25 16:05:21 Usergrid-2 catalina.out: UnavailableShardsException[[usergrid_applications_4][10] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@445968e3] On Fri, Jan 22, 2016 at 10:29 PM, Michael Russo <[email protected]> wrote: > Hi Harish, > > Yeah, in Usergrid 2 you can re-index all of the entity data that exists in > Cassandra. Here are example curl requests that invoke the reindex APIs: > > *Full system re-index:* > > curl -i -X POST -u <sysadmin user>:<sysadmin pass> " > http://localhost:8080/system/index/rebuild" > > *Per application re-index:* > > curl -i -X POST -u <sysadmin user>:<sysadmin pass> " > http://localhost:8080/system/index/rebuild/<application uuid>" > > Thanks. > -Michael > > On Fri, Jan 22, 2016 at 4:37 AM, Harish Singh Bisht < > [email protected]> wrote: > >> Hi Team, >> >> We have been our testing our application based on Usergrid 2 (master >> branch) and started noticing unusual poor performance with spikes in the >> response time. >> >> Our investigations revealed that during the load testing we ran out of >> HDD space on the elasticsearch single node cluster. This led to indexing >> failures. >> >> So we increased the HDD space and restarted ES. But now the cluster >> health is red and alot of shards are in the initializing state. It seem >> data has been lost on the ES node. >> >> Is there any way to recover the lost data in ES? Specifically, is there a >> way to trigger a re-index of data from Cassandra to ES? >> >> Appreciate the help. >> >> Thanks >> Harish >> >> -- >> Regards, >> Harish Singh Bisht >> >> > -- Regards, Harish Singh Bisht
