Hi Michael, (resending email as i found another type of error from the logs) Thank you very much for the help. I tried the full system re-index API, but i am still seeing the below issues:
1. One of the ES indexes (usergrid_applications_4) still has 15 shards in "INITIALIZING" state. The ES cluster health is red. Log Extract from ES (details in appendices) shows that: "[usergrid_applications_4][14] failed to start shard.". 2. Usergrid logs show the below types of errors after invoking the re-index api (details in appendices): Type 1 UnavailableShardsException[[usergrid_applications_4][7] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@155fc46c] Type 2 corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>- Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1 Type 3 corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>- Missing messages from queue post operation Type 4 core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>- Usergrid-SQS-Pool task queue full, rejecting task rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and running in thread QueueConsumer_11 Type 5 EsRejectedExecutionException[rejected execution (queue capacity 300) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13afccc8 ]; Please suggest the next steps. Appreciate the help. Many Thanks, Harish ------------------------------------------------------------------- *Appendices* *1. ELASTICSEARCH LOG* Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 06:40:44,227][WARN ][indices.cluster ] [Blindside] [usergrid_applications_4][14] failed to start shard Jan 25 17:10:23 Elasticsearch elasticsearch.log: org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [usergrid_applications_4][14] failed to recover shard Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at java.lang.Thread.run(Thread.java:745) Jan 25 17:10:23 Elasticsearch elasticsearch.log: Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog corruption while reading from stream Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257) Jan 25 17:10:23 Elasticsearch elasticsearch.log: ... 4 more Jan 25 17:10:23 Elasticsearch elasticsearch.log: Caused by: java.io.EOFException Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495) Jan 25 17:10:23 Elasticsearch elasticsearch.log: at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68) Jan 25 17:10:23 Elasticsearch elasticsearch.log: ... 5 more Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 06:40:44,279][WARN ][cluster.action.shard ] [Blindside] [usergrid_applications_4][14] sending failed shard for [usergrid_applications_4][14], node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to recover shard]; nested: TranslogCorruptedException[translog corruption while reading from stream]; nested: EOFException; ]] Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 06:40:44,279][WARN ][cluster.action.shard ] [Blindside] [usergrid_applications_4][14] received shard failed for [usergrid_applications_4][14], node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to recover shard]; nested: TranslogCorruptedException[translog corruption while reading from stream]; nested: EOFException; ]] ...... *2. USERGRID LOG* Jan 25 16:05:19 Usergrid-2 catalina.out: 2016-01-25 05:35:19 INFO rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>- Rebuilding all applications Jan 25 16:05:19 Usergrid-2 catalina.out: 2016-01-25 05:35:19 INFO corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>- Sending batch of 1000 to be indexed. Jan 25 16:05:21 Usergrid-2 catalina.out: 2016-01-25 05:35:20 ERROR index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>- Unable to index id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET), type=entity, index=usergrid_applications_4, failureMessage=UnavailableShardsException[[usergrid_applications_4][4] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1] ..... Jan 25 16:05:21 Usergrid-2 catalina.out: 2016-01-25 05:35:20 ERROR corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>- Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1 Jan 25 16:05:21 Usergrid-2 catalina.out: java.lang.RuntimeException: Error during processing of bulk index operations one of the responses failed. Jan 25 16:05:21 Usergrid-2 catalina.out: UnavailableShardsException[[usergrid_applications_4][4] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1] Jan 25 16:05:21 Usergrid-2 catalina.out: UnavailableShardsException[[usergrid_applications_4][7] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@155fc46c] Jan 25 16:05:21 Usergrid-2 catalina.out: UnavailableShardsException[[usergrid_applications_4][10] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@445968e3] ....... Jan 25 16:53:02 Usergrid-2 catalina.out: 2016-01-25 06:23:02 ERROR index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-97>- Unable to index id=appId(65ccc2b7-bde0-11e5-88d3-0a75091e6d25,application).entityId(b7dcf6c9-bf62-11e5-88d3-0a75091e6d25,activity).version(b7dcf6cb-bf62-11e5-88d3-0a75091e6d25).nodeId(65ccc2b7-bde0-11e5-88d3-0a75091e6d25,application).edgeName(zzzcollzzz|activities).nodeType(TARGET), type=entity, index=usergrid_applications_3, failureMessage=RemoteTransportException[[Blindside][inet[/10.0.0.148:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 300) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13afccc8 ]; Jan 25 16:53:03 Usergrid-2 catalina.out: RemoteTransportException[[Blindside][inet[/10.0.0.148:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 300) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@59da0cf2 ]; On Mon, Jan 25, 2016 at 5:54 PM, Harish Singh Bisht < [email protected]> wrote: > Hi Michael, > > Thank you vert much for the help. I tried the full system re-index API, > but i am still seeing the below issues: > > 1. One of the ES indexes (usergrid_applications_4) still has 15 shards in > "INITIALIZING" state. The ES cluster health is red. > Log Extract from ES (details in appendices) shows that: > "[usergrid_applications_4][14] failed to start shard.". > > 2. Usergrid logs show the below types of errors after invoking the > re-index api (details in appendices): > Type 1 > UnavailableShardsException[[usergrid_applications_4][7] Primary shard is > not active or isn't assigned is a known node. Timeout: [1m], request: > org.elasticsearch.action.bulk.BulkShardRequest@155fc46c] > Type 2 > corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>- > Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1 > Type 3 > corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>- > Missing messages from queue post operation > Type 4 > core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>- > Usergrid-SQS-Pool task queue full, rejecting task > rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and > running in thread QueueConsumer_11 > > Please suggest the next steps. Appriciate the help. > > Many Thanks, > Harish > > > > ------------------------------------------------------------------- > *Appendices* > *1. ELASTICSEARCH LOG* > Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 > 06:40:44,227][WARN > ][indices.cluster ] [Blindside] [usergrid_applications_4][14] > failed to start shard > Jan 25 17:10:23 Elasticsearch elasticsearch.log: > org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: > [usergrid_applications_4][14] failed to recover shard > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > java.lang.Thread.run(Thread.java:745) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: Caused by: > org.elasticsearch.index.translog.TranslogCorruptedException: translog > corruption while reading from stream > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: ... 4 more > Jan 25 17:10:23 Elasticsearch elasticsearch.log: Caused by: > java.io.EOFException > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: at > org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68) > Jan 25 17:10:23 Elasticsearch elasticsearch.log: ... 5 more > Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 > 06:40:44,279][WARN > ][cluster.action.shard ] [Blindside] [usergrid_applications_4][14] > sending failed shard for [usergrid_applications_4][14], > node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID > [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message > [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to > recover shard]; nested: TranslogCorruptedException[translog corruption > while reading from stream]; nested: EOFException; ]] > Jan 25 17:10:23 Elasticsearch elasticsearch.log: [2016-01-25 > 06:40:44,279][WARN > ][cluster.action.shard ] [Blindside] [usergrid_applications_4][14] > received shard failed for [usergrid_applications_4][14], > node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID > [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message > [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to > recover shard]; nested: TranslogCorruptedException[translog corruption > while reading from stream]; nested: EOFException; ]] > ...... > *2. USERGRID LOG* > Jan 25 16:05:19 Usergrid-2 catalina.out: 2016-01-25 05:35:19 INFO > rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>- > Rebuilding all applications > Jan 25 16:05:19 Usergrid-2 catalina.out: 2016-01-25 05:35:19 INFO > corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>- > Sending batch of 1000 to be indexed. > Jan 25 16:05:21 Usergrid-2 catalina.out: 2016-01-25 05:35:20 ERROR > index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>- > Unable to index > id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET), > type=entity, index=usergrid_applications_4, > failureMessage=UnavailableShardsException[[usergrid_applications_4][4] > Primary shard is not active or isn't assigned is a known node. Timeout: > [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1] > ..... > Jan 25 16:05:21 Usergrid-2 catalina.out: 2016-01-25 05:35:20 ERROR > corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>- > Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1 > Jan 25 16:05:21 Usergrid-2 catalina.out: java.lang.RuntimeException: > Error during processing of bulk index operations one of the responses > failed. > Jan 25 16:05:21 Usergrid-2 catalina.out: > UnavailableShardsException[[usergrid_applications_4][4] Primary shard is > not active or isn't assigned is a known node. Timeout: [1m], request: > org.elasticsearch.action.bulk.BulkShardRequest@3d277a1] > Jan 25 16:05:21 Usergrid-2 catalina.out: > UnavailableShardsException[[usergrid_applications_4][7] Primary shard is > not active or isn't assigned is a known node. Timeout: [1m], request: > org.elasticsearch.action.bulk.BulkShardRequest@155fc46c] > Jan 25 16:05:21 Usergrid-2 catalina.out: > UnavailableShardsException[[usergrid_applications_4][10] Primary shard is > not active or isn't assigned is a known node. Timeout: [1m], request: > org.elasticsearch.action.bulk.BulkShardRequest@445968e3] > > > On Fri, Jan 22, 2016 at 10:29 PM, Michael Russo <[email protected]> > wrote: > >> Hi Harish, >> >> Yeah, in Usergrid 2 you can re-index all of the entity data that exists >> in Cassandra. Here are example curl requests that invoke the reindex APIs: >> >> *Full system re-index:* >> >> curl -i -X POST -u <sysadmin user>:<sysadmin pass> " >> http://localhost:8080/system/index/rebuild" >> >> *Per application re-index:* >> >> curl -i -X POST -u <sysadmin user>:<sysadmin pass> " >> http://localhost:8080/system/index/rebuild/<application uuid>" >> >> Thanks. >> -Michael >> >> On Fri, Jan 22, 2016 at 4:37 AM, Harish Singh Bisht < >> [email protected]> wrote: >> >>> Hi Team, >>> >>> We have been our testing our application based on Usergrid 2 (master >>> branch) and started noticing unusual poor performance with spikes in the >>> response time. >>> >>> Our investigations revealed that during the load testing we ran out of >>> HDD space on the elasticsearch single node cluster. This led to indexing >>> failures. >>> >>> So we increased the HDD space and restarted ES. But now the cluster >>> health is red and alot of shards are in the initializing state. It seem >>> data has been lost on the ES node. >>> >>> Is there any way to recover the lost data in ES? Specifically, is there >>> a way to trigger a re-index of data from Cassandra to ES? >>> >>> Appreciate the help. >>> >>> Thanks >>> Harish >>> >>> -- >>> Regards, >>> Harish Singh Bisht >>> >>> >> > > > -- > Regards, > Harish Singh Bisht > > -- Regards, Harish Singh Bisht
