Hi Michael,

Thank you very much for the detailed explanation. I deleted the ES data
folders, increased the ES bulk queue size to 2000, restarted ES / UserGrid
and then started the re-indexing process. This fixed most of the errors - a
big thanks to you.
Below are my observations when i started the re-indexing process today:

1. Usergrid logs show the below warnings very frequently (Usergrid-SQS-Pool
task queue full):
Jan 27 14:33:00 Usergrid-2 catalina.out:  2016-01-27 04:03:00 INFO
corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-14>-
Sending batch of 1000 to be indexed.
Jan 27 14:33:01 Usergrid-2 catalina.out:  2016-01-27 04:03:01 WARN
core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_9>-
Usergrid-SQS-Pool task queue full, rejecting task
rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@4ef107a and running
in thread QueueConsumer_9

Does this mean some index requests are being rejected, resulting in a loss
of data in ES? Do i need to set the scheduler.io.threads to a higher value,
in the usergrid property file? Will this prevent a loss of data during the
re-indexing.

2. I needed to trigger the re-index api twice, as the first time it appears
to have hung / stopped in between. (only 2 indexes were created out of 6,
and no docs were getting created after that).
After waiting for a few hours, I re-triggered the rebuild api again, which
seems to be working well now (the 5th index is is in process)

3. Does the reindex 'status' api "
http://localhost:8080/system/index/rebuild/<job-uuid>" return the number of
indexing errors / failures etc?

Please advise. Appreciate the help.

Many Thanks,
Harish

On Mon, Jan 25, 2016 at 10:15 PM, Michael Russo <[email protected]>
wrote:

> Hi Harish,
>
> I'm going to try explaining the errors by type:
>
> 1)
>
> This is occurring because of Elasticsearch not being able to start the
> shard for the index.  If your cluster health is red, this is possible that
> you don't have enough disk space or at one point you did not have enough
> disk space so the already created shards for Usergrid indexes are corrupt.
> Since Usergrid can completely rebuild the index, I'd suggest that you
> completely delete the Elasticsearch data directories, restart
> Elasticsearch, restart Usergrid, and begin the process again for
> re-indexing.  As an FYI we have configured in the code a default of 18
> shards and 1 replica used for each index.
>
> 2.)
>
> *Type 1* - Example exceptions that can occur due to issues see in #1.
> Usergrid's use of the Elasticsearch sdk as a client will print these type
> of statements as it's aware of the Elasticsearch cluster state.
>
> *Type 2* - This shows that the Async index queue consumers were unable to
> index an entity.  This is most likely due to #1 as well, but does not
> always have to be the cause of this exception.  If for any reason the
> consumer threads get some exception while trying to index the entity at
> Elasticsearch, this type of error will be seen.
>
> *Type 3 *- This message goes with 'Type 2'.
>
> *Type 4* - We have a finite threadpool size set for the asynchronous
> indexing (Usergrid-SQS-Pool).  This default used to be 100, but now is of
> size 40 in master branch.  If there is not a thread available (rejected
> task) in the pool, we implemented a 'caller runs' strategy in which the
> thread trying to hand-off the event (Queue_Consumer) to the pool will
> execute the request.  The number of threads available in this pool can be
> configured with the *scheduler.io.threads* property.
>
> *Type 5* - Elasticsearch also maintains threadpool sizes, and will queue
> up requests when a thread is not available to process.  The default queue
> sizes are fairly small if you're trying for high concurrency.  However, if
> there is a problem with ES and it's unable to process requests properly,
> the queue would likely fill up regardless.  See
> https://www.elastic.co/guide/en/elasticsearch/reference/2.x/modules-threadpool.html.
> As an FYI when we scaled Usergrid to 10,000 requests per second, we set our
> queue sizes for search and bulk to 5000.  That is a bit high, you should
> probably start with something lower like 1000 if you plan to change.
>
>
> Hope this info helps!  Let me know how things progress.
>
> Thanks!
> -Michael
>
>
> On Mon, Jan 25, 2016 at 5:06 AM, Harish Singh Bisht <
> [email protected]> wrote:
>
>> Hi Michael,
>>
>> (resending email as i found another type of error from the logs)
>> Thank you very much for the help. I tried the full system re-index API,
>> but i am still seeing the below issues:
>>
>> 1. One of the ES indexes (usergrid_applications_4) still has 15 shards in
>> "INITIALIZING" state. The ES cluster health is red.
>>  Log Extract from ES (details in appendices) shows that:
>> "[usergrid_applications_4][14] failed to start shard.".
>>
>> 2. Usergrid logs show the below types of errors after invoking the
>> re-index api (details in appendices):
>> Type 1
>> UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
>> not active or isn't assigned is a known node. Timeout: [1m], request:
>> org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
>> Type 2
>> corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
>> Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
>> Type 3
>> corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>-
>> Missing messages from queue post operation
>> Type 4
>> core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>-
>> Usergrid-SQS-Pool task queue full, rejecting task
>> rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and
>> running in thread QueueConsumer_11
>> Type 5
>> EsRejectedExecutionException[rejected execution (queue capacity 300) on
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13afccc8
>> ];
>>
>> Please suggest the next steps. Appreciate the help.
>>
>> Many Thanks,
>> Harish
>>
>>
>>
>> -------------------------------------------------------------------
>> *Appendices*
>> *1. ELASTICSEARCH LOG*
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
>> 06:40:44,227][WARN
>> ][indices.cluster          ] [Blindside] [usergrid_applications_4][14]
>> failed to start shard
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:
>>  org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
>> [usergrid_applications_4][14] failed to recover shard
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> java.lang.Thread.run(Thread.java:745)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
>> org.elasticsearch.index.translog.TranslogCorruptedException: translog
>> corruption while reading from stream
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 4 more
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
>> java.io.EOFException
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>> org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 5 more
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
>> 06:40:44,279][WARN
>> ][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
>> sending failed shard for [usergrid_applications_4][14],
>> node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
>> [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
>> [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
>> recover shard]; nested: TranslogCorruptedException[translog corruption
>> while reading from stream]; nested: EOFException; ]]
>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
>> 06:40:44,279][WARN
>> ][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
>> received shard failed for [usergrid_applications_4][14],
>> node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
>> [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
>> [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
>> recover shard]; nested: TranslogCorruptedException[translog corruption
>> while reading from stream]; nested: EOFException; ]]
>> ......
>> *2. USERGRID LOG*
>> Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
>> rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>-
>> Rebuilding all applications
>> Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
>> corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>-
>> Sending batch of 1000 to be indexed.
>> Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
>> index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>-
>> Unable to index
>> id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET),
>> type=entity, index=usergrid_applications_4,
>> failureMessage=UnavailableShardsException[[usergrid_applications_4][4]
>> Primary shard is not active or isn't assigned is a known node. Timeout:
>> [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
>> .....
>> Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
>> corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
>> Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
>> Jan 25 16:05:21 Usergrid-2 catalina.out:  java.lang.RuntimeException:
>> Error during processing of bulk index operations one of the responses
>> failed.
>> Jan 25 16:05:21 Usergrid-2 catalina.out:
>>  UnavailableShardsException[[usergrid_applications_4][4] Primary shard is
>> not active or isn't assigned is a known node. Timeout: [1m], request:
>> org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
>> Jan 25 16:05:21 Usergrid-2 catalina.out:
>>  UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
>> not active or isn't assigned is a known node. Timeout: [1m], request:
>> org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
>> Jan 25 16:05:21 Usergrid-2 catalina.out:
>>  UnavailableShardsException[[usergrid_applications_4][10] Primary shard is
>> not active or isn't assigned is a known node. Timeout: [1m], request:
>> org.elasticsearch.action.bulk.BulkShardRequest@445968e3]
>> .......
>> Jan 25 16:53:02 Usergrid-2 catalina.out: 2016-01-25 06:23:02 ERROR
>> index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-97>-
>> Unable to index
>> id=appId(65ccc2b7-bde0-11e5-88d3-0a75091e6d25,application).entityId(b7dcf6c9-bf62-11e5-88d3-0a75091e6d25,activity).version(b7dcf6cb-bf62-11e5-88d3-0a75091e6d25).nodeId(65ccc2b7-bde0-11e5-88d3-0a75091e6d25,application).edgeName(zzzcollzzz|activities).nodeType(TARGET),
>> type=entity, index=usergrid_applications_3,
>> failureMessage=RemoteTransportException[[Blindside][inet[/10.0.0.148:9300]][indices:data/write/bulk[s]]];
>> nested: EsRejectedExecutionException[rejected execution (queue capacity
>> 300) on
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@13afccc8
>> ];
>> Jan 25 16:53:03 Usergrid-2 catalina.out:
>> RemoteTransportException[[Blindside][inet[/10.0.0.148:9300]][indices:data/write/bulk[s]]];
>> nested: EsRejectedExecutionException[rejected execution (queue capacity
>> 300) on
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@59da0cf2
>> ];
>>
>> On Mon, Jan 25, 2016 at 5:54 PM, Harish Singh Bisht <
>> [email protected]> wrote:
>>
>>> Hi Michael,
>>>
>>> Thank you vert much for the help. I tried the full system re-index API,
>>> but i am still seeing the below issues:
>>>
>>> 1. One of the ES indexes (usergrid_applications_4) still has 15 shards
>>> in "INITIALIZING" state. The ES cluster health is red.
>>>  Log Extract from ES (details in appendices) shows that:
>>> "[usergrid_applications_4][14] failed to start shard.".
>>>
>>> 2. Usergrid logs show the below types of errors after invoking the
>>> re-index api (details in appendices):
>>> Type 1
>>> UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
>>> not active or isn't assigned is a known node. Timeout: [1m], request:
>>> org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
>>> Type 2
>>> corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
>>> Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
>>> Type 3
>>> corepersistence.asyncevents.AmazonAsyncEventService.lambda$null$70(735)<Usergrid-SQS-Pool-16>-
>>> Missing messages from queue post operation
>>> Type 4
>>> core.executor.TaskExecutorFactory.rejectedExecution(171)<QueueConsumer_11>-
>>> Usergrid-SQS-Pool task queue full, rejecting task
>>> rx.schedulers.ExecutorScheduler$ExecutorSchedulerWorker@131a4c5 and
>>> running in thread QueueConsumer_11
>>>
>>> Please suggest the next steps. Appriciate the help.
>>>
>>> Many Thanks,
>>> Harish
>>>
>>>
>>>
>>> -------------------------------------------------------------------
>>> *Appendices*
>>> *1. ELASTICSEARCH LOG*
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
>>> 06:40:44,227][WARN
>>> ][indices.cluster          ] [Blindside] [usergrid_applications_4][14]
>>> failed to start shard
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:
>>>  org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
>>> [usergrid_applications_4][14] failed to recover shard
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:287)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> java.lang.Thread.run(Thread.java:745)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
>>> org.elasticsearch.index.translog.TranslogCorruptedException: translog
>>> corruption while reading from stream
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:70)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:257)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 4 more
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  Caused by:
>>> java.io.EOFException
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:53)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.index.translog.BufferedChecksumStreamInput.readBytes(BufferedChecksumStreamInput.java:55)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:86)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:74)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:495)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   at
>>> org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:   ... 5 more
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
>>> 06:40:44,279][WARN
>>> ][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
>>> sending failed shard for [usergrid_applications_4][14],
>>> node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
>>> [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
>>> [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
>>> recover shard]; nested: TranslogCorruptedException[translog corruption
>>> while reading from stream]; nested: EOFException; ]]
>>> Jan 25 17:10:23 Elasticsearch elasticsearch.log:  [2016-01-25 
>>> 06:40:44,279][WARN
>>> ][cluster.action.shard     ] [Blindside] [usergrid_applications_4][14]
>>> received shard failed for [usergrid_applications_4][14],
>>> node[lb-HRQpWRQGCeIadzTEHSw], [P], s[INITIALIZING], indexUUID
>>> [fdyaoJQZQKuFeBONTQSD1g], reason [Failed to start shard, message
>>> [IndexShardGatewayRecoveryException[[usergrid_applications_4][14] failed to
>>> recover shard]; nested: TranslogCorruptedException[translog corruption
>>> while reading from stream]; nested: EOFException; ]]
>>> ......
>>> *2. USERGRID LOG*
>>> Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
>>> rest.system.IndexResource.rebuildIndexesPost(78)<http-bio-80-exec-92>-
>>> Rebuilding all applications
>>> Jan 25 16:05:19 Usergrid-2 catalina.out:  2016-01-25 05:35:19 INFO
>>> corepersistence.index.ReIndexServiceImpl.lambda$rebuildIndex$97(131)<RxCachedThreadScheduler-35>-
>>> Sending batch of 1000 to be indexed.
>>> Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
>>> index.impl.EsIndexProducerImpl.sendRequest(209)<Usergrid-SQS-Pool-13>-
>>> Unable to index
>>> id=appId(cd2bd460-a3e8-11e5-a327-0a75091e6d25,application).entityId(7c3328cc-bdd1-11e5-88d3-0a75091e6d25,activity).version(7c3328cd-bdd1-11e5-88d3-0a75091e6d25).nodeId(99400999-a3ef-11e5-a327-0a75091e6d25,group).edgeName(zzzcollzzz|feed).nodeType(TARGET),
>>> type=entity, index=usergrid_applications_4,
>>> failureMessage=UnavailableShardsException[[usergrid_applications_4][4]
>>> Primary shard is not active or isn't assigned is a known node. Timeout:
>>> [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
>>> .....
>>> Jan 25 16:05:21 Usergrid-2 catalina.out:  2016-01-25 05:35:20 ERROR
>>> corepersistence.asyncevents.AmazonAsyncEventService.lambda$callEventHandlers$65(359)<Usergrid-SQS-Pool-13>-
>>> Failed to index message: 17ed55a5-3091-4f0d-8620-12f2915668c1
>>> Jan 25 16:05:21 Usergrid-2 catalina.out:  java.lang.RuntimeException:
>>> Error during processing of bulk index operations one of the responses
>>> failed.
>>> Jan 25 16:05:21 Usergrid-2 catalina.out:
>>>  UnavailableShardsException[[usergrid_applications_4][4] Primary shard is
>>> not active or isn't assigned is a known node. Timeout: [1m], request:
>>> org.elasticsearch.action.bulk.BulkShardRequest@3d277a1]
>>> Jan 25 16:05:21 Usergrid-2 catalina.out:
>>>  UnavailableShardsException[[usergrid_applications_4][7] Primary shard is
>>> not active or isn't assigned is a known node. Timeout: [1m], request:
>>> org.elasticsearch.action.bulk.BulkShardRequest@155fc46c]
>>> Jan 25 16:05:21 Usergrid-2 catalina.out:
>>>  UnavailableShardsException[[usergrid_applications_4][10] Primary shard is
>>> not active or isn't assigned is a known node. Timeout: [1m], request:
>>> org.elasticsearch.action.bulk.BulkShardRequest@445968e3]
>>>
>>>
>>> On Fri, Jan 22, 2016 at 10:29 PM, Michael Russo <[email protected]
>>> > wrote:
>>>
>>>> Hi Harish,
>>>>
>>>> Yeah, in Usergrid 2 you can re-index all of the entity data that exists
>>>> in Cassandra.  Here are example curl requests that invoke the reindex APIs:
>>>>
>>>> *Full system re-index:*
>>>>
>>>> curl -i -X POST -u <sysadmin user>:<sysadmin pass> "
>>>> http://localhost:8080/system/index/rebuild";
>>>>
>>>> *Per application re-index:*
>>>>
>>>> curl -i -X POST -u <sysadmin user>:<sysadmin pass> "
>>>> http://localhost:8080/system/index/rebuild/<application uuid>"
>>>>
>>>> Thanks.
>>>> -Michael
>>>>
>>>> On Fri, Jan 22, 2016 at 4:37 AM, Harish Singh Bisht <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> We have been our testing our application based on Usergrid 2 (master
>>>>> branch) and started noticing unusual poor performance with spikes in the
>>>>> response time.
>>>>>
>>>>> Our investigations revealed that during the load testing we ran out of
>>>>> HDD space on the elasticsearch single node cluster. This led to indexing
>>>>> failures.
>>>>>
>>>>> So we increased the HDD space and restarted ES. But now the cluster
>>>>> health is red and alot of shards are in the initializing state. It seem
>>>>> data has been lost on the ES node.
>>>>>
>>>>> Is there any way to recover the lost data in ES? Specifically, is
>>>>> there a way to trigger a re-index of data from Cassandra to ES?
>>>>>
>>>>> Appreciate the help.
>>>>>
>>>>> Thanks
>>>>> Harish
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Harish Singh Bisht
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Harish Singh Bisht
>>>
>>>
>>
>>
>> --
>> Regards,
>> Harish Singh Bisht
>>
>>
>


-- 
Regards,
Harish Singh Bisht

Reply via email to