Hi Michael,

This makes sense. I can confirm that while we have been seeing missing
entity errors on high load; these automatically get resolved themselves as
the load decreases.

Another anomaly that we have noticed is that usergrid responds with a code
"401" and message "Unable to authenticate OAuth credentials" for certain
user's credentials under high load and the same credentials work fine after
the load reduces. Can we assume that this issue (intermittent invalid
credentials) has the same underlying root cause (ie elasticsearch is not
responding)? See below a couple of examples of the error_description for
such 401 errors:
1. 'invalid username or password'
2. ‘Unable to authenticate OAuth credentials’
3. ‘Unable to authenticate due to corrupt access token’

Regarding your suggestion to increase the search thread pool queue size, we
were already using a setting of 1000 (with 320 threads). Should we consider
further increasing this? Or simply provide additional resources (cpu / ram)
to the ES process.

Additionally we are also seeing cassandra connection timeouts, specifically
the exceptions below under high load conditions:
ERROR stage.write.WriteCommit.call(132)<Usergrid-Collection-Pool-12>-
Failed to execute write asynchronously
com.netflix.astyanax.connectionpool.exceptions.TimeoutException:
TimeoutException: [host=10.0.0.237(10.0.0.237):9160, latency=2003(2003),
attempts=1]org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out

These exceptions occur even though opscenter was reporting medium load on
our cluster. Is there a way to optimize the astyanax library. Please let us
know if you have any recommendations in this area.

Thanks a lot for the help.

Thanks
Jaskaran

On Mon, Dec 7, 2015 at 2:29 AM, Michael Russo <michaelaru...@gmail.com>
wrote:

> Here are a couple things to check:
>
> 1) Can you query all of these entities out when the system is not under
> load?
> 2) Elasticsearch has a search queue for index query requests. (
> https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-threadpool.html)
> When this is full the searches are rejected. Currently Usergrid surfaces
> this as no results returned rather than unable to query or some other
> identifying error message (we're aware and plan to fix this in the future).
> Try increasing the queue size to 1000. You might have delayed results, but
> can prevent them from being empty results for data that's known to be in
> the index.
>
> Thanks.
> -Michael R.
>
> On Dec 5, 2015, at 07:07, Jaskaran Singh <
> jaskaran.si...@comprotechnologies.com> wrote:
>
> Hello All,
>
> We are testing usergrid 2.x (master branch) for our application that was
> previously being prototyped on usergrid 1.x. We are noticing some weird
> anomalies which are causing errors in our application which otherwise works
> fine against usergrid 1.x. Specifically, we are seeing empty responses when
> querying custom collections for a particular entity record.
> Following is an example of one such query:
> http://server-name/b2perf1/default/userdata?client_id=
> <...>&client_secret=<....>&ql=userproductid='4d543507-9839-11e5-ba08-0a75091e6d25~~5c856de9-9828-11e5-ba08-0a75091e6d25'
>
> In the above scenario, we are querying a custom collection “userdata”.
> And under high load conditions (performance tests), this query starts
> returning an empty entities array (see below), even though this entity did
> exist at one time and we have no code / logic to delete entities.
> {
>     "action": "get",
>     "application": "0f7a2396-9826-11e5-ba08-0a75091e6d25",
>     "params": {
>         "ql": [
>
> "userproductid='4d543507-9839-11e5-ba08-0a75091e6d25~~5c856de9-9828-11e5-ba08-0a75091e6d25'"
>         ]
>     },
>     "path": "/userdata",
>     "uri": "http://localhost:8080/b2perf1/default/userdata";,
>     "entities": [],
>     "timestamp": 1449322746733,
>     "duration": 1053,
>     "organization": "b2perf1",
>     "applicationName": "default",
>     "count": 0
> }
>
> This has been happening quite randomly / intermittently and we have not
> been able to isolate any replication steps besides, running load /
> performance tests when this problem does eventually show up.
> Note, the creation of the entities is prior to the load test and we can
> confirm that they existed before running the load test.
>
> We have never noticed this issue for ‘non’ query calls (ie calls that do
> not directly provide a field to query on)
>
> Our suspicion is that while these records do exist in Cassandra (because
> we have never deleted them), but the ElasticSearch index is ‘not’ in sync
> or is not functioning properly.
> How do we go about debugging this problem? Is there any particular logging
> or metric that we can check for us to confirm if all the elasticsearch
> index is upto date with the changes in cassandra.
>
> Any other suggestions will be greatly appreciated.
>
> Thanks
> Jaskaran
>
>

Reply via email to