Hi Michael,

I am providing an update on my situation. We have changed our application
logic to minimize the use of queries (ie calls with "ql=.....") in usergrid
2.x. This seems to have provided significant benefit and all the problems
reported below, seem to have disappeared.

To some extent this is good news. However we were lucky that we were able
to work around the logic and would like to understand any limitations or
best practices around the use of queries (which are serviced by
elasticsearch in usergrid 2.x) under high load situations.

Also please let me know me know if there is an existing Jira issue for
addressing the empty entity response when elasticsearch when it is
overloaded. Or should i add one?

Thanks in advance,

Thanks
Jaskaran


On Tue, Dec 8, 2015 at 6:00 PM, Jaskaran Singh <
jaskaran.si...@comprotechnologies.com> wrote:

> Hi Michael,
>
> This makes sense. I can confirm that while we have been seeing missing
> entity errors on high load; these automatically get resolved themselves as
> the load decreases.
>
> Another anomaly that we have noticed is that usergrid responds with a code
> "401" and message "Unable to authenticate OAuth credentials" for certain
> user's credentials under high load and the same credentials work fine after
> the load reduces. Can we assume that this issue (intermittent invalid
> credentials) has the same underlying root cause (ie elasticsearch is not
> responding)? See below a couple of examples of the error_description for
> such 401 errors:
> 1. 'invalid username or password'
> 2. ‘Unable to authenticate OAuth credentials’
> 3. ‘Unable to authenticate due to corrupt access token’
>
> Regarding your suggestion to increase the search thread pool queue size,
> we were already using a setting of 1000 (with 320 threads). Should we
> consider further increasing this? Or simply provide additional resources
> (cpu / ram) to the ES process.
>
> Additionally we are also seeing cassandra connection timeouts,
> specifically the exceptions below under high load conditions:
> ERROR stage.write.WriteCommit.call(132)<Usergrid-Collection-Pool-12>-
> Failed to execute write asynchronously
> com.netflix.astyanax.connectionpool.exceptions.TimeoutException:
> TimeoutException: [host=10.0.0.237(10.0.0.237):9160, latency=2003(2003),
> attempts=1]org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
>
> These exceptions occur even though opscenter was reporting medium load on
> our cluster. Is there a way to optimize the astyanax library. Please let us
> know if you have any recommendations in this area.
>
> Thanks a lot for the help.
>
> Thanks
> Jaskaran
>
> On Mon, Dec 7, 2015 at 2:29 AM, Michael Russo <michaelaru...@gmail.com>
> wrote:
>
>> Here are a couple things to check:
>>
>> 1) Can you query all of these entities out when the system is not under
>> load?
>> 2) Elasticsearch has a search queue for index query requests. (
>> https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-threadpool.html)
>> When this is full the searches are rejected. Currently Usergrid surfaces
>> this as no results returned rather than unable to query or some other
>> identifying error message (we're aware and plan to fix this in the future).
>> Try increasing the queue size to 1000. You might have delayed results, but
>> can prevent them from being empty results for data that's known to be in
>> the index.
>>
>> Thanks.
>> -Michael R.
>>
>> On Dec 5, 2015, at 07:07, Jaskaran Singh <
>> jaskaran.si...@comprotechnologies.com> wrote:
>>
>> Hello All,
>>
>> We are testing usergrid 2.x (master branch) for our application that was
>> previously being prototyped on usergrid 1.x. We are noticing some weird
>> anomalies which are causing errors in our application which otherwise works
>> fine against usergrid 1.x. Specifically, we are seeing empty responses when
>> querying custom collections for a particular entity record.
>> Following is an example of one such query:
>> http://server-name/b2perf1/default/userdata?client_id=
>> <...>&client_secret=<....>&ql=userproductid='4d543507-9839-11e5-ba08-0a75091e6d25~~5c856de9-9828-11e5-ba08-0a75091e6d25'
>>
>> In the above scenario, we are querying a custom collection “userdata”.
>> And under high load conditions (performance tests), this query starts
>> returning an empty entities array (see below), even though this entity did
>> exist at one time and we have no code / logic to delete entities.
>> {
>>     "action": "get",
>>     "application": "0f7a2396-9826-11e5-ba08-0a75091e6d25",
>>     "params": {
>>         "ql": [
>>
>> "userproductid='4d543507-9839-11e5-ba08-0a75091e6d25~~5c856de9-9828-11e5-ba08-0a75091e6d25'"
>>         ]
>>     },
>>     "path": "/userdata",
>>     "uri": "http://localhost:8080/b2perf1/default/userdata";,
>>     "entities": [],
>>     "timestamp": 1449322746733,
>>     "duration": 1053,
>>     "organization": "b2perf1",
>>     "applicationName": "default",
>>     "count": 0
>> }
>>
>> This has been happening quite randomly / intermittently and we have not
>> been able to isolate any replication steps besides, running load /
>> performance tests when this problem does eventually show up.
>> Note, the creation of the entities is prior to the load test and we can
>> confirm that they existed before running the load test.
>>
>> We have never noticed this issue for ‘non’ query calls (ie calls that do
>> not directly provide a field to query on)
>>
>> Our suspicion is that while these records do exist in Cassandra (because
>> we have never deleted them), but the ElasticSearch index is ‘not’ in sync
>> or is not functioning properly.
>> How do we go about debugging this problem? Is there any particular
>> logging or metric that we can check for us to confirm if all the
>> elasticsearch index is upto date with the changes in cassandra.
>>
>> Any other suggestions will be greatly appreciated.
>>
>> Thanks
>> Jaskaran
>>
>>
>

Reply via email to