The cache.query() starts to block when ignite server nodes are being
restarted and there's no baseline topology yet. The server nodes do not
block. It's the client that blocks.

The dumpfiles are of the server nodes. The screen shot is from the client
app using your kit profiler on the client side the threads are marked as
red on your kit.

The app is simple, make http request, it runs cache Sql query on ignite and
if it succeeds does a put back to ignite.

The Client disconnected exception only happens when all server nodes in the
cluster are down. The blockage only happens when the cluster is trying to
establish baseline topology.

On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda, <dma...@apache.org> wrote:

> John,
>
> I don't see any traits of an application-caused deadlock in the thread
> dumps. Please elaborate on the following:
>
> 7- Restart 1st node, run operation, operation fails with
>> ClientDisconectedException but application still able to complete it's
>> request.
>
>
> What's the IP address of the server node the client app uses to join the
> cluster? If that's not the address of the 1st node, that is already
> restarted, then the client couldn't join the cluster and it's expected that
> it fails with the ClientDisconnectedException.
>
> 8- Start 2nd node, run operation, from here on all operations just block.
>
>
> Are the operations unblocked and completed successfully when the third
> node joins the cluster and the cluster gets activated automatically?
>
> -
> Denis
>
>
> On Wed, Aug 12, 2020 at 11:08 AM John Smith <java.dev....@gmail.com>
> wrote:
>
>> Ok Denis here they are...
>>
>> 3 nodes and I capture a yourlit screenshot of what it thinks are
>> deadlocks on the client app.
>>
>> https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0
>>
>> On Wed, 12 Aug 2020 at 11:07, John Smith <java.dev....@gmail.com> wrote:
>>
>>> Hi Denis. I will asap but you I think you were right it is the query
>>> that blocks.
>>>
>>> My application first first runs a select on the cache and then does a
>>> put to cache.
>>>
>>> On Tue, 11 Aug 2020 at 19:22, Denis Magda <dma...@apache.org> wrote:
>>>
>>>> John,
>>>>
>>>> It sounds like a deadlock caused by the application logic. Is there any
>>>> chance that the operation you run on step 8 accesses several keys in one
>>>> order while the other operations work with the same keys but in a different
>>>> order. The deadlocks are possible when you use Ignite Transaction API or
>>>> simply execute bulk operations such as cache.readAll() or
>>>> cache.writeAll(..).
>>>>
>>>> Please take and attach thread dumps from all the cluster nodes for
>>>> analysis if we need to dig deeper.
>>>>
>>>> -
>>>> Denis
>>>>
>>>>
>>>> On Mon, Aug 10, 2020 at 6:23 PM John Smith <java.dev....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Denis, I think you are right. It's the query that blocks the other
>>>>> k/v operations are ok.
>>>>>
>>>>> Any thoughts on this?
>>>>>
>>>>> On Mon, 10 Aug 2020 at 15:28, John Smith <java.dev....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I tried with 2.8.1, same issue. Operations block indefinitely...
>>>>>>
>>>>>> 1- Start 3 node cluster
>>>>>> 2- Start client application client = true with Ignition.start()
>>>>>> 3- Run some cache operations, everything ok...
>>>>>> 4- Shut down one node, run operation, still ok
>>>>>> 5- Shut down 2nd node, run operation, still ok
>>>>>> 6- Shut down 3rd node, run operation, still ok... Operations start
>>>>>> failing with ClientDisconectedException...
>>>>>> 7- Restart 1st node, run operation, operation fails
>>>>>> with ClientDisconectedException but application still able to complete 
>>>>>> it's
>>>>>> request.
>>>>>> 8- Start 2nd node, run operation, from here on all operations just
>>>>>> block.
>>>>>>
>>>>>> Basically the client application is an HTTP Server on each HTTP
>>>>>> request does cache exception.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 7 Aug 2020 at 19:46, John Smith <java.dev....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> No, everything blocks... Also using 2.7.0 just in case.
>>>>>>>
>>>>>>> Only time I get exception is if the cluster is completely off, then
>>>>>>> I get ClientDisconectedException...
>>>>>>>
>>>>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <dma...@apache.org> wrote:
>>>>>>>
>>>>>>>> If I'm not mistaken, key-value operations (cache.get/put) and
>>>>>>>> compute calls fail with an exception if the cluster is deactivated. Do
>>>>>>>> those fail on your end?
>>>>>>>>
>>>>>>>> As for the async and SQL operations, let's see what other community
>>>>>>>> members say.
>>>>>>>>
>>>>>>>> -
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith <java.dev....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi any thoughts on this?
>>>>>>>>>
>>>>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith <java.dev....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Here is another example where it blocks.
>>>>>>>>>>
>>>>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>>>>>>>>>         "select * from my_table")
>>>>>>>>>>         .setArgs(providerId, carrierCode);
>>>>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>>>>>>>>
>>>>>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query))
>>>>>>>>>>
>>>>>>>>>> cache.query just blocks even with the timeout set.
>>>>>>>>>>
>>>>>>>>>> Is there a way to timeout and at least have the application
>>>>>>>>>> continue and respond with an appropriate message?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith <java.dev....@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi running 2.7.0
>>>>>>>>>>>
>>>>>>>>>>> When I reboot a node and it begins to rejoin the cluster or the
>>>>>>>>>>> cluster is not yet activated with baseline topology operations seem 
>>>>>>>>>>> to
>>>>>>>>>>> block forever, operations that are supposed to return IgniteFuture. 
>>>>>>>>>>> I.e:
>>>>>>>>>>> putAsync, getAsync etc... They just block, until the cluster 
>>>>>>>>>>> resolves it's
>>>>>>>>>>> state.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to