[jira] [Updated] (IGNITE-4395) Implement communication backpressure per policy - SYSTEM or PUBLIC

2017-04-06 Thread Dmitry Karachentsev (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Karachentsev updated IGNITE-4395:

Fix Version/s: (was: 2.0)
   2.1

> Implement communication backpressure per policy - SYSTEM or PUBLIC
> --
>
> Key: IGNITE-4395
> URL: https://issues.apache.org/jira/browse/IGNITE-4395
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, compute
>Affects Versions: 1.7
>Reporter: Dmitry Karachentsev
>Assignee: Dmitry Karachentsev
> Fix For: 2.1
>
>
> 1) Start two data nodes with some cache.
> 2) From one node in async mode post some big number of jobs to another. That 
> jobs do some cache operations.
> 3) Grid hangs almost immediately and all threads are sleeping except public 
> ones, they are waiting for response.
> This happens because all cache and job messages are queued on communication 
> and limited with default number (1024). It looks like jobs are waiting for 
> cache responses that could not be received due to this limit.
> Proper solution here is to have communication backpressure per policy -
> SYSTEM or PUBLIC, but not single point as it is now. It could be achieved
> with having two queues per communication session or (which looks a bit
> easier to implement) to have separate connections.
> [PR#1331|https://github.com/apache/ignite/pull/1331] with test that leads to 
> grid hang.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4395) Implement communication backpressure per policy - SYSTEM or PUBLIC

2017-02-13 Thread Anton Vinogradov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Vinogradov updated IGNITE-4395:
-
Fix Version/s: (was: 1.9)
   2.0

FixVersion changed to 2.0.
In case issue can be done till Feb 17 please rollback this change.

> Implement communication backpressure per policy - SYSTEM or PUBLIC
> --
>
> Key: IGNITE-4395
> URL: https://issues.apache.org/jira/browse/IGNITE-4395
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, compute
>Affects Versions: 1.7
>Reporter: Dmitry Karachentsev
>Assignee: Dmitry Karachentsev
> Fix For: 2.0
>
>
> 1) Start two data nodes with some cache.
> 2) From one node in async mode post some big number of jobs to another. That 
> jobs do some cache operations.
> 3) Grid hangs almost immediately and all threads are sleeping except public 
> ones, they are waiting for response.
> This happens because all cache and job messages are queued on communication 
> and limited with default number (1024). It looks like jobs are waiting for 
> cache responses that could not be received due to this limit.
> Proper solution here is to have communication backpressure per policy -
> SYSTEM or PUBLIC, but not single point as it is now. It could be achieved
> with having two queues per communication session or (which looks a bit
> easier to implement) to have separate connections.
> [PR#1331|https://github.com/apache/ignite/pull/1331] with test that leads to 
> grid hang.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4395) Implement communication backpressure per policy - SYSTEM or PUBLIC

2017-01-25 Thread Dmitry Karachentsev (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Karachentsev updated IGNITE-4395:

Fix Version/s: (was: 2.0)
   1.9

> Implement communication backpressure per policy - SYSTEM or PUBLIC
> --
>
> Key: IGNITE-4395
> URL: https://issues.apache.org/jira/browse/IGNITE-4395
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, compute
>Affects Versions: 1.7
>Reporter: Dmitry Karachentsev
>Assignee: Dmitry Karachentsev
> Fix For: 1.9
>
>
> 1) Start two data nodes with some cache.
> 2) From one node in async mode post some big number of jobs to another. That 
> jobs do some cache operations.
> 3) Grid hangs almost immediately and all threads are sleeping except public 
> ones, they are waiting for response.
> This happens because all cache and job messages are queued on communication 
> and limited with default number (1024). It looks like jobs are waiting for 
> cache responses that could not be received due to this limit.
> Proper solution here is to have communication backpressure per policy -
> SYSTEM or PUBLIC, but not single point as it is now. It could be achieved
> with having two queues per communication session or (which looks a bit
> easier to implement) to have separate connections.
> [PR#1331|https://github.com/apache/ignite/pull/1331] with test that leads to 
> grid hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4395) Implement communication backpressure per policy - SYSTEM or PUBLIC

2016-12-27 Thread Dmitry Karachentsev (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Karachentsev updated IGNITE-4395:

Fix Version/s: 2.0

> Implement communication backpressure per policy - SYSTEM or PUBLIC
> --
>
> Key: IGNITE-4395
> URL: https://issues.apache.org/jira/browse/IGNITE-4395
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, compute
>Affects Versions: 1.7
>Reporter: Dmitry Karachentsev
> Fix For: 2.0
>
>
> 1) Start two data nodes with some cache.
> 2) From one node in async mode post some big number of jobs to another. That 
> jobs do some cache operations.
> 3) Grid hangs almost immediately and all threads are sleeping except public 
> ones, they are waiting for response.
> This happens because all cache and job messages are queued on communication 
> and limited with default number (1024). It looks like jobs are waiting for 
> cache responses that could not be received due to this limit.
> Proper solution here is to have communication backpressure per policy -
> SYSTEM or PUBLIC, but not single point as it is now. It could be achieved
> with having two queues per communication session or (which looks a bit
> easier to implement) to have separate connections.
> [PR#1331|https://github.com/apache/ignite/pull/1331] with test that leads to 
> grid hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-4395) Implement communication backpressure per policy - SYSTEM or PUBLIC

2016-12-07 Thread Dmitry Karachentsev (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Karachentsev updated IGNITE-4395:

Description: 
1) Start two data nodes with some cache.
2) From one node in async mode post some big number of jobs to another. That 
jobs do some cache operations.
3) Grid hangs almost immediately and all threads are sleeping except public 
ones, they are waiting for response.

This happens because all cache and job messages are queued on communication and 
limited with default number (1024). It looks like jobs are waiting for cache 
responses that could not be received due to this limit.

Proper solution here is to have communication backpressure per policy -
SYSTEM or PUBLIC, but not single point as it is now. It could be achieved
with having two queues per communication session or (which looks a bit
easier to implement) to have separate connections.

[PR#1331|https://github.com/apache/ignite/pull/1331] with test that leads to 
grid hang.

  was:
1) Start two data nodes with some cache.
2) From one node in async mode post some big number of jobs to another. That 
jobs do some cache operations.
3) Grid hangs almost immediately and all threads are sleeping except public 
ones, they are waiting for response.

This happens because all cache and job messages are queued on communication and 
limited with default number (1024). It looks like jobs are waiting for cache 
responses that could not be received due to this limit.

Proper solution here is to have communication backpressure per policy -
SYSTEM or PUBLIC, but not single point as it is now. It could be achieved
with having two queues per communication session or (which looks a bit
easier to implement) to have separate connections.


> Implement communication backpressure per policy - SYSTEM or PUBLIC
> --
>
> Key: IGNITE-4395
> URL: https://issues.apache.org/jira/browse/IGNITE-4395
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, compute
>Affects Versions: 1.7
>Reporter: Dmitry Karachentsev
>
> 1) Start two data nodes with some cache.
> 2) From one node in async mode post some big number of jobs to another. That 
> jobs do some cache operations.
> 3) Grid hangs almost immediately and all threads are sleeping except public 
> ones, they are waiting for response.
> This happens because all cache and job messages are queued on communication 
> and limited with default number (1024). It looks like jobs are waiting for 
> cache responses that could not be received due to this limit.
> Proper solution here is to have communication backpressure per policy -
> SYSTEM or PUBLIC, but not single point as it is now. It could be achieved
> with having two queues per communication session or (which looks a bit
> easier to implement) to have separate connections.
> [PR#1331|https://github.com/apache/ignite/pull/1331] with test that leads to 
> grid hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)