[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15527787#comment-15527787
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 55a65840a4e4d427acaed1182aca869bf92ecae6 in incubator-geode's branch 
refs/heads/feature/GEODE-1801 from [~dschneider]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=55a6584 ]

GEODE-1885: fix infinite loop

The previous fix for GEODE-1885 introduced a hang on off-heap regions.
If a concurrent close/destroy of the region happens while other threads
are modifying it then the thread doing the modification can get stuck
in a hot loop that never terminates.
The hot loop is in AbstractRegionMap when it tests the existing
region entry it finds to see if it can be modified.
If the region entry has a value that says it is removed
then the operation spins around and tries again.
It expects the thread that marked it as being removed
to also remove it from the map.
The previous fix for GEODE-1885 can cause a remove to not happen.
So this fix does two things:
 1. On retry remove the existing removed region entry from the map.
 2. putEntryIfAbsent now only releases the current entry if it has an off-heap 
reference.
This prevents an infinite loop that was caused by the current thread who 
just added
a new entry with REMOVE_PHASE1 from releasing it (changing it to 
REMOVE_PHASE2)
because it sees that the region is closed/destroyed.


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Darrel Schneider
> Fix For: 1.0.0-incubating
>
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15527786#comment-15527786
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 55a65840a4e4d427acaed1182aca869bf92ecae6 in incubator-geode's branch 
refs/heads/feature/GEODE-1801 from [~dschneider]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=55a6584 ]

GEODE-1885: fix infinite loop

The previous fix for GEODE-1885 introduced a hang on off-heap regions.
If a concurrent close/destroy of the region happens while other threads
are modifying it then the thread doing the modification can get stuck
in a hot loop that never terminates.
The hot loop is in AbstractRegionMap when it tests the existing
region entry it finds to see if it can be modified.
If the region entry has a value that says it is removed
then the operation spins around and tries again.
It expects the thread that marked it as being removed
to also remove it from the map.
The previous fix for GEODE-1885 can cause a remove to not happen.
So this fix does two things:
 1. On retry remove the existing removed region entry from the map.
 2. putEntryIfAbsent now only releases the current entry if it has an off-heap 
reference.
This prevents an infinite loop that was caused by the current thread who 
just added
a new entry with REMOVE_PHASE1 from releasing it (changing it to 
REMOVE_PHASE2)
because it sees that the region is closed/destroyed.


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Darrel Schneider
> Fix For: 1.0.0-incubating
>
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15527785#comment-15527785
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 55a65840a4e4d427acaed1182aca869bf92ecae6 in incubator-geode's branch 
refs/heads/feature/GEODE-1801 from [~dschneider]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=55a6584 ]

GEODE-1885: fix infinite loop

The previous fix for GEODE-1885 introduced a hang on off-heap regions.
If a concurrent close/destroy of the region happens while other threads
are modifying it then the thread doing the modification can get stuck
in a hot loop that never terminates.
The hot loop is in AbstractRegionMap when it tests the existing
region entry it finds to see if it can be modified.
If the region entry has a value that says it is removed
then the operation spins around and tries again.
It expects the thread that marked it as being removed
to also remove it from the map.
The previous fix for GEODE-1885 can cause a remove to not happen.
So this fix does two things:
 1. On retry remove the existing removed region entry from the map.
 2. putEntryIfAbsent now only releases the current entry if it has an off-heap 
reference.
This prevents an infinite loop that was caused by the current thread who 
just added
a new entry with REMOVE_PHASE1 from releasing it (changing it to 
REMOVE_PHASE2)
because it sees that the region is closed/destroyed.


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Darrel Schneider
> Fix For: 1.0.0-incubating
>
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524264#comment-15524264
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 55a65840a4e4d427acaed1182aca869bf92ecae6 in incubator-geode's branch 
refs/heads/develop from [~dschneider]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=55a6584 ]

GEODE-1885: fix infinite loop

The previous fix for GEODE-1885 introduced a hang on off-heap regions.
If a concurrent close/destroy of the region happens while other threads
are modifying it then the thread doing the modification can get stuck
in a hot loop that never terminates.
The hot loop is in AbstractRegionMap when it tests the existing
region entry it finds to see if it can be modified.
If the region entry has a value that says it is removed
then the operation spins around and tries again.
It expects the thread that marked it as being removed
to also remove it from the map.
The previous fix for GEODE-1885 can cause a remove to not happen.
So this fix does two things:
 1. On retry remove the existing removed region entry from the map.
 2. putEntryIfAbsent now only releases the current entry if it has an off-heap 
reference.
This prevents an infinite loop that was caused by the current thread who 
just added
a new entry with REMOVE_PHASE1 from releasing it (changing it to 
REMOVE_PHASE2)
because it sees that the region is closed/destroyed.


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Darrel Schneider
> Fix For: 1.0.0-incubating
>
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524267#comment-15524267
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 55a65840a4e4d427acaed1182aca869bf92ecae6 in incubator-geode's branch 
refs/heads/develop from [~dschneider]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=55a6584 ]

GEODE-1885: fix infinite loop

The previous fix for GEODE-1885 introduced a hang on off-heap regions.
If a concurrent close/destroy of the region happens while other threads
are modifying it then the thread doing the modification can get stuck
in a hot loop that never terminates.
The hot loop is in AbstractRegionMap when it tests the existing
region entry it finds to see if it can be modified.
If the region entry has a value that says it is removed
then the operation spins around and tries again.
It expects the thread that marked it as being removed
to also remove it from the map.
The previous fix for GEODE-1885 can cause a remove to not happen.
So this fix does two things:
 1. On retry remove the existing removed region entry from the map.
 2. putEntryIfAbsent now only releases the current entry if it has an off-heap 
reference.
This prevents an infinite loop that was caused by the current thread who 
just added
a new entry with REMOVE_PHASE1 from releasing it (changing it to 
REMOVE_PHASE2)
because it sees that the region is closed/destroyed.


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Darrel Schneider
> Fix For: 1.0.0-incubating
>
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524265#comment-15524265
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 55a65840a4e4d427acaed1182aca869bf92ecae6 in incubator-geode's branch 
refs/heads/develop from [~dschneider]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=55a6584 ]

GEODE-1885: fix infinite loop

The previous fix for GEODE-1885 introduced a hang on off-heap regions.
If a concurrent close/destroy of the region happens while other threads
are modifying it then the thread doing the modification can get stuck
in a hot loop that never terminates.
The hot loop is in AbstractRegionMap when it tests the existing
region entry it finds to see if it can be modified.
If the region entry has a value that says it is removed
then the operation spins around and tries again.
It expects the thread that marked it as being removed
to also remove it from the map.
The previous fix for GEODE-1885 can cause a remove to not happen.
So this fix does two things:
 1. On retry remove the existing removed region entry from the map.
 2. putEntryIfAbsent now only releases the current entry if it has an off-heap 
reference.
This prevents an infinite loop that was caused by the current thread who 
just added
a new entry with REMOVE_PHASE1 from releasing it (changing it to 
REMOVE_PHASE2)
because it sees that the region is closed/destroyed.


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Darrel Schneider
> Fix For: 1.0.0-incubating
>
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-22 Thread Darrel Schneider (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514048#comment-15514048
 ] 

Darrel Schneider commented on GEODE-1885:
-

This fix caused a deadlock. If an offheap region is being destroyed while 
concurrent modifications are being done and if a clear is done on it then the 
deadlock can happen.

The deadlock is caused by the code setting the offheap region entry value to a 
REMOVE token but not throwing an exception. This causes the higher level code 
to leave the entry in the map (if we had thrown an exception the higher level 
code would have removed the entry from the map). Then another thread that has 
the RVV read lock keeps seeing this entry with the REMOVE token and spinning 
around and seeing it again. Holding the RVV read lock blocks clear who is 
trying to get the RVV write lock. The clear blocks region destroy from 
completing because it waits for an in progress clear.


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Darrel Schneider
> Fix For: 1.0.0-incubating
>
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494571#comment-15494571
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 9b710ab0af2bc6af2667010c004ad4798b0b8700 in incubator-geode's branch 
refs/heads/develop from [~agingade]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=9b710ab ]

GEODE-1885: Removed call to check readiness (region check) after the offheap 
region entry is released.

GEODE-1885: Missing subsctiption event with Offheap partitioned region during 
bucket rebalance.

During the trasaction commit on redundant bucket region, if the bucket region 
is moved, the call-back logic (to deliver subscription events) were not invoked 
due to check-readiness call with offheap region. The check-readiness throws 
exception, if the region is not found, which causes the code to return early 
without sending the subscription events.

In this scenario, calling check-readiness is not needed...


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Anilkumar Gingade
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.

2016-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494570#comment-15494570
 ] 

ASF subversion and git services commented on GEODE-1885:


Commit 9b710ab0af2bc6af2667010c004ad4798b0b8700 in incubator-geode's branch 
refs/heads/develop from [~agingade]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=9b710ab ]

GEODE-1885: Removed call to check readiness (region check) after the offheap 
region entry is released.

GEODE-1885: Missing subsctiption event with Offheap partitioned region during 
bucket rebalance.

During the trasaction commit on redundant bucket region, if the bucket region 
is moved, the call-back logic (to deliver subscription events) were not invoked 
due to check-readiness call with offheap region. The check-readiness throws 
exception, if the region is not found, which causes the code to return early 
without sending the subscription events.

In this scenario, calling check-readiness is not needed...


> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> ---
>
> Key: GEODE-1885
> URL: https://issues.apache.org/jira/browse/GEODE-1885
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Reporter: Anilkumar Gingade
>Assignee: Anilkumar Gingade
>
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)