[jira] [Updated] (GEODE-10400) Function execution triggering internal exception

2022-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-10400:
---
Labels: needsTriage pull-request-available  (was: needsTriage)

> Function execution triggering internal exception
> 
>
> Key: GEODE-10400
> URL: https://issues.apache.org/jira/browse/GEODE-10400
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Reporter: Mario Salazar de Torres
>Assignee: Mario Salazar de Torres
>Priority: Major
>  Labels: needsTriage, pull-request-available
>
> *GIVEN* a cluster with at least 3 members
> *AND* a partitioned region with 1 redundant-copy
> *AND* a server function called *JustAFunction* with isHA=false, 
> hasResult=true, optimizeForWrite=true
> *AND* a native client configured to connect to the above cluster with a pool 
> using PR-Single-Hop=true
> *WHEN* *JustAFunction* is executed with onRegion and no filters
> *IF* the client has partial metadata due to the cluster starting up or a 
> rebalance occurring
> *THEN* and exception of type 
> *"org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException:
>  Multiple target nodes found for single hop operation"* is thrown by one of 
> the servers
> ---
> *Additional information.* Currently, in geode-native whenever the metadata 
> information is incomplete, and the user tries to execute the server function 
> with onRegion and no filters, a request of type 
> EXECUTE_REGION_FUNCTION_SINGLE_HOP is sent to each node.
> But the issue is that bucket partition used by the client is incorrect, 
> leading consequently to the mentioned exception.
> *Potential solution.* The potential solution would be to detect that the 
> metadata is incomplete before actually executing the function and send a 
> EXECUTE_REGION_FUNCTION request to one of the cluster nodes instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (GEODE-10400) Function execution triggering internal exception

2022-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583483#comment-17583483
 ] 

ASF GitHub Bot commented on GEODE-10400:


gaussianrecurrence opened a new pull request, #982:
URL: https://github.com/apache/geode-native/pull/982

- Whenever execution server functions with isHA=false, hasResult=true
  and optimizeForWrite=true, if the metadata is incomplete a request of
  type EXECUTE_REGION_FUNCTION_SINGLE_HOP is sent with an invalid
  bucket set partition. This causes an
  InternalFunctionInvocationTargetException exception
- In order to solve this issue, now, whenever calling
  groupByServerToBuckets, if there isn't a valid location for one of
  the buckets, it returns nullptr, triggering the fallback mechanism
  which would be sending EXECUTE_REGION_FUNCTION to one of the nodes
  instead.




> Function execution triggering internal exception
> 
>
> Key: GEODE-10400
> URL: https://issues.apache.org/jira/browse/GEODE-10400
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Reporter: Mario Salazar de Torres
>Assignee: Mario Salazar de Torres
>Priority: Major
>  Labels: needsTriage
>
> *GIVEN* a cluster with at least 3 members
> *AND* a partitioned region with 1 redundant-copy
> *AND* a server function called *JustAFunction* with isHA=false, 
> hasResult=true, optimizeForWrite=true
> *AND* a native client configured to connect to the above cluster with a pool 
> using PR-Single-Hop=true
> *WHEN* *JustAFunction* is executed with onRegion and no filters
> *IF* the client has partial metadata due to the cluster starting up or a 
> rebalance occurring
> *THEN* and exception of type 
> *"org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException:
>  Multiple target nodes found for single hop operation"* is thrown by one of 
> the servers
> ---
> *Additional information.* Currently, in geode-native whenever the metadata 
> information is incomplete, and the user tries to execute the server function 
> with onRegion and no filters, a request of type 
> EXECUTE_REGION_FUNCTION_SINGLE_HOP is sent to each node.
> But the issue is that bucket partition used by the client is incorrect, 
> leading consequently to the mentioned exception.
> *Potential solution.* The potential solution would be to detect that the 
> metadata is incomplete before actually executing the function and send a 
> EXECUTE_REGION_FUNCTION request to one of the cluster nodes instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (GEODE-10402) Fix FunctionException handling

2022-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-10402:
---
Labels: needsTriage pull-request-available  (was: needsTriage)

> Fix FunctionException handling
> --
>
> Key: GEODE-10402
> URL: https://issues.apache.org/jira/browse/GEODE-10402
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Reporter: Mario Salazar de Torres
>Assignee: Mario Salazar de Torres
>Priority: Major
>  Labels: needsTriage, pull-request-available
>
> *GIVEN* a ServerFunction throwing a FunctionException
> *WHEN* its executed
> *THEN* a CacheServerException is thrown rather FunctionException
> 
> *Additional info.* FunctionException seems not to be handled, that's why the 
> default handling exception is thrown by the native API, CacheServerException



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (GEODE-10402) Fix FunctionException handling

2022-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583458#comment-17583458
 ] 

ASF GitHub Bot commented on GEODE-10402:


gaussianrecurrence opened a new pull request, #981:
URL: https://github.com/apache/geode-native/pull/981

- Fixed handling for FunctionException.
- Added InternalFunctionInvocationTargetException and replaced
  GF_FUNCTION_EXCEPTION by
  GF_INTERNAL_FUNCTION_INVOCATION_TARGET_EXCEPTION, so function
  exceptions are properly handled.
- Code modified to adapt to the above changes.




> Fix FunctionException handling
> --
>
> Key: GEODE-10402
> URL: https://issues.apache.org/jira/browse/GEODE-10402
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Reporter: Mario Salazar de Torres
>Assignee: Mario Salazar de Torres
>Priority: Major
>  Labels: needsTriage
>
> *GIVEN* a ServerFunction throwing a FunctionException
> *WHEN* its executed
> *THEN* a CacheServerException is thrown rather FunctionException
> 
> *Additional info.* FunctionException seems not to be handled, that's why the 
> default handling exception is thrown by the native API, CacheServerException



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (GEODE-10410) Rebalance Guard Prevent Lost Bucket Recovery

2022-08-23 Thread Weijie Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Xu updated GEODE-10410:
--
Attachment: server2.log
test.tar.gz

> Rebalance Guard Prevent Lost Bucket Recovery
> 
>
> Key: GEODE-10410
> URL: https://issues.apache.org/jira/browse/GEODE-10410
> Project: Geode
>  Issue Type: Bug
>Reporter: Weijie Xu
>Priority: Major
>  Labels: needsTriage
> Attachments: server2.log, test.tar.gz
>
>
> Following steps reproduce the issue:
> Run the start.gfsh in the attached example, which configures a geode system 
> with a partitioned region and a gateway sender. So there are two regions, the 
> manually created region, and the queue region.
> Then run the example code, which will source ~400M data and 5 times amount of 
> events into the system. All data are sourced into the system, no bucket lost, 
> and no out of memory.
> Then stop one of the server, and revoke the disk file of the server.
> Then start the server, which will trigger a bucket recovery. After that, 
> there will be part of secondary bucket lost.
> gfsh>show metrics --region=/example-region
>           | numBucketsWithoutRedundancy  | 63
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (GEODE-10410) Rebalance Guard Prevent Lost Bucket Recovery

2022-08-23 Thread Alexander Murmann (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Murmann updated GEODE-10410:
--
Labels: needsTriage  (was: )

> Rebalance Guard Prevent Lost Bucket Recovery
> 
>
> Key: GEODE-10410
> URL: https://issues.apache.org/jira/browse/GEODE-10410
> Project: Geode
>  Issue Type: Bug
>Reporter: Weijie Xu
>Priority: Major
>  Labels: needsTriage
>
> Following steps reproduce the issue:
> Run the start.gfsh in the attached example, which configures a geode system 
> with a partitioned region and a gateway sender. So there are two regions, the 
> manually created region, and the queue region.
> Then run the example code, which will source ~400M data and 5 times amount of 
> events into the system. All data are sourced into the system, no bucket lost, 
> and no out of memory.
> Then stop one of the server, and revoke the disk file of the server.
> Then start the server, which will trigger a bucket recovery. After that, 
> there will be part of secondary bucket lost.
> gfsh>show metrics --region=/example-region
>           | numBucketsWithoutRedundancy  | 63
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (GEODE-10410) Rebalance Guard Prevent Lost Bucket Recovery

2022-08-23 Thread Weijie Xu (Jira)
Weijie Xu created GEODE-10410:
-

 Summary: Rebalance Guard Prevent Lost Bucket Recovery
 Key: GEODE-10410
 URL: https://issues.apache.org/jira/browse/GEODE-10410
 Project: Geode
  Issue Type: Bug
Reporter: Weijie Xu


Following steps reproduce the issue:

Run the start.gfsh in the attached example, which configures a geode system 
with a partitioned region and a gateway sender. So there are two regions, the 
manually created region, and the queue region.

Then run the example code, which will source ~400M data and 5 times amount of 
events into the system. All data are sourced into the system, no bucket lost, 
and no out of memory.

Then stop one of the server, and revoke the disk file of the server.

Then start the server, which will trigger a bucket recovery. After that, there 
will be part of secondary bucket lost.

gfsh>show metrics --region=/example-region

          | numBucketsWithoutRedundancy  | 63

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (GEODE-10409) Rebalance Model Missing Collocated Regions At Server Startup

2022-08-23 Thread Weijie Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Xu updated GEODE-10409:
--
Attachment: server2.log
test.tar.gz

> Rebalance Model Missing Collocated Regions At Server Startup
> 
>
> Key: GEODE-10409
> URL: https://issues.apache.org/jira/browse/GEODE-10409
> Project: Geode
>  Issue Type: Bug
>Reporter: Weijie Xu
>Priority: Major
>  Labels: needsTriage
> Attachments: server2.log, test.tar.gz
>
>
> Following steps reproduce the issue:
> Run the start.gfsh in the attached example, which configures a geode system 
> with a partitioned region, a gateway sender and a collocated region with the 
> partitioned region. So there are three regions totally, the leader region, 
> the collcated region and the queue region.
> Then run the example code, which will source ~400M data and 5 times amount of 
> events into the system.
> Then stop one of the server, and revoke the disk file of the server.
> Then start the server, which will trigger a bucket recovery.
> From the attached log line596, line598 and line5958, we can see that the 
> queue region is not included in the rebalance model, either in the data size 
> colum nor in the max size colum.
> Then do a manual rebalance after the server is up, this time log shows the 
> queue region is added to the model.(line6010, line6012, lin6014 and line6028)
>  
> The inconsistent behavior will lead to 2 negative results:
> 1) Different result of rebalance between server startup phase and manual 
> trigger, startup rebalance tells everything is OK, rebalance finished, but 
> manual trigger rebalance tells space not enough since it included the queue 
> region into the model which has 5 times data size as the leader region.
> 2) A dismatch between the rebalance model and the actual data being 
> rebalanced(Actually the queue region data is rebalanced although the region 
> is not included in the model at server startup phase).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (GEODE-10409) Rebalance Model Missing Collocated Regions At Server Startup

2022-08-23 Thread Weijie Xu (Jira)
Weijie Xu created GEODE-10409:
-

 Summary: Rebalance Model Missing Collocated Regions At Server 
Startup
 Key: GEODE-10409
 URL: https://issues.apache.org/jira/browse/GEODE-10409
 Project: Geode
  Issue Type: Bug
Reporter: Weijie Xu


Following steps reproduce the issue:

Run the start.gfsh in the attached example, which configures a geode system 
with a partitioned region, a gateway sender and a collocated region with the 
partitioned region. So there are three regions totally, the leader region, the 
collcated region and the queue region.

Then run the example code, which will source ~400M data and 5 times amount of 
events into the system.

Then stop one of the server, and revoke the disk file of the server.

Then start the server, which will trigger a bucket recovery.

>From the attached log line596, line598 and line5958, we can see that the queue 
>region is not included in the rebalance model, either in the data size colum 
>nor in the max size colum.

Then do a manual rebalance after the server is up, this time log shows the 
queue region is added to the model.(line6010, line6012, lin6014 and line6028)

 

The inconsistent behavior will lead to 2 negative results:

1) Different result of rebalance between server startup phase and manual 
trigger, startup rebalance tells everything is OK, rebalance finished, but 
manual trigger rebalance tells space not enough since it included the queue 
region into the model which has 5 times data size as the leader region.

2) A dismatch between the rebalance model and the actual data being 
rebalanced(Actually the queue region data is rebalanced although the region is 
not included in the model at server startup phase).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (GEODE-10409) Rebalance Model Missing Collocated Regions At Server Startup

2022-08-23 Thread Alexander Murmann (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Murmann updated GEODE-10409:
--
Labels: needsTriage  (was: )

> Rebalance Model Missing Collocated Regions At Server Startup
> 
>
> Key: GEODE-10409
> URL: https://issues.apache.org/jira/browse/GEODE-10409
> Project: Geode
>  Issue Type: Bug
>Reporter: Weijie Xu
>Priority: Major
>  Labels: needsTriage
>
> Following steps reproduce the issue:
> Run the start.gfsh in the attached example, which configures a geode system 
> with a partitioned region, a gateway sender and a collocated region with the 
> partitioned region. So there are three regions totally, the leader region, 
> the collcated region and the queue region.
> Then run the example code, which will source ~400M data and 5 times amount of 
> events into the system.
> Then stop one of the server, and revoke the disk file of the server.
> Then start the server, which will trigger a bucket recovery.
> From the attached log line596, line598 and line5958, we can see that the 
> queue region is not included in the rebalance model, either in the data size 
> colum nor in the max size colum.
> Then do a manual rebalance after the server is up, this time log shows the 
> queue region is added to the model.(line6010, line6012, lin6014 and line6028)
>  
> The inconsistent behavior will lead to 2 negative results:
> 1) Different result of rebalance between server startup phase and manual 
> trigger, startup rebalance tells everything is OK, rebalance finished, but 
> manual trigger rebalance tells space not enough since it included the queue 
> region into the model which has 5 times data size as the leader region.
> 2) A dismatch between the rebalance model and the actual data being 
> rebalanced(Actually the queue region data is rebalanced although the region 
> is not included in the model at server startup phase).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)