[jira] [Updated] (GEODE-10400) Function execution triggering internal exception
[ https://issues.apache.org/jira/browse/GEODE-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-10400: --- Labels: needsTriage pull-request-available (was: needsTriage) > Function execution triggering internal exception > > > Key: GEODE-10400 > URL: https://issues.apache.org/jira/browse/GEODE-10400 > Project: Geode > Issue Type: Bug > Components: native client >Reporter: Mario Salazar de Torres >Assignee: Mario Salazar de Torres >Priority: Major > Labels: needsTriage, pull-request-available > > *GIVEN* a cluster with at least 3 members > *AND* a partitioned region with 1 redundant-copy > *AND* a server function called *JustAFunction* with isHA=false, > hasResult=true, optimizeForWrite=true > *AND* a native client configured to connect to the above cluster with a pool > using PR-Single-Hop=true > *WHEN* *JustAFunction* is executed with onRegion and no filters > *IF* the client has partial metadata due to the cluster starting up or a > rebalance occurring > *THEN* and exception of type > *"org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: > Multiple target nodes found for single hop operation"* is thrown by one of > the servers > --- > *Additional information.* Currently, in geode-native whenever the metadata > information is incomplete, and the user tries to execute the server function > with onRegion and no filters, a request of type > EXECUTE_REGION_FUNCTION_SINGLE_HOP is sent to each node. > But the issue is that bucket partition used by the client is incorrect, > leading consequently to the mentioned exception. > *Potential solution.* The potential solution would be to detect that the > metadata is incomplete before actually executing the function and send a > EXECUTE_REGION_FUNCTION request to one of the cluster nodes instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (GEODE-10400) Function execution triggering internal exception
[ https://issues.apache.org/jira/browse/GEODE-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583483#comment-17583483 ] ASF GitHub Bot commented on GEODE-10400: gaussianrecurrence opened a new pull request, #982: URL: https://github.com/apache/geode-native/pull/982 - Whenever execution server functions with isHA=false, hasResult=true and optimizeForWrite=true, if the metadata is incomplete a request of type EXECUTE_REGION_FUNCTION_SINGLE_HOP is sent with an invalid bucket set partition. This causes an InternalFunctionInvocationTargetException exception - In order to solve this issue, now, whenever calling groupByServerToBuckets, if there isn't a valid location for one of the buckets, it returns nullptr, triggering the fallback mechanism which would be sending EXECUTE_REGION_FUNCTION to one of the nodes instead. > Function execution triggering internal exception > > > Key: GEODE-10400 > URL: https://issues.apache.org/jira/browse/GEODE-10400 > Project: Geode > Issue Type: Bug > Components: native client >Reporter: Mario Salazar de Torres >Assignee: Mario Salazar de Torres >Priority: Major > Labels: needsTriage > > *GIVEN* a cluster with at least 3 members > *AND* a partitioned region with 1 redundant-copy > *AND* a server function called *JustAFunction* with isHA=false, > hasResult=true, optimizeForWrite=true > *AND* a native client configured to connect to the above cluster with a pool > using PR-Single-Hop=true > *WHEN* *JustAFunction* is executed with onRegion and no filters > *IF* the client has partial metadata due to the cluster starting up or a > rebalance occurring > *THEN* and exception of type > *"org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: > Multiple target nodes found for single hop operation"* is thrown by one of > the servers > --- > *Additional information.* Currently, in geode-native whenever the metadata > information is incomplete, and the user tries to execute the server function > with onRegion and no filters, a request of type > EXECUTE_REGION_FUNCTION_SINGLE_HOP is sent to each node. > But the issue is that bucket partition used by the client is incorrect, > leading consequently to the mentioned exception. > *Potential solution.* The potential solution would be to detect that the > metadata is incomplete before actually executing the function and send a > EXECUTE_REGION_FUNCTION request to one of the cluster nodes instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (GEODE-10402) Fix FunctionException handling
[ https://issues.apache.org/jira/browse/GEODE-10402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-10402: --- Labels: needsTriage pull-request-available (was: needsTriage) > Fix FunctionException handling > -- > > Key: GEODE-10402 > URL: https://issues.apache.org/jira/browse/GEODE-10402 > Project: Geode > Issue Type: Bug > Components: native client >Reporter: Mario Salazar de Torres >Assignee: Mario Salazar de Torres >Priority: Major > Labels: needsTriage, pull-request-available > > *GIVEN* a ServerFunction throwing a FunctionException > *WHEN* its executed > *THEN* a CacheServerException is thrown rather FunctionException > > *Additional info.* FunctionException seems not to be handled, that's why the > default handling exception is thrown by the native API, CacheServerException -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (GEODE-10402) Fix FunctionException handling
[ https://issues.apache.org/jira/browse/GEODE-10402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583458#comment-17583458 ] ASF GitHub Bot commented on GEODE-10402: gaussianrecurrence opened a new pull request, #981: URL: https://github.com/apache/geode-native/pull/981 - Fixed handling for FunctionException. - Added InternalFunctionInvocationTargetException and replaced GF_FUNCTION_EXCEPTION by GF_INTERNAL_FUNCTION_INVOCATION_TARGET_EXCEPTION, so function exceptions are properly handled. - Code modified to adapt to the above changes. > Fix FunctionException handling > -- > > Key: GEODE-10402 > URL: https://issues.apache.org/jira/browse/GEODE-10402 > Project: Geode > Issue Type: Bug > Components: native client >Reporter: Mario Salazar de Torres >Assignee: Mario Salazar de Torres >Priority: Major > Labels: needsTriage > > *GIVEN* a ServerFunction throwing a FunctionException > *WHEN* its executed > *THEN* a CacheServerException is thrown rather FunctionException > > *Additional info.* FunctionException seems not to be handled, that's why the > default handling exception is thrown by the native API, CacheServerException -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (GEODE-10410) Rebalance Guard Prevent Lost Bucket Recovery
[ https://issues.apache.org/jira/browse/GEODE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weijie Xu updated GEODE-10410: -- Attachment: server2.log test.tar.gz > Rebalance Guard Prevent Lost Bucket Recovery > > > Key: GEODE-10410 > URL: https://issues.apache.org/jira/browse/GEODE-10410 > Project: Geode > Issue Type: Bug >Reporter: Weijie Xu >Priority: Major > Labels: needsTriage > Attachments: server2.log, test.tar.gz > > > Following steps reproduce the issue: > Run the start.gfsh in the attached example, which configures a geode system > with a partitioned region and a gateway sender. So there are two regions, the > manually created region, and the queue region. > Then run the example code, which will source ~400M data and 5 times amount of > events into the system. All data are sourced into the system, no bucket lost, > and no out of memory. > Then stop one of the server, and revoke the disk file of the server. > Then start the server, which will trigger a bucket recovery. After that, > there will be part of secondary bucket lost. > gfsh>show metrics --region=/example-region > | numBucketsWithoutRedundancy | 63 > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (GEODE-10410) Rebalance Guard Prevent Lost Bucket Recovery
[ https://issues.apache.org/jira/browse/GEODE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Murmann updated GEODE-10410: -- Labels: needsTriage (was: ) > Rebalance Guard Prevent Lost Bucket Recovery > > > Key: GEODE-10410 > URL: https://issues.apache.org/jira/browse/GEODE-10410 > Project: Geode > Issue Type: Bug >Reporter: Weijie Xu >Priority: Major > Labels: needsTriage > > Following steps reproduce the issue: > Run the start.gfsh in the attached example, which configures a geode system > with a partitioned region and a gateway sender. So there are two regions, the > manually created region, and the queue region. > Then run the example code, which will source ~400M data and 5 times amount of > events into the system. All data are sourced into the system, no bucket lost, > and no out of memory. > Then stop one of the server, and revoke the disk file of the server. > Then start the server, which will trigger a bucket recovery. After that, > there will be part of secondary bucket lost. > gfsh>show metrics --region=/example-region > | numBucketsWithoutRedundancy | 63 > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (GEODE-10410) Rebalance Guard Prevent Lost Bucket Recovery
Weijie Xu created GEODE-10410: - Summary: Rebalance Guard Prevent Lost Bucket Recovery Key: GEODE-10410 URL: https://issues.apache.org/jira/browse/GEODE-10410 Project: Geode Issue Type: Bug Reporter: Weijie Xu Following steps reproduce the issue: Run the start.gfsh in the attached example, which configures a geode system with a partitioned region and a gateway sender. So there are two regions, the manually created region, and the queue region. Then run the example code, which will source ~400M data and 5 times amount of events into the system. All data are sourced into the system, no bucket lost, and no out of memory. Then stop one of the server, and revoke the disk file of the server. Then start the server, which will trigger a bucket recovery. After that, there will be part of secondary bucket lost. gfsh>show metrics --region=/example-region | numBucketsWithoutRedundancy | 63 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (GEODE-10409) Rebalance Model Missing Collocated Regions At Server Startup
[ https://issues.apache.org/jira/browse/GEODE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weijie Xu updated GEODE-10409: -- Attachment: server2.log test.tar.gz > Rebalance Model Missing Collocated Regions At Server Startup > > > Key: GEODE-10409 > URL: https://issues.apache.org/jira/browse/GEODE-10409 > Project: Geode > Issue Type: Bug >Reporter: Weijie Xu >Priority: Major > Labels: needsTriage > Attachments: server2.log, test.tar.gz > > > Following steps reproduce the issue: > Run the start.gfsh in the attached example, which configures a geode system > with a partitioned region, a gateway sender and a collocated region with the > partitioned region. So there are three regions totally, the leader region, > the collcated region and the queue region. > Then run the example code, which will source ~400M data and 5 times amount of > events into the system. > Then stop one of the server, and revoke the disk file of the server. > Then start the server, which will trigger a bucket recovery. > From the attached log line596, line598 and line5958, we can see that the > queue region is not included in the rebalance model, either in the data size > colum nor in the max size colum. > Then do a manual rebalance after the server is up, this time log shows the > queue region is added to the model.(line6010, line6012, lin6014 and line6028) > > The inconsistent behavior will lead to 2 negative results: > 1) Different result of rebalance between server startup phase and manual > trigger, startup rebalance tells everything is OK, rebalance finished, but > manual trigger rebalance tells space not enough since it included the queue > region into the model which has 5 times data size as the leader region. > 2) A dismatch between the rebalance model and the actual data being > rebalanced(Actually the queue region data is rebalanced although the region > is not included in the model at server startup phase). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (GEODE-10409) Rebalance Model Missing Collocated Regions At Server Startup
Weijie Xu created GEODE-10409: - Summary: Rebalance Model Missing Collocated Regions At Server Startup Key: GEODE-10409 URL: https://issues.apache.org/jira/browse/GEODE-10409 Project: Geode Issue Type: Bug Reporter: Weijie Xu Following steps reproduce the issue: Run the start.gfsh in the attached example, which configures a geode system with a partitioned region, a gateway sender and a collocated region with the partitioned region. So there are three regions totally, the leader region, the collcated region and the queue region. Then run the example code, which will source ~400M data and 5 times amount of events into the system. Then stop one of the server, and revoke the disk file of the server. Then start the server, which will trigger a bucket recovery. >From the attached log line596, line598 and line5958, we can see that the queue >region is not included in the rebalance model, either in the data size colum >nor in the max size colum. Then do a manual rebalance after the server is up, this time log shows the queue region is added to the model.(line6010, line6012, lin6014 and line6028) The inconsistent behavior will lead to 2 negative results: 1) Different result of rebalance between server startup phase and manual trigger, startup rebalance tells everything is OK, rebalance finished, but manual trigger rebalance tells space not enough since it included the queue region into the model which has 5 times data size as the leader region. 2) A dismatch between the rebalance model and the actual data being rebalanced(Actually the queue region data is rebalanced although the region is not included in the model at server startup phase). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (GEODE-10409) Rebalance Model Missing Collocated Regions At Server Startup
[ https://issues.apache.org/jira/browse/GEODE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Murmann updated GEODE-10409: -- Labels: needsTriage (was: ) > Rebalance Model Missing Collocated Regions At Server Startup > > > Key: GEODE-10409 > URL: https://issues.apache.org/jira/browse/GEODE-10409 > Project: Geode > Issue Type: Bug >Reporter: Weijie Xu >Priority: Major > Labels: needsTriage > > Following steps reproduce the issue: > Run the start.gfsh in the attached example, which configures a geode system > with a partitioned region, a gateway sender and a collocated region with the > partitioned region. So there are three regions totally, the leader region, > the collcated region and the queue region. > Then run the example code, which will source ~400M data and 5 times amount of > events into the system. > Then stop one of the server, and revoke the disk file of the server. > Then start the server, which will trigger a bucket recovery. > From the attached log line596, line598 and line5958, we can see that the > queue region is not included in the rebalance model, either in the data size > colum nor in the max size colum. > Then do a manual rebalance after the server is up, this time log shows the > queue region is added to the model.(line6010, line6012, lin6014 and line6028) > > The inconsistent behavior will lead to 2 negative results: > 1) Different result of rebalance between server startup phase and manual > trigger, startup rebalance tells everything is OK, rebalance finished, but > manual trigger rebalance tells space not enough since it included the queue > region into the model which has 5 times data size as the leader region. > 2) A dismatch between the rebalance model and the actual data being > rebalanced(Actually the queue region data is rebalanced although the region > is not included in the model at server startup phase). -- This message was sent by Atlassian Jira (v8.20.10#820010)