[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system
[ https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337943#comment-16337943 ] Bruce Schuchardt commented on GEODE-4322: - If you're going to shut down the whole cluster you can delete the .dat file when you stop each locator. If you shut down the first locator and then change your mind that's okay, too. If you restart it the locator will recover its state from the still-running locator. I've asked Brian Baynes to talk with you about this ticket and am re-assigning it to him. > Locator fails to start with NPE during join to the distributed system > - > > Key: GEODE-4322 > URL: https://issues.apache.org/jira/browse/GEODE-4322 > Project: Geode > Issue Type: Bug > Components: membership >Affects Versions: 1.2.0 >Reporter: Vahram Aharonyan >Assignee: Brian Baynes >Priority: Major > > Found out that after setting security-udp-dhalgo=AES:128 in prorperties files > sometimes locator is failing to come online with the following Exception: > [severe 2018/01/19 04:22:12.194 PST tid=0x45] > Exception in processing request from 10.144.248.41 > java.lang.RuntimeException: Not found public key for member > 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493):10002 > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365) > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271) > at > org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > Please note, that generally this issue is hit after cluster restart. This is > important, as during poweroff locator can go offline first and one of other > members will become coordinator and update view file accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system
[ https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337596#comment-16337596 ] Vahram Aharonyan commented on GEODE-4322: - Hi [~bschuchardt], Thanks for the info. Actually yes, we are shutting down whole cluster. BTW, lets assume we have 2 locators(Loc-1 and Loc-2). We stop Loc-1 and remove its dat file before start (while Loc-2 remains alive) , will this cause problems? Or we should remove all the dat files from all locator nodes only when whole cluster is powered off? Thanks, Vahram. > Locator fails to start with NPE during join to the distributed system > - > > Key: GEODE-4322 > URL: https://issues.apache.org/jira/browse/GEODE-4322 > Project: Geode > Issue Type: Bug > Components: membership >Affects Versions: 1.2.0 >Reporter: Vahram Aharonyan >Assignee: Bruce Schuchardt >Priority: Major > > Found out that after setting security-udp-dhalgo=AES:128 in prorperties files > sometimes locator is failing to come online with the following Exception: > [severe 2018/01/19 04:22:12.194 PST tid=0x45] > Exception in processing request from 10.144.248.41 > java.lang.RuntimeException: Not found public key for member > 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493):10002 > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365) > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271) > at > org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > Please note, that generally this issue is hit after cluster restart. This is > important, as during poweroff locator can go offline first and one of other > members will become coordinator and update view file accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system
[ https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336523#comment-16336523 ] Bruce Schuchardt commented on GEODE-4322: - Hi [~vaharonyan], One of the fixes for GEODE-2542 which pushes the cluster key to newly joining nodes is not in 1.3. That one fixes a null pointer exception thrown when receiving an encrypted message with id "-150" when starting/restarting locators concurrently. This is something you are likely to hit. This fix is in the 1.4 release. The fix for the issue you are hitting is in the 1.3 release. The symptom for this issue is an NPE where the stack trace includes GMSEncrypt.getPeerEncryptor(), as shown in this ticket's description. Are you shutting down the whole cluster? If so you might consider deleting the locatorView.dat files before starting the locators. You can't delete them if you're doing a rolling restart or rolling upgrade but if the whole cluster is down it is safe to delete them. The contain reboot information that lets the locator rejoin the cluster. This information is causing confusion in the 1.2 algorithms that leads to the exception and deleting the files may clear up the issue for you. These files are in the locator directories and have the locator's port in the file name, such as locator10334view.dat. > Locator fails to start with NPE during join to the distributed system > - > > Key: GEODE-4322 > URL: https://issues.apache.org/jira/browse/GEODE-4322 > Project: Geode > Issue Type: Bug > Components: membership >Affects Versions: 1.2.0 >Reporter: Vahram Aharonyan >Assignee: Bruce Schuchardt >Priority: Major > > Found out that after setting security-udp-dhalgo=AES:128 in prorperties files > sometimes locator is failing to come online with the following Exception: > [severe 2018/01/19 04:22:12.194 PST tid=0x45] > Exception in processing request from 10.144.248.41 > java.lang.RuntimeException: Not found public key for member > 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493):10002 > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365) > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271) > at > org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > Please note, that generally this issue is hit after cluster restart. This is > important, as during poweroff locator can go offline first and one of other > members will become coordinator and update view file accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system
[ https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335448#comment-16335448 ] Vahram Aharonyan commented on GEODE-4322: - Hi [~bschuchardt], Unfortunately we don't have a way to quickly test this with Geode 1.3. We have setups only with 1.1.0 and 1.2.0. Do you know what is the status of the test mentioned in GEODE-2542 in case of GEODE 1.3 branch? I see that stacktraces from fails are pretty similar to what we are observing. Thanks, Vahram. > Locator fails to start with NPE during join to the distributed system > - > > Key: GEODE-4322 > URL: https://issues.apache.org/jira/browse/GEODE-4322 > Project: Geode > Issue Type: Bug > Components: membership >Affects Versions: 1.2.0 >Reporter: Vahram Aharonyan >Assignee: Bruce Schuchardt >Priority: Major > > Found out that after setting security-udp-dhalgo=AES:128 in prorperties files > sometimes locator is failing to come online with the following Exception: > [severe 2018/01/19 04:22:12.194 PST tid=0x45] > Exception in processing request from 10.144.248.41 > java.lang.RuntimeException: Not found public key for member > 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493):10002 > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365) > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271) > at > org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > Please note, that generally this issue is hit after cluster restart. This is > important, as during poweroff locator can go offline first and one of other > members will become coordinator and update view file accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system
[ https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335204#comment-16335204 ] Bruce Schuchardt commented on GEODE-4322: - Hi [~vaharonyan], have you tested with Geode 1.3? The NPE occurred when the locator was processing a "find coordinator" request during restart. A lot of the code concerning that processing was revised in 1.3 and I'd like to know if the problem still existed after those changes. > Locator fails to start with NPE during join to the distributed system > - > > Key: GEODE-4322 > URL: https://issues.apache.org/jira/browse/GEODE-4322 > Project: Geode > Issue Type: Bug > Components: membership >Affects Versions: 1.2.0 >Reporter: Vahram Aharonyan >Assignee: Bruce Schuchardt >Priority: Major > > Found out that after setting security-udp-dhalgo=AES:128 in prorperties files > sometimes locator is failing to come online with the following Exception: > [severe 2018/01/19 04:22:12.194 PST tid=0x45] > Exception in processing request from 10.144.248.41 > java.lang.RuntimeException: Not found public key for member > 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493):10002 > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365) > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271) > at > org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > Please note, that generally this issue is hit after cluster restart. This is > important, as during poweroff locator can go offline first and one of other > members will become coordinator and update view file accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system
[ https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334622#comment-16334622 ] Anthony Baker commented on GEODE-4322: -- [~vaharonyan] thanks for the information. Normally we don't assign a target FixVersion without discussion on the dev list to confirm that someone is willing to commit to fixing the issue. We have already cut the release branch for 1.4.0 and I wouldn't be surprised to see a release candidate coming in the next day or so. Perhaps it would make sense to fix this bug in a patch release that follows soon after. > Locator fails to start with NPE during join to the distributed system > - > > Key: GEODE-4322 > URL: https://issues.apache.org/jira/browse/GEODE-4322 > Project: Geode > Issue Type: Bug > Components: membership >Affects Versions: 1.2.0 >Reporter: Vahram Aharonyan >Priority: Major > > Found out that after setting security-udp-dhalgo=AES:128 in prorperties files > sometimes locator is failing to come online with the following Exception: > [severe 2018/01/19 04:22:12.194 PST tid=0x45] > Exception in processing request from 10.144.248.41 > java.lang.RuntimeException: Not found public key for member > 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493):10002 > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365) > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271) > at > org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > Please note, that generally this issue is hit after cluster restart. This is > important, as during poweroff locator can go offline first and one of other > members will become coordinator and update view file accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system
[ https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334002#comment-16334002 ] Vahram Aharonyan commented on GEODE-4322: - Hi [~amb], This is really important issue as it is blocking from encrypting inter-node UDP communication. This means that all the UDP communication within the cluster goes as a plain text . Can't we have this fixed sooner in terms of upcoming 1.4.0? Or having a patch for older releases would be an option as well. Thanks, Vahram. > Locator fails to start with NPE during join to the distributed system > - > > Key: GEODE-4322 > URL: https://issues.apache.org/jira/browse/GEODE-4322 > Project: Geode > Issue Type: Bug > Components: membership >Affects Versions: 1.2.0 >Reporter: Vahram Aharonyan >Priority: Major > > Found out that after setting security-udp-dhalgo=AES:128 in prorperties files > sometimes locator is failing to come online with the following Exception: > [severe 2018/01/19 04:22:12.194 PST tid=0x45] > Exception in processing request from 10.144.248.41 > java.lang.RuntimeException: Not found public key for member > 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493):10002 > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365) > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271) > at > org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258) > at > org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175) > ... 7 more > Please note, that generally this issue is hit after cluster restart. This is > important, as during poweroff locator can go offline first and one of other > members will become coordinator and update view file accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)