[jira] [Commented] (ZOOKEEPER-3280) ClientCnxn xid rollover can break sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772346#comment-16772346 ] Jonathan Park commented on ZOOKEEPER-3280: -- I'm afraid I don't know enough to help out. > ClientCnxn xid rollover can break sessions > -- > > Key: ZOOKEEPER-3280 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3280 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.6, 3.4.12 >Reporter: Jonathan Park >Priority: Major > Attachments: ZKOutOfOrder.java > > > > {code:java} > 2019-02-15 13:40:21,471 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@759] - Got auth > sessionid:0x168f2c5e9c60017 > 2019-02-15 13:40:21,472 [myid:] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1166] - Session > 0x168f2c5e9c60017 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected > error, closing socket connection and attempting reconnect > java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 > for a packet with details: clientPath:null serverPath:null finished:false > header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) > 2019-02-15 13:40:22,520 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket > connection to server localhost/127.0.0.1:2181. Will not attempt to > authenticate using SASL (unknown error) > 2019-02-15 13:40:22,521 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket > connection established to localhost/127.0.0.1:2181, initiating session > 2019-02-15 13:40:22,521 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session > establishment request sent on localhost/127.0.0.1:2181 > 2019-02-15 13:40:22,522 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session > establishment complete on server localhost/127.0.0.1:2181, sessionid = > 0x168f2c5e9c60017, negotiated timeout = 3 > 2019-02-15 13:40:22,525 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping > response for sessionid: 0x168f2c5e9c60017 after 235329552ms > {code} > ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this > can lead to rollover into the negative xid space. Xid = -4 is treated as a > special xid reserved for auth requests. With xid rollover, a normal ZK > request can also have xid = -4 but the response will be treated as an auth > response making subsequent packet processing error with the exception above. > We can reproduce this more readily by changing the starting xid in ClientCnxn > from 1 to -100. The ZK client will transparently reconnect and establish a > new session but features that depended on the same session persisting will > unnecessarily experience a disconnected event. > > I've attached simple class with a main() method that reproduces the failure > quickly against a local ZK server after modifying the initial value of > ClientCnxn.xid from 1 to -100. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3280) ClientCnxn xid rollover can break sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769876#comment-16769876 ] Jonathan Park commented on ZOOKEEPER-3280: -- Liam, I'm afraid I don't have the context to understand what you're referring to. > ClientCnxn xid rollover can break sessions > -- > > Key: ZOOKEEPER-3280 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3280 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.6, 3.4.12 >Reporter: Jonathan Park >Priority: Major > Attachments: ZKOutOfOrder.java > > > > {code:java} > 2019-02-15 13:40:21,471 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@759] - Got auth > sessionid:0x168f2c5e9c60017 > 2019-02-15 13:40:21,472 [myid:] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1166] - Session > 0x168f2c5e9c60017 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected > error, closing socket connection and attempting reconnect > java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 > for a packet with details: clientPath:null serverPath:null finished:false > header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) > 2019-02-15 13:40:22,520 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket > connection to server localhost/127.0.0.1:2181. Will not attempt to > authenticate using SASL (unknown error) > 2019-02-15 13:40:22,521 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket > connection established to localhost/127.0.0.1:2181, initiating session > 2019-02-15 13:40:22,521 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session > establishment request sent on localhost/127.0.0.1:2181 > 2019-02-15 13:40:22,522 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session > establishment complete on server localhost/127.0.0.1:2181, sessionid = > 0x168f2c5e9c60017, negotiated timeout = 3 > 2019-02-15 13:40:22,525 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping > response for sessionid: 0x168f2c5e9c60017 after 235329552ms > {code} > ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this > can lead to rollover into the negative xid space. Xid = -4 is treated as a > special xid reserved for auth requests. With xid rollover, a normal ZK > request can also have xid = -4 but the response will be treated as an auth > response making subsequent packet processing error with the exception above. > We can reproduce this more readily by changing the starting xid in ClientCnxn > from 1 to -100. The ZK client will transparently reconnect and establish a > new session but features that depended on the same session persisting will > unnecessarily experience a disconnected event. > > I've attached simple class with a main() method that reproduces the failure > quickly against a local ZK server after modifying the initial value of > ClientCnxn.xid from 1 to -100. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3280) ClientCnxn xid rollover can break sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769796#comment-16769796 ] Jonathan Park commented on ZOOKEEPER-3280: -- Yes, go ahead unless you want to use a different medium that Jira comments. > ClientCnxn xid rollover can break sessions > -- > > Key: ZOOKEEPER-3280 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3280 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.6, 3.4.12 >Reporter: Jonathan Park >Priority: Major > Attachments: ZKOutOfOrder.java > > > > {code:java} > 2019-02-15 13:40:21,471 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@759] - Got auth > sessionid:0x168f2c5e9c60017 > 2019-02-15 13:40:21,472 [myid:] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1166] - Session > 0x168f2c5e9c60017 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected > error, closing socket connection and attempting reconnect > java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 > for a packet with details: clientPath:null serverPath:null finished:false > header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) > 2019-02-15 13:40:22,520 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket > connection to server localhost/127.0.0.1:2181. Will not attempt to > authenticate using SASL (unknown error) > 2019-02-15 13:40:22,521 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket > connection established to localhost/127.0.0.1:2181, initiating session > 2019-02-15 13:40:22,521 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session > establishment request sent on localhost/127.0.0.1:2181 > 2019-02-15 13:40:22,522 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session > establishment complete on server localhost/127.0.0.1:2181, sessionid = > 0x168f2c5e9c60017, negotiated timeout = 3 > 2019-02-15 13:40:22,525 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping > response for sessionid: 0x168f2c5e9c60017 after 235329552ms > {code} > ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this > can lead to rollover into the negative xid space. Xid = -4 is treated as a > special xid reserved for auth requests. With xid rollover, a normal ZK > request can also have xid = -4 but the response will be treated as an auth > response making subsequent packet processing error with the exception above. > We can reproduce this more readily by changing the starting xid in ClientCnxn > from 1 to -100. The ZK client will transparently reconnect and establish a > new session but features that depended on the same session persisting will > unnecessarily experience a disconnected event. > > I've attached simple class with a main() method that reproduces the failure > quickly against a local ZK server after modifying the initial value of > ClientCnxn.xid from 1 to -100. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ZOOKEEPER-3280) ClientCnxn xid rollover can break sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park resolved ZOOKEEPER-3280. -- Resolution: Duplicate > ClientCnxn xid rollover can break sessions > -- > > Key: ZOOKEEPER-3280 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3280 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.6, 3.4.12 >Reporter: Jonathan Park >Priority: Major > Attachments: ZKOutOfOrder.java > > > > {code:java} > 2019-02-15 13:40:21,471 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@759] - Got auth > sessionid:0x168f2c5e9c60017 > 2019-02-15 13:40:21,472 [myid:] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1166] - Session > 0x168f2c5e9c60017 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected > error, closing socket connection and attempting reconnect > java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 > for a packet with details: clientPath:null serverPath:null finished:false > header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) > 2019-02-15 13:40:22,520 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket > connection to server localhost/127.0.0.1:2181. Will not attempt to > authenticate using SASL (unknown error) > 2019-02-15 13:40:22,521 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket > connection established to localhost/127.0.0.1:2181, initiating session > 2019-02-15 13:40:22,521 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session > establishment request sent on localhost/127.0.0.1:2181 > 2019-02-15 13:40:22,522 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session > establishment complete on server localhost/127.0.0.1:2181, sessionid = > 0x168f2c5e9c60017, negotiated timeout = 3 > 2019-02-15 13:40:22,525 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping > response for sessionid: 0x168f2c5e9c60017 after 235329552ms > {code} > ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this > can lead to rollover into the negative xid space. Xid = -4 is treated as a > special xid reserved for auth requests. With xid rollover, a normal ZK > request can also have xid = -4 but the response will be treated as an auth > response making subsequent packet processing error with the exception above. > We can reproduce this more readily by changing the starting xid in ClientCnxn > from 1 to -100. The ZK client will transparently reconnect and establish a > new session but features that depended on the same session persisting will > unnecessarily experience a disconnected event. > > I've attached simple class with a main() method that reproduces the failure > quickly against a local ZK server after modifying the initial value of > ClientCnxn.xid from 1 to -100. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3280) ClientCnxn xid rollover can break sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769787#comment-16769787 ] Jonathan Park commented on ZOOKEEPER-3280: -- Indeed it is. My apologies, I looked for a duplicate but did not find one. Closing this one in favor of ZOOKEEPER-3253 > ClientCnxn xid rollover can break sessions > -- > > Key: ZOOKEEPER-3280 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3280 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.6, 3.4.12 >Reporter: Jonathan Park >Priority: Major > Attachments: ZKOutOfOrder.java > > > > {code:java} > 2019-02-15 13:40:21,471 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@759] - Got auth > sessionid:0x168f2c5e9c60017 > 2019-02-15 13:40:21,472 [myid:] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1166] - Session > 0x168f2c5e9c60017 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected > error, closing socket connection and attempting reconnect > java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 > for a packet with details: clientPath:null serverPath:null finished:false > header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) > at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) > 2019-02-15 13:40:22,520 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket > connection to server localhost/127.0.0.1:2181. Will not attempt to > authenticate using SASL (unknown error) > 2019-02-15 13:40:22,521 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket > connection established to localhost/127.0.0.1:2181, initiating session > 2019-02-15 13:40:22,521 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session > establishment request sent on localhost/127.0.0.1:2181 > 2019-02-15 13:40:22,522 [myid:] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session > establishment complete on server localhost/127.0.0.1:2181, sessionid = > 0x168f2c5e9c60017, negotiated timeout = 3 > 2019-02-15 13:40:22,525 [myid:] - DEBUG > [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping > response for sessionid: 0x168f2c5e9c60017 after 235329552ms > {code} > ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this > can lead to rollover into the negative xid space. Xid = -4 is treated as a > special xid reserved for auth requests. With xid rollover, a normal ZK > request can also have xid = -4 but the response will be treated as an auth > response making subsequent packet processing error with the exception above. > We can reproduce this more readily by changing the starting xid in ClientCnxn > from 1 to -100. The ZK client will transparently reconnect and establish a > new session but features that depended on the same session persisting will > unnecessarily experience a disconnected event. > > I've attached simple class with a main() method that reproduces the failure > quickly against a local ZK server after modifying the initial value of > ClientCnxn.xid from 1 to -100. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-3280) ClientCnxn xid rollover can break sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ZOOKEEPER-3280: - Description: {code:java} 2019-02-15 13:40:21,471 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@759] - Got auth sessionid:0x168f2c5e9c60017 2019-02-15 13:40:21,472 [myid:] - WARN [main-SendThread(localhost:2181):ClientCnxn$SendThread@1166] - Session 0x168f2c5e9c60017 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 for a packet with details: clientPath:null serverPath:null finished:false header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) 2019-02-15 13:40:22,520 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2019-02-15 13:40:22,521 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket connection established to localhost/127.0.0.1:2181, initiating session 2019-02-15 13:40:22,521 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session establishment request sent on localhost/127.0.0.1:2181 2019-02-15 13:40:22,522 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x168f2c5e9c60017, negotiated timeout = 3 2019-02-15 13:40:22,525 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping response for sessionid: 0x168f2c5e9c60017 after 235329552ms {code} ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this can lead to rollover into the negative xid space. Xid = -4 is treated as a special xid reserved for auth requests. With xid rollover, a normal ZK request can also have xid = -4 but the response will be treated as an auth response making subsequent packet processing error with the exception above. We can reproduce this more readily by changing the starting xid in ClientCnxn from 1 to -100. The ZK client will transparently reconnect and establish a new session but features that depended on the same session persisting will unnecessarily experience a disconnected event. I've attached simple class with a main() method that reproduces the failure quickly against a local ZK server after modifying the initial value of ClientCnxn.xid from 1 to -100. was: {code:java} 181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 for a packet with details: clientPath:null serverPath:null finished:false header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) 2019-02-15 13:40:22,520 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2019-02-15 13:40:22,521 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket connection established to localhost/127.0.0.1:2181, initiating session 2019-02-15 13:40:22,521 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session establishment request sent on localhost/127.0.0.1:2181 2019-02-15 13:40:22,522 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x168f2c5e9c60017, negotiated timeout = 3 2019-02-15 13:40:22,525 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping response for sessionid: 0x168f2c5e9c60017 after 235329552ms {code} ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this can lead to rollover into the negative xid space. Xid = -4 is treated as a special xid reserved for auth requests. With xid rollover, a normal ZK request can also have xid = -4 but the response will be treated as an auth response making subsequent packet processing error with the exception above. We can
[jira] [Created] (ZOOKEEPER-3280) ClientCnxn xid rollover can break sessions
Jonathan Park created ZOOKEEPER-3280: Summary: ClientCnxn xid rollover can break sessions Key: ZOOKEEPER-3280 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3280 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.12, 3.4.6 Reporter: Jonathan Park Attachments: ZKOutOfOrder.java {code:java} 181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Xid out of order. Got Xid -3 with err 0 expected Xid -4 for a packet with details: clientPath:null serverPath:null finished:false header:: -4,8 replyHeader:: 0,0,-4 request:: '/,F response:: v{} at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:828) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) 2019-02-15 13:40:22,520 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1027] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2019-02-15 13:40:22,521 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@877] - Socket connection established to localhost/127.0.0.1:2181, initiating session 2019-02-15 13:40:22,521 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@950] - Session establishment request sent on localhost/127.0.0.1:2181 2019-02-15 13:40:22,522 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1301] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x168f2c5e9c60017, negotiated timeout = 3 2019-02-15 13:40:22,525 [myid:] - DEBUG [main-SendThread(localhost:2181):ClientCnxn$SendThread@742] - Got ping response for sessionid: 0x168f2c5e9c60017 after 235329552ms {code} ClientCnxn xid's are tracked as java int's. For long-lived ZK clients this can lead to rollover into the negative xid space. Xid = -4 is treated as a special xid reserved for auth requests. With xid rollover, a normal ZK request can also have xid = -4 but the response will be treated as an auth response making subsequent packet processing error with the exception above. We can reproduce this more readily by changing the starting xid in ClientCnxn from 1 to -100. The ZK client will transparently reconnect and establish a new session but features that depended on the same session persisting will unnecessarily experience a disconnected event. I've attached simple class with a main() method that reproduces the failure quickly against a local ZK server after modifying the initial value of ClientCnxn.xid from 1 to -100. -- This message was sent by Atlassian JIRA (v7.6.3#76005)