Huizhi Lu created ZOOKEEPER-4053: ------------------------------------ Summary: ConnectionLossException is vague for failing to read/write large znode Key: ZOOKEEPER-4053 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4053 Project: ZooKeeper Issue Type: Improvement Components: java client Affects Versions: 3.6.2 Reporter: Huizhi Lu Assignee: Huizhi Lu
h2. Description [Related discussion thread|https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202101.mbox/ajax/%3CCAHVM2p%3DJ2GE1jQ3_rs2npSZ%2Bm8evszATKTvBQrmjqMdM5is22Q%40mail.gmail.com%3E] As we know, assume we are using the default 1 MB jute.maxbuffer, if a zk client tries to write a large znode > 1MB, the server will fail it. Server will log "Len error" and close the connection. The client will receive a connection loss. In a third party ZkClient lib (eg. I0IZkClient), it'll keep retrying the operation upon connection loss. And this forever retrying might have a chance to take down the zk server. h2. Log {noformat} 2021/01/04 18:49:06.372 WARN [ClientCnxn] [main-SendThread(localhost:2181)] Session 0x776989df3190104 for server localhost:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe 2021/01/04 20:03:22.535 WARN [ClientCnxn] [main-SendThread(localhost:2181)] Session 0x776989df3190104 for server localhost:2181, unexpected error, closing socket connection and attempting reconnectjava.io.IOException: Connection reset by peer\{noformat} in fact, the error log in the server has more meaningful information: {noformat} 2021-01-04 19:19:38,467 [myid:8] - WARN [NIOServerCxn.Factory:/0.0.0.0:2181:NIOServerCnxn@373] - Exception causing close of session 0x976988b591a010b due to java.io.IOException: Len error 1076482 2021-01-04 19:19:38,842 [myid:8] - WARN\{noformat} h2. Proposed Solution Client side also blocks large data write by add a sanity check for buffer size for the outgoing request and throwing a new KeeperException to signal clients to stop retrying the same operation. It's more efficient as the request is not sent to the server so a round trip is saved and server does not have to disconnect the connection. -- This message was sent by Atlassian Jira (v8.3.4#803005)