[jira] [Commented] (NIFI-9878) DistributedCacheMap Handshake failure, processor hang indefinitely.
[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603647#comment-17603647 ] Jon Shoemaker commented on NIFI-9878: - [~exceptionfactory] Submitted pull request > DistributedCacheMap Handshake failure, processor hang indefinitely. > --- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.15.3, 1.17.0, 1.16.3 >Reporter: Aaron Rich >Assignee: Jon Shoemaker >Priority: Major > Labels: Handshake, distributed_cache > Attachments: > 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch, > image-2022-04-05-21-54-31-002.png, image-2022-04-05-21-55-16-221.png > > Time Spent: 10m > Remaining Estimate: 0h > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-9878) DistributedCacheMap Handshake failure, processor hang indefinitely.
[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603602#comment-17603602 ] Jon Shoemaker commented on NIFI-9878: - Due to refactoring of the DistributeCacheClient code, the patch, as written, only works with 1.17 and 1.18 > DistributedCacheMap Handshake failure, processor hang indefinitely. > --- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.15.3, 1.17.0, 1.16.3 >Reporter: Aaron Rich >Assignee: Jon Shoemaker >Priority: Major > Labels: Handshake, distributed_cache > Attachments: > 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch, > image-2022-04-05-21-54-31-002.png, image-2022-04-05-21-55-16-221.png > > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NIFI-9878) DistributedCacheMap Handshake failure, processor hang indefinitely.
[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Shoemaker updated NIFI-9878: Affects Version/s: 1.17.0 > DistributedCacheMap Handshake failure, processor hang indefinitely. > --- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.15.3, 1.17.0, 1.16.3 >Reporter: Aaron Rich >Assignee: Jon Shoemaker >Priority: Major > Labels: Handshake, distributed_cache > Attachments: > 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch, > image-2022-04-05-21-54-31-002.png, image-2022-04-05-21-55-16-221.png > > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NIFI-9878) DistributedCacheMap Handshake failure, processor hang indefinitely.
[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Shoemaker updated NIFI-9878: Affects Version/s: 1.16.3 Status: Patch Available (was: Open) > DistributedCacheMap Handshake failure, processor hang indefinitely. > --- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.16.3, 1.15.3 >Reporter: Aaron Rich >Assignee: Jon Shoemaker >Priority: Major > Labels: Handshake, distributed_cache > Attachments: > 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch, > image-2022-04-05-21-54-31-002.png, image-2022-04-05-21-55-16-221.png > > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (NIFI-9878) DistributedCacheMap Handshake failure, processor hang indefinitely.
[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Shoemaker updated NIFI-9878: Attachment: 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch > DistributedCacheMap Handshake failure, processor hang indefinitely. > --- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.15.3 >Reporter: Aaron Rich >Assignee: Jon Shoemaker >Priority: Major > Labels: Handshake, distributed_cache > Attachments: > 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch, > image-2022-04-05-21-54-31-002.png, image-2022-04-05-21-55-16-221.png > > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (NIFI-9878) DistributedCacheMap Handshake failure, processor hang indefinitely.
[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Shoemaker reassigned NIFI-9878: --- Assignee: Jon Shoemaker > DistributedCacheMap Handshake failure, processor hang indefinitely. > --- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.15.3 >Reporter: Aaron Rich >Assignee: Jon Shoemaker >Priority: Major > Labels: Handshake, distributed_cache > Attachments: image-2022-04-05-21-54-31-002.png, > image-2022-04-05-21-55-16-221.png > > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-9878) DistributedCacheMap Handshake failure, processor hang indefinitely.
[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583714#comment-17583714 ] Jon Shoemaker commented on NIFI-9878: - Experiencing the same issue. In our scenario it works correctly most of the time but occasionally the handshake response is never received and the processor thread hangs until the processor is terminated. Some of these stuck threads happen when the DistributedCacheServer is restarted. > DistributedCacheMap Handshake failure, processor hang indefinitely. > --- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.15.3 >Reporter: Aaron Rich >Priority: Major > Labels: Handshake, distributed_cache > Attachments: image-2022-04-05-21-54-31-002.png, > image-2022-04-05-21-55-16-221.png > > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (NIFI-9788) Update Apache Commons Codec to 1.15
Jon Shoemaker created NIFI-9788: --- Summary: Update Apache Commons Codec to 1.15 Key: NIFI-9788 URL: https://issues.apache.org/jira/browse/NIFI-9788 Project: Apache NiFi Issue Type: Task Reporter: Jon Shoemaker -- This message was sent by Atlassian Jira (v8.20.1#820001)