[jira] [Created] (TS-1102) Cleanup obsolete debugging code
Cleanup obsolete debugging code --- Key: TS-1102 URL: https://issues.apache.org/jira/browse/TS-1102 Project: Traffic Server Issue Type: Bug Components: Core, Logging, Performance Affects Versions: 3.0.2 Environment: Any Reporter: Uri Shachar Priority: Minor The current Diags.h D/EClosure mechanism is obsolete. ATS requires gcc = 4.1 for all compilation environments, and it includes variadic argument macro support with ##_VA_ARGS_ that deletes the final comma if no arguments are provided. Removing the added layer should also improve performance when high volume debugging is turned on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1102) Cleanup obsolete debugging code
[ https://issues.apache.org/jira/browse/TS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uri Shachar updated TS-1102: Attachment: diags_cleanup.patch Base cleanup patch. Will add 2 more patches to remove the DebugOn macro (which was an unoptimized version of Debug anyways) and remove the obsolete 'prefix' argument. Cleanup obsolete debugging code --- Key: TS-1102 URL: https://issues.apache.org/jira/browse/TS-1102 Project: Traffic Server Issue Type: Bug Components: Core, Logging, Performance Affects Versions: 3.0.2 Environment: Any Reporter: Uri Shachar Priority: Minor Attachments: diags_cleanup.patch Original Estimate: 24h Remaining Estimate: 24h The current Diags.h D/EClosure mechanism is obsolete. ATS requires gcc = 4.1 for all compilation environments, and it includes variadic argument macro support with ##_VA_ARGS_ that deletes the final comma if no arguments are provided. Removing the added layer should also improve performance when high volume debugging is turned on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1079) Add an API function to turn debugging on for specific transactions/sessions
[ https://issues.apache.org/jira/browse/TS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198639#comment-13198639 ] Uri Shachar commented on TS-1079: - Yes, the next step is to change some Debug calls in the core. I plan to go through the HttpSM/HttpTransact/HttpClientSession files and change every possible Debug call to use DebugSpecific... Any thoughts regarding adding an internal config file (remap-like) to allow turning this flag on without plugin intervention? Add an API function to turn debugging on for specific transactions/sessions --- Key: TS-1079 URL: https://issues.apache.org/jira/browse/TS-1079 Project: Traffic Server Issue Type: Improvement Components: Core, HTTP Reporter: Uri Shachar Priority: Minor Attachments: debug_specific.patch Original Estimate: 72h Remaining Estimate: 72h When attempting to troubleshoot issues on a production ATS system, it is often impossible/difficult to turn on any of the 'high-volume' debug tags like http due to the performance impact. This enhancement allows a plugin to set a debug flag for a specific txn/ssn, and replaces some of the internal Debug calls with a new function that checks if the flag is turned on, and outputs the debug line regardless of the tag if it is (The diags enable/disable flag is still taken into account). The API will also have TSDebugSpecific in order to allow plugins to use the same functionality. In addition, we might consider adding an internal config file (remap-like) to allow turning this flag on without plugin intervention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1079) Add an API function to turn debugging on for specific transactions/sessions
[ https://issues.apache.org/jira/browse/TS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198640#comment-13198640 ] Uri Shachar commented on TS-1079: - I've created TS-1102 for the cleanup task. Once the patch there is merged into the trunk I'll upload an updated patch for this issue. Add an API function to turn debugging on for specific transactions/sessions --- Key: TS-1079 URL: https://issues.apache.org/jira/browse/TS-1079 Project: Traffic Server Issue Type: Improvement Components: Core, HTTP Reporter: Uri Shachar Priority: Minor Attachments: debug_specific.patch Original Estimate: 72h Remaining Estimate: 72h When attempting to troubleshoot issues on a production ATS system, it is often impossible/difficult to turn on any of the 'high-volume' debug tags like http due to the performance impact. This enhancement allows a plugin to set a debug flag for a specific txn/ssn, and replaces some of the internal Debug calls with a new function that checks if the flag is turned on, and outputs the debug line regardless of the tag if it is (The diags enable/disable flag is still taken into account). The API will also have TSDebugSpecific in order to allow plugins to use the same functionality. In addition, we might consider adding an internal config file (remap-like) to allow turning this flag on without plugin intervention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1102) Cleanup obsolete debugging code
[ https://issues.apache.org/jira/browse/TS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uri Shachar updated TS-1102: Attachment: remove_prefix_arg.patch Added patch to remove the unused prefix argument (Includes the previous patch). Cleanup obsolete debugging code --- Key: TS-1102 URL: https://issues.apache.org/jira/browse/TS-1102 Project: Traffic Server Issue Type: Bug Components: Core, Logging, Performance Affects Versions: 3.0.2 Environment: Any Reporter: Uri Shachar Priority: Minor Attachments: diags_cleanup.patch, remove_prefix_arg.patch Original Estimate: 24h Remaining Estimate: 24h The current Diags.h D/EClosure mechanism is obsolete. ATS requires gcc = 4.1 for all compilation environments, and it includes variadic argument macro support with ##_VA_ARGS_ that deletes the final comma if no arguments are provided. Removing the added layer should also improve performance when high volume debugging is turned on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1079) Add an API function to turn debugging on for specific transactions/sessions
[ https://issues.apache.org/jira/browse/TS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198837#comment-13198837 ] Leif Hedstrom commented on TS-1079: --- Cool, I'll take a look at TS-1102 today (unless someone else beats me to it :). As for the API / remap: One possibility is to look at the APIs we added to support overriding records.config settings per transaction. It's a bit hacky, but works fairly well. The only caveat here is that it only works in places in the code where you have access to the HttpSM's State. So that might be a problem for this? If necessary, we could add another @ option to remap.config, but we tried to avoid that as much as possible with the new per-transaction records.config features (in fact, we removed 3 @ options after adding this feature). Add an API function to turn debugging on for specific transactions/sessions --- Key: TS-1079 URL: https://issues.apache.org/jira/browse/TS-1079 Project: Traffic Server Issue Type: Improvement Components: Core, HTTP Reporter: Uri Shachar Priority: Minor Attachments: debug_specific.patch Original Estimate: 72h Remaining Estimate: 72h When attempting to troubleshoot issues on a production ATS system, it is often impossible/difficult to turn on any of the 'high-volume' debug tags like http due to the performance impact. This enhancement allows a plugin to set a debug flag for a specific txn/ssn, and replaces some of the internal Debug calls with a new function that checks if the flag is turned on, and outputs the debug line regardless of the tag if it is (The diags enable/disable flag is still taken into account). The API will also have TSDebugSpecific in order to allow plugins to use the same functionality. In addition, we might consider adding an internal config file (remap-like) to allow turning this flag on without plugin intervention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1102) Cleanup obsolete debugging code
[ https://issues.apache.org/jira/browse/TS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom reassigned TS-1102: - Assignee: Leif Hedstrom Cleanup obsolete debugging code --- Key: TS-1102 URL: https://issues.apache.org/jira/browse/TS-1102 Project: Traffic Server Issue Type: Bug Components: Core, Logging, Performance Affects Versions: 3.0.2 Environment: Any Reporter: Uri Shachar Assignee: Leif Hedstrom Priority: Minor Fix For: 3.1.2 Attachments: diags_cleanup.patch, remove_prefix_arg.patch Original Estimate: 24h Remaining Estimate: 24h The current Diags.h D/EClosure mechanism is obsolete. ATS requires gcc = 4.1 for all compilation environments, and it includes variadic argument macro support with ##_VA_ARGS_ that deletes the final comma if no arguments are provided. Removing the added layer should also improve performance when high volume debugging is turned on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1102) Cleanup obsolete debugging code
[ https://issues.apache.org/jira/browse/TS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-1102: -- Fix Version/s: 3.1.2 Cleanup obsolete debugging code --- Key: TS-1102 URL: https://issues.apache.org/jira/browse/TS-1102 Project: Traffic Server Issue Type: Bug Components: Core, Logging, Performance Affects Versions: 3.0.2 Environment: Any Reporter: Uri Shachar Assignee: Leif Hedstrom Priority: Minor Fix For: 3.1.2 Attachments: diags_cleanup.patch, remove_prefix_arg.patch Original Estimate: 24h Remaining Estimate: 24h The current Diags.h D/EClosure mechanism is obsolete. ATS requires gcc = 4.1 for all compilation environments, and it includes variadic argument macro support with ##_VA_ARGS_ that deletes the final comma if no arguments are provided. Removing the added layer should also improve performance when high volume debugging is turned on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1102) Cleanup obsolete debugging code
[ https://issues.apache.org/jira/browse/TS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198881#comment-13198881 ] Leif Hedstrom commented on TS-1102: --- Uri, the remove_prefix_arg.patch fails on current trunk. Any chance you can prepare a patch for trunk? Cleanup obsolete debugging code --- Key: TS-1102 URL: https://issues.apache.org/jira/browse/TS-1102 Project: Traffic Server Issue Type: Bug Components: Core, Logging, Performance Affects Versions: 3.0.2 Environment: Any Reporter: Uri Shachar Assignee: Leif Hedstrom Priority: Minor Fix For: 3.1.2 Attachments: diags_cleanup.patch, remove_prefix_arg.patch Original Estimate: 24h Remaining Estimate: 24h The current Diags.h D/EClosure mechanism is obsolete. ATS requires gcc = 4.1 for all compilation environments, and it includes variadic argument macro support with ##_VA_ARGS_ that deletes the final comma if no arguments are provided. Removing the added layer should also improve performance when high volume debugging is turned on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1038) TSHttpTxnErrorBodySet() can leak memory (pt 2)
[ https://issues.apache.org/jira/browse/TS-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić updated TS-1038: --- Backport to Version: 3.0.3 TSHttpTxnErrorBodySet() can leak memory (pt 2) -- Key: TS-1038 URL: https://issues.apache.org/jira/browse/TS-1038 Project: Traffic Server Issue Type: Bug Affects Versions: 3.0.1 Reporter: Brian Geffon Assignee: Igor Galić Fix For: 3.1.2 Attachments: TSHttpTxnErrorBodySet.patch TS-826 resolved a memory leak with TSHttpTxnErrorBodySet but it appears that mimetype is still being leaked. See HttpSM::setup_internal_transfer line 5416 which frees internal_msg_buffer_type...it's expected that mimetype was malloced since clearly it's being freed. So that means there is still a memory leak in TSHttpTxnErrorBodySet(). Patch included. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1038) TSHttpTxnErrorBodySet() can leak memory (pt 2)
[ https://issues.apache.org/jira/browse/TS-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-1038: -- Assignee: Igor Galić (was: Leif Hedstrom) TSHttpTxnErrorBodySet() can leak memory (pt 2) -- Key: TS-1038 URL: https://issues.apache.org/jira/browse/TS-1038 Project: Traffic Server Issue Type: Bug Affects Versions: 3.0.1 Reporter: Brian Geffon Assignee: Igor Galić Fix For: 3.1.2 Attachments: TSHttpTxnErrorBodySet.patch TS-826 resolved a memory leak with TSHttpTxnErrorBodySet but it appears that mimetype is still being leaked. See HttpSM::setup_internal_transfer line 5416 which frees internal_msg_buffer_type...it's expected that mimetype was malloced since clearly it's being freed. So that means there is still a memory leak in TSHttpTxnErrorBodySet(). Patch included. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1061) TSHttpTxnServerReqHdrBytesGet in ./proxy/InkAPI.cc has an extra parameter (int *bytes) from the prototype in ./proxy/api/ts/ts.h. The extra parameter needs to be removed a
[ https://issues.apache.org/jira/browse/TS-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-1061: -- Assignee: Igor Galić (was: Leif Hedstrom) TSHttpTxnServerReqHdrBytesGet in ./proxy/InkAPI.cc has an extra parameter (int *bytes) from the prototype in ./proxy/api/ts/ts.h. The extra parameter needs to be removed as it is not used. - Key: TS-1061 URL: https://issues.apache.org/jira/browse/TS-1061 Project: Traffic Server Issue Type: Bug Components: Plugins Affects Versions: 3.1.1, 3.0.1 Environment: Redhat Linux but it is not environment specific Reporter: Alistair Stevenson Assignee: Igor Galić Priority: Minor Labels: api-change Fix For: 3.1.2 Original Estimate: 1h Remaining Estimate: 1h The definitions are: ./proxy/InkAPI.cc:TSHttpTxnServerReqHdrBytesGet(TSHttpTxn txnp, int *bytes) ./proxy/api/ts/ts.h.in: tsapi int TSHttpTxnServerReqHdrBytesGet(TSHttpTxn txnp); The int * bytes parameter is not used and means that the function does not resolve and so cannot be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1061) TSHttpTxnServerReqHdrBytesGet in ./proxy/InkAPI.cc has an extra parameter (int *bytes) from the prototype in ./proxy/api/ts/ts.h. The extra parameter needs to be removed as
[ https://issues.apache.org/jira/browse/TS-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić updated TS-1061: --- Backport to Version: 3.0.3 TSHttpTxnServerReqHdrBytesGet in ./proxy/InkAPI.cc has an extra parameter (int *bytes) from the prototype in ./proxy/api/ts/ts.h. The extra parameter needs to be removed as it is not used. - Key: TS-1061 URL: https://issues.apache.org/jira/browse/TS-1061 Project: Traffic Server Issue Type: Bug Components: Plugins Affects Versions: 3.1.1, 3.0.1 Environment: Redhat Linux but it is not environment specific Reporter: Alistair Stevenson Assignee: Igor Galić Priority: Minor Labels: api-change Fix For: 3.1.2 Original Estimate: 1h Remaining Estimate: 1h The definitions are: ./proxy/InkAPI.cc:TSHttpTxnServerReqHdrBytesGet(TSHttpTxn txnp, int *bytes) ./proxy/api/ts/ts.h.in: tsapi int TSHttpTxnServerReqHdrBytesGet(TSHttpTxn txnp); The int * bytes parameter is not used and means that the function does not resolve and so cannot be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1049) TS hangs (dead lock) on HTTPS POST requests
[ https://issues.apache.org/jira/browse/TS-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić updated TS-1049: --- Backport to Version: 3.0.3 TS hangs (dead lock) on HTTPS POST requests --- Key: TS-1049 URL: https://issues.apache.org/jira/browse/TS-1049 Project: Traffic Server Issue Type: Bug Components: Core, HTTP, SSL Affects Versions: 3.1.1, 3.1.0, 3.0.2 Environment: RedHat Enterprise Linux 6.0, Intel 32-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: records.config A very reproducible bug where the body of a HTTPS POST request is never forwarded to the origin server. Client submits a HTTPS POST request to TS, which is supposed to forward to the backend/origin server via HTTP. TS process the HTTP headers and establishes connection to the origin server, but the body of the HTTPS POST is never read. This hangs until the client times out and shuts down the connection. To reproduce: 1) Client connects to TS using HTTPS (works OK if it is just HTTP). 2) It must be a POST request. 3) TS must use at least 2 worker threads. 4) Easier to reproduce when the connections to the origin server is HTTP (not HTTPS). 5) POST body must be large enough so that the HTTP request headers and POST body do *NOT* fit within the same TCP packet. (2000 bytes is a good size) 6) I can consistently reproduce this problem using 2 separate clients each simultaneously submitting 2 requests back to back (i.e., 2 requests from each client, a total of 4 requests). This gives you a high probability that at least one of the requests would hang. Observation: 1) Thread A accepted and processed the HTTP headers, and called UnixNetProcessor::connect_re to prepare a new connection to the origin server. 2) Thread A must not have read the body of the POST. Otherwise, it works fine. 3) Thread B was assigned the task to handle the origin server connection. If the same thread A was picked, then everything works fine. 4) Apparently, one of the first things that thread B does is to acquire the mutex for reading from the client. (Why does it do that??) 5) While thread B was holding the mutex, thread A proceeded in SSLNetVConnection::net_read_io, tried and failed to acquire the mutex. Thread A typically re-tried calling SSLNetVConnection::net_read_io soon, but gave up after the second failure. But if thread B released the mutex soon enough, that thread A could proceed happily and everything works. 6) From this point, the body of the POST is never read from the client, and there is nothing to be proxy'd to the origin server, and both the consumer and producer tasks are never scheduled to run again -- or until the client times out. I tried setting the client-side time out to as long as 3-5 minutes and TS really does not recover by itself until the client closed the connection. This is the first time I uses this bug system. Please let me know how I could produce the configuration files and trace logs, etc. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1049) TS hangs (dead lock) on HTTPS POST requests
[ https://issues.apache.org/jira/browse/TS-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić updated TS-1049: --- Description: A very reproducible bug where the body of a HTTPS POST request is never forwarded to the origin server. Client submits a HTTPS POST request to TS, which is supposed to forward to the backend/origin server via HTTP. TS process the HTTP headers and establishes connection to the origin server, but the body of the HTTPS POST is never read. This hangs until the client times out and shuts down the connection. To reproduce: # Client connects to TS using HTTPS (works OK if it is just HTTP). # It must be a POST request. # TS must use at least 2 worker threads. # Easier to reproduce when the connections to the origin server is HTTP (not HTTPS). # POST body must be large enough so that the HTTP request headers and POST body do *NOT* fit within the same TCP packet. (2000 bytes is a good size) # I can consistently reproduce this problem using 2 separate clients each simultaneously submitting 2 requests back to back (i.e., 2 requests from each client, a total of 4 requests). This gives you a high probability that at least one of the requests would hang. Observation: # Thread A accepted and processed the HTTP headers, and called UnixNetProcessor::connect_re to prepare a new connection to the origin server. # Thread A must not have read the body of the POST. Otherwise, it works fine. # Thread B was assigned the task to handle the origin server connection. If the same thread A was picked, then everything works fine. # Apparently, one of the first things that thread B does is to acquire the mutex for reading from the client. (Why does it do that??) # While thread B was holding the mutex, thread A proceeded in SSLNetVConnection::net_read_io, tried and failed to acquire the mutex. Thread A typically re-tried calling SSLNetVConnection::net_read_io soon, but gave up after the second failure. But if thread B released the mutex soon enough, that thread A could proceed happily and everything works. # From this point, the body of the POST is never read from the client, and there is nothing to be proxy'd to the origin server, and both the consumer and producer tasks are never scheduled to run again -- or until the client times out. I tried setting the client-side time out to as long as 3-5 minutes and TS really does not recover by itself until the client closed the connection. This is the first time I uses this bug system. Please let me know how I could produce the configuration files and trace logs, etc. Thanks! was: A very reproducible bug where the body of a HTTPS POST request is never forwarded to the origin server. Client submits a HTTPS POST request to TS, which is supposed to forward to the backend/origin server via HTTP. TS process the HTTP headers and establishes connection to the origin server, but the body of the HTTPS POST is never read. This hangs until the client times out and shuts down the connection. To reproduce: 1) Client connects to TS using HTTPS (works OK if it is just HTTP). 2) It must be a POST request. 3) TS must use at least 2 worker threads. 4) Easier to reproduce when the connections to the origin server is HTTP (not HTTPS). 5) POST body must be large enough so that the HTTP request headers and POST body do *NOT* fit within the same TCP packet. (2000 bytes is a good size) 6) I can consistently reproduce this problem using 2 separate clients each simultaneously submitting 2 requests back to back (i.e., 2 requests from each client, a total of 4 requests). This gives you a high probability that at least one of the requests would hang. Observation: 1) Thread A accepted and processed the HTTP headers, and called UnixNetProcessor::connect_re to prepare a new connection to the origin server. 2) Thread A must not have read the body of the POST. Otherwise, it works fine. 3) Thread B was assigned the task to handle the origin server connection. If the same thread A was picked, then everything works fine. 4) Apparently, one of the first things that thread B does is to acquire the mutex for reading from the client. (Why does it do that??) 5) While thread B was holding the mutex, thread A proceeded in SSLNetVConnection::net_read_io, tried and failed to acquire the mutex. Thread A typically re-tried calling SSLNetVConnection::net_read_io soon, but gave up after the second failure. But if thread B released the mutex soon enough, that thread A could proceed happily and everything works. 6) From this point, the body of the POST is never read from the client, and there is nothing to be proxy'd to the origin server, and both the consumer and producer tasks are never scheduled to run again -- or until the client times out. I tried setting the client-side time out to as long as 3-5 minutes and TS really does not recover by itself until the client closed the
[jira] [Updated] (TS-1065) traffic_cop segment fault when enable TRACE_LOG_COP
[ https://issues.apache.org/jira/browse/TS-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić updated TS-1065: --- Backport to Version: 3.0.3 traffic_cop segment fault when enable TRACE_LOG_COP --- Key: TS-1065 URL: https://issues.apache.org/jira/browse/TS-1065 Project: Traffic Server Issue Type: Bug Affects Versions: 3.1.1, 3.0.2 Environment: mac os 10.7.2, centos 5.4 64bit Reporter: Conan Wang Assignee: Leif Hedstrom Priority: Minor Fix For: 3.1.2 Attachments: traffic_cop.diff When enable traffic_cop's debug log: #define TRACE_LOG_COP 1 Some cop_log invocation will cause segment fault, because va_list object in cop_log is used twice between 'va_start' and 'va_end'. {code} Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x7fff846b64f0 in strlen () (gdb) bt #0 0x7fff846b64f0 in strlen () #1 0x7fff846578c3 in __vfprintf () #2 0x7fff846a109b in vsprintf_l () #3 0x00011883 in cop_log (priority=5, format=0x172a8 --- Cop Starting [Version: %s] ---\n) at TrafficCop.cc:172 #4 0x00012244 in check_lockfile () at TrafficCop.cc:1733 #5 0x000122c0 in init () at TrafficCop.cc:1894 #6 0x00016689 in main (argc=1, argv=0x7fff5fbffbb0) at TrafficCop.cc:1958 {code} Reference: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdarg.h.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1055) Wrong implementation of TSHttpSsnArgGet
[ https://issues.apache.org/jira/browse/TS-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić updated TS-1055: --- Backport to Version: 3.0.3 Wrong implementation of TSHttpSsnArgGet --- Key: TS-1055 URL: https://issues.apache.org/jira/browse/TS-1055 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.1.1 Reporter: Yakov Kopel Assignee: Leif Hedstrom Labels: api-change Fix For: 3.1.2 Original Estimate: 24h Remaining Estimate: 24h There is a different between the interface of TSHttpSsnArgGet and it implemenation. In the interface (proxy/api/ts/ts.h.in): tsapi void* TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx); In the implementation(proxy/InkAPI.cc): void * TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx, void **argp) So, I wrote a simple patch to fix this problem: Index: InkAPI.cc === --- InkAPI.cc (revision 1220421) +++ InkAPI.cc (working copy) @@ -5500,7 +5500,7 @@ } void * -TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx, void **argp) +TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx) { sdk_assert(sdk_sanity_check_http_ssn(ssnp) == TS_SUCCESS); sdk_assert(arg_idx = 0 arg_idx HTTP_SSN_TXN_MAX_USER_ARG); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1055) Wrong implementation of TSHttpSsnArgGet
[ https://issues.apache.org/jira/browse/TS-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-1055: -- Assignee: Igor Galić (was: Leif Hedstrom) Wrong implementation of TSHttpSsnArgGet --- Key: TS-1055 URL: https://issues.apache.org/jira/browse/TS-1055 Project: Traffic Server Issue Type: Bug Components: TS API Affects Versions: 3.1.1 Reporter: Yakov Kopel Assignee: Igor Galić Labels: api-change Fix For: 3.1.2 Original Estimate: 24h Remaining Estimate: 24h There is a different between the interface of TSHttpSsnArgGet and it implemenation. In the interface (proxy/api/ts/ts.h.in): tsapi void* TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx); In the implementation(proxy/InkAPI.cc): void * TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx, void **argp) So, I wrote a simple patch to fix this problem: Index: InkAPI.cc === --- InkAPI.cc (revision 1220421) +++ InkAPI.cc (working copy) @@ -5500,7 +5500,7 @@ } void * -TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx, void **argp) +TSHttpSsnArgGet(TSHttpSsn ssnp, int arg_idx) { sdk_assert(sdk_sanity_check_http_ssn(ssnp) == TS_SUCCESS); sdk_assert(arg_idx = 0 arg_idx HTTP_SSN_TXN_MAX_USER_ARG); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1065) traffic_cop segment fault when enable TRACE_LOG_COP
[ https://issues.apache.org/jira/browse/TS-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-1065: -- Assignee: Igor Galić (was: Leif Hedstrom) traffic_cop segment fault when enable TRACE_LOG_COP --- Key: TS-1065 URL: https://issues.apache.org/jira/browse/TS-1065 Project: Traffic Server Issue Type: Bug Affects Versions: 3.1.1, 3.0.2 Environment: mac os 10.7.2, centos 5.4 64bit Reporter: Conan Wang Assignee: Igor Galić Priority: Minor Fix For: 3.1.2 Attachments: traffic_cop.diff When enable traffic_cop's debug log: #define TRACE_LOG_COP 1 Some cop_log invocation will cause segment fault, because va_list object in cop_log is used twice between 'va_start' and 'va_end'. {code} Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x7fff846b64f0 in strlen () (gdb) bt #0 0x7fff846b64f0 in strlen () #1 0x7fff846578c3 in __vfprintf () #2 0x7fff846a109b in vsprintf_l () #3 0x00011883 in cop_log (priority=5, format=0x172a8 --- Cop Starting [Version: %s] ---\n) at TrafficCop.cc:172 #4 0x00012244 in check_lockfile () at TrafficCop.cc:1733 #5 0x000122c0 in init () at TrafficCop.cc:1894 #6 0x00016689 in main (argc=1, argv=0x7fff5fbffbb0) at TrafficCop.cc:1958 {code} Reference: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdarg.h.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (TS-1065) traffic_cop segment fault when enable TRACE_LOG_COP
[ https://issues.apache.org/jira/browse/TS-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reopened TS-1065: reopen for backport traffic_cop segment fault when enable TRACE_LOG_COP --- Key: TS-1065 URL: https://issues.apache.org/jira/browse/TS-1065 Project: Traffic Server Issue Type: Bug Affects Versions: 3.1.1, 3.0.2 Environment: mac os 10.7.2, centos 5.4 64bit Reporter: Conan Wang Assignee: Igor Galić Priority: Minor Fix For: 3.1.2 Attachments: traffic_cop.diff When enable traffic_cop's debug log: #define TRACE_LOG_COP 1 Some cop_log invocation will cause segment fault, because va_list object in cop_log is used twice between 'va_start' and 'va_end'. {code} Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x7fff846b64f0 in strlen () (gdb) bt #0 0x7fff846b64f0 in strlen () #1 0x7fff846578c3 in __vfprintf () #2 0x7fff846a109b in vsprintf_l () #3 0x00011883 in cop_log (priority=5, format=0x172a8 --- Cop Starting [Version: %s] ---\n) at TrafficCop.cc:172 #4 0x00012244 in check_lockfile () at TrafficCop.cc:1733 #5 0x000122c0 in init () at TrafficCop.cc:1894 #6 0x00016689 in main (argc=1, argv=0x7fff5fbffbb0) at TrafficCop.cc:1958 {code} Reference: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdarg.h.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (TS-1074) PluginVC should schedule to the local queue instead of the external queue.
[ https://issues.apache.org/jira/browse/TS-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reopened TS-1074: reopen for backport PluginVC should schedule to the local queue instead of the external queue. -- Key: TS-1074 URL: https://issues.apache.org/jira/browse/TS-1074 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1 Reporter: Brian Geffon Assignee: Igor Galić Fix For: 3.1.2 Attachments: PluginVC.patch In TS-867 a patch was introduced to resolve a crash that was appearing w/ TSFetchURL, the patch would schedule events on the same thread if it is a net thread, if not it will only then schedule with the event processor. If you're scheduling on the same thread, wouldn't it be more efficient to place the event directly on the local queue? It turns out that going to the ExternalQueue under low load it would cause the event to become delayed. Patch Attached. To best see the symptoms see complaints in (TS-912 and TS-1043). I have verified that this patch fixes the 10ms symptom seen in TS-912 and TS-1043. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1074) PluginVC should schedule to the local queue instead of the external queue.
[ https://issues.apache.org/jira/browse/TS-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-1074: -- Assignee: Igor Galić (was: Brian Geffon) PluginVC should schedule to the local queue instead of the external queue. -- Key: TS-1074 URL: https://issues.apache.org/jira/browse/TS-1074 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1 Reporter: Brian Geffon Assignee: Igor Galić Fix For: 3.1.2 Attachments: PluginVC.patch In TS-867 a patch was introduced to resolve a crash that was appearing w/ TSFetchURL, the patch would schedule events on the same thread if it is a net thread, if not it will only then schedule with the event processor. If you're scheduling on the same thread, wouldn't it be more efficient to place the event directly on the local queue? It turns out that going to the ExternalQueue under low load it would cause the event to become delayed. Patch Attached. To best see the symptoms see complaints in (TS-912 and TS-1043). I have verified that this patch fixes the 10ms symptom seen in TS-912 and TS-1043. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1004) transformation plugins cause connection close when content length is not known ahead
[ https://issues.apache.org/jira/browse/TS-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-1004: -- Assignee: Igor Galić (was: Brian Geffon) transformation plugins cause connection close when content length is not known ahead Key: TS-1004 URL: https://issues.apache.org/jira/browse/TS-1004 Project: Traffic Server Issue Type: Bug Components: HTTP, Plugins Affects Versions: 3.0.1 Reporter: Otto van der Schaaf Assignee: Igor Galić Fix For: 3.1.1 Attachments: chunk_transform_response.diff whenever the null transform plugin (or gzip) is executed, ATS will force the ua connection closed. when the user agent supports it, sending a chunked response w/ keepalive enabled would be preferred i guess i'll add a patch for review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (TS-1049) TS hangs (dead lock) on HTTPS POST requests
[ https://issues.apache.org/jira/browse/TS-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reopened TS-1049: reopen for backport TS hangs (dead lock) on HTTPS POST requests --- Key: TS-1049 URL: https://issues.apache.org/jira/browse/TS-1049 Project: Traffic Server Issue Type: Bug Components: Core, HTTP, SSL Affects Versions: 3.1.1, 3.1.0, 3.0.2 Environment: RedHat Enterprise Linux 6.0, Intel 32-bit Reporter: Wilson Ho Assignee: Igor Galić Priority: Blocker Fix For: 3.1.2 Attachments: records.config A very reproducible bug where the body of a HTTPS POST request is never forwarded to the origin server. Client submits a HTTPS POST request to TS, which is supposed to forward to the backend/origin server via HTTP. TS process the HTTP headers and establishes connection to the origin server, but the body of the HTTPS POST is never read. This hangs until the client times out and shuts down the connection. To reproduce: # Client connects to TS using HTTPS (works OK if it is just HTTP). # It must be a POST request. # TS must use at least 2 worker threads. # Easier to reproduce when the connections to the origin server is HTTP (not HTTPS). # POST body must be large enough so that the HTTP request headers and POST body do *NOT* fit within the same TCP packet. (2000 bytes is a good size) # I can consistently reproduce this problem using 2 separate clients each simultaneously submitting 2 requests back to back (i.e., 2 requests from each client, a total of 4 requests). This gives you a high probability that at least one of the requests would hang. Observation: # Thread A accepted and processed the HTTP headers, and called UnixNetProcessor::connect_re to prepare a new connection to the origin server. # Thread A must not have read the body of the POST. Otherwise, it works fine. # Thread B was assigned the task to handle the origin server connection. If the same thread A was picked, then everything works fine. # Apparently, one of the first things that thread B does is to acquire the mutex for reading from the client. (Why does it do that??) # While thread B was holding the mutex, thread A proceeded in SSLNetVConnection::net_read_io, tried and failed to acquire the mutex. Thread A typically re-tried calling SSLNetVConnection::net_read_io soon, but gave up after the second failure. But if thread B released the mutex soon enough, that thread A could proceed happily and everything works. # From this point, the body of the POST is never read from the client, and there is nothing to be proxy'd to the origin server, and both the consumer and producer tasks are never scheduled to run again -- or until the client times out. I tried setting the client-side time out to as long as 3-5 minutes and TS really does not recover by itself until the client closed the connection. This is the first time I uses this bug system. Please let me know how I could produce the configuration files and trace logs, etc. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (TS-1061) TSHttpTxnServerReqHdrBytesGet in ./proxy/InkAPI.cc has an extra parameter (int *bytes) from the prototype in ./proxy/api/ts/ts.h. The extra parameter needs to be removed a
[ https://issues.apache.org/jira/browse/TS-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reopened TS-1061: reopen for backport TSHttpTxnServerReqHdrBytesGet in ./proxy/InkAPI.cc has an extra parameter (int *bytes) from the prototype in ./proxy/api/ts/ts.h. The extra parameter needs to be removed as it is not used. - Key: TS-1061 URL: https://issues.apache.org/jira/browse/TS-1061 Project: Traffic Server Issue Type: Bug Components: Plugins Affects Versions: 3.1.1, 3.0.1 Environment: Redhat Linux but it is not environment specific Reporter: Alistair Stevenson Assignee: Igor Galić Priority: Minor Labels: api-change Fix For: 3.1.2 Original Estimate: 1h Remaining Estimate: 1h The definitions are: ./proxy/InkAPI.cc:TSHttpTxnServerReqHdrBytesGet(TSHttpTxn txnp, int *bytes) ./proxy/api/ts/ts.h.in: tsapi int TSHttpTxnServerReqHdrBytesGet(TSHttpTxn txnp); The int * bytes parameter is not used and means that the function does not resolve and so cannot be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-876) forward map based on request receive port
[ https://issues.apache.org/jira/browse/TS-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-876: - Assignee: Igor Galić (was: Brian Geffon) forward map based on request receive port - Key: TS-876 URL: https://issues.apache.org/jira/browse/TS-876 Project: Traffic Server Issue Type: New Feature Components: Remap API Reporter: Manjesh Nilange Assignee: Igor Galić Fix For: 3.1.0 Attachments: TS876.fixed.patch, map_with_recv_port.patch Currently the port in the from fields of all remap rules are compared against the port in the request (explicitly in the request or implicitly deduced from the protocol). TS supports listening on multiple ports, so there is a use case for a remap rule that uses the TS port at which the request is received instead of the request port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (TS-1038) TSHttpTxnErrorBodySet() can leak memory (pt 2)
[ https://issues.apache.org/jira/browse/TS-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reopened TS-1038: reopen for backport TSHttpTxnErrorBodySet() can leak memory (pt 2) -- Key: TS-1038 URL: https://issues.apache.org/jira/browse/TS-1038 Project: Traffic Server Issue Type: Bug Affects Versions: 3.0.1 Reporter: Brian Geffon Assignee: Igor Galić Fix For: 3.1.2 Attachments: TSHttpTxnErrorBodySet.patch TS-826 resolved a memory leak with TSHttpTxnErrorBodySet but it appears that mimetype is still being leaked. See HttpSM::setup_internal_transfer line 5416 which frees internal_msg_buffer_type...it's expected that mimetype was malloced since clearly it's being freed. So that means there is still a memory leak in TSHttpTxnErrorBodySet(). Patch included. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TS-1095) 3.0.x ts.h.in has incorrect declaration for TSFetchURL
[ https://issues.apache.org/jira/browse/TS-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Galić reassigned TS-1095: -- Assignee: Igor Galić (was: Brian Geffon) 3.0.x ts.h.in has incorrect declaration for TSFetchURL -- Key: TS-1095 URL: https://issues.apache.org/jira/browse/TS-1095 Project: Traffic Server Issue Type: Bug Affects Versions: 3.0.2 Reporter: Brian Geffon Assignee: Igor Galić Fix For: 3.0.3 Attachments: ts.h.in.patch If you look at the declaration in ts.h.in for TSFetchURL it doesn't match the definition in InkAPI.cc. Patch attached, and updating STATUS file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199504#comment-13199504 ] Leif Hedstrom commented on TS-937: -- Brian, should we keep this for 3.1.2, or move out to 3.1.3 ? EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.2 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199506#comment-13199506 ] Brian Geffon commented on TS-937: - I'll move it out to 3.1.3, I'm preparing my fix. Basically it's going to involve 1) blowing away the HANDLER_NAME macro for a quick fix, 2) Identifying why PluginVC is canceling an action without holding the lock, and 3) blowing away the TS_HAS_PURIFY crap, which is not a big deal but affects hundreds of files. EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.3 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Geffon updated TS-937: Fix Version/s: (was: 3.1.2) 3.1.3 EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.3 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira