[jira] [Commented] (TS-1100) Coredump at startup when there's a duplicate remap rule
[ https://issues.apache.org/jira/browse/TS-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197789#comment-13197789 ] Nick Kew commented on TS-1100: -- To exit at startup on such an error - fine (though a clear error message would be helpful - starting up through the trafficserver script leaves only cryptic failure). But a coredump? What kind of by design is that? Coredump at startup when there's a duplicate remap rule --- Key: TS-1100 URL: https://issues.apache.org/jira/browse/TS-1100 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.1 Environment: Bog standard linux/x86; own build. Reporter: Nick Kew Priority: Minor A minor accident with cutpaste in vi leads to TS coredumping at startup. I had duplicated a line in remap.config: map http://myhost:8080/ http://target/ () map http://myhost:8080/ http://target/ The second instance is line 125 in remap.config, and the dump shows: FATAL: [ReverseProxy] Unable to add mapping rule to lookup table at line 125 bin/traffic_server - STACK TRACE: /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal_va+0xc7)[0xe21873] /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal+0x2b)[0xe218c5] bin/traffic_server(_ZN10UrlRewrite10BuildTableEv+0x1e48)[0x81e1120] bin/traffic_server(_ZN10UrlRewriteC1EPKc+0x562)[0x810] bin/traffic_server(_Z18init_reverse_proxyv+0x41)[0x815500b] bin/traffic_server(_Z20init_HttpProxyServerv+0xe)[0x818fccc] bin/traffic_server(main+0xf7f)[0x813b65d] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0x692bd6] bin/traffic_server[0x80f6001] Aborted -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-974) TS should have a mode to hold partial objects in cache
[ https://issues.apache.org/jira/browse/TS-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197819#comment-13197819 ] Johan Acevedo commented on TS-974: -- Is there currently any cache software for http that manages to do this? TS should have a mode to hold partial objects in cache -- Key: TS-974 URL: https://issues.apache.org/jira/browse/TS-974 Project: Traffic Server Issue Type: Improvement Components: Cache Affects Versions: 3.0.1 Reporter: William Bardwell For ATS to do an excelent job caching large files like video it would need to be able to hold partial objects for a large file. This could be done in a plugin or in the core. This would need to be integrated with the Range handling code to serve requests out of the partial objects and to get more parts of a file to satisfy a Range request. An intermediate step (also do-able in the core or in a plugin) would be to have some settings to let the Range handling code be able to trigger a full file download either asynchronously when a Range response indicates that the file isn't larger than some threshold, or synchronously when a Range request could reasonably be answered quickly from a full request. (Right now Range requests are tunneled if there is not full cached content as far as I can tell.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-475) HTTP SM should support efficient byte range requests
[ https://issues.apache.org/jira/browse/TS-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197820#comment-13197820 ] Johan Acevedo commented on TS-475: -- Has there been any new work done on this? HTTP SM should support efficient byte range requests Key: TS-475 URL: https://issues.apache.org/jira/browse/TS-475 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Leif Hedstrom Priority: Critical Fix For: 3.1.3 The cache has support for efficiently locate a particular range in the cached object, but the HTTP SM does not support this. In order to make Range: request efficient (particularly on large objects), the SM should support this new cache feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1100) Coredump at startup when there's a duplicate remap rule
[ https://issues.apache.org/jira/browse/TS-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197935#comment-13197935 ] Conan Wang commented on TS-1100: I think TS-948 has fixed this? https://issues.apache.org/jira/browse/TS-948 Coredump at startup when there's a duplicate remap rule --- Key: TS-1100 URL: https://issues.apache.org/jira/browse/TS-1100 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.1 Environment: Bog standard linux/x86; own build. Reporter: Nick Kew Priority: Minor A minor accident with cutpaste in vi leads to TS coredumping at startup. I had duplicated a line in remap.config: map http://myhost:8080/ http://target/ () map http://myhost:8080/ http://target/ The second instance is line 125 in remap.config, and the dump shows: FATAL: [ReverseProxy] Unable to add mapping rule to lookup table at line 125 bin/traffic_server - STACK TRACE: /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal_va+0xc7)[0xe21873] /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal+0x2b)[0xe218c5] bin/traffic_server(_ZN10UrlRewrite10BuildTableEv+0x1e48)[0x81e1120] bin/traffic_server(_ZN10UrlRewriteC1EPKc+0x562)[0x810] bin/traffic_server(_Z18init_reverse_proxyv+0x41)[0x815500b] bin/traffic_server(_Z20init_HttpProxyServerv+0xe)[0x818fccc] bin/traffic_server(main+0xf7f)[0x813b65d] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0x692bd6] bin/traffic_server[0x80f6001] Aborted -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1094) TS hangs after repeated requests from the same kept-alive connection
[ https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan M. Carroll updated TS-1094: Attachment: ts-1094.diff TS hangs after repeated requests from the same kept-alive connection Key: TS-1094 URL: https://issues.apache.org/jira/browse/TS-1094 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Oracle Enterprise Linux 5.5 64-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: ts-1094.diff When a client submits multiple requests while re-using the same keep-alived connection, TS hangs. Usually, the client eventually times out, and at that point TS will be waken up and forwards the request to the original server. But by then it's too late and the client already closed connection. In real life traffic, this bug is very hard to reproduce. But here is an artificial test case. First, make sure client-side keep alive is on. My test case uses HTTP (port 80) GET. Second, make sure the total header size of the requests is exactly 275 bytes, including the carriage returns and line feeds. One byte more or less would fail to reproduce the bug. Third, repeatedly submit the same request through this keep-alived connection. At exactly the 283rd iteration, TS hangs. Note that if the client opens a new connection every time, TS works fine. There is a second test case, where the header size is exactly 283 bytes, and TS hangs at exactly the 275th iteration. (Does 275 x 283 mean something?) These magic numbers seem to suggest a memory buffer size (or allocation) problem. I speculate that headers from repeated requests are placed in a buffer (or a circular buffer?), and when the total hits a particular size, some boundary conditions must have be violated and resulted in memory corruption. In real life traffic, each request typically has slightly different header size, so it is really hard to hit this bug. I suspect there is a +/- 1 calculation error in some buffer. BTW, turning on/off caching does not make any difference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection
[ https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197964#comment-13197964 ] Alan M. Carroll commented on TS-1094: - I have attached an alternative patch, which on my magic dev box fixes the problem, which I think was the case where the input buffer ended with the CR of a CR LF pair and the CR was inappropriately counted as normal input. TS hangs after repeated requests from the same kept-alive connection Key: TS-1094 URL: https://issues.apache.org/jira/browse/TS-1094 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Oracle Enterprise Linux 5.5 64-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: ts-1094.diff When a client submits multiple requests while re-using the same keep-alived connection, TS hangs. Usually, the client eventually times out, and at that point TS will be waken up and forwards the request to the original server. But by then it's too late and the client already closed connection. In real life traffic, this bug is very hard to reproduce. But here is an artificial test case. First, make sure client-side keep alive is on. My test case uses HTTP (port 80) GET. Second, make sure the total header size of the requests is exactly 275 bytes, including the carriage returns and line feeds. One byte more or less would fail to reproduce the bug. Third, repeatedly submit the same request through this keep-alived connection. At exactly the 283rd iteration, TS hangs. Note that if the client opens a new connection every time, TS works fine. There is a second test case, where the header size is exactly 283 bytes, and TS hangs at exactly the 275th iteration. (Does 275 x 283 mean something?) These magic numbers seem to suggest a memory buffer size (or allocation) problem. I speculate that headers from repeated requests are placed in a buffer (or a circular buffer?), and when the total hits a particular size, some boundary conditions must have be violated and resulted in memory corruption. In real life traffic, each request typically has slightly different header size, so it is really hard to hit this bug. I suspect there is a +/- 1 calculation error in some buffer. BTW, turning on/off caching does not make any difference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection
[ https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197969#comment-13197969 ] Leif Hedstrom commented on TS-1094: --- Cool, I'll test this in a bit. Question: Do we still need that second change from TS-466, we reorders the call to mime_scanner_append() ? TS hangs after repeated requests from the same kept-alive connection Key: TS-1094 URL: https://issues.apache.org/jira/browse/TS-1094 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Oracle Enterprise Linux 5.5 64-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: ts-1094.diff When a client submits multiple requests while re-using the same keep-alived connection, TS hangs. Usually, the client eventually times out, and at that point TS will be waken up and forwards the request to the original server. But by then it's too late and the client already closed connection. In real life traffic, this bug is very hard to reproduce. But here is an artificial test case. First, make sure client-side keep alive is on. My test case uses HTTP (port 80) GET. Second, make sure the total header size of the requests is exactly 275 bytes, including the carriage returns and line feeds. One byte more or less would fail to reproduce the bug. Third, repeatedly submit the same request through this keep-alived connection. At exactly the 283rd iteration, TS hangs. Note that if the client opens a new connection every time, TS works fine. There is a second test case, where the header size is exactly 283 bytes, and TS hangs at exactly the 275th iteration. (Does 275 x 283 mean something?) These magic numbers seem to suggest a memory buffer size (or allocation) problem. I speculate that headers from repeated requests are placed in a buffer (or a circular buffer?), and when the total hits a particular size, some boundary conditions must have be violated and resulted in memory corruption. In real life traffic, each request typically has slightly different header size, so it is really hard to hit this bug. I suspect there is a +/- 1 calculation error in some buffer. BTW, turning on/off caching does not make any difference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1094) TS hangs after repeated requests from the same kept-alive connection
[ https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan M. Carroll updated TS-1094: Attachment: ts-1094.diff TS hangs after repeated requests from the same kept-alive connection Key: TS-1094 URL: https://issues.apache.org/jira/browse/TS-1094 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Oracle Enterprise Linux 5.5 64-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: ts-1094.diff When a client submits multiple requests while re-using the same keep-alived connection, TS hangs. Usually, the client eventually times out, and at that point TS will be waken up and forwards the request to the original server. But by then it's too late and the client already closed connection. In real life traffic, this bug is very hard to reproduce. But here is an artificial test case. First, make sure client-side keep alive is on. My test case uses HTTP (port 80) GET. Second, make sure the total header size of the requests is exactly 275 bytes, including the carriage returns and line feeds. One byte more or less would fail to reproduce the bug. Third, repeatedly submit the same request through this keep-alived connection. At exactly the 283rd iteration, TS hangs. Note that if the client opens a new connection every time, TS works fine. There is a second test case, where the header size is exactly 283 bytes, and TS hangs at exactly the 275th iteration. (Does 275 x 283 mean something?) These magic numbers seem to suggest a memory buffer size (or allocation) problem. I speculate that headers from repeated requests are placed in a buffer (or a circular buffer?), and when the total hits a particular size, some boundary conditions must have be violated and resulted in memory corruption. In real life traffic, each request typically has slightly different header size, so it is really hard to hit this bug. I suspect there is a +/- 1 calculation error in some buffer. BTW, turning on/off caching does not make any difference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1094) TS hangs after repeated requests from the same kept-alive connection
[ https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan M. Carroll updated TS-1094: Attachment: (was: ts-1094.diff) TS hangs after repeated requests from the same kept-alive connection Key: TS-1094 URL: https://issues.apache.org/jira/browse/TS-1094 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Oracle Enterprise Linux 5.5 64-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: ts-1094.diff When a client submits multiple requests while re-using the same keep-alived connection, TS hangs. Usually, the client eventually times out, and at that point TS will be waken up and forwards the request to the original server. But by then it's too late and the client already closed connection. In real life traffic, this bug is very hard to reproduce. But here is an artificial test case. First, make sure client-side keep alive is on. My test case uses HTTP (port 80) GET. Second, make sure the total header size of the requests is exactly 275 bytes, including the carriage returns and line feeds. One byte more or less would fail to reproduce the bug. Third, repeatedly submit the same request through this keep-alived connection. At exactly the 283rd iteration, TS hangs. Note that if the client opens a new connection every time, TS works fine. There is a second test case, where the header size is exactly 283 bytes, and TS hangs at exactly the 275th iteration. (Does 275 x 283 mean something?) These magic numbers seem to suggest a memory buffer size (or allocation) problem. I speculate that headers from repeated requests are placed in a buffer (or a circular buffer?), and when the total hits a particular size, some boundary conditions must have be violated and resulted in memory corruption. In real life traffic, each request typically has slightly different header size, so it is really hard to hit this bug. I suspect there is a +/- 1 calculation error in some buffer. BTW, turning on/off caching does not make any difference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection
[ https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198098#comment-13198098 ] Alan M. Carroll commented on TS-1094: - Yes. Some checks need to be done that depend on data_size, so it can't be cleared until after that. TS hangs after repeated requests from the same kept-alive connection Key: TS-1094 URL: https://issues.apache.org/jira/browse/TS-1094 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Oracle Enterprise Linux 5.5 64-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: ts-1094.diff When a client submits multiple requests while re-using the same keep-alived connection, TS hangs. Usually, the client eventually times out, and at that point TS will be waken up and forwards the request to the original server. But by then it's too late and the client already closed connection. In real life traffic, this bug is very hard to reproduce. But here is an artificial test case. First, make sure client-side keep alive is on. My test case uses HTTP (port 80) GET. Second, make sure the total header size of the requests is exactly 275 bytes, including the carriage returns and line feeds. One byte more or less would fail to reproduce the bug. Third, repeatedly submit the same request through this keep-alived connection. At exactly the 283rd iteration, TS hangs. Note that if the client opens a new connection every time, TS works fine. There is a second test case, where the header size is exactly 283 bytes, and TS hangs at exactly the 275th iteration. (Does 275 x 283 mean something?) These magic numbers seem to suggest a memory buffer size (or allocation) problem. I speculate that headers from repeated requests are placed in a buffer (or a circular buffer?), and when the total hits a particular size, some boundary conditions must have be violated and resulted in memory corruption. In real life traffic, each request typically has slightly different header size, so it is really hard to hit this bug. I suspect there is a +/- 1 calculation error in some buffer. BTW, turning on/off caching does not make any difference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection
[ https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198170#comment-13198170 ] Wilson Ho commented on TS-1094: --- Thank you very much for the patch! I've verified that it fixes our problem. TS hangs after repeated requests from the same kept-alive connection Key: TS-1094 URL: https://issues.apache.org/jira/browse/TS-1094 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Oracle Enterprise Linux 5.5 64-bit Reporter: Wilson Ho Assignee: Leif Hedstrom Priority: Blocker Fix For: 3.1.2 Attachments: ts-1094.diff When a client submits multiple requests while re-using the same keep-alived connection, TS hangs. Usually, the client eventually times out, and at that point TS will be waken up and forwards the request to the original server. But by then it's too late and the client already closed connection. In real life traffic, this bug is very hard to reproduce. But here is an artificial test case. First, make sure client-side keep alive is on. My test case uses HTTP (port 80) GET. Second, make sure the total header size of the requests is exactly 275 bytes, including the carriage returns and line feeds. One byte more or less would fail to reproduce the bug. Third, repeatedly submit the same request through this keep-alived connection. At exactly the 283rd iteration, TS hangs. Note that if the client opens a new connection every time, TS works fine. There is a second test case, where the header size is exactly 283 bytes, and TS hangs at exactly the 275th iteration. (Does 275 x 283 mean something?) These magic numbers seem to suggest a memory buffer size (or allocation) problem. I speculate that headers from repeated requests are placed in a buffer (or a circular buffer?), and when the total hits a particular size, some boundary conditions must have be violated and resulted in memory corruption. In real life traffic, each request typically has slightly different header size, so it is really hard to hit this bug. I suspect there is a +/- 1 calculation error in some buffer. BTW, turning on/off caching does not make any difference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Geffon updated TS-937: Comment: was deleted (was: So, I've began to look into this bug again. To try to determine where the action is being canceled I modified Action to add a const char * volatile cancelled_by; and then simply replaced any instance of -cancel() to pass the name of the method doing the cancelling: ) EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.3 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198522#comment-13198522 ] Brian Geffon commented on TS-937: - So, I've began to look into this bug again. To try to determine where the action is being canceled I modified Action to add a const char * volatile cancelled_by; and then simply replaced any instance of -cancel() to pass the name of the method doing the cancelling: EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.3 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198523#comment-13198523 ] Brian Geffon commented on TS-937: - So I've tracked down where the event is being cancelled: PluginVC::process_close() line 699: if (core_lock_retry_event) { core_lock_retry_event-cancel(); core_lock_retry_event = NULL; } EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.3 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198526#comment-13198526 ] Brian Geffon commented on TS-937: - So it turns out that this bug is fixed in TS-1074, I've verified this and am closing this bug. EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.2 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Geffon reopened TS-937: - EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.2 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198530#comment-13198530 ] Brian Geffon commented on TS-937: - nevermind, I lied. Reopening :/ EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.2 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198532#comment-13198532 ] Brian Geffon commented on TS-937: - So it appears that the event being cancelled is an event callback related to a MUTEX being held. See PluginVC.cc:489 EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.2 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Geffon updated TS-937: Comment: was deleted (was: So it turns out that this bug is fixed in TS-1074, I've verified this and am closing this bug.) EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.2 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-937) EThread::execute still processing cancelled event
[ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Geffon updated TS-937: Comment: was deleted (was: Ignore notifications about a comment talking about inclusion in 3.0.2, I was mistaken. ) EThread::execute still processing cancelled event - Key: TS-937 URL: https://issues.apache.org/jira/browse/TS-937 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.0.1, 2.1.9 Environment: RHEL6 Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.1.2 Attachments: UnixEThread.patch The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. Brian Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x764fa700 (LWP 28518)] 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64 (gdb) bt #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 #4 0x00361f8e577d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202} #1 0x006fcbaf in EThread::execute (this=0x768ff010) at UnixEThread.cc:232 done_one = false e = 0x1db45c0 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, tail = 0xfc75f0} next_time = 1314647904419648000 #2 0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88 p = 0xfb7e80 #3 0x0036204077e1 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00361f8e577d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) f 0 #0 0x006fc663 in EThread::process_event (this=0x768ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130 130 MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation); (gdb) p *e $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 0x0}, prev = 0x0}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira