[jira] [Commented] (TS-1100) Coredump at startup when there's a duplicate remap rule

2012-02-01 Thread Nick Kew (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197789#comment-13197789
 ] 

Nick Kew commented on TS-1100:
--

To exit at startup on such an error - fine (though a clear error message would 
be helpful - starting up through the trafficserver script leaves only cryptic 
failure).

But a coredump?  What kind of by design is that?

 Coredump at startup when there's a duplicate remap rule
 ---

 Key: TS-1100
 URL: https://issues.apache.org/jira/browse/TS-1100
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.1
 Environment: Bog standard linux/x86; own build.
Reporter: Nick Kew
Priority: Minor

 A minor accident with cutpaste in vi leads to TS coredumping at startup.
 I had duplicated a line in remap.config:
 map http://myhost:8080/  http://target/
 ()
 map http://myhost:8080/  http://target/
 The second instance is line 125 in remap.config, and the dump shows:
 FATAL: [ReverseProxy] Unable to add mapping rule to lookup table at line 125
 bin/traffic_server - STACK TRACE: 
 /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal_va+0xc7)[0xe21873]
 /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal+0x2b)[0xe218c5]
 bin/traffic_server(_ZN10UrlRewrite10BuildTableEv+0x1e48)[0x81e1120]
 bin/traffic_server(_ZN10UrlRewriteC1EPKc+0x562)[0x810]
 bin/traffic_server(_Z18init_reverse_proxyv+0x41)[0x815500b]
 bin/traffic_server(_Z20init_HttpProxyServerv+0xe)[0x818fccc]
 bin/traffic_server(main+0xf7f)[0x813b65d]
 /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0x692bd6]
 bin/traffic_server[0x80f6001]
 Aborted

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-974) TS should have a mode to hold partial objects in cache

2012-02-01 Thread Johan Acevedo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197819#comment-13197819
 ] 

Johan Acevedo commented on TS-974:
--

Is there currently any cache software for http that manages to do this?

 TS should have a mode to hold partial objects in cache
 --

 Key: TS-974
 URL: https://issues.apache.org/jira/browse/TS-974
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Affects Versions: 3.0.1
Reporter: William Bardwell

 For ATS to do an excelent job caching large files like video it would need to 
 be able to hold partial objects for a large file.  This could be done in a 
 plugin or in the core.  This would need to be integrated with the Range 
 handling code to serve requests out of the partial objects and to get more 
 parts of a file to satisfy a Range request.
 An intermediate step (also do-able in the core or in a plugin) would be to 
 have some settings to let the Range handling code be able to trigger a full 
 file download either asynchronously when a Range response indicates that the 
 file isn't larger than some threshold, or synchronously when a Range request 
 could reasonably be answered quickly from a full request.  (Right now Range 
 requests are tunneled if there is not full cached content as far as I can 
 tell.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-475) HTTP SM should support efficient byte range requests

2012-02-01 Thread Johan Acevedo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197820#comment-13197820
 ] 

Johan Acevedo commented on TS-475:
--

Has there been any new work done on this?

 HTTP SM should support efficient byte range requests
 

 Key: TS-475
 URL: https://issues.apache.org/jira/browse/TS-475
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Leif Hedstrom
Priority: Critical
 Fix For: 3.1.3


 The cache has support for efficiently locate a particular range in the cached 
 object, but the HTTP SM does not support this. In order to make Range: 
 request efficient (particularly on large objects), the SM should support this 
 new cache feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1100) Coredump at startup when there's a duplicate remap rule

2012-02-01 Thread Conan Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197935#comment-13197935
 ] 

Conan Wang commented on TS-1100:


I think TS-948 has fixed this?
https://issues.apache.org/jira/browse/TS-948

 Coredump at startup when there's a duplicate remap rule
 ---

 Key: TS-1100
 URL: https://issues.apache.org/jira/browse/TS-1100
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.1
 Environment: Bog standard linux/x86; own build.
Reporter: Nick Kew
Priority: Minor

 A minor accident with cutpaste in vi leads to TS coredumping at startup.
 I had duplicated a line in remap.config:
 map http://myhost:8080/  http://target/
 ()
 map http://myhost:8080/  http://target/
 The second instance is line 125 in remap.config, and the dump shows:
 FATAL: [ReverseProxy] Unable to add mapping rule to lookup table at line 125
 bin/traffic_server - STACK TRACE: 
 /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal_va+0xc7)[0xe21873]
 /usr/local/trafficserver/lib/libtsutil.so.3(ink_fatal+0x2b)[0xe218c5]
 bin/traffic_server(_ZN10UrlRewrite10BuildTableEv+0x1e48)[0x81e1120]
 bin/traffic_server(_ZN10UrlRewriteC1EPKc+0x562)[0x810]
 bin/traffic_server(_Z18init_reverse_proxyv+0x41)[0x815500b]
 bin/traffic_server(_Z20init_HttpProxyServerv+0xe)[0x818fccc]
 bin/traffic_server(main+0xf7f)[0x813b65d]
 /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0x692bd6]
 bin/traffic_server[0x80f6001]
 Aborted

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-1094) TS hangs after repeated requests from the same kept-alive connection

2012-02-01 Thread Alan M. Carroll (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan M. Carroll updated TS-1094:


Attachment: ts-1094.diff

 TS hangs after repeated requests from the same kept-alive connection
 

 Key: TS-1094
 URL: https://issues.apache.org/jira/browse/TS-1094
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Oracle Enterprise Linux 5.5 64-bit
Reporter: Wilson Ho
Assignee: Leif Hedstrom
Priority: Blocker
 Fix For: 3.1.2

 Attachments: ts-1094.diff


 When a client submits multiple requests while re-using the same keep-alived 
 connection, TS hangs.  Usually, the client eventually times out, and at that 
 point TS will be waken up and forwards the request to the original server.  
 But by then it's too late and the client already closed connection.
 In real life traffic, this bug is very hard to reproduce.  But here is an 
 artificial test case.
 First, make sure client-side keep alive is on.  My test case uses HTTP (port 
 80) GET.
 Second, make sure the total header size of the requests is exactly 275 bytes, 
 including the carriage returns and line feeds.  One byte more or less would 
 fail to reproduce the bug.
 Third, repeatedly submit the same request through this keep-alived 
 connection.  At exactly the 283rd iteration, TS hangs.  Note that if the 
 client opens a new connection every time, TS works fine.
 There is a second test case, where the header size is exactly 283 bytes, and 
 TS hangs at exactly the 275th iteration.  (Does 275 x 283 mean something?)
 These magic numbers seem to suggest a memory buffer size (or allocation) 
 problem.  I speculate that headers from repeated requests are placed in a 
 buffer (or a circular buffer?), and when the total hits a particular size, 
 some boundary conditions must have be violated and resulted in memory 
 corruption.
 In real life traffic, each request typically has slightly different header 
 size, so it is really hard to hit this bug.  I suspect there is a +/- 1 
 calculation error in some buffer.
 BTW, turning on/off caching does not make any difference.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection

2012-02-01 Thread Alan M. Carroll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197964#comment-13197964
 ] 

Alan M. Carroll commented on TS-1094:
-

I have attached an alternative patch, which on my magic dev box fixes the 
problem, which I think was the case where the input buffer ended with the CR of 
a CR LF pair and the CR was inappropriately counted as normal input.

 TS hangs after repeated requests from the same kept-alive connection
 

 Key: TS-1094
 URL: https://issues.apache.org/jira/browse/TS-1094
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Oracle Enterprise Linux 5.5 64-bit
Reporter: Wilson Ho
Assignee: Leif Hedstrom
Priority: Blocker
 Fix For: 3.1.2

 Attachments: ts-1094.diff


 When a client submits multiple requests while re-using the same keep-alived 
 connection, TS hangs.  Usually, the client eventually times out, and at that 
 point TS will be waken up and forwards the request to the original server.  
 But by then it's too late and the client already closed connection.
 In real life traffic, this bug is very hard to reproduce.  But here is an 
 artificial test case.
 First, make sure client-side keep alive is on.  My test case uses HTTP (port 
 80) GET.
 Second, make sure the total header size of the requests is exactly 275 bytes, 
 including the carriage returns and line feeds.  One byte more or less would 
 fail to reproduce the bug.
 Third, repeatedly submit the same request through this keep-alived 
 connection.  At exactly the 283rd iteration, TS hangs.  Note that if the 
 client opens a new connection every time, TS works fine.
 There is a second test case, where the header size is exactly 283 bytes, and 
 TS hangs at exactly the 275th iteration.  (Does 275 x 283 mean something?)
 These magic numbers seem to suggest a memory buffer size (or allocation) 
 problem.  I speculate that headers from repeated requests are placed in a 
 buffer (or a circular buffer?), and when the total hits a particular size, 
 some boundary conditions must have be violated and resulted in memory 
 corruption.
 In real life traffic, each request typically has slightly different header 
 size, so it is really hard to hit this bug.  I suspect there is a +/- 1 
 calculation error in some buffer.
 BTW, turning on/off caching does not make any difference.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection

2012-02-01 Thread Leif Hedstrom (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197969#comment-13197969
 ] 

Leif Hedstrom commented on TS-1094:
---

Cool, I'll test this in a bit. Question: Do we still need that second change 
from TS-466, we reorders the call to mime_scanner_append() ?

 TS hangs after repeated requests from the same kept-alive connection
 

 Key: TS-1094
 URL: https://issues.apache.org/jira/browse/TS-1094
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Oracle Enterprise Linux 5.5 64-bit
Reporter: Wilson Ho
Assignee: Leif Hedstrom
Priority: Blocker
 Fix For: 3.1.2

 Attachments: ts-1094.diff


 When a client submits multiple requests while re-using the same keep-alived 
 connection, TS hangs.  Usually, the client eventually times out, and at that 
 point TS will be waken up and forwards the request to the original server.  
 But by then it's too late and the client already closed connection.
 In real life traffic, this bug is very hard to reproduce.  But here is an 
 artificial test case.
 First, make sure client-side keep alive is on.  My test case uses HTTP (port 
 80) GET.
 Second, make sure the total header size of the requests is exactly 275 bytes, 
 including the carriage returns and line feeds.  One byte more or less would 
 fail to reproduce the bug.
 Third, repeatedly submit the same request through this keep-alived 
 connection.  At exactly the 283rd iteration, TS hangs.  Note that if the 
 client opens a new connection every time, TS works fine.
 There is a second test case, where the header size is exactly 283 bytes, and 
 TS hangs at exactly the 275th iteration.  (Does 275 x 283 mean something?)
 These magic numbers seem to suggest a memory buffer size (or allocation) 
 problem.  I speculate that headers from repeated requests are placed in a 
 buffer (or a circular buffer?), and when the total hits a particular size, 
 some boundary conditions must have be violated and resulted in memory 
 corruption.
 In real life traffic, each request typically has slightly different header 
 size, so it is really hard to hit this bug.  I suspect there is a +/- 1 
 calculation error in some buffer.
 BTW, turning on/off caching does not make any difference.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-1094) TS hangs after repeated requests from the same kept-alive connection

2012-02-01 Thread Alan M. Carroll (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan M. Carroll updated TS-1094:


Attachment: ts-1094.diff

 TS hangs after repeated requests from the same kept-alive connection
 

 Key: TS-1094
 URL: https://issues.apache.org/jira/browse/TS-1094
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Oracle Enterprise Linux 5.5 64-bit
Reporter: Wilson Ho
Assignee: Leif Hedstrom
Priority: Blocker
 Fix For: 3.1.2

 Attachments: ts-1094.diff


 When a client submits multiple requests while re-using the same keep-alived 
 connection, TS hangs.  Usually, the client eventually times out, and at that 
 point TS will be waken up and forwards the request to the original server.  
 But by then it's too late and the client already closed connection.
 In real life traffic, this bug is very hard to reproduce.  But here is an 
 artificial test case.
 First, make sure client-side keep alive is on.  My test case uses HTTP (port 
 80) GET.
 Second, make sure the total header size of the requests is exactly 275 bytes, 
 including the carriage returns and line feeds.  One byte more or less would 
 fail to reproduce the bug.
 Third, repeatedly submit the same request through this keep-alived 
 connection.  At exactly the 283rd iteration, TS hangs.  Note that if the 
 client opens a new connection every time, TS works fine.
 There is a second test case, where the header size is exactly 283 bytes, and 
 TS hangs at exactly the 275th iteration.  (Does 275 x 283 mean something?)
 These magic numbers seem to suggest a memory buffer size (or allocation) 
 problem.  I speculate that headers from repeated requests are placed in a 
 buffer (or a circular buffer?), and when the total hits a particular size, 
 some boundary conditions must have be violated and resulted in memory 
 corruption.
 In real life traffic, each request typically has slightly different header 
 size, so it is really hard to hit this bug.  I suspect there is a +/- 1 
 calculation error in some buffer.
 BTW, turning on/off caching does not make any difference.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-1094) TS hangs after repeated requests from the same kept-alive connection

2012-02-01 Thread Alan M. Carroll (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan M. Carroll updated TS-1094:


Attachment: (was: ts-1094.diff)

 TS hangs after repeated requests from the same kept-alive connection
 

 Key: TS-1094
 URL: https://issues.apache.org/jira/browse/TS-1094
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Oracle Enterprise Linux 5.5 64-bit
Reporter: Wilson Ho
Assignee: Leif Hedstrom
Priority: Blocker
 Fix For: 3.1.2

 Attachments: ts-1094.diff


 When a client submits multiple requests while re-using the same keep-alived 
 connection, TS hangs.  Usually, the client eventually times out, and at that 
 point TS will be waken up and forwards the request to the original server.  
 But by then it's too late and the client already closed connection.
 In real life traffic, this bug is very hard to reproduce.  But here is an 
 artificial test case.
 First, make sure client-side keep alive is on.  My test case uses HTTP (port 
 80) GET.
 Second, make sure the total header size of the requests is exactly 275 bytes, 
 including the carriage returns and line feeds.  One byte more or less would 
 fail to reproduce the bug.
 Third, repeatedly submit the same request through this keep-alived 
 connection.  At exactly the 283rd iteration, TS hangs.  Note that if the 
 client opens a new connection every time, TS works fine.
 There is a second test case, where the header size is exactly 283 bytes, and 
 TS hangs at exactly the 275th iteration.  (Does 275 x 283 mean something?)
 These magic numbers seem to suggest a memory buffer size (or allocation) 
 problem.  I speculate that headers from repeated requests are placed in a 
 buffer (or a circular buffer?), and when the total hits a particular size, 
 some boundary conditions must have be violated and resulted in memory 
 corruption.
 In real life traffic, each request typically has slightly different header 
 size, so it is really hard to hit this bug.  I suspect there is a +/- 1 
 calculation error in some buffer.
 BTW, turning on/off caching does not make any difference.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection

2012-02-01 Thread Alan M. Carroll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198098#comment-13198098
 ] 

Alan M. Carroll commented on TS-1094:
-

Yes. Some checks need to be done that depend on data_size, so it can't be 
cleared until after that.

 TS hangs after repeated requests from the same kept-alive connection
 

 Key: TS-1094
 URL: https://issues.apache.org/jira/browse/TS-1094
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Oracle Enterprise Linux 5.5 64-bit
Reporter: Wilson Ho
Assignee: Leif Hedstrom
Priority: Blocker
 Fix For: 3.1.2

 Attachments: ts-1094.diff


 When a client submits multiple requests while re-using the same keep-alived 
 connection, TS hangs.  Usually, the client eventually times out, and at that 
 point TS will be waken up and forwards the request to the original server.  
 But by then it's too late and the client already closed connection.
 In real life traffic, this bug is very hard to reproduce.  But here is an 
 artificial test case.
 First, make sure client-side keep alive is on.  My test case uses HTTP (port 
 80) GET.
 Second, make sure the total header size of the requests is exactly 275 bytes, 
 including the carriage returns and line feeds.  One byte more or less would 
 fail to reproduce the bug.
 Third, repeatedly submit the same request through this keep-alived 
 connection.  At exactly the 283rd iteration, TS hangs.  Note that if the 
 client opens a new connection every time, TS works fine.
 There is a second test case, where the header size is exactly 283 bytes, and 
 TS hangs at exactly the 275th iteration.  (Does 275 x 283 mean something?)
 These magic numbers seem to suggest a memory buffer size (or allocation) 
 problem.  I speculate that headers from repeated requests are placed in a 
 buffer (or a circular buffer?), and when the total hits a particular size, 
 some boundary conditions must have be violated and resulted in memory 
 corruption.
 In real life traffic, each request typically has slightly different header 
 size, so it is really hard to hit this bug.  I suspect there is a +/- 1 
 calculation error in some buffer.
 BTW, turning on/off caching does not make any difference.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1094) TS hangs after repeated requests from the same kept-alive connection

2012-02-01 Thread Wilson Ho (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198170#comment-13198170
 ] 

Wilson Ho commented on TS-1094:
---

Thank you very much for the patch!  I've verified that it fixes our problem.

 TS hangs after repeated requests from the same kept-alive connection
 

 Key: TS-1094
 URL: https://issues.apache.org/jira/browse/TS-1094
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Oracle Enterprise Linux 5.5 64-bit
Reporter: Wilson Ho
Assignee: Leif Hedstrom
Priority: Blocker
 Fix For: 3.1.2

 Attachments: ts-1094.diff


 When a client submits multiple requests while re-using the same keep-alived 
 connection, TS hangs.  Usually, the client eventually times out, and at that 
 point TS will be waken up and forwards the request to the original server.  
 But by then it's too late and the client already closed connection.
 In real life traffic, this bug is very hard to reproduce.  But here is an 
 artificial test case.
 First, make sure client-side keep alive is on.  My test case uses HTTP (port 
 80) GET.
 Second, make sure the total header size of the requests is exactly 275 bytes, 
 including the carriage returns and line feeds.  One byte more or less would 
 fail to reproduce the bug.
 Third, repeatedly submit the same request through this keep-alived 
 connection.  At exactly the 283rd iteration, TS hangs.  Note that if the 
 client opens a new connection every time, TS works fine.
 There is a second test case, where the header size is exactly 283 bytes, and 
 TS hangs at exactly the 275th iteration.  (Does 275 x 283 mean something?)
 These magic numbers seem to suggest a memory buffer size (or allocation) 
 problem.  I speculate that headers from repeated requests are placed in a 
 buffer (or a circular buffer?), and when the total hits a particular size, 
 some boundary conditions must have be violated and resulted in memory 
 corruption.
 In real life traffic, each request typically has slightly different header 
 size, so it is really hard to hit this bug.  I suspect there is a +/- 1 
 calculation error in some buffer.
 BTW, turning on/off caching does not make any difference.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Geffon updated TS-937:


Comment: was deleted

(was: So, I've began to look into this bug again. To try to determine where the 
action is being canceled I modified Action to add a const char * volatile 
cancelled_by; and then simply replaced any instance of -cancel() to pass the 
name of the method doing the cancelling:
)

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.3

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198522#comment-13198522
 ] 

Brian Geffon commented on TS-937:
-

So, I've began to look into this bug again. To try to determine where the 
action is being canceled I modified Action to add a const char * volatile 
cancelled_by; and then simply replaced any instance of -cancel() to pass the 
name of the method doing the cancelling:


 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.3

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198523#comment-13198523
 ] 

Brian Geffon commented on TS-937:
-

So I've tracked down where the event is being cancelled: 

PluginVC::process_close() line 699:

  if (core_lock_retry_event) {
core_lock_retry_event-cancel();
core_lock_retry_event = NULL;
  }





 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.3

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198526#comment-13198526
 ] 

Brian Geffon commented on TS-937:
-

So it turns out that this bug is fixed in TS-1074, I've verified this and am 
closing this bug.

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Geffon reopened TS-937:
-


 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198530#comment-13198530
 ] 

Brian Geffon commented on TS-937:
-

nevermind, I lied. Reopening :/

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198532#comment-13198532
 ] 

Brian Geffon commented on TS-937:
-

So it appears that the event being cancelled is an event callback related to a 
MUTEX being held. See PluginVC.cc:489

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Geffon updated TS-937:


Comment: was deleted

(was: So it turns out that this bug is fixed in TS-1074, I've verified this and 
am closing this bug.)

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

2012-02-01 Thread Brian Geffon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Geffon updated TS-937:


Comment: was deleted

(was: Ignore notifications about a comment talking about inclusion in 3.0.2, I 
was mistaken.
)

 EThread::execute still processing cancelled event
 -

 Key: TS-937
 URL: https://issues.apache.org/jira/browse/TS-937
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1, 2.1.9
 Environment: RHEL6
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.1.2

 Attachments: UnixEThread.patch


 The included GDB log will show that ATS is trying to process an event that 
 has already been canceled, examining the code of UnixEThread.cc line 232 
 shows that EThread::process_event gets called without a check for the event 
 being cancelled. 
 Brian
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x764fa700 (LWP 28518)]
 0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 Missing separate debuginfos, use: debuginfo-install 
 expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
 (gdb) bt
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 (gdb) bt full
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 lock = {m = {m_ptr = 0x764f9d20}, lock_acquired = 202}
 #1  0x006fcbaf in EThread::execute (this=0x768ff010) at 
 UnixEThread.cc:232
 done_one = false
 e = 0x1db45c0
 NegativeQueue = {DLLEvent, Event::Link_link = {head = 0xfc75f0}, 
 tail = 0xfc75f0}
 next_time = 1314647904419648000
 #2  0x006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
 p = 0xfb7e80
 #3  0x0036204077e1 in start_thread () from /lib64/libpthread.so.0
 No symbol table info available.
 #4  0x00361f8e577d in clone () from /lib64/libc.so.6
 No symbol table info available.
 (gdb) f 0
 #0  0x006fc663 in EThread::process_event (this=0x768ff010, 
 e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
 130  MUTEX_TRY_LOCK_FOR(lock, e-mutex.m_ptr, this, e-continuation);
 (gdb) p *e
 $2 = {Action = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
 {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x768ff010, 
 in_the_prot_queue = 0, in_the_priority_queue = 0, 
   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
 timeout_at = 0, period = 0, cookie = 0x0, link = {SLinkEvent = {next = 
 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira