[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=31117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31117
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 26/Oct/16 20:00
Start Date: 26/Oct/16 20:00
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
Closing the PR-- as we'll take care of this in Jira.


Issue Time Tracking
---

Worklog Id: (was: 31117)
Time Spent: 5h  (was: 4h 50m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
> Fix For: 7.1.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=31116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31116
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 26/Oct/16 20:00
Start Date: 26/Oct/16 20:00
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj closed the pull request at:

https://github.com/apache/trafficserver/pull/1108


Issue Time Tracking
---

Worklog Id: (was: 31116)
Time Spent: 4h 50m  (was: 4h 40m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
> Fix For: 7.1.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=31118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31118
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 26/Oct/16 20:00
Start Date: 26/Oct/16 20:00
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Closing the PR-- as we'll take care of this in Jira.


Issue Time Tracking
---

Worklog Id: (was: 31118)
Time Spent: 5h 10m  (was: 5h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
> Fix For: 7.1.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30803
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 18/Oct/16 15:07
Start Date: 18/Oct/16 15:07
Worklog Time Spent: 10m 
  Work Description: Github user shinrich commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Not a complete solution as Thomas noted, but may be sufficient to keep 
6.2.x moving.  Alternatively, we may just want to back port TS-4590.


Issue Time Tracking
---

Worklog Id: (was: 30803)
Time Spent: 4h 40m  (was: 4.5h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
> Fix For: 7.1.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30795
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 18/Oct/16 01:01
Start Date: 18/Oct/16 01:01
Worklog Time Spent: 10m 
  Work Description: Github user zwoop commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Is this a duplicate PR?


Issue Time Tracking
---

Worklog Id: (was: 30795)
Time Spent: 4.5h  (was: 4h 20m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30770
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 17/Oct/16 20:33
Start Date: 17/Oct/16 20:33
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
FreeBSD build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-FreeBSD/1036/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30770)
Time Spent: 4h 20m  (was: 4h 10m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30769
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 17/Oct/16 20:29
Start Date: 17/Oct/16 20:29
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
Linux build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-Linux/928/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30769)
Time Spent: 4h 10m  (was: 4h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30768
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 17/Oct/16 20:26
Start Date: 17/Oct/16 20:26
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Linux build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-Linux/927/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30768)
Time Spent: 4h  (was: 3h 50m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30767
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 17/Oct/16 20:25
Start Date: 17/Oct/16 20:25
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
FreeBSD build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-FreeBSD/1035/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30767)
Time Spent: 3h 50m  (was: 3h 40m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30764
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 17/Oct/16 20:15
Start Date: 17/Oct/16 20:15
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
After doing some more looking, the `m_deleted` flag is just marking the 
VConn as "we should delete this" and that combined with `m_deletable` lets it 
reschedule the delete in the future when it can. The fundamental problem I'm 
seeing is it is double freed under some conditions-- since all events trigger a 
delete. This is fixed in 7.x with  TS-4590 -- so I've updated this PR to 
workaround the issue in a simpler mechanism that more closely mirrors what is 
happening on 7.x


Issue Time Tracking
---

Worklog Id: (was: 30764)
Time Spent: 3h 40m  (was: 3.5h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30751
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 17/Oct/16 15:26
Start Date: 17/Oct/16 15:26
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
@shinrich I did look a bit more into the other backport-- and it seems that 
the outcome would be quite similar-- as the `handle_event` method is still 
using this `m_deleted` flag to determine if the struct was deleted, so it ends 
up being a superset of this change.


Issue Time Tracking
---

Worklog Id: (was: 30751)
Time Spent: 3.5h  (was: 3h 20m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30710
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 15/Oct/16 05:05
Start Date: 15/Oct/16 05:05
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
@jpeach yes, although since 5.2.x drops support in the next month-- its 
probably not work backporting to the 5.2.x branch-- there is another PR for 
6.2.x.


Issue Time Tracking
---

Worklog Id: (was: 30710)
Time Spent: 3h 20m  (was: 3h 10m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30699
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 22:27
Start Date: 14/Oct/16 22:27
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
@shinrich we could-- this patch seems to be working fine for our build 
though-- seemed like a less intrusive patch to an LTS release.


Issue Time Tracking
---

Worklog Id: (was: 30699)
Time Spent: 3h 10m  (was: 3h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30669
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 19:57
Start Date: 14/Oct/16 19:57
Worklog Time Spent: 10m 
  Work Description: Github user shinrich commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Would it be easier to just back port the fix from TS-4590?  I'm a bit 
concerned about the proposed fix in this PR.  I don't think is correctly using 
the m_deleted/m_deletable parameters?


Issue Time Tracking
---

Worklog Id: (was: 30669)
Time Spent: 3h  (was: 2h 50m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30663
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 19:31
Start Date: 14/Oct/16 19:31
Worklog Time Spent: 10m 
  Work Description: Github user SolidWallOfCode commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Does the `destroy` method also clear `m_deletable`?


Issue Time Tracking
---

Worklog Id: (was: 30663)
Time Spent: 2h 50m  (was: 2h 40m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30660
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 18:48
Start Date: 14/Oct/16 18:48
Worklog Time Spent: 10m 
  Work Description: Github user shinrich commented on a diff in the pull 
request:

https://github.com/apache/trafficserver/pull/1108#discussion_r83480249
  
--- Diff: proxy/InkAPI.cc ---
@@ -1053,15 +1053,14 @@ int
 INKVConnInternal::handle_event(int event, void *edata)
 {
   handle_event_count(event);
-  if (m_deleted) {
-if (m_deletable) {
-  this->mutex = NULL;
-  m_read_vio.set_continuation(NULL);
-  m_write_vio.set_continuation(NULL);
-  INKVConnAllocator.free(this);
-}
-  } else {
+  // If the VConn isn't deleted, call the handler
+  if (!m_deleted) {
 return m_event_func((TSCont) this, (TSEvent) event, edata);
+  } else {
+// if the VConn is deleted, and we are in DEBUG mode-- we should assert
+// because this means that the VConn was cleaned up before all the 
callbacks
+// (timeouts, etc.) where canceled.
+ink_assert("event on deleted INKVConnInternal");
--- End diff --

Yes warnings would be good.


Issue Time Tracking
---

Worklog Id: (was: 30660)
Time Spent: 2h 40m  (was: 2.5h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I 

[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30654
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 17:17
Start Date: 14/Oct/16 17:17
Worklog Time Spent: 10m 
  Work Description: Github user jpeach commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
Wait, this it for 5.2.x?


Issue Time Tracking
---

Worklog Id: (was: 30654)
Time Spent: 2.5h  (was: 2h 20m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30647
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 16:22
Start Date: 14/Oct/16 16:22
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
@jpeach The issue is that an event shows up after the object was 
destroyed() the destroy() mechanism sets the deleted flag-- previously what was 
happening is that if an event showed up after it was `destroy()`d then it would 
attempt to re-destroy it. This change is simply to not re-destroy it, but 
rather do nothing.


Issue Time Tracking
---

Worklog Id: (was: 30647)
Time Spent: 2h  (was: 1h 50m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30649
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 16:23
Start Date: 14/Oct/16 16:23
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on a diff in the pull 
request:

https://github.com/apache/trafficserver/pull/1108#discussion_r83455610
  
--- Diff: proxy/InkAPI.cc ---
@@ -1053,15 +1053,14 @@ int
 INKVConnInternal::handle_event(int event, void *edata)
 {
   handle_event_count(event);
-  if (m_deleted) {
-if (m_deletable) {
-  this->mutex = NULL;
-  m_read_vio.set_continuation(NULL);
-  m_write_vio.set_continuation(NULL);
-  INKVConnAllocator.free(this);
-}
-  } else {
+  // If the VConn isn't deleted, call the handler
+  if (!m_deleted) {
 return m_event_func((TSCont) this, (TSEvent) event, edata);
+  } else {
--- End diff --

@igalic This handler shouldn't be called after the VConn was destroyed. The 
else exists soely to get the assert in there (for debug builds)-- because if we 
hit that `else` path there is a bug-- but we don't want ATS to crash on it if 
possible.


Issue Time Tracking
---

Worklog Id: (was: 30649)
Time Spent: 2h 20m  (was: 2h 10m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks 

[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30648
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 16:22
Start Date: 14/Oct/16 16:22
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on a diff in the pull 
request:

https://github.com/apache/trafficserver/pull/1108#discussion_r83342298
  
--- Diff: proxy/InkAPI.cc ---
@@ -1053,15 +1053,14 @@ int
 INKVConnInternal::handle_event(int event, void *edata)
 {
   handle_event_count(event);
-  if (m_deleted) {
-if (m_deletable) {
-  this->mutex = NULL;
-  m_read_vio.set_continuation(NULL);
-  m_write_vio.set_continuation(NULL);
-  INKVConnAllocator.free(this);
-}
-  } else {
+  // If the VConn isn't deleted, call the handler
+  if (!m_deleted) {
 return m_event_func((TSCont) this, (TSEvent) event, edata);
+  } else {
+// if the VConn is deleted, and we are in DEBUG mode-- we should assert
+// because this means that the VConn was cleaned up before all the 
callbacks
+// (timeouts, etc.) where canceled.
+ink_assert("event on deleted INKVConnInternal");
--- End diff --

I think it might also be worthwhile to put a log line (warning?) here for 
non-debug builds. Thoughts?


Issue Time Tracking
---

Worklog Id: (was: 30648)
Time Spent: 2h 10m  (was: 2h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave 

[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30646
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 16:20
Start Date: 14/Oct/16 16:20
Worklog Time Spent: 10m 
  Work Description: Github user jacksontj commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
@jpeach TBH I'm not sure, I figured the PR would be good at least to get 
the CI to run on it-- I'm not sure what our official process is now that we do 
github for *some* things


Issue Time Tracking
---

Worklog Id: (was: 30646)
Time Spent: 1h 50m  (was: 1h 40m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30641
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 15:28
Start Date: 14/Oct/16 15:28
Worklog Time Spent: 10m 
  Work Description: Github user jpeach commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Are we using github for backports? I thought the process was to mark the 
JIRA?


Issue Time Tracking
---

Worklog Id: (was: 30641)
Time Spent: 1h 40m  (was: 1.5h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30632
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 13:43
Start Date: 14/Oct/16 13:43
Worklog Time Spent: 10m 
  Work Description: Github user igalic commented on a diff in the pull 
request:

https://github.com/apache/trafficserver/pull/1108#discussion_r83423846
  
--- Diff: proxy/InkAPI.cc ---
@@ -1053,15 +1053,14 @@ int
 INKVConnInternal::handle_event(int event, void *edata)
 {
   handle_event_count(event);
-  if (m_deleted) {
-if (m_deletable) {
-  this->mutex = NULL;
-  m_read_vio.set_continuation(NULL);
-  m_write_vio.set_continuation(NULL);
-  INKVConnAllocator.free(this);
-}
-  } else {
+  // If the VConn isn't deleted, call the handler
+  if (!m_deleted) {
 return m_event_func((TSCont) this, (TSEvent) event, edata);
+  } else {
--- End diff --

why is this an `else` if we previously `return`ed?


Issue Time Tracking
---

Worklog Id: (was: 30632)
Time Spent: 1.5h  (was: 1h 20m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30629
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:55
Start Date: 14/Oct/16 00:55
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Linux build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-Linux/901/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30629)
Time Spent: 1h 20m  (was: 1h 10m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30628
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:50
Start Date: 14/Oct/16 00:50
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
FreeBSD build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-FreeBSD/1009/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30628)
Time Spent: 1h 10m  (was: 1h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30627
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:40
Start Date: 14/Oct/16 00:40
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
FreeBSD build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-FreeBSD/1008/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30627)
Time Spent: 1h  (was: 50m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30626
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:37
Start Date: 14/Oct/16 00:37
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1108
  
Linux build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-Linux/900/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30626)
Time Spent: 50m  (was: 40m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30625
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:35
Start Date: 14/Oct/16 00:35
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
FreeBSD build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-FreeBSD/1007/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30625)
Time Spent: 40m  (was: 0.5h)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30623
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:30
Start Date: 14/Oct/16 00:30
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1109
  
Linux build *failed*! See 
https://ci.trafficserver.apache.org/job/Github-Linux/899/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 30623)
Time Spent: 0.5h  (was: 20m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30622
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:25
Start Date: 14/Oct/16 00:25
Worklog Time Spent: 10m 
  Work Description: GitHub user jacksontj opened a pull request:

https://github.com/apache/trafficserver/pull/1109

TS-4970: Crash in INKVConnInternal when handle_event is called after 
destroy()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jacksontj/trafficserver TS-4970_ATS6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/trafficserver/pull/1109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1109


commit d9998523404d2f3dcc5061e7636d4dfe5d81882d
Author: Thomas Jackson 
Date:   2016-10-14T00:23:50Z

TS-4970: Crash in INKVConnInternal when handle_event is called after 
destroy()




Issue Time Tracking
---

Worklog Id: (was: 30622)
Time Spent: 20m  (was: 10m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by 

[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30621
 ]

ASF GitHub Bot logged work on TS-4970:
--

Author: ASF GitHub Bot
Created on: 14/Oct/16 00:24
Start Date: 14/Oct/16 00:24
Worklog Time Spent: 10m 
  Work Description: GitHub user jacksontj opened a pull request:

https://github.com/apache/trafficserver/pull/1108

TS-4970: Crash in INKVConnInternal when handle_event is called after 
destroy()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jacksontj/trafficserver TS-4970_ATS5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/trafficserver/pull/1108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1108


commit 1210ad96278ccb29a0614b3234a2ed3743c3f6f6
Author: Thomas Jackson 
Date:   2016-10-14T00:22:42Z

TS-4970: Crash in INKVConnInternal when handle_event is called after 
destroy()




Issue Time Tracking
---

Worklog Id: (was: 30621)
Time Spent: 10m
Remaining Estimate: 0h

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).