[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2015-02-04 Thread Feifei Cai (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306497#comment-14306497
 ] 

Feifei Cai edited comment on TS-2497 at 2/5/15 2:13 AM:


I have not test redirects yet. Will try it.
Hi [~briang], do you remember the crash case? Is it happened on redirects or 
not?


was (Author: ffcai):
I have not test redirects yet. Will try it.
Hi [~briang]], do you remember the crash case? Is it happened on redirects or 
not?

 Failed post results in tunnel buffers being returned to freelist prematurely
 

 Key: TS-2497
 URL: https://issues.apache.org/jira/browse/TS-2497
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 4.2.0

 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js


 When a post fails to an origin server either the server died or the server 
 returned a response without reading all of the post data, in either case, TS 
 will destroy buffers too early. This normally does not result in a crash 
 because the MIOBuffers are returned to the freelist and only with sufficient 
 load will the race happen causing a crash. Additionally, even if a crash 
 doesn't happen you might have data corruption across post requests from the 
 buffers being used after being returned to the freelist.
 Thanks to Thomas Jackson for help reproducing and resolving this bug.
 An example stack trace, while we've seen other crashes in write_avail too.
 #0  0x004eff14 in IOBufferBlock::read_avail (this=0x0) at 
 ../iocore/eventsystem/I_IOBuffer.h:362
 #1  0x0050d151 in MIOBuffer::append_block_internal 
 (this=0x2aab38001130, b=0x2aab0c037200) at 
 ../iocore/eventsystem/P_IOBuffer.h:946
 #2  0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, 
 asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986
 #3  0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:994
 #4  0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1002
 #5  0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1048
 #6  0x006c18f3 in read_from_net (nh=0x2aaafca0d208, 
 vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234
 #7  0x006c37bf in UnixNetVConnection::net_read_io 
 (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at 
 UnixNetVConnection.cc:816
 #8  0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, 
 event=5, e=0x271d8e0) at UnixNet.cc:380
 #9  0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, 
 event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146
 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, 
 e=0x271d8e0, calling_code=5) at UnixEThread.cc:142
 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at 
 UnixEThread.cc:264
 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88
 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0
 #14 0x0033728e68ed in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2015-01-23 Thread Brian Geffon (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289632#comment-14289632
 ] 

Brian Geffon edited comment on TS-2497 at 1/23/15 6:18 PM:
---

I'd have to do some reading, I don't really remember much about this.


was (Author: briang):
I'd have to do some reading, I don't really remember to much about this.

 Failed post results in tunnel buffers being returned to freelist prematurely
 

 Key: TS-2497
 URL: https://issues.apache.org/jira/browse/TS-2497
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 4.2.0

 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js


 When a post fails to an origin server either the server died or the server 
 returned a response without reading all of the post data, in either case, TS 
 will destroy buffers too early. This normally does not result in a crash 
 because the MIOBuffers are returned to the freelist and only with sufficient 
 load will the race happen causing a crash. Additionally, even if a crash 
 doesn't happen you might have data corruption across post requests from the 
 buffers being used after being returned to the freelist.
 Thanks to Thomas Jackson for help reproducing and resolving this bug.
 An example stack trace, while we've seen other crashes in write_avail too.
 #0  0x004eff14 in IOBufferBlock::read_avail (this=0x0) at 
 ../iocore/eventsystem/I_IOBuffer.h:362
 #1  0x0050d151 in MIOBuffer::append_block_internal 
 (this=0x2aab38001130, b=0x2aab0c037200) at 
 ../iocore/eventsystem/P_IOBuffer.h:946
 #2  0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, 
 asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986
 #3  0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:994
 #4  0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1002
 #5  0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1048
 #6  0x006c18f3 in read_from_net (nh=0x2aaafca0d208, 
 vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234
 #7  0x006c37bf in UnixNetVConnection::net_read_io 
 (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at 
 UnixNetVConnection.cc:816
 #8  0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, 
 event=5, e=0x271d8e0) at UnixNet.cc:380
 #9  0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, 
 event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146
 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, 
 e=0x271d8e0, calling_code=5) at UnixEThread.cc:142
 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at 
 UnixEThread.cc:264
 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88
 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0
 #14 0x0033728e68ed in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2015-01-23 Thread Feifei Cai (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289206#comment-14289206
 ] 

Feifei Cai edited comment on TS-2497 at 1/23/15 12:53 PM:
--

Memory leak is noticed in our production hosts. It should be related to 
handling 5xx response from origin sever.

The dump info is as follows, it's from 1 host with ~70% POST requests. I 
enabled memory dump {{proxy.config.dump_mem_info_frequency}} and track 
{{proxy.config.res_track_memory}}.

*traffic.out:*
{noformat}
 allocated  |in-use  | type size  |   free list name
|||--
  0 |  0 |2097152 | 
memory/ioBufAllocator[14]
  0 |  0 |1048576 | 
memory/ioBufAllocator[13]
  0 |  0 | 524288 | 
memory/ioBufAllocator[12]
  0 |  0 | 262144 | 
memory/ioBufAllocator[11]
  0 |  0 | 131072 | 
memory/ioBufAllocator[10]
  0 |  0 |  65536 | memory/ioBufAllocator[9]
 1266679808 | 1262354432 |  32768 | memory/ioBufAllocator[8]
  600309760 |  599703552 |  16384 | memory/ioBufAllocator[7]
  395051008 |  391086080 |   8192 | memory/ioBufAllocator[6]
  229113856 |  224432128 |   4096 | memory/ioBufAllocator[5]
  342622208 |  342503424 |   2048 | memory/ioBufAllocator[4]
  245104640 |  245042176 |   1024 | memory/ioBufAllocator[3]
2228224 |2176512 |512 | memory/ioBufAllocator[2]
 622592 | 607232 |256 | memory/ioBufAllocator[1]
2375680 |2370176 |128 | memory/ioBufAllocator[0]
  Location |  Size In-use
---+
  memory/IOBuffer/ProtocolProbeSessionAccept.cc:39 | 66768896
  memory/IOBuffer/HttpClientSession.cc:230 |0
memory/IOBuffer/HttpSM.cc:3314 |0
memory/IOBuffer/HttpSM.cc:5349 |   3003506816
memory/IOBuffer/HttpSM.cc:5668 |0
memory/IOBuffer/HttpSM.cc:5874 |0
memory/IOBuffer/HttpSM.cc:5976 |0
memory/IOBuffer/HttpSM.cc:6267 |0
   memory/IOBuffer/HttpServerSession.cc:87 |0
  memory/IOBuffer/HttpTunnel.cc:95 |0
 memory/IOBuffer/HttpTunnel.cc:100 |0
 TOTAL |   3070275712
{noformat}


I take a refer to [~shaunmcginnity]'s node.js with some changes, and reproduce 
the memory leak in my local environment.

# 
[origin-server.js|https://issues.apache.org/jira/secure/attachment/12694146/client.js]
This origin server responses a 503 when receives more than one single byte, so 
the post would not complete at most cases. I change [~shaunmcginnity]'s code, 
make origin server responses to ats, which would make ats hits another code 
path.
# 
[client.js|https://issues.apache.org/jira/secure/attachment/12694145/origin-server.js]
We create a new client per second, and each client try to post 32K bytes data.
# ats
*remap.config*: remap all to local port 5000
{quote}map / http://127.0.0.1:5000{quote}
*records.config*: listen on 80
{quote}CONFIG proxy.config.http.server_ports STRING 80{quote}

Then we can get dump info as follows, and in-use number of MIOBuffer with 
index=8 (size=32K) would increase 1 per second.

{noformat}
 allocated  |in-use  | type size  |   free list name
|||--
1048576 |  32768 |  32768 | memory/ioBufAllocator[8]
{noformat}

We can also try change the Content-Length in client.js to a smaller size, and 
MIOBuffer with the corresponding index(0-7) would also increase.


I add this simple patch to prevent the memory leak in the case above, just like 
last commit, and it's verified in 1 test host.

free.diff
{code}
diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc
index 932ef97..123b97a 100644
--- a/proxy/http/HttpSM.cc
+++ b/proxy/http/HttpSM.cc
@@ -5074,6 +5074,7 @@ HttpSM::handle_post_failure()
   t_state.current.server-keep_alive = HTTP_NO_KEEPALIVE;

   if (server_buffer_reader-read_avail()  0) {
+tunnel.deallocate_buffers();
 tunnel.reset();
 // There's data from the server so try to read the header
 

[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2015-01-23 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289413#comment-14289413
 ] 

Sudheer Vinukonda edited comment on TS-2497 at 1/23/15 3:41 PM:


[~ffcai]: 

Does the below patch fix the leak as well as address the concerns about not 
breaking TS-2497?

{code}
diff --git a/proxy/http/HttpTunnel.cc b/proxy/http/HttpTunnel.cc
index d75b5a1..75df1a5 100644
--- a/proxy/http/HttpTunnel.cc
+++ b/proxy/http/HttpTunnel.cc
@@ -621,6 +621,11 @@ HttpTunnel::add_producer(VConnection * vc,
   if ((p = alloc_producer()) != NULL) {
 p-vc = vc;
 p-nbytes = nbytes_arg;
+if (p-read_buffer) {
+  free_MIOBuffer(p-read_buffer);
+  p-read_buffer = NULL;
+  p-buffer_start = NULL;
+}
 p-buffer_start = reader_start;
 p-read_buffer = reader_start-mbuf;
 p-vc_handler = sm_handler;
{code}



was (Author: sudheerv):
[~ffcai]: 

Does the below patch fix the leak as well as address the concerns about not 
breaking TS-2497?


diff --git a/proxy/http/HttpTunnel.cc b/proxy/http/HttpTunnel.cc
index d75b5a1..75df1a5 100644
--- a/proxy/http/HttpTunnel.cc
+++ b/proxy/http/HttpTunnel.cc
@@ -621,6 +621,11 @@ HttpTunnel::add_producer(VConnection * vc,
   if ((p = alloc_producer()) != NULL) {
 p-vc = vc;
 p-nbytes = nbytes_arg;
+if (p-read_buffer) {
+  free_MIOBuffer(p-read_buffer);
+  p-read_buffer = NULL;
+  p-buffer_start = NULL;
+}
 p-buffer_start = reader_start;
 p-read_buffer = reader_start-mbuf;
 p-vc_handler = sm_handler;


 Failed post results in tunnel buffers being returned to freelist prematurely
 

 Key: TS-2497
 URL: https://issues.apache.org/jira/browse/TS-2497
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 4.2.0

 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js


 When a post fails to an origin server either the server died or the server 
 returned a response without reading all of the post data, in either case, TS 
 will destroy buffers too early. This normally does not result in a crash 
 because the MIOBuffers are returned to the freelist and only with sufficient 
 load will the race happen causing a crash. Additionally, even if a crash 
 doesn't happen you might have data corruption across post requests from the 
 buffers being used after being returned to the freelist.
 Thanks to Thomas Jackson for help reproducing and resolving this bug.
 An example stack trace, while we've seen other crashes in write_avail too.
 #0  0x004eff14 in IOBufferBlock::read_avail (this=0x0) at 
 ../iocore/eventsystem/I_IOBuffer.h:362
 #1  0x0050d151 in MIOBuffer::append_block_internal 
 (this=0x2aab38001130, b=0x2aab0c037200) at 
 ../iocore/eventsystem/P_IOBuffer.h:946
 #2  0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, 
 asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986
 #3  0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:994
 #4  0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1002
 #5  0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1048
 #6  0x006c18f3 in read_from_net (nh=0x2aaafca0d208, 
 vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234
 #7  0x006c37bf in UnixNetVConnection::net_read_io 
 (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at 
 UnixNetVConnection.cc:816
 #8  0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, 
 event=5, e=0x271d8e0) at UnixNet.cc:380
 #9  0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, 
 event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146
 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, 
 e=0x271d8e0, calling_code=5) at UnixEThread.cc:142
 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at 
 UnixEThread.cc:264
 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88
 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0
 #14 0x0033728e68ed in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2015-01-23 Thread Feifei Cai (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289206#comment-14289206
 ] 

Feifei Cai edited comment on TS-2497 at 1/23/15 12:54 PM:
--

Memory leak is noticed in our production hosts. It should be related to 
handling 5xx response from origin sever.

The dump info is as follows, it's from 1 host with ~70% POST requests. I 
enabled memory dump {{proxy.config.dump_mem_info_frequency}} and track 
{{proxy.config.res_track_memory}}.

*traffic.out:*
{noformat}
 allocated  |in-use  | type size  |   free list name
|||--
  0 |  0 |2097152 | 
memory/ioBufAllocator[14]
  0 |  0 |1048576 | 
memory/ioBufAllocator[13]
  0 |  0 | 524288 | 
memory/ioBufAllocator[12]
  0 |  0 | 262144 | 
memory/ioBufAllocator[11]
  0 |  0 | 131072 | 
memory/ioBufAllocator[10]
  0 |  0 |  65536 | memory/ioBufAllocator[9]
 1266679808 | 1262354432 |  32768 | memory/ioBufAllocator[8]
  600309760 |  599703552 |  16384 | memory/ioBufAllocator[7]
  395051008 |  391086080 |   8192 | memory/ioBufAllocator[6]
  229113856 |  224432128 |   4096 | memory/ioBufAllocator[5]
  342622208 |  342503424 |   2048 | memory/ioBufAllocator[4]
  245104640 |  245042176 |   1024 | memory/ioBufAllocator[3]
2228224 |2176512 |512 | memory/ioBufAllocator[2]
 622592 | 607232 |256 | memory/ioBufAllocator[1]
2375680 |2370176 |128 | memory/ioBufAllocator[0]
  Location |  Size In-use
---+
  memory/IOBuffer/ProtocolProbeSessionAccept.cc:39 | 66768896
  memory/IOBuffer/HttpClientSession.cc:230 |0
memory/IOBuffer/HttpSM.cc:3314 |0
memory/IOBuffer/HttpSM.cc:5349 |   3003506816
memory/IOBuffer/HttpSM.cc:5668 |0
memory/IOBuffer/HttpSM.cc:5874 |0
memory/IOBuffer/HttpSM.cc:5976 |0
memory/IOBuffer/HttpSM.cc:6267 |0
   memory/IOBuffer/HttpServerSession.cc:87 |0
  memory/IOBuffer/HttpTunnel.cc:95 |0
 memory/IOBuffer/HttpTunnel.cc:100 |0
 TOTAL |   3070275712
{noformat}


I take a refer to [~shaunmcginnity]'s node.js with some changes, and reproduce 
the memory leak in my local environment.

# 
[origin-server.js|https://issues.apache.org/jira/secure/attachment/12694145/origin-server.js]
This origin server responses a 503 when receives more than one single byte, so 
the post would not complete at most cases. I change [~shaunmcginnity]'s code, 
make origin server responses to ats, which would make ats hits another code 
path.
# 
[client.js|https://issues.apache.org/jira/secure/attachment/12694146/client.js]
We create a new client per second, and each client try to post 32K bytes data.
# ats
*remap.config*: remap all to local port 5000
{quote}map / http://127.0.0.1:5000{quote}
*records.config*: listen on 80
{quote}CONFIG proxy.config.http.server_ports STRING 80{quote}

Then we can get dump info as follows, and in-use number of MIOBuffer with 
index=8 (size=32K) would increase 1 per second.

{noformat}
 allocated  |in-use  | type size  |   free list name
|||--
1048576 |  32768 |  32768 | memory/ioBufAllocator[8]
{noformat}

We can also try change the Content-Length in client.js to a smaller size, and 
MIOBuffer with the corresponding index(0-7) would also increase.


I add this simple patch to prevent the memory leak in the case above, just like 
last commit, and it's verified in 1 test host.

free.diff
{code}
diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc
index 932ef97..123b97a 100644
--- a/proxy/http/HttpSM.cc
+++ b/proxy/http/HttpSM.cc
@@ -5074,6 +5074,7 @@ HttpSM::handle_post_failure()
   t_state.current.server-keep_alive = HTTP_NO_KEEPALIVE;

   if (server_buffer_reader-read_avail()  0) {
+tunnel.deallocate_buffers();
 tunnel.reset();
 // There's data from the server so try to read the header
 

[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2015-01-23 Thread Sudheer Vinukonda (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289453#comment-14289453
 ] 

Sudheer Vinukonda edited comment on TS-2497 at 1/23/15 4:17 PM:


I wonder if the issue that was originally fixed by [~briang] is similar to the 
issue resolved in TS-3285 (freeing the MIOBuffer while there's a write/read in 
progress, which could eventually corrupt the buffer on the free list). Also 
refer TS-3286 for some proposed improvements to detect buffer corruptions 
sooner/easier.


was (Author: sudheerv):
I wonder if the issue that was originally fixed by [~briang] is similar to the 
issue resolved in TS-3285 (freeing the MIOBuffer while there's a write/read in 
progress, which could eventually corrupt the buffer on the free list).

 Failed post results in tunnel buffers being returned to freelist prematurely
 

 Key: TS-2497
 URL: https://issues.apache.org/jira/browse/TS-2497
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 4.2.0

 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js


 When a post fails to an origin server either the server died or the server 
 returned a response without reading all of the post data, in either case, TS 
 will destroy buffers too early. This normally does not result in a crash 
 because the MIOBuffers are returned to the freelist and only with sufficient 
 load will the race happen causing a crash. Additionally, even if a crash 
 doesn't happen you might have data corruption across post requests from the 
 buffers being used after being returned to the freelist.
 Thanks to Thomas Jackson for help reproducing and resolving this bug.
 An example stack trace, while we've seen other crashes in write_avail too.
 #0  0x004eff14 in IOBufferBlock::read_avail (this=0x0) at 
 ../iocore/eventsystem/I_IOBuffer.h:362
 #1  0x0050d151 in MIOBuffer::append_block_internal 
 (this=0x2aab38001130, b=0x2aab0c037200) at 
 ../iocore/eventsystem/P_IOBuffer.h:946
 #2  0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, 
 asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986
 #3  0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:994
 #4  0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1002
 #5  0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1048
 #6  0x006c18f3 in read_from_net (nh=0x2aaafca0d208, 
 vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234
 #7  0x006c37bf in UnixNetVConnection::net_read_io 
 (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at 
 UnixNetVConnection.cc:816
 #8  0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, 
 event=5, e=0x271d8e0) at UnixNet.cc:380
 #9  0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, 
 event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146
 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, 
 e=0x271d8e0, calling_code=5) at UnixEThread.cc:142
 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at 
 UnixEThread.cc:264
 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88
 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0
 #14 0x0033728e68ed in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2014-05-05 Thread Brian Geffon (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989925#comment-13989925
 ] 

Brian Geffon edited comment on TS-2497 at 5/5/14 8:17 PM:
--

I also attached a small node js script to reproduce this bug.


was (Author: briang):
A node js script to reproduce this bug.

 Failed post results in tunnel buffers being returned to freelist prematurely
 

 Key: TS-2497
 URL: https://issues.apache.org/jira/browse/TS-2497
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 4.2.0

 Attachments: TS-2497.patch, repro.js


 When a post fails to an origin server either the server died or the server 
 returned a response without reading all of the post data, in either case, TS 
 will destroy buffers too early. This normally does not result in a crash 
 because the MIOBuffers are returned to the freelist and only with sufficient 
 load will the race happen causing a crash. Additionally, even if a crash 
 doesn't happen you might have data corruption across post requests from the 
 buffers being used after being returned to the freelist.
 Thanks to Thomas Jackson for help reproducing and resolving this bug.
 An example stack trace, while we've seen other crashes in write_avail too.
 #0  0x004eff14 in IOBufferBlock::read_avail (this=0x0) at 
 ../iocore/eventsystem/I_IOBuffer.h:362
 #1  0x0050d151 in MIOBuffer::append_block_internal 
 (this=0x2aab38001130, b=0x2aab0c037200) at 
 ../iocore/eventsystem/P_IOBuffer.h:946
 #2  0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, 
 asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986
 #3  0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:994
 #4  0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1002
 #5  0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1048
 #6  0x006c18f3 in read_from_net (nh=0x2aaafca0d208, 
 vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234
 #7  0x006c37bf in UnixNetVConnection::net_read_io 
 (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at 
 UnixNetVConnection.cc:816
 #8  0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, 
 event=5, e=0x271d8e0) at UnixNet.cc:380
 #9  0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, 
 event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146
 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, 
 e=0x271d8e0, calling_code=5) at UnixEThread.cc:142
 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at 
 UnixEThread.cc:264
 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88
 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0
 #14 0x0033728e68ed in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely

2014-04-29 Thread Brian Geffon (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984585#comment-13984585
 ] 

Brian Geffon edited comment on TS-2497 at 4/29/14 5:58 PM:
---

[~shaunmcginnity], something is a little unclear to me, the only difference 
between the two is:

Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620
Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620 DONE

Which happens after add_producer, can you provide a back trace or something for 
those two frees? It's not clear where they are coming from, also, are you sure 
this behavior didn't exist before? Can you also provide output from -T .* 
debug tag? 


was (Author: briang):
[~shaunmcginnity], something is a little unclear to me, the only difference 
between the two is:

Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620
Apr 29 11:05:29 - INFO - free_MIOBuffer 0x5001620 DONE

Which happens after add_producer, can you provide a back trace or something for 
those two frees? It's not clear where they are coming from, also, are you sure 
this behavior didn't exist before?

 Failed post results in tunnel buffers being returned to freelist prematurely
 

 Key: TS-2497
 URL: https://issues.apache.org/jira/browse/TS-2497
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 4.2.0


 When a post fails to an origin server either the server died or the server 
 returned a response without reading all of the post data, in either case, TS 
 will destroy buffers too early. This normally does not result in a crash 
 because the MIOBuffers are returned to the freelist and only with sufficient 
 load will the race happen causing a crash. Additionally, even if a crash 
 doesn't happen you might have data corruption across post requests from the 
 buffers being used after being returned to the freelist.
 Thanks to Thomas Jackson for help reproducing and resolving this bug.
 An example stack trace, while we've seen other crashes in write_avail too.
 #0  0x004eff14 in IOBufferBlock::read_avail (this=0x0) at 
 ../iocore/eventsystem/I_IOBuffer.h:362
 #1  0x0050d151 in MIOBuffer::append_block_internal 
 (this=0x2aab38001130, b=0x2aab0c037200) at 
 ../iocore/eventsystem/P_IOBuffer.h:946
 #2  0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, 
 asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986
 #3  0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:994
 #4  0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1002
 #5  0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at 
 ../iocore/eventsystem/P_IOBuffer.h:1048
 #6  0x006c18f3 in read_from_net (nh=0x2aaafca0d208, 
 vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234
 #7  0x006c37bf in UnixNetVConnection::net_read_io 
 (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at 
 UnixNetVConnection.cc:816
 #8  0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, 
 event=5, e=0x271d8e0) at UnixNet.cc:380
 #9  0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, 
 event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146
 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, 
 e=0x271d8e0, calling_code=5) at UnixEThread.cc:142
 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at 
 UnixEThread.cc:264
 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88
 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0
 #14 0x0033728e68ed in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.2#6252)