[jira] [Updated] (TS-3285) Seg fault when 100 CONT handling is enabled
[ https://issues.apache.org/jira/browse/TS-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheer Vinukonda updated TS-3285: -- Description: With 100 CONT handling enabled in our ats5 production hosts, we are seeing the below seg fault. {code} (gdb) bt #0 0x00316e432925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00316e434105 in abort () at abort.c:92 #2 0x2b6869944458 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x2b6869944525 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b68699518d8 %s:%d: failed assert `%s`, ap=0x2b686bb1bf00) at ink_error.cc:65 #4 0x2b68699445ee in ink_fatal (return_code=1, message_format=0x2b68699518d8 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x2b6869943160 in _ink_assert (expression=0x7a984e buf_index_inout == NULL, file=0x7a96e3 MIME.cc, line=2676) at ink_assert.cc:37 #6 0x0068212d in mime_mem_print (src_d=0x2b686bb1c090 HTTP/1.1, src_l=8, buf_start=0x0, buf_length=-1811908575, buf_index_inout=0x2b686bb1c1bc, buf_chars_to_skip_inout=0x2b686bb1c1b8) at MIME.cc:2676 #7 0x00671df3 in http_version_print (version=65537, buf=0x0, bufsize=-1811908575, bufindex=0x2b686bb1c1bc, dumpoffset=0x2b686bb1c1b8) at HTTP.cc:415 #8 0x006724fb in http_hdr_print (heap=0x2b6881019010, hdr=0x2b6881019098, buf=0x0, bufsize=-1811908575, bufindex=0x2b686bb1c1bc, dumpoffset=0x2b686bb1c1b8) at HTTP.cc:539 #9 0x004f259b in HTTPHdr::print (this=0x2b68ac06f058, buf=0x0, bufsize=-1811908575, bufindex=0x2b686bb1c1bc, dumpoffset=0x2b686bb1c1b8) at ./hdrs/HTTP.h:897 #10 0x005da903 in HttpSM::write_header_into_buffer (this=0x2b68ac06e910, h=0x2b68ac06f058, b=0x2f163e0) at HttpSM.cc:5554 #11 0x005e5129 in HttpSM::write_response_header_into_buffer (this=0x2b68ac06e910, h=0x2b68ac06f058, b=0x2f163e0) at HttpSM.h:594 #12 0x005dcef2 in HttpSM::setup_server_transfer (this=0x2b68ac06e910) at HttpSM.cc:6295 #13 0x005cd336 in HttpSM::handle_api_return (this=0x2b68ac06e910) at HttpSM.cc:1554 #14 0x005cd040 in HttpSM::state_api_callout (this=0x2b68ac06e910, event=0, data=0x0) at HttpSM.cc:1446 #15 0x005d89b7 in HttpSM::do_api_callout_internal (this=0x2b68ac06e910) at HttpSM.cc:4858 #16 0x005dfdec in HttpSM::set_next_state (this=0x2b68ac06e910) at HttpSM.cc:7115 #17 0x005df0ec in HttpSM::call_transact_and_set_next_state (this=0x2b68ac06e910, f=0) at HttpSM.cc:6900 #18 0x005cd1e3 in HttpSM::handle_api_return (this=0x2b68ac06e910) at HttpSM.cc:1514 #19 0x005cd040 in HttpSM::state_api_callout (this=0x2b68ac06e910, event=6, data=0x0) at HttpSM.cc:1446 #20 0x005cc7d6 in HttpSM::state_api_callback (this=0x2b68ac06e910, event=6, data=0x0) at HttpSM.cc:1264 #21 0x00515bb5 in TSHttpTxnReenable (txnp=0x2b68ac06e910, event=TS_EVENT_HTTP_CONTINUE) at InkAPI.cc:5554 #22 0x2b68806f945b in transform_plugin (event=TS_EVENT_HTTP_READ_RESPONSE_HDR, edata=0x2b68ac06e910) at gzip.cc:693 #23 0x0050a40c in INKContInternal::handle_event (this=0x2ea2bb0, event=60006, edata=0x2b68ac06e910) at InkAPI.cc:1000 #24 0x004f597e in Continuation::handleEvent (this=0x2ea2bb0, event=60006, data=0x2b68ac06e910) at ../iocore/eventsystem/I_Continuation.h:146 #25 0x0050ac53 in APIHook::invoke (this=0x2ea3c80, event=60006, edata=0x2b68ac06e910) at InkAPI.cc:1219 #26 0x005ccda9 in HttpSM::state_api_callout (this=0x2b68ac06e910, event=0, data=0x0) at HttpSM.cc:1371 #27 0x005d89b7 in HttpSM::do_api_callout_internal (this=0x2b68ac06e910) at HttpSM.cc:4858 #28 0x005e54fc in HttpSM::do_api_callout (this=0x2b68ac06e910) at HttpSM.cc:448 #29 0x005ce277 in HttpSM::state_read_server_response_header (this=0x2b68ac06e910, event=100, data=0x2b68a802afc0) at HttpSM.cc:1861 #30 0x005d0582 in HttpSM::main_handler (this=0x2b68ac06e910, event=100, data=0x2b68a802afc0) at HttpSM.cc:2507 #31 0x004f597e in Continuation::handleEvent (this=0x2b68ac06e910, event=100, data=0x2b68a802afc0) at ../iocore/eventsystem/I_Continuation.h:146 #32 0x00531d7d in PluginVC::process_read_side (this=0x2b68a802aec0, other_side_call=true) at PluginVC.cc:671 #33 0x00531612 in PluginVC::process_write_side (this=0x2b68a802b0a8, other_side_call=false) at PluginVC.cc:567 #34 0x005303b4 in PluginVC::main_handler (this=0x2b68a802b0a8, event=1, data=0x2b68a80644f0) at PluginVC.cc:212 (gdb) f 12 #12 0x005dcef2 in HttpSM::setup_server_transfer (this=0x2b68ac06e910) at HttpSM.cc:6295 6295HttpSM.cc: No such file or directory. in HttpSM.cc (gdb) info local __func__ = setup_server_transfer hdr_size = 7902907 buf = 0x2f163e0 action = TCA_PASSTHRU_DECHUNKED_CONTENT alloc_index = 6 nbytes = 47727483405024
[jira] [Comment Edited] (TS-3285) Seg fault when 100 CONT handling is enabled
[ https://issues.apache.org/jira/browse/TS-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271543#comment-14271543 ] Sudheer Vinukonda edited comment on TS-3285 at 1/23/15 4:59 PM: Further debugging showed that 100 cont implementation calls {{do_io_write()}} with a reader on MIOBuffer {{ua_entry-write_buffer}}. The point where this buffer {{ua_entry-write_buffer}} is freed seems to be {{VC_EVENT_READ_COMPLETE/HTTP_TUNNEL_EVENT_PRECOMPLETE}} for the POST data being read from the client, instead of a WRITE_COMPLETE event for the 100 cont's {{do_io_write()}} operation. This could result in premature free'ing of the buffer, while the WRITE is not complete yet. Note how do_io_write/write_to_net (that still has a reference to the _writer of the 100 cont's MIOBuffer), internally may end up allocating iobuf via write_avail(). This piece of code (which could get executed after the 100 cont buffer is free'd in read_complete event above) could result in accessing the MIOBuffer after it's freed (and is on the free list). {code} /home/y/bin/traffic_server(new_IOBufferData_internal(char const*, long, AllocType)+0x51)[0x4f5bef] /home/y/bin/traffic_server(IOBufferBlock::alloc(long)+0x2c)[0x4f5eec] /home/y/bin/traffic_server(MIOBuffer::append_block(long)+0x3a)[0x51dee2] /home/y/bin/traffic_server(MIOBuffer::add_block()+0x22)[0x51df1a] /home/y/bin/traffic_server(MIOBuffer::check_add_block()+0x4b)[0x6cbd0b] /home/y/bin/traffic_server(MIOBuffer::write_avail()+0x18)[0x6cbd86] /home/y/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*)+0x397)[0x736175] /home/y/bin/traffic_server(write_to_net(NetHandler*, UnixNetVConnection*, EThread*)+0x80)[0x735dd7] /home/y/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x654)[0x72f5e2] /home/y/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6c)[0x4f5ad8] /home/y/bin/traffic_server(EThread::process_event(Event*, int)+0xc8)[0x756046] /home/y/bin/traffic_server(EThread::execute()+0x3dc)[0x756550] /home/y/bin/traffic_server[0x7555c4] /lib64/libpthread.so.0(+0x3036c079d1)[0x2aeb5b7e89d1] /lib64/libc.so.6(clone+0x6d)[0x30364e8b6d] {code} was (Author: sudheerv): Further debugging showed that 100 cont implementation calls do_io_write() with a reader on MIOBuffer (ua_entry-write_buffer). The point where this buffer (ua_entry-write_buffer) is freed seems to be VC_EVENT_READ_COMPLETE/HTTP_TUNNEL_EVENT_PRECOMPLETE for the POST data being read from the client, instead of a WRITE_COMPLETE event for the 100 cont's do_io_write() operation. This could result in premature free'ing of the buffer, while the WRITE is not complete yet. Note how do_io_write/write_to_net (that still has a reference to the _writer of the 100 cont's MIOBuffer), internally may end up allocating iobuf via write_avail(). This piece of code (which could get executed after the 100 cont buffer is free'd in read_complete event above) could result in accessing the MIOBuffer after it's freed (and is on the free list). {code} /home/y/bin/traffic_server(new_IOBufferData_internal(char const*, long, AllocType)+0x51)[0x4f5bef] /home/y/bin/traffic_server(IOBufferBlock::alloc(long)+0x2c)[0x4f5eec] /home/y/bin/traffic_server(MIOBuffer::append_block(long)+0x3a)[0x51dee2] /home/y/bin/traffic_server(MIOBuffer::add_block()+0x22)[0x51df1a] /home/y/bin/traffic_server(MIOBuffer::check_add_block()+0x4b)[0x6cbd0b] /home/y/bin/traffic_server(MIOBuffer::write_avail()+0x18)[0x6cbd86] /home/y/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*)+0x397)[0x736175] /home/y/bin/traffic_server(write_to_net(NetHandler*, UnixNetVConnection*, EThread*)+0x80)[0x735dd7] /home/y/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x654)[0x72f5e2] /home/y/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6c)[0x4f5ad8] /home/y/bin/traffic_server(EThread::process_event(Event*, int)+0xc8)[0x756046] /home/y/bin/traffic_server(EThread::execute()+0x3dc)[0x756550] /home/y/bin/traffic_server[0x7555c4] /lib64/libpthread.so.0(+0x3036c079d1)[0x2aeb5b7e89d1] /lib64/libc.so.6(clone+0x6d)[0x30364e8b6d] {code} Seg fault when 100 CONT handling is enabled --- Key: TS-3285 URL: https://issues.apache.org/jira/browse/TS-3285 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Fix For: 5.3.0 With 100 CONT handling enabled in our ats5 production hosts, we are seeing the below seg fault. {code} (gdb) bt #0 0x00316e432925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00316e434105 in abort () at abort.c:92 #2 0x2b6869944458 in ink_die_die_die (retval=1) at ink_error.cc:43 #3
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289600#comment-14289600 ] Susan Hinrichs commented on TS-2497: Looking at it some more, I wonder if the solution is just to eliminate the reset, at least in the branch without the deallocate_buffer. Another producer will be added to handle the traffic from the server back to the client, but due to changes in TS-3190, it is safer to leave the original producer/consumer around because the new producer will be started up explicitly. The original logic in many cases would start all producers at once. I'm having problems getting a case to even call handle_post_failure. Will look back at this later. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289632#comment-14289632 ] Brian Geffon commented on TS-2497: -- I'd have to do some reading, I don't really remember to much about this. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289632#comment-14289632 ] Brian Geffon edited comment on TS-2497 at 1/23/15 6:18 PM: --- I'd have to do some reading, I don't really remember much about this. was (Author: briang): I'd have to do some reading, I don't really remember to much about this. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3294) 5.3.0 Coverity Fixes
[ https://issues.apache.org/jira/browse/TS-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289488#comment-14289488 ] ASF subversion and git services commented on TS-3294: - Commit a2bc1245859c2add6b0468586566c83f630889a5 in trafficserver's branch refs/heads/master from [~sudheerv] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=a2bc124 ] [TS-3294] Add null pointer check Coverity CID:1021867 5.3.0 Coverity Fixes Key: TS-3294 URL: https://issues.apache.org/jira/browse/TS-3294 Project: Traffic Server Issue Type: Improvement Components: Cleanup, Quality Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Fix For: 5.3.0 Tracker Jira for 5.3.0 Coverity Fixes (Sudheer Vinukonda) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3294) 5.3.0 Coverity Fixes
[ https://issues.apache.org/jira/browse/TS-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289487#comment-14289487 ] ASF subversion and git services commented on TS-3294: - Commit b62ea0c9c80f8dad41a62253bc1fd7b1d97b22e6 in trafficserver's branch refs/heads/master from [~sudheerv] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=b62ea0c ] [TS-3294] Add null pointer check Coverity CID:1021868 5.3.0 Coverity Fixes Key: TS-3294 URL: https://issues.apache.org/jira/browse/TS-3294 Project: Traffic Server Issue Type: Improvement Components: Cleanup, Quality Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Fix For: 5.3.0 Tracker Jira for 5.3.0 Coverity Fixes (Sudheer Vinukonda) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3294) 5.3.0 Coverity Fixes
[ https://issues.apache.org/jira/browse/TS-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289489#comment-14289489 ] ASF subversion and git services commented on TS-3294: - Commit 1d19318b0d59436d007ccad85dfcbbfa1a722807 in trafficserver's branch refs/heads/master from [~sudheerv] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=1d19318 ] [TS-3294] Add null pointer check Coverity CID:1021866 5.3.0 Coverity Fixes Key: TS-3294 URL: https://issues.apache.org/jira/browse/TS-3294 Project: Traffic Server Issue Type: Improvement Components: Cleanup, Quality Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Fix For: 5.3.0 Tracker Jira for 5.3.0 Coverity Fixes (Sudheer Vinukonda) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build is back to normal : tsqa-master #41
See https://ci.trafficserver.apache.org/job/tsqa-master/41/
[jira] [Commented] (TS-3287) Coverity fixes for v5.3.0 by zwoop
[ https://issues.apache.org/jira/browse/TS-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289759#comment-14289759 ] ASF subversion and git services commented on TS-3287: - Commit fc559c126d302f6928046ec764854cadb540eedf in trafficserver's branch refs/heads/master from [~zwoop] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=fc559c1 ] TS-3287 Eliminate some dead code around random() Coverity CID #1261573 Coverity fixes for v5.3.0 by zwoop -- Key: TS-3287 URL: https://issues.apache.org/jira/browse/TS-3287 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 This is my JIRA for Coverity commits for v5.3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3287) Coverity fixes for v5.3.0 by zwoop
[ https://issues.apache.org/jira/browse/TS-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289761#comment-14289761 ] ASF subversion and git services commented on TS-3287: - Commit 1ccb1ea4c06ad6f563351320607de62e76c860b6 in trafficserver's branch refs/heads/master from [~zwoop] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=1ccb1ea ] TS-3287 Ignore the warning on random Coverity CID #1261572 Coverity fixes for v5.3.0 by zwoop -- Key: TS-3287 URL: https://issues.apache.org/jira/browse/TS-3287 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 This is my JIRA for Coverity commits for v5.3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3287) Coverity fixes for v5.3.0 by zwoop
[ https://issues.apache.org/jira/browse/TS-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289757#comment-14289757 ] ASF subversion and git services commented on TS-3287: - Commit f7f3055a22f175d9158f8a5ae473482519fcce43 in trafficserver's branch refs/heads/master from [~zwoop] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=f7f3055 ] TS-3287 Ignore this coverity error Coverity CID #1261575 Coverity fixes for v5.3.0 by zwoop -- Key: TS-3287 URL: https://issues.apache.org/jira/browse/TS-3287 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 This is my JIRA for Coverity commits for v5.3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3318) Remove mgmt/web2/WebHttpSession.{cc,h}
[ https://issues.apache.org/jira/browse/TS-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289760#comment-14289760 ] ASF subversion and git services commented on TS-3318: - Commit 7ec9f0cc37a530a4fcfc1b0d439e153f497c6ae1 in trafficserver's branch refs/heads/master from [~zwoop] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=7ec9f0c ] Added TS-3318 to CHANGES Remove mgmt/web2/WebHttpSession.{cc,h} -- Key: TS-3318 URL: https://issues.apache.org/jira/browse/TS-3318 Project: Traffic Server Issue Type: Improvement Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 It is unused, and causes some Coverity errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3318) Remove mgmt/web2/WebHttpSession.{cc,h}
[ https://issues.apache.org/jira/browse/TS-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289758#comment-14289758 ] ASF subversion and git services commented on TS-3318: - Commit cb7fc8f9efd67c7286616bcc94bf607035e28693 in trafficserver's branch refs/heads/master from [~zwoop] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=cb7fc8f ] TS-3318 Remove mgmt/Web2/WebHttpSession.{cc,h} This also helps fixing Coverity CID #1261573 Remove mgmt/web2/WebHttpSession.{cc,h} -- Key: TS-3318 URL: https://issues.apache.org/jira/browse/TS-3318 Project: Traffic Server Issue Type: Improvement Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 It is unused, and causes some Coverity errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3318) Remove mgmt/web2/WebHttpSession.{cc,h}
Leif Hedstrom created TS-3318: - Summary: Remove mgmt/web2/WebHttpSession.{cc,h} Key: TS-3318 URL: https://issues.apache.org/jira/browse/TS-3318 Project: Traffic Server Issue Type: Improvement Reporter: Leif Hedstrom It is unused, and causes some Coverity errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3318) Remove mgmt/web2/WebHttpSession.{cc,h}
[ https://issues.apache.org/jira/browse/TS-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom reassigned TS-3318: - Assignee: Leif Hedstrom Remove mgmt/web2/WebHttpSession.{cc,h} -- Key: TS-3318 URL: https://issues.apache.org/jira/browse/TS-3318 Project: Traffic Server Issue Type: Improvement Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 It is unused, and causes some Coverity errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3318) Remove mgmt/web2/WebHttpSession.{cc,h}
[ https://issues.apache.org/jira/browse/TS-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom resolved TS-3318. --- Resolution: Fixed Remove mgmt/web2/WebHttpSession.{cc,h} -- Key: TS-3318 URL: https://issues.apache.org/jira/browse/TS-3318 Project: Traffic Server Issue Type: Improvement Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 It is unused, and causes some Coverity errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3318) Remove mgmt/web2/WebHttpSession.{cc,h}
[ https://issues.apache.org/jira/browse/TS-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3318: -- Fix Version/s: 5.3.0 Remove mgmt/web2/WebHttpSession.{cc,h} -- Key: TS-3318 URL: https://issues.apache.org/jira/browse/TS-3318 Project: Traffic Server Issue Type: Improvement Reporter: Leif Hedstrom Assignee: Leif Hedstrom Fix For: 5.3.0 It is unused, and causes some Coverity errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3315) Assert after try lock
[ https://issues.apache.org/jira/browse/TS-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289937#comment-14289937 ] Alan M. Carroll commented on TS-3315: - That's not really the issue. If the code must have the lock, it should use {{MUTEX_LOCK}} not {{MUTEX_TRY_LOCK}}. If using the latter it should check and handle the unlocked case, not assert on failure. If we really need to check, why not directly check the lock to see if it is (1) locked and (2) the locking thread is this thread rather than trying to lock? E.g. {code} ink_assert(NULL == cont-mutex || cont-mutex-thread_holding == this_ethread()) {code} Assert after try lock - Key: TS-3315 URL: https://issues.apache.org/jira/browse/TS-3315 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Phil Sorber In iocore/cache/Cache.cc there is the following: {code} CACHE_TRY_LOCK(lock, cont-mutex, this_ethread()); ink_assert(lock.is_locked()); {code} Does it really make sense to try and assert when a try can fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289206#comment-14289206 ] Feifei Cai edited comment on TS-2497 at 1/23/15 12:53 PM: -- Memory leak is noticed in our production hosts. It should be related to handling 5xx response from origin sever. The dump info is as follows, it's from 1 host with ~70% POST requests. I enabled memory dump {{proxy.config.dump_mem_info_frequency}} and track {{proxy.config.res_track_memory}}. *traffic.out:* {noformat} allocated |in-use | type size | free list name |||-- 0 | 0 |2097152 | memory/ioBufAllocator[14] 0 | 0 |1048576 | memory/ioBufAllocator[13] 0 | 0 | 524288 | memory/ioBufAllocator[12] 0 | 0 | 262144 | memory/ioBufAllocator[11] 0 | 0 | 131072 | memory/ioBufAllocator[10] 0 | 0 | 65536 | memory/ioBufAllocator[9] 1266679808 | 1262354432 | 32768 | memory/ioBufAllocator[8] 600309760 | 599703552 | 16384 | memory/ioBufAllocator[7] 395051008 | 391086080 | 8192 | memory/ioBufAllocator[6] 229113856 | 224432128 | 4096 | memory/ioBufAllocator[5] 342622208 | 342503424 | 2048 | memory/ioBufAllocator[4] 245104640 | 245042176 | 1024 | memory/ioBufAllocator[3] 2228224 |2176512 |512 | memory/ioBufAllocator[2] 622592 | 607232 |256 | memory/ioBufAllocator[1] 2375680 |2370176 |128 | memory/ioBufAllocator[0] Location | Size In-use ---+ memory/IOBuffer/ProtocolProbeSessionAccept.cc:39 | 66768896 memory/IOBuffer/HttpClientSession.cc:230 |0 memory/IOBuffer/HttpSM.cc:3314 |0 memory/IOBuffer/HttpSM.cc:5349 | 3003506816 memory/IOBuffer/HttpSM.cc:5668 |0 memory/IOBuffer/HttpSM.cc:5874 |0 memory/IOBuffer/HttpSM.cc:5976 |0 memory/IOBuffer/HttpSM.cc:6267 |0 memory/IOBuffer/HttpServerSession.cc:87 |0 memory/IOBuffer/HttpTunnel.cc:95 |0 memory/IOBuffer/HttpTunnel.cc:100 |0 TOTAL | 3070275712 {noformat} I take a refer to [~shaunmcginnity]'s node.js with some changes, and reproduce the memory leak in my local environment. # [origin-server.js|https://issues.apache.org/jira/secure/attachment/12694146/client.js] This origin server responses a 503 when receives more than one single byte, so the post would not complete at most cases. I change [~shaunmcginnity]'s code, make origin server responses to ats, which would make ats hits another code path. # [client.js|https://issues.apache.org/jira/secure/attachment/12694145/origin-server.js] We create a new client per second, and each client try to post 32K bytes data. # ats *remap.config*: remap all to local port 5000 {quote}map / http://127.0.0.1:5000{quote} *records.config*: listen on 80 {quote}CONFIG proxy.config.http.server_ports STRING 80{quote} Then we can get dump info as follows, and in-use number of MIOBuffer with index=8 (size=32K) would increase 1 per second. {noformat} allocated |in-use | type size | free list name |||-- 1048576 | 32768 | 32768 | memory/ioBufAllocator[8] {noformat} We can also try change the Content-Length in client.js to a smaller size, and MIOBuffer with the corresponding index(0-7) would also increase. I add this simple patch to prevent the memory leak in the case above, just like last commit, and it's verified in 1 test host. free.diff {code} diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc index 932ef97..123b97a 100644 --- a/proxy/http/HttpSM.cc +++ b/proxy/http/HttpSM.cc @@ -5074,6 +5074,7 @@ HttpSM::handle_post_failure() t_state.current.server-keep_alive = HTTP_NO_KEEPALIVE; if (server_buffer_reader-read_avail() 0) { +tunnel.deallocate_buffers(); tunnel.reset(); // There's data from the server so try to read the header
[jira] [Updated] (TS-3319) Adapt to Openssl 1.,0.2 Certificate Callback
[ https://issues.apache.org/jira/browse/TS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3319: --- Issue Type: Improvement (was: Bug) Adapt to Openssl 1.,0.2 Certificate Callback Key: TS-3319 URL: https://issues.apache.org/jira/browse/TS-3319 Project: Traffic Server Issue Type: Improvement Reporter: Susan Hinrichs With TS-3006, we provided a patch for openssl 1.0.1 to enable the SNI callback to pause. With openssl 1.0.2 the client certificate callback is extended to work for server certificate selection. You can return values to pause the SSL processing after the client hello here as well. The details are at https://www.openssl.org/docs/ssl/SSL_CTX_set_cert_cb.html ATS should be extended to use the certificate callback mechanism if openssl 1.0.2 is available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3319) Adapt to Openssl 1.,0.2 Certificate Callback
Susan Hinrichs created TS-3319: -- Summary: Adapt to Openssl 1.,0.2 Certificate Callback Key: TS-3319 URL: https://issues.apache.org/jira/browse/TS-3319 Project: Traffic Server Issue Type: Bug Reporter: Susan Hinrichs With TS-3006, we provided a patch for openssl 1.0.1 to enable the SNI callback to pause. With openssl 1.0.2 the client certificate callback is extended to work for server certificate selection. You can return values to pause the SSL processing after the client hello here as well. The details are at https://www.openssl.org/docs/ssl/SSL_CTX_set_cert_cb.html ATS should be extended to use the certificate callback mechanism if openssl 1.0.2 is available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3319) Adapt to Openssl 1.,0.2 Certificate Callback
[ https://issues.apache.org/jira/browse/TS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs reassigned TS-3319: -- Assignee: Susan Hinrichs Adapt to Openssl 1.,0.2 Certificate Callback Key: TS-3319 URL: https://issues.apache.org/jira/browse/TS-3319 Project: Traffic Server Issue Type: Improvement Reporter: Susan Hinrichs Assignee: Susan Hinrichs With TS-3006, we provided a patch for openssl 1.0.1 to enable the SNI callback to pause. With openssl 1.0.2 the client certificate callback is extended to work for server certificate selection. You can return values to pause the SSL processing after the client hello here as well. The details are at https://www.openssl.org/docs/ssl/SSL_CTX_set_cert_cb.html ATS should be extended to use the certificate callback mechanism if openssl 1.0.2 is available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3319) Adapt to Openssl 1.0.2 Certificate Callback
[ https://issues.apache.org/jira/browse/TS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3319: -- Fix Version/s: 5.3.0 Adapt to Openssl 1.0.2 Certificate Callback --- Key: TS-3319 URL: https://issues.apache.org/jira/browse/TS-3319 Project: Traffic Server Issue Type: Improvement Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 5.3.0 With TS-3006, we provided a patch for openssl 1.0.1 to enable the SNI callback to pause. With openssl 1.0.2 the client certificate callback is extended to work for server certificate selection. You can return values to pause the SSL processing after the client hello here as well. The details are at https://www.openssl.org/docs/ssl/SSL_CTX_set_cert_cb.html ATS should be extended to use the certificate callback mechanism if openssl 1.0.2 is available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3319) Adapt to Openssl 1.0.2 Certificate Callback
[ https://issues.apache.org/jira/browse/TS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3319: -- Component/s: SSL Adapt to Openssl 1.0.2 Certificate Callback --- Key: TS-3319 URL: https://issues.apache.org/jira/browse/TS-3319 Project: Traffic Server Issue Type: Improvement Components: SSL Reporter: Susan Hinrichs Assignee: Susan Hinrichs Fix For: 5.3.0 With TS-3006, we provided a patch for openssl 1.0.1 to enable the SNI callback to pause. With openssl 1.0.2 the client certificate callback is extended to work for server certificate selection. You can return values to pause the SSL processing after the client hello here as well. The details are at https://www.openssl.org/docs/ssl/SSL_CTX_set_cert_cb.html ATS should be extended to use the certificate callback mechanism if openssl 1.0.2 is available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3319) Adapt to Openssl 1.0.2 Certificate Callback
[ https://issues.apache.org/jira/browse/TS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3319: --- Summary: Adapt to Openssl 1.0.2 Certificate Callback (was: Adapt to Openssl 1.,0.2 Certificate Callback) Adapt to Openssl 1.0.2 Certificate Callback --- Key: TS-3319 URL: https://issues.apache.org/jira/browse/TS-3319 Project: Traffic Server Issue Type: Improvement Reporter: Susan Hinrichs Assignee: Susan Hinrichs With TS-3006, we provided a patch for openssl 1.0.1 to enable the SNI callback to pause. With openssl 1.0.2 the client certificate callback is extended to work for server certificate selection. You can return values to pause the SSL processing after the client hello here as well. The details are at https://www.openssl.org/docs/ssl/SSL_CTX_set_cert_cb.html ATS should be extended to use the certificate callback mechanism if openssl 1.0.2 is available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: tsqa-master #40
See https://ci.trafficserver.apache.org/job/tsqa-master/40/ -- Started by timer Building remotely on QA1 (qa) in workspace https://ci.trafficserver.apache.org/job/tsqa-master/ws/ /usr/bin/git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository /usr/bin/git config remote.origin.url git://git.apache.org/trafficserver.git # timeout=10 Cleaning workspace /usr/bin/git rev-parse --verify HEAD # timeout=10 Resetting working tree /usr/bin/git reset --hard # timeout=10 ERROR: Error fetching remote repo 'origin' ERROR: Error fetching remote repo 'origin'
[jira] [Commented] (TS-3315) Assert after try lock
[ https://issues.apache.org/jira/browse/TS-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289020#comment-14289020 ] taorui commented on TS-3315: it`s great if you document it. if the caller has interest of the result of remove (cont != NULL), then the current thread must hold the lock of cont-mutex if not (cont == NULL), the try should lock. Assert after try lock - Key: TS-3315 URL: https://issues.apache.org/jira/browse/TS-3315 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Phil Sorber In iocore/cache/Cache.cc there is the following: {code} CACHE_TRY_LOCK(lock, cont-mutex, this_ethread()); ink_assert(lock.is_locked()); {code} Does it really make sense to try and assert when a try can fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3235) PluginVC crashed with unrecognized event
[ https://issues.apache.org/jira/browse/TS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289055#comment-14289055 ] zouyu commented on TS-3235: --- [~briang], [~amc],[~portl4t] After checking the lib/atscppapi/src/InterceptPlugin.cc, there are some problems. 1. InterceptPlugin uses atscppapi::Mutex which is separate from ProxyMutex which is used by ATS threads. 2. InterceptPlugin uses TSMutexCreate to create a mutex to sync continuation in InterceptPlugin::InterceptPlugin, but it doesn't use it to sync when calling 'InterceptPlugin::produce' 'InterceptPlugin::setOutputComplete' , instead it calls 'getMutex()', which actually calls its parent class 'TransactionPlugin::getMutex()', and uses TransactionPlugin::state_-mutex. So, it cannot sync the customer threads with ats threads. 3. when calling 'InterceptPlugin::handleEvent' function, it locks 'plugin_handle-mutex_' which is also 'TransactionPlugin::state_-mutex'. So, it cannot sync the customer threads with ats threads. Seems that we need to enhance InterceptPlugin to use the correct mutex, i think we should replace all above mutex to the one which is set into the continuation. PluginVC crashed with unrecognized event Key: TS-3235 URL: https://issues.apache.org/jira/browse/TS-3235 Project: Traffic Server Issue Type: Bug Components: CPP API, HTTP, Plugins Reporter: kang li Assignee: Brian Geffon Fix For: 5.3.0 Attachments: pluginvc-crash.diff We are using atscppapi to create Intercept plugin. From the coredump , that seems Continuation of the InterceptPlugin was already been destroyed. {code} #0 0x00375ac32925 in raise () from /lib64/libc.so.6 #1 0x00375ac34105 in abort () from /lib64/libc.so.6 #2 0x2b21eeae3458 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x2b21eeae3525 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`, ap=0x2b21f4913ad0) at ink_error.cc:65 #4 0x2b21eeae35ee in ink_fatal (return_code=1, message_format=0x2b21eeaf08d8 %s:%d: failed assert `%s`) at ink_error.cc:73 #5 0x2b21eeae2160 in _ink_assert (expression=0x76ddb8 call_event == core_lock_retry_event, file=0x76dd04 PluginVC.cc, line=203) at ink_assert.cc:37 #6 0x00530217 in PluginVC::main_handler (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at PluginVC.cc:203 #7 0x004f5854 in Continuation::handleEvent (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at ../iocore/eventsystem/I_Continuation.h:146 #8 0x00755d26 in EThread::process_event (this=0x309b250, e=0xe0f5b80, calling_code=1) at UnixEThread.cc:145 #9 0x0075610a in EThread::execute (this=0x309b250) at UnixEThread.cc:239 #10 0x00755284 in spawn_thread_internal (a=0x2849330) at Thread.cc:88 #11 0x2b21ef05f9d1 in start_thread () from /lib64/libpthread.so.0 #12 0x00375ace8b7d in clone () from /lib64/libc.so.6 (gdb) p sm_lock_retry_event $13 = (Event *) 0x2b2496146e90 (gdb) p core_lock_retry_event $14 = (Event *) 0x0 (gdb) p active_event $15 = (Event *) 0x0 (gdb) p inactive_event $16 = (Event *) 0x0 (gdb) p *(INKContInternal*)this-core_obj-connect_to Cannot access memory at address 0x2b269cd46c10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3243) Warnings from loading legitimate TLS certificates
[ https://issues.apache.org/jira/browse/TS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289300#comment-14289300 ] ASF subversion and git services commented on TS-3243: - Commit 4f043934c5d8e56e2ea6fa8f88badb7345e37d1c in trafficserver's branch refs/heads/master from shinrich [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=4f04393 ] TS-3243: Remove warnings while loading certificates with duplicate names. Warnings from loading legitimate TLS certificates - Key: TS-3243 URL: https://issues.apache.org/jira/browse/TS-3243 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Leif Hedstrom Assignee: Susan Hinrichs Fix For: 5.3.0 When loading a legitimate certificate (from Go Daddy), which has a domain name of trafficserver.apache.org as well as some SNs which includes trafficserver.apache.org as well, we get these warnings: {code} [Dec 17 16:01:19.540] Server {0x2b58fdcadf40} NOTE: loading SSL certificate configuration from /usr/local/etc/trafficserver/ssl_multicert.config [Dec 17 16:01:19.545] Server {0x2b58fdcadf40} WARNING: previously indexed 'trafficserver.apache.org' with SSL_CTX 0x1, cannot index it with SSL_CTX #2 now {code} I've looked at a couple certs from GD, and this practice seems normal. I don't think we should warn on this case, if the domain name for the cert is duplicated in the SN, just ignore the latter right ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3243) Warnings from loading legitimate TLS certificates
[ https://issues.apache.org/jira/browse/TS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289301#comment-14289301 ] Susan Hinrichs commented on TS-3243: This fixes the GD case. If the subject name is repeated in the SAN, we do not warn. Not sure if this addresses the case Dave is seeing. Let me know if this does not address the wildcard repeat. Warnings from loading legitimate TLS certificates - Key: TS-3243 URL: https://issues.apache.org/jira/browse/TS-3243 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Leif Hedstrom Assignee: Susan Hinrichs Fix For: 5.3.0 When loading a legitimate certificate (from Go Daddy), which has a domain name of trafficserver.apache.org as well as some SNs which includes trafficserver.apache.org as well, we get these warnings: {code} [Dec 17 16:01:19.540] Server {0x2b58fdcadf40} NOTE: loading SSL certificate configuration from /usr/local/etc/trafficserver/ssl_multicert.config [Dec 17 16:01:19.545] Server {0x2b58fdcadf40} WARNING: previously indexed 'trafficserver.apache.org' with SSL_CTX 0x1, cannot index it with SSL_CTX #2 now {code} I've looked at a couple certs from GD, and this practice seems normal. I don't think we should warn on this case, if the domain name for the cert is duplicated in the SN, just ignore the latter right ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TS-3243) Warnings from loading legitimate TLS certificates
[ https://issues.apache.org/jira/browse/TS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs resolved TS-3243. Resolution: Fixed Warnings from loading legitimate TLS certificates - Key: TS-3243 URL: https://issues.apache.org/jira/browse/TS-3243 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Leif Hedstrom Assignee: Susan Hinrichs Fix For: 5.3.0 When loading a legitimate certificate (from Go Daddy), which has a domain name of trafficserver.apache.org as well as some SNs which includes trafficserver.apache.org as well, we get these warnings: {code} [Dec 17 16:01:19.540] Server {0x2b58fdcadf40} NOTE: loading SSL certificate configuration from /usr/local/etc/trafficserver/ssl_multicert.config [Dec 17 16:01:19.545] Server {0x2b58fdcadf40} WARNING: previously indexed 'trafficserver.apache.org' with SSL_CTX 0x1, cannot index it with SSL_CTX #2 now {code} I've looked at a couple certs from GD, and this practice seems normal. I don't think we should warn on this case, if the domain name for the cert is duplicated in the SN, just ignore the latter right ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289413#comment-14289413 ] Sudheer Vinukonda commented on TS-2497: --- [~ffcai]: Does the below patch fix the leak as well as address the concerns about not breaking TS-2497? diff --git a/proxy/http/HttpTunnel.cc b/proxy/http/HttpTunnel.cc index d75b5a1..75df1a5 100644 --- a/proxy/http/HttpTunnel.cc +++ b/proxy/http/HttpTunnel.cc @@ -621,6 +621,11 @@ HttpTunnel::add_producer(VConnection * vc, if ((p = alloc_producer()) != NULL) { p-vc = vc; p-nbytes = nbytes_arg; +if (p-read_buffer) { + free_MIOBuffer(p-read_buffer); + p-read_buffer = NULL; + p-buffer_start = NULL; +} p-buffer_start = reader_start; p-read_buffer = reader_start-mbuf; p-vc_handler = sm_handler; Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289413#comment-14289413 ] Sudheer Vinukonda edited comment on TS-2497 at 1/23/15 3:41 PM: [~ffcai]: Does the below patch fix the leak as well as address the concerns about not breaking TS-2497? {code} diff --git a/proxy/http/HttpTunnel.cc b/proxy/http/HttpTunnel.cc index d75b5a1..75df1a5 100644 --- a/proxy/http/HttpTunnel.cc +++ b/proxy/http/HttpTunnel.cc @@ -621,6 +621,11 @@ HttpTunnel::add_producer(VConnection * vc, if ((p = alloc_producer()) != NULL) { p-vc = vc; p-nbytes = nbytes_arg; +if (p-read_buffer) { + free_MIOBuffer(p-read_buffer); + p-read_buffer = NULL; + p-buffer_start = NULL; +} p-buffer_start = reader_start; p-read_buffer = reader_start-mbuf; p-vc_handler = sm_handler; {code} was (Author: sudheerv): [~ffcai]: Does the below patch fix the leak as well as address the concerns about not breaking TS-2497? diff --git a/proxy/http/HttpTunnel.cc b/proxy/http/HttpTunnel.cc index d75b5a1..75df1a5 100644 --- a/proxy/http/HttpTunnel.cc +++ b/proxy/http/HttpTunnel.cc @@ -621,6 +621,11 @@ HttpTunnel::add_producer(VConnection * vc, if ((p = alloc_producer()) != NULL) { p-vc = vc; p-nbytes = nbytes_arg; +if (p-read_buffer) { + free_MIOBuffer(p-read_buffer); + p-read_buffer = NULL; + p-buffer_start = NULL; +} p-buffer_start = reader_start; p-read_buffer = reader_start-mbuf; p-vc_handler = sm_handler; Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289436#comment-14289436 ] Susan Hinrichs commented on TS-2497: I'm a bit unclear on the original problem that was being solved. Looking at the two commits, it appears that the tunnel.deallocate_buffers(); was moved from always being called in HttpSM::handle_post_failure to only being called if (server_buffer_reader-read_avail() = 0). But tunnel.reset is called in all cases (regardless of the value of server_buffer_reader-read_avail()), so [~ffcai] is seeing a leak in the case where server_buffer_reader-read_avail() 0. But if we add tunnel.deallocate_buffers(); then we are in the original case as far as I can tell. Judging from the original stack trace, it looks like there was a lingering read or write on the tunnel buffer. TS-1425 fixed that for the user agent side by canceling the read on the ua_session. Perhaps the real solution here is to cancel the read on the server_session? And then deallocate_buffers for the tunnel in all cases. [~jacksontj] and [~briang] do you still have your notes on reproducing the original crash? Then we could experiment with adding back the deallocate_buffer with a read cancel and see if we can safely solve the memory leak. Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feifei Cai updated TS-2497: --- Attachment: client.js origin-server.js Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289206#comment-14289206 ] Feifei Cai commented on TS-2497: Memory leak is noticed in our production hosts. It should be related to handling 5xx response from origin sever. The dump info is as follows, it's from 1 host with ~70% POST requests. I enabled memory dump {{proxy.config.dump_mem_info_frequency}} and track {{proxy.config.res_track_memory}}. *traffic.out:* {noformat} allocated |in-use | type size | free list name |||-- 0 | 0 |2097152 | memory/ioBufAllocator[14] 0 | 0 |1048576 | memory/ioBufAllocator[13] 0 | 0 | 524288 | memory/ioBufAllocator[12] 0 | 0 | 262144 | memory/ioBufAllocator[11] 0 | 0 | 131072 | memory/ioBufAllocator[10] 0 | 0 | 65536 | memory/ioBufAllocator[9] 1266679808 | 1262354432 | 32768 | memory/ioBufAllocator[8] 600309760 | 599703552 | 16384 | memory/ioBufAllocator[7] 395051008 | 391086080 | 8192 | memory/ioBufAllocator[6] 229113856 | 224432128 | 4096 | memory/ioBufAllocator[5] 342622208 | 342503424 | 2048 | memory/ioBufAllocator[4] 245104640 | 245042176 | 1024 | memory/ioBufAllocator[3] 2228224 |2176512 |512 | memory/ioBufAllocator[2] 622592 | 607232 |256 | memory/ioBufAllocator[1] 2375680 |2370176 |128 | memory/ioBufAllocator[0] Location | Size In-use ---+ memory/IOBuffer/ProtocolProbeSessionAccept.cc:39 | 66768896 memory/IOBuffer/HttpClientSession.cc:230 |0 memory/IOBuffer/HttpSM.cc:3314 |0 memory/IOBuffer/HttpSM.cc:5349 | 3003506816 memory/IOBuffer/HttpSM.cc:5668 |0 memory/IOBuffer/HttpSM.cc:5874 |0 memory/IOBuffer/HttpSM.cc:5976 |0 memory/IOBuffer/HttpSM.cc:6267 |0 memory/IOBuffer/HttpServerSession.cc:87 |0 memory/IOBuffer/HttpTunnel.cc:95 |0 memory/IOBuffer/HttpTunnel.cc:100 |0 TOTAL | 3070275712 {noformat} I take a refer to [~shaunmcginnity]'s node.js with some changes, and reproduce the memory leak in my local environment. # origin-server.js This origin server responses a 503 when receives more than one single byte, so the post would not complete at most cases. I change [~shaunmcginnity]'s code, make origin server responses to ats, which would make ats hits another code path. # client.js We create a new client per second, and each client try to post 32K bytes data. # ATS *remap.config*: remap all to local port 5000 {quote}map / http://127.0.0.1:5000{quote} *records.config*: listen on 80 {quote}CONFIG proxy.config.http.server_ports STRING 80{quote} Then we can get dump info as follows, and in-use number of MIOBuffer with index=8 (size=32K) would increase 1 per second. {noformat} allocated |in-use | type size | free list name |||-- 1048576 | 32768 | 32768 | memory/ioBufAllocator[8] {noformat} We can also try change the Content-Length in client.js to a smaller size, and MIOBuffer with the corresponding index(0-7) would also increase. I add this simple patch to prevent the memory leak in the case above, just like last commit, and it's verified in 1 test host. free.diff {code} diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc index 932ef97..123b97a 100644 --- a/proxy/http/HttpSM.cc +++ b/proxy/http/HttpSM.cc @@ -5074,6 +5074,7 @@ HttpSM::handle_post_failure() t_state.current.server-keep_alive = HTTP_NO_KEEPALIVE; if (server_buffer_reader-read_avail() 0) { +tunnel.deallocate_buffers(); tunnel.reset(); // There's data from the server so try to read the header setup_server_read_response_header(); {code} *traffic.out* {noformat} allocated |in-use | type size | free list name
[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289206#comment-14289206 ] Feifei Cai edited comment on TS-2497 at 1/23/15 12:54 PM: -- Memory leak is noticed in our production hosts. It should be related to handling 5xx response from origin sever. The dump info is as follows, it's from 1 host with ~70% POST requests. I enabled memory dump {{proxy.config.dump_mem_info_frequency}} and track {{proxy.config.res_track_memory}}. *traffic.out:* {noformat} allocated |in-use | type size | free list name |||-- 0 | 0 |2097152 | memory/ioBufAllocator[14] 0 | 0 |1048576 | memory/ioBufAllocator[13] 0 | 0 | 524288 | memory/ioBufAllocator[12] 0 | 0 | 262144 | memory/ioBufAllocator[11] 0 | 0 | 131072 | memory/ioBufAllocator[10] 0 | 0 | 65536 | memory/ioBufAllocator[9] 1266679808 | 1262354432 | 32768 | memory/ioBufAllocator[8] 600309760 | 599703552 | 16384 | memory/ioBufAllocator[7] 395051008 | 391086080 | 8192 | memory/ioBufAllocator[6] 229113856 | 224432128 | 4096 | memory/ioBufAllocator[5] 342622208 | 342503424 | 2048 | memory/ioBufAllocator[4] 245104640 | 245042176 | 1024 | memory/ioBufAllocator[3] 2228224 |2176512 |512 | memory/ioBufAllocator[2] 622592 | 607232 |256 | memory/ioBufAllocator[1] 2375680 |2370176 |128 | memory/ioBufAllocator[0] Location | Size In-use ---+ memory/IOBuffer/ProtocolProbeSessionAccept.cc:39 | 66768896 memory/IOBuffer/HttpClientSession.cc:230 |0 memory/IOBuffer/HttpSM.cc:3314 |0 memory/IOBuffer/HttpSM.cc:5349 | 3003506816 memory/IOBuffer/HttpSM.cc:5668 |0 memory/IOBuffer/HttpSM.cc:5874 |0 memory/IOBuffer/HttpSM.cc:5976 |0 memory/IOBuffer/HttpSM.cc:6267 |0 memory/IOBuffer/HttpServerSession.cc:87 |0 memory/IOBuffer/HttpTunnel.cc:95 |0 memory/IOBuffer/HttpTunnel.cc:100 |0 TOTAL | 3070275712 {noformat} I take a refer to [~shaunmcginnity]'s node.js with some changes, and reproduce the memory leak in my local environment. # [origin-server.js|https://issues.apache.org/jira/secure/attachment/12694145/origin-server.js] This origin server responses a 503 when receives more than one single byte, so the post would not complete at most cases. I change [~shaunmcginnity]'s code, make origin server responses to ats, which would make ats hits another code path. # [client.js|https://issues.apache.org/jira/secure/attachment/12694146/client.js] We create a new client per second, and each client try to post 32K bytes data. # ats *remap.config*: remap all to local port 5000 {quote}map / http://127.0.0.1:5000{quote} *records.config*: listen on 80 {quote}CONFIG proxy.config.http.server_ports STRING 80{quote} Then we can get dump info as follows, and in-use number of MIOBuffer with index=8 (size=32K) would increase 1 per second. {noformat} allocated |in-use | type size | free list name |||-- 1048576 | 32768 | 32768 | memory/ioBufAllocator[8] {noformat} We can also try change the Content-Length in client.js to a smaller size, and MIOBuffer with the corresponding index(0-7) would also increase. I add this simple patch to prevent the memory leak in the case above, just like last commit, and it's verified in 1 test host. free.diff {code} diff --git a/proxy/http/HttpSM.cc b/proxy/http/HttpSM.cc index 932ef97..123b97a 100644 --- a/proxy/http/HttpSM.cc +++ b/proxy/http/HttpSM.cc @@ -5074,6 +5074,7 @@ HttpSM::handle_post_failure() t_state.current.server-keep_alive = HTTP_NO_KEEPALIVE; if (server_buffer_reader-read_avail() 0) { +tunnel.deallocate_buffers(); tunnel.reset(); // There's data from the server so try to read the header
[jira] [Commented] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289453#comment-14289453 ] Sudheer Vinukonda commented on TS-2497: --- I wonder if the issue that was originally fixed by [~briang] is similar to the issue resolved in TS-3285 (freeing the MIOBuffer while there's a write/read in progress, which could eventually corrupt the buffer on the free list). Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-2497) Failed post results in tunnel buffers being returned to freelist prematurely
[ https://issues.apache.org/jira/browse/TS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289453#comment-14289453 ] Sudheer Vinukonda edited comment on TS-2497 at 1/23/15 4:17 PM: I wonder if the issue that was originally fixed by [~briang] is similar to the issue resolved in TS-3285 (freeing the MIOBuffer while there's a write/read in progress, which could eventually corrupt the buffer on the free list). Also refer TS-3286 for some proposed improvements to detect buffer corruptions sooner/easier. was (Author: sudheerv): I wonder if the issue that was originally fixed by [~briang] is similar to the issue resolved in TS-3285 (freeing the MIOBuffer while there's a write/read in progress, which could eventually corrupt the buffer on the free list). Failed post results in tunnel buffers being returned to freelist prematurely Key: TS-2497 URL: https://issues.apache.org/jira/browse/TS-2497 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 4.2.0 Attachments: TS-2497.patch, client.js, origin-server.js, repro.js When a post fails to an origin server either the server died or the server returned a response without reading all of the post data, in either case, TS will destroy buffers too early. This normally does not result in a crash because the MIOBuffers are returned to the freelist and only with sufficient load will the race happen causing a crash. Additionally, even if a crash doesn't happen you might have data corruption across post requests from the buffers being used after being returned to the freelist. Thanks to Thomas Jackson for help reproducing and resolving this bug. An example stack trace, while we've seen other crashes in write_avail too. #0 0x004eff14 in IOBufferBlock::read_avail (this=0x0) at ../iocore/eventsystem/I_IOBuffer.h:362 #1 0x0050d151 in MIOBuffer::append_block_internal (this=0x2aab38001130, b=0x2aab0c037200) at ../iocore/eventsystem/P_IOBuffer.h:946 #2 0x0050d39b in MIOBuffer::append_block (this=0x2aab38001130, asize_index=15) at ../iocore/eventsystem/P_IOBuffer.h:986 #3 0x0050d49b in MIOBuffer::add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:994 #4 0x0055cee2 in MIOBuffer::check_add_block (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1002 #5 0x0055d115 in MIOBuffer::write_avail (this=0x2aab38001130) at ../iocore/eventsystem/P_IOBuffer.h:1048 #6 0x006c18f3 in read_from_net (nh=0x2aaafca0d208, vc=0x2aab1c009140, thread=0x2aaafca0a010) at UnixNetVConnection.cc:234 #7 0x006c37bf in UnixNetVConnection::net_read_io (this=0x2aab1c009140, nh=0x2aaafca0d208, lthread=0x2aaafca0a010) at UnixNetVConnection.cc:816 #8 0x006be392 in NetHandler::mainNetEvent (this=0x2aaafca0d208, event=5, e=0x271d8e0) at UnixNet.cc:380 #9 0x004f05c4 in Continuation::handleEvent (this=0x2aaafca0d208, event=5, data=0x271d8e0) at ../iocore/eventsystem/I_Continuation.h:146 #10 0x006e361e in EThread::process_event (this=0x2aaafca0a010, e=0x271d8e0, calling_code=5) at UnixEThread.cc:142 #11 0x006e3b13 in EThread::execute (this=0x2aaafca0a010) at UnixEThread.cc:264 #12 0x006e290b in spawn_thread_internal (a=0x2716400) at Thread.cc:88 #13 0x003372c077e1 in start_thread () from /lib64/libpthread.so.0 #14 0x0033728e68ed in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)