[jira] [Updated] (TS-5072) logstats: fix log buffer parser running state update
[ https://issues.apache.org/jira/browse/TS-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-5072: - Fix Version/s: 7.1.0 > logstats: fix log buffer parser running state update > > > Key: TS-5072 > URL: https://issues.apache.org/jira/browse/TS-5072 > Project: Traffic Server > Issue Type: Bug > Components: Logging, Tools >Reporter: Gancho Tenev > Fix For: 7.1.0 > > > While refactoring for better code reuse while working on TS-5069 found the > following comment in the code that parses the log buffer: > {code} > // TODO: If we save state (struct) for a run, we probably need to always > // update the origin data, no matter what the origin_set is. > {code} > (before in {{parse_log_buff()}}, now in {{find_or_create_stats()}}) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-5072) logstats: fix log buffer parser running state update
Gancho Tenev created TS-5072: Summary: logstats: fix log buffer parser running state update Key: TS-5072 URL: https://issues.apache.org/jira/browse/TS-5072 Project: Traffic Server Issue Type: Bug Components: Logging, Tools Reporter: Gancho Tenev While refactoring for better code reuse while working on TS-5069 found the following comment in the code that parses the log buffer: {code} // TODO: If we save state (struct) for a run, we probably need to always // update the origin data, no matter what the origin_set is. {code} (before in {{parse_log_buff()}}, now in {{find_or_create_stats()}}) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-5069) logstats: add ability to report stats per user instead of host
[ https://issues.apache.org/jira/browse/TS-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-5069: - Fix Version/s: 7.1.0 > logstats: add ability to report stats per user instead of host > -- > > Key: TS-5069 > URL: https://issues.apache.org/jira/browse/TS-5069 > Project: Traffic Server > Issue Type: Bug > Components: Logging, Tools >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.1.0 > > > We would like to enhance {{traffic_logstats}} with the ability to report > stats per user instead of host (from the URI). > Currently the {{traffic_logstats}} expects a binary squid log format defined > in the following ATS log config and aggregates and reports stats per the > authority part of the URI ({{host:port}} in an usual use-case) > {code} > % % %/% % % % > % %/% %"/> > {code} > It would be useful for our use-case to be able aggregate and report stats > based on the 8th squid log field which is an username of the authenticated > client {{%}}). > In our use-case we need to aggregate and report stats per > CDN-customer-specific-tag. > For example the new functionality would allow us to replace {{%caun}} with a > random header content {{%<\{@CustomerTagHeader\}cqh>}} and report stats per > CDN customer by using a new command line parameter {{--report_per_user}} w/o > adding extra fields to the binary squid format log expected by > {{traffic_logstats}} and keep it backward compatible with the previous > version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-5069) logstats: add ability to report stats per user instead of host
[ https://issues.apache.org/jira/browse/TS-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev reassigned TS-5069: Assignee: Gancho Tenev > logstats: add ability to report stats per user instead of host > -- > > Key: TS-5069 > URL: https://issues.apache.org/jira/browse/TS-5069 > Project: Traffic Server > Issue Type: Bug > Components: Logging, Tools >Reporter: Gancho Tenev >Assignee: Gancho Tenev > > We would like to enhance {{traffic_logstats}} with the ability to report > stats per user instead of host (from the URI). > Currently the {{traffic_logstats}} expects a binary squid log format defined > in the following ATS log config and aggregates and reports stats per the > authority part of the URI ({{host:port}} in an usual use-case) > {code} > % % %/% % % % > % %/% %"/> > {code} > It would be useful for our use-case to be able aggregate and report stats > based on the 8th squid log field which is an username of the authenticated > client {{%}}). > In our use-case we need to aggregate and report stats per > CDN-customer-specific-tag. > For example the new functionality would allow us to replace {{%caun}} with a > random header content {{%<\{@CustomerTagHeader\}cqh>}} and report stats per > CDN customer by using a new command line parameter {{--report_per_user}} w/o > adding extra fields to the binary squid format log expected by > {{traffic_logstats}} and keep it backward compatible with the previous > version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-5069) logstats: add ability to report stats per user instead of host
Gancho Tenev created TS-5069: Summary: logstats: add ability to report stats per user instead of host Key: TS-5069 URL: https://issues.apache.org/jira/browse/TS-5069 Project: Traffic Server Issue Type: Bug Components: Logging, Tools Reporter: Gancho Tenev We would like to enhance {{traffic_logstats}} with the ability to report stats per user instead of host (from the URI). Currently the {{traffic_logstats}} expects a binary squid log format defined in the following ATS log config and aggregates and reports stats per the authority part of the URI ({{host:port}} in an usual use-case) {code} % % %/% % % % % %/% %"/> {code} It would be useful for our use-case to be able aggregate and report stats based on the 8th squid log field which is an username of the authenticated client {{%}}). In our use-case we need to aggregate and report stats per CDN-customer-specific-tag. For example the new functionality would allow us to replace {{%caun}} with a random header content {{%<\{@CustomerTagHeader\}cqh>}} and report stats per CDN customer by using a new command line parameter {{--report_per_user}} w/o adding extra fields to the binary squid format log expected by {{traffic_logstats}} and keep it backward compatible with the previous version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629730#comment-15629730 ] Gancho Tenev commented on TS-4916: -- already back-ported with PR #1157 > Http2ConnectionState::restart_streams infinite loop causes deadlock > > > Key: TS-4916 > URL: https://issues.apache.org/jira/browse/TS-4916 > Project: Traffic Server > Issue Type: Bug > Components: Core, HTTP/2 >Reporter: Gancho Tenev >Assignee: Gancho Tenev >Priority: Blocker > Fix For: 7.0.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Http2ConnectionState::restart_streams falls into an infinite loop while > holding a lock, which leads to cache updates to start failing. > The infinite loop is caused by traversing a list whose last element “next” > points to the element itself and the traversal never finishes. > {code} > Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): > #0 0x2acf3fee in Http2ConnectionState::restart_streams > (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 > #1 rcv_window_update_frame (cstate=..., frame=...) at > Http2ConnectionState.cc:627 > #2 0x2acf9738 in Http2ConnectionState::main_event_handler > (this=0x2ae6ba5284c8, event=, edata=) at > Http2ConnectionState.cc:823 > #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, > event=2253, this=0x2ae6ba5284c8) at > ../../iocore/eventsystem/I_Continuation.h:153 > #4 send_connection_event (cont=cont@entry=0x2ae6ba5284c8, > event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at > Http2ClientSession.cc:58 > #5 0x2acef462 in Http2ClientSession::state_complete_frame_read > (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at > Http2ClientSession.cc:426 > #6 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at > ../../iocore/eventsystem/I_Continuation.h:153 > #7 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, > event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 > #8 0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at > ../../iocore/eventsystem/I_Continuation.h:153 > #9 Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, > event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431 > #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at > ../../iocore/eventsystem/I_Continuation.h:153 > #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, > event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 > #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=) at > ../../iocore/eventsystem/I_Continuation.h:153 > #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, > event=event@entry=100) at UnixNetVConnection.cc:153 > #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, > event=event@entry=100) at UnixNetVConnection.cc:1036 > #15 0x2ae47653 in SSLNetVConnection::net_read_io > (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at > SSLNetVConnection.cc:595 > #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, > event=, e=) at UnixNet.cc:513 > #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, > event=5, this=) at I_Continuation.h:153 > #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, > this=0x2aaab2406000) at UnixEThread.cc:148 > #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275 > #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at > Thread.cc:86 > #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at > pthread_create.c:301 > #22 0x2e8bc93d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 > {code} > Here is the stream_list trace. > {code} > (gdb) thread 51 > [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))] > #0 0x2acf3fee in Http2ConnectionState::restart_streams > (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 > (gdb) trace_list stream_list > --- count=0 --- > id=29 > this=0x2ae673f0c840 > next=0x2aaac05d8900 > prev=(nil) > --- count=1 --- > id=27 > this=0x2aaac05d8900 > next=0x2ae5b6bbec00 > prev=0x2ae673f0c840 > --- count=2 --- > id=19 > this=0x2ae5b6bbec00 > next=0x2ae5b6bbec00 > prev=0x2aaac05d8900 > --- count=3 --- > id=19 > this=0x2ae5b6bbec00 > next=0x2ae5b6bbec00 > prev=0x2aaac05d8900 > . . . > --- count=5560 --- > id=19 > this=0x2ae5b6bbec00 > next=0x2ae5b6bbec00 > prev=0x2aaac05d8900 > . . . > {code} > Currently I am working on finding out why the list in question got into this > “impossible” (broken)
[jira] [Comment Edited] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565557#comment-15565557 ] Gancho Tenev edited comment on TS-4916 at 10/11/16 3:07 PM: [~shinrich], appreciate your comment! I am running with version 6.2.1 and not with master. {{DLL<>}} not being thread-safe and multiple threads manipulating it concurrently was the first thing that came to mind. Changed the code so {{client_streams_count}} ++ / -- and {{stream_list.push()}} / {{stream_list.remove()}} are called from only 2 corresponding new functions {{add_to_active_streams()}} and {{rm_from_active_streams()}} (in {{Http2ConnectionState}}) and added assert like the following: {code} (1) ink_assert(this->mutex->thread_holding == this_ethread()); {code} which never triggered in my case (9+ days already)! In fact traffic_server gets into the same broken state described in my previous post (broken DLL structure with few elements where the last element points to itself) reliably and consistently on all the machines I inspected (10+ at this time) and I saw the same symptoms (cache update failures skyrocket after the H2-infinite loop happens) on many more machines. Then came up with the hypothesis described earlier and added the following assert to the {{add_to_active_streams()}} function: {code} (2) ink_assert(stream_list.in(new_stream)); {code} which started triggering quickly (for < 1 day) on all machines I tested. It was always the case that {{stream_list.head == new_stream}} (memory chunk with the same address have been already added to stream_list). Then changed the {{Http2Stream::delete_stream()}} to be called safely even if already deleted and added a “catch-all” call in {{Http2Stream::destroy()}} to make sure stream is deleted before destroying and the problem never happened ((1) and (2) never triggered for at least 3+ days). Looking into it more it turned out if {{Http2Stream::do_io_close()}} gets into {{_state==HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE}} (in version 6.2.1) it misses to delete the stream before it orders the stream self destruction (few lines below) so when it gets to {{Http2Stream::destroy()}} it frees the memory without removing the stream from the active list and then it runs into the H2-infinite-loop if the sequence of events described in my previous post happens. Making sure {{Http2Stream::do_io_close()}} deletes the stream before requesting the self-destruction (with {{VC_EVENT_EOS}}) fixes the problem (it can be solved in many different ways). Although I can see that there might be a race condition in that part of the code I just have never run into it ((1) never triggers and all experiments fit pretty well what I described in my previous post) After your comment I identified 4 places where we might not be holding the lock properly when modifying stream_list/client_stream_cout (not actually happening, just reading/tracing the code), added them and repeated the experiments and got the same results (including the failures (2) and the fix). I see that the master code has changed quite a lot since 6.2.1, checked with [~zwoop] and will probably prepare a PR for a back port to 6.2.x. Since the master code changed (i.e. {{Http2Stream::do_io_close()}}) there is a chance we don’t run into this H2-infinite-loop condition anymore (have not tested it yet) but we still use {{DLL<>}} and a memory pool in the same way so it seems possible that we can run into the same problem if we are not careful and will study the new master code more to see if we can do something to avoid it. Cheers, —Gancho was (Author: gancho): [~shinrich], appreciate your comment! I am running with version 6.2.1 and not with master. {{DLL<>}} not being thread-safe and multiple threads manipulating it concurrently was the first thing that came to mind. Changed the code so {{client_streams_count}} ++ / -- and {{stream_list.push()}} / {{stream_list.remove()}} are called from only 2 corresponding new functions {{add_to_active_streams()}} and {{rm_from_active_streams()}} (in {{Http2ConnectionState}}) and added assert like the following: {code} (1) ink_assert(this->mutex->thread_holding == this_ethread()); {code} which never triggered in my case (9+ days already)! In fact traffic_server gets into the same broken state described in my previous post (broken DLL structure with few elements where the last element points to itself) reliably and consistently on all the machines I inspected (10+ at this time) and I saw the same symptoms (cache update failures skyrocket after the H2-infinite loop happens) on many more machines. Then came up with the hypothesis described earlier and added the following assert to the {{add_to_active_streams()}} function: {code} (2) ink_assert(stream_list.in(stream)); {code} which started triggering quickly (for < 1 day) on all machines I
[jira] [Comment Edited] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565557#comment-15565557 ] Gancho Tenev edited comment on TS-4916 at 10/11/16 3:02 PM: [~shinrich], appreciate your comment! I am running with version 6.2.1 and not with master. {{DLL<>}} not being thread-safe and multiple threads manipulating it concurrently was the first thing that came to mind. Changed the code so {{client_streams_count}} ++ / -- and {{stream_list.push()}} / {{stream_list.remove()}} are called from only 2 corresponding new functions {{add_to_active_streams()}} and {{rm_from_active_streams()}} (in {{Http2ConnectionState}}) and added assert like the following: {code} (1) ink_assert(this->mutex->thread_holding == this_ethread()); {code} which never triggered in my case (9+ days already)! In fact traffic_server gets into the same broken state described in my previous post (broken DLL structure with few elements where the last element points to itself) reliably and consistently on all the machines I inspected (10+ at this time) and I saw the same symptoms (cache update failures skyrocket after the H2-infinite loop happens) on many more machines. Then came up with the hypothesis described earlier and added the following assert to the {{add_to_active_streams()}} function: {code} (2) ink_assert(stream_list.in(stream)); {code} which started triggering quickly (for < 1 day) on all machines I tested. It was always the case that {{stream_list.head == new_stream}} (memory chunk with the same address have been already added to stream_list). Then changed the {{Http2Stream::delete_stream()}} to be called safely even if already deleted and added a “catch-all” call in {{Http2Stream::destroy()}} to make sure stream is deleted before destroying and the problem never happened ((1) and (2) never triggered for at least 3+ days). Looking into it more it turned out if {{Http2Stream::do_io_close()}} gets into {{_state==HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE}} (in version 6.2.1) it misses to delete the stream before it orders the stream self destruction (few lines below) so when it gets to {{Http2Stream::destroy()}} it frees the memory without removing the stream from the active list and then it runs into the H2-infinite-loop if the sequence of events described in my previous post happens. Making sure {{Http2Stream::do_io_close()}} deletes the stream before requesting the self-destruction (with {{VC_EVENT_EOS}}) fixes the problem (it can be solved in many different ways). Although I can see that there might be a race condition in that part of the code I just have never run into it ((1) never triggers and all experiments fit pretty well what I described in my previous post) After your comment I identified 4 places where we might not be holding the lock properly when modifying stream_list/client_stream_cout (not actually happening, just reading/tracing the code), added them and repeated the experiments and got the same results (including the failures (2) and the fix). I see that the master code has changed quite a lot since 6.2.1, checked with [~zwoop] and will probably prepare a PR for a back port to 6.2.x. Since the master code changed (i.e. {{Http2Stream::do_io_close()}}) there is a chance we don’t run into this H2-infinite-loop condition anymore (have not tested it yet) but we still use {{DLL<>}} and a memory pool in the same way so it seems possible that we can run into the same problem if we are not careful and will study the new master code more to see if we can do something to avoid it. Cheers, —Gancho was (Author: gancho): [~shinrich], appreciate your comment! I am running with version 6.2.1 and not with master. {{DLL<>}} not being thread-safe and multiple threads manipulating it concurrently was the first thing that came to mind. Changed the code so {{client_streams_count}} ++ / -- and {{stream_list.push()}} / {{stream_list.remove()}} are called from only 2 corresponding new functions {{add_to_active_streams()}} and {{rm_from_active_streams()}} (in {{Http2ConnectionState}}) and added assert like the following: {code} (1) ink_assert(this->mutex->thread_holding == this_ethread()); {code} which never triggered in my case (9+ days already)! In fact traffic_server gets into the same broken state described in my previous post (broken DLL structure with few elements where the last element points to itself) reliably and consistently on all the machines I inspected (10+ at this time) and I saw the same symptoms (cache update failures skyrocket after the H2-infinite loop happens) on many more machines. Then came up with the hypothesis described earlier and added the following assert to the {{add_to_active_streams()}} function: {code} (2) ink_assert(stream_list.in(stream)); {code} which started triggering quickly (for < 1 day) on all machines I tested. It
[jira] [Commented] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565557#comment-15565557 ] Gancho Tenev commented on TS-4916: -- [~shinrich], appreciate your comment! I am running with version 6.2.1 and not with master. {{DLL<>}} not being thread-safe and multiple threads manipulating it concurrently was the first thing that came to mind. Changed the code so {{client_streams_count}} ++ / -- and {{stream_list.push()}} / {{stream_list.remove()}} are called from only 2 corresponding new functions {{add_to_active_streams()}} and {{rm_from_active_streams()}} (in {{Http2ConnectionState}}) and added assert like the following: {code} (1) ink_assert(this->mutex->thread_holding == this_ethread()); {code} which never triggered in my case (9+ days already)! In fact traffic_server gets into the same broken state described in my previous post (broken DLL structure with few elements where the last element points to itself) reliably and consistently on all the machines I inspected (10+ at this time) and I saw the same symptoms (cache update failures skyrocket after the H2-infinite loop happens) on many more machines. Then came up with the hypothesis described earlier and added the following assert to the {{add_to_active_streams()}} function: {code} (2) ink_assert(stream_list.in(stream)); {code} which started triggering quickly (for < 1 day) on all machines I tested. It was always the case that {{stream_list.head == new_stream}} (memory chunk with the same address have been already added to stream_list). Then changed the {{Http2Stream::delete_stream()}} to be called safely even if already deleted and added a “catch-all” call in {{Http2Stream::destroy()}} to make sure stream is deleted before destroying and the problem never happened ((1) and (2) never triggered for at least 3+ days). Looking into it more it turned out if {{Http2Stream::do_io_close()}} gets into {{_state==HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE}} (in version 6.2.1) it misses to delete the stream before it orders the stream self destruction (few lines below) so when it gets to {{Http2Stream::destroy()}} it frees the memory without removing the stream from the active list and then it runs into the H2-infinite-loop if the sequence of events described in my previous post happens. Making sure {{Http2Stream::destroy()}} deletes the stream before requesting the self-destruction (with {{VC_EVENT_EOS}}) fixes the problem (it can be solved in many different ways). Although I can see that there might be a race condition in that part of the code I just have never run into it ((1) never triggers and all experiments fit pretty well what I described in my previous post) After your comment I identified 4 places where we might not be holding the lock properly when modifying stream_list/client_stream_cout (not actually happening, just reading/tracing the code), added them and repeated the experiments and got the same results (including the failures (2) and the fix). I see that the master code has changed quite a lot since 6.2.1, checked with [~zwoop] and will probably prepare a PR for a back port to 6.2.x. Since the master code changed (i.e. {{Http2Stream::do_io_close()}}) there is a chance we don’t run into this H2-infinite-loop condition anymore (have not tested it yet) but we still use {{DLL<>}} and a memory pool in the same way so it seems possible that we can run into the same problem if we are not careful and will study the new master code more to see if we can do something to avoid it. Cheers, —Gancho > Http2ConnectionState::restart_streams infinite loop causes deadlock > > > Key: TS-4916 > URL: https://issues.apache.org/jira/browse/TS-4916 > Project: Traffic Server > Issue Type: Bug > Components: Core, HTTP/2 >Reporter: Gancho Tenev >Assignee: Gancho Tenev >Priority: Blocker > Fix For: 7.1.0 > > > Http2ConnectionState::restart_streams falls into an infinite loop while > holding a lock, which leads to cache updates to start failing. > The infinite loop is caused by traversing a list whose last element “next” > points to the element itself and the traversal never finishes. > {code} > Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): > #0 0x2acf3fee in Http2ConnectionState::restart_streams > (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 > #1 rcv_window_update_frame (cstate=..., frame=...) at > Http2ConnectionState.cc:627 > #2 0x2acf9738 in Http2ConnectionState::main_event_handler > (this=0x2ae6ba5284c8, event=, edata=) at > Http2ConnectionState.cc:823 > #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, > event=2253, this=0x2ae6ba5284c8) at > ../../iocore/eventsystem/I_Continuation.h:153 > #4
[jira] [Commented] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548770#comment-15548770 ] Gancho Tenev commented on TS-4916: -- Looked into this more and here are my findings / hypothesis. *Studied the list code (lib/ts/List.h)* and found a couple of problems (filed [TS-4935|https://issues.apache.org/jira/browse/TS-4935]) If the list is used improperly its internal structure gets damaged silently. If the the same element is added twice in a row the element’s “next” would start pointing to the element itself and all the pre-existing list content would be lost. All further additions will be OK but the next list traversal will be infinite. *How would we add the same element twice?* Since a memory pool is used to instantiate the streams it is not impossible to have exactly the same chunk returned by the pool. *How could adding the same chunk happen?* # a stream N is created and used, no new streams are created in the meanwhile # the stream N is closed and its memory chunk is released back to the pool when the stream is destroyed. # fail to remove stream N from the list of active streams (bug!) # a new stream N+1 is created right after destroying stream N (and getting exactly the same memory chunk from the memory pool used by stream N) # add the new stream N+1 to the list of active streams, in this way adding the same memory chunk to the list for a second time in a row and damaging the list’s internal structure # new streams can be added and deleted after this point but the next the active stream list iteration will be infinite. *Hypothesis validation* By the time we identify the infinite loop (which is pretty straight forward) all the useful info about how we got to this state is gone so in order to validate this hypothesis I had to collect some more data. Instrumented the code and added a check for failures to remove the stream from the list of active streams right before destroying it and in case of failure removed it (as a “catch all” safety net just before destroying) It usually took 1-3 days to reach the infinite loop/deadlock after restart. Run the experimental code for 3+ days without getting into the infinite loop / deadlock state and the collected data indicates that it would have failed to remove the stream from the active stream list (which would trigger the infinite loop state) 4 times during that period. I believe this validates the hypothesis. *Next steps* Identified an execution path which could fail to remove the element from the list before destroying the stream. Implemented a patch which I just started testing in prod and will provide update as soon as I validate it. Please let me know if more info is needed or something does not make sense and will be happy to look into it! Cheers, --Gancho > Http2ConnectionState::restart_streams infinite loop causes deadlock > > > Key: TS-4916 > URL: https://issues.apache.org/jira/browse/TS-4916 > Project: Traffic Server > Issue Type: Bug > Components: Core, HTTP/2 >Reporter: Gancho Tenev >Assignee: Gancho Tenev >Priority: Blocker > Fix For: 7.1.0 > > > Http2ConnectionState::restart_streams falls into an infinite loop while > holding a lock, which leads to cache updates to start failing. > The infinite loop is caused by traversing a list whose last element “next” > points to the element itself and the traversal never finishes. > {code} > Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): > #0 0x2acf3fee in Http2ConnectionState::restart_streams > (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 > #1 rcv_window_update_frame (cstate=..., frame=...) at > Http2ConnectionState.cc:627 > #2 0x2acf9738 in Http2ConnectionState::main_event_handler > (this=0x2ae6ba5284c8, event=, edata=) at > Http2ConnectionState.cc:823 > #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, > event=2253, this=0x2ae6ba5284c8) at > ../../iocore/eventsystem/I_Continuation.h:153 > #4 send_connection_event (cont=cont@entry=0x2ae6ba5284c8, > event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at > Http2ClientSession.cc:58 > #5 0x2acef462 in Http2ClientSession::state_complete_frame_read > (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at > Http2ClientSession.cc:426 > #6 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at > ../../iocore/eventsystem/I_Continuation.h:153 > #7 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, > event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 > #8 0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at >
[jira] [Created] (TS-4935) Adding same element twice in a row damages DLL's structure silently
Gancho Tenev created TS-4935: Summary: Adding same element twice in a row damages DLL's structure silently Key: TS-4935 URL: https://issues.apache.org/jira/browse/TS-4935 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Gancho Tenev If the DLL list (lib/ts/List.h) is used improperly its internal structure gets damaged silently without any indication to the caller (no assert or return code). If the the same element is added twice in a row the element's “next” would start pointing to the element itself and all the existing list content would be lost. All further additions will be OK but the next list traversal will be infinite. Also noticed that when a new element is added to the list the element’s “prev” is not initialized (not a problem in the most common case but should to be fixed). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev reassigned TS-4916: Assignee: Gancho Tenev > Http2ConnectionState::restart_streams infinite loop causes deadlock > > > Key: TS-4916 > URL: https://issues.apache.org/jira/browse/TS-4916 > Project: Traffic Server > Issue Type: Bug > Components: Core, HTTP/2 >Reporter: Gancho Tenev >Assignee: Gancho Tenev >Priority: Blocker > Fix For: 7.1.0 > > > Http2ConnectionState::restart_streams falls into an infinite loop while > holding a lock, which leads to cache updates to start failing. > The infinite loop is caused by traversing a list whose last element “next” > points to the element itself and the traversal never finishes. > {code} > Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): > #0 0x2acf3fee in Http2ConnectionState::restart_streams > (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 > #1 rcv_window_update_frame (cstate=..., frame=...) at > Http2ConnectionState.cc:627 > #2 0x2acf9738 in Http2ConnectionState::main_event_handler > (this=0x2ae6ba5284c8, event=, edata=) at > Http2ConnectionState.cc:823 > #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, > event=2253, this=0x2ae6ba5284c8) at > ../../iocore/eventsystem/I_Continuation.h:153 > #4 send_connection_event (cont=cont@entry=0x2ae6ba5284c8, > event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at > Http2ClientSession.cc:58 > #5 0x2acef462 in Http2ClientSession::state_complete_frame_read > (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at > Http2ClientSession.cc:426 > #6 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at > ../../iocore/eventsystem/I_Continuation.h:153 > #7 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, > event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 > #8 0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at > ../../iocore/eventsystem/I_Continuation.h:153 > #9 Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, > event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431 > #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=0x2ae6ba528290) at > ../../iocore/eventsystem/I_Continuation.h:153 > #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, > event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 > #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, > event=100, this=) at > ../../iocore/eventsystem/I_Continuation.h:153 > #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, > event=event@entry=100) at UnixNetVConnection.cc:153 > #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, > event=event@entry=100) at UnixNetVConnection.cc:1036 > #15 0x2ae47653 in SSLNetVConnection::net_read_io > (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at > SSLNetVConnection.cc:595 > #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, > event=, e=) at UnixNet.cc:513 > #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, > event=5, this=) at I_Continuation.h:153 > #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, > this=0x2aaab2406000) at UnixEThread.cc:148 > #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275 > #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at > Thread.cc:86 > #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at > pthread_create.c:301 > #22 0x2e8bc93d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 > {code} > Here is the stream_list trace. > {code} > (gdb) thread 51 > [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))] > #0 0x2acf3fee in Http2ConnectionState::restart_streams > (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 > (gdb) trace_list stream_list > --- count=0 --- > id=29 > this=0x2ae673f0c840 > next=0x2aaac05d8900 > prev=(nil) > --- count=1 --- > id=27 > this=0x2aaac05d8900 > next=0x2ae5b6bbec00 > prev=0x2ae673f0c840 > --- count=2 --- > id=19 > this=0x2ae5b6bbec00 > next=0x2ae5b6bbec00 > prev=0x2aaac05d8900 > --- count=3 --- > id=19 > this=0x2ae5b6bbec00 > next=0x2ae5b6bbec00 > prev=0x2aaac05d8900 > . . . > --- count=5560 --- > id=19 > this=0x2ae5b6bbec00 > next=0x2ae5b6bbec00 > prev=0x2aaac05d8900 > . . . > {code} > Currently I am working on finding out why the list in question got into this > “impossible” (broken) state and and eventually coming up with a fix. -- This message was sent by Atlassian JIRA
[jira] [Updated] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4916: - Description: Http2ConnectionState::restart_streams falls into an infinite loop while holding a lock, which leads to cache updates to start failing. The infinite loop is caused by traversing a list whose last element “next” points to the element itself and the traversal never finishes. {code} Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 #1 rcv_window_update_frame (cstate=..., frame=...) at Http2ConnectionState.cc:627 #2 0x2acf9738 in Http2ConnectionState::main_event_handler (this=0x2ae6ba5284c8, event=, edata=) at Http2ConnectionState.cc:823 #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, event=2253, this=0x2ae6ba5284c8) at ../../iocore/eventsystem/I_Continuation.h:153 #4 send_connection_event (cont=cont@entry=0x2ae6ba5284c8, event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at Http2ClientSession.cc:58 #5 0x2acef462 in Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:426 #6 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #7 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 #8 0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #9 Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431 #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=) at ../../iocore/eventsystem/I_Continuation.h:153 #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, event=event@entry=100) at UnixNetVConnection.cc:153 #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, event=event@entry=100) at UnixNetVConnection.cc:1036 #15 0x2ae47653 in SSLNetVConnection::net_read_io (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at SSLNetVConnection.cc:595 #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, event=, e=) at UnixNet.cc:513 #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, event=5, this=) at I_Continuation.h:153 #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, this=0x2aaab2406000) at UnixEThread.cc:148 #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275 #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at Thread.cc:86 #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at pthread_create.c:301 #22 0x2e8bc93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 {code} Here is the stream_list trace. {code} (gdb) thread 51 [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))] #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 (gdb) trace_list stream_list --- count=0 --- id=29 this=0x2ae673f0c840 next=0x2aaac05d8900 prev=(nil) --- count=1 --- id=27 this=0x2aaac05d8900 next=0x2ae5b6bbec00 prev=0x2ae673f0c840 --- count=2 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 --- count=3 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 . . . --- count=5560 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 . . . {code} Currently I am working on finding out why the list in question got into this “impossible” (broken) state and and eventually coming up with a fix. was: Http2ConnectionState::restart_streams falls into an infinite loop while holding a lock, which leads to cache updates to start failing. The infinite loop is caused by traversing a list whose last element “next” points to the element itself and the traversal never finishes. {code} Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 #1 rcv_window_update_frame (cstate=..., frame=...) at Http2ConnectionState.cc:627 #2 0x2acf9738 in Http2ConnectionState::main_event_handler (this=0x2ae6ba5284c8, event=, edata=) at Http2ConnectionState.cc:823 #3 0x2acef1c3 in Continuation::handleEvent
[jira] [Updated] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock
[ https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4916: - Description: Http2ConnectionState::restart_streams falls into an infinite loop while holding a lock, which leads to cache updates to start failing. The infinite loop is caused by traversing a list whose last element “next” points to the element itself and the traversal never finishes. {code} Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 #1 rcv_window_update_frame (cstate=..., frame=...) at Http2ConnectionState.cc:627 #2 0x2acf9738 in Http2ConnectionState::main_event_handler (this=0x2ae6ba5284c8, event=, edata=) at Http2ConnectionState.cc:823 #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, event=2253, this=0x2ae6ba5284c8) at ../../iocore/eventsystem/I_Continuation.h:153 #4 send_connection_event (cont=cont@entry=0x2ae6ba5284c8, event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at Http2ClientSession.cc:58 #5 0x2acef462 in Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:426 #6 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #7 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 #8 0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #9 Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431 #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=) at ../../iocore/eventsystem/I_Continuation.h:153 #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, event=event@entry=100) at UnixNetVConnection.cc:153 #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, event=event@entry=100) at UnixNetVConnection.cc:1036 #15 0x2ae47653 in SSLNetVConnection::net_read_io (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at SSLNetVConnection.cc:595 #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, event=, e=) at UnixNet.cc:513 #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, event=5, this=) at I_Continuation.h:153 #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, this=0x2aaab2406000) at UnixEThread.cc:148 #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275 #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at Thread.cc:86 #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at pthread_create.c:301 #22 0x2e8bc93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) thread 51 [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))] #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 (gdb) trace_list stream_list --- count=0 --- id=29 this=0x2ae673f0c840 next=0x2aaac05d8900 prev=(nil) --- count=1 --- id=27 this=0x2aaac05d8900 next=0x2ae5b6bbec00 prev=0x2ae673f0c840 --- count=2 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 --- count=3 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 . . . --- count=5560 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 . . . {code} Currently I am working on finding out why the list in question got into this “impossible” (broken) state and and eventually coming up with a fix. was: Http2ConnectionState::restart_streams falls into an infinite loop while holding a lock, which leads to cache updates to start failing. The infinite loop is caused by traversing a list whose last element “next” points to the element itself and the traversal never finishes. {code} Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 #1 rcv_window_update_frame (cstate=..., frame=...) at Http2ConnectionState.cc:627 #2 0x2acf9738 in Http2ConnectionState::main_event_handler (this=0x2ae6ba5284c8, event=, edata=) at Http2ConnectionState.cc:823 #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, event=2253,
[jira] [Created] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadblock
Gancho Tenev created TS-4916: Summary: Http2ConnectionState::restart_streams infinite loop causes deadblock Key: TS-4916 URL: https://issues.apache.org/jira/browse/TS-4916 Project: Traffic Server Issue Type: Bug Components: Core, HTTP/2 Reporter: Gancho Tenev Http2ConnectionState::restart_streams falls into an infinite loop while holding a lock, which leads to cache updates to start failing. The infinite loop is caused by traversing a list whose last element “next” points to the element itself and the traversal never finishes. {code} Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)): #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 #1 rcv_window_update_frame (cstate=..., frame=...) at Http2ConnectionState.cc:627 #2 0x2acf9738 in Http2ConnectionState::main_event_handler (this=0x2ae6ba5284c8, event=, edata=) at Http2ConnectionState.cc:823 #3 0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, event=2253, this=0x2ae6ba5284c8) at ../../iocore/eventsystem/I_Continuation.h:153 #4 send_connection_event (cont=cont@entry=0x2ae6ba5284c8, event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at Http2ClientSession.cc:58 #5 0x2acef462 in Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:426 #6 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #7 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 #8 0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #9 Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431 #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153 #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399 #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=) at ../../iocore/eventsystem/I_Continuation.h:153 #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, event=event@entry=100) at UnixNetVConnection.cc:153 #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, event=event@entry=100) at UnixNetVConnection.cc:1036 #15 0x2ae47653 in SSLNetVConnection::net_read_io (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at SSLNetVConnection.cc:595 #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, event=, e=) at UnixNet.cc:513 #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, event=5, this=) at I_Continuation.h:153 #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, this=0x2aaab2406000) at UnixEThread.cc:148 #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275 #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at Thread.cc:86 #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at pthread_create.c:301 #22 0x2e8bc93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) thread 51 [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))] #0 0x2acf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913 (gdb) trace_list stream_list --- count=0 --- id=29 this=0x2ae673f0c840 next=0x2aaac05d8900 prev=(nil) --- count=1 --- id=27 this=0x2aaac05d8900 next=0x2ae5b6bbec00 prev=0x2ae673f0c840 --- count=2 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 --- count=3 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 . . . --- count=5560 --- id=19 this=0x2ae5b6bbec00 next=0x2ae5b6bbec00 prev=0x2aaac05d8900 --- count=5 --- . . . {code} Currently I am working on finding out why the list in question got into this “impossible” (broken) state and and eventually coming up with a fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.
[ https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510517#comment-15510517 ] Gancho Tenev commented on TS-4334: -- [~jamesf], sounds good. The above solution was just to demonstrate the idea, you would need to adjust it to your particular use-case and verify if it works. Cheers, --Gancho > The cache_range_requests plugin always attempts to modify the cache key. > > > Key: TS-4334 > URL: https://issues.apache.org/jira/browse/TS-4334 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Nolan Astrein >Assignee: Gancho Tenev > Fix For: 7.1.0 > > > A TrafficServer administrator should be able to specify whether or not the > cache_range_requests plugin should modify the cache key. The cache key may > be modified by a previous plugin in a plugin chain and there is no way to > configure cache_range_requests not to do any further modifications to the > cache key. Having multiple plugins responsible for cache key modifications > can cause unexpected behavior, especially when a plugin chain ordering is > changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.
[ https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505097#comment-15505097 ] Gancho Tenev commented on TS-4334: -- Checked with [~latesonarinn] offline and here is an excerpt: 9/16/16, 12:06 PM Gancho Tenev: {quote} Proposed alternative solution using more generic means (cachekey and header_rewrite) here: [TS-4334|https://issues.apache.org/jira/browse/TS-4334] Could you please let me know if it works for you? {quote} 9/19/16, 9:59 AM Nolan Astrein: {quote} Clever solution. I think that will work. {quote} [~latesonarinn], could you please verify and/or close this Jira if all looks good? (if not please let me know if I can help!) Cheers! > The cache_range_requests plugin always attempts to modify the cache key. > > > Key: TS-4334 > URL: https://issues.apache.org/jira/browse/TS-4334 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Nolan Astrein >Assignee: Gancho Tenev > Fix For: 7.1.0 > > > A TrafficServer administrator should be able to specify whether or not the > cache_range_requests plugin should modify the cache key. The cache key may > be modified by a previous plugin in a plugin chain and there is no way to > configure cache_range_requests not to do any further modifications to the > cache key. Having multiple plugins responsible for cache key modifications > can cause unexpected behavior, especially when a plugin chain ordering is > changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.
[ https://issues.apache.org/jira/browse/TS-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504833#comment-15504833 ] Gancho Tenev commented on TS-4707: -- [~jrushford], [~pbchou], [~zwoop], I am not sure if I studied this enough but here is an idea: it seems that we could add a switch to {{cachekey}} plugin to make it use the "modify parent selection URI" API call instead of "modify cache key URI" API if something like: {{\@plugin=cachekey \@pparam=--parent_selection}} is used. This way we may be able to use independently different {{cachekey}} plugin instances per each remap rule (one for modifying the cache key URI and one for modifying the parent selection URI) to be able to cover for more use-cases, reuse the {{cachekey}} URI manipulation functionality and have consistent user experience. If this makes sense/works we may end up renaming the plugin to something more generic. Currently {{cachekey}} plugin has the ability to do {{fname}} and {{maxdirs}} features by using some of its regex related features (please see {{--capture-path=/capture/replace/}} @ [cachekey docs|https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/cachekey.en.html#path-section], I can provide examples if requested) I think {{fname}} would not make sense for manipulating the cache key URI and if we insist on non-regex ways to achieve {{fname}} and {{maxdirs}} we could add it as features available only if {{--parent_selection}} is used, or something in this sense. If this sounds sensible/feasible I can work on the {{cachekey}} plugin change. > Parent Consistent Hash Selection - add fname and maxdirs options. > - > > Key: TS-4707 > URL: https://issues.apache.org/jira/browse/TS-4707 > Project: Traffic Server > Issue Type: Improvement > Components: Parent Proxy >Reporter: Peter Chou >Assignee: Peter Chou > Fix For: 7.1.0 > > Time Spent: 11.5h > Remaining Estimate: 0h > > This enhancement adds two options, "fname" and "maxdirs", which can be used > to exclude the file-name and some of the directories in the path. The > remaining portions of the path are then used as part of the hash computation > for selecting among multiple parent caches. > For our usage, it was desirable from an operational perspective to direct all > components of particular sub-tree to a single parent cache (to simplify > trouble-shooting, pre-loading, etc.). This can be achieved by excluding the > query-string, file-name, and right-most portions of the path from the hash > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4870) Storage can be marked offline multiple times which breaks related metrics
[ https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494410#comment-15494410 ] Gancho Tenev commented on TS-4870: -- Repeat the test with the patch applied: {code} # Initial cache size (when using both disks). $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total 268025856 # Take 1st disk offline. Cache size changes as expected. $ sudo ./bin/traffic_ctl storage offline /dev/sdb $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total 134012928 # Take same disk offline again. Now good! $ sudo ./bin/traffic_ctl storage offline /dev/sdb $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total 134012928 # Take same disk offline again. Good again. $ sudo ./bin/traffic_ctl storage offline /dev/sdb $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total 134012928 {code} > Storage can be marked offline multiple times which breaks related metrics > - > > Key: TS-4870 > URL: https://issues.apache.org/jira/browse/TS-4870 > Project: Traffic Server > Issue Type: Bug > Components: Cache, Metrics >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > Let us say traffic server is running with 2 disks > {code} > $ cat etc/trafficserver/storage.config > /dev/sdb > /dev/sdc > $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]' > Disk /dev/sdb: 134 MB, 134217728 bytes > Disk /dev/sdc: 134 MB, 134217728 bytes > {code} > Let us see what happens when we mark the same disk 3 times in a raw > ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}. > {code} > # Initial cache size (when using both disks). > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 268025856 > # Take 1st disk offline. Cache size changes as expected. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 134012928 > # Take same disk offline again. Not good! > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 0 > # Take same disk offline again. Negative value. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total -134012928 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4870) Storage can be marked offline multiple times which breaks related metrics
[ https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4870: - Fix Version/s: 7.0.0 > Storage can be marked offline multiple times which breaks related metrics > - > > Key: TS-4870 > URL: https://issues.apache.org/jira/browse/TS-4870 > Project: Traffic Server > Issue Type: Bug > Components: Cache, Metrics >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > Let us say traffic server is running with 2 disks > {code} > $ cat etc/trafficserver/storage.config > /dev/sdb > /dev/sdc > $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]' > Disk /dev/sdb: 134 MB, 134217728 bytes > Disk /dev/sdc: 134 MB, 134217728 bytes > {code} > Let us see what happens when we mark the same disk 3 times in a raw > ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}. > {code} > # Initial cache size (when using both disks). > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 268025856 > # Take 1st disk offline. Cache size changes as expected. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 134012928 > # Take same disk offline again. Not good! > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 0 > # Take same disk offline again. Negative value. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total -134012928 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4870) Storage can be marked offline multiple times which breaks related metrics
[ https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4870: - Component/s: Metrics > Storage can be marked offline multiple times which breaks related metrics > - > > Key: TS-4870 > URL: https://issues.apache.org/jira/browse/TS-4870 > Project: Traffic Server > Issue Type: Bug > Components: Cache, Metrics >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > Let us say traffic server is running with 2 disks > {code} > $ cat etc/trafficserver/storage.config > /dev/sdb > /dev/sdc > $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]' > Disk /dev/sdb: 134 MB, 134217728 bytes > Disk /dev/sdc: 134 MB, 134217728 bytes > {code} > Let us see what happens when we mark the same disk 3 times in a raw > ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}. > {code} > # Initial cache size (when using both disks). > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 268025856 > # Take 1st disk offline. Cache size changes as expected. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 134012928 > # Take same disk offline again. Not good! > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 0 > # Take same disk offline again. Negative value. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total -134012928 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4870) Storage can be marked offline multiple times which breaks related metrics
[ https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4870: - Assignee: Gancho Tenev > Storage can be marked offline multiple times which breaks related metrics > - > > Key: TS-4870 > URL: https://issues.apache.org/jira/browse/TS-4870 > Project: Traffic Server > Issue Type: Bug > Components: Cache >Reporter: Gancho Tenev >Assignee: Gancho Tenev > > Let us say traffic server is running with 2 disks > {code} > $ cat etc/trafficserver/storage.config > /dev/sdb > /dev/sdc > $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]' > Disk /dev/sdb: 134 MB, 134217728 bytes > Disk /dev/sdc: 134 MB, 134217728 bytes > {code} > Let us see what happens when we mark the same disk 3 times in a raw > ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}. > {code} > # Initial cache size (when using both disks). > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 268025856 > # Take 1st disk offline. Cache size changes as expected. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 134012928 > # Take same disk offline again. Not good! > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total 0 > # Take same disk offline again. Negative value. > $ sudo ./bin/traffic_ctl storage offline /dev/sdb > $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total > proxy.node.cache.bytes_total -134012928 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4870) Storage can be marked offline multiple times which breaks related metrics
Gancho Tenev created TS-4870: Summary: Storage can be marked offline multiple times which breaks related metrics Key: TS-4870 URL: https://issues.apache.org/jira/browse/TS-4870 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Gancho Tenev Let us say traffic server is running with 2 disks {code} $ cat etc/trafficserver/storage.config /dev/sdb /dev/sdc $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]' Disk /dev/sdb: 134 MB, 134217728 bytes Disk /dev/sdc: 134 MB, 134217728 bytes {code} Let us see what happens when we mark the same disk 3 times in a raw ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}. {code} # Initial cache size (when using both disks). $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total 268025856 # Take 1st disk offline. Cache size changes as expected. $ sudo ./bin/traffic_ctl storage offline /dev/sdb $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total 134012928 # Take same disk offline again. Not good! $ sudo ./bin/traffic_ctl storage offline /dev/sdb $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total 0 # Take same disk offline again. Negative value. $ sudo ./bin/traffic_ctl storage offline /dev/sdb $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total proxy.node.cache.bytes_total -134012928 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4834) Expose bad disk and disk access failures
[ https://issues.apache.org/jira/browse/TS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4834: - Description: We would like to monitor low-level disk access failures and disks marked by ATS as bad. I have a patch that exposes that information through {code} proxy.process.cache.disk_error_count 10 proxy.process.cache.disk_bad_count 5 {code} and the following tests/shows how it would work... Start ATS with 2 disks and tail {{diags.log}} {code} $ cat etc/trafficserver/storage.config /dev/sdb /dev/sdc $ tail -f var/log/trafficserver/diags.log [Sep 8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running [Sep 8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled {code} Check related metrics and observe all 0s {code} $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" "proxy.process.cache.*(read|write).failure" "proxy.process.http.cache_(read|write)_errors" proxy.process.cache.disk_error_count 0 proxy.process.cache.disk_bad_count 0 proxy.process.cache.read.failure 0 proxy.process.cache.write.failure 0 proxy.process.cache.volume_0.read.failure 0 proxy.process.cache.volume_0.write.failure 0 proxy.process.http.cache_write_errors 0 proxy.process.http.cache_read_errors 0 {code} Now using your favorite hard disk failure injection tool inject 10 failures, by setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to fail all reads. And shoot 5 requests causing 10 failed reads. {code} $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null -s; done $ tail -f var/log/trafficserver/diags.log [Sep 8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [1/10] [Sep 8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [2/10] [Sep 8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [3/10] [Sep 8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [4/10] [Sep 8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [5/10] [Sep 8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [6/10] [Sep 8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [7/10] [Sep 8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [8/10] [Sep 8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [9/10] [Sep 8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors accessing disk /dev/sdb [10/10]: declaring disk bad {code} We see 5 read failures which triggered 10 actually disk reads and marked the failing disk as a bad disk. {code} $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" "proxy.process.cache.*(read|write).failure" "proxy.process.http.cache_(read|write)_errors" proxy.process.cache.disk_error_count 10 proxy.process.cache.disk_bad_count 1 proxy.process.cache.read.failure 5 proxy.process.cache.write.failure 5 proxy.process.cache.volume_0.read.failure 5 proxy.process.cache.volume_0.write.failure 5 proxy.process.http.cache_write_errors 0 proxy.process.http.cache_read_errors 0 {code} Now shoot 5 requests causing 10 failed reads. {code} $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null -s; done $ tail -f var/log/trafficserver/diags.log [Sep 8 12:26:02.874] Server {0x2aaae4100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:26:02.875] Server {0x2aaaf0302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:26:02.876] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdc [1/10] [Sep 8 12:26:02.885] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdc [2/10] [Sep 8 12:26:02.902] Server {0x2aaaf0302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:26:02.902] Server {0x2aaae4100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:26:02.907] Server {0x2b5f43db54c0} WARNING: Error
[jira] [Updated] (TS-4834) Expose bad disk and disk access failures
[ https://issues.apache.org/jira/browse/TS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4834: - Assignee: Gancho Tenev > Expose bad disk and disk access failures > > > Key: TS-4834 > URL: https://issues.apache.org/jira/browse/TS-4834 > Project: Traffic Server > Issue Type: Improvement > Components: Cache, Metrics >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > We would like to monitor low-level disk access failures and disk marked by > ATS as bad. > I have a patch that exposes that information through > {code} > proxy.process.cache.disk_error_count 10 > proxy.process.cache.disk_bad_count 5 > {code} > and the following tests/shows how it would work... > Start ATS with 2 disks and tail {{diags.log}} > {code} > $ cat etc/trafficserver/storage.config > /dev/sdb > /dev/sdc > $ tail -f var/log/trafficserver/diags.log > [Sep 8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running > [Sep 8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled > {code} > Check related metrics and observe all 0s > {code} > $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" > "proxy.process.cache.*(read|write).failure" > "proxy.process.http.cache_(read|write)_errors" > proxy.process.cache.disk_error_count 0 > proxy.process.cache.disk_bad_count 0 > proxy.process.cache.read.failure 0 > proxy.process.cache.write.failure 0 > proxy.process.cache.volume_0.read.failure 0 > proxy.process.cache.volume_0.write.failure 0 > proxy.process.http.cache_write_errors 0 > proxy.process.http.cache_read_errors 0 > {code} > Now using your favorite hard disk failure injection tool inject 10 failures, > by setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to > fail all reads. And shoot 5 requests causing 10 failed reads. > {code} > $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o > /dev/null -s; done > $ tail -f var/log/trafficserver/diags.log > [Sep 8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [1/10] > [Sep 8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [2/10] > [Sep 8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [3/10] > [Sep 8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [4/10] > [Sep 8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [5/10] > [Sep 8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [6/10] > [Sep 8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [7/10] > [Sep 8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [8/10] > [Sep 8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [9/10] > [Sep 8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors > accessing disk /dev/sdb [10/10]: declaring disk bad > {code} > We see 5 read failures which triggered 10 actually disk reads and marked the > failing disk as a bad disk. > {code} > $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" > "proxy.process.cache.*(read|write).failure" > "proxy.process.http.cache_(read|write)_errors" > proxy.process.cache.disk_error_count 10 > proxy.process.cache.disk_bad_count 1 > proxy.process.cache.read.failure 5 > proxy.process.cache.write.failure 5 > proxy.process.cache.volume_0.read.failure 5 > proxy.process.cache.volume_0.write.failure 5 > proxy.process.http.cache_write_errors 0 > proxy.process.http.cache_read_errors 0 > {code} > Now shoot 5 requests causing 10 failed reads. > {code} > $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o > /dev/null -s; done > $ tail -f var/log/trafficserver/diags.log > [Sep 8 12:26:02.874] Server
[jira] [Created] (TS-4834) Expose bad disk and disk access failures
Gancho Tenev created TS-4834: Summary: Expose bad disk and disk access failures Key: TS-4834 URL: https://issues.apache.org/jira/browse/TS-4834 Project: Traffic Server Issue Type: Improvement Components: Cache, Metrics Reporter: Gancho Tenev We would like to monitor low-level disk access failures and disk marked by ATS as bad. I have a patch that exposes that information through {code} proxy.process.cache.disk_error_count 10 proxy.process.cache.disk_bad_count 5 {code} and the following tests/shows how it would work... Start ATS with 2 disks and tail {{diags.log}} {code} $ cat etc/trafficserver/storage.config /dev/sdb /dev/sdc $ tail -f var/log/trafficserver/diags.log [Sep 8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running [Sep 8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled {code} Check related metrics and observe all 0s {code} $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" "proxy.process.cache.*(read|write).failure" "proxy.process.http.cache_(read|write)_errors" proxy.process.cache.disk_error_count 0 proxy.process.cache.disk_bad_count 0 proxy.process.cache.read.failure 0 proxy.process.cache.write.failure 0 proxy.process.cache.volume_0.read.failure 0 proxy.process.cache.volume_0.write.failure 0 proxy.process.http.cache_write_errors 0 proxy.process.http.cache_read_errors 0 {code} Now using your favorite hard disk failure injection tool inject 10 failures, by setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to fail all reads. And shoot 5 requests causing 10 failed reads. {code} $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null -s; done $ tail -f var/log/trafficserver/diags.log [Sep 8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [1/10] [Sep 8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [2/10] [Sep 8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [3/10] [Sep 8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [4/10] [Sep 8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [5/10] [Sep 8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [6/10] [Sep 8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdb [7/10] [Sep 8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [8/10] [Sep 8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdb [9/10] [Sep 8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors accessing disk /dev/sdb [10/10]: declaring disk bad {code} We see 5 read failures which triggered 10 actually disk reads and marked the failing disk as a bad disk. {code} $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" "proxy.process.cache.*(read|write).failure" "proxy.process.http.cache_(read|write)_errors" proxy.process.cache.disk_error_count 10 proxy.process.cache.disk_bad_count 1 proxy.process.cache.read.failure 5 proxy.process.cache.write.failure 5 proxy.process.cache.volume_0.read.failure 5 proxy.process.cache.volume_0.write.failure 5 proxy.process.http.cache_write_errors 0 proxy.process.http.cache_read_errors 0 {code} Now shoot 5 requests causing 10 failed reads. {code} $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null -s; done $ tail -f var/log/trafficserver/diags.log [Sep 8 12:26:02.874] Server {0x2aaae4100700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:26:02.875] Server {0x2aaaf0302700} WARNING: cache disk operation failed READ -1 0 [Sep 8 12:26:02.876] Server {0x2b5f44654700} WARNING: Error accessing Disk /dev/sdc [1/10] [Sep 8 12:26:02.885] Server {0x2b5f43db54c0} WARNING: Error accessing Disk /dev/sdc [2/10] [Sep 8 12:26:02.902] Server {0x2aaaf0302700} WARNING: cache disk operation failed READ -1
[jira] [Updated] (TS-4834) Expose bad disk and disk access failures
[ https://issues.apache.org/jira/browse/TS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4834: - Fix Version/s: 7.0.0 > Expose bad disk and disk access failures > > > Key: TS-4834 > URL: https://issues.apache.org/jira/browse/TS-4834 > Project: Traffic Server > Issue Type: Improvement > Components: Cache, Metrics >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > We would like to monitor low-level disk access failures and disk marked by > ATS as bad. > I have a patch that exposes that information through > {code} > proxy.process.cache.disk_error_count 10 > proxy.process.cache.disk_bad_count 5 > {code} > and the following tests/shows how it would work... > Start ATS with 2 disks and tail {{diags.log}} > {code} > $ cat etc/trafficserver/storage.config > /dev/sdb > /dev/sdc > $ tail -f var/log/trafficserver/diags.log > [Sep 8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running > [Sep 8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled > {code} > Check related metrics and observe all 0s > {code} > $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" > "proxy.process.cache.*(read|write).failure" > "proxy.process.http.cache_(read|write)_errors" > proxy.process.cache.disk_error_count 0 > proxy.process.cache.disk_bad_count 0 > proxy.process.cache.read.failure 0 > proxy.process.cache.write.failure 0 > proxy.process.cache.volume_0.read.failure 0 > proxy.process.cache.volume_0.write.failure 0 > proxy.process.http.cache_write_errors 0 > proxy.process.http.cache_read_errors 0 > {code} > Now using your favorite hard disk failure injection tool inject 10 failures, > by setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to > fail all reads. And shoot 5 requests causing 10 failed reads. > {code} > $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o > /dev/null -s; done > $ tail -f var/log/trafficserver/diags.log > [Sep 8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [1/10] > [Sep 8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [2/10] > [Sep 8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [3/10] > [Sep 8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [4/10] > [Sep 8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [5/10] > [Sep 8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [6/10] > [Sep 8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk > /dev/sdb [7/10] > [Sep 8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [8/10] > [Sep 8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation > failed READ -1 0 > [Sep 8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk > /dev/sdb [9/10] > [Sep 8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors > accessing disk /dev/sdb [10/10]: declaring disk bad > {code} > We see 5 read failures which triggered 10 actually disk reads and marked the > failing disk as a bad disk. > {code} > $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" > "proxy.process.cache.*(read|write).failure" > "proxy.process.http.cache_(read|write)_errors" > proxy.process.cache.disk_error_count 10 > proxy.process.cache.disk_bad_count 1 > proxy.process.cache.read.failure 5 > proxy.process.cache.write.failure 5 > proxy.process.cache.volume_0.read.failure 5 > proxy.process.cache.volume_0.write.failure 5 > proxy.process.http.cache_write_errors 0 > proxy.process.http.cache_read_errors 0 > {code} > Now shoot 5 requests causing 10 failed reads. > {code} > $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o > /dev/null -s; done > $ tail -f var/log/trafficserver/diags.log > [Sep 8 12:26:02.874] Server
[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.
[ https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468540#comment-15468540 ] Gancho Tenev commented on TS-4334: -- [~jamesf], I am sorry if I did not make my idea/example clear enough! I was proposing not to use {{cache_range_requests}} plugin at all and use {{cachekey}} plugin as your central place of cache key manipulation and then by using the {{header_rewrite}} plugin to implement the rest of the logic done by the {{cache_range_requests}} (adding/removing the Range header at different hooks). The end result should be practically the same as using {{cache_range_requests}} plugin but achieved by using more generic plugins (tested it) instead of hacking into {{cache_range_requests}} which seems pretty specialized and less configurable. I was wondering if this would work for you. Please let me know, I would gladly help you if any problems/concerns. Cheers! > The cache_range_requests plugin always attempts to modify the cache key. > > > Key: TS-4334 > URL: https://issues.apache.org/jira/browse/TS-4334 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Nolan Astrein >Assignee: Gancho Tenev > Fix For: 7.1.0 > > > A TrafficServer administrator should be able to specify whether or not the > cache_range_requests plugin should modify the cache key. The cache key may > be modified by a previous plugin in a plugin chain and there is no way to > configure cache_range_requests not to do any further modifications to the > cache key. Having multiple plugins responsible for cache key modifications > can cause unexpected behavior, especially when a plugin chain ordering is > changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set
[ https://issues.apache.org/jira/browse/TS-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459117#comment-15459117 ] Gancho Tenev edited comment on TS-4809 at 9/2/16 5:51 PM: -- Provided a patch which would error like this: {code} 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} at hdrs.config:2 should be the first hook condition in the rule set and each rule set should contain only one hook condition {code} In the following 2 use-cases: * The hook condition is not the first in the rule set. {code} $ sudo cat etc/trafficserver/hdrs.config cond %{TRUE} cond %{REMAP_PSEUDO_HOOK} set-header Some-Header "some value" {code} * There are 2 hook conditions in the same rule set. {code} $ sudo cat etc/trafficserver/hdrs.config cond %{REMAP_PSEUDO_HOOK} cond %{TRUE} cond %{SEND_RESPONSE_HDR_HOOK} set-header Some-Header "some value" {code} Also added line numbers to the error messages in {{RuleSet::add_condition()}} and {{RuleSet::add_operator()}}. was (Author: gancho): Provided a patch which would error like this: {code} 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} at hdrs.config:2 should be the first hook condition in the rule set and each rule set should contain only one hook condition {code} In the following 2 use-cases: * The hook condition is not the first in the rule set. {code} $ sudo cat etc/trafficserver/hdrs.config cond %{TRUE} cond %{REMAP_PSEUDO_HOOK} set-header Some-Header "some value" {code} * There are 2 hook conditions in the same rule set. {code} $ sudo cat etc/trafficserver/hdrs.config cond %{REMAP_PSEUDO_HOOK} cond %{TRUE} cond %{SEND_RESPONSE_HDR_HOOK} set-header Some-Header "some value" {code} Also added a line numbers to the error messages in {{RuleSet::add_condition()}} and {{RuleSet::add_operator()}}. > [header_rewrite] check to make sure "hook" conditions are first in the rule > set > > > Key: TS-4809 > URL: https://issues.apache.org/jira/browse/TS-4809 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > The following configuration > {code} > $ cat etc/trafficserver/remap.config > map http://example.com http://127.0.0.1: \ > @plugin=header_rewrite.so @pparam=hdrs.config > $ cat etc/trafficserver/hdrs.config > cond %{TRUE} > cond %{REMAP_PSEUDO_HOOK} >set-header Some-Header "some value" > {code} > Triggers the following error which does not show what and where the problem > is: > {code} > 20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK > {code} > I would like to add a check which will prevent the above error and print > another error clarifying where and what the problem is, for instance: > {code} > 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come > first in the rule set at hdrs.config:2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set
[ https://issues.apache.org/jira/browse/TS-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459117#comment-15459117 ] Gancho Tenev commented on TS-4809: -- Provided a patch which would error like this: {code} 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} at hdrs.config:2 should be the first hook condition in the rule set and each rule set should contain only one hook condition {code} In the following 2 use-cases: * The hook condition is not the first in the rule set. {code} $ sudo cat etc/trafficserver/hdrs.config cond %{TRUE} cond %{REMAP_PSEUDO_HOOK} set-header Some-Header "some value" {code} * There are 2 hook conditions in the same rule set. {code} $ sudo cat etc/trafficserver/hdrs.config cond %{REMAP_PSEUDO_HOOK} cond %{TRUE} cond %{SEND_RESPONSE_HDR_HOOK} set-header Some-Header "some value" {code} Also added a line numbers to the error messages in {{RuleSet::add_condition()}} and {{RuleSet::add_operator()}}. > [header_rewrite] check to make sure "hook" conditions are first in the rule > set > > > Key: TS-4809 > URL: https://issues.apache.org/jira/browse/TS-4809 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > The following configuration > {code} > $ cat etc/trafficserver/remap.config > map http://example.com http://127.0.0.1: \ > @plugin=header_rewrite.so @pparam=hdrs.config > $ cat etc/trafficserver/hdrs.config > cond %{TRUE} > cond %{REMAP_PSEUDO_HOOK} >set-header Some-Header "some value" > {code} > Triggers the following error which does not show what and where the problem > is: > {code} > 20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK > {code} > I would like to add a check which will prevent the above error and print > another error clarifying where and what the problem is, for instance: > {code} > 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come > first in the rule set at hdrs.config:2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set
[ https://issues.apache.org/jira/browse/TS-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4809: - Assignee: Gancho Tenev > [header_rewrite] check to make sure "hook" conditions are first in the rule > set > > > Key: TS-4809 > URL: https://issues.apache.org/jira/browse/TS-4809 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Gancho Tenev >Assignee: Gancho Tenev > > The following configuration > {code} > $ cat etc/trafficserver/remap.config > map http://example.com http://127.0.0.1: \ > @plugin=header_rewrite.so @pparam=hdrs.config > $ cat etc/trafficserver/hdrs.config > cond %{TRUE} > cond %{REMAP_PSEUDO_HOOK} >set-header Some-Header "some value" > {code} > Triggers the following error which does not show what and where the problem > is: > {code} > 20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK > {code} > I would like to add a check which will prevent the above error and print > another error clarifying where and what the problem is, for instance: > {code} > 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come > first in the rule set at hdrs.config:2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set
Gancho Tenev created TS-4809: Summary: [header_rewrite] check to make sure "hook" conditions are first in the rule set Key: TS-4809 URL: https://issues.apache.org/jira/browse/TS-4809 Project: Traffic Server Issue Type: Improvement Components: Plugins Reporter: Gancho Tenev The following configuration {code} $ cat etc/trafficserver/remap.config map http://example.com http://127.0.0.1: \ @plugin=header_rewrite.so @pparam=hdrs.config $ cat etc/trafficserver/hdrs.config cond %{TRUE} cond %{REMAP_PSEUDO_HOOK} set-header Some-Header "some value" {code} Triggers the following error which does not show what and where the problem is: {code} 20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK {code} I would like to add a check which will prevent the above error and print another error clarifying where and what the problem is, for instance: {code} 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come first in the rule set at hdrs.config:2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.
[ https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441913#comment-15441913 ] Gancho Tenev edited comment on TS-4334 at 8/27/16 5:16 PM: --- It seems to me that the {{cache_range_requests}} functionality can be achieved by using more generic (and feature-rich) plugins like {{header_rewrite}} and {{cachekey}}. Here is an example where {{httpbin.org}} is used as origin which responds to range requests adding {{Cache-Control:max-age=10}} header. There are 2 remap rules defined, one for {{example_no_cache.com}} which is non-caching (ATS does not cache 206 responses by default) and one for {{example.com}} which is caching exactly like the {{cache_range_requests}} plugin does. To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times to test the non-caching remap and 2 times to test the "cache_range_requests"-style caching. Here are the configs: {code} $ cat etc/trafficserver/remap.config map http://example.com http://httpbin.org \ @plugin=cachekey.so @pparam=--include-headers=@Original-Range \ @plugin=header_rewrite.so @pparam=cache_range_local.config map http://example_no_cache.com http://httpbin.org \ @plugin=cachekey.so $ cat etc/trafficserver/cache_range_global.config cond %{READ_REQUEST_HDR_HOOK} cond %{CLIENT-URL:HOST} example.com set-header @Original-Range %{HEADER:Range} rm-header Range $ cat etc/trafficserver/cache_range_local.config cond %{SEND_REQUEST_HDR_HOOK} set-header Range %{HEADER:@Original-Range} cond %{READ_RESPONSE_HDR_HOOK} cond %{STATUS} =206 set-status 200 set-header Cache-Control "max-age=10" cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =200 set-status 206 $ cat etc/trafficserver/plugin.config header_rewrite.so cache_range_global.config xdebug.so {code} And here is a sample test: {code} $ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache . . . $ for domain in example_no_cache.com example_no_cache.com example.com example.com; do curl -x 127.0.0.1:80 -v "http://${domain}/range/1024; -H "X-Debug: X-Cache,X-Cache-Key" -r0-16 -s 2>&1|grep -e "HTTP" -e "Cache"; echo "---"; sleep 5; done > GET http://example_no_cache.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 PARTIAL CONTENT < X-Cache-Key: /example_no_cache.com/80/range/1024 < X-Cache: miss --- > GET http://example_no_cache.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 PARTIAL CONTENT < X-Cache-Key: /example_no_cache.com/80/range/1024 < X-Cache: miss --- > GET http://example.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 Partial Content < Cache-Control: max-age=10 < X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024 < X-Cache: miss --- > GET http://example.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 Partial Content < Cache-Control: max-age=10 < X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024 < X-Cache: hit-fresh --- {code} Please let me know if it works for you! Cheers, --Gancho was (Author: gancho): It seems to me that the {{cache_range_requests}} functionality can be achieved by using more generic (and feature-rich) plugins like {{header_rewrite}} and {{cachekey}}. Here is a example where {{httpbin.org}} is used as origin which responds to range requests adding {{Cache-Control:max-age=10}} header. There are 2 remap rules defined, one for {{example_no_cache.com}} which is non-caching (ATS does not cache 206 responses by default) and one for {{example.com}} which is caching exactly like the {{cache_range_requests}} plugin does. To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times to test the non-caching remap and 2 times to test the "cache_range_requests"-style caching. Here are the configs: {code} $ cat etc/trafficserver/remap.config map http://example.com http://httpbin.org \ @plugin=cachekey.so @pparam=--include-headers=@Original-Range \ @plugin=header_rewrite.so @pparam=cache_range_local.config map http://example_no_cache.com http://httpbin.org \ @plugin=cachekey.so $ cat etc/trafficserver/cache_range_global.config cond %{READ_REQUEST_HDR_HOOK} cond %{CLIENT-URL:HOST} example.com set-header @Original-Range %{HEADER:Range} rm-header Range $ cat etc/trafficserver/cache_range_local.config cond %{SEND_REQUEST_HDR_HOOK} set-header Range %{HEADER:@Original-Range} cond %{READ_RESPONSE_HDR_HOOK} cond %{STATUS} =206 set-status 200 set-header Cache-Control "max-age=10" cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =200 set-status 206 $ cat etc/trafficserver/plugin.config header_rewrite.so cache_range_global.config xdebug.so {code} And here is a sample test: {code} $ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache . . . $ for domain in example_no_cache.com example_no_cache.com example.com
[jira] [Comment Edited] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.
[ https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441913#comment-15441913 ] Gancho Tenev edited comment on TS-4334 at 8/27/16 5:15 PM: --- It seems to me that the {{cache_range_requests}} functionality can be achieved by using more generic (and feature-rich) plugins like {{header_rewrite}} and {{cachekey}}. Here is a example where {{httpbin.org}} is used as origin which responds to range requests adding {{Cache-Control:max-age=10}} header. There are 2 remap rules defined, one for {{example_no_cache.com}} which is non-caching (ATS does not cache 206 responses by default) and one for {{example.com}} which is caching exactly like the {{cache_range_requests}} plugin does. To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times to test the non-caching remap and 2 times to test the "cache_range_requests"-style caching. Here are the configs: {code} $ cat etc/trafficserver/remap.config map http://example.com http://httpbin.org \ @plugin=cachekey.so @pparam=--include-headers=@Original-Range \ @plugin=header_rewrite.so @pparam=cache_range_local.config map http://example_no_cache.com http://httpbin.org \ @plugin=cachekey.so $ cat etc/trafficserver/cache_range_global.config cond %{READ_REQUEST_HDR_HOOK} cond %{CLIENT-URL:HOST} example.com set-header @Original-Range %{HEADER:Range} rm-header Range $ cat etc/trafficserver/cache_range_local.config cond %{SEND_REQUEST_HDR_HOOK} set-header Range %{HEADER:@Original-Range} cond %{READ_RESPONSE_HDR_HOOK} cond %{STATUS} =206 set-status 200 set-header Cache-Control "max-age=10" cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =200 set-status 206 $ cat etc/trafficserver/plugin.config header_rewrite.so cache_range_global.config xdebug.so {code} And here is a sample test: {code} $ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache . . . $ for domain in example_no_cache.com example_no_cache.com example.com example.com; do curl -x 127.0.0.1:80 -v "http://${domain}/range/1024; -H "X-Debug: X-Cache,X-Cache-Key" -r0-16 -s 2>&1|grep -e "HTTP" -e "Cache"; echo "---"; sleep 5; done > GET http://example_no_cache.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 PARTIAL CONTENT < X-Cache-Key: /example_no_cache.com/80/range/1024 < X-Cache: miss --- > GET http://example_no_cache.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 PARTIAL CONTENT < X-Cache-Key: /example_no_cache.com/80/range/1024 < X-Cache: miss --- > GET http://example.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 Partial Content < Cache-Control: max-age=10 < X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024 < X-Cache: miss --- > GET http://example.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 Partial Content < Cache-Control: max-age=10 < X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024 < X-Cache: hit-fresh --- {code} Please let me know if it works for you! Cheers, --Gancho was (Author: gancho): It seems to me that the {{cache_range_requests}} functionality can be achieved by using more generic (and feature-rich) plugins like {{header_rewrite}} and {{cachekey}}. Here is a sample where {{httpbin.org}} is used as origin which responds to range requests adding {{Cache-Control:max-age=10}} header. There are 2 remap rules defined, one for {{example_no_cache.com}} which is non-caching (ATS does not cache 206 responses by default) and one for {{example.com}} which is caching exactly like the {{cache_range_requests}} plugin does. To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times to test the non-caching remap and 2 times to test the "cache_range_requests"-style caching. Here are the configs: {code} $ cat etc/trafficserver/remap.config map http://example.com http://httpbin.org \ @plugin=cachekey.so @pparam=--include-headers=@Original-Range \ @plugin=header_rewrite.so @pparam=cache_range_local.config map http://example_no_cache.com http://httpbin.org \ @plugin=cachekey.so $ cat etc/trafficserver/cache_range_global.config cond %{READ_REQUEST_HDR_HOOK} cond %{CLIENT-URL:HOST} example.com set-header @Original-Range %{HEADER:Range} rm-header Range $ cat etc/trafficserver/cache_range_local.config cond %{SEND_REQUEST_HDR_HOOK} set-header Range %{HEADER:@Original-Range} cond %{READ_RESPONSE_HDR_HOOK} cond %{STATUS} =206 set-status 200 set-header Cache-Control "max-age=10" cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =200 set-status 206 $ cat etc/trafficserver/plugin.config header_rewrite.so cache_range_global.config xdebug.so {code} And here is a sample test: {code} $ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache . . . $ for domain in example_no_cache.com example_no_cache.com example.com example.com;
[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.
[ https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441913#comment-15441913 ] Gancho Tenev commented on TS-4334: -- It seems to me that the {{cache_range_requests}} functionality can be achieved by using more generic (and feature-rich) plugins like {{header_rewrite}} and {{cachekey}}. Here is a sample where {{httpbin.org}} is used as origin which responds to range requests adding {{Cache-Control:max-age=10}} header. There are 2 remap rules defined, one for {{example_no_cache.com}} which is non-caching (ATS does not cache 206 responses by default) and one for {{example.com}} which is caching exactly like the {{cache_range_requests}} plugin does. To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times to test the non-caching remap and 2 times to test the "cache_range_requests"-style caching. Here are the configs: {code} $ cat etc/trafficserver/remap.config map http://example.com http://httpbin.org \ @plugin=cachekey.so @pparam=--include-headers=@Original-Range \ @plugin=header_rewrite.so @pparam=cache_range_local.config map http://example_no_cache.com http://httpbin.org \ @plugin=cachekey.so $ cat etc/trafficserver/cache_range_global.config cond %{READ_REQUEST_HDR_HOOK} cond %{CLIENT-URL:HOST} example.com set-header @Original-Range %{HEADER:Range} rm-header Range $ cat etc/trafficserver/cache_range_local.config cond %{SEND_REQUEST_HDR_HOOK} set-header Range %{HEADER:@Original-Range} cond %{READ_RESPONSE_HDR_HOOK} cond %{STATUS} =206 set-status 200 set-header Cache-Control "max-age=10" cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =200 set-status 206 $ cat etc/trafficserver/plugin.config header_rewrite.so cache_range_global.config xdebug.so {code} And here is a sample test: {code} $ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache . . . $ for domain in example_no_cache.com example_no_cache.com example.com example.com; do curl -x 127.0.0.1:80 -v "http://${domain}/range/1024; -H "X-Debug: X-Cache,X-Cache-Key" -r0-16 -s 2>&1|grep -e "HTTP" -e "Cache"; echo "---"; sleep 5; done > GET http://example_no_cache.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 PARTIAL CONTENT < X-Cache-Key: /example_no_cache.com/80/range/1024 < X-Cache: miss --- > GET http://example_no_cache.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 PARTIAL CONTENT < X-Cache-Key: /example_no_cache.com/80/range/1024 < X-Cache: miss --- > GET http://example.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 Partial Content < Cache-Control: max-age=10 < X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024 < X-Cache: miss --- > GET http://example.com/range/1024 HTTP/1.1 > X-Debug: X-Cache,X-Cache-Key < HTTP/1.1 206 Partial Content < Cache-Control: max-age=10 < X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024 < X-Cache: hit-fresh --- {code} Please let me know if it works for you! Cheers, --Gancho > The cache_range_requests plugin always attempts to modify the cache key. > > > Key: TS-4334 > URL: https://issues.apache.org/jira/browse/TS-4334 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Nolan Astrein >Assignee: Gancho Tenev > Fix For: 7.1.0 > > > A TrafficServer administrator should be able to specify whether or not the > cache_range_requests plugin should modify the cache key. The cache key may > be modified by a previous plugin in a plugin chain and there is no way to > configure cache_range_requests not to do any further modifications to the > cache key. Having multiple plugins responsible for cache key modifications > can cause unexpected behavior, especially when a plugin chain ordering is > changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4686) Move hook-trace plugin from examples to plugins/experimental
[ https://issues.apache.org/jira/browse/TS-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4686: - Assignee: Gancho Tenev > Move hook-trace plugin from examples to plugins/experimental > > > Key: TS-4686 > URL: https://issues.apache.org/jira/browse/TS-4686 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Leif Hedstrom >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > This makes more sense as a tool in the plugins arsenal. :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4712) Verify HttpHdr caching functionality
Gancho Tenev created TS-4712: Summary: Verify HttpHdr caching functionality Key: TS-4712 URL: https://issues.apache.org/jira/browse/TS-4712 Project: Traffic Server Issue Type: Task Components: Cleanup, Core Reporter: Gancho Tenev After finding an use-case that was not supported well by the HttpHdr caching functionality ([TS-4706|https://issues.apache.org/jira/browse/TS-4706]) it may make sense to look into its use-cases and verify its functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4706) SSL hostname verification failed due to truncated SNI name
Gancho Tenev created TS-4706: Summary: SSL hostname verification failed due to truncated SNI name Key: TS-4706 URL: https://issues.apache.org/jira/browse/TS-4706 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Gancho Tenev SSL hostname verification fails due to truncated SNI name when escalation plugin is used to redirect a failed request (404) from a primary origin {{primary.com}} to a secondary origin {{secondary.com}}. {code:title=Excerpt from the ATS logs showing the error|borderStyle=solid} DEBUG: (ssl) using SNI name ‘secondary.c'’ for client handshake DEBUG: (ssl.error) SSLNetVConnection::sslClientHandShakeEvent, SSL_ERROR_WANT_READ DEBUG: (ssl) using SNI name 'secondary.c’’ for client handshake DEBUG: (ssl) Hostname verification failed for (‘secondary.c') {code} One could see that the SNI name {{secondary.com}} is truncated to {{secondary.c}} {code:title=Test case to reproduce} $ cat etc/trafficserver/remap.config map http://example.com https://primary.com @plugin=escalate.so @pparam=404:secondary.com $ sudo ./bin/traffic_server -T ssl 2>&1 | egrep -e 'using SNI name .* for client handshake' DEBUG: (ssl) using SNI name 'primary.com' for client handshake DEBUG: (ssl) using SNI name 'secondary.c' for client handshake $ curl -x localhost:80 'http://example.com/path/to/object' {code} I have a fix available which produces the following log (SNI hostname no longer truncated) {code:title=Excerpt from ATS logs after applying the fix} $ sudo ./bin/traffic_server -T ssl 2>&1 | egrep -e 'using SNI name .* for client handshake' DEBUG: (ssl) using SNI name 'primary.com' for client handshake DEBUG: (ssl) using SNI name 'secondary.com' for client handshake {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4650) cachekey: not thread safe
[ https://issues.apache.org/jira/browse/TS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4650: - Backport to Version: 6.2.1 > cachekey: not thread safe > - > > Key: TS-4650 > URL: https://issues.apache.org/jira/browse/TS-4650 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Affects Versions: 6.2.0 >Reporter: Felicity Tarnell > Fix For: 7.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > cachekey's Pattern class is not thread safe; it uses member data to store the > result of pcre_exec(), but only one instance is shared between all threads. > This causes crashes when two threads access the pcre result at the same time. > Fix: use automatic storage for the pcre result data. > PR incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4650) cachekey: not thread safe
[ https://issues.apache.org/jira/browse/TS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4650: - Fix Version/s: (was: 6.2.1) 7.0.0 > cachekey: not thread safe > - > > Key: TS-4650 > URL: https://issues.apache.org/jira/browse/TS-4650 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Affects Versions: 6.2.0 >Reporter: Felicity Tarnell > Fix For: 7.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > cachekey's Pattern class is not thread safe; it uses member data to store the > result of pcre_exec(), but only one instance is shared between all threads. > This causes crashes when two threads access the pcre result at the same time. > Fix: use automatic storage for the pcre result data. > PR incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4650) cachekey: not thread safe
[ https://issues.apache.org/jira/browse/TS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4650: - Fix Version/s: (was: 7.0.0) 6.2.1 > cachekey: not thread safe > - > > Key: TS-4650 > URL: https://issues.apache.org/jira/browse/TS-4650 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Affects Versions: 6.2.0 >Reporter: Felicity Tarnell > Fix For: 6.2.1 > > Time Spent: 40m > Remaining Estimate: 0h > > cachekey's Pattern class is not thread safe; it uses member data to store the > result of pcre_exec(), but only one instance is shared between all threads. > This causes crashes when two threads access the pcre result at the same time. > Fix: use automatic storage for the pcre result data. > PR incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4367) [clang-analyzer] memory leaks in mgmt/api and proxy/logging
[ https://issues.apache.org/jira/browse/TS-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4367: - Component/s: Logging > [clang-analyzer] memory leaks in mgmt/api and proxy/logging > --- > > Key: TS-4367 > URL: https://issues.apache.org/jira/browse/TS-4367 > Project: Traffic Server > Issue Type: Bug > Components: Logging, Management API >Reporter: Gancho Tenev > > ||Bug Group ||Bug Type ||File ||Function/Method ||Line || > |Memory Error |Memory leak|mgmt/api/GenericParser.cc |cacheParse > |363| > |Memory Error |Memory leak|mgmt/api/GenericParser.cc |socksParse > |660| > |Memory Error |Memory leak|mgmt/api/GenericParser.cc |splitdnsParse > |744| > |Memory Error |Memory leak|proxy/logging/LogCollationAccept.cc > |accept_event |99| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4367) [clang-analyzer] memory leaks in mgmt/api and proxy/logging
[ https://issues.apache.org/jira/browse/TS-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4367: - Summary: [clang-analyzer] memory leaks in mgmt/api and proxy/logging (was: [clang-analyzer] memory leaks in mgmt/api) > [clang-analyzer] memory leaks in mgmt/api and proxy/logging > --- > > Key: TS-4367 > URL: https://issues.apache.org/jira/browse/TS-4367 > Project: Traffic Server > Issue Type: Bug > Components: Logging, Management API >Reporter: Gancho Tenev > > ||Bug Group ||Bug Type ||File ||Function/Method ||Line || > |Memory Error |Memory leak|mgmt/api/GenericParser.cc |cacheParse > |363| > |Memory Error |Memory leak|mgmt/api/GenericParser.cc |socksParse > |660| > |Memory Error |Memory leak|mgmt/api/GenericParser.cc |splitdnsParse > |744| > |Memory Error |Memory leak|proxy/logging/LogCollationAccept.cc > |accept_event |99| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4367) [clang-analyzer] memory leaks in mgmt/api
Gancho Tenev created TS-4367: Summary: [clang-analyzer] memory leaks in mgmt/api Key: TS-4367 URL: https://issues.apache.org/jira/browse/TS-4367 Project: Traffic Server Issue Type: Bug Components: Management API Reporter: Gancho Tenev ||Bug Group ||Bug Type ||File ||Function/Method ||Line || |Memory Error |Memory leak|mgmt/api/GenericParser.cc |cacheParse |363| |Memory Error |Memory leak|mgmt/api/GenericParser.cc |socksParse |660| |Memory Error |Memory leak|mgmt/api/GenericParser.cc |splitdnsParse |744| |Memory Error |Memory leak|proxy/logging/LogCollationAccept.cc |accept_event |99| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4366) [clang-analyzer] Uninitialized stack value used in mp4 plugin
[ https://issues.apache.org/jira/browse/TS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4366: - Summary: [clang-analyzer] Uninitialized stack value used in mp4 plugin (was: [clang-analyzer] Unitialized stack value used in mp4 plugin) > [clang-analyzer] Uninitialized stack value used in mp4 plugin > - > > Key: TS-4366 > URL: https://issues.apache.org/jira/browse/TS-4366 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Reporter: Gancho Tenev > Fix For: 7.0.0 > > > Logic error: Result of operation is garbage or undefined > Source: plugins/experimental/mp4/mp4_meta.cc: 951 > Function: Mp4Meta::mp4_read_co64_atom(): > Within the expansion of the macro 'mp4_get_32value': > The left operand of '<<' is a garbage value -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4366) [clang-analyzer] Unitialized stack value used in mp4 plugin
[ https://issues.apache.org/jira/browse/TS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4366: - Fix Version/s: 7.0.0 > [clang-analyzer] Unitialized stack value used in mp4 plugin > --- > > Key: TS-4366 > URL: https://issues.apache.org/jira/browse/TS-4366 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Reporter: Gancho Tenev > Fix For: 7.0.0 > > > Logic error: Result of operation is garbage or undefined > Source: plugins/experimental/mp4/mp4_meta.cc: 951 > Function: Mp4Meta::mp4_read_co64_atom(): > Within the expansion of the macro 'mp4_get_32value': > The left operand of '<<' is a garbage value -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4366) [clang-analyzer] Unitialized stack value used in mp4 plugin
Gancho Tenev created TS-4366: Summary: [clang-analyzer] Unitialized stack value used in mp4 plugin Key: TS-4366 URL: https://issues.apache.org/jira/browse/TS-4366 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Logic error: Result of operation is garbage or undefined Source: plugins/experimental/mp4/mp4_meta.cc: 951 Function: Mp4Meta::mp4_read_co64_atom(): Within the expansion of the macro 'mp4_get_32value': The left operand of '<<' is a garbage value -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3556) Implement CDNI URL Signing as a plugin
[ https://issues.apache.org/jira/browse/TS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-3556: - Fix Version/s: (was: 6.2.0) 7.0.0 > Implement CDNI URL Signing as a plugin > -- > > Key: TS-3556 > URL: https://issues.apache.org/jira/browse/TS-3556 > Project: Traffic Server > Issue Type: New Feature > Components: Plugins, Security >Reporter: Leif Hedstrom >Assignee: Gancho Tenev > Labels: A > Fix For: 7.0.0 > > > The specs are at > https://tools.ietf.org/html/draft-ietf-cdni-uri-signing-03 > I think we should implement this, and work with the IETF community around > this to provide a full featured implementation that covers all our use cases. > This would hopefully supersede the existing url_sig plugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3198) Ignore useless MIMEFieldBlockImpl.
[ https://issues.apache.org/jira/browse/TS-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-3198: - Fix Version/s: (was: 6.2.0) 7.0.0 > Ignore useless MIMEFieldBlockImpl. > -- > > Key: TS-3198 > URL: https://issues.apache.org/jira/browse/TS-3198 > Project: Traffic Server > Issue Type: Bug > Components: MIME >Affects Versions: 4.1.2, 4.2.0, 4.2.2, 5.0.0, 5.1.1 >Reporter: portl4t >Assignee: Gancho Tenev > Fix For: 7.0.0 > > Attachments: 0001-TS-3198-Ignore-useless-MIMEFieldBlockImpl.patch > > > ATS will generate a very large marshal header in a rare case. As we know, ATS > will merge the response header if it get 304 from origin server. I found that > the HdrHeap size will increase if duplidated header exist. > In our production environment, we got a response from origin server like this: > {code} > HTTP/1.1 200 OK > Content-Length: 60 > ... > Powered-By-CC: MISS from A > Cache-Control: public,max-age=0 > Powered-By-CC: MISS from B > Connection: close > {code} > There is a duplicated header 'Powered-By-CC', and every time the doc had been > accessed, ATS had to revalidate this doc from the origin as the max-age is 0. > The origin server response 304 like this: > {code} > HTTP/1.1 304 Not Modified > ... > Powered-By-CC: 8c61e322f02a0343e93ef227d82e5e0a > Cache-Control: public,max-age=0 > Powered-By-CC: e4563610a50c63ed500d27bb5f1df848 > Connection: close > {code} > ATS will merge the header frequently, and the HdrHeap size will increase > endlessly. > {code} > Breakpoint 1, CacheVC::updateVector (this=0x14112f0) at CacheWrite.cc:132 > 132 header_len = write_vector->marshal_length(); > (gdb) n > 133 od->writing_vec = 1; > (gdb) p header_len > $1 = 1068944 > (gdb) bt > #0 CacheVC::updateVector (this=0x14112f0) at CacheWrite.cc:133 > #1 0x006c04c6 in CacheVC::openWriteClose (this=0x14112f0, event=0, > e=0x0) at CacheWrite.cc:1276 > #2 0x0069e827 in CacheVC::die (this=0x14112f0) at > P_CacheInternal.h:738 > #3 0x00690b1f in CacheVC::do_io_close (this=0x14112f0, alerrno=-1) > at Cache.cc:373 > #4 0x004fed48 in VConnection::do_io (this=0x14112f0, op=3, c=0x0, > nbytes=9223372036854775807, cb=0x0, data=0) > at ../iocore/eventsystem/P_VConnection.h:106 > #5 0x00591b5a in HttpCacheSM::close_write (this=0x7fffe7f7d3b0) at > HttpCacheSM.h:118 > #6 0x005897a9 in HttpSM::issue_cache_update (this=0x7fffe7f7b980) at > HttpSM.cc:5590 > #7 0x005895d6 in HttpSM::perform_cache_write_action > (this=0x7fffe7f7b980) at HttpSM.cc:5540 > #8 0x0058ef4d in HttpSM::set_next_state (this=0x7fffe7f7b980) at > HttpSM.cc:7206 > #9 0x0058e0be in HttpSM::call_transact_and_set_next_state > (this=0x7fffe7f7b980, f=0) at HttpSM.cc:6962 > #10 0x0057bedf in HttpSM::handle_api_return (this=0x7fffe7f7b980) at > HttpSM.cc:1531 > #11 0x005944ca in HttpSM::do_api_callout (this=0x7fffe7f7b980) at > HttpSM.cc:452 > #12 0x0057cf73 in HttpSM::state_read_server_response_header > (this=0x7fffe7f7b980, event=100, data=0x7fffe0015c78) at HttpSM.cc:1878 > #13 0x0057f536 in HttpSM::main_handler (this=0x7fffe7f7b980, > event=100, data=0x7fffe0015c78) at HttpSM.cc:2565 > #14 0x004f55a6 in Continuation::handleEvent (this=0x7fffe7f7b980, > event=100, data=0x7fffe0015c78) at ../iocore/eventsystem/I_Continuation.h:146 > #15 0x006ead77 in read_signal_and_update (event=100, > vc=0x7fffe0015b60) at UnixNetVConnection.cc:137 > #16 0x006eb5a7 in read_from_net (nh=0x737cea30, > vc=0x7fffe0015b60, thread=0x737cb010) at UnixNetVConnection.cc:320 > #17 0x006ed221 in UnixNetVConnection::net_read_io > (this=0x7fffe0015b60, nh=0x737cea30, lthread=0x737cb010) at > UnixNetVConnection.cc:846 > #18 0x006e4dd1 in NetHandler::mainNetEvent (this=0x737cea30, > event=5, e=0x1089e80) at UnixNet.cc:399 > #19 0x004f55a6 in Continuation::handleEvent (this=0x737cea30, > event=5, data=0x1089e80) at ../iocore/eventsystem/I_Continuation.h:146 > #20 0x0070bace in EThread::process_event (this=0x737cb010, > e=0x1089e80, calling_code=5) at UnixEThread.cc:144 > #21 0x0070bfd8 in EThread::execute (this=0x737cb010) at > UnixEThread.cc:268 > #22 0x00526644 in main (argv=0x7fffe368) at Main.cc:1763 > {code} > In HttpTransact::merge_response_header_with_cached_header(...), ATS will set > the old MIMEField as DELETED if it is duplicated, and attach new MIMEField, > this will increase the number of MIMEFieldBlockImpl, and the HdrHeap size may > increase to larger than 1M. > I suggest to ignore the useless MIMEFieldBlockImpl when copy the MIME header > in mime_hdr_copy_onto(...). -- This message
[jira] [Updated] (TS-4356) Deprecate cacheurl plugin
[ https://issues.apache.org/jira/browse/TS-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4356: - Backport to Version: 6.2.0 Fix Version/s: (was: 6.2.0) 7.0.0 > Deprecate cacheurl plugin > - > > Key: TS-4356 > URL: https://issues.apache.org/jira/browse/TS-4356 > Project: Traffic Server > Issue Type: Task > Components: Plugins >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > Deprecate cacheurl plugin in favor of cachekey plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4362) Remove cacheurl plugin
[ https://issues.apache.org/jira/browse/TS-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4362: - Fix Version/s: (was: 6.2.0) 7.0.0 > Remove cacheurl plugin > -- > > Key: TS-4362 > URL: https://issues.apache.org/jira/browse/TS-4362 > Project: Traffic Server > Issue Type: Task > Components: Plugins >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 7.0.0 > > > Deprecate cacheurl plugin in favor of cachekey plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4362) Remove cacheurl plugin
Gancho Tenev created TS-4362: Summary: Remove cacheurl plugin Key: TS-4362 URL: https://issues.apache.org/jira/browse/TS-4362 Project: Traffic Server Issue Type: Task Components: Plugins Reporter: Gancho Tenev Assignee: Gancho Tenev Fix For: 6.2.0 Deprecate cacheurl plugin in favor of cachekey plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4356) Deprecate cacheurl plugin
Gancho Tenev created TS-4356: Summary: Deprecate cacheurl plugin Key: TS-4356 URL: https://issues.apache.org/jira/browse/TS-4356 Project: Traffic Server Issue Type: Task Components: Plugins Reporter: Gancho Tenev Deprecate cacheurl plugin in favor of cachekey plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.
[ https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231444#comment-15231444 ] Gancho Tenev commented on TS-4334: -- Sure, please assign it to me. > The cache_range_requests plugin always attempts to modify the cache key. > > > Key: TS-4334 > URL: https://issues.apache.org/jira/browse/TS-4334 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Nolan Astrein > Fix For: 7.0.0 > > > A TrafficServer administrator should be able to specify whether or not the > cache_range_requests plugin should modify the cache key. The cache key may > be modified by a previous plugin in a plugin chain and there is no way to > configure cache_range_requests not to do any further modifications to the > cache key. Having multiple plugins responsible for cache key modifications > can cause unexpected behavior, especially when a plugin chain ordering is > changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4183) cachekey: URI and URI path capture/replacement
Gancho Tenev created TS-4183: Summary: cachekey: URI and URI path capture/replacement Key: TS-4183 URL: https://issues.apache.org/jira/browse/TS-4183 Project: Traffic Server Issue Type: Improvement Components: Plugins Reporter: Gancho Tenev Add means to add to the cache key by using regex capture and replace from URI path and URI as a whole Plugin parameters: --capture-prefix-uri=regex --capture-prefix-uri=/regex/replacement/ --capture-path-uri=regex --capture-path-uri=/regex/replacement/ --capture-path=regex --capture-path=/regex/replacement/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4161) ProcessManager prone to stack-overflow
Gancho Tenev created TS-4161: Summary: ProcessManager prone to stack-overflow Key: TS-4161 URL: https://issues.apache.org/jira/browse/TS-4161 Project: Traffic Server Issue Type: Bug Components: Manager Reporter: Gancho Tenev ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big number of messages in a raw from the same socket. Since alloca() is used to allocate buffers on the stack for each message read from the socket, and those buffers are not released until the function returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen if the message length is big enough (accidentally or on purpose). It can be reproduced easily by setting up: proxy.config.lm.pserver_timeout_secs: 0 proxy.config.lm.pserver_timeout_msecs: 0 in records.config and running ./bin/traffic_manager. ATS crashes with a segfault in a weird place (while trying to allocate with malloc()). If you inspect the core you would see that it got "stuck" in the loop before it crashed over-flowing the stack (kept allocating buffers on the stack with alloca() until it crashed). It is worth considering replacing the alloca() with VLA (which "releases" memory when out of scope on each iteration of the loop) or using ats_malloc() which is supposedly less time-efficient but would be better to handle bigger messages without worrying about stack-overflow. IMO adding a message size limit check is a good practice especially with the current implementation. If the code gets "stuck" in the while loop while reading big number of messages in a row from the same socket then the port configured by proxy.config.process_manager.mgmt_port becomes unavailable (connection refused). Adding a limit of messages that can be processed in a row should be a good idea. I stumbled up on this while running TSQA regression tests where TSQA kept complaining that the management port is not available and the ATS kept crashing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4161) ProcessManager prone to stack-overflow
[ https://issues.apache.org/jira/browse/TS-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4161: - Description: ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big number of messages in a row from the same socket. Since alloca() is used to allocate buffers on the stack for each message read from the socket, and those buffers are not released until the function returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen if the message length is big enough (accidentally or on purpose). It can be reproduced easily by setting up: proxy.config.lm.pserver_timeout_secs: 0 proxy.config.lm.pserver_timeout_msecs: 0 in records.config and running ./bin/traffic_manager. ATS crashes with a segfault in a weird place (while trying to allocate with malloc()). If you inspect the core you would see that it got "stuck" in the loop before it crashed over-flowing the stack (kept allocating buffers on the stack with alloca() until it crashed). It is worth considering replacing the alloca() with VLA (which "releases" memory when out of scope on each iteration of the loop) or using ats_malloc() which is supposedly less time-efficient but would be better to handle bigger messages without worrying about stack-overflow. IMO adding a message size limit check is a good practice especially with the current implementation. If the code gets "stuck" in the while loop while reading big number of messages in a row from the same socket then the port configured by proxy.config.process_manager.mgmt_port becomes unavailable (connection refused). Adding a limit of messages that can be processed in a row should be a good idea. I stumbled up on this while running TSQA regression tests where TSQA kept complaining that the management port is not available and the ATS kept crashing. was: ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big number of messages in a raw from the same socket. Since alloca() is used to allocate buffers on the stack for each message read from the socket, and those buffers are not released until the function returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen if the message length is big enough (accidentally or on purpose). It can be reproduced easily by setting up: proxy.config.lm.pserver_timeout_secs: 0 proxy.config.lm.pserver_timeout_msecs: 0 in records.config and running ./bin/traffic_manager. ATS crashes with a segfault in a weird place (while trying to allocate with malloc()). If you inspect the core you would see that it got "stuck" in the loop before it crashed over-flowing the stack (kept allocating buffers on the stack with alloca() until it crashed). It is worth considering replacing the alloca() with VLA (which "releases" memory when out of scope on each iteration of the loop) or using ats_malloc() which is supposedly less time-efficient but would be better to handle bigger messages without worrying about stack-overflow. IMO adding a message size limit check is a good practice especially with the current implementation. If the code gets "stuck" in the while loop while reading big number of messages in a row from the same socket then the port configured by proxy.config.process_manager.mgmt_port becomes unavailable (connection refused). Adding a limit of messages that can be processed in a row should be a good idea. I stumbled up on this while running TSQA regression tests where TSQA kept complaining that the management port is not available and the ATS kept crashing. > ProcessManager prone to stack-overflow > -- > > Key: TS-4161 > URL: https://issues.apache.org/jira/browse/TS-4161 > Project: Traffic Server > Issue Type: Bug > Components: Manager >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Labels: crash > Fix For: 6.2.0 > > > ProcessManager::pollLMConnection() can get "stuck" in a loop while handling > big number of messages in a row from the same socket. > Since alloca() is used to allocate buffers on the stack for each message read > from the socket, and those buffers are not released until the function > returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same > could happen if the message length is big enough (accidentally or on purpose). > It can be reproduced easily by setting up: > proxy.config.lm.pserver_timeout_secs: 0 > proxy.config.lm.pserver_timeout_msecs: 0 > in records.config and running ./bin/traffic_manager. > ATS crashes with a segfault in a weird place (while trying to allocate with > malloc()). If you inspect the core you would see that it got "stuck" in the > loop before it crashed
[jira] [Commented] (TS-4023) cachekey plugin
[ https://issues.apache.org/jira/browse/TS-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005547#comment-15005547 ] Gancho Tenev commented on TS-4023: -- Renamed to {{cachekey}}. > cachekey plugin > --- > > Key: TS-4023 > URL: https://issues.apache.org/jira/browse/TS-4023 > Project: Traffic Server > Issue Type: Improvement > Components: Cache >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 6.1.0 > > > This plugin allows some common cache key normalizations of the URI. It can > - sort query parameters so reordering can be a cache hit > - ignore specific query parameters from the cache key by name or regular > expression > - ignore all query parameters from the cache key > - only use specific query parameters in the cache key by name or regular > expression > - include headers or cookies by name > - capture / replace values from the User-Agent header. > - classify request using User-Agent and a list of regular expressions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4023) cachekey_norm plugin
Gancho Tenev created TS-4023: Summary: cachekey_norm plugin Key: TS-4023 URL: https://issues.apache.org/jira/browse/TS-4023 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: Gancho Tenev This plugin allows some common cache key normalizations of the URI. It can - sort query parameters so reordering can be a cache hit - ignore specific query parameters from the cache key by name or regular expression - ignore all query parameters from the cache key - only use specific query parameters in the cache key by name or regular expression - include headers or cookies by name - capture / replace values from the User-Agent header. - classify request using User-Agent and a list of regular expressions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4023) cachekey plugin
[ https://issues.apache.org/jira/browse/TS-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-4023: - Summary: cachekey plugin (was: cachekey_norm plugin) > cachekey plugin > --- > > Key: TS-4023 > URL: https://issues.apache.org/jira/browse/TS-4023 > Project: Traffic Server > Issue Type: Improvement > Components: Cache >Reporter: Gancho Tenev >Assignee: Gancho Tenev > Fix For: 6.1.0 > > > This plugin allows some common cache key normalizations of the URI. It can > - sort query parameters so reordering can be a cache hit > - ignore specific query parameters from the cache key by name or regular > expression > - ignore all query parameters from the cache key > - only use specific query parameters in the cache key by name or regular > expression > - include headers or cookies by name > - capture / replace values from the User-Agent header. > - classify request using User-Agent and a list of regular expressions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3883) ats_madvise() has no effect on Linux (including MADV_DONTDUMP)
Gancho Tenev created TS-3883: Summary: ats_madvise() has no effect on Linux (including MADV_DONTDUMP) Key: TS-3883 URL: https://issues.apache.org/jira/browse/TS-3883 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Gancho Tenev While investigating unrelated issue with truncated core dumps on Linux, noticed that we run out of space on a few machines because of huge core dumps which were tending to the ATS process virtual memory size (reported by /proc//status:VmSize on Linux). It looked like MADV_DONTDUMP memory use advice was not set properly. Further debugging showed that we have the following code in ats_madvise(): {code} #if defined(linux) (void)addr; (void)len; (void)flags; return 0; #else . . . {code} Which would lead ats_madvise() to have no effect when "defined(linux)" is true, and would skip the necessary madvise() call to set MADV_DONTDUMP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3820) Change default for proxy.config.http.redirect_host_no_port (to 1, enabled)
[ https://issues.apache.org/jira/browse/TS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703241#comment-14703241 ] Gancho Tenev commented on TS-3820: -- The code worked as expected, will upload the TSQA test that I used to verify it later. Change default for proxy.config.http.redirect_host_no_port (to 1, enabled) -- Key: TS-3820 URL: https://issues.apache.org/jira/browse/TS-3820 Project: Traffic Server Issue Type: Improvement Components: Configuration Reporter: Leif Hedstrom Assignee: Gancho Tenev Fix For: 6.1.0 I think the behavior to not add on a port if matches the default scheme makes more sense. I assume the config, and default to 0, was done for backwards compatibility? This relates to 56fbfdd2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
[ https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916 ] Gancho Tenev edited comment on TS-3740 at 8/13/15 8:50 PM: --- Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of PATH and in QSA mode (appending query parameters to) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI to form Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! was (Author: gancho): Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of %{PATH} and in QSA mode (appending query parameters to) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI to form Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK Key: TS-3740 URL: https://issues.apache.org/jira/browse/TS-3740 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Assignee: Gancho Tenev Fix For: 6.1.0 DESCRIPTION: ATS header_rewrite plugin set-redirect operation doesn't work with SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info. HOW TO REPRODUCE: Here is a sample plugin configuration files that reproduce the problem $ cat /opt/ats/etc/trafficserver/remap.config map http://p1 http://h1:8001 \ @plugin=header_rewrite.so @pparam=/opt/ats/etc/trafficserver/header_rewrite.config $ cat /opt/ats/etc/trafficserver/header_rewrite.config cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =502 set-redirect 302 http://p0/%{PATH} [QSA] DEBUGGING NOTES: Both conditions in the header_rewrite.config are evaluated correctly but set-redirect has no effect and the response to the UA is not modified as expected. After some debugging it turned out that if the set-redirect (OperatorSetDestination::exec) is not called from the remap plugin it has no effect. The header_rewrite plugin creates a continuation to be called from SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec doesn't have code to handle the case when the set-redirect operation is _not_ called directly from the remap plugin (TSRemapDoRemap()). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
[ https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916 ] Gancho Tenev commented on TS-3740: -- Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of %{PATH} and in QSA mode (appending query parameters to) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI to form Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK Key: TS-3740 URL: https://issues.apache.org/jira/browse/TS-3740 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Assignee: Gancho Tenev Fix For: 6.1.0 DESCRIPTION: ATS header_rewrite plugin set-redirect operation doesn't work with SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info. HOW TO REPRODUCE: Here is a sample plugin configuration files that reproduce the problem $ cat /opt/ats/etc/trafficserver/remap.config map http://p1 http://h1:8001 \ @plugin=header_rewrite.so @pparam=/opt/ats/etc/trafficserver/header_rewrite.config $ cat /opt/ats/etc/trafficserver/header_rewrite.config cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =502 set-redirect 302 http://p0/%{PATH} [QSA] DEBUGGING NOTES: Both conditions in the header_rewrite.config are evaluated correctly but set-redirect has no effect and the response to the UA is not modified as expected. After some debugging it turned out that if the set-redirect (OperatorSetDestination::exec) is not called from the remap plugin it has no effect. The header_rewrite plugin creates a continuation to be called from SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec doesn't have code to handle the case when the set-redirect operation is _not_ called directly from the remap plugin (TSRemapDoRemap()). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
[ https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916 ] Gancho Tenev edited comment on TS-3740 at 8/13/15 8:55 PM: --- Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of PATH and in QSA mode (appending query parameters) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI when forming Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! was (Author: gancho): Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of PATH and in QSA mode (appending query parameters) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI to form Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK Key: TS-3740 URL: https://issues.apache.org/jira/browse/TS-3740 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Assignee: Gancho Tenev Fix For: 6.1.0 DESCRIPTION: ATS header_rewrite plugin set-redirect operation doesn't work with SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info. HOW TO REPRODUCE: Here is a sample plugin configuration files that reproduce the problem $ cat /opt/ats/etc/trafficserver/remap.config map http://p1 http://h1:8001 \ @plugin=header_rewrite.so @pparam=/opt/ats/etc/trafficserver/header_rewrite.config $ cat /opt/ats/etc/trafficserver/header_rewrite.config cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =502 set-redirect 302 http://p0/%{PATH} [QSA] DEBUGGING NOTES: Both conditions in the header_rewrite.config are evaluated correctly but set-redirect has no effect and the response to the UA is not modified as expected. After some debugging it turned out that if the set-redirect (OperatorSetDestination::exec) is not called from the remap plugin it has no effect. The header_rewrite plugin creates a continuation to be called from SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec doesn't have code to handle the case when the set-redirect operation is _not_ called directly from the remap plugin (TSRemapDoRemap()). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
[ https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916 ] Gancho Tenev edited comment on TS-3740 at 8/13/15 8:56 PM: --- Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of PATH and in QSA mode (appending query parameters) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI when forming Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! was (Author: gancho): Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of PATH and in QSA mode (appending query parameters) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI when forming Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK Key: TS-3740 URL: https://issues.apache.org/jira/browse/TS-3740 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Assignee: Gancho Tenev Fix For: 6.1.0 DESCRIPTION: ATS header_rewrite plugin set-redirect operation doesn't work with SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info. HOW TO REPRODUCE: Here is a sample plugin configuration files that reproduce the problem $ cat /opt/ats/etc/trafficserver/remap.config map http://p1 http://h1:8001 \ @plugin=header_rewrite.so @pparam=/opt/ats/etc/trafficserver/header_rewrite.config $ cat /opt/ats/etc/trafficserver/header_rewrite.config cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =502 set-redirect 302 http://p0/%{PATH} [QSA] DEBUGGING NOTES: Both conditions in the header_rewrite.config are evaluated correctly but set-redirect has no effect and the response to the UA is not modified as expected. After some debugging it turned out that if the set-redirect (OperatorSetDestination::exec) is not called from the remap plugin it has no effect. The header_rewrite plugin creates a continuation to be called from SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec doesn't have code to handle the case when the set-redirect operation is _not_ called directly from the remap plugin (TSRemapDoRemap()). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
[ https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916 ] Gancho Tenev edited comment on TS-3740 at 8/13/15 8:52 PM: --- Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of PATH and in QSA mode (appending query parameters) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI to form Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! was (Author: gancho): Yes, just by reading around the 2 patches (didn't have a chance to experiment with TS-3137 patch) it seems that they are meant to accomplish the same thing - enable set-redirect operation when not called from the remap plugin. There is an important difference in the replacing of PATH and in QSA mode (appending query parameters to) when forming the Location header. TS-3740 patch always uses client request URI when replacing those variables to form the Location header regardless of which hook condition matches while TS-3137 patch uses the corresponding URI at each particular hook (which is consistent with the way set-destination is implemented). I don't have visibility of all header-rewrite use-cases but it seems that although at the time it looked more reasonable to always use the client request URI to form Location header (it is always available regardless of which hook condition matches, it fitted the above origin time-out use-case well and seemed a more straightforward way to configure the redirects), it may be more reasonable to do it in TS-3137 way (which looks also more consistent with set-destination operation implementation as well). Any ideas and opinions are appreciated! header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK Key: TS-3740 URL: https://issues.apache.org/jira/browse/TS-3740 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Assignee: Gancho Tenev Fix For: 6.1.0 DESCRIPTION: ATS header_rewrite plugin set-redirect operation doesn't work with SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info. HOW TO REPRODUCE: Here is a sample plugin configuration files that reproduce the problem $ cat /opt/ats/etc/trafficserver/remap.config map http://p1 http://h1:8001 \ @plugin=header_rewrite.so @pparam=/opt/ats/etc/trafficserver/header_rewrite.config $ cat /opt/ats/etc/trafficserver/header_rewrite.config cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =502 set-redirect 302 http://p0/%{PATH} [QSA] DEBUGGING NOTES: Both conditions in the header_rewrite.config are evaluated correctly but set-redirect has no effect and the response to the UA is not modified as expected. After some debugging it turned out that if the set-redirect (OperatorSetDestination::exec) is not called from the remap plugin it has no effect. The header_rewrite plugin creates a continuation to be called from SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec doesn't have code to handle the case when the set-redirect operation is _not_ called directly from the remap plugin (TSRemapDoRemap()). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
Gancho Tenev created TS-3740: Summary: header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK Key: TS-3740 URL: https://issues.apache.org/jira/browse/TS-3740 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev DESCRIPTION: ATS header_rewrite plugin set-redirect operation doesn't work with SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info. HOW TO REPRODUCE: Here is a sample plugin configuration files that reproduce the problem $ cat /opt/ats/etc/trafficserver/remap.config map http://p1 http://h1:8001 \ @plugin=header_rewrite.so @pparam=/opt/ats/etc/trafficserver/header_rewrite.config $ cat /opt/ats/etc/trafficserver/header_rewrite.config cond %{SEND_RESPONSE_HDR_HOOK} cond %{STATUS} =502 set-redirect 302 http://p0/%{PATH} [QSA] DEBUGGING NOTES: Both conditions in the header_rewrite.config are evaluated correctly but set-redirect has no effect and the response to the UA is not modified as expected. After some debugging it turned out that if the set-redirect (OperatorSetDestination::exec) is not called from the remap plugin it has no effect. The header_rewrite plugin creates a continuation to be called from SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec doesn't have code to handle the case when the set-redirect operation is _not_ called directly from the remap plugin (TSRemapDoRemap()). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)
[ https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-3649: - Attachment: TS-3649-url_sig-security_issues.patch Please find the patch attached: TS-3649-url_sig-security_issues.patch url_sig plugin security issues (crash by HTTP request, circumvent signature) Key: TS-3649 URL: https://issues.apache.org/jira/browse/TS-3649 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Assignee: Gancho Tenev Fix For: 6.0.0 Attachments: TS-3649-url_sig-security_issues.patch, TS-3649-url_sig-security_issues.rtf While reading the code found 2 security issues url_sig code which would allow: - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP request (segmentation fault due out-of-bounds array access) - there is a need of proper sanitation of the key index input (query parameter) - Issue 2: to gain access to protected assets by signing the URL with an empty secret key if at least one of the 16 keys is not provided in the uri_sig plugin configuration. One could scan trying all keys 0 to 15 and for the empty key the signature validation would succeed - must deny access if the key specified in the signature is not defined in the plugin config (empty). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)
[ https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-3649: - Description: While reading the code found 2 security issues url_sig code which would allow: - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP request (segmentation fault due out-of-bounds array access) - there is a need of proper sanitation of the key index input (query parameter) - Issue 2: to gain access to protected assets by signing the URL with an empty secret key if at least one of the 16 keys is not provided in the uri_sig plugin configuration. One could scan trying all keys 0 to 15 and for the empty key the signature validation would succeed - must deny access if the key specified in the signature is not defined in the plugin config (empty). was: While reading the code found 2 security issues url_sig code which would allow: - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP request (segmentation fault due out-of-bounds array access) - there is a need of proper sanitation of the key index input (query parameter) - Issue 2: to gain access to protected assets by signing the URL with an empty secret key if at least one of the 16 keys is not provided in the uri_sig plugin configuration. One could scan trying all keys 0 to 15 and for the empty key the signature validation would succeed - must to deny access if the key specified in the signature is not defined in the plugin config (empty). url_sig plugin security issues (crash by HTTP request, circumvent signature) Key: TS-3649 URL: https://issues.apache.org/jira/browse/TS-3649 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Attachments: TS-3649-url_sig-security_issues.rtf While reading the code found 2 security issues url_sig code which would allow: - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP request (segmentation fault due out-of-bounds array access) - there is a need of proper sanitation of the key index input (query parameter) - Issue 2: to gain access to protected assets by signing the URL with an empty secret key if at least one of the 16 keys is not provided in the uri_sig plugin configuration. One could scan trying all keys 0 to 15 and for the empty key the signature validation would succeed - must deny access if the key specified in the signature is not defined in the plugin config (empty). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)
[ https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565566#comment-14565566 ] Gancho Tenev commented on TS-3649: -- The fix is ready as well. url_sig plugin security issues (crash by HTTP request, circumvent signature) Key: TS-3649 URL: https://issues.apache.org/jira/browse/TS-3649 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Attachments: TS-3649-url_sig-security_issues.rtf While reading the code found 2 security issues url_sig code which would allow: - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP request (segmentation fault due out-of-bounds array access) - there is a need of proper sanitation of the key index input (query parameter) - Issue 2: to gain access to protected assets by signing the URL with an empty secret key if at least one of the 16 keys is not provided in the uri_sig plugin configuration. One could scan trying all keys 0 to 15 and for the empty key the signature validation would succeed - must to deny access if the key specified in the signature is not defined in the plugin config (empty). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)
Gancho Tenev created TS-3649: Summary: url_sig plugin security issues (crash by HTTP request, circumvent signature) Key: TS-3649 URL: https://issues.apache.org/jira/browse/TS-3649 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev While reading the code found 2 security issues url_sig code which would allow: - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP request (segmentation fault due out-of-bounds array access) - there is a need of proper sanitation of the key index input (query parameter) - Issue 2: to gain access to protected assets by signing the URL with an empty secret key if at least one of the 16 keys is not provided in the uri_sig plugin configuration. One could scan trying all keys 0 to 15 and for the empty key the signature validation would succeed - must to deny access if the key specified in the signature is not defined in the plugin config (empty). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)
[ https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-3649: - Attachment: TS-3649-url_sig-security_issues.rtf Please find information on how to setup the environment and steps to reproduce both issues in the attached file. url_sig plugin security issues (crash by HTTP request, circumvent signature) Key: TS-3649 URL: https://issues.apache.org/jira/browse/TS-3649 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Gancho Tenev Attachments: TS-3649-url_sig-security_issues.rtf While reading the code found 2 security issues url_sig code which would allow: - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP request (segmentation fault due out-of-bounds array access) - there is a need of proper sanitation of the key index input (query parameter) - Issue 2: to gain access to protected assets by signing the URL with an empty secret key if at least one of the 16 keys is not provided in the uri_sig plugin configuration. One could scan trying all keys 0 to 15 and for the empty key the signature validation would succeed - must to deny access if the key specified in the signature is not defined in the plugin config (empty). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3340) Coverity fixes
[ https://issues.apache.org/jira/browse/TS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-3340: - Description: This issue is meant for attaching all Coverity fixes from Gancho Tenev for 5.3.0 release. (was: This issue is meant for attaching all Coverity fixes from Gancho Tenev for 5.3.x release.) Fix Version/s: 5.3.0 Coverity fixes -- Key: TS-3340 URL: https://issues.apache.org/jira/browse/TS-3340 Project: Traffic Server Issue Type: Improvement Reporter: Gancho Tenev Priority: Minor Fix For: 5.3.0 This issue is meant for attaching all Coverity fixes from Gancho Tenev for 5.3.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3340) Coverity fixes
Gancho Tenev created TS-3340: Summary: Coverity fixes Key: TS-3340 URL: https://issues.apache.org/jira/browse/TS-3340 Project: Traffic Server Issue Type: Improvement Reporter: Gancho Tenev This issue is meant for attaching all Coverity fixes from Gancho Tenev for 5.3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3340) Coverity fixes
[ https://issues.apache.org/jira/browse/TS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gancho Tenev updated TS-3340: - Priority: Minor (was: Major) Coverity fixes -- Key: TS-3340 URL: https://issues.apache.org/jira/browse/TS-3340 Project: Traffic Server Issue Type: Improvement Reporter: Gancho Tenev Priority: Minor This issue is meant for attaching all Coverity fixes from Gancho Tenev for 5.3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)