[jira] [Updated] (TS-5072) logstats: fix log buffer parser running state update

2016-12-02 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-5072:
-
Fix Version/s: 7.1.0

> logstats: fix log buffer parser running state update
> 
>
> Key: TS-5072
> URL: https://issues.apache.org/jira/browse/TS-5072
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging, Tools
>Reporter: Gancho Tenev
> Fix For: 7.1.0
>
>
> While refactoring for better code reuse while working on TS-5069 found the 
> following comment in the code that parses the log buffer:
> {code}
> // TODO: If we save state (struct) for a run, we probably need to always
> // update the origin data, no matter what the origin_set is.
> {code}
> (before in {{parse_log_buff()}}, now in {{find_or_create_stats()}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-5072) logstats: fix log buffer parser running state update

2016-12-02 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-5072:


 Summary: logstats: fix log buffer parser running state update
 Key: TS-5072
 URL: https://issues.apache.org/jira/browse/TS-5072
 Project: Traffic Server
  Issue Type: Bug
  Components: Logging, Tools
Reporter: Gancho Tenev


While refactoring for better code reuse while working on TS-5069 found the 
following comment in the code that parses the log buffer:

{code}
// TODO: If we save state (struct) for a run, we probably need to always
// update the origin data, no matter what the origin_set is.
{code}

(before in {{parse_log_buff()}}, now in {{find_or_create_stats()}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-5069) logstats: add ability to report stats per user instead of host

2016-11-29 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-5069:
-
Fix Version/s: 7.1.0

> logstats: add ability to report stats per user instead of host
> --
>
> Key: TS-5069
> URL: https://issues.apache.org/jira/browse/TS-5069
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging, Tools
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.1.0
>
>
> We would like to enhance {{traffic_logstats}} with the ability to report 
> stats per user instead of host (from the URI).
> Currently the {{traffic_logstats}} expects a binary squid log format defined 
> in the following ATS log config and aggregates and reports stats per the 
> authority part of the URI ({{host:port}} in an usual use-case)
> {code}
>  % % %/% % % % 
> % %/% %"/>
> {code}
> It would be useful for our use-case to be able aggregate and report stats 
> based on the 8th squid log field which is an username of the authenticated 
> client {{%}}).
> In our use-case we need to aggregate and report stats per 
> CDN-customer-specific-tag. 
> For example the new functionality would allow us to replace {{%caun}} with a 
> random header content {{%<\{@CustomerTagHeader\}cqh>}} and report stats per 
> CDN customer by using a new command line parameter {{--report_per_user}} w/o 
> adding extra fields to the binary squid format log expected by 
> {{traffic_logstats}} and keep it backward compatible with the previous 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-5069) logstats: add ability to report stats per user instead of host

2016-11-29 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev reassigned TS-5069:


Assignee: Gancho Tenev

> logstats: add ability to report stats per user instead of host
> --
>
> Key: TS-5069
> URL: https://issues.apache.org/jira/browse/TS-5069
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging, Tools
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>
> We would like to enhance {{traffic_logstats}} with the ability to report 
> stats per user instead of host (from the URI).
> Currently the {{traffic_logstats}} expects a binary squid log format defined 
> in the following ATS log config and aggregates and reports stats per the 
> authority part of the URI ({{host:port}} in an usual use-case)
> {code}
>  % % %/% % % % 
> % %/% %"/>
> {code}
> It would be useful for our use-case to be able aggregate and report stats 
> based on the 8th squid log field which is an username of the authenticated 
> client {{%}}).
> In our use-case we need to aggregate and report stats per 
> CDN-customer-specific-tag. 
> For example the new functionality would allow us to replace {{%caun}} with a 
> random header content {{%<\{@CustomerTagHeader\}cqh>}} and report stats per 
> CDN customer by using a new command line parameter {{--report_per_user}} w/o 
> adding extra fields to the binary squid format log expected by 
> {{traffic_logstats}} and keep it backward compatible with the previous 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-5069) logstats: add ability to report stats per user instead of host

2016-11-29 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-5069:


 Summary: logstats: add ability to report stats per user instead of 
host
 Key: TS-5069
 URL: https://issues.apache.org/jira/browse/TS-5069
 Project: Traffic Server
  Issue Type: Bug
  Components: Logging, Tools
Reporter: Gancho Tenev


We would like to enhance {{traffic_logstats}} with the ability to report stats 
per user instead of host (from the URI).

Currently the {{traffic_logstats}} expects a binary squid log format defined in 
the following ATS log config and aggregates and reports stats per the authority 
part of the URI ({{host:port}} in an usual use-case)

{code}
 % % %/% % % % 
% %/% %"/>
{code}

It would be useful for our use-case to be able aggregate and report stats based 
on the 8th squid log field which is an username of the authenticated client 
{{%}}).

In our use-case we need to aggregate and report stats per 
CDN-customer-specific-tag. 

For example the new functionality would allow us to replace {{%caun}} with a 
random header content {{%<\{@CustomerTagHeader\}cqh>}} and report stats per CDN 
customer by using a new command line parameter {{--report_per_user}} w/o adding 
extra fields to the binary squid format log expected by {{traffic_logstats}} 
and keep it backward compatible with the previous version.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-11-02 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629730#comment-15629730
 ] 

Gancho Tenev commented on TS-4916:
--

already back-ported with PR #1157

> Http2ConnectionState::restart_streams infinite loop causes deadlock 
> 
>
> Key: TS-4916
> URL: https://issues.apache.org/jira/browse/TS-4916
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, HTTP/2
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>Priority: Blocker
> Fix For: 7.0.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Http2ConnectionState::restart_streams falls into an infinite loop while 
> holding a lock, which leads to cache updates to start failing.
> The infinite loop is caused by traversing a list whose last element “next” 
> points to the element itself and the traversal never finishes.
> {code}
> Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> #1  rcv_window_update_frame (cstate=..., frame=...) at 
> Http2ConnectionState.cc:627
> #2  0x2acf9738 in Http2ConnectionState::main_event_handler 
> (this=0x2ae6ba5284c8, event=, edata=) at 
> Http2ConnectionState.cc:823
> #3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
> event=2253, this=0x2ae6ba5284c8) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
> event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
> Http2ClientSession.cc:58
> #5  0x2acef462 in Http2ClientSession::state_complete_frame_read 
> (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at 
> Http2ClientSession.cc:426
> #6  0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #8  0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
> #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
> event=event@entry=100) at UnixNetVConnection.cc:153
> #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
> event=event@entry=100) at UnixNetVConnection.cc:1036
> #15 0x2ae47653 in SSLNetVConnection::net_read_io 
> (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at 
> SSLNetVConnection.cc:595
> #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
> event=, e=) at UnixNet.cc:513
> #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
> event=5, this=) at I_Continuation.h:153
> #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
> this=0x2aaab2406000) at UnixEThread.cc:148
> #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
> #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
> Thread.cc:86
> #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
> pthread_create.c:301
> #22 0x2e8bc93d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> {code}
> Here is the stream_list trace.
> {code}
> (gdb) thread 51
> [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> (gdb) trace_list stream_list
> --- count=0 ---
> id=29
> this=0x2ae673f0c840
> next=0x2aaac05d8900
> prev=(nil)
> --- count=1 ---
> id=27
> this=0x2aaac05d8900
> next=0x2ae5b6bbec00
> prev=0x2ae673f0c840
> --- count=2 ---
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> --- count=3 ---
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . . 
> --- count=5560 ---
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . .
> {code}
> Currently I am working on finding out why the list in question got into this 
> “impossible” (broken) 

[jira] [Comment Edited] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-10-11 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565557#comment-15565557
 ] 

Gancho Tenev edited comment on TS-4916 at 10/11/16 3:07 PM:


[~shinrich], appreciate your comment!

I am running with version 6.2.1 and not with master.

{{DLL<>}} not being thread-safe and multiple threads manipulating it 
concurrently was the first thing that came to mind. Changed the code so 
{{client_streams_count}} ++ / -- and {{stream_list.push()}} / 
{{stream_list.remove()}} are called from only 2 corresponding new functions 
{{add_to_active_streams()}} and {{rm_from_active_streams()}}  (in 
{{Http2ConnectionState}}) and added assert like the following:

{code}
(1) ink_assert(this->mutex->thread_holding == this_ethread());
{code}

which never triggered in my case (9+ days already)! 

In fact traffic_server gets into the same broken state described in my previous 
post (broken DLL structure with few elements where the last element points to 
itself) reliably and consistently on all the machines I inspected (10+ at this 
time) and I saw the same symptoms (cache update failures skyrocket after the 
H2-infinite loop happens) on many more machines.

Then came up with the hypothesis described earlier and added the following 
assert to the {{add_to_active_streams()}} function:

{code}
(2) ink_assert(stream_list.in(new_stream));
{code}

which started triggering quickly (for < 1 day) on all machines I tested. It was 
always the case that {{stream_list.head == new_stream}} (memory chunk with the 
same address have been already added to stream_list).

Then changed the {{Http2Stream::delete_stream()}} to be called safely even if 
already deleted and added a “catch-all” call in {{Http2Stream::destroy()}} to 
make sure stream is deleted before destroying and the problem never happened 
((1) and (2) never triggered for at least 3+ days).

Looking into it more it turned out if {{Http2Stream::do_io_close()}} gets into 
{{_state==HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE}} (in version 6.2.1) it misses 
to delete the stream before it orders the stream self destruction (few lines 
below) so when it gets to {{Http2Stream::destroy()}} it frees the memory 
without removing the stream from the active list and then it runs into the 
H2-infinite-loop if the sequence of events described in my previous post 
happens.

Making sure {{Http2Stream::do_io_close()}} deletes the stream before requesting 
the self-destruction (with {{VC_EVENT_EOS}}) fixes the problem (it can be 
solved in many different ways). 

Although I can see that there might be a race condition in that part of the 
code I just have never run into it ((1) never triggers and all experiments fit 
pretty well what I described in my previous post)

After your comment I identified 4 places where we might not be holding the lock 
properly when modifying stream_list/client_stream_cout (not actually happening, 
just reading/tracing the code), added them and repeated the experiments and got 
the same results (including the failures (2) and the fix).

I see that the master code has changed quite a lot since 6.2.1, checked with 
[~zwoop] and will probably prepare a PR for a back port to 6.2.x.

Since the master code changed (i.e. {{Http2Stream::do_io_close()}}) there is a 
chance we don’t run into this H2-infinite-loop condition anymore (have not 
tested it yet) but we still use {{DLL<>}} and a memory pool in the same way so 
it seems possible that we can run into the same problem if we are not careful 
and will study the new master code more to see if we can do something to avoid 
it.

Cheers,
—Gancho



was (Author: gancho):
[~shinrich], appreciate your comment!

I am running with version 6.2.1 and not with master.

{{DLL<>}} not being thread-safe and multiple threads manipulating it 
concurrently was the first thing that came to mind. Changed the code so 
{{client_streams_count}} ++ / -- and {{stream_list.push()}} / 
{{stream_list.remove()}} are called from only 2 corresponding new functions 
{{add_to_active_streams()}} and {{rm_from_active_streams()}}  (in 
{{Http2ConnectionState}}) and added assert like the following:

{code}
(1) ink_assert(this->mutex->thread_holding == this_ethread());
{code}

which never triggered in my case (9+ days already)! 

In fact traffic_server gets into the same broken state described in my previous 
post (broken DLL structure with few elements where the last element points to 
itself) reliably and consistently on all the machines I inspected (10+ at this 
time) and I saw the same symptoms (cache update failures skyrocket after the 
H2-infinite loop happens) on many more machines.

Then came up with the hypothesis described earlier and added the following 
assert to the {{add_to_active_streams()}} function:

{code}
(2) ink_assert(stream_list.in(stream));
{code}

which started triggering quickly (for < 1 day) on all machines I 

[jira] [Comment Edited] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-10-11 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565557#comment-15565557
 ] 

Gancho Tenev edited comment on TS-4916 at 10/11/16 3:02 PM:


[~shinrich], appreciate your comment!

I am running with version 6.2.1 and not with master.

{{DLL<>}} not being thread-safe and multiple threads manipulating it 
concurrently was the first thing that came to mind. Changed the code so 
{{client_streams_count}} ++ / -- and {{stream_list.push()}} / 
{{stream_list.remove()}} are called from only 2 corresponding new functions 
{{add_to_active_streams()}} and {{rm_from_active_streams()}}  (in 
{{Http2ConnectionState}}) and added assert like the following:

{code}
(1) ink_assert(this->mutex->thread_holding == this_ethread());
{code}

which never triggered in my case (9+ days already)! 

In fact traffic_server gets into the same broken state described in my previous 
post (broken DLL structure with few elements where the last element points to 
itself) reliably and consistently on all the machines I inspected (10+ at this 
time) and I saw the same symptoms (cache update failures skyrocket after the 
H2-infinite loop happens) on many more machines.

Then came up with the hypothesis described earlier and added the following 
assert to the {{add_to_active_streams()}} function:

{code}
(2) ink_assert(stream_list.in(stream));
{code}

which started triggering quickly (for < 1 day) on all machines I tested. It was 
always the case that {{stream_list.head == new_stream}} (memory chunk with the 
same address have been already added to stream_list).

Then changed the {{Http2Stream::delete_stream()}} to be called safely even if 
already deleted and added a “catch-all” call in {{Http2Stream::destroy()}} to 
make sure stream is deleted before destroying and the problem never happened 
((1) and (2) never triggered for at least 3+ days).

Looking into it more it turned out if {{Http2Stream::do_io_close()}} gets into 
{{_state==HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE}} (in version 6.2.1) it misses 
to delete the stream before it orders the stream self destruction (few lines 
below) so when it gets to {{Http2Stream::destroy()}} it frees the memory 
without removing the stream from the active list and then it runs into the 
H2-infinite-loop if the sequence of events described in my previous post 
happens.

Making sure {{Http2Stream::do_io_close()}} deletes the stream before requesting 
the self-destruction (with {{VC_EVENT_EOS}}) fixes the problem (it can be 
solved in many different ways). 

Although I can see that there might be a race condition in that part of the 
code I just have never run into it ((1) never triggers and all experiments fit 
pretty well what I described in my previous post)

After your comment I identified 4 places where we might not be holding the lock 
properly when modifying stream_list/client_stream_cout (not actually happening, 
just reading/tracing the code), added them and repeated the experiments and got 
the same results (including the failures (2) and the fix).

I see that the master code has changed quite a lot since 6.2.1, checked with 
[~zwoop] and will probably prepare a PR for a back port to 6.2.x.

Since the master code changed (i.e. {{Http2Stream::do_io_close()}}) there is a 
chance we don’t run into this H2-infinite-loop condition anymore (have not 
tested it yet) but we still use {{DLL<>}} and a memory pool in the same way so 
it seems possible that we can run into the same problem if we are not careful 
and will study the new master code more to see if we can do something to avoid 
it.

Cheers,
—Gancho



was (Author: gancho):
[~shinrich], appreciate your comment!

I am running with version 6.2.1 and not with master.

{{DLL<>}} not being thread-safe and multiple threads manipulating it 
concurrently was the first thing that came to mind. Changed the code so 
{{client_streams_count}} ++ / -- and {{stream_list.push()}} / 
{{stream_list.remove()}} are called from only 2 corresponding new functions 
{{add_to_active_streams()}} and {{rm_from_active_streams()}}  (in 
{{Http2ConnectionState}}) and added assert like the following:

{code}
(1) ink_assert(this->mutex->thread_holding == this_ethread());
{code}

which never triggered in my case (9+ days already)! 

In fact traffic_server gets into the same broken state described in my previous 
post (broken DLL structure with few elements where the last element points to 
itself) reliably and consistently on all the machines I inspected (10+ at this 
time) and I saw the same symptoms (cache update failures skyrocket after the 
H2-infinite loop happens) on many more machines.

Then came up with the hypothesis described earlier and added the following 
assert to the {{add_to_active_streams()}} function:

{code}
(2) ink_assert(stream_list.in(stream));
{code}

which started triggering quickly (for < 1 day) on all machines I tested. It 

[jira] [Commented] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-10-11 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565557#comment-15565557
 ] 

Gancho Tenev commented on TS-4916:
--

[~shinrich], appreciate your comment!

I am running with version 6.2.1 and not with master.

{{DLL<>}} not being thread-safe and multiple threads manipulating it 
concurrently was the first thing that came to mind. Changed the code so 
{{client_streams_count}} ++ / -- and {{stream_list.push()}} / 
{{stream_list.remove()}} are called from only 2 corresponding new functions 
{{add_to_active_streams()}} and {{rm_from_active_streams()}}  (in 
{{Http2ConnectionState}}) and added assert like the following:

{code}
(1) ink_assert(this->mutex->thread_holding == this_ethread());
{code}

which never triggered in my case (9+ days already)! 

In fact traffic_server gets into the same broken state described in my previous 
post (broken DLL structure with few elements where the last element points to 
itself) reliably and consistently on all the machines I inspected (10+ at this 
time) and I saw the same symptoms (cache update failures skyrocket after the 
H2-infinite loop happens) on many more machines.

Then came up with the hypothesis described earlier and added the following 
assert to the {{add_to_active_streams()}} function:

{code}
(2) ink_assert(stream_list.in(stream));
{code}

which started triggering quickly (for < 1 day) on all machines I tested. It was 
always the case that {{stream_list.head == new_stream}} (memory chunk with the 
same address have been already added to stream_list).

Then changed the {{Http2Stream::delete_stream()}} to be called safely even if 
already deleted and added a “catch-all” call in {{Http2Stream::destroy()}} to 
make sure stream is deleted before destroying and the problem never happened 
((1) and (2) never triggered for at least 3+ days).

Looking into it more it turned out if {{Http2Stream::do_io_close()}} gets into 
{{_state==HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE}} (in version 6.2.1) it misses 
to delete the stream before it orders the stream self destruction (few lines 
below) so when it gets to {{Http2Stream::destroy()}} it frees the memory 
without removing the stream from the active list and then it runs into the 
H2-infinite-loop if the sequence of events described in my previous post 
happens.

Making sure {{Http2Stream::destroy()}} deletes the stream before requesting the 
self-destruction (with {{VC_EVENT_EOS}}) fixes the problem (it can be solved in 
many different ways). 

Although I can see that there might be a race condition in that part of the 
code I just have never run into it ((1) never triggers and all experiments fit 
pretty well what I described in my previous post)

After your comment I identified 4 places where we might not be holding the lock 
properly when modifying stream_list/client_stream_cout (not actually happening, 
just reading/tracing the code), added them and repeated the experiments and got 
the same results (including the failures (2) and the fix).

I see that the master code has changed quite a lot since 6.2.1, checked with 
[~zwoop] and will probably prepare a PR for a back port to 6.2.x.

Since the master code changed (i.e. {{Http2Stream::do_io_close()}}) there is a 
chance we don’t run into this H2-infinite-loop condition anymore (have not 
tested it yet) but we still use {{DLL<>}} and a memory pool in the same way so 
it seems possible that we can run into the same problem if we are not careful 
and will study the new master code more to see if we can do something to avoid 
it.

Cheers,
—Gancho


> Http2ConnectionState::restart_streams infinite loop causes deadlock 
> 
>
> Key: TS-4916
> URL: https://issues.apache.org/jira/browse/TS-4916
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, HTTP/2
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Http2ConnectionState::restart_streams falls into an infinite loop while 
> holding a lock, which leads to cache updates to start failing.
> The infinite loop is caused by traversing a list whose last element “next” 
> points to the element itself and the traversal never finishes.
> {code}
> Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> #1  rcv_window_update_frame (cstate=..., frame=...) at 
> Http2ConnectionState.cc:627
> #2  0x2acf9738 in Http2ConnectionState::main_event_handler 
> (this=0x2ae6ba5284c8, event=, edata=) at 
> Http2ConnectionState.cc:823
> #3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
> event=2253, this=0x2ae6ba5284c8) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #4 

[jira] [Commented] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-10-05 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548770#comment-15548770
 ] 

Gancho Tenev commented on TS-4916:
--

Looked into this more and here are my findings / hypothesis.

*Studied the list code (lib/ts/List.h)* and found a couple of problems (filed 
[TS-4935|https://issues.apache.org/jira/browse/TS-4935])

If the list is used improperly its internal structure gets damaged silently. If 
the the same element is added twice in a row the element’s “next” would start 
pointing to the element itself and all the pre-existing list content would be 
lost. All further additions will be OK but the next list traversal will be 
infinite.

*How would we add the same element twice?*

Since a memory pool is used to instantiate the streams it is not impossible to 
have exactly the same chunk returned by the pool.

*How could adding the same chunk happen?*
# a stream N is created and used, no new streams are created in the meanwhile
# the stream N is closed and its memory chunk is released back to the pool when 
the stream is destroyed.
# fail to remove stream N from the list of active streams (bug!)
# a new stream N+1 is created right after destroying stream N (and getting 
exactly the same memory chunk from the memory pool used by stream N)
# add the new stream N+1 to the list of active streams, in this way adding the 
same memory chunk to the list for a second time in a row and damaging the 
list’s internal structure
# new streams can be added and deleted after this point but the next the active 
stream list iteration will be infinite.

*Hypothesis validation*

By the time we identify the infinite loop (which is pretty straight forward) 
all the useful info about how we got to this state is gone so in order to 
validate this hypothesis I had to collect some more data.

Instrumented the code and added a check for failures to remove the stream from 
the list of active streams right before destroying it and in case of failure 
removed it (as a “catch all” safety net just before destroying)

It usually took 1-3 days to reach the infinite loop/deadlock after restart. 

Run the experimental code for 3+ days without getting into the infinite loop / 
deadlock state and the collected data indicates that it would have failed to 
remove the stream from the active stream list (which would trigger the infinite 
loop state) 4 times during that period. I believe this validates the hypothesis.

*Next steps*

Identified an execution path which could fail to remove the element from the 
list before destroying the stream.
Implemented a patch which I just started testing in prod and will provide 
update as soon as I validate it.


Please let me know if more info is needed or something does not make sense and 
will be happy to look into it!

Cheers,
--Gancho



> Http2ConnectionState::restart_streams infinite loop causes deadlock 
> 
>
> Key: TS-4916
> URL: https://issues.apache.org/jira/browse/TS-4916
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, HTTP/2
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Http2ConnectionState::restart_streams falls into an infinite loop while 
> holding a lock, which leads to cache updates to start failing.
> The infinite loop is caused by traversing a list whose last element “next” 
> points to the element itself and the traversal never finishes.
> {code}
> Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> #1  rcv_window_update_frame (cstate=..., frame=...) at 
> Http2ConnectionState.cc:627
> #2  0x2acf9738 in Http2ConnectionState::main_event_handler 
> (this=0x2ae6ba5284c8, event=, edata=) at 
> Http2ConnectionState.cc:823
> #3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
> event=2253, this=0x2ae6ba5284c8) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
> event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
> Http2ClientSession.cc:58
> #5  0x2acef462 in Http2ClientSession::state_complete_frame_read 
> (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at 
> Http2ClientSession.cc:426
> #6  0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #8  0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> 

[jira] [Created] (TS-4935) Adding same element twice in a row damages DLL's structure silently

2016-10-05 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4935:


 Summary: Adding same element twice in a row damages DLL's 
structure silently
 Key: TS-4935
 URL: https://issues.apache.org/jira/browse/TS-4935
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Gancho Tenev


If the DLL list (lib/ts/List.h)  is used improperly its internal structure gets 
damaged silently without any indication to the caller (no assert or return 
code).

If the the same element is added twice in a row the element's “next” would 
start pointing to the element itself and all the existing list content would be 
lost. All further additions will be OK but the next list traversal will be 
infinite.

Also noticed that when a new element is added to the list the element’s “prev” 
is not initialized (not a problem in the most common case but should to be 
fixed).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-10-04 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev reassigned TS-4916:


Assignee: Gancho Tenev

> Http2ConnectionState::restart_streams infinite loop causes deadlock 
> 
>
> Key: TS-4916
> URL: https://issues.apache.org/jira/browse/TS-4916
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, HTTP/2
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Http2ConnectionState::restart_streams falls into an infinite loop while 
> holding a lock, which leads to cache updates to start failing.
> The infinite loop is caused by traversing a list whose last element “next” 
> points to the element itself and the traversal never finishes.
> {code}
> Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> #1  rcv_window_update_frame (cstate=..., frame=...) at 
> Http2ConnectionState.cc:627
> #2  0x2acf9738 in Http2ConnectionState::main_event_handler 
> (this=0x2ae6ba5284c8, event=, edata=) at 
> Http2ConnectionState.cc:823
> #3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
> event=2253, this=0x2ae6ba5284c8) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
> event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
> Http2ClientSession.cc:58
> #5  0x2acef462 in Http2ClientSession::state_complete_frame_read 
> (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at 
> Http2ClientSession.cc:426
> #6  0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #8  0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
> #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
> event=event@entry=100) at UnixNetVConnection.cc:153
> #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
> event=event@entry=100) at UnixNetVConnection.cc:1036
> #15 0x2ae47653 in SSLNetVConnection::net_read_io 
> (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at 
> SSLNetVConnection.cc:595
> #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
> event=, e=) at UnixNet.cc:513
> #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
> event=5, this=) at I_Continuation.h:153
> #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
> this=0x2aaab2406000) at UnixEThread.cc:148
> #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
> #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
> Thread.cc:86
> #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
> pthread_create.c:301
> #22 0x2e8bc93d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> {code}
> Here is the stream_list trace.
> {code}
> (gdb) thread 51
> [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> (gdb) trace_list stream_list
> --- count=0 ---
> id=29
> this=0x2ae673f0c840
> next=0x2aaac05d8900
> prev=(nil)
> --- count=1 ---
> id=27
> this=0x2aaac05d8900
> next=0x2ae5b6bbec00
> prev=0x2ae673f0c840
> --- count=2 ---
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> --- count=3 ---
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . . 
> --- count=5560 ---
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> . . .
> {code}
> Currently I am working on finding out why the list in question got into this 
> “impossible” (broken) state and and eventually coming up with a fix.



--
This message was sent by Atlassian JIRA

[jira] [Updated] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-09-30 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4916:
-
Description: 
Http2ConnectionState::restart_streams falls into an infinite loop while holding 
a lock, which leads to cache updates to start failing.

The infinite loop is caused by traversing a list whose last element “next” 
points to the element itself and the traversal never finishes.

{code}
Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
#1  rcv_window_update_frame (cstate=..., frame=...) at 
Http2ConnectionState.cc:627
#2  0x2acf9738 in Http2ConnectionState::main_event_handler 
(this=0x2ae6ba5284c8, event=, edata=) at 
Http2ConnectionState.cc:823
#3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
event=2253, this=0x2ae6ba5284c8) at 
../../iocore/eventsystem/I_Continuation.h:153
#4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
Http2ClientSession.cc:58
#5  0x2acef462 in Http2ClientSession::state_complete_frame_read 
(this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at 
Http2ClientSession.cc:426
#6  0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
#8  0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
#10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
#12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=) at 
../../iocore/eventsystem/I_Continuation.h:153
#13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
event=event@entry=100) at UnixNetVConnection.cc:153
#14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
event=event@entry=100) at UnixNetVConnection.cc:1036
#15 0x2ae47653 in SSLNetVConnection::net_read_io (this=0x2aab7b237e00, 
nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at SSLNetVConnection.cc:595
#16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
event=, e=) at UnixNet.cc:513
#17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
event=5, this=) at I_Continuation.h:153
#18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
this=0x2aaab2406000) at UnixEThread.cc:148
#19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
#20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
Thread.cc:86
#21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
pthread_create.c:301
#22 0x2e8bc93d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
{code}

Here is the stream_list trace.

{code}
(gdb) thread 51
[Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913

(gdb) trace_list stream_list
--- count=0 ---
id=29
this=0x2ae673f0c840
next=0x2aaac05d8900
prev=(nil)
--- count=1 ---
id=27
this=0x2aaac05d8900
next=0x2ae5b6bbec00
prev=0x2ae673f0c840
--- count=2 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
--- count=3 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
. . . 
--- count=5560 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
. . .
{code}

Currently I am working on finding out why the list in question got into this 
“impossible” (broken) state and and eventually coming up with a fix.

  was:
Http2ConnectionState::restart_streams falls into an infinite loop while holding 
a lock, which leads to cache updates to start failing.

The infinite loop is caused by traversing a list whose last element “next” 
points to the element itself and the traversal never finishes.

{code}
Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
#1  rcv_window_update_frame (cstate=..., frame=...) at 
Http2ConnectionState.cc:627
#2  0x2acf9738 in Http2ConnectionState::main_event_handler 
(this=0x2ae6ba5284c8, event=, edata=) at 
Http2ConnectionState.cc:823
#3  0x2acef1c3 in Continuation::handleEvent 

[jira] [Updated] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-09-30 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4916:
-
Description: 
Http2ConnectionState::restart_streams falls into an infinite loop while holding 
a lock, which leads to cache updates to start failing.

The infinite loop is caused by traversing a list whose last element “next” 
points to the element itself and the traversal never finishes.

{code}
Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
#1  rcv_window_update_frame (cstate=..., frame=...) at 
Http2ConnectionState.cc:627
#2  0x2acf9738 in Http2ConnectionState::main_event_handler 
(this=0x2ae6ba5284c8, event=, edata=) at 
Http2ConnectionState.cc:823
#3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
event=2253, this=0x2ae6ba5284c8) at 
../../iocore/eventsystem/I_Continuation.h:153
#4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
Http2ClientSession.cc:58
#5  0x2acef462 in Http2ClientSession::state_complete_frame_read 
(this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at 
Http2ClientSession.cc:426
#6  0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
#8  0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
#10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
#12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=) at 
../../iocore/eventsystem/I_Continuation.h:153
#13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
event=event@entry=100) at UnixNetVConnection.cc:153
#14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
event=event@entry=100) at UnixNetVConnection.cc:1036
#15 0x2ae47653 in SSLNetVConnection::net_read_io (this=0x2aab7b237e00, 
nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at SSLNetVConnection.cc:595
#16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
event=, e=) at UnixNet.cc:513
#17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
event=5, this=) at I_Continuation.h:153
#18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
this=0x2aaab2406000) at UnixEThread.cc:148
#19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
#20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
Thread.cc:86
#21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
pthread_create.c:301
#22 0x2e8bc93d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115

(gdb) thread 51
[Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913

(gdb) trace_list stream_list
--- count=0 ---
id=29
this=0x2ae673f0c840
next=0x2aaac05d8900
prev=(nil)
--- count=1 ---
id=27
this=0x2aaac05d8900
next=0x2ae5b6bbec00
prev=0x2ae673f0c840
--- count=2 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
--- count=3 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
. . . 
--- count=5560 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
. . .
{code}

Currently I am working on finding out why the list in question got into this 
“impossible” (broken) state and and eventually coming up with a fix.

  was:
Http2ConnectionState::restart_streams falls into an infinite loop while holding 
a lock, which leads to cache updates to start failing.

The infinite loop is caused by traversing a list whose last element “next” 
points to the element itself and the traversal never finishes.

{code}
Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
#1  rcv_window_update_frame (cstate=..., frame=...) at 
Http2ConnectionState.cc:627
#2  0x2acf9738 in Http2ConnectionState::main_event_handler 
(this=0x2ae6ba5284c8, event=, edata=) at 
Http2ConnectionState.cc:823
#3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
event=2253, 

[jira] [Created] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadblock

2016-09-30 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4916:


 Summary: Http2ConnectionState::restart_streams infinite loop 
causes deadblock 
 Key: TS-4916
 URL: https://issues.apache.org/jira/browse/TS-4916
 Project: Traffic Server
  Issue Type: Bug
  Components: Core, HTTP/2
Reporter: Gancho Tenev


Http2ConnectionState::restart_streams falls into an infinite loop while holding 
a lock, which leads to cache updates to start failing.

The infinite loop is caused by traversing a list whose last element “next” 
points to the element itself and the traversal never finishes.

{code}
Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
#1  rcv_window_update_frame (cstate=..., frame=...) at 
Http2ConnectionState.cc:627
#2  0x2acf9738 in Http2ConnectionState::main_event_handler 
(this=0x2ae6ba5284c8, event=, edata=) at 
Http2ConnectionState.cc:823
#3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
event=2253, this=0x2ae6ba5284c8) at 
../../iocore/eventsystem/I_Continuation.h:153
#4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
Http2ClientSession.cc:58
#5  0x2acef462 in Http2ClientSession::state_complete_frame_read 
(this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at 
Http2ClientSession.cc:426
#6  0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
#8  0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
#10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
#11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
#12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
event=100, this=) at 
../../iocore/eventsystem/I_Continuation.h:153
#13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
event=event@entry=100) at UnixNetVConnection.cc:153
#14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
event=event@entry=100) at UnixNetVConnection.cc:1036
#15 0x2ae47653 in SSLNetVConnection::net_read_io (this=0x2aab7b237e00, 
nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at SSLNetVConnection.cc:595
#16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
event=, e=) at UnixNet.cc:513
#17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
event=5, this=) at I_Continuation.h:153
#18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
this=0x2aaab2406000) at UnixEThread.cc:148
#19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
#20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
Thread.cc:86
#21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
pthread_create.c:301
#22 0x2e8bc93d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115

(gdb) thread 51
[Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
#0  0x2acf3fee in Http2ConnectionState::restart_streams 
(this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913

(gdb) trace_list stream_list
--- count=0 ---
id=29
this=0x2ae673f0c840
next=0x2aaac05d8900
prev=(nil)
--- count=1 ---
id=27
this=0x2aaac05d8900
next=0x2ae5b6bbec00
prev=0x2ae673f0c840
--- count=2 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
--- count=3 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
. . . 
--- count=5560 ---
id=19
this=0x2ae5b6bbec00
next=0x2ae5b6bbec00
prev=0x2aaac05d8900
--- count=5 ---
. . .
{code}

Currently I am working on finding out why the list in question got into this 
“impossible” (broken) state and and eventually coming up with a fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.

2016-09-21 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510517#comment-15510517
 ] 

Gancho Tenev commented on TS-4334:
--

[~jamesf], sounds good. The above solution was just to demonstrate the idea, 
you would need to adjust it to your particular use-case and verify if it works.

Cheers,
--Gancho

> The cache_range_requests plugin always attempts to modify the cache key.
> 
>
> Key: TS-4334
> URL: https://issues.apache.org/jira/browse/TS-4334
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Nolan Astrein
>Assignee: Gancho Tenev
> Fix For: 7.1.0
>
>
> A TrafficServer administrator should be able to specify whether or not the 
> cache_range_requests plugin should modify the cache key.  The cache key may 
> be modified by a previous plugin in a plugin chain and there is no way to 
> configure cache_range_requests not to do any further modifications to the 
> cache key.  Having multiple plugins responsible for cache key modifications 
> can cause unexpected behavior, especially when a plugin chain ordering is 
> changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.

2016-09-19 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505097#comment-15505097
 ] 

Gancho Tenev commented on TS-4334:
--

Checked with [~latesonarinn] offline and here is an excerpt:

9/16/16, 12:06 PM Gancho Tenev:
{quote}
Proposed alternative solution using more generic means (cachekey and 
header_rewrite) here: [TS-4334|https://issues.apache.org/jira/browse/TS-4334]
Could you please let me know if it works for you?
{quote}

9/19/16, 9:59 AM Nolan Astrein: 
{quote}
Clever solution.  I think that will work.
{quote}


[~latesonarinn], could you please verify and/or close this Jira if all looks 
good? 
(if not please let me know if I can help!)

Cheers!

> The cache_range_requests plugin always attempts to modify the cache key.
> 
>
> Key: TS-4334
> URL: https://issues.apache.org/jira/browse/TS-4334
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Nolan Astrein
>Assignee: Gancho Tenev
> Fix For: 7.1.0
>
>
> A TrafficServer administrator should be able to specify whether or not the 
> cache_range_requests plugin should modify the cache key.  The cache key may 
> be modified by a previous plugin in a plugin chain and there is no way to 
> configure cache_range_requests not to do any further modifications to the 
> cache key.  Having multiple plugins responsible for cache key modifications 
> can cause unexpected behavior, especially when a plugin chain ordering is 
> changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.

2016-09-19 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504833#comment-15504833
 ] 

Gancho Tenev commented on TS-4707:
--

[~jrushford], [~pbchou], [~zwoop], 

I am not sure if I studied this enough but here is an idea: it seems that we 
could add a switch to {{cachekey}} plugin to make it use the "modify parent 
selection URI" API call instead of "modify cache key URI" API if something 
like: {{\@plugin=cachekey \@pparam=--parent_selection}} is used.

This way we may be able to use independently different {{cachekey}} plugin 
instances per each remap rule (one for modifying the cache key URI and one for 
modifying the parent selection URI) to be able to cover for more use-cases, 
reuse the {{cachekey}} URI manipulation functionality and have consistent user 
experience. If this makes sense/works we may end up renaming the plugin to 
something more generic.

Currently {{cachekey}} plugin has the ability to do {{fname}} and {{maxdirs}} 
features by using some of its regex related features (please see 
{{--capture-path=/capture/replace/}} @  [cachekey 
docs|https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/cachekey.en.html#path-section],
 I can provide examples if requested)

I think {{fname}} would not make sense for manipulating the cache key URI and 
if we insist on non-regex ways to achieve {{fname}} and {{maxdirs}} we could 
add it as features available only if {{--parent_selection}} is used, or 
something in this sense.

If this sounds sensible/feasible I can work on the {{cachekey}} plugin change.

> Parent Consistent Hash Selection - add fname and maxdirs options.
> -
>
> Key: TS-4707
> URL: https://issues.apache.org/jira/browse/TS-4707
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Parent Proxy
>Reporter: Peter Chou
>Assignee: Peter Chou
> Fix For: 7.1.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This enhancement adds two options, "fname" and "maxdirs", which can be used 
> to exclude the file-name and some of the directories in the path. The 
> remaining portions of the path are then used as part of the hash computation 
> for selecting among multiple parent caches.
> For our usage, it was desirable from an operational perspective to direct all 
> components of particular sub-tree to a single parent cache (to simplify 
> trouble-shooting, pre-loading, etc.). This can be achieved by excluding the 
> query-string, file-name, and right-most portions of the path from the hash 
> computation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4870) Storage can be marked offline multiple times which breaks related metrics

2016-09-15 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494410#comment-15494410
 ] 

Gancho Tenev commented on TS-4870:
--

Repeat the test with the patch applied:

{code}
# Initial cache size (when using both disks).
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total 268025856

# Take 1st disk offline. Cache size changes as expected.
$ sudo ./bin/traffic_ctl storage offline /dev/sdb
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total 134012928

# Take same disk offline again. Now good!
$ sudo ./bin/traffic_ctl storage offline /dev/sdb
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total 134012928

# Take same disk offline again. Good again.
$ sudo ./bin/traffic_ctl storage offline /dev/sdb
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total 134012928
{code}

> Storage can be marked offline multiple times which breaks related metrics
> -
>
> Key: TS-4870
> URL: https://issues.apache.org/jira/browse/TS-4870
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache, Metrics
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> Let us say traffic server is running with 2 disks
> {code}
> $ cat etc/trafficserver/storage.config
> /dev/sdb
> /dev/sdc
> $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]'
> Disk /dev/sdb: 134 MB, 134217728 bytes
> Disk /dev/sdc: 134 MB, 134217728 bytes
> {code}
> Let us see what happens when we mark the same disk 3 times in a raw 
> ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}.
> {code}
> # Initial cache size (when using both disks).
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 268025856
> # Take 1st disk offline. Cache size changes as expected.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 134012928
> # Take same disk offline again. Not good!
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 0
> # Take same disk offline again. Negative value.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total -134012928
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4870) Storage can be marked offline multiple times which breaks related metrics

2016-09-15 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4870:
-
Fix Version/s: 7.0.0

> Storage can be marked offline multiple times which breaks related metrics
> -
>
> Key: TS-4870
> URL: https://issues.apache.org/jira/browse/TS-4870
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache, Metrics
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> Let us say traffic server is running with 2 disks
> {code}
> $ cat etc/trafficserver/storage.config
> /dev/sdb
> /dev/sdc
> $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]'
> Disk /dev/sdb: 134 MB, 134217728 bytes
> Disk /dev/sdc: 134 MB, 134217728 bytes
> {code}
> Let us see what happens when we mark the same disk 3 times in a raw 
> ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}.
> {code}
> # Initial cache size (when using both disks).
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 268025856
> # Take 1st disk offline. Cache size changes as expected.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 134012928
> # Take same disk offline again. Not good!
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 0
> # Take same disk offline again. Negative value.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total -134012928
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4870) Storage can be marked offline multiple times which breaks related metrics

2016-09-15 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4870:
-
Component/s: Metrics

> Storage can be marked offline multiple times which breaks related metrics
> -
>
> Key: TS-4870
> URL: https://issues.apache.org/jira/browse/TS-4870
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache, Metrics
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> Let us say traffic server is running with 2 disks
> {code}
> $ cat etc/trafficserver/storage.config
> /dev/sdb
> /dev/sdc
> $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]'
> Disk /dev/sdb: 134 MB, 134217728 bytes
> Disk /dev/sdc: 134 MB, 134217728 bytes
> {code}
> Let us see what happens when we mark the same disk 3 times in a raw 
> ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}.
> {code}
> # Initial cache size (when using both disks).
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 268025856
> # Take 1st disk offline. Cache size changes as expected.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 134012928
> # Take same disk offline again. Not good!
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 0
> # Take same disk offline again. Negative value.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total -134012928
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4870) Storage can be marked offline multiple times which breaks related metrics

2016-09-15 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4870:
-
Assignee: Gancho Tenev

> Storage can be marked offline multiple times which breaks related metrics
> -
>
> Key: TS-4870
> URL: https://issues.apache.org/jira/browse/TS-4870
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>
> Let us say traffic server is running with 2 disks
> {code}
> $ cat etc/trafficserver/storage.config
> /dev/sdb
> /dev/sdc
> $ sudo fdisk -l|grep 'Disk /dev/sd[b|c]'
> Disk /dev/sdb: 134 MB, 134217728 bytes
> Disk /dev/sdc: 134 MB, 134217728 bytes
> {code}
> Let us see what happens when we mark the same disk 3 times in a raw 
> ({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}.
> {code}
> # Initial cache size (when using both disks).
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 268025856
> # Take 1st disk offline. Cache size changes as expected.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 134012928
> # Take same disk offline again. Not good!
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total 0
> # Take same disk offline again. Negative value.
> $ sudo ./bin/traffic_ctl storage offline /dev/sdb
> $ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
> proxy.node.cache.bytes_total -134012928
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4870) Storage can be marked offline multiple times which breaks related metrics

2016-09-15 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4870:


 Summary: Storage can be marked offline multiple times which breaks 
related metrics
 Key: TS-4870
 URL: https://issues.apache.org/jira/browse/TS-4870
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Gancho Tenev


Let us say traffic server is running with 2 disks
{code}
$ cat etc/trafficserver/storage.config
/dev/sdb
/dev/sdc

$ sudo fdisk -l|grep 'Disk /dev/sd[b|c]'
Disk /dev/sdb: 134 MB, 134217728 bytes
Disk /dev/sdc: 134 MB, 134217728 bytes
{code}

Let us see what happens when we mark the same disk 3 times in a raw 
({{/dev/sdb}}) and check the {{proxy.node.cache.bytes_total}}.

{code}
# Initial cache size (when using both disks).
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total 268025856

# Take 1st disk offline. Cache size changes as expected.
$ sudo ./bin/traffic_ctl storage offline /dev/sdb
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total 134012928

# Take same disk offline again. Not good!
$ sudo ./bin/traffic_ctl storage offline /dev/sdb
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total 0

# Take same disk offline again. Negative value.
$ sudo ./bin/traffic_ctl storage offline /dev/sdb
$ ./bin/traffic_ctl metric get proxy.node.cache.bytes_total
proxy.node.cache.bytes_total -134012928
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4834) Expose bad disk and disk access failures

2016-09-08 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4834:
-
Description: 
We would like to monitor low-level disk access failures and disks marked by ATS 
as bad.

I have a patch that exposes that information through
{code}
proxy.process.cache.disk_error_count 10
proxy.process.cache.disk_bad_count 5
{code}

and the following tests/shows how it would work...

Start ATS with 2 disks and tail {{diags.log}}

{code}
$ cat etc/trafficserver/storage.config
/dev/sdb
/dev/sdc

$ tail -f var/log/trafficserver/diags.log
[Sep  8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running
[Sep  8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled
{code}

Check related metrics and observe all 0s

{code}
$ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
"proxy.process.cache.*(read|write).failure" 
"proxy.process.http.cache_(read|write)_errors"
proxy.process.cache.disk_error_count 0
proxy.process.cache.disk_bad_count 0
proxy.process.cache.read.failure 0
proxy.process.cache.write.failure 0
proxy.process.cache.volume_0.read.failure 0
proxy.process.cache.volume_0.write.failure 0
proxy.process.http.cache_write_errors 0
proxy.process.http.cache_read_errors 0
{code}

Now using your favorite hard disk failure injection tool inject 10 failures, by 
setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to fail all 
reads. And shoot 5 requests causing 10 failed reads.

{code}
$ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null 
-s; done

$ tail -f var/log/trafficserver/diags.log
[Sep  8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [1/10]
[Sep  8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [2/10]
[Sep  8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [3/10]
[Sep  8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [4/10]
[Sep  8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [5/10]
[Sep  8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [6/10]
[Sep  8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [7/10]
[Sep  8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [8/10]
[Sep  8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [9/10]
[Sep  8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors 
accessing disk /dev/sdb [10/10]: declaring disk bad
{code}

We see 5 read failures which triggered 10 actually disk reads and marked the 
failing disk as a bad disk.

{code}
$ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
"proxy.process.cache.*(read|write).failure" 
"proxy.process.http.cache_(read|write)_errors"
proxy.process.cache.disk_error_count 10
proxy.process.cache.disk_bad_count 1
proxy.process.cache.read.failure 5
proxy.process.cache.write.failure 5
proxy.process.cache.volume_0.read.failure 5
proxy.process.cache.volume_0.write.failure 5
proxy.process.http.cache_write_errors 0
proxy.process.http.cache_read_errors 0
{code}

Now shoot 5 requests causing 10 failed reads.

{code}
$ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null 
-s; done

$ tail -f var/log/trafficserver/diags.log
[Sep  8 12:26:02.874] Server {0x2aaae4100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:26:02.875] Server {0x2aaaf0302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:26:02.876] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdc [1/10]
[Sep  8 12:26:02.885] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdc [2/10]
[Sep  8 12:26:02.902] Server {0x2aaaf0302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:26:02.902] Server {0x2aaae4100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:26:02.907] Server {0x2b5f43db54c0} WARNING: Error 

[jira] [Updated] (TS-4834) Expose bad disk and disk access failures

2016-09-08 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4834:
-
Assignee: Gancho Tenev

> Expose bad disk and disk access failures
> 
>
> Key: TS-4834
> URL: https://issues.apache.org/jira/browse/TS-4834
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache, Metrics
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> We would like to monitor low-level disk access failures and disk marked by 
> ATS as bad.
> I have a patch that exposes that information through
> {code}
> proxy.process.cache.disk_error_count 10
> proxy.process.cache.disk_bad_count 5
> {code}
> and the following tests/shows how it would work...
> Start ATS with 2 disks and tail {{diags.log}}
> {code}
> $ cat etc/trafficserver/storage.config
> /dev/sdb
> /dev/sdc
> $ tail -f var/log/trafficserver/diags.log
> [Sep  8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running
> [Sep  8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled
> {code}
> Check related metrics and observe all 0s
> {code}
> $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
> "proxy.process.cache.*(read|write).failure" 
> "proxy.process.http.cache_(read|write)_errors"
> proxy.process.cache.disk_error_count 0
> proxy.process.cache.disk_bad_count 0
> proxy.process.cache.read.failure 0
> proxy.process.cache.write.failure 0
> proxy.process.cache.volume_0.read.failure 0
> proxy.process.cache.volume_0.write.failure 0
> proxy.process.http.cache_write_errors 0
> proxy.process.http.cache_read_errors 0
> {code}
> Now using your favorite hard disk failure injection tool inject 10 failures, 
> by setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to 
> fail all reads. And shoot 5 requests causing 10 failed reads.
> {code}
> $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o 
> /dev/null -s; done
> $ tail -f var/log/trafficserver/diags.log
> [Sep  8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [1/10]
> [Sep  8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [2/10]
> [Sep  8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [3/10]
> [Sep  8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [4/10]
> [Sep  8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [5/10]
> [Sep  8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [6/10]
> [Sep  8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [7/10]
> [Sep  8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [8/10]
> [Sep  8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [9/10]
> [Sep  8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors 
> accessing disk /dev/sdb [10/10]: declaring disk bad
> {code}
> We see 5 read failures which triggered 10 actually disk reads and marked the 
> failing disk as a bad disk.
> {code}
> $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
> "proxy.process.cache.*(read|write).failure" 
> "proxy.process.http.cache_(read|write)_errors"
> proxy.process.cache.disk_error_count 10
> proxy.process.cache.disk_bad_count 1
> proxy.process.cache.read.failure 5
> proxy.process.cache.write.failure 5
> proxy.process.cache.volume_0.read.failure 5
> proxy.process.cache.volume_0.write.failure 5
> proxy.process.http.cache_write_errors 0
> proxy.process.http.cache_read_errors 0
> {code}
> Now shoot 5 requests causing 10 failed reads.
> {code}
> $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o 
> /dev/null -s; done
> $ tail -f var/log/trafficserver/diags.log
> [Sep  8 12:26:02.874] Server 

[jira] [Created] (TS-4834) Expose bad disk and disk access failures

2016-09-08 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4834:


 Summary: Expose bad disk and disk access failures
 Key: TS-4834
 URL: https://issues.apache.org/jira/browse/TS-4834
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache, Metrics
Reporter: Gancho Tenev


We would like to monitor low-level disk access failures and disk marked by ATS 
as bad.

I have a patch that exposes that information through
{code}
proxy.process.cache.disk_error_count 10
proxy.process.cache.disk_bad_count 5
{code}

and the following tests/shows how it would work...

Start ATS with 2 disks and tail {{diags.log}}

{code}
$ cat etc/trafficserver/storage.config
/dev/sdb
/dev/sdc

$ tail -f var/log/trafficserver/diags.log
[Sep  8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running
[Sep  8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled
{code}

Check related metrics and observe all 0s

{code}
$ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
"proxy.process.cache.*(read|write).failure" 
"proxy.process.http.cache_(read|write)_errors"
proxy.process.cache.disk_error_count 0
proxy.process.cache.disk_bad_count 0
proxy.process.cache.read.failure 0
proxy.process.cache.write.failure 0
proxy.process.cache.volume_0.read.failure 0
proxy.process.cache.volume_0.write.failure 0
proxy.process.http.cache_write_errors 0
proxy.process.http.cache_read_errors 0
{code}

Now using your favorite hard disk failure injection tool inject 10 failures, by 
setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to fail all 
reads. And shoot 5 requests causing 10 failed reads.

{code}
$ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null 
-s; done

$ tail -f var/log/trafficserver/diags.log
[Sep  8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [1/10]
[Sep  8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [2/10]
[Sep  8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [3/10]
[Sep  8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [4/10]
[Sep  8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [5/10]
[Sep  8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [6/10]
[Sep  8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdb [7/10]
[Sep  8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [8/10]
[Sep  8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdb [9/10]
[Sep  8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors 
accessing disk /dev/sdb [10/10]: declaring disk bad
{code}

We see 5 read failures which triggered 10 actually disk reads and marked the 
failing disk as a bad disk.

{code}
$ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
"proxy.process.cache.*(read|write).failure" 
"proxy.process.http.cache_(read|write)_errors"
proxy.process.cache.disk_error_count 10
proxy.process.cache.disk_bad_count 1
proxy.process.cache.read.failure 5
proxy.process.cache.write.failure 5
proxy.process.cache.volume_0.read.failure 5
proxy.process.cache.volume_0.write.failure 5
proxy.process.http.cache_write_errors 0
proxy.process.http.cache_read_errors 0
{code}

Now shoot 5 requests causing 10 failed reads.

{code}
$ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o /dev/null 
-s; done

$ tail -f var/log/trafficserver/diags.log
[Sep  8 12:26:02.874] Server {0x2aaae4100700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:26:02.875] Server {0x2aaaf0302700} WARNING: cache disk operation 
failed READ -1 0
[Sep  8 12:26:02.876] Server {0x2b5f44654700} WARNING: Error accessing Disk 
/dev/sdc [1/10]
[Sep  8 12:26:02.885] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
/dev/sdc [2/10]
[Sep  8 12:26:02.902] Server {0x2aaaf0302700} WARNING: cache disk operation 
failed READ -1 

[jira] [Updated] (TS-4834) Expose bad disk and disk access failures

2016-09-08 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4834:
-
Fix Version/s: 7.0.0

> Expose bad disk and disk access failures
> 
>
> Key: TS-4834
> URL: https://issues.apache.org/jira/browse/TS-4834
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache, Metrics
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> We would like to monitor low-level disk access failures and disk marked by 
> ATS as bad.
> I have a patch that exposes that information through
> {code}
> proxy.process.cache.disk_error_count 10
> proxy.process.cache.disk_bad_count 5
> {code}
> and the following tests/shows how it would work...
> Start ATS with 2 disks and tail {{diags.log}}
> {code}
> $ cat etc/trafficserver/storage.config
> /dev/sdb
> /dev/sdc
> $ tail -f var/log/trafficserver/diags.log
> [Sep  8 12:18:48.149] Server {0x2b5f43db54c0} NOTE: traffic server running
> [Sep  8 12:18:48.198] Server {0x2b5f44654700} NOTE: cache enabled
> {code}
> Check related metrics and observe all 0s
> {code}
> $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
> "proxy.process.cache.*(read|write).failure" 
> "proxy.process.http.cache_(read|write)_errors"
> proxy.process.cache.disk_error_count 0
> proxy.process.cache.disk_bad_count 0
> proxy.process.cache.read.failure 0
> proxy.process.cache.write.failure 0
> proxy.process.cache.volume_0.read.failure 0
> proxy.process.cache.volume_0.write.failure 0
> proxy.process.http.cache_write_errors 0
> proxy.process.http.cache_read_errors 0
> {code}
> Now using your favorite hard disk failure injection tool inject 10 failures, 
> by setting both disks used by this setup {{/dev/sdb}} and {{/dev/sdc}} to 
> fail all reads. And shoot 5 requests causing 10 failed reads.
> {code}
> $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o 
> /dev/null -s; done
> $ tail -f var/log/trafficserver/diags.log
> [Sep  8 12:19:09.758] Server {0x2aaab4302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.759] Server {0x2aaac0100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.764] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [1/10]
> [Sep  8 12:19:09.769] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [2/10]
> [Sep  8 12:19:09.785] Server {0x2aaac0100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.786] Server {0x2aaab4302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.791] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [3/10]
> [Sep  8 12:19:09.796] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [4/10]
> [Sep  8 12:19:09.812] Server {0x2aaab4100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.813] Server {0x2aaacc100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.817] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [5/10]
> [Sep  8 12:19:09.823] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [6/10]
> [Sep  8 12:19:09.843] Server {0x2aaacc302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.844] Server {0x2aaad8100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.847] Server {0x2b5f44654700} WARNING: Error accessing Disk 
> /dev/sdb [7/10]
> [Sep  8 12:19:09.854] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [8/10]
> [Sep  8 12:19:09.874] Server {0x2aaacc302700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.875] Server {0x2aaad8100700} WARNING: cache disk operation 
> failed READ -1 0
> [Sep  8 12:19:09.880] Server {0x2b5f43db54c0} WARNING: Error accessing Disk 
> /dev/sdb [9/10]
> [Sep  8 12:19:09.887] Server {0x2b5f44654700} WARNING: too many errors 
> accessing disk /dev/sdb [10/10]: declaring disk bad
> {code}
> We see 5 read failures which triggered 10 actually disk reads and marked the 
> failing disk as a bad disk.
> {code}
> $ ./bin/traffic_ctl metric match "proxy.process.cache*.disk.*" 
> "proxy.process.cache.*(read|write).failure" 
> "proxy.process.http.cache_(read|write)_errors"
> proxy.process.cache.disk_error_count 10
> proxy.process.cache.disk_bad_count 1
> proxy.process.cache.read.failure 5
> proxy.process.cache.write.failure 5
> proxy.process.cache.volume_0.read.failure 5
> proxy.process.cache.volume_0.write.failure 5
> proxy.process.http.cache_write_errors 0
> proxy.process.http.cache_read_errors 0
> {code}
> Now shoot 5 requests causing 10 failed reads.
> {code}
> $ for i in 1 2 3 4 5; do curl -x 127.0.0.1:80 http://example.com/1 -o 
> /dev/null -s; done
> $ tail -f var/log/trafficserver/diags.log
> [Sep  8 12:26:02.874] Server 

[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.

2016-09-06 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468540#comment-15468540
 ] 

Gancho Tenev commented on TS-4334:
--

[~jamesf], I am sorry if I did not make my idea/example clear enough!

I was proposing not to use {{cache_range_requests}} plugin at all and use 
{{cachekey}} plugin as your central place of cache key manipulation and then by 
using the {{header_rewrite}} plugin to implement the rest of the logic done by 
the {{cache_range_requests}} (adding/removing the Range header at different 
hooks).

The end result should be practically the same as using {{cache_range_requests}} 
plugin but achieved by using more generic plugins (tested it) instead of 
hacking into {{cache_range_requests}} which seems pretty specialized and less 
configurable.

I was wondering if this would work for you.

Please let me know, I would gladly help you if any problems/concerns. 

Cheers!

> The cache_range_requests plugin always attempts to modify the cache key.
> 
>
> Key: TS-4334
> URL: https://issues.apache.org/jira/browse/TS-4334
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Nolan Astrein
>Assignee: Gancho Tenev
> Fix For: 7.1.0
>
>
> A TrafficServer administrator should be able to specify whether or not the 
> cache_range_requests plugin should modify the cache key.  The cache key may 
> be modified by a previous plugin in a plugin chain and there is no way to 
> configure cache_range_requests not to do any further modifications to the 
> cache key.  Having multiple plugins responsible for cache key modifications 
> can cause unexpected behavior, especially when a plugin chain ordering is 
> changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set

2016-09-02 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459117#comment-15459117
 ] 

Gancho Tenev edited comment on TS-4809 at 9/2/16 5:51 PM:
--

Provided a patch which would error like this:

{code}
20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} at hdrs.config:2 
should be the first hook condition in the rule set and each rule set should 
contain only one hook condition
{code}

In the following 2 use-cases:

* The hook condition is not the first in the rule set. {code}
$ sudo cat etc/trafficserver/hdrs.config
cond %{TRUE}
cond %{REMAP_PSEUDO_HOOK}
   set-header Some-Header "some value"
{code}

* There are 2 hook conditions in the same rule set. {code}
$ sudo cat etc/trafficserver/hdrs.config
cond %{REMAP_PSEUDO_HOOK}
cond %{TRUE}
cond %{SEND_RESPONSE_HDR_HOOK}
   set-header Some-Header "some value"
{code}

Also added line numbers to the error messages in {{RuleSet::add_condition()}} 
and {{RuleSet::add_operator()}}.




was (Author: gancho):
Provided a patch which would error like this:

{code}
20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} at hdrs.config:2 
should be the first hook condition in the rule set and each rule set should 
contain only one hook condition
{code}

In the following 2 use-cases:

* The hook condition is not the first in the rule set. {code}
$ sudo cat etc/trafficserver/hdrs.config
cond %{TRUE}
cond %{REMAP_PSEUDO_HOOK}
   set-header Some-Header "some value"
{code}

* There are 2 hook conditions in the same rule set. {code}
$ sudo cat etc/trafficserver/hdrs.config
cond %{REMAP_PSEUDO_HOOK}
cond %{TRUE}
cond %{SEND_RESPONSE_HDR_HOOK}
   set-header Some-Header "some value"
{code}

Also added a line numbers to the error messages in {{RuleSet::add_condition()}} 
and {{RuleSet::add_operator()}}.



> [header_rewrite] check to make sure "hook" conditions are first in the rule 
> set 
> 
>
> Key: TS-4809
> URL: https://issues.apache.org/jira/browse/TS-4809
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> The following configuration
> {code}
> $ cat etc/trafficserver/remap.config
> map http://example.com http://127.0.0.1: \
> @plugin=header_rewrite.so @pparam=hdrs.config
> $ cat etc/trafficserver/hdrs.config
> cond %{TRUE}
> cond %{REMAP_PSEUDO_HOOK}
>set-header Some-Header "some value"
> {code}
> Triggers the following error which does not show what and where the problem 
> is:
> {code}
> 20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK
> {code}
> I would like to add a check which will prevent the above error and print 
> another error clarifying where and what the problem is, for instance:
> {code}
> 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come 
> first in the rule set at hdrs.config:2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set

2016-09-02 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459117#comment-15459117
 ] 

Gancho Tenev commented on TS-4809:
--

Provided a patch which would error like this:

{code}
20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} at hdrs.config:2 
should be the first hook condition in the rule set and each rule set should 
contain only one hook condition
{code}

In the following 2 use-cases:

* The hook condition is not the first in the rule set. {code}
$ sudo cat etc/trafficserver/hdrs.config
cond %{TRUE}
cond %{REMAP_PSEUDO_HOOK}
   set-header Some-Header "some value"
{code}

* There are 2 hook conditions in the same rule set. {code}
$ sudo cat etc/trafficserver/hdrs.config
cond %{REMAP_PSEUDO_HOOK}
cond %{TRUE}
cond %{SEND_RESPONSE_HDR_HOOK}
   set-header Some-Header "some value"
{code}

Also added a line numbers to the error messages in {{RuleSet::add_condition()}} 
and {{RuleSet::add_operator()}}.



> [header_rewrite] check to make sure "hook" conditions are first in the rule 
> set 
> 
>
> Key: TS-4809
> URL: https://issues.apache.org/jira/browse/TS-4809
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> The following configuration
> {code}
> $ cat etc/trafficserver/remap.config
> map http://example.com http://127.0.0.1: \
> @plugin=header_rewrite.so @pparam=hdrs.config
> $ cat etc/trafficserver/hdrs.config
> cond %{TRUE}
> cond %{REMAP_PSEUDO_HOOK}
>set-header Some-Header "some value"
> {code}
> Triggers the following error which does not show what and where the problem 
> is:
> {code}
> 20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK
> {code}
> I would like to add a check which will prevent the above error and print 
> another error clarifying where and what the problem is, for instance:
> {code}
> 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come 
> first in the rule set at hdrs.config:2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set

2016-09-02 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4809:
-
Assignee: Gancho Tenev

> [header_rewrite] check to make sure "hook" conditions are first in the rule 
> set 
> 
>
> Key: TS-4809
> URL: https://issues.apache.org/jira/browse/TS-4809
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>
> The following configuration
> {code}
> $ cat etc/trafficserver/remap.config
> map http://example.com http://127.0.0.1: \
> @plugin=header_rewrite.so @pparam=hdrs.config
> $ cat etc/trafficserver/hdrs.config
> cond %{TRUE}
> cond %{REMAP_PSEUDO_HOOK}
>set-header Some-Header "some value"
> {code}
> Triggers the following error which does not show what and where the problem 
> is:
> {code}
> 20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK
> {code}
> I would like to add a check which will prevent the above error and print 
> another error clarifying where and what the problem is, for instance:
> {code}
> 20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come 
> first in the rule set at hdrs.config:2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4809) [header_rewrite] check to make sure "hook" conditions are first in the rule set

2016-09-02 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4809:


 Summary: [header_rewrite] check to make sure "hook" conditions are 
first in the rule set 
 Key: TS-4809
 URL: https://issues.apache.org/jira/browse/TS-4809
 Project: Traffic Server
  Issue Type: Improvement
  Components: Plugins
Reporter: Gancho Tenev


The following configuration

{code}
$ cat etc/trafficserver/remap.config
map http://example.com http://127.0.0.1: \
@plugin=header_rewrite.so @pparam=hdrs.config

$ cat etc/trafficserver/hdrs.config
cond %{TRUE}
cond %{REMAP_PSEUDO_HOOK}
   set-header Some-Header "some value"
{code}

Triggers the following error which does not show what and where the problem is:

{code}
20160901.23h17m13s [header_rewrite] Unknown condition: REMAP_PSEUDO_HOOK
{code}

I would like to add a check which will prevent the above error and print 
another error clarifying where and what the problem is, for instance:

{code}
20160901.23h17m13s [header_rewrite] cond %{REMAP_PSEUDO_HOOK} should come first 
in the rule set at hdrs.config:2
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.

2016-08-27 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441913#comment-15441913
 ] 

Gancho Tenev edited comment on TS-4334 at 8/27/16 5:16 PM:
---

It seems to me that the {{cache_range_requests}} functionality can be achieved 
by using more generic (and feature-rich) plugins like {{header_rewrite}} and 
{{cachekey}}.

Here is an example where {{httpbin.org}} is used as origin which responds to 
range requests adding {{Cache-Control:max-age=10}} header. 

There are 2 remap rules defined, one for {{example_no_cache.com}} which is 
non-caching (ATS does not cache 206 responses by default) and one for 
{{example.com}} which is caching exactly like the {{cache_range_requests}} 
plugin does.

To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times 
to test the non-caching remap and 2 times to test the 
"cache_range_requests"-style caching. 

Here are the configs:

{code}
$ cat etc/trafficserver/remap.config
map http://example.com http://httpbin.org \
@plugin=cachekey.so @pparam=--include-headers=@Original-Range \
@plugin=header_rewrite.so @pparam=cache_range_local.config

map http://example_no_cache.com http://httpbin.org \
@plugin=cachekey.so


$ cat etc/trafficserver/cache_range_global.config
cond %{READ_REQUEST_HDR_HOOK}
cond %{CLIENT-URL:HOST} example.com
set-header @Original-Range %{HEADER:Range}
rm-header Range


$ cat etc/trafficserver/cache_range_local.config
cond %{SEND_REQUEST_HDR_HOOK}
set-header Range %{HEADER:@Original-Range}

cond %{READ_RESPONSE_HDR_HOOK}
cond %{STATUS} =206
set-status 200
set-header Cache-Control "max-age=10"

cond %{SEND_RESPONSE_HDR_HOOK}
cond %{STATUS} =200
set-status 206


$ cat etc/trafficserver/plugin.config
header_rewrite.so cache_range_global.config
xdebug.so
{code}

And here is a sample test:

{code}
$ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache
. . .

$ for domain in example_no_cache.com example_no_cache.com example.com 
example.com; do curl -x 127.0.0.1:80 -v "http://${domain}/range/1024; -H 
"X-Debug: X-Cache,X-Cache-Key" -r0-16 -s 2>&1|grep -e "HTTP" -e "Cache"; echo 
"---"; sleep 5; done
> GET http://example_no_cache.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 PARTIAL CONTENT
< X-Cache-Key: /example_no_cache.com/80/range/1024
< X-Cache: miss
---
> GET http://example_no_cache.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 PARTIAL CONTENT
< X-Cache-Key: /example_no_cache.com/80/range/1024
< X-Cache: miss
---
> GET http://example.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 Partial Content
< Cache-Control: max-age=10
< X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024
< X-Cache: miss
---
> GET http://example.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 Partial Content
< Cache-Control: max-age=10
< X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024
< X-Cache: hit-fresh
---
{code}

Please let me know if it works for you!
Cheers,
--Gancho


was (Author: gancho):
It seems to me that the {{cache_range_requests}} functionality can be achieved 
by using more generic (and feature-rich) plugins like {{header_rewrite}} and 
{{cachekey}}.

Here is a example where {{httpbin.org}} is used as origin which responds to 
range requests adding {{Cache-Control:max-age=10}} header. 

There are 2 remap rules defined, one for {{example_no_cache.com}} which is 
non-caching (ATS does not cache 206 responses by default) and one for 
{{example.com}} which is caching exactly like the {{cache_range_requests}} 
plugin does.

To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times 
to test the non-caching remap and 2 times to test the 
"cache_range_requests"-style caching. 

Here are the configs:

{code}
$ cat etc/trafficserver/remap.config
map http://example.com http://httpbin.org \
@plugin=cachekey.so @pparam=--include-headers=@Original-Range \
@plugin=header_rewrite.so @pparam=cache_range_local.config

map http://example_no_cache.com http://httpbin.org \
@plugin=cachekey.so


$ cat etc/trafficserver/cache_range_global.config
cond %{READ_REQUEST_HDR_HOOK}
cond %{CLIENT-URL:HOST} example.com
set-header @Original-Range %{HEADER:Range}
rm-header Range


$ cat etc/trafficserver/cache_range_local.config
cond %{SEND_REQUEST_HDR_HOOK}
set-header Range %{HEADER:@Original-Range}

cond %{READ_RESPONSE_HDR_HOOK}
cond %{STATUS} =206
set-status 200
set-header Cache-Control "max-age=10"

cond %{SEND_RESPONSE_HDR_HOOK}
cond %{STATUS} =200
set-status 206


$ cat etc/trafficserver/plugin.config
header_rewrite.so cache_range_global.config
xdebug.so
{code}

And here is a sample test:

{code}
$ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache
. . .

$ for domain in example_no_cache.com example_no_cache.com example.com 

[jira] [Comment Edited] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.

2016-08-27 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441913#comment-15441913
 ] 

Gancho Tenev edited comment on TS-4334 at 8/27/16 5:15 PM:
---

It seems to me that the {{cache_range_requests}} functionality can be achieved 
by using more generic (and feature-rich) plugins like {{header_rewrite}} and 
{{cachekey}}.

Here is a example where {{httpbin.org}} is used as origin which responds to 
range requests adding {{Cache-Control:max-age=10}} header. 

There are 2 remap rules defined, one for {{example_no_cache.com}} which is 
non-caching (ATS does not cache 206 responses by default) and one for 
{{example.com}} which is caching exactly like the {{cache_range_requests}} 
plugin does.

To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times 
to test the non-caching remap and 2 times to test the 
"cache_range_requests"-style caching. 

Here are the configs:

{code}
$ cat etc/trafficserver/remap.config
map http://example.com http://httpbin.org \
@plugin=cachekey.so @pparam=--include-headers=@Original-Range \
@plugin=header_rewrite.so @pparam=cache_range_local.config

map http://example_no_cache.com http://httpbin.org \
@plugin=cachekey.so


$ cat etc/trafficserver/cache_range_global.config
cond %{READ_REQUEST_HDR_HOOK}
cond %{CLIENT-URL:HOST} example.com
set-header @Original-Range %{HEADER:Range}
rm-header Range


$ cat etc/trafficserver/cache_range_local.config
cond %{SEND_REQUEST_HDR_HOOK}
set-header Range %{HEADER:@Original-Range}

cond %{READ_RESPONSE_HDR_HOOK}
cond %{STATUS} =206
set-status 200
set-header Cache-Control "max-age=10"

cond %{SEND_RESPONSE_HDR_HOOK}
cond %{STATUS} =200
set-status 206


$ cat etc/trafficserver/plugin.config
header_rewrite.so cache_range_global.config
xdebug.so
{code}

And here is a sample test:

{code}
$ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache
. . .

$ for domain in example_no_cache.com example_no_cache.com example.com 
example.com; do curl -x 127.0.0.1:80 -v "http://${domain}/range/1024; -H 
"X-Debug: X-Cache,X-Cache-Key" -r0-16 -s 2>&1|grep -e "HTTP" -e "Cache"; echo 
"---"; sleep 5; done
> GET http://example_no_cache.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 PARTIAL CONTENT
< X-Cache-Key: /example_no_cache.com/80/range/1024
< X-Cache: miss
---
> GET http://example_no_cache.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 PARTIAL CONTENT
< X-Cache-Key: /example_no_cache.com/80/range/1024
< X-Cache: miss
---
> GET http://example.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 Partial Content
< Cache-Control: max-age=10
< X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024
< X-Cache: miss
---
> GET http://example.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 Partial Content
< Cache-Control: max-age=10
< X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024
< X-Cache: hit-fresh
---
{code}

Please let me know if it works for you!
Cheers,
--Gancho


was (Author: gancho):
It seems to me that the {{cache_range_requests}} functionality can be achieved 
by using more generic (and feature-rich) plugins like {{header_rewrite}} and 
{{cachekey}}.

Here is a sample where {{httpbin.org}} is used as origin which responds to 
range requests adding {{Cache-Control:max-age=10}} header. 

There are 2 remap rules defined, one for {{example_no_cache.com}} which is 
non-caching (ATS does not cache 206 responses by default) and one for 
{{example.com}} which is caching exactly like the {{cache_range_requests}} 
plugin does.

To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times 
to test the non-caching remap and 2 times to test the 
"cache_range_requests"-style caching. 

Here are the configs:

{code}
$ cat etc/trafficserver/remap.config
map http://example.com http://httpbin.org \
@plugin=cachekey.so @pparam=--include-headers=@Original-Range \
@plugin=header_rewrite.so @pparam=cache_range_local.config

map http://example_no_cache.com http://httpbin.org \
@plugin=cachekey.so


$ cat etc/trafficserver/cache_range_global.config
cond %{READ_REQUEST_HDR_HOOK}
cond %{CLIENT-URL:HOST} example.com
set-header @Original-Range %{HEADER:Range}
rm-header Range


$ cat etc/trafficserver/cache_range_local.config
cond %{SEND_REQUEST_HDR_HOOK}
set-header Range %{HEADER:@Original-Range}

cond %{READ_RESPONSE_HDR_HOOK}
cond %{STATUS} =206
set-status 200
set-header Cache-Control "max-age=10"

cond %{SEND_RESPONSE_HDR_HOOK}
cond %{STATUS} =200
set-status 206


$ cat etc/trafficserver/plugin.config
header_rewrite.so cache_range_global.config
xdebug.so
{code}

And here is a sample test:

{code}
$ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache
. . .

$ for domain in example_no_cache.com example_no_cache.com example.com 
example.com; 

[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.

2016-08-27 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441913#comment-15441913
 ] 

Gancho Tenev commented on TS-4334:
--

It seems to me that the {{cache_range_requests}} functionality can be achieved 
by using more generic (and feature-rich) plugins like {{header_rewrite}} and 
{{cachekey}}.

Here is a sample where {{httpbin.org}} is used as origin which responds to 
range requests adding {{Cache-Control:max-age=10}} header. 

There are 2 remap rules defined, one for {{example_no_cache.com}} which is 
non-caching (ATS does not cache 206 responses by default) and one for 
{{example.com}} which is caching exactly like the {{cache_range_requests}} 
plugin does.

To test run {{traffic_server}} and {{curl}} 4 times every 5 seconds - 2 times 
to test the non-caching remap and 2 times to test the 
"cache_range_requests"-style caching. 

Here are the configs:

{code}
$ cat etc/trafficserver/remap.config
map http://example.com http://httpbin.org \
@plugin=cachekey.so @pparam=--include-headers=@Original-Range \
@plugin=header_rewrite.so @pparam=cache_range_local.config

map http://example_no_cache.com http://httpbin.org \
@plugin=cachekey.so


$ cat etc/trafficserver/cache_range_global.config
cond %{READ_REQUEST_HDR_HOOK}
cond %{CLIENT-URL:HOST} example.com
set-header @Original-Range %{HEADER:Range}
rm-header Range


$ cat etc/trafficserver/cache_range_local.config
cond %{SEND_REQUEST_HDR_HOOK}
set-header Range %{HEADER:@Original-Range}

cond %{READ_RESPONSE_HDR_HOOK}
cond %{STATUS} =206
set-status 200
set-header Cache-Control "max-age=10"

cond %{SEND_RESPONSE_HDR_HOOK}
cond %{STATUS} =200
set-status 206


$ cat etc/trafficserver/plugin.config
header_rewrite.so cache_range_global.config
xdebug.so
{code}

And here is a sample test:

{code}
$ sudo ./bin/traffic_server -T 'header_rewrite|cachekey' --clear_cache
. . .

$ for domain in example_no_cache.com example_no_cache.com example.com 
example.com; do curl -x 127.0.0.1:80 -v "http://${domain}/range/1024; -H 
"X-Debug: X-Cache,X-Cache-Key" -r0-16 -s 2>&1|grep -e "HTTP" -e "Cache"; echo 
"---"; sleep 5; done
> GET http://example_no_cache.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 PARTIAL CONTENT
< X-Cache-Key: /example_no_cache.com/80/range/1024
< X-Cache: miss
---
> GET http://example_no_cache.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 PARTIAL CONTENT
< X-Cache-Key: /example_no_cache.com/80/range/1024
< X-Cache: miss
---
> GET http://example.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 Partial Content
< Cache-Control: max-age=10
< X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024
< X-Cache: miss
---
> GET http://example.com/range/1024 HTTP/1.1
> X-Debug: X-Cache,X-Cache-Key
< HTTP/1.1 206 Partial Content
< Cache-Control: max-age=10
< X-Cache-Key: /example.com/80/@Original-Range:bytes=0-16/range/1024
< X-Cache: hit-fresh
---
{code}

Please let me know if it works for you!
Cheers,
--Gancho

> The cache_range_requests plugin always attempts to modify the cache key.
> 
>
> Key: TS-4334
> URL: https://issues.apache.org/jira/browse/TS-4334
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Nolan Astrein
>Assignee: Gancho Tenev
> Fix For: 7.1.0
>
>
> A TrafficServer administrator should be able to specify whether or not the 
> cache_range_requests plugin should modify the cache key.  The cache key may 
> be modified by a previous plugin in a plugin chain and there is no way to 
> configure cache_range_requests not to do any further modifications to the 
> cache key.  Having multiple plugins responsible for cache key modifications 
> can cause unexpected behavior, especially when a plugin chain ordering is 
> changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4686) Move hook-trace plugin from examples to plugins/experimental

2016-08-23 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4686:
-
Assignee: Gancho Tenev

> Move hook-trace plugin from examples to plugins/experimental
> 
>
> Key: TS-4686
> URL: https://issues.apache.org/jira/browse/TS-4686
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Leif Hedstrom
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> This makes more sense as a tool in the plugins arsenal. :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4712) Verify HttpHdr caching functionality

2016-08-02 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4712:


 Summary: Verify HttpHdr caching functionality
 Key: TS-4712
 URL: https://issues.apache.org/jira/browse/TS-4712
 Project: Traffic Server
  Issue Type: Task
  Components: Cleanup, Core
Reporter: Gancho Tenev


After finding an use-case that was not supported well by the HttpHdr caching 
functionality ([TS-4706|https://issues.apache.org/jira/browse/TS-4706]) it may 
make sense to look into its use-cases and verify its functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4706) SSL hostname verification failed due to truncated SNI name

2016-07-29 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4706:


 Summary: SSL hostname verification failed due to truncated SNI name
 Key: TS-4706
 URL: https://issues.apache.org/jira/browse/TS-4706
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Gancho Tenev


SSL hostname verification fails due to truncated SNI name when escalation 
plugin is used to redirect a failed request (404) from a primary origin 
{{primary.com}} to a secondary origin {{secondary.com}}.

{code:title=Excerpt from the ATS logs showing the error|borderStyle=solid}
DEBUG:  (ssl) using SNI 
name ‘secondary.c'’ for client handshake
DEBUG:  (ssl.error) 
SSLNetVConnection::sslClientHandShakeEvent, SSL_ERROR_WANT_READ
DEBUG:  (ssl) using SNI 
name 'secondary.c’’ for client handshake
DEBUG:  (ssl) Hostname verification 
failed for (‘secondary.c')
{code}

One could see that the SNI name {{secondary.com}} is truncated to 
{{secondary.c}}

{code:title=Test case to reproduce}
$ cat etc/trafficserver/remap.config
map http://example.com https://primary.com @plugin=escalate.so 
@pparam=404:secondary.com

$ sudo ./bin/traffic_server -T ssl 2>&1 | egrep -e 'using SNI name .* for 
client handshake'
DEBUG:  (ssl) using SNI 
name 'primary.com' for client handshake
DEBUG:  (ssl) using SNI 
name 'secondary.c' for client handshake

$ curl -x localhost:80 'http://example.com/path/to/object'
{code}

I have a fix available which produces the following log (SNI hostname no longer 
truncated)

{code:title=Excerpt from ATS logs after applying the fix}
$ sudo ./bin/traffic_server -T ssl 2>&1 | egrep -e 'using SNI name .* for 
client handshake'
DEBUG:  (ssl) using SNI 
name 'primary.com' for client handshake
DEBUG:  (ssl) using SNI 
name 'secondary.com' for client handshake
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4650) cachekey: not thread safe

2016-07-12 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4650:
-
Backport to Version: 6.2.1

> cachekey: not thread safe
> -
>
> Key: TS-4650
> URL: https://issues.apache.org/jira/browse/TS-4650
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Affects Versions: 6.2.0
>Reporter: Felicity Tarnell
> Fix For: 7.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> cachekey's Pattern class is not thread safe; it uses member data to store the 
> result of pcre_exec(), but only one instance is shared between all threads.  
> This causes crashes when two threads access the pcre result at the same time.
> Fix: use automatic storage for the pcre result data.
> PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4650) cachekey: not thread safe

2016-07-12 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4650:
-
Fix Version/s: (was: 6.2.1)
   7.0.0

> cachekey: not thread safe
> -
>
> Key: TS-4650
> URL: https://issues.apache.org/jira/browse/TS-4650
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Affects Versions: 6.2.0
>Reporter: Felicity Tarnell
> Fix For: 7.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> cachekey's Pattern class is not thread safe; it uses member data to store the 
> result of pcre_exec(), but only one instance is shared between all threads.  
> This causes crashes when two threads access the pcre result at the same time.
> Fix: use automatic storage for the pcre result data.
> PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4650) cachekey: not thread safe

2016-07-12 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4650:
-
Fix Version/s: (was: 7.0.0)
   6.2.1

> cachekey: not thread safe
> -
>
> Key: TS-4650
> URL: https://issues.apache.org/jira/browse/TS-4650
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Affects Versions: 6.2.0
>Reporter: Felicity Tarnell
> Fix For: 6.2.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> cachekey's Pattern class is not thread safe; it uses member data to store the 
> result of pcre_exec(), but only one instance is shared between all threads.  
> This causes crashes when two threads access the pcre result at the same time.
> Fix: use automatic storage for the pcre result data.
> PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4367) [clang-analyzer] memory leaks in mgmt/api and proxy/logging

2016-04-19 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4367:
-
Component/s: Logging

> [clang-analyzer] memory leaks in mgmt/api and proxy/logging
> ---
>
> Key: TS-4367
> URL: https://issues.apache.org/jira/browse/TS-4367
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging, Management API
>Reporter: Gancho Tenev
>
> ||Bug Group ||Bug Type ||File ||Function/Method ||Line ||
> |Memory Error |Memory leak|mgmt/api/GenericParser.cc  |cacheParse 
> |363|
> |Memory Error |Memory leak|mgmt/api/GenericParser.cc  |socksParse 
> |660|
> |Memory Error |Memory leak|mgmt/api/GenericParser.cc  |splitdnsParse  
> |744|
> |Memory Error |Memory leak|proxy/logging/LogCollationAccept.cc
> |accept_event   |99|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4367) [clang-analyzer] memory leaks in mgmt/api and proxy/logging

2016-04-19 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4367:
-
Summary: [clang-analyzer] memory leaks in mgmt/api and proxy/logging  (was: 
[clang-analyzer] memory leaks in mgmt/api)

> [clang-analyzer] memory leaks in mgmt/api and proxy/logging
> ---
>
> Key: TS-4367
> URL: https://issues.apache.org/jira/browse/TS-4367
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging, Management API
>Reporter: Gancho Tenev
>
> ||Bug Group ||Bug Type ||File ||Function/Method ||Line ||
> |Memory Error |Memory leak|mgmt/api/GenericParser.cc  |cacheParse 
> |363|
> |Memory Error |Memory leak|mgmt/api/GenericParser.cc  |socksParse 
> |660|
> |Memory Error |Memory leak|mgmt/api/GenericParser.cc  |splitdnsParse  
> |744|
> |Memory Error |Memory leak|proxy/logging/LogCollationAccept.cc
> |accept_event   |99|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4367) [clang-analyzer] memory leaks in mgmt/api

2016-04-19 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4367:


 Summary: [clang-analyzer] memory leaks in mgmt/api
 Key: TS-4367
 URL: https://issues.apache.org/jira/browse/TS-4367
 Project: Traffic Server
  Issue Type: Bug
  Components: Management API
Reporter: Gancho Tenev


||Bug Group ||Bug Type ||File ||Function/Method ||Line ||
|Memory Error   |Memory leak|mgmt/api/GenericParser.cc  |cacheParse 
|363|
|Memory Error   |Memory leak|mgmt/api/GenericParser.cc  |socksParse 
|660|
|Memory Error   |Memory leak|mgmt/api/GenericParser.cc  |splitdnsParse  
|744|
|Memory Error   |Memory leak|proxy/logging/LogCollationAccept.cc
|accept_event   |99|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4366) [clang-analyzer] Uninitialized stack value used in mp4 plugin

2016-04-19 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4366:
-
Summary: [clang-analyzer] Uninitialized stack value used in mp4 plugin  
(was: [clang-analyzer] Unitialized stack value used in mp4 plugin)

> [clang-analyzer] Uninitialized stack value used in mp4 plugin
> -
>
> Key: TS-4366
> URL: https://issues.apache.org/jira/browse/TS-4366
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Reporter: Gancho Tenev
> Fix For: 7.0.0
>
>
> Logic error: Result of operation is garbage or undefined
> Source: plugins/experimental/mp4/mp4_meta.cc: 951 
> Function: Mp4Meta::mp4_read_co64_atom():
> Within the expansion of the macro 'mp4_get_32value': 
>   The left operand of '<<' is a garbage value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4366) [clang-analyzer] Unitialized stack value used in mp4 plugin

2016-04-19 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4366:
-
Fix Version/s: 7.0.0

> [clang-analyzer] Unitialized stack value used in mp4 plugin
> ---
>
> Key: TS-4366
> URL: https://issues.apache.org/jira/browse/TS-4366
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Reporter: Gancho Tenev
> Fix For: 7.0.0
>
>
> Logic error: Result of operation is garbage or undefined
> Source: plugins/experimental/mp4/mp4_meta.cc: 951 
> Function: Mp4Meta::mp4_read_co64_atom():
> Within the expansion of the macro 'mp4_get_32value': 
>   The left operand of '<<' is a garbage value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4366) [clang-analyzer] Unitialized stack value used in mp4 plugin

2016-04-19 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4366:


 Summary: [clang-analyzer] Unitialized stack value used in mp4 
plugin
 Key: TS-4366
 URL: https://issues.apache.org/jira/browse/TS-4366
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev


Logic error: Result of operation is garbage or undefined
Source: plugins/experimental/mp4/mp4_meta.cc: 951   
Function: Mp4Meta::mp4_read_co64_atom():

Within the expansion of the macro 'mp4_get_32value': 
  The left operand of '<<' is a garbage value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3556) Implement CDNI URL Signing as a plugin

2016-04-19 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-3556:
-
Fix Version/s: (was: 6.2.0)
   7.0.0

> Implement CDNI URL Signing as a plugin
> --
>
> Key: TS-3556
> URL: https://issues.apache.org/jira/browse/TS-3556
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Plugins, Security
>Reporter: Leif Hedstrom
>Assignee: Gancho Tenev
>  Labels: A
> Fix For: 7.0.0
>
>
> The specs are at
> https://tools.ietf.org/html/draft-ietf-cdni-uri-signing-03
> I think we should implement this, and work with the IETF community around 
> this to provide a full featured implementation that covers all our use cases. 
> This would hopefully supersede the existing url_sig plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3198) Ignore useless MIMEFieldBlockImpl.

2016-04-19 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-3198:
-
Fix Version/s: (was: 6.2.0)
   7.0.0

> Ignore useless MIMEFieldBlockImpl.
> --
>
> Key: TS-3198
> URL: https://issues.apache.org/jira/browse/TS-3198
> Project: Traffic Server
>  Issue Type: Bug
>  Components: MIME
>Affects Versions: 4.1.2, 4.2.0, 4.2.2, 5.0.0, 5.1.1
>Reporter: portl4t
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
> Attachments: 0001-TS-3198-Ignore-useless-MIMEFieldBlockImpl.patch
>
>
> ATS will generate a very large marshal header in a rare case. As we know, ATS 
> will merge the response header if it get 304 from origin server. I found that 
> the HdrHeap size will increase if duplidated header exist.
> In our production environment, we got a response from origin server like this:
> {code}
> HTTP/1.1 200 OK
> Content-Length: 60
> ...
> Powered-By-CC: MISS from A
> Cache-Control: public,max-age=0
> Powered-By-CC: MISS from B
> Connection: close
> {code}
> There is a duplicated header 'Powered-By-CC', and every time the doc had been 
> accessed, ATS had to revalidate this doc from the origin as the max-age is 0. 
> The origin server response 304 like this:
> {code}
> HTTP/1.1 304 Not Modified
> ...
> Powered-By-CC: 8c61e322f02a0343e93ef227d82e5e0a
> Cache-Control: public,max-age=0
> Powered-By-CC: e4563610a50c63ed500d27bb5f1df848
> Connection: close
> {code}
> ATS will merge the header frequently, and the HdrHeap size will increase 
> endlessly.
> {code}
> Breakpoint 1, CacheVC::updateVector (this=0x14112f0) at CacheWrite.cc:132
> 132 header_len = write_vector->marshal_length();
> (gdb) n
> 133 od->writing_vec = 1;
> (gdb) p header_len
> $1 = 1068944
> (gdb) bt
> #0  CacheVC::updateVector (this=0x14112f0) at CacheWrite.cc:133
> #1  0x006c04c6 in CacheVC::openWriteClose (this=0x14112f0, event=0, 
> e=0x0) at CacheWrite.cc:1276
> #2  0x0069e827 in CacheVC::die (this=0x14112f0) at 
> P_CacheInternal.h:738
> #3  0x00690b1f in CacheVC::do_io_close (this=0x14112f0, alerrno=-1) 
> at Cache.cc:373
> #4  0x004fed48 in VConnection::do_io (this=0x14112f0, op=3, c=0x0, 
> nbytes=9223372036854775807, cb=0x0, data=0)
> at ../iocore/eventsystem/P_VConnection.h:106
> #5  0x00591b5a in HttpCacheSM::close_write (this=0x7fffe7f7d3b0) at 
> HttpCacheSM.h:118
> #6  0x005897a9 in HttpSM::issue_cache_update (this=0x7fffe7f7b980) at 
> HttpSM.cc:5590
> #7  0x005895d6 in HttpSM::perform_cache_write_action 
> (this=0x7fffe7f7b980) at HttpSM.cc:5540
> #8  0x0058ef4d in HttpSM::set_next_state (this=0x7fffe7f7b980) at 
> HttpSM.cc:7206
> #9  0x0058e0be in HttpSM::call_transact_and_set_next_state 
> (this=0x7fffe7f7b980, f=0) at HttpSM.cc:6962
> #10 0x0057bedf in HttpSM::handle_api_return (this=0x7fffe7f7b980) at 
> HttpSM.cc:1531
> #11 0x005944ca in HttpSM::do_api_callout (this=0x7fffe7f7b980) at 
> HttpSM.cc:452
> #12 0x0057cf73 in HttpSM::state_read_server_response_header 
> (this=0x7fffe7f7b980, event=100, data=0x7fffe0015c78) at HttpSM.cc:1878
> #13 0x0057f536 in HttpSM::main_handler (this=0x7fffe7f7b980, 
> event=100, data=0x7fffe0015c78) at HttpSM.cc:2565
> #14 0x004f55a6 in Continuation::handleEvent (this=0x7fffe7f7b980, 
> event=100, data=0x7fffe0015c78) at ../iocore/eventsystem/I_Continuation.h:146
> #15 0x006ead77 in read_signal_and_update (event=100, 
> vc=0x7fffe0015b60) at UnixNetVConnection.cc:137
> #16 0x006eb5a7 in read_from_net (nh=0x737cea30, 
> vc=0x7fffe0015b60, thread=0x737cb010) at UnixNetVConnection.cc:320
> #17 0x006ed221 in UnixNetVConnection::net_read_io 
> (this=0x7fffe0015b60, nh=0x737cea30, lthread=0x737cb010) at 
> UnixNetVConnection.cc:846
> #18 0x006e4dd1 in NetHandler::mainNetEvent (this=0x737cea30, 
> event=5, e=0x1089e80) at UnixNet.cc:399
> #19 0x004f55a6 in Continuation::handleEvent (this=0x737cea30, 
> event=5, data=0x1089e80) at ../iocore/eventsystem/I_Continuation.h:146
> #20 0x0070bace in EThread::process_event (this=0x737cb010, 
> e=0x1089e80, calling_code=5) at UnixEThread.cc:144
> #21 0x0070bfd8 in EThread::execute (this=0x737cb010) at 
> UnixEThread.cc:268
> #22 0x00526644 in main (argv=0x7fffe368) at Main.cc:1763
> {code}
> In HttpTransact::merge_response_header_with_cached_header(...), ATS will set 
> the old MIMEField as DELETED if it is duplicated, and attach new MIMEField, 
> this will increase the number of MIMEFieldBlockImpl, and the HdrHeap size may 
> increase to larger than 1M.
> I suggest to ignore the useless MIMEFieldBlockImpl when copy the MIME header 
> in mime_hdr_copy_onto(...).



--
This message 

[jira] [Updated] (TS-4356) Deprecate cacheurl plugin

2016-04-19 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4356:
-
Backport to Version: 6.2.0
  Fix Version/s: (was: 6.2.0)
 7.0.0

> Deprecate cacheurl plugin
> -
>
> Key: TS-4356
> URL: https://issues.apache.org/jira/browse/TS-4356
> Project: Traffic Server
>  Issue Type: Task
>  Components: Plugins
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> Deprecate cacheurl plugin in favor of cachekey plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4362) Remove cacheurl plugin

2016-04-18 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4362:
-
Fix Version/s: (was: 6.2.0)
   7.0.0

> Remove cacheurl plugin
> --
>
> Key: TS-4362
> URL: https://issues.apache.org/jira/browse/TS-4362
> Project: Traffic Server
>  Issue Type: Task
>  Components: Plugins
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 7.0.0
>
>
> Deprecate cacheurl plugin in favor of cachekey plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4362) Remove cacheurl plugin

2016-04-18 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4362:


 Summary: Remove cacheurl plugin
 Key: TS-4362
 URL: https://issues.apache.org/jira/browse/TS-4362
 Project: Traffic Server
  Issue Type: Task
  Components: Plugins
Reporter: Gancho Tenev
Assignee: Gancho Tenev
 Fix For: 6.2.0


Deprecate cacheurl plugin in favor of cachekey plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4356) Deprecate cacheurl plugin

2016-04-17 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4356:


 Summary: Deprecate cacheurl plugin
 Key: TS-4356
 URL: https://issues.apache.org/jira/browse/TS-4356
 Project: Traffic Server
  Issue Type: Task
  Components: Plugins
Reporter: Gancho Tenev


Deprecate cacheurl plugin in favor of cachekey plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4334) The cache_range_requests plugin always attempts to modify the cache key.

2016-04-07 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231444#comment-15231444
 ] 

Gancho Tenev commented on TS-4334:
--

Sure, please assign it to me.

> The cache_range_requests plugin always attempts to modify the cache key.
> 
>
> Key: TS-4334
> URL: https://issues.apache.org/jira/browse/TS-4334
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Nolan Astrein
> Fix For: 7.0.0
>
>
> A TrafficServer administrator should be able to specify whether or not the 
> cache_range_requests plugin should modify the cache key.  The cache key may 
> be modified by a previous plugin in a plugin chain and there is no way to 
> configure cache_range_requests not to do any further modifications to the 
> cache key.  Having multiple plugins responsible for cache key modifications 
> can cause unexpected behavior, especially when a plugin chain ordering is 
> changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4183) cachekey: URI and URI path capture/replacement

2016-02-07 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4183:


 Summary: cachekey: URI and URI path capture/replacement
 Key: TS-4183
 URL: https://issues.apache.org/jira/browse/TS-4183
 Project: Traffic Server
  Issue Type: Improvement
  Components: Plugins
Reporter: Gancho Tenev


Add means to add to the cache key by using regex capture and replace from URI 
path and URI as a whole 
Plugin parameters:
--capture-prefix-uri=regex
--capture-prefix-uri=/regex/replacement/
--capture-path-uri=regex
--capture-path-uri=/regex/replacement/
--capture-path=regex
--capture-path=/regex/replacement/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4161) ProcessManager prone to stack-overflow

2016-01-28 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4161:


 Summary: ProcessManager prone to stack-overflow
 Key: TS-4161
 URL: https://issues.apache.org/jira/browse/TS-4161
 Project: Traffic Server
  Issue Type: Bug
  Components: Manager
Reporter: Gancho Tenev


ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big 
number of messages in a raw from the same socket. 

Since alloca() is used to allocate buffers on the stack for each message read 
from the socket, and those buffers are not released until the function returns, 
getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen 
if the message length is big enough (accidentally or on purpose).

It can be reproduced easily by setting up:
proxy.config.lm.pserver_timeout_secs: 0
proxy.config.lm.pserver_timeout_msecs: 0
in records.config and running ./bin/traffic_manager. 

ATS crashes with a segfault in a weird place (while trying to allocate with 
malloc()). If you inspect the core you would see that it got "stuck" in the 
loop before it crashed over-flowing the stack (kept allocating buffers on the 
stack with alloca() until it crashed).

It is worth considering replacing the alloca() with VLA (which "releases" 
memory when out of scope on each iteration of the loop) or using ats_malloc() 
which is supposedly less time-efficient but would be better to handle bigger 
messages without worrying about stack-overflow. 

IMO adding a message size limit check is a good practice especially with the 
current implementation.

If the code gets "stuck" in the while loop while reading big number of messages 
in a row from the same socket then the port configured by 
proxy.config.process_manager.mgmt_port becomes unavailable (connection 
refused). Adding a limit of messages that can be processed in a row should be a 
good idea.

I stumbled up on this while running TSQA regression tests where TSQA kept 
complaining that the management port is not available and the ATS kept crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4161) ProcessManager prone to stack-overflow

2016-01-28 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4161:
-
Description: 
ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big 
number of messages in a row from the same socket. 

Since alloca() is used to allocate buffers on the stack for each message read 
from the socket, and those buffers are not released until the function returns, 
getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen 
if the message length is big enough (accidentally or on purpose).

It can be reproduced easily by setting up:
proxy.config.lm.pserver_timeout_secs: 0
proxy.config.lm.pserver_timeout_msecs: 0
in records.config and running ./bin/traffic_manager. 

ATS crashes with a segfault in a weird place (while trying to allocate with 
malloc()). If you inspect the core you would see that it got "stuck" in the 
loop before it crashed over-flowing the stack (kept allocating buffers on the 
stack with alloca() until it crashed).

It is worth considering replacing the alloca() with VLA (which "releases" 
memory when out of scope on each iteration of the loop) or using ats_malloc() 
which is supposedly less time-efficient but would be better to handle bigger 
messages without worrying about stack-overflow. 

IMO adding a message size limit check is a good practice especially with the 
current implementation.

If the code gets "stuck" in the while loop while reading big number of messages 
in a row from the same socket then the port configured by 
proxy.config.process_manager.mgmt_port becomes unavailable (connection 
refused). Adding a limit of messages that can be processed in a row should be a 
good idea.

I stumbled up on this while running TSQA regression tests where TSQA kept 
complaining that the management port is not available and the ATS kept crashing.

  was:
ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big 
number of messages in a raw from the same socket. 

Since alloca() is used to allocate buffers on the stack for each message read 
from the socket, and those buffers are not released until the function returns, 
getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen 
if the message length is big enough (accidentally or on purpose).

It can be reproduced easily by setting up:
proxy.config.lm.pserver_timeout_secs: 0
proxy.config.lm.pserver_timeout_msecs: 0
in records.config and running ./bin/traffic_manager. 

ATS crashes with a segfault in a weird place (while trying to allocate with 
malloc()). If you inspect the core you would see that it got "stuck" in the 
loop before it crashed over-flowing the stack (kept allocating buffers on the 
stack with alloca() until it crashed).

It is worth considering replacing the alloca() with VLA (which "releases" 
memory when out of scope on each iteration of the loop) or using ats_malloc() 
which is supposedly less time-efficient but would be better to handle bigger 
messages without worrying about stack-overflow. 

IMO adding a message size limit check is a good practice especially with the 
current implementation.

If the code gets "stuck" in the while loop while reading big number of messages 
in a row from the same socket then the port configured by 
proxy.config.process_manager.mgmt_port becomes unavailable (connection 
refused). Adding a limit of messages that can be processed in a row should be a 
good idea.

I stumbled up on this while running TSQA regression tests where TSQA kept 
complaining that the management port is not available and the ATS kept crashing.


> ProcessManager prone to stack-overflow
> --
>
> Key: TS-4161
> URL: https://issues.apache.org/jira/browse/TS-4161
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Manager
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>  Labels: crash
> Fix For: 6.2.0
>
>
> ProcessManager::pollLMConnection() can get "stuck" in a loop while handling 
> big number of messages in a row from the same socket. 
> Since alloca() is used to allocate buffers on the stack for each message read 
> from the socket, and those buffers are not released until the function 
> returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same 
> could happen if the message length is big enough (accidentally or on purpose).
> It can be reproduced easily by setting up:
> proxy.config.lm.pserver_timeout_secs: 0
> proxy.config.lm.pserver_timeout_msecs: 0
> in records.config and running ./bin/traffic_manager. 
> ATS crashes with a segfault in a weird place (while trying to allocate with 
> malloc()). If you inspect the core you would see that it got "stuck" in the 
> loop before it crashed 

[jira] [Commented] (TS-4023) cachekey plugin

2015-11-14 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005547#comment-15005547
 ] 

Gancho Tenev commented on TS-4023:
--

Renamed to {{cachekey}}.

> cachekey plugin
> ---
>
> Key: TS-4023
> URL: https://issues.apache.org/jira/browse/TS-4023
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 6.1.0
>
>
> This plugin allows some common cache key normalizations of the URI.  It can
> - sort query parameters so reordering can be a cache hit
> - ignore specific query parameters from the cache key by name or regular 
> expression
> - ignore all query parameters from the cache key
> - only use specific query parameters in the cache key by name or regular 
> expression
> - include headers or cookies by name
> - capture / replace values from the User-Agent header.
> - classify request using User-Agent and a list of regular expressions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4023) cachekey_norm plugin

2015-11-13 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-4023:


 Summary: cachekey_norm plugin
 Key: TS-4023
 URL: https://issues.apache.org/jira/browse/TS-4023
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: Gancho Tenev


This plugin allows some common cache key normalizations of the URI.  It can
- sort query parameters so reordering can be a cache hit
- ignore specific query parameters from the cache key by name or regular 
expression
- ignore all query parameters from the cache key
- only use specific query parameters in the cache key by name or regular 
expression
- include headers or cookies by name
- capture / replace values from the User-Agent header.
- classify request using User-Agent and a list of regular expressions




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4023) cachekey plugin

2015-11-13 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-4023:
-
Summary: cachekey plugin  (was: cachekey_norm plugin)

> cachekey plugin
> ---
>
> Key: TS-4023
> URL: https://issues.apache.org/jira/browse/TS-4023
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
> Fix For: 6.1.0
>
>
> This plugin allows some common cache key normalizations of the URI.  It can
> - sort query parameters so reordering can be a cache hit
> - ignore specific query parameters from the cache key by name or regular 
> expression
> - ignore all query parameters from the cache key
> - only use specific query parameters in the cache key by name or regular 
> expression
> - include headers or cookies by name
> - capture / replace values from the User-Agent header.
> - classify request using User-Agent and a list of regular expressions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3883) ats_madvise() has no effect on Linux (including MADV_DONTDUMP)

2015-09-01 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-3883:


 Summary: ats_madvise() has no effect on Linux (including 
MADV_DONTDUMP)
 Key: TS-3883
 URL: https://issues.apache.org/jira/browse/TS-3883
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Gancho Tenev


While investigating unrelated issue with truncated core dumps on Linux, noticed 
that we run out of space on a few machines because of huge core dumps which 
were tending to the ATS process virtual memory size (reported by 
/proc//status:VmSize on Linux).

It looked like MADV_DONTDUMP memory use advice was not set properly.

Further debugging showed that we have the following code in ats_madvise():
{code}
#if defined(linux)
(void)addr;
(void)len;
(void)flags;
return 0;
#else . . .
{code}

Which would lead ats_madvise() to have no effect when "defined(linux)" is true, 
and would skip the necessary madvise() call to set MADV_DONTDUMP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3820) Change default for proxy.config.http.redirect_host_no_port (to 1, enabled)

2015-08-19 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703241#comment-14703241
 ] 

Gancho Tenev commented on TS-3820:
--

The code worked as expected, will upload the TSQA test that I used to verify it 
later.

 Change default for proxy.config.http.redirect_host_no_port (to 1, enabled)
 --

 Key: TS-3820
 URL: https://issues.apache.org/jira/browse/TS-3820
 Project: Traffic Server
  Issue Type: Improvement
  Components: Configuration
Reporter: Leif Hedstrom
Assignee: Gancho Tenev
 Fix For: 6.1.0


 I think the behavior to not add on a port if matches the default scheme makes 
 more sense. I assume the config, and default to 0, was done for backwards 
 compatibility?
 This relates to 56fbfdd2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK

2015-08-13 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916
 ] 

Gancho Tenev edited comment on TS-3740 at 8/13/15 8:50 PM:
---

Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of PATH and in QSA mode 
(appending query parameters to) when forming the Location header. TS-3740 
patch always uses client request URI when replacing those variables to form the 
Location header regardless of which hook condition matches while TS-3137 
patch uses the corresponding URI at each particular hook (which is consistent 
with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI to form Location header (it is always available regardless of which hook 
condition matches, it fitted the above origin time-out use-case well and seemed 
a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!


was (Author: gancho):
Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of %{PATH} and in QSA 
mode (appending query parameters to) when forming the Location header. 
TS-3740 patch always uses client request URI when replacing those variables to 
form the Location header regardless of which hook condition matches while 
TS-3137 patch uses the corresponding URI at each particular hook (which is 
consistent with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI to form Location header (it is always available regardless of which hook 
condition matches, it fitted the above origin time-out use-case well and seemed 
a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!

 header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
 

 Key: TS-3740
 URL: https://issues.apache.org/jira/browse/TS-3740
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
Assignee: Gancho Tenev
 Fix For: 6.1.0


 DESCRIPTION:
 ATS header_rewrite plugin set-redirect operation doesn't work with 
 SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info.
 HOW TO REPRODUCE:
 Here is a sample plugin configuration files that reproduce the problem
 $ cat /opt/ats/etc/trafficserver/remap.config
 map http://p1 http://h1:8001 \
 @plugin=header_rewrite.so 
 @pparam=/opt/ats/etc/trafficserver/header_rewrite.config
 $ cat /opt/ats/etc/trafficserver/header_rewrite.config
 cond %{SEND_RESPONSE_HDR_HOOK}
 cond %{STATUS} =502
 set-redirect 302 http://p0/%{PATH} [QSA]
 DEBUGGING NOTES:
 Both conditions in the header_rewrite.config are evaluated correctly but 
 set-redirect has no effect and the response to the UA is not modified as 
 expected.  After some debugging it turned out that if the set-redirect 
 (OperatorSetDestination::exec) is not called from the remap plugin it has no 
 effect. The header_rewrite plugin creates a continuation to be called from 
 SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec 
 doesn't have code to handle the case when the set-redirect operation is _not_ 
 called directly from the remap plugin (TSRemapDoRemap()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK

2015-08-13 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916
 ] 

Gancho Tenev commented on TS-3740:
--

Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of %{PATH} and in QSA 
mode (appending query parameters to) when forming the Location header. 
TS-3740 patch always uses client request URI when replacing those variables to 
form the Location header regardless of which hook condition matches while 
TS-3137 patch uses the corresponding URI at each particular hook (which is 
consistent with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI to form Location header (it is always available regardless of which hook 
condition matches, it fitted the above origin time-out use-case well and seemed 
a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!

 header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
 

 Key: TS-3740
 URL: https://issues.apache.org/jira/browse/TS-3740
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
Assignee: Gancho Tenev
 Fix For: 6.1.0


 DESCRIPTION:
 ATS header_rewrite plugin set-redirect operation doesn't work with 
 SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info.
 HOW TO REPRODUCE:
 Here is a sample plugin configuration files that reproduce the problem
 $ cat /opt/ats/etc/trafficserver/remap.config
 map http://p1 http://h1:8001 \
 @plugin=header_rewrite.so 
 @pparam=/opt/ats/etc/trafficserver/header_rewrite.config
 $ cat /opt/ats/etc/trafficserver/header_rewrite.config
 cond %{SEND_RESPONSE_HDR_HOOK}
 cond %{STATUS} =502
 set-redirect 302 http://p0/%{PATH} [QSA]
 DEBUGGING NOTES:
 Both conditions in the header_rewrite.config are evaluated correctly but 
 set-redirect has no effect and the response to the UA is not modified as 
 expected.  After some debugging it turned out that if the set-redirect 
 (OperatorSetDestination::exec) is not called from the remap plugin it has no 
 effect. The header_rewrite plugin creates a continuation to be called from 
 SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec 
 doesn't have code to handle the case when the set-redirect operation is _not_ 
 called directly from the remap plugin (TSRemapDoRemap()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK

2015-08-13 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916
 ] 

Gancho Tenev edited comment on TS-3740 at 8/13/15 8:55 PM:
---

Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of PATH and in QSA mode 
(appending query parameters) when forming the Location header. TS-3740 patch 
always uses client request URI when replacing those variables to form the 
Location header regardless of which hook condition matches while TS-3137 
patch uses the corresponding URI at each particular hook (which is consistent 
with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI when forming Location header (it is always available regardless of which 
hook condition matches, it fitted the above origin time-out use-case well and 
seemed a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!


was (Author: gancho):
Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of PATH and in QSA mode 
(appending query parameters) when forming the Location header. TS-3740 patch 
always uses client request URI when replacing those variables to form the 
Location header regardless of which hook condition matches while TS-3137 
patch uses the corresponding URI at each particular hook (which is consistent 
with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI to form Location header (it is always available regardless of which hook 
condition matches, it fitted the above origin time-out use-case well and seemed 
a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!

 header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
 

 Key: TS-3740
 URL: https://issues.apache.org/jira/browse/TS-3740
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
Assignee: Gancho Tenev
 Fix For: 6.1.0


 DESCRIPTION:
 ATS header_rewrite plugin set-redirect operation doesn't work with 
 SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info.
 HOW TO REPRODUCE:
 Here is a sample plugin configuration files that reproduce the problem
 $ cat /opt/ats/etc/trafficserver/remap.config
 map http://p1 http://h1:8001 \
 @plugin=header_rewrite.so 
 @pparam=/opt/ats/etc/trafficserver/header_rewrite.config
 $ cat /opt/ats/etc/trafficserver/header_rewrite.config
 cond %{SEND_RESPONSE_HDR_HOOK}
 cond %{STATUS} =502
 set-redirect 302 http://p0/%{PATH} [QSA]
 DEBUGGING NOTES:
 Both conditions in the header_rewrite.config are evaluated correctly but 
 set-redirect has no effect and the response to the UA is not modified as 
 expected.  After some debugging it turned out that if the set-redirect 
 (OperatorSetDestination::exec) is not called from the remap plugin it has no 
 effect. The header_rewrite plugin creates a continuation to be called from 
 SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec 
 doesn't have code to handle the case when the set-redirect operation is _not_ 
 called directly from the remap plugin (TSRemapDoRemap()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK

2015-08-13 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916
 ] 

Gancho Tenev edited comment on TS-3740 at 8/13/15 8:56 PM:
---

Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of PATH and in QSA mode 
(appending query parameters) when forming the Location header. TS-3740 patch 
always uses client request URI when replacing those variables to form the 
Location header regardless of which hook condition matches while TS-3137 
patch uses the corresponding URI at each particular hook (which is consistent 
with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI when forming Location header (it is always available regardless of which 
hook condition matches, it fitted the above origin time-out use-case well and 
seemed a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!


was (Author: gancho):
Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of PATH and in QSA mode 
(appending query parameters) when forming the Location header. TS-3740 patch 
always uses client request URI when replacing those variables to form the 
Location header regardless of which hook condition matches while TS-3137 
patch uses the corresponding URI at each particular hook (which is consistent 
with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI when forming Location header (it is always available regardless of which 
hook condition matches, it fitted the above origin time-out use-case well and 
seemed a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!

 header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
 

 Key: TS-3740
 URL: https://issues.apache.org/jira/browse/TS-3740
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
Assignee: Gancho Tenev
 Fix For: 6.1.0


 DESCRIPTION:
 ATS header_rewrite plugin set-redirect operation doesn't work with 
 SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info.
 HOW TO REPRODUCE:
 Here is a sample plugin configuration files that reproduce the problem
 $ cat /opt/ats/etc/trafficserver/remap.config
 map http://p1 http://h1:8001 \
 @plugin=header_rewrite.so 
 @pparam=/opt/ats/etc/trafficserver/header_rewrite.config
 $ cat /opt/ats/etc/trafficserver/header_rewrite.config
 cond %{SEND_RESPONSE_HDR_HOOK}
 cond %{STATUS} =502
 set-redirect 302 http://p0/%{PATH} [QSA]
 DEBUGGING NOTES:
 Both conditions in the header_rewrite.config are evaluated correctly but 
 set-redirect has no effect and the response to the UA is not modified as 
 expected.  After some debugging it turned out that if the set-redirect 
 (OperatorSetDestination::exec) is not called from the remap plugin it has no 
 effect. The header_rewrite plugin creates a continuation to be called from 
 SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec 
 doesn't have code to handle the case when the set-redirect operation is _not_ 
 called directly from the remap plugin (TSRemapDoRemap()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK

2015-08-13 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695916#comment-14695916
 ] 

Gancho Tenev edited comment on TS-3740 at 8/13/15 8:52 PM:
---

Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of PATH and in QSA mode 
(appending query parameters) when forming the Location header. TS-3740 patch 
always uses client request URI when replacing those variables to form the 
Location header regardless of which hook condition matches while TS-3137 
patch uses the corresponding URI at each particular hook (which is consistent 
with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI to form Location header (it is always available regardless of which hook 
condition matches, it fitted the above origin time-out use-case well and seemed 
a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!


was (Author: gancho):
Yes, just by reading around the 2 patches (didn't have a chance to experiment 
with TS-3137 patch) it seems that they are meant to accomplish the same thing - 
enable set-redirect operation when not called from the remap plugin.

There is an important difference in the replacing of PATH and in QSA mode 
(appending query parameters to) when forming the Location header. TS-3740 
patch always uses client request URI when replacing those variables to form the 
Location header regardless of which hook condition matches while TS-3137 
patch uses the corresponding URI at each particular hook (which is consistent 
with the way set-destination is implemented).

I don't have visibility of all header-rewrite use-cases but it seems that 
although at the time it looked more reasonable to always use the client request 
URI to form Location header (it is always available regardless of which hook 
condition matches, it fitted the above origin time-out use-case well and seemed 
a more straightforward way to configure the redirects), it may be more 
reasonable to do it in TS-3137 way (which looks also more consistent with 
set-destination operation implementation as well).

Any ideas and opinions are appreciated!

 header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK
 

 Key: TS-3740
 URL: https://issues.apache.org/jira/browse/TS-3740
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
Assignee: Gancho Tenev
 Fix For: 6.1.0


 DESCRIPTION:
 ATS header_rewrite plugin set-redirect operation doesn't work with 
 SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info.
 HOW TO REPRODUCE:
 Here is a sample plugin configuration files that reproduce the problem
 $ cat /opt/ats/etc/trafficserver/remap.config
 map http://p1 http://h1:8001 \
 @plugin=header_rewrite.so 
 @pparam=/opt/ats/etc/trafficserver/header_rewrite.config
 $ cat /opt/ats/etc/trafficserver/header_rewrite.config
 cond %{SEND_RESPONSE_HDR_HOOK}
 cond %{STATUS} =502
 set-redirect 302 http://p0/%{PATH} [QSA]
 DEBUGGING NOTES:
 Both conditions in the header_rewrite.config are evaluated correctly but 
 set-redirect has no effect and the response to the UA is not modified as 
 expected.  After some debugging it turned out that if the set-redirect 
 (OperatorSetDestination::exec) is not called from the remap plugin it has no 
 effect. The header_rewrite plugin creates a continuation to be called from 
 SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec 
 doesn't have code to handle the case when the set-redirect operation is _not_ 
 called directly from the remap plugin (TSRemapDoRemap()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3740) header_rewrite plugin: set-redirect doesn't work with SEND_RESPONSE_HDR_HOOK

2015-07-06 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-3740:


 Summary: header_rewrite plugin: set-redirect doesn't work with 
SEND_RESPONSE_HDR_HOOK
 Key: TS-3740
 URL: https://issues.apache.org/jira/browse/TS-3740
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev


DESCRIPTION:

ATS header_rewrite plugin set-redirect operation doesn't work with 
SEND_RESPONSE_HDR_HOOK. Please see the debugging notes below for more info.


HOW TO REPRODUCE:

Here is a sample plugin configuration files that reproduce the problem

$ cat /opt/ats/etc/trafficserver/remap.config
map http://p1 http://h1:8001 \
@plugin=header_rewrite.so 
@pparam=/opt/ats/etc/trafficserver/header_rewrite.config

$ cat /opt/ats/etc/trafficserver/header_rewrite.config
cond %{SEND_RESPONSE_HDR_HOOK}
cond %{STATUS} =502
set-redirect 302 http://p0/%{PATH} [QSA]


DEBUGGING NOTES:

Both conditions in the header_rewrite.config are evaluated correctly but 
set-redirect has no effect and the response to the UA is not modified as 
expected.  After some debugging it turned out that if the set-redirect 
(OperatorSetDestination::exec) is not called from the remap plugin it has no 
effect. The header_rewrite plugin creates a continuation to be called from 
SEND_RESPONSE_HDR_HOOK (TSHttpHookAdd()). OperatorSetDestination::exec doesn't 
have code to handle the case when the set-redirect operation is _not_ called 
directly from the remap plugin (TSRemapDoRemap()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)

2015-06-01 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-3649:
-
Attachment: TS-3649-url_sig-security_issues.patch

Please find the patch attached: TS-3649-url_sig-security_issues.patch

 url_sig plugin security issues (crash by HTTP request, circumvent signature)
 

 Key: TS-3649
 URL: https://issues.apache.org/jira/browse/TS-3649
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
Assignee: Gancho Tenev
 Fix For: 6.0.0

 Attachments: TS-3649-url_sig-security_issues.patch, 
 TS-3649-url_sig-security_issues.rtf


 While reading the code found 2 security issues url_sig code which would allow:
 - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP 
 request (segmentation fault due out-of-bounds array access) - there is a need 
 of proper sanitation of the key index input (query parameter)
 - Issue 2: to gain access to protected assets by signing the URL with an 
 empty secret key if at least one of the 16 keys is not provided in the 
 uri_sig plugin configuration. One could scan trying all keys 0 to 15 and 
 for the empty key the signature validation would succeed - must deny access 
 if the key specified in the signature is not defined in the plugin config 
 (empty).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)

2015-05-29 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-3649:
-
Description: 
While reading the code found 2 security issues url_sig code which would allow:
- Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP 
request (segmentation fault due out-of-bounds array access) - there is a need 
of proper sanitation of the key index input (query parameter)
- Issue 2: to gain access to protected assets by signing the URL with an empty 
secret key if at least one of the 16 keys is not provided in the uri_sig plugin 
configuration. One could scan trying all keys 0 to 15 and for the empty key 
the signature validation would succeed - must deny access if the key specified 
in the signature is not defined in the plugin config (empty).


  was:
While reading the code found 2 security issues url_sig code which would allow:
- Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP 
request (segmentation fault due out-of-bounds array access) - there is a need 
of proper sanitation of the key index input (query parameter)
- Issue 2: to gain access to protected assets by signing the URL with an empty 
secret key if at least one of the 16 keys is not provided in the uri_sig plugin 
configuration. One could scan trying all keys 0 to 15 and for the empty key 
the signature validation would succeed - must to deny access if the key 
specified in the signature is not defined in the plugin config (empty).



 url_sig plugin security issues (crash by HTTP request, circumvent signature)
 

 Key: TS-3649
 URL: https://issues.apache.org/jira/browse/TS-3649
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
 Attachments: TS-3649-url_sig-security_issues.rtf


 While reading the code found 2 security issues url_sig code which would allow:
 - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP 
 request (segmentation fault due out-of-bounds array access) - there is a need 
 of proper sanitation of the key index input (query parameter)
 - Issue 2: to gain access to protected assets by signing the URL with an 
 empty secret key if at least one of the 16 keys is not provided in the 
 uri_sig plugin configuration. One could scan trying all keys 0 to 15 and 
 for the empty key the signature validation would succeed - must deny access 
 if the key specified in the signature is not defined in the plugin config 
 (empty).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)

2015-05-29 Thread Gancho Tenev (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565566#comment-14565566
 ] 

Gancho Tenev commented on TS-3649:
--

The fix is ready as well.

 url_sig plugin security issues (crash by HTTP request, circumvent signature)
 

 Key: TS-3649
 URL: https://issues.apache.org/jira/browse/TS-3649
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
 Attachments: TS-3649-url_sig-security_issues.rtf


 While reading the code found 2 security issues url_sig code which would allow:
 - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP 
 request (segmentation fault due out-of-bounds array access) - there is a need 
 of proper sanitation of the key index input (query parameter)
 - Issue 2: to gain access to protected assets by signing the URL with an 
 empty secret key if at least one of the 16 keys is not provided in the 
 uri_sig plugin configuration. One could scan trying all keys 0 to 15 and 
 for the empty key the signature validation would succeed - must to deny 
 access if the key specified in the signature is not defined in the plugin 
 config (empty).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)

2015-05-29 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-3649:


 Summary: url_sig plugin security issues (crash by HTTP request, 
circumvent signature)
 Key: TS-3649
 URL: https://issues.apache.org/jira/browse/TS-3649
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev


While reading the code found 2 security issues url_sig code which would allow:
- Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP 
request (segmentation fault due out-of-bounds array access) - there is a need 
of proper sanitation of the key index input (query parameter)
- Issue 2: to gain access to protected assets by signing the URL with an empty 
secret key if at least one of the 16 keys is not provided in the uri_sig plugin 
configuration. One could scan trying all keys 0 to 15 and for the empty key 
the signature validation would succeed - must to deny access if the key 
specified in the signature is not defined in the plugin config (empty).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3649) url_sig plugin security issues (crash by HTTP request, circumvent signature)

2015-05-29 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-3649:
-
Attachment: TS-3649-url_sig-security_issues.rtf

Please find information on how to setup the environment and steps to reproduce 
both issues in the attached file.

 url_sig plugin security issues (crash by HTTP request, circumvent signature)
 

 Key: TS-3649
 URL: https://issues.apache.org/jira/browse/TS-3649
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Gancho Tenev
 Attachments: TS-3649-url_sig-security_issues.rtf


 While reading the code found 2 security issues url_sig code which would allow:
 - Issue 1: to crash ATS which is running the url_sig plugin by using an HTTP 
 request (segmentation fault due out-of-bounds array access) - there is a need 
 of proper sanitation of the key index input (query parameter)
 - Issue 2: to gain access to protected assets by signing the URL with an 
 empty secret key if at least one of the 16 keys is not provided in the 
 uri_sig plugin configuration. One could scan trying all keys 0 to 15 and 
 for the empty key the signature validation would succeed - must to deny 
 access if the key specified in the signature is not defined in the plugin 
 config (empty).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3340) Coverity fixes

2015-01-28 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-3340:
-
  Description: This issue is meant for attaching all Coverity fixes from 
Gancho Tenev for 5.3.0 release.  (was: This issue is meant for attaching all 
Coverity fixes from Gancho Tenev for 5.3.x release.)
Fix Version/s: 5.3.0

 Coverity fixes
 --

 Key: TS-3340
 URL: https://issues.apache.org/jira/browse/TS-3340
 Project: Traffic Server
  Issue Type: Improvement
Reporter: Gancho Tenev
Priority: Minor
 Fix For: 5.3.0


 This issue is meant for attaching all Coverity fixes from Gancho Tenev for 
 5.3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3340) Coverity fixes

2015-01-28 Thread Gancho Tenev (JIRA)
Gancho Tenev created TS-3340:


 Summary: Coverity fixes
 Key: TS-3340
 URL: https://issues.apache.org/jira/browse/TS-3340
 Project: Traffic Server
  Issue Type: Improvement
Reporter: Gancho Tenev


This issue is meant for attaching all Coverity fixes from Gancho Tenev for 
5.3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3340) Coverity fixes

2015-01-28 Thread Gancho Tenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gancho Tenev updated TS-3340:
-
Priority: Minor  (was: Major)

 Coverity fixes
 --

 Key: TS-3340
 URL: https://issues.apache.org/jira/browse/TS-3340
 Project: Traffic Server
  Issue Type: Improvement
Reporter: Gancho Tenev
Priority: Minor

 This issue is meant for attaching all Coverity fixes from Gancho Tenev for 
 5.3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)