[jira] [Resolved] (TS-5022) Multiple Client Certificate to Origin

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-5022.

Resolution: Fixed

> Multiple Client Certificate to Origin
> -
>
> Key: TS-5022
> URL: https://issues.apache.org/jira/browse/TS-5022
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Security, SSL, TLS
>Reporter: Scott Beardsley
>Assignee: Syeda Persia Aziz
>  Labels: yahoo
> Fix For: 7.1.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Yahoo has a use case where the origin is doing mutual TLS authentication 
> which requires ATS to send a client certificate. This works fine (for now) 
> because ATS supports configuring *one* client cert but this feature should 
> really allow multiple client certificates to be configured which would depend 
> upon the origin being contacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-5095) IOBufferReader::read_avail adds considerably to CPU utilization

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-5095.

Resolution: Fixed

> IOBufferReader::read_avail adds considerably to CPU utilization
> ---
>
> Key: TS-5095
> URL: https://issues.apache.org/jira/browse/TS-5095
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When installing a new build of our ATS, we noticed that the CPU utilization 
> was higher than a non-upgraded peer in the same pod. Looking at perf top for 
> that process we saw that 10-17% of CPU was spent in 
> IOBufferReader::read_avail. In the older system, that function didn't show up 
> in the top couple screens of perf top.
> I tracked it down to an "ink_assert(read_avail() > 0)" call in 
> IOBufferReader::consume. We didn't have the debug asserts compiled in, but 
> the parameter arguments are still being executed. Commented out the 
> ink_assert call and the CPU utilization went back to normal.  We later found 
> a similar growth in IOBufferReader::read() due to the use of 
> IOBufferReader::read_avail.
> It looks like the issue is compiling with openssl 1.0.2 instead of openssl 
> 1.0.1.  We have been running against the openssl 1.0.2 library but were still 
> compiling against openssl 1.0.1.  We recently upgraded our builds to compile 
> against the new openssl as well. 
> I built a version of the older source against openssl 1.0.2 and it had very 
> similar performance to what I was seeing with the newer source.
> Still don't know why compiling against the newer openssl would make such a 
> difference.  But it is easy enough to rework these use cases to take 
> read_avail (which walks the block chain) out of the fast path.  This should 
> be a good optimization in any case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-5085) `posix_fadvise` is incorrectly used in traffic_logcat

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-5085.

Resolution: Fixed

> `posix_fadvise` is incorrectly used in traffic_logcat
> -
>
> Key: TS-5085
> URL: https://issues.apache.org/jira/browse/TS-5085
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Reporter: Daniel Xu
>Assignee: Daniel Xu
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {{traffic_logcat}} is currently using {{posix_fadvise}} the exact opposite 
> way it should be used. It's currently telling the kernel it doesn't need 
> anything in this file and then immediately proceeds to read everything in the 
> file anyways. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4948) CID 1364117 (Forward NULL) in proxy/http/HttpSM.cc

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4948.

Resolution: Fixed
  Assignee: Leif Hedstrom

> CID 1364117 (Forward NULL) in proxy/http/HttpSM.cc
> --
>
> Key: TS-4948
> URL: https://issues.apache.org/jira/browse/TS-4948
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Leif Hedstrom
>Assignee: Leif Hedstrom
> Fix For: 7.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> *** CID 1364117:(FORWARD_NULL)
> /proxy/http/HttpSM.cc: 2111 in HttpSM::process_hostdb_info(HostDBInfo *)()
> 2105 void
> 2106 HttpSM::process_hostdb_info(HostDBInfo *r)
> 2107 {
> 2108   // Increment the refcount to our item, since we are pointing at it
> 2109   t_state.hostdb_entry = Ptr(r);
> 2110 
>CID 1364117:(FORWARD_NULL)
>Assigning: "client_addr" = "NULL".
> 2111   sockaddr const *client_addr = NULL;
> 2112   bool use_client_addr= 
> t_state.http_config_param->use_client_target_addr == 1 && 
> t_state.client_info.is_transparent &&
> 2113  t_state.dns_info.os_addr_style == 
> HttpTransact::DNSLookupInfo::OS_ADDR_TRY_DEFAULT;
> 2114   if (use_client_addr) {
> 2115 NetVConnection *vc = t_state.state_machine->ua_session ? 
> t_state.state_machine->ua_session->get_netvc() : NULL;
> 2116 if (vc) {
> /proxy/http/HttpSM.cc: 2111 in HttpSM::process_hostdb_info(HostDBInfo *)()
> 2105 void
> 2106 HttpSM::process_hostdb_info(HostDBInfo *r)
> 2107 {
> 2108   // Increment the refcount to our item, since we are pointing at it
> 2109   t_state.hostdb_entry = Ptr(r);
> 2110 
>CID 1364117:(FORWARD_NULL)
>Assigning: "client_addr" = "NULL".
> 2111   sockaddr const *client_addr = NULL;
> 2112   bool use_client_addr= 
> t_state.http_config_param->use_client_target_addr == 1 && 
> t_state.client_info.is_transparent &&
> 2113  t_state.dns_info.os_addr_style == 
> HttpTransact::DNSLookupInfo::OS_ADDR_TRY_DEFAULT;
> 2114   if (use_client_addr) {
> 2115 NetVConnection *vc = t_state.state_machine->ua_session ? 
> t_state.state_machine->ua_session->get_netvc() : NULL;
> 2116 if (vc) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-5093) Augment and fix crash in slow log

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-5093.

Resolution: Fixed

> Augment and fix crash in slow log
> -
>
> Key: TS-5093
> URL: https://issues.apache.org/jira/browse/TS-5093
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While tracking down a performance problem in HTTP/2, added some values to the 
> slow log entry. Also rearranged the call to avoid a use-after-free crash 
> calling update_stats HttpSM::kill_this().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-5053) const char **argv passed to TSPluginInit is not null terminated

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-5053.

Resolution: Fixed

> const char **argv passed to TSPluginInit is not null terminated
> ---
>
> Key: TS-5053
> URL: https://issues.apache.org/jira/browse/TS-5053
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Reporter: Daniel Xu
>Assignee: Daniel Xu
> Fix For: 7.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See title. Typically **argv is null terminated in other systems. And who are 
> we to question 1000 years of tradition?
> One example of an issue is that {{lib/ts/ink_args.cc}} actually relies on 
> **argv being null terminated. Interesting segfaults occur in plugins usings 
> the ATS argument parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-5103) Always tunnel non-keepalive HTTP request if tr-pass enabled

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-5103.

Resolution: Fixed

> Always tunnel non-keepalive HTTP request if tr-pass enabled
> ---
>
> Key: TS-5103
> URL: https://issues.apache.org/jira/browse/TS-5103
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Oknet Xu
>Assignee: Oknet Xu
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Should use ua_buffer_reader instead of ua_raw_buffer_reader.
> {code}
>   // If we had a GET request that has data after the
>   // get request, do blind tunnel
> } else if (state == PARSE_DONE && 
> t_state.hdr_info.client_request.method_get_wksidx() == HTTP_WKSIDX_GET &&
>ua_raw_buffer_reader->read_avail() > 0 && 
> !t_state.hdr_info.client_request.is_keep_alive_set()) {
>   do_blind_tunnel = true;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-5103) Always tunnel non-keepalive HTTP request if tr-pass enabled

2017-01-06 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-5103:
---
Assignee: Oknet Xu

> Always tunnel non-keepalive HTTP request if tr-pass enabled
> ---
>
> Key: TS-5103
> URL: https://issues.apache.org/jira/browse/TS-5103
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Oknet Xu
>Assignee: Oknet Xu
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Should use ua_buffer_reader instead of ua_raw_buffer_reader.
> {code}
>   // If we had a GET request that has data after the
>   // get request, do blind tunnel
> } else if (state == PARSE_DONE && 
> t_state.hdr_info.client_request.method_get_wksidx() == HTTP_WKSIDX_GET &&
>ua_raw_buffer_reader->read_avail() > 0 && 
> !t_state.hdr_info.client_request.is_keep_alive_set()) {
>   do_blind_tunnel = true;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-5095) IOBufferReader::read_avail adds considerably to CPU utilization

2016-12-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-5095:
--

Assignee: Susan Hinrichs

> IOBufferReader::read_avail adds considerably to CPU utilization
> ---
>
> Key: TS-5095
> URL: https://issues.apache.org/jira/browse/TS-5095
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> When installing a new build of our ATS, we noticed that the CPU utilization 
> was higher than a non-upgraded peer in the same pod. Looking at perf top for 
> that process we saw that 10-17% of CPU was spent in 
> IOBufferReader::read_avail. In the older system, that function didn't show up 
> in the top couple screens of perf top.
> I tracked it down to an "ink_assert(read_avail() > 0)" call in 
> IOBufferReader::consume. We didn't have the debug asserts compiled in, but 
> the parameter arguments are still being executed. Commented out the 
> ink_assert call and the CPU utilization went back to normal.  We later found 
> a similar growth in IOBufferReader::read() due to the use of 
> IOBufferReader::read_avail.
> It looks like the issue is compiling with openssl 1.0.2 instead of openssl 
> 1.0.1.  We have been running against the openssl 1.0.2 library but were still 
> compiling against openssl 1.0.1.  We recently upgraded our builds to compile 
> against the new openssl as well. 
> I built a version of the older source against openssl 1.0.2 and it had very 
> similar performance to what I was seeing with the newer source.
> Still don't know why compiling against the newer openssl would make such a 
> difference.  But it is easy enough to rework these use cases to take 
> read_avail (which walks the block chain) out of the fast path.  This should 
> be a good optimization in any case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-5095) IOBufferReader::read_avail adds considerably to CPU utilization

2016-12-13 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-5095:
--

 Summary: IOBufferReader::read_avail adds considerably to CPU 
utilization
 Key: TS-5095
 URL: https://issues.apache.org/jira/browse/TS-5095
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Susan Hinrichs


When installing a new build of our ATS, we noticed that the CPU utilization was 
higher than a non-upgraded peer in the same pod. Looking at perf top for that 
process we saw that 10-17% of CPU was spent in IOBufferReader::read_avail. In 
the older system, that function didn't show up in the top couple screens of 
perf top.

I tracked it down to an "ink_assert(read_avail() > 0)" call in 
IOBufferReader::consume. We didn't have the debug asserts compiled in, but the 
parameter arguments are still being executed. Commented out the ink_assert call 
and the CPU utilization went back to normal.  We later found a similar growth 
in IOBufferReader::read() due to the use of IOBufferReader::read_avail.

It looks like the issue is compiling with openssl 1.0.2 instead of openssl 
1.0.1.  We have been running against the openssl 1.0.2 library but were still 
compiling against openssl 1.0.1.  We recently upgraded our builds to compile 
against the new openssl as well. 

I built a version of the older source against openssl 1.0.2 and it had very 
similar performance to what I was seeing with the newer source.

Still don't know why compiling against the newer openssl would make such a 
difference.  But it is easy enough to rework these use cases to take read_avail 
(which walks the block chain) out of the fast path.  This should be a good 
optimization in any case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-5093) Augment and fix crash in slow log

2016-12-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-5093:
--

Assignee: Susan Hinrichs

> Augment and fix crash in slow log
> -
>
> Key: TS-5093
> URL: https://issues.apache.org/jira/browse/TS-5093
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> While tracking down a performance problem in HTTP/2, added some values to the 
> slow log entry. Also rearranged the call to avoid a use-after-free crash 
> calling update_stats HttpSM::kill_this().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-5093) Augment and fix crash in slow log

2016-12-13 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-5093:
--

 Summary: Augment and fix crash in slow log
 Key: TS-5093
 URL: https://issues.apache.org/jira/browse/TS-5093
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Susan Hinrichs


While tracking down a performance problem in HTTP/2, added some values to the 
slow log entry. Also rearranged the call to avoid a use-after-free crash 
calling update_stats HttpSM::kill_this().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-5092) ATS handling of too many concurrent streams too agressive and maybe out of spec

2016-12-13 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-5092:
--

 Summary: ATS handling of too many concurrent streams too agressive 
and maybe out of spec
 Key: TS-5092
 URL: https://issues.apache.org/jira/browse/TS-5092
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP/2
Reporter: Susan Hinrichs


This issue was identified while debugging new errors seen by an internal team 
after they enabled HTTP/2 in their client. On the backend, they saw an increase 
in the cases were ATS sends the origin the POST header but no POST body and 
then closes the connection.
With the addition of Error() messages we were able to see a case where the 
client is trying to open the 101'st stream on a session. This is beyond the 100 
max concurrent stream limit, so ATS shuts down the session which kills the 
previous 100 streams.

A closer reading of section 5.1.2 of the spec 
(https://tools.ietf.org/html/rfc7540#section-5.1.2) indicates that this should 
be a stream error and not a connection error. Bryan Call, Masaori, and Maskit 
confirmed this interpretation. Maskit also noted that the other error case in 
the current createStream method must be treated as a connection error.
Presumably the client library is expecting the refused stream case so it can 
try again later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-5092) ATS handling of too many concurrent streams too agressive and maybe out of spec

2016-12-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-5092:
--

Assignee: Susan Hinrichs

> ATS handling of too many concurrent streams too agressive and maybe out of 
> spec
> ---
>
> Key: TS-5092
> URL: https://issues.apache.org/jira/browse/TS-5092
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> This issue was identified while debugging new errors seen by an internal team 
> after they enabled HTTP/2 in their client. On the backend, they saw an 
> increase in the cases were ATS sends the origin the POST header but no POST 
> body and then closes the connection.
> With the addition of Error() messages we were able to see a case where the 
> client is trying to open the 101'st stream on a session. This is beyond the 
> 100 max concurrent stream limit, so ATS shuts down the session which kills 
> the previous 100 streams.
> A closer reading of section 5.1.2 of the spec 
> (https://tools.ietf.org/html/rfc7540#section-5.1.2) indicates that this 
> should be a stream error and not a connection error. Bryan Call, Masaori, and 
> Maskit confirmed this interpretation. Maskit also noted that the other error 
> case in the current createStream method must be treated as a connection error.
> Presumably the client library is expecting the refused stream case so it can 
> try again later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-5091) Crash if server session from global pool is not alive

2016-12-13 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-5091:
--

 Summary: Crash if server session from global pool is not alive
 Key: TS-5091
 URL: https://issues.apache.org/jira/browse/TS-5091
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Susan Hinrichs


We have seen the following stack in production.  The problem is the migration 
fails, but we set the netvc to null before calling do_io_close.  This causes 
the get_server_ip() call in HttpServerSession::do_io_close to dereference a 
NULL.

{code}
(gdb) bt
#0  0x2aae5acc2625 in raise () from /lib64/libc.so.6
#1  0x2aae5acc3d8d in abort () from /lib64/libc.so.6
#2  0x2aae58062149 in ink_die_die_die () at 
../../../../trafficserver/lib/ts/ink_error.cc:43
#3  0x2aae58062202 in ink_fatal_va(const char *, typedef __va_list_tag 
__va_list_tag *) (fmt=0x2aae58077b18 "%s:%d: failed assert `%s`", 
ap=0x2aae62dccd68) at ../../../../trafficserver/lib/ts/ink_error.cc:65
#4  0x2aae580622a1 in ink_fatal (message_format=0x2aae58077b18 "%s:%d: 
failed assert `%s`") at ../../../../trafficserver/lib/ts/ink_error.cc:73
#5  0x2aae5805fa06 in _ink_assert (expression=0x7c2a51 "server_vc != NULL", 
file=0x7c2a10 
"../../../../trafficserver/proxy/http/../http/HttpServerSession.h", line=123) 
at ../../../../trafficserver/lib/ts/ink_assert.cc:37
#6  0x006038b9 in HttpServerSession::get_server_ip 
(this=0x2aac08b0efd0) at 
../../../../trafficserver/proxy/http/../http/HttpServerSession.h:123
#7  0x0060801e in HttpServerSession::do_io_close (this=0x2aac08b0efd0, 
alerrno=-1) at ../../../../trafficserver/proxy/http/HttpServerSession.cc:130
#8  0x00609b12 in HttpSessionManager::acquire_session (this=0xae1ba0, 
ip=0x2aac2f2817c8, hostname=0x2aac305e8a19 "sc1.ycpi.vip.bf1.yahoo.com", 
ua_session=0x2aac104fca70, sm=0x2aac2f2810b0) at 
../../../../trafficserver/proxy/http/HttpSessionManager.cc:311
#9  0x005f7554 in HttpSM::do_http_server_open (this=0x2aac2f2810b0, 
raw=false) at ../../../../trafficserver/proxy/http/HttpSM.cc:4872
#10 0x00600357 in HttpSM::set_next_state (this=0x2aac2f2810b0) at 
../../../../trafficserver/proxy/http/HttpSM.cc:7385
#11 0x005ff4ee in HttpSM::call_transact_and_set_next_state 
(this=0x2aac2f2810b0, f=0) at 
../../../../trafficserver/proxy/http/HttpSM.cc:7198
#12 0x005ef0d5 in HttpSM::state_cache_open_write (this=0x2aac2f2810b0, 
event=1108, data=0x2aaad0203540) at 
../../../../trafficserver/proxy/http/HttpSM.cc:2581
#13 0x005ef79c in HttpSM::main_handler (this=0x2aac2f2810b0, 
event=1108, data=0x2aaad0203540) at 
../../../../trafficserver/proxy/http/HttpSM.cc:2693
#14 0x0051381a in Continuation::handleEvent (this=0x2aac2f2810b0, 
event=1108, data=0x2aaad0203540) at 
../../../trafficserver/iocore/eventsystem/I_Continuation.h:150
#15 0x005d9e38 in HttpCacheSM::state_cache_open_write 
(this=0x2aac2f282ba0, event=1108, data=0x2aaad0203540) at 
../../../../trafficserver/proxy/http/HttpCacheSM.cc:167
#16 0x0051381a in Continuation::handleEvent (this=0x2aac2f282ba0, 
event=1108, data=0x2aaad0203540) at 
../../../trafficserver/iocore/eventsystem/I_Continuation.h:150
#17 0x0073c9be in CacheVC::callcont (this=0x2aaad0203540, event=1108) 
at ../../../../trafficserver/iocore/cache/P_CacheInternal.h:673
#18 0x00746aca in Cache::open_write (this=0x2aaadc008c90, 
cont=0x2aac2f282ba0, key=0x2aae62dcd650, info=0x0, apin_in_cache=0, 
type=CACHE_FRAG_TYPE_HTTP, hostname=0x2aac0cad3405 
"68.media.tumblr.com9548920493ae47f3954b2a04b9d8763a/tumblr_inline_n1gft6Y4Vj1qjk6k8.jpg",
 host_len=19) at ../../../../trafficserver/iocore/cache/CacheWrite.cc:1789
#19 0x00723667 in Cache::open_write (this=0x2aaadc008c90, 
cont=0x2aac2f282ba0, url=0x2aac2f281158, request=0x2aac2f281828, old_info=0x0, 
pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
../../../../trafficserver/iocore/cache/P_CacheInternal.h:1104
#20 0x007211be in CacheProcessor::open_write (this=0x1059780, 
cont=0x2aac2f282ba0, expected_size=0, url=0x2aac2f281158, 
cluster_cache_local=false, request=0x2aac2f281828, old_info=0x0, 
pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
../../../../trafficserver/iocore/cache/Cache.cc:3701
#21 0x005da25e in HttpCacheSM::open_write (this=0x2aac2f282ba0, 
url=0x2aac2f281158, request=0x2aac2f281828, old_info=0x0, pin_in_cache=0, 
retry=true, allow_multiple=false) at 
../../../../trafficserver/proxy/http/HttpCacheSM.cc:298
#22 0x005f672b in HttpSM::do_cache_prepare_action (this=0x2aac2f2810b0, 
c_sm=0x2aac2f282ba0, object_read_info=0x0, retry=true, allow_multiple=false) at 
../../../../trafficserver/proxy/http/HttpSM.cc:4686
#23 0x0060648d in HttpSM::do_cache_prepare_write (this=0x2aac2f2810b0) 
at ../../../../trafficserver/proxy/http/HttpSM.cc:4611
#24 0x00600702 in HttpSM::set_next_state (this=0x2aac2f2810b0) 

[jira] [Assigned] (TS-5091) Crash if server session from global pool is not alive

2016-12-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-5091:
--

Assignee: Susan Hinrichs

> Crash if server session from global pool is not alive
> -
>
> Key: TS-5091
> URL: https://issues.apache.org/jira/browse/TS-5091
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> We have seen the following stack in production.  The problem is the migration 
> fails, but we set the netvc to null before calling do_io_close.  This causes 
> the get_server_ip() call in HttpServerSession::do_io_close to dereference a 
> NULL.
> {code}
> (gdb) bt
> #0  0x2aae5acc2625 in raise () from /lib64/libc.so.6
> #1  0x2aae5acc3d8d in abort () from /lib64/libc.so.6
> #2  0x2aae58062149 in ink_die_die_die () at 
> ../../../../trafficserver/lib/ts/ink_error.cc:43
> #3  0x2aae58062202 in ink_fatal_va(const char *, typedef __va_list_tag 
> __va_list_tag *) (fmt=0x2aae58077b18 "%s:%d: failed assert `%s`", 
> ap=0x2aae62dccd68) at ../../../../trafficserver/lib/ts/ink_error.cc:65
> #4  0x2aae580622a1 in ink_fatal (message_format=0x2aae58077b18 "%s:%d: 
> failed assert `%s`") at ../../../../trafficserver/lib/ts/ink_error.cc:73
> #5  0x2aae5805fa06 in _ink_assert (expression=0x7c2a51 "server_vc != 
> NULL", file=0x7c2a10 
> "../../../../trafficserver/proxy/http/../http/HttpServerSession.h", line=123) 
> at ../../../../trafficserver/lib/ts/ink_assert.cc:37
> #6  0x006038b9 in HttpServerSession::get_server_ip 
> (this=0x2aac08b0efd0) at 
> ../../../../trafficserver/proxy/http/../http/HttpServerSession.h:123
> #7  0x0060801e in HttpServerSession::do_io_close 
> (this=0x2aac08b0efd0, alerrno=-1) at 
> ../../../../trafficserver/proxy/http/HttpServerSession.cc:130
> #8  0x00609b12 in HttpSessionManager::acquire_session (this=0xae1ba0, 
> ip=0x2aac2f2817c8, hostname=0x2aac305e8a19 "sc1.ycpi.vip.bf1.yahoo.com", 
> ua_session=0x2aac104fca70, sm=0x2aac2f2810b0) at 
> ../../../../trafficserver/proxy/http/HttpSessionManager.cc:311
> #9  0x005f7554 in HttpSM::do_http_server_open (this=0x2aac2f2810b0, 
> raw=false) at ../../../../trafficserver/proxy/http/HttpSM.cc:4872
> #10 0x00600357 in HttpSM::set_next_state (this=0x2aac2f2810b0) at 
> ../../../../trafficserver/proxy/http/HttpSM.cc:7385
> #11 0x005ff4ee in HttpSM::call_transact_and_set_next_state 
> (this=0x2aac2f2810b0, f=0) at 
> ../../../../trafficserver/proxy/http/HttpSM.cc:7198
> #12 0x005ef0d5 in HttpSM::state_cache_open_write 
> (this=0x2aac2f2810b0, event=1108, data=0x2aaad0203540) at 
> ../../../../trafficserver/proxy/http/HttpSM.cc:2581
> #13 0x005ef79c in HttpSM::main_handler (this=0x2aac2f2810b0, 
> event=1108, data=0x2aaad0203540) at 
> ../../../../trafficserver/proxy/http/HttpSM.cc:2693
> #14 0x0051381a in Continuation::handleEvent (this=0x2aac2f2810b0, 
> event=1108, data=0x2aaad0203540) at 
> ../../../trafficserver/iocore/eventsystem/I_Continuation.h:150
> #15 0x005d9e38 in HttpCacheSM::state_cache_open_write 
> (this=0x2aac2f282ba0, event=1108, data=0x2aaad0203540) at 
> ../../../../trafficserver/proxy/http/HttpCacheSM.cc:167
> #16 0x0051381a in Continuation::handleEvent (this=0x2aac2f282ba0, 
> event=1108, data=0x2aaad0203540) at 
> ../../../trafficserver/iocore/eventsystem/I_Continuation.h:150
> #17 0x0073c9be in CacheVC::callcont (this=0x2aaad0203540, event=1108) 
> at ../../../../trafficserver/iocore/cache/P_CacheInternal.h:673
> #18 0x00746aca in Cache::open_write (this=0x2aaadc008c90, 
> cont=0x2aac2f282ba0, key=0x2aae62dcd650, info=0x0, apin_in_cache=0, 
> type=CACHE_FRAG_TYPE_HTTP, hostname=0x2aac0cad3405 
> "68.media.tumblr.com9548920493ae47f3954b2a04b9d8763a/tumblr_inline_n1gft6Y4Vj1qjk6k8.jpg",
>  host_len=19) at ../../../../trafficserver/iocore/cache/CacheWrite.cc:1789
> #19 0x00723667 in Cache::open_write (this=0x2aaadc008c90, 
> cont=0x2aac2f282ba0, url=0x2aac2f281158, request=0x2aac2f281828, 
> old_info=0x0, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
> ../../../../trafficserver/iocore/cache/P_CacheInternal.h:1104
> #20 0x007211be in CacheProcessor::open_write (this=0x1059780, 
> cont=0x2aac2f282ba0, expected_size=0, url=0x2aac2f281158, 
> cluster_cache_local=false, request=0x2aac2f281828, old_info=0x0, 
> pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at 
> ../../../../trafficserver/iocore/cache/Cache.cc:3701
> #21 0x005da25e in HttpCacheSM::open_write (this=0x2aac2f282ba0, 
> url=0x2aac2f281158, request=0x2aac2f281828, old_info=0x0, pin_in_cache=0, 
> retry=true, allow_multiple=false) at 
> ../../../../trafficserver/proxy/http/HttpCacheSM.cc:298
> #22 0x005f672b in 

[jira] [Created] (TS-5068) Trafficserver build should verify that openssl it uses is compiled with thread support

2016-11-28 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-5068:
--

 Summary: Trafficserver build should verify that openssl it uses is 
compiled with thread support
 Key: TS-5068
 URL: https://issues.apache.org/jira/browse/TS-5068
 Project: Traffic Server
  Issue Type: Bug
  Components: Build
Reporter: Susan Hinrichs


One of our teams built their own TrafficServer 7.0 and built their own openssl. 
 The openssl was compiled with thread support disabled.  Running traffic server 
against this openssl caused frequent segfaults.

We should expand our automake logic to verify that the openssl we are compiling 
against has thread support compiled in.  It would also be nice to make the same 
check at runtime.

openssl version -a will show -DOPENSSL_THREADS in the list of compile flags if 
threads was enabled at compile time.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4179) OCSP stapling broken with RSA+ECDSA cert serving

2016-11-28 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4179:
---
Assignee: Syeda Persia Aziz  (was: Susan Hinrichs)

> OCSP stapling broken with RSA+ECDSA cert serving
> 
>
> Key: TS-4179
> URL: https://issues.apache.org/jira/browse/TS-4179
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: SSL
>Reporter: Scott Beardsley
>Assignee: Syeda Persia Aziz
>Priority: Minor
>  Labels: yahoo
> Fix For: 7.1.0
>
>
> When I try to serve both an RSA and an ECDSA cert using a config like so:
> $ grep ocsp records.config
> CONFIG proxy.config.ssl.ocsp.enabled INT 1
> $ grep -v ^# ssl_multicert.config
> dest_ip=* ssl_cert_name=ecdsa.crt,rsa.crt ssl_key_name=ecdsa.key,rsa.key
> I get the following error displayed in diags.log:
> WARNING: fail to configure SSL_CTX for OCSP Stapling info for certificate at 
> ecdsa.crt
> Also when I connect via either of the following I get no stapled cert:
> $ openssl s_client -connect localhost:443 -cipher 'ECDHE-ECDSA-AES128-SHA' 
> -status
> CONNECTED(0003)
> OCSP response: no response sent
> ...
> $ openssl s_client -connect localhost:443 -cipher 'ECDHE-RSA-AES128-SHA' 
> -status
> CONNECTED(0003)
> OCSP response: no response sent
> ...
> $
> Here are the debug log messages:
> diags.log:[Feb  5 22:44:03.230] Server {0x2afd2845bd80} WARNING: fail to 
> configure SSL_CTX for OCSP Stapling info for certificate at ecdsa.crt
> traffic.out:[Feb  5 22:44:03.230] Server {0x2afd2845bd80} DEBUG: (ssl) ssl 
> ocsp stapling is enabled
> traffic.out:[Feb  5 22:44:41.250] Server {0x2afd2ab89700} DEBUG: (ssl) 
> ssl_callback_ocsp_stapling: fail to get certificate information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-5022) Multiple Client Certificate to Origin

2016-11-03 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-5022:
---
Assignee: Syeda Persia Aziz  (was: Leif Hedstrom)

> Multiple Client Certificate to Origin
> -
>
> Key: TS-5022
> URL: https://issues.apache.org/jira/browse/TS-5022
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Security, SSL, TLS
>Reporter: Scott Beardsley
>Assignee: Syeda Persia Aziz
>  Labels: yahoo
> Fix For: 7.1.0
>
>
> Yahoo has a use case where the origin is doing mutual TLS authentication 
> which requires ATS to send a client certificate. This works fine (for now) 
> because ATS supports configuring *one* client cert but this feature should 
> really allow multiple client certificates to be configured which would depend 
> upon the origin being contacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4974) Bad debug assert in HttpSM::handle_server_setup_error

2016-10-20 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4974.

Resolution: Fixed

> Bad debug assert in HttpSM::handle_server_setup_error
> -
>
> Key: TS-4974
> URL: https://issues.apache.org/jira/browse/TS-4974
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 7.0.0
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When running with debug_enable, the non-release assert in 
> HttpSM::handle_server_setup_error sometimes goes off.
> ink_assert(server_entry->read_vio == data);
> In the crash case, the data corresponds to server_entry->write_vio. Reviewing 
> the function, I don't see why it is bad that this function is called with the 
> write vio.  The actual IO operations are performed against 
> server_entry->read_vio and server_entry->write_io instead of the parameter 
> vio directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4972) Allow collapsed_forwarding plugin to be configured global or per remap

2016-10-20 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4972.

Resolution: Fixed

> Allow collapsed_forwarding plugin to be configured global or per remap
> --
>
> Key: TS-4972
> URL: https://issues.apache.org/jira/browse/TS-4972
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently collpased_forwarding plugin can only be configured on remap rules.  
> It would be convenient to just configure it globally for some environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4813) HttpTunnel.cc:1215: failed assertion `p->alive == true || event == HTTP_TUNNEL_EVENT_PRECOMPLETE ...

2016-10-18 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4813:
---
Backport to Version: 6.2.1

> HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE ...
> 
>
> Key: TS-4813
> URL: https://issues.apache.org/jira/browse/TS-4813
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Blocker
>  Labels: crash
> Fix For: 7.0.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Seeing this with current (as of right now) master, on docs.trafficserver:
> {code}
> FATAL: HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE || event == VC_EVENT_EOS || 
> sm->enable_redirection || (p->self_consumer && p->self_consumer->alive == 
> true)`
> traffic_server: using root directory '/opt/ats'
> traffic_server: Aborted (Signal sent by tkill() 13188 99)
> traffic_server - STACK TRACE:
> /opt/ats/lib/libtsutil.so.7(signal_crash_handler(int, siginfo_t*, 
> void*)+0x18)[0x2b6d1031729e]
> /opt/ats/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
> void*)+0x155)[0x534104]
> /lib64/libpthread.so.0(+0xf100)[0x2b6d1240f100]
> /lib64/libc.so.6(gsignal+0x37)[0x2b6d12d6e5f7]
> /lib64/libc.so.6(abort+0x148)[0x2b6d12d6fce8]
> /opt/ats/lib/libtsutil.so.7(ink_warning(char const*, ...)+0x0)[0x2b6d102f6f4d]
> /opt/ats/lib/libtsutil.so.7(+0x733a7)[0x2b6d102f13a7]
> /opt/ats/bin/traffic_server(HttpTunnel::producer_handler(int, 
> HttpTunnelProducer*)+0xd14)[0x768a12]
> /opt/ats/bin/traffic_server(HttpTunnel::main_handler(int, 
> void*)+0x13b)[0x76b6e1]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(HttpSM::state_watch_for_client_abort(int, 
> void*)+0x9fe)[0x68c5e6]
> /opt/ats/bin/traffic_server(HttpSM::main_handler(int, void*)+0x58e)[0x69b7ec]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(Http2Stream::main_event_handler(int, 
> void*)+0x59f)[0x79c1df]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(EThread::process_event(Event*, 
> int)+0x2cf)[0xa809fb]
> /opt/ats/bin/traffic_server(EThread::execute()+0x671)[0xa8140f]
> /opt/ats/bin/traffic_server[0xa7f407]
> /lib64/libpthread.so.0(+0x7dc5)[0x2b6d12407dc5]
> /lib64/libc.so.6(clone+0x6d)[0x2b6d12e2fced]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

2016-10-14 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576330#comment-15576330
 ] 

Susan Hinrichs commented on TS-4970:


Would it be easier to just back port the fix from TS-4590?  I'm a bit concerned 
about the proposed fix in the PR.  I don't think is correctly using the 
m_deleted/m_deletable parameters?

> Crash in INKVConnInternal when handle_event is called after destroy()
> -
>
> Key: TS-4970
> URL: https://issues.apache.org/jira/browse/TS-4970
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Thomas Jackson
>Assignee: Thomas Jackson
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x in ?? ()
> #1  0x004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
> edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
> calling_code=1) at UnixEThread.cc:144
> #5  0x006f993b in EThread::execute (this=0x2afe3dd07000)
> at UnixEThread.cc:195
> #6  0x006f832a in spawn_thread_internal (a=0x2afe3badf400)
> at Thread.cc:88
> #7  0x003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x0038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = { = { = { = 
> { = { = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>   handler = (int (Continuation::*)(Continuation *, int, 
> void *)) 0x4cfd90 , mutex = {
> m_ptr = 0x0}, link = { = {next = 0x0}, 
> prev = 0x0}}, lerrno = 20600}, }, 
> mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>  <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
> m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
> nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
> nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
> vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4974) Bad debug assert in HttpSM::handle_server_setup_error

2016-10-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4974:
---
Affects Version/s: 7.0.0

> Bad debug assert in HttpSM::handle_server_setup_error
> -
>
> Key: TS-4974
> URL: https://issues.apache.org/jira/browse/TS-4974
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 7.0.0
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> When running with debug_enable, the non-release assert in 
> HttpSM::handle_server_setup_error sometimes goes off.
> ink_assert(server_entry->read_vio == data);
> In the crash case, the data corresponds to server_entry->write_vio. Reviewing 
> the function, I don't see why it is bad that this function is called with the 
> write vio.  The actual IO operations are performed against 
> server_entry->read_vio and server_entry->write_io instead of the parameter 
> vio directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-4974) Bad debug assert in HttpSM::handle_server_setup_error

2016-10-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-4974:
--

Assignee: Susan Hinrichs

> Bad debug assert in HttpSM::handle_server_setup_error
> -
>
> Key: TS-4974
> URL: https://issues.apache.org/jira/browse/TS-4974
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> When running with debug_enable, the non-release assert in 
> HttpSM::handle_server_setup_error sometimes goes off.
> ink_assert(server_entry->read_vio == data);
> In the crash case, the data corresponds to server_entry->write_vio. Reviewing 
> the function, I don't see why it is bad that this function is called with the 
> write vio.  The actual IO operations are performed against 
> server_entry->read_vio and server_entry->write_io instead of the parameter 
> vio directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4974) Bad debug assert in HttpSM::handle_server_setup_error

2016-10-14 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4974:
--

 Summary: Bad debug assert in HttpSM::handle_server_setup_error
 Key: TS-4974
 URL: https://issues.apache.org/jira/browse/TS-4974
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Susan Hinrichs


When running with debug_enable, the non-release assert in 
HttpSM::handle_server_setup_error sometimes goes off.

ink_assert(server_entry->read_vio == data);

In the crash case, the data corresponds to server_entry->write_vio. Reviewing 
the function, I don't see why it is bad that this function is called with the 
write vio.  The actual IO operations are performed against 
server_entry->read_vio and server_entry->write_io instead of the parameter vio 
directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4972) Allow collapsed_forwarding plugin to be configured global or per remap

2016-10-14 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4972:
--

 Summary: Allow collapsed_forwarding plugin to be configured global 
or per remap
 Key: TS-4972
 URL: https://issues.apache.org/jira/browse/TS-4972
 Project: Traffic Server
  Issue Type: Improvement
  Components: Plugins
Reporter: Susan Hinrichs


Currently collpased_forwarding plugin can only be configured on remap rules.  
It would be convenient to just configure it globally for some environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-4972) Allow collapsed_forwarding plugin to be configured global or per remap

2016-10-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-4972:
--

Assignee: Susan Hinrichs

> Allow collapsed_forwarding plugin to be configured global or per remap
> --
>
> Key: TS-4972
> URL: https://issues.apache.org/jira/browse/TS-4972
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> Currently collpased_forwarding plugin can only be configured on remap rules.  
> It would be convenient to just configure it globally for some environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4876) FATAL: Http2DependencyTree.h:319: failed assertion `node->parent->queue->top() == node->entry`

2016-10-11 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566801#comment-15566801
 ] 

Susan Hinrichs commented on TS-4876:


This may be related to TS-4915.  We found issues in the PriorityQueue which is 
also used in this code.

> FATAL: Http2DependencyTree.h:319: failed assertion 
> `node->parent->queue->top() == node->entry`
> --
>
> Key: TS-4876
> URL: https://issues.apache.org/jira/browse/TS-4876
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Leif Hedstrom
>Priority: Blocker
>  Labels: crash
> Fix For: 7.1.0
>
>
> All I had to do was navigate docs.trafficserver using Safari on macOS.
> {code}
> #0  0x2b41f07405f7 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x2b41f0741ce8 in __GI_abort () at abort.c:90
> #2  0x2b41edcc901d in ink_abort (message_format=0x2b41edcf4920 "%s:%d: 
> failed assertion `%s`") at ink_error.cc:79
> #3  0x2b41edcc2bf7 in _ink_assert (expression=0xb82fc0 
> "node->parent->queue->top() == node->entry", file=0xb82f80 
> "Http2DependencyTree.h", line=319) at ink_assert.cc:37
> #4  0x007b83fa in Http2DependencyTree::deactivate 
> (this=0x60299eb0, node=0x607000383cf0, sent=4192) at 
> Http2DependencyTree.h:319
> #5  0x007b1154 in 
> Http2ConnectionState::send_data_frames_depends_on_priority 
> (this=0x6190005278c8) at Http2ConnectionState.cc:1080
> #6  0x007ae45c in Http2ConnectionState::main_event_handler 
> (this=0x6190005278c8, event=2254, edata=0x609313a0) at 
> Http2ConnectionState.cc:808
> #7  0x0053b59f in Continuation::handleEvent (this=0x6190005278c8, 
> event=2254, data=0x609313a0) at ../iocore/eventsystem/I_Continuation.h:153
> #8  0x00ab74d1 in EThread::process_event (this=0x2b41f4d2d800, 
> e=0x609313a0, calling_code=2254) at UnixEThread.cc:146
> #9  0x00ab7b33 in EThread::execute (this=0x2b41f4d2d800) at 
> UnixEThread.cc:200
> #10 0x00ab5d7b in spawn_thread_internal (a=0x60417290) at 
> Thread.cc:84
> #11 0x2b41efdd9dc5 in start_thread (arg=0x2b41f5ee0700) at 
> pthread_create.c:308
> #12 0x2b41f0801ced in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> {code}
> and
> {code}
> (gdb) print this
> $1 = (Http2DependencyTree * const) 0x60299eb0
> (gdb) print *this
> $2 = {_root = 0x607000383f20, _max_depth = 100, _node_count = 5}
> (gdb) print node
> $3 = (Http2DependencyTree::Node *) 0x607000383cf0
> (gdb) print *node
> $4 = {link = {::Node>> = {next = 
> 0x0}, prev = 0x0}, active = false, queued = true, id = 9, weight = 109, point 
> = 9, parent = 0x607000383a50, children = {head = 0x0}, entry = 
> 0x60299d10, queue = 0x606fbba0, t = 0x61a0ea80}
> (gdb) print node->parent
> $5 = (Http2DependencyTree::Node *) 0x607000383a50
> (gdb) print *node->parent
> $6 = {link = {::Node>> = {next = 
> 0x0}, prev = 0x0}, active = true, queued = true, id = 17, weight = 182, point 
> = 17, parent = 0x607000383ac0, children = {head = 0x607000383cf0}, entry = 
> 0x60299c30, queue = 0x606fba20, t = 0x61a0c680}
> (gdb) print node->entry
> $7 = (PriorityQueueEntry::Node*> *) 
> 0x60299d10
> (gdb) print *node->entry
> $8 = {index = 0, node = 0x607000383cf0}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-11 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4915:
---
Backport to Version: 7.0.0

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff, ts-4915.diff
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
> e=0x2aac954db040, calling_code=1) at 

[jira] [Updated] (TS-4938) Crash due to null client_vc

2016-10-11 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4938:
---
Backport to Version: 6.2.1, 7.0.0  (was: 7.0.0)

> Crash due to null client_vc
> ---
>
> Key: TS-4938
> URL: https://issues.apache.org/jira/browse/TS-4938
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Saw this crash while testing fix for TS-4813.  Have a fix that checks 
> get_netvc() returns a non-NULL.  Should make a more comprehensive review on 
> the use of get_netvc() in HttpTransact.cc/HttpSM.cc
> {code}
> #0  0x0053ed2c in NetVConnection::get_local_addr (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:60
> #1  0x0057dca4 in NetVConnection::get_local_port (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:82
> #2  0x00627844 in 
> HttpTransact::initialize_state_variables_from_request (s=0x2b65700d4a98, 
> obsolete_incoming_request=0x2b65700d51b8)
> at HttpTransact.cc:5709
> #3  0x00632bd1 in HttpTransact::build_error_response 
> (s=0x2b65700d4a98, status_code=HTTP_STATUS_BAD_GATEWAY, 
> reason_phrase_or_null=0x7fb86c "Server Hangup", error_body_type=0x7fb87a 
> "connect#hangup", format=0x0) at HttpTransact.cc:8141
> #4  0x006311fa in HttpTransact::handle_server_died (s=0x2b65700d4a98) 
> at HttpTransact.cc:7789
> #5  0x00620bbc in HttpTransact::handle_server_connection_not_open 
> (s=0x2b65700d4a98) at HttpTransact.cc:3991
> #6  0x0061fd43 in HttpTransact::handle_response_from_server 
> (s=0x2b65700d4a98) at HttpTransact.cc:3824
> #7  0x0061d762 in HttpTransact::HandleResponse (s=0x2b65700d4a98) at 
> HttpTransact.cc:3401
> #8  0x005fc928 in HttpSM::call_transact_and_set_next_state 
> (this=0x2b65700d4a20, 
> f=0x61cf9a ) at 
> HttpSM.cc:7116
> #9  0x005f6902 in HttpSM::handle_server_setup_error 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:5505
> #10 0x005e88a4 in HttpSM::state_send_server_request_header 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:2053
> #11 0x005eb3ba in HttpSM::main_handler (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at HttpSM.cc:2655
> #12 0x005145ac in Continuation::handleEvent (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at ../iocore/eventsystem/I_Continuation.h:153
> #13 0x0079906f in write_signal_and_update (event=104, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:174
> #14 0x007992a6 in write_signal_done (event=104, nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:216
> #15 0x0079a475 in write_to_net_io (nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140, thread=0x2b64f71b1010) at UnixNetVConnection.cc:547
> #16 0x00799dc7 in write_to_net (nh=0x2b64f71b4cf0, vc=0x2aabd0037140, 
> thread=0x2b64f71b1010) at UnixNetVConnection.cc:414
> #17 0x0079129d in NetHandler::mainNetEvent (this=0x2b64f71b4cf0, 
> event=5, e=0x1646ac0) at UnixNet.cc:515
> #18 0x005145ac in Continuation::handleEvent (this=0x2b64f71b4cf0, 
> event=5, data=0x1646ac0) at ../iocore/eventsystem/I_Continuation.h:153
> #19 0x007bc90a in EThread::process_event (this=0x2b64f71b1010, 
> e=0x1646ac0, calling_code=5) at UnixEThread.cc:143
> #20 0x007bcf0d in EThread::execute (this=0x2b64f71b1010) at 
> UnixEThread.cc:270
> #21 0x007bbf1e in spawn_thread_internal (a=0x15731f0) at Thread.cc:84
> #22 0x2b64f5fcfaa1 in start_thread () from /lib64/libpthread.so.0
> #23 0x0032310e893d in clone () from /lib64/libc.so.6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4938) Crash due to null client_vc

2016-10-11 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4938.

Resolution: Fixed

> Crash due to null client_vc
> ---
>
> Key: TS-4938
> URL: https://issues.apache.org/jira/browse/TS-4938
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Saw this crash while testing fix for TS-4813.  Have a fix that checks 
> get_netvc() returns a non-NULL.  Should make a more comprehensive review on 
> the use of get_netvc() in HttpTransact.cc/HttpSM.cc
> {code}
> #0  0x0053ed2c in NetVConnection::get_local_addr (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:60
> #1  0x0057dca4 in NetVConnection::get_local_port (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:82
> #2  0x00627844 in 
> HttpTransact::initialize_state_variables_from_request (s=0x2b65700d4a98, 
> obsolete_incoming_request=0x2b65700d51b8)
> at HttpTransact.cc:5709
> #3  0x00632bd1 in HttpTransact::build_error_response 
> (s=0x2b65700d4a98, status_code=HTTP_STATUS_BAD_GATEWAY, 
> reason_phrase_or_null=0x7fb86c "Server Hangup", error_body_type=0x7fb87a 
> "connect#hangup", format=0x0) at HttpTransact.cc:8141
> #4  0x006311fa in HttpTransact::handle_server_died (s=0x2b65700d4a98) 
> at HttpTransact.cc:7789
> #5  0x00620bbc in HttpTransact::handle_server_connection_not_open 
> (s=0x2b65700d4a98) at HttpTransact.cc:3991
> #6  0x0061fd43 in HttpTransact::handle_response_from_server 
> (s=0x2b65700d4a98) at HttpTransact.cc:3824
> #7  0x0061d762 in HttpTransact::HandleResponse (s=0x2b65700d4a98) at 
> HttpTransact.cc:3401
> #8  0x005fc928 in HttpSM::call_transact_and_set_next_state 
> (this=0x2b65700d4a20, 
> f=0x61cf9a ) at 
> HttpSM.cc:7116
> #9  0x005f6902 in HttpSM::handle_server_setup_error 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:5505
> #10 0x005e88a4 in HttpSM::state_send_server_request_header 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:2053
> #11 0x005eb3ba in HttpSM::main_handler (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at HttpSM.cc:2655
> #12 0x005145ac in Continuation::handleEvent (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at ../iocore/eventsystem/I_Continuation.h:153
> #13 0x0079906f in write_signal_and_update (event=104, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:174
> #14 0x007992a6 in write_signal_done (event=104, nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:216
> #15 0x0079a475 in write_to_net_io (nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140, thread=0x2b64f71b1010) at UnixNetVConnection.cc:547
> #16 0x00799dc7 in write_to_net (nh=0x2b64f71b4cf0, vc=0x2aabd0037140, 
> thread=0x2b64f71b1010) at UnixNetVConnection.cc:414
> #17 0x0079129d in NetHandler::mainNetEvent (this=0x2b64f71b4cf0, 
> event=5, e=0x1646ac0) at UnixNet.cc:515
> #18 0x005145ac in Continuation::handleEvent (this=0x2b64f71b4cf0, 
> event=5, data=0x1646ac0) at ../iocore/eventsystem/I_Continuation.h:153
> #19 0x007bc90a in EThread::process_event (this=0x2b64f71b1010, 
> e=0x1646ac0, calling_code=5) at UnixEThread.cc:143
> #20 0x007bcf0d in EThread::execute (this=0x2b64f71b1010) at 
> UnixEThread.cc:270
> #21 0x007bbf1e in spawn_thread_internal (a=0x15731f0) at Thread.cc:84
> #22 0x2b64f5fcfaa1 in start_thread () from /lib64/libpthread.so.0
> #23 0x0032310e893d in clone () from /lib64/libc.so.6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4938) Crash due to null client_vc

2016-10-11 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4938:
---
Backport to Version: 7.0.0

> Crash due to null client_vc
> ---
>
> Key: TS-4938
> URL: https://issues.apache.org/jira/browse/TS-4938
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Saw this crash while testing fix for TS-4813.  Have a fix that checks 
> get_netvc() returns a non-NULL.  Should make a more comprehensive review on 
> the use of get_netvc() in HttpTransact.cc/HttpSM.cc
> {code}
> #0  0x0053ed2c in NetVConnection::get_local_addr (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:60
> #1  0x0057dca4 in NetVConnection::get_local_port (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:82
> #2  0x00627844 in 
> HttpTransact::initialize_state_variables_from_request (s=0x2b65700d4a98, 
> obsolete_incoming_request=0x2b65700d51b8)
> at HttpTransact.cc:5709
> #3  0x00632bd1 in HttpTransact::build_error_response 
> (s=0x2b65700d4a98, status_code=HTTP_STATUS_BAD_GATEWAY, 
> reason_phrase_or_null=0x7fb86c "Server Hangup", error_body_type=0x7fb87a 
> "connect#hangup", format=0x0) at HttpTransact.cc:8141
> #4  0x006311fa in HttpTransact::handle_server_died (s=0x2b65700d4a98) 
> at HttpTransact.cc:7789
> #5  0x00620bbc in HttpTransact::handle_server_connection_not_open 
> (s=0x2b65700d4a98) at HttpTransact.cc:3991
> #6  0x0061fd43 in HttpTransact::handle_response_from_server 
> (s=0x2b65700d4a98) at HttpTransact.cc:3824
> #7  0x0061d762 in HttpTransact::HandleResponse (s=0x2b65700d4a98) at 
> HttpTransact.cc:3401
> #8  0x005fc928 in HttpSM::call_transact_and_set_next_state 
> (this=0x2b65700d4a20, 
> f=0x61cf9a ) at 
> HttpSM.cc:7116
> #9  0x005f6902 in HttpSM::handle_server_setup_error 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:5505
> #10 0x005e88a4 in HttpSM::state_send_server_request_header 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:2053
> #11 0x005eb3ba in HttpSM::main_handler (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at HttpSM.cc:2655
> #12 0x005145ac in Continuation::handleEvent (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at ../iocore/eventsystem/I_Continuation.h:153
> #13 0x0079906f in write_signal_and_update (event=104, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:174
> #14 0x007992a6 in write_signal_done (event=104, nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:216
> #15 0x0079a475 in write_to_net_io (nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140, thread=0x2b64f71b1010) at UnixNetVConnection.cc:547
> #16 0x00799dc7 in write_to_net (nh=0x2b64f71b4cf0, vc=0x2aabd0037140, 
> thread=0x2b64f71b1010) at UnixNetVConnection.cc:414
> #17 0x0079129d in NetHandler::mainNetEvent (this=0x2b64f71b4cf0, 
> event=5, e=0x1646ac0) at UnixNet.cc:515
> #18 0x005145ac in Continuation::handleEvent (this=0x2b64f71b4cf0, 
> event=5, data=0x1646ac0) at ../iocore/eventsystem/I_Continuation.h:153
> #19 0x007bc90a in EThread::process_event (this=0x2b64f71b1010, 
> e=0x1646ac0, calling_code=5) at UnixEThread.cc:143
> #20 0x007bcf0d in EThread::execute (this=0x2b64f71b1010) at 
> UnixEThread.cc:270
> #21 0x007bbf1e in spawn_thread_internal (a=0x15731f0) at Thread.cc:84
> #22 0x2b64f5fcfaa1 in start_thread () from /lib64/libpthread.so.0
> #23 0x0032310e893d in clone () from /lib64/libc.so.6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TS-4902) Http2ConnectionState::stream_list gets in bad state

2016-10-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs closed TS-4902.
--
Resolution: Fixed

I'm not seeing this anymore.  I think the fix to TS-4813 addressed this issue.

> Http2ConnectionState::stream_list gets in bad state
> ---
>
> Key: TS-4902
> URL: https://issues.apache.org/jira/browse/TS-4902
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this in one of my runs while debugging TS-4900 and TS-4813.  It might 
> have been due to my attempted fix, but I don't think so. 
> I left current master running in production for an hour.  When I returned, 
> CPU utilization had jumped from 70% to 400%.  perf top showed that the 
> majority of time was being spent in Http2ConnectionState::restart_streams.  
> Connected with debugger and two threads were spinning in that method.  In 
> each case this->stream_list.head->link.next == 
> this->stream_list.head->link.prev and this->thread_list.head->link.next != 
> NULL.
> I assume that was are missing some locking and the stream_list is being 
> manipulated by two threads in parallel.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4915:
---
Attachment: ts-4915.diff

Updating my patch.  Got another assert where entry->index >= _v.length() in 
erase().  To erase a value, it pulls the last value and puts it in the 
erase_index target.  But it does not update the index value, and that index 
value will soon be invalid (i.e. greater than vector length).  So when we erase 
that value, the assertion will fail.  If we used that index to pull the value 
of of the vector, it would be accessing unused data.

Added assert(false) into PriorityQueue<>::pop() because I don't think it is 
called, and the index maintaining logic doesn't make sense.  Wanting to see if 
it is in fact used.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff, ts-4915.diff
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in 

[jira] [Comment Edited] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563113#comment-15563113
 ] 

Susan Hinrichs edited comment on TS-4915 at 10/10/16 6:46 PM:
--

Now see a slightly different stack

{code}
(gdb) bt
#0  0x00547b2a in RefCountCacheHashEntry::operator< 
(this=0x2b1dd00d8a80, v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
#1  0x005497ed in 
PriorityQueueLess::operator() (this=0x2b1d5cc1487b, 
a=@0x2b1dd00dae38, b=@0x2b1dd00daea8)
at ../lib/ts/PriorityQueue.h:41
#2  0x005496e5 in PriorityQueue >::_bubble_up (this=0x2155670, 
index=3)
at ../lib/ts/PriorityQueue.h:192
#3  0x006eccdc in PriorityQueue >::push (this=0x2155670, 
entry=0x2b1dd00dae30) at ../../lib/ts/PriorityQueue.h:91
#4  0x006ebf28 in RefCountCachePartition::put 
(this=0x21555e0, key=18396718469509840932, item=0x2b1d7abadf80, size=94, 
expire_time=1476124925) at ./P_RefCountCache.h:210
#5  0x006eb100 in RefCountCache::put (this=0x1ca8220, 
key=18396718469509840932, item=0x2b1d7abadf80, size=14, 
expiry_time=1476124925) at ./P_RefCountCache.h:462
#6  0x006e2ab0 in HostDBContinuation::dnsEvent (this=0x2b1dec047b80, 
event=600, e=0x2b1d622e8000) at HostDB.cc:1424
#7  0x0051453e in Continuation::handleEvent (this=0x2b1dec047b80, 
event=600, data=0x2b1d622e8000) at ../iocore/eventsystem/I_Continuation.h:153
#8  0x006f64b2 in DNSEntry::postEvent (this=0x2b1dd00cf200) at 
DNS.cc:1269
#9  0x0051453e in Continuation::handleEvent (this=0x2b1dd00cf200, 
event=1, data=0x2b1d6ec25220) at ../iocore/eventsystem/I_Continuation.h:153
#10 0x007bc572 in EThread::process_event (this=0x2b1d57270010, 
e=0x2b1d6ec25220, calling_code=1) at UnixEThread.cc:143
#11 0x007bc7e1 in EThread::execute (this=0x2b1d57270010) at 
UnixEThread.cc:197
#12 0x007bbb86 in spawn_thread_internal (a=0x1c9df40) at Thread.cc:84
#13 0x2b1d55a88aa1 in start_thread () from /lib64/libpthread.so.0
#14 0x0032310e893d in clone () from /lib64/libc.so.6
{code}

v2 is bogus.


was (Author: shinrich):
Now see a slightly different stack

{code}
(gdb) bt
#0  0x00547b2a in RefCountCacheHashEntry::operator< 
(this=0x2b1dd00d8a80, v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
#1  0x005497ed in 
PriorityQueueLess::operator() (this=0x2b1d5cc1487b, 
a=@0x2b1dd00dae38, b=@0x2b1dd00daea8)
at ../lib/ts/PriorityQueue.h:41
#2  0x005496e5 in PriorityQueue >::_bubble_up (this=0x2155670, 
index=3)
at ../lib/ts/PriorityQueue.h:192
#3  0x006eccdc in PriorityQueue >::push (this=0x2155670, 
entry=0x2b1dd00dae30) at ../../lib/ts/PriorityQueue.h:91
#4  0x006ebf28 in RefCountCachePartition::put 
(this=0x21555e0, key=18396718469509840932, item=0x2b1d7abadf80, size=94, 
expire_time=1476124925) at ./P_RefCountCache.h:210
#5  0x006eb100 in RefCountCache::put (this=0x1ca8220, 
key=18396718469509840932, item=0x2b1d7abadf80, size=14, 
expiry_time=1476124925) at ./P_RefCountCache.h:462
#6  0x006e2ab0 in HostDBContinuation::dnsEvent (this=0x2b1dec047b80, 
event=600, e=0x2b1d622e8000) at HostDB.cc:1424
#7  0x0051453e in Continuation::handleEvent (this=0x2b1dec047b80, 
event=600, data=0x2b1d622e8000) at ../iocore/eventsystem/I_Continuation.h:153
#8  0x006f64b2 in DNSEntry::postEvent (this=0x2b1dd00cf200) at 
DNS.cc:1269
#9  0x0051453e in Continuation::handleEvent (this=0x2b1dd00cf200, 
event=1, data=0x2b1d6ec25220) at ../iocore/eventsystem/I_Continuation.h:153
#10 0x007bc572 in EThread::process_event (this=0x2b1d57270010, 
e=0x2b1d6ec25220, calling_code=1) at UnixEThread.cc:143
#11 0x007bc7e1 in EThread::execute (this=0x2b1d57270010) at 
UnixEThread.cc:197
#12 0x007bbb86 in spawn_thread_internal (a=0x1c9df40) at Thread.cc:84
#13 0x2b1d55a88aa1 in start_thread () from /lib64/libpthread.so.0
#14 0x0032310e893d in clone () from /lib64/libc.so.6
{code}

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563113#comment-15563113
 ] 

Susan Hinrichs commented on TS-4915:


Now see a slightly different stack

{code}
(gdb) bt
#0  0x00547b2a in RefCountCacheHashEntry::operator< 
(this=0x2b1dd00d8a80, v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
#1  0x005497ed in 
PriorityQueueLess::operator() (this=0x2b1d5cc1487b, 
a=@0x2b1dd00dae38, b=@0x2b1dd00daea8)
at ../lib/ts/PriorityQueue.h:41
#2  0x005496e5 in PriorityQueue >::_bubble_up (this=0x2155670, 
index=3)
at ../lib/ts/PriorityQueue.h:192
#3  0x006eccdc in PriorityQueue >::push (this=0x2155670, 
entry=0x2b1dd00dae30) at ../../lib/ts/PriorityQueue.h:91
#4  0x006ebf28 in RefCountCachePartition::put 
(this=0x21555e0, key=18396718469509840932, item=0x2b1d7abadf80, size=94, 
expire_time=1476124925) at ./P_RefCountCache.h:210
#5  0x006eb100 in RefCountCache::put (this=0x1ca8220, 
key=18396718469509840932, item=0x2b1d7abadf80, size=14, 
expiry_time=1476124925) at ./P_RefCountCache.h:462
#6  0x006e2ab0 in HostDBContinuation::dnsEvent (this=0x2b1dec047b80, 
event=600, e=0x2b1d622e8000) at HostDB.cc:1424
#7  0x0051453e in Continuation::handleEvent (this=0x2b1dec047b80, 
event=600, data=0x2b1d622e8000) at ../iocore/eventsystem/I_Continuation.h:153
#8  0x006f64b2 in DNSEntry::postEvent (this=0x2b1dd00cf200) at 
DNS.cc:1269
#9  0x0051453e in Continuation::handleEvent (this=0x2b1dd00cf200, 
event=1, data=0x2b1d6ec25220) at ../iocore/eventsystem/I_Continuation.h:153
#10 0x007bc572 in EThread::process_event (this=0x2b1d57270010, 
e=0x2b1d6ec25220, calling_code=1) at UnixEThread.cc:143
#11 0x007bc7e1 in EThread::execute (this=0x2b1d57270010) at 
UnixEThread.cc:197
#12 0x007bbb86 in spawn_thread_internal (a=0x1c9df40) at Thread.cc:84
#13 0x2b1d55a88aa1 in start_thread () from /lib64/libpthread.so.0
#14 0x0032310e893d in clone () from /lib64/libc.so.6
{code}

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 

[jira] [Updated] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4915:
---
Attachment: ts-4915.diff

Running with ts-4915.diff.  It seems that the expiry_queue.pop() is harmful.  
The entry should have already been removed during the course of the 
this->erase().

Will make a PR once TS-4938 is commited. 

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562927#comment-15562927
 ] 

Susan Hinrichs commented on TS-4915:


assert twigged with entry->index == 5 and _v.length() == 4.  Digging through 
the logic to see it is reasonable to get in this state.  In which case doing a 
check here should suffice, or if there is a broader race condition we should be 
concerned about.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562520#comment-15562520
 ] 

Susan Hinrichs commented on TS-4915:


Interesting.  I assume that entry->index is invalid.  I put in an assert that 
entry->index < _v->length.  Hopefully that gives us a good core dump

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 

[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552510#comment-15552510
 ] 

Susan Hinrichs commented on TS-4915:


Still getting these crashes once every couple hours in production traffic.

> Crash from hostdb in PriorityQueueLess
> --
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HostDB
>Reporter: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1  0x0054988d in 
> PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2  0x00549785 in PriorityQueue PriorityQueueLess >::_bubble_up (this=0x1cb2990, 
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {}
> parent = 0
> #3  0x006ecfcc in PriorityQueue PriorityQueueLess >::push (this=0x1cb2990, 
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4  0x006ec206 in RefCountCachePartition::put 
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
> key=6912554662447498853, item=0x2b78aee04f00, size=16, 
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = { = { = {_vptr.ForceVFPTToTop 
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
>   key = 47797242059264, app = {allotment = {application1 = 5326300, 
> application2 = 0}, http_data = {http_version = 4, 
>   pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data = 
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
> sin_port = 94, 
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
> sin6_addr = {__in6_u = {__u6_addr8 = 
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
> 11128, 
>   0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight 
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
>   hostname_offset = 11128, ip_timestamp = 2845989456, 
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
>   round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = 
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944, 
>   47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
> DNS.cc:1269
> __func__ = "postEvent"
> #9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
> e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
> 

[jira] [Commented] (TS-4916) Http2ConnectionState::restart_streams infinite loop causes deadlock

2016-10-06 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552505#comment-15552505
 ] 

Susan Hinrichs commented on TS-4916:


[~gancho] you might also look at this fix for TS-4813, which is now merged.  It 
rearranges some of the stream_count book keeping.  I haven't seen this crash 
again recently.  Though I am crashing pretty frequently due to TS-4915, so it 
could be that I'm just not running long enough. 

It looked like we had simultaneous threads manipulating the stream list.  DLL<> 
is not thread safe.

> Http2ConnectionState::restart_streams infinite loop causes deadlock 
> 
>
> Key: TS-4916
> URL: https://issues.apache.org/jira/browse/TS-4916
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, HTTP/2
>Reporter: Gancho Tenev
>Assignee: Gancho Tenev
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Http2ConnectionState::restart_streams falls into an infinite loop while 
> holding a lock, which leads to cache updates to start failing.
> The infinite loop is caused by traversing a list whose last element “next” 
> points to the element itself and the traversal never finishes.
> {code}
> Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> #1  rcv_window_update_frame (cstate=..., frame=...) at 
> Http2ConnectionState.cc:627
> #2  0x2acf9738 in Http2ConnectionState::main_event_handler 
> (this=0x2ae6ba5284c8, event=, edata=) at 
> Http2ConnectionState.cc:823
> #3  0x2acef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, 
> event=2253, this=0x2ae6ba5284c8) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, 
> event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at 
> Http2ClientSession.cc:58
> #5  0x2acef462 in Http2ClientSession::state_complete_frame_read 
> (this=0x2ae6ba528290, event=, edata=0x2aab7b237f18) at 
> Http2ClientSession.cc:426
> #6  0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #8  0x2acef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
> #10 0x2acf0982 in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=0x2ae6ba528290) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, 
> event=, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
> #12 0x2ae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, 
> event=100, this=) at 
> ../../iocore/eventsystem/I_Continuation.h:153
> #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, 
> event=event@entry=100) at UnixNetVConnection.cc:153
> #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, 
> event=event@entry=100) at UnixNetVConnection.cc:1036
> #15 0x2ae47653 in SSLNetVConnection::net_read_io 
> (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at 
> SSLNetVConnection.cc:595
> #16 0x2ae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, 
> event=, e=) at UnixNet.cc:513
> #17 0x2ae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, 
> event=5, this=) at I_Continuation.h:153
> #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, 
> this=0x2aaab2406000) at UnixEThread.cc:148
> #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
> #20 0x2ae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at 
> Thread.cc:86
> #21 0x2d6b3aa1 in start_thread (arg=0x2aaab3d04700) at 
> pthread_create.c:301
> #22 0x2e8bc93d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> {code}
> Here is the stream_list trace.
> {code}
> (gdb) thread 51
> [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
> #0  0x2acf3fee in Http2ConnectionState::restart_streams 
> (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
> (gdb) trace_list stream_list
> --- count=0 ---
> id=29
> this=0x2ae673f0c840
> next=0x2aaac05d8900
> prev=(nil)
> --- count=1 ---
> id=27
> this=0x2aaac05d8900
> next=0x2ae5b6bbec00
> prev=0x2ae673f0c840
> --- count=2 ---
> id=19
> this=0x2ae5b6bbec00
> next=0x2ae5b6bbec00
> prev=0x2aaac05d8900
> --- count=3 ---
> 

[jira] [Assigned] (TS-4938) Crash due to null client_vc

2016-10-05 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-4938:
--

Assignee: Susan Hinrichs

> Crash due to null client_vc
> ---
>
> Key: TS-4938
> URL: https://issues.apache.org/jira/browse/TS-4938
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> Saw this crash while testing fix for TS-4813.  Have a fix that checks 
> get_netvc() returns a non-NULL.  Should make a more comprehensive review on 
> the use of get_netvc() in HttpTransact.cc/HttpSM.cc
> {code}
> #0  0x0053ed2c in NetVConnection::get_local_addr (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:60
> #1  0x0057dca4 in NetVConnection::get_local_port (this=0x0) at 
> ../iocore/net/P_NetVConnection.h:82
> #2  0x00627844 in 
> HttpTransact::initialize_state_variables_from_request (s=0x2b65700d4a98, 
> obsolete_incoming_request=0x2b65700d51b8)
> at HttpTransact.cc:5709
> #3  0x00632bd1 in HttpTransact::build_error_response 
> (s=0x2b65700d4a98, status_code=HTTP_STATUS_BAD_GATEWAY, 
> reason_phrase_or_null=0x7fb86c "Server Hangup", error_body_type=0x7fb87a 
> "connect#hangup", format=0x0) at HttpTransact.cc:8141
> #4  0x006311fa in HttpTransact::handle_server_died (s=0x2b65700d4a98) 
> at HttpTransact.cc:7789
> #5  0x00620bbc in HttpTransact::handle_server_connection_not_open 
> (s=0x2b65700d4a98) at HttpTransact.cc:3991
> #6  0x0061fd43 in HttpTransact::handle_response_from_server 
> (s=0x2b65700d4a98) at HttpTransact.cc:3824
> #7  0x0061d762 in HttpTransact::HandleResponse (s=0x2b65700d4a98) at 
> HttpTransact.cc:3401
> #8  0x005fc928 in HttpSM::call_transact_and_set_next_state 
> (this=0x2b65700d4a20, 
> f=0x61cf9a ) at 
> HttpSM.cc:7116
> #9  0x005f6902 in HttpSM::handle_server_setup_error 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:5505
> #10 0x005e88a4 in HttpSM::state_send_server_request_header 
> (this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:2053
> #11 0x005eb3ba in HttpSM::main_handler (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at HttpSM.cc:2655
> #12 0x005145ac in Continuation::handleEvent (this=0x2b65700d4a20, 
> event=104, data=0x2aabd00372d8) at ../iocore/eventsystem/I_Continuation.h:153
> #13 0x0079906f in write_signal_and_update (event=104, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:174
> #14 0x007992a6 in write_signal_done (event=104, nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140) at UnixNetVConnection.cc:216
> #15 0x0079a475 in write_to_net_io (nh=0x2b64f71b4cf0, 
> vc=0x2aabd0037140, thread=0x2b64f71b1010) at UnixNetVConnection.cc:547
> #16 0x00799dc7 in write_to_net (nh=0x2b64f71b4cf0, vc=0x2aabd0037140, 
> thread=0x2b64f71b1010) at UnixNetVConnection.cc:414
> #17 0x0079129d in NetHandler::mainNetEvent (this=0x2b64f71b4cf0, 
> event=5, e=0x1646ac0) at UnixNet.cc:515
> #18 0x005145ac in Continuation::handleEvent (this=0x2b64f71b4cf0, 
> event=5, data=0x1646ac0) at ../iocore/eventsystem/I_Continuation.h:153
> #19 0x007bc90a in EThread::process_event (this=0x2b64f71b1010, 
> e=0x1646ac0, calling_code=5) at UnixEThread.cc:143
> #20 0x007bcf0d in EThread::execute (this=0x2b64f71b1010) at 
> UnixEThread.cc:270
> #21 0x007bbf1e in spawn_thread_internal (a=0x15731f0) at Thread.cc:84
> #22 0x2b64f5fcfaa1 in start_thread () from /lib64/libpthread.so.0
> #23 0x0032310e893d in clone () from /lib64/libc.so.6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4938) Crash due to null client_vc

2016-10-05 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4938:
--

 Summary: Crash due to null client_vc
 Key: TS-4938
 URL: https://issues.apache.org/jira/browse/TS-4938
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Susan Hinrichs


Saw this crash while testing fix for TS-4813.  Have a fix that checks 
get_netvc() returns a non-NULL.  Should make a more comprehensive review on the 
use of get_netvc() in HttpTransact.cc/HttpSM.cc

{code}
#0  0x0053ed2c in NetVConnection::get_local_addr (this=0x0) at 
../iocore/net/P_NetVConnection.h:60
#1  0x0057dca4 in NetVConnection::get_local_port (this=0x0) at 
../iocore/net/P_NetVConnection.h:82
#2  0x00627844 in HttpTransact::initialize_state_variables_from_request 
(s=0x2b65700d4a98, obsolete_incoming_request=0x2b65700d51b8)
at HttpTransact.cc:5709
#3  0x00632bd1 in HttpTransact::build_error_response (s=0x2b65700d4a98, 
status_code=HTTP_STATUS_BAD_GATEWAY, 
reason_phrase_or_null=0x7fb86c "Server Hangup", error_body_type=0x7fb87a 
"connect#hangup", format=0x0) at HttpTransact.cc:8141
#4  0x006311fa in HttpTransact::handle_server_died (s=0x2b65700d4a98) 
at HttpTransact.cc:7789
#5  0x00620bbc in HttpTransact::handle_server_connection_not_open 
(s=0x2b65700d4a98) at HttpTransact.cc:3991
#6  0x0061fd43 in HttpTransact::handle_response_from_server 
(s=0x2b65700d4a98) at HttpTransact.cc:3824
#7  0x0061d762 in HttpTransact::HandleResponse (s=0x2b65700d4a98) at 
HttpTransact.cc:3401
#8  0x005fc928 in HttpSM::call_transact_and_set_next_state 
(this=0x2b65700d4a20, 
f=0x61cf9a ) at 
HttpSM.cc:7116
#9  0x005f6902 in HttpSM::handle_server_setup_error 
(this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:5505
#10 0x005e88a4 in HttpSM::state_send_server_request_header 
(this=0x2b65700d4a20, event=104, data=0x2aabd00372d8) at HttpSM.cc:2053
#11 0x005eb3ba in HttpSM::main_handler (this=0x2b65700d4a20, event=104, 
data=0x2aabd00372d8) at HttpSM.cc:2655
#12 0x005145ac in Continuation::handleEvent (this=0x2b65700d4a20, 
event=104, data=0x2aabd00372d8) at ../iocore/eventsystem/I_Continuation.h:153
#13 0x0079906f in write_signal_and_update (event=104, 
vc=0x2aabd0037140) at UnixNetVConnection.cc:174
#14 0x007992a6 in write_signal_done (event=104, nh=0x2b64f71b4cf0, 
vc=0x2aabd0037140) at UnixNetVConnection.cc:216
#15 0x0079a475 in write_to_net_io (nh=0x2b64f71b4cf0, 
vc=0x2aabd0037140, thread=0x2b64f71b1010) at UnixNetVConnection.cc:547
#16 0x00799dc7 in write_to_net (nh=0x2b64f71b4cf0, vc=0x2aabd0037140, 
thread=0x2b64f71b1010) at UnixNetVConnection.cc:414
#17 0x0079129d in NetHandler::mainNetEvent (this=0x2b64f71b4cf0, 
event=5, e=0x1646ac0) at UnixNet.cc:515
#18 0x005145ac in Continuation::handleEvent (this=0x2b64f71b4cf0, 
event=5, data=0x1646ac0) at ../iocore/eventsystem/I_Continuation.h:153
#19 0x007bc90a in EThread::process_event (this=0x2b64f71b1010, 
e=0x1646ac0, calling_code=5) at UnixEThread.cc:143
#20 0x007bcf0d in EThread::execute (this=0x2b64f71b1010) at 
UnixEThread.cc:270
#21 0x007bbf1e in spawn_thread_internal (a=0x15731f0) at Thread.cc:84
#22 0x2b64f5fcfaa1 in start_thread () from /lib64/libpthread.so.0
#23 0x0032310e893d in clone () from /lib64/libc.so.6
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-10-02 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541238#comment-15541238
 ] 

Susan Hinrichs commented on TS-4915:


Yes, I assume this is a 7.0.0 issue.  Though I cannot say for certain since I'm 
working on a branch to fix ts-4813 for now.  With that branch, it looks like 
this core appears every couple hours on a light traffic, predominately caching 
production server.

[~jacksontj] since hostdb shows up in the stack does this look like anything 
from the resent hostdb changes?  I have another less frequent stack that shows 
up in hostdb land.  I have seen this following stack 2-3 times since Friday.

{code}
#0  0x007b7c3f in ink_atomic_increment 
(mem=0xff4e45fa6000802c, count=1) at ../../lib/ts/ink_atomic.h:95
#1  0x007b7703 in RecIncrGlobalRawStatCount (rsb=0x2b1a60037f90, id=4, 
incr=1) at RecRawStats.cc:467
#2  0x0054915d in RefCountCachePartition::metric_inc 
(this=0x1477c50, metric_enum=refcountcache_total_lookups_stat, data=1)
at ../iocore/hostdb/P_RefCountCache.h:327
#3  0x006ebfca in RefCountCachePartition::get 
(this=0x1477c50, key=18396718469509840932) at ./P_RefCountCache.h:174
#4  0x006eb38f in RefCountCache::get (this=0x1484f00, 
key=18396718469509840932) at ./P_RefCountCache.h:455
#5  0x006de4bf in probe (mutex=0x1457a80, md5=..., 
ignore_timeout=false) at HostDB.cc:527
#6  0x006dfbfe in HostDBProcessor::getbyname_imm (this=0xc173c0, 
cont=0x2b19fc388440, process_hostdb_info=
(void (Continuation::*)(Continuation *, HostDBInfo *)) 0x5e8ece 
, 
hostname=0x2b1a2d0c1419 "lib.lumcs.com", len=0, opt=...) at HostDB.cc:818
#7  0x005f1102 in HttpSM::do_hostdb_lookup (this=0x2b19fc388440) at 
HttpSM.cc:4130
#8  0x005fd691 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7256
#9  0x005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=0) at HttpSM.cc:7122
#10 0x005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at 
HttpSM.cc:1604
#11 0x0060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at 
HttpSM.cc:438
#12 0x005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7155
#13 0x005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=0) at HttpSM.cc:7122
#14 0x005e6eab in HttpSM::handle_api_return (this=0x2b19fc388440) at 
HttpSM.cc:1604
#15 0x0060414c in HttpSM::do_api_callout (this=0x2b19fc388440) at 
HttpSM.cc:438
#16 0x005fcaf2 in HttpSM::set_next_state (this=0x2b19fc388440) at 
HttpSM.cc:7155
#17 0x005fca85 in HttpSM::call_transact_and_set_next_state 
(this=0x2b19fc388440, f=
0x616a42 ) at 
HttpSM.cc:7122
#18 0x005eaf09 in HttpSM::state_cache_open_read (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740) at HttpSM.cc:2596
#19 0x005eb4b5 in HttpSM::main_handler (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740) at HttpSM.cc:2658
#20 0x005145dc in Continuation::handleEvent (this=0x2b19fc388440, 
event=1102, data=0x2b1a3041e740)
at ../iocore/eventsystem/I_Continuation.h:153
#21 0x005d3818 in HttpCacheSM::state_cache_open_read 
(this=0x2b19fc389d60, event=1102, data=0x2b1a3041e740) at HttpCacheSM.cc:132
#22 0x005145dc in Continuation::handleEvent (this=0x2b19fc389d60, 
event=1102, data=0x2b1a3041e740)
at ../iocore/eventsystem/I_Continuation.h:153
#23 0x00756997 in CacheVC::callcont (this=0x2b1a3041e740, event=1102) 
at P_CacheInternal.h:643
#24 0x00754aa6 in CacheVC::openReadStartEarliest (this=0x2b1a3041e740) 
at CacheRead.cc:914
#25 0x005145dc in Continuation::handleEvent (this=0x2b1a3041e740, 
event=3900, data=0x0) at ../iocore/eventsystem/I_Continuation.h:153
#26 0x007318f5 in CacheVC::handleReadDone (this=0x2b1a3041e740, 
event=3900, e=0x2b1a3041e8c8) at Cache.cc:2445
#27 0x005145dc in Continuation::handleEvent (this=0x2b1a3041e740, 
event=3900, data=0x2b1a3041e8c8)
at ../iocore/eventsystem/I_Continuation.h:153
#28 0x00736ebd in AIOCallbackInternal::io_complete 
(this=0x2b1a3041e8c8, event=1, data=0x2b1a0c0041e0) at 
../../iocore/aio/P_AIO.h:117
#29 0x005145dc in Continuation::handleEvent (this=0x2b1a3041e8c8, 
event=1, data=0x2b1a0c0041e0) at ../iocore/eventsystem/I_Continuation.h:153
#30 0x007bca24 in EThread::process_event (this=0x2b19e1824010, 
e=0x2b1a0c0041e0, calling_code=1) at UnixEThread.cc:143
#31 0x007bcc93 in EThread::execute (this=0x2b19e1824010) at 
UnixEThread.cc:197
#32 0x007bc038 in spawn_thread_internal (a=0x147beb0) at Thread.cc:84
#33 0x2b19db4a9aa1 in start_thread () from /lib64/libpthread.so.0
#34 0x0032310e893d in clone () from /lib64/libc.so.6
{code}

> Crash 

[jira] [Updated] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-09-30 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4915:
---
Description: 
Saw this while testing fix for TS-4813 with debug enabled.

{code}
(gdb) bt full
#0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
No locals.
#1  0x0054988d in 
PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
a=@0x2b78f402af68, b=@0x2b78f402aa28)
at ../lib/ts/PriorityQueue.h:41
No locals.
#2  0x00549785 in PriorityQueue >::_bubble_up (this=0x1cb2990, 
index=2) at ../lib/ts/PriorityQueue.h:191
comp = {}
parent = 0
#3  0x006ecfcc in PriorityQueue >::push (this=0x1cb2990, 
entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
len = 2
#4  0x006ec206 in RefCountCachePartition::put 
(this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
expire_time=1475202356) at ./P_RefCountCache.h:210
expiry_entry = 0x2b78f402af60
__func__ = "put"
val = 0x1cc0880
#5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
key=6912554662447498853, item=0x2b78aee04f00, size=16, 
expiry_time=1475202356) at ./P_RefCountCache.h:462
No locals.
#6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
event=600, e=0x2b78ac009440) at HostDB.cc:1422
is_rr = false
old_rr_data = 0x0
first_record = 0x2b78ac0094f8
m = 0x1
failed = false
old_r = {m_ptr = 0x0}
af = 2 '\002'
s_size = 16
rrsize = 0
allocSize = 16
r = 0x2b78aee04f00
old_info = { = { = {_vptr.ForceVFPTToTop = 
0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
  key = 47797242059264, app = {allotment = {application1 = 5326300, 
application2 = 0}, http_data = {http_version = 4, 
  pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
ip = {sa = {sa_family = 54488, sa_data = 
"^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
sin_port = 94, 
sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
sin6_addr = {__in6_u = {__u6_addr8 = 
"\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
11128, 
  0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight = 
94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
  hostname_offset = 11128, ip_timestamp = 2845989456, 
ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
  round_robin_elt = 0}
valid_records = 0
tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
{__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
__u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 0}, 
__u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
_byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = {540420056, 
11128, 2819640560, 11128}, _u64 = {47794936489944, 
  47797215710448}}}
ttl_seconds = 132
aname = 0x2b7938021000 "fbmm1.zenfs.com"
offset = 96
thread = 0x2b78a8101010
__func__ = "dnsEvent"
#7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
event=600, data=0x2b78ac009440)
at ../iocore/eventsystem/I_Continuation.h:153
No locals.
#8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
DNS.cc:1269
__func__ = "postEvent"
#9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
event=1, data=0x2aac954db040)
at ../iocore/eventsystem/I_Continuation.h:153
No locals.
#10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
c_temp = 0x2b78f4028600
lock = {m = {m_ptr = 0x17dea10}, lock_acquired = true}
__func__ = "process_event"
#11 0x007bcc2d in EThread::execute (this=0x2b78a8101010) at 
UnixEThread.cc:197
done_one = false
e = 0x2aac954db040
NegativeQueue = {> = {head = 0x18ce400}, 
tail = 0x18ce400}
next_time = 1475191803711988905
__func__ = "execute"
#12 0x007bbfd2 in spawn_thread_internal (a=0x17fb9a0) at Thread.cc:84
p = 0x17fb9a0
#13 0x2b78a2555aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x0032310e893d in clone () 

[jira] [Updated] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-09-30 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4915:
---
Description: 
Saw this while testing fix for TS-4813 with debug enabled.

{code}
(gdb) bt full
#0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
No locals.
#1  0x0054988d in 
PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
a=@0x2b78f402af68, b=@0x2b78f402aa28)
at ../lib/ts/PriorityQueue.h:41
No locals.
#2  0x00549785 in PriorityQueue >::_bubble_up (this=0x1cb2990, 
index=2) at ../lib/ts/PriorityQueue.h:191
comp = {}
parent = 0
#3  0x006ecfcc in PriorityQueue >::push (this=0x1cb2990, 
entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
len = 2
#4  0x006ec206 in RefCountCachePartition::put 
(this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
expire_time=1475202356) at ./P_RefCountCache.h:210
expiry_entry = 0x2b78f402af60
__func__ = "put"
val = 0x1cc0880
#5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
key=6912554662447498853, item=0x2b78aee04f00, size=16, 
expiry_time=1475202356) at ./P_RefCountCache.h:462
No locals.
#6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
event=600, e=0x2b78ac009440) at HostDB.cc:1422
is_rr = false
old_rr_data = 0x0
first_record = 0x2b78ac0094f8
m = 0x1
failed = false
old_r = {m_ptr = 0x0}
af = 2 '\002'
s_size = 16
rrsize = 0
allocSize = 16
r = 0x2b78aee04f00
old_info = { = { = {_vptr.ForceVFPTToTop = 
0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
  key = 47797242059264, app = {allotment = {application1 = 5326300, 
application2 = 0}, http_data = {http_version = 4, 
  pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
ip = {sa = {sa_family = 54488, sa_data = 
"^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
sin_port = 94, 
sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
sin6_addr = {__in6_u = {__u6_addr8 = 
"\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
11128, 
  0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight = 
94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
  hostname_offset = 11128, ip_timestamp = 2845989456, 
ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
  round_robin_elt = 0}
valid_records = 0
tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
{__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
__u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 0}, 
__u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
_byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = {540420056, 
11128, 2819640560, 11128}, _u64 = {47794936489944, 
  47797215710448}}}
ttl_seconds = 132
aname = 0x2b7938021000 "fbmm1.zenfs.com"
offset = 96
thread = 0x2b78a8101010
__func__ = "dnsEvent"
#7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
event=600, data=0x2b78ac009440)
at ../iocore/eventsystem/I_Continuation.h:153
No locals.
#8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
DNS.cc:1269
__func__ = "postEvent"
#9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
event=1, data=0x2aac954db040)
at ../iocore/eventsystem/I_Continuation.h:153
No locals.
#10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
c_temp = 0x2b78f4028600
lock = {m = {m_ptr = 0x17dea10}, lock_acquired = true}
__func__ = "process_event"
#11 0x007bcc2d in EThread::execute (this=0x2b78a8101010) at 
UnixEThread.cc:197
done_one = false
e = 0x2aac954db040
NegativeQueue = {> = {head = 0x18ce400}, 
tail = 0x18ce400}
next_time = 1475191803711988905
__func__ = "execute"
#12 0x007bbfd2 in spawn_thread_internal (a=0x17fb9a0) at Thread.cc:84
p = 0x17fb9a0
#13 0x2b78a2555aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x0032310e893d in clone () 

[jira] [Created] (TS-4915) Crash from hostdb in PriorityQueueLess

2016-09-30 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4915:
--

 Summary: Crash from hostdb in PriorityQueueLess
 Key: TS-4915
 URL: https://issues.apache.org/jira/browse/TS-4915
 Project: Traffic Server
  Issue Type: Bug
  Components: HostDB
Reporter: Susan Hinrichs


Saw this while testing fix for TS-4813 with debug enabled.

{code}
(gdb) bt full
#0  0x00547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880, 
v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
No locals.
#1  0x0054988d in 
PriorityQueueLess::operator() (this=0x2b78a9a2587b, 
a=@0x2b78f402af68, b=@0x2b78f402aa28)
at ../lib/ts/PriorityQueue.h:41
No locals.
#2  0x00549785 in PriorityQueue >::_bubble_up (this=0x1cb2990, 
index=2) at ../lib/ts/PriorityQueue.h:191
comp = {}
parent = 0
#3  0x006ecfcc in PriorityQueue >::push (this=0x1cb2990, 
entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
len = 2
#4  0x006ec206 in RefCountCachePartition::put 
(this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96, 
expire_time=1475202356) at ./P_RefCountCache.h:210
expiry_entry = 0x2b78f402af60
__func__ = "put"
val = 0x1cc0880
#5  0x006eb3de in RefCountCache::put (this=0x18051e0, 
key=6912554662447498853, item=0x2b78aee04f00, size=16, 
expiry_time=1475202356) at ./P_RefCountCache.h:462
No locals.
#6  0x006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00, 
event=600, e=0x2b78ac009440) at HostDB.cc:1422
is_rr = false
old_rr_data = 0x0
first_record = 0x2b78ac0094f8
m = 0x1
failed = false
old_r = {m_ptr = 0x0}
af = 2 '\002'
s_size = 16
rrsize = 0
allocSize = 16
r = 0x2b78aee04f00
old_info = { = { = {_vptr.ForceVFPTToTop = 
0x7f3630}, m_refcount = 0}, iobuffer_index = 0, 
  key = 47797242059264, app = {allotment = {application1 = 5326300, 
application2 = 0}, http_data = {http_version = 4, 
  pipeline_max = 59, keepalive_timeout = 17, fail_count = 81, 
unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
ip = {sa = {sa_family = 54488, sa_data = 
"^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488, 
sin_port = 94, 
sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"}, 
sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0, 
sin6_addr = {__in6_u = {__u6_addr8 = 
"\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164, 
11128, 
  0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088, 
11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}}, 
hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight = 
94, srv_priority = 0, srv_port = 0, key = 3156483088}}, 
  hostname_offset = 11128, ip_timestamp = 2845989456, 
ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1, 
  round_robin_elt = 0}
valid_records = 0
tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u = 
{__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000", 
__u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128, 0}, 
__u6_addr32 = {540420056, 11128, 2819640560, 11128}}}, 
_byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 = {540420056, 
11128, 2819640560, 11128}, _u64 = {47794936489944, 
  47797215710448}}}
ttl_seconds = 132
aname = 0x2b7938021000 "fbmm1.zenfs.com"
offset = 96
thread = 0x2b78a8101010
__func__ = "dnsEvent"
#7  0x005145dc in Continuation::handleEvent (this=0x2b7938020f00, 
event=600, data=0x2b78ac009440)
at ../iocore/eventsystem/I_Continuation.h:153
No locals.
#8  0x006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at 
DNS.cc:1269
__func__ = "postEvent"
#9  0x005145dc in Continuation::handleEvent (this=0x2b78f4028600, 
event=1, data=0x2aac954db040)
at ../iocore/eventsystem/I_Continuation.h:153
No locals.
#10 0x007bc9be in EThread::process_event (this=0x2b78a8101010, 
e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
c_temp = 0x2b78f4028600
lock = {m = {m_ptr = 0x17dea10}, lock_acquired = true}
__func__ = "process_event"
#11 0x007bcc2d in EThread::execute (this=0x2b78a8101010) at 
UnixEThread.cc:197
done_one = false
e = 0x2aac954db040
NegativeQueue = {> = {head = 0x18ce400}, 
tail = 0x18ce400}
next_time = 1475191803711988905
__func__ = "execute"
#12 0x007bbfd2 in spawn_thread_internal (a=0x17fb9a0) at Thread.cc:84

[jira] [Commented] (TS-4902) Http2ConnectionState::stream_list gets in bad state

2016-09-30 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535992#comment-15535992
 ] 

Susan Hinrichs commented on TS-4902:


Saw another crash in this area while testing fix for TS-4813.

{code}
#0  0x003231032625 in raise () from /lib64/libc.so.6
#1  0x003231033d8d in abort () from /lib64/libc.so.6
#2  0x2b49eb4b9d8d in ink_abort (message_format=0x2b49eb4ccbdc "%s:%d: 
failed assertion `%s`") at ink_error.cc:79
#3  0x2b49eb4b7586 in _ink_assert (expression=0x805adf 
"client_streams_in_count > 0", file=0x805688 "Http2ConnectionState.cc", 
line=1005) at ink_assert.cc:37
#4  0x0065adeb in Http2ConnectionState::delete_stream 
(this=0x2b4a34a5c190, stream=0x2aac54a69c40) at Http2ConnectionState.cc:1005
#5  0x0065be03 in Http2ConnectionState::send_data_frames 
(this=0x2b4a34a5c190, stream=0x2aac54a69c40)
at Http2ConnectionState.cc:1191
#6  0x0064b57c in Http2Stream::do_io_close (this=0x2aac54a69c40) at 
Http2Stream.cc:330
#7  0x005edb61 in HttpSM::tunnel_handler_ua (this=0x2b4a442dcb40, 
event=103, c=0x2b4a442ddeb8) at HttpSM.cc:3320
#8  0x006439d5 in HttpTunnel::consumer_handler (this=0x2b4a442dde70, 
event=103, c=0x2b4a442ddeb8) at HttpTunnel.cc:1392
#9  0x006442d1 in HttpTunnel::main_handler (this=0x2b4a442dde70, 
event=103, data=0x2aac54a69fd0) at HttpTunnel.cc:1647
#10 0x005145dc in Continuation::handleEvent (this=0x2b4a442dde70, 
event=103, data=0x2aac54a69fd0)
at ../iocore/eventsystem/I_Continuation.h:153
#11 0x0064a8ea in Http2Stream::main_event_handler (this=0x2aac54a69c40, 
event=103, edata=0x2b4a07168100) at Http2Stream.cc:88
#12 0x005145dc in Continuation::handleEvent (this=0x2aac54a69c40, 
event=103, data=0x2b4a07168100)
at ../iocore/eventsystem/I_Continuation.h:153
#13 0x007bc9be in EThread::process_event (this=0x2b49ed3ee010, 
e=0x2b4a07168100, calling_code=103) at UnixEThread.cc:143
#14 0x007bcc2d in EThread::execute (this=0x2b49ed3ee010) at 
UnixEThread.cc:197
#15 0x007bbfd2 in spawn_thread_internal (a=0x15efb60) at Thread.cc:84
#16 0x2b49ec20caa1 in start_thread () from /lib64/libpthread.so.0
#17 0x0032310e893d in clone () from /lib64/libc.so.6
{code}

Digging in at frame 4, stream_list.head is null, but stream->link.prev is 
non-NULL.  stream->link.prev->link.next == stream.  But they seem to have been 
lost from the Http2ConnectionState.stream_list

> Http2ConnectionState::stream_list gets in bad state
> ---
>
> Key: TS-4902
> URL: https://issues.apache.org/jira/browse/TS-4902
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>
> Saw this in one of my runs while debugging TS-4900 and TS-4813.  It might 
> have been due to my attempted fix, but I don't think so. 
> I left current master running in production for an hour.  When I returned, 
> CPU utilization had jumped from 70% to 400%.  perf top showed that the 
> majority of time was being spent in Http2ConnectionState::restart_streams.  
> Connected with debugger and two threads were spinning in that method.  In 
> each case this->stream_list.head->link.next == 
> this->stream_list.head->link.prev and this->thread_list.head->link.next != 
> NULL.
> I assume that was are missing some locking and the stream_list is being 
> manipulated by two threads in parallel.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4899) Http2ClientSession object leaks

2016-09-29 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4899.

Resolution: Fixed

> Http2ClientSession object leaks
> ---
>
> Key: TS-4899
> URL: https://issues.apache.org/jira/browse/TS-4899
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP, HTTP/2
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Running master plus proposed fix for TS-4813 (without this fix, 
> traffic_server crashes very quickly).  
> After a short time (10 minutes), I noticed that the process memory 
> utilization was growing.  I took the machine out of rotation and waited for 
> existing connections to drain.  The memory use summary from SIGUSR1 shows 
> that many (most?) Http2ClientSession, Http2Stream, and Http1ClientSession 
> objects are not being freed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4892) Wrong metrics for proxy.process.http.current_active_client_connections

2016-09-28 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4892:
---
  Affects Version/s: (was: 7.0.0)
 6.2.0
Backport to Version: 6.2.1

> Wrong metrics for proxy.process.http.current_active_client_connections
> --
>
> Key: TS-4892
> URL: https://issues.apache.org/jira/browse/TS-4892
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Metrics, Network
>Affects Versions: 6.2.0
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> On Docs, running current 7.0.x branch, I'm seeing e.g.
> {code}
> proxy.process.http.current_client_connections 1
> proxy.process.http.current_active_client_connections 14994
> {code}
> This continues to grow indefinitely (but we crash pretty often, so resets).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4892) Wrong metrics for proxy.process.http.current_active_client_connections

2016-09-28 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530604#comment-15530604
 ] 

Susan Hinrichs commented on TS-4892:


Probably

> Wrong metrics for proxy.process.http.current_active_client_connections
> --
>
> Key: TS-4892
> URL: https://issues.apache.org/jira/browse/TS-4892
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Metrics, Network
>Affects Versions: 7.0.0
>Reporter: Leif Hedstrom
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> On Docs, running current 7.0.x branch, I'm seeing e.g.
> {code}
> proxy.process.http.current_client_connections 1
> proxy.process.http.current_active_client_connections 14994
> {code}
> This continues to grow indefinitely (but we crash pretty often, so resets).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4813) HttpTunnel.cc:1215: failed assertion `p->alive == true || event == HTTP_TUNNEL_EVENT_PRECOMPLETE ...

2016-09-28 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530406#comment-15530406
 ] 

Susan Hinrichs commented on TS-4813:


I see in our production version of HttpTunnel::producer_handler has a if 
(p->alive) check in the TIMEOUT/ERROR event case.  So that explains why we 
don't see this problem.  Will poke around to see if this is something that we 
added but didn't make it to open source or if it was code explicitly removed 
from open source.

> HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE ...
> 
>
> Key: TS-4813
> URL: https://issues.apache.org/jira/browse/TS-4813
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Leif Hedstrom
>Priority: Blocker
>  Labels: crash
> Fix For: 7.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Seeing this with current (as of right now) master, on docs.trafficserver:
> {code}
> FATAL: HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE || event == VC_EVENT_EOS || 
> sm->enable_redirection || (p->self_consumer && p->self_consumer->alive == 
> true)`
> traffic_server: using root directory '/opt/ats'
> traffic_server: Aborted (Signal sent by tkill() 13188 99)
> traffic_server - STACK TRACE:
> /opt/ats/lib/libtsutil.so.7(signal_crash_handler(int, siginfo_t*, 
> void*)+0x18)[0x2b6d1031729e]
> /opt/ats/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
> void*)+0x155)[0x534104]
> /lib64/libpthread.so.0(+0xf100)[0x2b6d1240f100]
> /lib64/libc.so.6(gsignal+0x37)[0x2b6d12d6e5f7]
> /lib64/libc.so.6(abort+0x148)[0x2b6d12d6fce8]
> /opt/ats/lib/libtsutil.so.7(ink_warning(char const*, ...)+0x0)[0x2b6d102f6f4d]
> /opt/ats/lib/libtsutil.so.7(+0x733a7)[0x2b6d102f13a7]
> /opt/ats/bin/traffic_server(HttpTunnel::producer_handler(int, 
> HttpTunnelProducer*)+0xd14)[0x768a12]
> /opt/ats/bin/traffic_server(HttpTunnel::main_handler(int, 
> void*)+0x13b)[0x76b6e1]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(HttpSM::state_watch_for_client_abort(int, 
> void*)+0x9fe)[0x68c5e6]
> /opt/ats/bin/traffic_server(HttpSM::main_handler(int, void*)+0x58e)[0x69b7ec]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(Http2Stream::main_event_handler(int, 
> void*)+0x59f)[0x79c1df]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(EThread::process_event(Event*, 
> int)+0x2cf)[0xa809fb]
> /opt/ats/bin/traffic_server(EThread::execute()+0x671)[0xa8140f]
> /opt/ats/bin/traffic_server[0xa7f407]
> /lib64/libpthread.so.0(+0x7dc5)[0x2b6d12407dc5]
> /lib64/libc.so.6(clone+0x6d)[0x2b6d12e2fced]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4813) HttpTunnel.cc:1215: failed assertion `p->alive == true || event == HTTP_TUNNEL_EVENT_PRECOMPLETE ...

2016-09-28 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530389#comment-15530389
 ] 

Susan Hinrichs commented on TS-4813:


My fix seems to have reduced the problem, but I just saw the same trace again 
after running in production for several hours.  Before it would crash within a 
few minutes.

My most recent crash could be a legitimate lingering transaction that runs into 
the inactivity timeout.  That is a legitimate case which should not cause a 
crash.  The question is why is HttpTunnel::producer_handler not prepared to 
deal with timeouts?  And why had we not seen this in previous versions?

> HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE ...
> 
>
> Key: TS-4813
> URL: https://issues.apache.org/jira/browse/TS-4813
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Leif Hedstrom
>Priority: Blocker
>  Labels: crash
> Fix For: 7.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Seeing this with current (as of right now) master, on docs.trafficserver:
> {code}
> FATAL: HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE || event == VC_EVENT_EOS || 
> sm->enable_redirection || (p->self_consumer && p->self_consumer->alive == 
> true)`
> traffic_server: using root directory '/opt/ats'
> traffic_server: Aborted (Signal sent by tkill() 13188 99)
> traffic_server - STACK TRACE:
> /opt/ats/lib/libtsutil.so.7(signal_crash_handler(int, siginfo_t*, 
> void*)+0x18)[0x2b6d1031729e]
> /opt/ats/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
> void*)+0x155)[0x534104]
> /lib64/libpthread.so.0(+0xf100)[0x2b6d1240f100]
> /lib64/libc.so.6(gsignal+0x37)[0x2b6d12d6e5f7]
> /lib64/libc.so.6(abort+0x148)[0x2b6d12d6fce8]
> /opt/ats/lib/libtsutil.so.7(ink_warning(char const*, ...)+0x0)[0x2b6d102f6f4d]
> /opt/ats/lib/libtsutil.so.7(+0x733a7)[0x2b6d102f13a7]
> /opt/ats/bin/traffic_server(HttpTunnel::producer_handler(int, 
> HttpTunnelProducer*)+0xd14)[0x768a12]
> /opt/ats/bin/traffic_server(HttpTunnel::main_handler(int, 
> void*)+0x13b)[0x76b6e1]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(HttpSM::state_watch_for_client_abort(int, 
> void*)+0x9fe)[0x68c5e6]
> /opt/ats/bin/traffic_server(HttpSM::main_handler(int, void*)+0x58e)[0x69b7ec]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(Http2Stream::main_event_handler(int, 
> void*)+0x59f)[0x79c1df]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(EThread::process_event(Event*, 
> int)+0x2cf)[0xa809fb]
> /opt/ats/bin/traffic_server(EThread::execute()+0x671)[0xa8140f]
> /opt/ats/bin/traffic_server[0xa7f407]
> /lib64/libpthread.so.0(+0x7dc5)[0x2b6d12407dc5]
> /lib64/libc.so.6(clone+0x6d)[0x2b6d12e2fced]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4900) con_id member variable is shadowed in Http2ClientSession

2016-09-28 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4900.

Resolution: Fixed

> con_id member variable is shadowed in Http2ClientSession
> 
>
> Key: TS-4900
> URL: https://issues.apache.org/jira/browse/TS-4900
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Noticed this while tracking down another bug.  The debug messages were 
> including the con_id value which was always null even through the 
> ProxyClientSession class atomic increments to create a new value on 
> new_connection.
> The member variable con_id is declared in both the parent class 
> (ProxyClientSession) and the child class (Http2ClientSession).  Aside from 
> debugging the con_id member variable doesn't really seem to be used for 
> Http2, so this isn't a critical bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4899) Http2ClientSession object leaks

2016-09-27 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528064#comment-15528064
 ] 

Susan Hinrichs commented on TS-4899:


PR #1055 has fix.  

> Http2ClientSession object leaks
> ---
>
> Key: TS-4899
> URL: https://issues.apache.org/jira/browse/TS-4899
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP, HTTP/2
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Running master plus proposed fix for TS-4813 (without this fix, 
> traffic_server crashes very quickly).  
> After a short time (10 minutes), I noticed that the process memory 
> utilization was growing.  I took the machine out of rotation and waited for 
> existing connections to drain.  The memory use summary from SIGUSR1 shows 
> that many (most?) Http2ClientSession, Http2Stream, and Http1ClientSession 
> objects are not being freed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4899) Http2ClientSession object leaks

2016-09-27 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4899:
---
Summary: Http2ClientSession object leaks  (was: Session and Transaction 
objects leak)

> Http2ClientSession object leaks
> ---
>
> Key: TS-4899
> URL: https://issues.apache.org/jira/browse/TS-4899
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP, HTTP/2
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.1.0
>
>
> Running master plus proposed fix for TS-4813 (without this fix, 
> traffic_server crashes very quickly).  
> After a short time (10 minutes), I noticed that the process memory 
> utilization was growing.  I took the machine out of rotation and waited for 
> existing connections to drain.  The memory use summary from SIGUSR1 shows 
> that many (most?) Http2ClientSession, Http2Stream, and Http1ClientSession 
> objects are not being freed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4902) Http2ConnectionState::stream_list gets in bad state

2016-09-27 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4902:
--

 Summary: Http2ConnectionState::stream_list gets in bad state
 Key: TS-4902
 URL: https://issues.apache.org/jira/browse/TS-4902
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP/2
Reporter: Susan Hinrichs


Saw this in one of my runs while debugging TS-4900 and TS-4813.  It might have 
been due to my attempted fix, but I don't think so. 

I left current master running in production for an hour.  When I returned, CPU 
utilization had jumped from 70% to 400%.  perf top showed that the majority of 
time was being spent in Http2ConnectionState::restart_streams.  Connected with 
debugger and two threads were spinning in that method.  In each case 
this->stream_list.head->link.next == this->stream_list.head->link.prev and 
this->thread_list.head->link.next != NULL.

I assume that was are missing some locking and the stream_list is being 
manipulated by two threads in parallel.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4896) TSHttpTxnClientAddrGet and TSHttpTxnIncomingAddrGet may return NULL

2016-09-26 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4896:
---
Priority: Minor  (was: Major)

> TSHttpTxnClientAddrGet and TSHttpTxnIncomingAddrGet may return NULL
> ---
>
> Key: TS-4896
> URL: https://issues.apache.org/jira/browse/TS-4896
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Priority: Minor
>
> With the clean up rearranging to ensure SSN close occurs after TXN close 
> (TS-4507), the API calls TSHttpTxnClientAddrGet and TSHttpTxnIncomingAddrGet 
> may return NULL.  This can occur in the case where the client connection has 
> terminated, but the HttpSM has not yet shutdown.  We now null out the 
> reference in HttpSM to the client_vc.  These calls fetch the addresses from 
> the client_vc, so it HttpSM reference to it has been removed, these API's 
> will return NULL.
> Locally, we copy these addresses into the ProxyClientSession before the 
> client_vc is disconnected.  We had push back from deployed plugins for a 
> short term fix.  
> Not clear what if anything we want to do in open source.  But wanted people 
> to be aware that this is an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4507) It is still possible for SSN_CLOSE hook to be called before TXN_CLOSE hook

2016-09-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511300#comment-15511300
 ] 

Susan Hinrichs commented on TS-4507:


Requesting the backport because other stability fixes rely on these changes.

> It is still possible for SSN_CLOSE hook to be called before TXN_CLOSE hook
> --
>
> Key: TS-4507
> URL: https://issues.apache.org/jira/browse/TS-4507
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> One of our plugins will occasionally crash.  It appears there is still a path 
> for HTTP2 that has the SSN_CLOSE hook close before the TXN_CLOSE hook.
> Working through solutions that delay the SSN_CLOSE hook until after all the 
> TXN_CLOSE hooks, but does not lose the SSN_CLOSE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4507) It is still possible for SSN_CLOSE hook to be called before TXN_CLOSE hook

2016-09-21 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4507:
---
Backport to Version: 6.2.1

> It is still possible for SSN_CLOSE hook to be called before TXN_CLOSE hook
> --
>
> Key: TS-4507
> URL: https://issues.apache.org/jira/browse/TS-4507
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> One of our plugins will occasionally crash.  It appears there is still a path 
> for HTTP2 that has the SSN_CLOSE hook close before the TXN_CLOSE hook.
> Working through solutions that delay the SSN_CLOSE hook until after all the 
> TXN_CLOSE hooks, but does not lose the SSN_CLOSE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4819) ATS-6.2.x crashes if the message-body of a chunk is not correctly formatted

2016-09-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510882#comment-15510882
 ] 

Susan Hinrichs commented on TS-4819:


Got a fix that addresses Persia's use case, but it may end up closing the 
Client Session (and invoking the SSN_CLOSE hook) before the transaction closes. 
 Poking around in our local code, I see another fix that would preserve the 
hook ordering on shutdown.  Will try to extract that fix and make a PR.

> ATS-6.2.x crashes if the message-body of a chunk is not correctly formatted
> ---
>
> Key: TS-4819
> URL: https://issues.apache.org/jira/browse/TS-4819
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, HTTP
>Reporter: Syeda Persia Aziz
> Fix For: 7.1.0
>
> Attachments: test_post.py
>
>
>  I found this when using the python "requests" library to generate HTTP 
> requests to test the ATS. The request method of this library generates 
> incorrect message body (i.e., does not follow the standard format) if both 
> Content-Length and chunked encoding are specified. ATS can handle requests 
> with these two fields being specified. It is the wrong format of the chunk 
> that makes the ATS crash. The test program to reproduce the issue is 
> attached. If the Content-Length is  removed from the header, then the library 
> generates the correct format and ATS responds correctly. Ideally, 
> content-length and chunked encoding should not be specified together



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4819) ATS-6.2.x crashes if the message-body of a chunk is not correctly formatted

2016-09-21 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510374#comment-15510374
 ] 

Susan Hinrichs commented on TS-4819:


I think this bug is related to TS-4664.  On an error, the state machine calls 
ua_session->do_io_close() which causes the SSN_CLOSE hook to be processed.  

We are running internally with the fix proposed by TS-4664.  This delays the 
ProxyClientSession SSN_CLOSE hook processing until we get to a point when the 
State Machine is gone.  This particular case does not crash in our patched 
5.3.x. Without this fix, the SSN_CLOSE processing happens immediately which 
frees the Http1ClientSession (and Http2ClientTransaction object).  In this 
particular case, the freed ua_session object is referenced on the way out of 
the function causing the crash.  

This error case freeing might also explain some of the crashes we are seeing in 
6.2/7.0.

> ATS-6.2.x crashes if the message-body of a chunk is not correctly formatted
> ---
>
> Key: TS-4819
> URL: https://issues.apache.org/jira/browse/TS-4819
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, HTTP
>Reporter: Syeda Persia Aziz
> Fix For: 7.1.0
>
> Attachments: test_post.py
>
>
>  I found this when using the python "requests" library to generate HTTP 
> requests to test the ATS. The request method of this library generates 
> incorrect message body (i.e., does not follow the standard format) if both 
> Content-Length and chunked encoding are specified. ATS can handle requests 
> with these two fields being specified. It is the wrong format of the chunk 
> that makes the ATS crash. The test program to reproduce the issue is 
> attached. If the Content-Length is  removed from the header, then the library 
> generates the correct format and ATS responds correctly. Ideally, 
> content-length and chunked encoding should not be specified together



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4817) Frequent segfaults in Cache::open_write

2016-09-20 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507046#comment-15507046
 ] 

Susan Hinrichs commented on TS-4817:


I see TransformTerminus on the stack.  Is there a plugin involved?

> Frequent segfaults in Cache::open_write
> ---
>
> Key: TS-4817
> URL: https://issues.apache.org/jira/browse/TS-4817
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache, HTTP
>Reporter: Mathias Biilmann Christensen
> Fix For: 7.1.0
>
> Attachments: core-dump.tar.gz
>
>
> I've been running some tests with the master branch sending some production 
> traffic to a test server, and am seeing very frequent segfaults.
> I'm running fb8bcbcac3c1d6c7a15340f0093342fb9f207e78
> The stack traces look like this:
> {noformat}
> traffic_server - STACK TRACE: 
> /opt/ts/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
> void*)+0xc3)[0x511280]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x2ae12de39330]
> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x2ae12eaa1c37]
> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x2ae12eaa5028]
> /opt/ts/lib/libtsutil.so.7(ink_warning(char const*, ...)+0x0)[0x2ae12cb11f4e]
> /opt/ts/lib/libtsutil.so.7(ats_base64_encode(unsigned char const*, unsigned 
> long, char*, unsigned long, unsigned long*)+0x0)[0x2ae12cb0f8e4]
> /opt/ts/bin/traffic_server(HttpTunnel::consumer_handler(int, 
> HttpTunnelConsumer*)+0xcc)[0x63efb2]
> /opt/ts/bin/traffic_server(HttpTunnel::main_handler(int, 
> void*)+0x133)[0x63f9b7]
> /opt/ts/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x72)[0x51425a]
> /opt/ts/bin/traffic_server(CacheVC::calluser(int)+0xaa)[0x74f4c8]
> /opt/ts/bin/traffic_server(CacheVC::openWriteMain(int, 
> Event*)+0x3de)[0x759d48]
> /opt/ts/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x72)[0x51425a]
> /opt/ts/bin/traffic_server(CacheVC::callcont(int)+0xfc)[0x74f608]
> /opt/ts/bin/traffic_server(CacheVC::openWriteStartDone(int, 
> Event*)+0x8a4)[0x75a9ee]
> /opt/ts/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x72)[0x51425a]
> /opt/ts/bin/traffic_server(CacheVC::handleReadDone(int, 
> Event*)+0xd15)[0x72a3e7]
> /opt/ts/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x72)[0x51425a]
> /opt/ts/bin/traffic_server(Cache::open_write(Continuation*, ats::CryptoHash 
> const*, HTTPInfo*, long, ats::CryptoHash const*, CacheFragType, char const*, 
> int)+0x6f$
> )[0x75b9df]
> /opt/ts/bin/traffic_server(CacheProcessor::open_write(Continuation*, int, 
> HttpCacheKey const*, bool, HTTPHdr*, HTTPInfo*, long, 
> CacheFragType)+0x129)[0x72eee5]
> /opt/ts/bin/traffic_server(HttpCacheSM::open_write(HttpCacheKey const*, URL*, 
> HTTPHdr*, HTTPInfo*, long, bool, bool)+0x236)[0x5d25a2]
> /opt/ts/bin/traffic_server(HttpSM::do_cache_prepare_action(HttpCacheSM*, 
> HTTPInfo*, bool, bool)+0x352)[0x5f054e]
> /opt/ts/bin/traffic_server(HttpSM::do_cache_prepare_write_transform()+0x69)[0x6013dd]
> /opt/ts/bin/traffic_server(HttpSM::set_next_state()+0x1a27)[0x5fb613]
> /opt/ts/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void 
> (*)(HttpTransact::State*))+0x1ae)[0x5f9be4]
> /opt/ts/bin/traffic_server(HttpSM::state_response_wait_for_transform_read(int,
>  void*)+0x19b)[0x5e35cd]
> /opt/ts/bin/traffic_server(HttpSM::main_handler(int, void*)+0x33a)[0x5e88ac]
> /opt/ts/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x72)[0x51425a]
> /opt/ts/bin/traffic_server(TransformTerminus::handle_event(int, 
> void*)+0x2fe)[0x561096]
> /opt/ts/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x72)[0x51425a]
> /opt/ts/bin/traffic_server(EThread::process_event(Event*, 
> int)+0x136)[0x7b3eac]
> /opt/ts/bin/traffic_server(EThread::execute()+0xdc)[0x7b410c]
> /opt/ts/bin/traffic_server(main+0x139c)[0x546c99]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2ae12ea8cf45]
> /opt/ts/bin/traffic_server[0x4f9840]
> FATAL: HttpTunnel.cc:1372: failed assertion `c->alive == true`
> {noformat}
> I'm attaching a core dump from a debug build of ATS as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4468) http.server_session_sharing.match = both unsafe with HTTPS

2016-09-20 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506995#comment-15506995
 ] 

Susan Hinrichs commented on TS-4468:


I am concerned that we would be reducing client-side session reuse in HTTP/2 
case if we get too aggressive in policing SNI matching host field on new 
requests.  

Say the client negotiated a new TLS connection with SNI name set to 
one.bob.come.  It uses HTTP/2 to send a request with HOST set to one.bob.com.  
Then it sends a request with HOST field set to two.bob.com.  one.bob.com and 
two.bob.com resolve to the same address, and the cert is a wildcard for 
*.bob.com, so the client is reusing the same HTTP/2 session.  If we were 
stringent as [~oknet] suggests in bullet 2 we should reject that client 
request, which would reduce the utility of HTTP/2.

[~jered]'s patch only adapts our reuse to meet the requirements of upstream 
HTTP/1.x servers.

At most, if we decided we needed to be stringent in enforcing SNI/host matching 
on client side, I would want the ability to opt out.  

> http.server_session_sharing.match = both unsafe with HTTPS
> --
>
> Key: TS-4468
> URL: https://issues.apache.org/jira/browse/TS-4468
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP, SSL
>Affects Versions: 6.1.1
>Reporter: Jered Floyd
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
> Attachments: TS-4468.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> proxy.config.http.server_session_sharing.match has a default value of "both", 
> which compares IP address, port, and FQDN when determining whether a 
> connection can be reused for further user agent requests.
> The "host" (FQDN) matching does not behave safely when ATS is operating as a 
> reverse proxy.  The compared value is the origin server FQDN after mapping, 
> rather than the initial "Host" target.
> If multiple Hosts map to the same origin server and the scheme is HTTPS, ATS 
> will attempt to reuse a connection that may have an SNI Host that does not 
> match the HTTP Host.  With Apache 2.4 origin servers this results in 400 Bad 
> Request to the user agent.
> PROBLEM REPRODUCTION:
> You can observe this behavior with two mapping rules such as:
> map https://example.com/ https://origin.example.com/
> map https://www.example.com/ https://origin.example.com/
> Non-caching clients alternately fetching URIs from the two targets will see 
> 400 Bad Request responses intermittently.
> WORKAROUND:
> proxy.config.http.server_session_sharing.match should have a default value of 
> "none" when proxy.config.reverse_proxy.enabled is "1"
> SUGGESTED FIXES:
> In order of completeness:
> 1) Do not share server sessions on reverse_proxy requests.
> 2) Do not share server sessions on reverse_proxy requests where scheme is 
> HTTPS.
> 3) Compare target host (SNI host) rather than replacement host when 
> determining if reuse of server session is allowed (when 
> server_session_sharing.match is set to "host" or "both").



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4881) Fix "Warning: Connection leak" log messages.

2016-09-19 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4881.

Resolution: Duplicate

While creating the PR, I see that I already fixed it in open source via TS-4750.

> Fix "Warning: Connection leak" log messages.
> 
>
> Key: TS-4881
> URL: https://issues.apache.org/jira/browse/TS-4881
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> A follow along fix for TS-3901.
> It looks like there is another drift between the cached server IP value used 
> to insert into the session pool (HttpServerSession::server_ip) and the cached 
> server IP value used to look up from the session pool 
> (NetVConnection::remote_addr).
> I'm guessing that the HttpServerSession value was added to avoid calling 
> vc->get_remote_addr() multiple times. But the vc also caches the remote addr, 
> so calls to vc->get_remote_addr should be pretty cheap. 
> Added a debug print to better understand the difference between 
> vc->get_remote_addr() and server_session->server_ip. In this case, the 
> differences are in the ports.
> DEBUG: (http_ss) remote_ip=xx.xx.xx.xx:3128, server_ip=xx.xx.xx.xx:80
> Specifically, vc->get_remote_addr() is the first value and reflects the 
> "real" port used to connection to the server. The server_ip port is the 
> pre-remap port.
> We ended up using remote_ip for both server session insert and lookup and 
> things have been running solidly for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-4881) Fix "Warning: Connection leak" log messages.

2016-09-19 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs reassigned TS-4881:
--

Assignee: Susan Hinrichs

> Fix "Warning: Connection leak" log messages.
> 
>
> Key: TS-4881
> URL: https://issues.apache.org/jira/browse/TS-4881
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
>
> A follow along fix for TS-3901.
> It looks like there is another drift between the cached server IP value used 
> to insert into the session pool (HttpServerSession::server_ip) and the cached 
> server IP value used to look up from the session pool 
> (NetVConnection::remote_addr).
> I'm guessing that the HttpServerSession value was added to avoid calling 
> vc->get_remote_addr() multiple times. But the vc also caches the remote addr, 
> so calls to vc->get_remote_addr should be pretty cheap. 
> Added a debug print to better understand the difference between 
> vc->get_remote_addr() and server_session->server_ip. In this case, the 
> differences are in the ports.
> DEBUG: (http_ss) remote_ip=xx.xx.xx.xx:3128, server_ip=xx.xx.xx.xx:80
> Specifically, vc->get_remote_addr() is the first value and reflects the 
> "real" port used to connection to the server. The server_ip port is the 
> pre-remap port.
> We ended up using remote_ip for both server session insert and lookup and 
> things have been running solidly for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4881) Fix "Warning: Connection leak" log messages.

2016-09-19 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4881:
--

 Summary: Fix "Warning: Connection leak" log messages.
 Key: TS-4881
 URL: https://issues.apache.org/jira/browse/TS-4881
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs


A follow along fix for TS-3901.

It looks like there is another drift between the cached server IP value used to 
insert into the session pool (HttpServerSession::server_ip) and the cached 
server IP value used to look up from the session pool 
(NetVConnection::remote_addr).

I'm guessing that the HttpServerSession value was added to avoid calling 
vc->get_remote_addr() multiple times. But the vc also caches the remote addr, 
so calls to vc->get_remote_addr should be pretty cheap. 

Added a debug print to better understand the difference between 
vc->get_remote_addr() and server_session->server_ip. In this case, the 
differences are in the ports.

DEBUG: (http_ss) remote_ip=xx.xx.xx.xx:3128, server_ip=xx.xx.xx.xx:80

Specifically, vc->get_remote_addr() is the first value and reflects the "real" 
port used to connection to the server. The server_ip port is the pre-remap port.

We ended up using remote_ip for both server session insert and lookup and 
things have been running solidly for us.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4856) Default SSL context fails to load.

2016-09-14 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4856.

Resolution: Fixed

> Default SSL context fails to load.
> --
>
> Key: TS-4856
> URL: https://issues.apache.org/jira/browse/TS-4856
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: James Peach
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This error message appears at startup:
> {noformat}
> [Sep 12 21:07:16.700] Server {0x7f98127d9780} ERROR: failed set default 
> context
> {noformat}
> Out of source context, this error is not especially grammatical.
> The problem seems to be a regression from TS-4671, since the default {{*}} 
> certificate fails to be constructed in {{SSLInitServerContext}} due to the 
> tunnel options check. The default context has neither a certificate nor a 
> tunnel option.
> AFAIK we still need a default certificate to make the TLS negotiation fail 
> when we don't get an actual certificate match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4769) TSSslServerContextCreate always returns null

2016-09-14 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490483#comment-15490483
 ] 

Susan Hinrichs commented on TS-4769:


If you are porting this change back, you want to look at the fix for TS-4856 as 
well.

> TSSslServerContextCreate always returns null
> 
>
> Key: TS-4769
> URL: https://issues.apache.org/jira/browse/TS-4769
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL, TS API
>Reporter: Mathias Biilmann Christensen
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The change to SSLInitServerContext in 
> https://github.com/apache/trafficserver/pull/810 breaks the 
> TSSslServerContextCreate API method, since this one calls 
> TSSslServerContextCreate with an empty sslMultCertSettings.
> This means plugins can't create a fresh SSL context and set the certificates 
> themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4664) Crash due to separate event handlers for IO events and plugin events for ClientSession

2016-09-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488290#comment-15488290
 ] 

Susan Hinrichs commented on TS-4664:


In the PR discussion [~jpe...@apache.org] had concerns about changing this 
structure.  Enough other changes were made cleaning up the session/transaction 
shutdown logic, this change may not longer be needed.  I'm going to let it ride 
and reevaluate whether it is needed in the 7.1.0 time frame.

> Crash due to separate event handlers for IO events and plugin events for 
> ClientSession
> --
>
> Key: TS-4664
> URL: https://issues.apache.org/jira/browse/TS-4664
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Found while tracking TS-4507 and original fix on that branch.
> Cleaned up handling regular events at the same time as plugin events. The 
> original code relied on the subclasses overriding handle_api_event to handle 
> the regular events, but the handler only handled the TIMEOUT event. Changed 
> that to augment the subclasses' main event handler to call out to 
> state_api_callout in the event of the plugin events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4664) Crash due to separate event handlers for IO events and plugin events for ClientSession

2016-09-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4664:
---
Fix Version/s: (was: 7.0.0)
   7.1.0

> Crash due to separate event handlers for IO events and plugin events for 
> ClientSession
> --
>
> Key: TS-4664
> URL: https://issues.apache.org/jira/browse/TS-4664
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.1.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Found while tracking TS-4507 and original fix on that branch.
> Cleaned up handling regular events at the same time as plugin events. The 
> original code relied on the subclasses overriding handle_api_event to handle 
> the regular events, but the handler only handled the TIMEOUT event. Changed 
> that to augment the subclasses' main event handler to call out to 
> state_api_callout in the event of the plugin events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4852) TSSslServerContextCreate doesn't get a ticket keyblock.

2016-09-13 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488280#comment-15488280
 ] 

Susan Hinrichs commented on TS-4852:


I don't see the problem here.  In ssl_callback_session_ticket, if no keyblock 
is associated with the SSLCertContext, the global_default_keyblock is used.  If 
no global_keyblock file was specified, a random keyblock would have been 
created during SSLParseCertificateConfiguration.  

[~persiaAziz] that is the expected flow, right?

> TSSslServerContextCreate doesn't get a ticket keyblock.
> ---
>
> Key: TS-4852
> URL: https://issues.apache.org/jira/browse/TS-4852
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL, TS API
>Reporter: James Peach
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>
> The ticket keyblock is added in {{SSLParseCertificateConfiguration}}, so when 
> you use {{TSSslServerContextCreate}} you get a SSL context without a 
> keyblock. {{TSSslServerContextCreate}} is supposed to make a default session 
> with all the configured parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4703) Adds an API call to retrieve transaction protocol

2016-09-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4703.

Resolution: Fixed

> Adds an API call to retrieve transaction protocol
> -
>
> Key: TS-4703
> URL: https://issues.apache.org/jira/browse/TS-4703
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: TS API
>Reporter: Petar Penkov
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> It would be useful if there was a way to retrieve the underlying protocol for 
> a given transaction through the tsapi at the very least for plugin logging 
> purposes. This can be achieved with a very simple method since this 
> information is already available internally. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4769) TSSslServerContextCreate always returns null

2016-09-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4769.

Resolution: Fixed

> TSSslServerContextCreate always returns null
> 
>
> Key: TS-4769
> URL: https://issues.apache.org/jira/browse/TS-4769
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL, TS API
>Reporter: Mathias Biilmann Christensen
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The change to SSLInitServerContext in 
> https://github.com/apache/trafficserver/pull/810 breaks the 
> TSSslServerContextCreate API method, since this one calls 
> TSSslServerContextCreate with an empty sslMultCertSettings.
> This means plugins can't create a fresh SSL context and set the certificates 
> themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4851) Remove proxy.config.ssl.number.threads remnants.

2016-09-13 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4851.

Resolution: Fixed

> Remove proxy.config.ssl.number.threads remnants.
> 
>
> Key: TS-4851
> URL: https://issues.apache.org/jira/browse/TS-4851
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Reporter: James Peach
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TS-3046 left remnants of {{proxy.config.ssl.number.threads}} behind. We 
> should remove them to avoid any confusion.
> {noformat}
> angler:trafficserver.git jpeach$ git grep proxy.config.ssl.number.threads
> ci/jenkins/ats_conf.pl:$recedit->set(conf => 
> "proxy.config.ssl.number.threads", val => "8");
> ci/tsqa/tests/test_keepalive.py:
> cls.configs['records.config']['CONFIG']['proxy.config.ssl.number.threads'] = 
> -1
> ci/tsqa/tests/test_keepalive.py:
> cls.configs['records.config']['CONFIG']['proxy.config.ssl.number.threads'] = 
> -1
> doc/admin-guide/files/records.config.en.rst:.. ts:cv:: CONFIG 
> proxy.config.ssl.number.threads INT -1
> lib/perl/lib/Apache/TS/AdminClient.pm: proxy.config.ssl.number.threads
> proxy/config/records.config.default.in:CONFIG proxy.config.ssl.number.threads 
> INT -1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-4813) HttpTunnel.cc:1215: failed assertion `p->alive == true || event == HTTP_TUNNEL_EVENT_PRECOMPLETE ...

2016-09-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486140#comment-15486140
 ] 

Susan Hinrichs edited comment on TS-4813 at 9/13/16 3:46 AM:
-

It would be interesting to dig through a core and see if the client session was 
HTTP/2.  It looks like producer_handler should deal with the timeout cases, but 
it does not appear to.  


was (Author: shinrich):
It would be interesting to dig through a core and see if the client session was 
HTTP/2.

> HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE ...
> 
>
> Key: TS-4813
> URL: https://issues.apache.org/jira/browse/TS-4813
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Leif Hedstrom
>Priority: Critical
>  Labels: crash
> Fix For: 7.0.0
>
>
> Seeing this with current (as of right now) master, on docs.trafficserver:
> {code}
> FATAL: HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE || event == VC_EVENT_EOS || 
> sm->enable_redirection || (p->self_consumer && p->self_consumer->alive == 
> true)`
> traffic_server: using root directory '/opt/ats'
> traffic_server: Aborted (Signal sent by tkill() 13188 99)
> traffic_server - STACK TRACE:
> /opt/ats/lib/libtsutil.so.7(signal_crash_handler(int, siginfo_t*, 
> void*)+0x18)[0x2b6d1031729e]
> /opt/ats/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
> void*)+0x155)[0x534104]
> /lib64/libpthread.so.0(+0xf100)[0x2b6d1240f100]
> /lib64/libc.so.6(gsignal+0x37)[0x2b6d12d6e5f7]
> /lib64/libc.so.6(abort+0x148)[0x2b6d12d6fce8]
> /opt/ats/lib/libtsutil.so.7(ink_warning(char const*, ...)+0x0)[0x2b6d102f6f4d]
> /opt/ats/lib/libtsutil.so.7(+0x733a7)[0x2b6d102f13a7]
> /opt/ats/bin/traffic_server(HttpTunnel::producer_handler(int, 
> HttpTunnelProducer*)+0xd14)[0x768a12]
> /opt/ats/bin/traffic_server(HttpTunnel::main_handler(int, 
> void*)+0x13b)[0x76b6e1]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(HttpSM::state_watch_for_client_abort(int, 
> void*)+0x9fe)[0x68c5e6]
> /opt/ats/bin/traffic_server(HttpSM::main_handler(int, void*)+0x58e)[0x69b7ec]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(Http2Stream::main_event_handler(int, 
> void*)+0x59f)[0x79c1df]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(EThread::process_event(Event*, 
> int)+0x2cf)[0xa809fb]
> /opt/ats/bin/traffic_server(EThread::execute()+0x671)[0xa8140f]
> /opt/ats/bin/traffic_server[0xa7f407]
> /lib64/libpthread.so.0(+0x7dc5)[0x2b6d12407dc5]
> /lib64/libc.so.6(clone+0x6d)[0x2b6d12e2fced]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4813) HttpTunnel.cc:1215: failed assertion `p->alive == true || event == HTTP_TUNNEL_EVENT_PRECOMPLETE ...

2016-09-12 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486140#comment-15486140
 ] 

Susan Hinrichs commented on TS-4813:


It would be interesting to dig through a core and see if the client session was 
HTTP/2.

> HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE ...
> 
>
> Key: TS-4813
> URL: https://issues.apache.org/jira/browse/TS-4813
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Reporter: Leif Hedstrom
>Priority: Critical
>  Labels: crash
> Fix For: 7.0.0
>
>
> Seeing this with current (as of right now) master, on docs.trafficserver:
> {code}
> FATAL: HttpTunnel.cc:1215: failed assertion `p->alive == true || event == 
> HTTP_TUNNEL_EVENT_PRECOMPLETE || event == VC_EVENT_EOS || 
> sm->enable_redirection || (p->self_consumer && p->self_consumer->alive == 
> true)`
> traffic_server: using root directory '/opt/ats'
> traffic_server: Aborted (Signal sent by tkill() 13188 99)
> traffic_server - STACK TRACE:
> /opt/ats/lib/libtsutil.so.7(signal_crash_handler(int, siginfo_t*, 
> void*)+0x18)[0x2b6d1031729e]
> /opt/ats/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, 
> void*)+0x155)[0x534104]
> /lib64/libpthread.so.0(+0xf100)[0x2b6d1240f100]
> /lib64/libc.so.6(gsignal+0x37)[0x2b6d12d6e5f7]
> /lib64/libc.so.6(abort+0x148)[0x2b6d12d6fce8]
> /opt/ats/lib/libtsutil.so.7(ink_warning(char const*, ...)+0x0)[0x2b6d102f6f4d]
> /opt/ats/lib/libtsutil.so.7(+0x733a7)[0x2b6d102f13a7]
> /opt/ats/bin/traffic_server(HttpTunnel::producer_handler(int, 
> HttpTunnelProducer*)+0xd14)[0x768a12]
> /opt/ats/bin/traffic_server(HttpTunnel::main_handler(int, 
> void*)+0x13b)[0x76b6e1]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(HttpSM::state_watch_for_client_abort(int, 
> void*)+0x9fe)[0x68c5e6]
> /opt/ats/bin/traffic_server(HttpSM::main_handler(int, void*)+0x58e)[0x69b7ec]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(Http2Stream::main_event_handler(int, 
> void*)+0x59f)[0x79c1df]
> /opt/ats/bin/traffic_server(Continuation::handleEvent(int, 
> void*)+0x149)[0x53a621]
> /opt/ats/bin/traffic_server(EThread::process_event(Event*, 
> int)+0x2cf)[0xa809fb]
> /opt/ats/bin/traffic_server(EThread::execute()+0x671)[0xa8140f]
> /opt/ats/bin/traffic_server[0xa7f407]
> /lib64/libpthread.so.0(+0x7dc5)[0x2b6d12407dc5]
> /lib64/libc.so.6(clone+0x6d)[0x2b6d12e2fced]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-2987) TS API to identify if the client connection is via HTTP/2

2016-09-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-2987.

Resolution: Fixed

> TS API to identify if the client connection is via HTTP/2
> -
>
> Key: TS-2987
> URL: https://issues.apache.org/jira/browse/TS-2987
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: HTTP/2, TS API
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Need a TS API for the plugins to be able to identify whether the incoming 
> client connection is via SPDY. The plugins would like to relay this 
> information over to the origins which may return a different kind of response 
> for a spdy client vs a non-spdy client. For example, the origins may skip the 
> optimizations such as domain-sharding which work well with non-spdy clients, 
> but, would cancel the benefits of spdy to multiplex requests. 
> The proposed API (the sole credit goes to [~amc]) checks the plugin_tag to 
> identify if the connection is spdy. In the future, the HttpSM data structure 
> may be enhanced to store a spdy indicator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4855) Make const char vs char const consistent in TS API

2016-09-12 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4855:
--

 Summary: Make const char vs char const consistent in TS API
 Key: TS-4855
 URL: https://issues.apache.org/jira/browse/TS-4855
 Project: Traffic Server
  Issue Type: Improvement
  Components: TS API
Reporter: Susan Hinrichs


Noted by [~jpe...@apache.org] while reviewing PR for TS-4703.  We use both 
"const char" and "char const" in the TS API prototypes.  Should pick one and be 
consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4842) Log field cqtr is inaccurate for HTTP/2

2016-09-12 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4842.

Resolution: Duplicate

Looks like I bounced a key and created two bugs.

> Log field cqtr is inaccurate for HTTP/2
> ---
>
> Key: TS-4842
> URL: https://issues.apache.org/jira/browse/TS-4842
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2, Logging
>Reporter: Susan Hinrichs
>
> cqtr is supposed to log whether a transaction was executed over a previously 
> opened TCP connection.  However, with HTTP/2, this field is created from the 
> Http2ClientSession::get_transact_count() method.  This incorrectly returns 
> the connection id which is constant for the session.  It should instead 
> return the number of streams opened or something like that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4480) Wildcards in certificates should only match one level

2016-09-10 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479764#comment-15479764
 ] 

Susan Hinrichs commented on TS-4480:


Fix is on PR #992.  Looks like I need to rebase it.

> Wildcards in certificates should only match one level
> -
>
> Key: TS-4480
> URL: https://issues.apache.org/jira/browse/TS-4480
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Reporter: Michael Sokolnicki
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
> Attachments: current_patch.diff
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> According to RFC 6125 section 6.4.3:
> {quote}
> If the wildcard character is the only character of the left-most label in the 
> presented identifier, the client SHOULD NOT compare against anything but the 
> left-most label of the reference identifier (e.g., *.example.com would match 
> foo.example.com but not bar.foo.example.com or example.com).
> {quote}
> In the current implementation, certificates are searched for in a trie, and 
> the longest match is returned, but there is no check if that match complies 
> with the above rule. This causes invalid certs to be returned and SLL errors 
> in the browser (in Firefox, we get SSL_ERROR_BAD_CERT_DOMAIN).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4459) Force domain names in cert to lower on insert into lookup tree

2016-09-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4459.

Resolution: Fixed

Odd, the PR seemed to get de-synced here too.  Finally addressed via PR 972.  
Commit # 12ab6b1f05c16416bc378af568f583b2147325d8

> Force domain names in cert to lower on insert into lookup tree
> --
>
> Key: TS-4459
> URL: https://issues.apache.org/jira/browse/TS-4459
> Project: Traffic Server
>  Issue Type: Bug
>  Components: SSL
>Reporter: Steven Feltner
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We have certs from a legacy system that were issued with mixed case domain 
> names.  We are migrating this older product over to ATS and found that domain 
> names need to be lower cased before being inserted in the lookup table.
> I will be submitting  a pull request to resolve this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4503) MachineFatal should shutdown without cleanup

2016-09-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4503.

Resolution: Fixed

Fixed by Commit 0ea702cc3399c4e3061b5d06bcc4cac2bc00f1a1

> MachineFatal should shutdown without cleanup
> 
>
> Key: TS-4503
> URL: https://issues.apache.org/jira/browse/TS-4503
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Syeda Persia Aziz
> Fix For: 7.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When MachineFatalClass::raise() is called, it prints a message to the log and 
> then calls exit.  exit causes memory cleanup to be called.  But if we are in 
> such bad state that MachineFatal is called, it is quite likely that memory is 
> messed up.
> We saw a crash where MachineFatal is called in thread 84.  This stack has the 
> real error.  But the stack that got reported was on thread 1 in class 
> destructor logic.  It would have been better if ATS failed immediately and 
> the stack on thread 84 was reported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4263) Session tickets keys in ssl_multicert.config do not work with SNI discovered hosts

2016-09-10 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4263.

Resolution: Fixed

Fixed by PR #942

Introduced as commit 1454812852ccc04635ce36db8afb81d7a6e5469e

> Session tickets keys in ssl_multicert.config do not work with SNI discovered 
> hosts
> --
>
> Key: TS-4263
> URL: https://issues.apache.org/jira/browse/TS-4263
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Configuration, SSL
>Reporter: Leif Hedstrom
>Assignee: Syeda Persia Aziz
>  Labels: A
> Fix For: 7.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> If you have a ssl_multicert.config without dest_ip= rules, i.e. requiring SNI 
> negotiation to get a TLS session, then you can not configure the session 
> ticket keys block, at all. Meaning, there's no way to share the keys across 
> more than one machine.
> I went down a bit of a rathole trying to fix this, but it's somewhat ugly. At 
> the point of resuming a session, the SSL call back provides the 16 byte 
> key-name, but the SNI name is seemingly not available at this point.
> A possible solution is to change the lookups to always be on the 16-byte 
> key-name, and keep a separate lookup table for the key blocks. This is in 
> itself a little ugly, because the ownerships around SSLCertContext is a 
> little murky. But it seems the cleanest, and definitely seemed to have been 
> the intent from OpenSSL's callback signature.
> Another option, which could not be done in the 6.x release cycle, is to 
> remove the ticket_key_name= option from ssl_multicert.config entirely, and 
> only have a single, global key block configured via records.config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4152) Build failure when curses is not available

2016-09-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478845#comment-15478845
 ] 

Susan Hinrichs commented on TS-4152:


I had problems building on rhel6 after pulling in this change.  traffic_cop 
wouldn't build because it didn't believe that I had ncursesw.  Once I reverted 
this commit all was fine.  I see that it built on jenkins, so I assume there 
are some other intermediate files I need to kill beyond the autoreconf and 
rerunning configure?  Once I get done with my current branch, I'll create a new 
git clone and that hopefully clears things up.

> Build failure when curses is not available
> --
>
> Key: TS-4152
> URL: https://issues.apache.org/jira/browse/TS-4152
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Build
>Reporter: James Peach
>Assignee: Jason Kenny
>  Labels: newbie
> Fix For: 7.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> On Centos6 w/ devtoolset-3:
> {code}
> checking for NcursesW wide-character library... yes
> checking for working ncursesw/curses.h... no
> checking for working ncursesw.h... no
> checking for working ncurses.h... no
> configure: WARNING: could not find a working ncursesw/curses.h, ncursesw.h or 
> ncurses.h
> ...
> Making all in traffic_top
> make[2]: Entering directory 
> `/home/vagrant/trafficserver-6.1.0/cmd/traffic_top'
>   CXX  traffic_top.o
> traffic_top.cc:51:2: error: #error "SysV or X/Open-compatible Curses header 
> file required"
>  #error "SysV or X/Open-compatible Curses header file required"
> {code}
> The build is not supposed to try to build {{traffic_top}} if ncurses is not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4842) Log field cqtr is inaccurate for HTTP/2

2016-09-09 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4842:
--

 Summary: Log field cqtr is inaccurate for HTTP/2
 Key: TS-4842
 URL: https://issues.apache.org/jira/browse/TS-4842
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP/2, Logging
Reporter: Susan Hinrichs


cqtr is supposed to log whether a transaction was executed over a previously 
opened TCP connection.  However, with HTTP/2, this field is created from the 
Http2ClientSession::get_transact_count() method.  This incorrectly returns the 
connection id which is constant for the session.  It should instead return the 
number of streams opened or something like that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4841) Log field cqtr is inaccurate for HTTP/2

2016-09-09 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4841:
--

 Summary: Log field cqtr is inaccurate for HTTP/2
 Key: TS-4841
 URL: https://issues.apache.org/jira/browse/TS-4841
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP/2, Logging
Reporter: Susan Hinrichs


cqtr is supposed to log whether a transaction was executed over a previously 
opened TCP connection.  However, with HTTP/2, this field is created from the 
Http2ClientSession::get_transact_count() method.  This incorrectly returns the 
connection id which is constant for the session.  It should instead return the 
number of streams opened or something like that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4453) READ_COMPLETE signal is not being sent for short POST's

2016-09-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477487#comment-15477487
 ] 

Susan Hinrichs commented on TS-4453:


I re-examined my internal change logs and tried to reproduce the short post 
problem on an install of the current master using HTTP2 and HTTP/1.1.  I was 
unable to reproduce the problem.  In both cases according to the logs the short 
post body was processed as a PRECOMPLETE, which means the post body was read in 
with the request header.  

I'm guessing that the fix got pulled in with TS-4507.  I am going to close the 
issue for now.  We can reopen it if people are still seeing this in the wild.

> READ_COMPLETE signal is not being sent for short POST's
> ---
>
> Key: TS-4453
> URL: https://issues.apache.org/jira/browse/TS-4453
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>
> Reported by [~briang].  Since TS-3612 commit, he has noticed that the 
> READ_COMPLETE event is not being delivered to the VIO in the case of short 
> POSTs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4453) READ_COMPLETE signal is not being sent for short POST's

2016-09-09 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4453.

Resolution: Cannot Reproduce

> READ_COMPLETE signal is not being sent for short POST's
> ---
>
> Key: TS-4453
> URL: https://issues.apache.org/jira/browse/TS-4453
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>
> Reported by [~briang].  Since TS-3612 commit, he has noticed that the 
> READ_COMPLETE event is not being delivered to the VIO in the case of short 
> POSTs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4839) Create http.server_session_sharing.match = lax_hostname

2016-09-09 Thread Susan Hinrichs (JIRA)
Susan Hinrichs created TS-4839:
--

 Summary: Create http.server_session_sharing.match = lax_hostname
 Key: TS-4839
 URL: https://issues.apache.org/jira/browse/TS-4839
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Reporter: Susan Hinrichs


With TS-4468, we restrict session reuse for both or hostname to match the SNI 
name as well as the hostname for session reuse.  By default this strict 
matching is appropriate.  The origin server may be checking that the SNI name 
matches the HTTP request hostname.

However, for some origins, this check is not performed.  And the strict check 
will reduce session reuse.  For a reverse proxy case, we may want to 
reintroduce the original lax matching for the improved performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4453) READ_COMPLETE signal is not being sent for short POST's

2016-09-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477385#comment-15477385
 ] 

Susan Hinrichs commented on TS-4453:


I haven't seen this lately.  I think I solved a problem like this internally.  
Let me review my various versions for the fix to make sure it was propagated to 
master.

Are folks still seeing this?

> READ_COMPLETE signal is not being sent for short POST's
> ---
>
> Key: TS-4453
> URL: https://issues.apache.org/jira/browse/TS-4453
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>
> Reported by [~briang].  Since TS-3612 commit, he has noticed that the 
> READ_COMPLETE event is not being delivered to the VIO in the case of short 
> POSTs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-3046) Phase out proxy.config.ssl.number.threads and force ET_NET threads to handle SSL connections

2016-09-09 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-3046.

Resolution: Fixed

> Phase out proxy.config.ssl.number.threads and force ET_NET threads to handle 
> SSL connections
> 
>
> Key: TS-3046
> URL: https://issues.apache.org/jira/browse/TS-3046
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Reporter: Sudheer Vinukonda
>Assignee: Susan Hinrichs
>Priority: Blocker
>  Labels: incompatible
> Fix For: 7.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This bug is a follow up on completely phasing out ET_SSL threads (refer 
> TS-2574 and TS-3045). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4657) SNI hook sends hook ID for events

2016-09-09 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4657.

Resolution: Fixed

> SNI hook sends hook ID for events
> -
>
> Key: TS-4657
> URL: https://issues.apache.org/jira/browse/TS-4657
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: TS API
>Reporter: James Peach
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If you use the {{TS_SSL_SNI_HOOK}} hook, it will send {{TS_SSL_SNI_HOOK}} 
> values as the event. {{TS_SSL_SNI_HOOK}} is not a valid {{TSEvent}} value.
> It is also weird that {{TS_SSL_SNI_HOOK}} and {{TS_SSL_CERT_HOOK}} have the 
> same value. One of these ought to be redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4503) MachineFatal should shutdown without cleanup

2016-09-09 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477205#comment-15477205
 ] 

Susan Hinrichs commented on TS-4503:


Persia is working on this (PR 991).  Don't know why her PR activity isn't 
adding comments. here.

> MachineFatal should shutdown without cleanup
> 
>
> Key: TS-4503
> URL: https://issues.apache.org/jira/browse/TS-4503
> Project: Traffic Server
>  Issue Type: Bug
>Reporter: Susan Hinrichs
>Assignee: Syeda Persia Aziz
> Fix For: 7.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When MachineFatalClass::raise() is called, it prints a message to the log and 
> then calls exit.  exit causes memory cleanup to be called.  But if we are in 
> such bad state that MachineFatal is called, it is quite likely that memory is 
> messed up.
> We saw a crash where MachineFatal is called in thread 84.  This stack has the 
> real error.  But the stack that got reported was on thread 1 in class 
> destructor logic.  It would have been better if ATS failed immediately and 
> the stack on thread 84 was reported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4468) http.server_session_sharing.match = both unsafe with HTTPS

2016-09-08 Thread Susan Hinrichs (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475075#comment-15475075
 ] 

Susan Hinrichs commented on TS-4468:


Sorry, disappeared off the thread.  Distracted with other fires and shiny 
things.  [~jered] I took a look at the patch and it seems quite reasonable.  
I'll set up a test build locally and play with it a bit.  I can put up a PR if 
you don't get to it first. 

> http.server_session_sharing.match = both unsafe with HTTPS
> --
>
> Key: TS-4468
> URL: https://issues.apache.org/jira/browse/TS-4468
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP, SSL
>Affects Versions: 6.1.1
>Reporter: Jered Floyd
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
> Attachments: TS-4468.patch
>
>
> proxy.config.http.server_session_sharing.match has a default value of "both", 
> which compares IP address, port, and FQDN when determining whether a 
> connection can be reused for further user agent requests.
> The "host" (FQDN) matching does not behave safely when ATS is operating as a 
> reverse proxy.  The compared value is the origin server FQDN after mapping, 
> rather than the initial "Host" target.
> If multiple Hosts map to the same origin server and the scheme is HTTPS, ATS 
> will attempt to reuse a connection that may have an SNI Host that does not 
> match the HTTP Host.  With Apache 2.4 origin servers this results in 400 Bad 
> Request to the user agent.
> PROBLEM REPRODUCTION:
> You can observe this behavior with two mapping rules such as:
> map https://example.com/ https://origin.example.com/
> map https://www.example.com/ https://origin.example.com/
> Non-caching clients alternately fetching URIs from the two targets will see 
> 400 Bad Request responses intermittently.
> WORKAROUND:
> proxy.config.http.server_session_sharing.match should have a default value of 
> "none" when proxy.config.reverse_proxy.enabled is "1"
> SUGGESTED FIXES:
> In order of completeness:
> 1) Do not share server sessions on reverse_proxy requests.
> 2) Do not share server sessions on reverse_proxy requests where scheme is 
> HTTPS.
> 3) Compare target host (SNI host) rather than replacement host when 
> determining if reuse of server session is allowed (when 
> server_session_sharing.match is set to "host" or "both").



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4832) HTTP/2 delete on stack crash

2016-09-08 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4832:
---
Backport to Version:   (was: 6.2.1)

> HTTP/2 delete on stack crash
> 
>
> Key: TS-4832
> URL: https://issues.apache.org/jira/browse/TS-4832
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>
> A crash we had seen in production.
> {code}
> #0  0x0051589a in Mutex_lock (m=0x0, t=0x2b859fe1e010) at 
> ../iocore/eventsystem/I_Lock.h:380
> #1  0x0053d518 in MutexLock::MutexLock (this=0x2b85a571f1d0, am=0x0, 
> t=0x2b859fe1e010) at ../iocore/eventsystem/I_Lock.h:447
> #2  0x00653944 in Http2Stream::initiating_close (this=0x2b86a10995c0) 
> at Http2Stream.cc:367
> #3  0x00650210 in Http2ConnectionState::delete_stream 
> (this=0x2b85ac54ce28, stream=0x2b86a10995c0)
> at Http2ConnectionState.cc:879
> #4  0x0065097d in Http2ConnectionState::send_data_frame 
> (this=0x2b85ac54ce28, stream=0x2b86a10995c0)
> at Http2ConnectionState.cc:986
> #5  0x0064f441 in rcv_window_update_frame (cs=..., cstate=..., 
> frame=...) at Http2ConnectionState.cc:591
> #6  0x0064fcdb in Http2ConnectionState::main_event_handler 
> (this=0x2b85ac54ce28, event=2253, edata=0x2b85a5721610)
> at Http2ConnectionState.cc:753
> #7  0x00515a58 in Continuation::handleEvent (this=0x2b85ac54ce28, 
> event=2253, data=0x2b85a5721610)
> at ../iocore/eventsystem/I_Continuation.h:150
> #8  0x00649ef9 in send_connection_event (cont=0x2b85ac54ce28, 
> event=2253, edata=0x2b85a5721610) at Http2ClientSession.cc:61
> #9  0x0064c247 in Http2ClientSession::state_complete_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:482
> #10 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #11 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #12 0x0064bf8f in Http2ClientSession::state_start_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:455
> #13 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #14 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #15 0x0064c2b8 in Http2ClientSession::state_complete_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:487
> #16 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #17 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #18 0x0064bf8f in Http2ClientSession::state_start_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:455
> #19 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #20 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #21 0x0078679d in read_signal_and_update (event=100, 
> vc=0x2b86fc70bed0) at UnixNetVConnection.cc:148
> #22 0x007895ba in UnixNetVConnection::readSignalAndUpdate 
> (this=0x2b86fc70bed0, event=100) at UnixNetVConnection.cc:1013
> #23 0x0076de67 in SSLNetVConnection::net_read_io 
> (this=0x2b86fc70bed0, nh=0x2b859fe21d60, lthread=0x2b859fe1e010)
> at SSLNetVConnection.cc:576
> ---Type  to continue, or q  to quit---
> #24 0x00780011 in NetHandler::waitForActivity (this=0x2b859fe21d60, 
> timeout=6000) at UnixNet.cc:547
> #25 0x007a7c69 in EThread::execute_regular (this=0x2b859fe1e010) at 
> UnixEThread.cc:266
> #26 0x007a7dac in EThread::execute (this=0x2b859fe1e010) at 
> UnixEThread.cc:304
> #27 0x007a6965 in spawn_thread_internal (a=0x112f520) at Thread.cc:85
> #28 0x2b859d8deaa1 in start_thread () from /lib64/libpthread.so.0
> #29 0x0031d68e893d in clone () from /lib64/libc.so.6
> {code}
> I think the issue was that I removed a schedule_immediate _EOS to do the 
> Http2Stream destroy in an earlier fi and instead did the destroy inline. But 
> that class does not include recursion logic to delay the delete until the 
> stack is unbound. By 

[jira] [Updated] (TS-4830) Http2 write_vio.reenable crash

2016-09-08 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4830:
---
Backport to Version:   (was: 6.2.1)

> Http2 write_vio.reenable crash
> --
>
> Key: TS-4830
> URL: https://issues.apache.org/jira/browse/TS-4830
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
>
> We were seeing crashes with following stack.
> {code}
> gdb) bt
> #0  0x00368400f807 in ?? () from /lib64/libgcc_s.so.1
> #1  0x0036840100b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #2  0x0036820fe936 in backtrace () from /lib64/libc.so.6
> #3  0x2b3543897af7 in ink_stack_trace_dump () at ink_stack_trace.cc:60
> #4  0x2b3543899cd4 in signal_crash_handler (signo=11) at signals.cc:183
> #5  0x00512b5c in crash_logger_invoke (signo=11, info=0x2b354ce0a4f0, 
> ctx=0x2b354ce0a3c0) at Crash.cc:169
> #6  
> #7  0x0002 in ?? ()
> #8  0x00515a0f in VIO::reenable (this=0x2b36e8020228) at 
> ../iocore/eventsystem/P_VIO.h:112
> #9  0x0064cbc9 in Http2ClientSession::write_reenable 
> (this=0x2b36ad35ed10) at Http2ClientSession.h:195
> #10 0x0064b0e5 in Http2ClientSession::main_event_handler 
> (this=0x2b36ad35ed10, event=2254, 
> edata=0x2b354ce0c980) at Http2ClientSession.cc:298
> #11 0x005159c8 in Continuation::handleEvent (this=0x2b36ad35ed10, 
> event=2254, data=0x2b354ce0c980)
> at ../iocore/eventsystem/I_Continuation.h:150
> #12 0x006508c1 in Http2ConnectionState::send_data_frame 
> (this=0x2b36ad35ef50, stream=0x2b362cfc1120)
> at Http2ConnectionState.cc:977
> #13 0x0065360b in Http2Stream::do_io_close (this=0x2b362cfc1120) at 
> Http2Stream.cc:293
> #14 0x005f5900 in HttpSM::tunnel_handler_ua (this=0x2b369a1b6750, 
> event=103, c=0x2b369a1b7c38)
> at HttpSM.cc:3370
> #15 0x0063f3f6 in HttpTunnel::consumer_handler (this=0x2b369a1b7bf8, 
> event=103, c=0x2b369a1b7c38)
> at HttpTunnel.cc:1326
> #16 0x0063fbbd in HttpTunnel::main_handler (this=0x2b369a1b7bf8, 
> event=103, data=0x2b362cfc14a0)
> at HttpTunnel.cc:1576
> #17 0x005159c8 in Continuation::handleEvent (this=0x2b369a1b7bf8, 
> event=103, data=0x2b362cfc14a0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #18 0x00652b7b in Http2Stream::main_event_handler 
> (this=0x2b362cfc1120, event=103, edata=0x2b36840c0910)
> at Http2Stream.cc:85
> #19 0x005159c8 in Continuation::handleEvent (this=0x2b362cfc1120, 
> event=103, data=0x2b36840c0910)
> at ../iocore/eventsystem/I_Continuation.h:150
> #20 0x007a79ee in EThread::process_event (this=0x2b35468af010, 
> e=0x2b36840c0910, calling_code=103)
> at UnixEThread.cc:145
> #21 0x007a7d8d in EThread::execute_regular (this=0x2b35468af010) at 
> UnixEThread.cc:212
> #22 0x007a8164 in EThread::execute (this=0x2b35468af010) at 
> UnixEThread.cc:304
> #23 0x007a6d1d in spawn_thread_internal (a=0x1775a60) at Thread.cc:85
> #24 0x2b35444b3aa1 in start_thread () from /lib64/libpthread.so.0
> #25 0x0036820e893d in clone () from /lib64/libc.so.6
> {code}
> Digging in to the data structures it looks like the client has already sent 
> an EOS.  Http2ClientSession.client_vc is NULL.  But a write_vio lingers (with 
> cont and nbytes set to 0 but buffer non-NULL).  I would assume we shouldn't 
> be doing a final write in this case if the client has initiated the shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TS-4832) HTTP/2 delete on stack crash

2016-09-08 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs resolved TS-4832.

Resolution: Won't Fix

Nevermind,  I see this issue already got moved up.

> HTTP/2 delete on stack crash
> 
>
> Key: TS-4832
> URL: https://issues.apache.org/jira/browse/TS-4832
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>
> A crash we had seen in production.
> {code}
> #0  0x0051589a in Mutex_lock (m=0x0, t=0x2b859fe1e010) at 
> ../iocore/eventsystem/I_Lock.h:380
> #1  0x0053d518 in MutexLock::MutexLock (this=0x2b85a571f1d0, am=0x0, 
> t=0x2b859fe1e010) at ../iocore/eventsystem/I_Lock.h:447
> #2  0x00653944 in Http2Stream::initiating_close (this=0x2b86a10995c0) 
> at Http2Stream.cc:367
> #3  0x00650210 in Http2ConnectionState::delete_stream 
> (this=0x2b85ac54ce28, stream=0x2b86a10995c0)
> at Http2ConnectionState.cc:879
> #4  0x0065097d in Http2ConnectionState::send_data_frame 
> (this=0x2b85ac54ce28, stream=0x2b86a10995c0)
> at Http2ConnectionState.cc:986
> #5  0x0064f441 in rcv_window_update_frame (cs=..., cstate=..., 
> frame=...) at Http2ConnectionState.cc:591
> #6  0x0064fcdb in Http2ConnectionState::main_event_handler 
> (this=0x2b85ac54ce28, event=2253, edata=0x2b85a5721610)
> at Http2ConnectionState.cc:753
> #7  0x00515a58 in Continuation::handleEvent (this=0x2b85ac54ce28, 
> event=2253, data=0x2b85a5721610)
> at ../iocore/eventsystem/I_Continuation.h:150
> #8  0x00649ef9 in send_connection_event (cont=0x2b85ac54ce28, 
> event=2253, edata=0x2b85a5721610) at Http2ClientSession.cc:61
> #9  0x0064c247 in Http2ClientSession::state_complete_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:482
> #10 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #11 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #12 0x0064bf8f in Http2ClientSession::state_start_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:455
> #13 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #14 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #15 0x0064c2b8 in Http2ClientSession::state_complete_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:487
> #16 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #17 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #18 0x0064bf8f in Http2ClientSession::state_start_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:455
> #19 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #20 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #21 0x0078679d in read_signal_and_update (event=100, 
> vc=0x2b86fc70bed0) at UnixNetVConnection.cc:148
> #22 0x007895ba in UnixNetVConnection::readSignalAndUpdate 
> (this=0x2b86fc70bed0, event=100) at UnixNetVConnection.cc:1013
> #23 0x0076de67 in SSLNetVConnection::net_read_io 
> (this=0x2b86fc70bed0, nh=0x2b859fe21d60, lthread=0x2b859fe1e010)
> at SSLNetVConnection.cc:576
> ---Type  to continue, or q  to quit---
> #24 0x00780011 in NetHandler::waitForActivity (this=0x2b859fe21d60, 
> timeout=6000) at UnixNet.cc:547
> #25 0x007a7c69 in EThread::execute_regular (this=0x2b859fe1e010) at 
> UnixEThread.cc:266
> #26 0x007a7dac in EThread::execute (this=0x2b859fe1e010) at 
> UnixEThread.cc:304
> #27 0x007a6965 in spawn_thread_internal (a=0x112f520) at Thread.cc:85
> #28 0x2b859d8deaa1 in start_thread () from /lib64/libpthread.so.0
> #29 0x0031d68e893d in clone () from /lib64/libc.so.6
> {code}
> I think the issue was that I removed a schedule_immediate _EOS to do the 
> Http2Stream destroy in an earlier fi and instead did the destroy inline. But 
> that class does not include recursion logic to delay the 

[jira] [Updated] (TS-4832) HTTP/2 delete on stack crash

2016-09-08 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-4832:
---
Fix Version/s: 7.0.0

> HTTP/2 delete on stack crash
> 
>
> Key: TS-4832
> URL: https://issues.apache.org/jira/browse/TS-4832
> Project: Traffic Server
>  Issue Type: Bug
>  Components: HTTP/2
>Reporter: Susan Hinrichs
>
> A crash we had seen in production.
> {code}
> #0  0x0051589a in Mutex_lock (m=0x0, t=0x2b859fe1e010) at 
> ../iocore/eventsystem/I_Lock.h:380
> #1  0x0053d518 in MutexLock::MutexLock (this=0x2b85a571f1d0, am=0x0, 
> t=0x2b859fe1e010) at ../iocore/eventsystem/I_Lock.h:447
> #2  0x00653944 in Http2Stream::initiating_close (this=0x2b86a10995c0) 
> at Http2Stream.cc:367
> #3  0x00650210 in Http2ConnectionState::delete_stream 
> (this=0x2b85ac54ce28, stream=0x2b86a10995c0)
> at Http2ConnectionState.cc:879
> #4  0x0065097d in Http2ConnectionState::send_data_frame 
> (this=0x2b85ac54ce28, stream=0x2b86a10995c0)
> at Http2ConnectionState.cc:986
> #5  0x0064f441 in rcv_window_update_frame (cs=..., cstate=..., 
> frame=...) at Http2ConnectionState.cc:591
> #6  0x0064fcdb in Http2ConnectionState::main_event_handler 
> (this=0x2b85ac54ce28, event=2253, edata=0x2b85a5721610)
> at Http2ConnectionState.cc:753
> #7  0x00515a58 in Continuation::handleEvent (this=0x2b85ac54ce28, 
> event=2253, data=0x2b85a5721610)
> at ../iocore/eventsystem/I_Continuation.h:150
> #8  0x00649ef9 in send_connection_event (cont=0x2b85ac54ce28, 
> event=2253, edata=0x2b85a5721610) at Http2ClientSession.cc:61
> #9  0x0064c247 in Http2ClientSession::state_complete_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:482
> #10 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #11 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #12 0x0064bf8f in Http2ClientSession::state_start_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:455
> #13 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #14 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #15 0x0064c2b8 in Http2ClientSession::state_complete_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:487
> #16 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #17 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #18 0x0064bf8f in Http2ClientSession::state_start_frame_read 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:455
> #19 0x0064b009 in Http2ClientSession::main_event_handler 
> (this=0x2b85ac54cbf0, event=100, edata=0x2b86fc70bff0)
> at Http2ClientSession.cc:304
> #20 0x00515a58 in Continuation::handleEvent (this=0x2b85ac54cbf0, 
> event=100, data=0x2b86fc70bff0)
> at ../iocore/eventsystem/I_Continuation.h:150
> #21 0x0078679d in read_signal_and_update (event=100, 
> vc=0x2b86fc70bed0) at UnixNetVConnection.cc:148
> #22 0x007895ba in UnixNetVConnection::readSignalAndUpdate 
> (this=0x2b86fc70bed0, event=100) at UnixNetVConnection.cc:1013
> #23 0x0076de67 in SSLNetVConnection::net_read_io 
> (this=0x2b86fc70bed0, nh=0x2b859fe21d60, lthread=0x2b859fe1e010)
> at SSLNetVConnection.cc:576
> ---Type  to continue, or q  to quit---
> #24 0x00780011 in NetHandler::waitForActivity (this=0x2b859fe21d60, 
> timeout=6000) at UnixNet.cc:547
> #25 0x007a7c69 in EThread::execute_regular (this=0x2b859fe1e010) at 
> UnixEThread.cc:266
> #26 0x007a7dac in EThread::execute (this=0x2b859fe1e010) at 
> UnixEThread.cc:304
> #27 0x007a6965 in spawn_thread_internal (a=0x112f520) at Thread.cc:85
> #28 0x2b859d8deaa1 in start_thread () from /lib64/libpthread.so.0
> #29 0x0031d68e893d in clone () from /lib64/libc.so.6
> {code}
> I think the issue was that I removed a schedule_immediate _EOS to do the 
> Http2Stream destroy in an earlier fi and instead did the destroy inline. But 
> that class does not include recursion logic to delay the delete until the 
> stack is unbound. By sending the 

  1   2   3   4   5   6   7   8   9   10   >