[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-11-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=31734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31734
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 07/Nov/16 17:36
Start Date: 07/Nov/16 17:36
Worklog Time Spent: 10m 
  Work Description: Github user PSUdaemon closed the pull request at:

https://github.com/apache/trafficserver/pull/1206


Issue Time Tracking
---

Worklog Id: (was: 31734)
Time Spent: 1.5h  (was: 1h 20m)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>Assignee: James Peach
> Fix For: 6.2.1, 7.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs 
> Date:   Wed Apr 13 19:57:39 2016 +
> TS-3612: Restructure client session and transaction processing. This 
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible 
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up 
> commits to fix various problems with it, but this particular problem of 
> stalled SSL connections is still occurring with master as of today, 
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is 
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
> tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
> itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-11-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=31733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31733
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 07/Nov/16 17:30
Start Date: 07/Nov/16 17:30
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1206
  
FreeBSD build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-FreeBSD/1189/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 31733)
Time Spent: 1h 20m  (was: 1h 10m)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>Assignee: James Peach
> Fix For: 7.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs 
> Date:   Wed Apr 13 19:57:39 2016 +
> TS-3612: Restructure client session and transaction processing. This 
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible 
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up 
> commits to fix various problems with it, but this particular problem of 
> stalled SSL connections is still occurring with master as of today, 
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is 
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
> tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
> itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-11-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=31732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31732
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 07/Nov/16 17:26
Start Date: 07/Nov/16 17:26
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1206
  
Linux build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-Linux/1082/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 31732)
Time Spent: 1h 10m  (was: 1h)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>Assignee: James Peach
> Fix For: 7.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs 
> Date:   Wed Apr 13 19:57:39 2016 +
> TS-3612: Restructure client session and transaction processing. This 
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible 
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up 
> commits to fix various problems with it, but this particular problem of 
> stalled SSL connections is still occurring with master as of today, 
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is 
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
> tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
> itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-11-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=31730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31730
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 07/Nov/16 17:16
Start Date: 07/Nov/16 17:16
Worklog Time Spent: 10m 
  Work Description: GitHub user PSUdaemon opened a pull request:

https://github.com/apache/trafficserver/pull/1206

TS-4838: CONNECT requests get forgotten across threads.

What happens here is that ProxyClientTransaction::adjust_thread
reschedules the transaction onto a new thread at the start of
HttpSM::do_http_server_open.

Unfortunately, at this point the default handler is
HttpSM::state_raw_http_server_open. When the transaction is
rescheduled, the default handler runs, and receives the EVENT_INTERVAL
that it so fortuitously logs an error for. We have never actually
completed do_http_server_open, so we never make any more progress
on this transaction.

(cherry picked from commit 8fddd77c085d1a64f11de61bb42a50562cd23229)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PSUdaemon/trafficserver bp-1002

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/trafficserver/pull/1206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1206


commit c5ab2e686ac0dad4ebe89573cdcc0b2d2a6359a4
Author: James Peach 
Date:   2016-09-09T22:29:05Z

TS-4838: CONNECT requests get forgotten across threads.

What happens here is that ProxyClientTransaction::adjust_thread
reschedules the transaction onto a new thread at the start of
HttpSM::do_http_server_open.

Unfortunately, at this point the default handler is
HttpSM::state_raw_http_server_open. When the transaction is
rescheduled, the default handler runs, and receives the EVENT_INTERVAL
that it so fortuitously logs an error for. We have never actually
completed do_http_server_open, so we never make any more progress
on this transaction.

(cherry picked from commit 8fddd77c085d1a64f11de61bb42a50562cd23229)




Issue Time Tracking
---

Worklog Id: (was: 31730)
Time Spent: 1h  (was: 50m)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>Assignee: James Peach
> Fix For: 7.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the 

[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-09-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=28904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-28904
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 13/Sep/16 03:12
Start Date: 13/Sep/16 03:12
Worklog Time Spent: 10m 
  Work Description: Github user jpeach closed the pull request at:

https://github.com/apache/trafficserver/pull/1002


Issue Time Tracking
---

Worklog Id: (was: 28904)
Time Spent: 50m  (was: 40m)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>Assignee: James Peach
> Fix For: 7.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs 
> Date:   Wed Apr 13 19:57:39 2016 +
> TS-3612: Restructure client session and transaction processing. This 
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible 
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up 
> commits to fix various problems with it, but this particular problem of 
> stalled SSL connections is still occurring with master as of today, 
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is 
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
> tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
> itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-09-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=28671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-28671
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 09/Sep/16 22:51
Start Date: 09/Sep/16 22:51
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1002
  
FreeBSD build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-FreeBSD/769/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 28671)
Time Spent: 40m  (was: 0.5h)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs 
> Date:   Wed Apr 13 19:57:39 2016 +
> TS-3612: Restructure client session and transaction processing. This 
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible 
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up 
> commits to fix various problems with it, but this particular problem of 
> stalled SSL connections is still occurring with master as of today, 
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is 
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
> tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
> itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-09-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=28670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-28670
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 09/Sep/16 22:50
Start Date: 09/Sep/16 22:50
Worklog Time Spent: 10m 
  Work Description: Github user atsci commented on the issue:

https://github.com/apache/trafficserver/pull/1002
  
Linux build *successful*! See 
https://ci.trafficserver.apache.org/job/Github-Linux/665/ for details.
 



Issue Time Tracking
---

Worklog Id: (was: 28670)
Time Spent: 0.5h  (was: 20m)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs 
> Date:   Wed Apr 13 19:57:39 2016 +
> TS-3612: Restructure client session and transaction processing. This 
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible 
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up 
> commits to fix various problems with it, but this particular problem of 
> stalled SSL connections is still occurring with master as of today, 
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is 
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
> tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
> itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-09-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=28669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-28669
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 09/Sep/16 22:47
Start Date: 09/Sep/16 22:47
Worklog Time Spent: 10m 
  Work Description: Github user shinrich commented on the issue:

https://github.com/apache/trafficserver/pull/1002
  
Looks right.  Had fixed it in HttpSM::state_http_server_open but missed the 
raw open case.


Issue Time Tracking
---

Worklog Id: (was: 28669)
Time Spent: 20m  (was: 10m)

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs 
> Date:   Wed Apr 13 19:57:39 2016 +
> TS-3612: Restructure client session and transaction processing. This 
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible 
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up 
> commits to fix various problems with it, but this particular problem of 
> stalled SSL connections is still occurring with master as of today, 
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is 
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
> tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
> itself is pretty easy to try out, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work logged] (TS-4838) After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors

2016-09-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=28668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-28668
 ]

ASF GitHub Bot logged work on TS-4838:
--

Author: ASF GitHub Bot
Created on: 09/Sep/16 22:32
Start Date: 09/Sep/16 22:32
Worklog Time Spent: 10m 
  Work Description: GitHub user jpeach opened a pull request:

https://github.com/apache/trafficserver/pull/1002

TS-4838: CONNECT requests get forgotten across threads.

What happens here is that ProxyClientTransaction::adjust_thread
reschedules the transaction onto a new thread at the start of
HttpSM::do_http_server_open.

Unfortunately, at this point the default handler is
HttpSM::state_raw_http_server_open. When the transaction is
rescheduled, the default handler runs, and receives the EVENT_INTERVAL
that it so fortuitously logs an error for. We have never actually
completed do_http_server_open, so we never make any more progress
on this transaction.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jpeach/trafficserver fix/4838

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/trafficserver/pull/1002.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1002


commit 8fddd77c085d1a64f11de61bb42a50562cd23229
Author: James Peach 
Date:   2016-09-09T22:29:05Z

TS-4838: CONNECT requests get forgotten across threads.

What happens here is that ProxyClientTransaction::adjust_thread
reschedules the transaction onto a new thread at the start of
HttpSM::do_http_server_open.

Unfortunately, at this point the default handler is
HttpSM::state_raw_http_server_open. When the transaction is
rescheduled, the default handler runs, and receives the EVENT_INTERVAL
that it so fortuitously logs an error for. We have never actually
completed do_http_server_open, so we never make any more progress
on this transaction.




Issue Time Tracking
---

Worklog Id: (was: 28668)
Time Spent: 10m
Remaining Estimate: 0h

> After TS-3612 restructuring, very slow SSL sessions and 
> HttpSM::state_raw_http_server_open errors
> -
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core, SSL
>Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
>Reporter: Dimitry Andric
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward 
> proxying of a number of different HTML5 applications, one of the most 
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.  
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the 
> CONNECT requests for port 443 which are generated by the YouTube app.  It 
> seems like these connections are "stalled" somehow, sometimes for >10 
> seconds.  Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ 
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT 
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
> r17---sn-5hnednl7.googlevideo.com:443/ - 
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to 
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
> the branch point of 6.2, and I ended up at [commit 
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan