[jira] [Created] (TS-3084) forwarding mode breaks iPhone activation (ga.apple.com)
Nikolai Gorchilov created TS-3084: - Summary: forwarding mode breaks iPhone activation (ga.apple.com) Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov On iDevice restoration iTunes makes activation request to ga.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3084) forwarding mode breaks iPhone activation (ga.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Gorchilov updated TS-3084: -- Attachment: gs.response gs.request forwarding mode breaks iPhone activation (ga.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Attachments: gs.request, gs.response On iDevice restoration iTunes makes activation request to ga.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Gorchilov updated TS-3084: -- Description: On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} was: On iDevice restoration iTunes makes activation request to ga.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} Summary: forwarding mode breaks iPhone activation (gs.apple.com) (was: forwarding mode breaks iPhone activation (ga.apple.com)) forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Attachments: gs.request, gs.response On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3084: -- Fix Version/s: 5.2.0 forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Fix For: 5.2.0 Attachments: gs.request, gs.response On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3083) crash
[ https://issues.apache.org/jira/browse/TS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3083: -- Fix Version/s: 5.2.0 crash - Key: TS-3083 URL: https://issues.apache.org/jira/browse/TS-3083 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.2 Reporter: bettydramit Labels: crash Fix For: 5.2.0 c++filt a.txt {code} /lib64/libpthread.so.0(+0xf710)[0x2b4c37949710] /usr/lib64/trafficserver/libtsutil.so.5(ink_atomiclist_pop+0x3e)[0x2b4c35abb64e] /usr/lib64/trafficserver/libtsutil.so.5(reclaimable_freelist_new+0x65)[0x2b4c35abc065] /usr/bin/traffic_server(MIOBuffer_tracker::operator()(long)+0x2b)[0x4a33db] /usr/bin/traffic_server(PluginVCCore::init()+0x2e3)[0x4d9903] /usr/bin/traffic_server(PluginVCCore::alloc()+0x11d)[0x4dcf4d] /usr/bin/traffic_server(TSHttpConnectWithPluginId+0x5d)[0x4b9e9d] /usr/bin/traffic_server(FetchSM::httpConnect()+0x74)[0x4a0224] /usr/bin/traffic_server(PluginVC::process_read_side(bool)+0x375)[0x4da675] /usr/bin/traffic_server(PluginVC::process_write_side(bool)+0x57a)[0x4dafca] /usr/bin/traffic_server(PluginVC::main_handler(int, void*)+0x315)[0x4dc9a5] /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x8f)[0x73788f] /usr/bin/traffic_server(EThread::execute()+0x57b)[0x7381fb] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3080) OpenSSL implementation of TLS session cache is very slow.
[ https://issues.apache.org/jira/browse/TS-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3080: -- Fix Version/s: 5.2.0 OpenSSL implementation of TLS session cache is very slow. - Key: TS-3080 URL: https://issues.apache.org/jira/browse/TS-3080 Project: Traffic Server Issue Type: Bug Components: Core, SSL Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 5.2.0 The OpenSSL implementation of TLS session caching is very slow, we attempted to use it and it's locking and blows up at only a few hundred QPS. I'm going to develop a new TLS session cache in TS that is more performant under highload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3080) OpenSSL implementation of TLS session cache is very slow.
[ https://issues.apache.org/jira/browse/TS-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139128#comment-14139128 ] Leif Hedstrom commented on TS-3080: --- I still find it odd that you'd run into lock contention already at hundreds of sessions / sec. Unless the critical section is large and/or very slow, that shouldn't be happening. Futexes are fast in general :). OpenSSL implementation of TLS session cache is very slow. - Key: TS-3080 URL: https://issues.apache.org/jira/browse/TS-3080 Project: Traffic Server Issue Type: Bug Components: Core, SSL Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 5.2.0 The OpenSSL implementation of TLS session caching is very slow, we attempted to use it and it's locking and blows up at only a few hundred QPS. I'm going to develop a new TLS session cache in TS that is more performant under highload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3082) ATS does not bind sessions to SNI names.
[ https://issues.apache.org/jira/browse/TS-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3082: -- Labels: security (was: ) ATS does not bind sessions to SNI names. Key: TS-3082 URL: https://issues.apache.org/jira/browse/TS-3082 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Alexey Ivanov Assignee: Brian Geffon Labels: security Fix For: 5.2.0 More information in paper: Virtual Host Confusion: Weaknesses and Exploits. Black Hat 2014 Report http://bh.ht.vc/vhost_confusion.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3082) ATS does not bind sessions to SNI names.
[ https://issues.apache.org/jira/browse/TS-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3082: -- Priority: Critical (was: Major) ATS does not bind sessions to SNI names. Key: TS-3082 URL: https://issues.apache.org/jira/browse/TS-3082 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Alexey Ivanov Assignee: Brian Geffon Priority: Critical Labels: security Fix For: 5.2.0 More information in paper: Virtual Host Confusion: Weaknesses and Exploits. Black Hat 2014 Report http://bh.ht.vc/vhost_confusion.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3082) ATS does not bind sessions to SNI names.
[ https://issues.apache.org/jira/browse/TS-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leif Hedstrom updated TS-3082: -- Component/s: SSL ATS does not bind sessions to SNI names. Key: TS-3082 URL: https://issues.apache.org/jira/browse/TS-3082 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Alexey Ivanov Assignee: Brian Geffon Labels: security Fix For: 5.2.0 More information in paper: Virtual Host Confusion: Weaknesses and Exploits. Black Hat 2014 Report http://bh.ht.vc/vhost_confusion.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5
Sudheer Vinukonda created TS-3085: - Summary: Large POSTs over (relatively) slower connections failing in ats5 Key: TS-3085 URL: https://issues.apache.org/jira/browse/TS-3085 Project: Traffic Server Issue Type: Bug Components: SSL Reporter: Sudheer Vinukonda We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5
[ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheer Vinukonda updated TS-3085: -- Description: We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} was: We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} Affects Version/s: 5.0.1 Backport to Version: 5.1.1 Fix Version/s: 5.2.0 Assignee: Sudheer Vinukonda Labels: yahoo (was: ) The fix is really simple - to basically call ERR_Clear_error() before SSL_Read(). I will investigate separately on why/who is
[jira] [Commented] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5
[ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139222#comment-14139222 ] Sudheer Vinukonda commented on TS-3085: --- Per Leif's suggestion on a different jira, I've marked this for 5.2 and added a back port to 5.1.1, but, this is a blocker for our ats5 roll out, and, perhaps, whoever has use cases involving large POSTs may need to cherry pick the fix. Large POSTs over (relatively) slower connections failing in ats5 Key: TS-3085 URL: https://issues.apache.org/jira/browse/TS-3085 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.0.1 Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Labels: yahoo Fix For: 5.2.0 We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5
[ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139224#comment-14139224 ] Sudheer Vinukonda commented on TS-3085: --- When a POST fails, below is the log (slightly enhanced and traced using single/ip debugging in production): {code} [Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (ssl) [SSL_NetVConnection::ssl_read_from_net] b-write_avail()=32768 [Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (ssl) [SSL_NetVConnection::ssl_read_from_net] rres=-1 [Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (ssl.error) [SSL_NetVConnection::ssl_read_from_net] error 1 [Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (http_tunnel) [510166] producer_handler [user agent post VC_EVENT_ERROR] [Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 3 [Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (http) [510166] [HttpSM::tunnel_handler_post_ua, VC_EVENT_ERROR] {code} Large POSTs over (relatively) slower connections failing in ats5 Key: TS-3085 URL: https://issues.apache.org/jira/browse/TS-3085 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.0.1 Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Labels: yahoo Fix For: 5.2.0 We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-1975) LocalManager may cause manager crash
[ https://issues.apache.org/jira/browse/TS-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139280#comment-14139280 ] Jared Ocker commented on TS-1975: - Here's a stack trace from our traffic.out, including some logs leading up to it. As you can see, we're using the rfc5861 (stale while revalidate) plugin. When I disable it, the issue seems to go away. {code} [Sep 15 09:51:55.419] Server {0x2b3a9b284710} DIAG: (sdk) (SDK) null mutex detected in critical region (mutex created) [Sep 15 09:51:55.419] Server {0x2b3a9b284710} DIAG: (sdk) (SDK) please create continuation [0x2dfc930] with mutex [Sep 15 09:51:55.419] Server {0x2b3a9b284710} DIAG: (rfc5861) Write Complete [Sep 15 09:51:55.420] Server {0x2b3a9b284710} DIAG: (rfc5861) Internal Request [Sep 15 09:51:55.450] Server {0x2b3a9b284710} DIAG: (rfc5861) Read Ready [Sep 15 09:51:55.450] Server {0x2b3a9b284710} DIAG: (rfc5861) HTTP Status: 304 [Sep 15 09:51:55.450] Server {0x2b3a9b284710} DIAG: (rfc5861) EOS [Sep 15 09:51:55.450] Server {0x2b3a9b284710} DIAG: (rfc5861) In sync path. setting fresh and re-enabling [Sep 15 09:51:55.450] Server {0x2b3a9b284710} DIAG: (rfc5861) Attempting new cache lookup [Sep 15 09:51:55.450] Server {0x2b3a9b284710} DIAG: (rfc5861) Not Stale! [Sep 15 09:51:55.469] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) External Request [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) CacheLookupStatus is STALE [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Found a date [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Found cache-control [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Unknown field value [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Found max-age [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Unknown field value [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Found stale-while-revalidate [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Found stale-on-error [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Looks like we can return fresh data on 500 error [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (sdk) (SDK) null mutex detected in critical region (mutex created) [Sep 15 09:51:55.487] Server {0x2b3a9a7e9ca0} DIAG: (sdk) (SDK) please create continuation [0x2dfc930] with mutex [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (rfc5861) Lets do the lookup [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (rfc5861) Set Connection: close [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (rfc5861) Found old Connection hdr [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (rfc5861) Creating Connection hdr [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (rfc5861) Create Buffers [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (sdk) (SDK) null mutex detected in critical region (mutex created) [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (sdk) (SDK) please create continuation [0x2dfc5d0] with mutex [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (rfc5861) Write Complete [Sep 15 09:51:55.487] Server {0x2b3a9b183710} DIAG: (rfc5861) Internal Request [Sep 15 09:51:55.515] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Read Ready [Sep 15 09:51:55.515] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) HTTP Status: 304 [Sep 15 09:51:55.515] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) EOS [Sep 15 09:51:55.515] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) In sync path. setting fresh and re-enabling [Sep 15 09:51:55.515] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Attempting new cache lookup [Sep 15 09:51:55.515] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.599] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) External Request [Sep 15 09:52:19.600] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.670] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) External Request [Sep 15 09:52:19.670] Server {0x2b3a9b284710} DIAG: (rfc5861) External Request [Sep 15 09:52:19.671] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.671] Server {0x2b3a9b284710} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.673] Server {0x2b3a9b284710} DIAG: (rfc5861) External Request [Sep 15 09:52:19.674] Server {0x2b3a9b284710} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.674] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) External Request [Sep 15 09:52:19.675] Server {0x2b3a9b183710} DIAG: (rfc5861) External Request [Sep 15 09:52:19.675] Server {0x2b3a9b284710} DIAG: (rfc5861) External Request [Sep 15 09:52:19.676] Server {0x2b3a9b284710} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.677] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.677] Server {0x2b3a9b183710} DIAG: (rfc5861) Not Stale! [Sep 15 09:52:19.685] Server {0x2b3a9a7e9ca0} DIAG: (rfc5861) External Request [Sep 15 09:52:19.685] Server {0x2b3a9b284710} DIAG: (rfc5861) External Request [Sep 15 09:52:19.686] Server {0x2b3a9b284710} DIAG:
[jira] [Commented] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139303#comment-14139303 ] James Peach commented on TS-3084: - What is the request that ATS sends to the origin server? forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Fix For: 5.2.0 Attachments: gs.request, gs.response On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5
[ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139314#comment-14139314 ] James Peach commented on TS-3085: - Good catch, that code looks quite broken. I think a better fix is to only call {{SSL_get_error()}} if {{SSL_read()}} returns = 0. The error handling for {{SSL_write}} also looks problematic. Can you refactor that to also call {{SSL_get_error()}} correctly? Large POSTs over (relatively) slower connections failing in ats5 Key: TS-3085 URL: https://issues.apache.org/jira/browse/TS-3085 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.0.1 Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Labels: yahoo Fix For: 5.2.0 We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3084: --- Attachment: partial-request.txt I was just able to reproduce. It looks like only part of the gs.request was sent before the FIN was sent from ATS. See the partial request attachment. forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Fix For: 5.2.0 Attachments: gs.request, gs.response, partial-request.txt On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139329#comment-14139329 ] Susan Hinrichs commented on TS-3084: Sometimes I see the full request being sent to origin server, but having ATS terminate connection to original server before the full response has been sent. forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Fix For: 5.2.0 Attachments: gs.request, gs.response, partial-request.txt On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139338#comment-14139338 ] James Peach commented on TS-3084: - Is this in transparent proxy mode? forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Fix For: 5.2.0 Attachments: gs.request, gs.response, partial-request.txt On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139350#comment-14139350 ] Susan Hinrichs commented on TS-3084: Yes, my test system is in transparent mode. forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Fix For: 5.2.0 Attachments: gs.request, gs.response, partial-request.txt On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-1975) LocalManager may cause manager crash
[ https://issues.apache.org/jira/browse/TS-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139359#comment-14139359 ] Phil Sorber commented on TS-1975: - Do you have a test case that repeats this easily that you can share? Or can you get a core and a real stack trace with debug symbols? Also, I think [~amc] said he saw something like this and it wasn't rfc5861. Maybe he can comment. Thanks. LocalManager may cause manager crash Key: TS-1975 URL: https://issues.apache.org/jira/browse/TS-1975 Project: Traffic Server Issue Type: Bug Components: Manager Affects Versions: 3.3.4 Reporter: Zhao Yongming Assignee: portl4t Labels: Crash Fix For: 5.2.0 when something wrong with the LocalManager, with [LocalManager::pollMgmtProcessServer] Error in read (errno: 104), then you will get manager and server restart. {code} Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} FATAL: (last system error 104: Connection reset by peer) Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} ERROR: (last system error 32: Broken pipe) Jun 17 17:40:07 cache163 traffic_cop[25652]: cop received child status signal [25654 2816] Jun 17 17:40:07 cache163 traffic_cop[25652]: traffic_manager not running, making sure traffic_server is dead Jun 17 17:40:07 cache163 traffic_cop[25652]: spawning traffic_manager Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: --- Manager Starting --- Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 51516 on Jun 15 2013 at 16:01:06) Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: RLIMIT_NOFILE(7):cur(16),max(16) Jun 17 17:40:07 cache163 traffic_manager[10118]: {0x7f26fc24a7e0} STATUS: opened /var/log/trafficserver/manager.log Jun 17 17:40:09 cache163 traffic_server[10131]: NOTE: --- Server Starting --- Jun 17 17:40:09 cache163 traffic_server[10131]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 51516 on Jun 15 2013 at 16:01:31) Jun 17 17:40:09 cache163 traffic_server[10131]: {0x2b167ded2280} STATUS: opened /var/log/trafficserver/diags.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-1975) LocalManager may cause manager crash
[ https://issues.apache.org/jira/browse/TS-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139478#comment-14139478 ] Jared Ocker commented on TS-1975: - Unfortunately, I don't know how to initiate it, it just happens multiple times per day. Below is a listing of our September log output from {{grep FATAL manager.log}} showing the frequency on a test server with light traffic. I've not seen any core dumps. Do you have any suggestions for debug tags? Even excluding some of the most verbose tags with {{CONFIG proxy.config.diags.debug.tags STRING ^(?!.\*dir_clean|stats|log.*|http)}} still generates a ton of output. {code} [Sep 4 14:41:12.517] Manager {0x7fb24371e7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 4 20:13:49.649] Manager {0x7fa5f16ed7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 5 00:34:55.776] Manager {0x7fab124677e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 5 09:25:42.075] Manager {0x7f9edb27b7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 6 00:13:55.717] Manager {0x7f7b8313a7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 6 00:58:49.020] Manager {0x7f04482517e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 6 01:00:49.758] Manager {0x7fd4e6c387e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 6 03:01:39.411] Manager {0x7f4b0285b7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 6 05:15:48.996] Manager {0x7fb6867407e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 6 23:19:50.204] Manager {0x7fb67c3e57e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 01:15:41.632] Manager {0x7fe5527627e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 01:17:50.320] Manager {0x7f60e471c7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 02:43:49.077] Manager {0x7f3f960c27e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 03:11:15.545] Manager {0x7f21e77107e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 03:12:01.707] Manager {0x7fcc284f37e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 03:34:53.128] Manager {0x7f283d1317e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 20:01:45.939] Manager {0x7f63d15977e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 20:02:54.278] Manager {0x7fec786157e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 7 20:52:53.018] Manager {0x7f34d10fc7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 8 23:03:53.881] Manager {0x7f97d5be57e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 9 01:17:06.301] Manager {0x7f62dc3837e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 9 04:02:54.184] Manager {0x7f1e2a0587e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 10 02:09:23.664] Manager {0x7f7936f427e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 10 04:05:23.641] Manager {0x7fd0c8fcc7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 10 21:49:25.050] Manager {0x7f2bd3ba47e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 10 22:55:24.225] Manager {0x7fbe262aa7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 11 02:15:24.191] Manager {0x7f030a7da7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 01:48:25.133] Manager {0x7f7a396337e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 03:28:24.094] Manager {0x7f53933977e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 03:29:24.516] Manager {0x7f0d3d1567e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 04:03:25.149] Manager {0x7f55a12b07e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 07:02:24.187] Manager {0x7f72d13ee7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 10:18:24.169] Manager {0x7f564bae57e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 11:13:15.805] Manager {0x7fb1b9ccc7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 11:30:17.156] Manager {0x7ff22a03d7e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Sep 12 12:39:25.266] Manager {0x7fed1d3867e0} FATAL:
[jira] [Commented] (TS-1975) LocalManager may cause manager crash
[ https://issues.apache.org/jira/browse/TS-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139489#comment-14139489 ] Phil Sorber commented on TS-1975: - Here is a link to the docs on enabling core dumps. You may also need to do OS level changes to enable core dumps. You also need to make sure you compiled with debug (-g) which I think is the default so that symbols are available. https://docs.trafficserver.apache.org/en/latest/sdk/troubleshooting-tips/using-a-debugger.en.html LocalManager may cause manager crash Key: TS-1975 URL: https://issues.apache.org/jira/browse/TS-1975 Project: Traffic Server Issue Type: Bug Components: Manager Affects Versions: 3.3.4 Reporter: Zhao Yongming Assignee: portl4t Labels: Crash Fix For: 5.2.0 when something wrong with the LocalManager, with [LocalManager::pollMgmtProcessServer] Error in read (errno: 104), then you will get manager and server restart. {code} Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} FATAL: (last system error 104: Connection reset by peer) Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Jun 17 17:40:06 cache163 traffic_manager[25654]: {0x7f528b4297e0} ERROR: (last system error 32: Broken pipe) Jun 17 17:40:07 cache163 traffic_cop[25652]: cop received child status signal [25654 2816] Jun 17 17:40:07 cache163 traffic_cop[25652]: traffic_manager not running, making sure traffic_server is dead Jun 17 17:40:07 cache163 traffic_cop[25652]: spawning traffic_manager Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: --- Manager Starting --- Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 3.2.0 - (build # 51516 on Jun 15 2013 at 16:01:06) Jun 17 17:40:07 cache163 traffic_manager[10118]: NOTE: RLIMIT_NOFILE(7):cur(16),max(16) Jun 17 17:40:07 cache163 traffic_manager[10118]: {0x7f26fc24a7e0} STATUS: opened /var/log/trafficserver/manager.log Jun 17 17:40:09 cache163 traffic_server[10131]: NOTE: --- Server Starting --- Jun 17 17:40:09 cache163 traffic_server[10131]: NOTE: Server Version: Apache Traffic Server - traffic_server - 3.2.0 - (build # 51516 on Jun 15 2013 at 16:01:31) Jun 17 17:40:09 cache163 traffic_server[10131]: {0x2b167ded2280} STATUS: opened /var/log/trafficserver/diags.log {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3086) Range requests for stale cache entries never use If-Modified-Since/If-None-Match
William Bardwell created TS-3086: Summary: Range requests for stale cache entries never use If-Modified-Since/If-None-Match Key: TS-3086 URL: https://issues.apache.org/jira/browse/TS-3086 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: William Bardwell Range requests against a stale cache entry always cause the Range request to be tunneled with no conditional. It would be nice if it used conditionals even if it couldn't update the cache entry just to cut down the traffic. (Not sure if updating the cache entry would be right, does If-Modified-Since refer only to the Range requested or to the whole object?) We could also have an option to use If-Range in these cases, but that might not make sense as a global decision... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3080) OpenSSL implementation of TLS session cache is very slow.
[ https://issues.apache.org/jira/browse/TS-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139593#comment-14139593 ] Alexey Ivanov edited comment on TS-3080 at 9/18/14 10:00 PM: - Bottleneck seems to manifest itself if: 1) we are around ~1k handshakes/sec. 2) we have huge session cache side - 30 entries It manifests itself in all NET threads stuck inside [SSL_CTX_flush_sessions]. Which is quite logical since it's going through the list of sessions applying timeout function to each of them while holding a lock: {code} lh_SSL_SESSION_doall_arg(tp.cache, LHASH_DOALL_ARG_FN(timeout), TIMEOUT_PARAM, tp); {code} So we either need to reduce number of elements in cache which will make it useless or write our own implementation (preferably) using CK and {{SSL_CTX_sess_set_\{new,get,remove\}_cb}} callbacks. (That's how nginx done it, though nginx still allows using built-in openssl cache, though it is slow and causes memory fragmentation [nginx_ssl_session_cache]) [SSL_CTX_flush_sessions] https://github.com/openssl/openssl/blob/master/ssl/ssl_sess.c#L964 [nginx_ssl_session_cache] http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_session_cache was (Author: savetherbtz): Bottleneck seems to manifest itself if: 1) we are around ~1k handshakes/sec. 2) we have huge session cache side - 30 entries It manifests itself in all NET threads stuck inside [SSL_CTX_flush_sessions]. Which is quite logical since it's going through the list of sessions applying timeout function to each of them while holding a lock: {code} lh_SSL_SESSION_doall_arg(tp.cache, LHASH_DOALL_ARG_FN(timeout), TIMEOUT_PARAM, tp); {code} So we either need to reduce number of elements in cache which will make it useless or write our own implementation (preferably) using CK and {{SSL_CTX_sess_set_\{new,get,remove\}_cb}} callbacks. (That's how nginx done it, though nginx still allows using built-in openssl cache, though it is slow and causes memory fragmentation [nginx#ssl_session_cache]) [SSL_CTX_flush_sessions] https://github.com/openssl/openssl/blob/master/ssl/ssl_sess.c#L964 [nginx#ssl_session_cache] http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_session_cache OpenSSL implementation of TLS session cache is very slow. - Key: TS-3080 URL: https://issues.apache.org/jira/browse/TS-3080 Project: Traffic Server Issue Type: Bug Components: Core, SSL Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 5.2.0 The OpenSSL implementation of TLS session caching is very slow, we attempted to use it and it's locking and blows up at only a few hundred QPS. I'm going to develop a new TLS session cache in TS that is more performant under highload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3080) OpenSSL implementation of TLS session cache is very slow.
[ https://issues.apache.org/jira/browse/TS-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139593#comment-14139593 ] Alexey Ivanov commented on TS-3080: --- Bottleneck seems to manifest itself if: 1) we are around ~1k handshakes/sec. 2) we have huge session cache side - 30 entries It manifests itself in all NET threads stuck inside [SSL_CTX_flush_sessions]. Which is quite logical since it's going through the list of sessions applying timeout function to each of them while holding a lock: {code} lh_SSL_SESSION_doall_arg(tp.cache, LHASH_DOALL_ARG_FN(timeout), TIMEOUT_PARAM, tp); {code} So we either need to reduce number of elements in cache which will make it useless or write our own implementation (preferably) using CK and {{SSL_CTX_sess_set_\{new,get,remove\}_cb}} callbacks. (That's how nginx done it, though nginx still allows using built-in openssl cache, though it is slow and causes memory fragmentation [nginx#ssl_session_cache]) [SSL_CTX_flush_sessions] https://github.com/openssl/openssl/blob/master/ssl/ssl_sess.c#L964 [nginx#ssl_session_cache] http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_session_cache OpenSSL implementation of TLS session cache is very slow. - Key: TS-3080 URL: https://issues.apache.org/jira/browse/TS-3080 Project: Traffic Server Issue Type: Bug Components: Core, SSL Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 5.2.0 The OpenSSL implementation of TLS session caching is very slow, we attempted to use it and it's locking and blows up at only a few hundred QPS. I'm going to develop a new TLS session cache in TS that is more performant under highload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5
[ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139915#comment-14139915 ] kang li commented on TS-3085: - Hi [~sudheerv], I think the SSL stack corruption may be related to [TS:2986|https://issues.apache.org/jira/browse/TS-2986]. As it remove SSLErrorVC to eliminate the SSL error log in diags.log. SSLErrorVC would call ERR_get_error_line_data to clean the error stack. Large POSTs over (relatively) slower connections failing in ats5 Key: TS-3085 URL: https://issues.apache.org/jira/browse/TS-3085 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 5.0.1 Reporter: Sudheer Vinukonda Assignee: Sudheer Vinukonda Labels: yahoo Fix For: 5.2.0 We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below: ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures. Documentation from openSSL and some related notes on stackoverflow: https://www.openssl.org/docs/ssl/SSL_get_error.html http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error {code} SSL_get_error() returns a result code (suitable for the C ``switch'' statement) for a preceding call to SSL_connect(), SSL_accept(), SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value returned by that TLS/SSL I/O function must be passed to SSL_get_error() in parameter ret. In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error, the error stays in the queue. You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write etc) that is followed by SSL_get_error, otherwise you may be reading an old error that occurred previously in the current thread. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3084) forwarding mode breaks iPhone activation (gs.apple.com)
[ https://issues.apache.org/jira/browse/TS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan M. Carroll updated TS-3084: Assignee: Susan Hinrichs forwarding mode breaks iPhone activation (gs.apple.com) --- Key: TS-3084 URL: https://issues.apache.org/jira/browse/TS-3084 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Assignee: Susan Hinrichs Fix For: 5.2.0 Attachments: gs.request, gs.response, partial-request.txt On iDevice restoration iTunes makes activation request to gs.apple.com (request attached). When sent via ATS, the request leads to HTTP/1.1 400 (bad request) response from origin server. Proper response (on direct connection) is also attached for your reference. Here's the command to reproduce the problem {noformat} netcat gs.apple.com 80 gs.request {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3073) tr-pass: non-http request gets blocked with error message instead of being tunnelled to the origin server
[ https://issues.apache.org/jira/browse/TS-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susan Hinrichs updated TS-3073: --- Attachment: ts-3084.patch This appears to be very similar to the problem fixed in ts-3073. The ATS logic really doesn't like it when the client closes a connection before the server finishes sending a response. This patch is built upon the patch for ts-3073. But that patch addressed a ts-pass case and this is addressing an HTTP post case. Again using the half_open flag to track that a client has already sent a FIN and check the flag after all the data has been read to do a shutdown(IO_READ) to send the FIN along to the origin server. tr-pass: non-http request gets blocked with error message instead of being tunnelled to the origin server - Key: TS-3073 URL: https://issues.apache.org/jira/browse/TS-3073 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Assignee: Susan Hinrichs Fix For: 5.2.0 Attachments: bypass.request, tr-pass-client-close.patch, ts-3084.patch ATS breaks RIFF Box JTAG Manager software that is using proprietary protocol over port 80 even with tr-pass enabled. Instead of creating a tunnel, ATS returns bad request error. Managed to capture the request that triggers the issue (to be attached as bypass.request). Here's a simple command to reproduce the problem: #$ netcat 93.191.132.28 80 bypass.request Direct request returns a simple exclamation mark '!', but passing it via ATS results in: {noformat} HTML HEAD TITLEBad Request/TITLE /HEAD BODY BGCOLOR=white FGCOLOR=black H1Bad Request/H1 HR FONT FACE=Helvetica,ArialB Description: Could not process this request. /B/FONT HR /BODY {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (TS-3073) tr-pass: non-http request gets blocked with error message instead of being tunnelled to the origin server
[ https://issues.apache.org/jira/browse/TS-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on TS-3073 started by Susan Hinrichs. -- tr-pass: non-http request gets blocked with error message instead of being tunnelled to the origin server - Key: TS-3073 URL: https://issues.apache.org/jira/browse/TS-3073 Project: Traffic Server Issue Type: Bug Components: Core, HTTP Reporter: Nikolai Gorchilov Assignee: Susan Hinrichs Fix For: 5.2.0 Attachments: bypass.request, tr-pass-client-close.patch, ts-3084.patch ATS breaks RIFF Box JTAG Manager software that is using proprietary protocol over port 80 even with tr-pass enabled. Instead of creating a tunnel, ATS returns bad request error. Managed to capture the request that triggers the issue (to be attached as bypass.request). Here's a simple command to reproduce the problem: #$ netcat 93.191.132.28 80 bypass.request Direct request returns a simple exclamation mark '!', but passing it via ATS results in: {noformat} HTML HEAD TITLEBad Request/TITLE /HEAD BODY BGCOLOR=white FGCOLOR=black H1Bad Request/H1 HR FONT FACE=Helvetica,ArialB Description: Could not process this request. /B/FONT HR /BODY {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)