[jira] [Assigned] (TS-4991) jtest should handle Range request
[ https://issues.apache.org/jira/browse/TS-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming reassigned TS-4991: - Assignee: song [~jasondmee] please take care of this request. thanks > jtest should handle Range request > - > > Key: TS-4991 > URL: https://issues.apache.org/jira/browse/TS-4991 > Project: Traffic Server > Issue Type: Improvement > Components: HTTP, Tests, Tools >Reporter: Zhao Yongming >Assignee: song > > jtest is not able to generate Range requests and handle Range requests, we > should make it > I'd like to see the SIMPLE "Range: bytes=100-200/1000" works first, then > maybe some other Range syntax oven multiple Range should be consider later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4991) jtest should handle Range request
Zhao Yongming created TS-4991: - Summary: jtest should handle Range request Key: TS-4991 URL: https://issues.apache.org/jira/browse/TS-4991 Project: Traffic Server Issue Type: Improvement Components: HTTP, Tests, Tools Reporter: Zhao Yongming jtest is not able to generate Range requests and handle Range requests, we should make it I'd like to see the SIMPLE "Range: bytes=100-200/1000" works first, then maybe some other Range syntax oven multiple Range should be consider later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-2482) Problems with SOCKS
[ https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-2482: -- Assignee: Oknet Xu (was: weijin) > Problems with SOCKS > --- > > Key: TS-2482 > URL: https://issues.apache.org/jira/browse/TS-2482 > Project: Traffic Server > Issue Type: Bug > Components: Core >Reporter: Radim Kolar >Assignee: Oknet Xu > Fix For: sometime > > > There are several problems with using SOCKS. I am interested in case when TF > is sock client. Client sends HTTP request and TF uses SOCKS server to make > connection to internet. > a/ - not documented enough in default configs > From default configs comments it seems that for running > TF 4.1.2 as socks client, it is sufficient to add one line to socks.config: > dest_ip=0.0.0.0-255.255.255.255 parent="10.0.0.7:9050" > but socks proxy is not used. If i run tcpdump sniffing packets TF never > tries to connect to that SOCKS. > From source code - > https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it > looks that is needed to set "proxy.config.socks.socks_needed" to activate > socks support. This should be documented in both sample files: socks.config > and record.config > b/ > after enabling socks, i am hit by this assert: > Assertion failed: (ats_is_ip4(_addr)), function init, file Socks.cc, > line 65. > i run on dual stack system (ip4,ip6). > This code is setting default destination for SOCKS request? Can not you use > just 127.0.0.1 for case if client gets connected over IP6? > https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4396) Off-by-one error in max redirects with redirection enabled
[ https://issues.apache.org/jira/browse/TS-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360120#comment-15360120 ] Zhao Yongming commented on TS-4396: --- proxy.config.http.number_of_redirections = 1 does NOT work as expected, let us fix it first. > Off-by-one error in max redirects with redirection enabled > -- > > Key: TS-4396 > URL: https://issues.apache.org/jira/browse/TS-4396 > Project: Traffic Server > Issue Type: Bug > Components: Core, Network >Reporter: Felix Buenemann >Assignee: Zhao Yongming > Fix For: 7.0.0 > > > There is a problem in the current stable version 6.1.1 where the setting > proxy.config.http.number_of_redirections = 1 is incorrectly checked when > following origin redirects by setting proxy.config.http.redirection_enabled = > 1. > If the requested URL is not already cached, ATS returns the redirect response > to the client instead of storing the target into the cache and returning it > to the client. > The problem can be fixed by using proxy.config.http.number_of_redirections = > 2, but we are only following one redirect, so this is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-4396) Off-by-one error in max redirects with redirection enabled
[ https://issues.apache.org/jira/browse/TS-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming reassigned TS-4396: - Assignee: Zhao Yongming > Off-by-one error in max redirects with redirection enabled > -- > > Key: TS-4396 > URL: https://issues.apache.org/jira/browse/TS-4396 > Project: Traffic Server > Issue Type: Bug > Components: Core, Network >Reporter: Felix Buenemann >Assignee: Zhao Yongming > Fix For: 7.0.0 > > > There is a problem in the current stable version 6.1.1 where the setting > proxy.config.http.number_of_redirections = 1 is incorrectly checked when > following origin redirects by setting proxy.config.http.redirection_enabled = > 1. > If the requested URL is not already cached, ATS returns the redirect response > to the client instead of storing the target into the cache and returning it > to the client. > The problem can be fixed by using proxy.config.http.number_of_redirections = > 2, but we are only following one redirect, so this is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4368) Segmentation fault
[ https://issues.apache.org/jira/browse/TS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-4368: -- Component/s: (was: Logging) > Segmentation fault > -- > > Key: TS-4368 > URL: https://issues.apache.org/jira/browse/TS-4368 > Project: Traffic Server > Issue Type: Bug > Components: Clustering >Affects Versions: 6.1.2 >Reporter: Stef Fen > > We have a test trafficserver cluster of 2 nodes where the first node has > segfaults and the other doesn't. > We are using this source > https://github.com/researchgate/trafficserver/tree/6.1.x > which creates this packages (version 6.1.2) > https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver > {code} > [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to > /var/log/trafficserver/crash-2016-04-20-124752.log > traffic_server: Segmentation fault (Address not mapped to object [0x8050]) > traffic_server - STACK TRACE: > /usr/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, > void*)+0x97)[0x2ac6b8d676d7] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340] > /usr/bin/traffic_server(ink_aio_read(AIOCallback*, int)+0x36)[0x2ac6b8fe2e46] > /usr/bin/traffic_server(CacheVC::handleRead(int, > Event*)+0x3a1)[0x2ac6b8f9d131] > /usr/bin/traffic_server(Cache::open_read(Continuation*, ats::CryptoHash > const*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char const*, > int)+0x61f)[0x2ac6b8fc056f] > /usr/bin/traffic_server(cache_op_ClusterFunction(ClusterHandler*, void*, > int)+0x94c)[0x2ac6b8f8fefc] > /usr/bin/traffic_server(ClusterHandler::process_large_control_msgs()+0xf4)[0x2ac6b8f6dc84] > /usr/bin/traffic_server(ClusterHandler::update_channels_read()+0x9b)[0x2ac6b8f7099b] > /usr/bin/traffic_server(ClusterHandler::process_read(long)+0xae)[0x2ac6b8f7471e] > /usr/bin/traffic_server(ClusterHandler::mainClusterEvent(int, > Event*)+0x158)[0x2ac6b8f75048] > /usr/bin/traffic_server(ClusterState::doIO_read_event(int, > void*)+0x160)[0x2ac6b8f78d50] > /usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7] > /usr/bin/traffic_server(NetHandler::mainNetEvent(int, > Event*)+0x218)[0x2ac6b90005e8] > /usr/bin/traffic_server(EThread::execute()+0xa82)[0x2ac6b9033b82] > /usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182] > /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4368) Segmentation fault
[ https://issues.apache.org/jira/browse/TS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-4368: -- Description: We have a test trafficserver cluster of 2 nodes where the first node has segfaults and the other doesn't. We are using this source https://github.com/researchgate/trafficserver/tree/6.1.x which creates this packages (version 6.1.2) https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver {code} [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to /var/log/trafficserver/crash-2016-04-20-124752.log traffic_server: Segmentation fault (Address not mapped to object [0x8050]) traffic_server - STACK TRACE: /usr/bin/traffic_server(crash_logger_invoke(int, siginfo_t*, void*)+0x97)[0x2ac6b8d676d7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340] /usr/bin/traffic_server(ink_aio_read(AIOCallback*, int)+0x36)[0x2ac6b8fe2e46] /usr/bin/traffic_server(CacheVC::handleRead(int, Event*)+0x3a1)[0x2ac6b8f9d131] /usr/bin/traffic_server(Cache::open_read(Continuation*, ats::CryptoHash const*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char const*, int)+0x61f)[0x2ac6b8fc056f] /usr/bin/traffic_server(cache_op_ClusterFunction(ClusterHandler*, void*, int)+0x94c)[0x2ac6b8f8fefc] /usr/bin/traffic_server(ClusterHandler::process_large_control_msgs()+0xf4)[0x2ac6b8f6dc84] /usr/bin/traffic_server(ClusterHandler::update_channels_read()+0x9b)[0x2ac6b8f7099b] /usr/bin/traffic_server(ClusterHandler::process_read(long)+0xae)[0x2ac6b8f7471e] /usr/bin/traffic_server(ClusterHandler::mainClusterEvent(int, Event*)+0x158)[0x2ac6b8f75048] /usr/bin/traffic_server(ClusterState::doIO_read_event(int, void*)+0x160)[0x2ac6b8f78d50] /usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7] /usr/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x218)[0x2ac6b90005e8] /usr/bin/traffic_server(EThread::execute()+0xa82)[0x2ac6b9033b82] /usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d] {code} was: We have a test trafficserver cluster of 2 nodes where the first node has segfaults and the other doesn't. We are using this source https://github.com/researchgate/trafficserver/tree/6.1.x which creates this packages (version 6.1.2) https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver {code} [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to /var/log/trafficserver/crash-2016-04-20-124752.log traffic_server: Segmentation fault (Address not mapped to object [0x8050]) traffic_server - STACK TRACE: /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x97)[0x2ac6b8d676d7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340] /usr/bin/traffic_server(_Z12ink_aio_readP11AIOCallbacki+0x36)[0x2ac6b8fe2e46] /usr/bin/traffic_server(_ZN7CacheVC10handleReadEiP5Event+0x3a1)[0x2ac6b8f9d131] /usr/bin/traffic_server(_ZN5Cache9open_readEP12ContinuationPKN3ats10CryptoHashEP7HTTPHdrP21CacheLookupHttpConfig13CacheFragTypePKci+0x61f)[0x2ac6b8fc056f] /usr/bin/traffic_server(_Z24cache_op_ClusterFunctionP14ClusterHandlerPvi+0x94c)[0x2ac6b8f8fefc] /usr/bin/traffic_server(_ZN14ClusterHandler26process_large_control_msgsEv+0xf4)[0x2ac6b8f6dc84] /usr/bin/traffic_server(_ZN14ClusterHandler20update_channels_readEv+0x9b)[0x2ac6b8f7099b] /usr/bin/traffic_server(_ZN14ClusterHandler12process_readEl+0xae)[0x2ac6b8f7471e] /usr/bin/traffic_server(_ZN14ClusterHandler16mainClusterEventEiP5Event+0x158)[0x2ac6b8f75048] /usr/bin/traffic_server(_ZN12ClusterState15doIO_read_eventEiPv+0x160)[0x2ac6b8f78d50] /usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7] /usr/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x218)[0x2ac6b90005e8] /usr/bin/traffic_server(_ZN7EThread7executeEv+0xa82)[0x2ac6b9033b82] /usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d] {code} > Segmentation fault > -- > > Key: TS-4368 > URL: https://issues.apache.org/jira/browse/TS-4368 > Project: Traffic Server > Issue Type: Bug > Components: Clustering >Affects Versions: 6.1.2 >Reporter: Stef Fen > > We have a test trafficserver cluster of 2 nodes where the first node has > segfaults and the other doesn't. > We are using this source > https://github.com/researchgate/trafficserver/tree/6.1.x > which creates this packages (version 6.1.2) > https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver > {code} > [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to > /var/log/trafficserver/crash-2016-04-20-124752.log > traffic_server: Segmentation fault (Address not mapped to object [0x8050]) > traffic_server - STACK TRACE: > /usr/bin/traffic_server(crash_logger_invoke(int,
[jira] [Updated] (TS-4368) Segmentation fault
[ https://issues.apache.org/jira/browse/TS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-4368: -- Affects Version/s: 6.1.2 Component/s: Logging Clustering > Segmentation fault > -- > > Key: TS-4368 > URL: https://issues.apache.org/jira/browse/TS-4368 > Project: Traffic Server > Issue Type: Bug > Components: Clustering, Logging >Affects Versions: 6.1.2 >Reporter: Stef Fen > > We have a test trafficserver cluster of 2 nodes where the first node has > segfaults and the other doesn't. > We are using this source > https://github.com/researchgate/trafficserver/tree/6.1.x > which creates this packages (version 6.1.2) > https://launchpad.net/~researchgate/+archive/ubuntu/trafficserver > {code} > [Apr 20 12:47:52.434] {0x2b72121ca600} ERROR: wrote crash log to > /var/log/trafficserver/crash-2016-04-20-124752.log > traffic_server: Segmentation fault (Address not mapped to object [0x8050]) > traffic_server - STACK TRACE: > /usr/bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0x97)[0x2ac6b8d676d7] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ac6bafdc340] > /usr/bin/traffic_server(_Z12ink_aio_readP11AIOCallbacki+0x36)[0x2ac6b8fe2e46] > /usr/bin/traffic_server(_ZN7CacheVC10handleReadEiP5Event+0x3a1)[0x2ac6b8f9d131] > /usr/bin/traffic_server(_ZN5Cache9open_readEP12ContinuationPKN3ats10CryptoHashEP7HTTPHdrP21CacheLookupHttpConfig13CacheFragTypePKci+0x61f)[0x2ac6b8fc056f] > /usr/bin/traffic_server(_Z24cache_op_ClusterFunctionP14ClusterHandlerPvi+0x94c)[0x2ac6b8f8fefc] > /usr/bin/traffic_server(_ZN14ClusterHandler26process_large_control_msgsEv+0xf4)[0x2ac6b8f6dc84] > /usr/bin/traffic_server(_ZN14ClusterHandler20update_channels_readEv+0x9b)[0x2ac6b8f7099b] > /usr/bin/traffic_server(_ZN14ClusterHandler12process_readEl+0xae)[0x2ac6b8f7471e] > /usr/bin/traffic_server(_ZN14ClusterHandler16mainClusterEventEiP5Event+0x158)[0x2ac6b8f75048] > /usr/bin/traffic_server(_ZN12ClusterState15doIO_read_eventEiPv+0x160)[0x2ac6b8f78d50] > /usr/bin/traffic_server(+0x37e4c7)[0x2ac6b90114c7] > /usr/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x218)[0x2ac6b90005e8] > /usr/bin/traffic_server(_ZN7EThread7executeEv+0xa82)[0x2ac6b9033b82] > /usr/bin/traffic_server(+0x39f6ca)[0x2ac6b90326ca] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x2ac6bafd4182] > /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2ac6bbd0847d] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4156) remove the traffic_sac, stand alone log collation server
[ https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-4156: -- Fix Version/s: (was: sometime) 7.0.0 > remove the traffic_sac, stand alone log collation server > > > Key: TS-4156 > URL: https://issues.apache.org/jira/browse/TS-4156 > Project: Traffic Server > Issue Type: Improvement > Components: Logging >Reporter: Zhao Yongming >Assignee: Zhao Yongming > Fix For: 7.0.0 > > > the stand alone collation server act as a dedicated log server from ATS, this > is a dedicated log product back in the Inktomi age, and we don't need it as > this functions are build into the traffic_server binary for free distribution. > it is time to nuke it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4156) remove the traffic_sac, stand alone log collation server
[ https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125402#comment-15125402 ] Zhao Yongming commented on TS-4156: --- the log collation works as: 1, dedicated log collation server: no http cache (or others) function active, bug still a traffic_server, with full functions installed, that way we just don't put any request on this traffic_server. someone with very high traffic may use this mode, just don't keep the collation server out of the http cache service. 2, mixed with cache server: both cache and logging collation server in active, we log for other hosts(collation clients). most of the users may choose this mode, it will help you collect all the logs into one single place, and easy for check or backup. 3, traffic_sac stand alone log server: no server function, just log collation server. this is the duplicated binary. by design, the log collation is going to help you simple the logging by store all the logs into one single place, one single file the whole site, with just one timeline. and the log collation mode 'poxy.local.log.collation_mode' is a LOCAL directive in records.config, that make it possible to active a single host as collation server while others as collation server, while still got the cluster management of config files. so, I think that traffic_server with log collation mode in client or server is just a must builtin function if we want to keep the log collation feature, and keep a completely dedicated log collation server may bring more code complex. > remove the traffic_sac, stand alone log collation server > > > Key: TS-4156 > URL: https://issues.apache.org/jira/browse/TS-4156 > Project: Traffic Server > Issue Type: Improvement > Components: Logging >Reporter: Zhao Yongming >Assignee: Zhao Yongming > Fix For: sometime > > > the stand alone collation server act as a dedicated log server from ATS, this > is a dedicated log product back in the Inktomi age, and we don't need it as > this functions are build into the traffic_server binary for free distribution. > it is time to nuke it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-4156) remove the traffic_sac, stand alone log collation server
[ https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125402#comment-15125402 ] Zhao Yongming edited comment on TS-4156 at 1/31/16 4:30 PM: the log collation works as: 1, dedicated log collation server: no http cache (or others) function active, but still a traffic_server, with full functions installed, that way we just don't put any request on this traffic_server. someone with very high traffic may use this mode, just keep the collation server out of the http cache service. 2, mixed with cache server: both cache and logging collation server in active, we log for other hosts(collation clients). most of the users may choose this mode, it will help you collect all the logs into one single place, and easy for check or backup. 3, traffic_sac stand alone log server: no server function, just log collation server. this is the duplicated binary. by design, the log collation is going to help you simple the logging by store all the logs into one single place, one single file the whole site, with just one timeline. and the log collation mode 'poxy.local.log.collation_mode' is a LOCAL directive in records.config, that make it possible to active a single host as collation server while others as collation server, while still got the cluster management of config files. so, I think that traffic_server with log collation mode in client or server is just a must builtin function if we want to keep the log collation feature, and keep a completely dedicated log collation server may bring more code complex. was (Author: zym): the log collation works as: 1, dedicated log collation server: no http cache (or others) function active, bug still a traffic_server, with full functions installed, that way we just don't put any request on this traffic_server. someone with very high traffic may use this mode, just don't keep the collation server out of the http cache service. 2, mixed with cache server: both cache and logging collation server in active, we log for other hosts(collation clients). most of the users may choose this mode, it will help you collect all the logs into one single place, and easy for check or backup. 3, traffic_sac stand alone log server: no server function, just log collation server. this is the duplicated binary. by design, the log collation is going to help you simple the logging by store all the logs into one single place, one single file the whole site, with just one timeline. and the log collation mode 'poxy.local.log.collation_mode' is a LOCAL directive in records.config, that make it possible to active a single host as collation server while others as collation server, while still got the cluster management of config files. so, I think that traffic_server with log collation mode in client or server is just a must builtin function if we want to keep the log collation feature, and keep a completely dedicated log collation server may bring more code complex. > remove the traffic_sac, stand alone log collation server > > > Key: TS-4156 > URL: https://issues.apache.org/jira/browse/TS-4156 > Project: Traffic Server > Issue Type: Improvement > Components: Logging >Reporter: Zhao Yongming >Assignee: Zhao Yongming > Fix For: sometime > > > the stand alone collation server act as a dedicated log server from ATS, this > is a dedicated log product back in the Inktomi age, and we don't need it as > this functions are build into the traffic_server binary for free distribution. > it is time to nuke it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4156) remove the traffic_sac, stand alone log collation server
[ https://issues.apache.org/jira/browse/TS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125409#comment-15125409 ] Zhao Yongming commented on TS-4156: --- orphaned log is a point where we can improve, I think we can make some tools on that, and due to that the orphaned log is out of the mainline log file, it is hard to archive the single file of logging in that period, even we can collect all the orphaned logs into one single box. the orphaned log file happened when the log server down or under traffic issue, we seen very few orphaned logs after we improved the log collation server performance. > remove the traffic_sac, stand alone log collation server > > > Key: TS-4156 > URL: https://issues.apache.org/jira/browse/TS-4156 > Project: Traffic Server > Issue Type: Improvement > Components: Logging >Reporter: Zhao Yongming >Assignee: Zhao Yongming > Fix For: sometime > > > the stand alone collation server act as a dedicated log server from ATS, this > is a dedicated log product back in the Inktomi age, and we don't need it as > this functions are build into the traffic_server binary for free distribution. > it is time to nuke it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4156) remove the traffic_sac, stand alone log collation server
Zhao Yongming created TS-4156: - Summary: remove the traffic_sac, stand alone log collation server Key: TS-4156 URL: https://issues.apache.org/jira/browse/TS-4156 Project: Traffic Server Issue Type: Improvement Components: Logging Reporter: Zhao Yongming the stand alone collation server act as a dedicated log server from ATS, this is a dedicated log product back in the Inktomi age, and we don't need it as this functions are build into the traffic_server binary for free distribution. it is time to nuke it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4056) MemLeak: ~NetAccept() do not free alloc_cache(vc)
[ https://issues.apache.org/jira/browse/TS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-4056: -- Affects Version/s: 6.1.0 > MemLeak: ~NetAccept() do not free alloc_cache(vc) > - > > Key: TS-4056 > URL: https://issues.apache.org/jira/browse/TS-4056 > Project: Traffic Server > Issue Type: Bug > Components: Core >Affects Versions: 6.1.0 >Reporter: Oknet Xu > > NetAccpet::alloc_cache is a void pointor is used in net_accept(). > the alloc_cache does not release after NetAccept canceled. > I'm looking for all code, believe the "alloc_cache" is a bad idea here. > I create a pull request on github: > https://github.com/apache/trafficserver/pull/366 > also add a condition check for vc==NULL after allocate_vc() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4059) Default value for proxy.config.bin_path does not use value from config.layout
[ https://issues.apache.org/jira/browse/TS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046143#comment-15046143 ] Zhao Yongming commented on TS-4059: --- I think that Craig Forbes want the building & installation honor the '--bindir=DIRuser executables [EPREFIX/bin]', which is default to 'EPREFIX/bin' if not specified, I think you may submit patch if that does not work as you wish. IMO, all the _path config options should be removed as those are binary releasing options, as we are now open sourced with all layout configurable, we should remove them (or hardcode to the configure specified directory) from records config. patch welcome FYI > Default value for proxy.config.bin_path does not use value from config.layout > - > > Key: TS-4059 > URL: https://issues.apache.org/jira/browse/TS-4059 > Project: Traffic Server > Issue Type: Bug > Components: Configuration >Reporter: Craig Forbes > > The default value for proxy.config.bin_path defined in RecordsConfig.cc is > hard coded to "bin". > The value should be TS_BUILD_BINDIR so the value specified at configure time > is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4058) Logging doesn't work when TS is compiled and run w/ --with-user
[ https://issues.apache.org/jira/browse/TS-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046156#comment-15046156 ] Zhao Yongming commented on TS-4058: --- good catch, _cop is design to be run as root, and --with-user=danielxu specified the _server to be run as danielxu, that is the current setup. currently one unprivileged user should not run _cop, in the past it even fail if you want to make install as no-root, haha. in most case we would advice to run with _server directly for small testing with _server. It would be nice if you want can make _cop run with unprivileged user. > Logging doesn't work when TS is compiled and run w/ --with-user > --- > > Key: TS-4058 > URL: https://issues.apache.org/jira/browse/TS-4058 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Reporter: Daniel Xu >Assignee: Daniel Xu > > ie. we run this _without_ sudo. > traffic_cop output seems to point to permission errors that occur within > traffic_manager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3510) header_rewrite is blocking building on raspberry pi
Zhao Yongming created TS-3510: - Summary: header_rewrite is blocking building on raspberry pi Key: TS-3510 URL: https://issues.apache.org/jira/browse/TS-3510 Project: Traffic Server Issue Type: Bug Components: Build, Plugins Reporter: Zhao Yongming ARM support is so good that we just have raspberrypi fail to build for header_rewrite. {code} pi@raspberrypi ~/trafficserver/plugins/header_rewrite $ make -j 2 CXXconditions.lo CXXheader_rewrite.lo {standard input}: Assembler messages: {standard input}:1221: Error: selected processor does not support ARM mode `dmb' CXXlulu.lo CXXmatcher.lo CXXoperator.lo Makefile:689: recipe for target 'conditions.lo' failed make: *** [conditions.lo] Error 1 make: *** Waiting for unfinished jobs {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3472) SNI proxy alike feature for TS
[ https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-3472: -- Fix Version/s: sometime SNI proxy alike feature for TS -- Key: TS-3472 URL: https://issues.apache.org/jira/browse/TS-3472 Project: Traffic Server Issue Type: New Feature Components: SSL Reporter: Zhao Yongming Fix For: sometime when doing forward proxy only setup, the sniproxy: https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to setup a TLS layer proxy with SNI, very good for some dirty tasks. in ATS, there is already a very good support in all those basic components, add SNI blind proxy should be a very good feature, with tiny small changes maybe. SNI in TLS, will extent the proxy(on caching) into all TLS based services, such as mail etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3472) SNI proxy alike feature for TS
Zhao Yongming created TS-3472: - Summary: SNI proxy alike feature for TS Key: TS-3472 URL: https://issues.apache.org/jira/browse/TS-3472 Project: Traffic Server Issue Type: New Feature Components: SSL Reporter: Zhao Yongming when doing forward proxy only setup, the sniproxy: https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to setup a TLS layer proxy with SNI, very good for some dirty tasks. in ATS, there is already a very good support in all those basic components, add SNI blind proxy should be a very good feature, with tiny small changes maybe. SNI in TLS, will extent the proxy(on caching) into all TLS based services, such as mail etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2482) Problems with SOCKS
[ https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386809#comment-14386809 ] Zhao Yongming commented on TS-2482: --- no time to test, here is the rough patch: {code} diff --git a/proxy/http/HttpTransact.cc b/proxy/http/HttpTransact.cc index c6f55ed..cc4ffdc 100644 --- a/proxy/http/HttpTransact.cc +++ b/proxy/http/HttpTransact.cc @@ -865,7 +865,7 @@ HttpTransact::EndRemapRequest(State* s) / if (s-http_config_param-reverse_proxy_enabled !s-client_info.is_transparent - !incoming_request-is_target_in_url()) { + !(incoming_request-is_target_in_url() || incoming_request-m_host_length 0)) { / // the url mapping failed, reverse proxy was enabled, // and the request contains no host: {code} and: {code} diff --git a/iocore/net/Socks.cc b/iocore/net/Socks.cc index cfdd214..c04c0f4 100644 --- a/iocore/net/Socks.cc +++ b/iocore/net/Socks.cc @@ -62,7 +62,7 @@ SocksEntry::init(ProxyMutex * m, SocksNetVC * vc, unsigned char socks_support, u req_data.api_info = 0; req_data.xact_start = time(0); - assert(ats_is_ip4(target_addr)); + //assert(ats_is_ip4(target_addr)); ats_ip_copy(req_data.dest_ip, target_addr); //we dont have information about the source. set to destination's {code} the patch assert may need more work, and the socks server only do http checking and no other socks support. that is a not so good socks server indeed, I'd see someone take it and continue to improve the socks server feature. so, paste the patch here, before it lost in time. Problems with SOCKS --- Key: TS-2482 URL: https://issues.apache.org/jira/browse/TS-2482 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Radim Kolar Assignee: weijin Fix For: sometime There are several problems with using SOCKS. I am interested in case when TF is sock client. Client sends HTTP request and TF uses SOCKS server to make connection to internet. a/ - not documented enough in default configs From default configs comments it seems that for running TF 4.1.2 as socks client, it is sufficient to add one line to socks.config: dest_ip=0.0.0.0-255.255.255.255 parent=10.0.0.7:9050 but socks proxy is not used. If i run tcpdump sniffing packets TF never tries to connect to that SOCKS. From source code - https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it looks that is needed to set proxy.config.socks.socks_needed to activate socks support. This should be documented in both sample files: socks.config and record.config b/ after enabling socks, i am hit by this assert: Assertion failed: (ats_is_ip4(target_addr)), function init, file Socks.cc, line 65. i run on dual stack system (ip4,ip6). This code is setting default destination for SOCKS request? Can not you use just 127.0.0.1 for case if client gets connected over IP6? https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3472) SNI proxy alike feature for TS
[ https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386893#comment-14386893 ] Zhao Yongming commented on TS-3472: --- the sniproxy do not need to intercept with ssl server|client, it only take SNI name and route to the backend. it does not even need to link against with SSL libary. with ssl_multicert.config: dest_ip=* action=tunnel does not work as we need a ssl cert/key file to act as a SSL intercept? SNI proxy alike feature for TS -- Key: TS-3472 URL: https://issues.apache.org/jira/browse/TS-3472 Project: Traffic Server Issue Type: New Feature Components: SSL Reporter: Zhao Yongming Fix For: sometime when doing forward proxy only setup, the sniproxy: https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to setup a TLS layer proxy with SNI, very good for some dirty tasks. in ATS, there is already a very good support in all those basic components, add SNI blind proxy should be a very good feature, with tiny small changes maybe. SNI in TLS, will extent the proxy(on caching) into all TLS based services, such as mail etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3472) SNI proxy alike feature for TS
[ https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386944#comment-14386944 ] Zhao Yongming commented on TS-3472: --- yes, the sniproxy make it possible to do proxy(without cache) for TLS based service with remap like origin routing control, that like some layer7 routing|proxy service? sometimes in forwarding proxy, proxy is much important than caching SNI proxy alike feature for TS -- Key: TS-3472 URL: https://issues.apache.org/jira/browse/TS-3472 Project: Traffic Server Issue Type: New Feature Components: SSL Reporter: Zhao Yongming Fix For: sometime when doing forward proxy only setup, the sniproxy: https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to setup a TLS layer proxy with SNI, very good for some dirty tasks. in ATS, there is already a very good support in all those basic components, add SNI blind proxy should be a very good feature, with tiny small changes maybe. SNI in TLS, will extent the proxy(on caching) into all TLS based services, such as mail etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3472) SNI proxy alike feature for TS
[ https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387868#comment-14387868 ] Zhao Yongming edited comment on TS-3472 at 3/31/15 2:52 AM: the forwarding proxy have nothing to control, that means they try to proxy if not cache-able. while the reverse proxy do caching on the site side, most of the forwarding proxy works on the user side. was (Author: zym): the forwarding proxy have nothing to control, that means they try to proxy if not cache-able. while the reverse proxy do caching on the site side, most of the forwarding proxy works on the user site. SNI proxy alike feature for TS -- Key: TS-3472 URL: https://issues.apache.org/jira/browse/TS-3472 Project: Traffic Server Issue Type: New Feature Components: SSL Reporter: Zhao Yongming Fix For: sometime when doing forward proxy only setup, the sniproxy: https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to setup a TLS layer proxy with SNI, very good for some dirty tasks. in ATS, there is already a very good support in all those basic components, add SNI blind proxy should be a very good feature, with tiny small changes maybe. SNI in TLS, will extent the proxy(on caching) into all TLS based services, such as mail etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3472) SNI proxy alike feature for TS
[ https://issues.apache.org/jira/browse/TS-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387868#comment-14387868 ] Zhao Yongming commented on TS-3472: --- the forwarding proxy have nothing to control, that means they try to proxy if not cache-able. while the reverse proxy do caching on the site side, most of the forwarding proxy works on the user site. SNI proxy alike feature for TS -- Key: TS-3472 URL: https://issues.apache.org/jira/browse/TS-3472 Project: Traffic Server Issue Type: New Feature Components: SSL Reporter: Zhao Yongming Fix For: sometime when doing forward proxy only setup, the sniproxy: https://github.com/dlundquist/sniproxy.git is a very tiny but cool effort to setup a TLS layer proxy with SNI, very good for some dirty tasks. in ATS, there is already a very good support in all those basic components, add SNI blind proxy should be a very good feature, with tiny small changes maybe. SNI in TLS, will extent the proxy(on caching) into all TLS based services, such as mail etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2205) AIO caused system hang
[ https://issues.apache.org/jira/browse/TS-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386140#comment-14386140 ] Zhao Yongming commented on TS-2205: --- TS-3458 is reported as a index sycing issue. things need more information, I will try. basicly we don't find anything that need to look anymore, I will let this issue open for a while and close if no further information. AIO caused system hang -- Key: TS-2205 URL: https://issues.apache.org/jira/browse/TS-2205 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 4.0.1 Reporter: Zhao Yongming Assignee: weijin Priority: Critical Fix For: 6.0.0 the system may hang with AIO thread CPU usage rising: {code} top - 17:10:46 up 38 days, 22:43, 2 users, load average: 11.34, 2.97, 2.75 Tasks: 512 total, 55 running, 457 sleeping, 0 stopped, 0 zombie Cpu(s): 6.9%us, 54.8%sy, 0.0%ni, 37.3%id, 0.0%wa, 0.0%hi, 0.9%si, 0.0%st Mem: 65963696k total, 64318444k used, 1645252k free, 241496k buffers Swap: 33554424k total,20416k used, 33534008k free, 14864188k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 32498 ats 20 0 59.3g 45g 25m R 65.8 72.1 24:44.15 [ET_AIO 5] 3213 root 20 0 000 S 15.4 0.0 13:38.32 kondemand/7 3219 root 20 0 000 S 15.1 0.0 16:32.78 kondemand/13 4 root 20 0 000 S 13.8 0.0 33:18.13 ksoftirqd/0 13 root 20 0 000 S 13.4 0.0 21:45.18 ksoftirqd/2 37 root 20 0 000 S 13.4 0.0 19:42.34 ksoftirqd/8 45 root 20 0 000 S 13.4 0.0 18:31.17 ksoftirqd/10 32483 ats 20 0 59.3g 45g 25m R 13.4 72.1 16:47.14 [ET_AIO 6] 32487 ats 20 0 59.3g 45g 25m R 13.4 72.1 16:46.93 [ET_AIO 2] 25 root 20 0 000 S 13.1 0.0 19:02.18 ksoftirqd/5 65 root 20 0 000 S 13.1 0.0 19:24.04 ksoftirqd/15 32477 ats 20 0 59.3g 45g 25m R 13.1 72.1 16:32.90 [ET_AIO 0] 32478 ats 20 0 59.3g 45g 25m R 13.1 72.1 16:49.77 [ET_AIO 1] 32479 ats 20 0 59.3g 45g 25m S 13.1 72.1 16:41.77 [ET_AIO 2] 32481 ats 20 0 59.3g 45g 25m R 13.1 72.1 16:50.40 [ET_AIO 4] 32482 ats 20 0 59.3g 45g 25m R 13.1 72.1 16:47.42 [ET_AIO 5] 32484 ats 20 0 59.3g 45g 25m R 13.1 72.1 16:25.81 [ET_AIO 7] 32485 ats 20 0 59.3g 45g 25m S 13.1 72.1 16:52.71 [ET_AIO 0] 32486 ats 20 0 59.3g 45g 25m S 13.1 72.1 16:51.69 [ET_AIO 1] 32491 ats 20 0 59.3g 45g 25m S 13.1 72.1 16:50.58 [ET_AIO 6] 32492 ats 20 0 59.3g 45g 25m S 13.1 72.1 16:49.12 [ET_AIO 7] 32480 ats 20 0 59.3g 45g 25m S 12.8 72.1 16:47.39 [ET_AIO 3] 32488 ats 20 0 59.3g 45g 25m R 12.8 72.1 16:52.16 [ET_AIO 3] 32489 ats 20 0 59.3g 45g 25m S 12.8 72.1 16:50.79 [ET_AIO 4] 32490 ats 20 0 59.3g 45g 25m R 12.8 72.1 16:52.61 [ET_AIO 5] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-965) cache.config can't deal with both revalidate= and ttl-in-cache= specified
[ https://issues.apache.org/jira/browse/TS-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352036#comment-14352036 ] Zhao Yongming commented on TS-965: -- I have no idea of what is the detail, as cache.config is a multi-matching rule system, and there is some hard-coded rules which is not explained anywhare, for example: if you matched with 'no-cache', then it will not cache. I don't like the idea with the cache-control matching, which is hard to extend and hard to use in real world, maybe we should avoid using it in fever of the LUA remaping and LUA plugins cache.config can't deal with both revalidate= and ttl-in-cache= specified - Key: TS-965 URL: https://issues.apache.org/jira/browse/TS-965 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.1.0, 3.0.1 Reporter: Igor Galić Assignee: Alan M. Carroll Labels: A, cache-control Fix For: 5.3.0 If both of these options are specified (with the same time?), nothing is cached at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3197) dest_ip in cache.config should be expand to network style
[ https://issues.apache.org/jira/browse/TS-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-3197: -- Summary: dest_ip in cache.config should be expand to network style (was: dest_ip in cache.config doesn't work) dest_ip in cache.config should be expand to network style - Key: TS-3197 URL: https://issues.apache.org/jira/browse/TS-3197 Project: Traffic Server Issue Type: Bug Components: Cache, Configuration, Performance Reporter: Luca Rea Fix For: sometime Hi, I'm tring to exclude a /22 netblock from the cache system but the syntax dest_ip doesn't work, detalis below: dest_ip=x.y.84.0-x.y.87.255 action=never-cache I've tried to stop,clear-cache,start several times but every time images have been put into the cache and log shows NONE FIN FIN TCP_MEM_HIT or NONE FIN FIN TCP_IMS_HIT. Other Info: proxy.node.version.manager.long=Apache Traffic Server - traffic_manager - 5.1.0 - (build # 81013 on Sep 10 2014 at 13:13:42) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3197) dest_ip in cache.config should be expand to network style
[ https://issues.apache.org/jira/browse/TS-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-3197: -- Priority: Minor (was: Major) dest_ip in cache.config should be expand to network style - Key: TS-3197 URL: https://issues.apache.org/jira/browse/TS-3197 Project: Traffic Server Issue Type: Improvement Components: Cache, Configuration, Performance Reporter: Luca Rea Priority: Minor Fix For: sometime Hi, I'm tring to exclude a /22 netblock from the cache system but the syntax dest_ip doesn't work, detalis below: dest_ip=x.y.84.0-x.y.87.255 action=never-cache I've tried to stop,clear-cache,start several times but every time images have been put into the cache and log shows NONE FIN FIN TCP_MEM_HIT or NONE FIN FIN TCP_IMS_HIT. Other Info: proxy.node.version.manager.long=Apache Traffic Server - traffic_manager - 5.1.0 - (build # 81013 on Sep 10 2014 at 13:13:42) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3197) dest_ip in cache.config should be expand to network style
[ https://issues.apache.org/jira/browse/TS-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-3197: -- Issue Type: Improvement (was: Bug) dest_ip in cache.config should be expand to network style - Key: TS-3197 URL: https://issues.apache.org/jira/browse/TS-3197 Project: Traffic Server Issue Type: Improvement Components: Cache, Configuration, Performance Reporter: Luca Rea Fix For: sometime Hi, I'm tring to exclude a /22 netblock from the cache system but the syntax dest_ip doesn't work, detalis below: dest_ip=x.y.84.0-x.y.87.255 action=never-cache I've tried to stop,clear-cache,start several times but every time images have been put into the cache and log shows NONE FIN FIN TCP_MEM_HIT or NONE FIN FIN TCP_IMS_HIT. Other Info: proxy.node.version.manager.long=Apache Traffic Server - traffic_manager - 5.1.0 - (build # 81013 on Sep 10 2014 at 13:13:42) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3212) 200 code is returned as 304
[ https://issues.apache.org/jira/browse/TS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344883#comment-14344883 ] Zhao Yongming commented on TS-3212: --- yeah, let us start tracking on cache control issue then. and what confusing me is that why IMS is there if your resonse including all the 'no-cache' directives to inform the client that the content is not cache-able. that is weird. anyway, keep this issue open until we fix the cache-control and recheck it. 200 code is returned as 304 --- Key: TS-3212 URL: https://issues.apache.org/jira/browse/TS-3212 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Rea Fix For: sometime The live streaming videos from akamaihd.net CDN cannot be watched because ATS rewrite codes 200 into 304 and videos enter continuosly in buffering status: {code} GET http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKRb=500,300,700,900,1200hdcore=3.1.0plugin=aasp-3.1.0.43.124 HTTP/1.1 Host: abclive.abcnews.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Referer: http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA== Connection: keep-alive HTTP/1.1 200 OK Server: ContactLab Mime-Version: 1.0 Content-Type: video/abst Content-Length: 122 Last-Modified: Tue, 25 Nov 2014 05:28:32 GMT Expires: Tue, 25 Nov 2014 15:31:53 GMT Cache-Control: max-age=0, no-cache Pragma: no-cache Date: Tue, 25 Nov 2014 15:31:53 GMT access-control-allow-origin: * Set-Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==; path=/z/abc_live1@136327/; domain=abclive.abcnews.com Age: 0 Connection: keep-alive GET http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKRb=500,300,700,900,1200hdcore=3.1.0plugin=aasp-3.1.0.43.124 HTTP/1.1 Host: abclive.abcnews.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Referer: http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA== Connection: keep-alive If-Modified-Since: Tue, 25 Nov 2014 05:28:32 GMT HTTP/1.1 304 Not Modified Date: Tue, 25 Nov 2014 15:31:58 GMT Expires: Tue, 25 Nov 2014 15:31:58 GMT Cache-Control: max-age=0, no-cache Connection: keep-alive Server: ContactLab {code} using the url_regex to skip cache/IMS doesn't work, the workaround is the following line in records.config: CONFIG proxy.config.http.cache.cache_urls_that_look_dynamic INT 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3360) TS don't use peer IP address from icp.config
[ https://issues.apache.org/jira/browse/TS-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344597#comment-14344597 ] Zhao Yongming commented on TS-3360: --- I think that is only an default config file misleading, as due to the official doc: https://docs.trafficserver.apache.org/en/latest/reference/configuration/icp.config.en.html#std:configfile-icp.config the Hostname and HostIP only need to specify one of them, not both :D can you provide an update on the default config file to make it clear? TS don't use peer IP address from icp.config Key: TS-3360 URL: https://issues.apache.org/jira/browse/TS-3360 Project: Traffic Server Issue Type: Bug Components: Configuration, ICP Reporter: Anton Ageev Fix For: 5.3.0 I use TS 5.0.1. I try to add peer in icp.config: {code} peer1|192.168.0.2|2|80|3130|0|0.0.0.0|1| {code} But I got in the log: {code} DEBUG: (icp_warn) ICP query send, res=90, ip=*Not IP address [0]* {code} The only way to specify peer IP is to specify *real* hostname: {code} google.com|192.168.0.2|2|80|3130|0|0.0.0.0|1| {code} ICP request to google.com in the log: {code} DEBUG: (icp) [ICP_QUEUE_REQUEST] Id=617 send query to [173.194.112.96:3130] {code} Host IP (second field) is parsed to {{\*Not IP address \[0\]\*}} always. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3212) 200 code is returned as 304
[ https://issues.apache.org/jira/browse/TS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344608#comment-14344608 ] Zhao Yongming commented on TS-3212: --- [~luca.rea] are you still following on this issue? I think we have find out some of the dark side: 1. your client send with a IMS but don't want the proxy/cache return with a 304. this is really hard to do that unless you make the IMS a fail in compare. 2, the cache.config is not working on never-cache, as expected as the no touch and passing-through. that is a dark side of ATS in the cache-control, IMO. I am going to sort out most of the cache-control issues as much as possible, I'd like to hear from you. 200 code is returned as 304 --- Key: TS-3212 URL: https://issues.apache.org/jira/browse/TS-3212 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Rea Fix For: sometime The live streaming videos from akamaihd.net CDN cannot be watched because ATS rewrite codes 200 into 304 and videos enter continuosly in buffering status: {code} GET http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKRb=500,300,700,900,1200hdcore=3.1.0plugin=aasp-3.1.0.43.124 HTTP/1.1 Host: abclive.abcnews.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Referer: http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA== Connection: keep-alive HTTP/1.1 200 OK Server: ContactLab Mime-Version: 1.0 Content-Type: video/abst Content-Length: 122 Last-Modified: Tue, 25 Nov 2014 05:28:32 GMT Expires: Tue, 25 Nov 2014 15:31:53 GMT Cache-Control: max-age=0, no-cache Pragma: no-cache Date: Tue, 25 Nov 2014 15:31:53 GMT access-control-allow-origin: * Set-Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==; path=/z/abc_live1@136327/; domain=abclive.abcnews.com Age: 0 Connection: keep-alive GET http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKRb=500,300,700,900,1200hdcore=3.1.0plugin=aasp-3.1.0.43.124 HTTP/1.1 Host: abclive.abcnews.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Referer: http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA== Connection: keep-alive If-Modified-Since: Tue, 25 Nov 2014 05:28:32 GMT HTTP/1.1 304 Not Modified Date: Tue, 25 Nov 2014 15:31:58 GMT Expires: Tue, 25 Nov 2014 15:31:58 GMT Cache-Control: max-age=0, no-cache Connection: keep-alive Server: ContactLab {code} using the url_regex to skip cache/IMS doesn't work, the workaround is the following line in records.config: CONFIG proxy.config.http.cache.cache_urls_that_look_dynamic INT 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3412) Segmentation fault ET_CLUSTER
[ https://issues.apache.org/jira/browse/TS-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-3412: -- Description: Can anyone help me please? 2.6.32-431.el6.x86_64 mem : 16GB cpu : 6core * 2 HS = 24core {noformat} kernel: [ET_CLUSTER 1][4508]: segfault at a8 ip 006c1571 sp 2b5b58738890 error 4 in traffic_server[40+421000] {noformat} traffic.out {noformat} traffic_server: using root directory '/opt/ats' traffic_server: Segmentation fault (Address not mapped to object [0xa8])traffic_server - STACK TRACE: /opt/ats/bin/traffic_server(crash_logger_invoke(int, siginfo*, void*)+0x99)[0x4aaf19] /lib64/libpthread.so.0(+0xf710)[0x2b5a25658710] /opt/ats/bin/traffic_server(ClusterProcessor::connect_local(Continuation*, ClusterVCToken*, int, int)+0xa /opt/ats/bin/traffic_server(cache_op_ClusterFunction(ClusterHandler*, void*, int)+0xabd)[0x6a71cd] /opt/ats/bin/traffic_server(ClusterHandler::process_large_control_msgs()+0xe9)[0x6ab5e9] /opt/ats/bin/traffic_server(ClusterHandler::update_channels_read()+0x8b)[0x6b0d7b] /opt/ats/bin/traffic_server(ClusterHandler::process_read(long)+0x138)[0x6b1528] /opt/ats/bin/traffic_server(ClusterHandler::mainClusterEvent(int, Event*)+0x176)[0x6b3f56] /opt/ats/bin/traffic_server(ClusterState::IOComplete()+0x8a)[0x6b701a] /opt/ats/bin/traffic_server(ClusterState::doIO_read_event(int, void*)+0xa7)[0x6b7307] /opt/ats/bin/traffic_server[0x72b2e7] /opt/ats/bin/traffic_server[0x72c53d] /opt/ats/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x1f2)[0x7213c2] /opt/ats/bin/traffic_server(EThread::process_event(Event*, int)+0x125)[0x74d4e5] /opt/ats/bin/traffic_server(EThread::execute()+0x4c9)[0x74de29] /opt/ats/bin/traffic_server[0x74c92a] /lib64/libpthread.so.0(+0x79d1)[0x2b5a256509d1] /lib64/libc.so.6(clone+0x6d)[0x2b5a26fa38fd] traffic_server: using root directory '/opt/ats' traffic_server: Terminated (Signal sent by kill() 28739 0)[E. Mgmt] log == [TrafficManager] using rootats' {noformat} records.config {noformat} CONFIG proxy.config.proxy_name STRING cluster-v530 LOCAL proxy.local.cluster.type INT 1 CONFIG proxy.config.cluster.ethernet_interface STRING bond0 CONFIG proxy.config.cluster.cluster_port INT 8086 CONFIG proxy.config.cluster.rsport INT 8088 CONFIG proxy.config.cluster.mcport INT 8089 CONFIG proxy.config.cluster.mc_group_addr STRING 224.0.1.40 CONFIG proxy.config.cluster.cluster_configuration STRING cluster.config CONFIG proxy.config.cluster.threads INT 4 {noformat} was: Can anyone help me please? 2.6.32-431.el6.x86_64 mem : 16GB cpu : 6core * 2 HS = 24core {noformat} kernel: [ET_CLUSTER 1][4508]: segfault at a8 ip 006c1571 sp 2b5b58738890 error 4 in traffic_server[40+421000] {noformat} traffic.out {noformat} traffic_server: using root directory '/opt/ats' traffic_server: Segmentation fault (Address not mapped to object [0xa8])traffic_server - STACK TRACE: /opt/ats/bin/traffic_server(_Z19crash_logger_invokeiP7siginfoPv+0x99)[0x4aaf19] /lib64/libpthread.so.0(+0xf710)[0x2b5a25658710] /opt/ats/bin/traffic_server(_ZN16ClusterProcessor13connect_localEP12ContinuationP14ClusterVCTokenii+0xa /opt/ats/bin/traffic_server(_Z24cache_op_ClusterFunctionP14ClusterHandlerPvi+0xabd)[0x6a71cd] /opt/ats/bin/traffic_server(_ZN14ClusterHandler26process_large_control_msgsEv+0xe9)[0x6ab5e9] /opt/ats/bin/traffic_server(_ZN14ClusterHandler20update_channels_readEv+0x8b)[0x6b0d7b] /opt/ats/bin/traffic_server(_ZN14ClusterHandler12process_readEl+0x138)[0x6b1528] /opt/ats/bin/traffic_server(_ZN14ClusterHandler16mainClusterEventEiP5Event+0x176)[0x6b3f56] /opt/ats/bin/traffic_server(_ZN12ClusterState10IOCompleteEv+0x8a)[0x6b701a] /opt/ats/bin/traffic_server(_ZN12ClusterState15doIO_read_eventEiPv+0xa7)[0x6b7307] /opt/ats/bin/traffic_server[0x72b2e7] /opt/ats/bin/traffic_server[0x72c53d] /opt/ats/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x1f2)[0x7213c2] /opt/ats/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x125)[0x74d4e5] /opt/ats/bin/traffic_server(_ZN7EThread7executeEv+0x4c9)[0x74de29] /opt/ats/bin/traffic_server[0x74c92a] /lib64/libpthread.so.0(+0x79d1)[0x2b5a256509d1] /lib64/libc.so.6(clone+0x6d)[0x2b5a26fa38fd] traffic_server: using root directory '/opt/ats' traffic_server: Terminated (Signal sent by kill() 28739 0)[E. Mgmt] log == [TrafficManager] using rootats' {noformat} records.config {noformat} CONFIG proxy.config.proxy_name STRING cluster-v530 LOCAL proxy.local.cluster.type INT 1 CONFIG proxy.config.cluster.ethernet_interface STRING bond0 CONFIG proxy.config.cluster.cluster_port INT 8086 CONFIG proxy.config.cluster.rsport INT 8088 CONFIG proxy.config.cluster.mcport INT 8089 CONFIG proxy.config.cluster.mc_group_addr STRING 224.0.1.40 CONFIG proxy.config.cluster.cluster_configuration STRING cluster.config CONFIG proxy.config.cluster.threads INT 4 {noformat} Segmentation fault ET_CLUSTER
[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331983#comment-14331983 ] Zhao Yongming commented on TS-3395: --- I really don't know what you want, you want to stress out what ATS can do? or you want ATS do what Nginx/Squid would do? in both issue, I have point out the ATS way we deal with your issues, even guide you step by step towards the root cause and how we deal it with ATS, you are now using ATS, very different from Squid etc, it is powerfull and design in some strange way, if you are the fresh user, find out the ATS way is a good start, as it turns out that ATS will perform well in most of the real world cases. on the testing issue, please refer to jtest (tools/jtest/) on testing if you dont know that, that is an other good stress tool which is suitble for stress a performance monster as ATS. anyway, welcome to the ATS Colosseum. Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops: !http://i.imgur.com/dFTJI16.png! traffic_top says 25% ram hit, 37% fresh, 63% cold. So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache. Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache. This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330321#comment-14330321 ] Zhao Yongming commented on TS-3395: --- when you get the disk write IO bottoleneck, you will get a high water writes, and all others will not able to write when ATS will try to forward other request to the origin, that is in the users view, and that will result in cache hit ratio decreased, but you will get a higher request per second nubumer than others. this is a feature by design I think :D Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops: !http://i.imgur.com/dFTJI16.png! traffic_top says 25% ram hit, 37% fresh, 63% cold. So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache. Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache. This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330323#comment-14330323 ] Zhao Yongming commented on TS-3395: --- my suggestion on performance testing is always avoid the disk IO(iops), as that is a hard limit on ATS performance, or any other proxy/cache system, and you can even calc out the real performance in production if that is the whole system bottoleneck. Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops: !http://i.imgur.com/dFTJI16.png! traffic_top says 25% ram hit, 37% fresh, 63% cold. So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache. Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache. This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330336#comment-14330336 ] Zhao Yongming commented on TS-3395: --- if you take ATS as a proxy, why not limit the connections on origin side? we have more options to protect the origin server than limit the disk io. if you take ATS as a cache, the disk io and space is the key point of the cache system, why not add more disks if you can? a disk write bottleneck is really rare case when we talking about the cache system, right? Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops: !http://i.imgur.com/dFTJI16.png! traffic_top says 25% ram hit, 37% fresh, 63% cold. So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache. Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache. This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330349#comment-14330349 ] Zhao Yongming edited comment on TS-3395 at 2/21/15 5:08 PM: good, that is the case what I'd like to avoid. haha. I am talking about the limit on the origin side, while TS-3386 is facing on the UA side connection limit, and in your case, the limit on the UA and OS messed up when you deal with the 127.0.0.1 at the beginning. in your practice, we strongly used the limit on the UA side, which is a very good solution for both cache origin nice. please refer to 'proxy.config.http.origin_max_connections' for the limit on the OS side when cache connections holding by waiting or other issue, the httpSM will keep alive, which may cause you very huge memories, when in our productions with 20kqps, we would like to keep the cache connections in about 1k-2k. that is critical to a busy system, when you want it stable in service. and cache write may cause more performace decrease than read, pay more attention to cache writes was (Author: zym): good, that is the case what I'd like to avoid. haha. I am talking about the limit on the origin side, while TS-3386 is facing on the UA side connection limit, and in your case, the limit on the UA and OS messed up when you deal with the 127.0.0.1 at the beginning. in your practice, we strongly used the limit on the UA side, which is a very good solution for both cache origin nice. please refer to 'proxy.config.http.origin_max_connections' for the limit on the UA side when cache connections holding by waiting or other issue, the httpSM will keep alive, which may cause you very huge memories, when in our productions with 20kqps, we would like to keep the cache connections in about 1k-2k. that is critical to a busy system, when you want it stable in service. and cache write may cause more performace decrease than read, pay more attention to cache writes Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops:
[jira] [Comment Edited] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330349#comment-14330349 ] Zhao Yongming edited comment on TS-3395 at 2/21/15 5:08 PM: good, that is the case what I'd like to avoid. haha. I am talking about the limit on the origin side, while TS-3386 is facing on the UA side connection limit, and in your case, the limit on the UA and OS messed up when you deal with the 127.0.0.1 at the beginning. in your practice, we strongly used the limit on the OS side, which is a very good solution for both cache origin nice. please refer to 'proxy.config.http.origin_max_connections' for the limit on the OS side when cache connections holding by waiting or other issue, the httpSM will keep alive, which may cause you very huge memories, when in our productions with 20kqps, we would like to keep the cache connections in about 1k-2k. that is critical to a busy system, when you want it stable in service. and cache write may cause more performace decrease than read, pay more attention to cache writes was (Author: zym): good, that is the case what I'd like to avoid. haha. I am talking about the limit on the origin side, while TS-3386 is facing on the UA side connection limit, and in your case, the limit on the UA and OS messed up when you deal with the 127.0.0.1 at the beginning. in your practice, we strongly used the limit on the UA side, which is a very good solution for both cache origin nice. please refer to 'proxy.config.http.origin_max_connections' for the limit on the OS side when cache connections holding by waiting or other issue, the httpSM will keep alive, which may cause you very huge memories, when in our productions with 20kqps, we would like to keep the cache connections in about 1k-2k. that is critical to a busy system, when you want it stable in service. and cache write may cause more performace decrease than read, pay more attention to cache writes Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops:
[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330349#comment-14330349 ] Zhao Yongming commented on TS-3395: --- good, that is the case what I'd like to avoid. haha. I am talking about the limit on the origin side, while TS-3386 is facing on the UA side connection limit, and in your case, the limit on the UA and OS messed up when you deal with the 127.0.0.1 at the beginning. in your practice, we strongly used the limit on the UA side, which is a very good solution for both cache origin nice. please refer to 'proxy.config.http.origin_max_connections' for the limit on the UA side when cache connections holding by waiting or other issue, the httpSM will keep alive, which may cause you very huge memories, when in our productions with 20kqps, we would like to keep the cache connections in about 1k-2k. that is critical to a busy system, when you want it stable in service. and cache write may cause more performace decrease than read, pay more attention to cache writes Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops: !http://i.imgur.com/dFTJI16.png! traffic_top says 25% ram hit, 37% fresh, 63% cold. So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache. Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache. This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA
[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330352#comment-14330352 ] Zhao Yongming commented on TS-3395: --- and in practice, the cache write connections will always less than origin_max_connections, sounds perfect? Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops: !http://i.imgur.com/dFTJI16.png! traffic_top says 25% ram hit, 37% fresh, 63% cold. So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache. Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache. This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3395) Hit ratio drops with high concurrency
[ https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330257#comment-14330257 ] Zhao Yongming commented on TS-3395: --- well, if that is the disk IO bottlenetck, I think that is something reasonable, can you please attach a disk iops verion of the disk I/O? Hit ratio drops with high concurrency - Key: TS-3395 URL: https://issues.apache.org/jira/browse/TS-3395 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Bruno Fix For: 5.3.0 I'm doing some tests and I've noticed that the hit ratio drops with more than 300 simultaneous http connections. The cache is on a raw disk of 500gb and it's not filled, so no eviction. The ram cache is disabled. The test is done with web-polygraph. Content size vary from 5kb to 20kb uniformly, expected hit ratio 60%, 2000 http connections, documents expire after months. There's no Vary. !http://i.imgur.com/Zxlhgnf.png! Then I thought it could be a problem of polygraph. I wrote my own client/server test code, it works fine also with squid, varnish and nginx. I register a hit if I get either cR or cH in the headers. {noformat} 2015/02/19 12:38:28 Starting 100 requests 2015/02/19 12:37:58 Elapsed: 3m51.23552164s 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s 2015/02/19 12:37:58 Average size: 12.50kb/req 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s 2015/02/19 12:37:58 Errors: 0 2015/02/19 12:37:58 Offered Hit ratio: 59.95% 2015/02/19 12:37:58 Measured Hit ratio: 37.20% 2015/02/19 12:37:58 Hit bytes: 4649000609 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req {noformat} So similar results, 37.20% on average. Then I thought that could be a problem of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit ratio, but request rate is very slow compared to ATS for obvious reasons. Then I wanted to check if with 200 connections but with longer test time hit ratio also dropped, but no, it's fine: !http://i.imgur.com/oMHscuf.png! So not a problem of my tests I guess. Then I realized by debugging the test server that the same url was asked twice. Out of 100 requests, 78600 urls were asked at least twice. An url was even requested 9 times. These same url are not requested close to each other: even more than 30sec can pass from one request to the other for the same url. I also tweaked the following parameters: {noformat} CONFIG proxy.config.http.cache.fuzz.time INT 0 CONFIG proxy.config.http.cache.fuzz.min_time INT 0 CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.00 CONFIG proxy.config.http.cache.max_open_read_retries INT 4 CONFIG proxy.config.http.cache.open_read_retry_time INT 500 {noformat} And this is the result with polygraph, similar results: !http://i.imgur.com/YgOndhY.png! Tweaked the read-while-writer option, and yet having similar results. Then I've enabled 1GB of ram, it is slightly better at the beginning, but then it drops: !http://i.imgur.com/dFTJI16.png! traffic_top says 25% ram hit, 37% fresh, 63% cold. So given that it doesn't seem to be a concurrency problem when requesting the url to the origin server, could it be a problem of concurrent write access to the cache? So that some pages are not cached at all? The traffoc_top fresh percentage also makes me think it can be a problem in writing the cache. Not sure if I explained the problem correctly, ask me further information in case. But in summary: hit ratio drops with a high number of connections, and the problem seems related to pages that are not written to the cache. This is some related issue: http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E Also this: http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3164) why the load of trafficserver occurrs a abrupt rise on a occasion ?
[ https://issues.apache.org/jira/browse/TS-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316244#comment-14316244 ] Zhao Yongming commented on TS-3164: --- I have seen that, but I got some very short lockup like situation, mostly less than 15s. still don't know why why the load of trafficserver occurrs a abrupt rise on a occasion ? --- Key: TS-3164 URL: https://issues.apache.org/jira/browse/TS-3164 Project: Traffic Server Issue Type: Bug Components: Core Environment: CentOS 6.3 64bit, 8 cores, 128G mem Reporter: taoyunxing Fix For: sometime I use Tsar to monitor the traffic status of the ATS 4.2.0, and come across the following problem: {code} Time ---cpu-- ---mem-- ---tcp-- -traffic --sda--- --sdb--- --sdc--- ---load- Time util util retranbytin bytout util util util load1 03/11/14-18:20 40.6787.19 3.3624.5M 43.9M13.0294.68 0.00 5.34 03/11/14-18:25 40.3087.20 3.2722.5M 42.6M12.3894.87 0.00 5.79 03/11/14-18:30 40.8484.67 3.4421.4M 42.0M13.2995.37 0.00 6.28 03/11/14-18:35 43.6387.36 3.2123.8M 45.0M13.2393.99 0.00 7.37 03/11/14-18:40 42.2587.37 3.0924.2M 44.8M12.8495.77 0.00 7.25 03/11/14-18:45 42.9687.44 3.4623.3M 46.0M12.9695.84 0.00 7.10 03/11/14-18:50 44.0087.42 3.4922.3M 43.0M14.1794.99 0.00 6.57 03/11/14-18:55 42.2087.44 3.4622.3M 43.6M13.1996.05 0.00 6.09 03/11/14-19:00 44.9087.53 3.6023.6M 46.5M13.6196.67 0.00 8.06 03/11/14-19:05 46.2687.73 3.2425.8M 49.1M15.3994.05 0.00 9.98 03/11/14-19:10 43.8587.69 3.1925.4M 50.9M12.8897.80 0.00 7.99 03/11/14-19:15 45.2887.69 3.3625.6M 49.6M13.1096.86 0.00 7.47 03/11/14-19:20 44.1185.20 3.2924.1M 47.8M14.2496.75 0.00 5.82 03/11/14-19:25 45.2687.78 3.5224.4M 47.7M13.2195.44 0.00 7.61 03/11/14-19:30 44.8387.80 3.6425.7M 50.8M13.2798.02 0.00 6.85 03/11/14-19:35 44.8987.78 3.6123.9M 49.0M13.3497.42 0.00 7.04 03/11/14-19:40 69.2188.88 0.5518.3M 33.7M11.3971.23 0.00 65.80 03/11/14-19:45 72.4788.66 0.2715.4M 31.6M11.5172.31 0.00 11.56 03/11/14-19:50 44.8788.72 4.1122.7M 46.3M12.9997.33 0.00 8.29 {code} in addition, top command show {code} hi:0 ni:0 si:45.56 st:0 sy:13.92 us:12.58 wa:14.3 id:15.96 {code} who help me ? thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted
[ https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316482#comment-14316482 ] Zhao Yongming commented on TS-3386: --- well, things get more interesting. q1: why you will lose the cached content in a restart of the traffic server?? q1.1: is that a cache issue? q2: you are going to protect the origin server, why you think that limit on the UA side connection is a better solution to the limit on the origin side? q2.1 have you seen any occurrence of connection(httpSM) hanghup? q2.2 what is a better way to handle of the connection issue, for example timeout? when you try to handle tons of cache, tons of the traffic, keep it simple, keep it robust always better than anything intelligent. yes, we have fixed many cache issue we meet, http SM issues, and connections timeout issue, connection leaking ... I think most of the important change already in the official tree. and this is the way we figure out the root issues in ATS, which may lead to just some very tiny fix that will only affect very high traffic site with very strict SLA requirement. Heartbeat failed with high load, trafficserver restarted Key: TS-3386 URL: https://issues.apache.org/jira/browse/TS-3386 Project: Traffic Server Issue Type: Bug Components: Performance Reporter: Luca Bruno I've been evaluating ATS for some days. I'm using it with mostly default settings, except I've lowered the number of connections to the backend, I have a raw storage of 500gb, and disabled ram cache. Working fine, then I wanted to stress it more. I've increased the test to 1000 concurrent requests, then the ATS worker has been restarted and thus lost the whole cache. /var/log/syslog: {noformat} Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server Starting --- Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: (last system error 32: Broken pipe) Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status signal [32985 256] Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, making sure traffic_server is dead Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting --- Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:05:19) Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server Starting --- Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:32 test-cache
[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted
[ https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316279#comment-14316279 ] Zhao Yongming commented on TS-3386: --- well, the remap metters, please don't mess up 127.0.0.1 8080 with most of the services, that is not what ATS working to as a proxy. use something like map http://mydomain.com:8080/ . and do your testing using modified /etc/hosts or -x 127.0.0.1:8080 in curl. Heartbeat failed with high load, trafficserver restarted Key: TS-3386 URL: https://issues.apache.org/jira/browse/TS-3386 Project: Traffic Server Issue Type: Bug Components: Performance Reporter: Luca Bruno I've been evaluating ATS for some days. I'm using it with mostly default settings, except I've lowered the number of connections to the backend, I have a raw storage of 500gb, and disabled ram cache. Working fine, then I wanted to stress it more. I've increased the test to 1000 concurrent requests, then the ATS worker has been restarted and thus lost the whole cache. /var/log/syslog: {noformat} Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server Starting --- Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: (last system error 32: Broken pipe) Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status signal [32985 256] Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, making sure traffic_server is dead Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting --- Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:05:19) Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server Starting --- Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: --- traffic_server Starting --- Feb 11 10:06:44 test-cache
[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted
[ https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316376#comment-14316376 ] Zhao Yongming commented on TS-3386: --- if you want to talk about the kill, I'd like to say there should be more work before taking down the server, but how would you know that the connection full and all works well? we have tried to put the heartbeat into a connection that will not be affect in the connection limit, but sounds not so good too the heart beat is a fake L7 service health checker, which is design to find out something abnormal :D Heartbeat failed with high load, trafficserver restarted Key: TS-3386 URL: https://issues.apache.org/jira/browse/TS-3386 Project: Traffic Server Issue Type: Bug Components: Performance Reporter: Luca Bruno I've been evaluating ATS for some days. I'm using it with mostly default settings, except I've lowered the number of connections to the backend, I have a raw storage of 500gb, and disabled ram cache. Working fine, then I wanted to stress it more. I've increased the test to 1000 concurrent requests, then the ATS worker has been restarted and thus lost the whole cache. /var/log/syslog: {noformat} Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server Starting --- Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: (last system error 32: Broken pipe) Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status signal [32985 256] Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, making sure traffic_server is dead Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting --- Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:05:19) Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server Starting --- Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [Alarms::signalAlarm]
[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted
[ https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316343#comment-14316343 ] Zhao Yongming commented on TS-3386: --- well, proxy.config.net.connections_throttle = 1000, are you kidding? ATS is not squid nor httpd-1.x Heartbeat failed with high load, trafficserver restarted Key: TS-3386 URL: https://issues.apache.org/jira/browse/TS-3386 Project: Traffic Server Issue Type: Bug Components: Performance Reporter: Luca Bruno I've been evaluating ATS for some days. I'm using it with mostly default settings, except I've lowered the number of connections to the backend, I have a raw storage of 500gb, and disabled ram cache. Working fine, then I wanted to stress it more. I've increased the test to 1000 concurrent requests, then the ATS worker has been restarted and thus lost the whole cache. /var/log/syslog: {noformat} Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server Starting --- Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: (last system error 32: Broken pipe) Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status signal [32985 256] Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, making sure traffic_server is dead Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting --- Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:05:19) Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server Starting --- Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: --- traffic_server Starting --- Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:44
[jira] [Commented] (TS-3386) Heartbeat failed with high load, trafficserver restarted
[ https://issues.apache.org/jira/browse/TS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316238#comment-14316238 ] Zhao Yongming commented on TS-3386: --- o, please just don't load any traffic and enable debug on http.*|dns.*, and I'd suspect this is a HostDB reverse lookup on 127.0.0.1 or lookup on localhost issue. let us dig it out. Heartbeat failed with high load, trafficserver restarted Key: TS-3386 URL: https://issues.apache.org/jira/browse/TS-3386 Project: Traffic Server Issue Type: Bug Components: Performance Reporter: Luca Bruno I've been evaluating ATS for some days. I'm using it with mostly default settings, except I've lowered the number of connections to the backend, I have a raw storage of 500gb, and disabled ram cache. Working fine, then I wanted to stress it more. I've increased the test to 1000 concurrent requests, then the ATS worker has been restarted and thus lost the whole cache. /var/log/syslog: {noformat} Feb 11 10:05:52 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:05:52 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:02 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:02 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:02 test-cache traffic_cop[32984]: killing server Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:02 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: --- traffic_server Starting --- Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:04 test-cache traffic_server[59047]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:12 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:12 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:22 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:22 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:22 test-cache traffic_cop[32984]: killing server Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message Feb 11 10:06:22 test-cache traffic_manager[32985]: {0x7f975c537720} ERROR: (last system error 32: Broken pipe) Feb 11 10:06:22 test-cache traffic_cop[32984]: cop received child status signal [32985 256] Feb 11 10:06:22 test-cache traffic_cop[32984]: traffic_manager not running, making sure traffic_server is dead Feb 11 10:06:22 test-cache traffic_cop[32984]: spawning traffic_manager Feb 11 10:06:22 test-cache traffic_cop[32984]: binpath is bin Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: --- Manager Starting --- Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: Manager Version: Apache Traffic Server - traffic_manager - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:05:19) Feb 11 10:06:22 test-cache traffic_manager[59057]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: --- traffic_server Starting --- Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: traffic_server Version: Apache Traffic Server - traffic_server - 5.2.0 - (build # 11013 on Feb 10 2015 at 13:04:42) Feb 11 10:06:24 test-cache traffic_server[59065]: NOTE: RLIMIT_NOFILE(7):cur(736236),max(736236) Feb 11 10:06:32 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:32 test-cache traffic_cop[32984]: server heartbeat failed [1] Feb 11 10:06:42 test-cache traffic_cop[32984]: (http test) received non-200 status(502) Feb 11 10:06:42 test-cache traffic_cop[32984]: server heartbeat failed [2] Feb 11 10:06:42 test-cache traffic_cop[32984]: killing server Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 9: Killed Feb 11 10:06:42 test-cache traffic_manager[59057]: {0x7f2c94ded720} ERROR: [Alarms::signalAlarm] Server Process was reset Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: --- traffic_server Starting --- Feb 11 10:06:44 test-cache traffic_server[59077]: NOTE: traffic_server Version: Apache Traffic Server -
[jira] [Updated] (TS-2482) Problems with SOCKS
[ https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-2482: -- Assignee: weijin Problems with SOCKS --- Key: TS-2482 URL: https://issues.apache.org/jira/browse/TS-2482 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Radim Kolar Assignee: weijin Fix For: sometime There are several problems with using SOCKS. I am interested in case when TF is sock client. Client sends HTTP request and TF uses SOCKS server to make connection to internet. a/ - not documented enough in default configs From default configs comments it seems that for running TF 4.1.2 as socks client, it is sufficient to add one line to socks.config: dest_ip=0.0.0.0-255.255.255.255 parent=10.0.0.7:9050 but socks proxy is not used. If i run tcpdump sniffing packets TF never tries to connect to that SOCKS. From source code - https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it looks that is needed to set proxy.config.socks.socks_needed to activate socks support. This should be documented in both sample files: socks.config and record.config b/ after enabling socks, i am hit by this assert: Assertion failed: (ats_is_ip4(target_addr)), function init, file Socks.cc, line 65. i run on dual stack system (ip4,ip6). This code is setting default destination for SOCKS request? Can not you use just 127.0.0.1 for case if client gets connected over IP6? https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2482) Problems with SOCKS
[ https://issues.apache.org/jira/browse/TS-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278227#comment-14278227 ] Zhao Yongming commented on TS-2482: --- we have a patch that will fix the problem, I think. it works on turning ATS into a SOCSK5 server, but still pending to full testing with parent socks feature. the problem here is not only the assert, but also the HTTP transactions. Problems with SOCKS --- Key: TS-2482 URL: https://issues.apache.org/jira/browse/TS-2482 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Radim Kolar Fix For: sometime There are several problems with using SOCKS. I am interested in case when TF is sock client. Client sends HTTP request and TF uses SOCKS server to make connection to internet. a/ - not documented enough in default configs From default configs comments it seems that for running TF 4.1.2 as socks client, it is sufficient to add one line to socks.config: dest_ip=0.0.0.0-255.255.255.255 parent=10.0.0.7:9050 but socks proxy is not used. If i run tcpdump sniffing packets TF never tries to connect to that SOCKS. From source code - https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc it looks that is needed to set proxy.config.socks.socks_needed to activate socks support. This should be documented in both sample files: socks.config and record.config b/ after enabling socks, i am hit by this assert: Assertion failed: (ats_is_ip4(target_addr)), function init, file Socks.cc, line 65. i run on dual stack system (ip4,ip6). This code is setting default destination for SOCKS request? Can not you use just 127.0.0.1 for case if client gets connected over IP6? https://github.com/apache/trafficserver/blob/master/iocore/net/Socks.cc#L66 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3088) Have ATS look at /etc/hosts
[ https://issues.apache.org/jira/browse/TS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247700#comment-14247700 ] Zhao Yongming commented on TS-3088: --- looks some of the SPLIT DNS codes is removed, is that feature still working after this commit? Have ATS look at /etc/hosts --- Key: TS-3088 URL: https://issues.apache.org/jira/browse/TS-3088 Project: Traffic Server Issue Type: New Feature Components: DNS Reporter: David Carlin Assignee: Alan M. Carroll Priority: Minor Fix For: 5.3.0 Attachments: ts-3088-3-2-x-patch.diff It would be nice if /etc/hosts was read when resolving hostnames - useful for testing/troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3220) Update http cache stats so we can determine if a response was served from ram cache
[ https://issues.apache.org/jira/browse/TS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235637#comment-14235637 ] Zhao Yongming commented on TS-3220: --- yeah, nice catch, we have seen some ram cache hit higher than expected too. Update http cache stats so we can determine if a response was served from ram cache --- Key: TS-3220 URL: https://issues.apache.org/jira/browse/TS-3220 Project: Traffic Server Issue Type: Improvement Components: Metrics Reporter: Bryan Call Labels: A, Yahoo Fix For: 5.3.0 Currently we use a combination of ram cache stats and some http ram cache information to try to determine if the response was served from ram cache. The ram cache stats don't know about http and the entry in ram cache might not be valid. It is possible to have a ram cache hit from the cache's point of view, but not serve the response from cache at all. The http cache stats are missing a few stats to determine if the response was served from ram. We would need to add stat for ims responses served from ram {{proxy.process.http.cache_hit_mem_ims}} and a stat if the stale response was served from ram {{proxy.process.http.cache_hit_mem_stale_served}}. Ram cache stats for reference {code} proxy.process.cache.ram_cache.hits proxy.process.cache.ram_cache.misses {code} Current http cache stats for reference {code} proxy.process.http.cache_hit_fresh proxy.process.http.cache_hit_mem_fresh proxy.process.http.cache_hit_revalidated proxy.process.http.cache_hit_ims proxy.process.http.cache_hit_stale_served proxy.process.http.cache_miss_cold proxy.process.http.cache_miss_changed proxy.process.http.cache_miss_client_no_cache proxy.process.http.cache_miss_client_not_cacheable proxy.process.http.cache_miss_ims {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3212) 200 code is returned as 304
[ https://issues.apache.org/jira/browse/TS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225607#comment-14225607 ] Zhao Yongming commented on TS-3212: --- well, if the ATS return you a 304, there will be two case: 1, the UA side IMS is pass to origin and origin returned with a 304, and that 304 response itself is saved 2, the content is saved in cache and expired, then ATS query the origin with a selfbuilding IMS header, origin server response with a 200, but ATS try to reponse with a 304 to UA. if it is case #2, please confirm that the content is saved in cache, and the origin response is 200. the http_ui and tcpdump or debug in records may help. I think that case #2 looks cool, but it should not saved as here the content is set to 'no cache', right? 200 code is returned as 304 --- Key: TS-3212 URL: https://issues.apache.org/jira/browse/TS-3212 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Luca Rea The live streaming videos from akamaihd.net CDN cannot be watched because ATS rewrite codes 200 into 304 and videos enter continuosly in buffering status: {code} GET http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKRb=500,300,700,900,1200hdcore=3.1.0plugin=aasp-3.1.0.43.124 HTTP/1.1 Host: abclive.abcnews.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Referer: http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA== Connection: keep-alive HTTP/1.1 200 OK Server: ContactLab Mime-Version: 1.0 Content-Type: video/abst Content-Length: 122 Last-Modified: Tue, 25 Nov 2014 05:28:32 GMT Expires: Tue, 25 Nov 2014 15:31:53 GMT Cache-Control: max-age=0, no-cache Pragma: no-cache Date: Tue, 25 Nov 2014 15:31:53 GMT access-control-allow-origin: * Set-Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA==; path=/z/abc_live1@136327/; domain=abclive.abcnews.com Age: 0 Connection: keep-alive GET http://abclive.abcnews.com/z/abc_live1@136327/1200_02769fd3e0d85977-p.bootstrap?g=PDSTQVGEMQKRb=500,300,700,900,1200hdcore=3.1.0plugin=aasp-3.1.0.43.124 HTTP/1.1 Host: abclive.abcnews.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Referer: http://a.abcnews.com/assets/player/amp/2.0.0012/amp.premier/AkamaiPremierPlayer.swf Cookie: _alid_=0OHcZb9VLdpbE6LrNYyDDA== Connection: keep-alive If-Modified-Since: Tue, 25 Nov 2014 05:28:32 GMT HTTP/1.1 304 Not Modified Date: Tue, 25 Nov 2014 15:31:58 GMT Expires: Tue, 25 Nov 2014 15:31:58 GMT Cache-Control: max-age=0, no-cache Connection: keep-alive Server: ContactLab {code} using the url_regex to skip cache/IMS doesn't work, the workaround is the following line in records.config: CONFIG proxy.config.http.cache.cache_urls_that_look_dynamic INT 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3192) implement proxy.config.config_dir
[ https://issues.apache.org/jira/browse/TS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207663#comment-14207663 ] Zhao Yongming commented on TS-3192: --- that is a pending to removing feature, IMO. the origin TS is desgin to be relocatable for the config files due to binary distribution. it may accept records config and shell ENV settings, after the opensource, we can set the config dir by configure options and there is no need to make things that complex. FYI implement proxy.config.config_dir - Key: TS-3192 URL: https://issues.apache.org/jira/browse/TS-3192 Project: Traffic Server Issue Type: New Feature Components: Configuration Reporter: James Peach Assignee: James Peach Fix For: 5.2.0 {{proxy.config.config_dir}} has never been implemented, but there are various scenarios where is it useful to be able to point Traffic Server to a non-default set of configuration files. {{TS_ROOT}} is not always sufficient for this because the system config directory is a path relative to the prefix which otherwise cannot be altered (even assuming you know it). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-1822) Do we still need proxy.config.system.mmap_max ?
[ https://issues.apache.org/jira/browse/TS-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14205773#comment-14205773 ] Zhao Yongming commented on TS-1822: --- we make use of the reclaim freelist on our 48G memory system, handling about 24-32G ram cache, with about 32KB everage content size, the default sysctl seting vm.max_map_count = 65530 is no enough, we have to rise it to 2x. so, I'd make this a option to rise the default sysctl setting if we choose to keep it, for example by cop process. Do we still need proxy.config.system.mmap_max ? --- Key: TS-1822 URL: https://issues.apache.org/jira/browse/TS-1822 Project: Traffic Server Issue Type: Improvement Components: Core Reporter: Leif Hedstrom Assignee: Phil Sorber Labels: compatibility Fix For: 6.0.0 A long time ago, we added proxy.config.system.mmap_max to let the traffic_server increase the max number of mmap segments that we want to use. We currently set this to 2MM. I'm wondering, do we really need this still ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3181) manager ports should only do local network interaction
Zhao Yongming created TS-3181: - Summary: manager ports should only do local network interaction Key: TS-3181 URL: https://issues.apache.org/jira/browse/TS-3181 Project: Traffic Server Issue Type: Improvement Components: Manager Reporter: Zhao Yongming the manager ports, such as 8088 8089 etc shoud only accept local network connections, and by ignore all the connections from outer network, we can make the interactions more stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3181) manager ports should only do local network interaction
[ https://issues.apache.org/jira/browse/TS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203965#comment-14203965 ] Zhao Yongming commented on TS-3181: --- for example, we should try to filter out these issues with cluster enabled: {code} [Nov 7 15:28:21.428] Manager {0x7f277bfff700} NOTE: [ClusterCom::drainIncomingChannel] Unexpected message on cluster port. Possibly an attack [Nov 7 15:28:57.501] Manager {0x7f277bfff700} NOTE: [ClusterCom::drainIncomingChannel] Unexpected message on cluster port. Possibly an attack [Nov 7 15:34:09.624] Manager {0x7f277bfff700} NOTE: [ClusterCom::drainIncomingChannel] Unexpected message on cluster port. Possibly an attack [Nov 7 15:38:36.235] Manager {0x7f277bfff700} NOTE: [ClusterCom::drainIncomingChannel] Unexpected message on cluster port. Possibly an attack [Nov 7 15:39:45.596] Manager {0x7f277bfff700} NOTE: [ClusterCom::drainIncomingChannel] Unexpected message on cluster port. Possibly an attack {code} manager ports should only do local network interaction -- Key: TS-3181 URL: https://issues.apache.org/jira/browse/TS-3181 Project: Traffic Server Issue Type: Improvement Components: Manager Reporter: Zhao Yongming the manager ports, such as 8088 8089 etc shoud only accept local network connections, and by ignore all the connections from outer network, we can make the interactions more stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3181) manager ports should only do local network interaction
[ https://issues.apache.org/jira/browse/TS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-3181: -- Fix Version/s: sometime manager ports should only do local network interaction -- Key: TS-3181 URL: https://issues.apache.org/jira/browse/TS-3181 Project: Traffic Server Issue Type: Improvement Components: Manager Reporter: Zhao Yongming Fix For: sometime the manager ports, such as 8088 8089 etc shoud only accept local network connections, and by ignore all the connections from outer network, we can make the interactions more stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3174) Kill LRU Ram Cache
[ https://issues.apache.org/jira/browse/TS-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203330#comment-14203330 ] Zhao Yongming commented on TS-3174: --- hmm, are you sure you get the correct understanding of the CLFUS effects? In our using of the ram cache, the CLFUS will cause some trouble on memory wasting, especially with a heavy changed traffic patten, be cause that CLFUS will try to cache more small objects and swapping out the big objects after ram cache memory full. that is a good feature, but need more working to archive the memory allocating/de-allocating during this step, I think that there still need more work to make the big objects swaping out and de-allocate or reuse. I know that will not kill TS on most so busy systems, but you still need to keep an eye on that. TS cop process will bring back the failed server, it may hide most of the problems from users. :D and it is easy to verify, on system with mixed objects, ie, active object size range from 1KB-100MB. set a higher ram cut off size from 4M to 100M, and following the doc/sdk/troubleshooting-tips/debugging-memory-leaks.en.rst to enable memory dump, compare the allocated and used memories on each size. FYI Kill LRU Ram Cache -- Key: TS-3174 URL: https://issues.apache.org/jira/browse/TS-3174 Project: Traffic Server Issue Type: Task Reporter: Susan Hinrichs Fix For: 6.0.0 Comment from [~zwoop]. Now that CLFUS is both stable, and default, is there even a reason to keep the old LRU cache. If no objections should remove for the next major version change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3180) Linux native aio not support disk 2T
Zhao Yongming created TS-3180: - Summary: Linux native aio not support disk 2T Key: TS-3180 URL: https://issues.apache.org/jira/browse/TS-3180 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Zhao Yongming {code} 21:47 faysal [Nov 8 15:45:30.080] Server {0x2ab53ff36700} WARNING: unable to clear cache directory '/dev/sdc 548864:366283256' 21:48 faysal although brw-rw 1 nobody nobody 8, 32 Nov 8 15:45 /dev/sdc 21:48 faysal fedora 21 21:48 faysal ping anyone 21:49 ming_zym disk fail? 21:52 ming_zym try to restart traffic server? 21:55 faysal i did restarted traffic server couple of times no luck 21:56 faysal by the way this is build with linux native aio enabled 21:56 faysal and latest master pulled today 21:56 ming_zym o, please don't use linux native aio in production 21:57 ming_zym not that ready to be used expect in testing 21:58 ming_zym I am sorry we don't have time to track down all those native aio issues here 21:59 faysal ok 21:59 faysal am compiling now without native aio 21:59 faysal and see what happens and inform you 22:06 faysal ming_zym: if you are working on native aio stuff its the issue 22:07 faysal i compiled without it and now its working fine 22:07 faysal i have noticed this on harddisks over 2T size 22:07 faysal smaller disks work fine with native aio 22:12 ming_zym ok, cool 22:13 faysal thats because i guess my disks are 3T each and one with 240G 22:14 faysal the 240 was taken no problem 22:14 faysal but the 3T has to be in GPT patition format 22:14 faysal and Fedora for some reason had issues identifying it 22:14 ming_zym hmm, maybe that is bug {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2314) New config to allow unsatifiable Range: request to go straight to Origin
[ https://issues.apache.org/jira/browse/TS-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143290#comment-14143290 ] Zhao Yongming commented on TS-2314: --- looks like a fundamental bug in the rww, should we take a deep look at it? cutting down the origin traffic is always the critical feature for a cache. New config to allow unsatifiable Range: request to go straight to Origin Key: TS-2314 URL: https://issues.apache.org/jira/browse/TS-2314 Project: Traffic Server Issue Type: Bug Components: Core Reporter: jaekyung oh Labels: range Attachments: TS-2314.diff Basically read_while_writer works fine when ATS handles normal file. In progressive download and playback of mp4 in which moov atom is placed at the end of the file, ATS makes and returns wrong response for range request from unfulfilled cache when read_while_writer is 1. In origin, apache has h264 streaming module. Everything is ok whether the moov atom is placed at the beginning of the file or not in origin except a range request happens with read_while_writer. Mostly our customer’s contents placed moov atom at the end of the file and in the case movie player stops playing when it seek somewhere in the movie. to check if read_while_writer works fine, 1. prepare a mp4 file whose moov atom is placed at the end of the file. 2. curl --range - http://www.test.com/mp4/test.mp4 1 no_cache_from_origin 3. wget http://www.test.com/mp4/test.mp4 4. right after wget, execute “curl --range - http://www.test.com/mp4/test.mp4 1 from_read_while_writer” on other terminal (the point is sending range request while ATS is still downloading) 5. after wget gets done, curl --range - http://www.test.com/mp4/test.mp4 1 from_cache 6. you can check compare those files by bindiff. The response from origin(no_cache_from_origin) for the range request is exactly same to from_cache resulted from #5's range request. but from_read_while_writer from #4 is totally different from others. i think a range request should be forwarded to origin server if it can’t find the content with the offset in cache even if the read_while_writer is on, instead ATS makes(from where?) and sends wrong response. (In squid.log it indicates TCP_HIT) That’s why a movie player stops when it seeks right after the movie starts. Well. we turned off read_while_writer and movie play is ok but the problems is read_while_writer is global options. we can’t set it differently for each remap entry by conf_remap. So the downloading of Big file(not mp4 file) gives overhead to origin server because read_while_writer is off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2314) New config to allow unsatifiable Range: request to go straight to Origin
[ https://issues.apache.org/jira/browse/TS-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143348#comment-14143348 ] Zhao Yongming commented on TS-2314: --- yeah, I think your suggestion is great. and the current rww lack of support for many cases besides this case, for example: 1, how long should a reader in waiting should wait for? in this case, the answer sounds like not at all, but use as much downloaded data as possible 2, should we enable the rww for a file as big as 5G? for example I'd like to make a limited usage with relatively small files such as 30m, due to the origin site is far away from the edge site. 3, should we consider on the patial feature in the https://cwiki.apache.org/confluence/display/TS/Partial+Object+Caching ? 4, well, if it is a low speed user that triggered the cache storing, will it be a speed problem for others readers that waiting? well, that are some of the issues we thinking on rww, I just want rww get more loves :D New config to allow unsatifiable Range: request to go straight to Origin Key: TS-2314 URL: https://issues.apache.org/jira/browse/TS-2314 Project: Traffic Server Issue Type: Bug Components: Core Reporter: jaekyung oh Labels: range Attachments: TS-2314.diff Basically read_while_writer works fine when ATS handles normal file. In progressive download and playback of mp4 in which moov atom is placed at the end of the file, ATS makes and returns wrong response for range request from unfulfilled cache when read_while_writer is 1. In origin, apache has h264 streaming module. Everything is ok whether the moov atom is placed at the beginning of the file or not in origin except a range request happens with read_while_writer. Mostly our customer’s contents placed moov atom at the end of the file and in the case movie player stops playing when it seek somewhere in the movie. to check if read_while_writer works fine, 1. prepare a mp4 file whose moov atom is placed at the end of the file. 2. curl --range - http://www.test.com/mp4/test.mp4 1 no_cache_from_origin 3. wget http://www.test.com/mp4/test.mp4 4. right after wget, execute “curl --range - http://www.test.com/mp4/test.mp4 1 from_read_while_writer” on other terminal (the point is sending range request while ATS is still downloading) 5. after wget gets done, curl --range - http://www.test.com/mp4/test.mp4 1 from_cache 6. you can check compare those files by bindiff. The response from origin(no_cache_from_origin) for the range request is exactly same to from_cache resulted from #5's range request. but from_read_while_writer from #4 is totally different from others. i think a range request should be forwarded to origin server if it can’t find the content with the offset in cache even if the read_while_writer is on, instead ATS makes(from where?) and sends wrong response. (In squid.log it indicates TCP_HIT) That’s why a movie player stops when it seeks right after the movie starts. Well. we turned off read_while_writer and movie play is ok but the problems is read_while_writer is global options. we can’t set it differently for each remap entry by conf_remap. So the downloading of Big file(not mp4 file) gives overhead to origin server because read_while_writer is off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-2314) New config to allow unsatifiable Range: request to go straight to Origin
[ https://issues.apache.org/jira/browse/TS-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143348#comment-14143348 ] Zhao Yongming edited comment on TS-2314 at 9/22/14 4:10 PM: yeah, I think your suggestion is great. and the current rww lack of support for many cases besides this case, for example: 1, how long a reader in waiting should wait for? in this case, the answer sounds like not at all, but use as much downloaded data as possible 2, should we enable the rww for a file as big as 5G? for example I'd like to make a limited usage with relatively small files such as 30m, due to the origin site is far away from the edge site. 3, should we consider on the patial feature in the https://cwiki.apache.org/confluence/display/TS/Partial+Object+Caching ? 4, well, if it is a low speed user that triggered the cache storing, will it be a speed problem for others readers that waiting? well, that are some of the issues we thinking on rww, I just want rww get more loves :D was (Author: zym): yeah, I think your suggestion is great. and the current rww lack of support for many cases besides this case, for example: 1, how long should a reader in waiting should wait for? in this case, the answer sounds like not at all, but use as much downloaded data as possible 2, should we enable the rww for a file as big as 5G? for example I'd like to make a limited usage with relatively small files such as 30m, due to the origin site is far away from the edge site. 3, should we consider on the patial feature in the https://cwiki.apache.org/confluence/display/TS/Partial+Object+Caching ? 4, well, if it is a low speed user that triggered the cache storing, will it be a speed problem for others readers that waiting? well, that are some of the issues we thinking on rww, I just want rww get more loves :D New config to allow unsatifiable Range: request to go straight to Origin Key: TS-2314 URL: https://issues.apache.org/jira/browse/TS-2314 Project: Traffic Server Issue Type: Bug Components: Core Reporter: jaekyung oh Labels: range Attachments: TS-2314.diff Basically read_while_writer works fine when ATS handles normal file. In progressive download and playback of mp4 in which moov atom is placed at the end of the file, ATS makes and returns wrong response for range request from unfulfilled cache when read_while_writer is 1. In origin, apache has h264 streaming module. Everything is ok whether the moov atom is placed at the beginning of the file or not in origin except a range request happens with read_while_writer. Mostly our customer’s contents placed moov atom at the end of the file and in the case movie player stops playing when it seek somewhere in the movie. to check if read_while_writer works fine, 1. prepare a mp4 file whose moov atom is placed at the end of the file. 2. curl --range - http://www.test.com/mp4/test.mp4 1 no_cache_from_origin 3. wget http://www.test.com/mp4/test.mp4 4. right after wget, execute “curl --range - http://www.test.com/mp4/test.mp4 1 from_read_while_writer” on other terminal (the point is sending range request while ATS is still downloading) 5. after wget gets done, curl --range - http://www.test.com/mp4/test.mp4 1 from_cache 6. you can check compare those files by bindiff. The response from origin(no_cache_from_origin) for the range request is exactly same to from_cache resulted from #5's range request. but from_read_while_writer from #4 is totally different from others. i think a range request should be forwarded to origin server if it can’t find the content with the offset in cache even if the read_while_writer is on, instead ATS makes(from where?) and sends wrong response. (In squid.log it indicates TCP_HIT) That’s why a movie player stops when it seeks right after the movie starts. Well. we turned off read_while_writer and movie play is ok but the problems is read_while_writer is global options. we can’t set it differently for each remap entry by conf_remap. So the downloading of Big file(not mp4 file) gives overhead to origin server because read_while_writer is off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3083) crash
[ https://issues.apache.org/jira/browse/TS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138429#comment-14138429 ] Zhao Yongming commented on TS-3083: --- hmm, can you provide more information on your configure options and env? I think we may get [~yunkai] take a look if it is the freelist issue crash - Key: TS-3083 URL: https://issues.apache.org/jira/browse/TS-3083 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.2 Reporter: bettydramit Labels: crash c++filt a.txt {code} /lib64/libpthread.so.0(+0xf710)[0x2b4c37949710] /usr/lib64/trafficserver/libtsutil.so.5(ink_atomiclist_pop+0x3e)[0x2b4c35abb64e] /usr/lib64/trafficserver/libtsutil.so.5(reclaimable_freelist_new+0x65)[0x2b4c35abc065] /usr/bin/traffic_server(MIOBuffer_tracker::operator()(long)+0x2b)[0x4a33db] /usr/bin/traffic_server(PluginVCCore::init()+0x2e3)[0x4d9903] /usr/bin/traffic_server(PluginVCCore::alloc()+0x11d)[0x4dcf4d] /usr/bin/traffic_server(TSHttpConnectWithPluginId+0x5d)[0x4b9e9d] /usr/bin/traffic_server(FetchSM::httpConnect()+0x74)[0x4a0224] /usr/bin/traffic_server(PluginVC::process_read_side(bool)+0x375)[0x4da675] /usr/bin/traffic_server(PluginVC::process_write_side(bool)+0x57a)[0x4dafca] /usr/bin/traffic_server(PluginVC::main_handler(int, void*)+0x315)[0x4dc9a5] /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x8f)[0x73788f] /usr/bin/traffic_server(EThread::execute()+0x57b)[0x7381fb] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115401#comment-14115401 ] Zhao Yongming commented on TS-3032: --- yeah, 64k is too small for you, I'd suggest you 128K, you may use 256K I think. FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 Attachments: memory.d.png ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, void*)+0x22b)[0x59270b] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server[0x714a60] /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x1ed)[0x7077cd] /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x736111]
[jira] [Comment Edited] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115401#comment-14115401 ] Zhao Yongming edited comment on TS-3032 at 8/29/14 4:26 PM: yeah, 64k is too small for you, I'd suggest you 128K for 48G ram system, but is that your ram cache set to 10G too? why it still use so many memory here? can you dump out the mem allocator debug info? was (Author: zym): yeah, 64k is too small for you, I'd suggest you 128K, you may use 256K I think. FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 Attachments: memory.d.png ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, void*)+0x22b)[0x59270b]
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115416#comment-14115416 ] Zhao Yongming commented on TS-3032: --- well, looks your memory is starting from 20G, I'd think that your index memory is nearly about 20G, that indicate you may have ~20TB storage, if you haven't change proxy.config.cache.min_average_object_size, is this right? FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 Attachments: memory.d.png ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, void*)+0x22b)[0x59270b] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server[0x714a60]
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113762#comment-14113762 ] Zhao Yongming commented on TS-3032: --- any update? [~ngorchilov] FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, void*)+0x22b)[0x59270b] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server[0x714a60] /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x1ed)[0x7077cd] /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x736111] /z/bin/traffic_server(EThread::execute()+0x4fc)[0x736bcc] /z/bin/traffic_server[0x7353aa]
[jira] [Commented] (TS-3021) hosting.config vs volume.config
[ https://issues.apache.org/jira/browse/TS-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110898#comment-14110898 ] Zhao Yongming commented on TS-3021: --- the hosting and volume is the same usage? I don't think so. the volume defines the partation spliting of the storage space, and the hosting assign them to hostname. unless you want to remove the control matcher, I'd not suggest to change thire file syntax. the config file is End User Interface, and we should do carefully discuss before we take any action. changes in UI is much evil than function renames in codes hosting.config vs volume.config --- Key: TS-3021 URL: https://issues.apache.org/jira/browse/TS-3021 Project: Traffic Server Issue Type: Bug Components: Configuration Reporter: Igor Galić Fix For: sometime it appears to me that hosting.config and volume.config have a very similar purpose / use-case. perhaps it would be good to merge those two. --- n.b.: i'm not up-to-date on the plans re lua-config, but even then we'll need to consider how to present. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110941#comment-14110941 ] Zhao Yongming commented on TS-3032: --- I'd suggest you get some tool to log the memory usage and other history data. a tool we used very often in tracing issues like this is https://github.com/alibaba/tsar https://blog.zymlinux.net/index.php/archives/251 , any other tool that can find out the data to compare is great. when we deal with TS-1006, I even make some excel sheet to point out that the memory is a big problem, the more data the better FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int,
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109133#comment-14109133 ] Zhao Yongming commented on TS-3032: --- looks nothing unusal, I think that 'Cached: 25975284 kB' is caused by the access logging, then we need more infomation on ATS: 1. your ram cache setting: proxy.config.cache.ram_cache.size, if not set please tell us your storage device usage, and cache min_average_object_size. 2. let us dump some memory details in the ATS itself: https://docs.trafficserver.apache.org/en/latest/sdk/troubleshooting-tips/debugging-memory-leaks.en.html and we should better get all those data the breaking point too :D FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2]
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109214#comment-14109214 ] Zhao Yongming commented on TS-3032: --- well, you have 7368964608 memory in the freelist, and 4893378608 in use, that is 66% in use. with about 8000 active connections. all sounds not so bad except that 7G is far smaller than that 19G from the pid summary, why? FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, void*)+0x22b)[0x59270b] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server[0x714a60] /z/bin/traffic_server(NetHandler::mainNetEvent(int,
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109216#comment-14109216 ] Zhao Yongming commented on TS-3032: --- I'd like you keep colect those data for some more days, the same time(to get the same load) if you can, to figure out which component is wasting more memories. FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int, void*)+0x22b)[0x59270b] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server[0x714a60] /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x1ed)[0x7077cd]
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109225#comment-14109225 ] Zhao Yongming commented on TS-3032: --- and if you have more than one boxes with that issue, please consider test one box with the following tweak: 1. re-install with reclaim freelist enabled. and make sure reclaim is enabled in the records.config 2. use the standard LRU: set proxy.config.cache.ram_cache.algorithm to 1 and if you have more system that can do a release test, we can identify which release is proved to be correct. :D FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int,
[jira] [Commented] (TS-3032) FATAL: ats_malloc: couldn't allocate XXXXXX bytes
[ https://issues.apache.org/jira/browse/TS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108410#comment-14108410 ] Zhao Yongming commented on TS-3032: --- I don't know who have any sucess story on BIG memory system, I'd like to hear if any. for the problem you have, please attach some more data such as: 1. /proc/meminfo 2. the traffic_server process status: /proc//status 3. more system log related to alloc and memory, such as dmesg syslog and, please tell us your configure options when building the binary too. hopes that will help us inspect the problem. FATAL: ats_malloc: couldn't allocate XX bytes - Key: TS-3032 URL: https://issues.apache.org/jira/browse/TS-3032 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 5.0.1 Reporter: Nikolai Gorchilov Assignee: Brian Geffon Labels: crash Fix For: 5.2.0 ATS 5.0.1 under Unbuntu 12.04.4 running happily for days suddenly crashes due to memory allocation issue. Happens once or twice a week. Server is having plenty of RAM - 128G - out of which 64G+ are free. Nothing suspicious in dmesg. {noformat} FATAL: ats_malloc: couldn't allocate 155648 bytes /z/bin/traffic_server - STACK TRACE: /z/lib/libtsutil.so.5(+0x1e837)[0x2b6251b3d837] /z/lib/libtsutil.so.5(ats_malloc+0x30)[0x2b6251b40c50] /z/bin/traffic_server(HdrHeap::coalesce_str_heaps(int)+0x34)[0x62e834] /z/bin/traffic_server(http_hdr_clone(HTTPHdrImpl*, HdrHeap*, HdrHeap*)+0x8f)[0x62a54f] /z/bin/traffic_server(HttpTransactHeaders::copy_header_fields(HTTPHdr*, HTTPHdr*, bool, long)+0x1ae)[0x5d08de] /z/bin/traffic_server(HttpTransact::build_request(HttpTransact::State*, HTTPHdr*, HTTPHdr*, HTTPVersion)+0x5c)[0x5b280c] /z/bin/traffic_server(HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*)+0x2c8)[0x5c2ce8] /z/bin/traffic_server(HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*))+0x66)[0x58e356] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::do_hostdb_lookup()+0x27a)[0x58e84a] /z/bin/traffic_server(HttpSM::set_next_state()+0xd48)[0x5a1038] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/x3me_dscp.so(http_txn_hook(tsapi_cont*, TSEvent, void*)+0x236)[0x2b626342b508] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_cache_open_read(int, void*)+0x180)[0x59b070] /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x59ad98] /z/bin/traffic_server(HttpCacheSM::state_cache_open_read(int, void*)+0x173)[0x57bbb3] /z/bin/traffic_server(Cache::open_read(Continuation*, INK_MD5*, HTTPHdr*, CacheLookupHttpConfig*, CacheFragType, char*, int)+0x616)[0x6d65a6] /z/bin/traffic_server(CacheProcessor::open_read(Continuation*, URL*, bool, HTTPHdr*, CacheLookupHttpConfig*, long, CacheFragType)+0xb0)[0x6b1af0] /z/bin/traffic_server(HttpCacheSM::open_read(URL*, HTTPHdr*, CacheLookupHttpConfig*, long)+0x83)[0x57c2d3] /z/bin/traffic_server(HttpSM::do_cache_lookup_and_read()+0xfb)[0x58baeb] /z/bin/traffic_server(HttpSM::set_next_state()+0x888)[0x5a0b78] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::set_next_state()+0x7e2)[0x5a0ad2] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x343)[0x599c03] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/cacheurl.so(+0x17dc)[0x2b6263a477dc] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/tslua.so(+0x596f)[0x2b626363396f] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::state_api_callback(int, void*)+0x8a)[0x59c81a] /z/bin/traffic_server(TSHttpTxnReenable+0x141)[0x4caa51] /z/lib/plugins/stats_over_http.so(+0x1235)[0x2b6263228235] /z/bin/traffic_server(HttpSM::state_api_callout(int, void*)+0x102)[0x5999c2] /z/bin/traffic_server(HttpSM::set_next_state()+0x238)[0x5a0528] /z/bin/traffic_server(HttpSM::state_read_client_request_header(int,
[jira] [Assigned] (TS-2966) Update Feature not working
[ https://issues.apache.org/jira/browse/TS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming reassigned TS-2966: - Assignee: Zhao Yongming Update Feature not working -- Key: TS-2966 URL: https://issues.apache.org/jira/browse/TS-2966 Project: Traffic Server Issue Type: Bug Components: Cache, Core Reporter: Thomas Stinner Assignee: Zhao Yongming Fix For: sometime Attachments: traffic.out, trafficserver.patch I had a problem using the update feature. I recevied a SegFault in do_host_db_lookup which was caused by accessing ua_session which was not initialized (see attached patch). After fixing that i no longer get an SegFault, but the files that are retrieved by recursion are not placed into the cache. They are requested in every schedule. Only the starting file is placed correctly into the cache. When retrieving the files with a client, caching works as expected. So i don't think this is a configuration error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2895) memory allocation failure
[ https://issues.apache.org/jira/browse/TS-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099899#comment-14099899 ] Zhao Yongming commented on TS-2895: --- [~wangjun] any update on this issue? memory allocation failure - Key: TS-2895 URL: https://issues.apache.org/jira/browse/TS-2895 Project: Traffic Server Issue Type: Test Components: Cache, Clustering Reporter: wangjun Assignee: Zhao Yongming Labels: crash Fix For: sometime Attachments: screenshot-1.jpg, screenshot-2.jpg In this version(ats 4.0.2), I encountered a bug (memory allocation failure), Look at the system log, screenshots below(screenshot-1.jpg). Look at the program logs, screenshots below((screenshot-2.jpg). Please help me, thank you. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2903) Connections are leaked at about 1000 per hour
[ https://issues.apache.org/jira/browse/TS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049568#comment-14049568 ] Zhao Yongming commented on TS-2903: --- well, 3.2.5 is definitely a very old version, can you test it on the git master version? and if you find out that connections is leaking, you may need to check why the httpsm is hanging, please use the {http} in http_ui to get the detailed imformations, it is the best tool for this issue. good luck Connections are leaked at about 1000 per hour - Key: TS-2903 URL: https://issues.apache.org/jira/browse/TS-2903 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Puneet Dhaliwal For version 3.2.5, with keep alive on for in/out and post out, connections were leaked at about 1000 per hour. The limit of proxy.config.net.connections_throttle was reached at 30k and at 60k after enough time. CONFIG proxy.config.http.keep_alive_post_out INT 1 CONFIG proxy.config.http.keep_alive_enabled_in INT 1 CONFIG proxy.config.http.keep_alive_enabled_out INT 1 This might also be happening for 4.2.1 and 5.0. Pls let me know if there is further information required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2796) Leaking CacheVConnections
[ https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006172#comment-14006172 ] Zhao Yongming commented on TS-2796: --- any update on this issue? do you need me push on the code diffing in taobao's side? Leaking CacheVConnections - Key: TS-2796 URL: https://issues.apache.org/jira/browse/TS-2796 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 4.0.2, 4.2.1, 5.0.0 Reporter: Brian Geffon Assignee: Brian Geffon Labels: yahoo Fix For: 5.0.0 It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking CacheVConnections resulting in IOBufAllocator leaking also, here is an example: allocated |in-use | type size | free list name 67108864 | 0 |2097152 | memory/ioBufAllocator[14] 67108864 | 19922944 |1048576 | memory/ioBufAllocator[13] 4798283776 | 14155776 | 524288 | memory/ioBufAllocator[12] 7281311744 | 98304000 | 262144 | memory/ioBufAllocator[11] 1115684864 | 148242432 | 131072 | memory/ioBufAllocator[10] 497544 | 379977728 | 65536 | memory/ioBufAllocator[9] 9902751744 | 5223546880 | 32768 | memory/ioBufAllocator[8] 14762901504 |14762311680 | 16384 | memory/ioBufAllocator[7] 6558056448 | 6557859840 | 8192 | memory/ioBufAllocator[6] 41418752 | 30502912 | 4096 | memory/ioBufAllocator[5] 524288 | 0 | 2048 | memory/ioBufAllocator[4] 0 | 0 | 1024 | memory/ioBufAllocator[3] 0 | 0 |512 | memory/ioBufAllocator[2] 32768 | 0 |256 | memory/ioBufAllocator[1] 0 | 0 |128 | memory/ioBufAllocator[0] 2138112 |2124192 |928 | memory/cacheVConnection [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x. The code path in CacheVC that is allocating the IoBuffers is memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom the real issue here is the leaking CacheVC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2796) Leaking CacheVConnections
[ https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997180#comment-13997180 ] Zhao Yongming commented on TS-2796: --- hmm, from what I know, many people have the memory issue, and it is a malloc and gc issue, but they just don't realized it. that is why I am pushing the reclaim freelist enabled by default. why not test it if you can vevify the result in hours. and please take a look at the 'allocated' - 'in-use' where memory/ioBufAllocator size 32K, if you sum up them, that is the memory you leak. the same as TS-1006 Leaking CacheVConnections - Key: TS-2796 URL: https://issues.apache.org/jira/browse/TS-2796 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 4.0.2, 4.2.1, 5.0.0 Reporter: Brian Geffon Labels: yahoo Fix For: 5.0.0 It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking CacheVConnections resulting in IOBufAllocator leaking also, here is an example: allocated |in-use | type size | free list name 67108864 | 0 |2097152 | memory/ioBufAllocator[14] 67108864 | 19922944 |1048576 | memory/ioBufAllocator[13] 4798283776 | 14155776 | 524288 | memory/ioBufAllocator[12] 7281311744 | 98304000 | 262144 | memory/ioBufAllocator[11] 1115684864 | 148242432 | 131072 | memory/ioBufAllocator[10] 497544 | 379977728 | 65536 | memory/ioBufAllocator[9] 9902751744 | 5223546880 | 32768 | memory/ioBufAllocator[8] 14762901504 |14762311680 | 16384 | memory/ioBufAllocator[7] 6558056448 | 6557859840 | 8192 | memory/ioBufAllocator[6] 41418752 | 30502912 | 4096 | memory/ioBufAllocator[5] 524288 | 0 | 2048 | memory/ioBufAllocator[4] 0 | 0 | 1024 | memory/ioBufAllocator[3] 0 | 0 |512 | memory/ioBufAllocator[2] 32768 | 0 |256 | memory/ioBufAllocator[1] 0 | 0 |128 | memory/ioBufAllocator[0] 2138112 |2124192 |928 | memory/cacheVConnection [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x. The code path in CacheVC that is allocating the IoBuffers is memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom the real issue here is the leaking CacheVC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2796) Leaking CacheVConnections
[ https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997221#comment-13997221 ] Zhao Yongming commented on TS-2796: --- yeah, I know you may think that the reclaim freelist is hard to manage and evil in coding, but if we can confirm it may help in this case, I'd like you think of enable it by default, we really should not waste so many time here, and pull back some not so experiencd users when they may think that we do have big memory problem in core. I'd push on the other enhancement you like to make it enable by default. :D Leaking CacheVConnections - Key: TS-2796 URL: https://issues.apache.org/jira/browse/TS-2796 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 4.0.2, 4.2.1, 5.0.0 Reporter: Brian Geffon Labels: yahoo Fix For: 5.0.0 It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking CacheVConnections resulting in IOBufAllocator leaking also, here is an example: allocated |in-use | type size | free list name 67108864 | 0 |2097152 | memory/ioBufAllocator[14] 67108864 | 19922944 |1048576 | memory/ioBufAllocator[13] 4798283776 | 14155776 | 524288 | memory/ioBufAllocator[12] 7281311744 | 98304000 | 262144 | memory/ioBufAllocator[11] 1115684864 | 148242432 | 131072 | memory/ioBufAllocator[10] 497544 | 379977728 | 65536 | memory/ioBufAllocator[9] 9902751744 | 5223546880 | 32768 | memory/ioBufAllocator[8] 14762901504 |14762311680 | 16384 | memory/ioBufAllocator[7] 6558056448 | 6557859840 | 8192 | memory/ioBufAllocator[6] 41418752 | 30502912 | 4096 | memory/ioBufAllocator[5] 524288 | 0 | 2048 | memory/ioBufAllocator[4] 0 | 0 | 1024 | memory/ioBufAllocator[3] 0 | 0 |512 | memory/ioBufAllocator[2] 32768 | 0 |256 | memory/ioBufAllocator[1] 0 | 0 |128 | memory/ioBufAllocator[0] 2138112 |2124192 |928 | memory/cacheVConnection [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x. The code path in CacheVC that is allocating the IoBuffers is memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom the real issue here is the leaking CacheVC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2796) Leaking CacheVConnections
[ https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995956#comment-13995956 ] Zhao Yongming commented on TS-2796: --- I don't know what is this issue looking for, if we focus on the last line of memory dump, the memory/cacheVConnection, please ignore my comment. the most of the memory leaking in your memory dump result is memory/ioBufAllocator size 32K. and from what I can guess, you are using the defualt CLFUS ram cache algorithm, which will produce this effect when the system running a long time, and the big objects in memory is replaced by the smaller ones, but memory used by big objects is not released to the system yet. and that issue is already adressed in TS-1006, and result in the reclaimable freelist memory management codes, already shiped in 4.0 releases, with a configure options to enable. so, if this is the cause, please help verify that your problem is still there with reclaimable freelist enabled, and you may test the simple LRU algorithm in the ram cache too. thanks Leaking CacheVConnections - Key: TS-2796 URL: https://issues.apache.org/jira/browse/TS-2796 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 4.0.2, 4.2.1, 5.0.0 Reporter: Brian Geffon Labels: yahoo Fix For: 5.0.0 It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking CacheVConnections resulting in IOBufAllocator leaking also, here is an example: allocated |in-use | type size | free list name 67108864 | 0 |2097152 | memory/ioBufAllocator[14] 67108864 | 19922944 |1048576 | memory/ioBufAllocator[13] 4798283776 | 14155776 | 524288 | memory/ioBufAllocator[12] 7281311744 | 98304000 | 262144 | memory/ioBufAllocator[11] 1115684864 | 148242432 | 131072 | memory/ioBufAllocator[10] 497544 | 379977728 | 65536 | memory/ioBufAllocator[9] 9902751744 | 5223546880 | 32768 | memory/ioBufAllocator[8] 14762901504 |14762311680 | 16384 | memory/ioBufAllocator[7] 6558056448 | 6557859840 | 8192 | memory/ioBufAllocator[6] 41418752 | 30502912 | 4096 | memory/ioBufAllocator[5] 524288 | 0 | 2048 | memory/ioBufAllocator[4] 0 | 0 | 1024 | memory/ioBufAllocator[3] 0 | 0 |512 | memory/ioBufAllocator[2] 32768 | 0 |256 | memory/ioBufAllocator[1] 0 | 0 |128 | memory/ioBufAllocator[0] 2138112 |2124192 |928 | memory/cacheVConnection [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x. The code path in CacheVC that is allocating the IoBuffers is memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom the real issue here is the leaking CacheVC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2796) Leaking CacheVConnections
[ https://issues.apache.org/jira/browse/TS-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995961#comment-13995961 ] Zhao Yongming commented on TS-2796: --- and if the recleamable freelist will help you, please help me promote it to be enabled by default Leaking CacheVConnections - Key: TS-2796 URL: https://issues.apache.org/jira/browse/TS-2796 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 4.0.2, 4.2.1, 5.0.0 Reporter: Brian Geffon Labels: yahoo Fix For: 5.0.0 It appears there is a memory leak in 4.0.x, 4.2.x, and master leaking CacheVConnections resulting in IOBufAllocator leaking also, here is an example: allocated |in-use | type size | free list name 67108864 | 0 |2097152 | memory/ioBufAllocator[14] 67108864 | 19922944 |1048576 | memory/ioBufAllocator[13] 4798283776 | 14155776 | 524288 | memory/ioBufAllocator[12] 7281311744 | 98304000 | 262144 | memory/ioBufAllocator[11] 1115684864 | 148242432 | 131072 | memory/ioBufAllocator[10] 497544 | 379977728 | 65536 | memory/ioBufAllocator[9] 9902751744 | 5223546880 | 32768 | memory/ioBufAllocator[8] 14762901504 |14762311680 | 16384 | memory/ioBufAllocator[7] 6558056448 | 6557859840 | 8192 | memory/ioBufAllocator[6] 41418752 | 30502912 | 4096 | memory/ioBufAllocator[5] 524288 | 0 | 2048 | memory/ioBufAllocator[4] 0 | 0 | 1024 | memory/ioBufAllocator[3] 0 | 0 |512 | memory/ioBufAllocator[2] 32768 | 0 |256 | memory/ioBufAllocator[1] 0 | 0 |128 | memory/ioBufAllocator[0] 2138112 |2124192 |928 | memory/cacheVConnection [~bcall] has observed this issue on 4.0.x, and we have observed this on 4.2.x. The code path in CacheVC that is allocating the IoBuffers is memory/IOBuffer/Cache.cc:2603; however, that's just the observable symptom the real issue here is the leaking CacheVC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2669) ATS crash, then restart with all cached objects cleared
[ https://issues.apache.org/jira/browse/TS-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950437#comment-13950437 ] Zhao Yongming commented on TS-2669: --- well, maybe something that need more check, please findout the diags.log lines like: {code} [Mar 27 20:31:25.948] {0x2b02748fde00} STATUS: opened /var/log/trafficserver/diags.log [Mar 27 20:31:25.948] {0x2b02748fde00} NOTE: updated diags config [Mar 27 20:31:25.954] Server {0x2b02748fde00} NOTE: cache clustering disabled [Mar 27 20:31:25.964] Server {0x2b02748fde00} NOTE: ip_allow.config updated, reloading [Mar 27 20:31:25.969] Server {0x2b02748fde00} NOTE: loading SSL certificate configuration from /etc/trafficserver/ssl_multicert.config [Mar 27 20:31:25.976] Server {0x2b02748fde00} NOTE: cache clustering disabled [Mar 27 20:31:25.977] Server {0x2b02748fde00} NOTE: logging initialized[15], logging_mode = 3 [Mar 27 20:31:25.978] Server {0x2b02748fde00} NOTE: loading plugin '/usr/lib64/trafficserver/plugins/libloader.so' [Mar 27 20:31:25.982] Server {0x2b02748fde00} NOTE: loading plugin '/usr/local/ironbee/libexec/ts_ironbee.so' [Mar 27 20:31:25.983] Server {0x2b02748fde00} NOTE: Rolling interval adjusted from 0 sec to 300 sec for /var/log/trafficserver/ts-ironbee.log [Mar 27 20:31:25.992] Server {0x2b02748fde00} NOTE: traffic server running [Mar 27 20:31:26.077] Server {0x2b0275d8e700} NOTE: cache enabled {code} you may find out that ' traffic server running' indicates that ATS is running, and 'cache enabled' show that cache is working. due to your system is crash and cache is not enabled, I will suspect that your ATS does not have that 'cache enabled' line. this is often caused by something like privilege issue, that ATS server process does not have the write privilege on the disk block device files etc. or anything else you may findout in the diags.log or even system logs. the interim cache aio bugs may cause you lose the saved data, interim cache may cause you lose the data while server process restart, and AIO may get all data clear. but all those bugs fixed in the v4.1.0 release. ATS crash, then restart with all cached objects cleared --- Key: TS-2669 URL: https://issues.apache.org/jira/browse/TS-2669 Project: Traffic Server Issue Type: Bug Reporter: AnDao Hi all, I'm using ATS 4.1.2, my ATS is just crashed and restart and clean all the cached objects, cause my backend servers overload. Why ATS do clean all the cached objects when crash and restart? The log is: * manager.log [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: [LocalManager::mgmtShutdown] Executing shutdown request. [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: [LocalManager::processShutdown] Executing process shutdown request. [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR: (last system error 32: Broken pipe) [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} STATUS: opened /zserver/log/trafficserver/manager.log [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} NOTE: updated diags config [Mar 27 12:57:13.520] Manager {0x7ffaeec7e7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-358.6.2.el6.x86_64' [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80 [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: [TrafficManager] Setup complete [Mar 27 12:57:14.618] Manager {0x7ffaeec7e7e0} NOTE: [LocalManager::startProxy] Launching ts process [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '15' [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: [Alarms::signalAlarm] Server Process born *** traffic.out *** [E. Mgmt] log == [TrafficManager] using root directory '/zserver/trafficserver-4.1.2' [TrafficServer] using root directory '/zserver/trafficserver-4.1.2' NOTE: Traffic Server received Sig 15: Terminated [E. Mgmt] log == [TrafficManager] using root directory '/zserver/trafficserver-4.1.2' [TrafficServer] using root directory '/zserver/trafficserver-4.1.2' NOTE: Traffic Server received Sig 11: Segmentation fault /zserver/trafficserver-4.1.2/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0(+0x35a360f500)[0x2b3b55819500] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact47change_response_header_because_of_range_requestEPNS_5StateEP7HTTPHdr+0x240)[0x54b8a0] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact28handle_content_length_headerEPNS_5StateEP7HTTPHdrS3_+0x2c8)[0x54bc38]
[jira] [Commented] (TS-2669) ATS crash, then restart with all cached objects cleared
[ https://issues.apache.org/jira/browse/TS-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950510#comment-13950510 ] Zhao Yongming commented on TS-2669: --- please attach the server start log in diags.log file, as I attached. and please tell us how is your storage is configured. ATS crash, then restart with all cached objects cleared --- Key: TS-2669 URL: https://issues.apache.org/jira/browse/TS-2669 Project: Traffic Server Issue Type: Bug Reporter: AnDao Attachments: cachobjecs.png, storage.png Hi all, I'm using ATS 4.1.2, my ATS is just crashed and restart and clean all the cached objects, cause my backend servers overload. Why ATS do clean all the cached objects when crash and restart? The log is: * manager.log [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} FATAL: [LocalManager::pollMgmtProcessServer] Error in read (errno: 104) [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: [LocalManager::mgmtShutdown] Executing shutdown request. [Mar 27 12:57:13.022] Manager {0x7f597e3477e0} NOTE: [LocalManager::processShutdown] Executing process shutdown request. [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR: [LocalManager::sendMgmtMsgToProcesses] Error writing message [Mar 27 12:57:13.028] Manager {0x7f597e3477e0} ERROR: (last system error 32: Broken pipe) [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} STATUS: opened /zserver/log/trafficserver/manager.log [Mar 27 12:57:13.174] {0x7ffaeec7e7e0} NOTE: updated diags config [Mar 27 12:57:13.520] Manager {0x7ffaeec7e7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-358.6.2.el6.x86_64' [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80 [Mar 27 12:57:13.550] Manager {0x7ffaeec7e7e0} NOTE: [TrafficManager] Setup complete [Mar 27 12:57:14.618] Manager {0x7ffaeec7e7e0} NOTE: [LocalManager::startProxy] Launching ts process [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '15' [Mar 27 12:57:14.632] Manager {0x7ffaeec7e7e0} NOTE: [Alarms::signalAlarm] Server Process born *** traffic.out *** [E. Mgmt] log == [TrafficManager] using root directory '/zserver/trafficserver-4.1.2' [TrafficServer] using root directory '/zserver/trafficserver-4.1.2' NOTE: Traffic Server received Sig 15: Terminated [E. Mgmt] log == [TrafficManager] using root directory '/zserver/trafficserver-4.1.2' [TrafficServer] using root directory '/zserver/trafficserver-4.1.2' NOTE: Traffic Server received Sig 11: Segmentation fault /zserver/trafficserver-4.1.2/bin/traffic_server - STACK TRACE: /lib64/libpthread.so.0(+0x35a360f500)[0x2b3b55819500] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact47change_response_header_because_of_range_requestEPNS_5StateEP7HTTPHdr+0x240)[0x54b8a0] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact28handle_content_length_headerEPNS_5StateEP7HTTPHdrS3_+0x2c8)[0x54bc38] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact14build_responseEPNS_5StateEP7HTTPHdrS3_11HTTPVersion10HTTPStatusPKc+0x3e3)[0x54c0c3] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN12HttpTransact22handle_transform_readyEPNS_5StateE+0x70)[0x54ca40] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN6HttpSM32call_transact_and_set_next_stateEPFvPN12HttpTransact5StateEE+0x28)[0x51b418] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN6HttpSM38state_response_wait_for_transform_readEiPv+0xed)[0x52988d] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x533178] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN17TransformTerminus12handle_eventEiPv+0x1d2)[0x4e8c62] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x8f)[0x6a5a0f] /zserver/trafficserver-4.1.2/bin/traffic_server(_ZN7EThread7executeEv+0x63b)[0x6a658b] /zserver/trafficserver-4.1.2/bin/traffic_server[0x6a48aa] /lib64/libpthread.so.0(+0x35a3607851)[0x2b3b55811851] /lib64/libc.so.6(clone+0x6d)[0x35a32e890d] [E. Mgmt] log == [TrafficManager] using root directory '/zserver/trafficserver-4.1.2' [TrafficServer] using root directory '/zserver/trafficserver-4.1.2' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TS-2668) need a way to fetch from the cluster when doing cluster local caching
Zhao Yongming created TS-2668: - Summary: need a way to fetch from the cluster when doing cluster local caching Key: TS-2668 URL: https://issues.apache.org/jira/browse/TS-2668 Project: Traffic Server Issue Type: Sub-task Components: Cache, Clustering Reporter: Zhao Yongming this is the TS-2184 #2 feature subtask. when you want do local caching in cluster env, you must tell cache to write done to the local disk when cluster hit. we need a good way to handle this. maybe a new API or similar API changes. be aware, the #2 feature may harms, and should be co-working with the other features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (TS-2668) need a way to fetch from the cluster when doing cluster local caching
[ https://issues.apache.org/jira/browse/TS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming reassigned TS-2668: - Assignee: weijin Weijin and Yuqing is working on a feature that is related to the API change requirements. please help find out the way to merge with this feature. need a way to fetch from the cluster when doing cluster local caching - Key: TS-2668 URL: https://issues.apache.org/jira/browse/TS-2668 Project: Traffic Server Issue Type: Sub-task Components: Cache, Clustering Reporter: Zhao Yongming Assignee: weijin Fix For: sometime this is the TS-2184 #2 feature subtask. when you want do local caching in cluster env, you must tell cache to write done to the local disk when cluster hit. we need a good way to handle this. maybe a new API or similar API changes. be aware, the #2 feature may harms, and should be co-working with the other features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (TS-2528) better bool handling in public APIs (ts / mgmt)
[ https://issues.apache.org/jira/browse/TS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming reassigned TS-2528: - Assignee: Zhao Yongming better bool handling in public APIs (ts / mgmt) --- Key: TS-2528 URL: https://issues.apache.org/jira/browse/TS-2528 Project: Traffic Server Issue Type: Bug Components: Management API Reporter: Zhao Yongming Assignee: Zhao Yongming Labels: api-change Fix For: 5.0.0 {code} tsapi bool TSListIsEmpty(TSList l); tsapi bool TSListIsValid(TSList l); tsapi bool TSIpAddrListIsEmpty(TSIpAddrList ip_addrl); tsapi bool TSIpAddrListIsValid(TSIpAddrList ip_addrl); tsapi bool TSPortListIsEmpty(TSPortList portl); tsapi bool TSPortListIsValid(TSPortList portl); tsapi bool TSStringListIsEmpty(TSStringList strl); tsapi bool TSStringListIsValid(TSStringList strl); tsapi bool TSIntListIsEmpty(TSIntList intl); tsapi bool TSIntListIsValid(TSIntList intl, int min, int max); tsapi bool TSDomainListIsEmpty(TSDomainList domainl); tsapi bool TSDomainListIsValid(TSDomainList domainl); tsapi TSError TSRestart(bool cluster); tsapi TSError TSBounce(bool cluster); tsapi TSError TSStatsReset(bool cluster, const char *name = NULL); tsapi TSError TSEventIsActive(char *event_name, bool * is_current); {code} and we have: {code} #if !defined(linux) #if defined (__SUNPRO_CC) || (defined (__GNUC__) || ! defined(__cplusplus)) #if !defined (bool) #if !defined(darwin) !defined(freebsd) !defined(solaris) // XXX: What other platforms are there? #define bool int #endif #endif #if !defined (true) #define true 1 #endif #if !defined (false) #define false 0 #endif #endif #endif // not linux {code} I'd like we can make it a typedef or replace bool with int completely, to make things better to be parsed by SWIG tools etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-1521) Enable compression for binary log format
[ https://issues.apache.org/jira/browse/TS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919388#comment-13919388 ] Zhao Yongming commented on TS-1521: --- [~bettydreamit] submit thire gzipping patch for the ascii loging, please consider accept this feature too. Enable compression for binary log format Key: TS-1521 URL: https://issues.apache.org/jira/browse/TS-1521 Project: Traffic Server Issue Type: New Feature Components: Logging Environment: RHEL 6+ Reporter: Lans Carstensen Assignee: Yunkai Zhang Fix For: 6.0.0 Attachments: logcompress.patch As noted by in a discussion on #traffic-server, gzip can result in 90%+ compression on the binary access logs. By adding a reasonable streaming compression algorithm to the binary format you could significantly reduce logging-related IOPS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-727) Do we need support for streams partitions?
[ https://issues.apache.org/jira/browse/TS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914204#comment-13914204 ] Zhao Yongming commented on TS-727: -- I think that remove the streams partitions will result into a completely remove of the MIXT cache, someone is working on the rtmp alike streaming cache for ATS, I'd like to talk to them before we nuke them. IMO, the 'stream' cache is much efficent than http if you would like to use ATS for live streaming broadcasting. Do we need support for streams partitions? Key: TS-727 URL: https://issues.apache.org/jira/browse/TS-727 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: Leif Hedstrom Assignee: Alan M. Carroll Fix For: 5.0.0 There's code in the cache related to MIXT streams volumes (caches). Since we don't support streams, I'm thinking this code could be removed? Or alternatively, we should expose APIs so that someone writing a plugin and wish to store a different protocol (e.g. QT) can register this media type with the API and core. The idea being that the core only contains protocols that are in the core, but expose the cache core so that plugins can take advantage of it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2184) Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled
[ https://issues.apache.org/jira/browse/TS-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911428#comment-13911428 ] Zhao Yongming commented on TS-2184: --- when Cluster is designed, the origin goal is to have ONLY one single valid content in the cluster, that is a good idea when you have very huge volume contents, and we have continued on this target, make sure even when some of the machines flapping in the cluster , to ensure that at anytime there is only one valid content in the cluster. but consider of the ICP protocol and others, those may have multiple same content in the ICP cluster, and if you make it complex, it may have multi-version content in the cluster at sometime. so, ICP alike protocols is consider as a not so cool(safe) protocol if you need to enforce of the consistency of the contents you provide to the user agents. back to this requirement, we can make the cluster act like ICP, first write to the cluster hashing machine, and the second or later read poll that content from the cluster and write to the local, but it will introduce the consistency problem here, you don't know who have that content local in the cluster when it is updated on the origin side, within the freshness. in most case, all write to the cache haveto broadcast to all the machines in the cluster, to enforce that change. the proxy.config.http.cache.cluster_cache_local enabled is a directive to disable cluster hashing in the cluster mode, our origin target is to use it to put some very hot hostnames(or urls) in the local, to reducing the intro-cluster traffic. proxy.config.http.cache.cluster_cache_local enabled is override-able and we have the same directive in the cache.config too. if it is in active, Cluster may consider as mode=3, the single host mod. so, if we want to achive ICP alike feature in cluster, mostly we should: 1, write the content on the hashing machine, if it is a miss in the cluster 2, read the cluster if it is missing in the local machine 3, write the local if it is a hit in the cluster 4, broadcast the change to all the machines in the cluster, if it is a overwrite(ie, revalidating etc) 5, purge on the hashing machine and broadcast the purge to all the machines in the cluster it will be a very big change in the Cluster and http transaction cc [~zwoop] Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled --- Key: TS-2184 URL: https://issues.apache.org/jira/browse/TS-2184 Project: Traffic Server Issue Type: Improvement Components: Cache, Clustering Reporter: Scott Harris Assignee: Bin Chen Fix For: 6.0.0 With proxy.config.http.cache.cluster_cache_local enabled I would like cluster nodes to store content locally but try to retrieve content from the cluster first (if not cached locally) and if no cluster nodes have content cached then retrieve from origin. Example - 2 Cluster nodes in Full cluster mode. 1. Node1 and Node2 are both empty. 2. Request to Node1 for http://www.example.com/foo.html;. 3. Query Cluster for object 4. Not cached in cluster so retrieve from orgin, serve to client, object now cached on Node1. 5. Request comes to Node2 for http://www.example.com/foo.html;. 6. Node2 retrieves cached version from Node1, serves to client, stores locally. 7. Subsequent request comes to Node1 or Node2 for http://www.example.com/foo.html;, object is served to client from local cache. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2184) Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled
[ https://issues.apache.org/jira/browse/TS-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911434#comment-13911434 ] Zhao Yongming commented on TS-2184: --- for the very HOT content in the cluster, we have another solution that tracking of the hot content(in traffic view) and put them in the cluster_cache_local list to make it dynamic. and this solution will need a workaround of the purging, you need to broadcast all the purging to every machines in the cluster. and pull from the hashing machine in the cluster is not implemented too. we are testing to see how cool it will be. this function is provide by [~happy_fish100] FYI Fetch from cluster with proxy.config.http.cache.cluster_cache_local enabled --- Key: TS-2184 URL: https://issues.apache.org/jira/browse/TS-2184 Project: Traffic Server Issue Type: Improvement Components: Cache, Clustering Reporter: Scott Harris Assignee: Bin Chen Fix For: 6.0.0 With proxy.config.http.cache.cluster_cache_local enabled I would like cluster nodes to store content locally but try to retrieve content from the cluster first (if not cached locally) and if no cluster nodes have content cached then retrieve from origin. Example - 2 Cluster nodes in Full cluster mode. 1. Node1 and Node2 are both empty. 2. Request to Node1 for http://www.example.com/foo.html;. 3. Query Cluster for object 4. Not cached in cluster so retrieve from orgin, serve to client, object now cached on Node1. 5. Request comes to Node2 for http://www.example.com/foo.html;. 6. Node2 retrieves cached version from Node1, serves to client, stores locally. 7. Subsequent request comes to Node1 or Node2 for http://www.example.com/foo.html;, object is served to client from local cache. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2531) The default remap rule doesn't match a forward proxy request
[ https://issues.apache.org/jira/browse/TS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895473#comment-13895473 ] Zhao Yongming commented on TS-2531: --- seems you want a default host rule on a forward proxy setup, what is this config for? it is a bit weird The default remap rule doesn't match a forward proxy request Key: TS-2531 URL: https://issues.apache.org/jira/browse/TS-2531 Project: Traffic Server Issue Type: Bug Components: HTTP Reporter: Bryan Call Attachments: 0001-fix-bug-TS_2531.patch when doing a forward proxy request it won't math the default rule, but will match other rules that specify the hostname. Example request: GET http://foo.yahoo.com HTTP/1.1 Host: foo.yahoo.com remap.config: map / http://www.yahoo.com Response: HTTP/1.1 404 Not Found ... However, this works: remap.config: map http://foo.yahoo.com http://www.yahoo.com -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (TS-2561) remove app-template from examples
Zhao Yongming created TS-2561: - Summary: remove app-template from examples Key: TS-2561 URL: https://issues.apache.org/jira/browse/TS-2561 Project: Traffic Server Issue Type: Bug Components: Cleanup Reporter: Zhao Yongming due to the STANDALONE IOCORE is removed, the app-template example should not be there. and most of the app-template STANDALONE IOCORE design purpose is able to satisfied with the protocol plugin. let us remove it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (TS-2561) remove app-template from examples
[ https://issues.apache.org/jira/browse/TS-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-2561: -- Affects Version/s: 5.0.0 Fix Version/s: 5.0.0 Assignee: Zhao Yongming remove app-template from examples - Key: TS-2561 URL: https://issues.apache.org/jira/browse/TS-2561 Project: Traffic Server Issue Type: Bug Components: Cleanup Affects Versions: 5.0.0 Reporter: Zhao Yongming Assignee: Zhao Yongming Fix For: 5.0.0 due to the STANDALONE IOCORE is removed, the app-template example should not be there. and most of the app-template STANDALONE IOCORE design purpose is able to satisfied with the protocol plugin. let us remove it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (TS-2019) find out what is the problem of reporting OpenReadHead failed on vector inconsistency
[ https://issues.apache.org/jira/browse/TS-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895487#comment-13895487 ] Zhao Yongming commented on TS-2019: --- [~weijin] should check this issue find out what is the problem of reporting OpenReadHead failed on vector inconsistency - Key: TS-2019 URL: https://issues.apache.org/jira/browse/TS-2019 Project: Traffic Server Issue Type: Task Components: Cache Reporter: Zhao Yongming Assignee: Alan M. Carroll Priority: Critical Fix For: 5.0.0 {code} [Jul 10 19:40:33.170] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey 44B5C68B : vector inconsistency with 4624 [Jul 10 19:40:33.293] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey 2ABA746F : vector inconsistency with 4632 [Jul 10 19:40:33.368] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey 389594A0 : vector inconsistency with 4632 [Jul 10 19:40:33.399] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey FBC601A3 : vector inconsistency with 4632 [Jul 10 19:40:33.506] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey 1F39AD5F : vector inconsistency with 4632 [Jul 10 19:40:33.602] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey ABFC6D97 : vector inconsistency with 4632 [Jul 10 19:40:33.687] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey 2420ABBF : vector inconsistency with 4632 [Jul 10 19:40:33.753] Server {0x2aaf1680} NOTE: OpenReadHead failed for cachekey 5DD061C8 : vector inconsistency with 4632 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (TS-2527) mgmtapi.h should be C style
Zhao Yongming created TS-2527: - Summary: mgmtapi.h should be C style Key: TS-2527 URL: https://issues.apache.org/jira/browse/TS-2527 Project: Traffic Server Issue Type: Bug Components: Management API Reporter: Zhao Yongming {code} /*--- statistics operations ---*/ /* TSStatsReset: sets all the statistics variables to their default values * Input: cluster - Reset the stats clusterwide or not * Outpue: TSErrr */ tsapi TSError TSStatsReset(bool cluster, const char *name = NULL); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (TS-2528) better bool handling in mgmtapi.h
Zhao Yongming created TS-2528: - Summary: better bool handling in mgmtapi.h Key: TS-2528 URL: https://issues.apache.org/jira/browse/TS-2528 Project: Traffic Server Issue Type: Bug Components: Management API Reporter: Zhao Yongming {code} tsapi bool TSListIsEmpty(TSList l); tsapi bool TSListIsValid(TSList l); tsapi bool TSIpAddrListIsEmpty(TSIpAddrList ip_addrl); tsapi bool TSIpAddrListIsValid(TSIpAddrList ip_addrl); tsapi bool TSPortListIsEmpty(TSPortList portl); tsapi bool TSPortListIsValid(TSPortList portl); tsapi bool TSStringListIsEmpty(TSStringList strl); tsapi bool TSStringListIsValid(TSStringList strl); tsapi bool TSIntListIsEmpty(TSIntList intl); tsapi bool TSIntListIsValid(TSIntList intl, int min, int max); tsapi bool TSDomainListIsEmpty(TSDomainList domainl); tsapi bool TSDomainListIsValid(TSDomainList domainl); tsapi TSError TSRestart(bool cluster); tsapi TSError TSBounce(bool cluster); tsapi TSError TSStatsReset(bool cluster, const char *name = NULL); tsapi TSError TSEventIsActive(char *event_name, bool * is_current); {code} and we have: {code} #if !defined(linux) #if defined (__SUNPRO_CC) || (defined (__GNUC__) || ! defined(__cplusplus)) #if !defined (bool) #if !defined(darwin) !defined(freebsd) !defined(solaris) // XXX: What other platforms are there? #define bool int #endif #endif #if !defined (true) #define true 1 #endif #if !defined (false) #define false 0 #endif #endif #endif // not linux {code} I'd like we can make it a typedef or replace bool with int completely, to make things better to be parsed by SWIG tools etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (TS-2493) API: introducing UDP API
Zhao Yongming created TS-2493: - Summary: API: introducing UDP API Key: TS-2493 URL: https://issues.apache.org/jira/browse/TS-2493 Project: Traffic Server Issue Type: Improvement Reporter: Zhao Yongming when doing UDP tasks in plugins, there is no UDP api available to use, we need to introduce those APIs task for [~xinyuziran] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (TS-2493) API: introducing UDP API
[ https://issues.apache.org/jira/browse/TS-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yongming updated TS-2493: -- Component/s: TS API Affects Version/s: 4.1.2 Fix Version/s: 4.2.0 Assignee: Zhao Yongming Labels: UDP (was: ) API: introducing UDP API Key: TS-2493 URL: https://issues.apache.org/jira/browse/TS-2493 Project: Traffic Server Issue Type: Improvement Components: TS API Affects Versions: 4.1.2 Reporter: Zhao Yongming Assignee: Zhao Yongming Labels: UDP Fix For: 4.2.0 when doing UDP tasks in plugins, there is no UDP api available to use, we need to introduce those APIs task for [~xinyuziran] -- This message was sent by Atlassian JIRA (v6.1.5#6160)