[jira] [Created] (TS-4053) Add hit rate and memory usage regressions for RAM cache.
John Plevyak created TS-4053: Summary: Add hit rate and memory usage regressions for RAM cache. Key: TS-4053 URL: https://issues.apache.org/jira/browse/TS-4053 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: John Plevyak It would be nice to have a hit rate and memory usage regression tests for the RAM cache. In particular comparing LRU and CLFUS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-4053) Add hit rate and memory usage regressions for RAM cache.
[ https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak reassigned TS-4053: Assignee: John Plevyak > Add hit rate and memory usage regressions for RAM cache. > > > Key: TS-4053 > URL: https://issues.apache.org/jira/browse/TS-4053 > Project: Traffic Server > Issue Type: Improvement > Components: Cache >Reporter: John Plevyak >Assignee: John Plevyak > > It would be nice to have a hit rate and memory usage regression tests for the > RAM cache. In particular comparing LRU and CLFUS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4053) Add hit rate and memory usage regressions for RAM cache, tune CLFUS.
[ https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-4053: - Description: It would be nice to have a hit rate and memory usage regression tests for the RAM cache. In particular comparing LRU and CLFUS. Once we have this we can tune the CLFUS implementation. (was: It would be nice to have a hit rate and memory usage regression tests for the RAM cache. In particular comparing LRU and CLFUS.) > Add hit rate and memory usage regressions for RAM cache, tune CLFUS. > > > Key: TS-4053 > URL: https://issues.apache.org/jira/browse/TS-4053 > Project: Traffic Server > Issue Type: Improvement > Components: Cache >Reporter: John Plevyak >Assignee: John Plevyak >Priority: Minor > > It would be nice to have a hit rate and memory usage regression tests for the > RAM cache. In particular comparing LRU and CLFUS. Once we have this we can > tune the CLFUS implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4053) Add hit rate and memory usage regressions for RAM cache, tune CLFUS.
[ https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-4053: - Priority: Minor (was: Major) Summary: Add hit rate and memory usage regressions for RAM cache, tune CLFUS. (was: Add hit rate and memory usage regressions for RAM cache.) > Add hit rate and memory usage regressions for RAM cache, tune CLFUS. > > > Key: TS-4053 > URL: https://issues.apache.org/jira/browse/TS-4053 > Project: Traffic Server > Issue Type: Improvement > Components: Cache >Reporter: John Plevyak >Assignee: John Plevyak >Priority: Minor > > It would be nice to have a hit rate and memory usage regression tests for the > RAM cache. In particular comparing LRU and CLFUS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-3786) Use a consensus algorithm to elect the cluster master
[ https://issues.apache.org/jira/browse/TS-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak reassigned TS-3786: Assignee: John Plevyak Use a consensus algorithm to elect the cluster master - Key: TS-3786 URL: https://issues.apache.org/jira/browse/TS-3786 Project: Traffic Server Issue Type: Improvement Components: Manager Reporter: John Plevyak Assignee: John Plevyak We should use a consensus algorithm to elect the cluster master and to update the configurations so that there is no single point of failure and machines entering or restarting can be brought to a consistent state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3786) Use a consensus algorithm to elect the cluster master
John Plevyak created TS-3786: Summary: Use a consensus algorithm to elect the cluster master Key: TS-3786 URL: https://issues.apache.org/jira/browse/TS-3786 Project: Traffic Server Issue Type: Improvement Components: Manager Reporter: John Plevyak We should use a consensus algorithm to elect the cluster master and to update the configurations so that there is no single point of failure and machines entering or restarting can be brought to a consistent state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-3508) use accept4 on linux systems where available to reduce system calls
John Plevyak created TS-3508: Summary: use accept4 on linux systems where available to reduce system calls Key: TS-3508 URL: https://issues.apache.org/jira/browse/TS-3508 Project: Traffic Server Issue Type: Improvement Components: Network Reporter: John Plevyak The accept4() syscall can set flags on the accepted socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3401) AIO blocks under lock contention
[ https://issues.apache.org/jira/browse/TS-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334450#comment-14334450 ] John Plevyak commented on TS-3401: -- I generally agree, but it is true that aio_thread_main() uses ink_atomiclist_popall() to grab the entire atomic queue associated with a AIO_Req for a single file descriptor/disk. This means that a bunch of reads could be blocked behind the disk operation (as well as acquiring the mutex for write callbacks, but that is probably less important). We could switch to using ink_atomiclist_pop in aio_move which would cause only a single op to be moved to the local queue. That said, we should probably reexamine using linux native AIO now that the eventfd code has landed. I think it will be more efficient, and the new linux multi-queue support for SSDs we can do millions of ops/sec, so we want to be able to load up that queue and native AIO with eventfd looks like a good way to do it. We should also consider changing all the delay periods (e.g. AIO_PERIOD) to be 100 mseconds or more if we have eventfd as we don't need to busy poll anything... we will be awoken if anything appears in a queue or on an file descriptor. AIO blocks under lock contention Key: TS-3401 URL: https://issues.apache.org/jira/browse/TS-3401 Project: Traffic Server Issue Type: Bug Components: Core Reporter: Brian Geffon Assignee: Brian Geffon Attachments: aio.patch In {{aio_thread_main()}} while trying to process AIO ops the AIO thread will wait on the mutex for the op which obviously blocks other AIO ops from processing. We should use a try lock instead and reschedule the ops that we couldn't immediately process. Patch attached. Waiting for reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-1264) LRU RAM cache not accounting for overhead
[ https://issues.apache.org/jira/browse/TS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-1264: - Attachment: ram_cache.patch LRU RAM cache not accounting for overhead - Key: TS-1264 URL: https://issues.apache.org/jira/browse/TS-1264 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.1.3 Reporter: John Plevyak Assignee: Leif Hedstrom Priority: Minor Fix For: 6.0.0 Attachments: ram_cache.patch The CLFUS RAM cache takes its overhead into account when determining how many bytes it is using. The LRU cache does not, which makes it hard to compare performance between the two and hard to correctly size the LRU RAM cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3044) linux native AIO should use eventfd if available to signal thread
[ https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154210#comment-14154210 ] John Plevyak commented on TS-3044: -- The perror is from the original AIO_MODE_NATIVE code. I was just following the style as this was a minimal patch just to add the eventfd handling. I agree that we should change that to standard ATS errors. For most unix/linux installations waiting for less than 10msec is the same as waiting for 0msecs, and can result in busy spinning. The iocore has a minimum wait time, so HRTIME_MSECONDS(4) is disingenuous as well as being a poor idea (if it actually was obeyed). linux native AIO should use eventfd if available to signal thread - Key: TS-3044 URL: https://issues.apache.org/jira/browse/TS-3044 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: John Plevyak Assignee: Phil Sorber Fix For: 5.2.0 Attachments: native-aio-eventfd.patch linux native AIO has the ability to signal the event thread to get off the poll and service the disk via the io_set_eventfd() call. linux native AIO scales better than the thread-based IO, but the current implementation can introduce delays on lightly loaded systems because of the thread is waiting on epoll(). This can be remedied by using io_set_eventfd -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-3044) linux native AIO should use eventfd if available to signal thread
[ https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-3044: - Assignee: weijin linux native AIO should use eventfd if available to signal thread - Key: TS-3044 URL: https://issues.apache.org/jira/browse/TS-3044 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: John Plevyak Assignee: weijin linux native AIO has the ability to signal the event thread to get off the poll and service the disk via the io_set_eventfd() call. linux native AIO scales better than the thread-based IO, but the current implementation can introduce delays on lightly loaded systems because of the thread is waiting on epoll(). This can be remedied by using io_set_eventfd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TS-3044) linux native AIO should use eventfd if available to signal thread
John Plevyak created TS-3044: Summary: linux native AIO should use eventfd if available to signal thread Key: TS-3044 URL: https://issues.apache.org/jira/browse/TS-3044 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: John Plevyak linux native AIO has the ability to signal the event thread to get off the poll and service the disk via the io_set_eventfd() call. linux native AIO scales better than the thread-based IO, but the current implementation can introduce delays on lightly loaded systems because of the thread is waiting on epoll(). This can be remedied by using io_set_eventfd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TS-3044) linux native AIO should use eventfd if available to signal thread
[ https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-3044: - Attachment: native-aio-eventfd.patch linux native AIO should use eventfd if available to signal thread - Key: TS-3044 URL: https://issues.apache.org/jira/browse/TS-3044 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: John Plevyak Assignee: weijin Attachments: native-aio-eventfd.patch linux native AIO has the ability to signal the event thread to get off the poll and service the disk via the io_set_eventfd() call. linux native AIO scales better than the thread-based IO, but the current implementation can introduce delays on lightly loaded systems because of the thread is waiting on epoll(). This can be remedied by using io_set_eventfd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-3044) linux native AIO should use eventfd if available to signal thread
[ https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110181#comment-14110181 ] John Plevyak commented on TS-3044: -- Assigned to weijin for review as he was in charge of the native linux AIO and can assess the impact. As I remember, this isn't enabled by default because of the latency concerns by Leif. With this patch, if the latency concerns are addressed, we might want to enable this feature by default. linux native AIO should use eventfd if available to signal thread - Key: TS-3044 URL: https://issues.apache.org/jira/browse/TS-3044 Project: Traffic Server Issue Type: Improvement Components: Cache Reporter: John Plevyak Assignee: weijin Attachments: native-aio-eventfd.patch linux native AIO has the ability to signal the event thread to get off the poll and service the disk via the io_set_eventfd() call. linux native AIO scales better than the thread-based IO, but the current implementation can introduce delays on lightly loaded systems because of the thread is waiting on epoll(). This can be remedied by using io_set_eventfd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
[ https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763219#comment-13763219 ] John Plevyak commented on TS-2193: -- I am concerned about the proxy.config.dns.dedicated_thread option. It is testing a configuration where event threads are not ET_NET. While originally the design accounted for that possibility, it was the case that all event threads were ET_NET soon thereafter and I an worried that there are implicit assumptions that it is the case (as seems to be with the session manager). Is this really necessary? DNS processing should be cheap. Several fixes come to mind: 1) make the SessionManager not depend on being called on an ET_NET (this should probably be done in any case). It could simply shift to any ET_NET thread if it was called from another. 2) make the DNS processor call back on an ET_NET thread (this is stupid since there is no good reason for it to assume the caller has such a restriction and indeed what about the other ET_ type?). 3) make the DNS processor run across threads by hashing hosts to all ET_NET threads. This will fix both the issue we are seeing as well as spread the load. We should probably do both 1 and 3. There will be a temptation to do 2) because it will be the easy fix but I think it is the wrong way out. Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1 -- Key: TS-2193 URL: https://issues.apache.org/jira/browse/TS-2193 Project: Traffic Server Issue Type: Bug Components: DNS Affects Versions: 4.1.0 Reporter: Tommy Lee Fix For: 4.1.0 Attachments: bt-01.txt Hi all, I've tried to enable DNS Thread without luck. When i set proxy.config.dns.dedicated_thread to 1, it crashes with the information below. The ATS is working in Forward Proxy mode. Thanks in advance. -- traffic.out NOTE: Traffic Server received Sig 11: Segmentation fault /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0] /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2] /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8] /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a] /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1] /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94] /usr/local/cache-4.1/bin/traffic_server[0x5fd382] /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72] /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94] /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41] /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534] /usr/local/cache-4.1/bin/traffic_server[0x6b17ea] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
[ https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763225#comment-13763225 ] John Plevyak commented on TS-2193: -- This is listed as an experimental performance feature. Personally, I would like to see some numbers before committing resources otherwise I would say that the experiment was a failure. Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1 -- Key: TS-2193 URL: https://issues.apache.org/jira/browse/TS-2193 Project: Traffic Server Issue Type: Bug Components: DNS Affects Versions: 4.1.0 Reporter: Tommy Lee Fix For: 4.1.0 Attachments: bt-01.txt Hi all, I've tried to enable DNS Thread without luck. When i set proxy.config.dns.dedicated_thread to 1, it crashes with the information below. The ATS is working in Forward Proxy mode. Thanks in advance. -- traffic.out NOTE: Traffic Server received Sig 11: Segmentation fault /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0] /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2] /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8] /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a] /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1] /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94] /usr/local/cache-4.1/bin/traffic_server[0x5fd382] /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72] /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94] /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41] /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534] /usr/local/cache-4.1/bin/traffic_server[0x6b17ea] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
[ https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762366#comment-13762366 ] John Plevyak commented on TS-2193: -- the code in dns_result will check to see that we only call back on the same thread that initiated the DNS lookup: if (h-mutex-thread_holding == e-submit_thread) { MUTEX_TRY_LOCK(lock, e-action.mutex, h-mutex-thread_holding); if (!lock) { Debug(dns, failed lock for result %s, e-qname); goto Lretry; } for (int i = 0; i MAX_DNS_RETRIES; i++) { if (e-id[i] 0) break; h-release_query_id(e-id[i]); } e-postEvent(0, 0); } else { for (int i = 0; i MAX_DNS_RETRIES; i++) { if (e-id[i] 0) break; h-release_query_id(e-id[i]); } e-mutex = e-action.mutex; SET_CONTINUATION_HANDLER(e, DNSEntry::postEvent); e-submit_thread-schedule_imm_signal(e); } There are calls which will schedule on *ANY* event thread (e.g. eventProcessor.schedule_XX). These could schedule (e.g. a timeout or other event) on the ET_DNS thread which perhaps isn't initialized for all the processors (e.g. sessions). At one point I removed all calls to the non-specific thread schedule calls, but it is possible there are some how/still. Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1 -- Key: TS-2193 URL: https://issues.apache.org/jira/browse/TS-2193 Project: Traffic Server Issue Type: Bug Components: DNS Affects Versions: 4.1.0 Reporter: Tommy Lee Fix For: 4.1.0 Hi all, I've tried to enable DNS Thread without luck. When i set proxy.config.dns.dedicated_thread to 1, it crashes with the information below. The ATS is working in Forward Proxy mode. Thanks in advance. -- traffic.out NOTE: Traffic Server received Sig 11: Segmentation fault /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0] /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2] /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99] /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8] /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a] /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1] /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94] /usr/local/cache-4.1/bin/traffic_server[0x5fd382] /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72] /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94] /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41] /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534] /usr/local/cache-4.1/bin/traffic_server[0x6b17ea] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-947) AIO Race condition on non NT systems
[ https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758339#comment-13758339 ] John Plevyak commented on TS-947: - Yes, this has been fixed. john AIO Race condition on non NT systems Key: TS-947 URL: https://issues.apache.org/jira/browse/TS-947 Project: Traffic Server Issue Type: Bug Components: Core Environment: stock build with static libts, running on a 4 core server Reporter: B Wyatt Assignee: John Plevyak Fix For: 4.2.0 Attachments: lock-safe-AIO.patch, timed-wait-AIO.patch Refer to code below. The timeslice starting when a consumer thread determines that the temp_list is empty (A) and ending when it releases the aio_mutex(C) is unsafe if the work queues are empty and it breaks loop execution at B. During this timeslice (A-C) the consumer holds the aio_mutex and as a result request producers enqueue items on the temporary atomic list (D). As a consumer in this state will wait for a signal on aio_cond to proceed before processing the temp_list again, any requests on the temp_list are effectively stalled until a future request produces this signal or manually processes the temp_list. In the case of cache volume initialization, there is no future request and the initialization sequence soft locks. {code:title=iocore/aio/AIO.cc(annotated)} void * aio_thread_main(void *arg) { ... ink_mutex_acquire(my_aio_req-aio_mutex); for (;;) { do { current_req = my_aio_req; /* check if any pending requests on the atomic list */ A if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list)) aio_move(my_aio_req); if (!(op = my_aio_req-aio_todo.pop()) !(op = my_aio_req-http_aio_todo.pop())) Bbreak; ... service request ... } while (1); Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex); } ... } static void aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0) { ... if (!ink_mutex_try_acquire(req-aio_mutex)) { Dink_atomiclist_push(req-aio_temp_list, op); } else { /* check if any pending requests on the atomic list */ if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list)) aio_move(req); /* now put the new request */ aio_insert(op, req); ink_cond_signal(req-aio_cond); ink_mutex_release(req-aio_mutex); } ... } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1648) Segmentation fault in dir_clear_range()
[ https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669893#comment-13669893 ] John Plevyak commented on TS-1648: -- Rather than long we should be using int64 as long is not well defined (it is platform dependent). Are those 10TB RAIDs? If so, you are better of using them as JBOD since ATS assumes that there is a single disk arm (or equal fraction) for each disk is storage.config. Because of the size of your disk it is possible that you have more than 2^31 directory entries which would account for the overflow. Also, given the size, the clear may take a long time. Your trace is not long enough for me to see if it repeats. However, if it does repeat, it is possible that it is because dir_in_bucket also takes an int which is then multiplied to get a directory number. The other possibility is (of course) that you have memory corruption: the directory is the single largest memory user, and it contains a linked list which can be circularized by corruption, but let's concentrate on the other issues first. I would suggest that we change all the bucket/entry/etc offsets to int64 (I can build a patch, but I would appreciate a review). Second, I would suggest (after testing to ensure that the patch fixes your problem) that you move to JBOD rather than RAID-0 or to having multiple NAS volumes which correspond approximately to the number of underlying disks since ATS will only have one outstanding write (although multiple reads) for each disk in storage.config. Segmentation fault in dir_clear_range() --- Key: TS-1648 URL: https://issues.apache.org/jira/browse/TS-1648 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.3.0, 3.2.0 Environment: reverse proxy Reporter: Tomasz Kuzemko Assignee: weijin Labels: A Fix For: 3.3.3 Attachments: 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 2x 10TB raw disks. I do not use cache compression. After a few days of running (this is a dev machine - not handling any traffic) ATS begins to crash with a segfault shortly after start: [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage snap 1357917060690487000 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x720ad700 (LWP 17292)] 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at CacheDir.cc:382 382 CacheDir.cc: No such file or directory. in CacheDir.cc (gdb) p i $1 = 214748365 (gdb) l 377 in CacheDir.cc (gdb) p dir_index(vol, i) $2 = (Dir *) 0x7ff997a04002 (gdb) p dir_index(vol, i-1) $3 = (Dir *) 0x7ffa97a03ff8 (gdb) p *dir_index(vol, i-1) $4 = {w = {0, 0, 0, 0, 0}} (gdb) p *dir_index(vol, i-2) $5 = {w = {0, 0, 52431, 52423, 0}} (gdb) p *dir_index(vol, i) Cannot access memory at address 0x7ff997a04002 (gdb) p *dir_index(vol, i+2) Cannot access memory at address 0x7ff997a04016 (gdb) p *dir_index(vol, i+1) Cannot access memory at address 0x7ff997a0400c (gdb) p vol-buckets * DIR_DEPTH * vol-segments $6 = 1246953472 (gdb) bt #0 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at CacheDir.cc:382 #1 0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, event=3900, data=0x16058a0) at Cache.cc:1384 #2 0x004e8e1c in Continuation::handleEvent (this=0x16057d0, event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146 #3 0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80 #4 0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146 #5 0x00700fec in EThread::process_event (this=0x736c4010, e=0x135afc0, calling_code=1) at UnixEThread.cc:142 #6 0x007011ff in EThread::execute (this=0x736c4010) at UnixEThread.cc:191 #7 0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88 #8 0x7797e8ca in start_thread () from /lib/libpthread.so.0 #9 0x755c6b6d in clone () from /lib/libc.so.6 #10 0x in ?? () This is fixed by running traffic_server -Kk to clear the cache. But after a few days the issue reappears. I will keep the current faulty setup as-is in case you need me to provide more data. I tried to make a core dump but it took a couple of GB even after gzip (I can however provide it on request). *Edit* OS is Debian GNU/Linux 6.0.6 with custom built kernel 3.2.13-grsec--grs-ipv6-64 -- This message is automatically generated by JIRA.
[jira] [Commented] (TS-1648) Segmentation fault in dir_clear_range()
[ https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670005#comment-13670005 ] John Plevyak commented on TS-1648: -- I added a patch to make the variables I think are causing the problem int64_t. Segmentation fault in dir_clear_range() --- Key: TS-1648 URL: https://issues.apache.org/jira/browse/TS-1648 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.3.0, 3.2.0 Environment: reverse proxy Reporter: Tomasz Kuzemko Assignee: John Plevyak Labels: A Fix For: 3.3.3 Attachments: 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, cachedir_int64-jp-1.patch I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 2x 10TB raw disks. I do not use cache compression. After a few days of running (this is a dev machine - not handling any traffic) ATS begins to crash with a segfault shortly after start: [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage snap 1357917060690487000 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x720ad700 (LWP 17292)] 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at CacheDir.cc:382 382 CacheDir.cc: No such file or directory. in CacheDir.cc (gdb) p i $1 = 214748365 (gdb) l 377 in CacheDir.cc (gdb) p dir_index(vol, i) $2 = (Dir *) 0x7ff997a04002 (gdb) p dir_index(vol, i-1) $3 = (Dir *) 0x7ffa97a03ff8 (gdb) p *dir_index(vol, i-1) $4 = {w = {0, 0, 0, 0, 0}} (gdb) p *dir_index(vol, i-2) $5 = {w = {0, 0, 52431, 52423, 0}} (gdb) p *dir_index(vol, i) Cannot access memory at address 0x7ff997a04002 (gdb) p *dir_index(vol, i+2) Cannot access memory at address 0x7ff997a04016 (gdb) p *dir_index(vol, i+1) Cannot access memory at address 0x7ff997a0400c (gdb) p vol-buckets * DIR_DEPTH * vol-segments $6 = 1246953472 (gdb) bt #0 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) at CacheDir.cc:382 #1 0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, event=3900, data=0x16058a0) at Cache.cc:1384 #2 0x004e8e1c in Continuation::handleEvent (this=0x16057d0, event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146 #3 0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80 #4 0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146 #5 0x00700fec in EThread::process_event (this=0x736c4010, e=0x135afc0, calling_code=1) at UnixEThread.cc:142 #6 0x007011ff in EThread::execute (this=0x736c4010) at UnixEThread.cc:191 #7 0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88 #8 0x7797e8ca in start_thread () from /lib/libpthread.so.0 #9 0x755c6b6d in clone () from /lib/libc.so.6 #10 0x in ?? () This is fixed by running traffic_server -Kk to clear the cache. But after a few days the issue reappears. I will keep the current faulty setup as-is in case you need me to provide more data. I tried to make a core dump but it took a couple of GB even after gzip (I can however provide it on request). *Edit* OS is Debian GNU/Linux 6.0.6 with custom built kernel 3.2.13-grsec--grs-ipv6-64 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-745) Support ssd
[ https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663697#comment-13663697 ] John Plevyak commented on TS-745: - Humm... let me read over the code. An SSD layer is necessary at this point, and if this is ephemeral, I am sure we can find a clean integration. thanx! Support ssd --- Key: TS-745 URL: https://issues.apache.org/jira/browse/TS-745 Project: Traffic Server Issue Type: New Feature Components: Cache Reporter: mohan_zl Assignee: weijin Fix For: 3.3.5 Attachments: 0001-TS-745-support-interim-caching-in-storage.patch, ts-745.diff, TS-ssd-2.patch, TS-ssd.patch A patch for supporting, not work well for a long time with --enable-debug -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-745) Support ssd
[ https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662684#comment-13662684 ] John Plevyak commented on TS-745: - I think the idea of stealing bits from the directory which are hard coded to point off device (off the hard disk which the directory is a part of) is a huge design departure and a problem. When the cache was first built, it was limited to 8GB disks which seemed HUGE. For Apache I extended it to .5PB as by then 8GB was far too small. Currently disks are at 4TB and this patch would decrease the limit from .5PB to 32TB which gives us only a few years headroom, not a good idea. Furthermore, the current design let's you unplug any cache disk from any machine, move it to another machine and have your cache back. This change stores SSD information in the HDD directory! why? Changing the configuration, a disk or machine failure, etc. invalidates that information corrupting the cache. Why not store that information in a side structure and either store it only in memory only or on the SSD? The idea of storing the SSD configuration in a string in records.config is also a bad idea. Overall, a stacked cache seems like a better idea or a minimally invasive extension would be great. This patch is pretty invasive, duplicates code and generally touches many bits of the code. The ram cache for example uses no bits in the HDD directory and only a couple entry points at well defined places (insert, lookup and delete/invalidate). This patch looks to incur more technical depth at a time when I think we would like to decrease the technical debt. For example, it would be nice to have more smaller locks, move the HTTP support out of the core via a well defined interface, add layering, etc. Adding yet another set of core code paths is going to make those changes harder. my 2 cents. Support ssd --- Key: TS-745 URL: https://issues.apache.org/jira/browse/TS-745 Project: Traffic Server Issue Type: New Feature Components: Cache Reporter: mohan_zl Assignee: weijin Fix For: 3.3.5 Attachments: 0001-TS-745-support-interim-caching-in-storage.patch, ts-745.diff, TS-ssd-2.patch, TS-ssd.patch A patch for supporting, not work well for a long time with --enable-debug -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1453) remove InactivityCop and enable define INACTIVITY_TIMEOUT
[ https://issues.apache.org/jira/browse/TS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636624#comment-13636624 ] John Plevyak commented on TS-1453: -- Couple things: 1) the lock is always held over accesses to disabled so it doesn't need to be volatile 2) I would just change the callback_event to a new EVENT_DISABLED and handle it in NetVConnction::mainEvent. The reason is that this will isolate the changes to the net processor, and I think the interaction of the disabled flag with the timeouts is problematic: you are going to end up rescheduling the event as an immediate eventually which will cause a lot of busy processing. remove InactivityCop and enable define INACTIVITY_TIMEOUT - Key: TS-1453 URL: https://issues.apache.org/jira/browse/TS-1453 Project: Traffic Server Issue Type: Sub-task Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.5 Attachments: TS-1453.patch when we have O(1), then we can be enable define INACTIVITY_TIMEOUT -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631112#comment-13631112 ] John Plevyak commented on TS-1405: -- A third drop in performance on any test is a red flag. There is definitely something wrong. There are two things going on in this patch. 1) it replaces the power of 2 buckets with a time wheel and 2) it introduces an atomic list as a mechanism for freeing up events quickly. Perhaps we can test the two separately? In particular, we can remove the atomic list effects by just having Event::cancel_event() call cancel_action() and commenting out the call to process_cancelled_events(). Leif, you up for running your test again with that change? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627411#comment-13627411 ] John Plevyak commented on TS-1405: -- Weird. The min and max are down, but the mean is up. What happens when you go to 500 connections? I am wondering if it is an efficiency or a latency issue. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625447#comment-13625447 ] John Plevyak commented on TS-1405: -- Sounds good. What sort of CPU/Memory improvements are you seeing? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625460#comment-13625460 ] John Plevyak commented on TS-1405: -- The patch includes: +#if AIO_MODE == AIO_MODE_NATIVE +#define AIO_PERIOD-HRTIME_MSECONDS(4) +#else Even if it was set to zero, on an unloaded system it would only get polled every 10 msecs because that is the poll rate for epoll(), so you could potentially delay a disk IO by that amount of time. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625510#comment-13625510 ] John Plevyak commented on TS-1405: -- Perhaps this is a larger issue. We use eventfd to wake up the event thread on an unloaded system, but it would be best to avoid using it when the system becomes loaded as it is expensive and tends to cause spinning on moderately loaded systems. Perhaps instead we should have operational regimes: use blocking IO threads on an unloaded or lightly loaded system and switching to AIO as the system becomes more heavily loaded. I would also be interested to see how this interacts with SSDs which can have wait times in the micro-second range. The crossover point for an SSD system is likely different than for an HDD system. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1760) use linux native aio
[ https://issues.apache.org/jira/browse/TS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621157#comment-13621157 ] John Plevyak commented on TS-1760: -- This patch seems to have a latency issues and a busy wait issue.The timeout on io_getevents is 4msec which is below the threshold on some systems for busy waiting which is often 10msec. Second, it queues the events onto a handler in the same thread rather than doing the io_submit itself. Third, if the handler which calls io_getevents on an EThread, there is already another call to epoll() which is blocking the same thread. Having two calls on the same thread blocking the thread is not a good idea: they will conflict with one blocking while the other has ready data (i.e. from the net or from the disk). If io_submit is thread safe while there is an currently waiting io_getevents on another thread, then linux aio might be viable for traffic server. If io_getevents played well with epoll() then linux aio might be viable. Really, to get this to work Linux would need to have an integrated async completion API. use linux native aio Key: TS-1760 URL: https://issues.apache.org/jira/browse/TS-1760 Project: Traffic Server Issue Type: Improvement Components: Core Reporter: weijin Assignee: weijin Fix For: 3.3.2 Attachments: native_aio.patch add a feature that use linux native aio -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-1405: - Attachment: linux_time_wheel_v11jp.patch apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618129#comment-13618129 ] John Plevyak commented on TS-1405: -- I missed on case, fixed in v11. I agree that you won't see the race if the timeout (50msec) is sufficiently large and no thread fails to be rescheduled and run in that amount of time, but I think such timing dependent behavior is to be avoided if possible. We have have a couple other races of this type, uses of new_Freer() and flushing of the log buffers but the former use a much larger timeout (1 minute) while the latter may be a cause of occasional crashes which we have not been able to debug for years. Experiences with the log buffer flushing issue are why I am not happy with a race in the event code. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617873#comment-13617873 ] John Plevyak commented on TS-1405: -- No, in the current patch (v10) in process_event the event will only be free'd if cancelled is set to CANCEL_SET which means that the Event is not in the atomic_list. The current v10 patch is simple, fast and has no delay and hence no opportunity for timing related problems. The previous patch checks Event::in_the_priority_queue which can change state at any time when the Event::ethread != this_ethread(). This is a race, and as a result the state of the Event being on the atomic_list is not knowable in the EThread during ::execute(). This will result in crashes. You may not be seeing them because we typically pin all transactions to a single thread unless proxy.config.share_server_session is set to 1, so Event::ethread == this_ethread(), however that is not the case in general. Try testing with this and the appropriate configuration and you will see the problem. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617877#comment-13617877 ] John Plevyak commented on TS-1405: -- If anyone else would like to chime in, I would appreciate it. Race conditions are subtle and when they exist, lead to random crashes which are very difficult to debug, so I would like to be sure that we are not introducing any races with this change. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-1405: - Attachment: linux_time_wheel_v10jp.patch apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616342#comment-13616342 ] John Plevyak commented on TS-1405: -- I am still concerned about race conditions with the v9 patch. In particular when the cancelled flag is set is possible (but not certain) that the event will be in the atomic list. If it is, then it should not be free'd, but if it is not it should be. Doing the wrong thing is either a leak or memory corruption. Furthermore, if we are cancelling from a different thread than the one the Event is on, the in_the_priority_queue flag is racy (it may change at any time) and hence should not be relied upon. Attached please find v10. This patch converts the 'cancelled' flag into a multi-state variable which captures whether or not the Event is in the atomic list. All tests of the cancelled variable now do the right thing with respect to the state of the event. Bin Chen: please take a look at this patch and consider the possible races and tell me what you think. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-1405: - Attachment: linux_time_wheel_v9jp.patch Fix Mutex leak and remove delay. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614895#comment-13614895 ] John Plevyak commented on TS-1405: -- I have uploaded a small modification on the recent v8 patch. This modification removes the delay, fixes a memory leak (of Mutex) and avoids going through the atomic list if we are on the same thread (the typical case). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611638#comment-13611638 ] John Plevyak commented on TS-1405: -- +TS_INLINE void +Event::cancel_event(Continuation * c) +{ + if (!cancelled) { +ink_assert(!c || c == continuation); +ethread-set_event_cancel(this); +cancelled = true; + } +} Once set_event_cancel has run, the Event may be deleted at any time. Do not set the cancelled flag here. It is set in set_cancel_event() in any case. If you set it here you can overwrite free memory (or worse a another event). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611749#comment-13611749 ] John Plevyak commented on TS-1405: -- I think depending on the delay is brittle. You can never tell how long a thread will be delayed in an overloaded system, and the delay increases memory pressure. Rather I would remove the delay, moving the line + event_cancel_list_head = (Event *) ink_atomiclist_popall(event_cancel_list); above the loop in process_cancel_event() (and remove the time test). Then I would move the assignment of cancelled = true into set_event_cancel: if (!e-canceled) { if (e-in_the_priority_queue (e-timeout_at - e-ethread-cur_time) HRTIME_SECONDS(event_cancel_limit)) { /* prevent more threads cancel one event racing */ e-cancelled = true; ink_atomiclist_push(event_cancel_list, e); } else e-cancelled = true; } In fact, I would just incorporate the code in set_event_cancel into cancel_event() since it is only called in one place. So, I agree, that the delay would most likely have prevented a problem, but I think it would be better to not have it, because when future programmers see a constant delay, they might be tempted to decrease it to the point when problems might occur. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609122#comment-13609122 ] John Plevyak commented on TS-1405: -- Why is it segfaulting? Can we backout the commit(s) which which caused the problem? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, linux_time_wheel_v8.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608136#comment-13608136 ] John Plevyak commented on TS-1405: -- If everything is correct there should be no race. You shouldn't be setting the 'cancelled' flag in cancel_event() since it is set in set_cancelled_event. Remove the ink_release_assert(). We should not have any of these: they slow the code down and lead to crash storms which are bad for everyone. There is no race because the caller needs to be holding the mutex, and after the call to cancel_event() the event is considered dead (which is why you shouldn't be setting the cancelled flag AFTER inserting the event into the cancel atomic list, because that is a race). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608142#comment-13608142 ] John Plevyak commented on TS-1405: -- There are only very limited reasons to use an ink_release_assert, in particular if it looks like we could be returning the wrong content to a user. We shouldn't use them to check other invariants as such checks just slow down the production server and are better done during regression testing and not at production time. Moreover, a server that crashes can cause major service disruption, so the assert itself may very well cause more harm than a bug. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, linux_time_wheel_v7.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606671#comment-13606671 ] John Plevyak commented on TS-1405: -- You are using EVENT_FREE, which does not free the mutex (which is reference counted) by setting it to NULL. Try using free_event(). Also, I think process_cancel_event shouldn't delay for 4 seconds, that is far too long. Perhaps 10 msec? Finally, why is the ink_atomic_popall happening at the end of process cancel event? shouldn't event_cancel_list_head be local and the call happen at the start (after the delay)? apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, linux_time_wheel_v6.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604324#comment-13604324 ] John Plevyak commented on TS-1405: -- Could you update this patch to be against the current master branch? I am getting a compile failure: UnixEThread.cc: In constructor 'EThread::EThread()': UnixEThread.cc:57:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, int)': UnixEThread.cc:79:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, Event*, ink_sem*)': UnixEThread.cc:116:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope and a patch failure: --- iocore/net/P_UnixNetVConnection.h +++ iocore/net/P_UnixNetVConnection.h @@ -339,7 +339,7 @@ inactivity_timeout_in = 0; #ifdef INACTIVITY_TIMEOUT if (inactivity_timeout) { -inactivity_timeout-cancel_action(this); +inactivity_timeout-cancel_event(this); inactivity_timeout = NULL; } #else @@ -351,7 +351,7 @@ UnixNetVConnection::cancel_active_timeout() { if (active_timeout) { -active_timeout-cancel_action(this); +active_timeout-cancel_event(this); active_timeout = NULL; active_timeout_in = 0; } ~ apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604486#comment-13604486 ] John Plevyak commented on TS-1405: -- Thanx! apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.2 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, linux_time_wheel_v5.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap
[ https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600214#comment-13600214 ] John Plevyak commented on TS-1742: -- There are still a number of volatile declarations associated head_p, and they need to be made consistent. Anyone with an ARM/i386 system want to do the honors? At the very least it looks like here and in ink_queue.h where head_p itself is declared volatile. john Freelists to use 64bit version w/ Double Word Compare and Swap -- Key: TS-1742 URL: https://issues.apache.org/jira/browse/TS-1742 Project: Traffic Server Issue Type: Improvement Reporter: Brian Geffon Assignee: Brian Geffon Fix For: 3.3.2 Attachments: 128bit_cas.patch, 128bit_cas.patch.2 So to those of you familiar with the freelists you know that it works this way the head pointer uses the upper 16 bits for a version to prevent the ABA problem. The big drawback to this is that it requires the following macros to get at the pointer or the version: {code} #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)16)16) | \ (((~intptr_t)(_x).data)1663)-1))48)48))) // sign extend #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)48) #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \ (_x).data = intptr_t)(_p))0xULL) | (((_v)0xULL) 48)) {code} Additionally, since this only leaves 16 bits it limits the number of versions you can have, well more and more x86_64 processors support DCAS (double word compare and swap / 128bit CAS). This means that we can use 64bits for a version which basically makes the versions unlimited but more importantly it takes those macros above and simplifies them to: {code} #define FREELIST_POINTER(_x) (_x).s.pointer #define FREELIST_VERSION(_x) (_x).s.version #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \ (_x).s.pointer = _p; (_x).s.version = _v {code} As you can imagine this will have a performance improvement, in my simple tests I measured a performance improvement of around 6%. Unfortunately, I'm not an expert with this stuff and I would really appreciate more community feedback before I commit this patch. Note: this only applies if you're not using a reclaimable freelist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap
[ https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598350#comment-13598350 ] John Plevyak commented on TS-1742: -- Looks good. I might consider adding an ink_debug_assert that the type_size for the freelist now be at least 16 bytes when this is enabled. Freelists to use 64bit version w/ Double Word Compare and Swap -- Key: TS-1742 URL: https://issues.apache.org/jira/browse/TS-1742 Project: Traffic Server Issue Type: Improvement Reporter: Brian Geffon Assignee: Brian Geffon Attachments: 128bit_cas.patch, 128bit_cas.patch.2 So to those of you familiar with the freelists you know that it works this way the head pointer uses the upper 16 bits for a version to prevent the ABA problem. The big drawback to this is that it requires the following macros to get at the pointer or the version: {code} #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)16)16) | \ (((~intptr_t)(_x).data)1663)-1))48)48))) // sign extend #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)48) #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \ (_x).data = intptr_t)(_p))0xULL) | (((_v)0xULL) 48)) {code} Additionally, since this only leaves 16 bits it limits the number of versions you can have, well more and more x86_64 processors support DCAS (double word compare and swap / 128bit CAS). This means that we can use 64bits for a version which basically makes the versions unlimited but more importantly it takes those macros above and simplifies them to: {code} #define FREELIST_POINTER(_x) (_x).s.pointer #define FREELIST_VERSION(_x) (_x).s.version #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \ (_x).s.pointer = _p; (_x).s.version = _v {code} As you can imagine this will have a performance improvement, in my simple tests I measured a performance improvement of around 6%. Unfortunately, I'm not an expert with this stuff and I would really appreciate more community feedback before I commit this patch. Note: this only applies if you're not using a reclaimable freelist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap
[ https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598501#comment-13598501 ] John Plevyak commented on TS-1742: -- We can take it out because we do the loads manually for the cas. I always liked to use volatile as a marker that the variable was being accessed outside of a lock, but if it is causing a performance problem then we could convert the keyword into a comment: // Warning: this variable is read and written in multiple threads without a lock, use INK_QUEUE_LD to read safely. john Freelists to use 64bit version w/ Double Word Compare and Swap -- Key: TS-1742 URL: https://issues.apache.org/jira/browse/TS-1742 Project: Traffic Server Issue Type: Improvement Reporter: Brian Geffon Assignee: Brian Geffon Attachments: 128bit_cas.patch, 128bit_cas.patch.2 So to those of you familiar with the freelists you know that it works this way the head pointer uses the upper 16 bits for a version to prevent the ABA problem. The big drawback to this is that it requires the following macros to get at the pointer or the version: {code} #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)16)16) | \ (((~intptr_t)(_x).data)1663)-1))48)48))) // sign extend #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)48) #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \ (_x).data = intptr_t)(_p))0xULL) | (((_v)0xULL) 48)) {code} Additionally, since this only leaves 16 bits it limits the number of versions you can have, well more and more x86_64 processors support DCAS (double word compare and swap / 128bit CAS). This means that we can use 64bits for a version which basically makes the versions unlimited but more importantly it takes those macros above and simplifies them to: {code} #define FREELIST_POINTER(_x) (_x).s.pointer #define FREELIST_VERSION(_x) (_x).s.version #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \ (_x).s.pointer = _p; (_x).s.version = _v {code} As you can imagine this will have a performance improvement, in my simple tests I measured a performance improvement of around 6%. Unfortunately, I'm not an expert with this stuff and I would really appreciate more community feedback before I commit this patch. Note: this only applies if you're not using a reclaimable freelist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588493#comment-13588493 ] John Plevyak commented on TS-1405: -- I am getting some compilation errors with tcc 4.7.2 : UnixEThread.cc:159:83: error: no matching function for call to 'ink_atomic_cas(int32_t*, bool, bool)' UnixEThread.cc:159:83: note: candidate is: In file included from ../../lib/ts/libts.h:52:0, from P_EventSystem.h:39, from UnixEThread.cc:30: ../../lib/ts/ink_atomic.h:152:1: note: templateclass T bool ink_atomic_cas(volatile T*, T, T) ../../lib/ts/ink_atomic.h:152:1: note: template argument deduction/substitution failed: UnixEThread.cc:159:83: note: deduced conflicting types for parameter 'T' ('int' and 'bool') Also: UnixEThread.cc: In constructor 'EThread::EThread()': UnixEThread.cc:58:81: error: 'IOCORE_ReadConfigInteger' was not declared in this scope apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588498#comment-13588498 ] John Plevyak commented on TS-1405: -- Instance variables CancelList need to start with a lower case letter and use _ to separate words (like all the other variables in this file). apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588500#comment-13588500 ] John Plevyak commented on TS-1405: -- The atomic list is single linked, so you could use SLINK for clink in Event. There are lots of events, so an extra field is worth saving. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: Bin Chen Assignee: Bin Chen Fix For: 3.3.1 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, linux_time_wheel_v4.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1006) memory management, cut down memory waste ?
[ https://issues.apache.org/jira/browse/TS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529116#comment-13529116 ] John Plevyak commented on TS-1006: -- I agree. We should land this initially as a compile time option in the dev branch to get wider production time on it before moving it to default. The main reason is that it is invasive and complicated, particularly in the way it will interact with the VM system and it would be nice to see how it responds in a variety of environments. If it is much better than TCMalloc, then perhaps we should package it up in a more general form as well. Was the design based on another allocator/paper? Any references? john memory management, cut down memory waste ? -- Key: TS-1006 URL: https://issues.apache.org/jira/browse/TS-1006 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.1.1 Reporter: Zhao Yongming Assignee: Bin Chen Fix For: 3.3.2 Attachments: 0001-Allocator-optimize-InkFreeList-memory-pool.patch, 0002-Allocator-make-InkFreeList-memory-pool-configurable.patch, Memory-Usage-After-Introduced-New-Allocator.png, memusage.ods, memusage.ods when we review the memory usage in the production, there is something abnormal, ie, looks like TS take much memory than index data + common system waste, and here is some memory dump result by set proxy.config.dump_mem_info_frequency 1, the one on a not so busy forwarding system: physics memory: 32G RAM cache: 22G DISK: 6140 GB average_object_size 64000 {code} allocated |in-use | type size | free list name |||-- 671088640 | 37748736 |2097152 | memory/ioBufAllocator[14] 2248146944 | 2135949312 |1048576 | memory/ioBufAllocator[13] 1711276032 | 1705508864 | 524288 | memory/ioBufAllocator[12] 1669332992 | 1667760128 | 262144 | memory/ioBufAllocator[11] 2214592512 | 221184 | 131072 | memory/ioBufAllocator[10] 2325741568 | 2323775488 | 65536 | memory/ioBufAllocator[9] 2091909120 | 2089123840 | 32768 | memory/ioBufAllocator[8] 1956642816 | 1956478976 | 16384 | memory/ioBufAllocator[7] 2094530560 | 2094071808 | 8192 | memory/ioBufAllocator[6] 356515840 | 355540992 | 4096 | memory/ioBufAllocator[5] 1048576 | 14336 | 2048 | memory/ioBufAllocator[4] 131072 | 0 | 1024 | memory/ioBufAllocator[3] 65536 | 0 |512 | memory/ioBufAllocator[2] 32768 | 0 |256 | memory/ioBufAllocator[1] 16384 | 0 |128 | memory/ioBufAllocator[0] 0 | 0 |576 | memory/ICPRequestCont_allocator 0 | 0 |112 | memory/ICPPeerReadContAllocator 0 | 0 |432 | memory/PeerReadDataAllocator 0 | 0 | 32 | memory/MIMEFieldSDKHandle 0 | 0 |240 | memory/INKVConnAllocator 0 | 0 | 96 | memory/INKContAllocator 4096 | 0 | 32 | memory/apiHookAllocator 0 | 0 |288 | memory/FetchSMAllocator 0 | 0 | 80 | memory/prefetchLockHandlerAllocator 0 | 0 |176 | memory/PrefetchBlasterAllocator 0 | 0 | 80 | memory/prefetchUrlBlaster 0 | 0 | 96 | memory/blasterUrlList 0 | 0 | 96 | memory/prefetchUrlEntryAllocator 0 | 0 |128 | memory/socksProxyAllocator 0 | 0 |144 | memory/ObjectReloadCont 3258368 | 576016 |592 | memory/httpClientSessionAllocator 825344 | 139568 |208 | memory/httpServerSessionAllocator 22597632 |1284848 | 9808 | memory/httpSMAllocator 0 | 0 | 32 | memory/CacheLookupHttpConfigAllocator 0 | 0 | 9856 | memory/httpUpdateSMAllocator 0 |
[jira] [Commented] (TS-1006) memory management, cut down memory waste ?
[ https://issues.apache.org/jira/browse/TS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528528#comment-13528528 ] John Plevyak commented on TS-1006: -- Some of the volatile variables are not listed as such (e.g. InkThreadCache::status). Also, what is the purpose of this status field and how is it updated? It is set in ink_freelist_new to 0 via simple assignment, then tested/assigned via a cas in ink_freelist_free. Some comments, or documentation would be nice. Have you tested this against the default memory allocator and TCMalloc? This seems to be doing something similar to TCMalloc and that code has been extensively tested. memory management, cut down memory waste ? -- Key: TS-1006 URL: https://issues.apache.org/jira/browse/TS-1006 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.1.1 Reporter: Zhao Yongming Assignee: Bin Chen Fix For: 3.3.2 Attachments: 0001-Allocator-optimize-InkFreeList-memory-pool.patch, 0002-Allocator-make-InkFreeList-memory-pool-configurable.patch, Memory-Usage-After-Introduced-New-Allocator.png, memusage.ods, memusage.ods when we review the memory usage in the production, there is something abnormal, ie, looks like TS take much memory than index data + common system waste, and here is some memory dump result by set proxy.config.dump_mem_info_frequency 1, the one on a not so busy forwarding system: physics memory: 32G RAM cache: 22G DISK: 6140 GB average_object_size 64000 {code} allocated |in-use | type size | free list name |||-- 671088640 | 37748736 |2097152 | memory/ioBufAllocator[14] 2248146944 | 2135949312 |1048576 | memory/ioBufAllocator[13] 1711276032 | 1705508864 | 524288 | memory/ioBufAllocator[12] 1669332992 | 1667760128 | 262144 | memory/ioBufAllocator[11] 2214592512 | 221184 | 131072 | memory/ioBufAllocator[10] 2325741568 | 2323775488 | 65536 | memory/ioBufAllocator[9] 2091909120 | 2089123840 | 32768 | memory/ioBufAllocator[8] 1956642816 | 1956478976 | 16384 | memory/ioBufAllocator[7] 2094530560 | 2094071808 | 8192 | memory/ioBufAllocator[6] 356515840 | 355540992 | 4096 | memory/ioBufAllocator[5] 1048576 | 14336 | 2048 | memory/ioBufAllocator[4] 131072 | 0 | 1024 | memory/ioBufAllocator[3] 65536 | 0 |512 | memory/ioBufAllocator[2] 32768 | 0 |256 | memory/ioBufAllocator[1] 16384 | 0 |128 | memory/ioBufAllocator[0] 0 | 0 |576 | memory/ICPRequestCont_allocator 0 | 0 |112 | memory/ICPPeerReadContAllocator 0 | 0 |432 | memory/PeerReadDataAllocator 0 | 0 | 32 | memory/MIMEFieldSDKHandle 0 | 0 |240 | memory/INKVConnAllocator 0 | 0 | 96 | memory/INKContAllocator 4096 | 0 | 32 | memory/apiHookAllocator 0 | 0 |288 | memory/FetchSMAllocator 0 | 0 | 80 | memory/prefetchLockHandlerAllocator 0 | 0 |176 | memory/PrefetchBlasterAllocator 0 | 0 | 80 | memory/prefetchUrlBlaster 0 | 0 | 96 | memory/blasterUrlList 0 | 0 | 96 | memory/prefetchUrlEntryAllocator 0 | 0 |128 | memory/socksProxyAllocator 0 | 0 |144 | memory/ObjectReloadCont 3258368 | 576016 |592 | memory/httpClientSessionAllocator 825344 | 139568 |208 | memory/httpServerSessionAllocator 22597632 |1284848 | 9808 | memory/httpSMAllocator 0 | 0 | 32 | memory/CacheLookupHttpConfigAllocator 0 | 0 | 9856 | memory/httpUpdateSMAllocator 0 | 0 |128 |
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449950#comment-13449950 ] John Plevyak commented on TS-1405: -- There is a race between the adding into the atomic list in the cancelling thread, getting dequeued in the controlling thread, and the setting of the cancelled flag in the cancelling thread. One solution is to take the mutex lock in the check_ready code as the cancelling thread must be holding that lock over the insert into the atomic list and setting the cancelled flag. Note, you could set the cancelled flag before adding to the atomic list and then just ignore it in process_thread() (and any other place) counting on it getting free'd eventually via the atomic list. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: linux_time_wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449953#comment-13449953 ] John Plevyak commented on TS-1405: -- weijin: I don't know that freeing it as soon as possible is as big a goal as race conditions are a problem :) The current code can take up to 5 seconds to free a cancelled event, so this code is much better in that regard, even if we have to wait for the next time the event loop runs. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.1 Attachments: linux_time_wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443575#comment-13443575 ] John Plevyak commented on TS-1405: -- The current code should have a complexity which is bounded by the need to scan the entire queue every 5 seconds. This is necessary because cancelling an event involves setting the volatile cancelled flag and to not scan them would result in running out of memory. Assuming an event is inserted with a 30 seconds timeout and waits till it runs, it will be touched 30/5 = 6 + 10 = 16 times. For a 300 second timeout it will be touched 300/5 = 60 + 10 = 70 times. If an event is cancelled (the normal case for timeouts). Then it will be touched once (after an average of 2.5 seconds). So (at least according to the design). The cost of the current design should be only a small constant factor worse than the time wheel and should average slightly more than 1 touch per event which is the best that can be expected. Of course that is the design if it is causing problems, then likely there is a bug or something about the workload which is causing problems. The time wheel can bring this down to 1 touch every N seconds with expected 1 touch per event or 6 and 60 above. So, I think this is a very reasonable change, assuming that it can deal with the out-of-memory issue, and I interested in seeing the benchmarks as I am curious as to see how the theory and practice collide. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system
[ https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443583#comment-13443583 ] John Plevyak commented on TS-1405: -- Sorry, the numbers for 30 seconds should be 30/5 + ~17 (every time a power of 2 bucket is touched, 1/2 of the of the elements will be moved out, and 1/2 of those will be moved down 2 levels, etc.) = 27 vs 7 for the time wheel So the time wheel, in the case of short expired timeouts, can be several times more efficient. apply time-wheel scheduler about event system -- Key: TS-1405 URL: https://issues.apache.org/jira/browse/TS-1405 Project: Traffic Server Issue Type: Improvement Components: Core Affects Versions: 3.2.0 Reporter: kuotai Assignee: kuotai Fix For: 3.3.0 Attachments: time-wheel.patch when have more and more event in event system scheduler, it's worse. This is the reason why we use inactivecop to handler keepalive. the new scheduler is time-wheel. It's have better time complexity(O(1)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (TS-1264) LRU RAM cache not accounting for overhead
John Plevyak created TS-1264: Summary: LRU RAM cache not accounting for overhead Key: TS-1264 URL: https://issues.apache.org/jira/browse/TS-1264 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.1.3 Reporter: John Plevyak Assignee: John Plevyak Priority: Minor The CLFUS RAM cache takes its overhead into account when determining how many bytes it is using. The LRU cache does not which makes it hard to compare performance between the two and hard to correctly size the LRU RAM cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1240) Debug assert triggered in LogBuffer.cc:209
[ https://issues.apache.org/jira/browse/TS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277284#comment-13277284 ] John Plevyak commented on TS-1240: -- What is the downside to restoring the delete delay buffer (was the memory usage too high)? Debug assert triggered in LogBuffer.cc:209 -- Key: TS-1240 URL: https://issues.apache.org/jira/browse/TS-1240 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.1.4 Reporter: Leif Hedstrom Fix For: 3.1.5 From John: {code} [May 1 09:08:44.746] Server {0x77fce800} NOTE: traffic server running FATAL: LogBuffer.cc:209: failed assert `m_unaligned_buffer` /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server - STACK TRACE: /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(ink_fatal+0xa3)[0x77bae4a5] /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(_ink_assert+0x3c)[0x77bad47c] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x35)[0x5d3a53] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject15_checkout_writeEPmm+0x41)[0x5eef75] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x4cb)[0x5ef5b9] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x4a)[0x5daab4] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN3Log6accessEP9LogAccess+0x235)[0x5d97f9] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12update_statsEv+0x204)[0x579872] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM9kill_thisEv+0x31d)[0x579525] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12main_handlerEiPv+0x337)[0x56cec1] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x14c)[0x5b24aa] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bb9d1] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bbafa] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x6fa)[0x6bcaaf] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z12write_to_netP10NetHandlerP18UnixNetVConnectionP14PollDescriptorP7EThread+0x7d)[0x6bc3b3] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x6e6)[0x6b8828] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread13process_eventEP5Eventi+0x111)[0x6dde7f] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread7executeEv+0x431)[0x6de42b] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6dd0bc] /lib64/libpthread.so.0(+0x7d90)[0x77676d90] /lib64/libc.so.6(clone+0x6d)[0x754f9f5d] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1238) RAM cache hit rate unexpectedly low
[ https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276422#comment-13276422 ] John Plevyak commented on TS-1238: -- I think the problem is that the LRU cache doesn't account for its overhead, while the CLFUS cache does which puts it at an unfair disadvantage in terms of relative true memory used per byte allocated. The CLFUS cache is much better behaved when the working set is larger than the RAM cache size and it supports compression. I am going to commit this fix and leave CLFUS as the default and file another bug to fix the accounting for ram in the LRU cache. I think this will make the performance comparable in the best case and better in worse cases. RAM cache hit rate unexpectedly low --- Key: TS-1238 URL: https://issues.apache.org/jira/browse/TS-1238 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.1.3 Reporter: John Plevyak Assignee: John Plevyak Fix For: 3.1.4 Attachments: TS-1238-jp-1.patch The RAM cache is not getting the expected hit rate. Looks like there are a couple issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-934) Proxy Mutex null pointer crash
[ https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274381#comment-13274381 ] John Plevyak commented on TS-934: - Is this still happening with the latest code? Proxy Mutex null pointer crash -- Key: TS-934 URL: https://issues.apache.org/jira/browse/TS-934 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Debian 6.0.2 quadcore, forward transparent proxy. Reporter: Alan M. Carroll Assignee: Alan M. Carroll Fix For: 3.1.4, 3.1.1 Attachments: ts-934-patch.txt [Client report] We had the cache crash gracefully twice last night on a segfault. Both times the callstack produced by trafficserver's signal handler was: /usr/bin/traffic_server[0x529596] /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60] [0x2ab09e7c0a10] usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c] /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6] /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a] /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226] /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28] /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x69)[0x4e4623] I went through the disassembly and the instruction that it is on in ::do_io_close is loading the value of diags (not dereferencing it) so it is unlikely that that through a segfault (unless this is some how in thread local storage and that is corrupt). The kernel message claimed that the instruction pointer was 0x4e438e which in this build is in ProxyMutexPtr::operator -() on the instruction that dereferences the object pointer to get the stored mutex pointer (bingo!), so it would seem that at some point we are dereferencing a null safe pointer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-934) Proxy Mutex null pointer crash
[ https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274383#comment-13274383 ] John Plevyak commented on TS-934: - I think we should undo this as other changes fixed the bug. Proxy Mutex null pointer crash -- Key: TS-934 URL: https://issues.apache.org/jira/browse/TS-934 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Debian 6.0.2 quadcore, forward transparent proxy. Reporter: Alan M. Carroll Assignee: Alan M. Carroll Fix For: 3.1.4, 3.1.1 Attachments: ts-934-patch.txt [Client report] We had the cache crash gracefully twice last night on a segfault. Both times the callstack produced by trafficserver's signal handler was: /usr/bin/traffic_server[0x529596] /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60] [0x2ab09e7c0a10] usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c] /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6] /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a] /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226] /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28] /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x69)[0x4e4623] I went through the disassembly and the instruction that it is on in ::do_io_close is loading the value of diags (not dereferencing it) so it is unlikely that that through a segfault (unless this is some how in thread local storage and that is corrupt). The kernel message claimed that the instruction pointer was 0x4e438e which in this build is in ProxyMutexPtr::operator -() on the instruction that dereferences the object pointer to get the stored mutex pointer (bingo!), so it would seem that at some point we are dereferencing a null safe pointer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-934) Proxy Mutex null pointer crash
[ https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-934: Assignee: John Plevyak (was: Alan M. Carroll) Proxy Mutex null pointer crash -- Key: TS-934 URL: https://issues.apache.org/jira/browse/TS-934 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Debian 6.0.2 quadcore, forward transparent proxy. Reporter: Alan M. Carroll Assignee: John Plevyak Fix For: 3.1.4, 3.1.1 Attachments: ts-934-jp1.patch, ts-934-patch.txt [Client report] We had the cache crash gracefully twice last night on a segfault. Both times the callstack produced by trafficserver's signal handler was: /usr/bin/traffic_server[0x529596] /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60] [0x2ab09e7c0a10] usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c] /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6] /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a] /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226] /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28] /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x69)[0x4e4623] I went through the disassembly and the instruction that it is on in ::do_io_close is loading the value of diags (not dereferencing it) so it is unlikely that that through a segfault (unless this is some how in thread local storage and that is corrupt). The kernel message claimed that the instruction pointer was 0x4e438e which in this build is in ProxyMutexPtr::operator -() on the instruction that dereferences the object pointer to get the stored mutex pointer (bingo!), so it would seem that at some point we are dereferencing a null safe pointer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-934) Proxy Mutex null pointer crash
[ https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-934: Attachment: ts-934-jp1.patch This undoes the previous patch as this issue was addressed under a different bug. Proxy Mutex null pointer crash -- Key: TS-934 URL: https://issues.apache.org/jira/browse/TS-934 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: Debian 6.0.2 quadcore, forward transparent proxy. Reporter: Alan M. Carroll Assignee: John Plevyak Fix For: 3.1.4, 3.1.1 Attachments: ts-934-jp1.patch, ts-934-patch.txt [Client report] We had the cache crash gracefully twice last night on a segfault. Both times the callstack produced by trafficserver's signal handler was: /usr/bin/traffic_server[0x529596] /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60] [0x2ab09e7c0a10] usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c] /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6] /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a] /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226] /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28] /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x69)[0x4e4623] I went through the disassembly and the instruction that it is on in ::do_io_close is loading the value of diags (not dereferencing it) so it is unlikely that that through a segfault (unless this is some how in thread local storage and that is corrupt). The kernel message claimed that the instruction pointer was 0x4e438e which in this build is in ProxyMutexPtr::operator -() on the instruction that dereferences the object pointer to get the stored mutex pointer (bingo!), so it would seem that at some point we are dereferencing a null safe pointer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1240) Debug assert triggered in LogBuffer.cc:209
[ https://issues.apache.org/jira/browse/TS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274393#comment-13274393 ] John Plevyak commented on TS-1240: -- I an repro on my machine any time you like :) Debug assert triggered in LogBuffer.cc:209 -- Key: TS-1240 URL: https://issues.apache.org/jira/browse/TS-1240 Project: Traffic Server Issue Type: Bug Components: Logging Affects Versions: 3.1.4 Reporter: Leif Hedstrom Fix For: 3.1.5 From John: {code} [May 1 09:08:44.746] Server {0x77fce800} NOTE: traffic server running FATAL: LogBuffer.cc:209: failed assert `m_unaligned_buffer` /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server - STACK TRACE: /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(ink_fatal+0xa3)[0x77bae4a5] /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(_ink_assert+0x3c)[0x77bad47c] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x35)[0x5d3a53] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject15_checkout_writeEPmm+0x41)[0x5eef75] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x4cb)[0x5ef5b9] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x4a)[0x5daab4] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN3Log6accessEP9LogAccess+0x235)[0x5d97f9] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12update_statsEv+0x204)[0x579872] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM9kill_thisEv+0x31d)[0x579525] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12main_handlerEiPv+0x337)[0x56cec1] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x14c)[0x5b24aa] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bb9d1] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bbafa] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x6fa)[0x6bcaaf] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z12write_to_netP10NetHandlerP18UnixNetVConnectionP14PollDescriptorP7EThread+0x7d)[0x6bc3b3] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x6e6)[0x6b8828] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread13process_eventEP5Eventi+0x111)[0x6dde7f] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread7executeEv+0x431)[0x6de42b] /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6dd0bc] /lib64/libpthread.so.0(+0x7d90)[0x77676d90] /lib64/libc.so.6(clone+0x6d)[0x754f9f5d] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-1238) RAM cache hit rate unexpectedly low
[ https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271042#comment-13271042 ] John Plevyak commented on TS-1238: -- It isn't committed. Bryan was going to try it out. It changes one of the defaults (probably for the better) for RAM caching, but I wanted to give him a chance to take a look. I'll see if I can figure it out myself as well. It should be a very safe change. RAM cache hit rate unexpectedly low --- Key: TS-1238 URL: https://issues.apache.org/jira/browse/TS-1238 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.1.3 Reporter: John Plevyak Assignee: John Plevyak Fix For: 3.1.4 Attachments: TS-1238-jp-1.patch The RAM cache is not getting the expected hit rate. Looks like there are a couple issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (TS-1238) RAM cache hit rate unexpectedly low
John Plevyak created TS-1238: Summary: RAM cache hit rate unexpectedly low Key: TS-1238 URL: https://issues.apache.org/jira/browse/TS-1238 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.1.3 Reporter: John Plevyak Assignee: John Plevyak Fix For: 3.1.4 The RAM cache is not getting the expected hit rate. Looks like there are a couple issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1238) RAM cache hit rate unexpectedly low
[ https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-1238: - Attachment: TS-1238-jp-1.patch Add new option to disable/enable the seen_filter in the RAM cache. Fix reporting of RAM cache hits to HTTP. Fix for LRU cache. Add back in seen_filter to LRU (disabled by default). Disable seen filter by default for CLFUS. RAM cache hit rate unexpectedly low --- Key: TS-1238 URL: https://issues.apache.org/jira/browse/TS-1238 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 3.1.3 Reporter: John Plevyak Assignee: John Plevyak Fix For: 3.1.4 Attachments: TS-1238-jp-1.patch The RAM cache is not getting the expected hit rate. Looks like there are a couple issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-1225) doc_size still gets casted to int in a few places
[ https://issues.apache.org/jira/browse/TS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-1225: - Attachment: ts-1225.diff Remove cast to 32bits of doc_len. doc_size still gets casted to int in a few places - Key: TS-1225 URL: https://issues.apache.org/jira/browse/TS-1225 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Leif Hedstrom Assignee: John Plevyak Fix For: 3.1.4 Attachments: ts-1225.diff This was also discussed on TS-475, and discovered by bwyatt. I'm filing a separate bug, since I think this should be fixed independent of TS-475. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (TS-888) SSL connections working with 2.1.5 fail with 3.0.1 and FireFox
[ https://issues.apache.org/jira/browse/TS-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak closed TS-888. --- Resolution: Fixed Assignee: John Plevyak (was: Leif Hedstrom) Fixed 1155125. SSL connections working with 2.1.5 fail with 3.0.1 and FireFox -- Key: TS-888 URL: https://issues.apache.org/jira/browse/TS-888 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 3.0.1 Environment: Ubuntu 10.04 LTS amd64, Glassfish 3.0.1, FireFox 5.0 Reporter: Kurt Huwig Assignee: John Plevyak Fix For: 3.1.0 Attachments: TS-888-jp.patch ATS has SSL server certificates. The backend is accessed via SSL as well which uses the same certificates. It fails with FireFox, but works with Google Chrome. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-888) SSL connections working with 2.1.5 fail with 3.0.1 and FireFox
[ https://issues.apache.org/jira/browse/TS-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-888: Attachment: TS-888-jp.patch SSL connections working with 2.1.5 fail with 3.0.1 and FireFox -- Key: TS-888 URL: https://issues.apache.org/jira/browse/TS-888 Project: Traffic Server Issue Type: Bug Components: SSL Affects Versions: 3.0.1 Environment: Ubuntu 10.04 LTS amd64, Glassfish 3.0.1, FireFox 5.0 Reporter: Kurt Huwig Assignee: Leif Hedstrom Fix For: 3.1.0 Attachments: TS-888-jp.patch ATS has SSL server certificates. The backend is accessed via SSL as well which uses the same certificates. It fails with FireFox, but works with Google Chrome. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-844) ReadFromWriter fail in CacheRead.cc
[ https://issues.apache.org/jira/browse/TS-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076013#comment-13076013 ] John Plevyak commented on TS-844: - I'd like to know what the top of the stack looked like and also what fail means in this context. The patch is safe in the sense that it is conservative, but if a write has been closed, but not yet been written into the aggregation buffer, this patch will prevent that data from being available for a ReadFromWriter. At least that is how I read it. What I am wondering is what about a closed by not yet written CacheVC is making ReadaFromWriter fail? ReadFromWriter fail in CacheRead.cc --- Key: TS-844 URL: https://issues.apache.org/jira/browse/TS-844 Project: Traffic Server Issue Type: Bug Reporter: mohan_zl Fix For: 3.1.0 Attachments: TS-844.patch {code} #6 0x006ab4d7 in CacheVC::openReadChooseWriter (this=0x2aaaf81523d0, event=1, e=0x0) at CacheRead.cc:320 #7 0x006abdc9 in CacheVC::openReadFromWriter (this=0x2aaaf81523d0, event=1, e=0x0) at CacheRead.cc:411 #8 0x004d302f in Continuation::handleEvent (this=0x2aaaf81523d0, event=1, data=0x0) at I_Continuation.h:146 #9 0x006ae2b9 in Cache::open_read (this=0x2aaab0001c40, cont=0x2aaab4472aa0, key=0x42100b10, request=0x2aaab44710f0, params=0x2aaab4470928, type=CACHE_FRAG_TYPE_HTTP, hostname=0x2aab09581049 js.tongji.linezing.comicon1.gifjs.tongji.linezing.com�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ�ï¿ï¿½ÞÂ���..., host_len=22) at CacheRead.cc:228 #10 0x0068da30 in Cache::open_read (this=0x2aaab0001c40, cont=0x2aaab4472aa0, url=0x2aaab4471108, request=0x2aaab44710f0, params=0x2aaab4470928, type=CACHE_FRAG_TYPE_HTTP) at P_CacheInternal.h:1068 #11 0x0067d32f in CacheProcessor::open_read (this=0xf2c030, cont=0x2aaab4472aa0, url=0x2aaab4471108, request=0x2aaab44710f0, params=0x2aaab4470928, pin_in_cache=0, type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3011 #12 0x0054e058 in HttpCacheSM::do_cache_open_read (this=0x2aaab4472aa0) at HttpCacheSM.cc:220 #13 0x0054e1a7 in HttpCacheSM::open_read (this=0x2aaab4472aa0, url=0x2aaab4471108, hdr=0x2aaab44710f0, params=0x2aaab4470928, pin_in_cache=0) at HttpCacheSM.cc:252 #14 0x00568404 in HttpSM::do_cache_lookup_and_read (this=0x2aaab4470830) at HttpSM.cc:3893 #15 0x005734b5 in HttpSM::set_next_state (this=0x2aaab4470830) at HttpSM.cc:6436 #16 0x0056115a in HttpSM::call_transact_and_set_next_state (this=0x2aaab4470830, f=0) at HttpSM.cc:6328 #17 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at HttpSM.cc:1516 #18 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, event=0, data=0x0) at HttpSM.cc:1448 #19 0x0056de77 in HttpSM::do_api_callout_internal (this=0x2aaab4470830) at HttpSM.cc:4345 #20 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at HttpSM.cc:497 #21 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at HttpSM.cc:6362 #22 0x0056115a in HttpSM::call_transact_and_set_next_state (this=0x2aaab4470830, f=0) at HttpSM.cc:6328 #23 0x00572faf in HttpSM::set_next_state (this=0x2aaab4470830) at HttpSM.cc:6378 #24 0x0056115a in HttpSM::call_transact_and_set_next_state (this=0x2aaab4470830, f=0) at HttpSM.cc:6328 #25 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at HttpSM.cc:1516 #26 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, event=0, data=0x0) at HttpSM.cc:1448 #27 0x0056de77 in HttpSM::do_api_callout_internal (this=0x2aaab4470830) at HttpSM.cc:4345 #28 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at HttpSM.cc:497 #29 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at HttpSM.cc:6362 #30 0x0056115a in HttpSM::call_transact_and_set_next_state (this=0x2aaab4470830, f=0) at HttpSM.cc:6328 #31 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at HttpSM.cc:1516 #32 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, event=0, data=0x0) at HttpSM.cc:1448 #33 0x0056de77 in HttpSM::do_api_callout_internal (this=0x2aaab4470830) at HttpSM.cc:4345 #34 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at HttpSM.cc:497 #35 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at
[jira] [Commented] (TS-866) Need way to clear contents of a cache entry
[ https://issues.apache.org/jira/browse/TS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070234#comment-13070234 ] John Plevyak commented on TS-866: - Sorry for the delay. I am looking at this patch. It needs a little bit of work: 1) it should be built on remove instead of read (it can still share internal states with using the stack mechanism) 2) it should interlock writes from the aggregation buffer if they would overlap these writes 3) it needs to support clustering These are not huge changes, but they will require a bit of work. There are other features which need to touch this code as well, so I'll poke around. Need way to clear contents of a cache entry --- Key: TS-866 URL: https://issues.apache.org/jira/browse/TS-866 Project: Traffic Server Issue Type: New Feature Components: Cache Affects Versions: 3.0.0 Reporter: William Bardwell Priority: Minor Fix For: 3.1.0 Attachments: cache_erase.diff I needed a way to clear a cache entry off of disk, not just forget about it. The worry was about if you got content on a server that was illegal or a privacy violation of some sort, we wanted a way to be able to tell customers that after this step there was no way that TS could serve the content again. The normal cache remove just clears the directory entry, but theoretically a bug could allow that data out in some way. This was not intended to prevent forensic analysis of the hardware being able to recover the data. And bugs in low level drivers or the kernel could theoretically allow data to survive due to block remapping or mis-management of disk caches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TS-866) Need way to clear contents of a cache entry
[ https://issues.apache.org/jira/browse/TS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070234#comment-13070234 ] John Plevyak edited comment on TS-866 at 7/24/11 7:25 PM: -- Sorry for the delay. I am looking at this patch. It needs a little bit of work: 1) it should be built on remove instead of read (it can still share internal states with read using the stack mechanism) 2) it should interlock writes from the aggregation buffer if they would overlap these writes 3) it needs to support clustering These are not huge changes, but they will require a bit of work. There are other features which need to touch this code as well, so I'll poke around. was (Author: jplevyak): Sorry for the delay. I am looking at this patch. It needs a little bit of work: 1) it should be built on remove instead of read (it can still share internal states with using the stack mechanism) 2) it should interlock writes from the aggregation buffer if they would overlap these writes 3) it needs to support clustering These are not huge changes, but they will require a bit of work. There are other features which need to touch this code as well, so I'll poke around. Need way to clear contents of a cache entry --- Key: TS-866 URL: https://issues.apache.org/jira/browse/TS-866 Project: Traffic Server Issue Type: New Feature Components: Cache Affects Versions: 3.0.0 Reporter: William Bardwell Priority: Minor Fix For: 3.1.0 Attachments: cache_erase.diff I needed a way to clear a cache entry off of disk, not just forget about it. The worry was about if you got content on a server that was illegal or a privacy violation of some sort, we wanted a way to be able to tell customers that after this step there was no way that TS could serve the content again. The normal cache remove just clears the directory entry, but theoretically a bug could allow that data out in some way. This was not intended to prevent forensic analysis of the hardware being able to recover the data. And bugs in low level drivers or the kernel could theoretically allow data to survive due to block remapping or mis-management of disk caches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-848) Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show
[ https://issues.apache.org/jira/browse/TS-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070236#comment-13070236 ] John Plevyak commented on TS-848: - Gack, many of those values (e.g. nbytes) are now 64-bit %lld. Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show Key: TS-848 URL: https://issues.apache.org/jira/browse/TS-848 Project: Traffic Server Issue Type: Bug Components: HTTP Affects Versions: 3.1.0 Reporter: Zhao Yongming Labels: http_ui, network Fix For: 3.1.1 when we use the {net} http_ui network interface, it crashed with the following information {code} NOTE: Traffic Server received Sig 11: Segmentation fault /usr/bin/traffic_server - STACK TRACE: /usr/bin/traffic_server[0x51ba3e] /lib64/libpthread.so.0[0x3f89c0e7c0] [0x7fffd20544f8] /lib64/libc.so.6(vsnprintf+0x9a)[0x3f8906988a] /usr/bin/traffic_server(ShowCont::show(char const*, ...)+0x262)[0x638184] /usr/bin/traffic_server(ShowNet::showConnectionsOnThread(int, Event*)+0x481)[0x6ec7bf] /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6f)[0x4d302f] /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x11e)[0x6f9978] /usr/bin/traffic_server(EThread::execute()+0x94)[0x6f9b6a] /usr/bin/traffic_server(main+0x10c7)[0x4ff74d] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f8901d994] /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149] /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149] [New process 31182] #0 0x003f890796d0 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003f890796d0 in strlen () from /lib64/libc.so.6 #1 0x003f89046b69 in vfprintf () from /lib64/libc.so.6 #2 0x003f8906988a in vsnprintf () from /lib64/libc.so.6 #3 0x00638184 in ShowCont::show (this=0x2aaab44af600, s=0x7732b8 trtd%d/tdtd%s/tdtd%d/tdtd%d/tdtd%s/tdtd%d/tdtd%d secs ago/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d secs/tdtd%d secs/td...) at ../../proxy/Show.h:62 #4 0x006ec7bf in ShowNet::showConnectionsOnThread (this=0x2aaab44af600, event=1, e=0x2aaab5cc2080) at UnixNetPages.cc:75 #5 0x004d302f in Continuation::handleEvent (this=0x2aaab44af600, event=1, data=0x2aaab5cc2080) at I_Continuation.h:146 #6 0x006f9978 in EThread::process_event (this=0x2ae29010, e=0x2aaab5cc2080, calling_code=1) at UnixEThread.cc:140 #7 0x006f9b6a in EThread::execute (this=0x2ae29010) at UnixEThread.cc:189 #8 0x004ff74d in main (argc=3, argv=0x7fffd2054d88) at Main.cc:1958 {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-848) Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show
[ https://issues.apache.org/jira/browse/TS-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070292#comment-13070292 ] John Plevyak commented on TS-848: - I think this is fixed in 1150526, give it a try. Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show Key: TS-848 URL: https://issues.apache.org/jira/browse/TS-848 Project: Traffic Server Issue Type: Bug Components: HTTP Affects Versions: 3.1.0 Reporter: Zhao Yongming Labels: http_ui, network Fix For: 3.1.1 when we use the {net} http_ui network interface, it crashed with the following information {code} NOTE: Traffic Server received Sig 11: Segmentation fault /usr/bin/traffic_server - STACK TRACE: /usr/bin/traffic_server[0x51ba3e] /lib64/libpthread.so.0[0x3f89c0e7c0] [0x7fffd20544f8] /lib64/libc.so.6(vsnprintf+0x9a)[0x3f8906988a] /usr/bin/traffic_server(ShowCont::show(char const*, ...)+0x262)[0x638184] /usr/bin/traffic_server(ShowNet::showConnectionsOnThread(int, Event*)+0x481)[0x6ec7bf] /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6f)[0x4d302f] /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x11e)[0x6f9978] /usr/bin/traffic_server(EThread::execute()+0x94)[0x6f9b6a] /usr/bin/traffic_server(main+0x10c7)[0x4ff74d] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f8901d994] /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149] /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149] [New process 31182] #0 0x003f890796d0 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003f890796d0 in strlen () from /lib64/libc.so.6 #1 0x003f89046b69 in vfprintf () from /lib64/libc.so.6 #2 0x003f8906988a in vsnprintf () from /lib64/libc.so.6 #3 0x00638184 in ShowCont::show (this=0x2aaab44af600, s=0x7732b8 trtd%d/tdtd%s/tdtd%d/tdtd%d/tdtd%s/tdtd%d/tdtd%d secs ago/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d secs/tdtd%d secs/td...) at ../../proxy/Show.h:62 #4 0x006ec7bf in ShowNet::showConnectionsOnThread (this=0x2aaab44af600, event=1, e=0x2aaab5cc2080) at UnixNetPages.cc:75 #5 0x004d302f in Continuation::handleEvent (this=0x2aaab44af600, event=1, data=0x2aaab5cc2080) at I_Continuation.h:146 #6 0x006f9978 in EThread::process_event (this=0x2ae29010, e=0x2aaab5cc2080, calling_code=1) at UnixEThread.cc:140 #7 0x006f9b6a in EThread::execute (this=0x2ae29010) at UnixEThread.cc:189 #8 0x004ff74d in main (argc=3, argv=0x7fffd2054d88) at Main.cc:1958 {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-834) Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57
[ https://issues.apache.org/jira/browse/TS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050605#comment-13050605 ] John Plevyak commented on TS-834: - zym, do you still see this with the patch? Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57 - Key: TS-834 URL: https://issues.apache.org/jira/browse/TS-834 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: current trunk( the same time as v3.0), --enable-debug Reporter: Zhao Yongming Labels: UnixNet Attachments: TS-834.diff bt #1 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, event=1, data=0x4b2d6d0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, event=1, data=0x4b2d6d0) at I_Continuation.h:146 #1 0x006ce196 in InactivityCop::check_inactivity (this=0x4b3f780, event=2, e=0x4b2d6d0) at UnixNet.cc:57 #2 0x004d2c5f in Continuation::handleEvent (this=0x4b3f780, event=2, data=0x4b2d6d0) at I_Continuation.h:146 #3 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x4b2d6d0, calling_code=2) at UnixEThread.cc:140 #4 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #5 0x004ff37d in main (argc=3, argv=0x7fff6f447418) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff6f446cb0: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6ce196 called by frame at 0x7fff6f446d00 source language c++. Arglist at 0x7fff6f446ca0, args: this=0x2aaaf4091b70, event=1, data=0x4b2d6d0 Locals at 0x7fff6f446ca0, Previous frame's sp is 0x7fff6f446cb0 Saved registers: rbp at 0x7fff6f446ca0, rip at 0x7fff6f446ca8 (gdb) x/80x this 0x2aaaf4091b70: 0x0076a830 0x 0x006d1902 0x 0x2aaaf4091b80: 0x 0x 0x0076a290 0x 0x2aaaf4091b90: 0x 0x 0x 0x 0x2aaaf4091ba0: 0x 0x 0x 0x 0x2aaaf4091bb0: 0x 0x 0x 0x 0x2aaaf4091bc0: 0x 0x 0x 0x 0x2aaaf4091bd0: 0x 0x 0x 0x 0x2aaaf4091be0: 0x 0x 0x 0x 0x2aaaf4091bf0: 0x 0x 0x 0x 0x2aaaf4091c00: 0x 0x 0x 0x 0x2aaaf4091c10: 0x 0x 0x 0x 0x2aaaf4091c20: 0x 0x 0x 0x 0x2aaaf4091c30: 0x 0x 0x 0x 0x2aaaf4091c40: 0x 0x 0x 0x 0x2aaaf4091c50: 0x 0x 0x 0x 0x2aaaf4091c60: 0x 0x 0x 0x 0x2aaaf4091c70: 0x 0x 0x 0x 0x2aaaf4091c80: 0x 0x 0x 0x 0x2aaaf4091c90: 0x 0x 0x 0x 0x2aaaf4091ca0: 0x 0x 0x 0x {code} bt #2 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, event=1, data=0x11cbc610) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, event=1, data=0x11cbc610) at I_Continuation.h:146 #1 0x006ce196 in InactivityCop::check_inactivity (this=0x2c001f50, event=2, e=0x11cbc610) at UnixNet.cc:57 #2 0x004d2c5f in Continuation::handleEvent (this=0x2c001f50, event=2, data=0x11cbc610) at I_Continuation.h:146 #3 0x006f5830 in EThread::process_event (this=0x2af2a010, e=0x11cbc610, calling_code=2) at UnixEThread.cc:140 #4 0x006f5b72 in EThread::execute (this=0x2af2a010) at UnixEThread.cc:217 #5 0x006f5181 in spawn_thread_internal (a=0x11cadae0) at Thread.cc:88 #6 0x0030ec2064a7 in start_thread () from /lib64/libpthread.so.0 #7 0x0030eb6d3c2d in clone () from /lib64/libc.so.6 (gdb) info f Stack level 0, frame at 0x4198df60: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6ce196 called by frame at 0x4198dfb0 source language c++. Arglist at 0x4198df50, args: this=0x11ed6000, event=1, data=0x11cbc610 Locals at 0x4198df50, Previous frame's sp is 0x4198df60 Saved
[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related
[ https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050620#comment-13050620 ] John Plevyak commented on TS-833: - mohan_zl, this latest crash is with TS-833-3.diff ?? Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related --- Key: TS-833 URL: https://issues.apache.org/jira/browse/TS-833 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: current trunk, with --enable-debug Reporter: Zhao Yongming Labels: freelist Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff bt #1 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff76c40e40: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff76c40eb0 source language c++. Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0 Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40 Saved registers: rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38 (gdb) x/40x this 0x19581df0: 0x19581901 0x 0xefbeadde 0xefbeadde 0x19581e00: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e10: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e20: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e30: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e40: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e50: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e60: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e70: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e80: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde {code} bt #2 {code} #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 #1 0x0070364c in EThread::process_event (this=0x2ae29010, e=0xc4408a0, calling_code=2) at UnixEThread.cc:140 #2 0x0070398e in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961 (gdb) p *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation = { next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}} (gdb) {code} bt #3 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff421f01f0: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff421f0260 source language c++. Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, data=0x2aaab00d1570 Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0 Saved registers: rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8 (gdb) p this-handler $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338 {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related
[ https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-833: Attachment: TS-833-2.diff This is a possible patch which deals with DNS issues. Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related --- Key: TS-833 URL: https://issues.apache.org/jira/browse/TS-833 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: current trunk, with --enable-debug Reporter: Zhao Yongming Labels: freelist Attachments: TS-833-2.diff, TS-833.diff bt #1 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff76c40e40: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff76c40eb0 source language c++. Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0 Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40 Saved registers: rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38 (gdb) x/40x this 0x19581df0: 0x19581901 0x 0xefbeadde 0xefbeadde 0x19581e00: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e10: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e20: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e30: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e40: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e50: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e60: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e70: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e80: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde {code} bt #2 {code} #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 #1 0x0070364c in EThread::process_event (this=0x2ae29010, e=0xc4408a0, calling_code=2) at UnixEThread.cc:140 #2 0x0070398e in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961 (gdb) p *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation = { next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}} (gdb) {code} bt #3 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff421f01f0: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff421f0260 source language c++. Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, data=0x2aaab00d1570 Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0 Saved registers: rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8 (gdb) p this-handler $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338 {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related
[ https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-833: Attachment: TS-833-3.diff Even more conservative coding style. Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related --- Key: TS-833 URL: https://issues.apache.org/jira/browse/TS-833 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: current trunk, with --enable-debug Reporter: Zhao Yongming Labels: freelist Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff bt #1 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff76c40e40: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff76c40eb0 source language c++. Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0 Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40 Saved registers: rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38 (gdb) x/40x this 0x19581df0: 0x19581901 0x 0xefbeadde 0xefbeadde 0x19581e00: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e10: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e20: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e30: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e40: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e50: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e60: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e70: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e80: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde {code} bt #2 {code} #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 #1 0x0070364c in EThread::process_event (this=0x2ae29010, e=0xc4408a0, calling_code=2) at UnixEThread.cc:140 #2 0x0070398e in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961 (gdb) p *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation = { next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}} (gdb) {code} bt #3 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff421f01f0: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff421f0260 source language c++. Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, data=0x2aaab00d1570 Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0 Saved registers: rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8 (gdb) p this-handler $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338 {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related
[ https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048750#comment-13048750 ] John Plevyak commented on TS-833: - I have a theory about this, but I am not sure why the problem has only manifest now as it seems to have been in the codebase for a while. The theory is that the vc_next is bad because it has been closed as a result of the inactivity callback. This could be checked by walking down nh-open_list in the debugger (or code) to see if next_vc is in the list. Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related --- Key: TS-833 URL: https://issues.apache.org/jira/browse/TS-833 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 3.1.0 Environment: current trunk, with --enable-debug Reporter: Zhao Yongming Labels: freelist bt #1 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x19581df0, event=2, data=0x197c4fc0) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff76c40e40: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff76c40eb0 source language c++. Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0 Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40 Saved registers: rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38 (gdb) x/40x this 0x19581df0: 0x19581901 0x 0xefbeadde 0xefbeadde 0x19581e00: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e10: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e20: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e30: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e40: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e50: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e60: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e70: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde 0x19581e80: 0xefbeadde 0xefbeadde 0xefbeadde 0xefbeadde {code} bt #2 {code} #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, data=0xc4408a0) at I_Continuation.h:146 #1 0x0070364c in EThread::process_event (this=0x2ae29010, e=0xc4408a0, calling_code=2) at UnixEThread.cc:140 #2 0x0070398e in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961 (gdb) p *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation = { next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}} (gdb) {code} bt #3 {code} #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 146 return (this-*handler) (event, data); (gdb) bt #0 0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, event=2, data=0x2aaab00d1570) at I_Continuation.h:146 #1 0x006f5830 in EThread::process_event (this=0x2ae29010, e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140 #2 0x006f5b72 in EThread::execute (this=0x2ae29010) at UnixEThread.cc:217 #3 0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958 (gdb) info f Stack level 0, frame at 0x7fff421f01f0: rip = 0x4d2c5c in Continuation::handleEvent(int, void*) (I_Continuation.h:146); saved rip 0x6f5830 called by frame at 0x7fff421f0260 source language c++. Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, data=0x2aaab00d1570 Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0 Saved registers: rbp at 0x7fff421f01e0,
[jira] [Updated] (TS-811) libtool configure warnings on Fedora 15
[ https://issues.apache.org/jira/browse/TS-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-811: Priority: Major (was: Minor) autoreconf -i fails. If the configure file is built on some other machine, then it will work fine so this only impacts developers :) The resulting configure and them Makefile's are not functional: make[3]: Entering directory `/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts' /bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -D_LARGEFILE64_SOURCE=1 -D_COMPILE64BIT_SOURCE=1 -D_GNU_SOURCE -D_REENTRANT -Dlinux -g -pipe -Wall -Werror -O3 -feliminate-unused-debug-symbols -fno-strict-aliasing -Wno-invalid-offsetof -MT Allocator.lo -MD -MP -MF .deps/Allocator.Tpo -c -o Allocator.lo Allocator.cc ../../libtool: line 2089: ./Allocator.cc: Permission denied libtool: compile: g++ -DHAVE_CONFIG_H -I. -D_LARGEFILE64_SOURCE=1 -D_COMPILE64BIT_SOURCE=1 -D_GNU_SOURCE -D_REENTRANT -Dlinux -g -pipe -Wall -Werror -O3 -feliminate-unused-debug-symbols -fno-strict-aliasing -Wno-invalid-offsetof -MT Allocator.lo -MD -MP -MF .deps/Allocator.Tpo -c -fPIC -DPIC -o .libs/Allocator.o g++: error: : No such file or directory g++: fatal error: no input files compilation terminated. make[3]: *** [Allocator.lo] Error 1 make[3]: Leaving directory `/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts' make[2]: *** [all] Error 2 make[2]: Leaving directory `/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib' libtool configure warnings on Fedora 15 --- Key: TS-811 URL: https://issues.apache.org/jira/browse/TS-811 Project: Traffic Server Issue Type: Bug Components: Build Affects Versions: 2.1.9 Environment: Fedora 15 x86_64. Reporter: John Plevyak Fix For: 3.1.0 configure.ac:465: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body ../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from... ../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from... ../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from... build/libtool.m4:1084: _LT_SYS_MODULE_PATH_AIX is expanded from... build/libtool.m4:5428: _LT_LANG_CXX_CONFIG is expanded from... build/libtool.m4:816: _LT_LANG is expanded from... build/libtool.m4:799: LT_LANG is expanded from... build/libtool.m4:827: _LT_LANG_DEFAULT_CONFIG is expanded from... build/libtool.m4:143: _LT_SETUP is expanded from... build/libtool.m4:69: LT_INIT is expanded from... build/libtool.m4:107: AC_PROG_LIBTOOL is expanded from... configure.ac:465: the top level -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached
[ https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-621: Backport to Version: 3.0.1 Fix Version/s: (was: 2.1.9) 3.1 This change is just too risky to land in 3.0. We will make the change first thing in 3.1 and then backport if/when it proves stable. writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached - Key: TS-621 URL: https://issues.apache.org/jira/browse/TS-621 Project: Traffic Server Issue Type: Improvement Components: Cache Affects Versions: 2.1.5 Reporter: John Plevyak Assignee: John Plevyak Fix For: 3.1 Attachments: TS-621_cluster_zero_size_objects.patch, ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached
[ https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036424#comment-13036424 ] John Plevyak commented on TS-621: - Obviously the patch needs to be fixed up a bit. The Cluster used the CacheDataType as a message type, so I hacked in: enum CacheDataType { CACHE_DATA_SIZE = VCONNECTION_CACHE_DATA_BASE, - CACHE_DATA_HTTP_INFO, + CACHE_DATA_HTTP_INFO_LEAVE_BODY, + CACHE_DATA_HTTP_INFO_REPLACE_BODY, CACHE_DATA_KEY, CACHE_DATA_RAM_CACHE_HIT_FLAG }; Which doesn't really make sense. The leave/replace bit should be encoded somewhere else in the message. The changes to CacheWrite are very tricky and I have little faith in them. We could land it, but we would needs some serious testing... writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached - Key: TS-621 URL: https://issues.apache.org/jira/browse/TS-621 Project: Traffic Server Issue Type: Improvement Components: Cache Affects Versions: 2.1.5 Reporter: John Plevyak Assignee: John Plevyak Fix For: 2.1.9 Attachments: TS-621_cluster_zero_size_objects.patch, ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached
[ https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Plevyak updated TS-621: Attachment: ts-621-jp-3.patch This one works... but I would consider it very risky. writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached - Key: TS-621 URL: https://issues.apache.org/jira/browse/TS-621 Project: Traffic Server Issue Type: Improvement Components: Cache Affects Versions: 2.1.5 Reporter: John Plevyak Assignee: John Plevyak Fix For: 2.1.9 Attachments: ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached
[ https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034408#comment-13034408 ] John Plevyak commented on TS-621: - lol, my testing tool treats 0 length as varied testing now. john writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached - Key: TS-621 URL: https://issues.apache.org/jira/browse/TS-621 Project: Traffic Server Issue Type: Improvement Components: Cache Affects Versions: 2.1.5 Reporter: John Plevyak Assignee: John Plevyak Fix For: 2.1.9 Attachments: ts-621-jp-1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached
[ https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034424#comment-13034424 ] John Plevyak commented on TS-621: - yes, the HTTP state machine needs some more changes, and these are beyond me. I changed it so that it make the correct calls to the cache, but it seems that content-length of 0 is hard-wired into HttpSM as an error. The problem emerges in this=0x7fffea3e01c0, event=103, c=0x7fffea3e1f40) at HttpSM.cc:3162 3162 c-vc-do_io_close(EHTTP_ERROR); (gdb) list 3157// we got a truncated header from the origin server 3158// but decided to accpet it anyways 3159if (c-write_vio == NULL) { 3160 *status_ptr = HttpTransact::CACHE_WRITE_ERROR; 3161 c-write_success = false; 3162 c-vc-do_io_close(EHTTP_ERROR); 3163} else { 3164 *status_ptr = HttpTransact::CACHE_WRITE_COMPLETE; 3165 c-write_success = true; 3166 c-write_vio = c-vc-do_io(VIO::CLOSE); It seems that c-write_vio is NULL which causes the HttpSM to close the cache with an error It is easy to test... just put a breakpoint in CacheVC::openWriteClose The close should be without error. writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached - Key: TS-621 URL: https://issues.apache.org/jira/browse/TS-621 Project: Traffic Server Issue Type: Improvement Components: Cache Affects Versions: 2.1.5 Reporter: John Plevyak Assignee: John Plevyak Fix For: 2.1.9 Attachments: ts-621-jp-1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-779) Set thread name for various event types
[ https://issues.apache.org/jira/browse/TS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033807#comment-13033807 ] John Plevyak commented on TS-779: - Nice! Looks good. Set thread name for various event types --- Key: TS-779 URL: https://issues.apache.org/jira/browse/TS-779 Project: Traffic Server Issue Type: New Feature Components: Core Reporter: Leif Hedstrom Assignee: Leif Hedstrom Attachments: TS-779.diff Where supported, I'd like to set the thread name (using prctl) for the various event threads that we have. This makes it much easier to see what type of thread is consuming resources. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-773) Traffic server has a hard limit of 512 gigabytes per RAW disk partition
[ https://issues.apache.org/jira/browse/TS-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033816#comment-13033816 ] John Plevyak commented on TS-773: - Well, I did predict failure :) In this case the problem was that the directory can now be more than 2GB in size, which exceeds an 'int'. The resulting patch touched lots of the system because it also means we can now read and write 2GB in a single go. I have submitted a patch and tested in a faked disk over 2TB (I had Store.cc lie about the size of the disk). Give it a go. Traffic server has a hard limit of 512 gigabytes per RAW disk partition --- Key: TS-773 URL: https://issues.apache.org/jira/browse/TS-773 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 2.1.8 Environment: Debian Lenny 5.0.8 2.6.34.7 x86_64 12 1.5TB harddrives for cache disks. Reporter: David Robinson Assignee: John Plevyak Fix For: 2.1.9 Using 1.5TB harddrives as cache disks results in ATS only using 512GBs of the disk. The disks are configured in RAW mode with no partition information. storage.config is setup like this, /dev/sda /dev/sdb /dev/sde /dev/sdf /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo fdisk -l /dev/sdo Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Partitioning a disk into 3 512G partition and adding then to storage.config will make ATS use the entire 1.5TBs of space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached
[ https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033817#comment-13033817 ] John Plevyak commented on TS-621: - I'd like to get it in. The concern however, is that it changes the API which means that it will break clustering, so we have to get cluster changes/testing before committing. john writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached - Key: TS-621 URL: https://issues.apache.org/jira/browse/TS-621 Project: Traffic Server Issue Type: Improvement Components: Cache Affects Versions: 2.1.5 Reporter: John Plevyak Assignee: John Plevyak Fix For: 2.1.9 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-773) Traffic server has a hard limit of 512 gigabytes per RAW disk partition
[ https://issues.apache.org/jira/browse/TS-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032061#comment-13032061 ] John Plevyak commented on TS-773: - I checked in what I hope is the fix. The cplist code needs to be cleaned up as it is not clear which counts are counting bytes/store blocks/disk volume blocks. WARNING: this fix required changing the disk structure which will result in a cache WIPE! Traffic server has a hard limit of 512 gigabytes per RAW disk partition --- Key: TS-773 URL: https://issues.apache.org/jira/browse/TS-773 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 2.1.8 Environment: Debian Lenny 5.0.8 2.6.34.7 x86_64 12 1.5TB harddrives for cache disks. Reporter: David Robinson Assignee: John Plevyak Fix For: 2.1.9 Using 1.5TB harddrives as cache disks results in ATS only using 512GBs of the disk. The disks are configured in RAW mode with no partition information. storage.config is setup like this, /dev/sda /dev/sdb /dev/sde /dev/sdf /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo fdisk -l /dev/sdo Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Partitioning a disk into 3 512G partition and adding then to storage.config will make ATS use the entire 1.5TBs of space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-773) Traffic server has a hard limit of 512 gigabytes per RAW disk partition
[ https://issues.apache.org/jira/browse/TS-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032119#comment-13032119 ] John Plevyak commented on TS-773: - Unfortunately, the hosting code is pretty complicated and poorly written. I have tried to patch it, but it really should have a more comprehensive audit. If this patch fixes the immediate problem (which it seems to have, but I'd like independent confirmation) we can put off the audit till after 3.0. So I don't foresee more changes if we get the fix verified but I wouldn't be surprised if there was still a problem under some circumstances... until a full audit. Traffic server has a hard limit of 512 gigabytes per RAW disk partition --- Key: TS-773 URL: https://issues.apache.org/jira/browse/TS-773 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 2.1.8 Environment: Debian Lenny 5.0.8 2.6.34.7 x86_64 12 1.5TB harddrives for cache disks. Reporter: David Robinson Assignee: John Plevyak Fix For: 2.1.9 Using 1.5TB harddrives as cache disks results in ATS only using 512GBs of the disk. The disks are configured in RAW mode with no partition information. storage.config is setup like this, /dev/sda /dev/sdb /dev/sde /dev/sdf /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo fdisk -l /dev/sdo Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Partitioning a disk into 3 512G partition and adding then to storage.config will make ATS use the entire 1.5TBs of space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-745) Support ssd
[ https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031254#comment-13031254 ] John Plevyak commented on TS-745: - We should make a branch for this. Then we could collaborate. I would also like to simplify locking in the cache by getting rid of the TryLocks for the partition. This would make it easier to write the SSD code to handle multiple SSDs I think. What do you think mohan_zl? Support ssd --- Key: TS-745 URL: https://issues.apache.org/jira/browse/TS-745 Project: Traffic Server Issue Type: New Feature Components: Cache Reporter: mohan_zl Assignee: mohan_zl Attachments: ssd_cache.patch A patch for supporting, not work well for a long time with --enable-debug -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-752) cache scan issues
[ https://issues.apache.org/jira/browse/TS-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026758#comment-13026758 ] John Plevyak commented on TS-752: - I reviewed this and it looks good to me. I'd move vol_relative_length into the header (for symetry). The fix for partial objects I think is nice as well. I don't see the crash fix in svn5.diff, so I am assuming that the two are independent. Since this patch is all about scan it should be safe. William, how much testing have you given this? If you are comfortable, I'll give it a smoke test and commit. cache scan issues - Key: TS-752 URL: https://issues.apache.org/jira/browse/TS-752 Project: Traffic Server Issue Type: Bug Components: Cache Affects Versions: 2.1.7, 2.1.6, 2.1.5, 2.1.4 Environment: Any Reporter: William Bardwell Assignee: Leif Hedstrom Fix For: 2.1.8 Attachments: svn4.diff, svn4.diff, svn5.diff Using the CacheScan plugin APIs I found a few issues. Issue 1 is that if you cancel a scan really quickly you can get a NULL dereference, the fix for this is easy. Issue 2 is that the cache scan code can skip over entries if the initial header overlaps a buffer boundary. Issue 3 is that the cache scan code is crazy slow if your cache is not full, it still scans everything. I will attach a patch for Issues 2 3 mixed together... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-716) Crash in Continuation::handleEvent
[ https://issues.apache.org/jira/browse/TS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025308#comment-13025308 ] John Plevyak commented on TS-716: - We should make another bug for this. I am not sure why, but it probably has something to do with one or more of the sites having a large round robin and a low TTL. It could also be exacerbated by a startup issue were a lot of outstanding DNS requests for the same host are done unnecessarily. There is supposed to be queuing in the DNS processor, but that may not be working as intended Crash in Continuation::handleEvent --- Key: TS-716 URL: https://issues.apache.org/jira/browse/TS-716 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 2.1.7 Environment: CentOS 5.4 x86_64, 6 * 2T SATA Disks, 48G Memory Reporter: Kissdev Assignee: John Plevyak Priority: Critical Fix For: 2.1.8 Attachments: crasher.patch ATS crashes with the following configuration: - reverse proxy , storage: 6 raw devices (6*2T), 1 partition (2T) - remap config:regex_map http://(.*) http://$1 The load : about 100Mbps, requests for top 4000 internet sites, mainly html,js,pictures,flashes Detail of crashes by core dump: crash #1: {code} #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 146 I_Continuation.h: No such file or directory. in I_Continuation.h (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 #1 0x00702b80 in EThread::process_event (this=0x2b101010, e=0x90f7170, calling_code=1) at UnixEThread.cc:140 #2 0x00702fa1 in EThread::execute (this=0x2b101010) at UnixEThread.cc:232 #3 0x007024d2 in spawn_thread_internal (a=0x8d94a70) at Thread.cc:85 #4 0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0 #5 0x0036eb0d3c2d in clone () from /lib64/libc.so.6 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaaba360a11}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation = { next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}} {code} crash #2: {code} (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, event=1, data=0x154b5a80) at I_Continuation.h:146 #1 0x006db290 in InactivityCop::check_inactivity (this=0x154c8730, event=2, e=0x154b5a80) at UnixNet.cc:57 #2 0x004dd1bb in Continuation::handleEvent (this=0x154c8730, event=2, data=0x154b5a80) at I_Continuation.h:146 #3 0x00702b80 in EThread::process_event (this=0x2b606010, e=0x154b5a80, calling_code=2) at UnixEThread.cc:140 #4 0x00702ec2 in EThread::execute (this=0x2b606010) at UnixEThread.cc:217 #5 0x007024d2 in spawn_thread_internal (a=0x154852c0) at Thread.cc:85 #6 0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0 #7 0x0036eb0d3c2d in clone () from /lib64/libc.so.6 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, event=1, data=0x154b5a80) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x16280061}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation = { next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}} (gdb) {code} crash #3: {code} (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, event=2, data=0x5631120) at I_Continuation.h:146 #1 0x00702b80 in EThread::process_event (this=0x2abfc010, e=0x5631120, calling_code=2) at UnixEThread.cc:140 #2 0x00702ec2 in EThread::execute (this=0x2abfc010) at UnixEThread.cc:217 #3 0x0050917c in main (argc=3, argv=0x7fff0af6e3b8) at Main.cc:1962 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, event=2, data=0x5631120) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab45df291}, handler = virtual table offset
[jira] [Commented] (TS-716) Crash in Continuation::handleEvent
[ https://issues.apache.org/jira/browse/TS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024834#comment-13024834 ] John Plevyak commented on TS-716: - Just to confirm that this is not a configuration problem, try increasing: CONFIG proxy.config.hostdb.size INT 20 CONFIG proxy.config.hostdb.storage_size INT 33554432 by say a factor of 10. john Crash in Continuation::handleEvent --- Key: TS-716 URL: https://issues.apache.org/jira/browse/TS-716 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 2.1.7 Environment: CentOS 5.4 x86_64, 6 * 2T SATA Disks, 48G Memory Reporter: Kissdev Assignee: John Plevyak Priority: Critical Fix For: 2.1.8 Attachments: crasher.patch ATS crashes with the following configuration: - reverse proxy , storage: 6 raw devices (6*2T), 1 partition (2T) - remap config:regex_map http://(.*) http://$1 The load : about 100Mbps, requests for top 4000 internet sites, mainly html,js,pictures,flashes Detail of crashes by core dump: crash #1: {code} #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 146 I_Continuation.h: No such file or directory. in I_Continuation.h (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 #1 0x00702b80 in EThread::process_event (this=0x2b101010, e=0x90f7170, calling_code=1) at UnixEThread.cc:140 #2 0x00702fa1 in EThread::execute (this=0x2b101010) at UnixEThread.cc:232 #3 0x007024d2 in spawn_thread_internal (a=0x8d94a70) at Thread.cc:85 #4 0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0 #5 0x0036eb0d3c2d in clone () from /lib64/libc.so.6 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaaba360a11}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation = { next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}} {code} crash #2: {code} (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, event=1, data=0x154b5a80) at I_Continuation.h:146 #1 0x006db290 in InactivityCop::check_inactivity (this=0x154c8730, event=2, e=0x154b5a80) at UnixNet.cc:57 #2 0x004dd1bb in Continuation::handleEvent (this=0x154c8730, event=2, data=0x154b5a80) at I_Continuation.h:146 #3 0x00702b80 in EThread::process_event (this=0x2b606010, e=0x154b5a80, calling_code=2) at UnixEThread.cc:140 #4 0x00702ec2 in EThread::execute (this=0x2b606010) at UnixEThread.cc:217 #5 0x007024d2 in spawn_thread_internal (a=0x154852c0) at Thread.cc:85 #6 0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0 #7 0x0036eb0d3c2d in clone () from /lib64/libc.so.6 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, event=1, data=0x154b5a80) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x16280061}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation = { next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}} (gdb) {code} crash #3: {code} (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, event=2, data=0x5631120) at I_Continuation.h:146 #1 0x00702b80 in EThread::process_event (this=0x2abfc010, e=0x5631120, calling_code=2) at UnixEThread.cc:140 #2 0x00702ec2 in EThread::execute (this=0x2abfc010) at UnixEThread.cc:217 #3 0x0050917c in main (argc=3, argv=0x7fff0af6e3b8) at Main.cc:1962 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, event=2, data=0x5631120) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab45df291}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation = {
[jira] [Commented] (TS-745) Support ssd
[ https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024839#comment-13024839 ] John Plevyak commented on TS-745: - Overall several comments: 1) Once we get it working we need to reorganize the patch so that we avoid code duplication. 2) The limitation of a single SSD is probably overly restrictive. SSDs are relatively inexpensive now and it is very likely that folks will want to have 1 per disk or better yet some combination of SSD and SATA. 3) The code uses xmalloc to store docs and creates another CacheVC to do the write in the background. We would be using the various allocators rather than xmalloc and we should be adding states rather than creating another CacheVC. The write can occur after calling back the user, so there will be no additional latency. 4) The code doesn't seem to have provision for handling multi-fragment documents. There are some tradeoffs there, but in any case the issues needs to be considered. Handling of multi-fragment documents requires tighter control over when the write is done and if and when it is successful. Again, this would indicate we should be doing the write on the same CacheVC in additional states. 5) The code needs to deal collisions after the initial directory is looked up. Again, this would be easier if the SSD code was operating in the same CacheVC. Please think about these comments. Let's use http://codereview.appspot.com/ for detailed reviews. I have already added traffic server as a repository. I have imported the patch as well, but I think we still have some design discussion to do before we can get into the details. Support ssd --- Key: TS-745 URL: https://issues.apache.org/jira/browse/TS-745 Project: Traffic Server Issue Type: New Feature Components: Cache Reporter: mohan_zl Assignee: mohan_zl Attachments: ssd_cache.patch A patch for supporting, not work well for a long time with --enable-debug -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TS-724) disk IO balance in v2.1.7
[ https://issues.apache.org/jira/browse/TS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022597#comment-13022597 ] John Plevyak commented on TS-724: - FYI a single document ends up on a single disk. If we could get the per-partition cache stats we could see what ATS thought was going on. disk IO balance in v2.1.7 - Key: TS-724 URL: https://issues.apache.org/jira/browse/TS-724 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 2.1.7 Environment: reporting from users, and confirm within my testing evn. v2.1.7 only Reporter: Zhao Yongming Priority: Critical Fix For: 2.1.9 when multiple disk enabled, the disk IO will show much diff in v2.1.7, here is my result on a 7 disk system result: {code:none} [root@cache189 ~]# iostat -x 5 Linux 2.6.18-164.11.1.el5 (cache189.cn8) 03/29/2011 avg-cpu: %user %nice %system %iowait %steal %idle 0.510.000.541.770.00 97.18 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 11.80 0.00 360.80 0.0030.58 0.075.73 5.56 6.56 sdc 0.00 0.00 12.00 0.00 413.80 0.0034.48 0.075.42 5.30 6.36 sdd 0.00 0.00 11.60 1.40 375.80 820.8092.05 0.075.06 4.66 6.06 sde 0.00 0.00 13.00 14.00 722.60 8192.00 330.17 0.124.50 2.99 8.06 sdf 0.00 0.00 14.60 0.00 579.40 0.0039.68 0.117.48 7.04 10.28 sdg 0.00 0.00 49.20 0.00 18268.60 0.00 371.31 0.081.66 0.54 2.66 sdb 0.00 0.00 11.60 0.00 253.60 0.0021.86 0.065.45 5.12 5.94 sdc 0.00 0.00 15.80 0.00 738.20 0.0046.72 0.085.22 4.76 7.52 sdd 0.00 0.00 10.80 0.00 728.40 0.0067.44 0.065.81 5.48 5.92 sde 0.00 0.00 11.60 2.00 377.60 1027.20 103.29 0.075.18 4.75 6.46 sdf 0.00 0.00 14.60 0.00 473.60 0.0032.44 0.095.90 5.78 8.44 sdg 0.00 0.00 87.00 0.00 37454.80 0.00 430.51 0.374.26 0.82 7.12 sdb 0.00 0.00 15.80 0.00 786.40 0.0049.77 0.106.56 5.76 9.10 sdc 0.00 0.00 10.20 1.60 217.60 911.2095.66 0.064.93 4.51 5.32 sdd 0.00 0.00 13.00 0.00 665.00 0.0051.15 0.086.12 5.80 7.54 sde 0.00 0.00 11.60 0.00 419.40 0.0036.16 0.065.43 5.17 6.00 sdf 0.00 0.00 11.00 1.40 315.00 826.8092.08 0.075.27 4.89 6.06 sdg 0.00 0.00 27.00 0.00 8629.60 0.00 319.61 0.020.87 0.37 1.00 sdb 0.00 0.00 12.80 0.00 380.00 0.0029.69 0.075.22 4.98 6.38 sdc 0.00 0.00 14.80 0.00 495.80 0.0033.50 0.085.39 5.19 7.68 sdd 0.00 0.00 10.40 0.00 267.40 0.0025.71 0.065.87 5.46 5.68 sde 0.00 0.00 12.20 0.00 691.20 0.0056.66 0.075.93 5.48 6.68 sdf 0.00 0.00 11.80 0.00 544.40 0.0046.14 0.075.83 5.63 6.64 sdg 0.00 0.00 57.00 0.00 22033.00 0.00 386.54 0.061.07 0.38 2.16 sdb 0.00 0.00 13.20 0.00 546.40 0.0041.39 0.085.73 5.73 7.56 sdc 0.00 0.00 14.00 0.00 583.60 0.0041.69 0.085.57 5.34 7.48 sdd 0.00 0.00 12.80 0.00 639.20 0.0049.94 0.075.61 5.14 6.58 sde 0.00 0.00 12.40 0.00 403.20 0.0032.52 0.26 20.98 11.03 13.68 sdf 0.00 0.00 15.00 0.00 475.80 0.0031.72 0.095.71 5.37 8.06 sdg 0.00 0.00 91.80 0.00 39239.00 0.00 427.44 0.576.24 0.76 6.94 sdb 0.00 0.00 10.60 0.00 326.60 0.0030.81 0.065.60 5.04 5.34 sdc 0.00 0.00 12.80 0.00 644.40 0.0050.34 0.075.72 5.27 6.74 sdd 0.00 0.00 14.80 0.00 624.00 0.0042.16 0.085.61 5.50 8.14 sde 0.00 0.00 9.20 0.00 283.00 0.0030.76 0.055.83 5.67 5.22 sdf 0.00 0.00 13.40 0.00 578.00 0.0043.13 0.075.39 5.15 6.90 sdg 0.00 0.00 12.80
[jira] [Commented] (TS-716) Crash in Continuation::handleEvent
[ https://issues.apache.org/jira/browse/TS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021644#comment-13021644 ] John Plevyak commented on TS-716: - I have committed 1095127, which fixes a problem with the DNS HostEnt and DNSEntry continuations, but it is consistent with only some of the stack traces seen here, so this bug stays open until we confirm a fix. Crash in Continuation::handleEvent --- Key: TS-716 URL: https://issues.apache.org/jira/browse/TS-716 Project: Traffic Server Issue Type: Bug Components: Core Affects Versions: 2.1.7 Environment: CentOS 5.4 x86_64, 6 * 2T SATA Disks, 48G Memory Reporter: Kissdev Assignee: John Plevyak Priority: Critical Fix For: 2.1.8 Attachments: crasher.patch ATS crashes with the following configuration: - reverse proxy , storage: 6 raw devices (6*2T), 1 partition (2T) - remap config:regex_map http://(.*) http://$1 The load : about 100Mbps, requests for top 4000 internet sites, mainly html,js,pictures,flashes Detail of crashes by core dump: crash #1: {code} #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 146 I_Continuation.h: No such file or directory. in I_Continuation.h (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 #1 0x00702b80 in EThread::process_event (this=0x2b101010, e=0x90f7170, calling_code=1) at UnixEThread.cc:140 #2 0x00702fa1 in EThread::execute (this=0x2b101010) at UnixEThread.cc:232 #3 0x007024d2 in spawn_thread_internal (a=0x8d94a70) at Thread.cc:85 #4 0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0 #5 0x0036eb0d3c2d in clone () from /lib64/libc.so.6 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, event=1, data=0x90f7170) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaaba360a11}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation = { next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}} {code} crash #2: {code} (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, event=1, data=0x154b5a80) at I_Continuation.h:146 #1 0x006db290 in InactivityCop::check_inactivity (this=0x154c8730, event=2, e=0x154b5a80) at UnixNet.cc:57 #2 0x004dd1bb in Continuation::handleEvent (this=0x154c8730, event=2, data=0x154b5a80) at I_Continuation.h:146 #3 0x00702b80 in EThread::process_event (this=0x2b606010, e=0x154b5a80, calling_code=2) at UnixEThread.cc:140 #4 0x00702ec2 in EThread::execute (this=0x2b606010) at UnixEThread.cc:217 #5 0x007024d2 in spawn_thread_internal (a=0x154852c0) at Thread.cc:85 #6 0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0 #7 0x0036eb0d3c2d in clone () from /lib64/libc.so.6 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, event=1, data=0x154b5a80) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x16280061}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation = { next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}} (gdb) {code} crash #3: {code} (gdb) bt #0 0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, event=2, data=0x5631120) at I_Continuation.h:146 #1 0x00702b80 in EThread::process_event (this=0x2abfc010, e=0x5631120, calling_code=2) at UnixEThread.cc:140 #2 0x00702ec2 in EThread::execute (this=0x2abfc010) at UnixEThread.cc:217 #3 0x0050917c in main (argc=3, argv=0x7fff0af6e3b8) at Main.cc:1962 (gdb) frame 0 #0 0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, event=2, data=0x5631120) at I_Continuation.h:146 146 in I_Continuation.h (gdb) print *this $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab45df291}, handler = virtual table offset -1157442765409226770, this adjustment -1157442765409226769, handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of bounds, mutex = {m_ptr = 0xefefefefefefefef}, link =
[jira] [Commented] (TS-652) SSL random buffer initialization should be checked
[ https://issues.apache.org/jira/browse/TS-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021341#comment-13021341 ] John Plevyak commented on TS-652: - I'd be inclined to 2) and document. If that is the recommended we of using the library we should just use it. SSL random buffer initialization should be checked -- Key: TS-652 URL: https://issues.apache.org/jira/browse/TS-652 Project: Traffic Server Issue Type: Wish Components: SSL Reporter: John Plevyak Fix For: 2.1.8 The way the SSL random buffers are initialized is interesting... it could also be made more efficient with the new 64-bit random number generator. It looks like it is using whatever is on the stack and then hashing it with 2 different random number generators and skipping the first few bytes... why, no idea. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira