[jira] [Created] (TS-4053) Add hit rate and memory usage regressions for RAM cache.

2015-12-03 Thread John Plevyak (JIRA)
John Plevyak created TS-4053:


 Summary: Add hit rate and memory usage regressions for RAM cache.
 Key: TS-4053
 URL: https://issues.apache.org/jira/browse/TS-4053
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak


It would be nice to have a hit rate and memory usage regression tests for the 
RAM cache.  In particular comparing LRU and CLFUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-4053) Add hit rate and memory usage regressions for RAM cache.

2015-12-03 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak reassigned TS-4053:


Assignee: John Plevyak

> Add hit rate and memory usage regressions for RAM cache.
> 
>
> Key: TS-4053
> URL: https://issues.apache.org/jira/browse/TS-4053
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: John Plevyak
>
> It would be nice to have a hit rate and memory usage regression tests for the 
> RAM cache.  In particular comparing LRU and CLFUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4053) Add hit rate and memory usage regressions for RAM cache, tune CLFUS.

2015-12-03 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-4053:
-
Description: It would be nice to have a hit rate and memory usage 
regression tests for the RAM cache.  In particular comparing LRU and CLFUS.  
Once we have this we can tune the CLFUS implementation.  (was: It would be nice 
to have a hit rate and memory usage regression tests for the RAM cache.  In 
particular comparing LRU and CLFUS.)

> Add hit rate and memory usage regressions for RAM cache, tune CLFUS.
> 
>
> Key: TS-4053
> URL: https://issues.apache.org/jira/browse/TS-4053
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: John Plevyak
>Priority: Minor
>
> It would be nice to have a hit rate and memory usage regression tests for the 
> RAM cache.  In particular comparing LRU and CLFUS.  Once we have this we can 
> tune the CLFUS implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4053) Add hit rate and memory usage regressions for RAM cache, tune CLFUS.

2015-12-03 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-4053:
-
Priority: Minor  (was: Major)
 Summary: Add hit rate and memory usage regressions for RAM cache, tune 
CLFUS.  (was: Add hit rate and memory usage regressions for RAM cache.)

> Add hit rate and memory usage regressions for RAM cache, tune CLFUS.
> 
>
> Key: TS-4053
> URL: https://issues.apache.org/jira/browse/TS-4053
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Cache
>Reporter: John Plevyak
>Assignee: John Plevyak
>Priority: Minor
>
> It would be nice to have a hit rate and memory usage regression tests for the 
> RAM cache.  In particular comparing LRU and CLFUS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-3786) Use a consensus algorithm to elect the cluster master

2015-07-21 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak reassigned TS-3786:


Assignee: John Plevyak

 Use a consensus algorithm to elect the cluster master
 -

 Key: TS-3786
 URL: https://issues.apache.org/jira/browse/TS-3786
 Project: Traffic Server
  Issue Type: Improvement
  Components: Manager
Reporter: John Plevyak
Assignee: John Plevyak

 We should use a consensus algorithm to elect the cluster master and to update 
 the configurations so that there is no single point of failure and machines 
 entering or restarting can be brought to a consistent state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3786) Use a consensus algorithm to elect the cluster master

2015-07-21 Thread John Plevyak (JIRA)
John Plevyak created TS-3786:


 Summary: Use a consensus algorithm to elect the cluster master
 Key: TS-3786
 URL: https://issues.apache.org/jira/browse/TS-3786
 Project: Traffic Server
  Issue Type: Improvement
  Components: Manager
Reporter: John Plevyak


We should use a consensus algorithm to elect the cluster master and to update 
the configurations so that there is no single point of failure and machines 
entering or restarting can be brought to a consistent state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-3508) use accept4 on linux systems where available to reduce system calls

2015-04-08 Thread John Plevyak (JIRA)
John Plevyak created TS-3508:


 Summary: use accept4 on linux systems where available to reduce 
system calls
 Key: TS-3508
 URL: https://issues.apache.org/jira/browse/TS-3508
 Project: Traffic Server
  Issue Type: Improvement
  Components: Network
Reporter: John Plevyak


The accept4() syscall can set flags on the accepted socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3401) AIO blocks under lock contention

2015-02-23 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334450#comment-14334450
 ] 

John Plevyak commented on TS-3401:
--

I generally agree, but it is true that aio_thread_main() uses 
ink_atomiclist_popall() to grab the entire atomic queue associated with a 
AIO_Req for a single file descriptor/disk.  This means that a bunch of reads 
could be blocked behind the disk operation (as well as acquiring the mutex for 
write callbacks, but that is probably less important).  We could switch to 
using ink_atomiclist_pop in aio_move which would cause only a single op to be 
moved to the local queue.  

That said, we should probably reexamine using linux native AIO now that the 
eventfd code has landed.  I think it will be more efficient, and the new linux 
multi-queue support for SSDs we can do millions of ops/sec, so we want to be 
able to load up that queue and native AIO with eventfd looks like a good way to 
do it.

We should also consider changing all the delay periods (e.g. AIO_PERIOD) to be 
100 mseconds or more if we have eventfd as we don't need to busy poll 
anything... we will be awoken if anything appears in a queue or on an file 
descriptor.

 AIO blocks under lock contention
 

 Key: TS-3401
 URL: https://issues.apache.org/jira/browse/TS-3401
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Reporter: Brian Geffon
Assignee: Brian Geffon
 Attachments: aio.patch


 In {{aio_thread_main()}} while trying to process AIO ops the AIO thread will 
 wait on the mutex for the op which obviously blocks other AIO ops from 
 processing. We should use a try lock instead and reschedule the ops that we 
 couldn't immediately process. Patch attached. Waiting for reviews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-1264) LRU RAM cache not accounting for overhead

2015-01-31 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1264:
-
Attachment: ram_cache.patch

 LRU RAM cache not accounting for overhead
 -

 Key: TS-1264
 URL: https://issues.apache.org/jira/browse/TS-1264
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: Leif Hedstrom
Priority: Minor
 Fix For: 6.0.0

 Attachments: ram_cache.patch


 The CLFUS RAM cache takes its overhead into account when determining how many 
 bytes it is using.  The LRU cache does not, which makes it hard to compare 
 performance between the two and hard to correctly size the LRU RAM cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-09-30 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154210#comment-14154210
 ] 

John Plevyak commented on TS-3044:
--

The perror is from the original AIO_MODE_NATIVE code.   I was just following 
the style as this was a minimal patch just to add the eventfd handling.   I 
agree that we should change that to standard ATS errors.  For most unix/linux 
installations waiting for less than 10msec is the same as waiting for 0msecs, 
and can result in busy spinning.   The iocore has a minimum wait time, so 
HRTIME_MSECONDS(4) is disingenuous as well as being a poor idea (if it actually 
was obeyed).  

 linux native AIO should use eventfd if available to signal thread
 -

 Key: TS-3044
 URL: https://issues.apache.org/jira/browse/TS-3044
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak
Assignee: Phil Sorber
 Fix For: 5.2.0

 Attachments: native-aio-eventfd.patch


 linux native AIO has the ability to signal the event thread to get off the 
 poll and service the disk via the io_set_eventfd() call.  linux native AIO 
 scales better than the thread-based IO, but the current implementation can 
 introduce delays on lightly loaded systems because of the thread is waiting 
 on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-3044:
-

Assignee: weijin

 linux native AIO should use eventfd if available to signal thread
 -

 Key: TS-3044
 URL: https://issues.apache.org/jira/browse/TS-3044
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak
Assignee: weijin

 linux native AIO has the ability to signal the event thread to get off the 
 poll and service the disk via the io_set_eventfd() call.  linux native AIO 
 scales better than the thread-based IO, but the current implementation can 
 introduce delays on lightly loaded systems because of the thread is waiting 
 on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)
John Plevyak created TS-3044:


 Summary: linux native AIO should use eventfd if available to 
signal thread
 Key: TS-3044
 URL: https://issues.apache.org/jira/browse/TS-3044
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak


linux native AIO has the ability to signal the event thread to get off the poll 
and service the disk via the io_set_eventfd() call.  linux native AIO scales 
better than the thread-based IO, but the current implementation can introduce 
delays on lightly loaded systems because of the thread is waiting on epoll().   
This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-3044:
-

Attachment: native-aio-eventfd.patch

 linux native AIO should use eventfd if available to signal thread
 -

 Key: TS-3044
 URL: https://issues.apache.org/jira/browse/TS-3044
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak
Assignee: weijin
 Attachments: native-aio-eventfd.patch


 linux native AIO has the ability to signal the event thread to get off the 
 poll and service the disk via the io_set_eventfd() call.  linux native AIO 
 scales better than the thread-based IO, but the current implementation can 
 introduce delays on lightly loaded systems because of the thread is waiting 
 on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TS-3044) linux native AIO should use eventfd if available to signal thread

2014-08-25 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110181#comment-14110181
 ] 

John Plevyak commented on TS-3044:
--

Assigned to weijin for review as he was in charge of the native linux AIO and 
can assess the impact.  As I remember, this isn't enabled by default because of 
the latency concerns by Leif.  With this patch, if the latency concerns are 
addressed, we might want to enable this feature by default.

 linux native AIO should use eventfd if available to signal thread
 -

 Key: TS-3044
 URL: https://issues.apache.org/jira/browse/TS-3044
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Reporter: John Plevyak
Assignee: weijin
 Attachments: native-aio-eventfd.patch


 linux native AIO has the ability to signal the event thread to get off the 
 poll and service the disk via the io_set_eventfd() call.  linux native AIO 
 scales better than the thread-based IO, but the current implementation can 
 introduce delays on lightly loaded systems because of the thread is waiting 
 on epoll().   This can be remedied by using io_set_eventfd



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1

2013-09-10 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763219#comment-13763219
 ] 

John Plevyak commented on TS-2193:
--

I am concerned about the proxy.config.dns.dedicated_thread option.  It is 
testing a configuration where event threads are not ET_NET.  While originally 
the design accounted for that possibility, it was the case that all event 
threads were ET_NET soon thereafter and I an worried that there are implicit 
assumptions that it is the case (as seems to be with the session manager).

Is this really necessary?  DNS processing should be cheap.

Several fixes come to mind:

1) make the SessionManager not depend on being called on an ET_NET (this should 
probably be done in any case).  It could simply shift to any ET_NET thread if 
it was called from another.
2) make the DNS processor call back on an ET_NET thread (this is stupid since 
there is no good reason for it to assume the caller has such a restriction and 
indeed what about the other ET_ type?).
3) make the DNS processor run across threads by hashing hosts to all ET_NET 
threads.  This will fix both the issue we are seeing as well as spread the load.

We should probably do both 1 and 3.  There will be a temptation to do 2) 
because it will be the easy fix but I think it is the wrong way out.

 Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
 --

 Key: TS-2193
 URL: https://issues.apache.org/jira/browse/TS-2193
 Project: Traffic Server
  Issue Type: Bug
  Components: DNS
Affects Versions: 4.1.0
Reporter: Tommy Lee
 Fix For: 4.1.0

 Attachments: bt-01.txt


 Hi all,
   I've tried to enable DNS Thread without luck.
   When i set proxy.config.dns.dedicated_thread to 1, it crashes with the 
 information below.
   The ATS is working in Forward Proxy mode.
   Thanks in advance.
 --
 traffic.out
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: 
 /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0]
 /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2]
 /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8]
 /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a]
 /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1]
 /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94]
 /usr/local/cache-4.1/bin/traffic_server[0x5fd382]
 /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72]
 /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94]
 /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41]
 /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534]
 /usr/local/cache-4.1/bin/traffic_server[0x6b17ea]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1

2013-09-10 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763225#comment-13763225
 ] 

John Plevyak commented on TS-2193:
--

This is listed as an experimental performance feature.  Personally, I would 
like to see some numbers before committing resources otherwise I would say that 
the experiment was a failure.

 Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
 --

 Key: TS-2193
 URL: https://issues.apache.org/jira/browse/TS-2193
 Project: Traffic Server
  Issue Type: Bug
  Components: DNS
Affects Versions: 4.1.0
Reporter: Tommy Lee
 Fix For: 4.1.0

 Attachments: bt-01.txt


 Hi all,
   I've tried to enable DNS Thread without luck.
   When i set proxy.config.dns.dedicated_thread to 1, it crashes with the 
 information below.
   The ATS is working in Forward Proxy mode.
   Thanks in advance.
 --
 traffic.out
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: 
 /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0]
 /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2]
 /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8]
 /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a]
 /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1]
 /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94]
 /usr/local/cache-4.1/bin/traffic_server[0x5fd382]
 /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72]
 /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94]
 /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41]
 /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534]
 /usr/local/cache-4.1/bin/traffic_server[0x6b17ea]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-2193) Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1

2013-09-09 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762366#comment-13762366
 ] 

John Plevyak commented on TS-2193:
--

the code in dns_result will check to see that we only call back on the same 
thread that initiated the DNS lookup:

  if (h-mutex-thread_holding == e-submit_thread) {
MUTEX_TRY_LOCK(lock, e-action.mutex, h-mutex-thread_holding);
if (!lock) {
  Debug(dns, failed lock for result %s, e-qname);
  goto Lretry;
}
for (int i = 0; i  MAX_DNS_RETRIES; i++) {
  if (e-id[i]  0)
break;
  h-release_query_id(e-id[i]);
}
e-postEvent(0, 0);
  } else {
for (int i = 0; i  MAX_DNS_RETRIES; i++) {
  if (e-id[i]  0)
break;
  h-release_query_id(e-id[i]);
}
e-mutex = e-action.mutex;
SET_CONTINUATION_HANDLER(e, DNSEntry::postEvent);
e-submit_thread-schedule_imm_signal(e);
  }

There are calls which will schedule on *ANY* event thread (e.g. 
eventProcessor.schedule_XX).  These could schedule (e.g. a timeout or other 
event) on the ET_DNS thread which perhaps isn't initialized for all the 
processors (e.g. sessions).

At one point I removed all calls to the non-specific thread schedule calls, but 
it is possible there are some how/still.
 

 Trafficserver 4.1 Crash with proxy.config.dns.dedicated_thread = 1
 --

 Key: TS-2193
 URL: https://issues.apache.org/jira/browse/TS-2193
 Project: Traffic Server
  Issue Type: Bug
  Components: DNS
Affects Versions: 4.1.0
Reporter: Tommy Lee
 Fix For: 4.1.0


 Hi all,
   I've tried to enable DNS Thread without luck.
   When i set proxy.config.dns.dedicated_thread to 1, it crashes with the 
 information below.
   The ATS is working in Forward Proxy mode.
   Thanks in advance.
 --
 traffic.out
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /usr/local/cache-4.1/bin/traffic_server - STACK TRACE: 
 /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x2af714875cb0]
 /usr/local/cache-4.1/bin/traffic_server(_Z16_acquire_sessionP13SessionBucketPK8sockaddrR7INK_MD5P6HttpSM+0x52)[0x51dac2]
 /usr/local/cache-4.1/bin/traffic_server(_ZN18HttpSessionManager15acquire_sessionEP12ContinuationPK8sockaddrPKcP17HttpClientSessionP6HttpSM+0x3d1)[0x51e0f1]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x30c)[0x53644c]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x6a0)[0x537560]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x57e)[0x53743e]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM27state_hostdb_reverse_lookupEiPv+0xb9)[0x526b99]
 /usr/local/cache-4.1/bin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x531be8]
 /usr/local/cache-4.1/bin/traffic_server[0x5d7c8a]
 /usr/local/cache-4.1/bin/traffic_server(_ZN18HostDBContinuation8dnsEventEiP7HostEnt+0x821)[0x5decd1]
 /usr/local/cache-4.1/bin/traffic_server(_ZN8DNSEntry9postEventEiP5Event+0x44)[0x5f7a94]
 /usr/local/cache-4.1/bin/traffic_server[0x5fd382]
 /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler8recv_dnsEiP5Event+0x852)[0x5fee72]
 /usr/local/cache-4.1/bin/traffic_server(_ZN10DNSHandler9mainEventEiP5Event+0x14)[0x5ffd94]
 /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x91)[0x6b2a41]
 /usr/local/cache-4.1/bin/traffic_server(_ZN7EThread7executeEv+0x514)[0x6b3534]
 /usr/local/cache-4.1/bin/traffic_server[0x6b17ea]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x2af71486de9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x2af71558dccd]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-947) AIO Race condition on non NT systems

2013-09-04 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758339#comment-13758339
 ] 

John Plevyak commented on TS-947:
-

Yes, this has been fixed.

john





 AIO Race condition on non NT systems
 

 Key: TS-947
 URL: https://issues.apache.org/jira/browse/TS-947
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
 Environment: stock build with static libts, running on a 4 core server
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 4.2.0

 Attachments: lock-safe-AIO.patch, timed-wait-AIO.patch


 Refer to code below.  The timeslice starting when a consumer thread 
 determines that the temp_list is empty (A) and ending when it releases the 
 aio_mutex(C) is unsafe if the work queues are empty and it breaks loop 
 execution at B.  During this timeslice (A-C) the consumer holds the aio_mutex 
 and as a result request producers enqueue items on the temporary atomic list 
 (D).  As a consumer in this state will wait for a signal on aio_cond to 
 proceed before processing the temp_list again, any requests on the temp_list 
 are effectively stalled until a future request produces this signal or 
 manually processes the temp_list.
 In the case of cache volume initialization, there is no future request and 
 the initialization sequence soft locks. 
 {code:title=iocore/aio/AIO.cc(annotated)}
 void *
 aio_thread_main(void *arg)
 {
   ...
   ink_mutex_acquire(my_aio_req-aio_mutex);
   for (;;) {
 do {
   current_req = my_aio_req;
   /* check if any pending requests on the atomic list */
 A  if (!INK_ATOMICLIST_EMPTY(my_aio_req-aio_temp_list))
 aio_move(my_aio_req);
   if (!(op = my_aio_req-aio_todo.pop())  !(op =
 my_aio_req-http_aio_todo.pop()))
 Bbreak;
   ...
   service request
   ...
 } while (1);
 Cink_cond_wait(my_aio_req-aio_cond, my_aio_req-aio_mutex);
   }
   ...
 }
 static void
 aio_queue_req(AIOCallbackInternal *op, int fromAPI = 0)
 {
   ...
   if (!ink_mutex_try_acquire(req-aio_mutex)) {
 Dink_atomiclist_push(req-aio_temp_list, op);
   } else {
 /* check if any pending requests on the atomic list */
 if (!INK_ATOMICLIST_EMPTY(req-aio_temp_list))
   aio_move(req);
 /* now put the new request */
 aio_insert(op, req);
 ink_cond_signal(req-aio_cond);
 ink_mutex_release(req-aio_mutex);
   }
   ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-29 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669893#comment-13669893
 ] 

John Plevyak commented on TS-1648:
--

Rather than long we should be using int64 as long is not well defined (it is 
platform dependent). Are those 10TB RAIDs?  If so, you are better of using them 
as JBOD since ATS assumes that there is a single disk arm (or equal fraction) 
for each disk is storage.config.  Because of the size of your disk it is 
possible that you have more than 2^31 directory entries which would account for 
the overflow.  Also, given the size, the clear may take a long time.  Your 
trace is not long enough for me to see if it repeats.  However, if it does 
repeat, it is possible that it is because dir_in_bucket also takes an int which 
is then multiplied to get a directory number.  The other possibility is (of 
course) that you have memory corruption: the directory is the single largest 
memory user, and it contains a linked list which can be circularized by 
corruption, but let's concentrate on the other issues first.

I would suggest that we change all the bucket/entry/etc offsets to int64 (I can 
build a patch, but I would appreciate a review).  Second, I would suggest 
(after testing to ensure that the patch fixes your problem) that you move to 
JBOD rather than RAID-0 or to having multiple NAS volumes which correspond 
approximately to the number of underlying disks since ATS will only have one 
outstanding write (although multiple reads) for each disk in storage.config.  

 Segmentation fault in dir_clear_range()
 ---

 Key: TS-1648
 URL: https://issues.apache.org/jira/browse/TS-1648
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.3.0, 3.2.0
 Environment: reverse proxy
Reporter: Tomasz Kuzemko
Assignee: weijin
  Labels: A
 Fix For: 3.3.3

 Attachments: 
 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch


 I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
 2x 10TB raw disks. I do not use cache compression. After a few days of 
 running (this is a dev machine - not handling any traffic) ATS begins to 
 crash with a segfault shortly after start:
 [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
 snap 1357917060690487000
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x720ad700 (LWP 17292)]
 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
 at CacheDir.cc:382
 382   CacheDir.cc: No such file or directory.
   in CacheDir.cc
 (gdb) p i
 $1 = 214748365
 (gdb) l
 377   in CacheDir.cc
 (gdb) p dir_index(vol, i)
 $2 = (Dir *) 0x7ff997a04002
 (gdb) p dir_index(vol, i-1)
 $3 = (Dir *) 0x7ffa97a03ff8
 (gdb) p *dir_index(vol, i-1)
 $4 = {w = {0, 0, 0, 0, 0}}
 (gdb) p *dir_index(vol, i-2)
 $5 = {w = {0, 0, 52431, 52423, 0}}
 (gdb) p *dir_index(vol, i)
 Cannot access memory at address 0x7ff997a04002
 (gdb) p *dir_index(vol, i+2)
 Cannot access memory at address 0x7ff997a04016
 (gdb) p *dir_index(vol, i+1)
 Cannot access memory at address 0x7ff997a0400c
 (gdb) p vol-buckets * DIR_DEPTH * vol-segments
 $6 = 1246953472
 (gdb) bt
 #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
 vol=0x16057d0) at CacheDir.cc:382
 #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
 event=3900, data=0x16058a0) at Cache.cc:1384
 #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
 event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
 #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
 event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
 #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
 data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
 #5  0x00700fec in EThread::process_event (this=0x736c4010, 
 e=0x135afc0, calling_code=1) at UnixEThread.cc:142
 #6  0x007011ff in EThread::execute (this=0x736c4010) at 
 UnixEThread.cc:191
 #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
 #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
 #9  0x755c6b6d in clone () from /lib/libc.so.6
 #10 0x in ?? ()
 This is fixed by running traffic_server -Kk to clear the cache. But after a 
 few days the issue reappears.
 I will keep the current faulty setup as-is in case you need me to provide 
 more data. I tried to make a core dump but it took a couple of GB even after 
 gzip (I can however provide it on request).
 *Edit*
 OS is Debian GNU/Linux 6.0.6 with custom built kernel 
 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.

[jira] [Commented] (TS-1648) Segmentation fault in dir_clear_range()

2013-05-29 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670005#comment-13670005
 ] 

John Plevyak commented on TS-1648:
--

I added a patch to make the variables I think are causing the problem int64_t.

 Segmentation fault in dir_clear_range()
 ---

 Key: TS-1648
 URL: https://issues.apache.org/jira/browse/TS-1648
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.3.0, 3.2.0
 Environment: reverse proxy
Reporter: Tomasz Kuzemko
Assignee: John Plevyak
  Labels: A
 Fix For: 3.3.3

 Attachments: 
 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch, 
 cachedir_int64-jp-1.patch


 I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
 2x 10TB raw disks. I do not use cache compression. After a few days of 
 running (this is a dev machine - not handling any traffic) ATS begins to 
 crash with a segfault shortly after start:
 [Jan 11 16:11:00.690] Server {0x72bb8700} DEBUG: (rusage) took rusage 
 snap 1357917060690487000
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x720ad700 (LWP 17292)]
 0x00696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
 at CacheDir.cc:382
 382   CacheDir.cc: No such file or directory.
   in CacheDir.cc
 (gdb) p i
 $1 = 214748365
 (gdb) l
 377   in CacheDir.cc
 (gdb) p dir_index(vol, i)
 $2 = (Dir *) 0x7ff997a04002
 (gdb) p dir_index(vol, i-1)
 $3 = (Dir *) 0x7ffa97a03ff8
 (gdb) p *dir_index(vol, i-1)
 $4 = {w = {0, 0, 0, 0, 0}}
 (gdb) p *dir_index(vol, i-2)
 $5 = {w = {0, 0, 52431, 52423, 0}}
 (gdb) p *dir_index(vol, i)
 Cannot access memory at address 0x7ff997a04002
 (gdb) p *dir_index(vol, i+2)
 Cannot access memory at address 0x7ff997a04016
 (gdb) p *dir_index(vol, i+1)
 Cannot access memory at address 0x7ff997a0400c
 (gdb) p vol-buckets * DIR_DEPTH * vol-segments
 $6 = 1246953472
 (gdb) bt
 #0  0x00696a71 in dir_clear_range (start=640, end=17024, 
 vol=0x16057d0) at CacheDir.cc:382
 #1  0x0068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
 event=3900, data=0x16058a0) at Cache.cc:1384
 #2  0x004e8e1c in Continuation::handleEvent (this=0x16057d0, 
 event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
 #3  0x00692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
 event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
 #4  0x004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
 data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
 #5  0x00700fec in EThread::process_event (this=0x736c4010, 
 e=0x135afc0, calling_code=1) at UnixEThread.cc:142
 #6  0x007011ff in EThread::execute (this=0x736c4010) at 
 UnixEThread.cc:191
 #7  0x006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
 #8  0x7797e8ca in start_thread () from /lib/libpthread.so.0
 #9  0x755c6b6d in clone () from /lib/libc.so.6
 #10 0x in ?? ()
 This is fixed by running traffic_server -Kk to clear the cache. But after a 
 few days the issue reappears.
 I will keep the current faulty setup as-is in case you need me to provide 
 more data. I tried to make a core dump but it took a couple of GB even after 
 gzip (I can however provide it on request).
 *Edit*
 OS is Debian GNU/Linux 6.0.6 with custom built kernel 
 3.2.13-grsec--grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-745) Support ssd

2013-05-21 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663697#comment-13663697
 ] 

John Plevyak commented on TS-745:
-

Humm... let me read over the code.  An SSD layer is necessary at this point, 
and if this is ephemeral, I am sure we can find a clean integration.

thanx!

 Support ssd
 ---

 Key: TS-745
 URL: https://issues.apache.org/jira/browse/TS-745
 Project: Traffic Server
  Issue Type: New Feature
  Components: Cache
Reporter: mohan_zl
Assignee: weijin
 Fix For: 3.3.5

 Attachments: 0001-TS-745-support-interim-caching-in-storage.patch, 
 ts-745.diff, TS-ssd-2.patch, TS-ssd.patch


 A patch for supporting, not work well for a long time with --enable-debug

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-745) Support ssd

2013-05-20 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662684#comment-13662684
 ] 

John Plevyak commented on TS-745:
-

I think the idea of stealing bits from the directory which are hard coded to 
point off device (off the hard disk which the directory is a part of) is a huge 
design departure and a problem.  When the cache was first built, it was limited 
to 8GB disks which seemed HUGE.  For Apache I extended it to .5PB as by then 
8GB was far too small.  Currently disks are at 4TB and this patch would 
decrease the limit from .5PB to 32TB which gives us only a few years headroom, 
not a good idea.   Furthermore, the current design let's you unplug any cache 
disk from any machine, move it to another machine and have your cache back.   
This change stores SSD information in the HDD directory! why?  Changing the 
configuration, a disk or machine failure, etc. invalidates that information 
corrupting the cache.   Why not store that information in a side structure and 
either store it only in memory only or on the SSD?   

The idea of storing the SSD configuration in a string in records.config is also 
a bad idea.

Overall, a stacked cache seems like a better idea or a minimally invasive 
extension would be great.   This patch is pretty invasive, duplicates code and 
generally touches many bits of the code.  The ram cache for example uses no 
bits in the HDD directory and only a couple entry points at well defined places 
(insert, lookup and delete/invalidate).

This patch looks to incur more technical depth at a time when I think we would 
like to decrease the technical debt.  For example, it would be nice to have 
more smaller locks, move the HTTP support out of the core via a well defined 
interface, add layering, etc.   Adding yet another set of core code paths is 
going to make those changes harder.

my 2 cents.

 Support ssd
 ---

 Key: TS-745
 URL: https://issues.apache.org/jira/browse/TS-745
 Project: Traffic Server
  Issue Type: New Feature
  Components: Cache
Reporter: mohan_zl
Assignee: weijin
 Fix For: 3.3.5

 Attachments: 0001-TS-745-support-interim-caching-in-storage.patch, 
 ts-745.diff, TS-ssd-2.patch, TS-ssd.patch


 A patch for supporting, not work well for a long time with --enable-debug

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1453) remove InactivityCop and enable define INACTIVITY_TIMEOUT

2013-04-19 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636624#comment-13636624
 ] 

John Plevyak commented on TS-1453:
--

Couple things:  1) the lock is always held over accesses to disabled so it 
doesn't need to be volatile 2) I would just change the callback_event to a new 
EVENT_DISABLED and handle it in NetVConnction::mainEvent.   The reason is that 
this will isolate the changes to the net processor, and I think the interaction 
of the disabled flag with the timeouts is problematic: you are going to end up 
rescheduling the event as an immediate eventually which will cause a lot of 
busy processing.

 remove InactivityCop and enable define INACTIVITY_TIMEOUT
 -

 Key: TS-1453
 URL: https://issues.apache.org/jira/browse/TS-1453
 Project: Traffic Server
  Issue Type: Sub-task
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.5

 Attachments: TS-1453.patch


 when we have O(1), then we can be enable define INACTIVITY_TIMEOUT

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-13 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631112#comment-13631112
 ] 

John Plevyak commented on TS-1405:
--

A third drop in performance on any test is a red flag.  There is definitely 
something wrong.  There are two things going on in this patch.  1) it replaces 
the power of 2 buckets with a time wheel and 2) it introduces an atomic list as 
a mechanism for freeing up events quickly.  Perhaps we can test the two 
separately?  In particular, we can remove the atomic list effects by just 
having Event::cancel_event() call cancel_action() and commenting out the call 
to process_cancelled_events().

Leif, you up for running your test again with that change?



 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-09 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627411#comment-13627411
 ] 

John Plevyak commented on TS-1405:
--


Weird.  The min and max are down, but the mean is up.  What happens when you go 
to 500 connections?  I am wondering if it is an efficiency or a latency issue.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-08 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625447#comment-13625447
 ] 

John Plevyak commented on TS-1405:
--

Sounds good.   What sort of CPU/Memory improvements are you seeing?

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-08 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625460#comment-13625460
 ] 

John Plevyak commented on TS-1405:
--

The patch includes:

+#if AIO_MODE == AIO_MODE_NATIVE
+#define AIO_PERIOD-HRTIME_MSECONDS(4)
+#else

Even if it was set to zero, on an unloaded system it would only get
polled every 10 msecs because that is the poll rate for epoll(), so
you could potentially delay a disk IO by that amount of time.







 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-04-08 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625510#comment-13625510
 ] 

John Plevyak commented on TS-1405:
--

Perhaps this is a larger issue.   We use eventfd to wake up the event thread on 
an unloaded system, but it would be best to avoid using it when the system 
becomes loaded as it is expensive and tends to cause spinning on moderately 
loaded systems. Perhaps instead we should have operational regimes: use 
blocking IO threads on an unloaded or lightly loaded system and switching to 
AIO as the system becomes more heavily loaded.   I would also be interested to 
see how this interacts with SSDs which can have wait times in the micro-second 
range.   The crossover point for an SSD system is likely different than for an 
HDD system.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1760) use linux native aio

2013-04-03 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621157#comment-13621157
 ] 

John Plevyak commented on TS-1760:
--

This patch seems to have a latency issues and a busy wait issue.The timeout 
on io_getevents is 4msec which is below the threshold on some systems for busy 
waiting which is often 10msec.   Second, it queues the events onto a handler in 
the same thread rather than doing the io_submit itself.  Third, if the handler 
which calls io_getevents on an EThread, there is already another call to 
epoll() which is blocking the same thread.  Having two calls on the same thread 
blocking the thread is not a good idea: they will conflict with one blocking 
while the other has ready data (i.e. from the net or from the disk).

If io_submit is thread safe while there is an currently waiting io_getevents on 
another thread, then linux aio might be viable for traffic server.  If 
io_getevents played well with epoll() then linux aio might be viable.   Really, 
to get this to work Linux would need to have an integrated async completion API.

 use linux native aio
 

 Key: TS-1760
 URL: https://issues.apache.org/jira/browse/TS-1760
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Reporter: weijin
Assignee: weijin
 Fix For: 3.3.2

 Attachments: native_aio.patch


 add a feature that use linux native aio

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system

2013-03-30 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1405:
-

Attachment: linux_time_wheel_v11jp.patch

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-30 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618129#comment-13618129
 ] 

John Plevyak commented on TS-1405:
--

I missed on case, fixed in v11. 

I agree that you won't see the race if the timeout (50msec) is sufficiently 
large and no thread fails to be rescheduled and run in that amount of time, but 
I think such timing dependent behavior is to be avoided if possible.  We have 
have a couple other races of this type, uses of new_Freer() and flushing of the 
log buffers but the former use a much larger timeout (1 minute) while the 
latter may be a cause of occasional crashes which we have not been able to 
debug for years.  Experiences with the log buffer flushing issue are why I am 
not happy with a race in the event code.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v11jp.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-29 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617873#comment-13617873
 ] 

John Plevyak commented on TS-1405:
--

No, in the current patch (v10) in process_event the event will only be free'd 
if cancelled is set to CANCEL_SET which means that the Event is not in the 
atomic_list.

The current v10 patch is simple, fast and has no delay and hence no opportunity 
for timing related problems.

The previous patch checks Event::in_the_priority_queue which can change state 
at any time when the Event::ethread != this_ethread(). This is a race, and as a 
result the state of the Event being on the atomic_list is not knowable in the 
EThread during ::execute().  This will result in crashes.  You may not be 
seeing them because we typically pin all transactions to a single thread unless 
proxy.config.share_server_session is set to 1, so Event::ethread == 
this_ethread(), however that is not the case in general. Try testing with this 
and the appropriate configuration and you will see the problem.




 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
 linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
 linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
 linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-29 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617877#comment-13617877
 ] 

John Plevyak commented on TS-1405:
--

If anyone else would like to chime in, I would appreciate it.  Race conditions 
are subtle and when they exist, lead to random crashes which are very difficult 
to debug, so I would like to be sure that we are not introducing any races with 
this change.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
 linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
 linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
 linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system

2013-03-28 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1405:
-

Attachment: linux_time_wheel_v10jp.patch

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
 linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
 linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
 linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-28 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616342#comment-13616342
 ] 

John Plevyak commented on TS-1405:
--

I am still concerned about race conditions with the v9 patch.  In particular 
when the cancelled flag is set is possible (but not certain) that the event 
will be in the atomic list.  If it is, then it should not be free'd, but if it 
is not it should be. Doing the wrong thing is either a leak or memory 
corruption. Furthermore, if we are cancelling from a different thread than the 
one the Event is on, the in_the_priority_queue flag is racy (it may change at 
any time) and hence should not be relied upon.

Attached please find v10.  This patch converts the 'cancelled' flag into a 
multi-state variable which captures whether or not the Event is in the atomic 
list.   All tests of the cancelled variable now do the right thing with 
respect to the state of the event.

Bin Chen: please take a look at this patch and consider the possible races and 
tell me what you think.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v10jp.patch, 
 linux_time_wheel_v2.patch, linux_time_wheel_v3.patch, 
 linux_time_wheel_v4.patch, linux_time_wheel_v5.patch, 
 linux_time_wheel_v6.patch, linux_time_wheel_v7.patch, 
 linux_time_wheel_v8.patch, linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-1405) apply time-wheel scheduler about event system

2013-03-26 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1405:
-

Attachment: linux_time_wheel_v9jp.patch

Fix Mutex leak and remove delay.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-26 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614895#comment-13614895
 ] 

John Plevyak commented on TS-1405:
--

I have uploaded a small modification on the recent v8 patch.  This modification 
removes the delay, fixes a memory leak (of Mutex) and avoids going through the 
atomic list if we are on the same thread (the typical case).

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch, 
 linux_time_wheel_v9jp.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-23 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611638#comment-13611638
 ] 

John Plevyak commented on TS-1405:
--

+TS_INLINE void
+Event::cancel_event(Continuation * c)
+{
+  if (!cancelled) {
+ink_assert(!c || c == continuation);
+ethread-set_event_cancel(this);
+cancelled = true;
+  }
+}

Once set_event_cancel has run, the Event may be deleted at any time.   Do not 
set the cancelled flag here.  It is set in set_cancel_event() in any case.  If 
you set it here you can overwrite free memory (or worse a another event).

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-23 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611749#comment-13611749
 ] 

John Plevyak commented on TS-1405:
--

I think depending on the delay is brittle.  You can never tell how long a 
thread will be delayed in an overloaded system, and the delay increases memory 
pressure.  Rather I would remove the delay, moving the line

+  event_cancel_list_head = (Event *) ink_atomiclist_popall(event_cancel_list);

above the loop in process_cancel_event() (and remove the time test).

Then I would move the assignment of cancelled = true into set_event_cancel:

if (!e-canceled) {
  if (e-in_the_priority_queue  (e-timeout_at - e-ethread-cur_time)  
HRTIME_SECONDS(event_cancel_limit)) {
/* prevent more threads cancel one event racing */
e-cancelled = true;
ink_atomiclist_push(event_cancel_list, e);
  } else
e-cancelled = true;
}

In fact, I would just incorporate the code in set_event_cancel into 
cancel_event() since it is only called in one place.

So, I agree, that the delay would most likely have prevented a problem, but I 
think it would be better to not have it, because when future programmers see a 
constant delay, they might be tempted to decrease it to the point when problems 
might occur.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-21 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609122#comment-13609122
 ] 

John Plevyak commented on TS-1405:
--

Why is it segfaulting?  Can we backout the commit(s) which which caused the 
problem?

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch, linux_time_wheel_v8.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-20 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608136#comment-13608136
 ] 

John Plevyak commented on TS-1405:
--

If everything is correct there should be no race.  You shouldn't be setting the 
'cancelled' flag in cancel_event() since it is set in set_cancelled_event.  
Remove the ink_release_assert().  We should not have any of these: they slow 
the code down and lead to crash storms which are bad for everyone.

There is no race because the caller needs to be holding the mutex, and after 
the call to cancel_event() the event is considered dead (which is why you 
shouldn't be setting the cancelled flag AFTER inserting the event into the 
cancel atomic list, because that is a race).

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-20 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608142#comment-13608142
 ] 

John Plevyak commented on TS-1405:
--

There are only very limited reasons to use an ink_release_assert, in particular 
if it looks like we could be returning the wrong content to a user.  We 
shouldn't use them to check other invariants as such checks just slow down the 
production server and are better done during regression testing and not at 
production time.  Moreover, a server that crashes can cause major service 
disruption, so the assert itself may very well cause more harm than a bug.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch, 
 linux_time_wheel_v7.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-19 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606671#comment-13606671
 ] 

John Plevyak commented on TS-1405:
--

You are using EVENT_FREE, which does not free the mutex (which is reference 
counted) by setting it to NULL.   Try using free_event().

Also, I think process_cancel_event shouldn't delay for 4 seconds, that is far 
too long.  Perhaps 10 msec?

Finally, why is the ink_atomic_popall happening at the end of process cancel 
event? shouldn't event_cancel_list_head be local and the call happen at the 
start (after the delay)?


 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch, linux_time_wheel_v6.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-16 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604324#comment-13604324
 ] 

John Plevyak commented on TS-1405:
--

Could you update this patch to be against the current master branch?

I am getting a compile failure:

UnixEThread.cc: In constructor 'EThread::EThread()':
UnixEThread.cc:57:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope
UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, int)':
UnixEThread.cc:79:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope
UnixEThread.cc: In constructor 'EThread::EThread(ThreadType, Event*, ink_sem*)':
UnixEThread.cc:116:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope

and a patch failure:

--- iocore/net/P_UnixNetVConnection.h
+++ iocore/net/P_UnixNetVConnection.h
@@ -339,7 +339,7 @@
   inactivity_timeout_in = 0;
 #ifdef INACTIVITY_TIMEOUT
   if (inactivity_timeout) {
-inactivity_timeout-cancel_action(this);
+inactivity_timeout-cancel_event(this);
 inactivity_timeout = NULL;
   }
 #else
@@ -351,7 +351,7 @@
 UnixNetVConnection::cancel_active_timeout()
 {
   if (active_timeout) {
-active_timeout-cancel_action(this);
+active_timeout-cancel_event(this);
 active_timeout = NULL;
 active_timeout_in = 0;
   }
~





 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-03-16 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604486#comment-13604486
 ] 

John Plevyak commented on TS-1405:
--

Thanx!



 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch, 
 linux_time_wheel_v5.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap

2013-03-12 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600214#comment-13600214
 ] 

John Plevyak commented on TS-1742:
--

There are still a number of volatile declarations associated head_p, and
they need
to be made consistent.  Anyone with an ARM/i386 system want to do the
honors?

At the very least it looks like here and in ink_queue.h where head_p itself
is declared volatile.

john




 Freelists to use 64bit version w/ Double Word Compare and Swap
 --

 Key: TS-1742
 URL: https://issues.apache.org/jira/browse/TS-1742
 Project: Traffic Server
  Issue Type: Improvement
Reporter: Brian Geffon
Assignee: Brian Geffon
 Fix For: 3.3.2

 Attachments: 128bit_cas.patch, 128bit_cas.patch.2


 So to those of you familiar with the freelists you know that it works this 
 way the head pointer uses the upper 16 bits for a version to prevent the ABA 
 problem. The big drawback to this is that it requires the following macros to 
 get at the pointer or the version:
 {code}
 #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)16)16) | \
  (((~intptr_t)(_x).data)1663)-1))48)48)))  // sign extend
 #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)48)
 #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
   (_x).data = intptr_t)(_p))0xULL) | (((_v)0xULL) 
  48))
 {code}
 Additionally, since this only leaves 16 bits it limits the number of versions 
 you can have, well more and more x86_64 processors support DCAS (double word 
 compare and swap / 128bit CAS). This means that we can use 64bits for a 
 version which basically makes the versions unlimited but more importantly it 
 takes those macros above and simplifies them to:
 {code}
 #define FREELIST_POINTER(_x) (_x).s.pointer
 #define FREELIST_VERSION(_x) (_x).s.version
 #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
 (_x).s.pointer = _p; (_x).s.version = _v
 {code}
 As you can imagine this will have a performance improvement, in my simple 
 tests I measured a performance improvement of around 6%. Unfortunately, I'm 
 not an expert with this stuff and I would really appreciate more community 
 feedback before I commit this patch.
 Note: this only applies if you're not using a reclaimable freelist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap

2013-03-10 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598350#comment-13598350
 ] 

John Plevyak commented on TS-1742:
--

Looks good.  I might consider adding an ink_debug_assert that the type_size for 
the freelist now be at least 16 bytes when this is enabled.

 Freelists to use 64bit version w/ Double Word Compare and Swap
 --

 Key: TS-1742
 URL: https://issues.apache.org/jira/browse/TS-1742
 Project: Traffic Server
  Issue Type: Improvement
Reporter: Brian Geffon
Assignee: Brian Geffon
 Attachments: 128bit_cas.patch, 128bit_cas.patch.2


 So to those of you familiar with the freelists you know that it works this 
 way the head pointer uses the upper 16 bits for a version to prevent the ABA 
 problem. The big drawback to this is that it requires the following macros to 
 get at the pointer or the version:
 {code}
 #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)16)16) | \
  (((~intptr_t)(_x).data)1663)-1))48)48)))  // sign extend
 #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)48)
 #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
   (_x).data = intptr_t)(_p))0xULL) | (((_v)0xULL) 
  48))
 {code}
 Additionally, since this only leaves 16 bits it limits the number of versions 
 you can have, well more and more x86_64 processors support DCAS (double word 
 compare and swap / 128bit CAS). This means that we can use 64bits for a 
 version which basically makes the versions unlimited but more importantly it 
 takes those macros above and simplifies them to:
 {code}
 #define FREELIST_POINTER(_x) (_x).s.pointer
 #define FREELIST_VERSION(_x) (_x).s.version
 #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
 (_x).s.pointer = _p; (_x).s.version = _v
 {code}
 As you can imagine this will have a performance improvement, in my simple 
 tests I measured a performance improvement of around 6%. Unfortunately, I'm 
 not an expert with this stuff and I would really appreciate more community 
 feedback before I commit this patch.
 Note: this only applies if you're not using a reclaimable freelist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1742) Freelists to use 64bit version w/ Double Word Compare and Swap

2013-03-10 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598501#comment-13598501
 ] 

John Plevyak commented on TS-1742:
--

We can take it out because we do the loads manually for the cas.

I always liked to use volatile as a marker that the variable was being
accessed outside
of a lock, but if it is causing a performance problem then we could convert
the keyword
into a comment:

// Warning: this variable is read and written in multiple threads without a
lock, use INK_QUEUE_LD to read safely.

john




 Freelists to use 64bit version w/ Double Word Compare and Swap
 --

 Key: TS-1742
 URL: https://issues.apache.org/jira/browse/TS-1742
 Project: Traffic Server
  Issue Type: Improvement
Reporter: Brian Geffon
Assignee: Brian Geffon
 Attachments: 128bit_cas.patch, 128bit_cas.patch.2


 So to those of you familiar with the freelists you know that it works this 
 way the head pointer uses the upper 16 bits for a version to prevent the ABA 
 problem. The big drawback to this is that it requires the following macros to 
 get at the pointer or the version:
 {code}
 #define FREELIST_POINTER(_x) ((void*)(intptr_t)(_x).data)16)16) | \
  (((~intptr_t)(_x).data)1663)-1))48)48)))  // sign extend
 #define FREELIST_VERSION(_x) (((intptr_t)(_x).data)48)
 #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
   (_x).data = intptr_t)(_p))0xULL) | (((_v)0xULL) 
  48))
 {code}
 Additionally, since this only leaves 16 bits it limits the number of versions 
 you can have, well more and more x86_64 processors support DCAS (double word 
 compare and swap / 128bit CAS). This means that we can use 64bits for a 
 version which basically makes the versions unlimited but more importantly it 
 takes those macros above and simplifies them to:
 {code}
 #define FREELIST_POINTER(_x) (_x).s.pointer
 #define FREELIST_VERSION(_x) (_x).s.version
 #define SET_FREELIST_POINTER_VERSION(_x,_p,_v) \
 (_x).s.pointer = _p; (_x).s.version = _v
 {code}
 As you can imagine this will have a performance improvement, in my simple 
 tests I measured a performance improvement of around 6%. Unfortunately, I'm 
 not an expert with this stuff and I would really appreciate more community 
 feedback before I commit this patch.
 Note: this only applies if you're not using a reclaimable freelist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-27 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588493#comment-13588493
 ] 

John Plevyak commented on TS-1405:
--

I am getting some compilation errors with tcc 4.7.2 :

UnixEThread.cc:159:83: error: no matching function for call to 
'ink_atomic_cas(int32_t*, bool, bool)'
UnixEThread.cc:159:83: note: candidate is:
In file included from ../../lib/ts/libts.h:52:0,
 from P_EventSystem.h:39,
 from UnixEThread.cc:30:
../../lib/ts/ink_atomic.h:152:1: note: templateclass T bool 
ink_atomic_cas(volatile T*, T, T)
../../lib/ts/ink_atomic.h:152:1: note:   template argument 
deduction/substitution failed:
UnixEThread.cc:159:83: note:   deduced conflicting types for parameter 'T' 
('int' and 'bool')

Also:
UnixEThread.cc: In constructor 'EThread::EThread()':
UnixEThread.cc:58:81: error: 'IOCORE_ReadConfigInteger' was not declared in 
this scope



 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.1

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-27 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588498#comment-13588498
 ] 

John Plevyak commented on TS-1405:
--

Instance variables CancelList need to start with a lower case letter and use 
_ to separate words (like all the other variables in this file).

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.1

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2013-02-27 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588500#comment-13588500
 ] 

John Plevyak commented on TS-1405:
--

The atomic list is single linked, so you could use SLINK for clink in Event.  
There are lots of events, so an extra field is worth saving.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: Bin Chen
Assignee: Bin Chen
 Fix For: 3.3.1

 Attachments: linux_time_wheel.patch, linux_time_wheel_v2.patch, 
 linux_time_wheel_v3.patch, linux_time_wheel_v4.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1006) memory management, cut down memory waste ?

2012-12-11 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529116#comment-13529116
 ] 

John Plevyak commented on TS-1006:
--

I agree.  We should land this initially as a compile time option in the dev
branch to get wider production time on it before moving it to default.

The main reason is that it is invasive and complicated, particularly in the
way it will interact with the VM system and it would be nice to see how it
responds in a variety of environments.

If it is much better than TCMalloc, then perhaps we should package it up in
a more general form as well.

Was the design based on another allocator/paper?  Any references?

john




 memory management, cut down memory waste ?
 --

 Key: TS-1006
 URL: https://issues.apache.org/jira/browse/TS-1006
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.1.1
Reporter: Zhao Yongming
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: 0001-Allocator-optimize-InkFreeList-memory-pool.patch, 
 0002-Allocator-make-InkFreeList-memory-pool-configurable.patch, 
 Memory-Usage-After-Introduced-New-Allocator.png, memusage.ods, memusage.ods


 when we review the memory usage in the production, there is something 
 abnormal, ie, looks like TS take much memory than index data + common system 
 waste, and here is some memory dump result by set 
 proxy.config.dump_mem_info_frequency
 1, the one on a not so busy forwarding system:
 physics memory: 32G
 RAM cache: 22G
 DISK: 6140 GB
 average_object_size 64000
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   671088640 |   37748736 |2097152 | 
 memory/ioBufAllocator[14]
  2248146944 | 2135949312 |1048576 | 
 memory/ioBufAllocator[13]
  1711276032 | 1705508864 | 524288 | 
 memory/ioBufAllocator[12]
  1669332992 | 1667760128 | 262144 | 
 memory/ioBufAllocator[11]
  2214592512 | 221184 | 131072 | 
 memory/ioBufAllocator[10]
  2325741568 | 2323775488 |  65536 | 
 memory/ioBufAllocator[9]
  2091909120 | 2089123840 |  32768 | 
 memory/ioBufAllocator[8]
  1956642816 | 1956478976 |  16384 | 
 memory/ioBufAllocator[7]
  2094530560 | 2094071808 |   8192 | 
 memory/ioBufAllocator[6]
   356515840 |  355540992 |   4096 | 
 memory/ioBufAllocator[5]
 1048576 |  14336 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   32768 |  0 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
   0 |  0 |576 | 
 memory/ICPRequestCont_allocator
   0 |  0 |112 | 
 memory/ICPPeerReadContAllocator
   0 |  0 |432 | 
 memory/PeerReadDataAllocator
   0 |  0 | 32 | 
 memory/MIMEFieldSDKHandle
   0 |  0 |240 | 
 memory/INKVConnAllocator
   0 |  0 | 96 | 
 memory/INKContAllocator
4096 |  0 | 32 | 
 memory/apiHookAllocator
   0 |  0 |288 | 
 memory/FetchSMAllocator
   0 |  0 | 80 | 
 memory/prefetchLockHandlerAllocator
   0 |  0 |176 | 
 memory/PrefetchBlasterAllocator
   0 |  0 | 80 | 
 memory/prefetchUrlBlaster
   0 |  0 | 96 | memory/blasterUrlList
   0 |  0 | 96 | 
 memory/prefetchUrlEntryAllocator
   0 |  0 |128 | 
 memory/socksProxyAllocator
   0 |  0 |144 | 
 memory/ObjectReloadCont
 3258368 | 576016 |592 | 
 memory/httpClientSessionAllocator
  825344 | 139568 |208 | 
 memory/httpServerSessionAllocator
22597632 |1284848 |   9808 | memory/httpSMAllocator
   0 |  0 | 32 | 
 memory/CacheLookupHttpConfigAllocator
   0 |  0 |   9856 | 
 memory/httpUpdateSMAllocator
   0 |  

[jira] [Commented] (TS-1006) memory management, cut down memory waste ?

2012-12-10 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528528#comment-13528528
 ] 

John Plevyak commented on TS-1006:
--

Some of the volatile variables are not listed as such (e.g. 
InkThreadCache::status).

Also, what is the purpose of this status field and how is it updated?  It is 
set in ink_freelist_new to 0 via simple assignment, then tested/assigned via a 
cas in ink_freelist_free.  Some comments, or documentation would be nice.

Have you tested this against the default memory allocator and TCMalloc?

This seems to be doing something similar to TCMalloc and that code has been 
extensively tested.

 memory management, cut down memory waste ?
 --

 Key: TS-1006
 URL: https://issues.apache.org/jira/browse/TS-1006
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.1.1
Reporter: Zhao Yongming
Assignee: Bin Chen
 Fix For: 3.3.2

 Attachments: 0001-Allocator-optimize-InkFreeList-memory-pool.patch, 
 0002-Allocator-make-InkFreeList-memory-pool-configurable.patch, 
 Memory-Usage-After-Introduced-New-Allocator.png, memusage.ods, memusage.ods


 when we review the memory usage in the production, there is something 
 abnormal, ie, looks like TS take much memory than index data + common system 
 waste, and here is some memory dump result by set 
 proxy.config.dump_mem_info_frequency
 1, the one on a not so busy forwarding system:
 physics memory: 32G
 RAM cache: 22G
 DISK: 6140 GB
 average_object_size 64000
 {code}
  allocated  |in-use  | type size  |   free list name
 |||--
   671088640 |   37748736 |2097152 | 
 memory/ioBufAllocator[14]
  2248146944 | 2135949312 |1048576 | 
 memory/ioBufAllocator[13]
  1711276032 | 1705508864 | 524288 | 
 memory/ioBufAllocator[12]
  1669332992 | 1667760128 | 262144 | 
 memory/ioBufAllocator[11]
  2214592512 | 221184 | 131072 | 
 memory/ioBufAllocator[10]
  2325741568 | 2323775488 |  65536 | 
 memory/ioBufAllocator[9]
  2091909120 | 2089123840 |  32768 | 
 memory/ioBufAllocator[8]
  1956642816 | 1956478976 |  16384 | 
 memory/ioBufAllocator[7]
  2094530560 | 2094071808 |   8192 | 
 memory/ioBufAllocator[6]
   356515840 |  355540992 |   4096 | 
 memory/ioBufAllocator[5]
 1048576 |  14336 |   2048 | 
 memory/ioBufAllocator[4]
  131072 |  0 |   1024 | 
 memory/ioBufAllocator[3]
   65536 |  0 |512 | 
 memory/ioBufAllocator[2]
   32768 |  0 |256 | 
 memory/ioBufAllocator[1]
   16384 |  0 |128 | 
 memory/ioBufAllocator[0]
   0 |  0 |576 | 
 memory/ICPRequestCont_allocator
   0 |  0 |112 | 
 memory/ICPPeerReadContAllocator
   0 |  0 |432 | 
 memory/PeerReadDataAllocator
   0 |  0 | 32 | 
 memory/MIMEFieldSDKHandle
   0 |  0 |240 | 
 memory/INKVConnAllocator
   0 |  0 | 96 | 
 memory/INKContAllocator
4096 |  0 | 32 | 
 memory/apiHookAllocator
   0 |  0 |288 | 
 memory/FetchSMAllocator
   0 |  0 | 80 | 
 memory/prefetchLockHandlerAllocator
   0 |  0 |176 | 
 memory/PrefetchBlasterAllocator
   0 |  0 | 80 | 
 memory/prefetchUrlBlaster
   0 |  0 | 96 | memory/blasterUrlList
   0 |  0 | 96 | 
 memory/prefetchUrlEntryAllocator
   0 |  0 |128 | 
 memory/socksProxyAllocator
   0 |  0 |144 | 
 memory/ObjectReloadCont
 3258368 | 576016 |592 | 
 memory/httpClientSessionAllocator
  825344 | 139568 |208 | 
 memory/httpServerSessionAllocator
22597632 |1284848 |   9808 | memory/httpSMAllocator
   0 |  0 | 32 | 
 memory/CacheLookupHttpConfigAllocator
   0 |  0 |   9856 | 
 memory/httpUpdateSMAllocator
   0 |  0 |128 | 
 

[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-09-06 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449950#comment-13449950
 ] 

John Plevyak commented on TS-1405:
--

There is a race between the adding into the atomic list in the cancelling 
thread, getting dequeued in the controlling thread, and the setting of the 
cancelled flag in the cancelling thread.  One solution is to take the mutex 
lock in the check_ready code as the cancelling thread must be holding that lock 
over the insert into the atomic list and setting the cancelled flag.  Note, you 
could set the cancelled flag before adding to the atomic list and then just 
ignore it in process_thread() (and any other place) counting on it getting 
free'd eventually via the atomic list.  

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: kuotai
Assignee: kuotai
 Fix For: 3.3.1

 Attachments: linux_time_wheel.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-09-06 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449953#comment-13449953
 ] 

John Plevyak commented on TS-1405:
--

weijin: I don't know that freeing it as soon as possible is as big a goal as 
race conditions are a problem :)  The current code can take up to 5 seconds to 
free a cancelled event, so this code is much better in that regard, even if we 
have to wait for the next time the event loop runs.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: kuotai
Assignee: kuotai
 Fix For: 3.3.1

 Attachments: linux_time_wheel.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-08-28 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443575#comment-13443575
 ] 

John Plevyak commented on TS-1405:
--

The current code should have a complexity which is bounded by the need to 
scan the entire queue every 5 seconds.  This is necessary because cancelling an 
event involves setting the volatile cancelled flag and to not scan them would 
result in running out of memory.  Assuming an event is inserted with a 30 
seconds timeout and waits till it runs, it will be touched 30/5 = 6 + 10 = 16 
times.  For a 300 second timeout it will be touched 300/5 = 60 + 10 = 70 times.

If an event is cancelled (the normal case for timeouts). Then it will be 
touched once (after an average of 2.5 seconds).  So (at least according to the 
design). The cost of the current design should be only a small constant factor 
worse than the time wheel and should average slightly more than 1 touch per 
event which is the best that can be expected.   Of course that is the 
design if it is causing problems, then likely there is a bug or something 
about the workload which is causing problems.

The time wheel can bring this down to 1 touch every N seconds with expected 1 
touch per event or 6 and 60 above.

So, I think this is a very reasonable change, assuming that it can deal with 
the out-of-memory issue, and I interested in seeing the benchmarks as I am 
curious as to see how the theory and practice collide.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: kuotai
Assignee: kuotai
 Fix For: 3.3.0

 Attachments: time-wheel.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-1405) apply time-wheel scheduler about event system

2012-08-28 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443583#comment-13443583
 ] 

John Plevyak commented on TS-1405:
--

Sorry, the numbers for 30 seconds should be 30/5 + ~17 (every time a power of 2 
bucket is touched, 1/2 of the of the elements will be moved out, and 1/2 of 
those will be moved down 2 levels, etc.) = 27 vs 7 for the time wheel

So the time wheel, in the case of short expired timeouts, can be several times 
more efficient.

 apply time-wheel scheduler  about event system
 --

 Key: TS-1405
 URL: https://issues.apache.org/jira/browse/TS-1405
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Affects Versions: 3.2.0
Reporter: kuotai
Assignee: kuotai
 Fix For: 3.3.0

 Attachments: time-wheel.patch


 when have more and more event in event system scheduler, it's worse. This is 
 the reason why we use inactivecop to handler keepalive. the new scheduler is 
 time-wheel. It's have better time complexity(O(1))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (TS-1264) LRU RAM cache not accounting for overhead

2012-05-19 Thread John Plevyak (JIRA)
John Plevyak created TS-1264:


 Summary: LRU RAM cache not accounting for overhead
 Key: TS-1264
 URL: https://issues.apache.org/jira/browse/TS-1264
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: John Plevyak
Priority: Minor


The CLFUS RAM cache takes its overhead into account when determining how many 
bytes it is using.  The LRU cache does not which makes it hard to compare 
performance between the two and hard to correctly size the LRU RAM cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1240) Debug assert triggered in LogBuffer.cc:209

2012-05-16 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277284#comment-13277284
 ] 

John Plevyak commented on TS-1240:
--

What is the downside to restoring the delete delay buffer (was the memory usage 
too high)?

 Debug assert triggered in LogBuffer.cc:209
 --

 Key: TS-1240
 URL: https://issues.apache.org/jira/browse/TS-1240
 Project: Traffic Server
  Issue Type: Bug
  Components: Logging
Affects Versions: 3.1.4
Reporter: Leif Hedstrom
 Fix For: 3.1.5


 From John:
 {code}
 [May  1 09:08:44.746] Server {0x77fce800} NOTE: traffic server running
 FATAL: LogBuffer.cc:209: failed assert `m_unaligned_buffer`
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server - STACK 
 TRACE: 
 /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(ink_fatal+0xa3)[0x77bae4a5]
 /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(_ink_assert+0x3c)[0x77bad47c]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x35)[0x5d3a53]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject15_checkout_writeEPmm+0x41)[0x5eef75]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x4cb)[0x5ef5b9]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x4a)[0x5daab4]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN3Log6accessEP9LogAccess+0x235)[0x5d97f9]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12update_statsEv+0x204)[0x579872]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM9kill_thisEv+0x31d)[0x579525]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12main_handlerEiPv+0x337)[0x56cec1]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x14c)[0x5b24aa]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bb9d1]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bbafa]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x6fa)[0x6bcaaf]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z12write_to_netP10NetHandlerP18UnixNetVConnectionP14PollDescriptorP7EThread+0x7d)[0x6bc3b3]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x6e6)[0x6b8828]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread13process_eventEP5Eventi+0x111)[0x6dde7f]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread7executeEv+0x431)[0x6de42b]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6dd0bc]
 /lib64/libpthread.so.0(+0x7d90)[0x77676d90]
 /lib64/libc.so.6(clone+0x6d)[0x754f9f5d]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-15 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276422#comment-13276422
 ] 

John Plevyak commented on TS-1238:
--

I think the problem is that the LRU cache doesn't account for its overhead, 
while the CLFUS cache does which puts it at an unfair disadvantage in terms of 
relative true memory used per byte allocated.  The CLFUS cache is much better 
behaved when the working set is larger than the RAM cache size and it supports 
compression.  I am going to commit this fix and leave CLFUS as the default and 
file another bug to fix the accounting for ram in the LRU cache.   I think this 
will make the performance comparable in the best case and better in worse cases.

 RAM cache hit rate unexpectedly low
 ---

 Key: TS-1238
 URL: https://issues.apache.org/jira/browse/TS-1238
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 3.1.4

 Attachments: TS-1238-jp-1.patch


 The RAM cache is not getting the expected hit rate.  Looks like there are a 
 couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274381#comment-13274381
 ] 

John Plevyak commented on TS-934:
-

Is this still happening with the latest code?

 Proxy Mutex null pointer crash
 --

 Key: TS-934
 URL: https://issues.apache.org/jira/browse/TS-934
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Debian 6.0.2 quadcore, forward transparent proxy.
Reporter: Alan M. Carroll
Assignee: Alan M. Carroll
 Fix For: 3.1.4, 3.1.1

 Attachments: ts-934-patch.txt


 [Client report]
 We had the cache crash gracefully twice last night on a segfault.  Both 
 times the callstack produced by trafficserver's signal handler was:
 /usr/bin/traffic_server[0x529596]
 /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
 [0x2ab09e7c0a10]
 usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
 /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
 /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
 /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
 /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
 /usr/bin/traffic_server(Continuation::handleEvent(int, 
 void*)+0x69)[0x4e4623]
 I went through the disassembly and the instruction that it is on in 
 ::do_io_close is loading the value of diags (not dereferencing it) so it 
 is unlikely that that through a segfault (unless this is some how in 
 thread local storage and that is corrupt).
 The kernel message claimed that the instruction pointer was 0x4e438e 
 which in this build is in ProxyMutexPtr::operator -() on the 
 instruction that dereferences the object pointer to get the stored mutex 
 pointer (bingo!), so it would seem that at some point we are 
 dereferencing a null safe pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274383#comment-13274383
 ] 

John Plevyak commented on TS-934:
-

I think we should undo this as other changes fixed the bug.

 Proxy Mutex null pointer crash
 --

 Key: TS-934
 URL: https://issues.apache.org/jira/browse/TS-934
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Debian 6.0.2 quadcore, forward transparent proxy.
Reporter: Alan M. Carroll
Assignee: Alan M. Carroll
 Fix For: 3.1.4, 3.1.1

 Attachments: ts-934-patch.txt


 [Client report]
 We had the cache crash gracefully twice last night on a segfault.  Both 
 times the callstack produced by trafficserver's signal handler was:
 /usr/bin/traffic_server[0x529596]
 /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
 [0x2ab09e7c0a10]
 usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
 /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
 /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
 /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
 /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
 /usr/bin/traffic_server(Continuation::handleEvent(int, 
 void*)+0x69)[0x4e4623]
 I went through the disassembly and the instruction that it is on in 
 ::do_io_close is loading the value of diags (not dereferencing it) so it 
 is unlikely that that through a segfault (unless this is some how in 
 thread local storage and that is corrupt).
 The kernel message claimed that the instruction pointer was 0x4e438e 
 which in this build is in ProxyMutexPtr::operator -() on the 
 instruction that dereferences the object pointer to get the stored mutex 
 pointer (bingo!), so it would seem that at some point we are 
 dereferencing a null safe pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-934:


Assignee: John Plevyak  (was: Alan M. Carroll)

 Proxy Mutex null pointer crash
 --

 Key: TS-934
 URL: https://issues.apache.org/jira/browse/TS-934
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Debian 6.0.2 quadcore, forward transparent proxy.
Reporter: Alan M. Carroll
Assignee: John Plevyak
 Fix For: 3.1.4, 3.1.1

 Attachments: ts-934-jp1.patch, ts-934-patch.txt


 [Client report]
 We had the cache crash gracefully twice last night on a segfault.  Both 
 times the callstack produced by trafficserver's signal handler was:
 /usr/bin/traffic_server[0x529596]
 /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
 [0x2ab09e7c0a10]
 usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
 /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
 /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
 /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
 /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
 /usr/bin/traffic_server(Continuation::handleEvent(int, 
 void*)+0x69)[0x4e4623]
 I went through the disassembly and the instruction that it is on in 
 ::do_io_close is loading the value of diags (not dereferencing it) so it 
 is unlikely that that through a segfault (unless this is some how in 
 thread local storage and that is corrupt).
 The kernel message claimed that the instruction pointer was 0x4e438e 
 which in this build is in ProxyMutexPtr::operator -() on the 
 instruction that dereferences the object pointer to get the stored mutex 
 pointer (bingo!), so it would seem that at some point we are 
 dereferencing a null safe pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-934) Proxy Mutex null pointer crash

2012-05-13 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-934:


Attachment: ts-934-jp1.patch

This undoes the previous patch as this issue was addressed under a different 
bug.

 Proxy Mutex null pointer crash
 --

 Key: TS-934
 URL: https://issues.apache.org/jira/browse/TS-934
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Debian 6.0.2 quadcore, forward transparent proxy.
Reporter: Alan M. Carroll
Assignee: John Plevyak
 Fix For: 3.1.4, 3.1.1

 Attachments: ts-934-jp1.patch, ts-934-patch.txt


 [Client report]
 We had the cache crash gracefully twice last night on a segfault.  Both 
 times the callstack produced by trafficserver's signal handler was:
 /usr/bin/traffic_server[0x529596]
 /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
 [0x2ab09e7c0a10]
 usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
 /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
 /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
 /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
 /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
 /usr/bin/traffic_server(Continuation::handleEvent(int, 
 void*)+0x69)[0x4e4623]
 I went through the disassembly and the instruction that it is on in 
 ::do_io_close is loading the value of diags (not dereferencing it) so it 
 is unlikely that that through a segfault (unless this is some how in 
 thread local storage and that is corrupt).
 The kernel message claimed that the instruction pointer was 0x4e438e 
 which in this build is in ProxyMutexPtr::operator -() on the 
 instruction that dereferences the object pointer to get the stored mutex 
 pointer (bingo!), so it would seem that at some point we are 
 dereferencing a null safe pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1240) Debug assert triggered in LogBuffer.cc:209

2012-05-13 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274393#comment-13274393
 ] 

John Plevyak commented on TS-1240:
--

I an repro on my machine any time you like :)

 Debug assert triggered in LogBuffer.cc:209
 --

 Key: TS-1240
 URL: https://issues.apache.org/jira/browse/TS-1240
 Project: Traffic Server
  Issue Type: Bug
  Components: Logging
Affects Versions: 3.1.4
Reporter: Leif Hedstrom
 Fix For: 3.1.5


 From John:
 {code}
 [May  1 09:08:44.746] Server {0x77fce800} NOTE: traffic server running
 FATAL: LogBuffer.cc:209: failed assert `m_unaligned_buffer`
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server - STACK 
 TRACE: 
 /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(ink_fatal+0xa3)[0x77bae4a5]
 /home/jplevyak/projects/ts/ts-2/lib/ts/.libs/libtsutil.so.3(_ink_assert+0x3c)[0x77bad47c]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogBuffer14checkout_writeEPmm+0x35)[0x5d3a53]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject15_checkout_writeEPmm+0x41)[0x5eef75]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN9LogObject3logEP9LogAccessPc+0x4cb)[0x5ef5b9]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN16LogObjectManager3logEP9LogAccess+0x4a)[0x5daab4]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN3Log6accessEP9LogAccess+0x235)[0x5d97f9]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12update_statsEv+0x204)[0x579872]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM9kill_thisEv+0x31d)[0x579525]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN6HttpSM12main_handlerEiPv+0x337)[0x56cec1]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10HttpTunnel12main_handlerEiPv+0x14c)[0x5b24aa]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bb9d1]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6bbafa]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread+0x6fa)[0x6bcaaf]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_Z12write_to_netP10NetHandlerP18UnixNetVConnectionP14PollDescriptorP7EThread+0x7d)[0x6bc3b3]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x6e6)[0x6b8828]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN12Continuation11handleEventEiPv+0x72)[0x4e2450]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread13process_eventEP5Eventi+0x111)[0x6dde7f]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server(_ZN7EThread7executeEv+0x431)[0x6de42b]
 /a/home/jplevyak/projects/ts/ts-2/proxy/.libs/lt-traffic_server[0x6dd0bc]
 /lib64/libpthread.so.0(+0x7d90)[0x77676d90]
 /lib64/libc.so.6(clone+0x6d)[0x754f9f5d]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-08 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271042#comment-13271042
 ] 

John Plevyak commented on TS-1238:
--

It isn't committed.   Bryan was going to try it out.  It changes one of the
defaults (probably for the better) for RAM caching, but I wanted to give
him a chance to take a look.  I'll see if I can figure it out myself as
well.  It should be a very safe change.




 RAM cache hit rate unexpectedly low
 ---

 Key: TS-1238
 URL: https://issues.apache.org/jira/browse/TS-1238
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 3.1.4

 Attachments: TS-1238-jp-1.patch


 The RAM cache is not getting the expected hit rate.  Looks like there are a 
 couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-01 Thread John Plevyak (JIRA)
John Plevyak created TS-1238:


 Summary: RAM cache hit rate unexpectedly low
 Key: TS-1238
 URL: https://issues.apache.org/jira/browse/TS-1238
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 3.1.4


The RAM cache is not getting the expected hit rate.  Looks like there are a 
couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-1238) RAM cache hit rate unexpectedly low

2012-05-01 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1238:
-

Attachment: TS-1238-jp-1.patch

Add new option to disable/enable the seen_filter in the RAM cache.  Fix 
reporting of RAM cache hits to HTTP.  Fix for LRU cache.  Add back in 
seen_filter to LRU (disabled by default).  Disable seen filter by default for 
CLFUS.

 RAM cache hit rate unexpectedly low
 ---

 Key: TS-1238
 URL: https://issues.apache.org/jira/browse/TS-1238
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.3
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 3.1.4

 Attachments: TS-1238-jp-1.patch


 The RAM cache is not getting the expected hit rate.  Looks like there are a 
 couple issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-1225) doc_size still gets casted to int in a few places

2012-04-25 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-1225:
-

Attachment: ts-1225.diff

Remove cast to 32bits of doc_len.

 doc_size still gets casted to int in a few places
 -

 Key: TS-1225
 URL: https://issues.apache.org/jira/browse/TS-1225
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Leif Hedstrom
Assignee: John Plevyak
 Fix For: 3.1.4

 Attachments: ts-1225.diff


 This was also discussed on TS-475, and discovered by bwyatt. I'm filing a 
 separate bug, since I think this should be fixed independent of TS-475.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Closed] (TS-888) SSL connections working with 2.1.5 fail with 3.0.1 and FireFox

2011-08-08 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak closed TS-888.
---

Resolution: Fixed
  Assignee: John Plevyak  (was: Leif Hedstrom)

Fixed 1155125.

 SSL connections working with 2.1.5 fail with 3.0.1 and FireFox
 --

 Key: TS-888
 URL: https://issues.apache.org/jira/browse/TS-888
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 3.0.1
 Environment: Ubuntu 10.04 LTS amd64, Glassfish 3.0.1, FireFox 5.0
Reporter: Kurt Huwig
Assignee: John Plevyak
 Fix For: 3.1.0

 Attachments: TS-888-jp.patch


 ATS has SSL server certificates. The backend is accessed via SSL as well 
 which uses the same certificates. It fails with FireFox, but works with 
 Google Chrome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-888) SSL connections working with 2.1.5 fail with 3.0.1 and FireFox

2011-08-07 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-888:


Attachment: TS-888-jp.patch

 SSL connections working with 2.1.5 fail with 3.0.1 and FireFox
 --

 Key: TS-888
 URL: https://issues.apache.org/jira/browse/TS-888
 Project: Traffic Server
  Issue Type: Bug
  Components: SSL
Affects Versions: 3.0.1
 Environment: Ubuntu 10.04 LTS amd64, Glassfish 3.0.1, FireFox 5.0
Reporter: Kurt Huwig
Assignee: Leif Hedstrom
 Fix For: 3.1.0

 Attachments: TS-888-jp.patch


 ATS has SSL server certificates. The backend is accessed via SSL as well 
 which uses the same certificates. It fails with FireFox, but works with 
 Google Chrome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-844) ReadFromWriter fail in CacheRead.cc

2011-08-01 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076013#comment-13076013
 ] 

John Plevyak commented on TS-844:
-

I'd like to know what the top of the stack looked like and also what fail 
means in this context.

The patch is safe in the sense that it is conservative, but if a write has been 
closed, but
not yet been written into the aggregation buffer, this patch will prevent that 
data from being
available for a ReadFromWriter.  At least that is how I read it.

What I am wondering is what about a closed by not yet written CacheVC is making 
ReadaFromWriter fail?


 ReadFromWriter fail in CacheRead.cc
 ---

 Key: TS-844
 URL: https://issues.apache.org/jira/browse/TS-844
 Project: Traffic Server
  Issue Type: Bug
Reporter: mohan_zl
 Fix For: 3.1.0

 Attachments: TS-844.patch


 {code}
 #6  0x006ab4d7 in CacheVC::openReadChooseWriter (this=0x2aaaf81523d0, 
 event=1, e=0x0) at CacheRead.cc:320
 #7  0x006abdc9 in CacheVC::openReadFromWriter (this=0x2aaaf81523d0, 
 event=1, e=0x0) at CacheRead.cc:411
 #8  0x004d302f in Continuation::handleEvent (this=0x2aaaf81523d0, 
 event=1, data=0x0) at I_Continuation.h:146
 #9  0x006ae2b9 in Cache::open_read (this=0x2aaab0001c40, 
 cont=0x2aaab4472aa0, key=0x42100b10, request=0x2aaab44710f0, 
 params=0x2aaab4470928, type=CACHE_FRAG_TYPE_HTTP,
 hostname=0x2aab09581049 
 js.tongji.linezing.comicon1.gifjs.tongji.linezing.com�ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿ï¿½Þ­ï¿½ï¿½ï¿½...,
  host_len=22) at CacheRead.cc:228
 #10 0x0068da30 in Cache::open_read (this=0x2aaab0001c40, 
 cont=0x2aaab4472aa0, url=0x2aaab4471108, request=0x2aaab44710f0, 
 params=0x2aaab4470928,
 type=CACHE_FRAG_TYPE_HTTP) at P_CacheInternal.h:1068
 #11 0x0067d32f in CacheProcessor::open_read (this=0xf2c030, 
 cont=0x2aaab4472aa0, url=0x2aaab4471108, request=0x2aaab44710f0, 
 params=0x2aaab4470928, pin_in_cache=0,
 type=CACHE_FRAG_TYPE_HTTP) at Cache.cc:3011
 #12 0x0054e058 in HttpCacheSM::do_cache_open_read 
 (this=0x2aaab4472aa0) at HttpCacheSM.cc:220
 #13 0x0054e1a7 in HttpCacheSM::open_read (this=0x2aaab4472aa0, 
 url=0x2aaab4471108, hdr=0x2aaab44710f0, params=0x2aaab4470928, 
 pin_in_cache=0) at HttpCacheSM.cc:252
 #14 0x00568404 in HttpSM::do_cache_lookup_and_read 
 (this=0x2aaab4470830) at HttpSM.cc:3893
 #15 0x005734b5 in HttpSM::set_next_state (this=0x2aaab4470830) at 
 HttpSM.cc:6436
 #16 0x0056115a in HttpSM::call_transact_and_set_next_state 
 (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
 #17 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at 
 HttpSM.cc:1516
 #18 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, 
 event=0, data=0x0) at HttpSM.cc:1448
 #19 0x0056de77 in HttpSM::do_api_callout_internal 
 (this=0x2aaab4470830) at HttpSM.cc:4345
 #20 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at 
 HttpSM.cc:497
 #21 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at 
 HttpSM.cc:6362
 #22 0x0056115a in HttpSM::call_transact_and_set_next_state 
 (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
 #23 0x00572faf in HttpSM::set_next_state (this=0x2aaab4470830) at 
 HttpSM.cc:6378
 #24 0x0056115a in HttpSM::call_transact_and_set_next_state 
 (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
 #25 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at 
 HttpSM.cc:1516
 #26 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, 
 event=0, data=0x0) at HttpSM.cc:1448
 #27 0x0056de77 in HttpSM::do_api_callout_internal 
 (this=0x2aaab4470830) at HttpSM.cc:4345
 #28 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at 
 HttpSM.cc:497
 #29 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at 
 HttpSM.cc:6362
 #30 0x0056115a in HttpSM::call_transact_and_set_next_state 
 (this=0x2aaab4470830, f=0) at HttpSM.cc:6328
 #31 0x00574b78 in HttpSM::handle_api_return (this=0x2aaab4470830) at 
 HttpSM.cc:1516
 #32 0x0056dbe7 in HttpSM::state_api_callout (this=0x2aaab4470830, 
 event=0, data=0x0) at HttpSM.cc:1448
 #33 0x0056de77 in HttpSM::do_api_callout_internal 
 (this=0x2aaab4470830) at HttpSM.cc:4345
 #34 0x00578c89 in HttpSM::do_api_callout (this=0x2aaab4470830) at 
 HttpSM.cc:497
 #35 0x00572e93 in HttpSM::set_next_state (this=0x2aaab4470830) at 
 

[jira] [Commented] (TS-866) Need way to clear contents of a cache entry

2011-07-24 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070234#comment-13070234
 ] 

John Plevyak commented on TS-866:
-

Sorry for the delay.  I am looking at this patch.  It needs a little bit of 
work:

1) it should be built on remove instead of read (it can still share internal 
states with using the stack mechanism)
2) it should interlock writes from the aggregation buffer if they would overlap 
these writes
3) it needs to support clustering

These are not huge changes, but they will require a bit of work.  There are 
other features which need to touch this code as well, so I'll poke around.

 Need way to clear contents of a cache entry
 ---

 Key: TS-866
 URL: https://issues.apache.org/jira/browse/TS-866
 Project: Traffic Server
  Issue Type: New Feature
  Components: Cache
Affects Versions: 3.0.0
Reporter: William Bardwell
Priority: Minor
 Fix For: 3.1.0

 Attachments: cache_erase.diff


 I needed a way to clear a cache entry off of disk, not just forget about it.  
 The worry was about if you got content on a server that was illegal or a 
 privacy violation of some sort, we wanted a way to be able to tell customers 
 that after this step there was no way that TS could serve the content again.  
 The normal cache remove just clears the directory entry, but theoretically a 
 bug could allow that data out in some way.  This was not intended to prevent 
 forensic analysis of the hardware being able to recover the data.  And bugs 
 in low level drivers or the kernel could theoretically allow data to survive 
 due to block remapping or mis-management of disk caches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (TS-866) Need way to clear contents of a cache entry

2011-07-24 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070234#comment-13070234
 ] 

John Plevyak edited comment on TS-866 at 7/24/11 7:25 PM:
--

Sorry for the delay.  I am looking at this patch.  It needs a little bit of 
work:

1) it should be built on remove instead of read (it can still share internal 
states with read using the stack mechanism)
2) it should interlock writes from the aggregation buffer if they would overlap 
these writes
3) it needs to support clustering

These are not huge changes, but they will require a bit of work.  There are 
other features which need to touch this code as well, so I'll poke around.

  was (Author: jplevyak):
Sorry for the delay.  I am looking at this patch.  It needs a little bit of 
work:

1) it should be built on remove instead of read (it can still share internal 
states with using the stack mechanism)
2) it should interlock writes from the aggregation buffer if they would overlap 
these writes
3) it needs to support clustering

These are not huge changes, but they will require a bit of work.  There are 
other features which need to touch this code as well, so I'll poke around.
  
 Need way to clear contents of a cache entry
 ---

 Key: TS-866
 URL: https://issues.apache.org/jira/browse/TS-866
 Project: Traffic Server
  Issue Type: New Feature
  Components: Cache
Affects Versions: 3.0.0
Reporter: William Bardwell
Priority: Minor
 Fix For: 3.1.0

 Attachments: cache_erase.diff


 I needed a way to clear a cache entry off of disk, not just forget about it.  
 The worry was about if you got content on a server that was illegal or a 
 privacy violation of some sort, we wanted a way to be able to tell customers 
 that after this step there was no way that TS could serve the content again.  
 The normal cache remove just clears the directory entry, but theoretically a 
 bug could allow that data out in some way.  This was not intended to prevent 
 forensic analysis of the hardware being able to recover the data.  And bugs 
 in low level drivers or the kernel could theoretically allow data to survive 
 due to block remapping or mis-management of disk caches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-848) Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show

2011-07-24 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070236#comment-13070236
 ] 

John Plevyak commented on TS-848:
-

Gack, many of those values (e.g. nbytes) are now 64-bit %lld.

 Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show
 

 Key: TS-848
 URL: https://issues.apache.org/jira/browse/TS-848
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Affects Versions: 3.1.0
Reporter: Zhao Yongming
  Labels: http_ui, network
 Fix For: 3.1.1


 when we use the {net} http_ui network interface, it crashed with the 
 following information
 {code}
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /usr/bin/traffic_server - STACK TRACE: 
 /usr/bin/traffic_server[0x51ba3e]
 /lib64/libpthread.so.0[0x3f89c0e7c0]
 [0x7fffd20544f8]
 /lib64/libc.so.6(vsnprintf+0x9a)[0x3f8906988a]
 /usr/bin/traffic_server(ShowCont::show(char const*, ...)+0x262)[0x638184]
 /usr/bin/traffic_server(ShowNet::showConnectionsOnThread(int, 
 Event*)+0x481)[0x6ec7bf]
 /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6f)[0x4d302f]
 /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x11e)[0x6f9978]
 /usr/bin/traffic_server(EThread::execute()+0x94)[0x6f9b6a]
 /usr/bin/traffic_server(main+0x10c7)[0x4ff74d]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f8901d994]
 /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
 /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
 [New process 31182]
 #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
 #1  0x003f89046b69 in vfprintf () from /lib64/libc.so.6
 #2  0x003f8906988a in vsnprintf () from /lib64/libc.so.6
 #3  0x00638184 in ShowCont::show (this=0x2aaab44af600, 
 s=0x7732b8 
 trtd%d/tdtd%s/tdtd%d/tdtd%d/tdtd%s/tdtd%d/tdtd%d 
 secs 
 ago/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d
  secs/tdtd%d secs/td...) at ../../proxy/Show.h:62
 #4  0x006ec7bf in ShowNet::showConnectionsOnThread 
 (this=0x2aaab44af600, event=1, e=0x2aaab5cc2080) at UnixNetPages.cc:75
 #5  0x004d302f in Continuation::handleEvent (this=0x2aaab44af600, 
 event=1, data=0x2aaab5cc2080) at I_Continuation.h:146
 #6  0x006f9978 in EThread::process_event (this=0x2ae29010, 
 e=0x2aaab5cc2080, calling_code=1) at UnixEThread.cc:140
 #7  0x006f9b6a in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:189
 #8  0x004ff74d in main (argc=3, argv=0x7fffd2054d88) at Main.cc:1958
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-848) Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show

2011-07-24 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070292#comment-13070292
 ] 

John Plevyak commented on TS-848:
-

I think this is fixed in 1150526, give it a try.

 Crash Report: ShowNet::showConnectionsOnThread - ShowCont::show
 

 Key: TS-848
 URL: https://issues.apache.org/jira/browse/TS-848
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP
Affects Versions: 3.1.0
Reporter: Zhao Yongming
  Labels: http_ui, network
 Fix For: 3.1.1


 when we use the {net} http_ui network interface, it crashed with the 
 following information
 {code}
 NOTE: Traffic Server received Sig 11: Segmentation fault
 /usr/bin/traffic_server - STACK TRACE: 
 /usr/bin/traffic_server[0x51ba3e]
 /lib64/libpthread.so.0[0x3f89c0e7c0]
 [0x7fffd20544f8]
 /lib64/libc.so.6(vsnprintf+0x9a)[0x3f8906988a]
 /usr/bin/traffic_server(ShowCont::show(char const*, ...)+0x262)[0x638184]
 /usr/bin/traffic_server(ShowNet::showConnectionsOnThread(int, 
 Event*)+0x481)[0x6ec7bf]
 /usr/bin/traffic_server(Continuation::handleEvent(int, void*)+0x6f)[0x4d302f]
 /usr/bin/traffic_server(EThread::process_event(Event*, int)+0x11e)[0x6f9978]
 /usr/bin/traffic_server(EThread::execute()+0x94)[0x6f9b6a]
 /usr/bin/traffic_server(main+0x10c7)[0x4ff74d]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f8901d994]
 /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
 /usr/bin/traffic_server(__gxx_personality_v0+0x491)[0x4b2149]
 [New process 31182]
 #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x003f890796d0 in strlen () from /lib64/libc.so.6
 #1  0x003f89046b69 in vfprintf () from /lib64/libc.so.6
 #2  0x003f8906988a in vsnprintf () from /lib64/libc.so.6
 #3  0x00638184 in ShowCont::show (this=0x2aaab44af600, 
 s=0x7732b8 
 trtd%d/tdtd%s/tdtd%d/tdtd%d/tdtd%s/tdtd%d/tdtd%d 
 secs 
 ago/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d/tdtd%d
  secs/tdtd%d secs/td...) at ../../proxy/Show.h:62
 #4  0x006ec7bf in ShowNet::showConnectionsOnThread 
 (this=0x2aaab44af600, event=1, e=0x2aaab5cc2080) at UnixNetPages.cc:75
 #5  0x004d302f in Continuation::handleEvent (this=0x2aaab44af600, 
 event=1, data=0x2aaab5cc2080) at I_Continuation.h:146
 #6  0x006f9978 in EThread::process_event (this=0x2ae29010, 
 e=0x2aaab5cc2080, calling_code=1) at UnixEThread.cc:140
 #7  0x006f9b6a in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:189
 #8  0x004ff74d in main (argc=3, argv=0x7fffd2054d88) at Main.cc:1958
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-834) Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57

2011-06-16 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050605#comment-13050605
 ] 

John Plevyak commented on TS-834:
-

zym, do you still see this with the patch?

 Crash Report: InactivityCop::check_inactivity, event=2, UnixNet.cc:57
 -

 Key: TS-834
 URL: https://issues.apache.org/jira/browse/TS-834
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: current trunk( the same time as v3.0), --enable-debug
Reporter: Zhao Yongming
  Labels: UnixNet
 Attachments: TS-834.diff


 bt #1
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
 event=1, data=0x4b2d6d0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaaf4091b70, 
 event=1, data=0x4b2d6d0) at I_Continuation.h:146
 #1  0x006ce196 in InactivityCop::check_inactivity (this=0x4b3f780, 
 event=2, e=0x4b2d6d0) at UnixNet.cc:57
 #2  0x004d2c5f in Continuation::handleEvent (this=0x4b3f780, event=2, 
 data=0x4b2d6d0) at I_Continuation.h:146
 #3  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x4b2d6d0, calling_code=2) at UnixEThread.cc:140
 #4  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #5  0x004ff37d in main (argc=3, argv=0x7fff6f447418) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff6f446cb0:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6ce196
  called by frame at 0x7fff6f446d00
  source language c++.
  Arglist at 0x7fff6f446ca0, args: this=0x2aaaf4091b70, event=1, data=0x4b2d6d0
  Locals at 0x7fff6f446ca0, Previous frame's sp is 0x7fff6f446cb0
  Saved registers:
   rbp at 0x7fff6f446ca0, rip at 0x7fff6f446ca8
 (gdb) x/80x this
 0x2aaaf4091b70: 0x0076a830  0x  0x006d1902  0x
 0x2aaaf4091b80: 0x  0x  0x0076a290  0x
 0x2aaaf4091b90: 0x  0x  0x  0x
 0x2aaaf4091ba0: 0x  0x  0x  0x
 0x2aaaf4091bb0: 0x  0x  0x  0x
 0x2aaaf4091bc0: 0x  0x  0x  0x
 0x2aaaf4091bd0: 0x  0x  0x  0x
 0x2aaaf4091be0: 0x  0x  0x  0x
 0x2aaaf4091bf0: 0x  0x  0x  0x
 0x2aaaf4091c00: 0x  0x  0x  0x
 0x2aaaf4091c10: 0x  0x  0x  0x
 0x2aaaf4091c20: 0x  0x  0x  0x
 0x2aaaf4091c30: 0x  0x  0x  0x
 0x2aaaf4091c40: 0x  0x  0x  0x
 0x2aaaf4091c50: 0x  0x  0x  0x
 0x2aaaf4091c60: 0x  0x  0x  0x
 0x2aaaf4091c70: 0x  0x  0x  0x
 0x2aaaf4091c80: 0x  0x  0x  0x
 0x2aaaf4091c90: 0x  0x  0x  0x
 0x2aaaf4091ca0: 0x  0x  0x  0x
 {code}
 bt #2
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
 event=1, data=0x11cbc610) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x11ed6000, 
 event=1, data=0x11cbc610) at I_Continuation.h:146
 #1  0x006ce196 in InactivityCop::check_inactivity 
 (this=0x2c001f50, event=2, e=0x11cbc610) at UnixNet.cc:57
 #2  0x004d2c5f in Continuation::handleEvent (this=0x2c001f50, 
 event=2, data=0x11cbc610) at I_Continuation.h:146
 #3  0x006f5830 in EThread::process_event (this=0x2af2a010, 
 e=0x11cbc610, calling_code=2) at UnixEThread.cc:140
 #4  0x006f5b72 in EThread::execute (this=0x2af2a010) at 
 UnixEThread.cc:217
 #5  0x006f5181 in spawn_thread_internal (a=0x11cadae0) at Thread.cc:88
 #6  0x0030ec2064a7 in start_thread () from /lib64/libpthread.so.0
 #7  0x0030eb6d3c2d in clone () from /lib64/libc.so.6
 (gdb) info f
 Stack level 0, frame at 0x4198df60:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6ce196
  called by frame at 0x4198dfb0
  source language c++.
  Arglist at 0x4198df50, args: this=0x11ed6000, event=1, data=0x11cbc610
  Locals at 0x4198df50, Previous frame's sp is 0x4198df60
  Saved 

[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-16 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050620#comment-13050620
 ] 

John Plevyak commented on TS-833:
-

mohan_zl, this latest crash is with TS-833-3.diff ??

 Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
 ink_freelist_free related
 ---

 Key: TS-833
 URL: https://issues.apache.org/jira/browse/TS-833
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: current trunk, with --enable-debug
Reporter: Zhao Yongming
  Labels: freelist
 Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff


 bt #1
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff76c40e40:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff76c40eb0
  source language c++.
  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
  Saved registers:
   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
 (gdb) x/40x this
 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 {code}
 bt #2
 {code}
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
 e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
 #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
 (gdb) p *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
 handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
   handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of 
 bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation 
 = {
   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
 (gdb) 
 {code}
 bt #3
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff421f01f0:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff421f0260
  source language c++.
  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
 data=0x2aaab00d1570
  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
  Saved registers:
   rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8
 (gdb) p this-handler
 $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-14 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-833:


Attachment: TS-833-2.diff

This is a possible patch which deals with DNS issues.

 Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
 ink_freelist_free related
 ---

 Key: TS-833
 URL: https://issues.apache.org/jira/browse/TS-833
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: current trunk, with --enable-debug
Reporter: Zhao Yongming
  Labels: freelist
 Attachments: TS-833-2.diff, TS-833.diff


 bt #1
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff76c40e40:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff76c40eb0
  source language c++.
  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
  Saved registers:
   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
 (gdb) x/40x this
 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 {code}
 bt #2
 {code}
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
 e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
 #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
 (gdb) p *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
 handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
   handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of 
 bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation 
 = {
   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
 (gdb) 
 {code}
 bt #3
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff421f01f0:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff421f0260
  source language c++.
  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
 data=0x2aaab00d1570
  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
  Saved registers:
   rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8
 (gdb) p this-handler
 $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-14 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-833:


Attachment: TS-833-3.diff

Even more conservative coding style.

 Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
 ink_freelist_free related
 ---

 Key: TS-833
 URL: https://issues.apache.org/jira/browse/TS-833
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: current trunk, with --enable-debug
Reporter: Zhao Yongming
  Labels: freelist
 Attachments: TS-833-2.diff, TS-833-3.diff, TS-833.diff


 bt #1
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff76c40e40:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff76c40eb0
  source language c++.
  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
  Saved registers:
   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
 (gdb) x/40x this
 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 {code}
 bt #2
 {code}
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
 e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
 #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
 (gdb) p *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
 handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
   handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of 
 bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation 
 = {
   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
 (gdb) 
 {code}
 bt #3
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff421f01f0:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff421f0260
  source language c++.
  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
 data=0x2aaab00d1570
  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
  Saved registers:
   rbp at 0x7fff421f01e0, rip at 0x7fff421f01e8
 (gdb) p this-handler
 $1 = 0xefbeaddeefbeadde, this adjustment -1171307680053154338
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-833) Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, ink_freelist_free related

2011-06-13 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048750#comment-13048750
 ] 

John Plevyak commented on TS-833:
-

I have a theory about this, but I am not sure why the problem has only manifest 
now as it seems to have been in the codebase for a while.  The theory is that 
the vc_next is bad because it has been closed as a result of the inactivity 
callback.   This could be checked by walking down nh-open_list in the debugger 
(or code) to see if next_vc is in the list.

 Crash Report: Continuation::handleEvent, event=2, 0xdeadbeef, 
 ink_freelist_free related
 ---

 Key: TS-833
 URL: https://issues.apache.org/jira/browse/TS-833
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: current trunk, with --enable-debug
Reporter: Zhao Yongming
  Labels: freelist

 bt #1
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x19581df0, 
 event=2, data=0x197c4fc0) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x197c4fc0, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff76c41528) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff76c40e40:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff76c40eb0
  source language c++.
  Arglist at 0x7fff76c40e30, args: this=0x19581df0, event=2, data=0x197c4fc0
  Locals at 0x7fff76c40e30, Previous frame's sp is 0x7fff76c40e40
  Saved registers:
   rbp at 0x7fff76c40e30, rip at 0x7fff76c40e38
 (gdb) x/40x this
 0x19581df0: 0x19581901  0x  0xefbeadde  0xefbeadde
 0x19581e00: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e10: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e20: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e30: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e40: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e50: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e60: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e70: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 0x19581e80: 0xefbeadde  0xefbeadde  0xefbeadde  0xefbeadde
 {code}
 bt #2
 {code}
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d637c in Continuation::handleEvent (this=0xc3cc390, event=2, 
 data=0xc4408a0) at I_Continuation.h:146
 #1  0x0070364c in EThread::process_event (this=0x2ae29010, 
 e=0xc4408a0, calling_code=2) at UnixEThread.cc:140
 #2  0x0070398e in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x00502aac in main (argc=3, argv=0x7fff32ef2f58) at Main.cc:1961
 (gdb) p *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab002f011}, 
 handler = 0xefbeaddeefbeadde, this adjustment -1171307680053154338, 
   handler_name = 0xefbeaddeefbeadde Address 0xefbeaddeefbeadde out of 
 bounds, mutex = {m_ptr = 0xefbeaddeefbeadde}, link = {SLinkContinuation 
 = {
   next = 0xefbeaddeefbeadde}, prev = 0xefbeaddeefbeadde}}
 (gdb) 
 {code}
 bt #3
 {code}
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 146 return (this-*handler) (event, data);
 (gdb) bt
 #0  0x004d2c5c in Continuation::handleEvent (this=0x2aaab00615b0, 
 event=2, data=0x2aaab00d1570) at I_Continuation.h:146
 #1  0x006f5830 in EThread::process_event (this=0x2ae29010, 
 e=0x2aaab00d1570, calling_code=2) at UnixEThread.cc:140
 #2  0x006f5b72 in EThread::execute (this=0x2ae29010) at 
 UnixEThread.cc:217
 #3  0x004ff37d in main (argc=3, argv=0x7fff421f08d8) at Main.cc:1958
 (gdb) info f
 Stack level 0, frame at 0x7fff421f01f0:
  rip = 0x4d2c5c in Continuation::handleEvent(int, void*) 
 (I_Continuation.h:146); saved rip 0x6f5830
  called by frame at 0x7fff421f0260
  source language c++.
  Arglist at 0x7fff421f01e0, args: this=0x2aaab00615b0, event=2, 
 data=0x2aaab00d1570
  Locals at 0x7fff421f01e0, Previous frame's sp is 0x7fff421f01f0
  Saved registers:
   rbp at 0x7fff421f01e0, 

[jira] [Updated] (TS-811) libtool configure warnings on Fedora 15

2011-05-30 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-811:


Priority: Major  (was: Minor)

autoreconf -i fails.  If the configure file is built on some other machine, then
it will work fine so this only impacts developers :)

The resulting configure and them Makefile's are not functional:

make[3]: Entering directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts'
/bin/sh ../../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.   
-D_LARGEFILE64_SOURCE=1 -D_COMPILE64BIT_SOURCE=1 -D_GNU_SOURCE -D_REENTRANT 
-Dlinux  -g -pipe -Wall -Werror -O3 -feliminate-unused-debug-symbols 
-fno-strict-aliasing -Wno-invalid-offsetof  -MT Allocator.lo -MD -MP -MF 
.deps/Allocator.Tpo -c -o Allocator.lo Allocator.cc
../../libtool: line 2089: ./Allocator.cc: Permission denied
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -D_LARGEFILE64_SOURCE=1 
-D_COMPILE64BIT_SOURCE=1 -D_GNU_SOURCE -D_REENTRANT -Dlinux -g -pipe -Wall 
-Werror -O3 -feliminate-unused-debug-symbols -fno-strict-aliasing 
-Wno-invalid-offsetof -MT Allocator.lo -MD -MP -MF .deps/Allocator.Tpo -c   
-fPIC -DPIC -o .libs/Allocator.o
g++: error: : No such file or directory
g++: fatal error: no input files
compilation terminated.
make[3]: *** [Allocator.lo] Error 1
make[3]: Leaving directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts'
make[2]: *** [all] Error 2
make[2]: Leaving directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib/ts'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory 
`/a/home/jplevyak/projects/ts/trafficserver-2.1.9-unstable/lib'


 libtool configure warnings on Fedora 15
 ---

 Key: TS-811
 URL: https://issues.apache.org/jira/browse/TS-811
 Project: Traffic Server
  Issue Type: Bug
  Components: Build
Affects Versions: 2.1.9
 Environment: Fedora 15 x86_64.
Reporter: John Plevyak
 Fix For: 3.1.0


 configure.ac:465: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected 
 in body
 ../../lib/autoconf/lang.m4:194: AC_LANG_CONFTEST is expanded from...
 ../../lib/autoconf/general.m4:2662: _AC_LINK_IFELSE is expanded from...
 ../../lib/autoconf/general.m4:2679: AC_LINK_IFELSE is expanded from...
 build/libtool.m4:1084: _LT_SYS_MODULE_PATH_AIX is expanded from...
 build/libtool.m4:5428: _LT_LANG_CXX_CONFIG is expanded from...
 build/libtool.m4:816: _LT_LANG is expanded from...
 build/libtool.m4:799: LT_LANG is expanded from...
 build/libtool.m4:827: _LT_LANG_DEFAULT_CONFIG is expanded from...
 build/libtool.m4:143: _LT_SETUP is expanded from...
 build/libtool.m4:69: LT_INIT is expanded from...
 build/libtool.m4:107: AC_PROG_LIBTOOL is expanded from...
 configure.ac:465: the top level

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-20 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-621:


Backport to Version: 3.0.1
  Fix Version/s: (was: 2.1.9)
 3.1

This change is just too risky to land in 3.0.   We will make the change first 
thing in 3.1 and then backport if/when it proves stable.

 writing 0 bytes to the HTTP cache means only update the header... need a new 
 API: update_header_only() to allow 0 byte files to be cached
 -

 Key: TS-621
 URL: https://issues.apache.org/jira/browse/TS-621
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Affects Versions: 2.1.5
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 3.1

 Attachments: TS-621_cluster_zero_size_objects.patch, 
 ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-19 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036424#comment-13036424
 ] 

John Plevyak commented on TS-621:
-

Obviously the patch needs to be fixed up a bit.  The Cluster used the
CacheDataType
as a message type, so I hacked in:

 enum CacheDataType
 {
   CACHE_DATA_SIZE = VCONNECTION_CACHE_DATA_BASE,
-  CACHE_DATA_HTTP_INFO,
+  CACHE_DATA_HTTP_INFO_LEAVE_BODY,
+  CACHE_DATA_HTTP_INFO_REPLACE_BODY,
   CACHE_DATA_KEY,
   CACHE_DATA_RAM_CACHE_HIT_FLAG
 };

Which doesn't really make sense.  The leave/replace bit should be encoded
somewhere else in the message.

The changes to CacheWrite are very tricky and I have little faith in them.

We could land it, but we would needs some serious testing...





 writing 0 bytes to the HTTP cache means only update the header... need a new 
 API: update_header_only() to allow 0 byte files to be cached
 -

 Key: TS-621
 URL: https://issues.apache.org/jira/browse/TS-621
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Affects Versions: 2.1.5
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 2.1.9

 Attachments: TS-621_cluster_zero_size_objects.patch, 
 ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-17 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated TS-621:


Attachment: ts-621-jp-3.patch

This one works... but I would consider it very risky.

 writing 0 bytes to the HTTP cache means only update the header... need a new 
 API: update_header_only() to allow 0 byte files to be cached
 -

 Key: TS-621
 URL: https://issues.apache.org/jira/browse/TS-621
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Affects Versions: 2.1.5
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 2.1.9

 Attachments: ts-621-jp-1.patch, ts-621-jp-2.patch, ts-621-jp-3.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-16 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034408#comment-13034408
 ] 

John Plevyak commented on TS-621:
-

lol, my testing tool treats 0 length as varied

testing now.

john






 writing 0 bytes to the HTTP cache means only update the header... need a new 
 API: update_header_only() to allow 0 byte files to be cached
 -

 Key: TS-621
 URL: https://issues.apache.org/jira/browse/TS-621
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Affects Versions: 2.1.5
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 2.1.9

 Attachments: ts-621-jp-1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-16 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034424#comment-13034424
 ] 

John Plevyak commented on TS-621:
-

yes, the HTTP state machine needs some more changes, and these are beyond me.

I changed it so that it make the correct calls to the cache, but it seems that
content-length of 0 is hard-wired into HttpSM as an error.  The problem emerges
in this=0x7fffea3e01c0, event=103, c=0x7fffea3e1f40) at HttpSM.cc:3162
3162  c-vc-do_io_close(EHTTP_ERROR);
(gdb) list
3157//   we got a truncated header from the origin server
3158//   but decided to accpet it anyways
3159if (c-write_vio == NULL) {
3160  *status_ptr = HttpTransact::CACHE_WRITE_ERROR;
3161  c-write_success = false;
3162  c-vc-do_io_close(EHTTP_ERROR);
3163} else {
3164  *status_ptr = HttpTransact::CACHE_WRITE_COMPLETE;
3165  c-write_success = true;
3166  c-write_vio = c-vc-do_io(VIO::CLOSE);

It seems that c-write_vio is NULL which causes the HttpSM to close the cache
with an error

It is easy to test... just put a breakpoint in CacheVC::openWriteClose

The close should be without error.

 writing 0 bytes to the HTTP cache means only update the header... need a new 
 API: update_header_only() to allow 0 byte files to be cached
 -

 Key: TS-621
 URL: https://issues.apache.org/jira/browse/TS-621
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Affects Versions: 2.1.5
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 2.1.9

 Attachments: ts-621-jp-1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-779) Set thread name for various event types

2011-05-15 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033807#comment-13033807
 ] 

John Plevyak commented on TS-779:
-

Nice!  Looks good.

 Set thread name for various event types
 ---

 Key: TS-779
 URL: https://issues.apache.org/jira/browse/TS-779
 Project: Traffic Server
  Issue Type: New Feature
  Components: Core
Reporter: Leif Hedstrom
Assignee: Leif Hedstrom
 Attachments: TS-779.diff


 Where supported, I'd like to set the thread name (using prctl) for the 
 various event threads that we have. This makes it much easier to see what 
 type of thread is consuming resources.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-773) Traffic server has a hard limit of 512 gigabytes per RAW disk partition

2011-05-15 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033816#comment-13033816
 ] 

John Plevyak commented on TS-773:
-

Well, I did predict failure :)  In this case the problem was that the directory 
can now be more than
2GB in size, which exceeds an 'int'.  The resulting patch touched lots of the 
system because it also means
we can now read and write  2GB in a single go.   I have submitted a patch and 
tested in a faked disk over 2TB
(I had Store.cc lie about the size of the disk).  Give it a go.


 Traffic server has a hard limit of 512 gigabytes per RAW disk partition
 ---

 Key: TS-773
 URL: https://issues.apache.org/jira/browse/TS-773
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 2.1.8
 Environment: Debian Lenny 5.0.8 2.6.34.7 x86_64
 12 1.5TB harddrives for cache disks. 
Reporter: David Robinson
Assignee: John Plevyak
 Fix For: 2.1.9


 Using 1.5TB harddrives as cache disks results in ATS only using 512GBs of the 
 disk. The disks are configured in RAW mode with no partition information.
 storage.config is setup like this,
 /dev/sda
 /dev/sdb
 /dev/sde
 /dev/sdf
 /dev/sdh
 /dev/sdi
 /dev/sdj
 /dev/sdk
 /dev/sdl
 /dev/sdm
 /dev/sdn
 /dev/sdo
 fdisk -l /dev/sdo
 Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes
 255 heads, 63 sectors/track, 182401 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 Disk identifier: 0x
 Partitioning a disk into 3 512G partition and adding then to storage.config 
 will make ATS use the entire 1.5TBs of space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-621) writing 0 bytes to the HTTP cache means only update the header... need a new API: update_header_only() to allow 0 byte files to be cached

2011-05-15 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033817#comment-13033817
 ] 

John Plevyak commented on TS-621:
-

I'd like to get it in.  The concern however, is that it changes the API
which means
that it will break clustering, so we have to get cluster changes/testing
before committing.

john




 writing 0 bytes to the HTTP cache means only update the header... need a new 
 API: update_header_only() to allow 0 byte files to be cached
 -

 Key: TS-621
 URL: https://issues.apache.org/jira/browse/TS-621
 Project: Traffic Server
  Issue Type: Improvement
  Components: Cache
Affects Versions: 2.1.5
Reporter: John Plevyak
Assignee: John Plevyak
 Fix For: 2.1.9




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-773) Traffic server has a hard limit of 512 gigabytes per RAW disk partition

2011-05-11 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032061#comment-13032061
 ] 

John Plevyak commented on TS-773:
-

I checked in what I hope is the fix.  The cplist code needs to be cleaned up
as it is not clear which counts are counting bytes/store blocks/disk volume 
blocks.

WARNING: this fix required changing the disk structure which will result in a 
cache WIPE!

 Traffic server has a hard limit of 512 gigabytes per RAW disk partition
 ---

 Key: TS-773
 URL: https://issues.apache.org/jira/browse/TS-773
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 2.1.8
 Environment: Debian Lenny 5.0.8 2.6.34.7 x86_64
 12 1.5TB harddrives for cache disks. 
Reporter: David Robinson
Assignee: John Plevyak
 Fix For: 2.1.9


 Using 1.5TB harddrives as cache disks results in ATS only using 512GBs of the 
 disk. The disks are configured in RAW mode with no partition information.
 storage.config is setup like this,
 /dev/sda
 /dev/sdb
 /dev/sde
 /dev/sdf
 /dev/sdh
 /dev/sdi
 /dev/sdj
 /dev/sdk
 /dev/sdl
 /dev/sdm
 /dev/sdn
 /dev/sdo
 fdisk -l /dev/sdo
 Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes
 255 heads, 63 sectors/track, 182401 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 Disk identifier: 0x
 Partitioning a disk into 3 512G partition and adding then to storage.config 
 will make ATS use the entire 1.5TBs of space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-773) Traffic server has a hard limit of 512 gigabytes per RAW disk partition

2011-05-11 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032119#comment-13032119
 ] 

John Plevyak commented on TS-773:
-

Unfortunately, the hosting code is pretty complicated and poorly written.
I have tried to patch it, but it really should have a more comprehensive
audit.  If this patch fixes the immediate problem (which it seems to have, but
I'd like independent confirmation) we can put off the audit till after 3.0.

So I don't foresee more changes if we get the fix verified but I wouldn't be
surprised if there was still a problem under some circumstances... until a full
audit.

 Traffic server has a hard limit of 512 gigabytes per RAW disk partition
 ---

 Key: TS-773
 URL: https://issues.apache.org/jira/browse/TS-773
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 2.1.8
 Environment: Debian Lenny 5.0.8 2.6.34.7 x86_64
 12 1.5TB harddrives for cache disks. 
Reporter: David Robinson
Assignee: John Plevyak
 Fix For: 2.1.9


 Using 1.5TB harddrives as cache disks results in ATS only using 512GBs of the 
 disk. The disks are configured in RAW mode with no partition information.
 storage.config is setup like this,
 /dev/sda
 /dev/sdb
 /dev/sde
 /dev/sdf
 /dev/sdh
 /dev/sdi
 /dev/sdj
 /dev/sdk
 /dev/sdl
 /dev/sdm
 /dev/sdn
 /dev/sdo
 fdisk -l /dev/sdo
 Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes
 255 heads, 63 sectors/track, 182401 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 Disk identifier: 0x
 Partitioning a disk into 3 512G partition and adding then to storage.config 
 will make ATS use the entire 1.5TBs of space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-745) Support ssd

2011-05-10 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031254#comment-13031254
 ] 

John Plevyak commented on TS-745:
-

We should make a branch for this.  Then we could collaborate.  I would also 
like to simplify locking in the cache by getting rid of the TryLocks for the 
partition.   This would make it easier to write the SSD code to handle multiple 
SSDs I think.

What do you think mohan_zl?

 Support ssd
 ---

 Key: TS-745
 URL: https://issues.apache.org/jira/browse/TS-745
 Project: Traffic Server
  Issue Type: New Feature
  Components: Cache
Reporter: mohan_zl
Assignee: mohan_zl
 Attachments: ssd_cache.patch


 A patch for supporting, not work well for a long time with --enable-debug

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-752) cache scan issues

2011-04-28 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026758#comment-13026758
 ] 

John Plevyak commented on TS-752:
-

I reviewed this and it looks good to me.  I'd move vol_relative_length
into the header (for symetry).

The fix for partial objects I think is nice as well.  

I don't see the crash fix in svn5.diff, so I am assuming that the two are 
independent.

Since this patch is all about scan it should be safe.

William, how much testing have you given this?  If you are comfortable, I'll 
give it a smoke test and commit.

 cache scan issues
 -

 Key: TS-752
 URL: https://issues.apache.org/jira/browse/TS-752
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 2.1.7, 2.1.6, 2.1.5, 2.1.4
 Environment: Any
Reporter: William Bardwell
Assignee: Leif Hedstrom
 Fix For: 2.1.8

 Attachments: svn4.diff, svn4.diff, svn5.diff


 Using the CacheScan plugin APIs I found a few issues.
 Issue 1 is that if you cancel a scan really quickly you can get a NULL 
 dereference, the fix for this is easy.
 Issue 2 is that the cache scan code can skip over entries if the initial 
 header overlaps a buffer boundary.
 Issue 3 is that the cache scan code is crazy slow if your cache is not full, 
 it still scans everything.
 I will attach a patch for Issues 2  3 mixed together...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-716) Crash in Continuation::handleEvent

2011-04-26 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025308#comment-13025308
 ] 

John Plevyak commented on TS-716:
-

We should make another bug for this.  I am not sure why, but it probably has 
something to do with one or more of the sites having a large round robin and a 
low TTL.  It could also be exacerbated by a startup issue were a lot of 
outstanding DNS requests for the same host are done unnecessarily.  There is 
supposed to be queuing in the DNS processor, but that may not be working as 
intended

 Crash in Continuation::handleEvent 
 ---

 Key: TS-716
 URL: https://issues.apache.org/jira/browse/TS-716
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 2.1.7
 Environment: CentOS 5.4 x86_64, 6 * 2T SATA Disks, 48G Memory
Reporter: Kissdev
Assignee: John Plevyak
Priority: Critical
 Fix For: 2.1.8

 Attachments: crasher.patch


 ATS crashes with the following configuration: 
   - reverse proxy , storage: 6 raw devices (6*2T),  1 partition (2T)
   - remap config:regex_map http://(.*) http://$1
 The load :  about 100Mbps, requests for top 4000 internet sites, mainly 
 html,js,pictures,flashes
 Detail of crashes by core dump:
 crash #1:
 {code}
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 146 I_Continuation.h: No such file or directory.
 in I_Continuation.h
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 #1  0x00702b80 in EThread::process_event (this=0x2b101010, 
 e=0x90f7170, calling_code=1) at UnixEThread.cc:140
 #2  0x00702fa1 in EThread::execute (this=0x2b101010) at 
 UnixEThread.cc:232
 #3  0x007024d2 in spawn_thread_internal (a=0x8d94a70) at Thread.cc:85
 #4  0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0
 #5  0x0036eb0d3c2d in clone () from /lib64/libc.so.6
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaaba360a11},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation 
 = {
   next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}}
 {code}
 crash #2:
 {code}
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, 
 event=1, data=0x154b5a80) at I_Continuation.h:146
 #1  0x006db290 in InactivityCop::check_inactivity (this=0x154c8730, 
 event=2, e=0x154b5a80) at UnixNet.cc:57
 #2  0x004dd1bb in Continuation::handleEvent (this=0x154c8730, 
 event=2, data=0x154b5a80) at I_Continuation.h:146
 #3  0x00702b80 in EThread::process_event (this=0x2b606010, 
 e=0x154b5a80, calling_code=2) at UnixEThread.cc:140
 #4  0x00702ec2 in EThread::execute (this=0x2b606010) at 
 UnixEThread.cc:217
 #5  0x007024d2 in spawn_thread_internal (a=0x154852c0) at Thread.cc:85
 #6  0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0
 #7  0x0036eb0d3c2d in clone () from /lib64/libc.so.6
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, 
 event=1, data=0x154b5a80) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x16280061},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation 
 = {
   next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}}
 (gdb)
 {code}
 crash #3:
 {code}
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, 
 event=2, data=0x5631120) at I_Continuation.h:146
 #1  0x00702b80 in EThread::process_event (this=0x2abfc010, 
 e=0x5631120, calling_code=2) at UnixEThread.cc:140
 #2  0x00702ec2 in EThread::execute (this=0x2abfc010) at 
 UnixEThread.cc:217
 #3  0x0050917c in main (argc=3, argv=0x7fff0af6e3b8) at Main.cc:1962
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, 
 event=2, data=0x5631120) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab45df291},
   handler = virtual table offset 

[jira] [Commented] (TS-716) Crash in Continuation::handleEvent

2011-04-25 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024834#comment-13024834
 ] 

John Plevyak commented on TS-716:
-

Just to confirm that this is not a configuration problem, try increasing:

CONFIG proxy.config.hostdb.size INT 20
CONFIG proxy.config.hostdb.storage_size INT 33554432

by say a factor of 10.

john





 Crash in Continuation::handleEvent 
 ---

 Key: TS-716
 URL: https://issues.apache.org/jira/browse/TS-716
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 2.1.7
 Environment: CentOS 5.4 x86_64, 6 * 2T SATA Disks, 48G Memory
Reporter: Kissdev
Assignee: John Plevyak
Priority: Critical
 Fix For: 2.1.8

 Attachments: crasher.patch


 ATS crashes with the following configuration: 
   - reverse proxy , storage: 6 raw devices (6*2T),  1 partition (2T)
   - remap config:regex_map http://(.*) http://$1
 The load :  about 100Mbps, requests for top 4000 internet sites, mainly 
 html,js,pictures,flashes
 Detail of crashes by core dump:
 crash #1:
 {code}
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 146 I_Continuation.h: No such file or directory.
 in I_Continuation.h
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 #1  0x00702b80 in EThread::process_event (this=0x2b101010, 
 e=0x90f7170, calling_code=1) at UnixEThread.cc:140
 #2  0x00702fa1 in EThread::execute (this=0x2b101010) at 
 UnixEThread.cc:232
 #3  0x007024d2 in spawn_thread_internal (a=0x8d94a70) at Thread.cc:85
 #4  0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0
 #5  0x0036eb0d3c2d in clone () from /lib64/libc.so.6
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaaba360a11},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation 
 = {
   next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}}
 {code}
 crash #2:
 {code}
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, 
 event=1, data=0x154b5a80) at I_Continuation.h:146
 #1  0x006db290 in InactivityCop::check_inactivity (this=0x154c8730, 
 event=2, e=0x154b5a80) at UnixNet.cc:57
 #2  0x004dd1bb in Continuation::handleEvent (this=0x154c8730, 
 event=2, data=0x154b5a80) at I_Continuation.h:146
 #3  0x00702b80 in EThread::process_event (this=0x2b606010, 
 e=0x154b5a80, calling_code=2) at UnixEThread.cc:140
 #4  0x00702ec2 in EThread::execute (this=0x2b606010) at 
 UnixEThread.cc:217
 #5  0x007024d2 in spawn_thread_internal (a=0x154852c0) at Thread.cc:85
 #6  0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0
 #7  0x0036eb0d3c2d in clone () from /lib64/libc.so.6
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, 
 event=1, data=0x154b5a80) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x16280061},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation 
 = {
   next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}}
 (gdb)
 {code}
 crash #3:
 {code}
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, 
 event=2, data=0x5631120) at I_Continuation.h:146
 #1  0x00702b80 in EThread::process_event (this=0x2abfc010, 
 e=0x5631120, calling_code=2) at UnixEThread.cc:140
 #2  0x00702ec2 in EThread::execute (this=0x2abfc010) at 
 UnixEThread.cc:217
 #3  0x0050917c in main (argc=3, argv=0x7fff0af6e3b8) at Main.cc:1962
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, 
 event=2, data=0x5631120) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab45df291},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation 
 = {
 

[jira] [Commented] (TS-745) Support ssd

2011-04-25 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024839#comment-13024839
 ] 

John Plevyak commented on TS-745:
-

Overall several comments:

1) Once we get it working we need to reorganize the patch so that we avoid code 
duplication.
2) The limitation of a single SSD is probably overly restrictive.  SSDs are 
relatively inexpensive now and it is very likely that folks will want to have 1 
per disk or better yet some combination of SSD and SATA.
3) The code uses xmalloc to store docs and creates another CacheVC to do the 
write in the background.  We would be using the various allocators rather than 
xmalloc and we should be adding states rather than creating another CacheVC.  
The write can occur after calling back the user, so there will be no additional 
latency.
4) The code doesn't seem to have provision for handling multi-fragment 
documents.  There are some tradeoffs there, but in any case the issues needs to 
be considered.  Handling of multi-fragment documents requires tighter control 
over when the write is done and if and when it is successful.  Again, this 
would indicate we should be doing the write on the same CacheVC in additional 
states.
5) The code needs to deal collisions after the initial directory is looked up.  
Again, this would be easier if the SSD code was operating in the same CacheVC.

Please think about these comments.  Let's use http://codereview.appspot.com/ 
for detailed reviews.  I have already added traffic server as a repository.  I 
have imported the patch as well, but I think we still have some design 
discussion to do before we can get into the details.



 Support ssd
 ---

 Key: TS-745
 URL: https://issues.apache.org/jira/browse/TS-745
 Project: Traffic Server
  Issue Type: New Feature
  Components: Cache
Reporter: mohan_zl
Assignee: mohan_zl
 Attachments: ssd_cache.patch


 A patch for supporting, not work well for a long time with --enable-debug

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TS-724) disk IO balance in v2.1.7

2011-04-20 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022597#comment-13022597
 ] 

John Plevyak commented on TS-724:
-

FYI a single document ends up on a single disk.  If we could get the 
per-partition cache stats we could see what ATS thought was going on.

 disk IO balance in v2.1.7
 -

 Key: TS-724
 URL: https://issues.apache.org/jira/browse/TS-724
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 2.1.7
 Environment: reporting from users, and confirm within my testing evn. 
 v2.1.7 only
Reporter: Zhao Yongming
Priority: Critical
 Fix For: 2.1.9


 when multiple disk enabled, the disk IO will show much diff in v2.1.7, here 
 is my result on a 7 disk system result:
 {code:none}
 [root@cache189 ~]# iostat -x 5 
 Linux 2.6.18-164.11.1.el5 (cache189.cn8)  03/29/2011
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.510.000.541.770.00   97.18
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz 
 avgqu-sz   await  svctm  %util
 sdb   0.00 0.00 11.80  0.00   360.80 0.0030.58 
 0.075.73   5.56   6.56
 sdc   0.00 0.00 12.00  0.00   413.80 0.0034.48 
 0.075.42   5.30   6.36
 sdd   0.00 0.00 11.60  1.40   375.80   820.8092.05 
 0.075.06   4.66   6.06
 sde   0.00 0.00 13.00 14.00   722.60  8192.00   330.17 
 0.124.50   2.99   8.06
 sdf   0.00 0.00 14.60  0.00   579.40 0.0039.68 
 0.117.48   7.04  10.28
 sdg   0.00 0.00 49.20  0.00 18268.60 0.00   371.31 
 0.081.66   0.54   2.66
 sdb   0.00 0.00 11.60  0.00   253.60 0.0021.86 
 0.065.45   5.12   5.94
 sdc   0.00 0.00 15.80  0.00   738.20 0.0046.72 
 0.085.22   4.76   7.52
 sdd   0.00 0.00 10.80  0.00   728.40 0.0067.44 
 0.065.81   5.48   5.92
 sde   0.00 0.00 11.60  2.00   377.60  1027.20   103.29 
 0.075.18   4.75   6.46
 sdf   0.00 0.00 14.60  0.00   473.60 0.0032.44 
 0.095.90   5.78   8.44
 sdg   0.00 0.00 87.00  0.00 37454.80 0.00   430.51 
 0.374.26   0.82   7.12
 sdb   0.00 0.00 15.80  0.00   786.40 0.0049.77 
 0.106.56   5.76   9.10
 sdc   0.00 0.00 10.20  1.60   217.60   911.2095.66 
 0.064.93   4.51   5.32
 sdd   0.00 0.00 13.00  0.00   665.00 0.0051.15 
 0.086.12   5.80   7.54
 sde   0.00 0.00 11.60  0.00   419.40 0.0036.16 
 0.065.43   5.17   6.00
 sdf   0.00 0.00 11.00  1.40   315.00   826.8092.08 
 0.075.27   4.89   6.06
 sdg   0.00 0.00 27.00  0.00  8629.60 0.00   319.61 
 0.020.87   0.37   1.00
 sdb   0.00 0.00 12.80  0.00   380.00 0.0029.69 
 0.075.22   4.98   6.38
 sdc   0.00 0.00 14.80  0.00   495.80 0.0033.50 
 0.085.39   5.19   7.68
 sdd   0.00 0.00 10.40  0.00   267.40 0.0025.71 
 0.065.87   5.46   5.68
 sde   0.00 0.00 12.20  0.00   691.20 0.0056.66 
 0.075.93   5.48   6.68
 sdf   0.00 0.00 11.80  0.00   544.40 0.0046.14 
 0.075.83   5.63   6.64
 sdg   0.00 0.00 57.00  0.00 22033.00 0.00   386.54 
 0.061.07   0.38   2.16
 sdb   0.00 0.00 13.20  0.00   546.40 0.0041.39 
 0.085.73   5.73   7.56
 sdc   0.00 0.00 14.00  0.00   583.60 0.0041.69 
 0.085.57   5.34   7.48
 sdd   0.00 0.00 12.80  0.00   639.20 0.0049.94 
 0.075.61   5.14   6.58
 sde   0.00 0.00 12.40  0.00   403.20 0.0032.52 
 0.26   20.98  11.03  13.68
 sdf   0.00 0.00 15.00  0.00   475.80 0.0031.72 
 0.095.71   5.37   8.06
 sdg   0.00 0.00 91.80  0.00 39239.00 0.00   427.44 
 0.576.24   0.76   6.94
 sdb   0.00 0.00 10.60  0.00   326.60 0.0030.81 
 0.065.60   5.04   5.34
 sdc   0.00 0.00 12.80  0.00   644.40 0.0050.34 
 0.075.72   5.27   6.74
 sdd   0.00 0.00 14.80  0.00   624.00 0.0042.16 
 0.085.61   5.50   8.14
 sde   0.00 0.00  9.20  0.00   283.00 0.0030.76 
 0.055.83   5.67   5.22
 sdf   0.00 0.00 13.40  0.00   578.00 0.0043.13 
 0.075.39   5.15   6.90
 sdg   0.00 0.00 12.80  

[jira] [Commented] (TS-716) Crash in Continuation::handleEvent

2011-04-19 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021644#comment-13021644
 ] 

John Plevyak commented on TS-716:
-

I have committed 1095127, which fixes a problem with the DNS HostEnt and 
DNSEntry continuations, but it is consistent with only some of the stack traces 
seen here, so this bug stays open until we confirm a fix.

 Crash in Continuation::handleEvent 
 ---

 Key: TS-716
 URL: https://issues.apache.org/jira/browse/TS-716
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 2.1.7
 Environment: CentOS 5.4 x86_64, 6 * 2T SATA Disks, 48G Memory
Reporter: Kissdev
Assignee: John Plevyak
Priority: Critical
 Fix For: 2.1.8

 Attachments: crasher.patch


 ATS crashes with the following configuration: 
   - reverse proxy , storage: 6 raw devices (6*2T),  1 partition (2T)
   - remap config:regex_map http://(.*) http://$1
 The load :  about 100Mbps, requests for top 4000 internet sites, mainly 
 html,js,pictures,flashes
 Detail of crashes by core dump:
 crash #1:
 {code}
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 146 I_Continuation.h: No such file or directory.
 in I_Continuation.h
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 #1  0x00702b80 in EThread::process_event (this=0x2b101010, 
 e=0x90f7170, calling_code=1) at UnixEThread.cc:140
 #2  0x00702fa1 in EThread::execute (this=0x2b101010) at 
 UnixEThread.cc:232
 #3  0x007024d2 in spawn_thread_internal (a=0x8d94a70) at Thread.cc:85
 #4  0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0
 #5  0x0036eb0d3c2d in clone () from /lib64/libc.so.6
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaaba364cb0, 
 event=1, data=0x90f7170) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaaba360a11},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation 
 = {
   next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}}
 {code}
 crash #2:
 {code}
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, 
 event=1, data=0x154b5a80) at I_Continuation.h:146
 #1  0x006db290 in InactivityCop::check_inactivity (this=0x154c8730, 
 event=2, e=0x154b5a80) at UnixNet.cc:57
 #2  0x004dd1bb in Continuation::handleEvent (this=0x154c8730, 
 event=2, data=0x154b5a80) at I_Continuation.h:146
 #3  0x00702b80 in EThread::process_event (this=0x2b606010, 
 e=0x154b5a80, calling_code=2) at UnixEThread.cc:140
 #4  0x00702ec2 in EThread::execute (this=0x2b606010) at 
 UnixEThread.cc:217
 #5  0x007024d2 in spawn_thread_internal (a=0x154852c0) at Thread.cc:85
 #6  0x0036ebc064a7 in start_thread () from /lib64/libpthread.so.0
 #7  0x0036eb0d3c2d in clone () from /lib64/libc.so.6
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaabc0bce80, 
 event=1, data=0x154b5a80) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x16280061},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = {SLinkContinuation 
 = {
   next = 0xefefefefefefefef}, prev = 0xefefefefefefefef}}
 (gdb)
 {code}
 crash #3:
 {code}
 (gdb) bt
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, 
 event=2, data=0x5631120) at I_Continuation.h:146
 #1  0x00702b80 in EThread::process_event (this=0x2abfc010, 
 e=0x5631120, calling_code=2) at UnixEThread.cc:140
 #2  0x00702ec2 in EThread::execute (this=0x2abfc010) at 
 UnixEThread.cc:217
 #3  0x0050917c in main (argc=3, argv=0x7fff0af6e3b8) at Main.cc:1962
 (gdb) frame 0
 #0  0x004dd17a in Continuation::handleEvent (this=0x2aaab45d3a10, 
 event=2, data=0x5631120) at I_Continuation.h:146
 146 in I_Continuation.h
 (gdb) print *this
 $1 = {force_VFPT_to_top = {_vptr.force_VFPT_to_top = 0x2aaab45df291},
   handler = virtual table offset -1157442765409226770, this adjustment 
 -1157442765409226769,
   handler_name = 0xefefefefefefefef Address 0xefefefefefefefef out of 
 bounds, mutex = {m_ptr = 0xefefefefefefefef}, link = 

[jira] [Commented] (TS-652) SSL random buffer initialization should be checked

2011-04-18 Thread John Plevyak (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021341#comment-13021341
 ] 

John Plevyak commented on TS-652:
-

I'd be inclined to 2) and document.

If that is the recommended we of using the library we should just use it.


 SSL random buffer initialization should be checked
 --

 Key: TS-652
 URL: https://issues.apache.org/jira/browse/TS-652
 Project: Traffic Server
  Issue Type: Wish
  Components: SSL
Reporter: John Plevyak
 Fix For: 2.1.8


 The way the SSL random buffers are initialized is interesting... it could 
 also be made more efficient
 with the new 64-bit random number generator.  It looks like it is using 
 whatever is on the stack
 and then hashing it with 2 different random number generators and skipping 
 the first few bytes...
 why, no idea.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >