[jira] [Updated] (TS-4888) collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not perform remap

2016-09-22 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4888:
---
Fix Version/s: 6.2.1

> collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not 
> perform remap
> -
>
> Key: TS-4888
> URL: https://issues.apache.org/jira/browse/TS-4888
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Affects Versions: 6.2.1, 7.1.0
>Reporter: Rajendra Kishore Bonumahanti
> Fix For: 6.2.1, 7.1.0
>
>
> Collapsed_forwarding plugin returns TSREMAP_DID_REMAP as a return value 
> though it did not perform any remap. This causes ATS not to perform remap and 
> makes the transaction failed due to DNS lookup error on "from url".
> For more details..
> Hi,
> I am testing collapsed_forwarding plugin 
> (https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html?highlight=collapsed_forwarding)
>  via ATS 6.2.x branch.
> We observed an error "DNS error 2 for [testurl.com]" for cache-miss, when 
> remap.config is configured with "collapsed_forwarding" to work alone as a 
> remap plugin. We must modify TSRemapDoRemap() in the plugin to "return 
> TSREMAP_NO_REMAP" to allow DNS lookup successful. It does not seem right for 
> the plugin to do "return TSREMAP_NO_REMAP" when it did not.
> Can someone help me to understand how this plugin needs to be used? Or does 
> it require the fix I mentioned above?
> Regards,
> Kishore
> == Sample remap.config entry and cach miss error when used 
> "collapsed_forwarding" by itself == map http://testurl.com/ 
> http://origin.com/ @plugin=collapsed_forwarding.so @pparam=--delay=10 
> @pparam=--retries=5
> I observed that during cache-miss, DNS query happens on the 'from' url 
> (hostname) in the remap and it gets failed.
> 
> [Sep  9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) send query 
> (qtype=1) for testurl.com to fd 43 [Sep  9 19:39:16.355] Server 
> {0x2b170ea6c940} DEBUG: (dns) sent qname = testurl.com, id = 9287, nameserver 
> = 1 [Sep  9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) sent_one: 
> failover_number for resolve 1 is 1 [Sep  9 19:39:16.628] Server 
> {0x2b170ea6c940} DEBUG: (dns) received packet size = 52 [Sep  9 19:39:16.628] 
> Server {0x2b170ea6c940} DEBUG: (dns) round-robin: nameserver 1 DNS respons 
> code = 0 [Sep  9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) received 
> rcode = 2 [Sep  9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) DNS 
> error 2 for [testurl.com] [Sep  9 19:39:16.628] Server {0x2b170ea6c940} 
> DEBUG: (dns) doing retry for testurl.com
> I further looked in to the code and found that it is due to return code from 
> the plugin is TSREMAP_DID_REMAP in TSRemapDoRemap(). It makes ATS not to 
> perform remap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4888) collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not perform remap

2016-09-22 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4888:
---
Fix Version/s: (was: 6.2.1)

> collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not 
> perform remap
> -
>
> Key: TS-4888
> URL: https://issues.apache.org/jira/browse/TS-4888
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Affects Versions: 6.2.1, 7.1.0
>Reporter: Rajendra Kishore Bonumahanti
> Fix For: 7.1.0
>
>
> Collapsed_forwarding plugin returns TSREMAP_DID_REMAP as a return value 
> though it did not perform any remap. This causes ATS not to perform remap and 
> makes the transaction failed due to DNS lookup error on "from url".
> For more details..
> Hi,
> I am testing collapsed_forwarding plugin 
> (https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html?highlight=collapsed_forwarding)
>  via ATS 6.2.x branch.
> We observed an error "DNS error 2 for [testurl.com]" for cache-miss, when 
> remap.config is configured with "collapsed_forwarding" to work alone as a 
> remap plugin. We must modify TSRemapDoRemap() in the plugin to "return 
> TSREMAP_NO_REMAP" to allow DNS lookup successful. It does not seem right for 
> the plugin to do "return TSREMAP_NO_REMAP" when it did not.
> Can someone help me to understand how this plugin needs to be used? Or does 
> it require the fix I mentioned above?
> Regards,
> Kishore
> == Sample remap.config entry and cach miss error when used 
> "collapsed_forwarding" by itself == map http://testurl.com/ 
> http://origin.com/ @plugin=collapsed_forwarding.so @pparam=--delay=10 
> @pparam=--retries=5
> I observed that during cache-miss, DNS query happens on the 'from' url 
> (hostname) in the remap and it gets failed.
> 
> [Sep  9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) send query 
> (qtype=1) for testurl.com to fd 43 [Sep  9 19:39:16.355] Server 
> {0x2b170ea6c940} DEBUG: (dns) sent qname = testurl.com, id = 9287, nameserver 
> = 1 [Sep  9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) sent_one: 
> failover_number for resolve 1 is 1 [Sep  9 19:39:16.628] Server 
> {0x2b170ea6c940} DEBUG: (dns) received packet size = 52 [Sep  9 19:39:16.628] 
> Server {0x2b170ea6c940} DEBUG: (dns) round-robin: nameserver 1 DNS respons 
> code = 0 [Sep  9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) received 
> rcode = 2 [Sep  9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) DNS 
> error 2 for [testurl.com] [Sep  9 19:39:16.628] Server {0x2b170ea6c940} 
> DEBUG: (dns) doing retry for testurl.com
> I further looked in to the code and found that it is due to return code from 
> the plugin is TSREMAP_DID_REMAP in TSRemapDoRemap(). It makes ATS not to 
> perform remap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TS-4887) Clean up Parent Selection URL feature.

2016-09-22 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou reassigned TS-4887:
--

Assignee: Peter Chou

> Clean up Parent Selection URL feature.
> --
>
> Key: TS-4887
> URL: https://issues.apache.org/jira/browse/TS-4887
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Parent Proxy
>Reporter: Peter Chou
>Assignee: Peter Chou
> Fix For: 7.1.0
>
>
> * Remove references to 'maxdirs' and 'fname' from the TS API manual page.
> * Rename the "tmp" variable  in ParentConsistentHash::getPathHash().
> * Clean up debug messages (remove excessive pointer value reporting).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4887) Clean up Parent Selection URL feature.

2016-09-22 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4887:
---
Fix Version/s: 7.1.0

> Clean up Parent Selection URL feature.
> --
>
> Key: TS-4887
> URL: https://issues.apache.org/jira/browse/TS-4887
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Parent Proxy
>Reporter: Peter Chou
>Assignee: Peter Chou
> Fix For: 7.1.0
>
>
> * Remove references to 'maxdirs' and 'fname' from the TS API manual page.
> * Rename the "tmp" variable  in ParentConsistentHash::getPathHash().
> * Clean up debug messages (remove excessive pointer value reporting).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4887) Clean up Parent Selection URL feature.

2016-09-22 Thread Peter Chou (JIRA)
Peter Chou created TS-4887:
--

 Summary: Clean up Parent Selection URL feature.
 Key: TS-4887
 URL: https://issues.apache.org/jira/browse/TS-4887
 Project: Traffic Server
  Issue Type: Improvement
  Components: Parent Proxy
Reporter: Peter Chou


* Remove references to 'maxdirs' and 'fname' from the TS API manual page.
* Rename the "tmp" variable  in ParentConsistentHash::getPathHash().
* Clean up debug messages (remove excessive pointer value reporting).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.

2016-09-20 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507310#comment-15507310
 ] 

Peter Chou commented on TS-4707:


[~gancho] -- This seems fine to me also. I would like to add that you should 
only implement the specific functionality for 'fname' and 'maxdirs' if you 
think it would benefit the wider community. Otherwise, the focus can just be on 
extending the existing cache URL manipulation capability to the parent 
selection URL.

> Parent Consistent Hash Selection - add fname and maxdirs options.
> -
>
> Key: TS-4707
> URL: https://issues.apache.org/jira/browse/TS-4707
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Parent Proxy
>Reporter: Peter Chou
>Assignee: Peter Chou
> Fix For: 7.1.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This enhancement adds two options, "fname" and "maxdirs", which can be used 
> to exclude the file-name and some of the directories in the path. The 
> remaining portions of the path are then used as part of the hash computation 
> for selecting among multiple parent caches.
> For our usage, it was desirable from an operational perspective to direct all 
> components of particular sub-tree to a single parent cache (to simplify 
> trouble-shooting, pre-loading, etc.). This can be achieved by excluding the 
> query-string, file-name, and right-most portions of the path from the hash 
> computation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.

2016-09-16 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497644#comment-15497644
 ] 

Peter Chou commented on TS-4707:


[~jrushford] [~zwoop] [~jpe...@apache.org] -- Except that you have to create a 
new plugin or modify an existing plugin to accomplish the same manipulation 
with the API in PR #1009. This may be a big hurdle from the user perspective. I 
think that since (a) we have allowed 'qstring' option in the past and (b) 
'fname' and 'maxdirs' are very much in the same ball-park (no new data 
structures are used in the path generation, we just adjust the base pointer 
position and length), that it would benefit the user to be able to accomplish a 
reasonable level of manipulation without invoking additional plugins or 
modifying existing plugins.

> Parent Consistent Hash Selection - add fname and maxdirs options.
> -
>
> Key: TS-4707
> URL: https://issues.apache.org/jira/browse/TS-4707
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Parent Proxy
>Reporter: Peter Chou
>Assignee: Peter Chou
> Fix For: 7.1.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This enhancement adds two options, "fname" and "maxdirs", which can be used 
> to exclude the file-name and some of the directories in the path. The 
> remaining portions of the path are then used as part of the hash computation 
> for selecting among multiple parent caches.
> For our usage, it was desirable from an operational perspective to direct all 
> components of particular sub-tree to a single parent cache (to simplify 
> trouble-shooting, pre-loading, etc.). This can be achieved by excluding the 
> query-string, file-name, and right-most portions of the path from the hash 
> computation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-09-16 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4475:
---
Backport to Version: 6.2.1
  Fix Version/s: (was: sometime)
 7.0.0

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
> Fix For: 7.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if (vc->read.vio._cont) {
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> 151   } else {
> 152 switch (event) {
> 153 case VC_EVENT_EOS:
> 154 case VC_EVENT_ERROR:
> (gdb) list
> 155 case VC_EVENT_ACTIVE_TIMEOUT:
> 156 case VC_EVENT_INACTIVITY_TIMEOUT:
> 157   Debug("inactivity_cop", "event %d: null read.vio cont, closing 
> vc %p", event, vc);
> 158   vc->closed = 1;
> 159   break;
> 160 default:
> 161   Error("Unexpected event %d for vc %p", event, vc);
> 162   ink_release_assert(0);
> 163   break;
> 164 }
> Note: I understand that there were several issues related to TS-3196 
> concerning inactivity_cop and this section of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4498) RemapConfig.cc - Print out error message on remap plugin init failure.

2016-09-16 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4498:
---
Backport to Version: 6.2.1

> RemapConfig.cc - Print out error message on remap plugin init failure.
> --
>
> Key: TS-4498
> URL: https://issues.apache.org/jira/browse/TS-4498
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Peter Chou
>Assignee: James Peach
> Fix For: 7.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add printing of the returned error message to the Warning() if a remap plugin 
> fails to init. Currently it just says "bailing out" which is not as useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4853) Parent Consistent Hash Selection - add parent selection URL and API.

2016-09-12 Thread Peter Chou (JIRA)
Peter Chou created TS-4853:
--

 Summary: Parent Consistent Hash Selection - add parent selection 
URL and API.
 Key: TS-4853
 URL: https://issues.apache.org/jira/browse/TS-4853
 Project: Traffic Server
  Issue Type: New Feature
  Components: Parent Proxy
Reporter: Peter Chou


Add the ability (via TS and Lua APIs) to set an explicit parent selection URL 
that will be used for parent consistent hash selection hashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4708) traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.

2016-09-08 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475034#comment-15475034
 ] 

Peter Chou commented on TS-4708:


Additional Info -- although my LD_LIBRARY_PATH is not set at compile time, I 
did find that LDFLAGS was set to include my $HOME/local/lib. This was 
previously done in order to find the GeoIP libraries installed there. This may 
be necessary to reproduce the issue -- which is linking the v7 .la file will 
end up resolving to the v6 .so in some circumstances.

In this reported case, the libtsmgmt.so is linked against libtsutil.so v6 
located in $HOME/local/lib, and the resulting traffic_cop depends on both 
libtsutil.so v7 (its own dependency) and v6 (via libtsmgt.so). If the linking 
command line is changed to specify the .so instead of the .la it will work. If 
the libtsmgmt.so linking command line omits libtsutil completely it will work 
(apparently it is not really required).

I am OK with closing this issue, as the work-around is not to install ATS into 
standard library search paths such as /usr or $HOME/local or else other 
versions of ATS installed into other directories, e.g., /opt or $HOME/opt, may 
be linked incorrectly during compilation. This would impact mostly development 
and build machines rather than production.

> traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.
> -
>
> Key: TS-4708
> URL: https://issues.apache.org/jira/browse/TS-4708
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Peter Chou
>
> Apologies if this is a known issue. I looked through several pages of search 
> results for traffic_cop and did not see this particular issue. Platform is 
> Ubuntu Linux 14.04 LTS 64-bit. I have previously installed and ran 6.1.x 
> under $HOME/local (I am running as an un-privileged user). I just tried 
> compiling and running "master" or 7.0.0 and installed to $HOME/master. I gave 
> the appropriate "--prefix" option to configure each time. Neither of the 
> directories above are in my LD_LIBRARY_PATH at compile or run time.
> Result: traffic_manager starts OK , traffic_server starts OK , traffic_cop 
> fails since it is looking for version-6 library. If I then add 
> $HOME/local/lib to my LD_LIBRARY_PATH (contains previous 6.1.x build), then 
> traffic_cop runs using the version-6 library under there. No idea why it 
> doesn't use the version-7 library that was built at the same time and 
> installed under $HOME/master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4799) Allow minimum log rolling period to be set as low as 30s (down from 60s).

2016-08-30 Thread Peter Chou (JIRA)
Peter Chou created TS-4799:
--

 Summary: Allow minimum log rolling period to be set as low as 30s 
(down from 60s).
 Key: TS-4799
 URL: https://issues.apache.org/jira/browse/TS-4799
 Project: Traffic Server
  Issue Type: Improvement
  Components: Logging
Reporter: Peter Chou


Change MIN_ROLLING_INTERVAL_SEC in proxy/logging/Log.h to 30 (seconds).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4708) traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.

2016-08-30 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450165#comment-15450165
 ] 

Peter Chou commented on TS-4708:


I think that I have narrowed this down to a libtool behavior. In the make 
files, we are linking to the .la file rather than to the un-installed shared 
library. Libtool will somehow translate the explicit .la file (within the build 
tree) to -l* which may search outside of the build tree, e.g., 
../../lib/ts/libtsutil.la in the libtool command ends up being -ltsutil in the 
eventual ld command. I happen to have the older ATS 6.x libtsutil.so in my lib 
so it ends up linking against that even when I am building ATS 7.x.

Not sure this is worth the effort to fix. Probably easier just to be aware of 
and avoid the situation.

> traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.
> -
>
> Key: TS-4708
> URL: https://issues.apache.org/jira/browse/TS-4708
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Peter Chou
> Fix For: 7.0.0
>
>
> Apologies if this is a known issue. I looked through several pages of search 
> results for traffic_cop and did not see this particular issue. Platform is 
> Ubuntu Linux 14.04 LTS 64-bit. I have previously installed and ran 6.1.x 
> under $HOME/local (I am running as an un-privileged user). I just tried 
> compiling and running "master" or 7.0.0 and installed to $HOME/master. I gave 
> the appropriate "--prefix" option to configure each time. Neither of the 
> directories above are in my LD_LIBRARY_PATH at compile or run time.
> Result: traffic_manager starts OK , traffic_server starts OK , traffic_cop 
> fails since it is looking for version-6 library. If I then add 
> $HOME/local/lib to my LD_LIBRARY_PATH (contains previous 6.1.x build), then 
> traffic_cop runs using the version-6 library under there. No idea why it 
> doesn't use the version-7 library that was built at the same time and 
> installed under $HOME/master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-2770) let proxy.config.log.rolling_interval_sec be less than 5mins

2016-08-30 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449786#comment-15449786
 ] 

Peter Chou edited comment on TS-2770 at 8/30/16 6:42 PM:
-

Yes, it appears to. This minimum wasn't protecting against processing delay to 
roll the log or anything like that, i.e., ensure roll is done before the next 
roll?


was (Author: pbchou):
Yes, it appears to. This minimum wasn't protecting against processing delay to 
roll the log or anything like that, i.e., ensure roll is done before the next 
roll.

> let proxy.config.log.rolling_interval_sec be less than 5mins
> 
>
> Key: TS-2770
> URL: https://issues.apache.org/jira/browse/TS-2770
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
> Fix For: 5.0.0
>
>
> 5 minutes is a long time. Let {{proxy.config.log.rolling_interval_sec}} be 
> lower, even as low a 1 minute!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-2770) let proxy.config.log.rolling_interval_sec be less than 5mins

2016-08-30 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449786#comment-15449786
 ] 

Peter Chou commented on TS-2770:


Yes, it appears to. This minimum wasn't protecting against processing delay to 
roll the log or anything like that, i.e., ensure roll is done before the next 
roll.

> let proxy.config.log.rolling_interval_sec be less than 5mins
> 
>
> Key: TS-2770
> URL: https://issues.apache.org/jira/browse/TS-2770
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
> Fix For: 5.0.0
>
>
> 5 minutes is a long time. Let {{proxy.config.log.rolling_interval_sec}} be 
> lower, even as low a 1 minute!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-2770) let proxy.config.log.rolling_interval_sec be less than 5mins

2016-08-30 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449754#comment-15449754
 ] 

Peter Chou commented on TS-2770:


[~jpe...@apache.org] -- Do you think there is any issue if want to further 
lower the minimum to 30 seconds?

> let proxy.config.log.rolling_interval_sec be less than 5mins
> 
>
> Key: TS-2770
> URL: https://issues.apache.org/jira/browse/TS-2770
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Logging
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
> Fix For: 5.0.0
>
>
> 5 minutes is a long time. Let {{proxy.config.log.rolling_interval_sec}} be 
> lower, even as low a 1 minute!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-08-09 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414477#comment-15414477
 ] 

Peter Chou commented on TS-4475:


[~oknet] Hi. I was able to test the fix against "master" after working around 
the TS-4728 bug that I found. Please review the update PR when you get a 
chance. I would also like to request that this fix be applied to both the 7.0.0 
and 6.2.x branches.

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
> Fix For: sometime
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if (vc->read.vio._cont) {
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> 151   } else {
> 152 switch (event) {
> 153 case VC_EVENT_EOS:
> 154 case VC_EVENT_ERROR:
> (gdb) list
> 155 case VC_EVENT_ACTIVE_TIMEOUT:
> 156 case VC_EVENT_INACTIVITY_TIMEOUT:
> 157   Debug("inactivity_cop", "event %d: null read.vio cont, closing 
> vc %p", event, vc);
> 158   vc->closed = 1;
> 159   break;
> 160 default:
> 161   Error("Unexpected event %d for vc %p", event, vc);
> 162   ink_release_assert(0);
> 163   break;
> 164 }
> Note: I understand that there were several issues 

[jira] [Updated] (TS-4728) Null pointer error in LogHost.cc.

2016-08-09 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4728:
---
Affects Version/s: 7.0.0

> Null pointer error in LogHost.cc.
> -
>
> Key: TS-4728
> URL: https://issues.apache.org/jira/browse/TS-4728
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 7.0.0
>Reporter: Peter Chou
>
> [~jpe...@apache.org] I am getting a null pointer access error with the 
> following assertion at the time of traffic_server start-up with log collation 
> enabled (client-side). I was able to get around it by just commenting it out, 
> but perhaps a better fix is required.
> {noformat}
> LogHost::create_orphan_LogFile_object()
> {
>   // We expect that no-one else is holding any refcounts on the
>   // orphan file so that is will be releases when we replace it
>   // below.
>   ink_assert(m_orphan_file->refcount() == 1);
> {noformat}
> Back-trace --
> {noformat}
> #0  0x0053e772 in RefCountObj::refcount (this=0x8) at 
> ../lib/ts/Ptr.h:80
> #1  0x00692f9f in LogHost::create_orphan_LogFile_object 
> (this=0x2268d80) at LogHost.cc:235
> #2  0x00692a45 in LogHost::set_ipstr_port (this=0x2268d80, 
> ipstr=0x2265d40 "127.0.0.1", pt=8085) at LogHost.cc:135
> #3  0x00692b92 in LogHost::set_name_or_ipstr (this=0x2268d80, 
> name_or_ip=0x2265d40 "127.0.0.1") at LogHost.cc:155
> #4  0x00684046 in LogConfig::read_xml_log_config (this=0x21e4110) at 
> LogConfig.cc:1472
> #5  0x0067ff73 in LogConfig::setup_log_objects (this=0x21e4110) at 
> LogConfig.cc:510
> #6  0x0067f858 in LogConfig::init (this=0x21e4110, prev_config=0x0) 
> at LogConfig.cc:395
> #7  0x006721fe in Log::init (flags=0) at Log.cc:925
> #8  0x00542552 in main (argv=0x7ffcc853abd8) at Main.cc:1828
> {noformat}
> I made minimal changes to logs_xml.config to set as client --
> {noformat}
> 
> 
>  : % : %"/>
> 
> 
> 
> 
> 
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4728) Null pointer error in LogHost.cc.

2016-08-09 Thread Peter Chou (JIRA)
Peter Chou created TS-4728:
--

 Summary: Null pointer error in LogHost.cc.
 Key: TS-4728
 URL: https://issues.apache.org/jira/browse/TS-4728
 Project: Traffic Server
  Issue Type: Bug
  Components: Logging
Reporter: Peter Chou


[~jpe...@apache.org] I am getting a null pointer access error with the 
following assertion at the time of traffic_server start-up with log collation 
enabled (client-side). I was able to get around it by just commenting it out, 
but perhaps a better fix is required.
{noformat}
LogHost::create_orphan_LogFile_object()
{
  // We expect that no-one else is holding any refcounts on the
  // orphan file so that is will be releases when we replace it
  // below.
  ink_assert(m_orphan_file->refcount() == 1);
{noformat}

Back-trace --
{noformat}
#0  0x0053e772 in RefCountObj::refcount (this=0x8) at ../lib/ts/Ptr.h:80
#1  0x00692f9f in LogHost::create_orphan_LogFile_object 
(this=0x2268d80) at LogHost.cc:235
#2  0x00692a45 in LogHost::set_ipstr_port (this=0x2268d80, 
ipstr=0x2265d40 "127.0.0.1", pt=8085) at LogHost.cc:135
#3  0x00692b92 in LogHost::set_name_or_ipstr (this=0x2268d80, 
name_or_ip=0x2265d40 "127.0.0.1") at LogHost.cc:155
#4  0x00684046 in LogConfig::read_xml_log_config (this=0x21e4110) at 
LogConfig.cc:1472
#5  0x0067ff73 in LogConfig::setup_log_objects (this=0x21e4110) at 
LogConfig.cc:510
#6  0x0067f858 in LogConfig::init (this=0x21e4110, prev_config=0x0) at 
LogConfig.cc:395
#7  0x006721fe in Log::init (flags=0) at Log.cc:925
#8  0x00542552 in main (argv=0x7ffcc853abd8) at Main.cc:1828
{noformat}

I made minimal changes to logs_xml.config to set as client --
{noformat}


 : % : %"/>







{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4498) RemapConfig.cc - Print out error message on remap plugin init failure.

2016-08-09 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414193#comment-15414193
 ] 

Peter Chou commented on TS-4498:


[~jpe...@apache.org] Would it be possible to back-port this to 6.2.x also? It 
should cherry-pick cleanly.

> RemapConfig.cc - Print out error message on remap plugin init failure.
> --
>
> Key: TS-4498
> URL: https://issues.apache.org/jira/browse/TS-4498
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Peter Chou
>Assignee: James Peach
> Fix For: 7.0.0
>
>
> Add printing of the returned error message to the Warning() if a remap plugin 
> fails to init. Currently it just says "bailing out" which is not as useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4708) traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.

2016-08-09 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414185#comment-15414185
 ] 

Peter Chou commented on TS-4708:


One more data point. It seems that the pre-install binary (left over after 
compilation) works, but the post-install binary does not. Something happens to 
the binary (I think) when you do "make -install". The pre-install binary will 
create a .libs directory in the execution directory with a lt-traffic_cop file 
inside and the program will run. The post-install binary will NOT create the 
.libs directory and then fails since it is searching for the wrong .6 library 
version. Any ideas?

> traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.
> -
>
> Key: TS-4708
> URL: https://issues.apache.org/jira/browse/TS-4708
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Peter Chou
> Fix For: 7.0.0
>
>
> Apologies if this is a known issue. I looked through several pages of search 
> results for traffic_cop and did not see this particular issue. Platform is 
> Ubuntu Linux 14.04 LTS 64-bit. I have previously installed and ran 6.1.x 
> under $HOME/local (I am running as an un-privileged user). I just tried 
> compiling and running "master" or 7.0.0 and installed to $HOME/master. I gave 
> the appropriate "--prefix" option to configure each time. Neither of the 
> directories above are in my LD_LIBRARY_PATH at compile or run time.
> Result: traffic_manager starts OK , traffic_server starts OK , traffic_cop 
> fails since it is looking for version-6 library. If I then add 
> $HOME/local/lib to my LD_LIBRARY_PATH (contains previous 6.1.x build), then 
> traffic_cop runs using the version-6 library under there. No idea why it 
> doesn't use the version-7 library that was built at the same time and 
> installed under $HOME/master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3245) getopt doesn't work correctly when used in plugin chaining

2016-08-08 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412584#comment-15412584
 ] 

Peter Chou commented on TS-3245:


Hi, I opened a PR #845 with a back-port of this patch from "master" [7.0.0] to 
6.2.x.

> getopt doesn't work correctly when used in plugin chaining
> --
>
> Key: TS-3245
> URL: https://issues.apache.org/jira/browse/TS-3245
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Affects Versions: 5.1.1
>Reporter: Sudheer Vinukonda
>Priority: Minor
>  Labels: newbie
> Fix For: sometime
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When multiple plugins that use getopt are chained, it doesn't work correctly 
> for the subsequent plugins after the first plugin. [~jpe...@apache.org] and 
> [~zwoop] suggested that the getopt globals need to be reset (example, 
> {{optind = opterr = optopt = 0}}) before using it and would be better to do 
> it in the core during plugin loading to keep it simple/transparent from 
> plugin development. 
> Note that, if a plugin itself uses getopt multiple times on different argv's, 
> it would have to reset the globals between them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-08-05 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410320#comment-15410320
 ] 

Peter Chou commented on TS-4475:


[~oknet] based on input from Susan Hinrich, I just piggy-backed the 
VC_EVENT_ACTIVE_TIMEOUT and VC_EVENT_INACTIVITY_TIMEOUT events with the actions 
taken for EOS and ERROR in the switch statement. I also modified the debug 
message accordingly. This is similar to what we originally had (see initial 
topic comment above) with the addition of the VC_EVENT_ACTIVE_TIMEOUT as both 
you and Susan suggested.

Can you take a look at what I did in client_open() with --
{noformat}
net_vc->set_inactivity_timeout(HRTIME_SECONDS(86400));
{noformat}
I am not sure if this is what you are recommending for step 2. It seems 
sufficient to set the time-out for the net vc to the previous default of 86400. 
I was not able to test this part in my development environment under "master", 
but we'll back-port it to 6.1.x and test in our lab next week. Appreciate if 
you can give an opinion on this whether it looks right and is in line with your 
thinking.

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
> Fix For: sometime
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if 

[jira] [Created] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.

2016-08-01 Thread Peter Chou (JIRA)
Peter Chou created TS-4707:
--

 Summary: Parent Consistent Hash Selection - add fname and maxdirs 
options.
 Key: TS-4707
 URL: https://issues.apache.org/jira/browse/TS-4707
 Project: Traffic Server
  Issue Type: Improvement
  Components: Parent Proxy
Reporter: Peter Chou


This enhancement adds two options, "fname" and "maxdirs", which can be used to 
exclude the file-name and some of the directories in the path. The remaining 
portions of the path are then used as part of the hash computation for 
selecting among multiple parent caches.
For our usage, it was desirable from an operational perspective to direct all 
components of particular sub-tree to a single parent cache (to simplify 
trouble-shooting, pre-loading, etc.). This can be achieved by excluding the 
query-string, file-name, and right-most portions of the path from the hash 
computation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-08-01 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402589#comment-15402589
 ] 

Peter Chou commented on TS-4475:


Apologies, I had forgotten that we had modified the original solution to just 
ignore the time-out event and return EVENT_CONT instead. This seems to work for 
us (it generates a time-out debug message every 5 minutes if you just leave it 
idle), and it seemed to be a less drastic approach than killing the connection 
on time-out. So based on the previous comment from Oknet, we should go back to 
the original approach of killing the connection (treating it as an error 
instead of ignoring). Should I just piggy-back the VC_EVENT_INACTIVITY_TIMEOUT 
with the actions for VC_EVENT_EOS and VC_EVENT_ERROR (these two are already 
combined) in the switch() statement? It seems there is some code for handling 
these two events in addition to just calling client_fail(). Perhaps the 
time-out should execute these actions also?

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
> Fix For: sometime
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if (vc->read.vio._cont) {
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> 151   } else {
> 152 

[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-07-28 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398056#comment-15398056
 ] 

Peter Chou commented on TS-4475:


I submitted a PR with a fix so that the Log Collation Client SM will treat the 
VC_EVENT_INACTIVITY_TIMEOUT event as an error (if un-handled in some way, it 
would result in a core dump). In the lab, this problem can be created by (1) 
starting ATS (2) sending a request [ this causes a log collation client 
connection to be established ] (3) let it sit idle for 300s with no further 
requests [ ATS will now core dump when the time-out is generated by inactivity 
cop and sent to the client SM ].

Admittedly, this is somewhat of a lab issue, but there is definitely some 
possibility that 5m of idle time could be experienced depending on the site in 
question.

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
> Fix For: sometime
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if (vc->read.vio._cont) {
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> 151   } else {
> 152 switch (event) {
> 153 case VC_EVENT_EOS:
> 154 case VC_EVENT_ERROR:
> (gdb) list
> 155 case VC_EVENT_ACTIVE_TIMEOUT:
> 156 case 

[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-06-01 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311015#comment-15311015
 ] 

Peter Chou commented on TS-4475:


We only changed the default_inactivity_timeout from 86400 to 300 seconds. The 
inactivity_check_frequency remained at the default of 1s. Understood the 
default_inactivity_timeout is by default enabled with a long delay. When I say 
"making use of" I meant setting it for a value that would actually be useful to 
clear the unresponsive ATS condition in a reasonable amount of time. By the 
way, I am not sure if anyone is supporting this sub-system since there have 
been no other responses. Do you think it is reasonable to allow the log 
collation continuation to handle the inactivity event by treating it like an 
EOS/ERROR (just piggy-back in the switch statement)? We can try this in our 
lab. So far we have experienced three core dumps due to this issue ~ 4-days 
apart each (ATS is just sitting there idle in a lab environment).

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
> Fix For: sometime
>
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if (vc->read.vio._cont) {
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> 151   } else {
> 152 switch (event) {
> 153 case 

[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-05-24 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299154#comment-15299154
 ] 

Peter Chou commented on TS-4475:


Update - I had an e-mail exchange with sudheerv on this issue. Turns out that 
the continuation (for Log Collator Client SM in this case) should be updated to 
handle the VC_EVENT_INACTIVITY_TIMEOUT (105) much like HttpSM.cc does, for 
example. Unfortunately, I looked and there are no existing instances of 
handling this event in LogCollationClientSM.cc. My first simplistic approach 
would be to piggy-back this event where I see VC_EVENT_EOS and VC_EVENT_ERROR 
being handled. Any recommendations? Also, is Log Collator client feature still 
supported going forward?

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if (vc->read.vio._cont) {
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> 151   } else {
> 152 switch (event) {
> 153 case VC_EVENT_EOS:
> 154 case VC_EVENT_ERROR:
> (gdb) list
> 155 case VC_EVENT_ACTIVE_TIMEOUT:
> 156 case VC_EVENT_INACTIVITY_TIMEOUT:
> 157   Debug("inactivity_cop", "event %d: null read.vio cont, closing 
> vc %p", event, vc);
> 158   vc->closed = 1;
> 159   

[jira] [Commented] (TS-4461) Not closing client connections

2016-05-24 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298965#comment-15298965
 ] 

Peter Chou commented on TS-4461:


Just curious if you have the inactivity cop time-out set for 10-minutes or not 
in your scenario? We experienced a similar issue with 6.1.1 in our lab testing 
when ATS would become unresponsive to client requests under heavy load. After 
the load was stopped, ATS continued to ignore client requests (single curl 
requests) for > 1-hour. It seemed locked up with 0 active connections 
internally, but approximately throttle-limit established connections reported 
with 'netstat -na'. After we set inactivity-cop to 300-seconds from 
86400-seconds it seemed to avoid the locked-up condition but resulted in 
TS-4475 periodic core-dumps some time later (no load just sitting there). Do 
you think our results fall under this issue?

> Not closing client connections
> --
>
> Key: TS-4461
> URL: https://issues.apache.org/jira/browse/TS-4461
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 6.2.0
>Reporter: Bryan Call
>Assignee: Susan Hinrichs
>Priority: Blocker
> Fix For: 7.0.0
>
>
> Looks like we are not closing client connections correctly on the 6.2.x 
> branch.  After taking a server our of rotation for awhile.
> {code}
> [bcall@l28 ~]$ ss -s
> Total: 18212 (kernel 18329)
> TCP:   18122 (estab 17141, closed 123, orphaned 4, synrecv 0, timewait 
> 123/0), ports 152
> {code}
> in traffic top:
> {code}
>  CLIENTORIGIN SERVER
> Requests 1.8 Head Bytes 492.0Requests 1.8 Head Bytes 345.7
> Req/Conn 1.0 Body Bytes   0.0Req/Conn 1.0 Body Bytes   0.0
> New Conn 1.8 Avg Size   269.0New Conn 1.8 Avg Size   189.0
> Curr Conn0.0 Net (bits)   3.9K   Curr Conn0.0 Net (bits)   
> 2.8K
> Active Con   6.6MResp (ms)0.8
> Dynamic KA   0.0
> {code}
> Looks like it is happening on the client connections to TLS ports (ip of the 
> server removed):
> {code}
> [bcall@l28 ~]$ ss -tn | grep 'XXX:44[3-4]' | wc -l
> 12434
> {code}
> And not on the non-TLS ports
> {code}
> [bcall@l28 ~]$ ss -tn | grep 'XXX:8' | wc -l
> 0
> {code}
> Count of the fd for the traffic_server process:
> {code}
> [bcall@l28 ~]$ sudo ls -l /proc/$(pidof traffic_server)/fd | wc -l
> 18127
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-05-23 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4475:
---
Affects Version/s: 6.1.1

> Crash in Log-Collation client after using inactivity-cop.
> -
>
> Key: TS-4475
> URL: https://issues.apache.org/jira/browse/TS-4475
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 6.1.1
>Reporter: Peter Chou
>
> Background: We recently tried making use of inactivity-cop by setting it to 
> 300s instead of the default one-day setting. This was to address an issue 
> where, under heavy load, ATS would become un-responsive to client requests, 
> and the condition would persist after traffic was stopped with the active 
> queue saying 0 connections but 'netstat -na' showing a bunch of established 
> connections (up to the throttle limit approximately).
> Inactivity cop seemed to help ATS handle this situation, but we have since 
> experienced a couple of core dumps over the last four day period. It seems 
> occasionally the Log Collation Client State Machine will have event value 105 
> or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() 
> it tries to call the continuation handler which down the line does not know 
> about this event thus causing core dump !"unexpcted state" [sic].
> Here is the back-trace --
> (gdb) bt
> #0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
> #1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
> #2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
> #3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
> assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
> #4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
> failed assert `%s`") at ink_error.cc:73
> #5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
> state\"", file=0x7fb35b "LogCollationClientSM.cc",
> line=445) at ink_assert.cc:37
> #6  0x0069c86b in LogCollationClientSM::client_idle 
> (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
> #7  0x0069b427 in LogCollationClientSM::client_handler 
> (this=0x2b681400bb00, event=105, data=0x2b680c017020)
> at LogCollationClientSM.cc:119
> #8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
> event=105, data=0x2b680c017020)
> at ../iocore/eventsystem/I_Continuation.h:153
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
> event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
> #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
> event=1, data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
> event=2, e=0x127ad60) at UnixNet.cc:102
> #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
> data=0x127ad60)
> at ../iocore/eventsystem/I_Continuation.h:153
> #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
> e=0x127ad60, calling_code=2) at UnixEThread.cc:128
> #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
> UnixEThread.cc:207
> #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918
> I believe it takes a wrong turn here --
> #9  0x00783d40 in read_signal_and_update (event=105, 
> vc=0x2b680c016f00) at UnixNetVConnection.cc:150
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> (gdb) list
> 145 static inline int
> 146 read_signal_and_update(int event, UnixNetVConnection *vc)
> 147 {
> 148   vc->recursion++;
> 149   if (vc->read.vio._cont) {
> 150 vc->read.vio._cont->handleEvent(event, >read.vio);
> 151   } else {
> 152 switch (event) {
> 153 case VC_EVENT_EOS:
> 154 case VC_EVENT_ERROR:
> (gdb) list
> 155 case VC_EVENT_ACTIVE_TIMEOUT:
> 156 case VC_EVENT_INACTIVITY_TIMEOUT:
> 157   Debug("inactivity_cop", "event %d: null read.vio cont, closing 
> vc %p", event, vc);
> 158   vc->closed = 1;
> 159   break;
> 160 default:
> 161   Error("Unexpected event %d for vc %p", event, vc);
> 162   ink_release_assert(0);
> 163   break;
> 164 }
> Note: I understand that there were several issues related to TS-3196 
> concerning inactivity_cop and this section of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

2016-05-23 Thread Peter Chou (JIRA)
Peter Chou created TS-4475:
--

 Summary: Crash in Log-Collation client after using inactivity-cop.
 Key: TS-4475
 URL: https://issues.apache.org/jira/browse/TS-4475
 Project: Traffic Server
  Issue Type: Bug
  Components: Logging
Reporter: Peter Chou


Background: We recently tried making use of inactivity-cop by setting it to 
300s instead of the default one-day setting. This was to address an issue 
where, under heavy load, ATS would become un-responsive to client requests, and 
the condition would persist after traffic was stopped with the active queue 
saying 0 connections but 'netstat -na' showing a bunch of established 
connections (up to the throttle limit approximately).

Inactivity cop seemed to help ATS handle this situation, but we have since 
experienced a couple of core dumps over the last four day period. It seems 
occasionally the Log Collation Client State Machine will have event value 105 
or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() it 
tries to call the continuation handler which down the line does not know about 
this event thus causing core dump !"unexpcted state" [sic].

Here is the back-trace --

(gdb) bt
#0  0x2b67cd5405f7 in raise () from /lib64/libc.so.6
#1  0x2b67cd541e28 in abort () from /lib64/libc.so.6
#2  0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43
#3  0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed 
assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
#4  0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: 
failed assert `%s`") at ink_error.cc:73
#5  0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted 
state\"", file=0x7fb35b "LogCollationClientSM.cc",
line=445) at ink_assert.cc:37
#6  0x0069c86b in LogCollationClientSM::client_idle 
(this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
#7  0x0069b427 in LogCollationClientSM::client_handler 
(this=0x2b681400bb00, event=105, data=0x2b680c017020)
at LogCollationClientSM.cc:119
#8  0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, 
event=105, data=0x2b680c017020)
at ../iocore/eventsystem/I_Continuation.h:153
#9  0x00783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) 
at UnixNetVConnection.cc:150
#10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, 
event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
#11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, 
event=1, data=0x127ad60)
at ../iocore/eventsystem/I_Continuation.h:153
#12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, 
event=2, e=0x127ad60) at UnixNet.cc:102
#13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, 
data=0x127ad60)
at ../iocore/eventsystem/I_Continuation.h:153
#14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, 
e=0x127ad60, calling_code=2) at UnixEThread.cc:128
#15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at 
UnixEThread.cc:207
#16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918

I believe it takes a wrong turn here --

#9  0x00783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) 
at UnixNetVConnection.cc:150
150 vc->read.vio._cont->handleEvent(event, >read.vio);
(gdb) list
145 static inline int
146 read_signal_and_update(int event, UnixNetVConnection *vc)
147 {
148   vc->recursion++;
149   if (vc->read.vio._cont) {
150 vc->read.vio._cont->handleEvent(event, >read.vio);
151   } else {
152 switch (event) {
153 case VC_EVENT_EOS:
154 case VC_EVENT_ERROR:
(gdb) list
155 case VC_EVENT_ACTIVE_TIMEOUT:
156 case VC_EVENT_INACTIVITY_TIMEOUT:
157   Debug("inactivity_cop", "event %d: null read.vio cont, closing vc 
%p", event, vc);
158   vc->closed = 1;
159   break;
160 default:
161   Error("Unexpected event %d for vc %p", event, vc);
162   ink_release_assert(0);
163   break;
164 }

Note: I understand that there were several issues related to TS-3196 concerning 
inactivity_cop and this section of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4450) Syntax error in CI test script test_https.py.

2016-05-17 Thread Peter Chou (JIRA)
Peter Chou created TS-4450:
--

 Summary: Syntax error in CI test script test_https.py.
 Key: TS-4450
 URL: https://issues.apache.org/jira/browse/TS-4450
 Project: Traffic Server
  Issue Type: Bug
  Components: CI
Reporter: Peter Chou


I don't know Python, but the parenthesis seems to be un-needed or at least 
un-balanced here. Sorry about the formatting, the caret is pointing to the 
parenthesis.

  File "/usr/src/git/trafficserver/ci/tsqa/tests/test_https.py", line 318
signal_cmd = [traffic_ctl, 'config', 'reload')]
 ^
SyntaxError: invalid syntax




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-4411) Add a error message on unrecognized remap.config @... option.

2016-05-02 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267317#comment-15267317
 ] 

Peter Chou edited comment on TS-4411 at 5/2/16 7:21 PM:


I opened a PR with the patch. It is just a one-liner that prints a diagnostic 
message to the error.log. I did notice that the remap_check_option() is 
apparently run three times on start-up so the message is printed three times. I 
did not attempt to squash this as it should be a rare exception condition.


was (Author: pbchou):
I opened a PR with the patch. It is just a one-liner that prints a diagnostic 
message to the error.log. I did not notice that the remap_check_option() is 
apparently run three times on start-up so the message is printed three times. I 
did not attempt to squash this as it should be a rare exception condition.

> Add a error message on unrecognized remap.config @... option.
> -
>
> Key: TS-4411
> URL: https://issues.apache.org/jira/browse/TS-4411
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Reporter: Peter Chou
>Assignee: Peter Chou
>  Labels: review
> Fix For: 7.0.0
>
>
> We noticed that unrecognized remap.config options seem to result in "silent" 
> failures, i.e., the remap rule "map /a /b @foo" just reduces to a plain "map 
> /a /b" rule. This is not desirable when we are implementing access control 
> and other functionality in the rule's plugin chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4411) Add a error message on unrecognized remap.config @... option.

2016-05-02 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267317#comment-15267317
 ] 

Peter Chou commented on TS-4411:


I opened a PR with the patch. It is just a one-liner that prints a diagnostic 
message to the error.log. I did not notice that the remap_check_option() is 
apparently run three times on start-up so the message is printed three times. I 
did not attempt to squash this as it should be a rare exception condition.

> Add a error message on unrecognized remap.config @... option.
> -
>
> Key: TS-4411
> URL: https://issues.apache.org/jira/browse/TS-4411
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Core
>Reporter: Peter Chou
>Assignee: Peter Chou
>  Labels: review
> Fix For: 7.0.0
>
>
> We noticed that unrecognized remap.config options seem to result in "silent" 
> failures, i.e., the remap rule "map /a /b @foo" just reduces to a plain "map 
> /a /b" rule. This is not desirable when we are implementing access control 
> and other functionality in the rule's plugin chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4411) Add a error message on unrecognized remap.config @... option.

2016-05-02 Thread Peter Chou (JIRA)
Peter Chou created TS-4411:
--

 Summary: Add a error message on unrecognized remap.config @... 
option.
 Key: TS-4411
 URL: https://issues.apache.org/jira/browse/TS-4411
 Project: Traffic Server
  Issue Type: Improvement
  Components: Core
Reporter: Peter Chou


We noticed that unrecognized remap.config options seem to result in "silent" 
failures, i.e., the remap rule "map /a /b @foo" just reduces to a plain "map /a 
/b" rule. This is not desirable when we are implementing access control and 
other functionality in the rule's plugin chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4410) Fix i386 compiler warning - unsigned-vs-signed comparison in hostdb.

2016-05-02 Thread Peter Chou (JIRA)
Peter Chou created TS-4410:
--

 Summary: Fix i386 compiler warning - unsigned-vs-signed comparison 
in hostdb.
 Key: TS-4410
 URL: https://issues.apache.org/jira/browse/TS-4410
 Project: Traffic Server
  Issue Type: Bug
  Components: DNS
Reporter: Peter Chou


Compiler warning shows up on i386 32-bit build due to unsigned-vs-signed int 
comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4353) Support multiple/custom GeoIP databases.

2016-04-14 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241950#comment-15241950
 ] 

Peter Chou commented on TS-4353:


Following branch is available for those interested --
git pull https://github.com/pbchou/trafficserver TS-4353

> Support multiple/custom GeoIP databases.
> 
>
> Key: TS-4353
> URL: https://issues.apache.org/jira/browse/TS-4353
> Project: Traffic Server
>  Issue Type: Improvement
>  Components: Plugins
>Reporter: Peter Chou
>
> We have an internally developed plugin that we worked on based on suggestions 
> from Shu Kit Chan. This plugin is a global/remap plugin that allows you to 
> specify multiple IPv4 country databases in the global plugin.config file 
> (important for multiple customers on an ATS instance). Each DB is assigned a 
> tag string, e.g., --tag=foo --file=path-to-foo-file --tag=bar 
> --file=path-to-bar-file.
> In the remap context, the plugin will look-up the country code of the client 
> IP and place it into an ATS internal header for down-chain plugins (such as 
> tslua) to use. The selector for controlling which DB to use for the look-up 
> for each remap rule is @pparam=foo.
> I understand that GeoIP enhancements have recently been added to 
> header_rewrite which can perform header changes based on GeoIP information. 
> Would there be some value in adding the multiple/custom-DB feature to 
> header_rewrite or possibly establishing a generic GeoIP helper plugin that 
> handles the DB management for other plugins?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4353) Support multiple/custom GeoIP databases.

2016-04-14 Thread Peter Chou (JIRA)
Peter Chou created TS-4353:
--

 Summary: Support multiple/custom GeoIP databases.
 Key: TS-4353
 URL: https://issues.apache.org/jira/browse/TS-4353
 Project: Traffic Server
  Issue Type: Improvement
  Components: Plugins
Reporter: Peter Chou


We have an internally developed plugin that we worked on based on suggestions 
from Shu Kit Chan. This plugin is a global/remap plugin that allows you to 
specify multiple IPv4 country databases in the global plugin.config file 
(important for multiple customers on an ATS instance). Each DB is assigned a 
tag string, e.g., --tag=foo --file=path-to-foo-file --tag=bar 
--file=path-to-bar-file.

In the remap context, the plugin will look-up the country code of the client IP 
and place it into an ATS internal header for down-chain plugins (such as tslua) 
to use. The selector for controlling which DB to use for the look-up for each 
remap rule is @pparam=foo.

I understand that GeoIP enhancements have recently been added to header_rewrite 
which can perform header changes based on GeoIP information. Would there be 
some value in adding the multiple/custom-DB feature to header_rewrite or 
possibly establishing a generic GeoIP helper plugin that handles the DB 
management for other plugins?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4252) Some plugins are causing seg-faults when using getopt_long with optind = 1.

2016-03-30 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4252:
---
Affects Version/s: 6.1.1
  Environment: Linux Intel 64-bit Ubuntu 14.04 and RHEL 7.
  Description: 
There are several global plugins which experience segmentation fault related to 
parsing (argc, argv) arguments using getopt_long(). Often, the plugins display 
debug output showing corrupted arguments, e.g., arguments belonging to previous 
entries in plugins.config. This has been confirmed to happen with 
background_fetch.so and regex_revalidate.so. The other plugins remap_stats and 
stale_while_revalidate may also be affected based on code review.

This issue is corrected if the plugins are modified to use optind = 0 instead 
of optind = 1 before calling getopt_long(). Note that the majority of plugins 
are using optind = 0 already. Per the Linux man page, you should only need to 
set optind = 1 between scanning different argument vectors, but you must set 
optind = 0 to cause some re-initialization to occur if you make use of GNU 
extensions in the opstring argument of getopt_long(). I am not sure if this 
applies to prior plugin using GNU extensions or current one (or going between 
one or the other), but it would seem safer to use optind = 0 always.

  was:
In "plugin.config" if we just do background_fetch.so with no arguments we get a 
segmentation fault with messages saying invalid option with argument text from 
previous lines in the configuration file. If I just add a garbage argument like 
"background_fetch.so bleah" there is no fault.

I noticed that this plugin initialized optind to 1 before calling getopt_long 
while others such as tcpinfo set it to 0. Setting it to 0 also prevents the 
fault. Is this the correct fix?

  Summary: Some plugins are causing seg-faults when using 
getopt_long with optind = 1.  (was: background_fetch.so segfaults with no 
arguments as a global plugin.)

Updated description to indicate multiple plugins affected.

> Some plugins are causing seg-faults when using getopt_long with optind = 1.
> ---
>
> Key: TS-4252
> URL: https://issues.apache.org/jira/browse/TS-4252
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Plugins
>Affects Versions: 6.1.1
> Environment: Linux Intel 64-bit Ubuntu 14.04 and RHEL 7.
>Reporter: Peter Chou
>Assignee: Leif Hedstrom
> Fix For: 6.2.0
>
>
> There are several global plugins which experience segmentation fault related 
> to parsing (argc, argv) arguments using getopt_long(). Often, the plugins 
> display debug output showing corrupted arguments, e.g., arguments belonging 
> to previous entries in plugins.config. This has been confirmed to happen with 
> background_fetch.so and regex_revalidate.so. The other plugins remap_stats 
> and stale_while_revalidate may also be affected based on code review.
> This issue is corrected if the plugins are modified to use optind = 0 instead 
> of optind = 1 before calling getopt_long(). Note that the majority of plugins 
> are using optind = 0 already. Per the Linux man page, you should only need to 
> set optind = 1 between scanning different argument vectors, but you must set 
> optind = 0 to cause some re-initialization to occur if you make use of GNU 
> extensions in the opstring argument of getopt_long(). I am not sure if this 
> applies to prior plugin using GNU extensions or current one (or going between 
> one or the other), but it would seem safer to use optind = 0 always.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4266) ATS memory statistics shows that memory utilization is doubled after “traffic_ctlconfig reload”. And it is failed as it cannot find enough memory.

2016-03-22 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207119#comment-15207119
 ] 

Peter Chou commented on TS-4266:


Kit, sorry for any confusion. I will work with Kishore to submit a pull request 
to apache/trafficserver. I think Kishore only merged the pull request with his 
own fork at brkishore/trafficserver in his comment above. Thanks for reviewing.

> ATS memory statistics shows that memory utilization is doubled after 
> “traffic_ctlconfig reload”. And it is failed as it cannot find enough memory.
> --
>
> Key: TS-4266
> URL: https://issues.apache.org/jira/browse/TS-4266
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Lua
>Reporter: Rajendra Kishore Bonumahanti
>Assignee: Kit Chan
> Fix For: sometime
>
>
> ATS memory statistics shows memory utilization is doubled after “traffic_ctl 
> config reload”. We get “not enough memory” error in the subsequent attempt 
> and “config reload” fails.
> ATS is configured with 100 map entries in remap.config, all share the same 
> lua script.
> ATS is started: The memory information is..
> [root@mtanjv8cdnc73 trafficserver]# pmap -x 113330 | grep total
> total kB 1416092  670256  663736
> After 1st Config reload:
> [root@mtanjv8cdnc73 trafficserver]# pmap -x 113330 | grep total
> total kB 1932660 1128084 1121544
> After 2nd config reload: It had failed with error “not enough memory” and 
> memory status as..
> [root@mtanjv8cdnc73 trafficserver]# pmap -x 113330 | grep total
> total kB 2170756 1167808 1160836
> Error displayed in diags.log:
> ===
> [Mar  8 23:27:27.580] Server {0x2af92498b700} WARNING: Failed to create new 
> instance for plugin /opt/trafficserver/libexec/trafficserver/tslua.so (not a 
> TS_SUCCESS return)
> [Mar  8 23:27:27.580] Server {0x2af92498b700} WARNING: Could not add rule at 
> line #3; Aborting!
> [Mar  8 23:27:27.580] Server {0x2af92498b700} WARNING: [ReverseProxy] Can't 
> create new remap instance for plugin 
> "/opt/trafficserver/libexec/trafficserver/tslua.so" - [ts_lua_add_module] 
> luaL_loadfile /opt/trafficserver/etc/trafficserver/lua/process_remap.lua 
> failed: not enough memory at line 3
> [Mar  8 23:27:27.580] Server {0x2af92498b700} WARNING: something failed 
> during BuildTable() -- check your remap plugins!
> [Mar  8 23:27:27.595] Server {0x2af92498b700} WARNING: failed to reload 
> remap.config, not replacing!
> Lua VM memory size at that time ,ts.debug(FUNCTION..'Lua VM memory: 
> '..collectgarbage("count")) 
> [Mar  8 23:27:27.579] Server {0x2af92498b700} DIAG: (ts_lua) __init__(): Lua 
> VM memory: 3629.7060546875
> This shows that Lua VMs are hitting the max capacity.
> Solution:
> ===
> I looked at the ts_lua code TSRemapDeleteInstance () [ts_lua.c ] and 
> ts_lua_del_module() [ts_lua_util.c] which does cleaning of the lua memory for 
> the instance. However the lua memory is not released and reused. 
> So, I have added code to start the garbage collector in ts_lua_del_module() .
> int
> ts_lua_del_module(ts_lua_instance_conf *conf, ts_lua_main_ctx *arr, int n)
> {
> ….
> lua_newtable(L);
> lua_replace(L, LUA_GLOBALSINDEX); /* L[GLOBAL] = EMPTY  */
> lua_gc(L, LUA_GCCOLLECT, 0);
> TSMutexUnlock(arr[i].mutexp);
> }
>   return 0;
> }
> This has improved the situation. However, I also added garbage collection in 
> ts_lua_add_module() at the end. With these two additions, we have tested the 
> code, the memory utilization is stable and we could do config reload at lest 
> 100 times with the background load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4252) background_fetch.so segfaults with no arguments as a global plugin.

2016-03-03 Thread Peter Chou (JIRA)
Peter Chou created TS-4252:
--

 Summary: background_fetch.so segfaults with no arguments as a 
global plugin.
 Key: TS-4252
 URL: https://issues.apache.org/jira/browse/TS-4252
 Project: Traffic Server
  Issue Type: Bug
  Components: Plugins
Reporter: Peter Chou


In "plugin.config" if we just do background_fetch.so with no arguments we get a 
segmentation fault with messages saying invalid option with argument text from 
previous lines in the configuration file. If I just add a garbage argument like 
"background_fetch.so bleah" there is no fault.

I noticed that this plugin initialized optind to 1 before calling getopt_long 
while others such as tcpinfo set it to 0. Setting it to 0 also prevents the 
fault. Is this the correct fix?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.

2016-01-15 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102336#comment-15102336
 ] 

Peter Chou commented on TS-4134:


Alan, the problem was evident only when running as a non-root user. I applied 
your patch and it seems to be working fine now. It also seemed to fix another 
issue where running 'trafficserver start' would only start traffic_cop and 
subsequently traffic_manager would have to be started manually and separately. 
Thanks for explaining about the scoping/destructor/auto-de-elevate behavior and 
for the quick response.

> Traffic Manager aborts on attempted privilege escalation when non-root.
> ---
>
> Key: TS-4134
> URL: https://issues.apache.org/jira/browse/TS-4134
> Project: Traffic Server
>  Issue Type: Bug
>Affects Versions: 6.2.0
>Reporter: Peter Chou
>Assignee: Alan M. Carroll
> Fix For: 6.1.0
>
>
> Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and 
> mgmt/LocalManager.cc. The root of the issue might be that the semantics of 
> the ElevateAccess constructor argument was changed from (boolean,level) to 
> just a (level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess 
> access(  ) calls in these two files were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.

2016-01-14 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099189#comment-15099189
 ] 

Peter Chou commented on TS-4134:


FYI, from grepping for 'ElevateAccess ' there are around ten instances of the 
constructor being called that may need to be reviewed. Something like - 
ElevateAccess access(root_access_needed); - would need to be wrapped in a 
conditional instead like - if (root_access_needed) { ElevateAccess access; } - 
and so on.

> Traffic Manager aborts on attempted privilege escalation when non-root.
> ---
>
> Key: TS-4134
> URL: https://issues.apache.org/jira/browse/TS-4134
> Project: Traffic Server
>  Issue Type: Bug
>Affects Versions: 6.2.0
>Reporter: Peter Chou
> Fix For: 6.1.0
>
>
> Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and 
> mgmt/LocalManager.cc. The root of the issue might be that the semantics of 
> the ElevateAccess constructor argument was changed from (boolean,level) to 
> just a (level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess 
> access(  ) calls in these two files were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.

2016-01-14 Thread Peter Chou (JIRA)
Peter Chou created TS-4134:
--

 Summary: Traffic Manager aborts on attempted privilege escalation 
when non-root.
 Key: TS-4134
 URL: https://issues.apache.org/jira/browse/TS-4134
 Project: Traffic Server
  Issue Type: Bug
Reporter: Peter Chou


Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and 
mgmt/LocalManager.cc. The root of the issue might be that the semantics of the 
ElevateAccess constructor argument was changed from (boolean,level) to just a 
(level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess access( 
 ) calls in these two files were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.

2016-01-14 Thread Peter Chou (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Chou updated TS-4134:
---
Affects Version/s: 6.2.0

> Traffic Manager aborts on attempted privilege escalation when non-root.
> ---
>
> Key: TS-4134
> URL: https://issues.apache.org/jira/browse/TS-4134
> Project: Traffic Server
>  Issue Type: Bug
>Affects Versions: 6.2.0
>Reporter: Peter Chou
>
> Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and 
> mgmt/LocalManager.cc. The root of the issue might be that the semantics of 
> the ElevateAccess constructor argument was changed from (boolean,level) to 
> just a (level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess 
> access(  ) calls in these two files were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-4021) Lua Plugin - Expose API Call TSHttpTxnFollowRedirect()

2015-11-13 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004713#comment-15004713
 ] 

Peter Chou commented on TS-4021:


Kit, thanks for volunteering to take a look at the proposed changes. I did 
basic testing under Ubuntu Linux 32-bit for this change.

* First, we left "CONFIG proxy.config.http.redirection_enabled" out of 
records.config so it stayed as default of 0 or disabled globally.
* Second, In remap.config, we added "map ... @plugin=tslua.so 
@pparam=.../my.lua".
* Third, in my.lua we added --
  function do_remap()
ts.http.enable_redirect(1)
return 0
  end

> Lua Plugin - Expose API Call TSHttpTxnFollowRedirect()
> --
>
> Key: TS-4021
> URL: https://issues.apache.org/jira/browse/TS-4021
> Project: Traffic Server
>  Issue Type: New Feature
>  Components: Lua, Plugins
>Affects Versions: 6.1.0
>Reporter: Jeremy Payne
>Assignee: Kit Chan
>Priority: Minor
> Fix For: 6.1.0
>
>
> Instead of relying on a config override, this plugin 'new feature'  would 
> allow for enabling origin server redirection on the fly; via a direct API 
> call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TS-3932) TCP TOS not working

2015-10-20 Thread Peter Chou (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965741#comment-14965741
 ] 

Peter Chou commented on TS-3932:


Hi. I confirmed that this behavior is not present in 5.3.0, but it is present 
in later releases starting with 5.3.1. I traced the relevant code change to 
iocore/net/UnixNetProcessor.cc:455 where THREAD_ALLOC in 5.3.0 was changed to 
THREAD_ALLOC_INIT in 5.3.1. However, I think that the actual root problem also 
exists in 5.3.0, but it was just being masked by this difference in the thread 
allocation call.

I believe the root problem is in iocore/net/UnixConnection.cc:303 
(Connection::open()) where the apply_options() call to set the TOS bits are 
made before addr is assigned a valid value in Connection::connect() later on. 
The addr must be valid in order for the addr.isIp4() check used in 
apply_options() to work. Under 5.3.1, addr is uninitialized  on all passes 
through Connection::open while in 5.3.0 it is uninitialized for the first pass 
but perhaps contains remnant values on subsequent passes.

Questions:
1. Thoughts on moving apply_options() from Connection::open to 
Connection::connect (after the addr is set by setRemote(target))?
2. We are looking for a fix in the 5.3.x branch so should I open a separate 
issue?

> TCP TOS not working
> ---
>
> Key: TS-3932
> URL: https://issues.apache.org/jira/browse/TS-3932
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 6.0.0
>Reporter: Bryan Call
>  Labels: Regression
> Fix For: 6.1.0
>
>
> jasonstrongman2016:
> In 5.3.0 the below works. However, seems to be broken in other
> releases. Including this one.
> # /opt/trafficserver60rc3/bin/traffic_ctl -V
> Apache Traffic Server - traffic_ctl - 6.0.0 - (build # 091616 on Sep
> 16 2015 at 16:49:13)
> # /opt/trafficserver60rc3/bin/traffic_ctl config match sock_packet_tos_out
> proxy.config.net.sock_packet_tos_out: 184
> #tcpdump
> 17:55:07.377780 IP (tos 0x0, ttl 64, id 45468, offset 0, flags [DF],
> proto TCP (6), length 60)
>10.0.0.71.51306 > 74.125.227.196.80: Flags [S],



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)