[jira] [Updated] (TS-4888) collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not perform remap
[ https://issues.apache.org/jira/browse/TS-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4888: --- Fix Version/s: 6.2.1 > collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not > perform remap > - > > Key: TS-4888 > URL: https://issues.apache.org/jira/browse/TS-4888 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Affects Versions: 6.2.1, 7.1.0 >Reporter: Rajendra Kishore Bonumahanti > Fix For: 6.2.1, 7.1.0 > > > Collapsed_forwarding plugin returns TSREMAP_DID_REMAP as a return value > though it did not perform any remap. This causes ATS not to perform remap and > makes the transaction failed due to DNS lookup error on "from url". > For more details.. > Hi, > I am testing collapsed_forwarding plugin > (https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html?highlight=collapsed_forwarding) > via ATS 6.2.x branch. > We observed an error "DNS error 2 for [testurl.com]" for cache-miss, when > remap.config is configured with "collapsed_forwarding" to work alone as a > remap plugin. We must modify TSRemapDoRemap() in the plugin to "return > TSREMAP_NO_REMAP" to allow DNS lookup successful. It does not seem right for > the plugin to do "return TSREMAP_NO_REMAP" when it did not. > Can someone help me to understand how this plugin needs to be used? Or does > it require the fix I mentioned above? > Regards, > Kishore > == Sample remap.config entry and cach miss error when used > "collapsed_forwarding" by itself == map http://testurl.com/ > http://origin.com/ @plugin=collapsed_forwarding.so @pparam=--delay=10 > @pparam=--retries=5 > I observed that during cache-miss, DNS query happens on the 'from' url > (hostname) in the remap and it gets failed. > > [Sep 9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) send query > (qtype=1) for testurl.com to fd 43 [Sep 9 19:39:16.355] Server > {0x2b170ea6c940} DEBUG: (dns) sent qname = testurl.com, id = 9287, nameserver > = 1 [Sep 9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) sent_one: > failover_number for resolve 1 is 1 [Sep 9 19:39:16.628] Server > {0x2b170ea6c940} DEBUG: (dns) received packet size = 52 [Sep 9 19:39:16.628] > Server {0x2b170ea6c940} DEBUG: (dns) round-robin: nameserver 1 DNS respons > code = 0 [Sep 9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) received > rcode = 2 [Sep 9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) DNS > error 2 for [testurl.com] [Sep 9 19:39:16.628] Server {0x2b170ea6c940} > DEBUG: (dns) doing retry for testurl.com > I further looked in to the code and found that it is due to return code from > the plugin is TSREMAP_DID_REMAP in TSRemapDoRemap(). It makes ATS not to > perform remap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4888) collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not perform remap
[ https://issues.apache.org/jira/browse/TS-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4888: --- Fix Version/s: (was: 6.2.1) > collapsed_forwarding plugin returns TSREMAP_DID_REMAP though it did not > perform remap > - > > Key: TS-4888 > URL: https://issues.apache.org/jira/browse/TS-4888 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Affects Versions: 6.2.1, 7.1.0 >Reporter: Rajendra Kishore Bonumahanti > Fix For: 7.1.0 > > > Collapsed_forwarding plugin returns TSREMAP_DID_REMAP as a return value > though it did not perform any remap. This causes ATS not to perform remap and > makes the transaction failed due to DNS lookup error on "from url". > For more details.. > Hi, > I am testing collapsed_forwarding plugin > (https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html?highlight=collapsed_forwarding) > via ATS 6.2.x branch. > We observed an error "DNS error 2 for [testurl.com]" for cache-miss, when > remap.config is configured with "collapsed_forwarding" to work alone as a > remap plugin. We must modify TSRemapDoRemap() in the plugin to "return > TSREMAP_NO_REMAP" to allow DNS lookup successful. It does not seem right for > the plugin to do "return TSREMAP_NO_REMAP" when it did not. > Can someone help me to understand how this plugin needs to be used? Or does > it require the fix I mentioned above? > Regards, > Kishore > == Sample remap.config entry and cach miss error when used > "collapsed_forwarding" by itself == map http://testurl.com/ > http://origin.com/ @plugin=collapsed_forwarding.so @pparam=--delay=10 > @pparam=--retries=5 > I observed that during cache-miss, DNS query happens on the 'from' url > (hostname) in the remap and it gets failed. > > [Sep 9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) send query > (qtype=1) for testurl.com to fd 43 [Sep 9 19:39:16.355] Server > {0x2b170ea6c940} DEBUG: (dns) sent qname = testurl.com, id = 9287, nameserver > = 1 [Sep 9 19:39:16.355] Server {0x2b170ea6c940} DEBUG: (dns) sent_one: > failover_number for resolve 1 is 1 [Sep 9 19:39:16.628] Server > {0x2b170ea6c940} DEBUG: (dns) received packet size = 52 [Sep 9 19:39:16.628] > Server {0x2b170ea6c940} DEBUG: (dns) round-robin: nameserver 1 DNS respons > code = 0 [Sep 9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) received > rcode = 2 [Sep 9 19:39:16.628] Server {0x2b170ea6c940} DEBUG: (dns) DNS > error 2 for [testurl.com] [Sep 9 19:39:16.628] Server {0x2b170ea6c940} > DEBUG: (dns) doing retry for testurl.com > I further looked in to the code and found that it is due to return code from > the plugin is TSREMAP_DID_REMAP in TSRemapDoRemap(). It makes ATS not to > perform remap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TS-4887) Clean up Parent Selection URL feature.
[ https://issues.apache.org/jira/browse/TS-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou reassigned TS-4887: -- Assignee: Peter Chou > Clean up Parent Selection URL feature. > -- > > Key: TS-4887 > URL: https://issues.apache.org/jira/browse/TS-4887 > Project: Traffic Server > Issue Type: Improvement > Components: Parent Proxy >Reporter: Peter Chou >Assignee: Peter Chou > Fix For: 7.1.0 > > > * Remove references to 'maxdirs' and 'fname' from the TS API manual page. > * Rename the "tmp" variable in ParentConsistentHash::getPathHash(). > * Clean up debug messages (remove excessive pointer value reporting). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4887) Clean up Parent Selection URL feature.
[ https://issues.apache.org/jira/browse/TS-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4887: --- Fix Version/s: 7.1.0 > Clean up Parent Selection URL feature. > -- > > Key: TS-4887 > URL: https://issues.apache.org/jira/browse/TS-4887 > Project: Traffic Server > Issue Type: Improvement > Components: Parent Proxy >Reporter: Peter Chou >Assignee: Peter Chou > Fix For: 7.1.0 > > > * Remove references to 'maxdirs' and 'fname' from the TS API manual page. > * Rename the "tmp" variable in ParentConsistentHash::getPathHash(). > * Clean up debug messages (remove excessive pointer value reporting). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4887) Clean up Parent Selection URL feature.
Peter Chou created TS-4887: -- Summary: Clean up Parent Selection URL feature. Key: TS-4887 URL: https://issues.apache.org/jira/browse/TS-4887 Project: Traffic Server Issue Type: Improvement Components: Parent Proxy Reporter: Peter Chou * Remove references to 'maxdirs' and 'fname' from the TS API manual page. * Rename the "tmp" variable in ParentConsistentHash::getPathHash(). * Clean up debug messages (remove excessive pointer value reporting). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.
[ https://issues.apache.org/jira/browse/TS-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507310#comment-15507310 ] Peter Chou commented on TS-4707: [~gancho] -- This seems fine to me also. I would like to add that you should only implement the specific functionality for 'fname' and 'maxdirs' if you think it would benefit the wider community. Otherwise, the focus can just be on extending the existing cache URL manipulation capability to the parent selection URL. > Parent Consistent Hash Selection - add fname and maxdirs options. > - > > Key: TS-4707 > URL: https://issues.apache.org/jira/browse/TS-4707 > Project: Traffic Server > Issue Type: Improvement > Components: Parent Proxy >Reporter: Peter Chou >Assignee: Peter Chou > Fix For: 7.1.0 > > Time Spent: 11.5h > Remaining Estimate: 0h > > This enhancement adds two options, "fname" and "maxdirs", which can be used > to exclude the file-name and some of the directories in the path. The > remaining portions of the path are then used as part of the hash computation > for selecting among multiple parent caches. > For our usage, it was desirable from an operational perspective to direct all > components of particular sub-tree to a single parent cache (to simplify > trouble-shooting, pre-loading, etc.). This can be achieved by excluding the > query-string, file-name, and right-most portions of the path from the hash > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.
[ https://issues.apache.org/jira/browse/TS-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497644#comment-15497644 ] Peter Chou commented on TS-4707: [~jrushford] [~zwoop] [~jpe...@apache.org] -- Except that you have to create a new plugin or modify an existing plugin to accomplish the same manipulation with the API in PR #1009. This may be a big hurdle from the user perspective. I think that since (a) we have allowed 'qstring' option in the past and (b) 'fname' and 'maxdirs' are very much in the same ball-park (no new data structures are used in the path generation, we just adjust the base pointer position and length), that it would benefit the user to be able to accomplish a reasonable level of manipulation without invoking additional plugins or modifying existing plugins. > Parent Consistent Hash Selection - add fname and maxdirs options. > - > > Key: TS-4707 > URL: https://issues.apache.org/jira/browse/TS-4707 > Project: Traffic Server > Issue Type: Improvement > Components: Parent Proxy >Reporter: Peter Chou >Assignee: Peter Chou > Fix For: 7.1.0 > > Time Spent: 11.5h > Remaining Estimate: 0h > > This enhancement adds two options, "fname" and "maxdirs", which can be used > to exclude the file-name and some of the directories in the path. The > remaining portions of the path are then used as part of the hash computation > for selecting among multiple parent caches. > For our usage, it was desirable from an operational perspective to direct all > components of particular sub-tree to a single parent cache (to simplify > trouble-shooting, pre-loading, etc.). This can be achieved by excluding the > query-string, file-name, and right-most portions of the path from the hash > computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4475: --- Backport to Version: 6.2.1 Fix Version/s: (was: sometime) 7.0.0 > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > Fix For: 7.0.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if (vc->read.vio._cont) { > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > 151 } else { > 152 switch (event) { > 153 case VC_EVENT_EOS: > 154 case VC_EVENT_ERROR: > (gdb) list > 155 case VC_EVENT_ACTIVE_TIMEOUT: > 156 case VC_EVENT_INACTIVITY_TIMEOUT: > 157 Debug("inactivity_cop", "event %d: null read.vio cont, closing > vc %p", event, vc); > 158 vc->closed = 1; > 159 break; > 160 default: > 161 Error("Unexpected event %d for vc %p", event, vc); > 162 ink_release_assert(0); > 163 break; > 164 } > Note: I understand that there were several issues related to TS-3196 > concerning inactivity_cop and this section of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4498) RemapConfig.cc - Print out error message on remap plugin init failure.
[ https://issues.apache.org/jira/browse/TS-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4498: --- Backport to Version: 6.2.1 > RemapConfig.cc - Print out error message on remap plugin init failure. > -- > > Key: TS-4498 > URL: https://issues.apache.org/jira/browse/TS-4498 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Peter Chou >Assignee: James Peach > Fix For: 7.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Add printing of the returned error message to the Warning() if a remap plugin > fails to init. Currently it just says "bailing out" which is not as useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4853) Parent Consistent Hash Selection - add parent selection URL and API.
Peter Chou created TS-4853: -- Summary: Parent Consistent Hash Selection - add parent selection URL and API. Key: TS-4853 URL: https://issues.apache.org/jira/browse/TS-4853 Project: Traffic Server Issue Type: New Feature Components: Parent Proxy Reporter: Peter Chou Add the ability (via TS and Lua APIs) to set an explicit parent selection URL that will be used for parent consistent hash selection hashing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4708) traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.
[ https://issues.apache.org/jira/browse/TS-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475034#comment-15475034 ] Peter Chou commented on TS-4708: Additional Info -- although my LD_LIBRARY_PATH is not set at compile time, I did find that LDFLAGS was set to include my $HOME/local/lib. This was previously done in order to find the GeoIP libraries installed there. This may be necessary to reproduce the issue -- which is linking the v7 .la file will end up resolving to the v6 .so in some circumstances. In this reported case, the libtsmgmt.so is linked against libtsutil.so v6 located in $HOME/local/lib, and the resulting traffic_cop depends on both libtsutil.so v7 (its own dependency) and v6 (via libtsmgt.so). If the linking command line is changed to specify the .so instead of the .la it will work. If the libtsmgmt.so linking command line omits libtsutil completely it will work (apparently it is not really required). I am OK with closing this issue, as the work-around is not to install ATS into standard library search paths such as /usr or $HOME/local or else other versions of ATS installed into other directories, e.g., /opt or $HOME/opt, may be linked incorrectly during compilation. This would impact mostly development and build machines rather than production. > traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built. > - > > Key: TS-4708 > URL: https://issues.apache.org/jira/browse/TS-4708 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Peter Chou > > Apologies if this is a known issue. I looked through several pages of search > results for traffic_cop and did not see this particular issue. Platform is > Ubuntu Linux 14.04 LTS 64-bit. I have previously installed and ran 6.1.x > under $HOME/local (I am running as an un-privileged user). I just tried > compiling and running "master" or 7.0.0 and installed to $HOME/master. I gave > the appropriate "--prefix" option to configure each time. Neither of the > directories above are in my LD_LIBRARY_PATH at compile or run time. > Result: traffic_manager starts OK , traffic_server starts OK , traffic_cop > fails since it is looking for version-6 library. If I then add > $HOME/local/lib to my LD_LIBRARY_PATH (contains previous 6.1.x build), then > traffic_cop runs using the version-6 library under there. No idea why it > doesn't use the version-7 library that was built at the same time and > installed under $HOME/master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4799) Allow minimum log rolling period to be set as low as 30s (down from 60s).
Peter Chou created TS-4799: -- Summary: Allow minimum log rolling period to be set as low as 30s (down from 60s). Key: TS-4799 URL: https://issues.apache.org/jira/browse/TS-4799 Project: Traffic Server Issue Type: Improvement Components: Logging Reporter: Peter Chou Change MIN_ROLLING_INTERVAL_SEC in proxy/logging/Log.h to 30 (seconds). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4708) traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.
[ https://issues.apache.org/jira/browse/TS-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450165#comment-15450165 ] Peter Chou commented on TS-4708: I think that I have narrowed this down to a libtool behavior. In the make files, we are linking to the .la file rather than to the un-installed shared library. Libtool will somehow translate the explicit .la file (within the build tree) to -l* which may search outside of the build tree, e.g., ../../lib/ts/libtsutil.la in the libtool command ends up being -ltsutil in the eventual ld command. I happen to have the older ATS 6.x libtsutil.so in my lib so it ends up linking against that even when I am building ATS 7.x. Not sure this is worth the effort to fix. Probably easier just to be aware of and avoid the situation. > traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built. > - > > Key: TS-4708 > URL: https://issues.apache.org/jira/browse/TS-4708 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Peter Chou > Fix For: 7.0.0 > > > Apologies if this is a known issue. I looked through several pages of search > results for traffic_cop and did not see this particular issue. Platform is > Ubuntu Linux 14.04 LTS 64-bit. I have previously installed and ran 6.1.x > under $HOME/local (I am running as an un-privileged user). I just tried > compiling and running "master" or 7.0.0 and installed to $HOME/master. I gave > the appropriate "--prefix" option to configure each time. Neither of the > directories above are in my LD_LIBRARY_PATH at compile or run time. > Result: traffic_manager starts OK , traffic_server starts OK , traffic_cop > fails since it is looking for version-6 library. If I then add > $HOME/local/lib to my LD_LIBRARY_PATH (contains previous 6.1.x build), then > traffic_cop runs using the version-6 library under there. No idea why it > doesn't use the version-7 library that was built at the same time and > installed under $HOME/master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-2770) let proxy.config.log.rolling_interval_sec be less than 5mins
[ https://issues.apache.org/jira/browse/TS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449786#comment-15449786 ] Peter Chou edited comment on TS-2770 at 8/30/16 6:42 PM: - Yes, it appears to. This minimum wasn't protecting against processing delay to roll the log or anything like that, i.e., ensure roll is done before the next roll? was (Author: pbchou): Yes, it appears to. This minimum wasn't protecting against processing delay to roll the log or anything like that, i.e., ensure roll is done before the next roll. > let proxy.config.log.rolling_interval_sec be less than 5mins > > > Key: TS-2770 > URL: https://issues.apache.org/jira/browse/TS-2770 > Project: Traffic Server > Issue Type: Improvement > Components: Logging >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Fix For: 5.0.0 > > > 5 minutes is a long time. Let {{proxy.config.log.rolling_interval_sec}} be > lower, even as low a 1 minute! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2770) let proxy.config.log.rolling_interval_sec be less than 5mins
[ https://issues.apache.org/jira/browse/TS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449786#comment-15449786 ] Peter Chou commented on TS-2770: Yes, it appears to. This minimum wasn't protecting against processing delay to roll the log or anything like that, i.e., ensure roll is done before the next roll. > let proxy.config.log.rolling_interval_sec be less than 5mins > > > Key: TS-2770 > URL: https://issues.apache.org/jira/browse/TS-2770 > Project: Traffic Server > Issue Type: Improvement > Components: Logging >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Fix For: 5.0.0 > > > 5 minutes is a long time. Let {{proxy.config.log.rolling_interval_sec}} be > lower, even as low a 1 minute! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-2770) let proxy.config.log.rolling_interval_sec be less than 5mins
[ https://issues.apache.org/jira/browse/TS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449754#comment-15449754 ] Peter Chou commented on TS-2770: [~jpe...@apache.org] -- Do you think there is any issue if want to further lower the minimum to 30 seconds? > let proxy.config.log.rolling_interval_sec be less than 5mins > > > Key: TS-2770 > URL: https://issues.apache.org/jira/browse/TS-2770 > Project: Traffic Server > Issue Type: Improvement > Components: Logging >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Fix For: 5.0.0 > > > 5 minutes is a long time. Let {{proxy.config.log.rolling_interval_sec}} be > lower, even as low a 1 minute! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414477#comment-15414477 ] Peter Chou commented on TS-4475: [~oknet] Hi. I was able to test the fix against "master" after working around the TS-4728 bug that I found. Please review the update PR when you get a chance. I would also like to request that this fix be applied to both the 7.0.0 and 6.2.x branches. > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > Fix For: sometime > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if (vc->read.vio._cont) { > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > 151 } else { > 152 switch (event) { > 153 case VC_EVENT_EOS: > 154 case VC_EVENT_ERROR: > (gdb) list > 155 case VC_EVENT_ACTIVE_TIMEOUT: > 156 case VC_EVENT_INACTIVITY_TIMEOUT: > 157 Debug("inactivity_cop", "event %d: null read.vio cont, closing > vc %p", event, vc); > 158 vc->closed = 1; > 159 break; > 160 default: > 161 Error("Unexpected event %d for vc %p", event, vc); > 162 ink_release_assert(0); > 163 break; > 164 } > Note: I understand that there were several issues
[jira] [Updated] (TS-4728) Null pointer error in LogHost.cc.
[ https://issues.apache.org/jira/browse/TS-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4728: --- Affects Version/s: 7.0.0 > Null pointer error in LogHost.cc. > - > > Key: TS-4728 > URL: https://issues.apache.org/jira/browse/TS-4728 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 7.0.0 >Reporter: Peter Chou > > [~jpe...@apache.org] I am getting a null pointer access error with the > following assertion at the time of traffic_server start-up with log collation > enabled (client-side). I was able to get around it by just commenting it out, > but perhaps a better fix is required. > {noformat} > LogHost::create_orphan_LogFile_object() > { > // We expect that no-one else is holding any refcounts on the > // orphan file so that is will be releases when we replace it > // below. > ink_assert(m_orphan_file->refcount() == 1); > {noformat} > Back-trace -- > {noformat} > #0 0x0053e772 in RefCountObj::refcount (this=0x8) at > ../lib/ts/Ptr.h:80 > #1 0x00692f9f in LogHost::create_orphan_LogFile_object > (this=0x2268d80) at LogHost.cc:235 > #2 0x00692a45 in LogHost::set_ipstr_port (this=0x2268d80, > ipstr=0x2265d40 "127.0.0.1", pt=8085) at LogHost.cc:135 > #3 0x00692b92 in LogHost::set_name_or_ipstr (this=0x2268d80, > name_or_ip=0x2265d40 "127.0.0.1") at LogHost.cc:155 > #4 0x00684046 in LogConfig::read_xml_log_config (this=0x21e4110) at > LogConfig.cc:1472 > #5 0x0067ff73 in LogConfig::setup_log_objects (this=0x21e4110) at > LogConfig.cc:510 > #6 0x0067f858 in LogConfig::init (this=0x21e4110, prev_config=0x0) > at LogConfig.cc:395 > #7 0x006721fe in Log::init (flags=0) at Log.cc:925 > #8 0x00542552 in main (argv=0x7ffcc853abd8) at Main.cc:1828 > {noformat} > I made minimal changes to logs_xml.config to set as client -- > {noformat} > > > : % : %"/> > > > > > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4728) Null pointer error in LogHost.cc.
Peter Chou created TS-4728: -- Summary: Null pointer error in LogHost.cc. Key: TS-4728 URL: https://issues.apache.org/jira/browse/TS-4728 Project: Traffic Server Issue Type: Bug Components: Logging Reporter: Peter Chou [~jpe...@apache.org] I am getting a null pointer access error with the following assertion at the time of traffic_server start-up with log collation enabled (client-side). I was able to get around it by just commenting it out, but perhaps a better fix is required. {noformat} LogHost::create_orphan_LogFile_object() { // We expect that no-one else is holding any refcounts on the // orphan file so that is will be releases when we replace it // below. ink_assert(m_orphan_file->refcount() == 1); {noformat} Back-trace -- {noformat} #0 0x0053e772 in RefCountObj::refcount (this=0x8) at ../lib/ts/Ptr.h:80 #1 0x00692f9f in LogHost::create_orphan_LogFile_object (this=0x2268d80) at LogHost.cc:235 #2 0x00692a45 in LogHost::set_ipstr_port (this=0x2268d80, ipstr=0x2265d40 "127.0.0.1", pt=8085) at LogHost.cc:135 #3 0x00692b92 in LogHost::set_name_or_ipstr (this=0x2268d80, name_or_ip=0x2265d40 "127.0.0.1") at LogHost.cc:155 #4 0x00684046 in LogConfig::read_xml_log_config (this=0x21e4110) at LogConfig.cc:1472 #5 0x0067ff73 in LogConfig::setup_log_objects (this=0x21e4110) at LogConfig.cc:510 #6 0x0067f858 in LogConfig::init (this=0x21e4110, prev_config=0x0) at LogConfig.cc:395 #7 0x006721fe in Log::init (flags=0) at Log.cc:925 #8 0x00542552 in main (argv=0x7ffcc853abd8) at Main.cc:1828 {noformat} I made minimal changes to logs_xml.config to set as client -- {noformat} : % : %"/> {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4498) RemapConfig.cc - Print out error message on remap plugin init failure.
[ https://issues.apache.org/jira/browse/TS-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414193#comment-15414193 ] Peter Chou commented on TS-4498: [~jpe...@apache.org] Would it be possible to back-port this to 6.2.x also? It should cherry-pick cleanly. > RemapConfig.cc - Print out error message on remap plugin init failure. > -- > > Key: TS-4498 > URL: https://issues.apache.org/jira/browse/TS-4498 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Peter Chou >Assignee: James Peach > Fix For: 7.0.0 > > > Add printing of the returned error message to the Warning() if a remap plugin > fails to init. Currently it just says "bailing out" which is not as useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4708) traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built.
[ https://issues.apache.org/jira/browse/TS-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414185#comment-15414185 ] Peter Chou commented on TS-4708: One more data point. It seems that the pre-install binary (left over after compilation) works, but the post-install binary does not. Something happens to the binary (I think) when you do "make -install". The pre-install binary will create a .libs directory in the execution directory with a lt-traffic_cop file inside and the program will run. The post-install binary will NOT create the .libs directory and then fails since it is searching for the wrong .6 library version. Any ideas? > traffic_cop looking for libtsutil.so.6 although libtsutil.so.7 was built. > - > > Key: TS-4708 > URL: https://issues.apache.org/jira/browse/TS-4708 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Peter Chou > Fix For: 7.0.0 > > > Apologies if this is a known issue. I looked through several pages of search > results for traffic_cop and did not see this particular issue. Platform is > Ubuntu Linux 14.04 LTS 64-bit. I have previously installed and ran 6.1.x > under $HOME/local (I am running as an un-privileged user). I just tried > compiling and running "master" or 7.0.0 and installed to $HOME/master. I gave > the appropriate "--prefix" option to configure each time. Neither of the > directories above are in my LD_LIBRARY_PATH at compile or run time. > Result: traffic_manager starts OK , traffic_server starts OK , traffic_cop > fails since it is looking for version-6 library. If I then add > $HOME/local/lib to my LD_LIBRARY_PATH (contains previous 6.1.x build), then > traffic_cop runs using the version-6 library under there. No idea why it > doesn't use the version-7 library that was built at the same time and > installed under $HOME/master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3245) getopt doesn't work correctly when used in plugin chaining
[ https://issues.apache.org/jira/browse/TS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412584#comment-15412584 ] Peter Chou commented on TS-3245: Hi, I opened a PR #845 with a back-port of this patch from "master" [7.0.0] to 6.2.x. > getopt doesn't work correctly when used in plugin chaining > -- > > Key: TS-3245 > URL: https://issues.apache.org/jira/browse/TS-3245 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Affects Versions: 5.1.1 >Reporter: Sudheer Vinukonda >Priority: Minor > Labels: newbie > Fix For: sometime > > Time Spent: 10m > Remaining Estimate: 0h > > When multiple plugins that use getopt are chained, it doesn't work correctly > for the subsequent plugins after the first plugin. [~jpe...@apache.org] and > [~zwoop] suggested that the getopt globals need to be reset (example, > {{optind = opterr = optopt = 0}}) before using it and would be better to do > it in the core during plugin loading to keep it simple/transparent from > plugin development. > Note that, if a plugin itself uses getopt multiple times on different argv's, > it would have to reset the globals between them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410320#comment-15410320 ] Peter Chou commented on TS-4475: [~oknet] based on input from Susan Hinrich, I just piggy-backed the VC_EVENT_ACTIVE_TIMEOUT and VC_EVENT_INACTIVITY_TIMEOUT events with the actions taken for EOS and ERROR in the switch statement. I also modified the debug message accordingly. This is similar to what we originally had (see initial topic comment above) with the addition of the VC_EVENT_ACTIVE_TIMEOUT as both you and Susan suggested. Can you take a look at what I did in client_open() with -- {noformat} net_vc->set_inactivity_timeout(HRTIME_SECONDS(86400)); {noformat} I am not sure if this is what you are recommending for step 2. It seems sufficient to set the time-out for the net vc to the previous default of 86400. I was not able to test this part in my development environment under "master", but we'll back-port it to 6.1.x and test in our lab next week. Appreciate if you can give an opinion on this whether it looks right and is in line with your thinking. > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > Fix For: sometime > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if
[jira] [Created] (TS-4707) Parent Consistent Hash Selection - add fname and maxdirs options.
Peter Chou created TS-4707: -- Summary: Parent Consistent Hash Selection - add fname and maxdirs options. Key: TS-4707 URL: https://issues.apache.org/jira/browse/TS-4707 Project: Traffic Server Issue Type: Improvement Components: Parent Proxy Reporter: Peter Chou This enhancement adds two options, "fname" and "maxdirs", which can be used to exclude the file-name and some of the directories in the path. The remaining portions of the path are then used as part of the hash computation for selecting among multiple parent caches. For our usage, it was desirable from an operational perspective to direct all components of particular sub-tree to a single parent cache (to simplify trouble-shooting, pre-loading, etc.). This can be achieved by excluding the query-string, file-name, and right-most portions of the path from the hash computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402589#comment-15402589 ] Peter Chou commented on TS-4475: Apologies, I had forgotten that we had modified the original solution to just ignore the time-out event and return EVENT_CONT instead. This seems to work for us (it generates a time-out debug message every 5 minutes if you just leave it idle), and it seemed to be a less drastic approach than killing the connection on time-out. So based on the previous comment from Oknet, we should go back to the original approach of killing the connection (treating it as an error instead of ignoring). Should I just piggy-back the VC_EVENT_INACTIVITY_TIMEOUT with the actions for VC_EVENT_EOS and VC_EVENT_ERROR (these two are already combined) in the switch() statement? It seems there is some code for handling these two events in addition to just calling client_fail(). Perhaps the time-out should execute these actions also? > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > Fix For: sometime > > Time Spent: 50m > Remaining Estimate: 0h > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if (vc->read.vio._cont) { > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > 151 } else { > 152
[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398056#comment-15398056 ] Peter Chou commented on TS-4475: I submitted a PR with a fix so that the Log Collation Client SM will treat the VC_EVENT_INACTIVITY_TIMEOUT event as an error (if un-handled in some way, it would result in a core dump). In the lab, this problem can be created by (1) starting ATS (2) sending a request [ this causes a log collation client connection to be established ] (3) let it sit idle for 300s with no further requests [ ATS will now core dump when the time-out is generated by inactivity cop and sent to the client SM ]. Admittedly, this is somewhat of a lab issue, but there is definitely some possibility that 5m of idle time could be experienced depending on the site in question. > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > Fix For: sometime > > Time Spent: 10m > Remaining Estimate: 0h > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if (vc->read.vio._cont) { > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > 151 } else { > 152 switch (event) { > 153 case VC_EVENT_EOS: > 154 case VC_EVENT_ERROR: > (gdb) list > 155 case VC_EVENT_ACTIVE_TIMEOUT: > 156 case
[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311015#comment-15311015 ] Peter Chou commented on TS-4475: We only changed the default_inactivity_timeout from 86400 to 300 seconds. The inactivity_check_frequency remained at the default of 1s. Understood the default_inactivity_timeout is by default enabled with a long delay. When I say "making use of" I meant setting it for a value that would actually be useful to clear the unresponsive ATS condition in a reasonable amount of time. By the way, I am not sure if anyone is supporting this sub-system since there have been no other responses. Do you think it is reasonable to allow the log collation continuation to handle the inactivity event by treating it like an EOS/ERROR (just piggy-back in the switch statement)? We can try this in our lab. So far we have experienced three core dumps due to this issue ~ 4-days apart each (ATS is just sitting there idle in a lab environment). > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > Fix For: sometime > > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if (vc->read.vio._cont) { > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > 151 } else { > 152 switch (event) { > 153 case
[jira] [Commented] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299154#comment-15299154 ] Peter Chou commented on TS-4475: Update - I had an e-mail exchange with sudheerv on this issue. Turns out that the continuation (for Log Collator Client SM in this case) should be updated to handle the VC_EVENT_INACTIVITY_TIMEOUT (105) much like HttpSM.cc does, for example. Unfortunately, I looked and there are no existing instances of handling this event in LogCollationClientSM.cc. My first simplistic approach would be to piggy-back this event where I see VC_EVENT_EOS and VC_EVENT_ERROR being handled. Any recommendations? Also, is Log Collator client feature still supported going forward? > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if (vc->read.vio._cont) { > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > 151 } else { > 152 switch (event) { > 153 case VC_EVENT_EOS: > 154 case VC_EVENT_ERROR: > (gdb) list > 155 case VC_EVENT_ACTIVE_TIMEOUT: > 156 case VC_EVENT_INACTIVITY_TIMEOUT: > 157 Debug("inactivity_cop", "event %d: null read.vio cont, closing > vc %p", event, vc); > 158 vc->closed = 1; > 159
[jira] [Commented] (TS-4461) Not closing client connections
[ https://issues.apache.org/jira/browse/TS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298965#comment-15298965 ] Peter Chou commented on TS-4461: Just curious if you have the inactivity cop time-out set for 10-minutes or not in your scenario? We experienced a similar issue with 6.1.1 in our lab testing when ATS would become unresponsive to client requests under heavy load. After the load was stopped, ATS continued to ignore client requests (single curl requests) for > 1-hour. It seemed locked up with 0 active connections internally, but approximately throttle-limit established connections reported with 'netstat -na'. After we set inactivity-cop to 300-seconds from 86400-seconds it seemed to avoid the locked-up condition but resulted in TS-4475 periodic core-dumps some time later (no load just sitting there). Do you think our results fall under this issue? > Not closing client connections > -- > > Key: TS-4461 > URL: https://issues.apache.org/jira/browse/TS-4461 > Project: Traffic Server > Issue Type: Bug > Components: Core >Affects Versions: 6.2.0 >Reporter: Bryan Call >Assignee: Susan Hinrichs >Priority: Blocker > Fix For: 7.0.0 > > > Looks like we are not closing client connections correctly on the 6.2.x > branch. After taking a server our of rotation for awhile. > {code} > [bcall@l28 ~]$ ss -s > Total: 18212 (kernel 18329) > TCP: 18122 (estab 17141, closed 123, orphaned 4, synrecv 0, timewait > 123/0), ports 152 > {code} > in traffic top: > {code} > CLIENTORIGIN SERVER > Requests 1.8 Head Bytes 492.0Requests 1.8 Head Bytes 345.7 > Req/Conn 1.0 Body Bytes 0.0Req/Conn 1.0 Body Bytes 0.0 > New Conn 1.8 Avg Size 269.0New Conn 1.8 Avg Size 189.0 > Curr Conn0.0 Net (bits) 3.9K Curr Conn0.0 Net (bits) > 2.8K > Active Con 6.6MResp (ms)0.8 > Dynamic KA 0.0 > {code} > Looks like it is happening on the client connections to TLS ports (ip of the > server removed): > {code} > [bcall@l28 ~]$ ss -tn | grep 'XXX:44[3-4]' | wc -l > 12434 > {code} > And not on the non-TLS ports > {code} > [bcall@l28 ~]$ ss -tn | grep 'XXX:8' | wc -l > 0 > {code} > Count of the fd for the traffic_server process: > {code} > [bcall@l28 ~]$ sudo ls -l /proc/$(pidof traffic_server)/fd | wc -l > 18127 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
[ https://issues.apache.org/jira/browse/TS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4475: --- Affects Version/s: 6.1.1 > Crash in Log-Collation client after using inactivity-cop. > - > > Key: TS-4475 > URL: https://issues.apache.org/jira/browse/TS-4475 > Project: Traffic Server > Issue Type: Bug > Components: Logging >Affects Versions: 6.1.1 >Reporter: Peter Chou > > Background: We recently tried making use of inactivity-cop by setting it to > 300s instead of the default one-day setting. This was to address an issue > where, under heavy load, ATS would become un-responsive to client requests, > and the condition would persist after traffic was stopped with the active > queue saying 0 connections but 'netstat -na' showing a bunch of established > connections (up to the throttle limit approximately). > Inactivity cop seemed to help ATS handle this situation, but we have since > experienced a couple of core dumps over the last four day period. It seems > occasionally the Log Collation Client State Machine will have event value 105 > or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() > it tries to call the continuation handler which down the line does not know > about this event thus causing core dump !"unexpcted state" [sic]. > Here is the back-trace -- > (gdb) bt > #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 > #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 > #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 > #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed > assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 > #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: > failed assert `%s`") at ink_error.cc:73 > #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted > state\"", file=0x7fb35b "LogCollationClientSM.cc", > line=445) at ink_assert.cc:37 > #6 0x0069c86b in LogCollationClientSM::client_idle > (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 > #7 0x0069b427 in LogCollationClientSM::client_handler > (this=0x2b681400bb00, event=105, data=0x2b680c017020) > at LogCollationClientSM.cc:119 > #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, > event=105, data=0x2b680c017020) > at ../iocore/eventsystem/I_Continuation.h:153 > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, > event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 > #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, > event=1, data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, > event=2, e=0x127ad60) at UnixNet.cc:102 > #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, > data=0x127ad60) > at ../iocore/eventsystem/I_Continuation.h:153 > #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, > e=0x127ad60, calling_code=2) at UnixEThread.cc:128 > #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at > UnixEThread.cc:207 > #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 > I believe it takes a wrong turn here -- > #9 0x00783d40 in read_signal_and_update (event=105, > vc=0x2b680c016f00) at UnixNetVConnection.cc:150 > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > (gdb) list > 145 static inline int > 146 read_signal_and_update(int event, UnixNetVConnection *vc) > 147 { > 148 vc->recursion++; > 149 if (vc->read.vio._cont) { > 150 vc->read.vio._cont->handleEvent(event, >read.vio); > 151 } else { > 152 switch (event) { > 153 case VC_EVENT_EOS: > 154 case VC_EVENT_ERROR: > (gdb) list > 155 case VC_EVENT_ACTIVE_TIMEOUT: > 156 case VC_EVENT_INACTIVITY_TIMEOUT: > 157 Debug("inactivity_cop", "event %d: null read.vio cont, closing > vc %p", event, vc); > 158 vc->closed = 1; > 159 break; > 160 default: > 161 Error("Unexpected event %d for vc %p", event, vc); > 162 ink_release_assert(0); > 163 break; > 164 } > Note: I understand that there were several issues related to TS-3196 > concerning inactivity_cop and this section of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4475) Crash in Log-Collation client after using inactivity-cop.
Peter Chou created TS-4475: -- Summary: Crash in Log-Collation client after using inactivity-cop. Key: TS-4475 URL: https://issues.apache.org/jira/browse/TS-4475 Project: Traffic Server Issue Type: Bug Components: Logging Reporter: Peter Chou Background: We recently tried making use of inactivity-cop by setting it to 300s instead of the default one-day setting. This was to address an issue where, under heavy load, ATS would become un-responsive to client requests, and the condition would persist after traffic was stopped with the active queue saying 0 connections but 'netstat -na' showing a bunch of established connections (up to the throttle limit approximately). Inactivity cop seemed to help ATS handle this situation, but we have since experienced a couple of core dumps over the last four day period. It seems occasionally the Log Collation Client State Machine will have event value 105 or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() it tries to call the continuation handler which down the line does not know about this event thus causing core dump !"unexpcted state" [sic]. Here is the back-trace -- (gdb) bt #0 0x2b67cd5405f7 in raise () from /lib64/libc.so.6 #1 0x2b67cd541e28 in abort () from /lib64/libc.so.6 #2 0x2b67cb032921 in ink_die_die_die () at ink_error.cc:43 #3 0x2b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65 #4 0x2b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: failed assert `%s`") at ink_error.cc:73 #5 0x2b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted state\"", file=0x7fb35b "LogCollationClientSM.cc", line=445) at ink_assert.cc:37 #6 0x0069c86b in LogCollationClientSM::client_idle (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445 #7 0x0069b427 in LogCollationClientSM::client_handler (this=0x2b681400bb00, event=105, data=0x2b680c017020) at LogCollationClientSM.cc:119 #8 0x00502cc6 in Continuation::handleEvent (this=0x2b681400bb00, event=105, data=0x2b680c017020) at ../iocore/eventsystem/I_Continuation.h:153 #9 0x00783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) at UnixNetVConnection.cc:150 #10 0x00787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, event=1, e=0x127ad60) at UnixNetVConnection.cc:1188 #11 0x00502cc6 in Continuation::handleEvent (this=0x2b680c016f00, event=1, data=0x127ad60) at ../iocore/eventsystem/I_Continuation.h:153 #12 0x0077d943 in InactivityCop::check_inactivity (this=0x1209a00, event=2, e=0x127ad60) at UnixNet.cc:102 #13 0x00502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, data=0x127ad60) at ../iocore/eventsystem/I_Continuation.h:153 #14 0x007a5df6 in EThread::process_event (this=0x2b67cf7bb010, e=0x127ad60, calling_code=2) at UnixEThread.cc:128 #15 0x007a61f5 in EThread::execute (this=0x2b67cf7bb010) at UnixEThread.cc:207 #16 0x00534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918 I believe it takes a wrong turn here -- #9 0x00783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) at UnixNetVConnection.cc:150 150 vc->read.vio._cont->handleEvent(event, >read.vio); (gdb) list 145 static inline int 146 read_signal_and_update(int event, UnixNetVConnection *vc) 147 { 148 vc->recursion++; 149 if (vc->read.vio._cont) { 150 vc->read.vio._cont->handleEvent(event, >read.vio); 151 } else { 152 switch (event) { 153 case VC_EVENT_EOS: 154 case VC_EVENT_ERROR: (gdb) list 155 case VC_EVENT_ACTIVE_TIMEOUT: 156 case VC_EVENT_INACTIVITY_TIMEOUT: 157 Debug("inactivity_cop", "event %d: null read.vio cont, closing vc %p", event, vc); 158 vc->closed = 1; 159 break; 160 default: 161 Error("Unexpected event %d for vc %p", event, vc); 162 ink_release_assert(0); 163 break; 164 } Note: I understand that there were several issues related to TS-3196 concerning inactivity_cop and this section of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4450) Syntax error in CI test script test_https.py.
Peter Chou created TS-4450: -- Summary: Syntax error in CI test script test_https.py. Key: TS-4450 URL: https://issues.apache.org/jira/browse/TS-4450 Project: Traffic Server Issue Type: Bug Components: CI Reporter: Peter Chou I don't know Python, but the parenthesis seems to be un-needed or at least un-balanced here. Sorry about the formatting, the caret is pointing to the parenthesis. File "/usr/src/git/trafficserver/ci/tsqa/tests/test_https.py", line 318 signal_cmd = [traffic_ctl, 'config', 'reload')] ^ SyntaxError: invalid syntax -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-4411) Add a error message on unrecognized remap.config @... option.
[ https://issues.apache.org/jira/browse/TS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267317#comment-15267317 ] Peter Chou edited comment on TS-4411 at 5/2/16 7:21 PM: I opened a PR with the patch. It is just a one-liner that prints a diagnostic message to the error.log. I did notice that the remap_check_option() is apparently run three times on start-up so the message is printed three times. I did not attempt to squash this as it should be a rare exception condition. was (Author: pbchou): I opened a PR with the patch. It is just a one-liner that prints a diagnostic message to the error.log. I did not notice that the remap_check_option() is apparently run three times on start-up so the message is printed three times. I did not attempt to squash this as it should be a rare exception condition. > Add a error message on unrecognized remap.config @... option. > - > > Key: TS-4411 > URL: https://issues.apache.org/jira/browse/TS-4411 > Project: Traffic Server > Issue Type: Improvement > Components: Core >Reporter: Peter Chou >Assignee: Peter Chou > Labels: review > Fix For: 7.0.0 > > > We noticed that unrecognized remap.config options seem to result in "silent" > failures, i.e., the remap rule "map /a /b @foo" just reduces to a plain "map > /a /b" rule. This is not desirable when we are implementing access control > and other functionality in the rule's plugin chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4411) Add a error message on unrecognized remap.config @... option.
[ https://issues.apache.org/jira/browse/TS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267317#comment-15267317 ] Peter Chou commented on TS-4411: I opened a PR with the patch. It is just a one-liner that prints a diagnostic message to the error.log. I did not notice that the remap_check_option() is apparently run three times on start-up so the message is printed three times. I did not attempt to squash this as it should be a rare exception condition. > Add a error message on unrecognized remap.config @... option. > - > > Key: TS-4411 > URL: https://issues.apache.org/jira/browse/TS-4411 > Project: Traffic Server > Issue Type: Improvement > Components: Core >Reporter: Peter Chou >Assignee: Peter Chou > Labels: review > Fix For: 7.0.0 > > > We noticed that unrecognized remap.config options seem to result in "silent" > failures, i.e., the remap rule "map /a /b @foo" just reduces to a plain "map > /a /b" rule. This is not desirable when we are implementing access control > and other functionality in the rule's plugin chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4411) Add a error message on unrecognized remap.config @... option.
Peter Chou created TS-4411: -- Summary: Add a error message on unrecognized remap.config @... option. Key: TS-4411 URL: https://issues.apache.org/jira/browse/TS-4411 Project: Traffic Server Issue Type: Improvement Components: Core Reporter: Peter Chou We noticed that unrecognized remap.config options seem to result in "silent" failures, i.e., the remap rule "map /a /b @foo" just reduces to a plain "map /a /b" rule. This is not desirable when we are implementing access control and other functionality in the rule's plugin chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4410) Fix i386 compiler warning - unsigned-vs-signed comparison in hostdb.
Peter Chou created TS-4410: -- Summary: Fix i386 compiler warning - unsigned-vs-signed comparison in hostdb. Key: TS-4410 URL: https://issues.apache.org/jira/browse/TS-4410 Project: Traffic Server Issue Type: Bug Components: DNS Reporter: Peter Chou Compiler warning shows up on i386 32-bit build due to unsigned-vs-signed int comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4353) Support multiple/custom GeoIP databases.
[ https://issues.apache.org/jira/browse/TS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241950#comment-15241950 ] Peter Chou commented on TS-4353: Following branch is available for those interested -- git pull https://github.com/pbchou/trafficserver TS-4353 > Support multiple/custom GeoIP databases. > > > Key: TS-4353 > URL: https://issues.apache.org/jira/browse/TS-4353 > Project: Traffic Server > Issue Type: Improvement > Components: Plugins >Reporter: Peter Chou > > We have an internally developed plugin that we worked on based on suggestions > from Shu Kit Chan. This plugin is a global/remap plugin that allows you to > specify multiple IPv4 country databases in the global plugin.config file > (important for multiple customers on an ATS instance). Each DB is assigned a > tag string, e.g., --tag=foo --file=path-to-foo-file --tag=bar > --file=path-to-bar-file. > In the remap context, the plugin will look-up the country code of the client > IP and place it into an ATS internal header for down-chain plugins (such as > tslua) to use. The selector for controlling which DB to use for the look-up > for each remap rule is @pparam=foo. > I understand that GeoIP enhancements have recently been added to > header_rewrite which can perform header changes based on GeoIP information. > Would there be some value in adding the multiple/custom-DB feature to > header_rewrite or possibly establishing a generic GeoIP helper plugin that > handles the DB management for other plugins? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4353) Support multiple/custom GeoIP databases.
Peter Chou created TS-4353: -- Summary: Support multiple/custom GeoIP databases. Key: TS-4353 URL: https://issues.apache.org/jira/browse/TS-4353 Project: Traffic Server Issue Type: Improvement Components: Plugins Reporter: Peter Chou We have an internally developed plugin that we worked on based on suggestions from Shu Kit Chan. This plugin is a global/remap plugin that allows you to specify multiple IPv4 country databases in the global plugin.config file (important for multiple customers on an ATS instance). Each DB is assigned a tag string, e.g., --tag=foo --file=path-to-foo-file --tag=bar --file=path-to-bar-file. In the remap context, the plugin will look-up the country code of the client IP and place it into an ATS internal header for down-chain plugins (such as tslua) to use. The selector for controlling which DB to use for the look-up for each remap rule is @pparam=foo. I understand that GeoIP enhancements have recently been added to header_rewrite which can perform header changes based on GeoIP information. Would there be some value in adding the multiple/custom-DB feature to header_rewrite or possibly establishing a generic GeoIP helper plugin that handles the DB management for other plugins? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4252) Some plugins are causing seg-faults when using getopt_long with optind = 1.
[ https://issues.apache.org/jira/browse/TS-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4252: --- Affects Version/s: 6.1.1 Environment: Linux Intel 64-bit Ubuntu 14.04 and RHEL 7. Description: There are several global plugins which experience segmentation fault related to parsing (argc, argv) arguments using getopt_long(). Often, the plugins display debug output showing corrupted arguments, e.g., arguments belonging to previous entries in plugins.config. This has been confirmed to happen with background_fetch.so and regex_revalidate.so. The other plugins remap_stats and stale_while_revalidate may also be affected based on code review. This issue is corrected if the plugins are modified to use optind = 0 instead of optind = 1 before calling getopt_long(). Note that the majority of plugins are using optind = 0 already. Per the Linux man page, you should only need to set optind = 1 between scanning different argument vectors, but you must set optind = 0 to cause some re-initialization to occur if you make use of GNU extensions in the opstring argument of getopt_long(). I am not sure if this applies to prior plugin using GNU extensions or current one (or going between one or the other), but it would seem safer to use optind = 0 always. was: In "plugin.config" if we just do background_fetch.so with no arguments we get a segmentation fault with messages saying invalid option with argument text from previous lines in the configuration file. If I just add a garbage argument like "background_fetch.so bleah" there is no fault. I noticed that this plugin initialized optind to 1 before calling getopt_long while others such as tcpinfo set it to 0. Setting it to 0 also prevents the fault. Is this the correct fix? Summary: Some plugins are causing seg-faults when using getopt_long with optind = 1. (was: background_fetch.so segfaults with no arguments as a global plugin.) Updated description to indicate multiple plugins affected. > Some plugins are causing seg-faults when using getopt_long with optind = 1. > --- > > Key: TS-4252 > URL: https://issues.apache.org/jira/browse/TS-4252 > Project: Traffic Server > Issue Type: Bug > Components: Plugins >Affects Versions: 6.1.1 > Environment: Linux Intel 64-bit Ubuntu 14.04 and RHEL 7. >Reporter: Peter Chou >Assignee: Leif Hedstrom > Fix For: 6.2.0 > > > There are several global plugins which experience segmentation fault related > to parsing (argc, argv) arguments using getopt_long(). Often, the plugins > display debug output showing corrupted arguments, e.g., arguments belonging > to previous entries in plugins.config. This has been confirmed to happen with > background_fetch.so and regex_revalidate.so. The other plugins remap_stats > and stale_while_revalidate may also be affected based on code review. > This issue is corrected if the plugins are modified to use optind = 0 instead > of optind = 1 before calling getopt_long(). Note that the majority of plugins > are using optind = 0 already. Per the Linux man page, you should only need to > set optind = 1 between scanning different argument vectors, but you must set > optind = 0 to cause some re-initialization to occur if you make use of GNU > extensions in the opstring argument of getopt_long(). I am not sure if this > applies to prior plugin using GNU extensions or current one (or going between > one or the other), but it would seem safer to use optind = 0 always. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4266) ATS memory statistics shows that memory utilization is doubled after “traffic_ctlconfig reload”. And it is failed as it cannot find enough memory.
[ https://issues.apache.org/jira/browse/TS-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207119#comment-15207119 ] Peter Chou commented on TS-4266: Kit, sorry for any confusion. I will work with Kishore to submit a pull request to apache/trafficserver. I think Kishore only merged the pull request with his own fork at brkishore/trafficserver in his comment above. Thanks for reviewing. > ATS memory statistics shows that memory utilization is doubled after > “traffic_ctlconfig reload”. And it is failed as it cannot find enough memory. > -- > > Key: TS-4266 > URL: https://issues.apache.org/jira/browse/TS-4266 > Project: Traffic Server > Issue Type: Bug > Components: Lua >Reporter: Rajendra Kishore Bonumahanti >Assignee: Kit Chan > Fix For: sometime > > > ATS memory statistics shows memory utilization is doubled after “traffic_ctl > config reload”. We get “not enough memory” error in the subsequent attempt > and “config reload” fails. > ATS is configured with 100 map entries in remap.config, all share the same > lua script. > ATS is started: The memory information is.. > [root@mtanjv8cdnc73 trafficserver]# pmap -x 113330 | grep total > total kB 1416092 670256 663736 > After 1st Config reload: > [root@mtanjv8cdnc73 trafficserver]# pmap -x 113330 | grep total > total kB 1932660 1128084 1121544 > After 2nd config reload: It had failed with error “not enough memory” and > memory status as.. > [root@mtanjv8cdnc73 trafficserver]# pmap -x 113330 | grep total > total kB 2170756 1167808 1160836 > Error displayed in diags.log: > === > [Mar 8 23:27:27.580] Server {0x2af92498b700} WARNING: Failed to create new > instance for plugin /opt/trafficserver/libexec/trafficserver/tslua.so (not a > TS_SUCCESS return) > [Mar 8 23:27:27.580] Server {0x2af92498b700} WARNING: Could not add rule at > line #3; Aborting! > [Mar 8 23:27:27.580] Server {0x2af92498b700} WARNING: [ReverseProxy] Can't > create new remap instance for plugin > "/opt/trafficserver/libexec/trafficserver/tslua.so" - [ts_lua_add_module] > luaL_loadfile /opt/trafficserver/etc/trafficserver/lua/process_remap.lua > failed: not enough memory at line 3 > [Mar 8 23:27:27.580] Server {0x2af92498b700} WARNING: something failed > during BuildTable() -- check your remap plugins! > [Mar 8 23:27:27.595] Server {0x2af92498b700} WARNING: failed to reload > remap.config, not replacing! > Lua VM memory size at that time ,ts.debug(FUNCTION..'Lua VM memory: > '..collectgarbage("count")) > [Mar 8 23:27:27.579] Server {0x2af92498b700} DIAG: (ts_lua) __init__(): Lua > VM memory: 3629.7060546875 > This shows that Lua VMs are hitting the max capacity. > Solution: > === > I looked at the ts_lua code TSRemapDeleteInstance () [ts_lua.c ] and > ts_lua_del_module() [ts_lua_util.c] which does cleaning of the lua memory for > the instance. However the lua memory is not released and reused. > So, I have added code to start the garbage collector in ts_lua_del_module() . > int > ts_lua_del_module(ts_lua_instance_conf *conf, ts_lua_main_ctx *arr, int n) > { > …. > lua_newtable(L); > lua_replace(L, LUA_GLOBALSINDEX); /* L[GLOBAL] = EMPTY */ > lua_gc(L, LUA_GCCOLLECT, 0); > TSMutexUnlock(arr[i].mutexp); > } > return 0; > } > This has improved the situation. However, I also added garbage collection in > ts_lua_add_module() at the end. With these two additions, we have tested the > code, the memory utilization is stable and we could do config reload at lest > 100 times with the background load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4252) background_fetch.so segfaults with no arguments as a global plugin.
Peter Chou created TS-4252: -- Summary: background_fetch.so segfaults with no arguments as a global plugin. Key: TS-4252 URL: https://issues.apache.org/jira/browse/TS-4252 Project: Traffic Server Issue Type: Bug Components: Plugins Reporter: Peter Chou In "plugin.config" if we just do background_fetch.so with no arguments we get a segmentation fault with messages saying invalid option with argument text from previous lines in the configuration file. If I just add a garbage argument like "background_fetch.so bleah" there is no fault. I noticed that this plugin initialized optind to 1 before calling getopt_long while others such as tcpinfo set it to 0. Setting it to 0 also prevents the fault. Is this the correct fix? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.
[ https://issues.apache.org/jira/browse/TS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102336#comment-15102336 ] Peter Chou commented on TS-4134: Alan, the problem was evident only when running as a non-root user. I applied your patch and it seems to be working fine now. It also seemed to fix another issue where running 'trafficserver start' would only start traffic_cop and subsequently traffic_manager would have to be started manually and separately. Thanks for explaining about the scoping/destructor/auto-de-elevate behavior and for the quick response. > Traffic Manager aborts on attempted privilege escalation when non-root. > --- > > Key: TS-4134 > URL: https://issues.apache.org/jira/browse/TS-4134 > Project: Traffic Server > Issue Type: Bug >Affects Versions: 6.2.0 >Reporter: Peter Chou >Assignee: Alan M. Carroll > Fix For: 6.1.0 > > > Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and > mgmt/LocalManager.cc. The root of the issue might be that the semantics of > the ElevateAccess constructor argument was changed from (boolean,level) to > just a (level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess > access( ) calls in these two files were not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.
[ https://issues.apache.org/jira/browse/TS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099189#comment-15099189 ] Peter Chou commented on TS-4134: FYI, from grepping for 'ElevateAccess ' there are around ten instances of the constructor being called that may need to be reviewed. Something like - ElevateAccess access(root_access_needed); - would need to be wrapped in a conditional instead like - if (root_access_needed) { ElevateAccess access; } - and so on. > Traffic Manager aborts on attempted privilege escalation when non-root. > --- > > Key: TS-4134 > URL: https://issues.apache.org/jira/browse/TS-4134 > Project: Traffic Server > Issue Type: Bug >Affects Versions: 6.2.0 >Reporter: Peter Chou > Fix For: 6.1.0 > > > Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and > mgmt/LocalManager.cc. The root of the issue might be that the semantics of > the ElevateAccess constructor argument was changed from (boolean,level) to > just a (level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess > access( ) calls in these two files were not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.
Peter Chou created TS-4134: -- Summary: Traffic Manager aborts on attempted privilege escalation when non-root. Key: TS-4134 URL: https://issues.apache.org/jira/browse/TS-4134 Project: Traffic Server Issue Type: Bug Reporter: Peter Chou Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and mgmt/LocalManager.cc. The root of the issue might be that the semantics of the ElevateAccess constructor argument was changed from (boolean,level) to just a (level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess access( ) calls in these two files were not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TS-4134) Traffic Manager aborts on attempted privilege escalation when non-root.
[ https://issues.apache.org/jira/browse/TS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Chou updated TS-4134: --- Affects Version/s: 6.2.0 > Traffic Manager aborts on attempted privilege escalation when non-root. > --- > > Key: TS-4134 > URL: https://issues.apache.org/jira/browse/TS-4134 > Project: Traffic Server > Issue Type: Bug >Affects Versions: 6.2.0 >Reporter: Peter Chou > > Traffic Manager aborts since it cannot elevate access in mgmt/Rollback.cc and > mgmt/LocalManager.cc. The root of the issue might be that the semantics of > the ElevateAccess constructor argument was changed from (boolean,level) to > just a (level) by commit 6a5f6241 or TS-306. It seems the ElevateAccess > access( ) calls in these two files were not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-4021) Lua Plugin - Expose API Call TSHttpTxnFollowRedirect()
[ https://issues.apache.org/jira/browse/TS-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004713#comment-15004713 ] Peter Chou commented on TS-4021: Kit, thanks for volunteering to take a look at the proposed changes. I did basic testing under Ubuntu Linux 32-bit for this change. * First, we left "CONFIG proxy.config.http.redirection_enabled" out of records.config so it stayed as default of 0 or disabled globally. * Second, In remap.config, we added "map ... @plugin=tslua.so @pparam=.../my.lua". * Third, in my.lua we added -- function do_remap() ts.http.enable_redirect(1) return 0 end > Lua Plugin - Expose API Call TSHttpTxnFollowRedirect() > -- > > Key: TS-4021 > URL: https://issues.apache.org/jira/browse/TS-4021 > Project: Traffic Server > Issue Type: New Feature > Components: Lua, Plugins >Affects Versions: 6.1.0 >Reporter: Jeremy Payne >Assignee: Kit Chan >Priority: Minor > Fix For: 6.1.0 > > > Instead of relying on a config override, this plugin 'new feature' would > allow for enabling origin server redirection on the fly; via a direct API > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3932) TCP TOS not working
[ https://issues.apache.org/jira/browse/TS-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965741#comment-14965741 ] Peter Chou commented on TS-3932: Hi. I confirmed that this behavior is not present in 5.3.0, but it is present in later releases starting with 5.3.1. I traced the relevant code change to iocore/net/UnixNetProcessor.cc:455 where THREAD_ALLOC in 5.3.0 was changed to THREAD_ALLOC_INIT in 5.3.1. However, I think that the actual root problem also exists in 5.3.0, but it was just being masked by this difference in the thread allocation call. I believe the root problem is in iocore/net/UnixConnection.cc:303 (Connection::open()) where the apply_options() call to set the TOS bits are made before addr is assigned a valid value in Connection::connect() later on. The addr must be valid in order for the addr.isIp4() check used in apply_options() to work. Under 5.3.1, addr is uninitialized on all passes through Connection::open while in 5.3.0 it is uninitialized for the first pass but perhaps contains remnant values on subsequent passes. Questions: 1. Thoughts on moving apply_options() from Connection::open to Connection::connect (after the addr is set by setRemote(target))? 2. We are looking for a fix in the 5.3.x branch so should I open a separate issue? > TCP TOS not working > --- > > Key: TS-3932 > URL: https://issues.apache.org/jira/browse/TS-3932 > Project: Traffic Server > Issue Type: Bug > Components: Network >Affects Versions: 6.0.0 >Reporter: Bryan Call > Labels: Regression > Fix For: 6.1.0 > > > jasonstrongman2016: > In 5.3.0 the below works. However, seems to be broken in other > releases. Including this one. > # /opt/trafficserver60rc3/bin/traffic_ctl -V > Apache Traffic Server - traffic_ctl - 6.0.0 - (build # 091616 on Sep > 16 2015 at 16:49:13) > # /opt/trafficserver60rc3/bin/traffic_ctl config match sock_packet_tos_out > proxy.config.net.sock_packet_tos_out: 184 > #tcpdump > 17:55:07.377780 IP (tos 0x0, ttl 64, id 45468, offset 0, flags [DF], > proto TCP (6), length 60) >10.0.0.71.51306 > 74.125.227.196.80: Flags [S], -- This message was sent by Atlassian JIRA (v6.3.4#6332)