Re: [RFC] Tokenizer API
On 10/12/2013 9:38 p.m., Robert Collins wrote: On 10 December 2013 19:13, Amos Jeffries squ...@treenet.co.nz wrote: The problem with comparing input strings to a SBuf of characters is that parsing a input of length N againt charset of size M takes O(N*M) time. Huh? There are linear time parsers with PEGs. Or maybe I don't understand one of your preconditions to come up with an N*M complexity here. The brute-force way is to scan each N position for the M delimiters individually. -- SBuf uses while(0..N) {memchr(M)} ... O(N*M) PS. I suspect the linear time parsers you know of are converting the PEG into bool-array first before doing linear time scan with it or using multiple CPU threads to do a parallel linear scan on small M. Amos
Re: [RFC] Tokenizer API
I'd need to check. To align the others to our IRC discussion: I'm fine with the design Alex and you are suggesting. My only suggestions are to have remaining() return SBuf instead of const SBuf , and to have a few predefined CharacterSets. On Tue, Dec 10, 2013 at 10:52 AM, Amos Jeffries squ...@treenet.co.nz wrote: On 10/12/2013 9:38 p.m., Robert Collins wrote: On 10 December 2013 19:13, Amos Jeffries squ...@treenet.co.nz wrote: The problem with comparing input strings to a SBuf of characters is that parsing a input of length N againt charset of size M takes O(N*M) time. Huh? There are linear time parsers with PEGs. Or maybe I don't understand one of your preconditions to come up with an N*M complexity here. The brute-force way is to scan each N position for the M delimiters individually. -- SBuf uses while(0..N) {memchr(M)} ... O(N*M) PS. I suspect the linear time parsers you know of are converting the PEG into bool-array first before doing linear time scan with it or using multiple CPU threads to do a parallel linear scan on small M. Amos -- /kinkie
Re: [RFC] Tokenizer API
On 10 December 2013 19:13, Amos Jeffries squ...@treenet.co.nz wrote: The problem with comparing input strings to a SBuf of characters is that parsing a input of length N againt charset of size M takes O(N*M) time. Huh? There are linear time parsers with PEGs. Or maybe I don't understand one of your preconditions to come up with an N*M complexity here. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud
Re: [RFC] Tokenizer API
On 12/09/2013 10:46 PM, Francesco Chemolli wrote: My suggestion is to have CharacterSet be a SBuf and rely on them, at least for now. In any case having them be a SBuf promotes better interface decoupling and abstraction. CharacterSet is a set of characters, which is semantically very different from a string. It would be overall wrong to abuse SBuf for that purpose. It should either start simple (as I suggested) or immediately optimized into a dedicated array-based class (as Amos proposed). Oh, one more argument for having the low-level matching primitives in SBuf: it's a pet peeve of mine to use some form of compact tries and/or FSM to do single-pass low-level string matching in SBuf The purpose of Tokenizer is to help transform input into tokens using simple rules. There is absolutely nothing in that design that prevents someone from adding more matching methods to SBuf or optimizing existing ones. Tokenizer can call those new/optimized methods as needed. That does not lead to code duplication or even performance overheads. On the other hand, it would be very wrong for the caller to use raw SBuf methods to tokenize input. Tokenization is Tokenizer job, not SBuf job. Please let me know if I should give more arguments explaining the difference and supporting this design. SBuf was not really designed to be passed by nonconst reference. I do not think that is a valid claim. Nearly any C++ object can be passed by reference. There is nothing wrong with that as such, it is often a good idea, and no general-purpose design can (or should) prohibit that. The const part of the API (or lack of thereof) is just an indication that the method argument cannot (or can) be modified. const SBuf remaining() const I'd change the signature to SBuf remaining() const copying a SBuf is easy, returning one puts a lower requirement on the caller and is less constrained Copying SBuf is easy but relatively expensive compared to not copying. I am not sure GCC would optimize that copy away when it can be. Here, I am talking about calling various refcounting methods, not copying of data buffers, of course. The reference-based method would not allow the caller to modify the return value directly, but can you think of a common use pattern or two where such on-the-fly modification would actually be needed or desired? Please note that we can change this aspect of the interface later if needed without any need to change any of the callers (AFAICT). I'd also add to the interface a few constants to describe common character sets such as ALPHA, ALNUM, LOWERALPHA, UPPERALPHA etc. (I'd use the predefined character classes from grep(1) as a refetence for common patterns). Yes, the proposed design will work well only if we predefine virtually all character sets. There is no need for a new interface for that though. Just declare global constant CharacterSet objects (or functions creating/returning them). However, I recommend against adding a lot of such constants until they are needed by a specific caller. Why waste 256 bytes? It is easy to add more as needed. I would recommend adding a macro to create a CharacterSet from one of the ISALPHA(3) isfoo() functions/macros, but I am worried that those are locale-dependent while almost everything in Squid should not be. We may be better off hard-coding any protocol-defined character sets instead. Thank you, Alex.
Re: [PATCH] annotation support for external ACL helpers
Now I just had small hole in time to look. The mentioned logic seems pretty reasonable to me. Eliezer On 23/11/13 13:02, Amos Jeffries wrote: entryData.tag = label; @@ -1603,6 +1605,18 @@ { ACLFilledChecklist*checklist = Filled(static_castACLChecklist*(data)); checklist-extacl_entry = cbdataReference((external_acl_entry *)result); + +// attach the helper kv-pair to the transaction +if (HttpRequest * req = checklist-request) { +// XXX: we have no access to the transaction / AccessLogEntry so cant SyncNotes(). +// workaround by using anything already set in HttpRequest +// OR use new and rely on a later Sync copying these to AccessLogEntry +if (!req-notes) +req-notes = new NotePairs; + +req-notes-appendNewOnly(checklist-extacl_entry-notes); +} + checklist-resumeNonBlockingCheck(ExternalACLLookup::Instance()); }
Re: [RFC] Tokenizer API
On 12/09/2013 04:13 PM, Amos Jeffries wrote: Two requests for additional scope: * can we place this is a separate src/parse/ library please? - we have other generic parse code the deserves to all be bundled up together instead of spread out. Might as well start that collection process now. No objections. * Lets do the charset boolean array earlier rather than later. No objections. I hope I will not be the one implementing the proposed API or the above additions though. :-) CharacterSet(const char * const c, size_t len) You do not want the len argument. These character sets will be nearly always initialized once, from constant hard-coded c-strings. For esoteric cases, I would add an add(const char c) method to add a single character to the set (and use the merge operation to produce more elaborate sets as needed, see below for a sketch). It is a good idea to add a public label/name argument/member though, for debugging/triaging: CharacterSet(const char *label, const char *characters); ... const char *name; /// a label for debugging /// add all characters from the given CharacterSet to this one void merge(const CharacterSet src) const { Please provide a '+=' operator [that will use merge()]. Here is how I envision some sets might be created: const CharacterSet MySpaces() { static CharacterSet cs; if (!cs.name) { cs = CharacterSet(MySpaces, \n\t ); // RFC 12345 cs += OtherProtocolSpaces(); if (Config.weNeedToBeSpecial) cs.add('\r'); } return cs; } private: bool match_[256]; I suggest using std::vectorbool to avoid sprinkling the code with 256 constants or adding another member to store the hard-coded length. I would also call this member chars_ instead of match_ because some sets will be used in negative sense and match will be a little confusing in that negative context. NP: most of the time we will be wanting to define these CharacterSet as global once-off objects. So I'm not sure if the merge() method is useful, but shown here for completeness in case we want it for generating composite character sets. Agreed on all counts. Cheers, Alex.
Re: [RFC] Tokenizer API
On Tue, Dec 10, 2013 at 5:06 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 12/09/2013 04:13 PM, Amos Jeffries wrote: Two requests for additional scope: * can we place this is a separate src/parse/ library please? - we have other generic parse code the deserves to all be bundled up together instead of spread out. Might as well start that collection process now. No objections. * Lets do the charset boolean array earlier rather than later. No objections. I hope I will not be the one implementing the proposed API or the above additions though. :-) I can do it. Worst-case there will be more audit rounds than strictly needed. -- /kinkie
[PATCH] sslcrtvalidator_children concurrency option default value
Hi all, currently we have the following situation for sslcrtvalidator_children configuration option, which is may confusing people: 1) The testing sslcrtvalidator helper supports concurrency 2) The default concurrency if the sslcrtvalidator_children is not set, is concurrency=0 3) The default sslcrtvalidator_children line includes concurrency=1 4) The documentation says: Defaults to 0 which indicates the certficate validator is a old-style single threaded redirector. This is make a confusion. I am attaching a simple patch which set the concurrency option for sslcrtvalidator_children by default to 1. I believe that this is the best because the testing helper we provide supports concurrency by default. Other options is a) set sslcrtvalidator_children default line to use concurrency=0 b) Make a better documentation of the current behaviour to cf.data.pre Opinions? === modified file 'src/cf.data.pre' --- src/cf.data.pre 2013-12-05 11:04:45 + +++ src/cf.data.pre 2013-12-10 20:02:47 + @@ -2655,42 +2655,42 @@ startup=N Sets the minimum number of processes to spawn when Squid starts or reconfigures. When set to zero the first request will cause spawning of the first child process to handle it. Starting too few children temporary slows Squid under load while it tries to spawn enough additional processes to cope with traffic. idle=N Sets a minimum of how many processes Squid is to try and keep available at all times. When traffic begins to rise above what the existing processes can handle this many more will be spawned up to the maximum configured. A minimum setting of 1 is required. concurrency= The number of requests each certificate validator helper can handle in - parallel. Defaults to 0 which indicates the certficate validator - is a old-style single threaded redirector. + parallel. A value of 0 indicates the certficate validator is an + old-style single threaded redirector. Defaults to 1. When this directive is set to a value = 1 then the protocol used to communicate with the helper is modified to include a request ID in front of the request/response. The request ID from the request must be echoed back with the response to that request. You must have at least one ssl_crt_validator process. DOC_END COMMENT_START OPTIONS WHICH AFFECT THE NEIGHBOR SELECTION ALGORITHM - COMMENT_END NAME: cache_peer TYPE: peer DEFAULT: none LOC: Config.peers DOC_START === modified file 'src/ssl/Config.cc' --- src/ssl/Config.cc 2013-02-03 10:45:53 + +++ src/ssl/Config.cc 2013-12-10 19:59:30 + @@ -1,12 +1,21 @@ #include squid.h #include ssl/Config.h Ssl::Config Ssl::TheConfig; +Ssl::Config::Config(): +#if USE_SSL_CRTD +ssl_crtd(NULL), +#endif +ssl_crt_validator(NULL) +{ +ssl_crt_validator_Children.concurrency = 1; +} + Ssl::Config::~Config() { #if USE_SSL_CRTD xfree(ssl_crtd); #endif xfree(ssl_crt_validator); } === modified file 'src/ssl/Config.h' --- src/ssl/Config.h 2013-02-03 10:45:53 + +++ src/ssl/Config.h 2013-12-10 19:58:13 + @@ -1,34 +1,29 @@ #ifndef SQUID_SSL_CONFIG_H #define SQUID_SSL_CONFIG_H #include HelperChildConfig.h namespace Ssl { class Config { public: #if USE_SSL_CRTD char *ssl_crtd; /// Name of external ssl_crtd application. /// The number of processes spawn for ssl_crtd. HelperChildConfig ssl_crtdChildren; #endif char *ssl_crt_validator; HelperChildConfig ssl_crt_validator_Children; -Config(): -#if USE_SSL_CRTD -ssl_crtd(NULL), -#endif -ssl_crt_validator(NULL) {} - +Config(); ~Config(); private: Config(const Config ); // not implemented Config operator =(const Config ); // not implemented }; extern Config TheConfig; } // namespace Ssl #endif
Re: [PATCH] sslcrtvalidator_children concurrency option default value
On 2013-12-11 10:46, Tsantilas Christos wrote: Hi all, currently we have the following situation for sslcrtvalidator_children configuration option, which is may confusing people: 1) The testing sslcrtvalidator helper supports concurrency 2) The default concurrency if the sslcrtvalidator_children is not set, is concurrency=0 3) The default sslcrtvalidator_children line includes concurrency=1 4) The documentation says: Defaults to 0 which indicates the certficate validator is a old-style single threaded redirector. This is make a confusion. I am attaching a simple patch which set the concurrency option for sslcrtvalidator_children by default to 1. I believe that this is the best because the testing helper we provide supports concurrency by default. Other options is a) set sslcrtvalidator_children default line to use concurrency=0 b) Make a better documentation of the current behaviour to cf.data.pre Opinions? +1. Looks good. But can you change the new documentation line to avoid calling it a redirector and single-threaded? A value of 0 indicates the certficate validator does not support concurrency. should be fine. Amos
[PATCH] Destroy ACLs properly, take2
On 12/01/2013 05:43 PM, Alex Rousskov wrote: The attached patch destroys ACLs in the reverse order of creation to avoid destruction segfaults during reconfiguration. Done as trunk r13165. Sorry, that was not enough. I somehow missed an obvious use case that the committed fix does not cover, despite specifically trying to imagine it during testing: It is actually possible for a group ACL G to use an ACL A1 that was created before G and an ACL A2 that was created after G: acl A1 src 127.0.0.1 acl G all-of A1 acl A2 src 127.0.0.2 acl G all-of A2 Such order of ACL creation makes any creation-based destruction order invalid: Whether we use FIFO (as before) or LIFO (as after r13165), G's destructor may dump core when deleting either A1 or A2 because one of them will be already deleted earlier via Config.aclList. The attached patch fixes the problem differently. Instead of using Config.aclList to register and destroy explicit ACLs and then relying on InnerNode destructor to destroy its sometimes explicit children, the take2 patch dedicates a new container to register and destroy both implicit and explicit ACLs in one sweep. The InnerNode destructor is gone now, with aclDestroyAcls() replacing its functionality. The Config.aclList global is still used for searching explicit ACLs by name -- no changes there. Refcounting remains the preferred long-term solution, of course. I have once again reviewed the changes required to make that happen, and can confirm that they are too big for me to volunteer for that project right now. Volunteers welcome. Thank you, Alex. P.S. I left the registration order as in r13165 because it is much faster (eliminates one linear search through all ACLs when adding a new ACL). That can be reverted if we find a reason to search starting with older ACLs. However, if that search order is important, we should replace Config.aclList linked list with an ordered-by-ACL-name container, to eliminate another case of a linear search. Centrally destroy all explicit and implicit ACLs to avoid destruction segfaults during reconfiguration. Group ACLs created later may use other ACLs created earlier and vice versa, a group ACL created earlier may use other ACLs created later. The latter is possible when an ACL (e.g., A2 below) is declared when the group already exists: acl A1 src 127.0.0.1 acl Group all-of A1 acl A2 src 127.0.0.2 acl Group all-of A2 Thus, the group (i.e., InnerNode) ACL destructor may access already deleted children regardless of the global ACL deletion order (FIFO or LIFO with respect to ACL creation). Instead of relying on the deletion order to protect InnerNode, we remove the InnerNode ACL destructor completely and rely on a global set of registered ACLs to destroy all ACLs. The old code was destroying all explicit ACLs in the same centralized fashion. We now add implicit ACLs (commonly used by InnerNodes) to the centralized destruction sequence. We added a new destruction-dedicated container to avoid messing with the by-name ACL search that Config.aclList global is used for. This new container will become unnecessary once we start refcounting ACLs. === modified file 'src/acl/Acl.cc' --- src/acl/Acl.cc 2013-12-02 00:35:50 + +++ src/acl/Acl.cc 2013-12-10 23:48:56 + @@ -14,40 +14,41 @@ * sources; see the CREDITS file for full details. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, USA. * */ #include squid.h #include acl/Acl.h #include acl/Checklist.h +#include acl/Gadgets.h #include anyp/PortCfg.h #include cache_cf.h #include ConfigParser.h #include Debug.h #include dlink.h #include globals.h #include profiler/Profiler.h #include SquidConfig.h const ACLFlag ACLFlags::NoFlags[1] = {ACL_F_END}; const char *AclMatchedName = NULL; bool ACLFlags::supported(const ACLFlag f) const { if (f == ACL_F_REGEX_CASE) return true; return (supported_.find(f) != std::string::npos); } @@ -278,46 +279,47 @@ ACL::ParseAclLine(ConfigParser parser, /*split the function here */ A-parse(); /* * Clear AclMatchedName from our temporary hack */ AclMatchedName = NULL; /* ugly */ if (!new_acl) return; if (A-empty()) { debugs(28, DBG_CRITICAL, Warning: empty ACL: A-cfgline); }