Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-06-14 Thread Amos Jeffries
On 15/06/2014 4:58 a.m., Alex Rousskov wrote:
> On 06/11/2014 08:52 AM, Tsantilas Christos wrote:
> 
>> I must also note that this patch adds an inconsistency. All annotation
>> key=values  pairs received from helpers, accumulated to the existing key
>> notes values. The clt_conn_id=Id pair is always unique and replaces the
>> existing clt_conn_id=Id annotation pair.
>> We may want to make all annotations unique, or maybe implement a
>> configuration mechanism to define which annotations are overwriting
>> their previous values and which appending the new values.
> 
> I suggest making all annotations unique (i.e., new values overwrite old
> ones) because helpers that want to accumulate annotation values can do
> that by returning old values along with new ones:
> 
>   received by helper: name=v1
>   returned by helper: name=v1,v2
> 
> Please note that the opposite does not work: If annotation values are
> always accumulated, a helper cannot overwrite/remove the old value.
> 

Doing that would mean passing all existing annotations to every helper
lookup. Which would seriously degrade the helper result caching.

Amos


[PATCH] validate -n parameter value

2014-06-14 Thread Amos Jeffries
Update the service_name global to an SBuf and use Tokenizer to validate
the -n argument is a pure alphanumeric.

Amos
=== modified file 'src/WinSvc.cc'
--- src/WinSvc.cc   2014-01-24 01:57:15 +
+++ src/WinSvc.cc   2014-06-14 14:20:56 +
@@ -493,41 +493,41 @@
 
 #if defined(_MSC_VER) /* Microsoft C Compiler ONLY */
 
 newHandler = Squid_Win32InvalidParameterHandler;
 
 oldHandler = _set_invalid_parameter_handler(newHandler);
 
 _CrtSetReportMode(_CRT_ASSERT, 0);
 
 #endif
 #if USE_WIN32_SERVICE
 
 if (WIN32_run_mode == _WIN_SQUID_RUN_MODE_SERVICE) {
 char path[512];
 HKEY hndKey;
 
 if (signal(SIGABRT, WIN32_Abort) == SIG_ERR)
 return 1;
 
 /* Register the service Handler function */
-svcHandle = RegisterServiceCtrlHandler(service_name, WIN32_svcHandler);
+svcHandle = RegisterServiceCtrlHandler(service_name.c_str(), 
WIN32_svcHandler);
 
 if (svcHandle == 0)
 return 1;
 
 /* Set Process work dir to directory cointaining squid.exe */
 GetModuleFileName(NULL, path, 512);
 
 WIN32_module_name=xstrdup(path);
 
 path[strlen(path) - 10] = '\0';
 
 if (SetCurrentDirectory(path) == 0)
 return 1;
 
 safe_free(ConfigFile);
 
 /* get config file from Windows Registry */
 if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, REGKEY, 0, KEY_QUERY_VALUE, 
&hndKey) == ERROR_SUCCESS) {
 DWORD Type = 0;
 DWORD Size = 0;
@@ -654,192 +654,195 @@
 status = GetLastError();
 debugs(1, DBG_IMPORTANT, "SetServiceStatus error " << status);
 }
 
 debugs(1, DBG_IMPORTANT, "Leaving Squid service");
 break;
 
 default:
 debugs(1, DBG_IMPORTANT, "Unrecognized opcode " << Opcode);
 }
 
 return;
 }
 
 void
 WIN32_RemoveService()
 {
 SC_HANDLE schService;
 SC_HANDLE schSCManager;
 
-if (!service_name)
-service_name = xstrdup(APP_SHORTNAME);
+if (service_name.isEmpty())
+service_name = SBuf(APP_SHORTNAME);
 
-strcat(REGKEY, service_name);
+const char *service =  service_name.c_str();
+strcat(REGKEY, service);
 
-keys[4] = service_name;
+keys[4] = service;
 
 schSCManager = OpenSCManager(NULL, /* machine (NULL == local)*/
  NULL, /* database (NULL == 
default) */
  SC_MANAGER_ALL_ACCESS /* access required  
  */
 );
 
 if (!schSCManager)
 fprintf(stderr, "OpenSCManager failed\n");
 else {
-schService = OpenService(schSCManager, service_name, 
SERVICE_ALL_ACCESS);
+schService = OpenService(schSCManager, service, SERVICE_ALL_ACCESS);
 
 if (schService == NULL)
 fprintf(stderr, "OpenService failed\n");
 
 /* Could not open the service */
 else {
 /* try to stop the service */
 
 if (ControlService(schService, _WIN_SQUID_SERVICE_CONTROL_STOP,
&svcStatus)) {
 sleep(1);
 
 while (QueryServiceStatus(schService, &svcStatus)) {
 if (svcStatus.dwCurrentState == SERVICE_STOP_PENDING)
 sleep(1);
 else
 break;
 }
 }
 
 /* now remove the service */
 if (DeleteService(schService) == 0)
 fprintf(stderr, "DeleteService failed.\n");
 else
-printf("Service %s deleted successfully.\n", service_name);
+printf("Service " SQUIDSBUFPH " deleted successfully.\n", 
SQUIDSBUFPRINT(service_name));
 
 CloseServiceHandle(schService);
 }
 
 CloseServiceHandle(schSCManager);
 }
 }
 
 void
 WIN32_SetServiceCommandLine()
 {
-if (!service_name)
-service_name = xstrdup(APP_SHORTNAME);
+if (service_name.isEmpty())
+service_name = SBuf(APP_SHORTNAME);
 
-strcat(REGKEY, service_name);
+const char *service = servie_name.c_str();
+strcat(REGKEY, service);
 
-keys[4] = service_name;
+keys[4] = service;
 
 /* Now store the Service Command Line in the registry */
 WIN32_StoreKey(COMMANDLINE, REG_SZ, (unsigned char *) WIN32_Command_Line, 
strlen(WIN32_Command_Line) + 1);
 }
 
 void
 WIN32_InstallService()
 {
 SC_HANDLE schService;
 SC_HANDLE schSCManager;
 char ServicePath[512];
 char szPath[512];
 int lenpath;
 
-if (!service_name)
-service_name = xstrdup(APP_SHORTNAME);
+if (service_name.isEmpty())
+service_name = SBuf(APP_SHORTNAME);
 
-strcat(REGKEY, service_name);
+const char *service = service_name.c_str();
+strcat(REGKEY, service);
 
-keys[4] = service_name;
+keys[4] = service;
 
 if ((lenpath = GetModuleFileName(NULL, ServicePath, 512)) == 0) {
 

Re: [PATCH 5/8] reconfiguration leaks: objects tied to http_port

2014-06-14 Thread Amos Jeffries
On 15/06/2014 4:44 a.m., Alex Rousskov wrote:
> On 06/13/2014 07:25 PM, Amos Jeffries wrote:
>> On 14/06/2014 8:07 a.m., Alex Rousskov wrote:
>>> On 04/25/2014 02:59 AM, Amos Jeffries wrote:
 On 25/04/2014 12:55 p.m., Alex Rousskov wrote:
> Do not leak [SSL] objects tied to http_port and https_port on reconfigure.
>
> PortCfg objects were not destroyed at all (no delete call) and were
> incorrectly stored (excessive cbdata locking). This change adds
> destruction and removes excessive locking to allow the destructed
> object to be freed. It also cleans up forgotten(?) clientca and crlfile
> PortCfg members.
>
> This change fixes a serious leak but also carries an elevated risk:
> There is a lot of code throughout Squid that does not check the pointers
> to the objects that are now properly destroyed. It is possible that some
> of that code will crash some time after reconfigure. It is not possible
> to ensure that this does not happen without rewriting/fixing the
> offending code to use refcounting. Such a rewrite would be a relatively
> large change outside this patch scope. We may decide that it is better
> to leak than to take this additional risk.
>
> Alex.
>

 -0.

 I have a patch moving the SSL config options into a standalone
 ref-counted object. That can be polished up and references added to each
 ConnStateData fairly easily.
>>>
>>> Amos, what is the status of that patch? Any ETA? Do you expect your
>>> changes to be easily portable to v3.3?
>>
>> Stalled behind the larger works. If it is urgent I can did it out and
>> polish it up.
>>
>> It could be back-ported to 3.3 if you like. The design is a new
>> Ref-Countable class to hold all the SSL options (and generated state)
>> leaving just a Pointer to it in the main config class.
>>  * Ports which needed a clone operation took a copy of the pointer and
>> share the context.
>>  * client/server context initialization functions take a Pointer to the
>> class and update its state content.
> 
> Why treat SSL options differently from all other port options? Should
> not the whole PortCfg object be refcounted instead?
> 
> Please note that if you refcount the SSL options and do nothing else,
> the leak will not be fixed. The SSL options (and other things) will
> continue to leak because the leaking port structure will keep pointing
> to them. Thus, as far as I can see, the patch you described will need
> the proposed leak fix anyway.

The other options are POD-style options. They can be reset relatively
okay by the parsers memset() and are filled out by the config parse
routines.

SSL options are partially those type of options and partially SSL
library objects generated from those options. Generated options should
be in SquidConfig2 or one of the new standalone config gobals.

Also, the same set of config settings are needed by *_port, cache_peer,
and sslproxy_* directives. It makes sense to have one class type for SSL
config which is used by all three.


On the other side AnyP::PortCfgPointer already exists to reference count
the port objects. We could spend time converting the raw pointers to use
it instead so your patch here is not dangerous.

Amos


Re: [PATCH 1/8] reconfiguration leaks: implicit ACLs

2014-06-14 Thread Kinkie
I agree on the objective, and while this is not the solution it's a a
workaround; +1 but if you haven't already please add a TODO mentioning
the eventual refcount objective.



On Sat, Jun 14, 2014 at 3:21 AM, Amos Jeffries  wrote:
> On 14/06/2014 7:57 a.m., Alex Rousskov wrote:
>> On 04/25/2014 01:58 AM, Amos Jeffries wrote:
>>> On 25/04/2014 12:46 p.m., Alex Rousskov wrote:
 Do not leak implicit ACLs during reconfigure.

 Many ACLs implicitly created by Squid when grouping multiple ACLs were
 not destroyed because those implicit ACLs where not covered by the
 global ACL registry used for ACL destruction.

 See also: r13210 which did not go far enough because it incorrectly
 assumed that all InnerNode() children are aclRegister()ed and, hence,
 will be centrally freed.
>>
>>
>>> -0.
>>
>> Is this a "negative" vote from "Squid3 voting" rules point of view?
>> http://wiki.squid-cache.org/MergeProcedure#Squid3_Voting
>
> It is "I don't like it but not objecting to a commit if you do it".
>
>
>>
>>
>>> I believe we should move to reference counting ACLs instead of
>>> continuing to work around these edge cases.
>>
>> I agree that reference counting is an overall better design for ACLs, of
>> course. However, since refcounting ACLs would be a large change that
>> nobody has volunteered to implement in the foreseeable future (AFAIK), I
>> suggest that this [significant] leak fix should go in now.
>>
>> Any other votes/opinions?
>>
>>
>> Thank you,
>>
>> Alex.
>>
>



-- 
Francesco


Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-06-14 Thread Alex Rousskov
On 06/11/2014 08:52 AM, Tsantilas Christos wrote:

> I must also note that this patch adds an inconsistency. All annotation
> key=values  pairs received from helpers, accumulated to the existing key
> notes values. The clt_conn_id=Id pair is always unique and replaces the
> existing clt_conn_id=Id annotation pair.
> We may want to make all annotations unique, or maybe implement a
> configuration mechanism to define which annotations are overwriting
> their previous values and which appending the new values.

I suggest making all annotations unique (i.e., new values overwrite old
ones) because helpers that want to accumulate annotation values can do
that by returning old values along with new ones:

  received by helper: name=v1
  returned by helper: name=v1,v2

Please note that the opposite does not work: If annotation values are
always accumulated, a helper cannot overwrite/remove the old value.


If, in the future, we discover a need to support accumulating
annotations, we can add a squid.conf option to enumerate annotation keys
that should be accumulated.

IMO, if others agree that annotations should not automatically
accumulate, that change can be done as a part of this patch (because it
needs the uniqueness exception/code anyway). If others disagree, the
patch can go in "as is" while we wait for the need to add a
configuration option to control uniqueness.


Cheers,

Alex.



Re: [PATCH 5/8] reconfiguration leaks: objects tied to http_port

2014-06-14 Thread Alex Rousskov
On 06/13/2014 07:25 PM, Amos Jeffries wrote:
> On 14/06/2014 8:07 a.m., Alex Rousskov wrote:
>> On 04/25/2014 02:59 AM, Amos Jeffries wrote:
>>> On 25/04/2014 12:55 p.m., Alex Rousskov wrote:
 Do not leak [SSL] objects tied to http_port and https_port on reconfigure.

 PortCfg objects were not destroyed at all (no delete call) and were
 incorrectly stored (excessive cbdata locking). This change adds
 destruction and removes excessive locking to allow the destructed
 object to be freed. It also cleans up forgotten(?) clientca and crlfile
 PortCfg members.

 This change fixes a serious leak but also carries an elevated risk:
 There is a lot of code throughout Squid that does not check the pointers
 to the objects that are now properly destroyed. It is possible that some
 of that code will crash some time after reconfigure. It is not possible
 to ensure that this does not happen without rewriting/fixing the
 offending code to use refcounting. Such a rewrite would be a relatively
 large change outside this patch scope. We may decide that it is better
 to leak than to take this additional risk.

 Alex.

>>>
>>> -0.
>>>
>>> I have a patch moving the SSL config options into a standalone
>>> ref-counted object. That can be polished up and references added to each
>>> ConnStateData fairly easily.
>>
>> Amos, what is the status of that patch? Any ETA? Do you expect your
>> changes to be easily portable to v3.3?
> 
> Stalled behind the larger works. If it is urgent I can did it out and
> polish it up.
> 
> It could be back-ported to 3.3 if you like. The design is a new
> Ref-Countable class to hold all the SSL options (and generated state)
> leaving just a Pointer to it in the main config class.
>  * Ports which needed a clone operation took a copy of the pointer and
> share the context.
>  * client/server context initialization functions take a Pointer to the
> class and update its state content.

Why treat SSL options differently from all other port options? Should
not the whole PortCfg object be refcounted instead?

Please note that if you refcount the SSL options and do nothing else,
the leak will not be fixed. The SSL options (and other things) will
continue to leak because the leaking port structure will keep pointing
to them. Thus, as far as I can see, the patch you described will need
the proposed leak fix anyway.


Cheers,

Alex.



Re: [code] [for discussion] map-trie

2014-06-14 Thread Alex Rousskov
On 06/14/2014 12:39 AM, Kinkie wrote:
> On Fri, Jun 13, 2014 at 4:50 PM, Alex Rousskov
>  wrote:
>> On 06/11/2014 04:43 PM, Kinkie wrote:
>>> level-compact-trie: the mean time is 11 sec; all runs take between
>>> 10.782 and 11.354 secs; 18 Mb of core used
>>
>>> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core.
>>
>>> splay-based: mean time is 16.3sec; all runs take between 16.193 and
>>> 16.427 secs; 14 Mb of core
>>
>> How about std::map? Have you considered using a standard class for this
>> purpose?
> 
> std::map doesn't offer efficient prefix matching, but sure, I'll try
> to prepare a test run on the same data to establish a baseline.

Is explicit support to prefix matching actually required though? I am
thinking of using std::map::lower_bound or similar to find the vicinity
of a possible prefix match. For example, when the map has a b.c domain
stored, and you are searching for a.b.c, the lower_bound method returns
a "pointer" to the stored b.c node. After that, a simple prefix test
between the two strings can return the final match/mismatch answer.

Needless to say, all domain names ought to be stored/compared in the
reverse order of their labels. For example, a.b.c is stored internally
as c.b.a. It just feels more natural for me to render it as a.b.c in a
human-oriented text like this email.

And if we need to support prefixes that do not end at domain label
boundaries (i.e., *foo.bar should match zzzfoo.bar), then we just treat
each character as a "label".


HTH,

Alex.



Re: [code] [for discussion] map-trie

2014-06-14 Thread Robert Collins
FWIW my intent with libTrie was always to head for an LPC Trie
implementation as the only implementation. This should be faster still
than the current naive trie, and faster than the std::map compression
technique you've implemented.

[ IIRC the LPC paper was

Implementing a Dynamic Compressed Trie from Stefan Nilsson, Matti
Tikkanen ]- I have the pdf floating around here somewhere, but a quick
google just hit paywalls these days :(.

-Rob

On 12 June 2014 10:43, Kinkie  wrote:
> Hi,
>   I've done some benchmarking, here are the results so far:
> The proposal I'm suggesting for dstdomain acl is at
> lp:~kinkie/squid/flexitrie . It uses the level-compact trie approach
> I've described in this thread (NOT a Patricia trie). As a comparison,
> lp:~kinkie/squid/domaindata-benchmark implements the same benchmark
> using the current splay-based implementation.
>
> I've implemented a quick-n-dirty benchmarking tool
> (src/acl/testDomainDataPerf); it takes as input an acl definition -
> one dstdomain per line, as if it was included in squid.conf, and a
> hostname list file (one destination hostname per line).
>
> I've run both variants of the code against the same dataset: a 4k
> entries domain list, containing both hostnames and domain names, and a
> 18M entries list of destination hostnames, both matching and not
> matching entries in the domain list (about 7.5M hits, 10.5M misses).
>
> Tested 10 times on a Core 2 PC with plenty of RAM - source datasets
> are in the fs cache.
> level-compact-trie: the mean time is 11 sec; all runs take between
> 10.782 and 11.354 secs; 18 Mb of core used
> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core.
> splay-based: mean time is 16.3sec; all runs take between 16.193 and
> 16.427 secs; 14 Mb of core
>
> I expect compact-trie to scale better as the number of entries in the
> list grows and with the number of clients and requests per second;
> furthermore using it removes 50-100 LOC, and makes code more readable.
>
> IMO it is the best compromise in terms of performance, resources
> useage and expected scalability; before pursuing this further however
> I'd like to have some feedback.
>
> Thanks
>
>
> On Wed, Jun 4, 2014 at 4:11 PM, Alex Rousskov
>  wrote:
>> On 06/04/2014 02:06 AM, Kinkie wrote:
>>
>>> there are use cases
>>> for using a Trie (e.g. prefix matching for dstdomain ACL); these may
>>> be served by other data strcutures, but none we could come up with on
>>> the spot.
>>
>> std::map and StringIdentifier are the ones we came up with on the spot.
>> Both may require some adjustments to address the use case you are after.
>>
>> Alex.
>>
>>
>>> A standard trie uses quite a lot of RAM for those use cases.
>>> There are quite a lot of areas where we can improve so this one is not
>>> urgent. Still I'd like to explore it as it's synchronous code (thus
>>> easier for me to follow) and it's a nice area to tinker with.
>>>
>>> On Tue, Jun 3, 2014 at 10:12 PM, Alex Rousskov
>>>  wrote:
 On 06/03/2014 08:40 AM, Kinkie wrote:
> Hi all,
>   as an experiment and to encourage some discussion I prepared an
> alternate implementation of TrieNode which uses a std::map instead of
> an array to store a node's children.
>
> The expected result is a worst case performance degradation on insert
> and delete from O(N) to O(N log R) where N is the length of the
> c-string being looked up, and R is the size of the alphabet (as R =
> 256, we're talking about 8x worse).
>
> The expected benefit is a noticeable reduction in terms of memory use,
> especially for sparse key-spaces; it'd be useful e.g. in some lookup
> cases.
>
> Comments?


 To evaluate these optimizations, we need to know the targeted use cases.
 Amos mentioned ESI as the primary Trie user. I am not familiar with ESI
 specifics (and would be surprised to learn you want to optimize ESI!),
 but would recommend investigating a different approach if your goal is
 to optimize search/identification of strings from a known-in-advance set.


 Cheers,

 Alex.

>>>
>>>
>>>
>>
>
>
>
> --
> Francesco



-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud