Re: Issues/questions with apr_memcache_multigetp

2015-09-24 Thread Yann Ylavic
Hi Jeffrey,

On Wed, Sep 23, 2015 at 9:23 PM, Jeffrey Crowell  wrote:
>
> We have some patches which were created in an attempt to fix some signed
> bugs in the original code that I think may be causing our issue.
>
> Namely here:
> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1453
> vs here
> https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L192
> The original code uses an atoi(), which has undefined behavior when called
> on non integer strings, leaving len to be 0.
>
> Second, the check here
> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1457
> vs https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L199
>
> In the original apr code, this check will never fail. len is an apr_size_t,
> and will never be < 0.

The code you refer has changed with [1] (i.e. APR-1.4.3, the current
version is 1.5.2), but still does not seem to fix the invalid/unknown
length/value/type issue, which are both unexpected errors in
apr_memcache, and as such should probably terminate the connection...

>
> The issue here now is that the check in the "server sent back a key that i
> didn't ask for"
> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1507
> (same in both forked and upstream), apr_pollset_remove, is never called, and
> the number of queries sent is never changed.
>
> Does it seem possible that this is causing the server to spin here

Yes, in the unexpected cases mentioned above, the socket is not
"consumed" (i.e. apr_brigade_partition() is not called), which let
some immediatly poll()able data for the next loops.

> would apr be interested in accepting a patch to fix the signedness issues?

Of course! I included parse_size() in the attached patch which tries
to address the parsing/unexpected issues by aborting the connection
and returning an error.
Does it work for you (it is based on trunk, so you may need to adapt
it to your version, which preferably should be the latest...)?

I'm not sure about the handling of empty values (the real len == 0
case, not the parsing error).
The previous code did not expect the trailing CRLF, while this patch does...

Regards,
Yann.

[1] http://svn.apache.org/r982408
Index: memcache/apr_memcache.c
===
--- memcache/apr_memcache.c	(revision 1685853)
+++ memcache/apr_memcache.c	(working copy)
@@ -731,6 +731,26 @@ apr_memcache_replace(apr_memcache_t *mc,
 
 }
 
+/*
+ * Parses a decimal size from size_str, returning the value in *size.
+ * Returns 1 if parsing was successful, 0 if parsing failed.
+ */
+static int parse_size(const char *size_str, apr_size_t *size)
+{
+char *endptr;
+long size_as_long;
+
+errno = 0;
+size_as_long = strtol(size_str, , 10);
+if ((size_as_long < 0) || (errno != 0) || (endptr == size_str) ||
+(endptr[0] != '\r') || (endptr[1] != '\n')) {
+return 0;
+}
+
+*size = (unsigned long)size_as_long;
+return 1;
+}
+
 APR_DECLARE(apr_status_t)
 apr_memcache_getp(apr_memcache_t *mc,
   apr_pool_t *p,
@@ -799,14 +819,11 @@ apr_memcache_getp(apr_memcache_t *mc,
 }
 
 length = apr_strtok(NULL, " ", );
-if (length) {
-len = strtol(length, (char **)NULL, 10);
+if (!length || !parse_size(length, )) {
+ms_bad_conn(ms, conn);
+apr_memcache_disable_server(mc, ms);
+return APR_EINVAL;
 }
-
-if (len == 0 )  {
-*new_length = 0;
-*baton = NULL;
-}
 else {
 apr_bucket_brigade *bbb;
 apr_bucket *e;
@@ -1361,74 +1378,69 @@ apr_memcache_multgetp(apr_memcache_t *mc,
char *last;
char *data;
apr_size_t len = 0;
+   apr_bucket *e = NULL;
 
apr_strtok(conn->buffer, " ", ); /* just the VALUE, ignore */
key = apr_strtok(NULL, " ", );
flags = apr_strtok(NULL, " ", );
+   length = apr_strtok(NULL, " ", );
 
-
-   length = apr_strtok(NULL, " ", );
-   if (length) {
-   len = strtol(length, (char **) NULL, 10);
+   if (!length || !parse_size(length, )) {
+   rv = APR_EINVAL;
}
+   else {
+   /* eat the trailing \r\n */
+   rv = apr_brigade_partition(conn->bb, len+2, );
+   }
+   if (rv != APR_SUCCESS) {
+   apr_pollset_remove (pollset, [i]);
+   mget_conn_result(FALSE, FALSE, rv, mc, ms, conn,
+server_query, values, server_queries);
+   queries_sent--;
+   continue;
+   }
 
value = apr_hash_get(values, key, strlen(key));
-
-  

Re: Issues/questions with apr_memcache_multigetp

2015-09-24 Thread Jeffrey Crowell
Hi Yann,

This patch looks like it should fix the hang we've seen based on the
instruction trace provided from a bug report (
http://i.imgur.com/RI3TKrU.png ) where essentially, queries_sent (
https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1414
) is never decremented.

The only reports of the bug have had HAProxy along with memcached, so I'm
not sure if something funky is going on there closing a connection in an
odd way.

One quick question on the parse_size function though.

Based on the protocol.txt for memcached, it seems like it should also be
valid if endptr[0] == ' ', endptr[1] == '\r', entptr[2] == '\n'
or even if just endptr[0] == ' ', no?

https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L227

I'm not really quite sure if the protol description is correct.

Should it in fact read

VALUE   [ ]\r\n

instead of

VALUE[]\r\n

And if the optional cas_unique token is present, wont this fail?

Jeff



On Thu, Sep 24, 2015 at 10:33 AM, Yann Ylavic  wrote:

> Hi Jeffrey,
>
> On Wed, Sep 23, 2015 at 9:23 PM, Jeffrey Crowell 
> wrote:
> >
> > We have some patches which were created in an attempt to fix some signed
> > bugs in the original code that I think may be causing our issue.
> >
> > Namely here:
> >
> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1453
> > vs here
> >
> https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L192
> > The original code uses an atoi(), which has undefined behavior when
> called
> > on non integer strings, leaving len to be 0.
> >
> > Second, the check here
> >
> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1457
> > vs
> https://gist.github.com/crowell/59bfa1bb9f0cda30c48a#file-multiget-c-L199
> >
> > In the original apr code, this check will never fail. len is an
> apr_size_t,
> > and will never be < 0.
>
> The code you refer has changed with [1] (i.e. APR-1.4.3, the current
> version is 1.5.2), but still does not seem to fix the invalid/unknown
> length/value/type issue, which are both unexpected errors in
> apr_memcache, and as such should probably terminate the connection...
>
> >
> > The issue here now is that the check in the "server sent back a key that
> i
> > didn't ask for"
> >
> https://github.com/pagespeed/mod_pagespeed/blob/master/third_party/aprutil/apr_memcache2.c#L1507
> > (same in both forked and upstream), apr_pollset_remove, is never called,
> and
> > the number of queries sent is never changed.
> >
> > Does it seem possible that this is causing the server to spin here
>
> Yes, in the unexpected cases mentioned above, the socket is not
> "consumed" (i.e. apr_brigade_partition() is not called), which let
> some immediatly poll()able data for the next loops.
>
> > would apr be interested in accepting a patch to fix the signedness
> issues?
>
> Of course! I included parse_size() in the attached patch which tries
> to address the parsing/unexpected issues by aborting the connection
> and returning an error.
> Does it work for you (it is based on trunk, so you may need to adapt
> it to your version, which preferably should be the latest...)?
>
> I'm not sure about the handling of empty values (the real len == 0
> case, not the parsing error).
> The previous code did not expect the trailing CRLF, while this patch
> does...
>
> Regards,
> Yann.
>
> [1] http://svn.apache.org/r982408
>