On 09/17/2013 04:41 PM, Lukas Slebodnik wrote:
On (16/09/13 18:16), Michal Židek wrote:
On 09/14/2013 12:35 AM, Lukas Slebodnik wrote:
On (13/09/13 19:17), Michal Židek wrote:
On 09/13/2013 05:58 PM, Michal Židek wrote:
Hello,

This patch should add another line of defence against memory cache
problems caused by accessing slot outside of bounds.

Thanks
Michal


After discussion with Lukas I am attaching alternative version
without the call to save the corrupted cache.

Thanks
Michal


I tested patch with latest sssd-1-9 on RHEL6.

Message was written to the sssd_nss.log:
(Sat Sep 14 00:19:24 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash]
(0x0010): Corrupted fastcache. Slot number too big.

But it seems that sssd_nss crashed.

Program received signal SIGILL, Illegal instruction.
0x00007fa63f8b5b03 in mabort () from /lib64/libc.so.6
#0  0x00007fa63f8b5b03 in mabort () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000000008 in ?? ()
No symbol table info available.
#2  0x0000000001086cf0 in ?? ()
No symbol table info available.
#3  0x00000000010842a0 in ?? ()
No symbol table info available.
#4  0x00000000004288de in sss_mc_find_free_slots (_mcc=0x1086cf0, rec_len=252, 
key=<value optimized out>, _rec=0x7fff1b6bb008) at 
src/responder/nss/nsssrv_mmap_cache.c:495
         tot_slots = <value optimized out>
         i = <value optimized out>
         rec = <value optimized out>
         cur = 0
         t = <value optimized out>
         used = true
#5  sss_mc_get_record (_mcc=0x1086cf0, rec_len=252, key=<value optimized out>, 
_rec=0x7fff1b6bb008) at src/responder/nss/nsssrv_mmap_cache.c:637
         mcc = 0x7fa632dfa038
         old_rec = <value optimized out>
         rec = <value optimized out>
         old_slots = <value optimized out>
         num_slots = 8
         base_slot = <value optimized out>
         ret = 853516376
         i = <value optimized out>
         __FUNCTION__ = "sss_mc_get_record"
#6  0x0000000000428eee in sss_mmap_cache_pw_store (_mcc=0x10842a0, 
name=0x7fff1b6bb140, pw=0x7fff1b6bb150, uid=126516, gid=16000, 
gecos=0x7fff1b6bb180, homedir=0x7fff1b6bb170, shell=0x7fff1b6bb160) at 
src/responder/nss/nsssrv_mmap_cache.c:731
         mcc = 0x1086cf0
         rec = <value optimized out>
         data = <value optimized out>
         uidkey = {str = 0x7fff1b6bb010 "126516", len = 7}
         uidstr = "126516\000\000\021)\213"
         data_len = 204
         rec_len = 252
         pos = <value optimized out>
         ret = <value optimized out>

LS

The crash was caused, because mmap_cache was invalidated
and memset was called for invalidated record.

memset(rec->data, MC_INVALID_VAL8, <snip> * MC_SIZE_TO_SLOTS(rec->len))
                                                             ^^^^^^^^^^
                                                     rec->len is 0xffffffff

Good catch.

In function sss_mc_get_next_slot_with_hash I returned MC_INVALID_VAL
in case of failure. This function is called in function
sss_mc_rm_rec_from_chain, which returns no error code. So the upper
layers worked with the wrong MC_INVALID_VAL, as if it was valid value
and the MC_PTR_TO_SLOT resulted in some big slot number (which
probably caused the crash). In this patch I populate the information
about failure to to sss_mc_invalidate_rec and terminate it if
necessary.

New patch is attached. Please run the testing script with your
configuration that failed to see if it is fixed.

Thanks
Michal

I ran a stress test for 20+ hours and sssd_nss didn't crash,
only mmap_cache was reseted or reinvalidated.

Here is sssd_nss.log:
=============================================================
(Mon Sep 16 18:49:56 2013) [sssd[nss]] [monitor_common_rotate_logs] (0x0010): 
Debug level changed to 0x0070
(Mon Sep 16 18:49:57 2013) [sssd[nss]] [monitor_common_rotate_logs] (0x0010): 
Debug level changed to 0x0030
(Mon Sep 16 18:53:31 2013) [sssd[nss]] [setpwent_result_timeout] (0x0020): 
setpwent result object has expired. Cleaning up.
(Mon Sep 16 19:37:17 2013) [sssd[nss]] [sss_mc_find_record] (0x0010): Corrupted 
fastcache. name_ptr value is 16.
--- 1st reset.
(Mon Sep 16 21:32:40 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 2nd reset.
(Mon Sep 16 22:04:04 2013) [sssd[nss]] [sss_mc_find_record] (0x0010): Corrupted 
fastcache. name_ptr value is 909128497.
--- 3rd reset.
(Mon Sep 16 23:15:23 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
(Mon Sep 16 23:40:19 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
Corrupted fastcache. Slot number too big.
(Mon Sep 16 23:40:19 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
internal mmap cache error, invalidating cache!
(Mon Sep 16 23:40:19 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
user testbigldap129485(default) in mmap cache!
--- 1st invalidation.
(Tue Sep 17 06:06:51 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 4th reset.
(Tue Sep 17 06:25:04 2013) [sssd[nss]] [sss_mc_add_rec_to_chain] (0x0010): 
Corrupted fastcache. Slot number too big.
--- 5th reset.
(Tue Sep 17 07:48:23 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 6th reset.
(Tue Sep 17 08:08:36 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
Corrupted fastcache. Slot number too big.
(Tue Sep 17 08:08:36 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
internal mmap cache error, invalidating cache!
(Tue Sep 17 08:08:36 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
user testbigldap119808(default) in mmap cache!
--- 2nd invalidation.
(Tue Sep 17 12:56:02 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 7th reset.
(Tue Sep 17 13:22:27 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
Corrupted fastcache. Slot number too big.
(Tue Sep 17 13:22:27 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
internal mmap cache error, invalidating cache!
(Tue Sep 17 13:22:27 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
user testbigldap126987(default) in mmap cache!
--- 3rd invalidation.
(Tue Sep 17 14:30:56 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 8th reset.
(Tue Sep 17 14:54:21 2013) [sssd[nss]] [sss_mc_add_rec_to_chain] (0x0010): 
Corrupted fastcache. Slot number too big.
--- 9th reset.
=============================================================

In older log sile, I also saw reseting caches in another functions.
BTW: I tested my patch with "two next pointers" and it did not crash and
log file is empty :-)

just few small nitpicks.

From ba054788692f713eed0738f5b3f5e6bb48fe4c4b Mon Sep 17 00:00:00 2001
From: Michal Zidek <[email protected]>
Date: Fri, 13 Sep 2013 17:41:28 +0200
Subject: [PATCH] Check slot validity before MC_SLOT_TO_PTR.

resolves:
https://fedorahosted.org/sssd/ticket/2049
---
src/responder/nss/nsssrv_mmap_cache.c | 80 +++++++++++++++++++++++++++++++----
src/sss_client/nss_mc_common.c        |  4 ++
2 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/src/responder/nss/nsssrv_mmap_cache.c 
b/src/responder/nss/nsssrv_mmap_cache.c
index 4e45405..09a4c2b 100644
--- a/src/responder/nss/nsssrv_mmap_cache.c
+++ b/src/responder/nss/nsssrv_mmap_cache.c
@@ -191,6 +191,13 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx *mcc,
     }

     do {
+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+            DEBUG(SSSDBG_FATAL_FAILURE,
+                  ("Corrupted fastcache. Slot number too big.\n"));
+            sss_mmap_cache_reset(mcc);
+            return;
+        }
+
         cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
         if (cur == rec) {
             /* rec already stored in hash chain */
@@ -205,16 +212,24 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx 
*mcc,
     cur->next = MC_PTR_TO_SLOT(mcc->data_table, rec);
}

-static inline uint32_t
+static inline errno_t
sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
                                struct sss_mc_rec *start_rec,
-                               uint32_t hash)
+                               uint32_t hash,
+                               uint32_t *_slot)
{
     struct sss_mc_rec *rec;
     uint32_t slot;

     slot = start_rec->next;
     while (slot != MC_INVALID_VAL) {
+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+            DEBUG(SSSDBG_FATAL_FAILURE,
+                  ("Corrupted fastcache. Slot number too big.\n"));
+            sss_mmap_cache_reset(mcc);
+            return EINVAL;
+        }
+
         rec = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
         if (rec->hash1 == hash || rec->hash2 == hash) {
             break;
@@ -223,23 +238,26 @@ sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
         slot = rec->next;
     }

-    return slot;
+    *_slot = slot;
+
+    return EOK;
}

-static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
+static errno_t sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
                                      struct sss_mc_rec *rec,
                                      uint32_t hash)
{
     struct sss_mc_rec *prev = NULL;
     struct sss_mc_rec *cur = NULL;
     uint32_t slot;
+    uint32_t ret;

     if (hash > MC_HT_ELEMS(mcc->ht_size)) {
         /* It can happen if rec->hash1 and rec->hash2 was the same.
          * or it is invalid hash. It is better to return
          * than trying to access out of bounds memory
          */
-        return;
+        return EOK;
                  ^^^
           EOK should be returned on if hash is equal to MC_INVALID_VAL

Ok. I also changed the comments (split the error and normal case).


     }

     slot = mcc->hash_table[hash];
@@ -247,18 +265,37 @@ static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx 
*mcc,
         /* record has already been removed. It may happen if rec->hash1 and
          * rec->has2 are the same. (It is not very likely).
          */
-        return;
+        return EOK;
+    }
+
+    if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+        DEBUG(SSSDBG_FATAL_FAILURE,
+              ("Corrupted fastcache. Slot number too big.\n"));
+        sss_mmap_cache_reset(mcc);
+        return EINVAL;
     }
+
     cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
     if (cur == rec) {
         /* rec->next can refer to record without matching hashes.
          * We need to skip this(those) records, because
          * mcc->hash_table[hash] have to refer to valid start of the chain.
          */
-        mcc->hash_table[hash] = sss_mc_get_next_slot_with_hash(mcc, rec, hash);
+        ret = sss_mc_get_next_slot_with_hash(mcc, rec, hash,
+                                             &mcc->hash_table[hash]);
+        if (ret != EOK) {
+            return ret;
+        }
This condition does not make sense.
You can use:
       either return ret;
       or return sss_mc_get_next_slot_with_hash

Well, I wanted the function to run to the end if nothing went wrong, but yes, in this case we return at several places in the function so it does not make much sense.


LS

Thanks for the review Lukas. New patch is attached.

Michal
>From 517ba35b01259133f9d6363409ce2ddc7a871419 Mon Sep 17 00:00:00 2001
From: Michal Zidek <[email protected]>
Date: Fri, 13 Sep 2013 17:41:28 +0200
Subject: [PATCH] Check slot validity before MC_SLOT_TO_PTR.

resolves:
https://fedorahosted.org/sssd/ticket/2049
---
 src/responder/nss/nsssrv_mmap_cache.c | 90 +++++++++++++++++++++++++++++------
 src/sss_client/nss_mc_common.c        |  4 ++
 2 files changed, 79 insertions(+), 15 deletions(-)

diff --git a/src/responder/nss/nsssrv_mmap_cache.c b/src/responder/nss/nsssrv_mmap_cache.c
index 4e45405..e46a7dc 100644
--- a/src/responder/nss/nsssrv_mmap_cache.c
+++ b/src/responder/nss/nsssrv_mmap_cache.c
@@ -191,6 +191,13 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx *mcc,
     }
 
     do {
+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+            DEBUG(SSSDBG_FATAL_FAILURE,
+                  ("Corrupted fastcache. Slot number too big.\n"));
+            sss_mmap_cache_reset(mcc);
+            return;
+        }
+
         cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
         if (cur == rec) {
             /* rec already stored in hash chain */
@@ -205,16 +212,24 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx *mcc,
     cur->next = MC_PTR_TO_SLOT(mcc->data_table, rec);
 }
 
-static inline uint32_t
+static inline errno_t
 sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
                                struct sss_mc_rec *start_rec,
-                               uint32_t hash)
+                               uint32_t hash,
+                               uint32_t *_slot)
 {
     struct sss_mc_rec *rec;
     uint32_t slot;
 
     slot = start_rec->next;
     while (slot != MC_INVALID_VAL) {
+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+            DEBUG(SSSDBG_FATAL_FAILURE,
+                  ("Corrupted fastcache. Slot number too big.\n"));
+            sss_mmap_cache_reset(mcc);
+            return EINVAL;
+        }
+
         rec = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
         if (rec->hash1 == hash || rec->hash2 == hash) {
             break;
@@ -223,23 +238,27 @@ sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
         slot = rec->next;
     }
 
-    return slot;
+    *_slot = slot;
+
+    return EOK;
 }
 
-static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
-                                     struct sss_mc_rec *rec,
-                                     uint32_t hash)
+static errno_t sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
+                                        struct sss_mc_rec *rec,
+                                        uint32_t hash)
 {
     struct sss_mc_rec *prev = NULL;
     struct sss_mc_rec *cur = NULL;
     uint32_t slot;
 
     if (hash > MC_HT_ELEMS(mcc->ht_size)) {
-        /* It can happen if rec->hash1 and rec->hash2 was the same.
-         * or it is invalid hash. It is better to return
-         * than trying to access out of bounds memory
-         */
-        return;
+        if (hash == MC_INVALID_VAL) {
+            /* This can happen if rec->hash1 and rec->hash2 was the same. */
+            return EOK;
+        }
+
+        /* The hash is invalid. */
+        return EINVAL;
     }
 
     slot = mcc->hash_table[hash];
@@ -247,18 +266,34 @@ static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
         /* record has already been removed. It may happen if rec->hash1 and
          * rec->has2 are the same. (It is not very likely).
          */
-        return;
+        return EOK;
     }
+
+    if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+        DEBUG(SSSDBG_FATAL_FAILURE,
+              ("Corrupted fastcache. Slot number too big.\n"));
+        sss_mmap_cache_reset(mcc);
+        return EINVAL;
+    }
+
     cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
     if (cur == rec) {
         /* rec->next can refer to record without matching hashes.
          * We need to skip this(those) records, because
          * mcc->hash_table[hash] have to refer to valid start of the chain.
          */
-        mcc->hash_table[hash] = sss_mc_get_next_slot_with_hash(mcc, rec, hash);
+        return sss_mc_get_next_slot_with_hash(mcc, rec, hash,
+                                              &mcc->hash_table[hash]);
     } else {
         slot = cur->next;
         while (slot != MC_INVALID_VAL) {
+            if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+                DEBUG(SSSDBG_FATAL_FAILURE,
+                      ("Corrupted fastcache. Slot number too big.\n"));
+                sss_mmap_cache_reset(mcc);
+                return EINVAL;
+            }
+
             prev = cur;
             cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
             if (cur == rec) {
@@ -278,6 +313,8 @@ static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
             }
         }
     }
+
+    return EOK;
 }
 
 static void sss_mc_free_slots(struct sss_mc_ctx *mcc, struct sss_mc_rec *rec)
@@ -296,6 +333,8 @@ static void sss_mc_free_slots(struct sss_mc_ctx *mcc, struct sss_mc_rec *rec)
 static void sss_mc_invalidate_rec(struct sss_mc_ctx *mcc,
                                   struct sss_mc_rec *rec)
 {
+    errno_t ret;
+
     if (rec->b1 == MC_INVALID_VAL) {
         /* record already invalid */
         return;
@@ -303,9 +342,16 @@ static void sss_mc_invalidate_rec(struct sss_mc_ctx *mcc,
 
     /* Remove from hash chains */
     /* hash chain 1 */
-    sss_mc_rm_rec_from_chain(mcc, rec, rec->hash1);
+    ret = sss_mc_rm_rec_from_chain(mcc, rec, rec->hash1);
+    if (ret != EOK) {
+        return;
+    }
+
     /* hash chain 2 */
-    sss_mc_rm_rec_from_chain(mcc, rec, rec->hash2);
+    ret = sss_mc_rm_rec_from_chain(mcc, rec, rec->hash2);
+    if (ret != EOK) {
+        return;
+    }
 
     /* Clear from free_table */
     sss_mc_free_slots(mcc, rec);
@@ -353,6 +399,13 @@ static bool sss_mc_is_valid_rec(struct sss_mc_ctx *mcc, struct sss_mc_rec *rec)
         self = NULL;
         slot = mcc->hash_table[rec->hash1];
         while (slot != MC_INVALID_VAL32 && self != rec) {
+            if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+                DEBUG(SSSDBG_FATAL_FAILURE,
+                      ("Corrupted fastcache. Slot number too big.\n"));
+                sss_mmap_cache_reset(mcc);
+                return false;
+            }
+
             self = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
             slot = self->next;
         }
@@ -364,6 +417,13 @@ static bool sss_mc_is_valid_rec(struct sss_mc_ctx *mcc, struct sss_mc_rec *rec)
         self = NULL;
         slot = mcc->hash_table[rec->hash2];
         while (slot != MC_INVALID_VAL32 && self != rec) {
+            if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+                DEBUG(SSSDBG_FATAL_FAILURE,
+                      ("Corrupted fastcache. Slot number too big.\n"));
+                sss_mmap_cache_reset(mcc);
+                return false;
+            }
+
             self = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
             slot = self->next;
         }
diff --git a/src/sss_client/nss_mc_common.c b/src/sss_client/nss_mc_common.c
index a0a70ab..6764d80 100644
--- a/src/sss_client/nss_mc_common.c
+++ b/src/sss_client/nss_mc_common.c
@@ -190,6 +190,10 @@ errno_t sss_nss_mc_get_record(struct sss_cli_mc_ctx *ctx,
     int count;
     int ret;
 
+    if (!MC_SLOT_WITHIN_BOUNDS(slot, ctx->dt_size)) {
+        return EINVAL;
+    }
+
     /* try max 5 times */
     for (count = 5; count > 0; count--) {
         rec = MC_SLOT_TO_PTR(ctx->data_table, slot, struct sss_mc_rec);
-- 
1.7.11.2

_______________________________________________
sssd-devel mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

Reply via email to