On 09/17/2013 06:15 PM, Lukas Slebodnik wrote:
On (17/09/13 17:15), Michal Židek wrote:
On 09/17/2013 04:41 PM, Lukas Slebodnik wrote:
On (16/09/13 18:16), Michal Židek wrote:
On 09/14/2013 12:35 AM, Lukas Slebodnik wrote:
On (13/09/13 19:17), Michal Židek wrote:
On 09/13/2013 05:58 PM, Michal Židek wrote:
Hello,

This patch should add another line of defence against memory cache
problems caused by accessing slot outside of bounds.

Thanks
Michal


After discussion with Lukas I am attaching alternative version
without the call to save the corrupted cache.

Thanks
Michal


I tested patch with latest sssd-1-9 on RHEL6.

Message was written to the sssd_nss.log:
(Sat Sep 14 00:19:24 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash]
(0x0010): Corrupted fastcache. Slot number too big.

But it seems that sssd_nss crashed.

Program received signal SIGILL, Illegal instruction.
0x00007fa63f8b5b03 in mabort () from /lib64/libc.so.6
#0  0x00007fa63f8b5b03 in mabort () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000000008 in ?? ()
No symbol table info available.
#2  0x0000000001086cf0 in ?? ()
No symbol table info available.
#3  0x00000000010842a0 in ?? ()
No symbol table info available.
#4  0x00000000004288de in sss_mc_find_free_slots (_mcc=0x1086cf0, rec_len=252, 
key=<value optimized out>, _rec=0x7fff1b6bb008) at 
src/responder/nss/nsssrv_mmap_cache.c:495
         tot_slots = <value optimized out>
         i = <value optimized out>
         rec = <value optimized out>
         cur = 0
         t = <value optimized out>
         used = true
#5  sss_mc_get_record (_mcc=0x1086cf0, rec_len=252, key=<value optimized out>, 
_rec=0x7fff1b6bb008) at src/responder/nss/nsssrv_mmap_cache.c:637
         mcc = 0x7fa632dfa038
         old_rec = <value optimized out>
         rec = <value optimized out>
         old_slots = <value optimized out>
         num_slots = 8
         base_slot = <value optimized out>
         ret = 853516376
         i = <value optimized out>
         __FUNCTION__ = "sss_mc_get_record"
#6  0x0000000000428eee in sss_mmap_cache_pw_store (_mcc=0x10842a0, 
name=0x7fff1b6bb140, pw=0x7fff1b6bb150, uid=126516, gid=16000, 
gecos=0x7fff1b6bb180, homedir=0x7fff1b6bb170, shell=0x7fff1b6bb160) at 
src/responder/nss/nsssrv_mmap_cache.c:731
         mcc = 0x1086cf0
         rec = <value optimized out>
         data = <value optimized out>
         uidkey = {str = 0x7fff1b6bb010 "126516", len = 7}
         uidstr = "126516\000\000\021)\213"
         data_len = 204
         rec_len = 252
         pos = <value optimized out>
         ret = <value optimized out>

LS

The crash was caused, because mmap_cache was invalidated
and memset was called for invalidated record.

memset(rec->data, MC_INVALID_VAL8, <snip> * MC_SIZE_TO_SLOTS(rec->len))
                                                             ^^^^^^^^^^
                                                     rec->len is 0xffffffff

Good catch.

In function sss_mc_get_next_slot_with_hash I returned MC_INVALID_VAL
in case of failure. This function is called in function
sss_mc_rm_rec_from_chain, which returns no error code. So the upper
layers worked with the wrong MC_INVALID_VAL, as if it was valid value
and the MC_PTR_TO_SLOT resulted in some big slot number (which
probably caused the crash). In this patch I populate the information
about failure to to sss_mc_invalidate_rec and terminate it if
necessary.

New patch is attached. Please run the testing script with your
configuration that failed to see if it is fixed.

Thanks
Michal

I ran a stress test for 20+ hours and sssd_nss didn't crash,
only mmap_cache was reseted or reinvalidated.

Here is sssd_nss.log:
=============================================================
(Mon Sep 16 18:49:56 2013) [sssd[nss]] [monitor_common_rotate_logs] (0x0010): 
Debug level changed to 0x0070
(Mon Sep 16 18:49:57 2013) [sssd[nss]] [monitor_common_rotate_logs] (0x0010): 
Debug level changed to 0x0030
(Mon Sep 16 18:53:31 2013) [sssd[nss]] [setpwent_result_timeout] (0x0020): 
setpwent result object has expired. Cleaning up.
(Mon Sep 16 19:37:17 2013) [sssd[nss]] [sss_mc_find_record] (0x0010): Corrupted 
fastcache. name_ptr value is 16.
--- 1st reset.
(Mon Sep 16 21:32:40 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 2nd reset.
(Mon Sep 16 22:04:04 2013) [sssd[nss]] [sss_mc_find_record] (0x0010): Corrupted 
fastcache. name_ptr value is 909128497.
--- 3rd reset.
(Mon Sep 16 23:15:23 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
(Mon Sep 16 23:40:19 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
Corrupted fastcache. Slot number too big.
(Mon Sep 16 23:40:19 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
internal mmap cache error, invalidating cache!
(Mon Sep 16 23:40:19 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
user testbigldap129485(default) in mmap cache!
--- 1st invalidation.
(Tue Sep 17 06:06:51 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 4th reset.
(Tue Sep 17 06:25:04 2013) [sssd[nss]] [sss_mc_add_rec_to_chain] (0x0010): 
Corrupted fastcache. Slot number too big.
--- 5th reset.
(Tue Sep 17 07:48:23 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 6th reset.
(Tue Sep 17 08:08:36 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
Corrupted fastcache. Slot number too big.
(Tue Sep 17 08:08:36 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
internal mmap cache error, invalidating cache!
(Tue Sep 17 08:08:36 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
user testbigldap119808(default) in mmap cache!
--- 2nd invalidation.
(Tue Sep 17 12:56:02 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 7th reset.
(Tue Sep 17 13:22:27 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
Corrupted fastcache. Slot number too big.
(Tue Sep 17 13:22:27 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
internal mmap cache error, invalidating cache!
(Tue Sep 17 13:22:27 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
user testbigldap126987(default) in mmap cache!
--- 3rd invalidation.
(Tue Sep 17 14:30:56 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
(0x0010): Corrupted fastcache. Slot number too big.
--- 8th reset.
(Tue Sep 17 14:54:21 2013) [sssd[nss]] [sss_mc_add_rec_to_chain] (0x0010): 
Corrupted fastcache. Slot number too big.
--- 9th reset.
=============================================================

In older log sile, I also saw reseting caches in another functions.
BTW: I tested my patch with "two next pointers" and it did not crash and
log file is empty :-)

just few small nitpicks.

>From ba054788692f713eed0738f5b3f5e6bb48fe4c4b Mon Sep 17 00:00:00 2001
From: Michal Zidek <[email protected]>
Date: Fri, 13 Sep 2013 17:41:28 +0200
Subject: [PATCH] Check slot validity before MC_SLOT_TO_PTR.

resolves:
https://fedorahosted.org/sssd/ticket/2049
---
src/responder/nss/nsssrv_mmap_cache.c | 80 +++++++++++++++++++++++++++++++----
src/sss_client/nss_mc_common.c        |  4 ++
2 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/src/responder/nss/nsssrv_mmap_cache.c 
b/src/responder/nss/nsssrv_mmap_cache.c
index 4e45405..09a4c2b 100644
--- a/src/responder/nss/nsssrv_mmap_cache.c
+++ b/src/responder/nss/nsssrv_mmap_cache.c
@@ -191,6 +191,13 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx *mcc,
     }

     do {
+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+            DEBUG(SSSDBG_FATAL_FAILURE,
+                  ("Corrupted fastcache. Slot number too big.\n"));
+            sss_mmap_cache_reset(mcc);
+            return;
+        }
+
         cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
         if (cur == rec) {
             /* rec already stored in hash chain */
@@ -205,16 +212,24 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx 
*mcc,
     cur->next = MC_PTR_TO_SLOT(mcc->data_table, rec);
}

-static inline uint32_t
+static inline errno_t
sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
                                struct sss_mc_rec *start_rec,
-                               uint32_t hash)
+                               uint32_t hash,
+                               uint32_t *_slot)
{
     struct sss_mc_rec *rec;
     uint32_t slot;

     slot = start_rec->next;
     while (slot != MC_INVALID_VAL) {
+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+            DEBUG(SSSDBG_FATAL_FAILURE,
+                  ("Corrupted fastcache. Slot number too big.\n"));
+            sss_mmap_cache_reset(mcc);
+            return EINVAL;
+        }
+
         rec = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
         if (rec->hash1 == hash || rec->hash2 == hash) {
             break;
@@ -223,23 +238,26 @@ sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
         slot = rec->next;
     }

-    return slot;
+    *_slot = slot;
+
+    return EOK;
}

-static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
+static errno_t sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
                                      struct sss_mc_rec *rec,
                                      uint32_t hash)
{
     struct sss_mc_rec *prev = NULL;
     struct sss_mc_rec *cur = NULL;
     uint32_t slot;
+    uint32_t ret;

     if (hash > MC_HT_ELEMS(mcc->ht_size)) {
         /* It can happen if rec->hash1 and rec->hash2 was the same.
          * or it is invalid hash. It is better to return
          * than trying to access out of bounds memory
          */
-        return;
+        return EOK;
                  ^^^
           EOK should be returned on if hash is equal to MC_INVALID_VAL

Ok. I also changed the comments (split the error and normal case).


     }

     slot = mcc->hash_table[hash];
@@ -247,18 +265,37 @@ static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx 
*mcc,
         /* record has already been removed. It may happen if rec->hash1 and
          * rec->has2 are the same. (It is not very likely).
          */
-        return;
+        return EOK;
+    }
+
+    if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
+        DEBUG(SSSDBG_FATAL_FAILURE,
+              ("Corrupted fastcache. Slot number too big.\n"));
+        sss_mmap_cache_reset(mcc);
+        return EINVAL;
     }
+
     cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
     if (cur == rec) {
         /* rec->next can refer to record without matching hashes.
          * We need to skip this(those) records, because
          * mcc->hash_table[hash] have to refer to valid start of the chain.
          */
-        mcc->hash_table[hash] = sss_mc_get_next_slot_with_hash(mcc, rec, hash);
+        ret = sss_mc_get_next_slot_with_hash(mcc, rec, hash,
+                                             &mcc->hash_table[hash]);
+        if (ret != EOK) {
+            return ret;
+        }
This condition does not make sense.
You can use:
       either return ret;
       or return sss_mc_get_next_slot_with_hash

Well, I wanted the function to run to the end if nothing went wrong,
but yes, in this case we return at several places in the function so
it does not make much sense.


LS

Thanks for the review Lukas. New patch is attached.

Michal

I tested patch for sssd-1-9 branch on RHEL6
and it works fine. I think it is good alternative for sssd-1-9 branch.

ACK

LS

I would personally prefer to have this in master as well, even if the currently known issues will be fixed with your patch ([PATCH] mmap_cache: Use two chains for hash collision). I think it hurts nothing to be more defensive.

Michal

_______________________________________________
sssd-devel mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

Reply via email to