On (17/09/13 17:15), Michal Židek wrote:
>On 09/17/2013 04:41 PM, Lukas Slebodnik wrote:
>>On (16/09/13 18:16), Michal Židek wrote:
>>>On 09/14/2013 12:35 AM, Lukas Slebodnik wrote:
>>>>On (13/09/13 19:17), Michal Židek wrote:
>>>>>On 09/13/2013 05:58 PM, Michal Židek wrote:
>>>>>>Hello,
>>>>>>
>>>>>>This patch should add another line of defence against memory cache
>>>>>>problems caused by accessing slot outside of bounds.
>>>>>>
>>>>>>Thanks
>>>>>>Michal
>>>>>>
>>>>>
>>>>>After discussion with Lukas I am attaching alternative version
>>>>>without the call to save the corrupted cache.
>>>>>
>>>>>Thanks
>>>>>Michal
>>>>>
>>>>
>>>>I tested patch with latest sssd-1-9 on RHEL6.
>>>>
>>>>Message was written to the sssd_nss.log:
>>>>(Sat Sep 14 00:19:24 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash]
>>>>(0x0010): Corrupted fastcache. Slot number too big.
>>>>
>>>>But it seems that sssd_nss crashed.
>>>>
>>>>Program received signal SIGILL, Illegal instruction.
>>>>0x00007fa63f8b5b03 in mabort () from /lib64/libc.so.6
>>>>#0  0x00007fa63f8b5b03 in mabort () from /lib64/libc.so.6
>>>>No symbol table info available.
>>>>#1  0x0000000000000008 in ?? ()
>>>>No symbol table info available.
>>>>#2  0x0000000001086cf0 in ?? ()
>>>>No symbol table info available.
>>>>#3  0x00000000010842a0 in ?? ()
>>>>No symbol table info available.
>>>>#4  0x00000000004288de in sss_mc_find_free_slots (_mcc=0x1086cf0, 
>>>>rec_len=252, key=<value optimized out>, _rec=0x7fff1b6bb008) at 
>>>>src/responder/nss/nsssrv_mmap_cache.c:495
>>>>         tot_slots = <value optimized out>
>>>>         i = <value optimized out>
>>>>         rec = <value optimized out>
>>>>         cur = 0
>>>>         t = <value optimized out>
>>>>         used = true
>>>>#5  sss_mc_get_record (_mcc=0x1086cf0, rec_len=252, key=<value optimized 
>>>>out>, _rec=0x7fff1b6bb008) at src/responder/nss/nsssrv_mmap_cache.c:637
>>>>         mcc = 0x7fa632dfa038
>>>>         old_rec = <value optimized out>
>>>>         rec = <value optimized out>
>>>>         old_slots = <value optimized out>
>>>>         num_slots = 8
>>>>         base_slot = <value optimized out>
>>>>         ret = 853516376
>>>>         i = <value optimized out>
>>>>         __FUNCTION__ = "sss_mc_get_record"
>>>>#6  0x0000000000428eee in sss_mmap_cache_pw_store (_mcc=0x10842a0, 
>>>>name=0x7fff1b6bb140, pw=0x7fff1b6bb150, uid=126516, gid=16000, 
>>>>gecos=0x7fff1b6bb180, homedir=0x7fff1b6bb170, shell=0x7fff1b6bb160) at 
>>>>src/responder/nss/nsssrv_mmap_cache.c:731
>>>>         mcc = 0x1086cf0
>>>>         rec = <value optimized out>
>>>>         data = <value optimized out>
>>>>         uidkey = {str = 0x7fff1b6bb010 "126516", len = 7}
>>>>         uidstr = "126516\000\000\021)\213"
>>>>         data_len = 204
>>>>         rec_len = 252
>>>>         pos = <value optimized out>
>>>>         ret = <value optimized out>
>>>>
>>>>LS
>>>>
>>The crash was caused, because mmap_cache was invalidated
>>and memset was called for invalidated record.
>>
>>memset(rec->data, MC_INVALID_VAL8, <snip> * MC_SIZE_TO_SLOTS(rec->len))
>>                                                             ^^^^^^^^^^
>>                                                     rec->len is 0xffffffff
>>>
>>>Good catch.
>>>
>>>In function sss_mc_get_next_slot_with_hash I returned MC_INVALID_VAL
>>>in case of failure. This function is called in function
>>>sss_mc_rm_rec_from_chain, which returns no error code. So the upper
>>>layers worked with the wrong MC_INVALID_VAL, as if it was valid value
>>>and the MC_PTR_TO_SLOT resulted in some big slot number (which
>>>probably caused the crash). In this patch I populate the information
>>>about failure to to sss_mc_invalidate_rec and terminate it if
>>>necessary.
>>>
>>>New patch is attached. Please run the testing script with your
>>>configuration that failed to see if it is fixed.
>>>
>>>Thanks
>>>Michal
>>>
>>I ran a stress test for 20+ hours and sssd_nss didn't crash,
>>only mmap_cache was reseted or reinvalidated.
>>
>>Here is sssd_nss.log:
>>=============================================================
>>(Mon Sep 16 18:49:56 2013) [sssd[nss]] [monitor_common_rotate_logs] (0x0010): 
>>Debug level changed to 0x0070
>>(Mon Sep 16 18:49:57 2013) [sssd[nss]] [monitor_common_rotate_logs] (0x0010): 
>>Debug level changed to 0x0030
>>(Mon Sep 16 18:53:31 2013) [sssd[nss]] [setpwent_result_timeout] (0x0020): 
>>setpwent result object has expired. Cleaning up.
>>(Mon Sep 16 19:37:17 2013) [sssd[nss]] [sss_mc_find_record] (0x0010): 
>>Corrupted fastcache. name_ptr value is 16.
>>--- 1st reset.
>>(Mon Sep 16 21:32:40 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
>>(0x0010): Corrupted fastcache. Slot number too big.
>>--- 2nd reset.
>>(Mon Sep 16 22:04:04 2013) [sssd[nss]] [sss_mc_find_record] (0x0010): 
>>Corrupted fastcache. name_ptr value is 909128497.
>>--- 3rd reset.
>>(Mon Sep 16 23:15:23 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
>>(0x0010): Corrupted fastcache. Slot number too big.
>>(Mon Sep 16 23:40:19 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
>>Corrupted fastcache. Slot number too big.
>>(Mon Sep 16 23:40:19 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
>>internal mmap cache error, invalidating cache!
>>(Mon Sep 16 23:40:19 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
>>user testbigldap129485(default) in mmap cache!
>>--- 1st invalidation.
>>(Tue Sep 17 06:06:51 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
>>(0x0010): Corrupted fastcache. Slot number too big.
>>--- 4th reset.
>>(Tue Sep 17 06:25:04 2013) [sssd[nss]] [sss_mc_add_rec_to_chain] (0x0010): 
>>Corrupted fastcache. Slot number too big.
>>--- 5th reset.
>>(Tue Sep 17 07:48:23 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
>>(0x0010): Corrupted fastcache. Slot number too big.
>>--- 6th reset.
>>(Tue Sep 17 08:08:36 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
>>Corrupted fastcache. Slot number too big.
>>(Tue Sep 17 08:08:36 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
>>internal mmap cache error, invalidating cache!
>>(Tue Sep 17 08:08:36 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
>>user testbigldap119808(default) in mmap cache!
>>--- 2nd invalidation.
>>(Tue Sep 17 12:56:02 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
>>(0x0010): Corrupted fastcache. Slot number too big.
>>--- 7th reset.
>>(Tue Sep 17 13:22:27 2013) [sssd[nss]] [sss_mc_is_valid_rec] (0x0010): 
>>Corrupted fastcache. Slot number too big.
>>(Tue Sep 17 13:22:27 2013) [sssd[nss]] [sss_mc_get_record] (0x0020): Fatal 
>>internal mmap cache error, invalidating cache!
>>(Tue Sep 17 13:22:27 2013) [sssd[nss]] [fill_pwent] (0x0020): Failed to store 
>>user testbigldap126987(default) in mmap cache!
>>--- 3rd invalidation.
>>(Tue Sep 17 14:30:56 2013) [sssd[nss]] [sss_mc_get_next_slot_with_hash] 
>>(0x0010): Corrupted fastcache. Slot number too big.
>>--- 8th reset.
>>(Tue Sep 17 14:54:21 2013) [sssd[nss]] [sss_mc_add_rec_to_chain] (0x0010): 
>>Corrupted fastcache. Slot number too big.
>>--- 9th reset.
>>=============================================================
>>
>>In older log sile, I also saw reseting caches in another functions.
>>BTW: I tested my patch with "two next pointers" and it did not crash and
>>log file is empty :-)
>>
>>just few small nitpicks.
>>
>>>From ba054788692f713eed0738f5b3f5e6bb48fe4c4b Mon Sep 17 00:00:00 2001
>>>From: Michal Zidek <[email protected]>
>>>Date: Fri, 13 Sep 2013 17:41:28 +0200
>>>Subject: [PATCH] Check slot validity before MC_SLOT_TO_PTR.
>>>
>>>resolves:
>>>https://fedorahosted.org/sssd/ticket/2049
>>>---
>>>src/responder/nss/nsssrv_mmap_cache.c | 80 
>>>+++++++++++++++++++++++++++++++----
>>>src/sss_client/nss_mc_common.c        |  4 ++
>>>2 files changed, 75 insertions(+), 9 deletions(-)
>>>
>>>diff --git a/src/responder/nss/nsssrv_mmap_cache.c 
>>>b/src/responder/nss/nsssrv_mmap_cache.c
>>>index 4e45405..09a4c2b 100644
>>>--- a/src/responder/nss/nsssrv_mmap_cache.c
>>>+++ b/src/responder/nss/nsssrv_mmap_cache.c
>>>@@ -191,6 +191,13 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx 
>>>*mcc,
>>>     }
>>>
>>>     do {
>>>+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
>>>+            DEBUG(SSSDBG_FATAL_FAILURE,
>>>+                  ("Corrupted fastcache. Slot number too big.\n"));
>>>+            sss_mmap_cache_reset(mcc);
>>>+            return;
>>>+        }
>>>+
>>>         cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
>>>         if (cur == rec) {
>>>             /* rec already stored in hash chain */
>>>@@ -205,16 +212,24 @@ static void sss_mc_add_rec_to_chain(struct sss_mc_ctx 
>>>*mcc,
>>>     cur->next = MC_PTR_TO_SLOT(mcc->data_table, rec);
>>>}
>>>
>>>-static inline uint32_t
>>>+static inline errno_t
>>>sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
>>>                                struct sss_mc_rec *start_rec,
>>>-                               uint32_t hash)
>>>+                               uint32_t hash,
>>>+                               uint32_t *_slot)
>>>{
>>>     struct sss_mc_rec *rec;
>>>     uint32_t slot;
>>>
>>>     slot = start_rec->next;
>>>     while (slot != MC_INVALID_VAL) {
>>>+        if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
>>>+            DEBUG(SSSDBG_FATAL_FAILURE,
>>>+                  ("Corrupted fastcache. Slot number too big.\n"));
>>>+            sss_mmap_cache_reset(mcc);
>>>+            return EINVAL;
>>>+        }
>>>+
>>>         rec = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
>>>         if (rec->hash1 == hash || rec->hash2 == hash) {
>>>             break;
>>>@@ -223,23 +238,26 @@ sss_mc_get_next_slot_with_hash(struct sss_mc_ctx *mcc,
>>>         slot = rec->next;
>>>     }
>>>
>>>-    return slot;
>>>+    *_slot = slot;
>>>+
>>>+    return EOK;
>>>}
>>>
>>>-static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
>>>+static errno_t sss_mc_rm_rec_from_chain(struct sss_mc_ctx *mcc,
>>>                                      struct sss_mc_rec *rec,
>>>                                      uint32_t hash)
>>>{
>>>     struct sss_mc_rec *prev = NULL;
>>>     struct sss_mc_rec *cur = NULL;
>>>     uint32_t slot;
>>>+    uint32_t ret;
>>>
>>>     if (hash > MC_HT_ELEMS(mcc->ht_size)) {
>>>         /* It can happen if rec->hash1 and rec->hash2 was the same.
>>>          * or it is invalid hash. It is better to return
>>>          * than trying to access out of bounds memory
>>>          */
>>>-        return;
>>>+        return EOK;
>>                  ^^^
>>           EOK should be returned on if hash is equal to MC_INVALID_VAL
>
>Ok. I also changed the comments (split the error and normal case).
>
>>
>>>     }
>>>
>>>     slot = mcc->hash_table[hash];
>>>@@ -247,18 +265,37 @@ static void sss_mc_rm_rec_from_chain(struct sss_mc_ctx 
>>>*mcc,
>>>         /* record has already been removed. It may happen if rec->hash1 and
>>>          * rec->has2 are the same. (It is not very likely).
>>>          */
>>>-        return;
>>>+        return EOK;
>>>+    }
>>>+
>>>+    if (!MC_SLOT_WITHIN_BOUNDS(slot, mcc->dt_size)) {
>>>+        DEBUG(SSSDBG_FATAL_FAILURE,
>>>+              ("Corrupted fastcache. Slot number too big.\n"));
>>>+        sss_mmap_cache_reset(mcc);
>>>+        return EINVAL;
>>>     }
>>>+
>>>     cur = MC_SLOT_TO_PTR(mcc->data_table, slot, struct sss_mc_rec);
>>>     if (cur == rec) {
>>>         /* rec->next can refer to record without matching hashes.
>>>          * We need to skip this(those) records, because
>>>          * mcc->hash_table[hash] have to refer to valid start of the chain.
>>>          */
>>>-        mcc->hash_table[hash] = sss_mc_get_next_slot_with_hash(mcc, rec, 
>>>hash);
>>>+        ret = sss_mc_get_next_slot_with_hash(mcc, rec, hash,
>>>+                                             &mcc->hash_table[hash]);
>>>+        if (ret != EOK) {
>>>+            return ret;
>>>+        }
>>This condition does not make sense.
>>You can use:
>>       either return ret;
>>       or return sss_mc_get_next_slot_with_hash
>
>Well, I wanted the function to run to the end if nothing went wrong,
>but yes, in this case we return at several places in the function so
>it does not make much sense.
>
>>
>>LS
>
>Thanks for the review Lukas. New patch is attached.
>
>Michal

I tested patch for sssd-1-9 branch on RHEL6
and it works fine. I think it is good alternative for sssd-1-9 branch.

ACK

LS
_______________________________________________
sssd-devel mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

Reply via email to