On Wed, Sep 04, 2013 at 11:23:54AM +0200, Sumit Bose wrote: > On Wed, Sep 04, 2013 at 08:50:47AM +0200, Jakub Hrozek wrote: > > On Tue, Aug 20, 2013 at 10:32:35PM +0200, Lukas Slebodnik wrote: > > > On (20/08/13 14:17), Jakub Hrozek wrote: > > > >On Tue, Aug 20, 2013 at 02:00:35PM +0200, Lukas Slebodnik wrote: > > > >> ehlo, > > > >> > > > >> I have a question about struct pam_auth_req and crash from BZ972699 > > > >> I thought It is a duplicate of upstream ticket > > > >> https://fedorahosted.org/sssd/ticket/2018 (use after free) > > > >> > > > >> But I anlysed the core dump and it doe not look like use after free. > > > >> I also checked talloc_magic for struct "pam_auth_req" and it is ok. > > > >> > > > >> Here is the definition of structure. > > > >> (gdb) ptype preq > > > >> type = struct pam_auth_req { > > > >> struct cli_ctx *cctx; > > > >> struct sss_domain_info *domain; > > > >> struct pam_data *pd; > > > >> pam_dp_callback_t *callback; > > > >> struct ldb_result *res; > > > >> _Bool check_provider; > > > >> void *data; > > > >> } * > > > >> > > > >> > > > >> And here is output from gdb > > > >> > > > >> (gdb) p *preq > > > >> $1 = {cctx = 0x9f90a0, domain = 0x9a3740, pd = 0xa178f0, callback = 0, > > > >> res = 0x0, check_provider = false, data = 0x0} > > > >> > > > >> #0 0x0000000000000000 in ?? () > > > >> #1 0x00000000004110d1 in pam_dp_process_reply (pending=0x9d3c00, > > > >> ptr=<value optimized out>) at src/responder/pam/pamsrv_dp.c:79 > > > >> #2 0x00007f1aeb0a661a in complete_pending_call_and_unlock ( > > > >> connection=0xa1e3a0, pending=0x9d3c00, message=<value optimized > > > >> out>) > > > >> at dbus-connection.c:2234 > > > >> #3 0x00007f1aeb0a886f in dbus_connection_dispatch > > > >> (connection=0xa1e3a0) > > > >> at dbus-connection.c:4397 > > > >> #4 0x000000000045425e in sbus_dispatch (ev=0x99d380, > > > >> te=<value optimized out>, tv=..., data=<value optimized out>) > > > >> at src/sbus/sssd_dbus_connection.c:104 > > > >> #5 0x00007f1aeb920bd9 in tevent_common_loop_timer_delay (ev=0x99d380) > > > >> at ../tevent_timed.c:254 > > > >> > > > >> 77 dbus_pending_call_unref(pending); > > > >> 78 dbus_message_unref(msg); > > > >> 79 preq->callback(preq); <<<<<<< CHASH HERE > > > >> 80 } > > > >> 81 > > > >> > > > >> sssd_pam crashed because preq->callback was NULL. > > > >> > > > >> I would like to describe struct pam_auth_req more deeply. > > > >> struct pam_auth_req is created in function pam_forwarder > > > >> form file "src/responder/pam/pamsrv_cmd.c". > > > >> > > > >> Most of structure members are initialized in this function, but > > > >> *calback* > > > >> is initialized in function pam_dom_forwarder. > > > >> 1173 if (!NEED_CHECK_PROVIDER(preq->domain->provider)) { > > > >> 1174 preq->callback = pam_reply; <<<<HERE > > > >> 1175 ret = LOCAL_pam_handler(preq); > > > >> 1176 } > > > >> 1177 else { > > > >> 1178 preq->callback = pam_reply; <<<<< OR HERE > > > > > > > >It was probably this one, the other call is only called for > > > >id_provider=local > > > > > > > preq->callback is NULL, pam_dom_forwarder could not be called. > > > Output from gdb is in this mail few lines before. > > > > > > >> 1179 ret = pam_dp_send_req(preq, SSS_CLI_SOCKET_TIMEOUT/2); > > > >> 1180 DEBUG(4, ("pam_dp_send_req returned %d\n", ret)); > > > >> 1181 } > > > >> 1182 > > > >> > > > >> There are 3 places where is pam_dom_forwarder called and it is called > > > >> only > > > >> in some condition (pam_check_user_search returned EOK) > > > >> Mostly it looks like following lines: > > > >> 852 ret = pam_check_user_search(preq); > > > >> 853 if (ret == EOK) { > > > >> 854 pam_dom_forwarder(preq); > > > >> 855 } > > > >> > > > >> > > > >> And my question are: > > > >> 1. Do we need to initialize pam_auth_req->callback only in function > > > >> pam_dom_forwarder? > > > > > > > >We could also move the callback to src/responder/pam/pamsrv_dp.c and > > > >either assign it from pam_dp_send_req() or call it directly from > > > >pam_dp_process_reply() because the way I read the code we can only ever > > > >call pam_dp_send_req with cb == pam_reply; > > > > > > > >> 2. Should we check preq->callback for NULL? > > > > > > > >Maybe, as a sanity check, but we still should fix the real bug. > > > > > > > >> 3. Do you have any idea how to reproduce it? > > > > > > > >Can we ask the reporter to run with valgrind? Also, did we rule out > > > >that sssd_be might be crashing or being restarted in between the requst > > > >is send and before it finishes? > > > > > > > There isn't a reproducer for this bug. > > > Reporter wrote: "Every now and then, sssd terminates" > > > > > > >> > > > >> I really don't know what is right solution. > > > >> > > > >> Thank you very much for any ideas. > > > >> > > > >> LS > > > > Hi, > > > > are there any other ideas on how to proceed with this problem? I haven't > > found any.. > > Maybe new logs with a higher debug level would help. Form the old logs > attached to the BZ it looks that the auth requests just needs a long > time and the client (sshd) start a new request before the old is > finished. This would indicate the use after free issue which does not > match the core dump. New logs would help to find out how SSSD is getting > into the state where the callback is NULL. > > bye, > Sumit
Lukas analyzed the core once again (the reporter uploaded new corefiles after I sent the mail to sssd-devel) and we're now reasonably sure he is hitting a known bug where, as you said, the request is taking a long time, the client times out and the responder accesses freed memory that was freed as a result of client timeout destructor. I will just build him a scratch build, maybe he can test it out. Looks like he's running CentOS anyway, so he might now worry about support :) Thanks for another look at the issue. _______________________________________________ sssd-devel mailing list [email protected] https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
