Re: [Freeipa-users] SSSD Crash Causing Inaccessibility

2016-02-01 Thread Lukas Slebodnik
On (29/01/16 14:08), Jeff Hallyburton wrote:
>Lukas,
>
>Installed versions of sssd:
>
># rpm -qa | grep -i sssd
>
>sssd-common-1.13.0-40.el7_2.1.x86_64
>
>sssd-ipa-1.13.0-40.el7_2.1.x86_64
>
>sssd-1.13.0-40.el7_2.1.x86_64
>
>sssd-krb5-common-1.13.0-40.el7_2.1.x86_64
>
>sssd-ad-1.13.0-40.el7_2.1.x86_64
>
>sssd-ldap-1.13.0-40.el7_2.1.x86_64
>
>sssd-proxy-1.13.0-40.el7_2.1.x86_64
>
>python-sssdconfig-1.13.0-40.el7_2.1.noarch
>
>sssd-client-1.13.0-40.el7_2.1.x86_64
>
>sssd-common-pac-1.13.0-40.el7_2.1.x86_64
>
>sssd-krb5-1.13.0-40.el7_2.1.x86_64
>
>No core dumps unfortunately.
>
That's stable version.
But on the other hand we do not test OOM.
Is there something interesting in /varlog/sssd/sssd.log?

LS

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


Re: [Freeipa-users] SSSD Crash Causing Inaccessibility

2016-01-29 Thread Lukas Slebodnik
On (28/01/16 18:37), Jeff Hallyburton wrote:
>Application logs showed this to be due to an OOM error, so no need to chase
>this further.  Thanks for the quick response!
>
Even though it was OOM.
I would still be interested in version of sssd.
"access after free error" is bed error.

Do you have a coredump. It might be stored
by abrt or systemd-coredumpd (coredumpctl)

LS

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


Re: [Freeipa-users] SSSD Crash Causing Inaccessibility

2016-01-29 Thread Jeff Hallyburton
Understood.  Unfortunately, this event has been diagnosed and mitigated, so
re-occurance is unlikely.  Will respond to this thread if we see any
repeats however, totally understand the need for further information here.

Jeff

Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com

Engineering Support: supp...@bloomip.com
Billing Support: bill...@bloomip.com
Customer Support Portal:  https://my.bloomip.com 

On Fri, Jan 29, 2016 at 5:44 PM, Jakub Hrozek  wrote:

> On Fri, Jan 29, 2016 at 01:49:15PM +0100, Lukas Slebodnik wrote:
> > On (28/01/16 18:37), Jeff Hallyburton wrote:
> > >Application logs showed this to be due to an OOM error, so no need to
> chase
> > >this further.  Thanks for the quick response!
> > >
> > Even though it was OOM.
> > I would still be interested in version of sssd.
> > "access after free error" is bed error.
> >
> > Do you have a coredump. It might be stored
> > by abrt or systemd-coredumpd (coredumpctl)
>
> This problem reminds me of:
> https://fedorahosted.org/sssd/ticket/2886
>
> Sadly, that one was also a one-time condition and we could never get to
> the root cause from the corefile.
>
> I agree with Lukas the core would be nice to see..
>
> --
> Manage your subscription for the Freeipa-users mailing list:
> https://www.redhat.com/mailman/listinfo/freeipa-users
> Go to http://freeipa.org for more info on the project
>
-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] SSSD Crash Causing Inaccessibility

2016-01-29 Thread Jakub Hrozek
On Fri, Jan 29, 2016 at 01:49:15PM +0100, Lukas Slebodnik wrote:
> On (28/01/16 18:37), Jeff Hallyburton wrote:
> >Application logs showed this to be due to an OOM error, so no need to chase
> >this further.  Thanks for the quick response!
> >
> Even though it was OOM.
> I would still be interested in version of sssd.
> "access after free error" is bed error.
> 
> Do you have a coredump. It might be stored
> by abrt or systemd-coredumpd (coredumpctl)

This problem reminds me of:
https://fedorahosted.org/sssd/ticket/2886

Sadly, that one was also a one-time condition and we could never get to
the root cause from the corefile.

I agree with Lukas the core would be nice to see..

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


Re: [Freeipa-users] SSSD Crash Causing Inaccessibility

2016-01-29 Thread Jeff Hallyburton
Lukas,

Installed versions of sssd:

# rpm -qa | grep -i sssd

sssd-common-1.13.0-40.el7_2.1.x86_64

sssd-ipa-1.13.0-40.el7_2.1.x86_64

sssd-1.13.0-40.el7_2.1.x86_64

sssd-krb5-common-1.13.0-40.el7_2.1.x86_64

sssd-ad-1.13.0-40.el7_2.1.x86_64

sssd-ldap-1.13.0-40.el7_2.1.x86_64

sssd-proxy-1.13.0-40.el7_2.1.x86_64

python-sssdconfig-1.13.0-40.el7_2.1.noarch

sssd-client-1.13.0-40.el7_2.1.x86_64

sssd-common-pac-1.13.0-40.el7_2.1.x86_64

sssd-krb5-1.13.0-40.el7_2.1.x86_64

No core dumps unfortunately.

Thanks,

Jeff


Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com

Engineering Support: supp...@bloomip.com
Billing Support: bill...@bloomip.com
Customer Support Portal:  https://my.bloomip.com 

On Fri, Jan 29, 2016 at 7:49 AM, Lukas Slebodnik 
wrote:

> On (28/01/16 18:37), Jeff Hallyburton wrote:
> >Application logs showed this to be due to an OOM error, so no need to
> chase
> >this further.  Thanks for the quick response!
> >
> Even though it was OOM.
> I would still be interested in version of sssd.
> "access after free error" is bed error.
>
> Do you have a coredump. It might be stored
> by abrt or systemd-coredumpd (coredumpctl)
>
> LS
>
-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] SSSD Crash Causing Inaccessibility

2016-01-28 Thread Lukas Slebodnik
On (28/01/16 16:25), Jeff Hallyburton wrote:
>We saw the following happen on a system today, and wanted to follow up:
>
>System became unresponsive to ssh logins with the error:
>
>ssh -v incentives01
>
//snip

># cat /var/log/sssd/sssd.log
>
>(Thu Jan 28 20:15:56 2016) [sssd] [mt_svc_sigkill] (0x0010): [enervee.com][620]
>is not responding to SIGTERM. Sending SIGKILL.
>
>(Thu Jan 28 20:16:27 2016) [sssd] [talloc_log_fn] (0x0010): talloc: access
>after free error - first free may be at src/monitor/monitor.c:2760
>
>
>(Thu Jan 28 20:16:27 2016) [sssd] [talloc_log_fn] (0x0010): Bad talloc
>magic value - access after free
>
There was a crash in sssd. It might explain why you cannot login.
Which version of sssd do you have?

LS

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


[Freeipa-users] SSSD Crash Causing Inaccessibility

2016-01-28 Thread Jeff Hallyburton
We saw the following happen on a system today, and wanted to follow up:

System became unresponsive to ssh logins with the error:

ssh -v incentives01

OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: /etc/ssh/ssh_config line 4: Applying options for *

debug1: Connecting to incentives01 [172.31.9.16] port 22.

debug1: Connection established.

debug1: identity file /home/jeff.hallyburton/.ssh/id_rsa type -1

debug1: identity file /home/jeff.hallyburton/.ssh/id_rsa-cert type -1

debug1: identity file /home/jeff.hallyburton/.ssh/id_dsa type -1

debug1: identity file /home/jeff.hallyburton/.ssh/id_dsa-cert type -1

debug1: identity file /home/jeff.hallyburton/.ssh/id_ecdsa type -1

debug1: identity file /home/jeff.hallyburton/.ssh/id_ecdsa-cert type -1

debug1: identity file /home/jeff.hallyburton/.ssh/id_ed25519 type -1

debug1: identity file /home/jeff.hallyburton/.ssh/id_ed25519-cert type -1

debug1: Enabling compatibility mode for protocol 2.0

debug1: Local version string SSH-2.0-OpenSSH_6.6.1

debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1

debug1: match: OpenSSH_6.6.1 pat OpenSSH_6.6.1* compat 0x0400

debug1: SSH2_MSG_KEXINIT sent

debug1: SSH2_MSG_KEXINIT received

debug1: kex: server->client aes128-ctr hmac-md5-...@openssh.com none

debug1: kex: client->server aes128-ctr hmac-md5-...@openssh.com none

debug1: kex: curve25519-sha...@libssh.org need=16 dh_need=16

debug1: kex: curve25519-sha...@libssh.org need=16 dh_need=16

debug1: sending SSH2_MSG_KEX_ECDH_INIT

debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

debug1: Server host key: ECDSA
89:e0:f8:25:21:db:c9:46:67:14:38:0c:c1:f4:f7:51

debug1: Host 'incentives01' is known and matches the ECDSA host key.

debug1: Found key in /home/jeff.hallyburton/.ssh/known_hosts:7

debug1: ssh_ecdsa_verify: signature correct

debug1: SSH2_MSG_NEWKEYS sent

debug1: expecting SSH2_MSG_NEWKEYS

debug1: SSH2_MSG_NEWKEYS received

debug1: SSH2_MSG_SERVICE_REQUEST sent

debug1: SSH2_MSG_SERVICE_ACCEPT received

This is a private computer system which is restricted to authorized
individuals.


Actual or attempted unauthorized use of this computer system will result in

criminal and/or civil prosecution.


We reserve the right to view, monitor and record activity on the system
without

notice or permission. Any information obtained by monitoring, reviewing or

recording is subject to review by law enforcement organizations in
connection

with the investigation or prosecution of possible criminal activity on this
system.


If you are not an authorized user of this system or do not consent to
continued

monitoring, disconnect at this time.

debug1: Authentications that can continue:
publickey,gssapi-keyex,gssapi-with-mic

debug1: Next authentication method: gssapi-keyex

debug1: No valid Key exchange context

debug1: Next authentication method: gssapi-with-mic

debug1: Authentications that can continue:
publickey,gssapi-keyex,gssapi-with-mic

debug1: Authentications that can continue:
publickey,gssapi-keyex,gssapi-with-mic

debug1: Authentications that can continue:
publickey,gssapi-keyex,gssapi-with-mic

Received disconnect from 172.31.9.16: 2: Too many authentication failures
for jeff.hallyburton

Ultimately we rebooted the node to restore connectivity.  After we were
back in, we're seeing that sssd crashed due what looks like a memory
allocation error:

/var/log/sssd/sssd.log

# cat /var/log/sssd/sssd.log

(Thu Jan 28 20:15:56 2016) [sssd] [mt_svc_sigkill] (0x0010): [enervee.com][620]
is not responding to SIGTERM. Sending SIGKILL.

(Thu Jan 28 20:16:27 2016) [sssd] [talloc_log_fn] (0x0010): talloc: access
after free error - first free may be at src/monitor/monitor.c:2760


(Thu Jan 28 20:16:27 2016) [sssd] [talloc_log_fn] (0x0010): Bad talloc
magic value - access after free

/var/log/secure

Jan 28 20:05:48 incentives01 sshd[26145]: Timeout, client not responding.

Jan 28 20:05:48 incentives01 sshd[26142]: pam_unix(sshd:session): session
closed for user

Jan 28 20:16:28 incentives01 sshd[14504]: Timeout, client not responding.

Jan 28 20:16:28 incentives01 sshd[14501]: pam_systemd(sshd:session): Failed
to release session: Connection timed out

Jan 28 20:16:28 incentives01 sshd[14501]: pam_unix(sshd:session): session
closed for user

Jan 28 20:16:28 incentives01 sshd[14501]: pam_sss(sshd:session): Request to
sssd failed. Bad address

Jan 28 20:16:29 incentives01 sshd[14501]: fatal: login_init_entry: Cannot
find user

Jan 28 20:21:40 incentives01 sshd[26882]: Invalid user from 172.31.8.34

The system may have simply run out of ram, but wanted to check to see if
there were any known or contributing issues.

Thanks,

Jeff

Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com

Engineering Support: supp...@bloomip.com
Billing Support: bill...@bloomip.com
Customer Support Portal:  https://my.bloomip.com 
-- 
Manage your subscription 

Re: [Freeipa-users] SSSD Crash Causing Inaccessibility

2016-01-28 Thread Jeff Hallyburton
Application logs showed this to be due to an OOM error, so no need to chase
this further.  Thanks for the quick response!

Jeff

Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com

Engineering Support: supp...@bloomip.com
Billing Support: bill...@bloomip.com
Customer Support Portal:  https://my.bloomip.com 

On Thu, Jan 28, 2016 at 6:22 PM, Lukas Slebodnik 
wrote:

> On (28/01/16 16:25), Jeff Hallyburton wrote:
> >We saw the following happen on a system today, and wanted to follow up:
> >
> >System became unresponsive to ssh logins with the error:
> >
> >ssh -v incentives01
> >
> //snip
>
> ># cat /var/log/sssd/sssd.log
> >
> >(Thu Jan 28 20:15:56 2016) [sssd] [mt_svc_sigkill] (0x0010): [enervee.com
> ][620]
> >is not responding to SIGTERM. Sending SIGKILL.
> >
> >(Thu Jan 28 20:16:27 2016) [sssd] [talloc_log_fn] (0x0010): talloc: access
> >after free error - first free may be at src/monitor/monitor.c:2760
> >
> >
> >(Thu Jan 28 20:16:27 2016) [sssd] [talloc_log_fn] (0x0010): Bad talloc
> >magic value - access after free
> >
> There was a crash in sssd. It might explain why you cannot login.
> Which version of sssd do you have?
>
> LS
>
-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project