Re: Kernel regression in head, now unusable for package building

2019-04-08 Thread Antoine Brodin
On Mon, Apr 8, 2019 at 8:36 PM Charlie Li  wrote:
> Konstantin Belousov wrote:
> > On Mon, Apr 08, 2019 at 12:10:23AM +0200, Antoine Brodin wrote:
> >> There seems to be a kernel regression in head,  that happened
> >> somewhere between r343921 and r345991.
> >> When launching "poudriere bulk -a",  the ssh session is terminated
> >> when poudriere attempts to clone/start builders (tmpfs mounts, file
> >> copying...),  the jails don't start and the consequence is that we
> >> can't build any package.
> > Are there any more details about the issue ?  It is not clear, does the
> > machine survives the event, i.e. did kernel paniced, what are the console
> > messages, any more details that you can provide.
> >
> I just ran into this both on a remote machine and the laptop I'm typing
> on right now. At least the reference jail does start and run, as any
> subsequent poudriere-bulk(8) invocations detect it. The entire login
> session is killed in the process, and the only clue of anything I can
> find (at least in the syslog) is that ntpd exits with Hangup:
>
> Apr  8 14:12:27 ardmore ntpd[74109]: ntpd exiting on signal 1 (Hangup)
>
> This seems to happen randomly, but still quite often. Restarting ntpd
> can help, though ntpd can still exit and kill the login again. Without
> ntpd running, running poudriere-bulk(8) is guaranteed to kill the login.

It should be fixed in https://svnweb.freebsd.org/changeset/base/346029

Antoine
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Kernel regression in head, now unusable for package building

2019-04-08 Thread Charlie Li
Konstantin Belousov wrote:
> On Mon, Apr 08, 2019 at 12:10:23AM +0200, Antoine Brodin wrote:
>> There seems to be a kernel regression in head,  that happened
>> somewhere between r343921 and r345991.
>> When launching "poudriere bulk -a",  the ssh session is terminated
>> when poudriere attempts to clone/start builders (tmpfs mounts, file
>> copying...),  the jails don't start and the consequence is that we
>> can't build any package.
> Are there any more details about the issue ?  It is not clear, does the
> machine survives the event, i.e. did kernel paniced, what are the console
> messages, any more details that you can provide.
> 
I just ran into this both on a remote machine and the laptop I'm typing
on right now. At least the reference jail does start and run, as any
subsequent poudriere-bulk(8) invocations detect it. The entire login
session is killed in the process, and the only clue of anything I can
find (at least in the syslog) is that ntpd exits with Hangup:

Apr  8 14:12:27 ardmore ntpd[74109]: ntpd exiting on signal 1 (Hangup)

This seems to happen randomly, but still quite often. Restarting ntpd
can help, though ntpd can still exit and kill the login again. Without
ntpd running, running poudriere-bulk(8) is guaranteed to kill the login.

-- 
Charlie Li
…nope, still don't have an exit line.

(This email address is for mailing list use; replace local-part with
vishwin for off-list communication if possible)



signature.asc
Description: OpenPGP digital signature


Re: Kernel regression in head, now unusable for package building

2019-04-08 Thread Konstantin Belousov
On Mon, Apr 08, 2019 at 12:10:23AM +0200, Antoine Brodin wrote:
> Hi,
> 
> There seems to be a kernel regression in head,  that happened
> somewhere between r343921 and r345991.
Can you bisect ?

I looked over the stated range and I do not see a revision that would be
an obvious candidate for the regression. I could think about r345955,
r345980, and r345982/r345983 as something that might be most interesting
to try.

> When launching "poudriere bulk -a",  the ssh session is terminated
> when poudriere attempts to clone/start builders (tmpfs mounts, file
> copying...),  the jails don't start and the consequence is that we
> can't build any package.
Are there any more details about the issue ?  It is not clear, does the
machine survives the event, i.e. did kernel paniced, what are the console
messages, any more details that you can provide.

Long time ago there was a similarly sound bug where mountd(8) signalled wrong
process, which caused ssh sessions termination.

> 
> Cheers,
> 
> Antoine (with hat: portmgr)
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"