Re: bisected: 4.18-rc* regression: x86-32 troubles (with timers?)
> >> Now this seems more relevant: > >> > >> mroos@rx100s2:~/linux$ nice git bisect good > >> 24dea04767e6e5175f4750770281b0c17ac6a2fb is the first bad commit > >> commit 24dea04767e6e5175f4750770281b0c17ac6a2fb > >> Author: Daniel Borkmann > >> Date: Fri May 4 01:08:23 2018 +0200 > >> > >> bpf, x32: remove ld_abs/ld_ind > >> > >> Since LD_ABS/LD_IND instructions are now removed from the core and > >> reimplemented through a combination of inlined BPF instructions and > >> a slow-path helper, we can get rid of the complexity from x32 JIT. > > > > This does seem much more likely than the previous bisection, given > > that you ended up in an x86-32 specific commit (the subject says x32, > > but that is a mistake). I also checked that systemd indeed does > > call into bpf in a number of places, possibly for the journald socket. > > > > OTOH, it's still hard to tell how that commit can have ended up > > corrupting the clock read function in systemd. To cross-check, > > could you try reverting that commit on the latest kernel and see > > if it still works? > > I would be curious as well about that whether revert would make it > work. What's the value of sysctl net.core.bpf_jit_enable ? Does it > change anything if you set it to 0 (only interpreter) or 1 (JIT > enabled). Seems a bit strange to me that bisect ended at this commit > given the issue you have. The JIT itself was also new in this window > fwiw. In any case some more debug info would be great to have. net.core.bpf_jit_enable is 1. Since it breaks bootup, I can not easily change the value at runtime (it would be postfactum). Do you mean changing the CONFIG_BPF_JIT_ALWAYS_ON=y option? Anyway, I started compile of v4.18-rc5 that was the latest I tested, with the commit in question reverted. Will see if I can test tomorrow morning. But I will leave tomorrow for a week and can only test further things if they happen to boot fine (no manual reboot possible for a week). -- Meelis Roos (mr...@linux.ee)
Re: bisected: 4.18-rc* regression: x86-32 troubles (with timers?)
> >> Now this seems more relevant: > >> > >> mroos@rx100s2:~/linux$ nice git bisect good > >> 24dea04767e6e5175f4750770281b0c17ac6a2fb is the first bad commit > >> commit 24dea04767e6e5175f4750770281b0c17ac6a2fb > >> Author: Daniel Borkmann > >> Date: Fri May 4 01:08:23 2018 +0200 > >> > >> bpf, x32: remove ld_abs/ld_ind > >> > >> Since LD_ABS/LD_IND instructions are now removed from the core and > >> reimplemented through a combination of inlined BPF instructions and > >> a slow-path helper, we can get rid of the complexity from x32 JIT. > > > > This does seem much more likely than the previous bisection, given > > that you ended up in an x86-32 specific commit (the subject says x32, > > but that is a mistake). I also checked that systemd indeed does > > call into bpf in a number of places, possibly for the journald socket. > > > > OTOH, it's still hard to tell how that commit can have ended up > > corrupting the clock read function in systemd. To cross-check, > > could you try reverting that commit on the latest kernel and see > > if it still works? > > I would be curious as well about that whether revert would make it > work. What's the value of sysctl net.core.bpf_jit_enable ? Does it > change anything if you set it to 0 (only interpreter) or 1 (JIT > enabled). Seems a bit strange to me that bisect ended at this commit > given the issue you have. The JIT itself was also new in this window > fwiw. In any case some more debug info would be great to have. net.core.bpf_jit_enable is 1. Since it breaks bootup, I can not easily change the value at runtime (it would be postfactum). Do you mean changing the CONFIG_BPF_JIT_ALWAYS_ON=y option? Anyway, I started compile of v4.18-rc5 that was the latest I tested, with the commit in question reverted. Will see if I can test tomorrow morning. But I will leave tomorrow for a week and can only test further things if they happen to boot fine (no manual reboot possible for a week). -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > > Everything below here is is 'bad', which can be an indication that you > > > misclassified one of > > > the commits above as 'good' when it should have been 'bad'. The most > > > likely > > > explanations are that you either typed the 'git bisect good' by accident, > > > or > > > that the failure is not 100% reliable, and it sometimes works fine even > > > on a > > > broken kernel. > > > > > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct > > > the > > > variable name in v9fs_get_trans_by_name() comment", which is marked > > > "good", > > > and can't really be good if 0bc5fe85727413 is bad and you are not using > > > the > > > 'qed' driver. > > > > > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > > > if it was, test v4.17-rc4, which is what the net-next tree was based on. > > > > Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting > > it. Building v4.17-rc4 now. > > v4.17-rc4 seems good after 2 reboots. The new bisect seems to have also led me to a strange commit. This time I tried to be careful and tested most on two reboots before classifying as good. However, f4e3ec0d573e was suspicious - it failed to autoload e1000 but had no other errors. On both boots with this kernel, modprobe e1000 and ifup -a made the system work so I assumed it was good, while it might not have been. Will try bisecting with f4e3ec0d573e marked bad. mroos@rx100s2:~/linux$ nice git bisect bad 9816dd35ececc095f3e3be29d30d3adc755908d9 is the first bad commit commit 9816dd35ececc095f3e3be29d30d3adc755908d9 Author: Jakub Kicinski Date: Thu May 3 18:37:12 2018 -0700 nfp: bpf: perf event output helpers support Add support for the perf_event_output family of helpers. The implementation on the NFP will not match the host code exactly. The state of the host map and rings is unknown to the device, hence device can't return errors when rings are not installed. The device simply packs the data into a firmware notification message and sends it over to the host, returning success to the program. There is no notion of a host CPU on the device when packets are being processed. Device will only offload programs which set BPF_F_CURRENT_CPU. Still, if map index doesn't match CPU no error will be returned (see above). Dropped/lost firmware notification messages will not cause "lost events" event on the perf ring, they are only visible via device error counters. Firmware notification messages may also get reordered in respect to the packets which caused their generation. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet Signed-off-by: Daniel Borkmann :04 04 00caca934fcbf1d5740a46d71e4d08e1f3ab8c7a 606c7bdd23e357f0902219630579c22a0ed0380c M drivers mroos@rx100s2:~/linux$ nice git bisect log git bisect start # bad: [3a443bd6dd7c43bf5763779309514bf3e7c1c3eb] net/9p: correct the variable name in v9fs_get_trans_by_name() comment git bisect bad 3a443bd6dd7c43bf5763779309514bf3e7c1c3eb # good: [75bc37fefc4471e718ba8e651aa74673d4e0a9eb] Linux 4.17-rc4 git bisect good 75bc37fefc4471e718ba8e651aa74673d4e0a9eb # good: [1504269814263c9676b4605a6a91e14dc6ceac21] Merge tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest git bisect good 1504269814263c9676b4605a6a91e14dc6ceac21 # skip: [c7d28c9df292a49904446dca15b2037ee8f874af] net: dsa: b53: Add support for reading PHY statistics git bisect skip c7d28c9df292a49904446dca15b2037ee8f874af # good: [173965fbfba596c02fa128966c2a33cb88afcd7f] tools/bpf: add a test for bpf_get_stack with raw tracepoint prog git bisect good 173965fbfba596c02fa128966c2a33cb88afcd7f # good: [795d8098d32b6bef3d0821588cb6e4b1f369a7a4] liquidio VF: indicate that disabling rx vlan offload is not allowed git bisect good 795d8098d32b6bef3d0821588cb6e4b1f369a7a4 # good: [90278871d4b0da39c84fc9aa4929b0809dc7cf3c] Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next git bisect good 90278871d4b0da39c84fc9aa4929b0809dc7cf3c # good: [4e1ec56cdc59746943b2acfab3c171b930187bbe] bpf: add skb_load_bytes_relative helper git bisect good 4e1ec56cdc59746943b2acfab3c171b930187bbe # good: [f4e3ec0d573e238f383b3da365127002579a07d6] bpf: replace map pointer loads before calling into offloads git bisect good f4e3ec0d573e238f383b3da365127002579a07d6 # bad: [e94fa1d93117e7f1eb783dc9cae6c7065099] bpf, xskmap: fix crash in xsk_map_alloc error path handling git bisect bad e94fa1d93117e7f1eb783dc9cae6c7065099 # bad: [e64d52569f6e847495091db40ab58d2d379748ef] tools: bpftool: move get_possible_cpus() to common code git bisect bad e64d52569f6e847495091db40ab58d2d379748ef # bad: [b4264c96b5cbc00c4c07deb9fbab928d43dffcf9] nfp: bpf: rewrite map pointers with NFP TIDs git bisect bad b4264c96b5cbc00c4c07deb9fbab928d43dffcf9 # bad:
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > > Everything below here is is 'bad', which can be an indication that you > > > misclassified one of > > > the commits above as 'good' when it should have been 'bad'. The most > > > likely > > > explanations are that you either typed the 'git bisect good' by accident, > > > or > > > that the failure is not 100% reliable, and it sometimes works fine even > > > on a > > > broken kernel. > > > > > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct > > > the > > > variable name in v9fs_get_trans_by_name() comment", which is marked > > > "good", > > > and can't really be good if 0bc5fe85727413 is bad and you are not using > > > the > > > 'qed' driver. > > > > > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > > > if it was, test v4.17-rc4, which is what the net-next tree was based on. > > > > Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting > > it. Building v4.17-rc4 now. > > v4.17-rc4 seems good after 2 reboots. The new bisect seems to have also led me to a strange commit. This time I tried to be careful and tested most on two reboots before classifying as good. However, f4e3ec0d573e was suspicious - it failed to autoload e1000 but had no other errors. On both boots with this kernel, modprobe e1000 and ifup -a made the system work so I assumed it was good, while it might not have been. Will try bisecting with f4e3ec0d573e marked bad. mroos@rx100s2:~/linux$ nice git bisect bad 9816dd35ececc095f3e3be29d30d3adc755908d9 is the first bad commit commit 9816dd35ececc095f3e3be29d30d3adc755908d9 Author: Jakub Kicinski Date: Thu May 3 18:37:12 2018 -0700 nfp: bpf: perf event output helpers support Add support for the perf_event_output family of helpers. The implementation on the NFP will not match the host code exactly. The state of the host map and rings is unknown to the device, hence device can't return errors when rings are not installed. The device simply packs the data into a firmware notification message and sends it over to the host, returning success to the program. There is no notion of a host CPU on the device when packets are being processed. Device will only offload programs which set BPF_F_CURRENT_CPU. Still, if map index doesn't match CPU no error will be returned (see above). Dropped/lost firmware notification messages will not cause "lost events" event on the perf ring, they are only visible via device error counters. Firmware notification messages may also get reordered in respect to the packets which caused their generation. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet Signed-off-by: Daniel Borkmann :04 04 00caca934fcbf1d5740a46d71e4d08e1f3ab8c7a 606c7bdd23e357f0902219630579c22a0ed0380c M drivers mroos@rx100s2:~/linux$ nice git bisect log git bisect start # bad: [3a443bd6dd7c43bf5763779309514bf3e7c1c3eb] net/9p: correct the variable name in v9fs_get_trans_by_name() comment git bisect bad 3a443bd6dd7c43bf5763779309514bf3e7c1c3eb # good: [75bc37fefc4471e718ba8e651aa74673d4e0a9eb] Linux 4.17-rc4 git bisect good 75bc37fefc4471e718ba8e651aa74673d4e0a9eb # good: [1504269814263c9676b4605a6a91e14dc6ceac21] Merge tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest git bisect good 1504269814263c9676b4605a6a91e14dc6ceac21 # skip: [c7d28c9df292a49904446dca15b2037ee8f874af] net: dsa: b53: Add support for reading PHY statistics git bisect skip c7d28c9df292a49904446dca15b2037ee8f874af # good: [173965fbfba596c02fa128966c2a33cb88afcd7f] tools/bpf: add a test for bpf_get_stack with raw tracepoint prog git bisect good 173965fbfba596c02fa128966c2a33cb88afcd7f # good: [795d8098d32b6bef3d0821588cb6e4b1f369a7a4] liquidio VF: indicate that disabling rx vlan offload is not allowed git bisect good 795d8098d32b6bef3d0821588cb6e4b1f369a7a4 # good: [90278871d4b0da39c84fc9aa4929b0809dc7cf3c] Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next git bisect good 90278871d4b0da39c84fc9aa4929b0809dc7cf3c # good: [4e1ec56cdc59746943b2acfab3c171b930187bbe] bpf: add skb_load_bytes_relative helper git bisect good 4e1ec56cdc59746943b2acfab3c171b930187bbe # good: [f4e3ec0d573e238f383b3da365127002579a07d6] bpf: replace map pointer loads before calling into offloads git bisect good f4e3ec0d573e238f383b3da365127002579a07d6 # bad: [e94fa1d93117e7f1eb783dc9cae6c7065099] bpf, xskmap: fix crash in xsk_map_alloc error path handling git bisect bad e94fa1d93117e7f1eb783dc9cae6c7065099 # bad: [e64d52569f6e847495091db40ab58d2d379748ef] tools: bpftool: move get_possible_cpus() to common code git bisect bad e64d52569f6e847495091db40ab58d2d379748ef # bad: [b4264c96b5cbc00c4c07deb9fbab928d43dffcf9] nfp: bpf: rewrite map pointers with NFP TIDs git bisect bad b4264c96b5cbc00c4c07deb9fbab928d43dffcf9 # bad:
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > Everything below here is is 'bad', which can be an indication that you > > misclassified one of > > the commits above as 'good' when it should have been 'bad'. The most likely > > explanations are that you either typed the 'git bisect good' by accident, or > > that the failure is not 100% reliable, and it sometimes works fine even on a > > broken kernel. > > > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the > > variable name in v9fs_get_trans_by_name() comment", which is marked "good", > > and can't really be good if 0bc5fe85727413 is bad and you are not using the > > 'qed' driver. > > > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > > if it was, test v4.17-rc4, which is what the net-next tree was based on. > > Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting > it. Building v4.17-rc4 now. v4.17-rc4 seems good after 2 reboots. -- Meelis Roos (mr...@ut.ee) http://www.cs.ut.ee/~mroos/
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > Everything below here is is 'bad', which can be an indication that you > > misclassified one of > > the commits above as 'good' when it should have been 'bad'. The most likely > > explanations are that you either typed the 'git bisect good' by accident, or > > that the failure is not 100% reliable, and it sometimes works fine even on a > > broken kernel. > > > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the > > variable name in v9fs_get_trans_by_name() comment", which is marked "good", > > and can't really be good if 0bc5fe85727413 is bad and you are not using the > > 'qed' driver. > > > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > > if it was, test v4.17-rc4, which is what the net-next tree was based on. > > Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting > it. Building v4.17-rc4 now. v4.17-rc4 seems good after 2 reboots. -- Meelis Roos (mr...@ut.ee) http://www.cs.ut.ee/~mroos/
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> Everything below here is is 'bad', which can be an indication that you > misclassified one of > the commits above as 'good' when it should have been 'bad'. The most likely > explanations are that you either typed the 'git bisect good' by accident, or > that the failure is not 100% reliable, and it sometimes works fine even on a > broken kernel. > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the > variable name in v9fs_get_trans_by_name() comment", which is marked "good", > and can't really be good if 0bc5fe85727413 is bad and you are not using the > 'qed' driver. > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > if it was, test v4.17-rc4, which is what the net-next tree was based on. Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting it. Building v4.17-rc4 now. -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> Everything below here is is 'bad', which can be an indication that you > misclassified one of > the commits above as 'good' when it should have been 'bad'. The most likely > explanations are that you either typed the 'git bisect good' by accident, or > that the failure is not 100% reliable, and it sometimes works fine even on a > broken kernel. > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the > variable name in v9fs_get_trans_by_name() comment", which is marked "good", > and can't really be good if 0bc5fe85727413 is bad and you are not using the > 'qed' driver. > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > if it was, test v4.17-rc4, which is what the net-next tree was based on. Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting it. Building v4.17-rc4 now. -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Sun, Jul 15, 2018 at 5:05 PM, Meelis Roos wrote: >> > > I then tried multiple other machines. All x86-64 machines seem >> > > unaffected, some x86-32 machines are affected (Athlon with AMD750 >> > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), >> > > some very similar x86-32 machines are unaffected. I have different >> > > customized kernel configuration on them, so far I have not pinpointed >> > > any configuration option to be at fault. >> > > >> > > All machines run Debian unstable. >> > > >> > > 4.17.0 was working fine. >> > > >> > > Will continue with bisecting between 4.17.0 and >> > > 4.18.0-rc1-00023-g9ffc59d57228. > > Bisection has been finished (I'm usually away from the problematic > computers in summer), result is strange and seems unrelated: > > 0bc5fe857274133ca028ebb15ff2e8549a369916 is the first bad commit > commit 0bc5fe857274133ca028ebb15ff2e8549a369916 > Author: Sudarsana Reddy Kalluru > Date: Sat May 5 18:42:59 2018 -0700 > > qed*: Refactor mf_mode to consist of bits. Agreed, that isn't the one you were looking for. > `mf_mode' field indicates the multi-partitioning mode the device is > configured to. This method doesn't scale very well, adding a new MF mode > requires going over all the existing conditions, and deciding whether > those > are needed for the new mode or not. > The patch defines a set of bit-fields for modes which are derived > according > to the mode info shared by the MFW and all the configuration would be made > according to those. To add a new mode, there would be a single place where > we'll need to go and choose which bits apply and which don't. > > Signed-off-by: Sudarsana Reddy Kalluru > Signed-off-by: Ariel Elior > Signed-off-by: David S. Miller > > :04 04 a3572846e1afb9ccfa9c4a84b0135a0057ade66f > bdb7b28725a4f1bffe79ee384a3603b3127d6fdb M drivers > :04 04 f90c7f26fd8445afa48c6679ed68fed294b23d7f > 52119c547a82b268b5c173d3df94e267cc1297a0 M include > mroos@rx100s2:~/linux$ nice git bisect log > git bisect start# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17 > git bisect good 29dcea88779c856c7dc92040a0c01233263101d4 > # good: [e27c49291a7fe9dc415c9fcab5bd781ec82dfe04] x86: Convert > x86_platform_ops to timespec64 > git bisect good e27c49291a7fe9dc415c9fcab5bd781ec82dfe04 > # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 > # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 > # good: [135c5504a600ff9b06e321694fbcac78a9530cd4] Merge tag > 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm > git bisect good 135c5504a600ff9b06e321694fbcac78a9530cd4 > # bad: [ffbc9197b4721634dc6c0fefa9b31e565fa89cee] wcn36xx: improve debug and > error messages for SMD > git bisect bad ffbc9197b4721634dc6c0fefa9b31e565fa89cee > # good: [3a443bd6dd7c43bf5763779309514bf3e7c1c3eb] net/9p: correct the > variable name in v9fs_get_trans_by_name() comment > git bisect good 3a443bd6dd7c43bf5763779309514bf3e7c1c3eb > # bad: [93c65d13d8a0b7c272868d4a9779f96fc973df26] vmxnet3: Replace msleep(1) > with usleep_range() > git bisect bad 93c65d13d8a0b7c272868d4a9779f96fc973df26 > # good: [4bc871984f7cb5b2dec3ae64b570cb02f9ce2227] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net > git bisect good 4bc871984f7cb5b2dec3ae64b570cb02f9ce2227 Everything below here is is 'bad', which can be an indication that you misclassified one of the commits above as 'good' when it should have been 'bad'. The most likely explanations are that you either typed the 'git bisect good' by accident, or that the failure is not 100% reliable, and it sometimes works fine even on a broken kernel. 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the variable name in v9fs_get_trans_by_name() comment", which is marked "good", and can't really be good if 0bc5fe85727413 is bad and you are not using the 'qed' driver. I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and if it was, test v4.17-rc4, which is what the net-next tree was based on. Arnd
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Sun, Jul 15, 2018 at 5:05 PM, Meelis Roos wrote: >> > > I then tried multiple other machines. All x86-64 machines seem >> > > unaffected, some x86-32 machines are affected (Athlon with AMD750 >> > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), >> > > some very similar x86-32 machines are unaffected. I have different >> > > customized kernel configuration on them, so far I have not pinpointed >> > > any configuration option to be at fault. >> > > >> > > All machines run Debian unstable. >> > > >> > > 4.17.0 was working fine. >> > > >> > > Will continue with bisecting between 4.17.0 and >> > > 4.18.0-rc1-00023-g9ffc59d57228. > > Bisection has been finished (I'm usually away from the problematic > computers in summer), result is strange and seems unrelated: > > 0bc5fe857274133ca028ebb15ff2e8549a369916 is the first bad commit > commit 0bc5fe857274133ca028ebb15ff2e8549a369916 > Author: Sudarsana Reddy Kalluru > Date: Sat May 5 18:42:59 2018 -0700 > > qed*: Refactor mf_mode to consist of bits. Agreed, that isn't the one you were looking for. > `mf_mode' field indicates the multi-partitioning mode the device is > configured to. This method doesn't scale very well, adding a new MF mode > requires going over all the existing conditions, and deciding whether > those > are needed for the new mode or not. > The patch defines a set of bit-fields for modes which are derived > according > to the mode info shared by the MFW and all the configuration would be made > according to those. To add a new mode, there would be a single place where > we'll need to go and choose which bits apply and which don't. > > Signed-off-by: Sudarsana Reddy Kalluru > Signed-off-by: Ariel Elior > Signed-off-by: David S. Miller > > :04 04 a3572846e1afb9ccfa9c4a84b0135a0057ade66f > bdb7b28725a4f1bffe79ee384a3603b3127d6fdb M drivers > :04 04 f90c7f26fd8445afa48c6679ed68fed294b23d7f > 52119c547a82b268b5c173d3df94e267cc1297a0 M include > mroos@rx100s2:~/linux$ nice git bisect log > git bisect start# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17 > git bisect good 29dcea88779c856c7dc92040a0c01233263101d4 > # good: [e27c49291a7fe9dc415c9fcab5bd781ec82dfe04] x86: Convert > x86_platform_ops to timespec64 > git bisect good e27c49291a7fe9dc415c9fcab5bd781ec82dfe04 > # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 > # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 > # good: [135c5504a600ff9b06e321694fbcac78a9530cd4] Merge tag > 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm > git bisect good 135c5504a600ff9b06e321694fbcac78a9530cd4 > # bad: [ffbc9197b4721634dc6c0fefa9b31e565fa89cee] wcn36xx: improve debug and > error messages for SMD > git bisect bad ffbc9197b4721634dc6c0fefa9b31e565fa89cee > # good: [3a443bd6dd7c43bf5763779309514bf3e7c1c3eb] net/9p: correct the > variable name in v9fs_get_trans_by_name() comment > git bisect good 3a443bd6dd7c43bf5763779309514bf3e7c1c3eb > # bad: [93c65d13d8a0b7c272868d4a9779f96fc973df26] vmxnet3: Replace msleep(1) > with usleep_range() > git bisect bad 93c65d13d8a0b7c272868d4a9779f96fc973df26 > # good: [4bc871984f7cb5b2dec3ae64b570cb02f9ce2227] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net > git bisect good 4bc871984f7cb5b2dec3ae64b570cb02f9ce2227 Everything below here is is 'bad', which can be an indication that you misclassified one of the commits above as 'good' when it should have been 'bad'. The most likely explanations are that you either typed the 'git bisect good' by accident, or that the failure is not 100% reliable, and it sometimes works fine even on a broken kernel. 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the variable name in v9fs_get_trans_by_name() comment", which is marked "good", and can't really be good if 0bc5fe85727413 is bad and you are not using the 'qed' driver. I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and if it was, test v4.17-rc4, which is what the net-next tree was based on. Arnd
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > > > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > > > 32-bit machines, and got half-failed bootup - kernel and userspace come > > > up but some services fail to start, including network and > > > systemd-journald: > > > > > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), > > > ) == 0' failed at ../src/basic/time-util.c:53, function now(). > > > Aborting. > > > > > > I then tried multiple other machines. All x86-64 machines seem > > > unaffected, some x86-32 machines are affected (Athlon with AMD750 > > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > > > some very similar x86-32 machines are unaffected. I have different > > > customized kernel configuration on them, so far I have not pinpointed > > > any configuration option to be at fault. > > > > > > All machines run Debian unstable. > > > > > > 4.17.0 was working fine. > > > > > > Will continue with bisecting between 4.17.0 and > > > 4.18.0-rc1-00023-g9ffc59d57228. Bisection has been finished (I'm usually away from the problematic computers in summer), result is strange and seems unrelated: 0bc5fe857274133ca028ebb15ff2e8549a369916 is the first bad commit commit 0bc5fe857274133ca028ebb15ff2e8549a369916 Author: Sudarsana Reddy Kalluru Date: Sat May 5 18:42:59 2018 -0700 qed*: Refactor mf_mode to consist of bits. `mf_mode' field indicates the multi-partitioning mode the device is configured to. This method doesn't scale very well, adding a new MF mode requires going over all the existing conditions, and deciding whether those are needed for the new mode or not. The patch defines a set of bit-fields for modes which are derived according to the mode info shared by the MFW and all the configuration would be made according to those. To add a new mode, there would be a single place where we'll need to go and choose which bits apply and which don't. Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Ariel Elior Signed-off-by: David S. Miller :04 04 a3572846e1afb9ccfa9c4a84b0135a0057ade66f bdb7b28725a4f1bffe79ee384a3603b3127d6fdb M drivers :04 04 f90c7f26fd8445afa48c6679ed68fed294b23d7f 52119c547a82b268b5c173d3df94e267cc1297a0 M include mroos@rx100s2:~/linux$ nice git bisect log git bisect start# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17 git bisect good 29dcea88779c856c7dc92040a0c01233263101d4 # good: [e27c49291a7fe9dc415c9fcab5bd781ec82dfe04] x86: Convert x86_platform_ops to timespec64 git bisect good e27c49291a7fe9dc415c9fcab5bd781ec82dfe04 # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 # good: [135c5504a600ff9b06e321694fbcac78a9530cd4] Merge tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm git bisect good 135c5504a600ff9b06e321694fbcac78a9530cd4 # bad: [ffbc9197b4721634dc6c0fefa9b31e565fa89cee] wcn36xx: improve debug and error messages for SMD git bisect bad ffbc9197b4721634dc6c0fefa9b31e565fa89cee # good: [3a443bd6dd7c43bf5763779309514bf3e7c1c3eb] net/9p: correct the variable name in v9fs_get_trans_by_name() comment git bisect good 3a443bd6dd7c43bf5763779309514bf3e7c1c3eb # bad: [93c65d13d8a0b7c272868d4a9779f96fc973df26] vmxnet3: Replace msleep(1) with usleep_range() git bisect bad 93c65d13d8a0b7c272868d4a9779f96fc973df26 # good: [4bc871984f7cb5b2dec3ae64b570cb02f9ce2227] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect good 4bc871984f7cb5b2dec3ae64b570cb02f9ce2227 # bad: [38aa51c134b56b7ea61bea79b428c5fbcd95f285] net/mlx5e: Support offloaded TC flows with no matches on headers git bisect bad 38aa51c134b56b7ea61bea79b428c5fbcd95f285 # bad: [00483690552c5fb6aa30bf3acb75b0ee89b4c0fd] tcp: Add mark for TIMEWAIT sockets git bisect bad 00483690552c5fb6aa30bf3acb75b0ee89b4c0fd # bad: [3e50d2da5850dd126b3e6a6e4387620d55b71db4] microchip_t1: Add driver for Microchip LAN87XX T1 PHYs git bisect bad 3e50d2da5850dd126b3e6a6e4387620d55b71db4 # bad: [dac0490718bd17df5e3995ffca14255e5f9ed22d] bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only. git bisect bad dac0490718bd17df5e3995ffca14255e5f9ed22d # bad: [9d4927f0d3760d8f10727c3035121d2677108f44] Merge branch 'ipv6-misc' git bisect bad 9d4927f0d3760d8f10727c3035121d2677108f44 # bad: [cac6f691546b9efd50c31c0db97fe50d0357104a] qed: Add support for Unified Fabric Port. git bisect bad cac6f691546b9efd50c31c0db97fe50d0357104a # bad: [27bf96e32c92599dc7523b36d6c761fc8312c8c0] qed: Remove unused data member 'is_mf_default'. git bisect bad 27bf96e32c92599dc7523b36d6c761fc8312c8c0 # bad:
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > > > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > > > 32-bit machines, and got half-failed bootup - kernel and userspace come > > > up but some services fail to start, including network and > > > systemd-journald: > > > > > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), > > > ) == 0' failed at ../src/basic/time-util.c:53, function now(). > > > Aborting. > > > > > > I then tried multiple other machines. All x86-64 machines seem > > > unaffected, some x86-32 machines are affected (Athlon with AMD750 > > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > > > some very similar x86-32 machines are unaffected. I have different > > > customized kernel configuration on them, so far I have not pinpointed > > > any configuration option to be at fault. > > > > > > All machines run Debian unstable. > > > > > > 4.17.0 was working fine. > > > > > > Will continue with bisecting between 4.17.0 and > > > 4.18.0-rc1-00023-g9ffc59d57228. Bisection has been finished (I'm usually away from the problematic computers in summer), result is strange and seems unrelated: 0bc5fe857274133ca028ebb15ff2e8549a369916 is the first bad commit commit 0bc5fe857274133ca028ebb15ff2e8549a369916 Author: Sudarsana Reddy Kalluru Date: Sat May 5 18:42:59 2018 -0700 qed*: Refactor mf_mode to consist of bits. `mf_mode' field indicates the multi-partitioning mode the device is configured to. This method doesn't scale very well, adding a new MF mode requires going over all the existing conditions, and deciding whether those are needed for the new mode or not. The patch defines a set of bit-fields for modes which are derived according to the mode info shared by the MFW and all the configuration would be made according to those. To add a new mode, there would be a single place where we'll need to go and choose which bits apply and which don't. Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Ariel Elior Signed-off-by: David S. Miller :04 04 a3572846e1afb9ccfa9c4a84b0135a0057ade66f bdb7b28725a4f1bffe79ee384a3603b3127d6fdb M drivers :04 04 f90c7f26fd8445afa48c6679ed68fed294b23d7f 52119c547a82b268b5c173d3df94e267cc1297a0 M include mroos@rx100s2:~/linux$ nice git bisect log git bisect start# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17 git bisect good 29dcea88779c856c7dc92040a0c01233263101d4 # good: [e27c49291a7fe9dc415c9fcab5bd781ec82dfe04] x86: Convert x86_platform_ops to timespec64 git bisect good e27c49291a7fe9dc415c9fcab5bd781ec82dfe04 # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21 # good: [135c5504a600ff9b06e321694fbcac78a9530cd4] Merge tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm git bisect good 135c5504a600ff9b06e321694fbcac78a9530cd4 # bad: [ffbc9197b4721634dc6c0fefa9b31e565fa89cee] wcn36xx: improve debug and error messages for SMD git bisect bad ffbc9197b4721634dc6c0fefa9b31e565fa89cee # good: [3a443bd6dd7c43bf5763779309514bf3e7c1c3eb] net/9p: correct the variable name in v9fs_get_trans_by_name() comment git bisect good 3a443bd6dd7c43bf5763779309514bf3e7c1c3eb # bad: [93c65d13d8a0b7c272868d4a9779f96fc973df26] vmxnet3: Replace msleep(1) with usleep_range() git bisect bad 93c65d13d8a0b7c272868d4a9779f96fc973df26 # good: [4bc871984f7cb5b2dec3ae64b570cb02f9ce2227] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect good 4bc871984f7cb5b2dec3ae64b570cb02f9ce2227 # bad: [38aa51c134b56b7ea61bea79b428c5fbcd95f285] net/mlx5e: Support offloaded TC flows with no matches on headers git bisect bad 38aa51c134b56b7ea61bea79b428c5fbcd95f285 # bad: [00483690552c5fb6aa30bf3acb75b0ee89b4c0fd] tcp: Add mark for TIMEWAIT sockets git bisect bad 00483690552c5fb6aa30bf3acb75b0ee89b4c0fd # bad: [3e50d2da5850dd126b3e6a6e4387620d55b71db4] microchip_t1: Add driver for Microchip LAN87XX T1 PHYs git bisect bad 3e50d2da5850dd126b3e6a6e4387620d55b71db4 # bad: [dac0490718bd17df5e3995ffca14255e5f9ed22d] bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only. git bisect bad dac0490718bd17df5e3995ffca14255e5f9ed22d # bad: [9d4927f0d3760d8f10727c3035121d2677108f44] Merge branch 'ipv6-misc' git bisect bad 9d4927f0d3760d8f10727c3035121d2677108f44 # bad: [cac6f691546b9efd50c31c0db97fe50d0357104a] qed: Add support for Unified Fabric Port. git bisect bad cac6f691546b9efd50c31c0db97fe50d0357104a # bad: [27bf96e32c92599dc7523b36d6c761fc8312c8c0] qed: Remove unused data member 'is_mf_default'. git bisect bad 27bf96e32c92599dc7523b36d6c761fc8312c8c0 # bad:
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Wed 2018-07-04 14:41:08, Meelis Roos wrote: > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > 32-bit machines, and got half-failed bootup - kernel and userspace come > up but some services fail to start, including network and > systemd-journald: > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > I then tried multiple other machines. All x86-64 machines seem > unaffected, some x86-32 machines are affected (Athlon with AMD750 > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > some very similar x86-32 machines are unaffected. I have different > customized kernel configuration on them, so far I have not pinpointed > any configuration option to be at fault. > > All machines run Debian unstable. > > 4.17.0 was working fine. > > Will continue with bisecting between 4.17.0 and > 4.18.0-rc1-00023-g9ffc59d57228. Details of my tests (.config, dmesg, versions) can be found in https://github.com/pavelmachek/missy/tree/master/db/notebook/lenovo/thinkpad/x60/pavel/2018.3648830947643 (and nearby directories). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Wed 2018-07-04 14:41:08, Meelis Roos wrote: > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > 32-bit machines, and got half-failed bootup - kernel and userspace come > up but some services fail to start, including network and > systemd-journald: > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > I then tried multiple other machines. All x86-64 machines seem > unaffected, some x86-32 machines are affected (Athlon with AMD750 > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > some very similar x86-32 machines are unaffected. I have different > customized kernel configuration on them, so far I have not pinpointed > any configuration option to be at fault. > > All machines run Debian unstable. > > 4.17.0 was working fine. > > Will continue with bisecting between 4.17.0 and > 4.18.0-rc1-00023-g9ffc59d57228. Details of my tests (.config, dmesg, versions) can be found in https://github.com/pavelmachek/missy/tree/master/db/notebook/lenovo/thinkpad/x60/pavel/2018.3648830947643 (and nearby directories). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Wed 2018-07-04 14:41:08, Meelis Roos wrote: > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > 32-bit machines, and got half-failed bootup - kernel and userspace come > up but some services fail to start, including network and > systemd-journald: > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > I then tried multiple other machines. All x86-64 machines seem > unaffected, some x86-32 machines are affected (Athlon with AMD750 > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > some very similar x86-32 machines are unaffected. I have different > customized kernel configuration on them, so far I have not pinpointed > any configuration option to be at fault. > > All machines run Debian unstable. > > 4.17.0 was working fine. > > Will continue with bisecting between 4.17.0 and > 4.18.0-rc1-00023-g9ffc59d57228. I don't think if it helps you, but 4.18-rc4 seems to work okay for me (and previous versions did, too) on thinkpad X60. But I'm using older debian version. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Wed 2018-07-04 14:41:08, Meelis Roos wrote: > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > 32-bit machines, and got half-failed bootup - kernel and userspace come > up but some services fail to start, including network and > systemd-journald: > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > I then tried multiple other machines. All x86-64 machines seem > unaffected, some x86-32 machines are affected (Athlon with AMD750 > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > some very similar x86-32 machines are unaffected. I have different > customized kernel configuration on them, so far I have not pinpointed > any configuration option to be at fault. > > All machines run Debian unstable. > > 4.17.0 was working fine. > > Will continue with bisecting between 4.17.0 and > 4.18.0-rc1-00023-g9ffc59d57228. I don't think if it helps you, but 4.18-rc4 seems to work okay for me (and previous versions did, too) on thinkpad X60. But I'm using older debian version. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Thu, Jul 5, 2018 at 11:54 AM, Meelis Roos wrote: >> > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now >> > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other >> > 32-bit machines, and got half-failed bootup - kernel and userspace come >> > up but some services fail to start, including network and >> > systemd-journald: >> > >> > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), >> > ) == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. >> > >> > I then tried multiple other machines. All x86-64 machines seem >> > unaffected, some x86-32 machines are affected (Athlon with AMD750 >> > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), >> > some very similar x86-32 machines are unaffected. I have different >> > customized kernel configuration on them, so far I have not pinpointed >> > any configuration option to be at fault. >> > >> > All machines run Debian unstable. >> > >> > 4.17.0 was working fine. >> > >> > Will continue with bisecting between 4.17.0 and >> > 4.18.0-rc1-00023-g9ffc59d57228. >> >> That does sound like it is related to my patches indeed. If you are not >> yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert >> x86_platform_ops to timespec64") before you try anything else, that >> one is the top of the branch with my changes. If that fails, the bisection >> will be much quicker. > > This commit was fine. So it's likely something else. Ok, at least that's a relief for me, even if it didn't help you ;-) I looked at the sources a bit and found that the assertion is triggered in systemd whenever we try to read a clock that the kernel does not provide. You have CONFIG_POSIX_TIMERS and CLOCK_RTC_CLASS set, so all the normal clocks should be operational, and I don't see anything unusual being passed into clock_gettime() from systemd. If you are able to find out what clock_id is passed in here, and what the return code is, that might still lead to a solution more quickly than continuing the bisection. Arnd
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Thu, Jul 5, 2018 at 11:54 AM, Meelis Roos wrote: >> > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now >> > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other >> > 32-bit machines, and got half-failed bootup - kernel and userspace come >> > up but some services fail to start, including network and >> > systemd-journald: >> > >> > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), >> > ) == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. >> > >> > I then tried multiple other machines. All x86-64 machines seem >> > unaffected, some x86-32 machines are affected (Athlon with AMD750 >> > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), >> > some very similar x86-32 machines are unaffected. I have different >> > customized kernel configuration on them, so far I have not pinpointed >> > any configuration option to be at fault. >> > >> > All machines run Debian unstable. >> > >> > 4.17.0 was working fine. >> > >> > Will continue with bisecting between 4.17.0 and >> > 4.18.0-rc1-00023-g9ffc59d57228. >> >> That does sound like it is related to my patches indeed. If you are not >> yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert >> x86_platform_ops to timespec64") before you try anything else, that >> one is the top of the branch with my changes. If that fails, the bisection >> will be much quicker. > > This commit was fine. So it's likely something else. Ok, at least that's a relief for me, even if it didn't help you ;-) I looked at the sources a bit and found that the assertion is triggered in systemd whenever we try to read a clock that the kernel does not provide. You have CONFIG_POSIX_TIMERS and CLOCK_RTC_CLASS set, so all the normal clocks should be operational, and I don't see anything unusual being passed into clock_gettime() from systemd. If you are able to find out what clock_id is passed in here, and what the return code is, that might still lead to a solution more quickly than continuing the bisection. Arnd
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > > 32-bit machines, and got half-failed bootup - kernel and userspace come > > up but some services fail to start, including network and > > systemd-journald: > > > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > > > I then tried multiple other machines. All x86-64 machines seem > > unaffected, some x86-32 machines are affected (Athlon with AMD750 > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > > some very similar x86-32 machines are unaffected. I have different > > customized kernel configuration on them, so far I have not pinpointed > > any configuration option to be at fault. > > > > All machines run Debian unstable. > > > > 4.17.0 was working fine. > > > > Will continue with bisecting between 4.17.0 and > > 4.18.0-rc1-00023-g9ffc59d57228. > > That does sound like it is related to my patches indeed. If you are not > yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert > x86_platform_ops to timespec64") before you try anything else, that > one is the top of the branch with my changes. If that fails, the bisection > will be much quicker. This commit was fine. So it's likely something else. -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > > 32-bit machines, and got half-failed bootup - kernel and userspace come > > up but some services fail to start, including network and > > systemd-journald: > > > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > > > I then tried multiple other machines. All x86-64 machines seem > > unaffected, some x86-32 machines are affected (Athlon with AMD750 > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > > some very similar x86-32 machines are unaffected. I have different > > customized kernel configuration on them, so far I have not pinpointed > > any configuration option to be at fault. > > > > All machines run Debian unstable. > > > > 4.17.0 was working fine. > > > > Will continue with bisecting between 4.17.0 and > > 4.18.0-rc1-00023-g9ffc59d57228. > > That does sound like it is related to my patches indeed. If you are not > yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert > x86_platform_ops to timespec64") before you try anything else, that > one is the top of the branch with my changes. If that fails, the bisection > will be much quicker. This commit was fine. So it's likely something else. -- Meelis Roos (mr...@linux.ee)
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Wed, Jul 4, 2018 at 1:41 PM, Meelis Roos wrote: > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > 32-bit machines, and got half-failed bootup - kernel and userspace come > up but some services fail to start, including network and > systemd-journald: > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > I then tried multiple other machines. All x86-64 machines seem > unaffected, some x86-32 machines are affected (Athlon with AMD750 > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > some very similar x86-32 machines are unaffected. I have different > customized kernel configuration on them, so far I have not pinpointed > any configuration option to be at fault. > > All machines run Debian unstable. > > 4.17.0 was working fine. > > Will continue with bisecting between 4.17.0 and > 4.18.0-rc1-00023-g9ffc59d57228. That does sound like it is related to my patches indeed. If you are not yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert x86_platform_ops to timespec64") before you try anything else, that one is the top of the branch with my changes. If that fails, the bisection will be much quicker. Unfortunately I don't see anything right away, and haven't come across that bug in my own testing using Debian Stretch in an x86-32 qemu. Arnd
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
On Wed, Jul 4, 2018 at 1:41 PM, Meelis Roos wrote: > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other > 32-bit machines, and got half-failed bootup - kernel and userspace come > up but some services fail to start, including network and > systemd-journald: > > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. > > I then tried multiple other machines. All x86-64 machines seem > unaffected, some x86-32 machines are affected (Athlon with AMD750 > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), > some very similar x86-32 machines are unaffected. I have different > customized kernel configuration on them, so far I have not pinpointed > any configuration option to be at fault. > > All machines run Debian unstable. > > 4.17.0 was working fine. > > Will continue with bisecting between 4.17.0 and > 4.18.0-rc1-00023-g9ffc59d57228. That does sound like it is related to my patches indeed. If you are not yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert x86_platform_ops to timespec64") before you try anything else, that one is the top of the branch with my changes. If that fails, the bisection will be much quicker. Unfortunately I don't see anything right away, and haven't come across that bug in my own testing using Debian Stretch in an x86-32 qemu. Arnd
4.18-rc* regression: x86-32 troubles (with timers?)
I tried 4.18.0-rc1-00023-g9ffc59d57228 and now 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other 32-bit machines, and got half-failed bootup - kernel and userspace come up but some services fail to start, including network and systemd-journald: systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. I then tried multiple other machines. All x86-64 machines seem unaffected, some x86-32 machines are affected (Athlon with AMD750 chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), some very similar x86-32 machines are unaffected. I have different customized kernel configuration on them, so far I have not pinpointed any configuration option to be at fault. All machines run Debian unstable. 4.17.0 was working fine. Will continue with bisecting between 4.17.0 and 4.18.0-rc1-00023-g9ffc59d57228. [0.00] Linux version 4.18.0-rc3-00113-gfc36def997cf (mroos@rx100s2) (gcc version 7.3.0 (Debian 7.3.0-23)) #27 SMP Wed Jul 4 13:06:34 EEST 2018 [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009afff] usable [0.00] BIOS-e820: [mem 0x0009b000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000ca000-0x000cbfff] reserved [0.00] BIOS-e820: [mem 0x000dc000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x3ff6] usable [0.00] BIOS-e820: [mem 0x3ff7-0x3ff79fff] ACPI data [0.00] BIOS-e820: [mem 0x3ff7a000-0x3ff7] ACPI NVS [0.00] BIOS-e820: [mem 0x3ff8-0x3fff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec0] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff80-0xffbf] reserved [0.00] BIOS-e820: [mem 0xfc00-0x] reserved [0.00] Notice: NX (Execute Disable) protection missing in CPU! [0.00] SMBIOS 2.3 present. [0.00] DMI: FUJITSU SIEMENS PRIMERGY RX100S2/D1571/M71IXG, BIOS 6.0 Rev. C0F2.1571 04/27/2005 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] last_pfn = 0x3ff70 max_arch_pfn = 0x10 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C7FFF write-protect [0.00] C8000-D uncachable [0.00] E-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask FC000 write-back [0.00] 1 base 03FF8 mask 8 uncachable [0.00] 2 disabled [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.00] total RAM covered: 1023M [0.00] Found optimal setting for mtrr clean up [0.00] gran_size: 64K chunk_size: 1M num_reg: 2 lose cover RAM: 0G [0.00] found SMP MP-table at [mem 0x000f6680-0x000f668f] mapped at [(ptrval)] [0.00] initial memory mapped: [mem 0x-0x04ff] [0.00] Base memory trampoline at [(ptrval)] 97000 size 16384 [0.00] BRK [0x04d97000, 0x04d97fff] PGTABLE [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F66B0 14 (v00 PTLTD ) [0.00] ACPI: RSDT 0x3FF75B79 38 (v01 PTLTDRSDT 0604 LTP ) [0.00] ACPI: FACP 0x3FF79E69 74 (v01 INTEL CANTWOOD 0604 PTL 0003) [0.00] ACPI: DSDT 0x3FF75BB1 0042B8 (v01 INTEL CANTWOOD 0604 MSFT 010B) [0.00] ACPI: FACS 0x3FF7AFC0 40 [0.00] ACPI: SPCR 0x3FF79EDD 50 (v01 PTLTD $UCRTBL$ 0604 PTL 0001) [0.00] ACPI: APIC 0x3FF79F2D 74 (v01 PTLTD ? APIC 0604 LTP ) [0.00] ACPI: BOOT 0x3FF79FA1 28 (v01 PTLTD $SBFTBL$ 0604 LTP 0001) [0.00] ACPI: SSDT 0x3FF79FC9 37 (v01 PTLTD ACPIHT 0604 LTP 0001) [0.00] ACPI: Local APIC address 0xfee0 [0.00] 135MB HIGHMEM available. [0.00] 887MB LOWMEM available. [0.00] mapped low ram: 0 - 377fe000 [0.00] low ram: 0 - 377fe000 [0.00] tsc: Fast TSC calibration using PIT [0.00] BRK [0x04d98000, 0x04d98fff] PGTABLE [0.00] Zone ranges: [0.00] DMA [mem 0x1000-0x00ff] [0.00]
4.18-rc* regression: x86-32 troubles (with timers?)
I tried 4.18.0-rc1-00023-g9ffc59d57228 and now 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other 32-bit machines, and got half-failed bootup - kernel and userspace come up but some services fail to start, including network and systemd-journald: systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), ) == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting. I then tried multiple other machines. All x86-64 machines seem unaffected, some x86-32 machines are affected (Athlon with AMD750 chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), some very similar x86-32 machines are unaffected. I have different customized kernel configuration on them, so far I have not pinpointed any configuration option to be at fault. All machines run Debian unstable. 4.17.0 was working fine. Will continue with bisecting between 4.17.0 and 4.18.0-rc1-00023-g9ffc59d57228. [0.00] Linux version 4.18.0-rc3-00113-gfc36def997cf (mroos@rx100s2) (gcc version 7.3.0 (Debian 7.3.0-23)) #27 SMP Wed Jul 4 13:06:34 EEST 2018 [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009afff] usable [0.00] BIOS-e820: [mem 0x0009b000-0x0009] reserved [0.00] BIOS-e820: [mem 0x000ca000-0x000cbfff] reserved [0.00] BIOS-e820: [mem 0x000dc000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x3ff6] usable [0.00] BIOS-e820: [mem 0x3ff7-0x3ff79fff] ACPI data [0.00] BIOS-e820: [mem 0x3ff7a000-0x3ff7] ACPI NVS [0.00] BIOS-e820: [mem 0x3ff8-0x3fff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec0] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff80-0xffbf] reserved [0.00] BIOS-e820: [mem 0xfc00-0x] reserved [0.00] Notice: NX (Execute Disable) protection missing in CPU! [0.00] SMBIOS 2.3 present. [0.00] DMI: FUJITSU SIEMENS PRIMERGY RX100S2/D1571/M71IXG, BIOS 6.0 Rev. C0F2.1571 04/27/2005 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] last_pfn = 0x3ff70 max_arch_pfn = 0x10 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C7FFF write-protect [0.00] C8000-D uncachable [0.00] E-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask FC000 write-back [0.00] 1 base 03FF8 mask 8 uncachable [0.00] 2 disabled [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.00] total RAM covered: 1023M [0.00] Found optimal setting for mtrr clean up [0.00] gran_size: 64K chunk_size: 1M num_reg: 2 lose cover RAM: 0G [0.00] found SMP MP-table at [mem 0x000f6680-0x000f668f] mapped at [(ptrval)] [0.00] initial memory mapped: [mem 0x-0x04ff] [0.00] Base memory trampoline at [(ptrval)] 97000 size 16384 [0.00] BRK [0x04d97000, 0x04d97fff] PGTABLE [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F66B0 14 (v00 PTLTD ) [0.00] ACPI: RSDT 0x3FF75B79 38 (v01 PTLTDRSDT 0604 LTP ) [0.00] ACPI: FACP 0x3FF79E69 74 (v01 INTEL CANTWOOD 0604 PTL 0003) [0.00] ACPI: DSDT 0x3FF75BB1 0042B8 (v01 INTEL CANTWOOD 0604 MSFT 010B) [0.00] ACPI: FACS 0x3FF7AFC0 40 [0.00] ACPI: SPCR 0x3FF79EDD 50 (v01 PTLTD $UCRTBL$ 0604 PTL 0001) [0.00] ACPI: APIC 0x3FF79F2D 74 (v01 PTLTD ? APIC 0604 LTP ) [0.00] ACPI: BOOT 0x3FF79FA1 28 (v01 PTLTD $SBFTBL$ 0604 LTP 0001) [0.00] ACPI: SSDT 0x3FF79FC9 37 (v01 PTLTD ACPIHT 0604 LTP 0001) [0.00] ACPI: Local APIC address 0xfee0 [0.00] 135MB HIGHMEM available. [0.00] 887MB LOWMEM available. [0.00] mapped low ram: 0 - 377fe000 [0.00] low ram: 0 - 377fe000 [0.00] tsc: Fast TSC calibration using PIT [0.00] BRK [0x04d98000, 0x04d98fff] PGTABLE [0.00] Zone ranges: [0.00] DMA [mem 0x1000-0x00ff] [0.00]