Re: (Small) bias in generation of random passkeys for pairing
Hi Pavel, Am 19.06.19 um 18:24 schrieb Pavel Machek: > Hi! > > There's a (small) bias in passkey generation in bluetooth: > > get_random_bytes(, sizeof(passkey)); > passkey %= 100; > put_unaligned_le32(passkey, smp->tk); > > (there are at least two places doing this). > > All passkeys are not of same probability, passkey "00" is more > probable than "99", but difference is small. It is slightly different IMHO. Unsigned 32bits passkey assumed (and all users I found were u32), the passkeys "00" to "967295" are slightly more probable than "967296" to "99". If my math is right (which I doubt), the difference in probability for both entities is 4294:4293. > Do we care? I, personally, don't (yet). But then, I'm not a real security expert. Have fun, -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
[PATCH] dvb-usb-firmware: use DMA buffers for USB transfers
From: Stefan Seyfried <seife+ker...@b1-systems.com> The USB control messages require DMA to work. We cannot pass a stack-allocated buffer, as it is not warranted that the stack would be into a DMA enabled area. Signed-off-by: Stefan Seyfried <seife+ker...@b1-systems.com> --- This fixes at least dvb-usb-technisat-usb2 for me, but probably the other drivers that are using dvb_usb_download_firmware() with a Cypress chip are broken with CONFIG_VMAP_STACK=y right now. Patch attached additionally, because I don't think thunderbird will get this right :-( drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c index f0023dbb7276..2f340621a786 100644 --- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c +++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c @@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type) { - struct hexline hx; - u8 reset; int ret,pos=0; + /* urb buffers must be malloc'ed, stack will not work with CONFIG_VMAP_STACK=y */ + u8 *reset = kmalloc(1, GFP_KERNEL); + struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL); /* stop the CPU */ - reset = 1; - if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1)) != 1) + *reset = 1; + if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1)) != 1) err("could not stop the USB controller CPU."); - while ((ret = dvb_usb_get_hexline(fw,,)) > 0) { - deb_fw("writing to address 0x%04x (buffer: 0x%02x %02x)\n",hx.addr,hx.len,hx.chk); - ret = usb_cypress_writemem(udev,hx.addr,hx.data,hx.len); + while ((ret = dvb_usb_get_hexline(fw,hx,)) > 0) { + deb_fw("writing to address 0x%04x (buffer: 0x%02x %02x)\n",hx->addr,hx->len,hx->chk); + ret = usb_cypress_writemem(udev,hx->addr,hx->data,hx->len); - if (ret != hx.len) { + if (ret != hx->len) { err("error while transferring firmware (transferred size: %d, block size: %d)", - ret,hx.len); + ret,hx->len); ret = -EINVAL; break; } } if (ret < 0) { err("firmware download failed at %d with %d",pos,ret); - return ret; + goto out_free; } if (ret == 0) { /* restart the CPU */ - reset = 0; - if (ret || usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1) != 1) { + *reset = 0; + if (ret || usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1) != 1) { err("could not restart the USB controller CPU."); ret = -EINVAL; } } else ret = -EIO; + out_free: + kfree(reset); + kfree(hx); return ret; } EXPORT_SYMBOL(usb_cypress_load_firmware); -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman From f582c0f19837890254d3c0d8a23a1142eb8ea673 Mon Sep 17 00:00:00 2001 From: Stefan Seyfried <seife+ker...@b1-systems.com> Date: Sat, 18 Feb 2017 22:52:31 +0100 Subject: [PATCH] dvb-usb-firmware: use DMA buffers for USB transfers The USB control messages require DMA to work. We cannot pass a stack-allocated buffer, as it is not warranted that the stack would be into a DMA enabled area. Signed-off-by: Stefan Seyfried <seife+ker...@b1-systems.com> --- drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c index f0023dbb7276..2f340621a786 100644 --- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c +++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c @@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type) { - struct hexline hx; - u8 reset; int ret,pos=0; + /* urb buffers must be malloc'ed, stack will not work with CONFIG_VMAP_STACK=y */ + u8 *reset = kmalloc(1, GFP_KERNEL); + struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL); /* stop the CPU */ - reset = 1; - if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_regis
[PATCH] dvb-usb-firmware: use DMA buffers for USB transfers
From: Stefan Seyfried The USB control messages require DMA to work. We cannot pass a stack-allocated buffer, as it is not warranted that the stack would be into a DMA enabled area. Signed-off-by: Stefan Seyfried --- This fixes at least dvb-usb-technisat-usb2 for me, but probably the other drivers that are using dvb_usb_download_firmware() with a Cypress chip are broken with CONFIG_VMAP_STACK=y right now. Patch attached additionally, because I don't think thunderbird will get this right :-( drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c index f0023dbb7276..2f340621a786 100644 --- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c +++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c @@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type) { - struct hexline hx; - u8 reset; int ret,pos=0; + /* urb buffers must be malloc'ed, stack will not work with CONFIG_VMAP_STACK=y */ + u8 *reset = kmalloc(1, GFP_KERNEL); + struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL); /* stop the CPU */ - reset = 1; - if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1)) != 1) + *reset = 1; + if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1)) != 1) err("could not stop the USB controller CPU."); - while ((ret = dvb_usb_get_hexline(fw,,)) > 0) { - deb_fw("writing to address 0x%04x (buffer: 0x%02x %02x)\n",hx.addr,hx.len,hx.chk); - ret = usb_cypress_writemem(udev,hx.addr,hx.data,hx.len); + while ((ret = dvb_usb_get_hexline(fw,hx,)) > 0) { + deb_fw("writing to address 0x%04x (buffer: 0x%02x %02x)\n",hx->addr,hx->len,hx->chk); + ret = usb_cypress_writemem(udev,hx->addr,hx->data,hx->len); - if (ret != hx.len) { + if (ret != hx->len) { err("error while transferring firmware (transferred size: %d, block size: %d)", - ret,hx.len); + ret,hx->len); ret = -EINVAL; break; } } if (ret < 0) { err("firmware download failed at %d with %d",pos,ret); - return ret; + goto out_free; } if (ret == 0) { /* restart the CPU */ - reset = 0; - if (ret || usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1) != 1) { + *reset = 0; + if (ret || usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1) != 1) { err("could not restart the USB controller CPU."); ret = -EINVAL; } } else ret = -EIO; + out_free: + kfree(reset); + kfree(hx); return ret; } EXPORT_SYMBOL(usb_cypress_load_firmware); -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman From f582c0f19837890254d3c0d8a23a1142eb8ea673 Mon Sep 17 00:00:00 2001 From: Stefan Seyfried Date: Sat, 18 Feb 2017 22:52:31 +0100 Subject: [PATCH] dvb-usb-firmware: use DMA buffers for USB transfers The USB control messages require DMA to work. We cannot pass a stack-allocated buffer, as it is not warranted that the stack would be into a DMA enabled area. Signed-off-by: Stefan Seyfried --- drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c index f0023dbb7276..2f340621a786 100644 --- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c +++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c @@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type) { - struct hexline hx; - u8 reset; int ret,pos=0; + /* urb buffers must be malloc'ed, stack will not work with CONFIG_VMAP_STACK=y */ + u8 *reset = kmalloc(1, GFP_KERNEL); + struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL); /* stop the CPU */ - reset = 1; - if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1)) != 1) + *reset = 1; + if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1)) != 1) err("could not stop
Re: [PATCH] drm/i915: Avoid drm_atomic_state_put(NULL) in intel_display_resume
Hi Chris, this fixes the problem for me, thanks! Tested-by: Stefan Seyfried <stefan.seyfr...@googlemail.com> Am 15.01.2017 um 13:58 schrieb Chris Wilson: > intel_display_resume() may be called without a atomic state to restore, > i.e. dev_priv->modeset_reset_restore state is NULL. One such case is > following a lid open/close event and the forced modeset in > intel_lid_notiy(). > > Reported-by: Stefan Seyfried <stefan.seyfr...@googlemail.com> > Fixes: 0853695c3ba4 ("drm: Add reference counting to drm_atomic_state") > Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk> > Cc: Daniel Vetter <daniel.vet...@ffwll.ch> > Cc: Jani Nikula <jani.nik...@linux.intel.com> > Cc: <drm-intel-fi...@lists.freedesktop.org> # v4.10-rc1+ > --- > drivers/gpu/drm/i915/intel_display.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 3dc8724df400..260bbe8881e6 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -17024,7 +17024,8 @@ void intel_display_resume(struct drm_device *dev) > > if (ret) > DRM_ERROR("Restoring old state failed with %i\n", ret); > - drm_atomic_state_put(state); > + if (state) > + drm_atomic_state_put(state); > } > > void intel_modeset_gem_init(struct drm_device *dev) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
Re: [PATCH] drm/i915: Avoid drm_atomic_state_put(NULL) in intel_display_resume
Hi Chris, this fixes the problem for me, thanks! Tested-by: Stefan Seyfried Am 15.01.2017 um 13:58 schrieb Chris Wilson: > intel_display_resume() may be called without a atomic state to restore, > i.e. dev_priv->modeset_reset_restore state is NULL. One such case is > following a lid open/close event and the forced modeset in > intel_lid_notiy(). > > Reported-by: Stefan Seyfried > Fixes: 0853695c3ba4 ("drm: Add reference counting to drm_atomic_state") > Signed-off-by: Chris Wilson > Cc: Daniel Vetter > Cc: Jani Nikula > Cc: # v4.10-rc1+ > --- > drivers/gpu/drm/i915/intel_display.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 3dc8724df400..260bbe8881e6 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -17024,7 +17024,8 @@ void intel_display_resume(struct drm_device *dev) > > if (ret) > DRM_ERROR("Restoring old state failed with %i\n", ret); > - drm_atomic_state_put(state); > + if (state) > + drm_atomic_state_put(state); > } > > void intel_modeset_gem_init(struct drm_device *dev) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
4.10 regression drm/i915: BUG/oops on lid open
Hi all, Since 4.10-rc1 I'm getting this on lid close/open on my trusty old ThinkPad X200s: pci :00:1e.0: PCI bridge to [bus 0d] BUG: unable to handle kernel NULL pointer dereference at (null) IP: intel_display_resume+0xaf/0x120 [i915] PGD 22b99b067 PUD 22b99a067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep msr xfs libcrc32c cdc_ether usbnet mii cdc_wdm cdc_acm dm_crypt algif_skcipher af_alg snd_hda_codec_conexant snd_hda_codec_generic arc4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_pcm mei_wdt iTCO_wdt iTCO_vendor_support iwldvm snd_seq mac80211 snd_seq_device snd_timer coretemp kvm_intel kvm irqbypass btusb btrtl btbcm btintel iwlwifi pcspkr snd_mixer_oss bluetooth thinkpad_acpi battery ac fjes i915 cfg80211 snd wmi rfkill drm_kms_helper video drm i2c_i801 fb_sys_fops syscopyarea e1000e sysfillrect sysimgblt i2c_algo_bit acpi_cpufreq ptp soundcore tpm_tis mei_me pps_core shpchp tpm_tis_core lpc_ich mei mfd_core button tpm serio_raw thermal ehci_pci uhci_hcd ehci_hcd usbcore sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua loop CPU: 0 PID: 12922 Comm: kworker/0:0 Not tainted 4.10.0-rc3-1.gf1c24bb-default #1 Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 Workqueue: kacpi_notify acpi_os_execute_deferred task: 9e2c22854240 task.stack: becbcc85c000 RIP: 0010:intel_display_resume+0xaf/0x120 [i915] RSP: 0018:becbcc85fc70 EFLAGS: 00010282 RAX: c027a670 RBX: becbcc85fc78 RCX: RDX: 9e2c22854240 RSI: 000d RDI: 9e2c2d738210 RBP: becbcc85fcd0 R08: 0010 R09: R10: 9e2c2d738380 R11: c0451d00 R12: 9e2c2d738000 R13: R14: 9e2c2d738210 R15: FS: () GS:9e2c3bc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 00022b998000 CR4: 000406f0 Call Trace: intel_lid_notify+0xca/0xd0 [i915] notifier_call_chain+0x4a/0x70 __blocking_notifier_call_chain+0x47/0x60 blocking_notifier_call_chain+0x16/0x20 acpi_lid_notify_state+0xee/0x142 [button] acpi_lid_update_state+0x24/0x27 [button] acpi_button_notify+0x3d/0x130 [button] acpi_device_notify+0x19/0x1b acpi_ev_notify_dispatch+0x49/0x61 acpi_os_execute_deferred+0x14/0x20 process_one_work+0x193/0x470 worker_thread+0x4e/0x490 kthread+0x101/0x140 ? process_one_work+0x470/0x470 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x25/0x30 Code: e8 d7 aa 2c d6 8b 45 a4 89 c1 31 f6 48 c7 c2 c0 11 50 c0 48 c7 c7 e5 10 51 c0 e8 6d a3 de ff 48 c7 c0 70 a6 27 c0 48 85 c0 74 56 41 83 6d 00 01 75 08 4c 89 ef e8 01 b9 df ff 48 83 c4 40 5b RIP: intel_display_resume+0xaf/0x120 [i915] RSP: becbcc85fc70 CR2: ---[ end trace d496ba830778c097 ]--- The machine is running fine afterwards but never again receiving a lid close / open event. 4.9 is good. I tried to bisect it and landed at 0853695c3ba46f97dfc0b5885f7b7e640ca212dd Author: Chris Wilson <ch...@chris-wilson.co.uk> Date: Fri Oct 14 13:18:18 2016 +0100 drm: Add reference counting to drm_atomic_state However, during bisecting the failure got worse (the machine locked up hard during lid close/open, with lots of recursive faults), so I doubt this is the commit to revert, but apparently it still needs some more fixes. Thanks, Stefan -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
4.10 regression drm/i915: BUG/oops on lid open
Hi all, Since 4.10-rc1 I'm getting this on lid close/open on my trusty old ThinkPad X200s: pci :00:1e.0: PCI bridge to [bus 0d] BUG: unable to handle kernel NULL pointer dereference at (null) IP: intel_display_resume+0xaf/0x120 [i915] PGD 22b99b067 PUD 22b99a067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep msr xfs libcrc32c cdc_ether usbnet mii cdc_wdm cdc_acm dm_crypt algif_skcipher af_alg snd_hda_codec_conexant snd_hda_codec_generic arc4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_pcm mei_wdt iTCO_wdt iTCO_vendor_support iwldvm snd_seq mac80211 snd_seq_device snd_timer coretemp kvm_intel kvm irqbypass btusb btrtl btbcm btintel iwlwifi pcspkr snd_mixer_oss bluetooth thinkpad_acpi battery ac fjes i915 cfg80211 snd wmi rfkill drm_kms_helper video drm i2c_i801 fb_sys_fops syscopyarea e1000e sysfillrect sysimgblt i2c_algo_bit acpi_cpufreq ptp soundcore tpm_tis mei_me pps_core shpchp tpm_tis_core lpc_ich mei mfd_core button tpm serio_raw thermal ehci_pci uhci_hcd ehci_hcd usbcore sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua loop CPU: 0 PID: 12922 Comm: kworker/0:0 Not tainted 4.10.0-rc3-1.gf1c24bb-default #1 Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 Workqueue: kacpi_notify acpi_os_execute_deferred task: 9e2c22854240 task.stack: becbcc85c000 RIP: 0010:intel_display_resume+0xaf/0x120 [i915] RSP: 0018:becbcc85fc70 EFLAGS: 00010282 RAX: c027a670 RBX: becbcc85fc78 RCX: RDX: 9e2c22854240 RSI: 000d RDI: 9e2c2d738210 RBP: becbcc85fcd0 R08: 0010 R09: R10: 9e2c2d738380 R11: c0451d00 R12: 9e2c2d738000 R13: R14: 9e2c2d738210 R15: FS: () GS:9e2c3bc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 00022b998000 CR4: 000406f0 Call Trace: intel_lid_notify+0xca/0xd0 [i915] notifier_call_chain+0x4a/0x70 __blocking_notifier_call_chain+0x47/0x60 blocking_notifier_call_chain+0x16/0x20 acpi_lid_notify_state+0xee/0x142 [button] acpi_lid_update_state+0x24/0x27 [button] acpi_button_notify+0x3d/0x130 [button] acpi_device_notify+0x19/0x1b acpi_ev_notify_dispatch+0x49/0x61 acpi_os_execute_deferred+0x14/0x20 process_one_work+0x193/0x470 worker_thread+0x4e/0x490 kthread+0x101/0x140 ? process_one_work+0x470/0x470 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x25/0x30 Code: e8 d7 aa 2c d6 8b 45 a4 89 c1 31 f6 48 c7 c2 c0 11 50 c0 48 c7 c7 e5 10 51 c0 e8 6d a3 de ff 48 c7 c0 70 a6 27 c0 48 85 c0 74 56 41 83 6d 00 01 75 08 4c 89 ef e8 01 b9 df ff 48 83 c4 40 5b RIP: intel_display_resume+0xaf/0x120 [i915] RSP: becbcc85fc70 CR2: ---[ end trace d496ba830778c097 ]--- The machine is running fine afterwards but never again receiving a lid close / open event. 4.9 is good. I tried to bisect it and landed at 0853695c3ba46f97dfc0b5885f7b7e640ca212dd Author: Chris Wilson Date: Fri Oct 14 13:18:18 2016 +0100 drm: Add reference counting to drm_atomic_state However, during bisecting the failure got worse (the machine locked up hard during lid close/open, with lots of recursive faults), so I doubt this is the commit to revert, but apparently it still needs some more fixes. Thanks, Stefan -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
Re: PCI devices (buses?) and 3GB of RAM lost with 4.2rc1
Am 08.07.2015 um 22:09 schrieb Stefan Seyfried: > this is on a Thinkpad X200s, 5 years old and working fine, until 4.2rc1 > came along. > > With that booted, I do not have a WiFi card anymore, it doesn't even > appear in "lspci" output. > From diffing the dmesg's, it also looks like I lost some of my RAM: > > -Memory: 8050048K/8280176K available (6401K kernel code, 980K rwdata, > 4864K rodata, 1532K init, 1516K bss, 230128K reserved, 0K cma-reserved) > +Memory: 5104620K/8280176K available (6823K kernel code, 1096K rwdata, > 3220K rodata, 1556K init, 1520K bss, 227792K reserved, 0K cma-reserved) This was only a one-off thing, it looks like the hardware was confused when first booting 4.2-rc1 (I found out when I wanted to bisect it, all the kernels I built did just work, and then I finally booted the distro-kernel again and it also worked :-) So everything is fine, sorry for the noise. -- -- Stefan Seyfried Linux Consultant & Developer Mail: seyfr...@b1-systems.de GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PCI devices (buses?) and 3GB of RAM lost with 4.2rc1
Am 08.07.2015 um 22:09 schrieb Stefan Seyfried: this is on a Thinkpad X200s, 5 years old and working fine, until 4.2rc1 came along. With that booted, I do not have a WiFi card anymore, it doesn't even appear in lspci output. From diffing the dmesg's, it also looks like I lost some of my RAM: -Memory: 8050048K/8280176K available (6401K kernel code, 980K rwdata, 4864K rodata, 1532K init, 1516K bss, 230128K reserved, 0K cma-reserved) +Memory: 5104620K/8280176K available (6823K kernel code, 1096K rwdata, 3220K rodata, 1556K init, 1520K bss, 227792K reserved, 0K cma-reserved) This was only a one-off thing, it looks like the hardware was confused when first booting 4.2-rc1 (I found out when I wanted to bisect it, all the kernels I built did just work, and then I finally booted the distro-kernel again and it also worked :-) So everything is fine, sorry for the noise. -- -- Stefan Seyfried Linux Consultant Developer Mail: seyfr...@b1-systems.de GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 4.1-rc6 unloading loop OOPS
Hi Ming, Am 04.06.2015 um 12:24 schrieb Ming Lei: > On Thu, Jun 4, 2015 at 5:11 PM, Stefan Seyfried > wrote: >> I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod >> loop >> do not complain anymore), but not the OOPS. > > One fix[1] was just merged to linus tree, and could you test that to see if > your > issue can be addressed? > > [1] http://marc.info/?t=14320151831=1=2 I just tried current Linus' master v4.1-rc6-49-g8a7deb3 which contains this commit and do no longer get the Warning Unfortunately, due to this I cannot really test your patch for the OOPS (but the OOPS was only happening once for me, so it was not reliably triggered). Thanks, things work well for me, again :-) Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION] 4.1-rc6 unloading loop OOPS
ff [ 661.450217] Call Trace: [ 661.450217] [] blk_mq_unregister_hctx.part.0+0x3d/0x60 [ 661.450217] [] blk_mq_unregister_disk+0x51/0xe0 [ 661.450217] [] blk_unregister_queue+0x2c/0x90 [ 661.450217] [] del_gendisk+0x118/0x280 [ 661.450217] [] loop_remove+0x21/0x50 [loop] [ 661.450217] [] loop_exit_cb+0x11/0x20 [loop] [ 661.450217] [] idr_for_each+0xa3/0xf0 [ 661.450217] [] loop_exit+0x30/0xb1a [loop] [ 661.450217] [] SyS_delete_module+0x1ac/0x230 [ 661.450217] [] system_call_fastpath+0x16/0x75 [ 661.450217] [<7ff635777f37>] 0x7ff635777f37 [ 661.450217] Code: 48 83 c7 18 e9 54 ff ff ff 0f 1f 40 00 5b c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 2e <48> 8b 6f 30 e8 09 cc ef ff 48 89 ef e8 a1 98 ef ff 80 63 3c fd [ 661.450217] RIP [] kobject_del+0xe/0x50 [ 661.450217] RSP [ 661.450217] CR2: 0108 [ 661.466690] ---[ end trace 7b8e0f39c45cf572 ]--- I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod loop do not complain anymore), but not the OOPS. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 4.1-rc6 unloading loop OOPS
Hi Ming, Am 04.06.2015 um 12:24 schrieb Ming Lei: On Thu, Jun 4, 2015 at 5:11 PM, Stefan Seyfried stefan.seyfr...@googlemail.com wrote: I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod loop do not complain anymore), but not the OOPS. One fix[1] was just merged to linus tree, and could you test that to see if your issue can be addressed? [1] http://marc.info/?t=14320151831r=1w=2 I just tried current Linus' master v4.1-rc6-49-g8a7deb3 which contains this commit and do no longer get the Warning Unfortunately, due to this I cannot really test your patch for the OOPS (but the OOPS was only happening once for me, so it was not reliably triggered). Thanks, things work well for me, again :-) Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION] 4.1-rc6 unloading loop OOPS
: [ 661.450217] a380 0001 88023004f800 8133abed [ 661.450217] 8802319eba80 8802300577e0 88023004f800 8133ac61 [ 661.450217] 8802300577e0 88023004f400 [ 661.450217] Call Trace: [ 661.450217] [8133abed] blk_mq_unregister_hctx.part.0+0x3d/0x60 [ 661.450217] [8133ac61] blk_mq_unregister_disk+0x51/0xe0 [ 661.450217] [81330a2c] blk_unregister_queue+0x2c/0x90 [ 661.450217] [8133e048] del_gendisk+0x118/0x280 [ 661.450217] [a351] loop_remove+0x21/0x50 [loop] [ 661.450217] [a391] loop_exit_cb+0x11/0x20 [loop] [ 661.450217] [81359743] idr_for_each+0xa3/0xf0 [ 661.450217] [a0003516] loop_exit+0x30/0xb1a [loop] [ 661.450217] [810ece3c] SyS_delete_module+0x1ac/0x230 [ 661.450217] [816a1cb2] system_call_fastpath+0x16/0x75 [ 661.450217] [7ff635777f37] 0x7ff635777f37 [ 661.450217] Code: 48 83 c7 18 e9 54 ff ff ff 0f 1f 40 00 5b c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 2e 48 8b 6f 30 e8 09 cc ef ff 48 89 ef e8 a1 98 ef ff 80 63 3c fd [ 661.450217] RIP [8135b3ce] kobject_del+0xe/0x50 [ 661.450217] RSP 8801d8d7bd78 [ 661.450217] CR2: 0108 [ 661.466690] ---[ end trace 7b8e0f39c45cf572 ]--- I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod loop do not complain anymore), but not the OOPS. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 23.03.2015 um 19:38 schrieb Andy Lutomirski: > I bet I see it. I have the advantage of having stared at KVM code and > cursed at it more recently than you, I suspect. KVM does awful, awful > things to CPU state, and, as an optimization, it allows kernel code to > run with CPU state that would be totally invalid in user mode. This > happens through a bunch of hooks, including this bit in __switch_to: > > /* > * Now maybe reload the debug registers and handle I/O bitmaps > */ > if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || > task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) > __switch_to_xtra(prev_p, next_p, tss); > > IOW, we *change* tif during context switches. > > > The race looks like this: > > testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) > jnz int_ret_from_sys_call_fixup/* Go the the slow path */ > > --- preempted here, switch to KVM guest --- > > KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't > happen to be a *32-bit* KVM guest, perhaps? not in my case (penryn CPU), there it was 64bit guests. > Now KVM schedules, calling __switch_to. __switch_to sets > _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn > off interrupts, and do sysret. We are now screwed. > > I don't know why this manifests in this particular failure, but any > number of terrible things could happen now. > > FWIW, this will affect things other than KVM. For example, SIGKILL > sent while a process is sleeping in that two-instruction window won't > work. > > Takashi, can you re-send your patch so we can review it for real in > light of this race? -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 23.03.2015 um 19:38 schrieb Andy Lutomirski: I bet I see it. I have the advantage of having stared at KVM code and cursed at it more recently than you, I suspect. KVM does awful, awful things to CPU state, and, as an optimization, it allows kernel code to run with CPU state that would be totally invalid in user mode. This happens through a bunch of hooks, including this bit in __switch_to: /* * Now maybe reload the debug registers and handle I/O bitmaps */ if (unlikely(task_thread_info(next_p)-flags _TIF_WORK_CTXSW_NEXT || task_thread_info(prev_p)-flags _TIF_WORK_CTXSW_PREV)) __switch_to_xtra(prev_p, next_p, tss); IOW, we *change* tif during context switches. The race looks like this: testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP) jnz int_ret_from_sys_call_fixup/* Go the the slow path */ --- preempted here, switch to KVM guest --- KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't happen to be a *32-bit* KVM guest, perhaps? not in my case (penryn CPU), there it was 64bit guests. Now KVM schedules, calling __switch_to. __switch_to sets _TIF_USER_RETURN_NOTIFY. We IRET back to the syscall exit code, turn off interrupts, and do sysret. We are now screwed. I don't know why this manifests in this particular failure, but any number of terrible things could happen now. FWIW, this will affect things other than KVM. For example, SIGKILL sent while a process is sleeping in that two-instruction window won't work. Takashi, can you re-send your patch so we can review it for real in light of this race? -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Good Morning :-) Am 19.03.2015 um 01:57 schrieb Andy Lutomirski: > Stefan, do you happen to know whether your disassembly of page_fault > came from the instructions in memory or if they came from the vmlinux > file? Not that I have any relevant ideas there. I think they came from memory. At least, the disassemble in crash... crash> disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data32 xchg %ax,%ax 0x816834a3 <+3>: data32 xchg %ax,%ax 0x816834a6 <+6>: data32 xchg %ax,%ax 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 0x816834b2 <+18>:mov%rsp,%rdi 0x816834b5 <+21>:mov0x78(%rsp),%rsi 0x816834ba <+26>:movq $0x,0x78(%rsp) 0x816834c3 <+35>:callq 0x810504e0 0x816834c8 <+40>:jmpq 0x816836d0 End of assembler dump. ...is different than the one from loading vmlinux in gdb: Reading symbols from vmlinux-4.0.0-rc3-2.gd5c547f-desktop...done. Reading symbols from /usr/lib/debug/boot/vmlinux-4.0.0-rc3-2.gd5c547f-desktop.debug...done. (gdb) disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data16 xchg %ax,%ax 0x816834a3 <+3>: callq *0x7a5b07(%rip)# 0x81e28fb0 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 0x816834b2 <+18>:mov%rsp,%rdi 0x816834b5 <+21>:mov0x78(%rsp),%rsi 0x816834ba <+26>:movq $0x,0x78(%rsp) 0x816834c3 <+35>:callq 0x810504e0 0xffff816834c8 <+40>:jmpq 0x816836d0 End of assembler dump. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Good Morning :-) Am 19.03.2015 um 01:57 schrieb Andy Lutomirski: Stefan, do you happen to know whether your disassembly of page_fault came from the instructions in memory or if they came from the vmlinux file? Not that I have any relevant ideas there. I think they came from memory. At least, the disassemble in crash... crash disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 +0: data32 xchg %ax,%ax 0x816834a3 +3: data32 xchg %ax,%ax 0x816834a6 +6: data32 xchg %ax,%ax 0x816834a9 +9: sub$0x78,%rsp 0x816834ad +13:callq 0x81683620 error_entry 0x816834b2 +18:mov%rsp,%rdi 0x816834b5 +21:mov0x78(%rsp),%rsi 0x816834ba +26:movq $0x,0x78(%rsp) 0x816834c3 +35:callq 0x810504e0 do_page_fault 0x816834c8 +40:jmpq 0x816836d0 error_exit End of assembler dump. ...is different than the one from loading vmlinux in gdb: Reading symbols from vmlinux-4.0.0-rc3-2.gd5c547f-desktop...done. Reading symbols from /usr/lib/debug/boot/vmlinux-4.0.0-rc3-2.gd5c547f-desktop.debug...done. (gdb) disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 +0: data16 xchg %ax,%ax 0x816834a3 +3: callq *0x7a5b07(%rip)# 0x81e28fb0 pv_irq_ops+48 0x816834a9 +9: sub$0x78,%rsp 0x816834ad +13:callq 0x81683620 error_entry 0x816834b2 +18:mov%rsp,%rdi 0x816834b5 +21:mov0x78(%rsp),%rsi 0x816834ba +26:movq $0x,0x78(%rsp) 0x816834c3 +35:callq 0x810504e0 do_page_fault 0x816834c8 +40:jmpq 0x816836d0 error_exit End of assembler dump. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 19.03.2015 um 00:22 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski wrote: >> Yes, it's userspace. Thanks for checking, though. > > One more stupid hunch: > > Can you do: > x/21xg 8801013d4f58 > > If I counted right, that'll dump task_pt_regs(current). That's all zeroes: crash> x /21xg 0x8801013d4f58 0x8801013d4f58: 0x 0x 0x8801013d4f68: 0x 0x 0x8801013d4f78: 0x 0x 0x8801013d4f88: 0x 0x 0x8801013d4f98: 0x 0x 0x8801013d4fa8: 0x 0x 0x8801013d4fb8: 0x 0x 0x8801013d4fc8: 0x 0x 0x8801013d4fd8: 0x 0x 0x8801013d4fe8: 0x 0x 0x8801013d4ff8: 0x But maybe you counted wrong (or I'm reading arch/x86/include/asm/processor.h wrong, which is at least as likely...). #define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1) => I have the task_struct readily available decoded in the crash utility. crash> task, search for thread, in thread: sp0 = 18446612136629993472 crash> eval 18446612136629993472 hexadecimal: 8801013d8000 (18014269664677728KB) crash> print *(struct pt_regs *)(18446612136629993472 - sizeof(struct pt_regs)) $20 = { r15 = 18446744071585666077, r14 = 16, r13 = 582, r12 = 18446612136629993352, bp = 24, bx = 18446744071585666061, r11 = 582, r10 = 10760856, r9 = 140712613762160, r8 = 140735967861216, ax = 1, cx = 140712476030103, dx = 140712613782304, si = 1, di = 140712589295616, orig_ax = 209, ip = 140712571864823, cs = 51, flags = 582, sp = 140735967860552, ss = 43 } => r15 = 8168141d r12 = 8801013d7f88 bx = 8168140d r9 = 7ffa355bd470 ip = 7ffa32dc86f7 sp = 7fffa55f1748 looks somehow legit, to my totally untrained eye (ip and sp actually). I'm off to bed now (01:20 around here ;), will be back in about 7 hours. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 23:29 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina wrote: >> On Wed, 18 Mar 2015, Andy Lutomirski wrote: >> >>> sysret64 can only fail with #GP, and we're totally screwed if that >>> happens, >> >> But what if the GPF handler pagefaults afterwards? It'd be operating on >> user stack already. > > Good point. > > Stefan, can you try changing the first "jne > opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in > entry_64.S and seeing if you can reproduce this? (Is it easy enough > to reproduce that this would tell us anything?) I have no good way of reproducing the issue (happens once per week...) but apparently Takashi has, so I'd like to hand this task over to him. > It's a shame that double_fault doesn't record what gs was on entry. > If we did sysret -> general_protection -> page_fault -> double_fault, > then we'd enter double_fault with usergs, whereas syscall -> > page_fault -> double_fault would enter double_fault with kernelgs. > > Hmm. We may be able to answer this more directly. Stefan, can you > dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your > page_fault stack at the time of the failure)? That will tell us the > faulting address. If that fails, try starting at 7fffa55eb000 > instead. Unfortunately not, is this userspace memory? It's not in the dump I have. This issue is the first I have seen where having a full dump would be really helpful apart from cosmetic reasons... -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:49 schrieb Denys Vlasenko: > Stefan, Takashi, can you post your /proc/cpuinfo > and dmesg after boot? susi:~ # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping: 10 microcode : 0xa0c cpu MHz : 1867.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow vnmi flexpriority bugs: bogomips: 3723.96 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: (repeats for second core :) I'm running 3.19 now, but the dmesg extracted from the crash dump of 4.0-rc3 is at http://paste.opensuse.org/48196621 -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:32 schrieb Linus Torvalds: > Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault' > makes me suspect it is, and that that is some paravirt rewriting > area. What does paravirt go for that USERGS_SYSRET64 (or for > SWAPGS_UNSAFE_STACK, for that matter). This from the newer kernel package, but I doubt this configuration has been changed in the openSUSE kernel: susi:~ # grep PARAVIRT /boot/config-4.0.0-rc4-1.g126fc64-desktop CONFIG_PARAVIRT=y # CONFIG_PARAVIRT_DEBUG is not set # CONFIG_PARAVIRT_SPINLOCKS is not set # CONFIG_PARAVIRT_TIME_ACCOUNTING is not set CONFIG_PARAVIRT_CLOCK=y So yes, PARAVIRT is enabled. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:21 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried > wrote: >> Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: >>> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried >>> wrote: >> >>>>> The relevant thread's stack is here (see ti in the trace): >>>>> >>>>> 8801013d4000 >>>>> >>>>> It could be interesting to see what's there. >>>>> >>>>> I don't suppose you want to try to walk the paging structures to see >>>>> if 88023bc8 (i.e. gsbase) and, more specifically, >>>>> 88023bc8 + old_rsp and 88023bc8 + kernel_stack are >>>>> present? You'd only have to walk one level -- presumably, if the PGD >>>>> entry is there, the rest of the entries are okay, too. >>>> >>>> That's all greek to me :-) >>>> >>>> I see that there is something at 88023bc8: >>>> >>>> crash> x /64xg 0x88023bc8 >>>> 0x88023bc8: 0x 0x >>>> 0x88023bc80010: 0x 0x >>>> 0x88023bc80020: 0x 0x6686ada9 >>>> 0x88023bc80030: 0x 0x >>>> 0x88023bc80040: 0x 0x >>>> [all zeroes] >>>> 0x88023bc801f0: 0x 0x >>>> >>>> old_rsp and kernel_stack seem bogus: >>>> crash> print old_rsp >>>> Cannot access memory at address 0xa200 >>>> gdb: gdb request failed: print old_rsp >>>> crash> print kernel_stack >>>> Cannot access memory at address 0xaa48 >>>> gdb: gdb request failed: print kernel_stack >>>> >>>> kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is: >>> >>> Yup. old_rsp and kernel_stack are offsets relative to gsbase. >>> >>>> >>>> crash> x /64xg 0x88023bc8aa00 >>>> 0x88023bc8aa00: 0x 0x >>> >>> [...] >>> >>> I don't know enough about crashkernel to know whether the fact that >>> this worked means anything. >> >> AFAIK this just means that the memory at this location is included in >> the dump :-) >> >>> Can you dump the page of physical memory at 0x4779a067? That's the PGD. >> >> Unfortunately not, this is a partial dump (I think the default config in >> openSUSE, but I might have changed it some time ago) and the dump_level >> is 31 which means that the following are excluded: >> >> | |cache |cache | | >> dump | zero |without|with | user | free >>level | page |private|private| data | page >> ---+--+---+---+--+-- >> 31 | X | X | X | X | X >> >> so this: >> crash> x /64xg 0x4779a067 >> 0x4779a067: Cannot access memory at address 0x4779a067 >> gdb: gdb request failed: x /64xg >> >> probably just means, that the PGD falls in one of the above excluded >> categories. > > I suspect that it actually means that gdb sees virtual addresses, not > physical addresses. But I screwed up completely -- "PGD" in the dump > is the PGD *entry*, not the PGD pointer. in crash, usually physical addresses work (it's a sophisticated wrapper around gdb AFAICT) > > We could plausibly fish it out from current->mm, but that's a mess. I'll come to that later I > don't suppose that "info registers" or "p/x $cr3" will show the cr3 > value? No, that does not work from crash. But current->mm is easy: crash> task|grep mm start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" mm = 0x8800b8a9c040, active_mm = 0x8800b8a9c040, comm = "qemu-system-x86", and (guessing the type :-) crash> print *(struct mm_struct *)0x8800b8a9c040|grep pgd pgd = 0x880002d7e000, But if that's correct, pgd contains all zeroes: crash> print *(pgd_t *)0x880002d7e000 $15 = { pgd = 0 } crash> x /16xg 0x880002d7e000 0x880002d7e000: 0x 0x 0x880002d7e010: 0x 0x 0x880002d7e020: 0x0000 0x0000 0x880002d7e030: 0x 0x000
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried > wrote: >>> The relevant thread's stack is here (see ti in the trace): >>> >>> 8801013d4000 >>> >>> It could be interesting to see what's there. >>> >>> I don't suppose you want to try to walk the paging structures to see >>> if 88023bc8 (i.e. gsbase) and, more specifically, >>> 88023bc8 + old_rsp and 88023bc8 + kernel_stack are >>> present? You'd only have to walk one level -- presumably, if the PGD >>> entry is there, the rest of the entries are okay, too. >> >> That's all greek to me :-) >> >> I see that there is something at 88023bc8: >> >> crash> x /64xg 0x88023bc8 >> 0x88023bc8: 0x 0x >> 0x88023bc80010: 0x 0x >> 0x88023bc80020: 0x 0x6686ada9 >> 0x88023bc80030: 0x 0x >> 0x88023bc80040: 0x 0x >> [all zeroes] >> 0x88023bc801f0: 0x 0x >> >> old_rsp and kernel_stack seem bogus: >> crash> print old_rsp >> Cannot access memory at address 0xa200 >> gdb: gdb request failed: print old_rsp >> crash> print kernel_stack >> Cannot access memory at address 0xaa48 >> gdb: gdb request failed: print kernel_stack >> >> kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is: > > Yup. old_rsp and kernel_stack are offsets relative to gsbase. > >> >> crash> x /64xg 0x88023bc8aa00 >> 0x88023bc8aa00: 0x 0x > > [...] > > I don't know enough about crashkernel to know whether the fact that > this worked means anything. AFAIK this just means that the memory at this location is included in the dump :-) > Can you dump the page of physical memory at 0x4779a067? That's the PGD. Unfortunately not, this is a partial dump (I think the default config in openSUSE, but I might have changed it some time ago) and the dump_level is 31 which means that the following are excluded: | |cache |cache | | dump | zero |without|with | user | free level | page |private|private| data | page ---+--+---+---+--+-- 31 | X | X | X | X | X so this: crash> x /64xg 0x4779a067 0x4779a067: Cannot access memory at address 0x4779a067 gdb: gdb request failed: x /64xg probably just means, that the PGD falls in one of the above excluded categories. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi Andy, Am 18.03.2015 um 20:26 schrieb Andy Lutomirski: > Hi Linus- > > You seem to enjoy debugging these things. Want to give this a shot? > My guess is a vmalloc fault accessing either old_rsp or kernel_stack > right after swapgs in syscall entry. > > On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried > wrote: >> Hi all, >> >> first, I'm kind of happy that I'm not the only one seeing this, and >> thus my beloved Thinkpad can stay for a bit longer... :-) >> >> Then, I'm mostly an amateur when it comes to kernel debugging, so bear >> with me when I'm stumbling through the code... >> >> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: >>> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: >>>> At Wed, 18 Mar 2015 18:43:52 +0100, >>>> Takashi Iwai wrote: >>>>> >>>>> At Wed, 18 Mar 2015 15:16:42 +0100, >>>>> Takashi Iwai wrote: >>>>>> >>>>>> At Sun, 15 Mar 2015 09:17:15 +0100, >>>>>> Stefan Seyfried wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> in 4.0-rc I have recently seen a few crashes, always when running >>>>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this >>>>>>> is the backtrace from dmesg.txt: >>>>>>> >>>>>>> [242060.604870] PANIC: double fault, error_code: 0x0 >>> >>> OK, we double faulted. Too bad that x86 CPUs don't tell us why. >>> >>>>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >>>>>>> W 4.0.0-rc3-2.gd5c547f-desktop #1 >>>>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >>>>>>> (3.21 ) 12/13/2011 >>>>>>> [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: >>>>>>> 8801013d4000 >>>>>>> [242060.604885] RIP: 0010:[] [] >>>>>>> page_fault+0xd/0x30 >>> >>> The double fault happened during page fault processing. Could you >>> disassemble your page_fault function to find the offending >>> instruction? >> >> This one is easy: >> >> crash> disassemble page_fault >> Dump of assembler code for function page_fault: >>0x816834a0 <+0>: data32 xchg %ax,%ax >>0x816834a3 <+3>: data32 xchg %ax,%ax >>0x816834a6 <+6>: data32 xchg %ax,%ax >>0x816834a9 <+9>: sub$0x78,%rsp >>0x816834ad <+13>:callq 0x81683620 > > The callq was the double-faulting instruction, and it is indeed the > first function in here that would have accessed the stack. (The sub > *changes* rsp but isn't a memory access.) So, since RSP is bogus, we > page fault, and the page fault is promoted to a double fault. The > surprising thing is that the page fault itself seems to have been > delivered okay, and RSP wasn't on a page boundary. > > You wouldn't happen to be using a Broadwell machine? No, this is a quite old Thinkpad X200s, Core2duo processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping: 10 microcode : 0xa0c > The only way to get here with bogus RSP is if we interrupted something > that was previously running at CPL0 with similarly bogus RSP. > > I don't know if I trust CR2. It's 16 bytes lower than I'd expect. > >>0x816834b2 <+18>:mov%rsp,%rdi >>0x816834b5 <+21>:mov0x78(%rsp),%rsi >>0x816834ba <+26>:movq $0x,0x78(%rsp) >>0x816834c3 <+35>:callq 0x810504e0 >>0x816834c8 <+40>:jmpq 0x816836d0 >> End of assembler dump. >> >> >>>>>>> [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 >>> >>> Uh, what? That RSP is a user address. >>> >>>>>>> [242060.604895] RAX: aa40 RBX: 0001 RCX: >>>>>>> 81682237 >>>>>>> [242060.604896] RDX: aa40 RSI: RDI: >>>>>>> 7fffa55eb078 >>>>>>> [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: >>>>>>> >>>>>>> [242060.604900] R10: R1
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi all, first, I'm kind of happy that I'm not the only one seeing this, and thus my beloved Thinkpad can stay for a bit longer... :-) Then, I'm mostly an amateur when it comes to kernel debugging, so bear with me when I'm stumbling through the code... Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: > On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: >> At Wed, 18 Mar 2015 18:43:52 +0100, >> Takashi Iwai wrote: >>> >>> At Wed, 18 Mar 2015 15:16:42 +0100, >>> Takashi Iwai wrote: >>>> >>>> At Sun, 15 Mar 2015 09:17:15 +0100, >>>> Stefan Seyfried wrote: >>>>> >>>>> Hi all, >>>>> >>>>> in 4.0-rc I have recently seen a few crashes, always when running >>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this >>>>> is the backtrace from dmesg.txt: >>>>> >>>>> [242060.604870] PANIC: double fault, error_code: 0x0 > > OK, we double faulted. Too bad that x86 CPUs don't tell us why. > >>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >>>>> W 4.0.0-rc3-2.gd5c547f-desktop #1 >>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >>>>> (3.21 ) 12/13/2011 >>>>> [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: >>>>> 8801013d4000 >>>>> [242060.604885] RIP: 0010:[] [] >>>>> page_fault+0xd/0x30 > > The double fault happened during page fault processing. Could you > disassemble your page_fault function to find the offending > instruction? This one is easy: crash> disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 <+0>: data32 xchg %ax,%ax 0x816834a3 <+3>: data32 xchg %ax,%ax 0x816834a6 <+6>: data32 xchg %ax,%ax 0x816834a9 <+9>: sub$0x78,%rsp 0x816834ad <+13>:callq 0x81683620 0x816834b2 <+18>:mov%rsp,%rdi 0x816834b5 <+21>:mov0x78(%rsp),%rsi 0x816834ba <+26>:movq $0x,0x78(%rsp) 0x816834c3 <+35>:callq 0x810504e0 0x816834c8 <+40>:jmpq 0x816836d0 End of assembler dump. >>>>> [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 > > Uh, what? That RSP is a user address. > >>>>> [242060.604895] RAX: aa40 RBX: 0001 RCX: >>>>> 81682237 >>>>> [242060.604896] RDX: aa40 RSI: RDI: >>>>> 7fffa55eb078 >>>>> [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: >>>>> >>>>> [242060.604900] R10: R11: 0293 R12: >>>>> 004a >>>>> [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: >>>>> 7ffa3556cf20 >>>>> [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() >>>>> knlGS: >>>>> [242060.604906] CS: 0010 DS: ES: CR0: 80050033 >>>>> [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: >>>>> 000427e0 >>>>> [242060.604909] Stack: >>>>> [242060.604942] BUG: unable to handle kernel paging request at >>>>> 7fffa55eafb8 >>>>> [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 > > This is suspicious. We need to have died, again, of a fatal page > fault while dumping the stack. I posted the same problem to the opensuse kernel list shortly before turning to LKML. There, Michal Kubecek noted: "I encountered a similar problem recently. The thing is, x86 specification says that on a double fault, RIP and RSP registers are undefined, i.e. you not only can't expect them to contain values corresponding to the first or second fault but you can't even expect them to have any usable values at all. Unfortunately the kernel double fault handler doesn't take this into account and does try to display usual crash related information so that it itself does usually crash when trying to show stack content (that's the show_stack_log_lvl() crash). The result is a double fault (which itself would be very hard to debug) followed by a crash in its handler so that analysing the outcome is extremely difficult." I cannot judge if this is true, but it sounded related to solving the problem to me. >>>>> [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 19.03.2015 um 00:22 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski l...@amacapital.net wrote: Yes, it's userspace. Thanks for checking, though. One more stupid hunch: Can you do: x/21xg 8801013d4f58 If I counted right, that'll dump task_pt_regs(current). That's all zeroes: crash x /21xg 0x8801013d4f58 0x8801013d4f58: 0x 0x 0x8801013d4f68: 0x 0x 0x8801013d4f78: 0x 0x 0x8801013d4f88: 0x 0x 0x8801013d4f98: 0x 0x 0x8801013d4fa8: 0x 0x 0x8801013d4fb8: 0x 0x 0x8801013d4fc8: 0x 0x 0x8801013d4fd8: 0x 0x 0x8801013d4fe8: 0x 0x 0x8801013d4ff8: 0x But maybe you counted wrong (or I'm reading arch/x86/include/asm/processor.h wrong, which is at least as likely...). #define task_pt_regs(tsk) ((struct pt_regs *)(tsk)-thread.sp0 - 1) = I have the task_struct readily available decoded in the crash utility. crash task, search for thread, in thread: sp0 = 18446612136629993472 crash eval 18446612136629993472 hexadecimal: 8801013d8000 (18014269664677728KB) crash print *(struct pt_regs *)(18446612136629993472 - sizeof(struct pt_regs)) $20 = { r15 = 18446744071585666077, r14 = 16, r13 = 582, r12 = 18446612136629993352, bp = 24, bx = 18446744071585666061, r11 = 582, r10 = 10760856, r9 = 140712613762160, r8 = 140735967861216, ax = 1, cx = 140712476030103, dx = 140712613782304, si = 1, di = 140712589295616, orig_ax = 209, ip = 140712571864823, cs = 51, flags = 582, sp = 140735967860552, ss = 43 } = r15 = 8168141d r12 = 8801013d7f88 bx = 8168140d r9 = 7ffa355bd470 ip = 7ffa32dc86f7 sp = 7fffa55f1748 looks somehow legit, to my totally untrained eye (ip and sp actually). I'm off to bed now (01:20 around here ;), will be back in about 7 hours. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 23:29 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina jkos...@suse.cz wrote: On Wed, 18 Mar 2015, Andy Lutomirski wrote: sysret64 can only fail with #GP, and we're totally screwed if that happens, But what if the GPF handler pagefaults afterwards? It'd be operating on user stack already. Good point. Stefan, can you try changing the first jne opportunistic_sysret_failed to jmp opportunistic_sysret_failed in entry_64.S and seeing if you can reproduce this? (Is it easy enough to reproduce that this would tell us anything?) I have no good way of reproducing the issue (happens once per week...) but apparently Takashi has, so I'd like to hand this task over to him. It's a shame that double_fault doesn't record what gs was on entry. If we did sysret - general_protection - page_fault - double_fault, then we'd enter double_fault with usergs, whereas syscall - page_fault - double_fault would enter double_fault with kernelgs. Hmm. We may be able to answer this more directly. Stefan, can you dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your page_fault stack at the time of the failure)? That will tell us the faulting address. If that fails, try starting at 7fffa55eb000 instead. Unfortunately not, is this userspace memory? It's not in the dump I have. This issue is the first I have seen where having a full dump would be really helpful apart from cosmetic reasons... -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi all, first, I'm kind of happy that I'm not the only one seeing this, and thus my beloved Thinkpad can stay for a bit longer... :-) Then, I'm mostly an amateur when it comes to kernel debugging, so bear with me when I'm stumbling through the code... Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai ti...@suse.de wrote: At Wed, 18 Mar 2015 18:43:52 +0100, Takashi Iwai wrote: At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: Hi all, in 4.0-rc I have recently seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 OK, we double faulted. Too bad that x86 CPUs don't tell us why. [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.604885] RIP: 0010:[816834ad] [816834ad] page_fault+0xd/0x30 The double fault happened during page fault processing. Could you disassemble your page_fault function to find the offending instruction? This one is easy: crash disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 +0: data32 xchg %ax,%ax 0x816834a3 +3: data32 xchg %ax,%ax 0x816834a6 +6: data32 xchg %ax,%ax 0x816834a9 +9: sub$0x78,%rsp 0x816834ad +13:callq 0x81683620 error_entry 0x816834b2 +18:mov%rsp,%rdi 0x816834b5 +21:mov0x78(%rsp),%rsi 0x816834ba +26:movq $0x,0x78(%rsp) 0x816834c3 +35:callq 0x810504e0 do_page_fault 0x816834c8 +40:jmpq 0x816836d0 error_exit End of assembler dump. [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 Uh, what? That RSP is a user address. [242060.604895] RAX: aa40 RBX: 0001 RCX: 81682237 [242060.604896] RDX: aa40 RSI: RDI: 7fffa55eb078 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: [242060.604900] R10: R11: 0293 R12: 004a [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 7ffa3556cf20 [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.604906] CS: 0010 DS: ES: CR0: 80050033 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 7fffa55eafb8 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190 This is suspicious. We need to have died, again, of a fatal page fault while dumping the stack. I posted the same problem to the opensuse kernel list shortly before turning to LKML. There, Michal Kubecek noted: I encountered a similar problem recently. The thing is, x86 specification says that on a double fault, RIP and RSP registers are undefined, i.e. you not only can't expect them to contain values corresponding to the first or second fault but you can't even expect them to have any usable values at all. Unfortunately the kernel double fault handler doesn't take this into account and does try to display usual crash related information so that it itself does usually crash when trying to show stack content (that's the show_stack_log_lvl() crash). The result is a double fault (which itself would be very hard to debug) followed by a crash in its handler so that analysing the outcome is extremely difficult. I cannot judge if this is true, but it sounded related to solving the problem to me. [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 [242060.605078] Oops: [#1] PREEMPT SMP [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:49 schrieb Denys Vlasenko: Stefan, Takashi, can you post your /proc/cpuinfo and dmesg after boot? susi:~ # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping: 10 microcode : 0xa0c cpu MHz : 1867.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow vnmi flexpriority bugs: bogomips: 3723.96 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: (repeats for second core :) I'm running 3.19 now, but the dmesg extracted from the crash dump of 4.0-rc3 is at http://paste.opensuse.org/48196621 -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Hi Andy, Am 18.03.2015 um 20:26 schrieb Andy Lutomirski: Hi Linus- You seem to enjoy debugging these things. Want to give this a shot? My guess is a vmalloc fault accessing either old_rsp or kernel_stack right after swapgs in syscall entry. On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried stefan.seyfr...@googlemail.com wrote: Hi all, first, I'm kind of happy that I'm not the only one seeing this, and thus my beloved Thinkpad can stay for a bit longer... :-) Then, I'm mostly an amateur when it comes to kernel debugging, so bear with me when I'm stumbling through the code... Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai ti...@suse.de wrote: At Wed, 18 Mar 2015 18:43:52 +0100, Takashi Iwai wrote: At Wed, 18 Mar 2015 15:16:42 +0100, Takashi Iwai wrote: At Sun, 15 Mar 2015 09:17:15 +0100, Stefan Seyfried wrote: Hi all, in 4.0-rc I have recently seen a few crashes, always when running KVM guests (IIRC). Today I was able to capture a crash dump, this is the backtrace from dmesg.txt: [242060.604870] PANIC: double fault, error_code: 0x0 OK, we double faulted. Too bad that x86 CPUs don't tell us why. [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 8801013d4000 [242060.604885] RIP: 0010:[816834ad] [816834ad] page_fault+0xd/0x30 The double fault happened during page fault processing. Could you disassemble your page_fault function to find the offending instruction? This one is easy: crash disassemble page_fault Dump of assembler code for function page_fault: 0x816834a0 +0: data32 xchg %ax,%ax 0x816834a3 +3: data32 xchg %ax,%ax 0x816834a6 +6: data32 xchg %ax,%ax 0x816834a9 +9: sub$0x78,%rsp 0x816834ad +13:callq 0x81683620 error_entry The callq was the double-faulting instruction, and it is indeed the first function in here that would have accessed the stack. (The sub *changes* rsp but isn't a memory access.) So, since RSP is bogus, we page fault, and the page fault is promoted to a double fault. The surprising thing is that the page fault itself seems to have been delivered okay, and RSP wasn't on a page boundary. You wouldn't happen to be using a Broadwell machine? No, this is a quite old Thinkpad X200s, Core2duo processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping: 10 microcode : 0xa0c The only way to get here with bogus RSP is if we interrupted something that was previously running at CPL0 with similarly bogus RSP. I don't know if I trust CR2. It's 16 bytes lower than I'd expect. 0x816834b2 +18:mov%rsp,%rdi 0x816834b5 +21:mov0x78(%rsp),%rsi 0x816834ba +26:movq $0x,0x78(%rsp) 0x816834c3 +35:callq 0x810504e0 do_page_fault 0x816834c8 +40:jmpq 0x816836d0 error_exit End of assembler dump. [242060.604893] RSP: 0018:7fffa55eafb8 EFLAGS: 00010016 Uh, what? That RSP is a user address. [242060.604895] RAX: aa40 RBX: 0001 RCX: 81682237 [242060.604896] RDX: aa40 RSI: RDI: 7fffa55eb078 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: [242060.604900] R10: R11: 0293 R12: 004a [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 7ffa3556cf20 [242060.604904] FS: 7ffa33dbfa80() GS:88023bc8() knlGS: [242060.604906] CS: 0010 DS: ES: CR0: 80050033 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 000427e0 [242060.604909] Stack: [242060.604942] BUG: unable to handle kernel paging request at 7fffa55eafb8 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190 This is suspicious. We need to have died, again, of a fatal page fault while dumping the stack. I posted the same problem to the opensuse kernel list shortly before turning to LKML. There, Michal Kubecek noted: I encountered a similar problem recently. The thing is, x86 specification says that on a double fault, RIP and RSP registers are undefined, i.e. you not only can't expect them to contain values corresponding to the first or second fault but you can't even expect them to have any usable values at all. Unfortunately the kernel double fault handler doesn't take this into account and does try to display usual crash related information so that it itself does usually
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried stefan.seyfr...@googlemail.com wrote: The relevant thread's stack is here (see ti in the trace): 8801013d4000 It could be interesting to see what's there. I don't suppose you want to try to walk the paging structures to see if 88023bc8 (i.e. gsbase) and, more specifically, 88023bc8 + old_rsp and 88023bc8 + kernel_stack are present? You'd only have to walk one level -- presumably, if the PGD entry is there, the rest of the entries are okay, too. That's all greek to me :-) I see that there is something at 88023bc8: crash x /64xg 0x88023bc8 0x88023bc8: 0x 0x 0x88023bc80010: 0x 0x 0x88023bc80020: 0x 0x6686ada9 0x88023bc80030: 0x 0x 0x88023bc80040: 0x 0x [all zeroes] 0x88023bc801f0: 0x 0x old_rsp and kernel_stack seem bogus: crash print old_rsp Cannot access memory at address 0xa200 gdb: gdb request failed: print old_rsp crash print kernel_stack Cannot access memory at address 0xaa48 gdb: gdb request failed: print kernel_stack kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is: Yup. old_rsp and kernel_stack are offsets relative to gsbase. crash x /64xg 0x88023bc8aa00 0x88023bc8aa00: 0x 0x [...] I don't know enough about crashkernel to know whether the fact that this worked means anything. AFAIK this just means that the memory at this location is included in the dump :-) Can you dump the page of physical memory at 0x4779a067? That's the PGD. Unfortunately not, this is a partial dump (I think the default config in openSUSE, but I might have changed it some time ago) and the dump_level is 31 which means that the following are excluded: | |cache |cache | | dump | zero |without|with | user | free level | page |private|private| data | page ---+--+---+---+--+-- 31 | X | X | X | X | X so this: crash x /64xg 0x4779a067 0x4779a067: Cannot access memory at address 0x4779a067 gdb: gdb request failed: x /64xg probably just means, that the PGD falls in one of the above excluded categories. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:21 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried stefan.seyfr...@googlemail.com wrote: Am 18.03.2015 um 21:51 schrieb Andy Lutomirski: On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried stefan.seyfr...@googlemail.com wrote: The relevant thread's stack is here (see ti in the trace): 8801013d4000 It could be interesting to see what's there. I don't suppose you want to try to walk the paging structures to see if 88023bc8 (i.e. gsbase) and, more specifically, 88023bc8 + old_rsp and 88023bc8 + kernel_stack are present? You'd only have to walk one level -- presumably, if the PGD entry is there, the rest of the entries are okay, too. That's all greek to me :-) I see that there is something at 88023bc8: crash x /64xg 0x88023bc8 0x88023bc8: 0x 0x 0x88023bc80010: 0x 0x 0x88023bc80020: 0x 0x6686ada9 0x88023bc80030: 0x 0x 0x88023bc80040: 0x 0x [all zeroes] 0x88023bc801f0: 0x 0x old_rsp and kernel_stack seem bogus: crash print old_rsp Cannot access memory at address 0xa200 gdb: gdb request failed: print old_rsp crash print kernel_stack Cannot access memory at address 0xaa48 gdb: gdb request failed: print kernel_stack kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is: Yup. old_rsp and kernel_stack are offsets relative to gsbase. crash x /64xg 0x88023bc8aa00 0x88023bc8aa00: 0x 0x [...] I don't know enough about crashkernel to know whether the fact that this worked means anything. AFAIK this just means that the memory at this location is included in the dump :-) Can you dump the page of physical memory at 0x4779a067? That's the PGD. Unfortunately not, this is a partial dump (I think the default config in openSUSE, but I might have changed it some time ago) and the dump_level is 31 which means that the following are excluded: | |cache |cache | | dump | zero |without|with | user | free level | page |private|private| data | page ---+--+---+---+--+-- 31 | X | X | X | X | X so this: crash x /64xg 0x4779a067 0x4779a067: Cannot access memory at address 0x4779a067 gdb: gdb request failed: x /64xg probably just means, that the PGD falls in one of the above excluded categories. I suspect that it actually means that gdb sees virtual addresses, not physical addresses. But I screwed up completely -- PGD in the dump is the PGD *entry*, not the PGD pointer. in crash, usually physical addresses work (it's a sophisticated wrapper around gdb AFAICT) We could plausibly fish it out from current-mm, but that's a mess. I'll come to that later I don't suppose that info registers or p/x $cr3 will show the cr3 value? No, that does not work from crash. But current-mm is easy: crash task|grep mm start_comm = \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000 mm = 0x8800b8a9c040, active_mm = 0x8800b8a9c040, comm = qemu-system-x86, and (guessing the type :-) crash print *(struct mm_struct *)0x8800b8a9c040|grep pgd pgd = 0x880002d7e000, But if that's correct, pgd contains all zeroes: crash print *(pgd_t *)0x880002d7e000 $15 = { pgd = 0 } crash x /16xg 0x880002d7e000 0x880002d7e000: 0x 0x 0x880002d7e010: 0x 0x 0x880002d7e020: 0x 0x 0x880002d7e030: 0x 0x 0x880002d7e040: 0x 0x 0x880002d7e050: 0x 0x 0x880002d7e060: 0x 0x 0x880002d7e070: 0x 0x In any case, Denys is right -- my theory doesn't really hold water on non-SMAP systems. Mine is definitely not new enough for this feature :) Maybe it would be more helpful if Takashi who is able to reproduce this more reliably than me would do a crash dump, preferably with a lower dumplevel, to investigate on. I have seen the bug two or three times in a week or two, which makes waiting for it to happen a boring experience. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body
Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Am 18.03.2015 um 22:32 schrieb Linus Torvalds: Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault' makes me suspect it is, and that that is some paravirt rewriting area. What does paravirt go for that USERGS_SYSRET64 (or for SWAPGS_UNSAFE_STACK, for that matter). This from the newer kernel package, but I doubt this configuration has been changed in the openSUSE kernel: susi:~ # grep PARAVIRT /boot/config-4.0.0-rc4-1.g126fc64-desktop CONFIG_PARAVIRT=y # CONFIG_PARAVIRT_DEBUG is not set # CONFIG_PARAVIRT_SPINLOCKS is not set # CONFIG_PARAVIRT_TIME_ACCOUNTING is not set CONFIG_PARAVIRT_CLOCK=y So yes, PARAVIRT is enabled. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
power button for 5 seconds. Unfortunately, I cannot load the crashdump with the crash version in openSUSE Tumbleweed, so the backtrace is all I have for now. Any hints? Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
: 7fffa55eafb8 I would not totally rule out a hardware problem, since this machine had another weird crash where it crashed and the bios beeper was constant on until I hit the power button for 5 seconds. Unfortunately, I cannot load the crashdump with the crash version in openSUSE Tumbleweed, so the backtrace is all I have for now. Any hints? Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Revert "blk-mq: fix hctx/ctx kobject use-after-free"
Am 03.02.2015 um 22:50 schrieb Jens Axboe: > On 02/03/2015 12:14 PM, Jens Axboe wrote: >> On 02/03/2015 12:13 PM, Stefan Seyfried wrote: >>> Am 29.01.2015 um 13:17 schrieb Ming Lei: >>>> This reverts commit 76d697d10769048e5721510100bf3a9413a56385. >>> The revert is not yet in Linus' tree (but it should get there before >>> 3.19 is released, or all USB-stick users will be unhappy). >> >> It'll go out later today. > > It's in Linus' tree now. ...and works well for my trivial "plug and unplug an USB stick" testcase. (I did not want to push, just make sure it wasn't forgotten :) Thanks all, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Revert "blk-mq: fix hctx/ctx kobject use-after-free"
Am 29.01.2015 um 13:17 schrieb Ming Lei: > This reverts commit 76d697d10769048e5721510100bf3a9413a56385. > > The commit 76d697d10769048 causes general protection fault > reported from Bart Van Assche: > > https://lkml.org/lkml/2015/1/28/334 I bisected the "unplugging my USB stick crashes the kernel" problem today and came to this very commit. The revert is not yet in Linus' tree (but it should get there before 3.19 is released, or all USB-stick users will be unhappy). Best regards, Stefan > Reported-by: Bart Van Assche > Signed-off-by: Ming Lei -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Revert blk-mq: fix hctx/ctx kobject use-after-free
Am 03.02.2015 um 22:50 schrieb Jens Axboe: On 02/03/2015 12:14 PM, Jens Axboe wrote: On 02/03/2015 12:13 PM, Stefan Seyfried wrote: Am 29.01.2015 um 13:17 schrieb Ming Lei: This reverts commit 76d697d10769048e5721510100bf3a9413a56385. The revert is not yet in Linus' tree (but it should get there before 3.19 is released, or all USB-stick users will be unhappy). It'll go out later today. It's in Linus' tree now. ...and works well for my trivial plug and unplug an USB stick testcase. (I did not want to push, just make sure it wasn't forgotten :) Thanks all, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Revert blk-mq: fix hctx/ctx kobject use-after-free
Am 29.01.2015 um 13:17 schrieb Ming Lei: This reverts commit 76d697d10769048e5721510100bf3a9413a56385. The commit 76d697d10769048 causes general protection fault reported from Bart Van Assche: https://lkml.org/lkml/2015/1/28/334 I bisected the unplugging my USB stick crashes the kernel problem today and came to this very commit. The revert is not yet in Linus' tree (but it should get there before 3.19 is released, or all USB-stick users will be unhappy). Best regards, Stefan Reported-by: Bart Van Assche bart.vanass...@sandisk.com Signed-off-by: Ming Lei ming@canonical.com -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel
Hi Takashi, yes, this no longer crashes. No real-world test yet, but the obvious crash is gone. Thanks! Am 07.11.2014 um 14:22 schrieb Takashi Iwai: > At Fri, 07 Nov 2014 12:10:46 +0100, > Stefan Seyfried wrote: >> >> Hi all, >> >> since 3.18-rc1, setting up a PPP interface kills my kernel with >> >> [ 163.433251] PPP generic driver version 2.4.2 >> [ 164.452474] [ cut here ] >> [ 164.453327] kernel BUG at ../mm/vmalloc.c:1316! >> [ 164.453327] invalid opcode: [#1] PREEMPT SMP >> [ 164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc >> af_packet xfs libcrc32c coretemp kvm_intel >> snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support >> uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc >> snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core >> v4l2_common snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi >> serio_raw wmi lpc_ich snd_timer thermal snd rfkill mfd_core tpm_tis shpchp >> mei_me soundcore ptp mei pps_core acpi_cpufreq tpm battery processor ac >> dm_mod btrfs xor raid6_pq i915 i2c_algo_bit drm_kms_helper drm video button >> sg >> [ 164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted >> 3.18.0-rc3-3.ge706e91-desktop #1 >> [ 164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) >> 11/10/2009 >> >> This is easy to reproduce with: >> >> linux:~ # cat bin/crashme.sh >> >> #!/bin/bash -x >> pppd local pty "netcat -l 1234" & >> sleep 1 >> pppd local pty "netcat localhost 1234" & >> sleep 1 >> >> >> 3.17 works fine. >> I bisected the issue multiple times and always arrived at >> >> # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch >> 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip >> >> which is a merge commit unfortunately. >> >> The BUG encountered above is in: >> >> 1309 static struct vm_struct *__get_vm_area_node(unsigned long size, >> 1310 unsigned long align, unsigned long flags, unsigned long >> start, >> 1311 unsigned long end, int node, gfp_t gfp_mask, const void >> *caller) >> 1312 { >> 1313 struct vmap_area *va; >> 1314 struct vm_struct *area; >> 1315 >> 1316 BUG_ON(in_interrupt()); >> 1317 if (flags & VM_IOREMAP) >> 1318 align = 1ul << clamp(fls(size), PAGE_SHIFT, >> IOREMAP_MAX_ORDER); >> 1319 >> >> the call trace is: >> [ 164.453327] Call Trace: >> [ 164.453327] [] __vmalloc_node_range+0x6d/0x290 >> [ 164.453327] [] __vmalloc+0x3e/0x50 >> [ 164.453327] [] bpf_prog_alloc+0x30/0xa0 >> [ 164.453327] [] bpf_prog_create+0x46/0xb0 >> [ 164.453327] [] ppp_ioctl+0x420/0xe9a [ppp_generic] >> [ 164.453327] [] do_vfs_ioctl+0x2e7/0x4c0 >> [ 164.453327] [] SyS_ioctl+0x81/0xa0 >> [ 164.453327] [] system_call_fastpath+0x16/0x1b >> [ 164.453327] [<7f4502d87397>] 0x7f4502d87397 > > bpf_prog_create() is called inside spin_lock_bh(), and the BUG_ON() > hits. Below is a quick fix. > > > Takashi > > -- 8< -- > From: Takashi Iwai > Subject: [PATCH] net: ppp: Don't call bpf_prog_create() in ppp_lock > > In ppp_ioctl(), bpf_prog_create() is called inside ppp_lock, which > eventually calls vmalloc() and hits BUG_ON() in vmalloc.c. This patch > works around the problem by moving the allocation outside the lock. > > Reported-by: Stefan Seyfried > Signed-off-by: Takashi Iwai FWIW :-) Tested-by: Stefan Seyfried > --- > drivers/net/ppp/ppp_generic.c | 40 > 1 file changed, 20 insertions(+), 20 deletions(-) > > diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c > index 68c3a3f4e0ab..794a47329368 100644 > --- a/drivers/net/ppp/ppp_generic.c > +++ b/drivers/net/ppp/ppp_generic.c > @@ -755,23 +755,23 @@ static long ppp_ioctl(struct file *file, unsigned int > cmd, unsigned long arg) > > err = get_filter(argp, ); > if (err >= 0) { > + struct bpf_prog *pass_filter = NULL; > struct sock_fprog_kern fprog = { > .len = err, > .filter = code, > }; > > - ppp_lock(ppp); > - if (ppp->pass_filter) { > - bpf_prog_destroy(ppp->
Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel
Am 07.11.2014 um 12:56 schrieb Stefan Seyfried: > Hi Paul, > > Am 07.11.2014 um 12:53 schrieb Paul Bolle: >> Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your >> v3.18-rc3 .config? > > Yes it is: > tux@linux:~> zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz > CONFIG_RCU_NOCB_CPU=y > # CONFIG_RCU_NOCB_CPU_NONE is not set > # CONFIG_RCU_NOCB_CPU_ZERO is not set > CONFIG_RCU_NOCB_CPU_ALL=y > > And I'll try without it, but looking at the backtrace and the actual > BUG_ON() in the code, I cannot really believe it is the real problems. > > But I'll try with the config changed and with the above line removed. JFTR, this did not help: tux@linux:~/linux> zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz # CONFIG_RCU_NOCB_CPU is not set neither did: --- a/init/main.c +++ b/init/main.c @@ -583,7 +583,7 @@ asmlinkage __visible void __init start_kernel(void) early_irq_init(); init_IRQ(); tick_init(); - rcu_init_nohz(); +// rcu_init_nohz(); init_timers(); hrtimers_init(); softirq_init(); -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel
Hi Paul, Am 07.11.2014 um 12:53 schrieb Paul Bolle: > On Fri, 2014-11-07 at 12:10 +0100, Stefan Seyfried wrote: >> I bisected the issue multiple times and always arrived at >> >> # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch >> 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip >> >> which is a merge commit unfortunately. > > That merge commit actually does add some code: > > git show d6dd50e07c5bec00db2005969b1a01f8ca3d25ef > [...] > diff --cc init/main.c > index 8af2f1abfe38,e3c4cdd94d5b..c5c11da6c4e1 > --- a/init/main.c > +++ b/init/main.c > @@@ -583,6 -585,6 +583,7 @@@ asmlinkage __visible void __init start_ > early_irq_init(); > init_IRQ(); > tick_init(); > ++rcu_init_nohz(); > init_timers(); > hrtimers_init(); > softirq_init(); > > Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your > v3.18-rc3 .config? Yes it is: tux@linux:~> zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_NONE is not set # CONFIG_RCU_NOCB_CPU_ZERO is not set CONFIG_RCU_NOCB_CPU_ALL=y And I'll try without it, but looking at the backtrace and the actual BUG_ON() in the code, I cannot really believe it is the real problems. But I'll try with the config changed and with the above line removed. Thanks, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION] in 3.18-rc1: ppp crashes kernel
Hi all, since 3.18-rc1, setting up a PPP interface kills my kernel with [ 163.433251] PPP generic driver version 2.4.2 [ 164.452474] [ cut here ] [ 164.453327] kernel BUG at ../mm/vmalloc.c:1316! [ 164.453327] invalid opcode: [#1] PREEMPT SMP [ 164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc af_packet xfs libcrc32c coretemp kvm_intel snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core v4l2_common snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi serio_raw wmi lpc_ich snd_timer thermal snd rfkill mfd_core tpm_tis shpchp mei_me soundcore ptp mei pps_core acpi_cpufreq tpm battery processor ac dm_mod btrfs xor raid6_pq i915 i2c_algo_bit drm_kms_helper drm video button sg [ 164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted 3.18.0-rc3-3.ge706e91-desktop #1 [ 164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) 11/10/2009 This is easy to reproduce with: linux:~ # cat bin/crashme.sh #!/bin/bash -x pppd local pty "netcat -l 1234" & sleep 1 pppd local pty "netcat localhost 1234" & sleep 1 3.17 works fine. I bisected the issue multiple times and always arrived at # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip which is a merge commit unfortunately. The BUG encountered above is in: 1309 static struct vm_struct *__get_vm_area_node(unsigned long size, 1310 unsigned long align, unsigned long flags, unsigned long start, 1311 unsigned long end, int node, gfp_t gfp_mask, const void *caller) 1312 { 1313 struct vmap_area *va; 1314 struct vm_struct *area; 1315 1316 BUG_ON(in_interrupt()); 1317 if (flags & VM_IOREMAP) 1318 align = 1ul << clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER); 1319 the call trace is: [ 164.453327] Call Trace: [ 164.453327] [] __vmalloc_node_range+0x6d/0x290 [ 164.453327] [] __vmalloc+0x3e/0x50 [ 164.453327] [] bpf_prog_alloc+0x30/0xa0 [ 164.453327] [] bpf_prog_create+0x46/0xb0 [ 164.453327] [] ppp_ioctl+0x420/0xe9a [ppp_generic] [ 164.453327] [] do_vfs_ioctl+0x2e7/0x4c0 [ 164.453327] [] SyS_ioctl+0x81/0xa0 [ 164.453327] [] system_call_fastpath+0x16/0x1b [ 164.453327] [<7f4502d87397>] 0x7f4502d87397 I have a crashdump of the kernel, but given this is easily reproducible, I doubt that I need to send this to anyone :-) Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION] in 3.18-rc1: ppp crashes kernel
Hi all, since 3.18-rc1, setting up a PPP interface kills my kernel with [ 163.433251] PPP generic driver version 2.4.2 [ 164.452474] [ cut here ] [ 164.453327] kernel BUG at ../mm/vmalloc.c:1316! [ 164.453327] invalid opcode: [#1] PREEMPT SMP [ 164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc af_packet xfs libcrc32c coretemp kvm_intel snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core v4l2_common snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi serio_raw wmi lpc_ich snd_timer thermal snd rfkill mfd_core tpm_tis shpchp mei_me soundcore ptp mei pps_core acpi_cpufreq tpm battery processor ac dm_mod btrfs xor raid6_pq i915 i2c_algo_bit drm_kms_helper drm video button sg [ 164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted 3.18.0-rc3-3.ge706e91-desktop #1 [ 164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) 11/10/2009 This is easy to reproduce with: linux:~ # cat bin/crashme.sh #!/bin/bash -x pppd local pty netcat -l 1234 sleep 1 pppd local pty netcat localhost 1234 sleep 1 3.17 works fine. I bisected the issue multiple times and always arrived at # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip which is a merge commit unfortunately. The BUG encountered above is in: 1309 static struct vm_struct *__get_vm_area_node(unsigned long size, 1310 unsigned long align, unsigned long flags, unsigned long start, 1311 unsigned long end, int node, gfp_t gfp_mask, const void *caller) 1312 { 1313 struct vmap_area *va; 1314 struct vm_struct *area; 1315 1316 BUG_ON(in_interrupt()); 1317 if (flags VM_IOREMAP) 1318 align = 1ul clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER); 1319 the call trace is: [ 164.453327] Call Trace: [ 164.453327] [811974bd] __vmalloc_node_range+0x6d/0x290 [ 164.453327] [8119771e] __vmalloc+0x3e/0x50 [ 164.453327] [81146950] bpf_prog_alloc+0x30/0xa0 [ 164.453327] [8157b716] bpf_prog_create+0x46/0xb0 [ 164.453327] [a07ecb90] ppp_ioctl+0x420/0xe9a [ppp_generic] [ 164.453327] [811df1c7] do_vfs_ioctl+0x2e7/0x4c0 [ 164.453327] [811df421] SyS_ioctl+0x81/0xa0 [ 164.453327] [8165ee2d] system_call_fastpath+0x16/0x1b [ 164.453327] [7f4502d87397] 0x7f4502d87397 I have a crashdump of the kernel, but given this is easily reproducible, I doubt that I need to send this to anyone :-) Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel
Hi Paul, Am 07.11.2014 um 12:53 schrieb Paul Bolle: On Fri, 2014-11-07 at 12:10 +0100, Stefan Seyfried wrote: I bisected the issue multiple times and always arrived at # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip which is a merge commit unfortunately. That merge commit actually does add some code: git show d6dd50e07c5bec00db2005969b1a01f8ca3d25ef [...] diff --cc init/main.c index 8af2f1abfe38,e3c4cdd94d5b..c5c11da6c4e1 --- a/init/main.c +++ b/init/main.c @@@ -583,6 -585,6 +583,7 @@@ asmlinkage __visible void __init start_ early_irq_init(); init_IRQ(); tick_init(); ++rcu_init_nohz(); init_timers(); hrtimers_init(); softirq_init(); Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your v3.18-rc3 .config? Yes it is: tux@linux:~ zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_NONE is not set # CONFIG_RCU_NOCB_CPU_ZERO is not set CONFIG_RCU_NOCB_CPU_ALL=y And I'll try without it, but looking at the backtrace and the actual BUG_ON() in the code, I cannot really believe it is the real problems. But I'll try with the config changed and with the above line removed. Thanks, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel
Am 07.11.2014 um 12:56 schrieb Stefan Seyfried: Hi Paul, Am 07.11.2014 um 12:53 schrieb Paul Bolle: Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your v3.18-rc3 .config? Yes it is: tux@linux:~ zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_NONE is not set # CONFIG_RCU_NOCB_CPU_ZERO is not set CONFIG_RCU_NOCB_CPU_ALL=y And I'll try without it, but looking at the backtrace and the actual BUG_ON() in the code, I cannot really believe it is the real problems. But I'll try with the config changed and with the above line removed. JFTR, this did not help: tux@linux:~/linux zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz # CONFIG_RCU_NOCB_CPU is not set neither did: --- a/init/main.c +++ b/init/main.c @@ -583,7 +583,7 @@ asmlinkage __visible void __init start_kernel(void) early_irq_init(); init_IRQ(); tick_init(); - rcu_init_nohz(); +// rcu_init_nohz(); init_timers(); hrtimers_init(); softirq_init(); -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel
Hi Takashi, yes, this no longer crashes. No real-world test yet, but the obvious crash is gone. Thanks! Am 07.11.2014 um 14:22 schrieb Takashi Iwai: At Fri, 07 Nov 2014 12:10:46 +0100, Stefan Seyfried wrote: Hi all, since 3.18-rc1, setting up a PPP interface kills my kernel with [ 163.433251] PPP generic driver version 2.4.2 [ 164.452474] [ cut here ] [ 164.453327] kernel BUG at ../mm/vmalloc.c:1316! [ 164.453327] invalid opcode: [#1] PREEMPT SMP [ 164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc af_packet xfs libcrc32c coretemp kvm_intel snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core v4l2_common snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi serio_raw wmi lpc_ich snd_timer thermal snd rfkill mfd_core tpm_tis shpchp mei_me soundcore ptp mei pps_core acpi_cpufreq tpm battery processor ac dm_mod btrfs xor raid6_pq i915 i2c_algo_bit drm_kms_helper drm video button sg [ 164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted 3.18.0-rc3-3.ge706e91-desktop #1 [ 164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) 11/10/2009 This is easy to reproduce with: linux:~ # cat bin/crashme.sh #!/bin/bash -x pppd local pty netcat -l 1234 sleep 1 pppd local pty netcat localhost 1234 sleep 1 3.17 works fine. I bisected the issue multiple times and always arrived at # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip which is a merge commit unfortunately. The BUG encountered above is in: 1309 static struct vm_struct *__get_vm_area_node(unsigned long size, 1310 unsigned long align, unsigned long flags, unsigned long start, 1311 unsigned long end, int node, gfp_t gfp_mask, const void *caller) 1312 { 1313 struct vmap_area *va; 1314 struct vm_struct *area; 1315 1316 BUG_ON(in_interrupt()); 1317 if (flags VM_IOREMAP) 1318 align = 1ul clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER); 1319 the call trace is: [ 164.453327] Call Trace: [ 164.453327] [811974bd] __vmalloc_node_range+0x6d/0x290 [ 164.453327] [8119771e] __vmalloc+0x3e/0x50 [ 164.453327] [81146950] bpf_prog_alloc+0x30/0xa0 [ 164.453327] [8157b716] bpf_prog_create+0x46/0xb0 [ 164.453327] [a07ecb90] ppp_ioctl+0x420/0xe9a [ppp_generic] [ 164.453327] [811df1c7] do_vfs_ioctl+0x2e7/0x4c0 [ 164.453327] [811df421] SyS_ioctl+0x81/0xa0 [ 164.453327] [8165ee2d] system_call_fastpath+0x16/0x1b [ 164.453327] [7f4502d87397] 0x7f4502d87397 bpf_prog_create() is called inside spin_lock_bh(), and the BUG_ON() hits. Below is a quick fix. Takashi -- 8 -- From: Takashi Iwai ti...@suse.de Subject: [PATCH] net: ppp: Don't call bpf_prog_create() in ppp_lock In ppp_ioctl(), bpf_prog_create() is called inside ppp_lock, which eventually calls vmalloc() and hits BUG_ON() in vmalloc.c. This patch works around the problem by moving the allocation outside the lock. Reported-by: Stefan Seyfried stefan.seyfr...@googlemail.com Signed-off-by: Takashi Iwai ti...@suse.de FWIW :-) Tested-by: Stefan Seyfried stefan.seyfr...@googlemail.com --- drivers/net/ppp/ppp_generic.c | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 68c3a3f4e0ab..794a47329368 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -755,23 +755,23 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg) err = get_filter(argp, code); if (err = 0) { + struct bpf_prog *pass_filter = NULL; struct sock_fprog_kern fprog = { .len = err, .filter = code, }; - ppp_lock(ppp); - if (ppp-pass_filter) { - bpf_prog_destroy(ppp-pass_filter); - ppp-pass_filter = NULL; + err = 0; + if (fprog.filter) + err = bpf_prog_create(pass_filter, fprog); + if (!err) { + ppp_lock(ppp); + if (ppp-pass_filter) + bpf_prog_destroy(ppp-pass_filter); + ppp-pass_filter = pass_filter; + ppp_unlock(ppp
[PATCH] Makefile: fix syntax error in warning message
From: Stefan Seyfried --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 893d6f0..eeaf3e7 100644 --- a/Makefile +++ b/Makefile @@ -606,7 +606,7 @@ ifdef CONFIG_CC_STACKPROTECTOR_REGULAR stackp-flag := -fstack-protector ifeq ($(call cc-option, $(stackp-flag)),) $(warning Cannot use CONFIG_CC_STACKPROTECTOR: \ - -fstack-protector not supported by compiler)) + -fstack-protector not supported by compiler) endif else ifdef CONFIG_CC_STACKPROTECTOR_STRONG stackp-flag := -fstack-protector-strong -- 1.8.5.2 Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Makefile: fix syntax error in warning message
From: Stefan Seyfried seife+ker...@b1-systems.com --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 893d6f0..eeaf3e7 100644 --- a/Makefile +++ b/Makefile @@ -606,7 +606,7 @@ ifdef CONFIG_CC_STACKPROTECTOR_REGULAR stackp-flag := -fstack-protector ifeq ($(call cc-option, $(stackp-flag)),) $(warning Cannot use CONFIG_CC_STACKPROTECTOR: \ - -fstack-protector not supported by compiler)) + -fstack-protector not supported by compiler) endif else ifdef CONFIG_CC_STACKPROTECTOR_STRONG stackp-flag := -fstack-protector-strong -- 1.8.5.2 Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
8250_pci: improve code comments and Kconfig help
Hi Greg, in order to avoid such regressions in the future, a comment in the source and a note in the Kconfig help text might be useful This patch is against git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-next Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 8250_pci: improve code comments and Kconfig help
From: Stefan Seyfried The recent regression about NetMos 9835 Multi-I/O boards indicates that comment pointing to the parport_serial driver could be helpful. Signed-off-by: Stefan Seyfried --- drivers/tty/serial/8250/8250_pci.c | 6 ++ drivers/tty/serial/8250/Kconfig| 2 ++ 2 files changed, 8 insertions(+) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index c52948b..c626c4f 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -4797,6 +4797,12 @@ static struct pci_device_id serial_pci_tbl[] = { PCI_VENDOR_ID_IBM, 0x0299, 0, 0, pbn_b0_bt_2_115200 }, + /* +* other NetMos 9835 devices are most likely handled by the +* parport_serial driver, check drivers/parport/parport_serial.c +* before adding them here. +*/ + { PCI_VENDOR_ID_NETMOS, PCI_DEVICE_ID_NETMOS_9901, 0xA000, 0x1000, 0, 0, pbn_b0_1_115200 }, diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig index a1ba94d..f3b306e 100644 --- a/drivers/tty/serial/8250/Kconfig +++ b/drivers/tty/serial/8250/Kconfig @@ -116,6 +116,8 @@ config SERIAL_8250_PCI This builds standard PCI serial support. You may be able to disable this feature if you only need legacy serial support. Saves about 9K. + Note that serial ports on NetMos 9835 Multi-I/O cards are handled + by the parport_serial driver, enabled with CONFIG_PARPORT_SERIAL. config SERIAL_8250_HP300 tristate -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
8250_pci: improve code comments and Kconfig help
Hi Greg, in order to avoid such regressions in the future, a comment in the source and a note in the Kconfig help text might be useful This patch is against git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-next Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 8250_pci: improve code comments and Kconfig help
From: Stefan Seyfried seife+ker...@b1-systems.com The recent regression about NetMos 9835 Multi-I/O boards indicates that comment pointing to the parport_serial driver could be helpful. Signed-off-by: Stefan Seyfried seife+ker...@b1-systems.com --- drivers/tty/serial/8250/8250_pci.c | 6 ++ drivers/tty/serial/8250/Kconfig| 2 ++ 2 files changed, 8 insertions(+) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index c52948b..c626c4f 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -4797,6 +4797,12 @@ static struct pci_device_id serial_pci_tbl[] = { PCI_VENDOR_ID_IBM, 0x0299, 0, 0, pbn_b0_bt_2_115200 }, + /* +* other NetMos 9835 devices are most likely handled by the +* parport_serial driver, check drivers/parport/parport_serial.c +* before adding them here. +*/ + { PCI_VENDOR_ID_NETMOS, PCI_DEVICE_ID_NETMOS_9901, 0xA000, 0x1000, 0, 0, pbn_b0_1_115200 }, diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig index a1ba94d..f3b306e 100644 --- a/drivers/tty/serial/8250/Kconfig +++ b/drivers/tty/serial/8250/Kconfig @@ -116,6 +116,8 @@ config SERIAL_8250_PCI This builds standard PCI serial support. You may be able to disable this feature if you only need legacy serial support. Saves about 9K. + Note that serial ports on NetMos 9835 Multi-I/O cards are handled + by the parport_serial driver, enabled with CONFIG_PARPORT_SERIAL. config SERIAL_8250_HP300 tristate -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
commit 8d2f8cd424 breaks parallel port, regression since 3.9-rc3 / backported to stable (3.4.37)
Hi all, the following commit: commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366 Author: Wang YanQing Date: Fri Mar 1 11:47:20 2013 +0800 serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller 01:08.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) Subsystem: Device [1000:0012] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Cc: stable Signed-off-by: Greg Kroah-Hartman breaks my 05:05.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) 05:05.0 0780: 9710:9835 (rev 01) Subsystem: 1000:0012 which has two serial and one parallel port, driven by parport_serial. The reason is, that this commit adds the PCI ID to 8250_pci, when it was handled by parport_serial before. In my case (openSUSE kernel), 8250 is built in and parport_serial is built as a module. Unfortunately with the device occupied by 8250, parport_serial finds no device and thus does not drive the parport. I bisected this in the stable series after the openSUSE kernel update (which pulled in the stable kernel update) broke my printing. Actually the above commit is totally unnecessary: the serial ports work very well without it, they are just driven by another driver. Can this please be reverted? I can't see which problem it solves, but it definitely breaks the additional ports on my multi-i/o board. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
commit 8d2f8cd424 breaks parallel port, regression since 3.9-rc3 / backported to stable (3.4.37)
Hi all, the following commit: commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366 Author: Wang YanQing udkni...@gmail.com Date: Fri Mar 1 11:47:20 2013 +0800 serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller 01:08.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) Subsystem: Device [1000:0012] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Interrupt: pin A routed to IRQ 20 Region 0: I/O ports at e050 [size=8] Region 1: I/O ports at e040 [size=8] Region 2: I/O ports at e030 [size=8] Region 3: I/O ports at e020 [size=8] Region 4: I/O ports at e010 [size=8] Region 5: I/O ports at e000 [size=16] Signed-off-by: Wang YanQing udkni...@gmail.com Cc: stable sta...@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org breaks my 05:05.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) 05:05.0 0780: 9710:9835 (rev 01) Subsystem: 1000:0012 which has two serial and one parallel port, driven by parport_serial. The reason is, that this commit adds the PCI ID to 8250_pci, when it was handled by parport_serial before. In my case (openSUSE kernel), 8250 is built in and parport_serial is built as a module. Unfortunately with the device occupied by 8250, parport_serial finds no device and thus does not drive the parport. I bisected this in the stable series after the openSUSE kernel update (which pulled in the stable kernel update) broke my printing. Actually the above commit is totally unnecessary: the serial ports work very well without it, they are just driven by another driver. Can this please be reverted? I can't see which problem it solves, but it definitely breaks the additional ports on my multi-i/o board. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Hi all, I hate to say it, but this regression from 3.9 is still present in 3.10-rc7 :-( Am 19.06.2013 11:02, schrieb Stefan Seyfried: > The suspend/resume failure is easily reproduced by > > * booting with "init=/bin/bash no_console_suspend" > * mount /sys > * echo mem > /sys/power/state > * resume => lots of messages, finally kernel panic. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Hi all, I hate to say it, but this regression from 3.9 is still present in 3.10-rc7 :-( Am 19.06.2013 11:02, schrieb Stefan Seyfried: The suspend/resume failure is easily reproduced by * booting with init=/bin/bash no_console_suspend * mount /sys * echo mem /sys/power/state * resume = lots of messages, finally kernel panic. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Hi Tomas, Am 19.06.2013 10:52, schrieb Winkler, Tomas: >> So it is not yet fixed, unfortunately. > > Not sure I understand how to reproduce it. it is still falling on > suspend/resume or just unbind/bind? > Would you be so kind and send me the whole log. Both is still broken. I'm actually not really sure if the unbind / bind stuff is really related to the suspend / resume failure. The messages just looked similar to me, but that might not mean anything. Sending the whole log is not easy, since it overflows the dmesg buffer (I have CONFIG_LOG_BUF_SHIFT=18 which is "big enough" usually) and the journald just exits and restarts itself under such flooding, but I'll try. Since the resume from suspend to RAM hangs, it is hard to get any logs -- I never got the mei serial working before and a "real" serial port is not present on this Thinkpad -- since the resume does not seem to restart userspace before killing the machine, so nothing gets into the logs. The suspend/resume failure is easily reproduced by * booting with "init=/bin/bash no_console_suspend" * mount /sys * echo mem > /sys/power/state * resume => lots of messages, finally kernel panic. For the bind/unbind: the driver is built in (this is the openSUSE kernel-of-the-day), but unbinding / rebinding also reproducibly floods the logs. It does not seem to have additional side effects, but I cannot test if mei actually still works afterwards. I could try to take a picture of the panic, but it looked not really directly related, more like a stack overflow after too many errors or something like that (it also takes a few seconds after resume for the machine to panic). Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Hi Tomas, Am 19.06.2013 10:52, schrieb Winkler, Tomas: So it is not yet fixed, unfortunately. Not sure I understand how to reproduce it. it is still falling on suspend/resume or just unbind/bind? Would you be so kind and send me the whole log. Both is still broken. I'm actually not really sure if the unbind / bind stuff is really related to the suspend / resume failure. The messages just looked similar to me, but that might not mean anything. Sending the whole log is not easy, since it overflows the dmesg buffer (I have CONFIG_LOG_BUF_SHIFT=18 which is big enough usually) and the journald just exits and restarts itself under such flooding, but I'll try. Since the resume from suspend to RAM hangs, it is hard to get any logs -- I never got the mei serial working before and a real serial port is not present on this Thinkpad -- since the resume does not seem to restart userspace before killing the machine, so nothing gets into the logs. The suspend/resume failure is easily reproduced by * booting with init=/bin/bash no_console_suspend * mount /sys * echo mem /sys/power/state * resume = lots of messages, finally kernel panic. For the bind/unbind: the driver is built in (this is the openSUSE kernel-of-the-day), but unbinding / rebinding also reproducibly floods the logs. It does not seem to have additional side effects, but I cannot test if mei actually still works afterwards. I could try to take a picture of the panic, but it looked not really directly related, more like a stack overflow after too many errors or something like that (it also takes a few seconds after resume for the machine to panic). Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Hi Tomas, executive summary: it is not fixed in 3.10rc6 Am 03.06.2013 20:09, schrieb Tomas Winkler: >>> Or, to be more precise: it breaks resume. >>> >>> The machine seems to lock up hard after resume, then after a few seconds >>> it panics (caps lock blinking). >>> >>> Reproduced on ThinkPad X200s >>> >>> 00:03.0 0780: 8086:2a44 (rev 07) >>> Intel Corporation Mobile 4 Series Chipset MEI Controller >>> >>> Debugged with "init=/bin/bash no_console_suspend", I see lots of errors >>> from the mei_me driver, then finally the panic (some overflow maybe?). >>> >>> Unbinding the device before suspend fixes resume. >> >> I just noticed that I get the following message on unbinding: >> >> $ echo :00:03.0 > /sys/bus/pci/drivers/mei_me/unbind >> $ dmesg|tail -2 >> [ 1216.830034] mei_me :00:03.0: stop >> [ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0 >> >> not sure if this is related. >> > Thanks for the report I'm looking into it. I looked at the git log of drivers/misc/mei and it looked promising. However, it still does not work, commit 42f132febff3b7b42c6c9dbfc151f29233be3132 does not seem to help enough on my hardware. Still just unbinding and rebinding with echo :00:03.0 > /sys/bus/pci/drivers/mei_me/unbind echo :00:03.0 > /sys/bus/pci/drivers/mei_me/bind triggers lots of [ 318.330981] mei_me :00:03.0: reset: wrong host start response [ 318.330984] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 318.330990] mei_me :00:03.0: reset: unexpected enumeration response hbm. [ 318.330993] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 318.331016] mei_me :00:03.0: reset: wrong host start response [ 318.331019] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 346.571031] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 346.571047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 376.631030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 376.631044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING It does, however, calm down after a few seconds, only to spew a few lines once every 30 seconds: [ 406.691032] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 406.691048] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 436.751033] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 436.751047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 466.811030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 466.811044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING So it is not yet fixed, unfortunately. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Hi Tomas, executive summary: it is not fixed in 3.10rc6 Am 03.06.2013 20:09, schrieb Tomas Winkler: Or, to be more precise: it breaks resume. The machine seems to lock up hard after resume, then after a few seconds it panics (caps lock blinking). Reproduced on ThinkPad X200s 00:03.0 0780: 8086:2a44 (rev 07) Intel Corporation Mobile 4 Series Chipset MEI Controller Debugged with init=/bin/bash no_console_suspend, I see lots of errors from the mei_me driver, then finally the panic (some overflow maybe?). Unbinding the device before suspend fixes resume. I just noticed that I get the following message on unbinding: $ echo :00:03.0 /sys/bus/pci/drivers/mei_me/unbind $ dmesg|tail -2 [ 1216.830034] mei_me :00:03.0: stop [ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0 not sure if this is related. Thanks for the report I'm looking into it. I looked at the git log of drivers/misc/mei and it looked promising. However, it still does not work, commit 42f132febff3b7b42c6c9dbfc151f29233be3132 does not seem to help enough on my hardware. Still just unbinding and rebinding with echo :00:03.0 /sys/bus/pci/drivers/mei_me/unbind echo :00:03.0 /sys/bus/pci/drivers/mei_me/bind triggers lots of [ 318.330981] mei_me :00:03.0: reset: wrong host start response [ 318.330984] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 318.330990] mei_me :00:03.0: reset: unexpected enumeration response hbm. [ 318.330993] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 318.331016] mei_me :00:03.0: reset: wrong host start response [ 318.331019] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 346.571031] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 346.571047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 376.631030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 376.631044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING It does, however, calm down after a few seconds, only to spew a few lines once every 30 seconds: [ 406.691032] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 406.691048] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 436.751033] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 436.751047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING [ 466.811030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1. [ 466.811044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING So it is not yet fixed, unfortunately. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
Am 03.06.2013 21:48, schrieb Frederic Weisbecker: > On Mon, Jun 03, 2013 at 11:47:17AM +0200, Stefan Seyfried wrote: >> FWIW: >> Tested-by: Stefan Seyfried >> >> This patch fixes the 0% CPU issue on openSUSE Factory kernels for me. > > Thanks! The patch has been committed already so I can't add your Tested-by: > but feedbacks on testing are always appeciated. But it did not end up in Linus' tree yet. That would be more important for me than the credits in the commit message :-) Thanks, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Am 03.06.2013 19:38, schrieb Stefan Seyfried: > Or, to be more precise: it breaks resume. > > The machine seems to lock up hard after resume, then after a few seconds > it panics (caps lock blinking). > > Reproduced on ThinkPad X200s > > 00:03.0 0780: 8086:2a44 (rev 07) > Intel Corporation Mobile 4 Series Chipset MEI Controller > > Debugged with "init=/bin/bash no_console_suspend", I see lots of errors > from the mei_me driver, then finally the panic (some overflow maybe?). > > Unbinding the device before suspend fixes resume. I just noticed that I get the following message on unbinding: $ echo :00:03.0 > /sys/bus/pci/drivers/mei_me/unbind $ dmesg|tail -2 [ 1216.830034] mei_me :00:03.0: stop [ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0 not sure if this is related. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Or, to be more precise: it breaks resume. The machine seems to lock up hard after resume, then after a few seconds it panics (caps lock blinking). Reproduced on ThinkPad X200s 00:03.0 0780: 8086:2a44 (rev 07) Intel Corporation Mobile 4 Series Chipset MEI Controller Debugged with "init=/bin/bash no_console_suspend", I see lots of errors from the mei_me driver, then finally the panic (some overflow maybe?). Unbinding the device before suspend fixes resume. This machine has suspended and resumed fine with 3.9. This machine has no serial port, so it is hard for me to capture output. I could try to take a picture of the panic message if that would be helpful. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
Am 20.05.2013 18:01, schrieb Frederic Weisbecker: > While computing the cputime delta of dynticks CPUs, > we are mixing up clocks of differents natures: [...] > As a consequence, some strange behaviour with unstable tsc > has been observed such as non progressing constant zero cputime. > (The 'top' command showing no load). This happens for example on my trusty ThinkPad X200s (family 6 model 23 stepping 10 Core 2 duo), seriously confusing its user (me :-). > Fix this by only using local_clock(), or its irq safe/remote > equivalent, in vtime code. > > Reported-by: Mike Galbraith > Suggested-by: Mike Galbraith > Cc: Steven Rostedt > Cc: Paul E. McKenney > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Peter Zijlstra > Cc: Borislav Petkov > Cc: Li Zhong > Cc: Mike Galbraith > Signed-off-by: Frederic Weisbecker FWIW: Tested-by: Stefan Seyfried This patch fixes the 0% CPU issue on openSUSE Factory kernels for me. Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
Am 20.05.2013 18:01, schrieb Frederic Weisbecker: While computing the cputime delta of dynticks CPUs, we are mixing up clocks of differents natures: [...] As a consequence, some strange behaviour with unstable tsc has been observed such as non progressing constant zero cputime. (The 'top' command showing no load). This happens for example on my trusty ThinkPad X200s (family 6 model 23 stepping 10 Core 2 duo), seriously confusing its user (me :-). Fix this by only using local_clock(), or its irq safe/remote equivalent, in vtime code. Reported-by: Mike Galbraith efa...@gmx.de Suggested-by: Mike Galbraith efa...@gmx.de Cc: Steven Rostedt rost...@goodmis.org Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: Peter Zijlstra pet...@infradead.org Cc: Borislav Petkov b...@alien8.de Cc: Li Zhong zh...@linux.vnet.ibm.com Cc: Mike Galbraith efa...@gmx.de Signed-off-by: Frederic Weisbecker fweis...@gmail.com FWIW: Tested-by: Stefan Seyfried seife+...@b1-systems.com This patch fixes the 0% CPU issue on openSUSE Factory kernels for me. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Or, to be more precise: it breaks resume. The machine seems to lock up hard after resume, then after a few seconds it panics (caps lock blinking). Reproduced on ThinkPad X200s 00:03.0 0780: 8086:2a44 (rev 07) Intel Corporation Mobile 4 Series Chipset MEI Controller Debugged with init=/bin/bash no_console_suspend, I see lots of errors from the mei_me driver, then finally the panic (some overflow maybe?). Unbinding the device before suspend fixes resume. This machine has suspended and resumed fine with 3.9. This machine has no serial port, so it is hard for me to capture output. I could try to take a picture of the panic message if that would be helpful. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3
Am 03.06.2013 19:38, schrieb Stefan Seyfried: Or, to be more precise: it breaks resume. The machine seems to lock up hard after resume, then after a few seconds it panics (caps lock blinking). Reproduced on ThinkPad X200s 00:03.0 0780: 8086:2a44 (rev 07) Intel Corporation Mobile 4 Series Chipset MEI Controller Debugged with init=/bin/bash no_console_suspend, I see lots of errors from the mei_me driver, then finally the panic (some overflow maybe?). Unbinding the device before suspend fixes resume. I just noticed that I get the following message on unbinding: $ echo :00:03.0 /sys/bus/pci/drivers/mei_me/unbind $ dmesg|tail -2 [ 1216.830034] mei_me :00:03.0: stop [ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0 not sure if this is related. Best regards, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting
Am 03.06.2013 21:48, schrieb Frederic Weisbecker: On Mon, Jun 03, 2013 at 11:47:17AM +0200, Stefan Seyfried wrote: FWIW: Tested-by: Stefan Seyfried seife+...@b1-systems.com This patch fixes the 0% CPU issue on openSUSE Factory kernels for me. Thanks! The patch has been committed already so I can't add your Tested-by: but feedbacks on testing are always appeciated. But it did not end up in Linus' tree yet. That would be more important for me than the credits in the commit message :-) Thanks, Stefan -- Stefan Seyfried Linux Consultant Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bluez-devel] [BUG] rfcomm
Dave Young schrieb: >> Feb 16 23:41:33 alon1 BUG: unable to handle kernel NULL pointer dereference >> at virtual address 0008 >> Feb 16 23:41:33 alon1 printing eip: c01b2db6 *pde = >> Feb 16 23:41:33 alon1 Oops: [#1] PREEMPT >> Feb 16 23:41:33 alon1 Modules linked in: ppp_deflate zlib_deflate >> zlib_inflate bsd_comp ppp_async rfcomm l2cap hci_usb vmnet(P) vmmon(P) tun >> radeon drm autofs4 ipv6 aes_generic crypto_algapi ieee80211_crypt_ccmp >> nf_nat_irc nf_nat_ftp nf_conntrack_irc nf_conntrack_ftp ipt_MASQUERADE >> iptable_nat nf_nat ipt_REJECT xt_tcpudp ipt_LOG xt_limit xt_state >> nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables snd_pcm_oss >> snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq >> snd_seq_device bluetooth ppp_generic slhc ioatdma dca cfq_iosched >> cpufreq_powersave cpufreq_ondemand cpufreq_conservative acpi_cpufreq >> freq_table uinput fan af_packet nls_cp1255 nls_iso8859_1 nls_utf8 nls_base >> pcmcia snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm nsc_ircc snd_timer >> ipw2200 thinkpad_acpi irda snd ehci_hcd yenta_socket uhci_hcd psmouse >> ieee80211 soundcore intel_agp hwmon rsrc_nonstatic pcspkr e1000 crc_ccitt >> snd_page_alloc i2c_i801 ieee80211_crypt pcmcia_core agpgart thermal b a > ttery nvram rtc sr_mod ac sg firmware_class button processor cdrom unix > usbcore evdev ext3 jbd ext2 mbcache loop ata_piix libata sd_mod scsi_mod >> Feb 16 23:41:33 alon1 >> Feb 16 23:41:33 alon1 Pid: 4, comm: events/0 Tainted: P >> (2.6.24-gentoo-r2 #1) >> Feb 16 23:41:33 alon1 EIP: 0060:[] EFLAGS: 00010282 CPU: 0 >> Feb 16 23:41:33 alon1 EIP is at sysfs_get_dentry+0x26/0x80 >> Feb 16 23:41:33 alon1 EAX: EBX: ECX: EDX: f48a2210 >> Feb 16 23:41:33 alon1 ESI: f72eb900 EDI: f4803ae0 EBP: f4803ae0 ESP: f7c49efc >> Feb 16 23:41:33 alon1 hcid[7004]: HCI dev 0 registered >> Feb 16 23:41:33 alon1 DS: 007b ES: 007b FS: GS: SS: 0068 >> Feb 16 23:41:33 alon1 Process events/0 (pid: 4, ti=f7c48000 task=f7c3efc0 >> task.ti=f7c48000) >> Feb 16 23:41:33 alon1 Stack: f7cb6140 f4822668 f7e71e10 c01b304d >> fffe c030ba9c >> Feb 16 23:41:33 alon1 f7cb6140 f4822668 f6da6720 f7cb6140 f4822668 f6da6720 >> c030ba8e c01ce20b >> Feb 16 23:41:33 alon1 f6e9dd00 c030ba8e f6da6720 f6e9dd00 f6e9dd00 >> f4822600 >> Feb 16 23:41:33 alon1 Call Trace: >> Feb 16 23:41:33 alon1 [] sysfs_move_dir+0x3d/0x1f0 >> Feb 16 23:41:33 alon1 [] kobject_move+0x9b/0x120 >> Feb 16 23:41:33 alon1 [] device_move+0x51/0x110 >> Feb 16 23:41:33 alon1 [] del_conn+0x0/0x70 [bluetooth] >> Feb 16 23:41:33 alon1 [] del_conn+0x19/0x70 [bluetooth] >> Feb 16 23:41:33 alon1 [] run_workqueue+0x81/0x140 >> Feb 16 23:41:33 alon1 [] schedule+0x168/0x2e0 > Could you try patch below? Works fine for me. Thanks. Together with the other two patches already taken by davem, this fixes all my current BT problems :-) > Defer hci_unregister_sysfs because hci device could be destructed > while hci conn devices still there. > > Signed-off-by: Dave Young <[EMAIL PROTECTED]> > > --- > net/bluetooth/hci_core.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff -upr linux/net/bluetooth/hci_core.c linux.new/net/bluetooth/hci_core.c > --- linux/net/bluetooth/hci_core.c2008-02-20 18:27:28.0 +0800 > +++ linux.new/net/bluetooth/hci_core.c2008-02-20 18:28:34.0 > +0800 > @@ -901,8 +901,6 @@ int hci_unregister_dev(struct hci_dev *h > > BT_DBG("%p name %s type %d", hdev, hdev->name, hdev->type); > > - hci_unregister_sysfs(hdev); > - > write_lock_bh(_dev_list_lock); > list_del(>list); > write_unlock_bh(_dev_list_lock); > @@ -914,6 +912,8 @@ int hci_unregister_dev(struct hci_dev *h > > hci_notify(hdev, HCI_DEV_UNREG); > > + hci_unregister_sysfs(hdev); > + > __hci_dev_put(hdev); > > return 0; -- Stefan Seyfried R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bluez-devel] [BUG] rfcomm
Dave Young schrieb: Feb 16 23:41:33 alon1 BUG: unable to handle kernel NULL pointer dereference at virtual address 0008 Feb 16 23:41:33 alon1 printing eip: c01b2db6 *pde = Feb 16 23:41:33 alon1 Oops: [#1] PREEMPT Feb 16 23:41:33 alon1 Modules linked in: ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async rfcomm l2cap hci_usb vmnet(P) vmmon(P) tun radeon drm autofs4 ipv6 aes_generic crypto_algapi ieee80211_crypt_ccmp nf_nat_irc nf_nat_ftp nf_conntrack_irc nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT xt_tcpudp ipt_LOG xt_limit xt_state nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device bluetooth ppp_generic slhc ioatdma dca cfq_iosched cpufreq_powersave cpufreq_ondemand cpufreq_conservative acpi_cpufreq freq_table uinput fan af_packet nls_cp1255 nls_iso8859_1 nls_utf8 nls_base pcmcia snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm nsc_ircc snd_timer ipw2200 thinkpad_acpi irda snd ehci_hcd yenta_socket uhci_hcd psmouse ieee80211 soundcore intel_agp hwmon rsrc_nonstatic pcspkr e1000 crc_ccitt snd_page_alloc i2c_i801 ieee80211_crypt pcmcia_core agpgart thermal b a ttery nvram rtc sr_mod ac sg firmware_class button processor cdrom unix usbcore evdev ext3 jbd ext2 mbcache loop ata_piix libata sd_mod scsi_mod Feb 16 23:41:33 alon1 Feb 16 23:41:33 alon1 Pid: 4, comm: events/0 Tainted: P (2.6.24-gentoo-r2 #1) Feb 16 23:41:33 alon1 EIP: 0060:[c01b2db6] EFLAGS: 00010282 CPU: 0 Feb 16 23:41:33 alon1 EIP is at sysfs_get_dentry+0x26/0x80 Feb 16 23:41:33 alon1 EAX: EBX: ECX: EDX: f48a2210 Feb 16 23:41:33 alon1 ESI: f72eb900 EDI: f4803ae0 EBP: f4803ae0 ESP: f7c49efc Feb 16 23:41:33 alon1 hcid[7004]: HCI dev 0 registered Feb 16 23:41:33 alon1 DS: 007b ES: 007b FS: GS: SS: 0068 Feb 16 23:41:33 alon1 Process events/0 (pid: 4, ti=f7c48000 task=f7c3efc0 task.ti=f7c48000) Feb 16 23:41:33 alon1 Stack: f7cb6140 f4822668 f7e71e10 c01b304d fffe c030ba9c Feb 16 23:41:33 alon1 f7cb6140 f4822668 f6da6720 f7cb6140 f4822668 f6da6720 c030ba8e c01ce20b Feb 16 23:41:33 alon1 f6e9dd00 c030ba8e f6da6720 f6e9dd00 f6e9dd00 f4822600 Feb 16 23:41:33 alon1 Call Trace: Feb 16 23:41:33 alon1 [c01b304d] sysfs_move_dir+0x3d/0x1f0 Feb 16 23:41:33 alon1 [c01ce20b] kobject_move+0x9b/0x120 Feb 16 23:41:33 alon1 [c0241711] device_move+0x51/0x110 Feb 16 23:41:33 alon1 [f9aaed80] del_conn+0x0/0x70 [bluetooth] Feb 16 23:41:33 alon1 [f9aaed99] del_conn+0x19/0x70 [bluetooth] Feb 16 23:41:33 alon1 [c012c1a1] run_workqueue+0x81/0x140 Feb 16 23:41:33 alon1 [c02c0c88] schedule+0x168/0x2e0 Could you try patch below? Works fine for me. Thanks. Together with the other two patches already taken by davem, this fixes all my current BT problems :-) Defer hci_unregister_sysfs because hci device could be destructed while hci conn devices still there. Signed-off-by: Dave Young [EMAIL PROTECTED] --- net/bluetooth/hci_core.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -upr linux/net/bluetooth/hci_core.c linux.new/net/bluetooth/hci_core.c --- linux/net/bluetooth/hci_core.c2008-02-20 18:27:28.0 +0800 +++ linux.new/net/bluetooth/hci_core.c2008-02-20 18:28:34.0 +0800 @@ -901,8 +901,6 @@ int hci_unregister_dev(struct hci_dev *h BT_DBG(%p name %s type %d, hdev, hdev-name, hdev-type); - hci_unregister_sysfs(hdev); - write_lock_bh(hci_dev_list_lock); list_del(hdev-list); write_unlock_bh(hci_dev_list_lock); @@ -914,6 +912,8 @@ int hci_unregister_dev(struct hci_dev *h hci_notify(hdev, HCI_DEV_UNREG); + hci_unregister_sysfs(hdev); + __hci_dev_put(hdev); return 0; -- Stefan Seyfried RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rft] Kill junk from s2ram resume paths
On Tue, Jul 31, 2007 at 04:43:34PM +0200, Stefan Seyfried wrote: > On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote: > > Hi! > > > > > > > > # Running in *copy* of this code, somewhere in low 1MB. > > > > > > > > > > > > - movb$0xa1, %al ; outb %al, $0x80 > > > > > > > > > > Well, what was this for? > > > > > > > > Debugging leds on port 80. I still have that card somewhere > > > > :-). Interesting parties can reinsert it. > > > > > > Ah, I see. > > > > > > Hmm, can you please write about that in the chanelog more explicitly? > > > Or just comment it out with a "uncomment this to get ..." text? > > > > I still need someone with x86-64 to test it for me before I submit it > > properly ;-). Updated patch follows. > > Compiling right now. Worked well on my x86_64 testmachine (a 64bit Thinkpad), worked before and after the patch with 2.6.23-rc1. -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
Hi, Sorry for joining late, just a small annotation: On Tue, Jul 17, 2007 at 01:18:13PM -0700, [EMAIL PROTECTED] wrote: > non-ACPI hibernate > > since the box powers off > it uses zero power while suspended > another OS could be run before a resume > hardware can be swapped, suspend image could be sent around the world to > be restored on another system. > restore makes no assumptions about the state of the hardware when it is > restored > restore is slower (full BIOS boot is required) > should be able to work on just about any hardware (the limit is the ability > to initialize the devices) > > > ACPI suspends > > since the box never completely powers off:A wrong > a complete power failure breaks the suspend wrong > the OS must remain in control so other uses must be prevented. > hardware must remain in the ACPI state from suspend until restore. > restore can be faster (some initialization may be able to be skipped) > requires ACPI hardware support > > under the catagory of ACPI suspends you have ACPI S4 turns off the machine completely and you can remove the battery (this is even required somewhere in the spec). Any state saving is done in CMOS RAM or flash. But for example many Notebooks resume much faster if they go through the ACPI S4 hooks during suspend (less than one second from "lid open" to "grub" while they need ~10 seconds through the BIOS on a "normal" boot. My Toughbook resumes on "Lid Opened" after S4, it doesn't after a shutdown. So there will be differences. I'm not saying that they are too important, but 20% faster resume still is a good saving for me. No need to restart this thread btw ;-) Have fun, Stefan -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
Hi, Sorry for joining late, just a small annotation: On Tue, Jul 17, 2007 at 01:18:13PM -0700, [EMAIL PROTECTED] wrote: non-ACPI hibernate since the box powers off it uses zero power while suspended another OS could be run before a resume hardware can be swapped, suspend image could be sent around the world to be restored on another system. restore makes no assumptions about the state of the hardware when it is restored restore is slower (full BIOS boot is required) should be able to work on just about any hardware (the limit is the ability to initialize the devices) ACPI suspends since the box never completely powers off:A wrong a complete power failure breaks the suspend wrong the OS must remain in control so other uses must be prevented. hardware must remain in the ACPI state from suspend until restore. restore can be faster (some initialization may be able to be skipped) requires ACPI hardware support under the catagory of ACPI suspends you have ACPI S4 turns off the machine completely and you can remove the battery (this is even required somewhere in the spec). Any state saving is done in CMOS RAM or flash. But for example many Notebooks resume much faster if they go through the ACPI S4 hooks during suspend (less than one second from lid open to grub while they need ~10 seconds through the BIOS on a normal boot. My Toughbook resumes on Lid Opened after S4, it doesn't after a shutdown. So there will be differences. I'm not saying that they are too important, but 20% faster resume still is a good saving for me. No need to restart this thread btw ;-) Have fun, Stefan -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rft] Kill junk from s2ram resume paths
On Tue, Jul 31, 2007 at 04:43:34PM +0200, Stefan Seyfried wrote: On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote: Hi! # Running in *copy* of this code, somewhere in low 1MB. - movb$0xa1, %al ; outb %al, $0x80 Well, what was this for? Debugging leds on port 80. I still have that card somewhere :-). Interesting parties can reinsert it. Ah, I see. Hmm, can you please write about that in the chanelog more explicitly? Or just comment it out with a uncomment this to get ... text? I still need someone with x86-64 to test it for me before I submit it properly ;-). Updated patch follows. Compiling right now. Worked well on my x86_64 testmachine (a 64bit Thinkpad), worked before and after the patch with 2.6.23-rc1. -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rft] Kill junk from s2ram resume paths
On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote: > Hi! > > > > > > # Running in *copy* of this code, somewhere in low 1MB. > > > > > > > > > > - movb$0xa1, %al ; outb %al, $0x80 > > > > > > > > Well, what was this for? > > > > > > Debugging leds on port 80. I still have that card somewhere > > > :-). Interesting parties can reinsert it. > > > > Ah, I see. > > > > Hmm, can you please write about that in the chanelog more explicitly? > > Or just comment it out with a "uncomment this to get ..." text? > > I still need someone with x86-64 to test it for me before I submit it > properly ;-). Updated patch follows. Compiling right now. > Pavel > > diff --git a/arch/i386/kernel/acpi/wakeup.S b/arch/i386/kernel/acpi/wakeup.S > index 1415da1..9cebef7 100644 > --- a/arch/i386/kernel/acpi/wakeup.S > +++ b/arch/i386/kernel/acpi/wakeup.S > @@ -28,21 +28,6 @@ #define BEEP \ > movb$15, %al; \ > outb%al, $66; > > -#define BEEP \ > - inb $97, %al; \ > - outb%al, $0x80; \ > - movb$3, %al;\ > - outb%al, $97; \ > - outb%al, $0x80; \ > - movb$-74, %al; \ > - outb%al, $67; \ > - outb%al, $0x80; \ > - movb$-119, %al; \ > - outb%al, $66; \ > - outb%al, $0x80; \ > - movb $15, %al; \ > - outb%al, $66; > - > ALIGN > .align 4096 > ENTRY(wakeup_start) This hunk rejected for me (against 2.6.23-rc1), but i'm testing x86_64, so it did not matter ;-) -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rft] Kill junk from s2ram resume paths
On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote: Hi! # Running in *copy* of this code, somewhere in low 1MB. - movb$0xa1, %al ; outb %al, $0x80 Well, what was this for? Debugging leds on port 80. I still have that card somewhere :-). Interesting parties can reinsert it. Ah, I see. Hmm, can you please write about that in the chanelog more explicitly? Or just comment it out with a uncomment this to get ... text? I still need someone with x86-64 to test it for me before I submit it properly ;-). Updated patch follows. Compiling right now. Pavel diff --git a/arch/i386/kernel/acpi/wakeup.S b/arch/i386/kernel/acpi/wakeup.S index 1415da1..9cebef7 100644 --- a/arch/i386/kernel/acpi/wakeup.S +++ b/arch/i386/kernel/acpi/wakeup.S @@ -28,21 +28,6 @@ #define BEEP \ movb$15, %al; \ outb%al, $66; -#define BEEP \ - inb $97, %al; \ - outb%al, $0x80; \ - movb$3, %al;\ - outb%al, $97; \ - outb%al, $0x80; \ - movb$-74, %al; \ - outb%al, $67; \ - outb%al, $0x80; \ - movb$-119, %al; \ - outb%al, $66; \ - outb%al, $0x80; \ - movb$15, %al; \ - outb%al, $66; - ALIGN .align 4096 ENTRY(wakeup_start) This hunk rejected for me (against 2.6.23-rc1), but i'm testing x86_64, so it did not matter ;-) -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: update ACPI_PROCFS removal schedule
On Thu, Jul 12, 2007 at 06:20:34PM +0800, rzhang1 wrote: > From: Zhang Rui <[EMAIL PROTECTED]> > > ACPI sysfs conversion is not finished yet and > some user space tools still depend on the ACPI procfs I/F. > > The ACPI_PROCFS removal schedule is changed to Jan 08. I think that's too early. The conversion to sysfs is not even finished, so it will be less than 6 months. -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][BUTTON] remove procfs-interface
On Fri, Jul 13, 2007 at 12:37:07AM +0530, Satyam Sharma wrote: > On 7/12/07, Zhang, Rui <[EMAIL PROTECTED]> wrote: > >Well, the ACPI sysfs conversion is not finished yet > >[...] > >I'm not sure if the button sysfs I/F is already finished. > >We'd better make a double check. :) > > Ok, this sounds reasonable. > > >and some user space tools still use the ACPI procfs. > > But this does *not*, IMHO. It quite defeats the whole concept of > feature-removal-schedule.txt. I think that file exists precisely > because we cannot gratuitously break userspace interfaces just > like that, but when something gets put up there with a removal date > that is a good one year in the future, and userspace tools _still_ > continue to use it ... then, I suspect something's seriously wrong. Holy sh*t. There is not even a functional replacement ready, but still everybody wants to remove /proc/acpi. (Maybe the replacement started to work recently, i have not looked into this area for the last months. This does not change my pint, though). This is not going to work. IMNSHO, we need the new interface available and usable for quite some time (i'd say for over one year), and then we can start to phase out the old interface. Starting with removing /proc/acpi is not the correct ordering of actions. > Either the feature-removal-schedule.txt file has become something > that users don't even bother checking, or else, they _know_ that > even if they don't bother keeping up with the pace in kernel-land, > that interface still won't go away (because they're still using it!). Or they look at the feature-removal document, find out that there is no replacement available and conclude "the writers of this document must have been on crack, or this document is unmaintained". I cannot disagree with them. -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] the scheduled ACPI_PROCFS removal
On Thu, Jul 12, 2007 at 10:18:17AM +0100, Richard Hughes wrote: > On Thu, 2007-07-12 at 09:32 +0400, Alexey Starikovskiy wrote: > > >> [*] Does someone have an alternative for > > >> /proc/acpi/battery/BAT1/{state,info}? > > I'm working on it. Should have proto by the end of week. > > If you are using the power_supply class (i hope you are ;-) then a HAL > from freedesktop git should make userspace continue to just work. Having to update HAL is not my definition of "does not break userspace". And, BTW, there is more than just HAL out there using /proc/acpi, and this should continue to work. -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] the scheduled ACPI_PROCFS removal
On Thu, Jul 12, 2007 at 10:18:17AM +0100, Richard Hughes wrote: On Thu, 2007-07-12 at 09:32 +0400, Alexey Starikovskiy wrote: [*] Does someone have an alternative for /proc/acpi/battery/BAT1/{state,info}? I'm working on it. Should have proto by the end of week. If you are using the power_supply class (i hope you are ;-) then a HAL from freedesktop git should make userspace continue to just work. Having to update HAL is not my definition of does not break userspace. And, BTW, there is more than just HAL out there using /proc/acpi, and this should continue to work. -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][BUTTON] remove procfs-interface
On Fri, Jul 13, 2007 at 12:37:07AM +0530, Satyam Sharma wrote: On 7/12/07, Zhang, Rui [EMAIL PROTECTED] wrote: Well, the ACPI sysfs conversion is not finished yet [...] I'm not sure if the button sysfs I/F is already finished. We'd better make a double check. :) Ok, this sounds reasonable. and some user space tools still use the ACPI procfs. But this does *not*, IMHO. It quite defeats the whole concept of feature-removal-schedule.txt. I think that file exists precisely because we cannot gratuitously break userspace interfaces just like that, but when something gets put up there with a removal date that is a good one year in the future, and userspace tools _still_ continue to use it ... then, I suspect something's seriously wrong. Holy sh*t. There is not even a functional replacement ready, but still everybody wants to remove /proc/acpi. (Maybe the replacement started to work recently, i have not looked into this area for the last months. This does not change my pint, though). This is not going to work. IMNSHO, we need the new interface available and usable for quite some time (i'd say for over one year), and then we can start to phase out the old interface. Starting with removing /proc/acpi is not the correct ordering of actions. Either the feature-removal-schedule.txt file has become something that users don't even bother checking, or else, they _know_ that even if they don't bother keeping up with the pace in kernel-land, that interface still won't go away (because they're still using it!). Or they look at the feature-removal document, find out that there is no replacement available and conclude the writers of this document must have been on crack, or this document is unmaintained. I cannot disagree with them. -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: update ACPI_PROCFS removal schedule
On Thu, Jul 12, 2007 at 06:20:34PM +0800, rzhang1 wrote: From: Zhang Rui [EMAIL PROTECTED] ACPI sysfs conversion is not finished yet and some user space tools still depend on the ACPI procfs I/F. The ACPI_PROCFS removal schedule is changed to Jan 08. I think that's too early. The conversion to sysfs is not even finished, so it will be less than 6 months. -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Failure to properly reinit i8042 post suspend-to-ram
Hi, On Tue, Jul 10, 2007 at 10:59:57AM +1000, Nigel Cunningham wrote: > On Saturday 07 July 2007 01:04:51 Stefan Seyfried wrote: > > On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote: > > > > > > Adding i8042.reset=1 to the commandline fixed it. > > > > Wasn't there a quirk list where workarounds for i8042 on known bad machines > > are stored? Maybe it would be a good idea to get your machine into it ;-) > > Unless I'm missing something, it looks like there's no such thing in the > i8042 > driver. That's okay. I can cope with adding i8042.reset=1 to my > commandline :) In drivers/input/serio/i8042-x86ia64io.h there are tables for various quirks, but apparently nothing for "reset=1". If we find another machine that needs reset=1, then it might be time for a table for this quirk. Best regards, Stefan -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Failure to properly reinit i8042 post suspend-to-ram
Hi, On Tue, Jul 10, 2007 at 10:59:57AM +1000, Nigel Cunningham wrote: On Saturday 07 July 2007 01:04:51 Stefan Seyfried wrote: On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote: Adding i8042.reset=1 to the commandline fixed it. Wasn't there a quirk list where workarounds for i8042 on known bad machines are stored? Maybe it would be a good idea to get your machine into it ;-) Unless I'm missing something, it looks like there's no such thing in the i8042 driver. That's okay. I can cope with adding i8042.reset=1 to my commandline :) In drivers/input/serio/i8042-x86ia64io.h there are tables for various quirks, but apparently nothing for reset=1. If we find another machine that needs reset=1, then it might be time for a table for this quirk. Best regards, Stefan -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Failure to properly reinit i8042 post suspend-to-ram
On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote: > > > > If confusion persist after 4 seconds hard power down... then you h ve > > hw/BIOS problem. Complain to whoever is manufacturing that beast. > > Adding i8042.reset=1 to the commandline fixed it. Wasn't there a quirk list where workarounds for i8042 on known bad machines are stored? Maybe it would be a good idea to get your machine into it ;-) -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Failure to properly reinit i8042 post suspend-to-ram
On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote: If confusion persist after 4 seconds hard power down... then you h ve hw/BIOS problem. Complain to whoever is manufacturing that beast. Adding i8042.reset=1 to the commandline fixed it. Wasn't there a quirk list where workarounds for i8042 on known bad machines are stored? Maybe it would be a good idea to get your machine into it ;-) -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND
Hi, On Thu, Jul 05, 2007 at 12:39:22AM +0200, Pavel Machek wrote: > Hi! > > > Yes, but I'm not sure if netconsole is the only one that we will want to > > have > > Well, netconsole is the only one we know of. AFAIR it is plain luck that serial console sometimes works. I repeat: "no bugreport" is not the same as "it works for everyone" wrt. suspend. It seems (i unfortunately have no numbers, since my machines always worked without suspending the consoles) as if suspending consoles generally helped reliability of suspend. > > disabled. Moreover, what if someone wants to use the netconsole regardless > > of the fact that it can crash the box? > > He'll have to edit the sources at that point. I'd prefer not to have > too many "please crash the box" options. So should we remove sysrq-C? This is a debugging option. Only root can set it. Its purpose is to make "machine hangs during suspend" (even before it goes to sleep) debuggable. It will only be set if the machine crashes anyways. (We can taint the kernel if this control is set, if that helps you). -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND
Hi, On Thu, Jul 05, 2007 at 12:39:22AM +0200, Pavel Machek wrote: Hi! Yes, but I'm not sure if netconsole is the only one that we will want to have Well, netconsole is the only one we know of. AFAIR it is plain luck that serial console sometimes works. I repeat: no bugreport is not the same as it works for everyone wrt. suspend. It seems (i unfortunately have no numbers, since my machines always worked without suspending the consoles) as if suspending consoles generally helped reliability of suspend. disabled. Moreover, what if someone wants to use the netconsole regardless of the fact that it can crash the box? He'll have to edit the sources at that point. I'd prefer not to have too many please crash the box options. So should we remove sysrq-C? This is a debugging option. Only root can set it. Its purpose is to make machine hangs during suspend (even before it goes to sleep) debuggable. It will only be set if the machine crashes anyways. (We can taint the kernel if this control is set, if that helps you). -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Optional Beeping During Resume From Suspend To Ram.
On Fri, Jun 29, 2007 at 08:27:12AM +1000, Nigel Cunningham wrote: > > Can we rename/reuse existing flag variable? > > Sorry, but I can't resist the opportunity to say "Send a patch!" :) > > Seriously, though, I'd prefer not to. If we rename that acpi video flags > variable (I assume this is what you're thinking of), we only create cause for > confusion. A variable should for debugging or for controlling quirks, not for > both at the same time. I agree. And video_flags is something totally different :-) I just used that one in my ad-hoc hack (which actually was only to illustrate the idea) because a) it was enough to show the intent and b) i did not know how to do it better ;-) -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND
On Thu, Jun 28, 2007 at 09:12:44PM +0200, Rafael J. Wysocki wrote: > On Thursday, 28 June 2007 19:25, Stefan Seyfried wrote: > > > > However, we don't know which consoles are safe to stay alive during suspend. > > Generally, defaulting to suspending them all is not a bad idea IMHO. > > And IIRC it is plain luck if a serial console survives the suspend (or was > > the serial code fixed recently?) > > Well, I don't think so, but I'm not sure. > > The VGA/fb console also should be off during suspend (not necessarily during > hibernation, though). IIRC, that's what caused Linus to introduce the > suspending of consoles after all. > > > So i do not care too much, but my / Frank's patch was shorter :-) and safer. > > I'm not sure which way to go. On the one hand, I agree that we should rather > fix the consoles so that we know which one is suspend-safe and which is not > and disable the unsafe ones, but on the other hand we are not there yet and it > _sometimes_ is useful not to suspend a console even if we know that it will > break things. This is what my / Frank's patch was aimed at: give the user the ability to (painlessly, without rebuilding the kernel) debug suspend problems. Keep the default safe, like Linus likes it (consoles suspended), but give the user a switch to make it unsafe (consoles not suspended) for the sake of debugging. Of course, fixing up all console drivers is an option that i'd very much like to see. It is however debatable if it is really worth the effort. If it works with consoles suspended, the user does not care. If it doesn't, he turns on debugging (knowing, or being told that this will break using netconsole). I strongly oppose Pavel's approach to "declare all console drivers as nonbroken except netconsole". Even if he has not seen any failures apart from netconsole, in general i had the impression that suspending consoles did help. At least suspend works on many more machines than half a year ago, and i'd not be surprised if this was partly due to suspending the consoles. Remember that wrt. suspend "i did not get a bugreport" very often just means "people tried it, it did not work, but they expected that and just turned away". It does not mean "it just works for everyone". -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND
On Thu, Jun 28, 2007 at 09:12:44PM +0200, Rafael J. Wysocki wrote: On Thursday, 28 June 2007 19:25, Stefan Seyfried wrote: However, we don't know which consoles are safe to stay alive during suspend. Generally, defaulting to suspending them all is not a bad idea IMHO. And IIRC it is plain luck if a serial console survives the suspend (or was the serial code fixed recently?) Well, I don't think so, but I'm not sure. The VGA/fb console also should be off during suspend (not necessarily during hibernation, though). IIRC, that's what caused Linus to introduce the suspending of consoles after all. So i do not care too much, but my / Frank's patch was shorter :-) and safer. I'm not sure which way to go. On the one hand, I agree that we should rather fix the consoles so that we know which one is suspend-safe and which is not and disable the unsafe ones, but on the other hand we are not there yet and it _sometimes_ is useful not to suspend a console even if we know that it will break things. This is what my / Frank's patch was aimed at: give the user the ability to (painlessly, without rebuilding the kernel) debug suspend problems. Keep the default safe, like Linus likes it (consoles suspended), but give the user a switch to make it unsafe (consoles not suspended) for the sake of debugging. Of course, fixing up all console drivers is an option that i'd very much like to see. It is however debatable if it is really worth the effort. If it works with consoles suspended, the user does not care. If it doesn't, he turns on debugging (knowing, or being told that this will break using netconsole). I strongly oppose Pavel's approach to declare all console drivers as nonbroken except netconsole. Even if he has not seen any failures apart from netconsole, in general i had the impression that suspending consoles did help. At least suspend works on many more machines than half a year ago, and i'd not be surprised if this was partly due to suspending the consoles. Remember that wrt. suspend i did not get a bugreport very often just means people tried it, it did not work, but they expected that and just turned away. It does not mean it just works for everyone. -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Optional Beeping During Resume From Suspend To Ram.
On Fri, Jun 29, 2007 at 08:27:12AM +1000, Nigel Cunningham wrote: Can we rename/reuse existing flag variable? Sorry, but I can't resist the opportunity to say Send a patch! :) Seriously, though, I'd prefer not to. If we rename that acpi video flags variable (I assume this is what you're thinking of), we only create cause for confusion. A variable should for debugging or for controlling quirks, not for both at the same time. I agree. And video_flags is something totally different :-) I just used that one in my ad-hoc hack (which actually was only to illustrate the idea) because a) it was enough to show the intent and b) i did not know how to do it better ;-) -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND
(CC'ing Linus, since disabling consoles during suspend was his idea IIRC) On Thu, Jun 28, 2007 at 05:34:54PM +0200, Rafael J. Wysocki wrote: > Hi, > > On Thursday, 28 June 2007 15:51, Pavel Machek wrote: > > Hi! > > > > What about this? (Only compile tested, but looks pretty obvious to > > me). Something like this should get us rid of ugly option, and still > > solve debugging problems... Hmmm? > > Pavel > > > > Kill CONFIG_DISABLE_CONSOLE_SUSPEND; it should not be configurable at > > all, instead, we should automatically keep console alive when > > possible. > > > > Signed-off-by: Pavel Machek <[EMAIL PROTECTED]> > > > > diff --git a/drivers/char/lp.c b/drivers/char/lp.c > > index 62051f8..8267ff8 100644 > > --- a/drivers/char/lp.c > > +++ b/drivers/char/lp.c > > @@ -144,7 +144,7 @@ static unsigned int lp_count = 0; > > static struct class *lp_class; > > > > #ifdef CONFIG_LP_CONSOLE > > -static struct parport *console_registered; // initially NULL > > +static struct parport *console_registered; > > #endif /* CONFIG_LP_CONSOLE */ > > Could you please avoid fixing things like this, white space etc. in this > patch? > It would be easier to read ... Yes. > I generally agree with the idea, but the patch needs a clean up, IMHO. However, we don't know which consoles are safe to stay alive during suspend. Generally, defaulting to suspending them all is not a bad idea IMHO. And IIRC it is plain luck if a serial console survives the suspend (or was the serial code fixed recently?) So i do not care too much, but my / Frank's patch was shorter :-) and safer. -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND
(CC'ing Linus, since disabling consoles during suspend was his idea IIRC) On Thu, Jun 28, 2007 at 05:34:54PM +0200, Rafael J. Wysocki wrote: Hi, On Thursday, 28 June 2007 15:51, Pavel Machek wrote: Hi! What about this? (Only compile tested, but looks pretty obvious to me). Something like this should get us rid of ugly option, and still solve debugging problems... Hmmm? Pavel Kill CONFIG_DISABLE_CONSOLE_SUSPEND; it should not be configurable at all, instead, we should automatically keep console alive when possible. Signed-off-by: Pavel Machek [EMAIL PROTECTED] diff --git a/drivers/char/lp.c b/drivers/char/lp.c index 62051f8..8267ff8 100644 --- a/drivers/char/lp.c +++ b/drivers/char/lp.c @@ -144,7 +144,7 @@ static unsigned int lp_count = 0; static struct class *lp_class; #ifdef CONFIG_LP_CONSOLE -static struct parport *console_registered; // initially NULL +static struct parport *console_registered; #endif /* CONFIG_LP_CONSOLE */ Could you please avoid fixing things like this, white space etc. in this patch? It would be easier to read ... Yes. I generally agree with the idea, but the patch needs a clean up, IMHO. However, we don't know which consoles are safe to stay alive during suspend. Generally, defaulting to suspending them all is not a bad idea IMHO. And IIRC it is plain luck if a serial console survives the suspend (or was the serial code fixed recently?) So i do not care too much, but my / Frank's patch was shorter :-) and safer. -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable
On Thu, Jun 21, 2007 at 03:20:08PM +0200, Pavel Machek wrote: > Hi! > > No, i don't agree at all. > > > > In this case, "no config needed" == "not possible to debug suspend > > problems". > > No, sorry. > > My proposed solution is "figure out which console drivers can survive > being on while machines go down, and keep them on". > > So, "no config needed" == "kernel always does the right thing, keeping > console during suspend when possible" == "possible to debug suspend > problems without having to change CONFIG_ or /sys/*". Ok. Deal. Once you fixed all the console drivers, i'll gladly send a patch that reverts the patch we are discussing now. Note that this patch actually helps fixing those drivers, since you can test much easier if a given driver survives suspend ;-) -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable
On Thu, Jun 21, 2007 at 03:20:08PM +0200, Pavel Machek wrote: Hi! No, i don't agree at all. In this case, no config needed == not possible to debug suspend problems. No, sorry. My proposed solution is figure out which console drivers can survive being on while machines go down, and keep them on. So, no config needed == kernel always does the right thing, keeping console during suspend when possible == possible to debug suspend problems without having to change CONFIG_ or /sys/*. Ok. Deal. Once you fixed all the console drivers, i'll gladly send a patch that reverts the patch we are discussing now. Note that this patch actually helps fixing those drivers, since you can test much easier if a given driver survives suspend ;-) -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable
On Sun, Jun 17, 2007 at 11:49:40PM +0200, Pavel Machek wrote: > Hi! > > > > > I hate having to recompile the kernel, just to be able to debug suspend. > > > > Remove CONFIG_DISABLE_CONSOLE_SUSPEND, replace it by a tunable in > > > > /sys/power/disable_console_suspend. > > > > > > > > > > > > Signed-off-by: Stefan Seyfried <[EMAIL PROTECTED]> > > > > Signed-off-by: Frank Seidel <[EMAIL PROTECTED]> > > > > --- > > > > Third try, renamed sysfs interface to console_suspend > > > > reporting and expecting either "enabled" or "disabled" > > > > > > Thanks a lot for redoing it. > > > > > > I have no objections. Pavel? > > > > I still think that patch is bad. I should have screamed when > > CONFIG_DISABLE_CONSOLE_SUSPEND went into kernel. That beast should > > _not_ be configurable, it should just do the right thing. > > > > But I realized that too late. And this only makes it works, making > > that mistake part of user-kernel interface. > > > > Sorry for not screaming when CONFIG_DISABLE_CONSOLE_SUSPEND went in, > > but please lets solve this correctly > > Ouch and sorry for not screaming at "try 1" time. But it still does > not make the patch right, and I believe that even patch authors agree > that "no-config-needed" is superior solution. No, i don't agree at all. In this case, "no config needed" == "not possible to debug suspend problems". IMO this is the same as issue as with "sysrq-C". You can crash the machine by other means, but it sometimes is just handy to have a mechanism to do it. I do not understand what's the problem with this option. If you want to avoid that people use it for something else than debugging, i can add a patch that crashes the machine ten seconds after resume if this option is set. -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable
On Sun, Jun 17, 2007 at 11:49:40PM +0200, Pavel Machek wrote: Hi! I hate having to recompile the kernel, just to be able to debug suspend. Remove CONFIG_DISABLE_CONSOLE_SUSPEND, replace it by a tunable in /sys/power/disable_console_suspend. Signed-off-by: Stefan Seyfried [EMAIL PROTECTED] Signed-off-by: Frank Seidel [EMAIL PROTECTED] --- Third try, renamed sysfs interface to console_suspend reporting and expecting either enabled or disabled Thanks a lot for redoing it. I have no objections. Pavel? I still think that patch is bad. I should have screamed when CONFIG_DISABLE_CONSOLE_SUSPEND went into kernel. That beast should _not_ be configurable, it should just do the right thing. But I realized that too late. And this only makes it works, making that mistake part of user-kernel interface. Sorry for not screaming when CONFIG_DISABLE_CONSOLE_SUSPEND went in, but please lets solve this correctly Ouch and sorry for not screaming at try 1 time. But it still does not make the patch right, and I believe that even patch authors agree that no-config-needed is superior solution. No, i don't agree at all. In this case, no config needed == not possible to debug suspend problems. IMO this is the same as issue as with sysrq-C. You can crash the machine by other means, but it sometimes is just handy to have a mechanism to do it. I do not understand what's the problem with this option. If you want to avoid that people use it for something else than debugging, i can add a patch that crashes the machine ten seconds after resume if this option is set. -- Stefan Seyfried QA / RD Team Mobile Devices| Any ideas, John? SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out. This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH, 2nd try] make disable_console_suspend runtime configurable
On Thu, Jun 14, 2007 at 12:08:00AM +0200, Pavel Machek wrote: > Hi! > > > I hate having to recompile the kernel, just to be able to debug suspend. > > Remove CONFIG_DISABLE_CONSOLE_SUSPEND, replace it by a tunable in > > /sys/power/disable_console_suspend. > > > Signed-off-by: Stefan Seyfried <[EMAIL PROTECTED]> > > Signed-off-by: Frank Seidel <[EMAIL PROTECTED]> > > I wonder if there's a better name? Suggest one. > Or maybe this should not be /sys configurable, but just have value for > each console "this console can work while suspended"? > > (serial can, vesafb can, netconsole can't)? Go ahead, submit a patch. It won't be that trivial. And i wonder if it is actually worth the hassle. This is a debugging facility. > Exporting "crash-me" option to user does not seem that cool to me. We have "echo c > /proc/sysrq-trigger" also. This is a debugging option, and forcing users to recompile the kernel just to debug suspend problems (not resume problems, the "it does not even go to sleep" stuff is where this matters most) is IMO a bad idea. We can also make this a boot parameter, i don't care, but i want to disable console suspend without recompiling the kernel. -- Stefan Seyfried QA / R Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/