Re: (Small) bias in generation of random passkeys for pairing

2019-06-20 Thread Stefan Seyfried
Hi Pavel,

Am 19.06.19 um 18:24 schrieb Pavel Machek:
> Hi!
> 
> There's a (small) bias in passkey generation in bluetooth:
> 
> get_random_bytes(, sizeof(passkey));
>   passkey %= 100;
>   put_unaligned_le32(passkey, smp->tk);
> 
> (there are at least two places doing this).
> 
> All passkeys are not of same probability, passkey "00" is more
> probable than "99", but difference is small.

It is slightly different IMHO.

Unsigned 32bits passkey assumed (and all users I found were u32),
the passkeys "00" to "967295" are slightly more probable than
"967296" to "99".

If my math is right (which I doubt), the difference in probability
for both entities is 4294:4293.

> Do we care?

I, personally, don't (yet).
But then, I'm not a real security expert.

Have fun,
-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman


[PATCH] dvb-usb-firmware: use DMA buffers for USB transfers

2017-02-18 Thread Stefan Seyfried
From: Stefan Seyfried <seife+ker...@b1-systems.com>

The USB control messages require DMA to work. We cannot pass
a stack-allocated buffer, as it is not warranted that the
stack would be into a DMA enabled area.

Signed-off-by: Stefan Seyfried <seife+ker...@b1-systems.com>
---

This fixes at least dvb-usb-technisat-usb2 for me, but probably
the other drivers that are using dvb_usb_download_firmware()
with a Cypress chip are broken with CONFIG_VMAP_STACK=y right
now. Patch attached additionally, because I don't think
thunderbird will get this right :-(

 drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c 
b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
index f0023dbb7276..2f340621a786 100644
--- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
+++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
@@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 
addr,u8 *data, u8 le
 
 int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware 
*fw, int type)
 {
-   struct hexline hx;
-   u8 reset;
int ret,pos=0;
+   /* urb buffers must be malloc'ed, stack will not work with 
CONFIG_VMAP_STACK=y */
+   u8 *reset = kmalloc(1, GFP_KERNEL);
+   struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL);
 
/* stop the CPU */
-   reset = 1;
-   if ((ret = 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1)) != 1)
+   *reset = 1;
+   if ((ret = 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1)) != 1)
err("could not stop the USB controller CPU.");
 
-   while ((ret = dvb_usb_get_hexline(fw,,)) > 0) {
-   deb_fw("writing to address 0x%04x (buffer: 0x%02x 
%02x)\n",hx.addr,hx.len,hx.chk);
-   ret = usb_cypress_writemem(udev,hx.addr,hx.data,hx.len);
+   while ((ret = dvb_usb_get_hexline(fw,hx,)) > 0) {
+   deb_fw("writing to address 0x%04x (buffer: 0x%02x 
%02x)\n",hx->addr,hx->len,hx->chk);
+   ret = usb_cypress_writemem(udev,hx->addr,hx->data,hx->len);
 
-   if (ret != hx.len) {
+   if (ret != hx->len) {
err("error while transferring firmware (transferred 
size: %d, block size: %d)",
-   ret,hx.len);
+   ret,hx->len);
ret = -EINVAL;
break;
}
}
if (ret < 0) {
err("firmware download failed at %d with %d",pos,ret);
-   return ret;
+   goto out_free;
}
 
if (ret == 0) {
/* restart the CPU */
-   reset = 0;
-   if (ret || 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1) != 1) {
+   *reset = 0;
+   if (ret || 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1) != 1) {
err("could not restart the USB controller CPU.");
ret = -EINVAL;
}
} else
ret = -EIO;
 
+ out_free:
+   kfree(reset);
+   kfree(hx);
return ret;
 }
 EXPORT_SYMBOL(usb_cypress_load_firmware);
-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman
From f582c0f19837890254d3c0d8a23a1142eb8ea673 Mon Sep 17 00:00:00 2001
From: Stefan Seyfried <seife+ker...@b1-systems.com>
Date: Sat, 18 Feb 2017 22:52:31 +0100
Subject: [PATCH] dvb-usb-firmware: use DMA buffers for USB transfers

The USB control messages require DMA to work. We cannot pass
a stack-allocated buffer, as it is not warranted that the
stack would be into a DMA enabled area.

Signed-off-by: Stefan Seyfried <seife+ker...@b1-systems.com>
---
 drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
index f0023dbb7276..2f340621a786 100644
--- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
+++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
@@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le
 
 int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type)
 {
-	struct hexline hx;
-	u8 reset;
 	int ret,pos=0;
+	/* urb buffers must be malloc'ed, stack will not work with CONFIG_VMAP_STACK=y */
+	u8 *reset = kmalloc(1, GFP_KERNEL);
+	struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL);
 
 	/* stop the CPU */
-	reset = 1;
-	if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_regis

[PATCH] dvb-usb-firmware: use DMA buffers for USB transfers

2017-02-18 Thread Stefan Seyfried
From: Stefan Seyfried 

The USB control messages require DMA to work. We cannot pass
a stack-allocated buffer, as it is not warranted that the
stack would be into a DMA enabled area.

Signed-off-by: Stefan Seyfried 
---

This fixes at least dvb-usb-technisat-usb2 for me, but probably
the other drivers that are using dvb_usb_download_firmware()
with a Cypress chip are broken with CONFIG_VMAP_STACK=y right
now. Patch attached additionally, because I don't think
thunderbird will get this right :-(

 drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c 
b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
index f0023dbb7276..2f340621a786 100644
--- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
+++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
@@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 
addr,u8 *data, u8 le
 
 int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware 
*fw, int type)
 {
-   struct hexline hx;
-   u8 reset;
int ret,pos=0;
+   /* urb buffers must be malloc'ed, stack will not work with 
CONFIG_VMAP_STACK=y */
+   u8 *reset = kmalloc(1, GFP_KERNEL);
+   struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL);
 
/* stop the CPU */
-   reset = 1;
-   if ((ret = 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1)) != 1)
+   *reset = 1;
+   if ((ret = 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1)) != 1)
err("could not stop the USB controller CPU.");
 
-   while ((ret = dvb_usb_get_hexline(fw,,)) > 0) {
-   deb_fw("writing to address 0x%04x (buffer: 0x%02x 
%02x)\n",hx.addr,hx.len,hx.chk);
-   ret = usb_cypress_writemem(udev,hx.addr,hx.data,hx.len);
+   while ((ret = dvb_usb_get_hexline(fw,hx,)) > 0) {
+   deb_fw("writing to address 0x%04x (buffer: 0x%02x 
%02x)\n",hx->addr,hx->len,hx->chk);
+   ret = usb_cypress_writemem(udev,hx->addr,hx->data,hx->len);
 
-   if (ret != hx.len) {
+   if (ret != hx->len) {
err("error while transferring firmware (transferred 
size: %d, block size: %d)",
-   ret,hx.len);
+   ret,hx->len);
ret = -EINVAL;
break;
}
}
if (ret < 0) {
err("firmware download failed at %d with %d",pos,ret);
-   return ret;
+   goto out_free;
}
 
if (ret == 0) {
/* restart the CPU */
-   reset = 0;
-   if (ret || 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1) != 1) {
+   *reset = 0;
+   if (ret || 
usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1) != 1) {
err("could not restart the USB controller CPU.");
ret = -EINVAL;
}
} else
ret = -EIO;
 
+ out_free:
+   kfree(reset);
+   kfree(hx);
return ret;
 }
 EXPORT_SYMBOL(usb_cypress_load_firmware);
-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman
From f582c0f19837890254d3c0d8a23a1142eb8ea673 Mon Sep 17 00:00:00 2001
From: Stefan Seyfried 
Date: Sat, 18 Feb 2017 22:52:31 +0100
Subject: [PATCH] dvb-usb-firmware: use DMA buffers for USB transfers

The USB control messages require DMA to work. We cannot pass
a stack-allocated buffer, as it is not warranted that the
stack would be into a DMA enabled area.

Signed-off-by: Stefan Seyfried 
---
 drivers/media/usb/dvb-usb/dvb-usb-firmware.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
index f0023dbb7276..2f340621a786 100644
--- a/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
+++ b/drivers/media/usb/dvb-usb/dvb-usb-firmware.c
@@ -35,41 +35,45 @@ static int usb_cypress_writemem(struct usb_device *udev,u16 addr,u8 *data, u8 le
 
 int usb_cypress_load_firmware(struct usb_device *udev, const struct firmware *fw, int type)
 {
-	struct hexline hx;
-	u8 reset;
 	int ret,pos=0;
+	/* urb buffers must be malloc'ed, stack will not work with CONFIG_VMAP_STACK=y */
+	u8 *reset = kmalloc(1, GFP_KERNEL);
+	struct hexline *hx = kmalloc(sizeof(struct hexline), GFP_KERNEL);
 
 	/* stop the CPU */
-	reset = 1;
-	if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,,1)) != 1)
+	*reset = 1;
+	if ((ret = usb_cypress_writemem(udev,cypress[type].cpu_cs_register,reset,1)) != 1)
 		err("could not stop 

Re: [PATCH] drm/i915: Avoid drm_atomic_state_put(NULL) in intel_display_resume

2017-01-15 Thread Stefan Seyfried
Hi Chris,

this fixes the problem for me, thanks!

Tested-by: Stefan Seyfried <stefan.seyfr...@googlemail.com>

Am 15.01.2017 um 13:58 schrieb Chris Wilson:
> intel_display_resume() may be called without a atomic state to restore,
> i.e. dev_priv->modeset_reset_restore state is NULL. One such case is
> following a lid open/close event and the forced modeset in
> intel_lid_notiy().
> 
> Reported-by: Stefan Seyfried <stefan.seyfr...@googlemail.com>
> Fixes: 0853695c3ba4 ("drm: Add reference counting to drm_atomic_state")
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> Cc: Jani Nikula <jani.nik...@linux.intel.com>
> Cc: <drm-intel-fi...@lists.freedesktop.org> # v4.10-rc1+
> ---
>  drivers/gpu/drm/i915/intel_display.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index 3dc8724df400..260bbe8881e6 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -17024,7 +17024,8 @@ void intel_display_resume(struct drm_device *dev)
>  
>   if (ret)
>   DRM_ERROR("Restoring old state failed with %i\n", ret);
> - drm_atomic_state_put(state);
> + if (state)
> + drm_atomic_state_put(state);
>  }
>  
>  void intel_modeset_gem_init(struct drm_device *dev)

-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman


Re: [PATCH] drm/i915: Avoid drm_atomic_state_put(NULL) in intel_display_resume

2017-01-15 Thread Stefan Seyfried
Hi Chris,

this fixes the problem for me, thanks!

Tested-by: Stefan Seyfried 

Am 15.01.2017 um 13:58 schrieb Chris Wilson:
> intel_display_resume() may be called without a atomic state to restore,
> i.e. dev_priv->modeset_reset_restore state is NULL. One such case is
> following a lid open/close event and the forced modeset in
> intel_lid_notiy().
> 
> Reported-by: Stefan Seyfried 
> Fixes: 0853695c3ba4 ("drm: Add reference counting to drm_atomic_state")
> Signed-off-by: Chris Wilson 
> Cc: Daniel Vetter 
> Cc: Jani Nikula 
> Cc:  # v4.10-rc1+
> ---
>  drivers/gpu/drm/i915/intel_display.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index 3dc8724df400..260bbe8881e6 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -17024,7 +17024,8 @@ void intel_display_resume(struct drm_device *dev)
>  
>   if (ret)
>   DRM_ERROR("Restoring old state failed with %i\n", ret);
> - drm_atomic_state_put(state);
> + if (state)
> +     drm_atomic_state_put(state);
>  }
>  
>  void intel_modeset_gem_init(struct drm_device *dev)

-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman


4.10 regression drm/i915: BUG/oops on lid open

2017-01-15 Thread Stefan Seyfried
Hi all,

Since 4.10-rc1 I'm getting this on lid close/open on my trusty old
ThinkPad X200s:

pci :00:1e.0: PCI bridge to [bus 0d]
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: intel_display_resume+0xaf/0x120 [i915]
PGD 22b99b067
PUD 22b99a067
PMD 0

Oops: 0002 [#1] PREEMPT SMP
Modules linked in: ccm rfcomm fuse xt_CHECKSUM iptable_mangle
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet
bnep msr xfs libcrc32c cdc_ether usbnet mii cdc_wdm cdc_acm dm_crypt
algif_skcipher af_alg snd_hda_codec_conexant snd_hda_codec_generic arc4
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_pcm
mei_wdt iTCO_wdt iTCO_vendor_support iwldvm snd_seq mac80211
snd_seq_device snd_timer coretemp kvm_intel kvm irqbypass btusb btrtl
btbcm btintel iwlwifi pcspkr snd_mixer_oss bluetooth thinkpad_acpi
battery ac fjes i915 cfg80211 snd wmi rfkill
 drm_kms_helper video drm i2c_i801 fb_sys_fops syscopyarea e1000e
sysfillrect sysimgblt i2c_algo_bit acpi_cpufreq ptp soundcore tpm_tis
mei_me pps_core shpchp tpm_tis_core lpc_ich mei mfd_core button tpm
serio_raw thermal ehci_pci uhci_hcd ehci_hcd usbcore sg dm_multipath
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua loop
CPU: 0 PID: 12922 Comm: kworker/0:0 Not tainted
4.10.0-rc3-1.gf1c24bb-default #1
Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
Workqueue: kacpi_notify acpi_os_execute_deferred
task: 9e2c22854240 task.stack: becbcc85c000
RIP: 0010:intel_display_resume+0xaf/0x120 [i915]
RSP: 0018:becbcc85fc70 EFLAGS: 00010282
RAX: c027a670 RBX: becbcc85fc78 RCX: 
RDX: 9e2c22854240 RSI: 000d RDI: 9e2c2d738210
RBP: becbcc85fcd0 R08: 0010 R09: 
R10: 9e2c2d738380 R11: c0451d00 R12: 9e2c2d738000
R13:  R14: 9e2c2d738210 R15: 
FS:  () GS:9e2c3bc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 00022b998000 CR4: 000406f0
Call Trace:
 intel_lid_notify+0xca/0xd0 [i915]
 notifier_call_chain+0x4a/0x70
 __blocking_notifier_call_chain+0x47/0x60
 blocking_notifier_call_chain+0x16/0x20
 acpi_lid_notify_state+0xee/0x142 [button]
 acpi_lid_update_state+0x24/0x27 [button]
 acpi_button_notify+0x3d/0x130 [button]
 acpi_device_notify+0x19/0x1b
 acpi_ev_notify_dispatch+0x49/0x61
 acpi_os_execute_deferred+0x14/0x20
 process_one_work+0x193/0x470
 worker_thread+0x4e/0x490
 kthread+0x101/0x140
 ? process_one_work+0x470/0x470
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x25/0x30
Code: e8 d7 aa 2c d6 8b 45 a4 89 c1 31 f6 48 c7 c2 c0 11 50 c0 48 c7 c7
e5 10 51 c0 e8 6d a3 de ff 48 c7 c0 70 a6 27 c0 48 85 c0 74 56  41
83 6d 00 01 75 08 4c 89 ef e8 01 b9 df ff 48 83 c4 40 5b
RIP: intel_display_resume+0xaf/0x120 [i915] RSP: becbcc85fc70
CR2: 
---[ end trace d496ba830778c097 ]---

The machine is running fine afterwards but never again receiving a lid
close / open event.
4.9 is good.
I tried to bisect it and landed at

0853695c3ba46f97dfc0b5885f7b7e640ca212dd
Author: Chris Wilson <ch...@chris-wilson.co.uk>
Date:   Fri Oct 14 13:18:18 2016 +0100

drm: Add reference counting to drm_atomic_state

However, during bisecting the failure got worse (the machine locked up
hard during lid close/open, with lots of recursive faults), so I doubt
this is the commit to revert, but apparently it still needs some more fixes.

Thanks,

Stefan
-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman


4.10 regression drm/i915: BUG/oops on lid open

2017-01-15 Thread Stefan Seyfried
Hi all,

Since 4.10-rc1 I'm getting this on lid close/open on my trusty old
ThinkPad X200s:

pci :00:1e.0: PCI bridge to [bus 0d]
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: intel_display_resume+0xaf/0x120 [i915]
PGD 22b99b067
PUD 22b99a067
PMD 0

Oops: 0002 [#1] PREEMPT SMP
Modules linked in: ccm rfcomm fuse xt_CHECKSUM iptable_mangle
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet
bnep msr xfs libcrc32c cdc_ether usbnet mii cdc_wdm cdc_acm dm_crypt
algif_skcipher af_alg snd_hda_codec_conexant snd_hda_codec_generic arc4
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_pcm
mei_wdt iTCO_wdt iTCO_vendor_support iwldvm snd_seq mac80211
snd_seq_device snd_timer coretemp kvm_intel kvm irqbypass btusb btrtl
btbcm btintel iwlwifi pcspkr snd_mixer_oss bluetooth thinkpad_acpi
battery ac fjes i915 cfg80211 snd wmi rfkill
 drm_kms_helper video drm i2c_i801 fb_sys_fops syscopyarea e1000e
sysfillrect sysimgblt i2c_algo_bit acpi_cpufreq ptp soundcore tpm_tis
mei_me pps_core shpchp tpm_tis_core lpc_ich mei mfd_core button tpm
serio_raw thermal ehci_pci uhci_hcd ehci_hcd usbcore sg dm_multipath
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua loop
CPU: 0 PID: 12922 Comm: kworker/0:0 Not tainted
4.10.0-rc3-1.gf1c24bb-default #1
Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
Workqueue: kacpi_notify acpi_os_execute_deferred
task: 9e2c22854240 task.stack: becbcc85c000
RIP: 0010:intel_display_resume+0xaf/0x120 [i915]
RSP: 0018:becbcc85fc70 EFLAGS: 00010282
RAX: c027a670 RBX: becbcc85fc78 RCX: 
RDX: 9e2c22854240 RSI: 000d RDI: 9e2c2d738210
RBP: becbcc85fcd0 R08: 0010 R09: 
R10: 9e2c2d738380 R11: c0451d00 R12: 9e2c2d738000
R13:  R14: 9e2c2d738210 R15: 
FS:  () GS:9e2c3bc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 00022b998000 CR4: 000406f0
Call Trace:
 intel_lid_notify+0xca/0xd0 [i915]
 notifier_call_chain+0x4a/0x70
 __blocking_notifier_call_chain+0x47/0x60
 blocking_notifier_call_chain+0x16/0x20
 acpi_lid_notify_state+0xee/0x142 [button]
 acpi_lid_update_state+0x24/0x27 [button]
 acpi_button_notify+0x3d/0x130 [button]
 acpi_device_notify+0x19/0x1b
 acpi_ev_notify_dispatch+0x49/0x61
 acpi_os_execute_deferred+0x14/0x20
 process_one_work+0x193/0x470
 worker_thread+0x4e/0x490
 kthread+0x101/0x140
 ? process_one_work+0x470/0x470
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x25/0x30
Code: e8 d7 aa 2c d6 8b 45 a4 89 c1 31 f6 48 c7 c2 c0 11 50 c0 48 c7 c7
e5 10 51 c0 e8 6d a3 de ff 48 c7 c0 70 a6 27 c0 48 85 c0 74 56  41
83 6d 00 01 75 08 4c 89 ef e8 01 b9 df ff 48 83 c4 40 5b
RIP: intel_display_resume+0xaf/0x120 [i915] RSP: becbcc85fc70
CR2: 
---[ end trace d496ba830778c097 ]---

The machine is running fine afterwards but never again receiving a lid
close / open event.
4.9 is good.
I tried to bisect it and landed at

0853695c3ba46f97dfc0b5885f7b7e640ca212dd
Author: Chris Wilson 
Date:   Fri Oct 14 13:18:18 2016 +0100

drm: Add reference counting to drm_atomic_state

However, during bisecting the failure got worse (the machine locked up
hard during lid close/open, with lots of recursive faults), so I doubt
this is the commit to revert, but apparently it still needs some more fixes.

Thanks,

Stefan
-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman


Re: PCI devices (buses?) and 3GB of RAM lost with 4.2rc1

2015-07-11 Thread Stefan Seyfried
Am 08.07.2015 um 22:09 schrieb Stefan Seyfried:
> this is on a Thinkpad X200s, 5 years old and working fine, until 4.2rc1
> came along.
> 
> With that booted, I do not have a WiFi card anymore, it doesn't even
> appear in "lspci" output.

> From diffing the dmesg's, it also looks like I lost some of my RAM:
> 
> -Memory: 8050048K/8280176K available (6401K kernel code, 980K rwdata,
> 4864K rodata, 1532K init, 1516K bss, 230128K reserved, 0K cma-reserved)
> +Memory: 5104620K/8280176K available (6823K kernel code, 1096K rwdata,
> 3220K rodata, 1556K init, 1520K bss, 227792K reserved, 0K cma-reserved)

This was only a one-off thing, it looks like the hardware was confused
when first booting 4.2-rc1
(I found out when I wanted to bisect it, all the kernels I built did
just work, and then I finally booted the distro-kernel again and it also
worked :-)

So everything is fine, sorry for the noise.
-- 
-- 
Stefan Seyfried
Linux Consultant & Developer
Mail: seyfr...@b1-systems.de GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI devices (buses?) and 3GB of RAM lost with 4.2rc1

2015-07-11 Thread Stefan Seyfried
Am 08.07.2015 um 22:09 schrieb Stefan Seyfried:
 this is on a Thinkpad X200s, 5 years old and working fine, until 4.2rc1
 came along.
 
 With that booted, I do not have a WiFi card anymore, it doesn't even
 appear in lspci output.

 From diffing the dmesg's, it also looks like I lost some of my RAM:
 
 -Memory: 8050048K/8280176K available (6401K kernel code, 980K rwdata,
 4864K rodata, 1532K init, 1516K bss, 230128K reserved, 0K cma-reserved)
 +Memory: 5104620K/8280176K available (6823K kernel code, 1096K rwdata,
 3220K rodata, 1556K init, 1520K bss, 227792K reserved, 0K cma-reserved)

This was only a one-off thing, it looks like the hardware was confused
when first booting 4.2-rc1
(I found out when I wanted to bisect it, all the kernels I built did
just work, and then I finally booted the distro-kernel again and it also
worked :-)

So everything is fine, sorry for the noise.
-- 
-- 
Stefan Seyfried
Linux Consultant  Developer
Mail: seyfr...@b1-systems.de GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] 4.1-rc6 unloading loop OOPS

2015-06-04 Thread Stefan Seyfried
Hi Ming,

Am 04.06.2015 um 12:24 schrieb Ming Lei:
> On Thu, Jun 4, 2015 at 5:11 PM, Stefan Seyfried
>  wrote:

>> I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod 
>> loop
>> do not complain anymore), but not the OOPS.
> 
> One fix[1] was just merged to linus tree, and could you test that to see if 
> your
> issue can be addressed?
> 
> [1] http://marc.info/?t=14320151831=1=2

I just tried current Linus' master v4.1-rc6-49-g8a7deb3 which contains
this commit and do no longer get the Warning

Unfortunately, due to this I cannot really test your patch for the OOPS
(but the OOPS was only happening once for me, so it was not reliably
triggered).

Thanks, things work well for me, again :-)

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION] 4.1-rc6 unloading loop OOPS

2015-06-04 Thread Stefan Seyfried
ff
[  661.450217] Call Trace:
[  661.450217]  [] blk_mq_unregister_hctx.part.0+0x3d/0x60
[  661.450217]  [] blk_mq_unregister_disk+0x51/0xe0
[  661.450217]  [] blk_unregister_queue+0x2c/0x90
[  661.450217]  [] del_gendisk+0x118/0x280
[  661.450217]  [] loop_remove+0x21/0x50 [loop]
[  661.450217]  [] loop_exit_cb+0x11/0x20 [loop]
[  661.450217]  [] idr_for_each+0xa3/0xf0
[  661.450217]  [] loop_exit+0x30/0xb1a [loop]
[  661.450217]  [] SyS_delete_module+0x1ac/0x230
[  661.450217]  [] system_call_fastpath+0x16/0x75
[  661.450217]  [<7ff635777f37>] 0x7ff635777f37
[  661.450217] Code: 48 83 c7 18 e9 54 ff ff ff 0f 1f 40 00 5b c3 66 66 66 66 
66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 2e <48> 8b 
6f 30 e8 09 cc ef ff 48 89 ef e8 a1 98 ef ff 80 63 3c fd 
[  661.450217] RIP  [] kobject_del+0xe/0x50
[  661.450217]  RSP 
[  661.450217] CR2: 0108
[  661.466690] ---[ end trace 7b8e0f39c45cf572 ]---

I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod 
loop
do not complain anymore), but not the OOPS.

Best regards,

    Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] 4.1-rc6 unloading loop OOPS

2015-06-04 Thread Stefan Seyfried
Hi Ming,

Am 04.06.2015 um 12:24 schrieb Ming Lei:
 On Thu, Jun 4, 2015 at 5:11 PM, Stefan Seyfried
 stefan.seyfr...@googlemail.com wrote:

 I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod 
 loop
 do not complain anymore), but not the OOPS.
 
 One fix[1] was just merged to linus tree, and could you test that to see if 
 your
 issue can be addressed?
 
 [1] http://marc.info/?t=14320151831r=1w=2

I just tried current Linus' master v4.1-rc6-49-g8a7deb3 which contains
this commit and do no longer get the Warning

Unfortunately, due to this I cannot really test your patch for the OOPS
(but the OOPS was only happening once for me, so it was not reliably
triggered).

Thanks, things work well for me, again :-)

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION] 4.1-rc6 unloading loop OOPS

2015-06-04 Thread Stefan Seyfried
:
[  661.450217]  a380 0001 88023004f800 
8133abed
[  661.450217]  8802319eba80 8802300577e0 88023004f800 
8133ac61
[  661.450217]  8802300577e0 88023004f400  

[  661.450217] Call Trace:
[  661.450217]  [8133abed] blk_mq_unregister_hctx.part.0+0x3d/0x60
[  661.450217]  [8133ac61] blk_mq_unregister_disk+0x51/0xe0
[  661.450217]  [81330a2c] blk_unregister_queue+0x2c/0x90
[  661.450217]  [8133e048] del_gendisk+0x118/0x280
[  661.450217]  [a351] loop_remove+0x21/0x50 [loop]
[  661.450217]  [a391] loop_exit_cb+0x11/0x20 [loop]
[  661.450217]  [81359743] idr_for_each+0xa3/0xf0
[  661.450217]  [a0003516] loop_exit+0x30/0xb1a [loop]
[  661.450217]  [810ece3c] SyS_delete_module+0x1ac/0x230
[  661.450217]  [816a1cb2] system_call_fastpath+0x16/0x75
[  661.450217]  [7ff635777f37] 0x7ff635777f37
[  661.450217] Code: 48 83 c7 18 e9 54 ff ff ff 0f 1f 40 00 5b c3 66 66 66 66 
66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 2e 48 8b 
6f 30 e8 09 cc ef ff 48 89 ef e8 a1 98 ef ff 80 63 3c fd 
[  661.450217] RIP  [8135b3ce] kobject_del+0xe/0x50
[  661.450217]  RSP 8801d8d7bd78
[  661.450217] CR2: 0108
[  661.466690] ---[ end trace 7b8e0f39c45cf572 ]---

I can reproduce the backtrace after a reboot once (subsequent modprobe/rmmod 
loop
do not complain anymore), but not the OOPS.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-23 Thread Stefan Seyfried
Am 23.03.2015 um 19:38 schrieb Andy Lutomirski:
> I bet I see it.  I have the advantage of having stared at KVM code and
> cursed at it more recently than you, I suspect.  KVM does awful, awful
> things to CPU state, and, as an optimization, it allows kernel code to
> run with CPU state that would be totally invalid in user mode.  This
> happens through a bunch of hooks, including this bit in __switch_to:
> 
> /*
>  * Now maybe reload the debug registers and handle I/O bitmaps
>  */
> if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT ||
>  task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
> __switch_to_xtra(prev_p, next_p, tss);
> 
> IOW, we *change* tif during context switches.
> 
> 
> The race looks like this:
> 
> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
> jnz int_ret_from_sys_call_fixup/* Go the the slow path */
> 
> --- preempted here, switch to KVM guest ---
> 
> KVM guest enters and screws up, say, MSR_SYSCALL_MASK.  This wouldn't
> happen to be a *32-bit* KVM guest, perhaps?

not in my case (penryn CPU), there it was 64bit guests.

> Now KVM schedules, calling __switch_to.  __switch_to sets
> _TIF_USER_RETURN_NOTIFY.  We IRET back to the syscall exit code, turn
> off interrupts, and do sysret.  We are now screwed.
> 
> I don't know why this manifests in this particular failure, but any
> number of terrible things could happen now.
> 
> FWIW, this will affect things other than KVM.  For example, SIGKILL
> sent while a process is sleeping in that two-instruction window won't
> work.
> 
> Takashi, can you re-send your patch so we can review it for real in
> light of this race?
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-23 Thread Stefan Seyfried
Am 23.03.2015 um 19:38 schrieb Andy Lutomirski:
 I bet I see it.  I have the advantage of having stared at KVM code and
 cursed at it more recently than you, I suspect.  KVM does awful, awful
 things to CPU state, and, as an optimization, it allows kernel code to
 run with CPU state that would be totally invalid in user mode.  This
 happens through a bunch of hooks, including this bit in __switch_to:
 
 /*
  * Now maybe reload the debug registers and handle I/O bitmaps
  */
 if (unlikely(task_thread_info(next_p)-flags  _TIF_WORK_CTXSW_NEXT ||
  task_thread_info(prev_p)-flags  _TIF_WORK_CTXSW_PREV))
 __switch_to_xtra(prev_p, next_p, tss);
 
 IOW, we *change* tif during context switches.
 
 
 The race looks like this:
 
 testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
 jnz int_ret_from_sys_call_fixup/* Go the the slow path */
 
 --- preempted here, switch to KVM guest ---
 
 KVM guest enters and screws up, say, MSR_SYSCALL_MASK.  This wouldn't
 happen to be a *32-bit* KVM guest, perhaps?

not in my case (penryn CPU), there it was 64bit guests.

 Now KVM schedules, calling __switch_to.  __switch_to sets
 _TIF_USER_RETURN_NOTIFY.  We IRET back to the syscall exit code, turn
 off interrupts, and do sysret.  We are now screwed.
 
 I don't know why this manifests in this particular failure, but any
 number of terrible things could happen now.
 
 FWIW, this will affect things other than KVM.  For example, SIGKILL
 sent while a process is sleeping in that two-instruction window won't
 work.
 
 Takashi, can you re-send your patch so we can review it for real in
 light of this race?
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-19 Thread Stefan Seyfried
Good Morning :-)

Am 19.03.2015 um 01:57 schrieb Andy Lutomirski:

> Stefan, do you happen to know whether your disassembly of page_fault
> came from the instructions in memory or if they came from the vmlinux
> file?  Not that I have any relevant ideas there.

I think they came from memory. At least, the disassemble in crash...
crash> disassemble page_fault
Dump of assembler code for function page_fault:
   0x816834a0 <+0>: data32 xchg %ax,%ax
   0x816834a3 <+3>: data32 xchg %ax,%ax
   0x816834a6 <+6>: data32 xchg %ax,%ax
   0x816834a9 <+9>: sub$0x78,%rsp
   0x816834ad <+13>:callq  0x81683620 
   0x816834b2 <+18>:mov%rsp,%rdi
   0x816834b5 <+21>:mov0x78(%rsp),%rsi
   0x816834ba <+26>:movq   $0x,0x78(%rsp)
   0x816834c3 <+35>:callq  0x810504e0 
   0x816834c8 <+40>:jmpq   0x816836d0 
End of assembler dump.

...is different than the one from loading vmlinux in gdb:

Reading symbols from vmlinux-4.0.0-rc3-2.gd5c547f-desktop...done.
Reading symbols from 
/usr/lib/debug/boot/vmlinux-4.0.0-rc3-2.gd5c547f-desktop.debug...done.
(gdb) disassemble page_fault
Dump of assembler code for function page_fault:
   0x816834a0 <+0>: data16 xchg %ax,%ax
   0x816834a3 <+3>: callq  *0x7a5b07(%rip)# 
0x81e28fb0 
   0x816834a9 <+9>: sub$0x78,%rsp
   0x816834ad <+13>:callq  0x81683620 
   0x816834b2 <+18>:mov%rsp,%rdi
   0x816834b5 <+21>:mov0x78(%rsp),%rsi
   0x816834ba <+26>:movq   $0x,0x78(%rsp)
   0x816834c3 <+35>:callq  0x810504e0 
   0xffff816834c8 <+40>:jmpq   0x816836d0 
End of assembler dump.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-19 Thread Stefan Seyfried
Good Morning :-)

Am 19.03.2015 um 01:57 schrieb Andy Lutomirski:

 Stefan, do you happen to know whether your disassembly of page_fault
 came from the instructions in memory or if they came from the vmlinux
 file?  Not that I have any relevant ideas there.

I think they came from memory. At least, the disassemble in crash...
crash disassemble page_fault
Dump of assembler code for function page_fault:
   0x816834a0 +0: data32 xchg %ax,%ax
   0x816834a3 +3: data32 xchg %ax,%ax
   0x816834a6 +6: data32 xchg %ax,%ax
   0x816834a9 +9: sub$0x78,%rsp
   0x816834ad +13:callq  0x81683620 error_entry
   0x816834b2 +18:mov%rsp,%rdi
   0x816834b5 +21:mov0x78(%rsp),%rsi
   0x816834ba +26:movq   $0x,0x78(%rsp)
   0x816834c3 +35:callq  0x810504e0 do_page_fault
   0x816834c8 +40:jmpq   0x816836d0 error_exit
End of assembler dump.

...is different than the one from loading vmlinux in gdb:

Reading symbols from vmlinux-4.0.0-rc3-2.gd5c547f-desktop...done.
Reading symbols from 
/usr/lib/debug/boot/vmlinux-4.0.0-rc3-2.gd5c547f-desktop.debug...done.
(gdb) disassemble page_fault
Dump of assembler code for function page_fault:
   0x816834a0 +0: data16 xchg %ax,%ax
   0x816834a3 +3: callq  *0x7a5b07(%rip)# 
0x81e28fb0 pv_irq_ops+48
   0x816834a9 +9: sub$0x78,%rsp
   0x816834ad +13:callq  0x81683620 error_entry
   0x816834b2 +18:mov%rsp,%rdi
   0x816834b5 +21:mov0x78(%rsp),%rsi
   0x816834ba +26:movq   $0x,0x78(%rsp)
   0x816834c3 +35:callq  0x810504e0 do_page_fault
   0x816834c8 +40:jmpq   0x816836d0 error_exit
End of assembler dump.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 19.03.2015 um 00:22 schrieb Andy Lutomirski:
> On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski  wrote:
>> Yes, it's userspace.  Thanks for checking, though.
> 
> One more stupid hunch:
> 
> Can you do:
> x/21xg 8801013d4f58
> 
> If I counted right, that'll dump task_pt_regs(current).

That's all zeroes:
crash> x /21xg 0x8801013d4f58
0x8801013d4f58: 0x  0x
0x8801013d4f68: 0x  0x
0x8801013d4f78: 0x  0x
0x8801013d4f88: 0x  0x
0x8801013d4f98: 0x  0x
0x8801013d4fa8: 0x  0x
0x8801013d4fb8: 0x  0x
0x8801013d4fc8: 0x  0x
0x8801013d4fd8: 0x  0x
0x8801013d4fe8: 0x  0x
0x8801013d4ff8: 0x

But maybe you counted wrong (or I'm reading arch/x86/include/asm/processor.h 
wrong, which is at least as likely...).

#define task_pt_regs(tsk)  ((struct pt_regs *)(tsk)->thread.sp0 - 1)

=> I have the task_struct readily available decoded in the crash utility.

crash> task, search for thread, in thread:
 sp0 = 18446612136629993472
crash> eval 18446612136629993472
hexadecimal: 8801013d8000  (18014269664677728KB)

crash> print *(struct pt_regs *)(18446612136629993472 - sizeof(struct pt_regs))
$20 = {
  r15 = 18446744071585666077, 
  r14 = 16, 
  r13 = 582, 
  r12 = 18446612136629993352, 
  bp = 24, 
  bx = 18446744071585666061, 
  r11 = 582, 
  r10 = 10760856, 
  r9 = 140712613762160, 
  r8 = 140735967861216, 
  ax = 1, 
  cx = 140712476030103, 
  dx = 140712613782304, 
  si = 1, 
  di = 140712589295616, 
  orig_ax = 209, 
  ip = 140712571864823, 
  cs = 51, 
  flags = 582, 
  sp = 140735967860552, 
  ss = 43
}

=>
r15 = 8168141d
r12 = 8801013d7f88
bx  = 8168140d
r9  = 7ffa355bd470
ip  = 7ffa32dc86f7
sp  = 7fffa55f1748

looks somehow legit, to my totally untrained eye (ip and sp actually).

I'm off to bed now (01:20 around here ;), will be back in about 7 hours.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 23:29 schrieb Andy Lutomirski:
> On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina  wrote:
>> On Wed, 18 Mar 2015, Andy Lutomirski wrote:
>>
>>> sysret64 can only fail with #GP, and we're totally screwed if that
>>> happens,
>>
>> But what if the GPF handler pagefaults afterwards? It'd be operating on
>> user stack already.
> 
> Good point.
> 
> Stefan, can you try changing the first "jne
> opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in
> entry_64.S and seeing if you can reproduce this?  (Is it easy enough
> to reproduce that this would tell us anything?)

I have no good way of reproducing the issue (happens once per week...)
but apparently Takashi has, so I'd like to hand this task over to him.

> It's a shame that double_fault doesn't record what gs was on entry.
> If we did sysret -> general_protection -> page_fault -> double_fault,
> then we'd enter double_fault with usergs, whereas syscall ->
> page_fault -> double_fault would enter double_fault with kernelgs.
> 
> Hmm.  We may be able to answer this more directly.  Stefan, can you
> dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your
> page_fault stack at the time of the failure)?  That will tell us the
> faulting address.  If that fails, try starting at 7fffa55eb000
> instead.

Unfortunately not, is this userspace memory? It's not in the dump I have.
This issue is the first I have seen where having a full dump would be
really helpful apart from cosmetic reasons...
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 22:49 schrieb Denys Vlasenko:
> Stefan, Takashi, can you post your /proc/cpuinfo
> and dmesg after boot?

susi:~ # cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU L9400  @ 1.86GHz
stepping: 10
microcode   : 0xa0c
cpu MHz : 1867.000
cache size  : 6144 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl 
vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow 
vnmi flexpriority bugs:
bogomips: 3723.96
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

(repeats for second core :)

I'm running 3.19 now, but the dmesg extracted from the crash
dump of 4.0-rc3 is at http://paste.opensuse.org/48196621
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 22:32 schrieb Linus Torvalds:
> Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault'
> makes me suspect it is,  and that that is some paravirt rewriting
> area. What does paravirt go for that USERGS_SYSRET64 (or for
> SWAPGS_UNSAFE_STACK, for that matter).

This from the newer kernel package, but I doubt this configuration has
been changed in the openSUSE kernel:

susi:~ # grep PARAVIRT /boot/config-4.0.0-rc4-1.g126fc64-desktop
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
# CONFIG_PARAVIRT_SPINLOCKS is not set
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_PARAVIRT_CLOCK=y

So yes, PARAVIRT is enabled.

Best regards,

    Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 22:21 schrieb Andy Lutomirski:
> On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried
>  wrote:
>> Am 18.03.2015 um 21:51 schrieb Andy Lutomirski:
>>> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried
>>>  wrote:
>>
>>>>> The relevant thread's stack is here (see ti in the trace):
>>>>>
>>>>> 8801013d4000
>>>>>
>>>>> It could be interesting to see what's there.
>>>>>
>>>>> I don't suppose you want to try to walk the paging structures to see
>>>>> if 88023bc8 (i.e. gsbase) and, more specifically,
>>>>> 88023bc8 + old_rsp and 88023bc8 + kernel_stack are
>>>>> present?  You'd only have to walk one level -- presumably, if the PGD
>>>>> entry is there, the rest of the entries are okay, too.
>>>>
>>>> That's all greek to me :-)
>>>>
>>>> I see that there is something at 88023bc8:
>>>>
>>>> crash> x /64xg 0x88023bc8
>>>> 0x88023bc8: 0x  0x
>>>> 0x88023bc80010: 0x  0x
>>>> 0x88023bc80020: 0x  0x6686ada9
>>>> 0x88023bc80030: 0x  0x
>>>> 0x88023bc80040: 0x  0x
>>>> [all zeroes]
>>>> 0x88023bc801f0: 0x  0x
>>>>
>>>> old_rsp and kernel_stack seem bogus:
>>>> crash> print old_rsp
>>>> Cannot access memory at address 0xa200
>>>> gdb: gdb request failed: print old_rsp
>>>> crash> print kernel_stack
>>>> Cannot access memory at address 0xaa48
>>>> gdb: gdb request failed: print kernel_stack
>>>>
>>>> kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is:
>>>
>>> Yup.  old_rsp and kernel_stack are offsets relative to gsbase.
>>>
>>>>
>>>> crash> x /64xg 0x88023bc8aa00
>>>> 0x88023bc8aa00: 0x  0x
>>>
>>> [...]
>>>
>>> I don't know enough about crashkernel to know whether the fact that
>>> this worked means anything.
>>
>> AFAIK this just means that the memory at this location is included in
>> the dump :-)
>>
>>> Can you dump the page of physical memory at 0x4779a067?  That's the PGD.
>>
>> Unfortunately not, this is a partial dump (I think the default config in
>> openSUSE, but I might have changed it some time ago) and the dump_level
>> is 31 which means that the following are excluded:
>>
>>  |  |cache  |cache  |  |
>> dump | zero |without|with   | user | free
>>level | page |private|private| data | page
>>   ---+--+---+---+--+--
>>   31 |  X   |   X   |   X   |  X   |  X
>>
>> so this:
>> crash> x /64xg 0x4779a067
>> 0x4779a067: Cannot access memory at address 0x4779a067
>> gdb: gdb request failed: x /64xg
>>
>> probably just means, that the PGD falls in one of the above excluded
>> categories.
> 
> I suspect that it actually means that gdb sees virtual addresses, not
> physical addresses.  But I screwed up completely -- "PGD" in the dump
> is the PGD *entry*, not the PGD pointer.

in crash, usually physical addresses work (it's a sophisticated wrapper
around gdb AFAICT)
> 
> We could plausibly fish it out from current->mm, but that's a mess.

I'll come to that later
  I
> don't suppose that "info registers" or "p/x $cr3" will show the cr3
> value?

No, that does not work from crash.

But current->mm is easy:
crash> task|grep mm
  start_comm =
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
  mm = 0x8800b8a9c040,
  active_mm = 0x8800b8a9c040,
  comm = "qemu-system-x86",

and (guessing the type :-)
crash> print *(struct mm_struct *)0x8800b8a9c040|grep pgd
  pgd = 0x880002d7e000,

But if that's correct, pgd contains all zeroes:
crash> print *(pgd_t *)0x880002d7e000
$15 = {
  pgd = 0
}
crash> x /16xg 0x880002d7e000
0x880002d7e000: 0x  0x
0x880002d7e010: 0x  0x
0x880002d7e020: 0x0000  0x0000
0x880002d7e030: 0x  0x000

Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 21:51 schrieb Andy Lutomirski:
> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried
>  wrote:

>>> The relevant thread's stack is here (see ti in the trace):
>>>
>>> 8801013d4000
>>>
>>> It could be interesting to see what's there.
>>>
>>> I don't suppose you want to try to walk the paging structures to see
>>> if 88023bc8 (i.e. gsbase) and, more specifically,
>>> 88023bc8 + old_rsp and 88023bc8 + kernel_stack are
>>> present?  You'd only have to walk one level -- presumably, if the PGD
>>> entry is there, the rest of the entries are okay, too.
>>
>> That's all greek to me :-)
>>
>> I see that there is something at 88023bc8:
>>
>> crash> x /64xg 0x88023bc8
>> 0x88023bc8: 0x  0x
>> 0x88023bc80010: 0x  0x
>> 0x88023bc80020: 0x  0x6686ada9
>> 0x88023bc80030: 0x  0x
>> 0x88023bc80040: 0x  0x
>> [all zeroes]
>> 0x88023bc801f0: 0x  0x
>>
>> old_rsp and kernel_stack seem bogus:
>> crash> print old_rsp
>> Cannot access memory at address 0xa200
>> gdb: gdb request failed: print old_rsp
>> crash> print kernel_stack
>> Cannot access memory at address 0xaa48
>> gdb: gdb request failed: print kernel_stack
>>
>> kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is:
> 
> Yup.  old_rsp and kernel_stack are offsets relative to gsbase.
> 
>>
>> crash> x /64xg 0x88023bc8aa00
>> 0x88023bc8aa00: 0x  0x
> 
> [...]
> 
> I don't know enough about crashkernel to know whether the fact that
> this worked means anything.

AFAIK this just means that the memory at this location is included in
the dump :-)

> Can you dump the page of physical memory at 0x4779a067?  That's the PGD.

Unfortunately not, this is a partial dump (I think the default config in
openSUSE, but I might have changed it some time ago) and the dump_level
is 31 which means that the following are excluded:

 |  |cache  |cache  |  |
dump | zero |without|with   | user | free
   level | page |private|private| data | page
  ---+--+---+---+--+--
      31 |  X   |   X   |   X   |  X   |  X

so this:
crash> x /64xg 0x4779a067
0x4779a067: Cannot access memory at address 0x4779a067
gdb: gdb request failed: x /64xg

probably just means, that the PGD falls in one of the above excluded
categories.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Hi Andy,

Am 18.03.2015 um 20:26 schrieb Andy Lutomirski:
> Hi Linus-
> 
> You seem to enjoy debugging these things.  Want to give this a shot?
> My guess is a vmalloc fault accessing either old_rsp or kernel_stack
> right after swapgs in syscall entry.
> 
> On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried
>  wrote:
>> Hi all,
>>
>> first, I'm kind of happy that I'm not the only one seeing this, and
>> thus my beloved Thinkpad can stay for a bit longer... :-)
>>
>> Then, I'm mostly an amateur when it comes to kernel debugging, so bear
>> with me when I'm stumbling through the code...
>>
>> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski:
>>> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai  wrote:
>>>> At Wed, 18 Mar 2015 18:43:52 +0100,
>>>> Takashi Iwai wrote:
>>>>>
>>>>> At Wed, 18 Mar 2015 15:16:42 +0100,
>>>>> Takashi Iwai wrote:
>>>>>>
>>>>>> At Sun, 15 Mar 2015 09:17:15 +0100,
>>>>>> Stefan Seyfried wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> in 4.0-rc I have recently seen a few crashes, always when running
>>>>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this
>>>>>>> is the backtrace from dmesg.txt:
>>>>>>>
>>>>>>> [242060.604870] PANIC: double fault, error_code: 0x0
>>>
>>> OK, we double faulted.  Too bad that x86 CPUs don't tell us why.
>>>
>>>>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G   
>>>>>>>  W   4.0.0-rc3-2.gd5c547f-desktop #1
>>>>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW 
>>>>>>> (3.21 ) 12/13/2011
>>>>>>> [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 
>>>>>>> 8801013d4000
>>>>>>> [242060.604885] RIP: 0010:[]  [] 
>>>>>>> page_fault+0xd/0x30
>>>
>>> The double fault happened during page fault processing.  Could you
>>> disassemble your page_fault function to find the offending
>>> instruction?
>>
>> This one is easy:
>>
>> crash> disassemble page_fault
>> Dump of assembler code for function page_fault:
>>0x816834a0 <+0>: data32 xchg %ax,%ax
>>0x816834a3 <+3>: data32 xchg %ax,%ax
>>0x816834a6 <+6>: data32 xchg %ax,%ax
>>0x816834a9 <+9>: sub$0x78,%rsp
>>0x816834ad <+13>:callq  0x81683620 
> 
> The callq was the double-faulting instruction, and it is indeed the
> first function in here that would have accessed the stack.  (The sub
> *changes* rsp but isn't a memory access.)  So, since RSP is bogus, we
> page fault, and the page fault is promoted to a double fault.  The
> surprising thing is that the page fault itself seems to have been
> delivered okay, and RSP wasn't on a page boundary.
> 
> You wouldn't happen to be using a Broadwell machine?

No, this is a quite old Thinkpad X200s, Core2duo
processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU L9400  @ 1.86GHz
stepping: 10
microcode   : 0xa0c

> The only way to get here with bogus RSP is if we interrupted something
> that was previously running at CPL0 with similarly bogus RSP.
> 
> I don't know if I trust CR2.  It's 16 bytes lower than I'd expect.
> 
>>0x816834b2 <+18>:mov%rsp,%rdi
>>0x816834b5 <+21>:mov0x78(%rsp),%rsi
>>0x816834ba <+26>:movq   $0x,0x78(%rsp)
>>0x816834c3 <+35>:callq  0x810504e0 
>>0x816834c8 <+40>:jmpq   0x816836d0 
>> End of assembler dump.
>>
>>
>>>>>>> [242060.604893] RSP: 0018:7fffa55eafb8  EFLAGS: 00010016
>>>
>>> Uh, what?  That RSP is a user address.
>>>
>>>>>>> [242060.604895] RAX: aa40 RBX: 0001 RCX: 
>>>>>>> 81682237
>>>>>>> [242060.604896] RDX: aa40 RSI:  RDI: 
>>>>>>> 7fffa55eb078
>>>>>>> [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: 
>>>>>>> 
>>>>>>> [242060.604900] R10:  R1

Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Hi all,

first, I'm kind of happy that I'm not the only one seeing this, and
thus my beloved Thinkpad can stay for a bit longer... :-)

Then, I'm mostly an amateur when it comes to kernel debugging, so bear
with me when I'm stumbling through the code...

Am 18.03.2015 um 19:03 schrieb Andy Lutomirski:
> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai  wrote:
>> At Wed, 18 Mar 2015 18:43:52 +0100,
>> Takashi Iwai wrote:
>>>
>>> At Wed, 18 Mar 2015 15:16:42 +0100,
>>> Takashi Iwai wrote:
>>>>
>>>> At Sun, 15 Mar 2015 09:17:15 +0100,
>>>> Stefan Seyfried wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> in 4.0-rc I have recently seen a few crashes, always when running
>>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this
>>>>> is the backtrace from dmesg.txt:
>>>>>
>>>>> [242060.604870] PANIC: double fault, error_code: 0x0
> 
> OK, we double faulted.  Too bad that x86 CPUs don't tell us why.
> 
>>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G
>>>>> W   4.0.0-rc3-2.gd5c547f-desktop #1
>>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW 
>>>>> (3.21 ) 12/13/2011
>>>>> [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 
>>>>> 8801013d4000
>>>>> [242060.604885] RIP: 0010:[]  [] 
>>>>> page_fault+0xd/0x30
> 
> The double fault happened during page fault processing.  Could you
> disassemble your page_fault function to find the offending
> instruction?

This one is easy:

crash> disassemble page_fault
Dump of assembler code for function page_fault:
   0x816834a0 <+0>: data32 xchg %ax,%ax
   0x816834a3 <+3>: data32 xchg %ax,%ax
   0x816834a6 <+6>: data32 xchg %ax,%ax
   0x816834a9 <+9>: sub$0x78,%rsp
   0x816834ad <+13>:callq  0x81683620 
   0x816834b2 <+18>:mov%rsp,%rdi
   0x816834b5 <+21>:mov0x78(%rsp),%rsi
   0x816834ba <+26>:movq   $0x,0x78(%rsp)
   0x816834c3 <+35>:callq  0x810504e0 
   0x816834c8 <+40>:jmpq   0x816836d0 
End of assembler dump.


>>>>> [242060.604893] RSP: 0018:7fffa55eafb8  EFLAGS: 00010016
> 
> Uh, what?  That RSP is a user address.
> 
>>>>> [242060.604895] RAX: aa40 RBX: 0001 RCX: 
>>>>> 81682237
>>>>> [242060.604896] RDX: aa40 RSI:  RDI: 
>>>>> 7fffa55eb078
>>>>> [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: 
>>>>> 
>>>>> [242060.604900] R10:  R11: 0293 R12: 
>>>>> 004a
>>>>> [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 
>>>>> 7ffa3556cf20
>>>>> [242060.604904] FS:  7ffa33dbfa80() GS:88023bc8() 
>>>>> knlGS:
>>>>> [242060.604906] CS:  0010 DS:  ES:  CR0: 80050033
>>>>> [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 
>>>>> 000427e0
>>>>> [242060.604909] Stack:
>>>>> [242060.604942] BUG: unable to handle kernel paging request at 
>>>>> 7fffa55eafb8
>>>>> [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190
> 
> This is suspicious.  We need to have died, again, of a fatal page
> fault while dumping the stack.

I posted the same problem to the opensuse kernel list shortly before turning
to LKML. There, Michal Kubecek noted:

"I encountered a similar problem recently. The thing is, x86
specification says that on a double fault, RIP and RSP registers are
undefined, i.e. you not only can't expect them to contain values
corresponding to the first or second fault but you can't even expect
them to have any usable values at all. Unfortunately the kernel double
fault handler doesn't take this into account and does try to display
usual crash related information so that it itself does usually crash
when trying to show stack content (that's the show_stack_log_lvl()
crash).

The result is a double fault (which itself would be very hard to debug)
followed by a crash in its handler so that analysing the outcome is
extremely difficult."

I cannot judge if this is true, but it sounded related to solving the
problem to me.

>>>>> [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e

Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 19.03.2015 um 00:22 schrieb Andy Lutomirski:
 On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski l...@amacapital.net wrote:
 Yes, it's userspace.  Thanks for checking, though.
 
 One more stupid hunch:
 
 Can you do:
 x/21xg 8801013d4f58
 
 If I counted right, that'll dump task_pt_regs(current).

That's all zeroes:
crash x /21xg 0x8801013d4f58
0x8801013d4f58: 0x  0x
0x8801013d4f68: 0x  0x
0x8801013d4f78: 0x  0x
0x8801013d4f88: 0x  0x
0x8801013d4f98: 0x  0x
0x8801013d4fa8: 0x  0x
0x8801013d4fb8: 0x  0x
0x8801013d4fc8: 0x  0x
0x8801013d4fd8: 0x  0x
0x8801013d4fe8: 0x  0x
0x8801013d4ff8: 0x

But maybe you counted wrong (or I'm reading arch/x86/include/asm/processor.h 
wrong, which is at least as likely...).

#define task_pt_regs(tsk)  ((struct pt_regs *)(tsk)-thread.sp0 - 1)

= I have the task_struct readily available decoded in the crash utility.

crash task, search for thread, in thread:
 sp0 = 18446612136629993472
crash eval 18446612136629993472
hexadecimal: 8801013d8000  (18014269664677728KB)

crash print *(struct pt_regs *)(18446612136629993472 - sizeof(struct pt_regs))
$20 = {
  r15 = 18446744071585666077, 
  r14 = 16, 
  r13 = 582, 
  r12 = 18446612136629993352, 
  bp = 24, 
  bx = 18446744071585666061, 
  r11 = 582, 
  r10 = 10760856, 
  r9 = 140712613762160, 
  r8 = 140735967861216, 
  ax = 1, 
  cx = 140712476030103, 
  dx = 140712613782304, 
  si = 1, 
  di = 140712589295616, 
  orig_ax = 209, 
  ip = 140712571864823, 
  cs = 51, 
  flags = 582, 
  sp = 140735967860552, 
  ss = 43
}

=
r15 = 8168141d
r12 = 8801013d7f88
bx  = 8168140d
r9  = 7ffa355bd470
ip  = 7ffa32dc86f7
sp  = 7fffa55f1748

looks somehow legit, to my totally untrained eye (ip and sp actually).

I'm off to bed now (01:20 around here ;), will be back in about 7 hours.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 23:29 schrieb Andy Lutomirski:
 On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Wed, 18 Mar 2015, Andy Lutomirski wrote:

 sysret64 can only fail with #GP, and we're totally screwed if that
 happens,

 But what if the GPF handler pagefaults afterwards? It'd be operating on
 user stack already.
 
 Good point.
 
 Stefan, can you try changing the first jne
 opportunistic_sysret_failed to jmp opportunistic_sysret_failed in
 entry_64.S and seeing if you can reproduce this?  (Is it easy enough
 to reproduce that this would tell us anything?)

I have no good way of reproducing the issue (happens once per week...)
but apparently Takashi has, so I'd like to hand this task over to him.

 It's a shame that double_fault doesn't record what gs was on entry.
 If we did sysret - general_protection - page_fault - double_fault,
 then we'd enter double_fault with usergs, whereas syscall -
 page_fault - double_fault would enter double_fault with kernelgs.
 
 Hmm.  We may be able to answer this more directly.  Stefan, can you
 dump a couple hundred bytes starting at 0x7fffa55eafb8 (i.e. your
 page_fault stack at the time of the failure)?  That will tell us the
 faulting address.  If that fails, try starting at 7fffa55eb000
 instead.

Unfortunately not, is this userspace memory? It's not in the dump I have.
This issue is the first I have seen where having a full dump would be
really helpful apart from cosmetic reasons...
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Hi all,

first, I'm kind of happy that I'm not the only one seeing this, and
thus my beloved Thinkpad can stay for a bit longer... :-)

Then, I'm mostly an amateur when it comes to kernel debugging, so bear
with me when I'm stumbling through the code...

Am 18.03.2015 um 19:03 schrieb Andy Lutomirski:
 On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai ti...@suse.de wrote:
 At Wed, 18 Mar 2015 18:43:52 +0100,
 Takashi Iwai wrote:

 At Wed, 18 Mar 2015 15:16:42 +0100,
 Takashi Iwai wrote:

 At Sun, 15 Mar 2015 09:17:15 +0100,
 Stefan Seyfried wrote:

 Hi all,

 in 4.0-rc I have recently seen a few crashes, always when running
 KVM guests (IIRC). Today I was able to capture a crash dump, this
 is the backtrace from dmesg.txt:

 [242060.604870] PANIC: double fault, error_code: 0x0
 
 OK, we double faulted.  Too bad that x86 CPUs don't tell us why.
 
 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G
 W   4.0.0-rc3-2.gd5c547f-desktop #1
 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW 
 (3.21 ) 12/13/2011
 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 
 8801013d4000
 [242060.604885] RIP: 0010:[816834ad]  [816834ad] 
 page_fault+0xd/0x30
 
 The double fault happened during page fault processing.  Could you
 disassemble your page_fault function to find the offending
 instruction?

This one is easy:

crash disassemble page_fault
Dump of assembler code for function page_fault:
   0x816834a0 +0: data32 xchg %ax,%ax
   0x816834a3 +3: data32 xchg %ax,%ax
   0x816834a6 +6: data32 xchg %ax,%ax
   0x816834a9 +9: sub$0x78,%rsp
   0x816834ad +13:callq  0x81683620 error_entry
   0x816834b2 +18:mov%rsp,%rdi
   0x816834b5 +21:mov0x78(%rsp),%rsi
   0x816834ba +26:movq   $0x,0x78(%rsp)
   0x816834c3 +35:callq  0x810504e0 do_page_fault
   0x816834c8 +40:jmpq   0x816836d0 error_exit
End of assembler dump.


 [242060.604893] RSP: 0018:7fffa55eafb8  EFLAGS: 00010016
 
 Uh, what?  That RSP is a user address.
 
 [242060.604895] RAX: aa40 RBX: 0001 RCX: 
 81682237
 [242060.604896] RDX: aa40 RSI:  RDI: 
 7fffa55eb078
 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: 
 
 [242060.604900] R10:  R11: 0293 R12: 
 004a
 [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 
 7ffa3556cf20
 [242060.604904] FS:  7ffa33dbfa80() GS:88023bc8() 
 knlGS:
 [242060.604906] CS:  0010 DS:  ES:  CR0: 80050033
 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 
 000427e0
 [242060.604909] Stack:
 [242060.604942] BUG: unable to handle kernel paging request at 
 7fffa55eafb8
 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190
 
 This is suspicious.  We need to have died, again, of a fatal page
 fault while dumping the stack.

I posted the same problem to the opensuse kernel list shortly before turning
to LKML. There, Michal Kubecek noted:

I encountered a similar problem recently. The thing is, x86
specification says that on a double fault, RIP and RSP registers are
undefined, i.e. you not only can't expect them to contain values
corresponding to the first or second fault but you can't even expect
them to have any usable values at all. Unfortunately the kernel double
fault handler doesn't take this into account and does try to display
usual crash related information so that it itself does usually crash
when trying to show stack content (that's the show_stack_log_lvl()
crash).

The result is a double fault (which itself would be very hard to debug)
followed by a crash in its handler so that analysing the outcome is
extremely difficult.

I cannot judge if this is true, but it sounded related to solving the
problem to me.

 [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0
 [242060.605078] Oops:  [#1] PREEMPT SMP
 [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 
 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace 
 sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp 
 ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac 
 algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
 nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge 
 stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
 ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg 
 xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt 
 iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec 
 snd_hwdep snd_pcm_oss snd_pcm

Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 22:49 schrieb Denys Vlasenko:
 Stefan, Takashi, can you post your /proc/cpuinfo
 and dmesg after boot?

susi:~ # cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU L9400  @ 1.86GHz
stepping: 10
microcode   : 0xa0c
cpu MHz : 1867.000
cache size  : 6144 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl 
vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow 
vnmi flexpriority bugs:
bogomips: 3723.96
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

(repeats for second core :)

I'm running 3.19 now, but the dmesg extracted from the crash
dump of 4.0-rc3 is at http://paste.opensuse.org/48196621
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Hi Andy,

Am 18.03.2015 um 20:26 schrieb Andy Lutomirski:
 Hi Linus-
 
 You seem to enjoy debugging these things.  Want to give this a shot?
 My guess is a vmalloc fault accessing either old_rsp or kernel_stack
 right after swapgs in syscall entry.
 
 On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried
 stefan.seyfr...@googlemail.com wrote:
 Hi all,

 first, I'm kind of happy that I'm not the only one seeing this, and
 thus my beloved Thinkpad can stay for a bit longer... :-)

 Then, I'm mostly an amateur when it comes to kernel debugging, so bear
 with me when I'm stumbling through the code...

 Am 18.03.2015 um 19:03 schrieb Andy Lutomirski:
 On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai ti...@suse.de wrote:
 At Wed, 18 Mar 2015 18:43:52 +0100,
 Takashi Iwai wrote:

 At Wed, 18 Mar 2015 15:16:42 +0100,
 Takashi Iwai wrote:

 At Sun, 15 Mar 2015 09:17:15 +0100,
 Stefan Seyfried wrote:

 Hi all,

 in 4.0-rc I have recently seen a few crashes, always when running
 KVM guests (IIRC). Today I was able to capture a crash dump, this
 is the backtrace from dmesg.txt:

 [242060.604870] PANIC: double fault, error_code: 0x0

 OK, we double faulted.  Too bad that x86 CPUs don't tell us why.

 [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G   
  W   4.0.0-rc3-2.gd5c547f-desktop #1
 [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW 
 (3.21 ) 12/13/2011
 [242060.604883] task: 880103f46150 ti: 8801013d4000 task.ti: 
 8801013d4000
 [242060.604885] RIP: 0010:[816834ad]  [816834ad] 
 page_fault+0xd/0x30

 The double fault happened during page fault processing.  Could you
 disassemble your page_fault function to find the offending
 instruction?

 This one is easy:

 crash disassemble page_fault
 Dump of assembler code for function page_fault:
0x816834a0 +0: data32 xchg %ax,%ax
0x816834a3 +3: data32 xchg %ax,%ax
0x816834a6 +6: data32 xchg %ax,%ax
0x816834a9 +9: sub$0x78,%rsp
0x816834ad +13:callq  0x81683620 error_entry
 
 The callq was the double-faulting instruction, and it is indeed the
 first function in here that would have accessed the stack.  (The sub
 *changes* rsp but isn't a memory access.)  So, since RSP is bogus, we
 page fault, and the page fault is promoted to a double fault.  The
 surprising thing is that the page fault itself seems to have been
 delivered okay, and RSP wasn't on a page boundary.
 
 You wouldn't happen to be using a Broadwell machine?

No, this is a quite old Thinkpad X200s, Core2duo
processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU L9400  @ 1.86GHz
stepping: 10
microcode   : 0xa0c

 The only way to get here with bogus RSP is if we interrupted something
 that was previously running at CPL0 with similarly bogus RSP.
 
 I don't know if I trust CR2.  It's 16 bytes lower than I'd expect.
 
0x816834b2 +18:mov%rsp,%rdi
0x816834b5 +21:mov0x78(%rsp),%rsi
0x816834ba +26:movq   $0x,0x78(%rsp)
0x816834c3 +35:callq  0x810504e0 do_page_fault
0x816834c8 +40:jmpq   0x816836d0 error_exit
 End of assembler dump.


 [242060.604893] RSP: 0018:7fffa55eafb8  EFLAGS: 00010016

 Uh, what?  That RSP is a user address.

 [242060.604895] RAX: aa40 RBX: 0001 RCX: 
 81682237
 [242060.604896] RDX: aa40 RSI:  RDI: 
 7fffa55eb078
 [242060.604898] RBP: 7fffa55f1c1c R08: 0008 R09: 
 
 [242060.604900] R10:  R11: 0293 R12: 
 004a
 [242060.604902] R13: 7ffa356b5d60 R14: 000f R15: 
 7ffa3556cf20
 [242060.604904] FS:  7ffa33dbfa80() GS:88023bc8() 
 knlGS:
 [242060.604906] CS:  0010 DS:  ES:  CR0: 80050033
 [242060.604908] CR2: 7fffa55eafa8 CR3: 02d7e000 CR4: 
 000427e0
 [242060.604909] Stack:
 [242060.604942] BUG: unable to handle kernel paging request at 
 7fffa55eafb8
 [242060.604995] IP: [81005b44] show_stack_log_lvl+0x124/0x190

 This is suspicious.  We need to have died, again, of a fatal page
 fault while dumping the stack.

 I posted the same problem to the opensuse kernel list shortly before turning
 to LKML. There, Michal Kubecek noted:

 I encountered a similar problem recently. The thing is, x86
 specification says that on a double fault, RIP and RSP registers are
 undefined, i.e. you not only can't expect them to contain values
 corresponding to the first or second fault but you can't even expect
 them to have any usable values at all. Unfortunately the kernel double
 fault handler doesn't take this into account and does try to display
 usual crash related information so that it itself does usually

Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 21:51 schrieb Andy Lutomirski:
 On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried
 stefan.seyfr...@googlemail.com wrote:

 The relevant thread's stack is here (see ti in the trace):

 8801013d4000

 It could be interesting to see what's there.

 I don't suppose you want to try to walk the paging structures to see
 if 88023bc8 (i.e. gsbase) and, more specifically,
 88023bc8 + old_rsp and 88023bc8 + kernel_stack are
 present?  You'd only have to walk one level -- presumably, if the PGD
 entry is there, the rest of the entries are okay, too.

 That's all greek to me :-)

 I see that there is something at 88023bc8:

 crash x /64xg 0x88023bc8
 0x88023bc8: 0x  0x
 0x88023bc80010: 0x  0x
 0x88023bc80020: 0x  0x6686ada9
 0x88023bc80030: 0x  0x
 0x88023bc80040: 0x  0x
 [all zeroes]
 0x88023bc801f0: 0x  0x

 old_rsp and kernel_stack seem bogus:
 crash print old_rsp
 Cannot access memory at address 0xa200
 gdb: gdb request failed: print old_rsp
 crash print kernel_stack
 Cannot access memory at address 0xaa48
 gdb: gdb request failed: print kernel_stack

 kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is:
 
 Yup.  old_rsp and kernel_stack are offsets relative to gsbase.
 

 crash x /64xg 0x88023bc8aa00
 0x88023bc8aa00: 0x  0x
 
 [...]
 
 I don't know enough about crashkernel to know whether the fact that
 this worked means anything.

AFAIK this just means that the memory at this location is included in
the dump :-)

 Can you dump the page of physical memory at 0x4779a067?  That's the PGD.

Unfortunately not, this is a partial dump (I think the default config in
openSUSE, but I might have changed it some time ago) and the dump_level
is 31 which means that the following are excluded:

 |  |cache  |cache  |  |
dump | zero |without|with   | user | free
   level | page |private|private| data | page
  ---+--+---+---+--+--
  31 |  X   |   X   |   X   |  X   |  X

so this:
crash x /64xg 0x4779a067
0x4779a067: Cannot access memory at address 0x4779a067
gdb: gdb request failed: x /64xg

probably just means, that the PGD falls in one of the above excluded
categories.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 22:21 schrieb Andy Lutomirski:
 On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried
 stefan.seyfr...@googlemail.com wrote:
 Am 18.03.2015 um 21:51 schrieb Andy Lutomirski:
 On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried
 stefan.seyfr...@googlemail.com wrote:

 The relevant thread's stack is here (see ti in the trace):

 8801013d4000

 It could be interesting to see what's there.

 I don't suppose you want to try to walk the paging structures to see
 if 88023bc8 (i.e. gsbase) and, more specifically,
 88023bc8 + old_rsp and 88023bc8 + kernel_stack are
 present?  You'd only have to walk one level -- presumably, if the PGD
 entry is there, the rest of the entries are okay, too.

 That's all greek to me :-)

 I see that there is something at 88023bc8:

 crash x /64xg 0x88023bc8
 0x88023bc8: 0x  0x
 0x88023bc80010: 0x  0x
 0x88023bc80020: 0x  0x6686ada9
 0x88023bc80030: 0x  0x
 0x88023bc80040: 0x  0x
 [all zeroes]
 0x88023bc801f0: 0x  0x

 old_rsp and kernel_stack seem bogus:
 crash print old_rsp
 Cannot access memory at address 0xa200
 gdb: gdb request failed: print old_rsp
 crash print kernel_stack
 Cannot access memory at address 0xaa48
 gdb: gdb request failed: print kernel_stack

 kernel_stack is not a pointer? So 0x88023bc8 + 0xaa48 it is:

 Yup.  old_rsp and kernel_stack are offsets relative to gsbase.


 crash x /64xg 0x88023bc8aa00
 0x88023bc8aa00: 0x  0x

 [...]

 I don't know enough about crashkernel to know whether the fact that
 this worked means anything.

 AFAIK this just means that the memory at this location is included in
 the dump :-)

 Can you dump the page of physical memory at 0x4779a067?  That's the PGD.

 Unfortunately not, this is a partial dump (I think the default config in
 openSUSE, but I might have changed it some time ago) and the dump_level
 is 31 which means that the following are excluded:

  |  |cache  |cache  |  |
 dump | zero |without|with   | user | free
level | page |private|private| data | page
   ---+--+---+---+--+--
   31 |  X   |   X   |   X   |  X   |  X

 so this:
 crash x /64xg 0x4779a067
 0x4779a067: Cannot access memory at address 0x4779a067
 gdb: gdb request failed: x /64xg

 probably just means, that the PGD falls in one of the above excluded
 categories.
 
 I suspect that it actually means that gdb sees virtual addresses, not
 physical addresses.  But I screwed up completely -- PGD in the dump
 is the PGD *entry*, not the PGD pointer.

in crash, usually physical addresses work (it's a sophisticated wrapper
around gdb AFAICT)
 
 We could plausibly fish it out from current-mm, but that's a mess.

I'll come to that later
  I
 don't suppose that info registers or p/x $cr3 will show the cr3
 value?

No, that does not work from crash.

But current-mm is easy:
crash task|grep mm
  start_comm =
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
  mm = 0x8800b8a9c040,
  active_mm = 0x8800b8a9c040,
  comm = qemu-system-x86,

and (guessing the type :-)
crash print *(struct mm_struct *)0x8800b8a9c040|grep pgd
  pgd = 0x880002d7e000,

But if that's correct, pgd contains all zeroes:
crash print *(pgd_t *)0x880002d7e000
$15 = {
  pgd = 0
}
crash x /16xg 0x880002d7e000
0x880002d7e000: 0x  0x
0x880002d7e010: 0x  0x
0x880002d7e020: 0x  0x
0x880002d7e030: 0x  0x
0x880002d7e040: 0x  0x
0x880002d7e050: 0x  0x
0x880002d7e060: 0x  0x
0x880002d7e070: 0x  0x

 In any case, Denys is right -- my theory doesn't really hold water on
 non-SMAP systems.

Mine is definitely not new enough for this feature :)

Maybe it would be more helpful if Takashi who is able to reproduce this
more reliably than me would do a crash dump, preferably with a lower
dumplevel, to investigate on.
I have seen the bug two or three times in a week or two, which makes
waiting for it to happen a boring experience.

Best regards,

Stefan

-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body

Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-18 Thread Stefan Seyfried
Am 18.03.2015 um 22:32 schrieb Linus Torvalds:
 Is PARAVIRT enabled? The three nop's at the beginning of 'page_fault'
 makes me suspect it is,  and that that is some paravirt rewriting
 area. What does paravirt go for that USERGS_SYSRET64 (or for
 SWAPGS_UNSAFE_STACK, for that matter).

This from the newer kernel package, but I doubt this configuration has
been changed in the openSUSE kernel:

susi:~ # grep PARAVIRT /boot/config-4.0.0-rc4-1.g126fc64-desktop
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
# CONFIG_PARAVIRT_SPINLOCKS is not set
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_PARAVIRT_CLOCK=y

So yes, PARAVIRT is enabled.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-15 Thread Stefan Seyfried
 power button for 5 seconds.

Unfortunately, I cannot load the crashdump with the crash version in
openSUSE Tumbleweed, so the backtrace is all I have for now.

Any hints?

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

2015-03-15 Thread Stefan Seyfried
: 7fffa55eafb8

I would not totally rule out a hardware problem, since this machine had
another weird crash where it crashed and the bios beeper was constant
on until I hit the power button for 5 seconds.

Unfortunately, I cannot load the crashdump with the crash version in
openSUSE Tumbleweed, so the backtrace is all I have for now.

Any hints?

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Revert "blk-mq: fix hctx/ctx kobject use-after-free"

2015-02-03 Thread Stefan Seyfried
Am 03.02.2015 um 22:50 schrieb Jens Axboe:
> On 02/03/2015 12:14 PM, Jens Axboe wrote:
>> On 02/03/2015 12:13 PM, Stefan Seyfried wrote:
>>> Am 29.01.2015 um 13:17 schrieb Ming Lei:
>>>> This reverts commit 76d697d10769048e5721510100bf3a9413a56385.
>>> The revert is not yet in Linus' tree (but it should get there before
>>> 3.19 is released, or all USB-stick users will be unhappy).
>>
>> It'll go out later today.
> 
> It's in Linus' tree now.

...and works well for my trivial "plug and unplug an USB stick" testcase.
(I did not want to push, just make sure it wasn't forgotten :)

Thanks all,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Revert "blk-mq: fix hctx/ctx kobject use-after-free"

2015-02-03 Thread Stefan Seyfried
Am 29.01.2015 um 13:17 schrieb Ming Lei:
> This reverts commit 76d697d10769048e5721510100bf3a9413a56385.
> 
> The commit 76d697d10769048 causes general protection fault
> reported from Bart Van Assche:
> 
>   https://lkml.org/lkml/2015/1/28/334

I bisected the "unplugging my USB stick crashes the kernel" problem
today and came to this very commit.

The revert is not yet in Linus' tree (but it should get there before
3.19 is released, or all USB-stick users will be unhappy).

Best regards,

Stefan

> Reported-by: Bart Van Assche 
> Signed-off-by: Ming Lei 
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Revert blk-mq: fix hctx/ctx kobject use-after-free

2015-02-03 Thread Stefan Seyfried
Am 03.02.2015 um 22:50 schrieb Jens Axboe:
 On 02/03/2015 12:14 PM, Jens Axboe wrote:
 On 02/03/2015 12:13 PM, Stefan Seyfried wrote:
 Am 29.01.2015 um 13:17 schrieb Ming Lei:
 This reverts commit 76d697d10769048e5721510100bf3a9413a56385.
 The revert is not yet in Linus' tree (but it should get there before
 3.19 is released, or all USB-stick users will be unhappy).

 It'll go out later today.
 
 It's in Linus' tree now.

...and works well for my trivial plug and unplug an USB stick testcase.
(I did not want to push, just make sure it wasn't forgotten :)

Thanks all,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Revert blk-mq: fix hctx/ctx kobject use-after-free

2015-02-03 Thread Stefan Seyfried
Am 29.01.2015 um 13:17 schrieb Ming Lei:
 This reverts commit 76d697d10769048e5721510100bf3a9413a56385.
 
 The commit 76d697d10769048 causes general protection fault
 reported from Bart Van Assche:
 
   https://lkml.org/lkml/2015/1/28/334

I bisected the unplugging my USB stick crashes the kernel problem
today and came to this very commit.

The revert is not yet in Linus' tree (but it should get there before
3.19 is released, or all USB-stick users will be unhappy).

Best regards,

Stefan

 Reported-by: Bart Van Assche bart.vanass...@sandisk.com
 Signed-off-by: Ming Lei ming@canonical.com
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Hi Takashi,

yes, this no longer crashes. No real-world test yet, but the obvious
crash is gone. Thanks!

Am 07.11.2014 um 14:22 schrieb Takashi Iwai:
> At Fri, 07 Nov 2014 12:10:46 +0100,
> Stefan Seyfried wrote:
>>
>> Hi all,
>>
>> since 3.18-rc1, setting up a PPP interface kills my kernel with
>>
>> [  163.433251] PPP generic driver version 2.4.2
>> [  164.452474] [ cut here ]
>> [  164.453327] kernel BUG at ../mm/vmalloc.c:1316!
>> [  164.453327] invalid opcode:  [#1] PREEMPT SMP 
>> [  164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc 
>> af_packet xfs libcrc32c coretemp kvm_intel 
>> snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support 
>> uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc 
>> snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core 
>> v4l2_common snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi 
>> serio_raw wmi lpc_ich snd_timer thermal snd rfkill mfd_core tpm_tis shpchp 
>> mei_me soundcore ptp mei pps_core acpi_cpufreq tpm battery processor ac 
>> dm_mod btrfs xor raid6_pq i915 i2c_algo_bit drm_kms_helper drm video button 
>> sg
>> [  164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted 
>> 3.18.0-rc3-3.ge706e91-desktop #1
>> [  164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) 
>> 11/10/2009
>>
>> This is easy to reproduce with:
>>
>> linux:~ # cat bin/crashme.sh 
>> 
>> #!/bin/bash -x
>> pppd local pty "netcat -l 1234" &
>> sleep 1
>> pppd local pty "netcat localhost 1234" &
>> sleep 1
>> 
>>
>> 3.17 works fine.
>> I bisected the issue multiple times and always arrived at
>>
>> # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 
>> 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>
>> which is a merge commit unfortunately.
>>
>> The BUG encountered above is in:
>>
>> 1309 static struct vm_struct *__get_vm_area_node(unsigned long size,
>> 1310 unsigned long align, unsigned long flags, unsigned long 
>> start,
>> 1311 unsigned long end, int node, gfp_t gfp_mask, const void 
>> *caller)
>> 1312 {
>> 1313 struct vmap_area *va;
>> 1314 struct vm_struct *area;
>> 1315 
>> 1316 BUG_ON(in_interrupt());
>> 1317 if (flags & VM_IOREMAP)
>> 1318 align = 1ul << clamp(fls(size), PAGE_SHIFT, 
>> IOREMAP_MAX_ORDER);
>> 1319 
>>
>> the call trace is:
>> [  164.453327] Call Trace:
>> [  164.453327]  [] __vmalloc_node_range+0x6d/0x290
>> [  164.453327]  [] __vmalloc+0x3e/0x50
>> [  164.453327]  [] bpf_prog_alloc+0x30/0xa0
>> [  164.453327]  [] bpf_prog_create+0x46/0xb0
>> [  164.453327]  [] ppp_ioctl+0x420/0xe9a [ppp_generic]
>> [  164.453327]  [] do_vfs_ioctl+0x2e7/0x4c0
>> [  164.453327]  [] SyS_ioctl+0x81/0xa0
>> [  164.453327]  [] system_call_fastpath+0x16/0x1b
>> [  164.453327]  [<7f4502d87397>] 0x7f4502d87397
> 
> bpf_prog_create() is called inside spin_lock_bh(), and the BUG_ON()
> hits.  Below is a quick fix.
> 
> 
> Takashi
> 
> -- 8< --
> From: Takashi Iwai 
> Subject: [PATCH] net: ppp: Don't call bpf_prog_create() in ppp_lock
> 
> In ppp_ioctl(), bpf_prog_create() is called inside ppp_lock, which
> eventually calls vmalloc() and hits BUG_ON() in vmalloc.c.  This patch
> works around the problem by moving the allocation outside the lock.
> 
> Reported-by: Stefan Seyfried 
> Signed-off-by: Takashi Iwai 

FWIW :-)
Tested-by: Stefan Seyfried 

> ---
>  drivers/net/ppp/ppp_generic.c | 40 
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
> index 68c3a3f4e0ab..794a47329368 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -755,23 +755,23 @@ static long ppp_ioctl(struct file *file, unsigned int 
> cmd, unsigned long arg)
>  
>   err = get_filter(argp, );
>   if (err >= 0) {
> + struct bpf_prog *pass_filter = NULL;
>   struct sock_fprog_kern fprog = {
>   .len = err,
>   .filter = code,
>   };
>  
> - ppp_lock(ppp);
> - if (ppp->pass_filter) {
> - bpf_prog_destroy(ppp->

Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Am 07.11.2014 um 12:56 schrieb Stefan Seyfried:
> Hi Paul,
> 
> Am 07.11.2014 um 12:53 schrieb Paul Bolle:
>> Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your
>> v3.18-rc3 .config?
> 
> Yes it is:
> tux@linux:~> zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz
> CONFIG_RCU_NOCB_CPU=y
> # CONFIG_RCU_NOCB_CPU_NONE is not set
> # CONFIG_RCU_NOCB_CPU_ZERO is not set
> CONFIG_RCU_NOCB_CPU_ALL=y
> 
> And I'll try without it, but looking at the backtrace and the actual
> BUG_ON() in the code, I cannot really believe it is the real problems.
> 
> But I'll try with the config changed and with the above line removed.

JFTR, this did not help:
tux@linux:~/linux> zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz
# CONFIG_RCU_NOCB_CPU is not set

neither did:

--- a/init/main.c
+++ b/init/main.c
@@ -583,7 +583,7 @@ asmlinkage __visible void __init start_kernel(void)
early_irq_init();
init_IRQ();
tick_init();
-   rcu_init_nohz();
+// rcu_init_nohz();
init_timers();
    hrtimers_init();
softirq_init();
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Hi Paul,

Am 07.11.2014 um 12:53 schrieb Paul Bolle:
> On Fri, 2014-11-07 at 12:10 +0100, Stefan Seyfried wrote:
>> I bisected the issue multiple times and always arrived at
>>
>> # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 
>> 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>
>> which is a merge commit unfortunately.
> 
> That merge commit actually does add some code:
> 
> git show d6dd50e07c5bec00db2005969b1a01f8ca3d25ef
> [...]
> diff --cc init/main.c
> index 8af2f1abfe38,e3c4cdd94d5b..c5c11da6c4e1
> --- a/init/main.c
> +++ b/init/main.c
> @@@ -583,6 -585,6 +583,7 @@@ asmlinkage __visible void __init start_
>   early_irq_init();
>   init_IRQ();
>   tick_init();
> ++rcu_init_nohz();
>   init_timers();
>   hrtimers_init();
>   softirq_init();
> 
> Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your
> v3.18-rc3 .config?

Yes it is:
tux@linux:~> zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_NONE is not set
# CONFIG_RCU_NOCB_CPU_ZERO is not set
CONFIG_RCU_NOCB_CPU_ALL=y

And I'll try without it, but looking at the backtrace and the actual
BUG_ON() in the code, I cannot really believe it is the real problems.

But I'll try with the config changed and with the above line removed.

Thanks,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Hi all,

since 3.18-rc1, setting up a PPP interface kills my kernel with

[  163.433251] PPP generic driver version 2.4.2
[  164.452474] [ cut here ]
[  164.453327] kernel BUG at ../mm/vmalloc.c:1316!
[  164.453327] invalid opcode:  [#1] PREEMPT SMP 
[  164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc 
af_packet xfs libcrc32c coretemp kvm_intel 
snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support 
uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc 
snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core v4l2_common 
snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi serio_raw wmi lpc_ich 
snd_timer thermal snd rfkill mfd_core tpm_tis shpchp mei_me soundcore ptp mei 
pps_core acpi_cpufreq tpm battery processor ac dm_mod btrfs xor raid6_pq i915 
i2c_algo_bit drm_kms_helper drm video button sg
[  164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted 
3.18.0-rc3-3.ge706e91-desktop #1
[  164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) 
11/10/2009

This is easy to reproduce with:

linux:~ # cat bin/crashme.sh 

#!/bin/bash -x
pppd local pty "netcat -l 1234" &
sleep 1
pppd local pty "netcat localhost 1234" &
sleep 1


3.17 works fine.
I bisected the issue multiple times and always arrived at

# first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 
'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

which is a merge commit unfortunately.

The BUG encountered above is in:

1309 static struct vm_struct *__get_vm_area_node(unsigned long size,
1310 unsigned long align, unsigned long flags, unsigned long 
start,
1311 unsigned long end, int node, gfp_t gfp_mask, const void 
*caller)
1312 {
1313 struct vmap_area *va;
1314 struct vm_struct *area;
1315 
1316 BUG_ON(in_interrupt());
1317 if (flags & VM_IOREMAP)
1318 align = 1ul << clamp(fls(size), PAGE_SHIFT, 
IOREMAP_MAX_ORDER);
1319 

the call trace is:
[  164.453327] Call Trace:
[  164.453327]  [] __vmalloc_node_range+0x6d/0x290
[  164.453327]  [] __vmalloc+0x3e/0x50
[  164.453327]  [] bpf_prog_alloc+0x30/0xa0
[  164.453327]  [] bpf_prog_create+0x46/0xb0
[  164.453327]  [] ppp_ioctl+0x420/0xe9a [ppp_generic]
[  164.453327]  [] do_vfs_ioctl+0x2e7/0x4c0
[  164.453327]  [] SyS_ioctl+0x81/0xa0
[  164.453327]  [] system_call_fastpath+0x16/0x1b
[  164.453327]  [<7f4502d87397>] 0x7f4502d87397

I have a crashdump of the kernel, but given this is easily reproducible, I doubt
that I need to send this to anyone :-)

Best regards,

Stefan

-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Hi all,

since 3.18-rc1, setting up a PPP interface kills my kernel with

[  163.433251] PPP generic driver version 2.4.2
[  164.452474] [ cut here ]
[  164.453327] kernel BUG at ../mm/vmalloc.c:1316!
[  164.453327] invalid opcode:  [#1] PREEMPT SMP 
[  164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc 
af_packet xfs libcrc32c coretemp kvm_intel 
snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support 
uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc 
snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core v4l2_common 
snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi serio_raw wmi lpc_ich 
snd_timer thermal snd rfkill mfd_core tpm_tis shpchp mei_me soundcore ptp mei 
pps_core acpi_cpufreq tpm battery processor ac dm_mod btrfs xor raid6_pq i915 
i2c_algo_bit drm_kms_helper drm video button sg
[  164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted 
3.18.0-rc3-3.ge706e91-desktop #1
[  164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) 
11/10/2009

This is easy to reproduce with:

linux:~ # cat bin/crashme.sh 

#!/bin/bash -x
pppd local pty netcat -l 1234 
sleep 1
pppd local pty netcat localhost 1234 
sleep 1


3.17 works fine.
I bisected the issue multiple times and always arrived at

# first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 
'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

which is a merge commit unfortunately.

The BUG encountered above is in:

1309 static struct vm_struct *__get_vm_area_node(unsigned long size,
1310 unsigned long align, unsigned long flags, unsigned long 
start,
1311 unsigned long end, int node, gfp_t gfp_mask, const void 
*caller)
1312 {
1313 struct vmap_area *va;
1314 struct vm_struct *area;
1315 
1316 BUG_ON(in_interrupt());
1317 if (flags  VM_IOREMAP)
1318 align = 1ul  clamp(fls(size), PAGE_SHIFT, 
IOREMAP_MAX_ORDER);
1319 

the call trace is:
[  164.453327] Call Trace:
[  164.453327]  [811974bd] __vmalloc_node_range+0x6d/0x290
[  164.453327]  [8119771e] __vmalloc+0x3e/0x50
[  164.453327]  [81146950] bpf_prog_alloc+0x30/0xa0
[  164.453327]  [8157b716] bpf_prog_create+0x46/0xb0
[  164.453327]  [a07ecb90] ppp_ioctl+0x420/0xe9a [ppp_generic]
[  164.453327]  [811df1c7] do_vfs_ioctl+0x2e7/0x4c0
[  164.453327]  [811df421] SyS_ioctl+0x81/0xa0
[  164.453327]  [8165ee2d] system_call_fastpath+0x16/0x1b
[  164.453327]  [7f4502d87397] 0x7f4502d87397

I have a crashdump of the kernel, but given this is easily reproducible, I doubt
that I need to send this to anyone :-)

Best regards,

Stefan

-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Hi Paul,

Am 07.11.2014 um 12:53 schrieb Paul Bolle:
 On Fri, 2014-11-07 at 12:10 +0100, Stefan Seyfried wrote:
 I bisected the issue multiple times and always arrived at

 # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 
 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

 which is a merge commit unfortunately.
 
 That merge commit actually does add some code:
 
 git show d6dd50e07c5bec00db2005969b1a01f8ca3d25ef
 [...]
 diff --cc init/main.c
 index 8af2f1abfe38,e3c4cdd94d5b..c5c11da6c4e1
 --- a/init/main.c
 +++ b/init/main.c
 @@@ -583,6 -585,6 +583,7 @@@ asmlinkage __visible void __init start_
   early_irq_init();
   init_IRQ();
   tick_init();
 ++rcu_init_nohz();
   init_timers();
   hrtimers_init();
   softirq_init();
 
 Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your
 v3.18-rc3 .config?

Yes it is:
tux@linux:~ zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_NONE is not set
# CONFIG_RCU_NOCB_CPU_ZERO is not set
CONFIG_RCU_NOCB_CPU_ALL=y

And I'll try without it, but looking at the backtrace and the actual
BUG_ON() in the code, I cannot really believe it is the real problems.

But I'll try with the config changed and with the above line removed.

Thanks,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Am 07.11.2014 um 12:56 schrieb Stefan Seyfried:
 Hi Paul,
 
 Am 07.11.2014 um 12:53 schrieb Paul Bolle:
 Wild guess: is CONFIG_RCU_NOCB_CPU perhaps set in your
 v3.18-rc3 .config?
 
 Yes it is:
 tux@linux:~ zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz
 CONFIG_RCU_NOCB_CPU=y
 # CONFIG_RCU_NOCB_CPU_NONE is not set
 # CONFIG_RCU_NOCB_CPU_ZERO is not set
 CONFIG_RCU_NOCB_CPU_ALL=y
 
 And I'll try without it, but looking at the backtrace and the actual
 BUG_ON() in the code, I cannot really believe it is the real problems.
 
 But I'll try with the config changed and with the above line removed.

JFTR, this did not help:
tux@linux:~/linux zgrep CONFIG_RCU_NOCB_CPU /proc/config.gz
# CONFIG_RCU_NOCB_CPU is not set

neither did:

--- a/init/main.c
+++ b/init/main.c
@@ -583,7 +583,7 @@ asmlinkage __visible void __init start_kernel(void)
early_irq_init();
init_IRQ();
tick_init();
-   rcu_init_nohz();
+// rcu_init_nohz();
init_timers();
hrtimers_init();
softirq_init();
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] in 3.18-rc1: ppp crashes kernel

2014-11-07 Thread Stefan Seyfried
Hi Takashi,

yes, this no longer crashes. No real-world test yet, but the obvious
crash is gone. Thanks!

Am 07.11.2014 um 14:22 schrieb Takashi Iwai:
 At Fri, 07 Nov 2014 12:10:46 +0100,
 Stefan Seyfried wrote:

 Hi all,

 since 3.18-rc1, setting up a PPP interface kills my kernel with

 [  163.433251] PPP generic driver version 2.4.2
 [  164.452474] [ cut here ]
 [  164.453327] kernel BUG at ../mm/vmalloc.c:1316!
 [  164.453327] invalid opcode:  [#1] PREEMPT SMP 
 [  164.453327] Modules linked in: ppp_async crc_ccitt ppp_generic slhc 
 af_packet xfs libcrc32c coretemp kvm_intel 
 snd_hda_codec_conexant iTCO_wdt snd_hda_codec_generic iTCO_vendor_support 
 uvcvideo snd_hda_intel snd_hda_controller mac80211 videobuf2_vmalloc 
 snd_hda_codec kvm e1000e videobuf2_memops cfg80211 videobuf2_core 
 v4l2_common snd_hwdep i2c_i801 videodev snd_pcm pcspkr thinkpad_acpi 
 serio_raw wmi lpc_ich snd_timer thermal snd rfkill mfd_core tpm_tis shpchp 
 mei_me soundcore ptp mei pps_core acpi_cpufreq tpm battery processor ac 
 dm_mod btrfs xor raid6_pq i915 i2c_algo_bit drm_kms_helper drm video button 
 sg
 [  164.453327] CPU: 0 PID: 6927 Comm: pppd Not tainted 
 3.18.0-rc3-3.ge706e91-desktop #1
 [  164.453327] Hardware name: LENOVO 7470E36/7470E36, BIOS 6DET61WW (3.11 ) 
 11/10/2009

 This is easy to reproduce with:

 linux:~ # cat bin/crashme.sh 
 
 #!/bin/bash -x
 pppd local pty netcat -l 1234 
 sleep 1
 pppd local pty netcat localhost 1234 
 sleep 1
 

 3.17 works fine.
 I bisected the issue multiple times and always arrived at

 # first bad commit: [d6dd50e07c5bec00db2005969b1a01f8ca3d25ef] Merge branch 
 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

 which is a merge commit unfortunately.

 The BUG encountered above is in:

 1309 static struct vm_struct *__get_vm_area_node(unsigned long size,
 1310 unsigned long align, unsigned long flags, unsigned long 
 start,
 1311 unsigned long end, int node, gfp_t gfp_mask, const void 
 *caller)
 1312 {
 1313 struct vmap_area *va;
 1314 struct vm_struct *area;
 1315 
 1316 BUG_ON(in_interrupt());
 1317 if (flags  VM_IOREMAP)
 1318 align = 1ul  clamp(fls(size), PAGE_SHIFT, 
 IOREMAP_MAX_ORDER);
 1319 

 the call trace is:
 [  164.453327] Call Trace:
 [  164.453327]  [811974bd] __vmalloc_node_range+0x6d/0x290
 [  164.453327]  [8119771e] __vmalloc+0x3e/0x50
 [  164.453327]  [81146950] bpf_prog_alloc+0x30/0xa0
 [  164.453327]  [8157b716] bpf_prog_create+0x46/0xb0
 [  164.453327]  [a07ecb90] ppp_ioctl+0x420/0xe9a [ppp_generic]
 [  164.453327]  [811df1c7] do_vfs_ioctl+0x2e7/0x4c0
 [  164.453327]  [811df421] SyS_ioctl+0x81/0xa0
 [  164.453327]  [8165ee2d] system_call_fastpath+0x16/0x1b
 [  164.453327]  [7f4502d87397] 0x7f4502d87397
 
 bpf_prog_create() is called inside spin_lock_bh(), and the BUG_ON()
 hits.  Below is a quick fix.
 
 
 Takashi
 
 -- 8 --
 From: Takashi Iwai ti...@suse.de
 Subject: [PATCH] net: ppp: Don't call bpf_prog_create() in ppp_lock
 
 In ppp_ioctl(), bpf_prog_create() is called inside ppp_lock, which
 eventually calls vmalloc() and hits BUG_ON() in vmalloc.c.  This patch
 works around the problem by moving the allocation outside the lock.
 
 Reported-by: Stefan Seyfried stefan.seyfr...@googlemail.com
 Signed-off-by: Takashi Iwai ti...@suse.de

FWIW :-)
Tested-by: Stefan Seyfried stefan.seyfr...@googlemail.com

 ---
  drivers/net/ppp/ppp_generic.c | 40 
  1 file changed, 20 insertions(+), 20 deletions(-)
 
 diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
 index 68c3a3f4e0ab..794a47329368 100644
 --- a/drivers/net/ppp/ppp_generic.c
 +++ b/drivers/net/ppp/ppp_generic.c
 @@ -755,23 +755,23 @@ static long ppp_ioctl(struct file *file, unsigned int 
 cmd, unsigned long arg)
  
   err = get_filter(argp, code);
   if (err = 0) {
 + struct bpf_prog *pass_filter = NULL;
   struct sock_fprog_kern fprog = {
   .len = err,
   .filter = code,
   };
  
 - ppp_lock(ppp);
 - if (ppp-pass_filter) {
 - bpf_prog_destroy(ppp-pass_filter);
 - ppp-pass_filter = NULL;
 + err = 0;
 + if (fprog.filter)
 + err = bpf_prog_create(pass_filter, fprog);
 + if (!err) {
 + ppp_lock(ppp);
 + if (ppp-pass_filter)
 + bpf_prog_destroy(ppp-pass_filter);
 + ppp-pass_filter = pass_filter;
 + ppp_unlock(ppp

[PATCH] Makefile: fix syntax error in warning message

2014-02-19 Thread stefan . seyfried
From: Stefan Seyfried 

---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 893d6f0..eeaf3e7 100644
--- a/Makefile
+++ b/Makefile
@@ -606,7 +606,7 @@ ifdef CONFIG_CC_STACKPROTECTOR_REGULAR
   stackp-flag := -fstack-protector
   ifeq ($(call cc-option, $(stackp-flag)),)
 $(warning Cannot use CONFIG_CC_STACKPROTECTOR: \
- -fstack-protector not supported by compiler))
+ -fstack-protector not supported by compiler)
   endif
 else ifdef CONFIG_CC_STACKPROTECTOR_STRONG
   stackp-flag := -fstack-protector-strong
-- 
1.8.5.2


Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Makefile: fix syntax error in warning message

2014-02-19 Thread stefan . seyfried
From: Stefan Seyfried seife+ker...@b1-systems.com

---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 893d6f0..eeaf3e7 100644
--- a/Makefile
+++ b/Makefile
@@ -606,7 +606,7 @@ ifdef CONFIG_CC_STACKPROTECTOR_REGULAR
   stackp-flag := -fstack-protector
   ifeq ($(call cc-option, $(stackp-flag)),)
 $(warning Cannot use CONFIG_CC_STACKPROTECTOR: \
- -fstack-protector not supported by compiler))
+ -fstack-protector not supported by compiler)
   endif
 else ifdef CONFIG_CC_STACKPROTECTOR_STRONG
   stackp-flag := -fstack-protector-strong
-- 
1.8.5.2


Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


8250_pci: improve code comments and Kconfig help

2013-07-01 Thread stefan . seyfried
Hi Greg,

in order to avoid such regressions in the future, a comment in
the source and a note in the Kconfig help text might be useful

This patch is against
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-next

Best regards,

Stefan

-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 8250_pci: improve code comments and Kconfig help

2013-07-01 Thread stefan . seyfried
From: Stefan Seyfried 

The recent regression about NetMos 9835 Multi-I/O boards indicates
that comment pointing to the parport_serial driver could be helpful.

Signed-off-by: Stefan Seyfried 
---
 drivers/tty/serial/8250/8250_pci.c | 6 ++
 drivers/tty/serial/8250/Kconfig| 2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_pci.c 
b/drivers/tty/serial/8250/8250_pci.c
index c52948b..c626c4f 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -4797,6 +4797,12 @@ static struct pci_device_id serial_pci_tbl[] = {
PCI_VENDOR_ID_IBM, 0x0299,
0, 0, pbn_b0_bt_2_115200 },
 
+   /*
+* other NetMos 9835 devices are most likely handled by the
+* parport_serial driver, check drivers/parport/parport_serial.c
+* before adding them here.
+*/
+
{   PCI_VENDOR_ID_NETMOS, PCI_DEVICE_ID_NETMOS_9901,
0xA000, 0x1000,
0, 0, pbn_b0_1_115200 },
diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
index a1ba94d..f3b306e 100644
--- a/drivers/tty/serial/8250/Kconfig
+++ b/drivers/tty/serial/8250/Kconfig
@@ -116,6 +116,8 @@ config SERIAL_8250_PCI
  This builds standard PCI serial support. You may be able to
  disable this feature if you only need legacy serial support.
  Saves about 9K.
+ Note that serial ports on NetMos 9835 Multi-I/O cards are handled
+ by the parport_serial driver, enabled with CONFIG_PARPORT_SERIAL.
 
 config SERIAL_8250_HP300
tristate
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


8250_pci: improve code comments and Kconfig help

2013-07-01 Thread stefan . seyfried
Hi Greg,

in order to avoid such regressions in the future, a comment in
the source and a note in the Kconfig help text might be useful

This patch is against
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-next

Best regards,

Stefan

-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 8250_pci: improve code comments and Kconfig help

2013-07-01 Thread stefan . seyfried
From: Stefan Seyfried seife+ker...@b1-systems.com

The recent regression about NetMos 9835 Multi-I/O boards indicates
that comment pointing to the parport_serial driver could be helpful.

Signed-off-by: Stefan Seyfried seife+ker...@b1-systems.com
---
 drivers/tty/serial/8250/8250_pci.c | 6 ++
 drivers/tty/serial/8250/Kconfig| 2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_pci.c 
b/drivers/tty/serial/8250/8250_pci.c
index c52948b..c626c4f 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -4797,6 +4797,12 @@ static struct pci_device_id serial_pci_tbl[] = {
PCI_VENDOR_ID_IBM, 0x0299,
0, 0, pbn_b0_bt_2_115200 },
 
+   /*
+* other NetMos 9835 devices are most likely handled by the
+* parport_serial driver, check drivers/parport/parport_serial.c
+* before adding them here.
+*/
+
{   PCI_VENDOR_ID_NETMOS, PCI_DEVICE_ID_NETMOS_9901,
0xA000, 0x1000,
0, 0, pbn_b0_1_115200 },
diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
index a1ba94d..f3b306e 100644
--- a/drivers/tty/serial/8250/Kconfig
+++ b/drivers/tty/serial/8250/Kconfig
@@ -116,6 +116,8 @@ config SERIAL_8250_PCI
  This builds standard PCI serial support. You may be able to
  disable this feature if you only need legacy serial support.
  Saves about 9K.
+ Note that serial ports on NetMos 9835 Multi-I/O cards are handled
+ by the parport_serial driver, enabled with CONFIG_PARPORT_SERIAL.
 
 config SERIAL_8250_HP300
tristate
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


commit 8d2f8cd424 breaks parallel port, regression since 3.9-rc3 / backported to stable (3.4.37)

2013-06-30 Thread Stefan Seyfried
Hi all,

the following commit:

commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366
Author: Wang YanQing 
Date:   Fri Mar 1 11:47:20 2013 +0800

serial: 8250_pci: add support for another kind of NetMos Technology PCI 
9835 Multi-I/O Controller

01:08.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O 
Controller (rev 01)
Subsystem: Device [1000:0012]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
Cc: stable 
Signed-off-by: Greg Kroah-Hartman 


breaks my
05:05.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O 
Controller (rev 01)
05:05.0 0780: 9710:9835 (rev 01)
Subsystem: 1000:0012

which has two serial and one parallel port, driven by parport_serial.

The reason is, that this commit adds the PCI ID to 8250_pci, when it
was handled by parport_serial before.
In my case (openSUSE kernel), 8250 is built in and parport_serial is
built as a module. Unfortunately with the device occupied by 8250,
parport_serial finds no device and thus does not drive the parport.

I bisected this in the stable series after the openSUSE kernel update
(which pulled in the stable kernel update) broke my printing.

Actually the above commit is totally unnecessary: the serial ports
work very well without it, they are just driven by another driver.

Can this please be reverted? I can't see which problem it solves, but
it definitely breaks the additional ports on my multi-i/o board.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


commit 8d2f8cd424 breaks parallel port, regression since 3.9-rc3 / backported to stable (3.4.37)

2013-06-30 Thread Stefan Seyfried
Hi all,

the following commit:

commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366
Author: Wang YanQing udkni...@gmail.com
Date:   Fri Mar 1 11:47:20 2013 +0800

serial: 8250_pci: add support for another kind of NetMos Technology PCI 
9835 Multi-I/O Controller

01:08.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O 
Controller (rev 01)
Subsystem: Device [1000:0012]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR- INTx-
Interrupt: pin A routed to IRQ 20
Region 0: I/O ports at e050 [size=8]
Region 1: I/O ports at e040 [size=8]
Region 2: I/O ports at e030 [size=8]
Region 3: I/O ports at e020 [size=8]
Region 4: I/O ports at e010 [size=8]
Region 5: I/O ports at e000 [size=16]

Signed-off-by: Wang YanQing udkni...@gmail.com
Cc: stable sta...@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org


breaks my
05:05.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O 
Controller (rev 01)
05:05.0 0780: 9710:9835 (rev 01)
Subsystem: 1000:0012

which has two serial and one parallel port, driven by parport_serial.

The reason is, that this commit adds the PCI ID to 8250_pci, when it
was handled by parport_serial before.
In my case (openSUSE kernel), 8250 is built in and parport_serial is
built as a module. Unfortunately with the device occupied by 8250,
parport_serial finds no device and thus does not drive the parport.

I bisected this in the stable series after the openSUSE kernel update
(which pulled in the stable kernel update) broke my printing.

Actually the above commit is totally unnecessary: the serial ports
work very well without it, they are just driven by another driver.

Can this please be reverted? I can't see which problem it solves, but
it definitely breaks the additional ports on my multi-i/o board.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-29 Thread Stefan Seyfried
Hi all,

I hate to say it, but this regression from 3.9 is still present in
3.10-rc7 :-(

Am 19.06.2013 11:02, schrieb Stefan Seyfried:
> The suspend/resume failure is easily reproduced by
> 
> * booting with "init=/bin/bash no_console_suspend"
> * mount /sys
> * echo mem > /sys/power/state
> * resume => lots of messages, finally kernel panic.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-29 Thread Stefan Seyfried
Hi all,

I hate to say it, but this regression from 3.9 is still present in
3.10-rc7 :-(

Am 19.06.2013 11:02, schrieb Stefan Seyfried:
 The suspend/resume failure is easily reproduced by
 
 * booting with init=/bin/bash no_console_suspend
 * mount /sys
 * echo mem  /sys/power/state
 * resume = lots of messages, finally kernel panic.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-19 Thread Stefan Seyfried
Hi Tomas,

Am 19.06.2013 10:52, schrieb Winkler, Tomas:

>> So it is not yet fixed, unfortunately.
> 
> Not sure I understand how to reproduce it.  it is still falling on 
> suspend/resume or just unbind/bind?
> Would you be so kind and send me the whole log.

Both is still broken. I'm actually not really sure if the unbind / bind
stuff is really related to the suspend / resume failure. The messages
just looked similar to me, but that might not mean anything.

Sending the whole log is not easy, since it overflows the dmesg buffer
(I have CONFIG_LOG_BUF_SHIFT=18 which is "big enough" usually) and the
journald just exits and restarts itself under such flooding, but I'll try.

Since the resume from suspend to RAM hangs, it is hard to get any logs
-- I never got the mei serial working before and a "real" serial port is
not present on this Thinkpad -- since the resume does not seem to
restart userspace before killing the machine, so nothing gets into the logs.

The suspend/resume failure is easily reproduced by

* booting with "init=/bin/bash no_console_suspend"
* mount /sys
* echo mem > /sys/power/state
* resume => lots of messages, finally kernel panic.

For the bind/unbind: the driver is built in (this is the openSUSE
kernel-of-the-day), but unbinding / rebinding also reproducibly floods
the logs. It does not seem to have additional side effects, but I cannot
test if mei actually still works afterwards.

I could try to take a picture of the panic, but it looked not really
directly related, more like a stack overflow after too many errors or
something like that (it also takes a few seconds after resume for the
machine to panic).

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-19 Thread Stefan Seyfried
Hi Tomas,

Am 19.06.2013 10:52, schrieb Winkler, Tomas:

 So it is not yet fixed, unfortunately.
 
 Not sure I understand how to reproduce it.  it is still falling on 
 suspend/resume or just unbind/bind?
 Would you be so kind and send me the whole log.

Both is still broken. I'm actually not really sure if the unbind / bind
stuff is really related to the suspend / resume failure. The messages
just looked similar to me, but that might not mean anything.

Sending the whole log is not easy, since it overflows the dmesg buffer
(I have CONFIG_LOG_BUF_SHIFT=18 which is big enough usually) and the
journald just exits and restarts itself under such flooding, but I'll try.

Since the resume from suspend to RAM hangs, it is hard to get any logs
-- I never got the mei serial working before and a real serial port is
not present on this Thinkpad -- since the resume does not seem to
restart userspace before killing the machine, so nothing gets into the logs.

The suspend/resume failure is easily reproduced by

* booting with init=/bin/bash no_console_suspend
* mount /sys
* echo mem  /sys/power/state
* resume = lots of messages, finally kernel panic.

For the bind/unbind: the driver is built in (this is the openSUSE
kernel-of-the-day), but unbinding / rebinding also reproducibly floods
the logs. It does not seem to have additional side effects, but I cannot
test if mei actually still works afterwards.

I could try to take a picture of the panic, but it looked not really
directly related, more like a stack overflow after too many errors or
something like that (it also takes a few seconds after resume for the
machine to panic).

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-18 Thread Stefan Seyfried
Hi Tomas,

executive summary: it is not fixed in 3.10rc6

Am 03.06.2013 20:09, schrieb Tomas Winkler:
>>> Or, to be more precise: it breaks resume.
>>>
>>> The machine seems to lock up hard after resume, then after a few seconds
>>> it panics (caps lock blinking).
>>>
>>> Reproduced on ThinkPad X200s
>>>
>>> 00:03.0 0780: 8086:2a44 (rev 07)
>>> Intel Corporation Mobile 4 Series Chipset MEI Controller
>>>
>>> Debugged with "init=/bin/bash no_console_suspend", I see lots of errors
>>> from the mei_me driver, then finally the panic (some overflow maybe?).
>>>
>>> Unbinding the device before suspend fixes resume.
>>
>> I just noticed that I get the following message on unbinding:
>>
>> $ echo :00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
>> $ dmesg|tail -2
>> [ 1216.830034] mei_me :00:03.0: stop
>> [ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0
>>
>> not sure if this is related.
>>
> Thanks for the report I'm looking into it.

I looked at the git log of drivers/misc/mei and it looked promising.

However, it still does not work, commit
42f132febff3b7b42c6c9dbfc151f29233be3132 does not seem to help enough on
my hardware.

Still just unbinding and rebinding with
echo :00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
echo :00:03.0 > /sys/bus/pci/drivers/mei_me/bind

triggers lots of
[  318.330981] mei_me :00:03.0: reset: wrong host start response
[  318.330984] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  318.330990] mei_me :00:03.0: reset: unexpected enumeration response hbm.
[  318.330993] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  318.331016] mei_me :00:03.0: reset: wrong host start response
[  318.331019] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  346.571031] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  346.571047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  376.631030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  376.631044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING

It does, however, calm down after a few seconds, only to spew a few lines
once every 30 seconds:

[  406.691032] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  406.691048] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  436.751033] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  436.751047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  466.811030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  466.811044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING

So it is not yet fixed, unfortunately.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-18 Thread Stefan Seyfried
Hi Tomas,

executive summary: it is not fixed in 3.10rc6

Am 03.06.2013 20:09, schrieb Tomas Winkler:
 Or, to be more precise: it breaks resume.

 The machine seems to lock up hard after resume, then after a few seconds
 it panics (caps lock blinking).

 Reproduced on ThinkPad X200s

 00:03.0 0780: 8086:2a44 (rev 07)
 Intel Corporation Mobile 4 Series Chipset MEI Controller

 Debugged with init=/bin/bash no_console_suspend, I see lots of errors
 from the mei_me driver, then finally the panic (some overflow maybe?).

 Unbinding the device before suspend fixes resume.

 I just noticed that I get the following message on unbinding:

 $ echo :00:03.0  /sys/bus/pci/drivers/mei_me/unbind
 $ dmesg|tail -2
 [ 1216.830034] mei_me :00:03.0: stop
 [ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0

 not sure if this is related.

 Thanks for the report I'm looking into it.

I looked at the git log of drivers/misc/mei and it looked promising.

However, it still does not work, commit
42f132febff3b7b42c6c9dbfc151f29233be3132 does not seem to help enough on
my hardware.

Still just unbinding and rebinding with
echo :00:03.0  /sys/bus/pci/drivers/mei_me/unbind
echo :00:03.0  /sys/bus/pci/drivers/mei_me/bind

triggers lots of
[  318.330981] mei_me :00:03.0: reset: wrong host start response
[  318.330984] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  318.330990] mei_me :00:03.0: reset: unexpected enumeration response hbm.
[  318.330993] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  318.331016] mei_me :00:03.0: reset: wrong host start response
[  318.331019] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  346.571031] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  346.571047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  376.631030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  376.631044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING

It does, however, calm down after a few seconds, only to spew a few lines
once every 30 seconds:

[  406.691032] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  406.691048] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  436.751033] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  436.751047] mei_me :00:03.0: unexpected reset: dev_state = RESETTING
[  466.811030] mei_me :00:03.0: reset: init clients timeout hbm_state = 1.
[  466.811044] mei_me :00:03.0: unexpected reset: dev_state = RESETTING

So it is not yet fixed, unfortunately.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting

2013-06-03 Thread Stefan Seyfried
Am 03.06.2013 21:48, schrieb Frederic Weisbecker:
> On Mon, Jun 03, 2013 at 11:47:17AM +0200, Stefan Seyfried wrote:
>> FWIW:
>> Tested-by: Stefan Seyfried 
>>
>> This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.
> 
> Thanks! The patch has been committed already so I can't add your Tested-by:
> but feedbacks on testing are always appeciated.

But it did not end up in Linus' tree yet. That would be more important
for me than the credits in the commit message :-)

Thanks,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-03 Thread Stefan Seyfried
Am 03.06.2013 19:38, schrieb Stefan Seyfried:
> Or, to be more precise: it breaks resume.
> 
> The machine seems to lock up hard after resume, then after a few seconds
> it panics (caps lock blinking).
> 
> Reproduced on ThinkPad X200s
> 
> 00:03.0 0780: 8086:2a44 (rev 07)
> Intel Corporation Mobile 4 Series Chipset MEI Controller
> 
> Debugged with "init=/bin/bash no_console_suspend", I see lots of errors
> from the mei_me driver, then finally the panic (some overflow maybe?).
> 
> Unbinding the device before suspend fixes resume.

I just noticed that I get the following message on unbinding:

$ echo :00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
$ dmesg|tail -2
[ 1216.830034] mei_me :00:03.0: stop
[ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0

not sure if this is related.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-03 Thread Stefan Seyfried
Or, to be more precise: it breaks resume.

The machine seems to lock up hard after resume, then after a few seconds
it panics (caps lock blinking).

Reproduced on ThinkPad X200s

00:03.0 0780: 8086:2a44 (rev 07)
Intel Corporation Mobile 4 Series Chipset MEI Controller

Debugged with "init=/bin/bash no_console_suspend", I see lots of errors
from the mei_me driver, then finally the panic (some overflow maybe?).

Unbinding the device before suspend fixes resume.
This machine has suspended and resumed fine with 3.9.

This machine has no serial port, so it is hard for me to capture output.
I could try to take a picture of the panic message if that would be helpful.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting

2013-06-03 Thread Stefan Seyfried
Am 20.05.2013 18:01, schrieb Frederic Weisbecker:
> While computing the cputime delta of dynticks CPUs,
> we are mixing up clocks of differents natures:

[...]

> As a consequence, some strange behaviour with unstable tsc
> has been observed such as non progressing constant zero cputime.
> (The 'top' command showing no load).

This happens for example on my trusty ThinkPad X200s (family 6 model 23
stepping 10 Core 2 duo), seriously confusing its user (me :-).

> Fix this by only using local_clock(), or its irq safe/remote
> equivalent, in vtime code.
> 
> Reported-by: Mike Galbraith 
> Suggested-by: Mike Galbraith 
> Cc: Steven Rostedt 
> Cc: Paul E. McKenney 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: Borislav Petkov 
> Cc: Li Zhong 
> Cc: Mike Galbraith 
> Signed-off-by: Frederic Weisbecker 

FWIW:
Tested-by: Stefan Seyfried 

This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting

2013-06-03 Thread Stefan Seyfried
Am 20.05.2013 18:01, schrieb Frederic Weisbecker:
 While computing the cputime delta of dynticks CPUs,
 we are mixing up clocks of differents natures:

[...]

 As a consequence, some strange behaviour with unstable tsc
 has been observed such as non progressing constant zero cputime.
 (The 'top' command showing no load).

This happens for example on my trusty ThinkPad X200s (family 6 model 23
stepping 10 Core 2 duo), seriously confusing its user (me :-).

 Fix this by only using local_clock(), or its irq safe/remote
 equivalent, in vtime code.
 
 Reported-by: Mike Galbraith efa...@gmx.de
 Suggested-by: Mike Galbraith efa...@gmx.de
 Cc: Steven Rostedt rost...@goodmis.org
 Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Ingo Molnar mi...@kernel.org
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Borislav Petkov b...@alien8.de
 Cc: Li Zhong zh...@linux.vnet.ibm.com
 Cc: Mike Galbraith efa...@gmx.de
 Signed-off-by: Frederic Weisbecker fweis...@gmail.com

FWIW:
Tested-by: Stefan Seyfried seife+...@b1-systems.com

This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-03 Thread Stefan Seyfried
Or, to be more precise: it breaks resume.

The machine seems to lock up hard after resume, then after a few seconds
it panics (caps lock blinking).

Reproduced on ThinkPad X200s

00:03.0 0780: 8086:2a44 (rev 07)
Intel Corporation Mobile 4 Series Chipset MEI Controller

Debugged with init=/bin/bash no_console_suspend, I see lots of errors
from the mei_me driver, then finally the panic (some overflow maybe?).

Unbinding the device before suspend fixes resume.
This machine has suspended and resumed fine with 3.9.

This machine has no serial port, so it is hard for me to capture output.
I could try to take a picture of the panic message if that would be helpful.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

2013-06-03 Thread Stefan Seyfried
Am 03.06.2013 19:38, schrieb Stefan Seyfried:
 Or, to be more precise: it breaks resume.
 
 The machine seems to lock up hard after resume, then after a few seconds
 it panics (caps lock blinking).
 
 Reproduced on ThinkPad X200s
 
 00:03.0 0780: 8086:2a44 (rev 07)
 Intel Corporation Mobile 4 Series Chipset MEI Controller
 
 Debugged with init=/bin/bash no_console_suspend, I see lots of errors
 from the mei_me driver, then finally the panic (some overflow maybe?).
 
 Unbinding the device before suspend fixes resume.

I just noticed that I get the following message on unbinding:

$ echo :00:03.0  /sys/bus/pci/drivers/mei_me/unbind
$ dmesg|tail -2
[ 1216.830034] mei_me :00:03.0: stop
[ 1216.837018] mei_me :00:03.0: wait hw ready failed. status = 0x0

not sure if this is related.

Best regards,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] vtime: Use consistent clocks among nohz accounting

2013-06-03 Thread Stefan Seyfried
Am 03.06.2013 21:48, schrieb Frederic Weisbecker:
 On Mon, Jun 03, 2013 at 11:47:17AM +0200, Stefan Seyfried wrote:
 FWIW:
 Tested-by: Stefan Seyfried seife+...@b1-systems.com

 This patch fixes the 0% CPU issue on openSUSE Factory kernels for me.
 
 Thanks! The patch has been committed already so I can't add your Tested-by:
 but feedbacks on testing are always appeciated.

But it did not end up in Linus' tree yet. That would be more important
for me than the credits in the commit message :-)

Thanks,

Stefan
-- 
Stefan Seyfried
Linux Consultant  Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bluez-devel] [BUG] rfcomm

2008-02-20 Thread Stefan Seyfried
Dave Young schrieb:

>> Feb 16 23:41:33 alon1 BUG: unable to handle kernel NULL pointer dereference 
>> at virtual address 0008
>> Feb 16 23:41:33 alon1 printing eip: c01b2db6 *pde =  
>> Feb 16 23:41:33 alon1 Oops:  [#1] PREEMPT 
>> Feb 16 23:41:33 alon1 Modules linked in: ppp_deflate zlib_deflate 
>> zlib_inflate bsd_comp ppp_async rfcomm l2cap hci_usb vmnet(P) vmmon(P) tun 
>> radeon drm autofs4 ipv6 aes_generic crypto_algapi ieee80211_crypt_ccmp 
>> nf_nat_irc nf_nat_ftp nf_conntrack_irc nf_conntrack_ftp ipt_MASQUERADE 
>> iptable_nat nf_nat ipt_REJECT xt_tcpudp ipt_LOG xt_limit xt_state 
>> nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables snd_pcm_oss 
>> snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
>> snd_seq_device bluetooth ppp_generic slhc ioatdma dca cfq_iosched 
>> cpufreq_powersave cpufreq_ondemand cpufreq_conservative acpi_cpufreq 
>> freq_table uinput fan af_packet nls_cp1255 nls_iso8859_1 nls_utf8 nls_base 
>> pcmcia snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm nsc_ircc snd_timer 
>> ipw2200 thinkpad_acpi irda snd ehci_hcd yenta_socket uhci_hcd psmouse 
>> ieee80211 soundcore intel_agp hwmon rsrc_nonstatic pcspkr e1000 crc_ccitt 
>> snd_page_alloc i2c_i801 ieee80211_crypt pcmcia_core agpgart thermal b
a
>  ttery nvram rtc sr_mod ac sg firmware_class button processor cdrom unix 
> usbcore evdev ext3 jbd ext2 mbcache loop ata_piix libata sd_mod scsi_mod
>> Feb 16 23:41:33 alon1 
>> Feb 16 23:41:33 alon1 Pid: 4, comm: events/0 Tainted: P
>> (2.6.24-gentoo-r2 #1)
>> Feb 16 23:41:33 alon1 EIP: 0060:[] EFLAGS: 00010282 CPU: 0
>> Feb 16 23:41:33 alon1 EIP is at sysfs_get_dentry+0x26/0x80
>> Feb 16 23:41:33 alon1 EAX:  EBX:  ECX:  EDX: f48a2210
>> Feb 16 23:41:33 alon1 ESI: f72eb900 EDI: f4803ae0 EBP: f4803ae0 ESP: f7c49efc
>> Feb 16 23:41:33 alon1 hcid[7004]: HCI dev 0 registered
>> Feb 16 23:41:33 alon1 DS: 007b ES: 007b FS:  GS:  SS: 0068
>> Feb 16 23:41:33 alon1 Process events/0 (pid: 4, ti=f7c48000 task=f7c3efc0 
>> task.ti=f7c48000)
>> Feb 16 23:41:33 alon1 Stack: f7cb6140 f4822668 f7e71e10 c01b304d  
>>  fffe c030ba9c 
>> Feb 16 23:41:33 alon1 f7cb6140 f4822668 f6da6720 f7cb6140 f4822668 f6da6720 
>> c030ba8e c01ce20b 
>> Feb 16 23:41:33 alon1 f6e9dd00 c030ba8e f6da6720 f6e9dd00 f6e9dd00  
>> f4822600  
>> Feb 16 23:41:33 alon1 Call Trace:
>> Feb 16 23:41:33 alon1 [] sysfs_move_dir+0x3d/0x1f0
>> Feb 16 23:41:33 alon1 [] kobject_move+0x9b/0x120
>> Feb 16 23:41:33 alon1 [] device_move+0x51/0x110
>> Feb 16 23:41:33 alon1 [] del_conn+0x0/0x70 [bluetooth]
>> Feb 16 23:41:33 alon1 [] del_conn+0x19/0x70 [bluetooth]
>> Feb 16 23:41:33 alon1 [] run_workqueue+0x81/0x140
>> Feb 16 23:41:33 alon1 [] schedule+0x168/0x2e0

> Could you try patch below?

Works fine for me. Thanks. Together with the other two patches already taken
by davem, this fixes all my current BT problems :-)

> Defer hci_unregister_sysfs because hci device could be destructed
> while hci conn devices still there.
> 
> Signed-off-by: Dave Young <[EMAIL PROTECTED]>
> 
> ---
> net/bluetooth/hci_core.c |4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff -upr linux/net/bluetooth/hci_core.c linux.new/net/bluetooth/hci_core.c
> --- linux/net/bluetooth/hci_core.c2008-02-20 18:27:28.0 +0800
> +++ linux.new/net/bluetooth/hci_core.c2008-02-20 18:28:34.0 
> +0800
> @@ -901,8 +901,6 @@ int hci_unregister_dev(struct hci_dev *h
>  
>   BT_DBG("%p name %s type %d", hdev, hdev->name, hdev->type);
>  
> - hci_unregister_sysfs(hdev);
> -
>   write_lock_bh(_dev_list_lock);
>   list_del(>list);
>   write_unlock_bh(_dev_list_lock);
> @@ -914,6 +912,8 @@ int hci_unregister_dev(struct hci_dev *h
>  
>   hci_notify(hdev, HCI_DEV_UNREG);
>  
> + hci_unregister_sysfs(hdev);
> +
>   __hci_dev_put(hdev);
>  
>   return 0;
-- 
Stefan Seyfried
R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out."

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bluez-devel] [BUG] rfcomm

2008-02-20 Thread Stefan Seyfried
Dave Young schrieb:

 Feb 16 23:41:33 alon1 BUG: unable to handle kernel NULL pointer dereference 
 at virtual address 0008
 Feb 16 23:41:33 alon1 printing eip: c01b2db6 *pde =  
 Feb 16 23:41:33 alon1 Oops:  [#1] PREEMPT 
 Feb 16 23:41:33 alon1 Modules linked in: ppp_deflate zlib_deflate 
 zlib_inflate bsd_comp ppp_async rfcomm l2cap hci_usb vmnet(P) vmmon(P) tun 
 radeon drm autofs4 ipv6 aes_generic crypto_algapi ieee80211_crypt_ccmp 
 nf_nat_irc nf_nat_ftp nf_conntrack_irc nf_conntrack_ftp ipt_MASQUERADE 
 iptable_nat nf_nat ipt_REJECT xt_tcpudp ipt_LOG xt_limit xt_state 
 nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables snd_pcm_oss 
 snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
 snd_seq_device bluetooth ppp_generic slhc ioatdma dca cfq_iosched 
 cpufreq_powersave cpufreq_ondemand cpufreq_conservative acpi_cpufreq 
 freq_table uinput fan af_packet nls_cp1255 nls_iso8859_1 nls_utf8 nls_base 
 pcmcia snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm nsc_ircc snd_timer 
 ipw2200 thinkpad_acpi irda snd ehci_hcd yenta_socket uhci_hcd psmouse 
 ieee80211 soundcore intel_agp hwmon rsrc_nonstatic pcspkr e1000 crc_ccitt 
 snd_page_alloc i2c_i801 ieee80211_crypt pcmcia_core agpgart thermal b
a
  ttery nvram rtc sr_mod ac sg firmware_class button processor cdrom unix 
 usbcore evdev ext3 jbd ext2 mbcache loop ata_piix libata sd_mod scsi_mod
 Feb 16 23:41:33 alon1 
 Feb 16 23:41:33 alon1 Pid: 4, comm: events/0 Tainted: P
 (2.6.24-gentoo-r2 #1)
 Feb 16 23:41:33 alon1 EIP: 0060:[c01b2db6] EFLAGS: 00010282 CPU: 0
 Feb 16 23:41:33 alon1 EIP is at sysfs_get_dentry+0x26/0x80
 Feb 16 23:41:33 alon1 EAX:  EBX:  ECX:  EDX: f48a2210
 Feb 16 23:41:33 alon1 ESI: f72eb900 EDI: f4803ae0 EBP: f4803ae0 ESP: f7c49efc
 Feb 16 23:41:33 alon1 hcid[7004]: HCI dev 0 registered
 Feb 16 23:41:33 alon1 DS: 007b ES: 007b FS:  GS:  SS: 0068
 Feb 16 23:41:33 alon1 Process events/0 (pid: 4, ti=f7c48000 task=f7c3efc0 
 task.ti=f7c48000)
 Feb 16 23:41:33 alon1 Stack: f7cb6140 f4822668 f7e71e10 c01b304d  
  fffe c030ba9c 
 Feb 16 23:41:33 alon1 f7cb6140 f4822668 f6da6720 f7cb6140 f4822668 f6da6720 
 c030ba8e c01ce20b 
 Feb 16 23:41:33 alon1 f6e9dd00 c030ba8e f6da6720 f6e9dd00 f6e9dd00  
 f4822600  
 Feb 16 23:41:33 alon1 Call Trace:
 Feb 16 23:41:33 alon1 [c01b304d] sysfs_move_dir+0x3d/0x1f0
 Feb 16 23:41:33 alon1 [c01ce20b] kobject_move+0x9b/0x120
 Feb 16 23:41:33 alon1 [c0241711] device_move+0x51/0x110
 Feb 16 23:41:33 alon1 [f9aaed80] del_conn+0x0/0x70 [bluetooth]
 Feb 16 23:41:33 alon1 [f9aaed99] del_conn+0x19/0x70 [bluetooth]
 Feb 16 23:41:33 alon1 [c012c1a1] run_workqueue+0x81/0x140
 Feb 16 23:41:33 alon1 [c02c0c88] schedule+0x168/0x2e0

 Could you try patch below?

Works fine for me. Thanks. Together with the other two patches already taken
by davem, this fixes all my current BT problems :-)

 Defer hci_unregister_sysfs because hci device could be destructed
 while hci conn devices still there.
 
 Signed-off-by: Dave Young [EMAIL PROTECTED]
 
 ---
 net/bluetooth/hci_core.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff -upr linux/net/bluetooth/hci_core.c linux.new/net/bluetooth/hci_core.c
 --- linux/net/bluetooth/hci_core.c2008-02-20 18:27:28.0 +0800
 +++ linux.new/net/bluetooth/hci_core.c2008-02-20 18:28:34.0 
 +0800
 @@ -901,8 +901,6 @@ int hci_unregister_dev(struct hci_dev *h
  
   BT_DBG(%p name %s type %d, hdev, hdev-name, hdev-type);
  
 - hci_unregister_sysfs(hdev);
 -
   write_lock_bh(hci_dev_list_lock);
   list_del(hdev-list);
   write_unlock_bh(hci_dev_list_lock);
 @@ -914,6 +912,8 @@ int hci_unregister_dev(struct hci_dev *h
  
   hci_notify(hdev, HCI_DEV_UNREG);
  
 + hci_unregister_sysfs(hdev);
 +
   __hci_dev_put(hdev);
  
   return 0;
-- 
Stefan Seyfried
RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg | Well, surrounding them's out.

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rft] Kill junk from s2ram resume paths

2007-08-01 Thread Stefan Seyfried
On Tue, Jul 31, 2007 at 04:43:34PM +0200, Stefan Seyfried wrote:
> On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > > > >  # Running in *copy* of this code, somewhere in low 1MB.
> > > > > >  
> > > > > > -   movb$0xa1, %al  ;  outb %al, $0x80
> > > > > 
> > > > > Well, what was this for?
> > > > 
> > > > Debugging leds on port 80. I still have that card somewhere
> > > > :-). Interesting parties can reinsert it.
> > > 
> > > Ah, I see.
> > > 
> > > Hmm, can you please write about that in the chanelog more explicitly?
> > > Or just comment it out with a "uncomment this to get ..." text?
> > 
> > I still need someone with x86-64 to test it for me before I submit it
> > properly ;-). Updated patch follows.
> 
> Compiling right now.

Worked well on my x86_64 testmachine (a 64bit Thinkpad), worked before and
after the patch with 2.6.23-rc1.
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hibernation considerations

2007-08-01 Thread Stefan Seyfried
Hi,

Sorry for joining late, just a small annotation:

On Tue, Jul 17, 2007 at 01:18:13PM -0700, [EMAIL PROTECTED] wrote:
 
> non-ACPI hibernate
> 
>   since the box powers off
> it uses zero power while suspended
> another OS could be run before a resume
> hardware can be swapped, suspend image could be sent around the world to 
> be restored on another system.
> restore makes no assumptions about the state of the hardware when it is 
> restored
> restore is slower (full BIOS boot is required)
>   should be able to work on just about any hardware (the limit is the ability 
> to initialize the devices)
> 
> 
> ACPI suspends
> 
>   since the box never completely powers off:A

wrong

> a complete power failure breaks the suspend

wrong

> the OS must remain in control so other uses must be prevented.
> hardware must remain in the ACPI state from suspend until restore.
> restore can be faster (some initialization may be able to be skipped)
>   requires ACPI hardware support
> 
> under the catagory of ACPI suspends you have

ACPI S4 turns off the machine completely and you can remove the battery (this
is even required somewhere in the spec). Any state saving is done in CMOS RAM
or flash.

But for example many Notebooks resume much faster if they go through the
ACPI S4 hooks during suspend (less than one second from "lid open" to "grub"
while they need ~10 seconds through the BIOS on a "normal" boot.
My Toughbook resumes on "Lid Opened" after S4, it doesn't after a shutdown.

So there will be differences.
I'm not saying that they are too important, but 20% faster resume still is
a good saving for me.

No need to restart this thread btw ;-)

Have fun,

Stefan
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hibernation considerations

2007-08-01 Thread Stefan Seyfried
Hi,

Sorry for joining late, just a small annotation:

On Tue, Jul 17, 2007 at 01:18:13PM -0700, [EMAIL PROTECTED] wrote:
 
 non-ACPI hibernate
 
   since the box powers off
 it uses zero power while suspended
 another OS could be run before a resume
 hardware can be swapped, suspend image could be sent around the world to 
 be restored on another system.
 restore makes no assumptions about the state of the hardware when it is 
 restored
 restore is slower (full BIOS boot is required)
   should be able to work on just about any hardware (the limit is the ability 
 to initialize the devices)
 
 
 ACPI suspends
 
   since the box never completely powers off:A

wrong

 a complete power failure breaks the suspend

wrong

 the OS must remain in control so other uses must be prevented.
 hardware must remain in the ACPI state from suspend until restore.
 restore can be faster (some initialization may be able to be skipped)
   requires ACPI hardware support
 
 under the catagory of ACPI suspends you have

ACPI S4 turns off the machine completely and you can remove the battery (this
is even required somewhere in the spec). Any state saving is done in CMOS RAM
or flash.

But for example many Notebooks resume much faster if they go through the
ACPI S4 hooks during suspend (less than one second from lid open to grub
while they need ~10 seconds through the BIOS on a normal boot.
My Toughbook resumes on Lid Opened after S4, it doesn't after a shutdown.

So there will be differences.
I'm not saying that they are too important, but 20% faster resume still is
a good saving for me.

No need to restart this thread btw ;-)

Have fun,

Stefan
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rft] Kill junk from s2ram resume paths

2007-08-01 Thread Stefan Seyfried
On Tue, Jul 31, 2007 at 04:43:34PM +0200, Stefan Seyfried wrote:
 On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote:
  Hi!
  
   # Running in *copy* of this code, somewhere in low 1MB.
   
  -   movb$0xa1, %al  ;  outb %al, $0x80
 
 Well, what was this for?

Debugging leds on port 80. I still have that card somewhere
:-). Interesting parties can reinsert it.
   
   Ah, I see.
   
   Hmm, can you please write about that in the chanelog more explicitly?
   Or just comment it out with a uncomment this to get ... text?
  
  I still need someone with x86-64 to test it for me before I submit it
  properly ;-). Updated patch follows.
 
 Compiling right now.

Worked well on my x86_64 testmachine (a 64bit Thinkpad), worked before and
after the patch with 2.6.23-rc1.
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rft] Kill junk from s2ram resume paths

2007-07-31 Thread Stefan Seyfried
On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > > >  # Running in *copy* of this code, somewhere in low 1MB.
> > > > >  
> > > > > - movb$0xa1, %al  ;  outb %al, $0x80
> > > > 
> > > > Well, what was this for?
> > > 
> > > Debugging leds on port 80. I still have that card somewhere
> > > :-). Interesting parties can reinsert it.
> > 
> > Ah, I see.
> > 
> > Hmm, can you please write about that in the chanelog more explicitly?
> > Or just comment it out with a "uncomment this to get ..." text?
> 
> I still need someone with x86-64 to test it for me before I submit it
> properly ;-). Updated patch follows.

Compiling right now.

>   Pavel
> 
> diff --git a/arch/i386/kernel/acpi/wakeup.S b/arch/i386/kernel/acpi/wakeup.S
> index 1415da1..9cebef7 100644
> --- a/arch/i386/kernel/acpi/wakeup.S
> +++ b/arch/i386/kernel/acpi/wakeup.S
> @@ -28,21 +28,6 @@ #define BEEP \
>   movb$15, %al;   \
>   outb%al, $66;
>  
> -#define BEEP \
> - inb $97, %al;   \
> - outb%al, $0x80; \
> - movb$3, %al;\
> - outb%al, $97;   \
> - outb%al, $0x80; \
> - movb$-74, %al;  \
> - outb%al, $67;   \
> - outb%al, $0x80; \
> - movb$-119, %al; \
> - outb%al, $66;   \
> - outb%al, $0x80; \
> -     movb    $15, %al;   \
> - outb%al, $66;
> -
>  ALIGN
>   .align  4096
>  ENTRY(wakeup_start)

This hunk rejected for me (against 2.6.23-rc1), but i'm testing x86_64, so
it did not matter ;-)
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rft] Kill junk from s2ram resume paths

2007-07-31 Thread Stefan Seyfried
On Tue, Jul 31, 2007 at 04:01:40PM +0200, Pavel Machek wrote:
 Hi!
 
  # Running in *copy* of this code, somewhere in low 1MB.
  
 - movb$0xa1, %al  ;  outb %al, $0x80

Well, what was this for?
   
   Debugging leds on port 80. I still have that card somewhere
   :-). Interesting parties can reinsert it.
  
  Ah, I see.
  
  Hmm, can you please write about that in the chanelog more explicitly?
  Or just comment it out with a uncomment this to get ... text?
 
 I still need someone with x86-64 to test it for me before I submit it
 properly ;-). Updated patch follows.

Compiling right now.

   Pavel
 
 diff --git a/arch/i386/kernel/acpi/wakeup.S b/arch/i386/kernel/acpi/wakeup.S
 index 1415da1..9cebef7 100644
 --- a/arch/i386/kernel/acpi/wakeup.S
 +++ b/arch/i386/kernel/acpi/wakeup.S
 @@ -28,21 +28,6 @@ #define BEEP \
   movb$15, %al;   \
   outb%al, $66;
  
 -#define BEEP \
 - inb $97, %al;   \
 - outb%al, $0x80; \
 - movb$3, %al;\
 - outb%al, $97;   \
 - outb%al, $0x80; \
 - movb$-74, %al;  \
 - outb%al, $67;   \
 - outb%al, $0x80; \
 - movb$-119, %al; \
 - outb%al, $66;   \
 - outb%al, $0x80; \
 - movb$15, %al;   \
 - outb%al, $66;
 -
  ALIGN
   .align  4096
  ENTRY(wakeup_start)

This hunk rejected for me (against 2.6.23-rc1), but i'm testing x86_64, so
it did not matter ;-)
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: update ACPI_PROCFS removal schedule

2007-07-16 Thread Stefan Seyfried
On Thu, Jul 12, 2007 at 06:20:34PM +0800, rzhang1 wrote:
> From: Zhang Rui <[EMAIL PROTECTED]>
> 
> ACPI sysfs conversion is not finished yet and
> some user space tools still depend on the ACPI procfs I/F.
> 
> The ACPI_PROCFS removal schedule is changed to Jan 08.

I think that's too early. The conversion to sysfs is not even
finished, so it will be less than 6 months.
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][BUTTON] remove procfs-interface

2007-07-16 Thread Stefan Seyfried
On Fri, Jul 13, 2007 at 12:37:07AM +0530, Satyam Sharma wrote:
> On 7/12/07, Zhang, Rui <[EMAIL PROTECTED]> wrote:
> >Well, the ACPI sysfs conversion is not finished yet
> >[...]
> >I'm not sure if the button sysfs I/F is already finished.
> >We'd better make a double check. :)
> 
> Ok, this sounds reasonable.
> 
> >and some user space tools still use the ACPI procfs.
> 
> But this does *not*, IMHO. It quite defeats the whole concept of
> feature-removal-schedule.txt. I think that file exists precisely
> because we cannot gratuitously break userspace interfaces just
> like that, but when something gets put up there with a removal date
> that is a good one year in the future, and userspace tools _still_
> continue to use it ... then, I suspect something's seriously wrong.

Holy sh*t. There is not even a functional replacement ready, but still
everybody wants to remove /proc/acpi. (Maybe the replacement started
to work recently, i have not looked into this area for the last months.
This does not change my pint, though).
This is not going to work.
IMNSHO, we need the new interface available and usable for quite some time
(i'd say for over one year), and then we can start to phase out the old
interface.
Starting with removing /proc/acpi is not the correct ordering of actions.
 
> Either the feature-removal-schedule.txt file has become something
> that users don't even bother checking, or else, they _know_ that
> even if they don't bother keeping up with the pace in kernel-land,
> that interface still won't go away (because they're still using it!).

Or they look at the feature-removal document, find out that there is
no replacement available and conclude "the writers of this document
must have been on crack, or this document is unmaintained". I cannot
disagree with them.
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] the scheduled ACPI_PROCFS removal

2007-07-16 Thread Stefan Seyfried
On Thu, Jul 12, 2007 at 10:18:17AM +0100, Richard Hughes wrote:
> On Thu, 2007-07-12 at 09:32 +0400, Alexey Starikovskiy wrote:
> > >> [*] Does someone have an alternative for
> > >> /proc/acpi/battery/BAT1/{state,info}?
> > I'm working on it. Should have proto by the end of week.
> 
> If you are using the power_supply class (i hope you are ;-) then a HAL
> from freedesktop git should make userspace continue to just work.

Having to update HAL is not my definition of "does not break userspace".
And, BTW, there is more than just HAL out there using /proc/acpi, and
this should continue to work.
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] the scheduled ACPI_PROCFS removal

2007-07-16 Thread Stefan Seyfried
On Thu, Jul 12, 2007 at 10:18:17AM +0100, Richard Hughes wrote:
 On Thu, 2007-07-12 at 09:32 +0400, Alexey Starikovskiy wrote:
   [*] Does someone have an alternative for
   /proc/acpi/battery/BAT1/{state,info}?
  I'm working on it. Should have proto by the end of week.
 
 If you are using the power_supply class (i hope you are ;-) then a HAL
 from freedesktop git should make userspace continue to just work.

Having to update HAL is not my definition of does not break userspace.
And, BTW, there is more than just HAL out there using /proc/acpi, and
this should continue to work.
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][BUTTON] remove procfs-interface

2007-07-16 Thread Stefan Seyfried
On Fri, Jul 13, 2007 at 12:37:07AM +0530, Satyam Sharma wrote:
 On 7/12/07, Zhang, Rui [EMAIL PROTECTED] wrote:
 Well, the ACPI sysfs conversion is not finished yet
 [...]
 I'm not sure if the button sysfs I/F is already finished.
 We'd better make a double check. :)
 
 Ok, this sounds reasonable.
 
 and some user space tools still use the ACPI procfs.
 
 But this does *not*, IMHO. It quite defeats the whole concept of
 feature-removal-schedule.txt. I think that file exists precisely
 because we cannot gratuitously break userspace interfaces just
 like that, but when something gets put up there with a removal date
 that is a good one year in the future, and userspace tools _still_
 continue to use it ... then, I suspect something's seriously wrong.

Holy sh*t. There is not even a functional replacement ready, but still
everybody wants to remove /proc/acpi. (Maybe the replacement started
to work recently, i have not looked into this area for the last months.
This does not change my pint, though).
This is not going to work.
IMNSHO, we need the new interface available and usable for quite some time
(i'd say for over one year), and then we can start to phase out the old
interface.
Starting with removing /proc/acpi is not the correct ordering of actions.
 
 Either the feature-removal-schedule.txt file has become something
 that users don't even bother checking, or else, they _know_ that
 even if they don't bother keeping up with the pace in kernel-land,
 that interface still won't go away (because they're still using it!).

Or they look at the feature-removal document, find out that there is
no replacement available and conclude the writers of this document
must have been on crack, or this document is unmaintained. I cannot
disagree with them.
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: update ACPI_PROCFS removal schedule

2007-07-16 Thread Stefan Seyfried
On Thu, Jul 12, 2007 at 06:20:34PM +0800, rzhang1 wrote:
 From: Zhang Rui [EMAIL PROTECTED]
 
 ACPI sysfs conversion is not finished yet and
 some user space tools still depend on the ACPI procfs I/F.
 
 The ACPI_PROCFS removal schedule is changed to Jan 08.

I think that's too early. The conversion to sysfs is not even
finished, so it will be less than 6 months.
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Failure to properly reinit i8042 post suspend-to-ram

2007-07-10 Thread Stefan Seyfried
Hi,

On Tue, Jul 10, 2007 at 10:59:57AM +1000, Nigel Cunningham wrote:
> On Saturday 07 July 2007 01:04:51 Stefan Seyfried wrote:
> > On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote:
> > > 
> > > Adding i8042.reset=1 to the commandline fixed it.
> > 
> > Wasn't there a quirk list where workarounds for i8042 on known bad machines
> > are stored? Maybe it would be a good idea to get your machine into it ;-)
> 
> Unless I'm missing something, it looks like there's no such thing in the 
> i8042 
> driver. That's okay. I can cope with adding i8042.reset=1 to my 
> commandline :)

In drivers/input/serio/i8042-x86ia64io.h there are tables for various quirks,
but apparently nothing for "reset=1".
If we find another machine that needs reset=1, then it might be time for a
table for this quirk.

Best regards,

Stefan

-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Failure to properly reinit i8042 post suspend-to-ram

2007-07-10 Thread Stefan Seyfried
Hi,

On Tue, Jul 10, 2007 at 10:59:57AM +1000, Nigel Cunningham wrote:
 On Saturday 07 July 2007 01:04:51 Stefan Seyfried wrote:
  On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote:
   
   Adding i8042.reset=1 to the commandline fixed it.
  
  Wasn't there a quirk list where workarounds for i8042 on known bad machines
  are stored? Maybe it would be a good idea to get your machine into it ;-)
 
 Unless I'm missing something, it looks like there's no such thing in the 
 i8042 
 driver. That's okay. I can cope with adding i8042.reset=1 to my 
 commandline :)

In drivers/input/serio/i8042-x86ia64io.h there are tables for various quirks,
but apparently nothing for reset=1.
If we find another machine that needs reset=1, then it might be time for a
table for this quirk.

Best regards,

Stefan

-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Failure to properly reinit i8042 post suspend-to-ram

2007-07-06 Thread Stefan Seyfried
On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote:
> > 
> > If confusion persist after 4 seconds hard power down... then you h ve
> > hw/BIOS problem. Complain to whoever is manufacturing that beast.
> 
> Adding i8042.reset=1 to the commandline fixed it.

Wasn't there a quirk list where workarounds for i8042 on known bad machines
are stored? Maybe it would be a good idea to get your machine into it ;-)
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Failure to properly reinit i8042 post suspend-to-ram

2007-07-06 Thread Stefan Seyfried
On Thu, Jul 05, 2007 at 09:04:27PM +1000, Nigel Cunningham wrote:
  
  If confusion persist after 4 seconds hard power down... then you h ve
  hw/BIOS problem. Complain to whoever is manufacturing that beast.
 
 Adding i8042.reset=1 to the commandline fixed it.

Wasn't there a quirk list where workarounds for i8042 on known bad machines
are stored? Maybe it would be a good idea to get your machine into it ;-)
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND

2007-07-05 Thread Stefan Seyfried
Hi,

On Thu, Jul 05, 2007 at 12:39:22AM +0200, Pavel Machek wrote:
> Hi!
> 
> > Yes, but I'm not sure if netconsole is the only one that we will want to 
> > have
> 
> Well, netconsole is the only one we know of.

AFAIR it is plain luck that serial console sometimes works.

I repeat: "no bugreport" is not the same as "it works for everyone" wrt.
suspend. It seems (i unfortunately have no numbers, since my machines always
worked without suspending the consoles) as if suspending consoles generally
helped reliability of suspend.
 
> > disabled.  Moreover, what if someone wants to use the netconsole regardless
> > of the fact that it can crash the box?
> 
> He'll have to edit the sources at that point. I'd prefer not to have
> too many "please crash the box" options.

So should we remove sysrq-C?
This is a debugging option. Only root can set it. Its purpose is to make
"machine hangs during suspend" (even before it goes to sleep) debuggable.
It will only be set if the machine crashes anyways.

(We can taint the kernel if this control is set, if that helps you).
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND

2007-07-05 Thread Stefan Seyfried
Hi,

On Thu, Jul 05, 2007 at 12:39:22AM +0200, Pavel Machek wrote:
 Hi!
 
  Yes, but I'm not sure if netconsole is the only one that we will want to 
  have
 
 Well, netconsole is the only one we know of.

AFAIR it is plain luck that serial console sometimes works.

I repeat: no bugreport is not the same as it works for everyone wrt.
suspend. It seems (i unfortunately have no numbers, since my machines always
worked without suspending the consoles) as if suspending consoles generally
helped reliability of suspend.
 
  disabled.  Moreover, what if someone wants to use the netconsole regardless
  of the fact that it can crash the box?
 
 He'll have to edit the sources at that point. I'd prefer not to have
 too many please crash the box options.

So should we remove sysrq-C?
This is a debugging option. Only root can set it. Its purpose is to make
machine hangs during suspend (even before it goes to sleep) debuggable.
It will only be set if the machine crashes anyways.

(We can taint the kernel if this control is set, if that helps you).
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Optional Beeping During Resume From Suspend To Ram.

2007-06-29 Thread Stefan Seyfried
On Fri, Jun 29, 2007 at 08:27:12AM +1000, Nigel Cunningham wrote:
> > Can we rename/reuse existing flag variable?
> 
> Sorry, but I can't resist the opportunity to say "Send a patch!" :)
> 
> Seriously, though, I'd prefer not to. If we rename that acpi video flags 
> variable (I assume this is what you're thinking of), we only create cause for 
> confusion. A variable should for debugging or for controlling quirks, not for 
> both at the same time.

I agree. And video_flags is something totally different :-)
I just used that one in my ad-hoc hack (which actually was only to illustrate
the idea) because a) it was enough to show the intent and b) i did not know
how to do it better ;-)
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND

2007-06-29 Thread Stefan Seyfried
On Thu, Jun 28, 2007 at 09:12:44PM +0200, Rafael J. Wysocki wrote:
> On Thursday, 28 June 2007 19:25, Stefan Seyfried wrote:
> > 
> > However, we don't know which consoles are safe to stay alive during suspend.
> > Generally, defaulting to suspending them all is not a bad idea IMHO.
> > And IIRC it is plain luck if a serial console survives the suspend (or was
> > the serial code fixed recently?)
> 
> Well, I don't think so, but I'm not sure.
> 
> The VGA/fb console also should be off during suspend (not necessarily during
> hibernation, though).  IIRC, that's what caused Linus to introduce the
> suspending of consoles after all.
> 
> > So i do not care too much, but my / Frank's patch was shorter :-) and safer.
> 
> I'm not sure which way to go.  On the one hand, I agree that we should rather
> fix the consoles so that we know which one is suspend-safe and which is not
> and disable the unsafe ones, but on the other hand we are not there yet and it
> _sometimes_ is useful not to suspend a console even if we know that it will
> break things.

This is what my / Frank's patch was aimed at: give the user the ability to
(painlessly, without rebuilding the kernel) debug suspend problems. Keep the
default safe, like Linus likes it (consoles suspended), but give the user a
switch to make it unsafe (consoles not suspended) for the sake of debugging.

Of course, fixing up all console drivers is an option that i'd very much like
to see. It is however debatable if it is really worth the effort. If it works
with consoles suspended, the user does not care. If it doesn't, he turns on
debugging (knowing, or being told that this will break using netconsole).

I strongly oppose Pavel's approach to "declare all console drivers as
nonbroken except netconsole". Even if he has not seen any failures apart
from netconsole, in general i had the impression that suspending consoles
did help. At least suspend works on many more machines than half a year ago,
and i'd not be surprised if this was partly due to suspending the consoles.

Remember that wrt. suspend "i did not get a bugreport" very often just means
"people tried it, it did not work, but they expected that and just turned
 away". It does not mean "it just works for everyone".
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND

2007-06-29 Thread Stefan Seyfried
On Thu, Jun 28, 2007 at 09:12:44PM +0200, Rafael J. Wysocki wrote:
 On Thursday, 28 June 2007 19:25, Stefan Seyfried wrote:
  
  However, we don't know which consoles are safe to stay alive during suspend.
  Generally, defaulting to suspending them all is not a bad idea IMHO.
  And IIRC it is plain luck if a serial console survives the suspend (or was
  the serial code fixed recently?)
 
 Well, I don't think so, but I'm not sure.
 
 The VGA/fb console also should be off during suspend (not necessarily during
 hibernation, though).  IIRC, that's what caused Linus to introduce the
 suspending of consoles after all.
 
  So i do not care too much, but my / Frank's patch was shorter :-) and safer.
 
 I'm not sure which way to go.  On the one hand, I agree that we should rather
 fix the consoles so that we know which one is suspend-safe and which is not
 and disable the unsafe ones, but on the other hand we are not there yet and it
 _sometimes_ is useful not to suspend a console even if we know that it will
 break things.

This is what my / Frank's patch was aimed at: give the user the ability to
(painlessly, without rebuilding the kernel) debug suspend problems. Keep the
default safe, like Linus likes it (consoles suspended), but give the user a
switch to make it unsafe (consoles not suspended) for the sake of debugging.

Of course, fixing up all console drivers is an option that i'd very much like
to see. It is however debatable if it is really worth the effort. If it works
with consoles suspended, the user does not care. If it doesn't, he turns on
debugging (knowing, or being told that this will break using netconsole).

I strongly oppose Pavel's approach to declare all console drivers as
nonbroken except netconsole. Even if he has not seen any failures apart
from netconsole, in general i had the impression that suspending consoles
did help. At least suspend works on many more machines than half a year ago,
and i'd not be surprised if this was partly due to suspending the consoles.

Remember that wrt. suspend i did not get a bugreport very often just means
people tried it, it did not work, but they expected that and just turned
 away. It does not mean it just works for everyone.
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Optional Beeping During Resume From Suspend To Ram.

2007-06-29 Thread Stefan Seyfried
On Fri, Jun 29, 2007 at 08:27:12AM +1000, Nigel Cunningham wrote:
  Can we rename/reuse existing flag variable?
 
 Sorry, but I can't resist the opportunity to say Send a patch! :)
 
 Seriously, though, I'd prefer not to. If we rename that acpi video flags 
 variable (I assume this is what you're thinking of), we only create cause for 
 confusion. A variable should for debugging or for controlling quirks, not for 
 both at the same time.

I agree. And video_flags is something totally different :-)
I just used that one in my ad-hoc hack (which actually was only to illustrate
the idea) because a) it was enough to show the intent and b) i did not know
how to do it better ;-)
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND

2007-06-28 Thread Stefan Seyfried
(CC'ing Linus, since disabling consoles during suspend was his idea IIRC)

On Thu, Jun 28, 2007 at 05:34:54PM +0200, Rafael J. Wysocki wrote:
> Hi,
> 
> On Thursday, 28 June 2007 15:51, Pavel Machek wrote:
> > Hi!
> > 
> > What about this? (Only compile tested, but looks pretty obvious to
> > me). Something like this should get us rid of ugly option, and still
> > solve debugging problems... Hmmm?
> > Pavel
> > 
> > Kill CONFIG_DISABLE_CONSOLE_SUSPEND; it should not be configurable at
> > all, instead, we should automatically keep console alive when
> > possible.
> > 
> > Signed-off-by: Pavel Machek <[EMAIL PROTECTED]>
> > 
> > diff --git a/drivers/char/lp.c b/drivers/char/lp.c
> > index 62051f8..8267ff8 100644
> > --- a/drivers/char/lp.c
> > +++ b/drivers/char/lp.c
> > @@ -144,7 +144,7 @@ static unsigned int lp_count = 0;
> >  static struct class *lp_class;
> >  
> >  #ifdef CONFIG_LP_CONSOLE
> > -static struct parport *console_registered; // initially NULL
> > +static struct parport *console_registered;
> >  #endif /* CONFIG_LP_CONSOLE */
> 
> Could you please avoid fixing things like this, white space etc. in this 
> patch?
> It would be easier to read ...

Yes.

> I generally agree with the idea, but the patch needs a clean up, IMHO.

However, we don't know which consoles are safe to stay alive during suspend.
Generally, defaulting to suspending them all is not a bad idea IMHO.
And IIRC it is plain luck if a serial console survives the suspend (or was
the serial code fixed recently?)

So i do not care too much, but my / Frank's patch was shorter :-) and safer.
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] get rid of CONFIG_DISABLE_CONSOLE_SUSPEND

2007-06-28 Thread Stefan Seyfried
(CC'ing Linus, since disabling consoles during suspend was his idea IIRC)

On Thu, Jun 28, 2007 at 05:34:54PM +0200, Rafael J. Wysocki wrote:
 Hi,
 
 On Thursday, 28 June 2007 15:51, Pavel Machek wrote:
  Hi!
  
  What about this? (Only compile tested, but looks pretty obvious to
  me). Something like this should get us rid of ugly option, and still
  solve debugging problems... Hmmm?
  Pavel
  
  Kill CONFIG_DISABLE_CONSOLE_SUSPEND; it should not be configurable at
  all, instead, we should automatically keep console alive when
  possible.
  
  Signed-off-by: Pavel Machek [EMAIL PROTECTED]
  
  diff --git a/drivers/char/lp.c b/drivers/char/lp.c
  index 62051f8..8267ff8 100644
  --- a/drivers/char/lp.c
  +++ b/drivers/char/lp.c
  @@ -144,7 +144,7 @@ static unsigned int lp_count = 0;
   static struct class *lp_class;
   
   #ifdef CONFIG_LP_CONSOLE
  -static struct parport *console_registered; // initially NULL
  +static struct parport *console_registered;
   #endif /* CONFIG_LP_CONSOLE */
 
 Could you please avoid fixing things like this, white space etc. in this 
 patch?
 It would be easier to read ...

Yes.

 I generally agree with the idea, but the patch needs a clean up, IMHO.

However, we don't know which consoles are safe to stay alive during suspend.
Generally, defaulting to suspending them all is not a bad idea IMHO.
And IIRC it is plain luck if a serial console survives the suspend (or was
the serial code fixed recently?)

So i do not care too much, but my / Frank's patch was shorter :-) and safer.
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable

2007-06-21 Thread Stefan Seyfried
On Thu, Jun 21, 2007 at 03:20:08PM +0200, Pavel Machek wrote:
> Hi!
 
> > No, i don't agree at all.
> > 
> > In this case, "no config needed" == "not possible to debug suspend
> > problems".
> 
> No, sorry.
> 
> My proposed solution is "figure out which console drivers can survive
> being on while machines go down, and keep them on".
> 
> So, "no config needed" == "kernel always does the right thing, keeping
> console during suspend when possible" == "possible to debug suspend
> problems without having to change CONFIG_ or /sys/*".

Ok. Deal. Once you fixed all the console drivers, i'll gladly send a patch
that reverts the patch we are discussing now.

Note that this patch actually helps fixing those drivers, since you can
test much easier if a given driver survives suspend ;-)
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable

2007-06-21 Thread Stefan Seyfried
On Thu, Jun 21, 2007 at 03:20:08PM +0200, Pavel Machek wrote:
 Hi!
 
  No, i don't agree at all.
  
  In this case, no config needed == not possible to debug suspend
  problems.
 
 No, sorry.
 
 My proposed solution is figure out which console drivers can survive
 being on while machines go down, and keep them on.
 
 So, no config needed == kernel always does the right thing, keeping
 console during suspend when possible == possible to debug suspend
 problems without having to change CONFIG_ or /sys/*.

Ok. Deal. Once you fixed all the console drivers, i'll gladly send a patch
that reverts the patch we are discussing now.

Note that this patch actually helps fixing those drivers, since you can
test much easier if a given driver survives suspend ;-)
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable

2007-06-18 Thread Stefan Seyfried
On Sun, Jun 17, 2007 at 11:49:40PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > > I hate having to recompile the kernel, just to be able to debug suspend.
> > > > Remove CONFIG_DISABLE_CONSOLE_SUSPEND, replace it by a tunable in
> > > > /sys/power/disable_console_suspend.
> > > > 
> > > > 
> > > > Signed-off-by: Stefan Seyfried <[EMAIL PROTECTED]>
> > > > Signed-off-by: Frank Seidel <[EMAIL PROTECTED]>
> > > > ---
> > > > Third try, renamed sysfs interface to console_suspend 
> > > > reporting and expecting either "enabled" or "disabled"
> > > 
> > > Thanks a lot for redoing it.
> > > 
> > > I have no objections.  Pavel?
> > 
> > I still think that patch is bad. I should have screamed when
> > CONFIG_DISABLE_CONSOLE_SUSPEND went into kernel. That beast should
> > _not_ be configurable, it should just do the right thing.
> > 
> > But I realized that too late. And this only makes it works, making
> > that mistake part of user-kernel interface.
> > 
> > Sorry for not screaming when CONFIG_DISABLE_CONSOLE_SUSPEND went in,
> > but please lets solve this correctly
> 
> Ouch and sorry for not screaming at "try 1" time. But it still does
> not make the patch right, and I believe that even patch authors agree
> that "no-config-needed" is superior solution.

No, i don't agree at all.

In this case, "no config needed" == "not possible to debug suspend problems".

IMO this is the same as issue as with "sysrq-C". You can crash the machine by
other means, but it sometimes is just handy to have a mechanism to do it.

I do not understand what's the problem with this option. If you want to avoid
that people use it for something else than debugging, i can add a patch that
crashes the machine ten seconds after resume if this option is set.
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Suspend-devel] [PATCH, 3rd try] make disable_console_suspend runtime configurable

2007-06-18 Thread Stefan Seyfried
On Sun, Jun 17, 2007 at 11:49:40PM +0200, Pavel Machek wrote:
 Hi!
 
I hate having to recompile the kernel, just to be able to debug suspend.
Remove CONFIG_DISABLE_CONSOLE_SUSPEND, replace it by a tunable in
/sys/power/disable_console_suspend.


Signed-off-by: Stefan Seyfried [EMAIL PROTECTED]
Signed-off-by: Frank Seidel [EMAIL PROTECTED]
---
Third try, renamed sysfs interface to console_suspend 
reporting and expecting either enabled or disabled
   
   Thanks a lot for redoing it.
   
   I have no objections.  Pavel?
  
  I still think that patch is bad. I should have screamed when
  CONFIG_DISABLE_CONSOLE_SUSPEND went into kernel. That beast should
  _not_ be configurable, it should just do the right thing.
  
  But I realized that too late. And this only makes it works, making
  that mistake part of user-kernel interface.
  
  Sorry for not screaming when CONFIG_DISABLE_CONSOLE_SUSPEND went in,
  but please lets solve this correctly
 
 Ouch and sorry for not screaming at try 1 time. But it still does
 not make the patch right, and I believe that even patch authors agree
 that no-config-needed is superior solution.

No, i don't agree at all.

In this case, no config needed == not possible to debug suspend problems.

IMO this is the same as issue as with sysrq-C. You can crash the machine by
other means, but it sometimes is just handy to have a mechanism to do it.

I do not understand what's the problem with this option. If you want to avoid
that people use it for something else than debugging, i can add a patch that
crashes the machine ten seconds after resume if this option is set.
-- 
Stefan Seyfried
QA / RD Team Mobile Devices|  Any ideas, John?
SUSE LINUX Products GmbH, Nürnberg  | Well, surrounding them's out. 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, 2nd try] make disable_console_suspend runtime configurable

2007-06-13 Thread Stefan Seyfried
On Thu, Jun 14, 2007 at 12:08:00AM +0200, Pavel Machek wrote:
> Hi!
> 
> > I hate having to recompile the kernel, just to be able to debug suspend.
> > Remove CONFIG_DISABLE_CONSOLE_SUSPEND, replace it by a tunable in
> > /sys/power/disable_console_suspend.
> 
> > Signed-off-by: Stefan Seyfried <[EMAIL PROTECTED]>
> > Signed-off-by: Frank Seidel <[EMAIL PROTECTED]>
> 
> I wonder if there's a better name?

Suggest one.

> Or maybe this should not be /sys configurable, but just have value for
> each console "this console can work while suspended"?
> 
> (serial can, vesafb can, netconsole can't)?

Go ahead, submit a patch. It won't be that trivial. And i wonder
if it is actually worth the hassle. This is a debugging facility.

> Exporting "crash-me" option to user does not seem that cool to me.

We have "echo c > /proc/sysrq-trigger" also.
This is a debugging option, and forcing users to recompile the kernel just
to debug suspend problems (not resume problems, the "it does not even go to
sleep" stuff is where this matters most) is IMO a bad idea.

We can also make this a boot parameter, i don't care, but i want to disable
console suspend without recompiling the kernel.
-- 
Stefan Seyfried
QA / R Team Mobile Devices|  "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >