Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: Its not clear from the log who called this MTRR call for WC that failed, I hope we didn't attempt a WC wright on a WB region. Who owns e000-efff ? How can I answer that? Is there any utility to run? peek inside /proc? Here is an idea: $dmesg | grep -i -5 e000 [0.220941] pci_bus :00: root bus resource [mem 0x000e4000-0x000e7fff window] [0.220944] pci_bus :00: root bus resource [mem 0xdf20-0xfeaf window] [0.220950] pci :00:00.0: [8086:0c00] type 00 class 0x06 [0.221012] pci :00:02.0: [8086:0412] type 00 class 0x03 [0.221021] pci :00:02.0: reg 0x10: [mem 0xf780-0xf7bf 64bit] [0.221025] pci :00:02.0: reg 0x18: [mem 0xe000-0xefff 64bit pref] [0.221028] pci :00:02.0: reg 0x20: [io 0xf000-0xf03f] [0.221081] pci :00:03.0: [8086:0c0c] type 00 class 0x040300 [0.221089] pci :00:03.0: reg 0x10: [mem 0xf7c34000-0xf7c37fff 64bit] [0.221163] pci :00:14.0: [8086:8cb1] type 00 class 0x0c0330 [0.221184] pci :00:14.0: reg 0x10: [mem 0xf7c2-0xf7c2 64bit] -- [0.453765] calling ioapic_init_ops+0x0/0xf @ 1 [0.453767] initcall ioapic_init_ops+0x0/0xf returned 0 after 0 usecs [0.453770] calling add_pcspkr+0x0/0x3b @ 1 [0.453781] initcall add_pcspkr+0x0/0x3b returned 0 after 8 usecs [0.453783] calling sysfb_init+0x0/0x96 @ 1 [0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe000, 0x6bb000 bytes, mapped to 0xc9000200 [0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720 [0.557233] Console: switching to colour frame buffer device 210x65 [0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered! [0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs [0.661266] calling audit_classes_init+0x0/0xaa @ 1 -- [9.744397] input: gspca_zc3xx as /devices/pci:00/:00:14.0/usb3/3-3/input/input18 [9.744481] usbcore: registered new interface driver gspca_zc3xx [9.744484] initcall sd_driver_init+0x0/0x1000 [gspca_zc3xx] returned 0 after 319 usecs [9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 [9.745542] [drm] Memory usable by graphics device = 2048M [9.745544] checking generic (e000 6bb000) vs hw (e000 1000) [9.745544] fb: switching to inteldrmfb from simple [9.745831] calling alsa_seq_device_init+0x0/0x1000 [snd_seq_device] @ 384 [9.745842] initcall alsa_seq_device_init+0x0/0x1000 [snd_seq_device] returned 0 after 9 usecs [9.746179] calling hmac_module_init+0x0/0x1000 [hmac] @ 471 [9.746180] initcall hmac_module_init+0x0/0x1000 [hmac] returned 0 after 0 usecs -- [9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384 [9.751163] usbcore: registered new interface driver snd-usb-audio [9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs [9.943166] Console: switching to colour dummy device 80x25 [9.943240] [drm] Replacing VGA console driver [9.943520] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [9.943526] Failed to add WC MTRR for [e000-efff]; performance may suffer. [9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS [9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [9.949728] [drm] Driver supports precise vblank timestamp query. [9.949801] vgaarb: device changed decodes: PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) $lspci | grep 00:02.0 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) Looks like it is the graphics card or the graphics driver. I don't know if this is relevant $ cat /proc/mtrr reg00: base=0x0 (0MB), size=16384MB, count=1: write-back reg01: base=0x4 (16384MB), size= 512MB, count=1: write-back reg02: base=0x0e000 ( 3584MB), size= 512MB, count=1: uncachable reg03: base=0x0d000 ( 3328MB), size= 256MB, count=1: uncachable reg04: base=0x0cf00 ( 3312MB), size= 16MB, count=1: uncachable reg05: base=0x41f00 (16880MB), size= 16MB, count=1: uncachable reg06: base=0x41ee0 (16878MB), size=2MB, count=1: uncachable What does your log show right before and after this? To find out try: dmesg | grep -5 -i mtrr See full dmesg attached $dmesg | grep -5 -i mtrr [0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs [0.189336] calling pt_init+0x0/0x2a4 @ 1 [0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs [0.189352] calling bts_init+0x0/0xa4 @ 1 [0.189354] initcall bts_init+0x0/0xa4
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote: Its not clear from the log who called this MTRR call for WC that failed, I hope we didn't attempt a WC wright on a WB region. Who owns e000-efff ? How can I answer that? Is there any utility to run? peek inside /proc? Here is an idea: $dmesg | grep -i -5 e000 [0.220941] pci_bus :00: root bus resource [mem 0x000e4000-0x000e7fff window] [0.220944] pci_bus :00: root bus resource [mem 0xdf20-0xfeaf window] [0.220950] pci :00:00.0: [8086:0c00] type 00 class 0x06 [0.221012] pci :00:02.0: [8086:0412] type 00 class 0x03 [0.221021] pci :00:02.0: reg 0x10: [mem 0xf780-0xf7bf 64bit] [0.221025] pci :00:02.0: reg 0x18: [mem 0xe000-0xefff 64bit pref] [0.221028] pci :00:02.0: reg 0x20: [io 0xf000-0xf03f] [0.221081] pci :00:03.0: [8086:0c0c] type 00 class 0x040300 [0.221089] pci :00:03.0: reg 0x10: [mem 0xf7c34000-0xf7c37fff 64bit] [0.221163] pci :00:14.0: [8086:8cb1] type 00 class 0x0c0330 [0.221184] pci :00:14.0: reg 0x10: [mem 0xf7c2-0xf7c2 64bit] -- [0.453765] calling ioapic_init_ops+0x0/0xf @ 1 [0.453767] initcall ioapic_init_ops+0x0/0xf returned 0 after 0 usecs [0.453770] calling add_pcspkr+0x0/0x3b @ 1 [0.453781] initcall add_pcspkr+0x0/0x3b returned 0 after 8 usecs [0.453783] calling sysfb_init+0x0/0x96 @ 1 [0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe000, 0x6bb000 bytes, mapped to 0xc9000200 [0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720 [0.557233] Console: switching to colour frame buffer device 210x65 [0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered! [0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs [0.661266] calling audit_classes_init+0x0/0xaa @ 1 -- [9.744397] input: gspca_zc3xx as /devices/pci:00/:00:14.0/usb3/3-3/input/input18 [9.744481] usbcore: registered new interface driver gspca_zc3xx [9.744484] initcall sd_driver_init+0x0/0x1000 [gspca_zc3xx] returned 0 after 319 usecs [9.745108] calling i915_init+0x0/0xa2 [i915] @ 403 [9.745542] [drm] Memory usable by graphics device = 2048M [9.745544] checking generic (e000 6bb000) vs hw (e000 1000) [9.745544] fb: switching to inteldrmfb from simple [9.745831] calling alsa_seq_device_init+0x0/0x1000 [snd_seq_device] @ 384 [9.745842] initcall alsa_seq_device_init+0x0/0x1000 [snd_seq_device] returned 0 after 9 usecs [9.746179] calling hmac_module_init+0x0/0x1000 [hmac] @ 471 [9.746180] initcall hmac_module_init+0x0/0x1000 [hmac] returned 0 after 0 usecs -- [9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384 [9.751163] usbcore: registered new interface driver snd-usb-audio [9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs [9.943166] Console: switching to colour dummy device 80x25 [9.943240] [drm] Replacing VGA console driver [9.943520] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [9.943526] Failed to add WC MTRR for [e000-efff]; performance may suffer. [9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS [9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [9.949728] [drm] Driver supports precise vblank timestamp query. [9.949801] vgaarb: device changed decodes: PCI::00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) $lspci | grep 00:02.0 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) Looks like it is the graphics card or the graphics driver. I don't know if this is relevant $ cat /proc/mtrr reg00: base=0x0 (0MB), size=16384MB, count=1: write-back reg01: base=0x4 (16384MB), size= 512MB, count=1: write-back reg02: base=0x0e000 ( 3584MB), size= 512MB, count=1: uncachable reg03: base=0x0d000 ( 3328MB), size= 256MB, count=1: uncachable reg04: base=0x0cf00 ( 3312MB), size= 16MB, count=1: uncachable reg05: base=0x41f00 (16880MB), size= 16MB, count=1: uncachable reg06: base=0x41ee0 (16878MB), size=2MB, count=1: uncachable What does your log show right before and after this? To find out try: dmesg | grep -5 -i mtrr See full dmesg attached $dmesg | grep -5 -i mtrr [0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs [0.189336] calling pt_init+0x0/0x2a4 @ 1 [0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs [0.189352] calling bts_init+0x0/0xa4 @ 1 [0.189354] initcall bts_init+0x0/0xa4
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/20/2015 02:23 PM, Juergen Gross wrote: On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. I don't have access right now. I will test it later tonight (This is my home machine). Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr somewere else e.g. /proc /sys etc? I think grepping for MTRR in dmesg should be enough. kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place. $dmesg | grep -i mtr for 4.3 kernel with notpat [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs [8.994140] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [8.994154] Failed to add WC MTRR for [e000-efff]; performance may suffer. $dmesg | grep -i mtr for 4.3 kernel with default pat enabled [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR? Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently? Note: With PAT enabled the system boots up significantly faster. In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing... I will also try with nopat and I will run dmesg | grep -i mtr and post results Unless you have any other suggestions... Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/20/2015 02:23 PM, Juergen Gross wrote: On 20/11/15 11:04, vas...@iit.demokritos.gr wrote: I've just found a potential issue: In case MTRR is disabled by the BIOS the PAT register of the boot processor won't be restored after resume. Can you check whether pr_info("MTRR: Disabled\n") has been executed in early boot? If yes, this might be a BIOS option. I don't have access right now. I will test it later tonight (This is my home machine). Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr somewere else e.g. /proc /sys etc? I think grepping for MTRR in dmesg should be enough. kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place. $dmesg | grep -i mtr for 4.3 kernel with notpat [0.189113] calling mtrr_if_init+0x0/0x5f @ 1 [0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189222] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs [8.994140] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [8.994154] Failed to add WC MTRR for [e000-efff]; performance may suffer. $dmesg | grep -i mtr for 4.3 kernel with default pat enabled [0.189368] calling mtrr_if_init+0x0/0x5f @ 1 [0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs [0.189478] pmd_set_huge: Cannot satisfy [mem 0xf800-0xf820] with a huge-page mapping due to MTRR override. [0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 [0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR? Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently? Note: With PAT enabled the system boots up significantly faster. In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing... I will also try with nopat and I will run dmesg | grep -i mtr and post results Unless you have any other suggestions... Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: I compiled and I am running 4.3 right now. It failed this morning. Last night I did 3 hibernate / resume cycles. In the last one I I also turned off the PSU (this seems to push it over the edge - but it may be random behavior) and it worked. This morning 7h later failed to resume - but it didn't hang on _lapic_resume. This time it rebooted - and I seem to recall this behavior for 4.2+ kernels. I forgot to mention it because my testing with 4.x kernels were one month before. So 4.3 kernel - reboots on resume after a long hibernation time. I am testing with 4.3 and nopat right now. Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 11:10 AM, Juergen Gross wrote: So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume 4.3 for now. I think 4.3 is okay. I will do it later tonight. It will take 2 days at least to report back I compiled and I am running 4.3 right now. If it fails I will try with the nopat option. If it fails I will try 3.18-rc2+nopat to see if that fails. Do you want me to run something on this like lspci, lsusb Yes, please post the output of both. Here they are. See attachments I would like this to be fixed so I am willing to do the testing. I appreciate this spirit. :-) I appreciate the guidance. :-) Vassilis Bus 004 Device 002: ID 8087:8001 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8009 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 002: ID 8087:8001 Intel Corp. Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize064 idVendor 0x8087 Intel Corp. idProduct 0x8001 bcdDevice0.00 iManufacturer 0 iProduct0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 12 Hub Descriptor: bLength 11 bDescriptorType 41 nNbrPorts 8 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood0 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable0x00 0x00 PortPwrCtrlMask0xff 0xff Hub Port Status: Port 1: .0100 power Port 2: .0100 power Port 3: .0100 power Port 4: .0100 power Port 5: .0100 power Port 6: .0100 power Port 7: .0100 power Port 8: .0100 power Device Qualifier (for other device speed): bLength10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 bNumConfigurations 1 Device Status: 0x0001 Self Powered Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 idVendor 0x1d6b Linux Foundation idProduct 0x0002 2.0 root hub bcdDevice4.03 iManufacturer 3 Linux 4.3.0+ ehci_hcd iProduct2 EHCI Host Controller iSerial 1 :00:1d.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 11:10 AM, Juergen Gross wrote: So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume 4.3 for now. I think 4.3 is okay. I will do it later tonight. It will take 2 days at least to report back I compiled and I am running 4.3 right now. If it fails I will try with the nopat option. If it fails I will try 3.18-rc2+nopat to see if that fails. Do you want me to run something on this like lspci, lsusb Yes, please post the output of both. Here they are. See attachments I would like this to be fixed so I am willing to do the testing. I appreciate this spirit. :-) I appreciate the guidance. :-) Vassilis Bus 004 Device 002: ID 8087:8001 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8009 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 002: ID 8087:8001 Intel Corp. Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 1 Single TT bMaxPacketSize064 idVendor 0x8087 Intel Corp. idProduct 0x8001 bcdDevice0.00 iManufacturer 0 iProduct0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 12 Hub Descriptor: bLength 11 bDescriptorType 41 nNbrPorts 8 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection TT think time 8 FS bits bPwrOn2PwrGood0 * 2 milli seconds bHubContrCurrent 0 milli Ampere DeviceRemovable0x00 0x00 PortPwrCtrlMask0xff 0xff Hub Port Status: Port 1: .0100 power Port 2: .0100 power Port 3: .0100 power Port 4: .0100 power Port 5: .0100 power Port 6: .0100 power Port 7: .0100 power Port 8: .0100 power Device Qualifier (for other device speed): bLength10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 bNumConfigurations 1 Device Status: 0x0001 Self Powered Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass9 Hub bDeviceSubClass 0 Unused bDeviceProtocol 0 Full speed (or root) hub bMaxPacketSize064 idVendor 0x1d6b Linux Foundation idProduct 0x0002 2.0 root hub bcdDevice4.03 iManufacturer 3 Linux 4.3.0+ ehci_hcd iProduct2 EHCI Host Controller iSerial 1 :00:1d.0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 25 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 Unused bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5
Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
On 11/19/2015 10:35 PM, Vassilis Virvilis wrote: I compiled and I am running 4.3 right now. It failed this morning. Last night I did 3 hibernate / resume cycles. In the last one I I also turned off the PSU (this seems to push it over the edge - but it may be random behavior) and it worked. This morning 7h later failed to resume - but it didn't hang on _lapic_resume. This time it rebooted - and I seem to recall this behavior for 4.2+ kernels. I forgot to mention it because my testing with 4.x kernels were one month before. So 4.3 kernel - reboots on resume after a long hibernation time. I am testing with 4.3 and nopat right now. Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Debugging COW (copy on write) memory after fork: Is it possible to dump only the private anonymous memory of a process?
On 04/06/2013 09:11 PM, Bruno Prémont wrote: On Fri, 05 April 2013 Vassilis Virvilis wrote: Question Is it possible to dump only the private anonymous memory of a process? I don't know if that's possible, but from your background you could probably work around it be mmap()ing the memory you need and once initialized mark all of that memory read-only (if you mmap very large chunks you can even benefit from huge-pages). Any of the forked processes that tried to access the memory would then get a signal if they ever tried to write to the data (and thus unsharing it) I can't do that. We are talking about an existing system (in perl with C modules) that has been parallelized in a second step. If you allocate and initialize all of your memory in little malloc()'ed chunks it's possibly glibc's memory housekeeping that unshares all those pages over time. Yes I suppose it is a series of mallocs. I could easily verify that with strace. However if glibc's memory housekeeping undermines the COW behaviour that would be very bad. I have unit tests that I was able to work around the usual perl problems that cause memory unsharing such as the reference counting and hash accessing. Garbage collector shouldn't be a problem because there is nothing to collect from the shared memory, only private local variables that go out of scope. The problem is when I am employing these workarounds in the live system (with considerable IO) I am getting massive unsharing. So I thought to have a look and see what's going in two or three consecutive private memory dumps. The point is I need to locate the source of the memory unsharing. Any ideas how this can be done? At this point I could try in house compiled kernels if I can enable some logging to track this behavior. Does any knob like this exist? Even as an #ifdef? Vassilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Debugging COW (copy on write) memory after fork: Is it possible to dump only the private anonymous memory of a process?
On 04/06/2013 09:11 PM, Bruno Prémont wrote: On Fri, 05 April 2013 Vassilis Virvilisv.virvi...@biovista.com wrote: Question Is it possible to dump only the private anonymous memory of a process? I don't know if that's possible, but from your background you could probably work around it be mmap()ing the memory you need and once initialized mark all of that memory read-only (if you mmap very large chunks you can even benefit from huge-pages). Any of the forked processes that tried to access the memory would then get a signal if they ever tried to write to the data (and thus unsharing it) I can't do that. We are talking about an existing system (in perl with C modules) that has been parallelized in a second step. If you allocate and initialize all of your memory in little malloc()'ed chunks it's possibly glibc's memory housekeeping that unshares all those pages over time. Yes I suppose it is a series of mallocs. I could easily verify that with strace. However if glibc's memory housekeeping undermines the COW behaviour that would be very bad. I have unit tests that I was able to work around the usual perl problems that cause memory unsharing such as the reference counting and hash accessing. Garbage collector shouldn't be a problem because there is nothing to collect from the shared memory, only private local variables that go out of scope. The problem is when I am employing these workarounds in the live system (with considerable IO) I am getting massive unsharing. So I thought to have a look and see what's going in two or three consecutive private memory dumps. The point is I need to locate the source of the memory unsharing. Any ideas how this can be done? At this point I could try in house compiled kernels if I can enable some logging to track this behavior. Does any knob like this exist? Even as an #ifdef? Vassilis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Debugging COW (copy on write) memory after fork: Is it possible to dump only the private anonymous memory of a process?
Hello, sorry if this is off topic. Just point me to the right direction. Please cc me also in the reply. Question Is it possible to dump only the private anonymous memory of a process? Background -- I have a process where it reads and it initializes a large portion of the memory (around 2.3GB). This memory is effectively read only from that point and on. After the initialization I fork the process to several children in order to take advantage of the multicore architecture of modern cpus. The problem is that finally the program ends up requiring number_of_process * 2.3GB memory effectively entering swap thrashing and destroying the performance. Steps so far The first thing I did is to monitor the memory. I found about /proc/$pid/smaps and the http://wingolog.org/pub/mem_usage.py. What happens is the following The program starts reads from disk and has 2.3GB of private mappings The program forks. Immediately the 2.3GB become shared mapping between the parent and the child. Excellent so far. As the time goes and the child starts performing its tasks the shared memory is slowly migrating to the private mappings of each process effectively blowing up the memory requirements. I thought that if I could see (dump) the private mappings of each process I could see from the data why the shared mappings are being touched so I tried to dump the core with gcore and by playing with /proc/$pid/coredump_filter like this echo 0x1 > /proc/$pid/coredump_filter gcore $pid Unfortunately it always dumps 2.3GB despite the setting in /proc/$pid/coredump_filter which says private anonymous mappings. I have researched the question in google. I even posted it in stack overflow. Any other ideas? Thanks in advance Vassilis Virvilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Debugging COW (copy on write) memory after fork: Is it possible to dump only the private anonymous memory of a process?
Hello, sorry if this is off topic. Just point me to the right direction. Please cc me also in the reply. Question Is it possible to dump only the private anonymous memory of a process? Background -- I have a process where it reads and it initializes a large portion of the memory (around 2.3GB). This memory is effectively read only from that point and on. After the initialization I fork the process to several children in order to take advantage of the multicore architecture of modern cpus. The problem is that finally the program ends up requiring number_of_process * 2.3GB memory effectively entering swap thrashing and destroying the performance. Steps so far The first thing I did is to monitor the memory. I found about /proc/$pid/smaps and the http://wingolog.org/pub/mem_usage.py. What happens is the following The program starts reads from disk and has 2.3GB of private mappings The program forks. Immediately the 2.3GB become shared mapping between the parent and the child. Excellent so far. As the time goes and the child starts performing its tasks the shared memory is slowly migrating to the private mappings of each process effectively blowing up the memory requirements. I thought that if I could see (dump) the private mappings of each process I could see from the data why the shared mappings are being touched so I tried to dump the core with gcore and by playing with /proc/$pid/coredump_filter like this echo 0x1 /proc/$pid/coredump_filter gcore $pid Unfortunately it always dumps 2.3GB despite the setting in /proc/$pid/coredump_filter which says private anonymous mappings. I have researched the question in google. I even posted it in stack overflow. Any other ideas? Thanks in advance Vassilis Virvilis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/