Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 14:56, One Thousand Gnomes wrote: > On Thu, 19 Mar 2015 14:09:29 +0300 > Michael Tokarev wrote: > >> Half a year passed since my first email in this thread, and current kernels >> (4.0-tobe) still does not work properly. Meanwhile, I found this thread: >> http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ >> which seems to help. I wonder where they got these boot params from... >> > > Its one of the standard suggestions for dealing with wonky DRM I think. > > If that makes the difference on your box can you send me a dmidecode of > it, and I'll see if we can at least teach the driver that the 2500CC > needs LVDS enabled regardless of what the BIOS reports. Ok. actually this is not so simple. Yes, LVDS:d makes a difference. Namely, it enables monitor connected to VGA-0 to function. But once I plug in a digital monitor (DVI-0), screen goes blank when loading the module again, and this time, it does not matter whenever I specify any video= options (trying to disable any combinations of listed adaptors), screen is always blank. So basically the thing is still unusable. Because d-sub connection isn't stable (picture "trembles" depending on the cable and environment conditions), while digital option does not work. In bios, there's an option to ENable LVDS (it is disabled by default) and once enabled, to make it primary or secondary (with either automatically or manually choosen secondary/primary, being d-sub or dvi). When I enable LVDS with any other monitor in bios, the thing does not work again, the same way (screen goes blank once the module is loaded), but now d-sub/vga monitor does not work too. Ouf of curiocity I tried to run windows7 on this machine. Apparently it works with dvi monitor just fine and supports configuration with 2 monitors. Maybe they have some quirks in the drivers, I dunno... Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 14:56, One Thousand Gnomes wrote: On Thu, 19 Mar 2015 14:09:29 +0300 Michael Tokarev m...@tls.msk.ru wrote: Half a year passed since my first email in this thread, and current kernels (4.0-tobe) still does not work properly. Meanwhile, I found this thread: http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ which seems to help. I wonder where they got these boot params from... Its one of the standard suggestions for dealing with wonky DRM I think. If that makes the difference on your box can you send me a dmidecode of it, and I'll see if we can at least teach the driver that the 2500CC needs LVDS enabled regardless of what the BIOS reports. Ok. actually this is not so simple. Yes, LVDS:d makes a difference. Namely, it enables monitor connected to VGA-0 to function. But once I plug in a digital monitor (DVI-0), screen goes blank when loading the module again, and this time, it does not matter whenever I specify any video= options (trying to disable any combinations of listed adaptors), screen is always blank. So basically the thing is still unusable. Because d-sub connection isn't stable (picture trembles depending on the cable and environment conditions), while digital option does not work. In bios, there's an option to ENable LVDS (it is disabled by default) and once enabled, to make it primary or secondary (with either automatically or manually choosen secondary/primary, being d-sub or dvi). When I enable LVDS with any other monitor in bios, the thing does not work again, the same way (screen goes blank once the module is loaded), but now d-sub/vga monitor does not work too. Ouf of curiocity I tried to run windows7 on this machine. Apparently it works with dvi monitor just fine and supports configuration with 2 monitors. Maybe they have some quirks in the drivers, I dunno... Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 23:05, One Thousand Gnomes wrote: >> Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is >> graphics/video output on the screen, with the following message from kernel >> when loading gma500_gfx: >> >> [6.472859] [drm] forcing LVDS-1 connector OFF >> >> (and a few others). >> >> There's one funky thing still -- the screen size is not calculated correctly >> for the text (vga, d-sub) console, last text line is placed at about 3/4 of >> the screen size, with the rest - 1/4 - of the screen being blank. > > I've seen that in one other case, where what was in fact happening was > that forcing the connector "off" was actually effectively leaving it as > the BIOS set it. When I use LVDS-1:d in the kernel command line, that connector is not shown by utilities such as xrandr, at all. There is, however, another connector, named LVDS-0, and are also DVI-0, DVI-1, and DisplayPort-0, DisplayPort-1, while this mobo only have DVI & D-SUB (and LVDS soldered on board too) and no DP. At least as far as I can see. So at least one LVDS connector is shown anyway (LVDS-0, not LVDS-1), and that one is "not connected". Besides, DisplayPort-1 is shown as "connected" by xrandr, with monitor set to 1024x768 mode, -- I think this is why the text VGA size is calculated wrong.. Lemme see... ..nope. Adding video=DisplayPort-1:d to the kernel command line (in addition to video=LVDS-1:d) makes no difference, DisplayPort-1 is still shown by xrandr as connected @1024x768. > What happens if you then use xrandr to change the > display sizes ? X11 works fine as far as I can see. Xrandr works and changes video modes. Once I switch from X back to the text console the text size occupes 3/4 of the screen only, as if the monitor was smaller. I wonder if it will work with more than one monitor... ;) I'll try hopefully today. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 23:05, One Thousand Gnomes wrote: Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is graphics/video output on the screen, with the following message from kernel when loading gma500_gfx: [6.472859] [drm] forcing LVDS-1 connector OFF (and a few others). There's one funky thing still -- the screen size is not calculated correctly for the text (vga, d-sub) console, last text line is placed at about 3/4 of the screen size, with the rest - 1/4 - of the screen being blank. I've seen that in one other case, where what was in fact happening was that forcing the connector off was actually effectively leaving it as the BIOS set it. When I use LVDS-1:d in the kernel command line, that connector is not shown by utilities such as xrandr, at all. There is, however, another connector, named LVDS-0, and are also DVI-0, DVI-1, and DisplayPort-0, DisplayPort-1, while this mobo only have DVI D-SUB (and LVDS soldered on board too) and no DP. At least as far as I can see. So at least one LVDS connector is shown anyway (LVDS-0, not LVDS-1), and that one is not connected. Besides, DisplayPort-1 is shown as connected by xrandr, with monitor set to 1024x768 mode, -- I think this is why the text VGA size is calculated wrong.. Lemme see... ..nope. Adding video=DisplayPort-1:d to the kernel command line (in addition to video=LVDS-1:d) makes no difference, DisplayPort-1 is still shown by xrandr as connected @1024x768. What happens if you then use xrandr to change the display sizes ? X11 works fine as far as I can see. Xrandr works and changes video modes. Once I switch from X back to the text console the text size occupes 3/4 of the screen only, as if the monitor was smaller. I wonder if it will work with more than one monitor... ;) I'll try hopefully today. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 14:56, One Thousand Gnomes wrote: > On Thu, 19 Mar 2015 14:09:29 +0300 > Michael Tokarev wrote: > >> Half a year passed since my first email in this thread, and current kernels >> (4.0-tobe) still does not work properly. Meanwhile, I found this thread: >> http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ >> which seems to help. I wonder where they got these boot params from... > > Its one of the standard suggestions for dealing with wonky DRM I think. > > If that makes the difference on your box can you send me a dmidecode of > it, and I'll see if we can at least teach the driver that the 2500CC > needs LVDS enabled regardless of what the BIOS reports. I think you mean disable, not enable, since this is (again, I think) what video=LVDS-1:d kernel boot parameter does. Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is graphics/video output on the screen, with the following message from kernel when loading gma500_gfx: [6.472859] [drm] forcing LVDS-1 connector OFF (and a few others). There's one funky thing still -- the screen size is not calculated correctly for the text (vga, d-sub) console, last text line is placed at about 3/4 of the screen size, with the rest - 1/4 - of the screen being blank. However, X seems to work fine, using generic modesetting driver. Below is dmidecode output. Thanks, /mjt === # dmidecode 2.12 SMBIOS 2.7 present. 27 structures occupying 1491 bytes. Table at 0x000EB920. Handle 0x, DMI type 4, 42 bytes Processor Information Socket Designation: CPU 1 Type: Central Processor Family: Other Manufacturer: Intel(R) Corporation ID: 61 06 03 00 FF FB EB BF Version: Intel(R) Atom(TM) CPU D2500 @ 1.86GHz Voltage: 1.1 V External Clock: 133 MHz Max Speed: 4000 MHz Current Speed: 1868 MHz Status: Populated, Enabled Upgrade: None L1 Cache Handle: 0x0003 L2 Cache Handle: 0x0001 L3 Cache Handle: Not Provided Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Core Count: 2 Core Enabled: 2 Thread Count: 1 Characteristics: 64-bit capable Multi-Core Execute Protection Handle 0x0001, DMI type 7, 19 bytes Cache Information Socket Designation: Unknown Configuration: Enabled, Not Socketed, Level 2 Operational Mode: Write Back Location: Internal Installed Size: 512 kB Maximum Size: 512 kB Supported SRAM Types: Asynchronous Installed SRAM Type: Asynchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Data Associativity: 8-way Set-associative Handle 0x0002, DMI type 7, 19 bytes Cache Information Socket Designation: Unknown Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 32 kB Maximum Size: 32 kB Supported SRAM Types: Asynchronous Installed SRAM Type: Asynchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Instruction Associativity: 8-way Set-associative Handle 0x0003, DMI type 7, 19 bytes Cache Information Socket Designation: Unknown Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 24 kB Maximum Size: 24 kB Supported SRAM Types: Asynchronous Installed SRAM Type: Asynchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Data Associativity: 32-way Set-associative Handle 0x0004, DMI type 0, 24 bytes BIOS Information Vendor: Intel Corp. Version: CCCDT10N.86A.0039.2013.0425.1625 Release Date: 04/25/2013 Address: 0xF Runtime Size: 64 kB ROM Size: 2048 kB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported EDD is supported 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported ATAPI Zip drive boot is supported BIOS boot specification is supported Function key-initiated network
Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 14:09, Michael Tokarev wrote: > Half a year passed since my first email in this thread, and current kernels Actually it was more than a year, since Feb-2014 ;) > (4.0-tobe) still does not work properly. Meanwhile, I found this thread: > http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ > which seems to help. I wonder where they got these boot params from... > > Thanks, > > /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
Half a year passed since my first email in this thread, and current kernels (4.0-tobe) still does not work properly. Meanwhile, I found this thread: http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ which seems to help. I wonder where they got these boot params from... Thanks, /mjt 05.08.2014 20:15, Michael Tokarev wrote: > 05.08.2014 20:11, Michael Tokarev wrote: >> Hello again. >> >> It's been 4 more months since last message in this thread (which was mine). >> Now kernel 3.16 has been released, and I decided to give it a try. And it >> behaves just like all previous kernels, -- once gma500_gfx module is loaded, >> screen goes blank, monitor turns off ("no signal detected") and nothing to >> be seen until reboot. >> >> Can we try to debug this somehow, after more than half a year?... :) > > Current debugging (by 3.16), after: > > modprobe drm debug=6 > modprobe gma500_gfx > > on a freshly booted system: > > [ 46.463381] Linux agpgart interface v0.103 > [ 46.491487] [drm] Initialized drm 1.1.0 20060810 > [ 56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported > [ 56.585528] [drm:psb_intel_opregion_setup] ASLE supported > [ 56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X > [ 56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT > CEDARVIEW d > [ 56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:"1920x1080" 0 > 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa > [ 56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found > in VBT > [ 56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 > 500 t11_t12 5000 > [ 56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, > Bpp 24 > [ 56.585624] [drm:parse_edp] VBT reports EDP: VSwing 0, Preemph 0 > [ 56.598203] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) > [ 56.598902] acpi device:28: registered as cooling_device2 > [ 56.599109] input: Video Bus as > /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11 > [ 56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [ 56.599366] [drm] No driver support for vblank timestamp query. > [ 56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter > intel drm LVDSDDC_C > [ 56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B > [ 56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 > [ 56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 > [ 56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 > [ 56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 > [ 56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C > [ 56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack > [ 56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack > [ 56.714765] gma500 :00:02.0: trying to get vblank count for disabled > pipe 1 > [ 56.714812] gma500 :00:02.0: trying to get vblank count for disabled > pipe 1 > [ 56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] > [CONNECTOR:10:VGA-1] > [ 56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] > [CONNECTOR:10:VGA-1] probed modes : > [ 56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:"1280x1024" 60 > 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5 > [ 56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:"1280x1024" 75 > 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5 > [ 56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:"1280x1024" 72 > 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6 > [ 56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:"1152x864" 75 > 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5 > [ 56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:"1024x768" 75 > 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5 > [ 56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:"1024x768" 70 > 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa > [ 56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:"1024x768" 60 > 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa > [ 56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:"832x624" 75 > 57284 832 864 928 1152 624 625 628 667 0x40 0xa > [ 56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:"800x600" 75 > 49500 800 816 896 1056 600 601 604 625 0x40 0x5 > [ 56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:"800x600" 72 > 5 800 856 976 1040 600 637 643
Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 14:09, Michael Tokarev wrote: Half a year passed since my first email in this thread, and current kernels Actually it was more than a year, since Feb-2014 ;) (4.0-tobe) still does not work properly. Meanwhile, I found this thread: http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ which seems to help. I wonder where they got these boot params from... Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
Half a year passed since my first email in this thread, and current kernels (4.0-tobe) still does not work properly. Meanwhile, I found this thread: http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ which seems to help. I wonder where they got these boot params from... Thanks, /mjt 05.08.2014 20:15, Michael Tokarev wrote: 05.08.2014 20:11, Michael Tokarev wrote: Hello again. It's been 4 more months since last message in this thread (which was mine). Now kernel 3.16 has been released, and I decided to give it a try. And it behaves just like all previous kernels, -- once gma500_gfx module is loaded, screen goes blank, monitor turns off (no signal detected) and nothing to be seen until reboot. Can we try to debug this somehow, after more than half a year?... :) Current debugging (by 3.16), after: modprobe drm debug=6 modprobe gma500_gfx on a freshly booted system: [ 46.463381] Linux agpgart interface v0.103 [ 46.491487] [drm] Initialized drm 1.1.0 20060810 [ 56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported [ 56.585528] [drm:psb_intel_opregion_setup] ASLE supported [ 56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X [ 56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT CEDARVIEW d [ 56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:1920x1080 0 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [ 56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found in VBT [ 56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 5000 [ 56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 24 [ 56.585624] [drm:parse_edp] VBT reports EDP: VSwing 0, Preemph 0 [ 56.598203] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [ 56.598902] acpi device:28: registered as cooling_device2 [ 56.599109] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11 [ 56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 56.599366] [drm] No driver support for vblank timestamp query. [ 56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter intel drm LVDSDDC_C [ 56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B [ 56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 [ 56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 [ 56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 [ 56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 [ 56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C [ 56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack [ 56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack [ 56.714765] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 56.714812] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:10:VGA-1] [ 56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:10:VGA-1] probed modes : [ 56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:1280x1024 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5 [ 56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:1280x1024 75 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5 [ 56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:1280x1024 72 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6 [ 56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:1152x864 75 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5 [ 56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:1024x768 75 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5 [ 56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:1024x768 70 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa [ 56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:1024x768 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa [ 56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:832x624 75 57284 832 864 928 1152 624 625 628 667 0x40 0xa [ 56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:800x600 75 49500 800 816 896 1056 600 601 604 625 0x40 0x5 [ 56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:800x600 72 5 800 856 976 1040 600 637 643 666 0x40 0x5 [ 56.900681] [drm:drm_mode_debug_printmodeline] Modeline 30:800x600 60 4 800 840 968 1056 600 601 605 628 0x40 0x5 [ 56.900687] [drm:drm_mode_debug_printmodeline] Modeline 31:640x480 75 31500 640 656 720 840 480 481 484 500 0x40 0xa [ 56.900694] [drm:drm_mode_debug_printmodeline] Modeline 32:640x480 73 31500 640 664 704 832 480 489 491 520 0x40 0xa [ 56.900700] [drm:drm_mode_debug_printmodeline] Modeline 33:640x480 67 30240 640 704 768 864 480 483 486
Re: screen goes blank when loading gma500_gfx (atom D2500)
19.03.2015 14:56, One Thousand Gnomes wrote: On Thu, 19 Mar 2015 14:09:29 +0300 Michael Tokarev m...@tls.msk.ru wrote: Half a year passed since my first email in this thread, and current kernels (4.0-tobe) still does not work properly. Meanwhile, I found this thread: http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/ which seems to help. I wonder where they got these boot params from... Its one of the standard suggestions for dealing with wonky DRM I think. If that makes the difference on your box can you send me a dmidecode of it, and I'll see if we can at least teach the driver that the 2500CC needs LVDS enabled regardless of what the BIOS reports. I think you mean disable, not enable, since this is (again, I think) what video=LVDS-1:d kernel boot parameter does. Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is graphics/video output on the screen, with the following message from kernel when loading gma500_gfx: [6.472859] [drm] forcing LVDS-1 connector OFF (and a few others). There's one funky thing still -- the screen size is not calculated correctly for the text (vga, d-sub) console, last text line is placed at about 3/4 of the screen size, with the rest - 1/4 - of the screen being blank. However, X seems to work fine, using generic modesetting driver. Below is dmidecode output. Thanks, /mjt === # dmidecode 2.12 SMBIOS 2.7 present. 27 structures occupying 1491 bytes. Table at 0x000EB920. Handle 0x, DMI type 4, 42 bytes Processor Information Socket Designation: CPU 1 Type: Central Processor Family: Other Manufacturer: Intel(R) Corporation ID: 61 06 03 00 FF FB EB BF Version: Intel(R) Atom(TM) CPU D2500 @ 1.86GHz Voltage: 1.1 V External Clock: 133 MHz Max Speed: 4000 MHz Current Speed: 1868 MHz Status: Populated, Enabled Upgrade: None L1 Cache Handle: 0x0003 L2 Cache Handle: 0x0001 L3 Cache Handle: Not Provided Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Core Count: 2 Core Enabled: 2 Thread Count: 1 Characteristics: 64-bit capable Multi-Core Execute Protection Handle 0x0001, DMI type 7, 19 bytes Cache Information Socket Designation: Unknown Configuration: Enabled, Not Socketed, Level 2 Operational Mode: Write Back Location: Internal Installed Size: 512 kB Maximum Size: 512 kB Supported SRAM Types: Asynchronous Installed SRAM Type: Asynchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Data Associativity: 8-way Set-associative Handle 0x0002, DMI type 7, 19 bytes Cache Information Socket Designation: Unknown Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 32 kB Maximum Size: 32 kB Supported SRAM Types: Asynchronous Installed SRAM Type: Asynchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Instruction Associativity: 8-way Set-associative Handle 0x0003, DMI type 7, 19 bytes Cache Information Socket Designation: Unknown Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 24 kB Maximum Size: 24 kB Supported SRAM Types: Asynchronous Installed SRAM Type: Asynchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Data Associativity: 32-way Set-associative Handle 0x0004, DMI type 0, 24 bytes BIOS Information Vendor: Intel Corp. Version: CCCDT10N.86A.0039.2013.0425.1625 Release Date: 04/25/2013 Address: 0xF Runtime Size: 64 kB ROM Size: 2048 kB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported EDD is supported 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported ATAPI Zip drive boot is supported BIOS boot specification is supported Function key-initiated network boot is supported Targeted content
Re: screen goes blank when loading gma500_gfx (atom D2500)
05.08.2014 20:11, Michael Tokarev wrote: > Hello again. > > It's been 4 more months since last message in this thread (which was mine). > Now kernel 3.16 has been released, and I decided to give it a try. And it > behaves just like all previous kernels, -- once gma500_gfx module is loaded, > screen goes blank, monitor turns off ("no signal detected") and nothing to > be seen until reboot. > > Can we try to debug this somehow, after more than half a year?... :) Current debugging (by 3.16), after: modprobe drm debug=6 modprobe gma500_gfx on a freshly booted system: [ 46.463381] Linux agpgart interface v0.103 [ 46.491487] [drm] Initialized drm 1.1.0 20060810 [ 56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported [ 56.585528] [drm:psb_intel_opregion_setup] ASLE supported [ 56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X [ 56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT CEDARVIEW d [ 56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:"1920x1080" 0 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [ 56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found in VBT [ 56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 5000 [ 56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 24 [ 56.585624] [drm:parse_edp] VBT reports EDP: VSwing 0, Preemph 0 [ 56.598203] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [ 56.598902] acpi device:28: registered as cooling_device2 [ 56.599109] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11 [ 56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 56.599366] [drm] No driver support for vblank timestamp query. [ 56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter intel drm LVDSDDC_C [ 56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B [ 56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 [ 56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 [ 56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 [ 56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 [ 56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C [ 56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack [ 56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack [ 56.714765] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 56.714812] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:10:VGA-1] [ 56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:10:VGA-1] probed modes : [ 56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:"1280x1024" 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5 [ 56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:"1280x1024" 75 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5 [ 56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:"1280x1024" 72 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6 [ 56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:"1152x864" 75 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5 [ 56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:"1024x768" 75 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5 [ 56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:"1024x768" 70 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa [ 56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:"1024x768" 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa [ 56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:"832x624" 75 57284 832 864 928 1152 624 625 628 667 0x40 0xa [ 56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:"800x600" 75 49500 800 816 896 1056 600 601 604 625 0x40 0x5 [ 56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:"800x600" 72 5 800 856 976 1040 600 637 643 666 0x40 0x5 [ 56.900681] [drm:drm_mode_debug_printmodeline] Modeline 30:"800x600" 60 4 800 840 968 1056 600 601 605 628 0x40 0x5 [ 56.900687] [drm:drm_mode_debug_printmodeline] Modeline 31:"640x480" 75 31500 640 656 720 840 480 481 484 500 0x40 0xa [ 56.900694] [drm:drm_mode_debug_printmodeline] Modeline 32:"640x480" 73 31500 640 664 704 832 480 489 491 520 0x40 0xa [ 56.900700] [drm:drm_mode_debug_printmodeline] Modeline 33:"640x480" 67 30240 640 704 768 864 480 483 486 525 0x40 0xa [ 56.900706] [drm:drm_mode_debug_printmodeline] Modeline 34:"640x480" 60 25200 640 656 752 800 480 490 492 525 0x40 0xa [ 56.900713] [drm:drm_mode_debug_printmodeline] Modeline 35:
Re: screen goes blank when loading gma500_gfx (atom D2500)
Hello again. It's been 4 more months since last message in this thread (which was mine). Now kernel 3.16 has been released, and I decided to give it a try. And it behaves just like all previous kernels, -- once gma500_gfx module is loaded, screen goes blank, monitor turns off ("no signal detected") and nothing to be seen until reboot. Can we try to debug this somehow, after more than half a year?... :) Thank you, /mjt 05.04.2014 12:15, Michael Tokarev wrote: > Hello again > > It's been about 2 months since I sent the original debugging output. Today I > tried > out 3.14 kernel. And this one behaves quite similarly, screen goes blank > right > when loading gma500_gfx module. Here's the dmesg from a freshly booted system > after doing > > modprobe drm debug=6 > modprobe gma500_gfx > > with a monitor connected to VGA port (before loading gma500_gfx, it displays > the > regular text console): > > [ 39.863330] Linux agpgart interface v0.103 > [ 39.900511] [drm] Initialized drm 1.1.0 20060810 > [ 45.012300] [drm:psb_intel_opregion_setup], Public ACPI methods supported > [ 45.012308] [drm:psb_intel_opregion_setup], ASLE supported > [ 45.012345] gma500 :00:02.0: irq 50 for MSI/MSI-X > [ 45.012371] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT > CEDARVIEW d > [ 45.012384] [drm:drm_mode_debug_printmodeline], Modeline 0:"1920x1080" 0 > 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa > [ 45.012389] [drm:parse_sdvo_device_mapping], No SDVO device info is found > in VBT > [ 45.012397] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 > t10 500 t11_t12 5000 > [ 45.012401] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, > Bpp 24 > [ 45.012405] [drm:parse_edp], VBT reports EDP: VSwing 0, Preemph 0 > [ 45.012478] gma500 :00:02.0: GPU: power management timed out. > [ 45.026195] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) > [ 45.026891] acpi device:29: registered as cooling_device2 > [ 45.027104] input: Video Bus as > /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 > [ 45.027681] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [ 45.027726] [drm] No driver support for vblank timestamp query. > [ 45.078928] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent > adapter intel drm LVDSDDC_C > [ 45.079839] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B > [ 45.080383] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 > [ 45.080388] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 > [ 45.080896] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 > [ 45.080899] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 > [ 45.081754] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C > [ 45.082062] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack > [ 45.082272] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack > [ 45.122742] [drm:cdv_intel_single_pipe_active], pipe enabled 0 > [ 45.142780] gma500 :00:02.0: trying to get vblank count for disabled > pipe 1 > [ 45.142826] gma500 :00:02.0: trying to get vblank count for disabled > pipe 1 > [ 45.183207] [drm:cdv_intel_single_pipe_active], pipe enabled 0 > [ 45.203249] [drm:drm_helper_probe_single_connector_modes], > [CONNECTOR:7:VGA-1] > [ 45.332286] [drm:drm_helper_probe_single_connector_modes], > [CONNECTOR:7:VGA-1] probed modes : > [ 45.332297] [drm:drm_mode_debug_printmodeline], Modeline 23:"1280x1024" 60 > 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5 > [ 45.332304] [drm:drm_mode_debug_printmodeline], Modeline 33:"1280x1024" 75 > 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5 > [ 45.332311] [drm:drm_mode_debug_printmodeline], Modeline 26:"1280x1024" 72 > 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6 > [ 45.332318] [drm:drm_mode_debug_printmodeline], Modeline 25:"1152x864" 75 > 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5 > [ 45.332325] [drm:drm_mode_debug_printmodeline], Modeline 34:"1024x768" 75 > 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5 > [ 45.332332] [drm:drm_mode_debug_printmodeline], Modeline 35:"1024x768" 70 > 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa > [ 45.332338] [drm:drm_mode_debug_printmodeline], Modeline 36:"1024x768" 60 > 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa > [ 45.332345] [drm:drm_mode_debug_printmodeline], Modeline 37:"832x624" 75 > 57284 832 864 928 1152 624 625 628 667 0x40 0xa > [ 45.332352] [drm:drm_mode_debug_printmodeline], Modeline 38:"800x600" 75 > 49500 800 816 896 1056 600 601 604 625 0x40 0x5 > [ 45.332359] [drm:dr
Re: screen goes blank when loading gma500_gfx (atom D2500)
Hello again. It's been 4 more months since last message in this thread (which was mine). Now kernel 3.16 has been released, and I decided to give it a try. And it behaves just like all previous kernels, -- once gma500_gfx module is loaded, screen goes blank, monitor turns off (no signal detected) and nothing to be seen until reboot. Can we try to debug this somehow, after more than half a year?... :) Thank you, /mjt 05.04.2014 12:15, Michael Tokarev wrote: Hello again It's been about 2 months since I sent the original debugging output. Today I tried out 3.14 kernel. And this one behaves quite similarly, screen goes blank right when loading gma500_gfx module. Here's the dmesg from a freshly booted system after doing modprobe drm debug=6 modprobe gma500_gfx with a monitor connected to VGA port (before loading gma500_gfx, it displays the regular text console): [ 39.863330] Linux agpgart interface v0.103 [ 39.900511] [drm] Initialized drm 1.1.0 20060810 [ 45.012300] [drm:psb_intel_opregion_setup], Public ACPI methods supported [ 45.012308] [drm:psb_intel_opregion_setup], ASLE supported [ 45.012345] gma500 :00:02.0: irq 50 for MSI/MSI-X [ 45.012371] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT CEDARVIEW d [ 45.012384] [drm:drm_mode_debug_printmodeline], Modeline 0:1920x1080 0 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [ 45.012389] [drm:parse_sdvo_device_mapping], No SDVO device info is found in VBT [ 45.012397] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 5000 [ 45.012401] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 24 [ 45.012405] [drm:parse_edp], VBT reports EDP: VSwing 0, Preemph 0 [ 45.012478] gma500 :00:02.0: GPU: power management timed out. [ 45.026195] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [ 45.026891] acpi device:29: registered as cooling_device2 [ 45.027104] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 [ 45.027681] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 45.027726] [drm] No driver support for vblank timestamp query. [ 45.078928] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm LVDSDDC_C [ 45.079839] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B [ 45.080383] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [ 45.080388] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [ 45.080896] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [ 45.080899] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [ 45.081754] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C [ 45.082062] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [ 45.082272] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [ 45.122742] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [ 45.142780] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 45.142826] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 45.183207] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [ 45.203249] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] [ 45.332286] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] probed modes : [ 45.332297] [drm:drm_mode_debug_printmodeline], Modeline 23:1280x1024 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5 [ 45.332304] [drm:drm_mode_debug_printmodeline], Modeline 33:1280x1024 75 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5 [ 45.332311] [drm:drm_mode_debug_printmodeline], Modeline 26:1280x1024 72 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6 [ 45.332318] [drm:drm_mode_debug_printmodeline], Modeline 25:1152x864 75 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5 [ 45.332325] [drm:drm_mode_debug_printmodeline], Modeline 34:1024x768 75 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5 [ 45.332332] [drm:drm_mode_debug_printmodeline], Modeline 35:1024x768 70 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa [ 45.332338] [drm:drm_mode_debug_printmodeline], Modeline 36:1024x768 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa [ 45.332345] [drm:drm_mode_debug_printmodeline], Modeline 37:832x624 75 57284 832 864 928 1152 624 625 628 667 0x40 0xa [ 45.332352] [drm:drm_mode_debug_printmodeline], Modeline 38:800x600 75 49500 800 816 896 1056 600 601 604 625 0x40 0x5 [ 45.332359] [drm:drm_mode_debug_printmodeline], Modeline 39:800x600 72 5 800 856 976 1040 600 637 643 666 0x40 0x5 [ 45.332365] [drm:drm_mode_debug_printmodeline], Modeline 27:800x600 60 4 800 840 968 1056 600 601 605 628 0x40 0x5 [ 45.332372] [drm:drm_mode_debug_printmodeline], Modeline 28:640x480 75 31500 640 656 720 840 480 481 484 500 0x40 0xa [ 45.332379] [drm:drm_mode_debug_printmodeline], Modeline 29:640x480 73 31500 640 664
Re: screen goes blank when loading gma500_gfx (atom D2500)
05.08.2014 20:11, Michael Tokarev wrote: Hello again. It's been 4 more months since last message in this thread (which was mine). Now kernel 3.16 has been released, and I decided to give it a try. And it behaves just like all previous kernels, -- once gma500_gfx module is loaded, screen goes blank, monitor turns off (no signal detected) and nothing to be seen until reboot. Can we try to debug this somehow, after more than half a year?... :) Current debugging (by 3.16), after: modprobe drm debug=6 modprobe gma500_gfx on a freshly booted system: [ 46.463381] Linux agpgart interface v0.103 [ 46.491487] [drm] Initialized drm 1.1.0 20060810 [ 56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported [ 56.585528] [drm:psb_intel_opregion_setup] ASLE supported [ 56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X [ 56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT CEDARVIEW d [ 56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:1920x1080 0 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [ 56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found in VBT [ 56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 5000 [ 56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 24 [ 56.585624] [drm:parse_edp] VBT reports EDP: VSwing 0, Preemph 0 [ 56.598203] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [ 56.598902] acpi device:28: registered as cooling_device2 [ 56.599109] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11 [ 56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 56.599366] [drm] No driver support for vblank timestamp query. [ 56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter intel drm LVDSDDC_C [ 56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B [ 56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 [ 56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 [ 56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064 [ 56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110 [ 56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C [ 56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack [ 56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack [ 56.714765] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 56.714812] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:10:VGA-1] [ 56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:10:VGA-1] probed modes : [ 56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:1280x1024 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5 [ 56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:1280x1024 75 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5 [ 56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:1280x1024 72 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6 [ 56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:1152x864 75 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5 [ 56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:1024x768 75 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5 [ 56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:1024x768 70 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa [ 56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:1024x768 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa [ 56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:832x624 75 57284 832 864 928 1152 624 625 628 667 0x40 0xa [ 56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:800x600 75 49500 800 816 896 1056 600 601 604 625 0x40 0x5 [ 56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:800x600 72 5 800 856 976 1040 600 637 643 666 0x40 0x5 [ 56.900681] [drm:drm_mode_debug_printmodeline] Modeline 30:800x600 60 4 800 840 968 1056 600 601 605 628 0x40 0x5 [ 56.900687] [drm:drm_mode_debug_printmodeline] Modeline 31:640x480 75 31500 640 656 720 840 480 481 484 500 0x40 0xa [ 56.900694] [drm:drm_mode_debug_printmodeline] Modeline 32:640x480 73 31500 640 664 704 832 480 489 491 520 0x40 0xa [ 56.900700] [drm:drm_mode_debug_printmodeline] Modeline 33:640x480 67 30240 640 704 768 864 480 483 486 525 0x40 0xa [ 56.900706] [drm:drm_mode_debug_printmodeline] Modeline 34:640x480 60 25200 640 656 752 800 480 490 492 525 0x40 0xa [ 56.900713] [drm:drm_mode_debug_printmodeline] Modeline 35:720x400 70 28320 720 738 846 900 400 412 414 449 0x40 0x6 [ 56.900719] [drm:drm_mode_debug_printmodeline] Modeline 27:640x350 70 25170 640 656 752 800 350 387 389 449 0x40 0x9 [ 56.900724] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:12:LVDS-1
Re: [PATCH] arch: x86: kvm: x86.c: Cleaning up uninitialized variables
03.06.2014 16:04, Paolo Bonzini wrote: > Il 01/06/2014 01:05, Rickard Strandqvist ha scritto: >> There is a risk that the variable will be used without being initialized. >> >> This was largely found by using a static code analysis program called >> cppcheck. >> >> Signed-off-by: Rickard Strandqvist > > No, there isn't. The full context looks like this: > > longmode = is_long_mode(vcpu) && cs_l == 1; > if (!longmode) { > param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | > (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0x); > ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) | > (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0x); > outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) | > (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0x); > } > #ifdef CONFIG_X86_64 > else { > param = kvm_register_read(vcpu, VCPU_REGS_RCX); > ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); > outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); > } > #endif > > and longmode must be zero if !CONFIG_X86_64: This is not the first time this code is attempted to be changed. Maybe adding an additional #ifdef..endif around the longmode assignment and the "if" above will solve this for good? Or maybe something like this: #ifdef CONFIG_X86_64 if (!(is_long_mode(vcpu) && cs_l == 1)) { #else if (1) { #endif param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0x); ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) | (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0x); outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) | (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0x); } else { param = kvm_register_read(vcpu, VCPU_REGS_RCX); ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); } , to make it all explicit and obvious? Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] arch: x86: kvm: x86.c: Cleaning up uninitialized variables
03.06.2014 16:04, Paolo Bonzini wrote: Il 01/06/2014 01:05, Rickard Strandqvist ha scritto: There is a risk that the variable will be used without being initialized. This was largely found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se No, there isn't. The full context looks like this: longmode = is_long_mode(vcpu) cs_l == 1; if (!longmode) { param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) 32) | (kvm_register_read(vcpu, VCPU_REGS_RAX) 0x); ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) 32) | (kvm_register_read(vcpu, VCPU_REGS_RCX) 0x); outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) 32) | (kvm_register_read(vcpu, VCPU_REGS_RSI) 0x); } #ifdef CONFIG_X86_64 else { param = kvm_register_read(vcpu, VCPU_REGS_RCX); ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); } #endif and longmode must be zero if !CONFIG_X86_64: This is not the first time this code is attempted to be changed. Maybe adding an additional #ifdef..endif around the longmode assignment and the if above will solve this for good? Or maybe something like this: #ifdef CONFIG_X86_64 if (!(is_long_mode(vcpu) cs_l == 1)) { #else if (1) { #endif param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) 32) | (kvm_register_read(vcpu, VCPU_REGS_RAX) 0x); ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) 32) | (kvm_register_read(vcpu, VCPU_REGS_RCX) 0x); outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) 32) | (kvm_register_read(vcpu, VCPU_REGS_RSI) 0x); } else { param = kvm_register_read(vcpu, VCPU_REGS_RCX); ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); } , to make it all explicit and obvious? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
51935] [drm:drm_target_preferred], looking for preferred mode on connector 9 [ 45.351938] [drm:drm_target_preferred], found mode 1920x1080 [ 45.351942] [drm:drm_target_preferred], looking for cmdline mode on connector 20 [ 45.351945] [drm:drm_target_preferred], looking for preferred mode on connector 20 [ 45.351949] [drm:drm_target_preferred], found mode 1024x768 [ 45.351953] [drm:drm_setup_crtcs], picking CRTCs for 4096x4096 config [ 45.351962] [drm:drm_setup_crtcs], desired mode 1280x1024 set on crtc 3 [ 45.351967] [drm:drm_setup_crtcs], desired mode 1920x1080 set on crtc 4 [ 45.351987] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on minor 0 Thank you! /mjt 15.02.2014 22:28, Michael Tokarev wrote: > 10.02.2014 14:44, One Thousand Gnomes wrote: >>> fbcon is loaded so it isn't an issue. >>> >>> I tried 3.10 kernel initially (the above messages are from it), next >>> I tried 3.13 kernel too, and that one behaves exactly the same. >>> >>> As far as I remember, this system never worked with graphics well. >>> Previous kernel (from which I updated) was 3.2 which had no >>> gma500 module (local build). >>> >>> What are the steps to debug this further? >> >> Check you have X86_SYSFB and SIMPLEFB disabled > > Neither of these options exists in 3.10 config. In 3.13 I had X86_SYSFB set > to y initially (SIMPLEFB doesn't exist there too), but setting it to n does > not make any difference. > >> Boot with drm.debug=6 >> >> collect the logs > > I used `modprobe drm debug=6' (initially booting with gma500_gfx module > disabled), followed with `modprobe gma500_gfx'. After loading module > the screen goes blank as before, and monitor says 'no signal detected'. > > Here are the logs: > > [599286.739923] Linux agpgart interface v0.103 > [599286.765176] [drm] Initialized drm 1.1.0 20060810 > [599303.673734] gma500 :00:02.0: setting latency timer to 64 > [599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported > [599303.673887] [drm:psb_intel_opregion_setup], ASLE supported > [599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X > [599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT > CEDARVIEW d > [599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:"1920x1080" 0 > 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa > [599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found > in VBT > [599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 > t10 500 t11_t12 5000 > [599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, > Bpp 24 > [599303.673984] [drm:parse_edp], VBT reports EDP: VSwing 0, Preemph 0 > [599303.688094] acpi device:29: registered as cooling_device2 > [599303.688446] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) > [599303.688557] input: Video Bus as > /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 > [599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). > [599303.689188] [drm] No driver support for vblank timestamp query. > [599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent > adapter intel drm LVDSDDC_C > [599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B > [599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 > [599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 > [599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 > [599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 > [599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C > [599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack > [599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack > [599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0 > [599303.803958] gma500 :00:02.0: trying to get vblank count for disabled > pipe 1 > [599303.803996] gma500 :00:02.0: trying to get vblank count for disabled > pipe 1 > [599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0 > [599303.864408] [drm:drm_helper_probe_single_connector_modes], > [CONNECTOR:7:VGA-1] > [599303.877172] [drm:drm_helper_probe_single_connector_modes], > [CONNECTOR:7:VGA-1] disconnected > [599303.877184] [drm:drm_helper_probe_single_connector_modes], > [CONNECTOR:9:LVDS-1] > [599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent > adapter intel drm LVDSBLC_B > [599303.881778] [drm:drm_helper_probe_single_connector_modes], > [CONNECTOR:9:LVDS-1] probed modes : > [599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:"1920x1080" > 60 144000 192
Re: screen goes blank when loading gma500_gfx (atom D2500)
for preferred mode on connector 20 [ 45.351949] [drm:drm_target_preferred], found mode 1024x768 [ 45.351953] [drm:drm_setup_crtcs], picking CRTCs for 4096x4096 config [ 45.351962] [drm:drm_setup_crtcs], desired mode 1280x1024 set on crtc 3 [ 45.351967] [drm:drm_setup_crtcs], desired mode 1920x1080 set on crtc 4 [ 45.351987] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on minor 0 Thank you! /mjt 15.02.2014 22:28, Michael Tokarev wrote: 10.02.2014 14:44, One Thousand Gnomes wrote: fbcon is loaded so it isn't an issue. I tried 3.10 kernel initially (the above messages are from it), next I tried 3.13 kernel too, and that one behaves exactly the same. As far as I remember, this system never worked with graphics well. Previous kernel (from which I updated) was 3.2 which had no gma500 module (local build). What are the steps to debug this further? Check you have X86_SYSFB and SIMPLEFB disabled Neither of these options exists in 3.10 config. In 3.13 I had X86_SYSFB set to y initially (SIMPLEFB doesn't exist there too), but setting it to n does not make any difference. Boot with drm.debug=6 collect the logs I used `modprobe drm debug=6' (initially booting with gma500_gfx module disabled), followed with `modprobe gma500_gfx'. After loading module the screen goes blank as before, and monitor says 'no signal detected'. Here are the logs: [599286.739923] Linux agpgart interface v0.103 [599286.765176] [drm] Initialized drm 1.1.0 20060810 [599303.673734] gma500 :00:02.0: setting latency timer to 64 [599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported [599303.673887] [drm:psb_intel_opregion_setup], ASLE supported [599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X [599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT CEDARVIEW d [599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:1920x1080 0 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found in VBT [599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 5000 [599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 24 [599303.673984] [drm:parse_edp], VBT reports EDP: VSwing 0, Preemph 0 [599303.688094] acpi device:29: registered as cooling_device2 [599303.688446] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [599303.688557] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 [599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [599303.689188] [drm] No driver support for vblank timestamp query. [599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm LVDSDDC_C [599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B [599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C [599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [599303.803958] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [599303.803996] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [599303.864408] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] [599303.877172] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] disconnected [599303.877184] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:9:LVDS-1] [599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm LVDSBLC_B [599303.881778] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:9:LVDS-1] probed modes : [599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:1920x1080 60 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [599303.881791] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:12:DVI-D-1] [599303.886292] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm HDMIB [599303.886298] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:12:DVI-D-1] disconnected [599303.886304] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:14:DP-1] [599303.886811] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.886815] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:14:DP-1] disconnected [599303.886820] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:18:DVI-D-2] [599303.891350] [drm:drm_do_probe_ddc_edid], drm
Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
27.03.2014 20:14, Alejandro Comisario wrote: > Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side > (ubuntu 12.04 on host and guest). > So, how can i adjust the tinmeout on the guest ? After a bit more talks on IRC yesterday, it turned out that the situation is _much_ more "interesting" than originally described. The OP claims to have 10500 guests running off an NFS server, and that after NFS server downtime, the "backing files" were disappeared (whatever it means), so they had to restore those files. More, the OP didn't even bother to look at the guest's dmesg, being busy rebooting all 10500 guests. > This solution is the most logical one, but i cannot apply it! > thanks for all the responses! I suggested the OP to actually describe the _real_ situation, instead of giving random half-pictures, and actually take a look at the actual problem as reported in various places (most importantly the guest kernel log), and reoirt _those_ hints to the list. I also mentioned that, at least for some NFS servers, if a client has a file open on the server, and this file is deleted, the server will report error to the client when client tries to access that file, and this has nothing at all to do with timeouts of any kind. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
27.03.2014 20:14, Alejandro Comisario wrote: Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side (ubuntu 12.04 on host and guest). So, how can i adjust the tinmeout on the guest ? After a bit more talks on IRC yesterday, it turned out that the situation is _much_ more interesting than originally described. The OP claims to have 10500 guests running off an NFS server, and that after NFS server downtime, the backing files were disappeared (whatever it means), so they had to restore those files. More, the OP didn't even bother to look at the guest's dmesg, being busy rebooting all 10500 guests. This solution is the most logical one, but i cannot apply it! thanks for all the responses! I suggested the OP to actually describe the _real_ situation, instead of giving random half-pictures, and actually take a look at the actual problem as reported in various places (most importantly the guest kernel log), and reoirt _those_ hints to the list. I also mentioned that, at least for some NFS servers, if a client has a file open on the server, and this file is deleted, the server will report error to the client when client tries to access that file, and this has nothing at all to do with timeouts of any kind. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: screen goes blank when loading gma500_gfx (atom D2500)
10.02.2014 14:44, One Thousand Gnomes wrote: >> fbcon is loaded so it isn't an issue. >> >> I tried 3.10 kernel initially (the above messages are from it), next >> I tried 3.13 kernel too, and that one behaves exactly the same. >> >> As far as I remember, this system never worked with graphics well. >> Previous kernel (from which I updated) was 3.2 which had no >> gma500 module (local build). >> >> What are the steps to debug this further? > > Check you have X86_SYSFB and SIMPLEFB disabled Neither of these options exists in 3.10 config. In 3.13 I had X86_SYSFB set to y initially (SIMPLEFB doesn't exist there too), but setting it to n does not make any difference. > Boot with drm.debug=6 > > collect the logs I used `modprobe drm debug=6' (initially booting with gma500_gfx module disabled), followed with `modprobe gma500_gfx'. After loading module the screen goes blank as before, and monitor says 'no signal detected'. Here are the logs: [599286.739923] Linux agpgart interface v0.103 [599286.765176] [drm] Initialized drm 1.1.0 20060810 [599303.673734] gma500 :00:02.0: setting latency timer to 64 [599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported [599303.673887] [drm:psb_intel_opregion_setup], ASLE supported [599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X [599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT CEDARVIEW d [599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:"1920x1080" 0 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found in VBT [599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 5000 [599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 24 [599303.673984] [drm:parse_edp], VBT reports EDP: VSwing 0, Preemph 0 [599303.688094] acpi device:29: registered as cooling_device2 [599303.688446] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [599303.688557] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 [599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [599303.689188] [drm] No driver support for vblank timestamp query. [599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm LVDSDDC_C [599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B [599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C [599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [599303.803958] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [599303.803996] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [599303.864408] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] [599303.877172] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] disconnected [599303.877184] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:9:LVDS-1] [599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm LVDSBLC_B [599303.881778] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:9:LVDS-1] probed modes : [599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:"1920x1080" 60 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [599303.881791] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:12:DVI-D-1] [599303.886292] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm HDMIB [599303.886298] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:12:DVI-D-1] disconnected [599303.886304] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:14:DP-1] [599303.886811] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.886815] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:14:DP-1] disconnected [599303.886820] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:18:DVI-D-2] [599303.891350] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm HDMIC [599303.891357] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:18:DVI-D-2] disconnected [599303.891362] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:20:DP-2] [599303.891569] [drm:cdv_dp_detect], DPCD: Rev=11 LN_Rate=a LN_CNT=82 LN_DOWNSP=41 [599303.891876] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.892082] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.892085] [drm:i2c_algo_dp_aux_xfer], dp_aux_xfer return
Re: screen goes blank when loading gma500_gfx (atom D2500)
10.02.2014 14:44, One Thousand Gnomes wrote: fbcon is loaded so it isn't an issue. I tried 3.10 kernel initially (the above messages are from it), next I tried 3.13 kernel too, and that one behaves exactly the same. As far as I remember, this system never worked with graphics well. Previous kernel (from which I updated) was 3.2 which had no gma500 module (local build). What are the steps to debug this further? Check you have X86_SYSFB and SIMPLEFB disabled Neither of these options exists in 3.10 config. In 3.13 I had X86_SYSFB set to y initially (SIMPLEFB doesn't exist there too), but setting it to n does not make any difference. Boot with drm.debug=6 collect the logs I used `modprobe drm debug=6' (initially booting with gma500_gfx module disabled), followed with `modprobe gma500_gfx'. After loading module the screen goes blank as before, and monitor says 'no signal detected'. Here are the logs: [599286.739923] Linux agpgart interface v0.103 [599286.765176] [drm] Initialized drm 1.1.0 20060810 [599303.673734] gma500 :00:02.0: setting latency timer to 64 [599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported [599303.673887] [drm:psb_intel_opregion_setup], ASLE supported [599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X [599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT CEDARVIEW d [599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:1920x1080 0 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found in VBT [599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 500 t11_t12 5000 [599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 24 [599303.673984] [drm:parse_edp], VBT reports EDP: VSwing 0, Preemph 0 [599303.688094] acpi device:29: registered as cooling_device2 [599303.688446] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [599303.688557] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 [599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [599303.689188] [drm] No driver support for vblank timestamp query. [599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm LVDSDDC_C [599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B [599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110 [599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C [599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [599303.803958] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [599303.803996] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0 [599303.864408] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] [599303.877172] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:7:VGA-1] disconnected [599303.877184] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:9:LVDS-1] [599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm LVDSBLC_B [599303.881778] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:9:LVDS-1] probed modes : [599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:1920x1080 60 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa [599303.881791] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:12:DVI-D-1] [599303.886292] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm HDMIB [599303.886298] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:12:DVI-D-1] disconnected [599303.886304] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:14:DP-1] [599303.886811] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064 [599303.886815] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:14:DP-1] disconnected [599303.886820] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:18:DVI-D-2] [599303.891350] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter intel drm HDMIC [599303.891357] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:18:DVI-D-2] disconnected [599303.891362] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:20:DP-2] [599303.891569] [drm:cdv_dp_detect], DPCD: Rev=11 LN_Rate=a LN_CNT=82 LN_DOWNSP=41 [599303.891876] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.892082] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack [599303.892085] [drm:i2c_algo_dp_aux_xfer], dp_aux_xfer return -121 [599303.892391]
Re: [ANNOUNCE] s390 31 bit kernel support removal
12.02.2014 13:29, Heiko Carstens wrote: > We want to remove s390 31 bit kernel support with Linux kernel 3.16. Maybe you can send a patch for Documentation/feature-removal-schedule.txt about this now? Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] s390 31 bit kernel support removal
12.02.2014 13:29, Heiko Carstens wrote: We want to remove s390 31 bit kernel support with Linux kernel 3.16. Maybe you can send a patch for Documentation/feature-removal-schedule.txt about this now? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
screen goes blank when loading gma500_gfx (atom D2500)
Hello. Today I rebooted my router into a new kernel and noticed that the screen goes blank after booting the system (initial bootup messages are visible). After some debugging it turns out that the screen goes blank when loading gma500_gfx module. This is an intel D2500CC motherboard with Atom D5200 built-in, with a monitor connected to a VGA port, the following vga device is reported by lspci: 00:02.0 VGA compatible controller: Intel Corporation Atom Processor D2xxx/N2xxx Integrated Graphics Controller (rev 09) Here are the dmesg output after loading gma500_gfx: [ 176.427071] Linux agpgart interface v0.103 [ 176.452914] [drm] Initialized drm 1.1.0 20060810 [ 176.476037] gma500 :00:02.0: setting latency timer to 64 [ 176.476216] gma500 :00:02.0: irq 50 for MSI/MSI-X [ 176.491675] acpi device:29: registered as cooling_device2 [ 176.492041] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [ 176.492169] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 [ 176.492357] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [ 176.492396] [drm] No driver support for vblank timestamp query. [ 176.607485] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 176.607531] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 176.806078] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on minor 0 which does not look bad or suspicious to me. fbcon is loaded so it isn't an issue. I tried 3.10 kernel initially (the above messages are from it), next I tried 3.13 kernel too, and that one behaves exactly the same. As far as I remember, this system never worked with graphics well. Previous kernel (from which I updated) was 3.2 which had no gma500 module (local build). What are the steps to debug this further? Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
screen goes blank when loading gma500_gfx (atom D2500)
Hello. Today I rebooted my router into a new kernel and noticed that the screen goes blank after booting the system (initial bootup messages are visible). After some debugging it turns out that the screen goes blank when loading gma500_gfx module. This is an intel D2500CC motherboard with Atom D5200 built-in, with a monitor connected to a VGA port, the following vga device is reported by lspci: 00:02.0 VGA compatible controller: Intel Corporation Atom Processor D2xxx/N2xxx Integrated Graphics Controller (rev 09) Here are the dmesg output after loading gma500_gfx: [ 176.427071] Linux agpgart interface v0.103 [ 176.452914] [drm] Initialized drm 1.1.0 20060810 [ 176.476037] gma500 :00:02.0: setting latency timer to 64 [ 176.476216] gma500 :00:02.0: irq 50 for MSI/MSI-X [ 176.491675] acpi device:29: registered as cooling_device2 [ 176.492041] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [ 176.492169] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11 [ 176.492357] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [ 176.492396] [drm] No driver support for vblank timestamp query. [ 176.607485] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 176.607531] gma500 :00:02.0: trying to get vblank count for disabled pipe 1 [ 176.806078] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on minor 0 which does not look bad or suspicious to me. fbcon is loaded so it isn't an issue. I tried 3.10 kernel initially (the above messages are from it), next I tried 3.13 kernel too, and that one behaves exactly the same. As far as I remember, this system never worked with graphics well. Previous kernel (from which I updated) was 3.2 which had no gma500 module (local build). What are the steps to debug this further? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.10.25 kernel behaves unstable as a qemu/kvm guest
Hello. This is just an initial/preliminary heads-up, maybe mis-directed, about a possible issue. I upgraded 2 machines today to 3.10.25, and both shows some.. strangeness within linux guests, which are also running 3.10.25. Revering to 3.10.24 in guests (compiled by the same compiler with the same options) or using older qemu/kvm (running with 1.7 now) fixes it. All guests are using virtio-net and virtio-blk. On one machine (prod), one guest (also prod) loads okay, but the networking is not functioning: no packets are received by the guest. I weren't able to debug this further at this time, so reverted back to an older qemu/kvm (1.1). On another machine (my home workstation where I can experiment), the same combination (3.10.25 on host & guest and qemu 1.7) shows rather unstable behavour: about every 1/2 boot it stalls somewhere at the initial boot, either after initializing PNP, or initializing networking, or sometimes after initializing virtio, and the rest 1/2 it boots okay. When it stalls, it consumes no CPU, qemu process is responsive, the guest just does nothing. Like this: ... NET: Registering protocol family 2 TCP: established hash table entries: 8192 (order: 5, 131072 bytes) TCP bind hash table entries: 8192 (order: 5, 131072 bytes) TCP: Hash tables configured (established 8192 bind 8192) TCP: reno registered [at this point it hanged] (after this it normally registers UDP hash tables and other net stuff) I'm not sure yet what's going on. I understand that there are no guest-related changes in 3.10.25 (compared with .24), so there should be something else. The fact that it stalls randomly suggests there's some uninitialized value somewhere. I'll try to debug it further. Just a heads-up for now. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.10.25 kernel behaves unstable as a qemu/kvm guest
Hello. This is just an initial/preliminary heads-up, maybe mis-directed, about a possible issue. I upgraded 2 machines today to 3.10.25, and both shows some.. strangeness within linux guests, which are also running 3.10.25. Revering to 3.10.24 in guests (compiled by the same compiler with the same options) or using older qemu/kvm (running with 1.7 now) fixes it. All guests are using virtio-net and virtio-blk. On one machine (prod), one guest (also prod) loads okay, but the networking is not functioning: no packets are received by the guest. I weren't able to debug this further at this time, so reverted back to an older qemu/kvm (1.1). On another machine (my home workstation where I can experiment), the same combination (3.10.25 on host guest and qemu 1.7) shows rather unstable behavour: about every 1/2 boot it stalls somewhere at the initial boot, either after initializing PNP, or initializing networking, or sometimes after initializing virtio, and the rest 1/2 it boots okay. When it stalls, it consumes no CPU, qemu process is responsive, the guest just does nothing. Like this: ... NET: Registering protocol family 2 TCP: established hash table entries: 8192 (order: 5, 131072 bytes) TCP bind hash table entries: 8192 (order: 5, 131072 bytes) TCP: Hash tables configured (established 8192 bind 8192) TCP: reno registered [at this point it hanged] (after this it normally registers UDP hash tables and other net stuff) I'm not sure yet what's going on. I understand that there are no guest-related changes in 3.10.25 (compared with .24), so there should be something else. The fact that it stalls randomly suggests there's some uninitialized value somewhere. I'll try to debug it further. Just a heads-up for now. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] autofs4 - rename autofs4 to autofs
31.08.2013 15:42, Ian Kent wrote: [...] > By leaving a Kconfig and Makefile in fs/autofs4 (to build autofs4.ko) > with a deprication message sub-system maintainers and other users will > make any needed changes before these are removed after two kernel versions. > IMHO the presence of the warning is reason enough to leave a build stub > rather than do a straight out rename. Why do you want to continue building autofs4.ko? (or allowing to) What's actually wrong with a stright rename? If the new module can be auto-loaded by both name (by providing an alias), there's no need to keep ability to build autofs4.ko, I think. Well, maybe except of the case when autofs is needed in initramfs (like for systemd). For this, indeed, you can keep autofs4.ko which is a dummy depending on autofs.ko... > Ian Kent (10): > autofs4 - coding style fixes > autofs4 - fix string.h include in auto_dev-ioctl.h > autofs4 - move linux/auto_dev-ioctl.h to uapi/linux > autofs - merge auto_fs.h and auto_fs4.h > autofs - use autofs instead of autofs4 everywhere > autofs - copy autofs4 to autofs > autofs - create autofs Kconfig and Makefile > autofs - update fs/autofs4/Kconfig > autofs - update fs/autofs4/Makefile > autofs - delete fs/autofs4 By doing it this way, you're losing all git history. If you perform stright rename and git detects it, you can use, eg, git log --follow to see whole hostory across rename. This way you create new files without history. So I strongly shuggest actually renaming the subdirectory (together with appropriate kconfig/makefile changes so things are bisectable), and creating the stubs after this. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] autofs4 - rename autofs4 to autofs
31.08.2013 15:42, Ian Kent wrote: [...] By leaving a Kconfig and Makefile in fs/autofs4 (to build autofs4.ko) with a deprication message sub-system maintainers and other users will make any needed changes before these are removed after two kernel versions. IMHO the presence of the warning is reason enough to leave a build stub rather than do a straight out rename. Why do you want to continue building autofs4.ko? (or allowing to) What's actually wrong with a stright rename? If the new module can be auto-loaded by both name (by providing an alias), there's no need to keep ability to build autofs4.ko, I think. Well, maybe except of the case when autofs is needed in initramfs (like for systemd). For this, indeed, you can keep autofs4.ko which is a dummy depending on autofs.ko... Ian Kent (10): autofs4 - coding style fixes autofs4 - fix string.h include in auto_dev-ioctl.h autofs4 - move linux/auto_dev-ioctl.h to uapi/linux autofs - merge auto_fs.h and auto_fs4.h autofs - use autofs instead of autofs4 everywhere autofs - copy autofs4 to autofs autofs - create autofs Kconfig and Makefile autofs - update fs/autofs4/Kconfig autofs - update fs/autofs4/Makefile autofs - delete fs/autofs4 By doing it this way, you're losing all git history. If you perform stright rename and git detects it, you can use, eg, git log --follow to see whole hostory across rename. This way you create new files without history. So I strongly shuggest actually renaming the subdirectory (together with appropriate kconfig/makefile changes so things are bisectable), and creating the stubs after this. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Very poor latency when using hard drive (raid1)
15.04.2013 13:59, l...@tigusoft.pl пишет: > There are 2 hard drives (normal, magnetic) in software raid 1 > on 3.2.41 kernel. > > When I write into them e.g. using dd from /dev/zero to a local file > (ext4 on default settings), running 2 dd at once (writing two files) it > starves all other programs that try to use the disk. > > Running ls on any directory on same disk (same fs btw), takes over half > minute to execute, same for any other disk touching action. > > Did anyone seen such problem, where too look, what to test? This is typical, known for many years, issue. Your dds are run against buffer cache, the same as used by all other regular accesses. So once it fills up, cached directories and the like are thrown away to make room for new cache space. So once you need something else, that something needs to be read from disk, which is busy together with the buffer cache. > What could solve it (other then ionice on applications that I expect to > use hard drive)? Just don't mix these two workloads. Or, if you really need to transfer large amount of data, use direct I/O (O_DIRECT) -- for dd it is iflag=direct or oflag=direct (depending on the I/O direction). ionice wont help much. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Very poor latency when using hard drive (raid1)
15.04.2013 13:59, l...@tigusoft.pl пишет: There are 2 hard drives (normal, magnetic) in software raid 1 on 3.2.41 kernel. When I write into them e.g. using dd from /dev/zero to a local file (ext4 on default settings), running 2 dd at once (writing two files) it starves all other programs that try to use the disk. Running ls on any directory on same disk (same fs btw), takes over half minute to execute, same for any other disk touching action. Did anyone seen such problem, where too look, what to test? This is typical, known for many years, issue. Your dds are run against buffer cache, the same as used by all other regular accesses. So once it fills up, cached directories and the like are thrown away to make room for new cache space. So once you need something else, that something needs to be read from disk, which is busy together with the buffer cache. What could solve it (other then ionice on applications that I expect to use hard drive)? Just don't mix these two workloads. Or, if you really need to transfer large amount of data, use direct I/O (O_DIRECT) -- for dd it is iflag=direct or oflag=direct (depending on the I/O direction). ionice wont help much. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()
13.02.2013 11:37, Ian Kent wrote: [] So, you would like me to forward this to Linus? I'd be inclined to wait until the window for 3.9 opens since Linus probably has more than enough to do finalizing 3.8 right now. I guess this change is anything but urgent ;) Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()
13.02.2013 11:20, Ian Kent wrote: On Tue, 2013-02-12 at 10:12 -0700, Tim Gardner wrote: smatch analysis: fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null check on wq->name.name calling kfree() I'm not sure about this change. autofs4_catatonic_mode() could be called when there are remaining entries in the wait queue, which is nulled, so autofs4_wait_release() won't see the the discarded waits if it is called. Ian, this is about something else really. The patch is about the NULL check before calling kfree() -- it does the NULL check internally. It is nothing about code flow or anything else, it is about calling kfree() unconditionally regardless whenever the argument is actually NULL or non-NULL. It makes the code shorter and easier to read. You can add my Signed-off-by: Michael Tokarev if you want. Cc: Ian Kent Cc: aut...@vger.kernel.org Signed-off-by: Tim Gardner --- fs/autofs4/waitq.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c index 03bc1d3..3db70da 100644 --- a/fs/autofs4/waitq.c +++ b/fs/autofs4/waitq.c @@ -42,10 +42,8 @@ void autofs4_catatonic_mode(struct autofs_sb_info *sbi) while (wq) { nwq = wq->next; wq->status = -ENOENT; /* Magic is gone - report failure */ - if (wq->name.name) { - kfree(wq->name.name); - wq->name.name = NULL; - } + kfree(wq->name.name); + wq->name.name = NULL; wq->wait_ctr--; wake_up_interruptible(>queue); wq = nwq; -- To unsubscribe from this list: send the line "unsubscribe autofs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()
13.02.2013 11:20, Ian Kent wrote: On Tue, 2013-02-12 at 10:12 -0700, Tim Gardner wrote: smatch analysis: fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null check on wq-name.name calling kfree() I'm not sure about this change. autofs4_catatonic_mode() could be called when there are remaining entries in the wait queue, which is nulled, so autofs4_wait_release() won't see the the discarded waits if it is called. Ian, this is about something else really. The patch is about the NULL check before calling kfree() -- it does the NULL check internally. It is nothing about code flow or anything else, it is about calling kfree() unconditionally regardless whenever the argument is actually NULL or non-NULL. It makes the code shorter and easier to read. You can add my Signed-off-by: Michael Tokarev m...@tls.msk.ru if you want. Cc: Ian Kent ra...@themaw.net Cc: aut...@vger.kernel.org Signed-off-by: Tim Gardner tim.gard...@canonical.com --- fs/autofs4/waitq.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c index 03bc1d3..3db70da 100644 --- a/fs/autofs4/waitq.c +++ b/fs/autofs4/waitq.c @@ -42,10 +42,8 @@ void autofs4_catatonic_mode(struct autofs_sb_info *sbi) while (wq) { nwq = wq-next; wq-status = -ENOENT; /* Magic is gone - report failure */ - if (wq-name.name) { - kfree(wq-name.name); - wq-name.name = NULL; - } + kfree(wq-name.name); + wq-name.name = NULL; wq-wait_ctr--; wake_up_interruptible(wq-queue); wq = nwq; -- To unsubscribe from this list: send the line unsubscribe autofs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()
13.02.2013 11:37, Ian Kent wrote: [] So, you would like me to forward this to Linus? I'd be inclined to wait until the window for 3.9 opens since Linus probably has more than enough to do finalizing 3.8 right now. I guess this change is anything but urgent ;) Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Transparent Huge Pages
Hello. I'm trying to understand how to use transparent huge pages (currently in x86). Before I used "explicit" huge pages alot (mostly about hugetlbfs), but it looked like THP should be easier so I gave it a try. This tiny program: - cut - #include #include #include #include #include #include #include int main(int argc, char **argv) { void *ptr; size_t len = argv[1] ? atoi(argv[1]) : 1024*1024*1024; /* no error checking! */ posix_memalign(, 2048*1024, len); madvise(ptr, len, MADV_HUGEPAGE); memset(ptr, 0, len); usleep(500); /* let khugepagesd do its work */ system("grep ^AnonHugePages: /proc/meminfo"); return 0; } - cut - which just tries to allocate some amount of RAM (1Gb by default) aligned to 2Mb, uses madvise(HUGEPAGE) on it, and checks /proc/meminfo for AnonHugePages. The problem is: I've never seen any value for AnonHugePages larger than about 16Mb. Usually it is around 10Mb or 8Mb, no matter how large the requested memory size is, including the default 1Gb. The question, obviously, is: why so small? My system (which is a few years old now) has 6Gb of RAM, it uses AMD Athlon II X2 260 CPU, and is running 3.2 kernel. Original question comes from grounds of of QEMU, which is supposed to use THP for guest memory, but it also does not use more than these ~10Mb, when allocating 1Gb to the guest. Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Transparent Huge Pages
Hello. I'm trying to understand how to use transparent huge pages (currently in x86). Before I used explicit huge pages alot (mostly about hugetlbfs), but it looked like THP should be easier so I gave it a try. This tiny program: - cut - #include unistd.h #include stdio.h #include stdlib.h #include sys/types.h #include sys/mman.h #include errno.h #include string.h int main(int argc, char **argv) { void *ptr; size_t len = argv[1] ? atoi(argv[1]) : 1024*1024*1024; /* no error checking! */ posix_memalign(ptr, 2048*1024, len); madvise(ptr, len, MADV_HUGEPAGE); memset(ptr, 0, len); usleep(500); /* let khugepagesd do its work */ system(grep ^AnonHugePages: /proc/meminfo); return 0; } - cut - which just tries to allocate some amount of RAM (1Gb by default) aligned to 2Mb, uses madvise(HUGEPAGE) on it, and checks /proc/meminfo for AnonHugePages. The problem is: I've never seen any value for AnonHugePages larger than about 16Mb. Usually it is around 10Mb or 8Mb, no matter how large the requested memory size is, including the default 1Gb. The question, obviously, is: why so small? My system (which is a few years old now) has 6Gb of RAM, it uses AMD Athlon II X2 260 CPU, and is running 3.2 kernel. Original question comes from grounds of of QEMU, which is supposed to use THP for guest memory, but it also does not use more than these ~10Mb, when allocating 1Gb to the guest. Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 00/11] x86/microcode: Early load microcode
On 20.12.2012 23:48, Fenghua Yu wrote: > From: Fenghua Yu > > The problem in current microcode loading method is that we load a microcode > way, > way too late; ideally we should load it before turning paging on. This may > only > be practical on 32 bits since we can't get to 64-bit mode without paging on, > but we should still do it as early as at all possible. Why loading microcode this early is important? Why it is bad to load it at the end of (initial) boot? Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 00/11] x86/microcode: Early load microcode
On 20.12.2012 23:48, Fenghua Yu wrote: From: Fenghua Yu fenghua...@intel.com The problem in current microcode loading method is that we load a microcode way, way too late; ideally we should load it before turning paging on. This may only be practical on 32 bits since we can't get to 64-bit mode without paging on, but we should still do it as early as at all possible. Why loading microcode this early is important? Why it is bad to load it at the end of (initial) boot? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] libata fixes for 3.7
On 02.10.2012 23:59, Jeff Garzik wrote: > On 10/02/2012 03:44 PM, Michael Tokarev wrote: >> On 02.10.2012 23:40, Jeff Garzik wrote: >> >>> Minor libata updates, nothing notable. >>> >>> 1) Apply -- and then revert -- the FUA feature. Caused >>> disk corruption in linux-next, proving it cannot be turned on by >>> default. >> >> Any details on that? Disk corruprion is rather a nasty >> side-effect indeed. > > One thread with reports is > > Storage related regression in linux-next 20120824 Eg, https://lkml.org/lkml/2012/8/27/66 (two reports). Thank you! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] libata fixes for 3.7
On 02.10.2012 23:40, Jeff Garzik wrote: > Minor libata updates, nothing notable. > > 1) Apply -- and then revert -- the FUA feature. Caused >disk corruption in linux-next, proving it cannot be turned on by >default. Any details on that? Disk corruprion is rather a nasty side-effect indeed. Thank you! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
On 02.10.2012 22:49, Ferenc Wagner wrote: > "Michael Chan" writes: >> These are the likely fixes: >> >> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db >> Author: Matt Carlson >> Date: Mon Nov 28 09:41:03 2011 + >> >> tg3: Fix TSO CAP for 5704 devs w / ASF enabled > > You are exactly right: cf9ecf4b fixed the premanent SoL breakage > introduced by dabc5c67. Looks like ASF utilizes similar technology to > that of the HS20 BMC. Thanks for the tip, it greatly reduced our CPU > wear. :) It's a pity ethtool -k did not give a hint. Do you think it's > possible to work around in 3.2 by eg. fiddling some ethtool setting? Maybe it's better to push this commit to -stable instead? (the commit that broke things is part of 3.0 kernel so all current 3.x -stable kernels are affected) (Besides, that commit "This patch fixes the problem by revisiting and reevaluating the decision after tg3_get_eeprom_hw_cfg() is called." - merely copies a somewhat "twisted" chunk of code into another place, which does not look optimal) Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tg3 driver upgrade (Linux 2.6.32 - 3.2) breaks IBM Bladecenter SoL
On 02.10.2012 22:49, Ferenc Wagner wrote: Michael Chan mc...@broadcom.com writes: These are the likely fixes: commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db Author: Matt Carlson mcarl...@broadcom.com Date: Mon Nov 28 09:41:03 2011 + tg3: Fix TSO CAP for 5704 devs w / ASF enabled You are exactly right: cf9ecf4b fixed the premanent SoL breakage introduced by dabc5c67. Looks like ASF utilizes similar technology to that of the HS20 BMC. Thanks for the tip, it greatly reduced our CPU wear. :) It's a pity ethtool -k did not give a hint. Do you think it's possible to work around in 3.2 by eg. fiddling some ethtool setting? Maybe it's better to push this commit to -stable instead? (the commit that broke things is part of 3.0 kernel so all current 3.x -stable kernels are affected) (Besides, that commit This patch fixes the problem by revisiting and reevaluating the decision after tg3_get_eeprom_hw_cfg() is called. - merely copies a somewhat twisted chunk of code into another place, which does not look optimal) Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] libata fixes for 3.7
On 02.10.2012 23:40, Jeff Garzik wrote: Minor libata updates, nothing notable. 1) Apply -- and then revert -- the FUA feature. Caused disk corruption in linux-next, proving it cannot be turned on by default. Any details on that? Disk corruprion is rather a nasty side-effect indeed. Thank you! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] libata fixes for 3.7
On 02.10.2012 23:59, Jeff Garzik wrote: On 10/02/2012 03:44 PM, Michael Tokarev wrote: On 02.10.2012 23:40, Jeff Garzik wrote: Minor libata updates, nothing notable. 1) Apply -- and then revert -- the FUA feature. Caused disk corruption in linux-next, proving it cannot be turned on by default. Any details on that? Disk corruprion is rather a nasty side-effect indeed. One thread with reports is Storage related regression in linux-next 20120824 Eg, https://lkml.org/lkml/2012/8/27/66 (two reports). Thank you! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lve module taint?
On 19.09.2012 06:02, Rusty Russell wrote: > From: Matthew Garrett > Subject: module: taint kernel when lve module is loaded > Date: Fri, 22 Jun 2012 13:49:31 -0400 > > Cloudlinux have a product called lve that includes a kernel module. This > was previously GPLed but is now under a proprietary license, but the > module continues to declare MODULE_LICENSE("GPL") and makes use of some > EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this. > + /* lve claims to be GPL but upstream won't provide source */ > + if (strcmp(mod->name, "lve") == 0) > + add_taint_module(mod, TAINT_PROPRIETARY_MODULE); This is setting a, in my opinion, rather bad precedent. Next we'll be adding various modules here due to various reasons. I think this case should be pure political now, not technical. Ie, if some project declares itself as GPL, it is not kernel task to verify that the sources are available or to enforce that. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lve module taint?
On 19.09.2012 06:02, Rusty Russell wrote: From: Matthew Garrett mj...@srcf.ucam.org Subject: module: taint kernel when lve module is loaded Date: Fri, 22 Jun 2012 13:49:31 -0400 Cloudlinux have a product called lve that includes a kernel module. This was previously GPLed but is now under a proprietary license, but the module continues to declare MODULE_LICENSE(GPL) and makes use of some EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this. + /* lve claims to be GPL but upstream won't provide source */ + if (strcmp(mod-name, lve) == 0) + add_taint_module(mod, TAINT_PROPRIETARY_MODULE); This is setting a, in my opinion, rather bad precedent. Next we'll be adding various modules here due to various reasons. I think this case should be pure political now, not technical. Ie, if some project declares itself as GPL, it is not kernel task to verify that the sources are available or to enforce that. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Qemu-devel] x86, nops settings result in kernel crash
On 20.08.2012 21:13, Tomas Racek wrote: [] Can we trim the old, large and now not-so-relevant discussion please? ;) > I can provide you with more different traces if it can help. But I thought > that maybe it will be more useful for you to try it on your own. So I've > prepared some minimal debian installation which you could download here (apx > 163M bzipped): > > http://fi.muni.cz/~xracek/debian.img.bz2 > > Password: > root/asdfgh > > Here is my config for guest kernel: > > http://fi.muni.cz/~xracek/config > > I use > > qemu-kvm -m 1500 -hda debian.img -kernel linux/arch/x86/boot/bzImage -append > "root=/dev/sda1" Um. I'd expect the image to be self-contained, no external kernel. I wanted to do a quick test to see if it fails on my machine too, d/loaded debian.img.bz2 but there's no kernel. So.. no quick test for you ;) > After logging in just run "sh runtest.sh". This leads to crash in my case > (host: Intel Core i5-2540M, kernel 3.5.2-1.fc17.x86_64, qemu 1.0.1). With all the above, this "runtest.sh" is informationally equal to your disk image. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: root=PARTUUID for MBR/NT disk signatures?
On 21.08.2012 08:47, Will Drewry wrote: [] > Functionally, I suspect this will work fine, but I am concerned that > it is a bad move from an efficiency perspective (not unfixable > though). Right now, the user-supplied value is converted from > string-uuid to packed-uuid. This is then memcmp'd across any and all > partitions - be it 2 or 200 - across all attached storage. If we move > to a pure string, then we end up needing to unpack every packed UUID > at disk scan time (or search, depending on impl) rather than just the > one user supplied value. > > Perhaps the cost is negligible on modern machines, but it seems like > the wrong place to put the cost (per entry rather than per search > value). Amount of work needed to READ all the partition tables might be quite a bit larger than strcmp'ing it all. I think. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: root=PARTUUID for MBR/NT disk signatures?
On 21.08.2012 08:47, Will Drewry wrote: [] Functionally, I suspect this will work fine, but I am concerned that it is a bad move from an efficiency perspective (not unfixable though). Right now, the user-supplied value is converted from string-uuid to packed-uuid. This is then memcmp'd across any and all partitions - be it 2 or 200 - across all attached storage. If we move to a pure string, then we end up needing to unpack every packed UUID at disk scan time (or search, depending on impl) rather than just the one user supplied value. Perhaps the cost is negligible on modern machines, but it seems like the wrong place to put the cost (per entry rather than per search value). Amount of work needed to READ all the partition tables might be quite a bit larger than strcmp'ing it all. I think. /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Qemu-devel] x86, nops settings result in kernel crash
On 20.08.2012 21:13, Tomas Racek wrote: [] Can we trim the old, large and now not-so-relevant discussion please? ;) I can provide you with more different traces if it can help. But I thought that maybe it will be more useful for you to try it on your own. So I've prepared some minimal debian installation which you could download here (apx 163M bzipped): http://fi.muni.cz/~xracek/debian.img.bz2 Password: root/asdfgh Here is my config for guest kernel: http://fi.muni.cz/~xracek/config I use qemu-kvm -m 1500 -hda debian.img -kernel linux/arch/x86/boot/bzImage -append root=/dev/sda1 Um. I'd expect the image to be self-contained, no external kernel. I wanted to do a quick test to see if it fails on my machine too, d/loaded debian.img.bz2 but there's no kernel. So.. no quick test for you ;) After logging in just run sh runtest.sh. This leads to crash in my case (host: Intel Core i5-2540M, kernel 3.5.2-1.fc17.x86_64, qemu 1.0.1). With all the above, this runtest.sh is informationally equal to your disk image. /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 18.08.2012 15:13, J. Bruce Fields wrote: > On Sat, Aug 18, 2012 at 10:49:31AM +0400, Michael Tokarev wrote: [] >> Well. What can I say? With the change below applied (to 3.2 kernel >> at least), I don't see any stalls or high CPU usage on the server >> anymore. It survived several multi-gigabyte transfers, for several >> hours, without any problem. So it is a good step forward ;) >> >> But the whole thing seems to be quite a bit fragile. I tried to follow >> the logic in there, and the thing is quite a bit, well, "twisted", and >> somewhat difficult to follow. So I don't know if this is the right >> fix or not. At least it works! :) > > Suggestions welcomed. Ok... Meanwhile, you can add my Tested-By: Michael Tokarev to the patch. >> And I really wonder why no one else reported this problem before. >> Is me the only one in this world who uses linux nfsd? :) > > This, for example: > > http://marc.info/?l=linux-nfs=134131915612287=2 > > may well describe the same problem It just needed some debugging > persistence, thanks! Ah. I tried to find something when I initially sent this report, but weren't able to. Apparently I'm not alone with this problem indeed! Thank you for all the work! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 18.08.2012 02:32, J. Bruce Fields wrote: > On Fri, Aug 17, 2012 at 04:08:07PM -0400, J. Bruce Fields wrote: >> Wait a minute, that assumption's a problem because that calculation >> depends in part on xpt_reserved, which is changed here >> >> In particular, svc_xprt_release() calls svc_reserve(rqstp, 0), which >> subtracts rqstp->rq_reserved and then calls svc_xprt_enqueue, now with a >> lower xpt_reserved value. That could well explain this. > > So, maybe something like this? Well. What can I say? With the change below applied (to 3.2 kernel at least), I don't see any stalls or high CPU usage on the server anymore. It survived several multi-gigabyte transfers, for several hours, without any problem. So it is a good step forward ;) But the whole thing seems to be quite a bit fragile. I tried to follow the logic in there, and the thing is quite a bit, well, "twisted", and somewhat difficult to follow. So I don't know if this is the right fix or not. At least it works! :) And I really wonder why no one else reported this problem before. Is me the only one in this world who uses linux nfsd? :) Thank you for all your patience and the proposed fix! /mjt > commit c8136c319ad85d0db870021fc3f9074d37f26d4a > Author: J. Bruce Fields > Date: Fri Aug 17 17:31:53 2012 -0400 > > svcrpc: don't add to xpt_reserved till we receive > > The rpc server tries to ensure that there will be room to send a reply > before it receives a request. > > It does this by tracking, in xpt_reserved, an upper bound on the total > size of the replies that is has already committed to for the socket. > > Currently it is adding in the estimate for a new reply *before* it > checks whether there is space available. If it finds that there is not > space, it then subtracts the estimate back out. > > This may lead the subsequent svc_xprt_enqueue to decide that there is > space after all. > > The results is a svc_recv() that will repeatedly return -EAGAIN, causing > server threads to loop without doing any actual work. > > Reported-by: Michael Tokarev > Signed-off-by: J. Bruce Fields > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > index ec99849a..59ff3a3 100644 > --- a/net/sunrpc/svc_xprt.c > +++ b/net/sunrpc/svc_xprt.c > @@ -366,8 +366,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt) > rqstp, rqstp->rq_xprt); > rqstp->rq_xprt = xprt; > svc_xprt_get(xprt); > - rqstp->rq_reserved = serv->sv_max_mesg; > - atomic_add(rqstp->rq_reserved, >xpt_reserved); > pool->sp_stats.threads_woken++; > wake_up(>rq_wait); > } else { > @@ -644,8 +642,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) > if (xprt) { > rqstp->rq_xprt = xprt; > svc_xprt_get(xprt); > - rqstp->rq_reserved = serv->sv_max_mesg; > - atomic_add(rqstp->rq_reserved, >xpt_reserved); > > /* As there is a shortage of threads and this request >* had to be queued, don't allow the thread to wait so > @@ -743,6 +739,10 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) > len = xprt->xpt_ops->xpo_recvfrom(rqstp); > dprintk("svc: got len=%d\n", len); > } > + if (len > 0) { > + rqstp->rq_reserved = serv->sv_max_mesg; > + atomic_add(rqstp->rq_reserved, >xpt_reserved); > + } > svc_xprt_received(xprt); > > /* No data, incomplete (TCP) read, or accept() */ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 18.08.2012 02:32, J. Bruce Fields wrote: On Fri, Aug 17, 2012 at 04:08:07PM -0400, J. Bruce Fields wrote: Wait a minute, that assumption's a problem because that calculation depends in part on xpt_reserved, which is changed here In particular, svc_xprt_release() calls svc_reserve(rqstp, 0), which subtracts rqstp-rq_reserved and then calls svc_xprt_enqueue, now with a lower xpt_reserved value. That could well explain this. So, maybe something like this? Well. What can I say? With the change below applied (to 3.2 kernel at least), I don't see any stalls or high CPU usage on the server anymore. It survived several multi-gigabyte transfers, for several hours, without any problem. So it is a good step forward ;) But the whole thing seems to be quite a bit fragile. I tried to follow the logic in there, and the thing is quite a bit, well, twisted, and somewhat difficult to follow. So I don't know if this is the right fix or not. At least it works! :) And I really wonder why no one else reported this problem before. Is me the only one in this world who uses linux nfsd? :) Thank you for all your patience and the proposed fix! /mjt commit c8136c319ad85d0db870021fc3f9074d37f26d4a Author: J. Bruce Fields bfie...@redhat.com Date: Fri Aug 17 17:31:53 2012 -0400 svcrpc: don't add to xpt_reserved till we receive The rpc server tries to ensure that there will be room to send a reply before it receives a request. It does this by tracking, in xpt_reserved, an upper bound on the total size of the replies that is has already committed to for the socket. Currently it is adding in the estimate for a new reply *before* it checks whether there is space available. If it finds that there is not space, it then subtracts the estimate back out. This may lead the subsequent svc_xprt_enqueue to decide that there is space after all. The results is a svc_recv() that will repeatedly return -EAGAIN, causing server threads to loop without doing any actual work. Reported-by: Michael Tokarev m...@tls.msk.ru Signed-off-by: J. Bruce Fields bfie...@redhat.com diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index ec99849a..59ff3a3 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -366,8 +366,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt) rqstp, rqstp-rq_xprt); rqstp-rq_xprt = xprt; svc_xprt_get(xprt); - rqstp-rq_reserved = serv-sv_max_mesg; - atomic_add(rqstp-rq_reserved, xprt-xpt_reserved); pool-sp_stats.threads_woken++; wake_up(rqstp-rq_wait); } else { @@ -644,8 +642,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) if (xprt) { rqstp-rq_xprt = xprt; svc_xprt_get(xprt); - rqstp-rq_reserved = serv-sv_max_mesg; - atomic_add(rqstp-rq_reserved, xprt-xpt_reserved); /* As there is a shortage of threads and this request * had to be queued, don't allow the thread to wait so @@ -743,6 +739,10 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) len = xprt-xpt_ops-xpo_recvfrom(rqstp); dprintk(svc: got len=%d\n, len); } + if (len 0) { + rqstp-rq_reserved = serv-sv_max_mesg; + atomic_add(rqstp-rq_reserved, xprt-xpt_reserved); + } svc_xprt_received(xprt); /* No data, incomplete (TCP) read, or accept() */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 18.08.2012 15:13, J. Bruce Fields wrote: On Sat, Aug 18, 2012 at 10:49:31AM +0400, Michael Tokarev wrote: [] Well. What can I say? With the change below applied (to 3.2 kernel at least), I don't see any stalls or high CPU usage on the server anymore. It survived several multi-gigabyte transfers, for several hours, without any problem. So it is a good step forward ;) But the whole thing seems to be quite a bit fragile. I tried to follow the logic in there, and the thing is quite a bit, well, twisted, and somewhat difficult to follow. So I don't know if this is the right fix or not. At least it works! :) Suggestions welcomed. Ok... Meanwhile, you can add my Tested-By: Michael Tokarev m...@tls.msk.ru to the patch. And I really wonder why no one else reported this problem before. Is me the only one in this world who uses linux nfsd? :) This, for example: http://marc.info/?l=linux-nfsm=134131915612287w=2 may well describe the same problem It just needed some debugging persistence, thanks! Ah. I tried to find something when I initially sent this report, but weren't able to. Apparently I'm not alone with this problem indeed! Thank you for all the work! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 17.08.2012 21:26, Michael Tokarev wrote: > On 17.08.2012 21:18, J. Bruce Fields wrote: >> On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote: > [] >>> So we're calling svc_recv in a tight loop, eating >>> all available CPU. (The above is with just 2 nfsd >>> threads). >>> >>> Something is definitely wrong here. And it happens mure more >>> often after the mentioned commit (f03d78db65085). >> >> Oh, neat. Hm. That commit doesn't really sound like the cause, then. >> Is that busy-looping reproduceable on kernels before that commit? > > Note I bisected this issue to this commit. I haven't seen it > happening before this commit, and reverting it from 3.0 or 3.2 > kernel makes the problem to go away. > > I guess it is looping there: > > > net/sunrpc/svc_xprt.c:svc_recv() > ... > len = 0; > ... > if (test_bit(XPT_LISTENER, >xpt_flags)) { > ... > } else if (xprt->xpt_ops->xpo_has_wspace(xprt)) { <=== here -- has > no wspace due to memory... > ... len = > } > > /* No data, incomplete (TCP) read, or accept() */ > if (len == 0 || len == -EAGAIN) > goto out; > ... > out: > rqstp->rq_res.len = 0; > svc_xprt_release(rqstp); > return -EAGAIN; > } > > I'm trying to verify this theory... Yes. I inserted a printk there, and all these million times while we're waiting in this EAGAIN loop, this printk is triggering: [21052.533053] svc_recv: !has_wspace [21052.533070] svc_recv: !has_wspace [21052.533087] svc_recv: !has_wspace [21052.533105] svc_recv: !has_wspace [21052.533122] svc_recv: !has_wspace [21052.533139] svc_recv: !has_wspace [21052.533156] svc_recv: !has_wspace [21052.533174] svc_recv: !has_wspace [21052.533191] svc_recv: !has_wspace [21052.533208] svc_recv: !has_wspace [21052.533226] svc_recv: !has_wspace [21052.533244] svc_recv: !has_wspace [21052.533265] calling svc_recv: 1228163 times (err=-4) [21052.533403] calling svc_recv: 1226616 times (err=-4) [21052.534520] nfsd: last server has exited, flushing export cache (I stopped nfsd since it was flooding the log). I can only guess that before that commit, we always had space, now we don't anymore, and are looping like crazy. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 17.08.2012 21:18, J. Bruce Fields wrote: > On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote: [] >> So we're calling svc_recv in a tight loop, eating >> all available CPU. (The above is with just 2 nfsd >> threads). >> >> Something is definitely wrong here. And it happens mure more >> often after the mentioned commit (f03d78db65085). > > Oh, neat. Hm. That commit doesn't really sound like the cause, then. > Is that busy-looping reproduceable on kernels before that commit? Note I bisected this issue to this commit. I haven't seen it happening before this commit, and reverting it from 3.0 or 3.2 kernel makes the problem to go away. I guess it is looping there: net/sunrpc/svc_xprt.c:svc_recv() ... len = 0; ... if (test_bit(XPT_LISTENER, >xpt_flags)) { ... } else if (xprt->xpt_ops->xpo_has_wspace(xprt)) { <=== here -- has no wspace due to memory... ... len = } /* No data, incomplete (TCP) read, or accept() */ if (len == 0 || len == -EAGAIN) goto out; ... out: rqstp->rq_res.len = 0; svc_xprt_release(rqstp); return -EAGAIN; } I'm trying to verify this theory... /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 17.08.2012 20:00, J. Bruce Fields wrote: []> Uh, if I grepped my way through this right: it looks like it's the > "memory" column of the "TCP" row of /proc/net/protocols; might be > interesting to see how that's changing over time. This file does not look interesting. Memory usage does not jump, there's no high increase either. But there's something else which is interesting here. I noticed that in perf top, the top consumer of CPU is svc_recv() (I mentioned this in the start of this thread). So I looked how this routine is called from nfsd. And here we go. fs/nfsd/nfssvc.c: /* * This is the NFS server kernel thread */ static int nfsd(void *vrqstp) { ... /* * The main request loop */ for (;;) { /* * Find a socket with data available and call its * recvfrom routine. */ int i = 0; while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) ++i; printk(KERN_ERR "calling svc_recv: %d times (err=%d)\n", i, err); if (err == -EINTR) break; ... (I added the "i" counter and the printk). And here's the output: [19626.401136] calling svc_recv: 0 times (err=212) [19626.405059] calling svc_recv: 1478 times (err=212) [19626.409512] calling svc_recv: 1106 times (err=212) [19626.543020] calling svc_recv: 0 times (err=212) [19626.543059] calling svc_recv: 0 times (err=212) [19626.548074] calling svc_recv: 0 times (err=212) [19626.549515] calling svc_recv: 0 times (err=212) [19626.552320] calling svc_recv: 0 times (err=212) [19626.553503] calling svc_recv: 0 times (err=212) [19626.556007] calling svc_recv: 0 times (err=212) [19626.557152] calling svc_recv: 0 times (err=212) [19626.560109] calling svc_recv: 0 times (err=212) [19626.560943] calling svc_recv: 0 times (err=212) [19626.565315] calling svc_recv: 1067 times (err=212) [19626.569735] calling svc_recv: 2571 times (err=212) [19626.574150] calling svc_recv: 3842 times (err=212) [19626.581914] calling svc_recv: 2891 times (err=212) [19626.583072] calling svc_recv: 1247 times (err=212) [19626.616885] calling svc_recv: 0 times (err=212) [19626.616952] calling svc_recv: 0 times (err=212) [19626.622889] calling svc_recv: 0 times (err=212) [19626.624518] calling svc_recv: 0 times (err=212) [19626.627118] calling svc_recv: 0 times (err=212) [19626.629735] calling svc_recv: 0 times (err=212) [19626.631777] calling svc_recv: 0 times (err=212) [19626.633986] calling svc_recv: 0 times (err=212) [19626.636746] calling svc_recv: 0 times (err=212) [19626.637692] calling svc_recv: 0 times (err=212) [19626.640769] calling svc_recv: 0 times (err=212) [19626.657852] calling svc_recv: 0 times (err=212) [19626.661602] calling svc_recv: 0 times (err=212) [19626.670160] calling svc_recv: 0 times (err=212) [19626.671917] calling svc_recv: 0 times (err=212) [19626.684643] calling svc_recv: 0 times (err=212) [19626.684680] calling svc_recv: 0 times (err=212) [19626.812820] calling svc_recv: 0 times (err=212) [19626.814697] calling svc_recv: 0 times (err=212) [19626.817195] calling svc_recv: 0 times (err=212) [19626.820324] calling svc_recv: 0 times (err=212) [19626.822855] calling svc_recv: 0 times (err=212) [19626.824823] calling svc_recv: 0 times (err=212) [19626.828016] calling svc_recv: 0 times (err=212) [19626.829021] calling svc_recv: 0 times (err=212) [19626.831970] calling svc_recv: 0 times (err=212) > the stall begin: [19686.823135] calling svc_recv: 3670352 times (err=212) [19686.823524] calling svc_recv: 3659205 times (err=212) > transfer continues [19686.854734] calling svc_recv: 0 times (err=212) [19686.860023] calling svc_recv: 0 times (err=212) [19686.887124] calling svc_recv: 0 times (err=212) [19686.895532] calling svc_recv: 0 times (err=212) [19686.903667] calling svc_recv: 0 times (err=212) [19686.922780] calling svc_recv: 0 times (err=212) So we're calling svc_recv in a tight loop, eating all available CPU. (The above is with just 2 nfsd threads). Something is definitely wrong here. And it happens mure more often after the mentioned commit (f03d78db65085). Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 17.08.2012 20:00, J. Bruce Fields wrote: [] Uh, if I grepped my way through this right: it looks like it's the memory column of the TCP row of /proc/net/protocols; might be interesting to see how that's changing over time. This file does not look interesting. Memory usage does not jump, there's no high increase either. But there's something else which is interesting here. I noticed that in perf top, the top consumer of CPU is svc_recv() (I mentioned this in the start of this thread). So I looked how this routine is called from nfsd. And here we go. fs/nfsd/nfssvc.c: /* * This is the NFS server kernel thread */ static int nfsd(void *vrqstp) { ... /* * The main request loop */ for (;;) { /* * Find a socket with data available and call its * recvfrom routine. */ int i = 0; while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) ++i; printk(KERN_ERR calling svc_recv: %d times (err=%d)\n, i, err); if (err == -EINTR) break; ... (I added the i counter and the printk). And here's the output: [19626.401136] calling svc_recv: 0 times (err=212) [19626.405059] calling svc_recv: 1478 times (err=212) [19626.409512] calling svc_recv: 1106 times (err=212) [19626.543020] calling svc_recv: 0 times (err=212) [19626.543059] calling svc_recv: 0 times (err=212) [19626.548074] calling svc_recv: 0 times (err=212) [19626.549515] calling svc_recv: 0 times (err=212) [19626.552320] calling svc_recv: 0 times (err=212) [19626.553503] calling svc_recv: 0 times (err=212) [19626.556007] calling svc_recv: 0 times (err=212) [19626.557152] calling svc_recv: 0 times (err=212) [19626.560109] calling svc_recv: 0 times (err=212) [19626.560943] calling svc_recv: 0 times (err=212) [19626.565315] calling svc_recv: 1067 times (err=212) [19626.569735] calling svc_recv: 2571 times (err=212) [19626.574150] calling svc_recv: 3842 times (err=212) [19626.581914] calling svc_recv: 2891 times (err=212) [19626.583072] calling svc_recv: 1247 times (err=212) [19626.616885] calling svc_recv: 0 times (err=212) [19626.616952] calling svc_recv: 0 times (err=212) [19626.622889] calling svc_recv: 0 times (err=212) [19626.624518] calling svc_recv: 0 times (err=212) [19626.627118] calling svc_recv: 0 times (err=212) [19626.629735] calling svc_recv: 0 times (err=212) [19626.631777] calling svc_recv: 0 times (err=212) [19626.633986] calling svc_recv: 0 times (err=212) [19626.636746] calling svc_recv: 0 times (err=212) [19626.637692] calling svc_recv: 0 times (err=212) [19626.640769] calling svc_recv: 0 times (err=212) [19626.657852] calling svc_recv: 0 times (err=212) [19626.661602] calling svc_recv: 0 times (err=212) [19626.670160] calling svc_recv: 0 times (err=212) [19626.671917] calling svc_recv: 0 times (err=212) [19626.684643] calling svc_recv: 0 times (err=212) [19626.684680] calling svc_recv: 0 times (err=212) [19626.812820] calling svc_recv: 0 times (err=212) [19626.814697] calling svc_recv: 0 times (err=212) [19626.817195] calling svc_recv: 0 times (err=212) [19626.820324] calling svc_recv: 0 times (err=212) [19626.822855] calling svc_recv: 0 times (err=212) [19626.824823] calling svc_recv: 0 times (err=212) [19626.828016] calling svc_recv: 0 times (err=212) [19626.829021] calling svc_recv: 0 times (err=212) [19626.831970] calling svc_recv: 0 times (err=212) the stall begin: [19686.823135] calling svc_recv: 3670352 times (err=212) [19686.823524] calling svc_recv: 3659205 times (err=212) transfer continues [19686.854734] calling svc_recv: 0 times (err=212) [19686.860023] calling svc_recv: 0 times (err=212) [19686.887124] calling svc_recv: 0 times (err=212) [19686.895532] calling svc_recv: 0 times (err=212) [19686.903667] calling svc_recv: 0 times (err=212) [19686.922780] calling svc_recv: 0 times (err=212) So we're calling svc_recv in a tight loop, eating all available CPU. (The above is with just 2 nfsd threads). Something is definitely wrong here. And it happens mure more often after the mentioned commit (f03d78db65085). Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 17.08.2012 21:18, J. Bruce Fields wrote: On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote: [] So we're calling svc_recv in a tight loop, eating all available CPU. (The above is with just 2 nfsd threads). Something is definitely wrong here. And it happens mure more often after the mentioned commit (f03d78db65085). Oh, neat. Hm. That commit doesn't really sound like the cause, then. Is that busy-looping reproduceable on kernels before that commit? Note I bisected this issue to this commit. I haven't seen it happening before this commit, and reverting it from 3.0 or 3.2 kernel makes the problem to go away. I guess it is looping there: net/sunrpc/svc_xprt.c:svc_recv() ... len = 0; ... if (test_bit(XPT_LISTENER, xprt-xpt_flags)) { ... } else if (xprt-xpt_ops-xpo_has_wspace(xprt)) { === here -- has no wspace due to memory... ... len = something } /* No data, incomplete (TCP) read, or accept() */ if (len == 0 || len == -EAGAIN) goto out; ... out: rqstp-rq_res.len = 0; svc_xprt_release(rqstp); return -EAGAIN; } I'm trying to verify this theory... /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 17.08.2012 21:26, Michael Tokarev wrote: On 17.08.2012 21:18, J. Bruce Fields wrote: On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote: [] So we're calling svc_recv in a tight loop, eating all available CPU. (The above is with just 2 nfsd threads). Something is definitely wrong here. And it happens mure more often after the mentioned commit (f03d78db65085). Oh, neat. Hm. That commit doesn't really sound like the cause, then. Is that busy-looping reproduceable on kernels before that commit? Note I bisected this issue to this commit. I haven't seen it happening before this commit, and reverting it from 3.0 or 3.2 kernel makes the problem to go away. I guess it is looping there: net/sunrpc/svc_xprt.c:svc_recv() ... len = 0; ... if (test_bit(XPT_LISTENER, xprt-xpt_flags)) { ... } else if (xprt-xpt_ops-xpo_has_wspace(xprt)) { === here -- has no wspace due to memory... ... len = something } /* No data, incomplete (TCP) read, or accept() */ if (len == 0 || len == -EAGAIN) goto out; ... out: rqstp-rq_res.len = 0; svc_xprt_release(rqstp); return -EAGAIN; } I'm trying to verify this theory... Yes. I inserted a printk there, and all these million times while we're waiting in this EAGAIN loop, this printk is triggering: [21052.533053] svc_recv: !has_wspace [21052.533070] svc_recv: !has_wspace [21052.533087] svc_recv: !has_wspace [21052.533105] svc_recv: !has_wspace [21052.533122] svc_recv: !has_wspace [21052.533139] svc_recv: !has_wspace [21052.533156] svc_recv: !has_wspace [21052.533174] svc_recv: !has_wspace [21052.533191] svc_recv: !has_wspace [21052.533208] svc_recv: !has_wspace [21052.533226] svc_recv: !has_wspace [21052.533244] svc_recv: !has_wspace [21052.533265] calling svc_recv: 1228163 times (err=-4) [21052.533403] calling svc_recv: 1226616 times (err=-4) [21052.534520] nfsd: last server has exited, flushing export cache (I stopped nfsd since it was flooding the log). I can only guess that before that commit, we always had space, now we don't anymore, and are looping like crazy. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues (bisected)
On 12.07.2012 16:53, J. Bruce Fields wrote: > On Tue, Jul 10, 2012 at 04:52:03PM +0400, Michael Tokarev wrote: >> I tried to debug this again, maybe to reproduce in a virtual machine, >> and found out that it is only 32bit server code shows this issue: >> after updating the kernel on the server to 64bit (the same version) >> I can't reproduce this issue anymore. Rebooting back to 32bit, >> and voila, it is here again. >> >> Something apparenlty isn't right on 32bits... ;) >> >> (And yes, the prob is still present and is very annoying :) > > OK, that's very useful, thanks. So probably a bug got introduced in the > 32-bit case between 2.6.32 and 3.0. > > My personal upstream testing is normally all x86_64 only. I'll kick off > a 32-bit install and see if I can reproduce this quickly. Actually it has nothing to do with 32 vs 64 bits as I initially thought. It happens on 64bits too, but takes more time (or data to transfer) to trigger. > Let me know if you're able to narrow this down any more. I bisected this issue to the following commit: commit f03d78db65085609938fdb686238867e65003181 Author: Eric Dumazet Date: Thu Jul 7 00:27:05 2011 -0700 net: refine {udp|tcp|sctp}_mem limits Current tcp/udp/sctp global memory limits are not taking into account hugepages allocations, and allow 50% of ram to be used by buffers of a single protocol [ not counting space used by sockets / inodes ...] Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram per protocol, and a minimum of 128 pages. Heavy duty machines sysadmins probably need to tweak limits anyway. Reverting this commit on top of 3.0 (or any later 3.x kernel) fixes the behavour here. This machine has 4Gb of memory. On 3.0, with this patch applied (as it is part of 3.0), tcp_mem is like this: 21228 28306 42456 with this patch reverted, tcp_mem shows: 81216 108288 162432 and with these values, it works fine. So it looks like something else goes wrong there, which lead to all nfsds fighting with each other for something and eating 100% of available CPU instead of servicing clients. For added fun, when setting tcp_mem to the "good" value from "bad" value (after booting into kernel with that patch applied), the problem is _not_ fixed. Any further hints? Thanks, /mjt >> On 31.05.2012 17:51, Michael Tokarev wrote: >>> On 31.05.2012 17:46, Myklebust, Trond wrote: >>>> On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote: >>> [] >>>>> I started tcpdump: >>>>> >>>>> tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 >>>>> \) -w nfsdump >>>>> >>>>> on the client (192.168.88.2). Next I mounted a directory on the client, >>>>> and started reading (tar'ing) a directory into /dev/null. It captured a >>>>> few stalls. Tcpdump shows number of packets it got, the stalls are at >>>>> packet counts 58090, 97069 and 97071. I cancelled the capture after that. >>>>> >>>>> The resulting file is available at >>>>> http://www.corpit.ru/mjt/tmp/nfsdump.xz , >>>>> it is 220Mb uncompressed and 1.3Mb compressed. The source files are >>>>> 10 files of 1Gb each, all made by using `truncate' utility, so does not >>>>> take place on disk at all. This also makes it obvious that the issue >>>>> does not depend on the speed of disk on the server (since in this case, >>>>> the server disk isn't even in use). >>>> >>>> OK. So from the above file it looks as if the traffic is mainly READ >>>> requests. >>> >>> The issue here happens only with reads. >>> >>>> In 2 places the server stops responding. In both cases, the client seems >>>> to be sending a single TCP frame containing several COMPOUNDS containing >>>> READ requests (which should be legal) just prior to the hang. When the >>>> server doesn't respond, the client pings it with a RENEW, before it ends >>>> up severing the TCP connection and then retransmitting. >>> >>> And sometimes -- speaking only from the behavour I've seen, not from the >>> actual frames sent -- server does not respond to the RENEW too, in which >>> case the client reports "nfs server no responding", and on the next >>> renew it may actually respond. This happens too, but much more rare. >>> >>> During these stalls, ie, when there's no network activity at all, >>> the server NFSD threads are busy eating all available CPU. >>> >>> What does it all tel
Re: 3.0+ NFS issues (bisected)
On 12.07.2012 16:53, J. Bruce Fields wrote: On Tue, Jul 10, 2012 at 04:52:03PM +0400, Michael Tokarev wrote: I tried to debug this again, maybe to reproduce in a virtual machine, and found out that it is only 32bit server code shows this issue: after updating the kernel on the server to 64bit (the same version) I can't reproduce this issue anymore. Rebooting back to 32bit, and voila, it is here again. Something apparenlty isn't right on 32bits... ;) (And yes, the prob is still present and is very annoying :) OK, that's very useful, thanks. So probably a bug got introduced in the 32-bit case between 2.6.32 and 3.0. My personal upstream testing is normally all x86_64 only. I'll kick off a 32-bit install and see if I can reproduce this quickly. Actually it has nothing to do with 32 vs 64 bits as I initially thought. It happens on 64bits too, but takes more time (or data to transfer) to trigger. Let me know if you're able to narrow this down any more. I bisected this issue to the following commit: commit f03d78db65085609938fdb686238867e65003181 Author: Eric Dumazet eric.duma...@gmail.com Date: Thu Jul 7 00:27:05 2011 -0700 net: refine {udp|tcp|sctp}_mem limits Current tcp/udp/sctp global memory limits are not taking into account hugepages allocations, and allow 50% of ram to be used by buffers of a single protocol [ not counting space used by sockets / inodes ...] Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram per protocol, and a minimum of 128 pages. Heavy duty machines sysadmins probably need to tweak limits anyway. Reverting this commit on top of 3.0 (or any later 3.x kernel) fixes the behavour here. This machine has 4Gb of memory. On 3.0, with this patch applied (as it is part of 3.0), tcp_mem is like this: 21228 28306 42456 with this patch reverted, tcp_mem shows: 81216 108288 162432 and with these values, it works fine. So it looks like something else goes wrong there, which lead to all nfsds fighting with each other for something and eating 100% of available CPU instead of servicing clients. For added fun, when setting tcp_mem to the good value from bad value (after booting into kernel with that patch applied), the problem is _not_ fixed. Any further hints? Thanks, /mjt On 31.05.2012 17:51, Michael Tokarev wrote: On 31.05.2012 17:46, Myklebust, Trond wrote: On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote: [] I started tcpdump: tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 \) -w nfsdump on the client (192.168.88.2). Next I mounted a directory on the client, and started reading (tar'ing) a directory into /dev/null. It captured a few stalls. Tcpdump shows number of packets it got, the stalls are at packet counts 58090, 97069 and 97071. I cancelled the capture after that. The resulting file is available at http://www.corpit.ru/mjt/tmp/nfsdump.xz , it is 220Mb uncompressed and 1.3Mb compressed. The source files are 10 files of 1Gb each, all made by using `truncate' utility, so does not take place on disk at all. This also makes it obvious that the issue does not depend on the speed of disk on the server (since in this case, the server disk isn't even in use). OK. So from the above file it looks as if the traffic is mainly READ requests. The issue here happens only with reads. In 2 places the server stops responding. In both cases, the client seems to be sending a single TCP frame containing several COMPOUNDS containing READ requests (which should be legal) just prior to the hang. When the server doesn't respond, the client pings it with a RENEW, before it ends up severing the TCP connection and then retransmitting. And sometimes -- speaking only from the behavour I've seen, not from the actual frames sent -- server does not respond to the RENEW too, in which case the client reports nfs server no responding, and on the next renew it may actually respond. This happens too, but much more rare. During these stalls, ie, when there's no network activity at all, the server NFSD threads are busy eating all available CPU. What does it all tell us? :) Thank you! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Don't use static to define "void *p" in show_partition_start().
On 03.08.2012 12:41, Jens Axboe wrote: > On 08/03/2012 07:07 AM, majianpeng wrote: [] >> diff --git a/block/genhd.c b/block/genhd.c >> index cac7366..d839723 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -835,7 +835,7 @@ static void disk_seqf_stop(struct seq_file *seqf, void >> *v) >> >> static void *show_partition_start(struct seq_file *seqf, loff_t *pos) >> { >> -static void *p; >> +void *p; >> >> p = disk_seqf_start(seqf, pos); >> if (!IS_ERR_OR_NULL(p) && !*pos) > > Huh, that looks like a clear bug. I've applied it, thanks. It also looks like a -stable material, don't you think? Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Don't use static to define void *p in show_partition_start().
On 03.08.2012 12:41, Jens Axboe wrote: On 08/03/2012 07:07 AM, majianpeng wrote: [] diff --git a/block/genhd.c b/block/genhd.c index cac7366..d839723 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -835,7 +835,7 @@ static void disk_seqf_stop(struct seq_file *seqf, void *v) static void *show_partition_start(struct seq_file *seqf, loff_t *pos) { -static void *p; +void *p; p = disk_seqf_start(seqf, pos); if (!IS_ERR_OR_NULL(p) !*pos) Huh, that looks like a clear bug. I've applied it, thanks. It also looks like a -stable material, don't you think? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.5-rcX : Big problem with root device returning
On 15.07.2012 23:12, werner wrote: > Even if rdev isn't often used, it should kept working, as it's included in > many other programs, and principally in the installers. rdev doesn't _exist_ anymore in current software, including installers. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.5-rcX : Big problem with root device returning
On 15.07.2012 23:12, werner wrote: Even if rdev isn't often used, it should kept working, as it's included in many other programs, and principally in the installers. rdev doesn't _exist_ anymore in current software, including installers. /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.5-rcX : Big problem with root device returning
On 12.07.2012 16:08, werner wrote: > There is a big problem since 3.5-rc1 which potentially mess the installations > > rdev don't give longer back the root device like /dev/sda1 , but in the > bios form like 0x80010300 Note rdev returns information which is written to kernel image, not information about actual device the system booted from. > rdev is essential for the installation programs and for the installation > f.ex. of lilo . It's not conveniente to rely on the bios numbers, because > on some meinbords they change depending which boot order you select in BIOS, > or only if you select another boot device in the bios boot menu with F12. > Whilst /dev/sdXY is more reliable. > > rdev is an old basical function which always worked correctly, until now. rdev utility is obsolete, it is not present in current util-linux anymore, because it makes just no sense nowadays. Storing root device in the kernel image has been obsoleted long ago by boot loaders providing kernel command line and root= parameter. More, root device is often not mounted by kernel itself, but by initramfs (which become an integral part of the kernel image). It is obsolete because of 3 reasons: 1) you've kernel command line from the bootloader to store this and other info 2) it is not guaranteed that the next reboot the same device will be using the same /dev/sdX node, since they're discovered dynamically (in this sense, bios codes are more reliable, and filesystem UUIDs or labels are the right way to go) 3) static device numbers are slowly going away too, very few tools left which knows about particular major,minor pairs. > The error starts with 3.5-rc1 and is not corrected until 3.5-rc6 .If I go > back to an earlier kernel, 3.4 or older, then the same installation works > correct (rdev gives /dev/sda1 ) and if I go back then again to 3.5-rcX it's > again wrong (rdev gives 0x80010300).Thus, this seems a wrong manner how > the kernel gives back the root device, or interact with rdev. It's also > possible that this problem happens only under any kernel compilation option, > so that below I give the differences in config between 3.4 and 3.5-rc1 > > This problem should be fixed most quickly, rdev always have to work > correctly. There's no problem, so nothing to fix. Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.5-rcX : Big problem with root device returning
On 12.07.2012 16:08, werner wrote: There is a big problem since 3.5-rc1 which potentially mess the installations rdev don't give longer back the root device like /dev/sda1 , but in the bios form like 0x80010300 Note rdev returns information which is written to kernel image, not information about actual device the system booted from. rdev is essential for the installation programs and for the installation f.ex. of lilo . It's not conveniente to rely on the bios numbers, because on some meinbords they change depending which boot order you select in BIOS, or only if you select another boot device in the bios boot menu with F12. Whilst /dev/sdXY is more reliable. rdev is an old basical function which always worked correctly, until now. rdev utility is obsolete, it is not present in current util-linux anymore, because it makes just no sense nowadays. Storing root device in the kernel image has been obsoleted long ago by boot loaders providing kernel command line and root= parameter. More, root device is often not mounted by kernel itself, but by initramfs (which become an integral part of the kernel image). It is obsolete because of 3 reasons: 1) you've kernel command line from the bootloader to store this and other info 2) it is not guaranteed that the next reboot the same device will be using the same /dev/sdX node, since they're discovered dynamically (in this sense, bios codes are more reliable, and filesystem UUIDs or labels are the right way to go) 3) static device numbers are slowly going away too, very few tools left which knows about particular major,minor pairs. The error starts with 3.5-rc1 and is not corrected until 3.5-rc6 .If I go back to an earlier kernel, 3.4 or older, then the same installation works correct (rdev gives /dev/sda1 ) and if I go back then again to 3.5-rcX it's again wrong (rdev gives 0x80010300).Thus, this seems a wrong manner how the kernel gives back the root device, or interact with rdev. It's also possible that this problem happens only under any kernel compilation option, so that below I give the differences in config between 3.4 and 3.5-rc1 This problem should be fixed most quickly, rdev always have to work correctly. There's no problem, so nothing to fix. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] core-kernel: use multiply instead of shifts in hash_64
On 03.07.2012 00:25, Andrew Hunter wrote: > diff --git a/include/linux/hash.h b/include/linux/hash.h > index b80506b..daabc3d 100644 > --- a/include/linux/hash.h > +++ b/include/linux/hash.h > @@ -34,7 +34,9 @@ > static inline u64 hash_64(u64 val, unsigned int bits) > { > u64 hash = val; > - > +#if BITS_PER_LONG == 64 > + hash *= GOLDEN_RATIO_PRIME_64; > +#else > /* Sigh, gcc can't optimise this alone like it does for 32 bits. */ Hmm. Does this comment make sense here now? Thanks, /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues
I tried to debug this again, maybe to reproduce in a virtual machine, and found out that it is only 32bit server code shows this issue: after updating the kernel on the server to 64bit (the same version) I can't reproduce this issue anymore. Rebooting back to 32bit, and voila, it is here again. Something apparenlty isn't right on 32bits... ;) (And yes, the prob is still present and is very annoying :) Thanks, /mjt On 31.05.2012 17:51, Michael Tokarev wrote: > On 31.05.2012 17:46, Myklebust, Trond wrote: >> On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote: > [] >>> I started tcpdump: >>> >>> tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 \) >>> -w nfsdump >>> >>> on the client (192.168.88.2). Next I mounted a directory on the client, >>> and started reading (tar'ing) a directory into /dev/null. It captured a >>> few stalls. Tcpdump shows number of packets it got, the stalls are at >>> packet counts 58090, 97069 and 97071. I cancelled the capture after that. >>> >>> The resulting file is available at http://www.corpit.ru/mjt/tmp/nfsdump.xz , >>> it is 220Mb uncompressed and 1.3Mb compressed. The source files are >>> 10 files of 1Gb each, all made by using `truncate' utility, so does not >>> take place on disk at all. This also makes it obvious that the issue >>> does not depend on the speed of disk on the server (since in this case, >>> the server disk isn't even in use). >> >> OK. So from the above file it looks as if the traffic is mainly READ >> requests. > > The issue here happens only with reads. > >> In 2 places the server stops responding. In both cases, the client seems >> to be sending a single TCP frame containing several COMPOUNDS containing >> READ requests (which should be legal) just prior to the hang. When the >> server doesn't respond, the client pings it with a RENEW, before it ends >> up severing the TCP connection and then retransmitting. > > And sometimes -- speaking only from the behavour I've seen, not from the > actual frames sent -- server does not respond to the RENEW too, in which > case the client reports "nfs server no responding", and on the next > renew it may actually respond. This happens too, but much more rare. > > During these stalls, ie, when there's no network activity at all, > the server NFSD threads are busy eating all available CPU. > > What does it all tell us? :) > > Thank you! > > /mjt > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.0+ NFS issues
I tried to debug this again, maybe to reproduce in a virtual machine, and found out that it is only 32bit server code shows this issue: after updating the kernel on the server to 64bit (the same version) I can't reproduce this issue anymore. Rebooting back to 32bit, and voila, it is here again. Something apparenlty isn't right on 32bits... ;) (And yes, the prob is still present and is very annoying :) Thanks, /mjt On 31.05.2012 17:51, Michael Tokarev wrote: On 31.05.2012 17:46, Myklebust, Trond wrote: On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote: [] I started tcpdump: tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 \) -w nfsdump on the client (192.168.88.2). Next I mounted a directory on the client, and started reading (tar'ing) a directory into /dev/null. It captured a few stalls. Tcpdump shows number of packets it got, the stalls are at packet counts 58090, 97069 and 97071. I cancelled the capture after that. The resulting file is available at http://www.corpit.ru/mjt/tmp/nfsdump.xz , it is 220Mb uncompressed and 1.3Mb compressed. The source files are 10 files of 1Gb each, all made by using `truncate' utility, so does not take place on disk at all. This also makes it obvious that the issue does not depend on the speed of disk on the server (since in this case, the server disk isn't even in use). OK. So from the above file it looks as if the traffic is mainly READ requests. The issue here happens only with reads. In 2 places the server stops responding. In both cases, the client seems to be sending a single TCP frame containing several COMPOUNDS containing READ requests (which should be legal) just prior to the hang. When the server doesn't respond, the client pings it with a RENEW, before it ends up severing the TCP connection and then retransmitting. And sometimes -- speaking only from the behavour I've seen, not from the actual frames sent -- server does not respond to the RENEW too, in which case the client reports nfs server no responding, and on the next renew it may actually respond. This happens too, but much more rare. During these stalls, ie, when there's no network activity at all, the server NFSD threads are busy eating all available CPU. What does it all tell us? :) Thank you! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] core-kernel: use multiply instead of shifts in hash_64
On 03.07.2012 00:25, Andrew Hunter wrote: diff --git a/include/linux/hash.h b/include/linux/hash.h index b80506b..daabc3d 100644 --- a/include/linux/hash.h +++ b/include/linux/hash.h @@ -34,7 +34,9 @@ static inline u64 hash_64(u64 val, unsigned int bits) { u64 hash = val; - +#if BITS_PER_LONG == 64 + hash *= GOLDEN_RATIO_PRIME_64; +#else /* Sigh, gcc can't optimise this alone like it does for 32 bits. */ Hmm. Does this comment make sense here now? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Jeremy Higdon wrote: [] > I'll put it even more strongly. My experience is that disabling write > cache plus disabling barriers is often much faster than enabling both > barriers and write cache enabled, when doing metadata intensive > operations, as long as you have a drive that is good at CTQ/NCQ. Now, and it's VERY interesting at least for me (and is off-topic in this thread) -- which drive(s) are good at NCQ? I tried numerous SATA (NCQ is about sata, right? :) drives, but NCQ either does nothing in terms of performance or hurts. Yesterday we ordered another drive from Hitachi (their "raid edition" thing), -- will try it tomorrow, but I've no hope here as it's some 5th or 6th model/brand already. (Ol'good SCSI drives, even 10 years old, shows large difference when TCQ is enabled...) Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Ric Wheeler wrote: > Alasdair G Kergon wrote: >> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: >>> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: >>>> I wonder if it's worth the effort to try to implement this. >> >> My personal view (which seems to be in the minority) is that it's a >> waste of our development time *except* in the (rare?) cases similar to >> the ones Andi is talking about. > > Using working barriers is important for normal users when you really > care about data loss and have normal drives in a box. We do power fail > testing on boxes (with reiserfs and ext3) and can definitely see a lot > of file system corruption eliminated over power failures when barriers > are enabled properly. > > It is not unreasonable for some machines to disable barriers to get a > performance boost, but I would not do that when you are storing things > you really need back. The talk here is about something different - about supporting barriers on md/dm devices, i.e., on pseudo-devices which uses multiple real devices as components (software RAIDs etc). In this "world" it's nearly impossible to support barriers if there are more than one underlying component device, barriers only works if there's only one component. And the talk is about supporting barriers only in "minority" of cases - mostly for simplest device-mapper case only, NOT covering any raid1 or other "fancy" configurations. > Of course, you don't need barriers when you either disable the write > cache on the drives or use a battery backed RAID array which gives you a > write cache that will survive power outages... Two things here. First, I still don't understand why in God's sake barriers are "working" while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? And second, "surprisingly", battery-backed RAID write caches tends to fail too, sometimes... ;) Usually, such a battery is enough to keep the data in memory for several hours only (sine many RAID controllers uses regular RAM for memory caches, which requires some power to keep its state), -- I come across this issue the hard way, and realized that only very few persons around me who manages raid systems even knows about this problem - that the battery-backed cache is only for some time... For example, power failed at evening, and by tomorrow morning, batteries are empty already. Or, with better batteries, think about a weekend... ;) (I've seen some vendors now uses flash-based backing store for caches instead, which should ensure far better results here). /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)
Andrew Morton wrote: > (suitable cc added) Thanks. I was meant to sent it to linux-nfs originally, but looks like i mistyped the address. > (regression) Now, after we did some more experiments with it, I don't think it's a regression. I'll post a bit more details in a few hours when the ongoing testing finishes. Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)
Andrew Morton wrote: (suitable cc added) Thanks. I was meant to sent it to linux-nfs originally, but looks like i mistyped the address. (regression) Now, after we did some more experiments with it, I don't think it's a regression. I'll post a bit more details in a few hours when the ongoing testing finishes. Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Ric Wheeler wrote: Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. Using working barriers is important for normal users when you really care about data loss and have normal drives in a box. We do power fail testing on boxes (with reiserfs and ext3) and can definitely see a lot of file system corruption eliminated over power failures when barriers are enabled properly. It is not unreasonable for some machines to disable barriers to get a performance boost, but I would not do that when you are storing things you really need back. The talk here is about something different - about supporting barriers on md/dm devices, i.e., on pseudo-devices which uses multiple real devices as components (software RAIDs etc). In this world it's nearly impossible to support barriers if there are more than one underlying component device, barriers only works if there's only one component. And the talk is about supporting barriers only in minority of cases - mostly for simplest device-mapper case only, NOT covering any raid1 or other fancy configurations. Of course, you don't need barriers when you either disable the write cache on the drives or use a battery backed RAID array which gives you a write cache that will survive power outages... Two things here. First, I still don't understand why in God's sake barriers are working while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? And second, surprisingly, battery-backed RAID write caches tends to fail too, sometimes... ;) Usually, such a battery is enough to keep the data in memory for several hours only (sine many RAID controllers uses regular RAM for memory caches, which requires some power to keep its state), -- I come across this issue the hard way, and realized that only very few persons around me who manages raid systems even knows about this problem - that the battery-backed cache is only for some time... For example, power failed at evening, and by tomorrow morning, batteries are empty already. Or, with better batteries, think about a weekend... ;) (I've seen some vendors now uses flash-based backing store for caches instead, which should ensure far better results here). /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Jeremy Higdon wrote: [] I'll put it even more strongly. My experience is that disabling write cache plus disabling barriers is often much faster than enabling both barriers and write cache enabled, when doing metadata intensive operations, as long as you have a drive that is good at CTQ/NCQ. Now, and it's VERY interesting at least for me (and is off-topic in this thread) -- which drive(s) are good at NCQ? I tried numerous SATA (NCQ is about sata, right? :) drives, but NCQ either does nothing in terms of performance or hurts. Yesterday we ordered another drive from Hitachi (their raid edition thing), -- will try it tomorrow, but I've no hope here as it's some 5th or 6th model/brand already. (Ol'good SCSI drives, even 10 years old, shows large difference when TCQ is enabled...) Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Spurious completions during NCQ
Hugo Mills wrote: > On Fri, Feb 15, 2008 at 10:00:00AM -0500, Calvin Walton wrote: >> On Fri, 2008-02-15 at 13:46 +, Hugo Mills wrote: >>> I'm getting these on my Dell Latitude D830: >>> >>> Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr >>> 0x0 action 0x2 frozen >>> Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ >>> issue=0x0 SAct=0x4 FIS=004040a1:0002 >>>In some cases, there are several cmd/res lines listed. It's >>> happening about once an hour or so (not correlated with any other >>> event that I can see). It doesn't seem to be affecting operation of >>> the machine, but it's making me nervous. JFYI: Most probably it is correlated with smartd asking the device for it's SMART status. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] quota: Turn quotas off when remounting read-only
Jan Engelhardt wrote: > On Feb 11 2008 13:39, Jan Kara wrote: >>> But... I'm thinking about this scenario: >>> >>> # mount /data >>> # quotaon /data >>> (some maintenance stuff to be planned) >>> # mount -o remount,ro /data >>> (do backup etc) >>> # mount -r remount,rw /data >>> >>> at this point, it's expected that quota on /data is enabled. >>> After this patch, it's not anymore... >> Yes, it previously accidentally worked this way (for an year or so, >> before that we refused to remount read-only). Hmm, but maybe we could >> somehow tweak quotas to be turned on when remounting read-write again. >> We have all the information we need at the time of remounting read-only >> so we could store it and use it later when remounting read-write. I'll have >> a look into that. > > Maybe it is possible to leave quota on all times, so that the > reporting quota ioctls continue to work even in ro mode? Well, that'd be the best approach imho (plus check if all ioctls which try to modify quotas fails with EROFS as appropriate). But the problem really is that it's unknown at this time where it fails in the first place. I can't reproduce my hang "on demand" (mount-ro followed with umount when quotas are turned on, with ext3fs - umount never finishes), yet it has biten me for several times already. So it must be something rare, some small race maybe, which is difficult to find... Yet it finds itself at the most inappropriate moment. ;) I already learned to turn the quota off before doing something with a filesystem, but sometimes I'm forgetting this, and the result is always the same... ;) Oh well. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
Alasdair G Kergon wrote: > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: >> Implement barrier support for single device DM devices > > Thanks. We've got some (more-invasive) dm patches in the works that > attempt to use flushing to emulate barriers where we can't just > pass them down like that. I wonder if it's worth the effort to try to implement this. As far as I understand (*), if a filesystem realizes that the underlying block device does not support barriers, it will switch to using regular flushes instead - isn't it the same thing as you're trying to do on an MD level? Note that a filesystem must understand barriers/flushes on underlying block device, since many disk drives don't support barriers anyway. (*) this is, in fact, an interesting question. I still can't find complete information about this. For example, how safe xfs is if barriers are not supported or turned off? Is it "less safe" than with barriers? Will it use regular cache flushes if barriers are not here? Ditto for ext3fs, but here, barriers are not enabled by default. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] quota: Turn quotas off when remounting read-only
Jan Engelhardt wrote: On Feb 11 2008 13:39, Jan Kara wrote: But... I'm thinking about this scenario: # mount /data # quotaon /data (some maintenance stuff to be planned) # mount -o remount,ro /data (do backup etc) # mount -r remount,rw /data at this point, it's expected that quota on /data is enabled. After this patch, it's not anymore... Yes, it previously accidentally worked this way (for an year or so, before that we refused to remount read-only). Hmm, but maybe we could somehow tweak quotas to be turned on when remounting read-write again. We have all the information we need at the time of remounting read-only so we could store it and use it later when remounting read-write. I'll have a look into that. Maybe it is possible to leave quota on all times, so that the reporting quota ioctls continue to work even in ro mode? Well, that'd be the best approach imho (plus check if all ioctls which try to modify quotas fails with EROFS as appropriate). But the problem really is that it's unknown at this time where it fails in the first place. I can't reproduce my hang on demand (mount-ro followed with umount when quotas are turned on, with ext3fs - umount never finishes), yet it has biten me for several times already. So it must be something rare, some small race maybe, which is difficult to find... Yet it finds itself at the most inappropriate moment. ;) I already learned to turn the quota off before doing something with a filesystem, but sometimes I'm forgetting this, and the result is always the same... ;) Oh well. /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Spurious completions during NCQ
Hugo Mills wrote: On Fri, Feb 15, 2008 at 10:00:00AM -0500, Calvin Walton wrote: On Fri, 2008-02-15 at 13:46 +, Hugo Mills wrote: I'm getting these on my Dell Latitude D830: Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 0x0 action 0x2 frozen Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ issue=0x0 SAct=0x4 FIS=004040a1:0002 In some cases, there are several cmd/res lines listed. It's happening about once an hour or so (not correlated with any other event that I can see). It doesn't seem to be affecting operation of the machine, but it's making me nervous. JFYI: Most probably it is correlated with smartd asking the device for it's SMART status. /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.24: RPC: bad TCP reclen 0x00020090 (large)
Hello! After upgrading to 2.6.24 (from .23), we're seeing ALOT of messages like in $subj in dmesg: Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. ... with linux NFS server. The clients are all linux too, mostly 2.6.23 and some 2.6.22. I found the "offending" piece of code in net/sunrpc/svcsock.c, in routine svc_tcp_recvfrom() with condition being: if (svsk->sk_reclen > serv->sv_max_mesg) ... This happens after a server reboot. At this point, client(s) are trying to perform some NFS transaction and fail, and server starts generating the above messages - till I do a umount followed by mount on all clients. Before, such situation (nfs server reboot) were handled transparently, ie, there was nothing to do, the mount continued working just fine when the server comes back online. Now, I'm not sure if it's really 2.6.24-specific problem or a userspace problem. Some time ago we also upgraded nfs-kernel-server (Debian) package, and the remount-after-nfs-server-reboot problem started to occur at THAT time (and it is something to worry about as well, I just had no time to deal with it); but the dmesg spamming only appeared with 2.6.24. How to debug the issue further on from this point? Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.24: RPC: bad TCP reclen 0x00020090 (large)
Hello! After upgrading to 2.6.24 (from .23), we're seeing ALOT of messages like in $subj in dmesg: Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. ... with linux NFS server. The clients are all linux too, mostly 2.6.23 and some 2.6.22. I found the offending piece of code in net/sunrpc/svcsock.c, in routine svc_tcp_recvfrom() with condition being: if (svsk-sk_reclen serv-sv_max_mesg) ... This happens after a server reboot. At this point, client(s) are trying to perform some NFS transaction and fail, and server starts generating the above messages - till I do a umount followed by mount on all clients. Before, such situation (nfs server reboot) were handled transparently, ie, there was nothing to do, the mount continued working just fine when the server comes back online. Now, I'm not sure if it's really 2.6.24-specific problem or a userspace problem. Some time ago we also upgraded nfs-kernel-server (Debian) package, and the remount-after-nfs-server-reboot problem started to occur at THAT time (and it is something to worry about as well, I just had no time to deal with it); but the dmesg spamming only appeared with 2.6.24. How to debug the issue further on from this point? Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] quota: Turn quotas off when remounting read-only
Andrew Morton wrote: > On Thu, 7 Feb 2008 15:37:21 +0100 Jan Kara <[EMAIL PROTECTED]> wrote: > >> Turn off quotas before filesystem is remounted read only. Otherwise quota >> will >> try to write to read-only filesystem which does no good... We could also just >> refuse to remount ro when quota is enabled but turning quota off is >> consistent >> with what we do on umount. [a nice one-liner snipped] > Cool. And this is applicable to 2.6.23, 2.6.22 and even earlier, isn't it? Provided the amount of time this issue exists, I don't think it's worth to push it to -stable. It's an ld, issue, which happens quite rarely, and no one bothered to report it so far... But it's not my call... ;) But... I'm thinking about this scenario: # mount /data # quotaon /data (some maintenance stuff to be planned) # mount -o remount,ro /data (do backup etc) # mount -r remount,rw /data at this point, it's expected that quota on /data is enabled. After this patch, it's not anymore... I think it's more usual scenario than mine (umount instead of remount-rw). And this change will break it. So I'm not sure what really to do here. Probably refusing remount-ro if quota is on is better... it's annoying for sure, but at least it's explicit, and avoids the handg too. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remount-ro & umount & quota interaction
Jan Kara wrote: [] >> I mean, why it locks in the first place? Quota subsystem trying >> to write something into an read-only filesystem? If so, WHY it >> is trying to do that on umount instead on a remount-ro? > Actually, I couldn't reproduce the hang on my testing machine so I don't > know exactly why it hangs. But my guess is that it's because we try to > write to the filesystem... I can't reproduce it here easily as well. Yesterday I had a locked-up console and had to hard-reboot the machine due to this (it was far from first time when I've hit this issue), but "on-demand reproducing" don't work (the uptime on that host was about 100 days, and I had to do some repartition - hence remount-ro to copy consistent data to other place - maybe during that 100 day there was something... ;) And I wasn't able to reproduce it on 2.6.24 so far, as well (this one is only used on a test machine so far). I'll keep trying ;) Thanks for your support! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remount-ro & umount & quota interaction
Jan Kara wrote: [deadlock after remount-ro followed with umount when quota is enabled] > Of course, thanks for report :). The problem is we allow remounting > read only which we should refuse when quota is enabled. I'll fix that in > a minute. Hmm. While that will prevent the lockup, maybe it's better to perform an equivalent of quotaoff on mount-ro instead? Or even do something more useful, like flush the quota stuff like the rest of the filesystem is flushed to disk, so that on umount, quota will not stay on the way... I mean, why it locks in the first place? Quota subsystem trying to write something into an read-only filesystem? If so, WHY it is trying to do that on umount instead on a remount-ro? Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] quota: Turn quotas off when remounting read-only
Andrew Morton wrote: On Thu, 7 Feb 2008 15:37:21 +0100 Jan Kara [EMAIL PROTECTED] wrote: Turn off quotas before filesystem is remounted read only. Otherwise quota will try to write to read-only filesystem which does no good... We could also just refuse to remount ro when quota is enabled but turning quota off is consistent with what we do on umount. [a nice one-liner snipped] Cool. And this is applicable to 2.6.23, 2.6.22 and even earlier, isn't it? Provided the amount of time this issue exists, I don't think it's worth to push it to -stable. It's an ld, issue, which happens quite rarely, and no one bothered to report it so far... But it's not my call... ;) But... I'm thinking about this scenario: # mount /data # quotaon /data (some maintenance stuff to be planned) # mount -o remount,ro /data (do backup etc) # mount -r remount,rw /data at this point, it's expected that quota on /data is enabled. After this patch, it's not anymore... I think it's more usual scenario than mine (umount instead of remount-rw). And this change will break it. So I'm not sure what really to do here. Probably refusing remount-ro if quota is on is better... it's annoying for sure, but at least it's explicit, and avoids the handg too. /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remount-ro umount quota interaction
Jan Kara wrote: [] I mean, why it locks in the first place? Quota subsystem trying to write something into an read-only filesystem? If so, WHY it is trying to do that on umount instead on a remount-ro? Actually, I couldn't reproduce the hang on my testing machine so I don't know exactly why it hangs. But my guess is that it's because we try to write to the filesystem... I can't reproduce it here easily as well. Yesterday I had a locked-up console and had to hard-reboot the machine due to this (it was far from first time when I've hit this issue), but on-demand reproducing don't work (the uptime on that host was about 100 days, and I had to do some repartition - hence remount-ro to copy consistent data to other place - maybe during that 100 day there was something... ;) And I wasn't able to reproduce it on 2.6.24 so far, as well (this one is only used on a test machine so far). I'll keep trying ;) Thanks for your support! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
remount-ro & umount & quota interaction
For a long time I'm bitten by a bad interaction of mount -o remount,ro and quota operations. The sequence is as follows: mount /fs quotaon -ug /fs mount -o remount,ro /fs umount /fs At this point, umount never returns. /proc/$pid/wchan shows vfs_quota_off: Feb 6 20:53:25 linux kernel: umountD e5183eb8 0 8646 1 Feb 6 20:53:25 linux kernel:e5183ecc 0086 0002 e5183eb8 e5183eb0 c1db2540 c1db2684 Feb 6 20:53:25 linux kernel:c1db2684 c1c0dd00 cfd9f1c0 c0367080 c0367080 f5849000 f7f06880 Feb 6 20:53:25 linux kernel:f7e89d80 c0367080 b7c9795c 005f3997 00ff Feb 6 20:53:25 linux kernel: Call Trace: Feb 6 20:53:25 linux kernel: [] vfs_quota_off+0x345/0x490 Feb 6 20:53:25 linux kernel: [] autoremove_wake_function+0x0/0x50 Feb 6 20:53:25 linux kernel: [] deactivate_super+0x46/0x80 Feb 6 20:53:25 linux kernel: [] sys_umount+0x4a/0x240 Feb 6 20:53:25 linux kernel: [] sys_stat64+0xf/0x30 Feb 6 20:53:25 linux kernel: [] remove_vma+0x39/0x50 Feb 6 20:53:25 linux kernel: [] do_munmap+0x197/0x1f0 Feb 6 20:53:25 linux kernel: [] sys_oldumount+0x15/0x20 Feb 6 20:53:25 linux kernel: [] sysenter_past_esp+0x5f/0x85 The filesystem is ext3. The issue is here for a long time, at least since before 2.6.20, and is still present in 2.6.23 (I'll try 2.6.24 later today). Can it be fixed please? :) Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
remount-ro umount quota interaction
For a long time I'm bitten by a bad interaction of mount -o remount,ro and quota operations. The sequence is as follows: mount /fs quotaon -ug /fs mount -o remount,ro /fs umount /fs At this point, umount never returns. /proc/$pid/wchan shows vfs_quota_off: Feb 6 20:53:25 linux kernel: umountD e5183eb8 0 8646 1 Feb 6 20:53:25 linux kernel:e5183ecc 0086 0002 e5183eb8 e5183eb0 c1db2540 c1db2684 Feb 6 20:53:25 linux kernel:c1db2684 c1c0dd00 cfd9f1c0 c0367080 c0367080 f5849000 f7f06880 Feb 6 20:53:25 linux kernel:f7e89d80 c0367080 b7c9795c 005f3997 00ff Feb 6 20:53:25 linux kernel: Call Trace: Feb 6 20:53:25 linux kernel: [c01a2a65] vfs_quota_off+0x345/0x490 Feb 6 20:53:25 linux kernel: [c013a3a0] autoremove_wake_function+0x0/0x50 Feb 6 20:53:25 linux kernel: [c0174bf6] deactivate_super+0x46/0x80 Feb 6 20:53:25 linux kernel: [c0188bba] sys_umount+0x4a/0x240 Feb 6 20:53:25 linux kernel: [c017637f] sys_stat64+0xf/0x30 Feb 6 20:53:25 linux kernel: [c0162069] remove_vma+0x39/0x50 Feb 6 20:53:25 linux kernel: [c0162b67] do_munmap+0x197/0x1f0 Feb 6 20:53:25 linux kernel: [c0188dc5] sys_oldumount+0x15/0x20 Feb 6 20:53:25 linux kernel: [c010417e] sysenter_past_esp+0x5f/0x85 The filesystem is ext3. The issue is here for a long time, at least since before 2.6.20, and is still present in 2.6.23 (I'll try 2.6.24 later today). Can it be fixed please? :) Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swsusp on an AMD x2-64, 2.6.24: regression?
Michael Tokarev wrote: > Rafael J. Wysocki wrote: [] >> I guess it's a special variation of >> http://bugzilla.kernel.org/show_bug.cgi?id=9528 >> >> Please try to hibernate in the shutdown mode (ie. echo >> "shutdown" into /sys/power/disk before hibernation). [yes it works with shutdown...] > In any way, this is definitely progress, and that bug > seems to be the same as I'm seeing here. > > Now... I see there's a new BIOS for this mobo available > (it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)), > which is more recent compared with what I have here. Trying > it now (will try to reflash it without a floppy - it turns > out to be quite.. challenging task ;) Ok, updated the bios (using freedos virtual boot floppy provided by memdisk from syslinux), and... now it all works correctly! I was definitely blaming linux for the regression - obviously, as "before-kernel" worked, while "current-kernel" does not anymore. But the problem seems to be due to some bios buglet. Oh well... ;) > Thanks! ! Now I can go on with my other.. question, namely the UPS thingie... :) /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swsusp on an AMD x2-64, 2.6.24: regression?
Rafael J. Wysocki wrote: > On Friday, 1 of February 2008, Michael Tokarev wrote: [] >> no_console_suspend it is. Tried that, the "S|" thing is still >> here, but instead of "Suspending console(s)" it now shows >> progress of suspending other devices. The end result is >> the same - finally it stops and sits here ad infinitum. > > I guess it's a special variation of > http://bugzilla.kernel.org/show_bug.cgi?id=9528 > > Please try to hibernate in the shutdown mode (ie. echo > "shutdown" into /sys/power/disk before hibernation). Hmm. A very obscure thing - that bug, that is. Tried "shutdown" - it works - even with all the other "fancy" stuff like highres timers, cpufreq et al. And it resumes correctly as well. After reading all the stuff attached to that bugreport, I also tried removing ohci_hcd - it also works just fine (had to do it in one line -- rmmod ohci-hcd; sleep 5; echo disk > /sys/power/state -- because I don't have non-USB keyboard handy :) What I also noticied is that at least twice while doing all the experiments, I've seen a message similar to (off memory): ohci_hcd: unlink after non-IRQ - controller is probably using the wrong IRQ this is done when no_console_suspend is enabled - during the final stage of suspend, when the kernel prints messages about disabling acpi devices. I can't reproduce it easily, but it happened at least twice with the same kernel configuration (i tried different options, many variations, recompiling and reinstalling kernel each time). In any way, this is definitely progress, and that bug seems to be the same as I'm seeing here. Now... I see there's a new BIOS for this mobo available (it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)), which is more recent compared with what I have here. Trying it now (will try to reflash it without a floppy - it turns out to be quite.. challenging task ;) Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swsusp on an AMD x2-64, 2.6.24: regression?
Pavel Machek wrote: > On Fri 2008-02-01 00:41:06, Michael Tokarev wrote: [] >> With 2.6.24, it tries to suspend, saves pages to disk, >> when prints this: >> >> ..Saving pages... done. >> Sl It's actually "S|", not "Sl". >> Suspending console(s) >> _ >> >> At this point, nothing more happens. It does not >> react to keyboard or to any other external events, ..because the keyboard is USB-connected, and it shuts down all USB devices. I'll try with PS/2 keyboard (when I'll find one I had somewhere... ;) [] > no_console_suspend (sp?), nohz=off, highres=off, and try with minimum > config. no_console_suspend it is. Tried that, the "S|" thing is still here, but instead of "Suspending console(s)" it now shows progress of suspending other devices. The end result is the same - finally it stops and sits here ad infinitum. nohz and highres are useless now, as I recompiled the kernel without support for those, and without CPU_IDLE and other fancy stuff, and disabled cpufreq just in case. What's minimum config? Should I turn off SMP (it's a dual-core CPU by the way)? Something else? (I already removed most driver modules when when trying suspend - only ones which are absolutely necessary are left). I've read Documentation/power/tricks.txt. From that list, I have the following: o all drivers are unloaded except disk and usb (keyboard) o preempt is disabled (was never enabled) o APIC IS in use. o modules are in use. Is it worth to try module-less? o vga text console - not even "vga" per se, - no framebuffers and such, not even as modules. No "video mode switching support" is enabled. o only a few processes left, in like single-user mode. One other difference between 2.6.23 and 2.6.24 as I see here is: 2.6.24 tells me about TSC unstability (when I load cpufreq stuff), while 2.6.23 did not. This is about 64bit mode - with 32bits, both switches from tsc to hpet, so in this regard, 2.6.24 (with 32bits) is not different from 2.6.23 it seems (i mean in relation with suspend issues, since 32bits .23 mentioned tsc instability yet it suspended fine). So I'm.. stuck. :) Don't know where to go from here. Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swsusp on an AMD x2-64, 2.6.24: regression?
Pavel Machek wrote: On Fri 2008-02-01 00:41:06, Michael Tokarev wrote: [] With 2.6.24, it tries to suspend, saves pages to disk, when prints this: ..Saving pages... done. Sl It's actually S|, not Sl. Suspending console(s) _ At this point, nothing more happens. It does not react to keyboard or to any other external events, ..because the keyboard is USB-connected, and it shuts down all USB devices. I'll try with PS/2 keyboard (when I'll find one I had somewhere... ;) [] no_console_suspend (sp?), nohz=off, highres=off, and try with minimum config. no_console_suspend it is. Tried that, the S| thing is still here, but instead of Suspending console(s) it now shows progress of suspending other devices. The end result is the same - finally it stops and sits here ad infinitum. nohz and highres are useless now, as I recompiled the kernel without support for those, and without CPU_IDLE and other fancy stuff, and disabled cpufreq just in case. What's minimum config? Should I turn off SMP (it's a dual-core CPU by the way)? Something else? (I already removed most driver modules when when trying suspend - only ones which are absolutely necessary are left). I've read Documentation/power/tricks.txt. From that list, I have the following: o all drivers are unloaded except disk and usb (keyboard) o preempt is disabled (was never enabled) o APIC IS in use. o modules are in use. Is it worth to try module-less? o vga text console - not even vga per se, - no framebuffers and such, not even as modules. No video mode switching support is enabled. o only a few processes left, in like single-user mode. One other difference between 2.6.23 and 2.6.24 as I see here is: 2.6.24 tells me about TSC unstability (when I load cpufreq stuff), while 2.6.23 did not. This is about 64bit mode - with 32bits, both switches from tsc to hpet, so in this regard, 2.6.24 (with 32bits) is not different from 2.6.23 it seems (i mean in relation with suspend issues, since 32bits .23 mentioned tsc instability yet it suspended fine). So I'm.. stuck. :) Don't know where to go from here. Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swsusp on an AMD x2-64, 2.6.24: regression?
Rafael J. Wysocki wrote: On Friday, 1 of February 2008, Michael Tokarev wrote: [] no_console_suspend it is. Tried that, the S| thing is still here, but instead of Suspending console(s) it now shows progress of suspending other devices. The end result is the same - finally it stops and sits here ad infinitum. I guess it's a special variation of http://bugzilla.kernel.org/show_bug.cgi?id=9528 Please try to hibernate in the shutdown mode (ie. echo shutdown into /sys/power/disk before hibernation). Hmm. A very obscure thing - that bug, that is. Tried shutdown - it works - even with all the other fancy stuff like highres timers, cpufreq et al. And it resumes correctly as well. After reading all the stuff attached to that bugreport, I also tried removing ohci_hcd - it also works just fine (had to do it in one line -- rmmod ohci-hcd; sleep 5; echo disk /sys/power/state -- because I don't have non-USB keyboard handy :) What I also noticied is that at least twice while doing all the experiments, I've seen a message similar to (off memory): ohci_hcd: unlink after non-IRQ - controller is probably using the wrong IRQ this is done when no_console_suspend is enabled - during the final stage of suspend, when the kernel prints messages about disabling acpi devices. I can't reproduce it easily, but it happened at least twice with the same kernel configuration (i tried different options, many variations, recompiling and reinstalling kernel each time). In any way, this is definitely progress, and that bug seems to be the same as I'm seeing here. Now... I see there's a new BIOS for this mobo available (it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)), which is more recent compared with what I have here. Trying it now (will try to reflash it without a floppy - it turns out to be quite.. challenging task ;) Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernate/suspend-to-disk: to turn power or not?
Pavel Machek wrote: [] >> I'm looking at the uswsusp source (while the kernel compiles), >> and have a question here. Is it possible to call some external >> application (typically a shell script) to do the final work after >> when the image has been written? I mean in principle - I >> understand there are some limitations here, but I don't know >> which exactly. > > No, you can't exec() anything. That would write mtime back to disk and > cause badness. Now that's.. interesting. s2disk writes to a swap device/file, which should update mtime of this device node/file. Isn't it something which also causes the same badness? Also, if the only concern is mtime update, what's really wrong with it? I mean, regardless of whether such update will finally hit the disk or not, there's not much difference really - it updates just mtime field, and there should be no harm in that. Unless such update first goes to a journal (in a journalling filesystem) - which DOES modify some on-disk structures. >> it typically involves writing/reading something to/from >> a given serial port (/dev/ttySxx), or to an USB device, > Create libups.so, and link s2disk to it? And what's the difference here again? We'll open a serial port and write something to it - which, again, will update mtime of that device node. Unless the said node is on a tmpfs, exactly the same badness will happen. Not all the world is udev, after all. So I don't get the reason why we can't exec something here, still. (And, for example, call splashy commands as external processes, instead of linking all this cruft into s2disk and resume.) What I'm thinking about here is - s2ram mlock()s its memory. If it will fork/exec something, that something will obviously NOT be locked like that. Is it of some concern? Probably not, because that something will be executed after we've taken the snapshot. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/