Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-04-02 Thread Michael Tokarev
19.03.2015 14:56, One Thousand Gnomes wrote:
> On Thu, 19 Mar 2015 14:09:29 +0300
> Michael Tokarev  wrote:
> 
>> Half a year passed since my first email in this thread, and current kernels
>> (4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
>> http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
>> which seems to help.  I wonder where they got these boot params from...
>>
> 
> Its one of the standard suggestions for dealing with wonky DRM I think.
> 
> If that makes the difference on your box can you send me a dmidecode of
> it, and I'll see if we can at least teach the driver that the 2500CC
> needs LVDS enabled regardless of what the BIOS reports.

Ok. actually this is not so simple.

Yes, LVDS:d makes a difference.  Namely, it enables monitor connected to
VGA-0 to function.

But once I plug in a digital monitor (DVI-0), screen goes blank when loading
the module again, and this time, it does not matter whenever I specify any
video= options (trying to disable any combinations of listed adaptors),
screen is always blank.

So basically the thing is still unusable.  Because d-sub connection isn't
stable (picture "trembles" depending on the cable and environment conditions),
while digital option does not work.

In bios, there's an option to ENable LVDS (it is disabled by default) and
once enabled, to make it primary or secondary (with either automatically
or manually choosen secondary/primary, being d-sub or dvi).  When I enable
LVDS with any other monitor in bios, the thing does not work again, the
same way (screen goes blank once the module is loaded), but now d-sub/vga
monitor does not work too.

Ouf of curiocity I tried to run windows7 on this machine.  Apparently it
works with dvi monitor just fine and supports configuration with 2 monitors.
Maybe they have some quirks in the drivers, I dunno...

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-04-02 Thread Michael Tokarev
19.03.2015 14:56, One Thousand Gnomes wrote:
 On Thu, 19 Mar 2015 14:09:29 +0300
 Michael Tokarev m...@tls.msk.ru wrote:
 
 Half a year passed since my first email in this thread, and current kernels
 (4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
 http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
 which seems to help.  I wonder where they got these boot params from...

 
 Its one of the standard suggestions for dealing with wonky DRM I think.
 
 If that makes the difference on your box can you send me a dmidecode of
 it, and I'll see if we can at least teach the driver that the 2500CC
 needs LVDS enabled regardless of what the BIOS reports.

Ok. actually this is not so simple.

Yes, LVDS:d makes a difference.  Namely, it enables monitor connected to
VGA-0 to function.

But once I plug in a digital monitor (DVI-0), screen goes blank when loading
the module again, and this time, it does not matter whenever I specify any
video= options (trying to disable any combinations of listed adaptors),
screen is always blank.

So basically the thing is still unusable.  Because d-sub connection isn't
stable (picture trembles depending on the cable and environment conditions),
while digital option does not work.

In bios, there's an option to ENable LVDS (it is disabled by default) and
once enabled, to make it primary or secondary (with either automatically
or manually choosen secondary/primary, being d-sub or dvi).  When I enable
LVDS with any other monitor in bios, the thing does not work again, the
same way (screen goes blank once the module is loaded), but now d-sub/vga
monitor does not work too.

Ouf of curiocity I tried to run windows7 on this machine.  Apparently it
works with dvi monitor just fine and supports configuration with 2 monitors.
Maybe they have some quirks in the drivers, I dunno...

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-20 Thread Michael Tokarev
19.03.2015 23:05, One Thousand Gnomes wrote:
>> Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is
>> graphics/video output on the screen, with the following message from kernel
>> when loading gma500_gfx:
>>
>> [6.472859] [drm] forcing LVDS-1 connector OFF
>>
>> (and a few others).
>>
>> There's one funky thing still -- the screen size is not calculated correctly
>> for the text (vga, d-sub) console, last text line is placed at about 3/4 of
>> the screen size, with the rest - 1/4 - of the screen being blank.
> 
> I've seen that in one other case, where what was in fact happening was
> that forcing the connector "off" was actually effectively leaving it as
> the BIOS set it.

When I use LVDS-1:d in the kernel command line, that connector is not shown
by utilities such as xrandr, at all.  There is, however, another connector,
named LVDS-0, and are also DVI-0, DVI-1, and DisplayPort-0, DisplayPort-1,
while this mobo only have DVI & D-SUB (and LVDS soldered on board too) and
no DP.  At least as far as I can see.  So at least one LVDS connector is
shown anyway (LVDS-0, not LVDS-1), and that one is "not connected".

Besides, DisplayPort-1 is shown as "connected" by xrandr, with monitor set
to 1024x768 mode, -- I think this is why the text VGA size is calculated
wrong.. Lemme see...

..nope.  Adding video=DisplayPort-1:d to the kernel command line (in
addition to video=LVDS-1:d) makes no difference, DisplayPort-1 is still
shown by xrandr as connected @1024x768.

> What happens if you then use xrandr to change the
> display sizes ?

X11 works fine as far as I can see.  Xrandr works and changes video modes.
Once I switch from X back to the text console the text size occupes 3/4 of
the screen only, as if the monitor was smaller.

I wonder if it will work with more than one monitor... ;)  I'll try hopefully
today.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-20 Thread Michael Tokarev
19.03.2015 23:05, One Thousand Gnomes wrote:
 Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is
 graphics/video output on the screen, with the following message from kernel
 when loading gma500_gfx:

 [6.472859] [drm] forcing LVDS-1 connector OFF

 (and a few others).

 There's one funky thing still -- the screen size is not calculated correctly
 for the text (vga, d-sub) console, last text line is placed at about 3/4 of
 the screen size, with the rest - 1/4 - of the screen being blank.
 
 I've seen that in one other case, where what was in fact happening was
 that forcing the connector off was actually effectively leaving it as
 the BIOS set it.

When I use LVDS-1:d in the kernel command line, that connector is not shown
by utilities such as xrandr, at all.  There is, however, another connector,
named LVDS-0, and are also DVI-0, DVI-1, and DisplayPort-0, DisplayPort-1,
while this mobo only have DVI  D-SUB (and LVDS soldered on board too) and
no DP.  At least as far as I can see.  So at least one LVDS connector is
shown anyway (LVDS-0, not LVDS-1), and that one is not connected.

Besides, DisplayPort-1 is shown as connected by xrandr, with monitor set
to 1024x768 mode, -- I think this is why the text VGA size is calculated
wrong.. Lemme see...

..nope.  Adding video=DisplayPort-1:d to the kernel command line (in
addition to video=LVDS-1:d) makes no difference, DisplayPort-1 is still
shown by xrandr as connected @1024x768.

 What happens if you then use xrandr to change the
 display sizes ?

X11 works fine as far as I can see.  Xrandr works and changes video modes.
Once I switch from X back to the text console the text size occupes 3/4 of
the screen only, as if the monitor was smaller.

I wonder if it will work with more than one monitor... ;)  I'll try hopefully
today.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-19 Thread Michael Tokarev
19.03.2015 14:56, One Thousand Gnomes wrote:
> On Thu, 19 Mar 2015 14:09:29 +0300
> Michael Tokarev  wrote:
> 
>> Half a year passed since my first email in this thread, and current kernels
>> (4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
>> http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
>> which seems to help.  I wonder where they got these boot params from...
> 
> Its one of the standard suggestions for dealing with wonky DRM I think.
> 
> If that makes the difference on your box can you send me a dmidecode of
> it, and I'll see if we can at least teach the driver that the 2500CC
> needs LVDS enabled regardless of what the BIOS reports.

I think you mean disable, not enable, since this is (again, I think) what
video=LVDS-1:d kernel boot parameter does.

Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is
graphics/video output on the screen, with the following message from kernel
when loading gma500_gfx:

[6.472859] [drm] forcing LVDS-1 connector OFF

(and a few others).

There's one funky thing still -- the screen size is not calculated correctly
for the text (vga, d-sub) console, last text line is placed at about 3/4 of
the screen size, with the rest - 1/4 - of the screen being blank.

However, X seems to work fine, using generic modesetting driver.

Below is dmidecode output.

Thanks,

/mjt

===
# dmidecode 2.12
SMBIOS 2.7 present.
27 structures occupying 1491 bytes.
Table at 0x000EB920.

Handle 0x, DMI type 4, 42 bytes
Processor Information
Socket Designation: CPU 1
Type: Central Processor
Family: Other
Manufacturer: Intel(R) Corporation
ID: 61 06 03 00 FF FB EB BF
Version: Intel(R) Atom(TM) CPU D2500   @ 1.86GHz
Voltage: 1.1 V
External Clock: 133 MHz
Max Speed: 4000 MHz
Current Speed: 1868 MHz
Status: Populated, Enabled
Upgrade: None
L1 Cache Handle: 0x0003
L2 Cache Handle: 0x0001
L3 Cache Handle: Not Provided
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Core Count: 2
Core Enabled: 2
Thread Count: 1
Characteristics:
64-bit capable
Multi-Core
Execute Protection

Handle 0x0001, DMI type 7, 19 bytes
Cache Information
Socket Designation: Unknown
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 512 kB
Maximum Size: 512 kB
Supported SRAM Types:
Asynchronous
Installed SRAM Type: Asynchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 8-way Set-associative

Handle 0x0002, DMI type 7, 19 bytes
Cache Information
Socket Designation: Unknown
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 32 kB
Maximum Size: 32 kB
Supported SRAM Types:
Asynchronous
Installed SRAM Type: Asynchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Instruction
Associativity: 8-way Set-associative

Handle 0x0003, DMI type 7, 19 bytes
Cache Information
Socket Designation: Unknown
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 24 kB
Maximum Size: 24 kB
Supported SRAM Types:
Asynchronous
Installed SRAM Type: Asynchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 32-way Set-associative

Handle 0x0004, DMI type 0, 24 bytes
BIOS Information
Vendor: Intel Corp.
Version: CCCDT10N.86A.0039.2013.0425.1625
Release Date: 04/25/2013
Address: 0xF
Runtime Size: 64 kB
ROM Size: 2048 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
ATAPI Zip drive boot is supported
BIOS boot specification is supported
Function key-initiated network

Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-19 Thread Michael Tokarev
19.03.2015 14:09, Michael Tokarev wrote:
> Half a year passed since my first email in this thread, and current kernels

Actually it was more than a year, since Feb-2014 ;)

> (4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
> http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
> which seems to help.  I wonder where they got these boot params from...
> 
> Thanks,
> 
> /mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-19 Thread Michael Tokarev
Half a year passed since my first email in this thread, and current kernels
(4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
which seems to help.  I wonder where they got these boot params from...

Thanks,

/mjt

05.08.2014 20:15, Michael Tokarev wrote:
> 05.08.2014 20:11, Michael Tokarev wrote:
>> Hello again.
>>
>> It's been 4 more months since last message in this thread (which was mine).
>> Now kernel 3.16 has been released, and I decided to give it a try.  And it
>> behaves just like all previous kernels, -- once gma500_gfx module is loaded,
>> screen goes blank, monitor turns off ("no signal detected") and nothing to
>> be seen until reboot.
>>
>> Can we try to debug this somehow, after more than half a year?... :)
> 
> Current debugging (by 3.16), after:
> 
>  modprobe drm debug=6
>  modprobe gma500_gfx
> 
> on a freshly booted system:
> 
> [   46.463381] Linux agpgart interface v0.103
> [   46.491487] [drm] Initialized drm 1.1.0 20060810
> [   56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported
> [   56.585528] [drm:psb_intel_opregion_setup] ASLE supported
> [   56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X
> [   56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT 
> CEDARVIEW  d
> [   56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:"1920x1080" 0 
> 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
> [   56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found 
> in VBT
> [   56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 
> 500 t11_t12 5000
> [   56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, 
> Bpp 24
> [   56.585624] [drm:parse_edp] VBT reports EDP: VSwing  0, Preemph 0
> [   56.598203] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
> [   56.598902] acpi device:28: registered as cooling_device2
> [   56.599109] input: Video Bus as 
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11
> [   56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [   56.599366] [drm] No driver support for vblank timestamp query.
> [   56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter 
> intel drm LVDSDDC_C
> [   56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B
> [   56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
> [   56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
> [   56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
> [   56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
> [   56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C
> [   56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
> [   56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
> [   56.714765] gma500 :00:02.0: trying to get vblank count for disabled 
> pipe 1
> [   56.714812] gma500 :00:02.0: trying to get vblank count for disabled 
> pipe 1
> [   56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
> [CONNECTOR:10:VGA-1]
> [   56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
> [CONNECTOR:10:VGA-1] probed modes :
> [   56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:"1280x1024" 60 
> 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
> [   56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:"1280x1024" 75 
> 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
> [   56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:"1280x1024" 72 
> 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6
> [   56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:"1152x864" 75 
> 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
> [   56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:"1024x768" 75 
> 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
> [   56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:"1024x768" 70 
> 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
> [   56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:"1024x768" 60 
> 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
> [   56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:"832x624" 75 
> 57284 832 864 928 1152 624 625 628 667 0x40 0xa
> [   56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:"800x600" 75 
> 49500 800 816 896 1056 600 601 604 625 0x40 0x5
> [   56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:"800x600" 72 
> 5 800 856 976 1040 600 637 643 

Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-19 Thread Michael Tokarev
19.03.2015 14:09, Michael Tokarev wrote:
 Half a year passed since my first email in this thread, and current kernels

Actually it was more than a year, since Feb-2014 ;)

 (4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
 http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
 which seems to help.  I wonder where they got these boot params from...
 
 Thanks,
 
 /mjt

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-19 Thread Michael Tokarev
Half a year passed since my first email in this thread, and current kernels
(4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
which seems to help.  I wonder where they got these boot params from...

Thanks,

/mjt

05.08.2014 20:15, Michael Tokarev wrote:
 05.08.2014 20:11, Michael Tokarev wrote:
 Hello again.

 It's been 4 more months since last message in this thread (which was mine).
 Now kernel 3.16 has been released, and I decided to give it a try.  And it
 behaves just like all previous kernels, -- once gma500_gfx module is loaded,
 screen goes blank, monitor turns off (no signal detected) and nothing to
 be seen until reboot.

 Can we try to debug this somehow, after more than half a year?... :)
 
 Current debugging (by 3.16), after:
 
  modprobe drm debug=6
  modprobe gma500_gfx
 
 on a freshly booted system:
 
 [   46.463381] Linux agpgart interface v0.103
 [   46.491487] [drm] Initialized drm 1.1.0 20060810
 [   56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported
 [   56.585528] [drm:psb_intel_opregion_setup] ASLE supported
 [   56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X
 [   56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT 
 CEDARVIEW  d
 [   56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:1920x1080 0 
 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
 [   56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found 
 in VBT
 [   56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 
 500 t11_t12 5000
 [   56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, 
 Bpp 24
 [   56.585624] [drm:parse_edp] VBT reports EDP: VSwing  0, Preemph 0
 [   56.598203] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
 [   56.598902] acpi device:28: registered as cooling_device2
 [   56.599109] input: Video Bus as 
 /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11
 [   56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
 [   56.599366] [drm] No driver support for vblank timestamp query.
 [   56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter 
 intel drm LVDSDDC_C
 [   56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B
 [   56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
 [   56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
 [   56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
 [   56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
 [   56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C
 [   56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
 [   56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
 [   56.714765] gma500 :00:02.0: trying to get vblank count for disabled 
 pipe 1
 [   56.714812] gma500 :00:02.0: trying to get vblank count for disabled 
 pipe 1
 [   56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
 [CONNECTOR:10:VGA-1]
 [   56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
 [CONNECTOR:10:VGA-1] probed modes :
 [   56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:1280x1024 60 
 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
 [   56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:1280x1024 75 
 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
 [   56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:1280x1024 72 
 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6
 [   56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:1152x864 75 
 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
 [   56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:1024x768 75 
 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
 [   56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:1024x768 70 
 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
 [   56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:1024x768 60 
 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
 [   56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:832x624 75 
 57284 832 864 928 1152 624 625 628 667 0x40 0xa
 [   56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:800x600 75 
 49500 800 816 896 1056 600 601 604 625 0x40 0x5
 [   56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:800x600 72 
 5 800 856 976 1040 600 637 643 666 0x40 0x5
 [   56.900681] [drm:drm_mode_debug_printmodeline] Modeline 30:800x600 60 
 4 800 840 968 1056 600 601 605 628 0x40 0x5
 [   56.900687] [drm:drm_mode_debug_printmodeline] Modeline 31:640x480 75 
 31500 640 656 720 840 480 481 484 500 0x40 0xa
 [   56.900694] [drm:drm_mode_debug_printmodeline] Modeline 32:640x480 73 
 31500 640 664 704 832 480 489 491 520 0x40 0xa
 [   56.900700] [drm:drm_mode_debug_printmodeline] Modeline 33:640x480 67 
 30240 640 704 768 864 480 483 486

Re: screen goes blank when loading gma500_gfx (atom D2500)

2015-03-19 Thread Michael Tokarev
19.03.2015 14:56, One Thousand Gnomes wrote:
 On Thu, 19 Mar 2015 14:09:29 +0300
 Michael Tokarev m...@tls.msk.ru wrote:
 
 Half a year passed since my first email in this thread, and current kernels
 (4.0-tobe) still does not work properly.  Meanwhile, I found this thread:
 http://www.linuxquestions.org/questions/slackware-installation-40/black-screen-on-intel-desktopboard-d2500cc-4175503983/
 which seems to help.  I wonder where they got these boot params from...
 
 Its one of the standard suggestions for dealing with wonky DRM I think.
 
 If that makes the difference on your box can you send me a dmidecode of
 it, and I'll see if we can at least teach the driver that the 2500CC
 needs LVDS enabled regardless of what the BIOS reports.

I think you mean disable, not enable, since this is (again, I think) what
video=LVDS-1:d kernel boot parameter does.

Yes, with video=LVDS-1:d boot parameter, kernel boots fine and there is
graphics/video output on the screen, with the following message from kernel
when loading gma500_gfx:

[6.472859] [drm] forcing LVDS-1 connector OFF

(and a few others).

There's one funky thing still -- the screen size is not calculated correctly
for the text (vga, d-sub) console, last text line is placed at about 3/4 of
the screen size, with the rest - 1/4 - of the screen being blank.

However, X seems to work fine, using generic modesetting driver.

Below is dmidecode output.

Thanks,

/mjt

===
# dmidecode 2.12
SMBIOS 2.7 present.
27 structures occupying 1491 bytes.
Table at 0x000EB920.

Handle 0x, DMI type 4, 42 bytes
Processor Information
Socket Designation: CPU 1
Type: Central Processor
Family: Other
Manufacturer: Intel(R) Corporation
ID: 61 06 03 00 FF FB EB BF
Version: Intel(R) Atom(TM) CPU D2500   @ 1.86GHz
Voltage: 1.1 V
External Clock: 133 MHz
Max Speed: 4000 MHz
Current Speed: 1868 MHz
Status: Populated, Enabled
Upgrade: None
L1 Cache Handle: 0x0003
L2 Cache Handle: 0x0001
L3 Cache Handle: Not Provided
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Core Count: 2
Core Enabled: 2
Thread Count: 1
Characteristics:
64-bit capable
Multi-Core
Execute Protection

Handle 0x0001, DMI type 7, 19 bytes
Cache Information
Socket Designation: Unknown
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 512 kB
Maximum Size: 512 kB
Supported SRAM Types:
Asynchronous
Installed SRAM Type: Asynchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 8-way Set-associative

Handle 0x0002, DMI type 7, 19 bytes
Cache Information
Socket Designation: Unknown
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 32 kB
Maximum Size: 32 kB
Supported SRAM Types:
Asynchronous
Installed SRAM Type: Asynchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Instruction
Associativity: 8-way Set-associative

Handle 0x0003, DMI type 7, 19 bytes
Cache Information
Socket Designation: Unknown
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 24 kB
Maximum Size: 24 kB
Supported SRAM Types:
Asynchronous
Installed SRAM Type: Asynchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 32-way Set-associative

Handle 0x0004, DMI type 0, 24 bytes
BIOS Information
Vendor: Intel Corp.
Version: CCCDT10N.86A.0039.2013.0425.1625
Release Date: 04/25/2013
Address: 0xF
Runtime Size: 64 kB
ROM Size: 2048 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
ATAPI Zip drive boot is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content

Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-08-05 Thread Michael Tokarev
05.08.2014 20:11, Michael Tokarev wrote:
> Hello again.
> 
> It's been 4 more months since last message in this thread (which was mine).
> Now kernel 3.16 has been released, and I decided to give it a try.  And it
> behaves just like all previous kernels, -- once gma500_gfx module is loaded,
> screen goes blank, monitor turns off ("no signal detected") and nothing to
> be seen until reboot.
> 
> Can we try to debug this somehow, after more than half a year?... :)

Current debugging (by 3.16), after:

 modprobe drm debug=6
 modprobe gma500_gfx

on a freshly booted system:

[   46.463381] Linux agpgart interface v0.103
[   46.491487] [drm] Initialized drm 1.1.0 20060810
[   56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported
[   56.585528] [drm:psb_intel_opregion_setup] ASLE supported
[   56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X
[   56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT 
CEDARVIEW  d
[   56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:"1920x1080" 0 
144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
[   56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found in 
VBT
[   56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 
500 t11_t12 5000
[   56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 
24
[   56.585624] [drm:parse_edp] VBT reports EDP: VSwing  0, Preemph 0
[   56.598203] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   56.598902] acpi device:28: registered as cooling_device2
[   56.599109] input: Video Bus as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11
[   56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   56.599366] [drm] No driver support for vblank timestamp query.
[   56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter 
intel drm LVDSDDC_C
[   56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B
[   56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
[   56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
[   56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
[   56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
[   56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C
[   56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
[   56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
[   56.714765] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[   56.714812] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[   56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
[CONNECTOR:10:VGA-1]
[   56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
[CONNECTOR:10:VGA-1] probed modes :
[   56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:"1280x1024" 60 
108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
[   56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:"1280x1024" 75 
135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
[   56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:"1280x1024" 72 
132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6
[   56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:"1152x864" 75 
108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
[   56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:"1024x768" 75 
78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
[   56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:"1024x768" 70 
75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
[   56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:"1024x768" 60 
65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[   56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:"832x624" 75 
57284 832 864 928 1152 624 625 628 667 0x40 0xa
[   56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:"800x600" 75 
49500 800 816 896 1056 600 601 604 625 0x40 0x5
[   56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:"800x600" 72 
5 800 856 976 1040 600 637 643 666 0x40 0x5
[   56.900681] [drm:drm_mode_debug_printmodeline] Modeline 30:"800x600" 60 
4 800 840 968 1056 600 601 605 628 0x40 0x5
[   56.900687] [drm:drm_mode_debug_printmodeline] Modeline 31:"640x480" 75 
31500 640 656 720 840 480 481 484 500 0x40 0xa
[   56.900694] [drm:drm_mode_debug_printmodeline] Modeline 32:"640x480" 73 
31500 640 664 704 832 480 489 491 520 0x40 0xa
[   56.900700] [drm:drm_mode_debug_printmodeline] Modeline 33:"640x480" 67 
30240 640 704 768 864 480 483 486 525 0x40 0xa
[   56.900706] [drm:drm_mode_debug_printmodeline] Modeline 34:"640x480" 60 
25200 640 656 752 800 480 490 492 525 0x40 0xa
[   56.900713] [drm:drm_mode_debug_printmodeline] Modeline 35:

Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-08-05 Thread Michael Tokarev
Hello again.

It's been 4 more months since last message in this thread (which was mine).
Now kernel 3.16 has been released, and I decided to give it a try.  And it
behaves just like all previous kernels, -- once gma500_gfx module is loaded,
screen goes blank, monitor turns off ("no signal detected") and nothing to
be seen until reboot.

Can we try to debug this somehow, after more than half a year?... :)

Thank you,

/mjt

05.04.2014 12:15, Michael Tokarev wrote:
> Hello again
> 
> It's been about 2 months since I sent the original debugging output.  Today I 
> tried
> out 3.14 kernel.  And this one behaves quite similarly, screen goes blank 
> right
> when loading gma500_gfx module.  Here's the dmesg from a freshly booted system
> after doing
> 
>   modprobe drm debug=6
>   modprobe gma500_gfx
> 
> with a monitor connected to VGA port (before loading gma500_gfx, it displays 
> the
> regular text console):
> 
> [   39.863330] Linux agpgart interface v0.103
> [   39.900511] [drm] Initialized drm 1.1.0 20060810
> [   45.012300] [drm:psb_intel_opregion_setup], Public ACPI methods supported
> [   45.012308] [drm:psb_intel_opregion_setup], ASLE supported
> [   45.012345] gma500 :00:02.0: irq 50 for MSI/MSI-X
> [   45.012371] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT 
> CEDARVIEW  d
> [   45.012384] [drm:drm_mode_debug_printmodeline], Modeline 0:"1920x1080" 0 
> 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
> [   45.012389] [drm:parse_sdvo_device_mapping], No SDVO device info is found 
> in VBT
> [   45.012397] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 
> t10 500 t11_t12 5000
> [   45.012401] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, 
> Bpp 24
> [   45.012405] [drm:parse_edp], VBT reports EDP: VSwing  0, Preemph 0
> [   45.012478] gma500 :00:02.0: GPU: power management timed out.
> [   45.026195] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
> [   45.026891] acpi device:29: registered as cooling_device2
> [   45.027104] input: Video Bus as 
> /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
> [   45.027681] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [   45.027726] [drm] No driver support for vblank timestamp query.
> [   45.078928] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent 
> adapter intel drm LVDSDDC_C
> [   45.079839] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B
> [   45.080383] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
> [   45.080388] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
> [   45.080896] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
> [   45.080899] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
> [   45.081754] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C
> [   45.082062] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
> [   45.082272] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
> [   45.122742] [drm:cdv_intel_single_pipe_active], pipe enabled 0
> [   45.142780] gma500 :00:02.0: trying to get vblank count for disabled 
> pipe 1
> [   45.142826] gma500 :00:02.0: trying to get vblank count for disabled 
> pipe 1
> [   45.183207] [drm:cdv_intel_single_pipe_active], pipe enabled 0
> [   45.203249] [drm:drm_helper_probe_single_connector_modes], 
> [CONNECTOR:7:VGA-1]
> [   45.332286] [drm:drm_helper_probe_single_connector_modes], 
> [CONNECTOR:7:VGA-1] probed modes :
> [   45.332297] [drm:drm_mode_debug_printmodeline], Modeline 23:"1280x1024" 60 
> 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
> [   45.332304] [drm:drm_mode_debug_printmodeline], Modeline 33:"1280x1024" 75 
> 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
> [   45.332311] [drm:drm_mode_debug_printmodeline], Modeline 26:"1280x1024" 72 
> 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6
> [   45.332318] [drm:drm_mode_debug_printmodeline], Modeline 25:"1152x864" 75 
> 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
> [   45.332325] [drm:drm_mode_debug_printmodeline], Modeline 34:"1024x768" 75 
> 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
> [   45.332332] [drm:drm_mode_debug_printmodeline], Modeline 35:"1024x768" 70 
> 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
> [   45.332338] [drm:drm_mode_debug_printmodeline], Modeline 36:"1024x768" 60 
> 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
> [   45.332345] [drm:drm_mode_debug_printmodeline], Modeline 37:"832x624" 75 
> 57284 832 864 928 1152 624 625 628 667 0x40 0xa
> [   45.332352] [drm:drm_mode_debug_printmodeline], Modeline 38:"800x600" 75 
> 49500 800 816 896 1056 600 601 604 625 0x40 0x5
> [   45.332359] [drm:dr

Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-08-05 Thread Michael Tokarev
Hello again.

It's been 4 more months since last message in this thread (which was mine).
Now kernel 3.16 has been released, and I decided to give it a try.  And it
behaves just like all previous kernels, -- once gma500_gfx module is loaded,
screen goes blank, monitor turns off (no signal detected) and nothing to
be seen until reboot.

Can we try to debug this somehow, after more than half a year?... :)

Thank you,

/mjt

05.04.2014 12:15, Michael Tokarev wrote:
 Hello again
 
 It's been about 2 months since I sent the original debugging output.  Today I 
 tried
 out 3.14 kernel.  And this one behaves quite similarly, screen goes blank 
 right
 when loading gma500_gfx module.  Here's the dmesg from a freshly booted system
 after doing
 
   modprobe drm debug=6
   modprobe gma500_gfx
 
 with a monitor connected to VGA port (before loading gma500_gfx, it displays 
 the
 regular text console):
 
 [   39.863330] Linux agpgart interface v0.103
 [   39.900511] [drm] Initialized drm 1.1.0 20060810
 [   45.012300] [drm:psb_intel_opregion_setup], Public ACPI methods supported
 [   45.012308] [drm:psb_intel_opregion_setup], ASLE supported
 [   45.012345] gma500 :00:02.0: irq 50 for MSI/MSI-X
 [   45.012371] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT 
 CEDARVIEW  d
 [   45.012384] [drm:drm_mode_debug_printmodeline], Modeline 0:1920x1080 0 
 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
 [   45.012389] [drm:parse_sdvo_device_mapping], No SDVO device info is found 
 in VBT
 [   45.012397] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 
 t10 500 t11_t12 5000
 [   45.012401] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, 
 Bpp 24
 [   45.012405] [drm:parse_edp], VBT reports EDP: VSwing  0, Preemph 0
 [   45.012478] gma500 :00:02.0: GPU: power management timed out.
 [   45.026195] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
 [   45.026891] acpi device:29: registered as cooling_device2
 [   45.027104] input: Video Bus as 
 /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
 [   45.027681] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
 [   45.027726] [drm] No driver support for vblank timestamp query.
 [   45.078928] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent 
 adapter intel drm LVDSDDC_C
 [   45.079839] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B
 [   45.080383] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
 [   45.080388] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
 [   45.080896] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
 [   45.080899] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
 [   45.081754] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C
 [   45.082062] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
 [   45.082272] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
 [   45.122742] [drm:cdv_intel_single_pipe_active], pipe enabled 0
 [   45.142780] gma500 :00:02.0: trying to get vblank count for disabled 
 pipe 1
 [   45.142826] gma500 :00:02.0: trying to get vblank count for disabled 
 pipe 1
 [   45.183207] [drm:cdv_intel_single_pipe_active], pipe enabled 0
 [   45.203249] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:7:VGA-1]
 [   45.332286] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:7:VGA-1] probed modes :
 [   45.332297] [drm:drm_mode_debug_printmodeline], Modeline 23:1280x1024 60 
 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
 [   45.332304] [drm:drm_mode_debug_printmodeline], Modeline 33:1280x1024 75 
 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
 [   45.332311] [drm:drm_mode_debug_printmodeline], Modeline 26:1280x1024 72 
 132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6
 [   45.332318] [drm:drm_mode_debug_printmodeline], Modeline 25:1152x864 75 
 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
 [   45.332325] [drm:drm_mode_debug_printmodeline], Modeline 34:1024x768 75 
 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
 [   45.332332] [drm:drm_mode_debug_printmodeline], Modeline 35:1024x768 70 
 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
 [   45.332338] [drm:drm_mode_debug_printmodeline], Modeline 36:1024x768 60 
 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
 [   45.332345] [drm:drm_mode_debug_printmodeline], Modeline 37:832x624 75 
 57284 832 864 928 1152 624 625 628 667 0x40 0xa
 [   45.332352] [drm:drm_mode_debug_printmodeline], Modeline 38:800x600 75 
 49500 800 816 896 1056 600 601 604 625 0x40 0x5
 [   45.332359] [drm:drm_mode_debug_printmodeline], Modeline 39:800x600 72 
 5 800 856 976 1040 600 637 643 666 0x40 0x5
 [   45.332365] [drm:drm_mode_debug_printmodeline], Modeline 27:800x600 60 
 4 800 840 968 1056 600 601 605 628 0x40 0x5
 [   45.332372] [drm:drm_mode_debug_printmodeline], Modeline 28:640x480 75 
 31500 640 656 720 840 480 481 484 500 0x40 0xa
 [   45.332379] [drm:drm_mode_debug_printmodeline], Modeline 29:640x480 73 
 31500 640 664

Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-08-05 Thread Michael Tokarev
05.08.2014 20:11, Michael Tokarev wrote:
 Hello again.
 
 It's been 4 more months since last message in this thread (which was mine).
 Now kernel 3.16 has been released, and I decided to give it a try.  And it
 behaves just like all previous kernels, -- once gma500_gfx module is loaded,
 screen goes blank, monitor turns off (no signal detected) and nothing to
 be seen until reboot.
 
 Can we try to debug this somehow, after more than half a year?... :)

Current debugging (by 3.16), after:

 modprobe drm debug=6
 modprobe gma500_gfx

on a freshly booted system:

[   46.463381] Linux agpgart interface v0.103
[   46.491487] [drm] Initialized drm 1.1.0 20060810
[   56.585520] [drm:psb_intel_opregion_setup] Public ACPI methods supported
[   56.585528] [drm:psb_intel_opregion_setup] ASLE supported
[   56.585563] gma500 :00:02.0: irq 50 for MSI/MSI-X
[   56.585591] [drm:psb_intel_init_bios] Using VBT from OpRegion: $VBT 
CEDARVIEW  d
[   56.585604] [drm:drm_mode_debug_printmodeline] Modeline 0:1920x1080 0 
144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
[   56.585609] [drm:parse_sdvo_device_mapping] No SDVO device info is found in 
VBT
[   56.585617] [drm:parse_edp] EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 
500 t11_t12 5000
[   56.585621] [drm:parse_edp] VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 
24
[   56.585624] [drm:parse_edp] VBT reports EDP: VSwing  0, Preemph 0
[   56.598203] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   56.598902] acpi device:28: registered as cooling_device2
[   56.599109] input: Video Bus as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11
[   56.599326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   56.599366] [drm] No driver support for vblank timestamp query.
[   56.650918] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter 
intel drm LVDSDDC_C
[   56.651842] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-B
[   56.652352] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
[   56.652356] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
[   56.652863] [drm:cdv_intel_dp_aux_ch] dp_aux_ch timeout status 0x51440064
[   56.652866] [drm:cdv_intel_dp_i2c_aux_ch] aux_ch failed -110
[   56.653706] [drm:cdv_intel_dp_i2c_init] i2c_init DPDDC-C
[   56.654014] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
[   56.654223] [drm:cdv_intel_dp_i2c_aux_ch] aux_i2c nack
[   56.714765] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[   56.714812] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[   56.775220] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
[CONNECTOR:10:VGA-1]
[   56.900606] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
[CONNECTOR:10:VGA-1] probed modes :
[   56.900617] [drm:drm_mode_debug_printmodeline] Modeline 26:1280x1024 60 
108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
[   56.900624] [drm:drm_mode_debug_printmodeline] Modeline 36:1280x1024 75 
135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
[   56.900630] [drm:drm_mode_debug_printmodeline] Modeline 29:1280x1024 72 
132840 1280 1368 1504 1728 1024 1025 1028 1067 0x0 0x6
[   56.900637] [drm:drm_mode_debug_printmodeline] Modeline 28:1152x864 75 
108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
[   56.900643] [drm:drm_mode_debug_printmodeline] Modeline 37:1024x768 75 
78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
[   56.900649] [drm:drm_mode_debug_printmodeline] Modeline 38:1024x768 70 
75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
[   56.900656] [drm:drm_mode_debug_printmodeline] Modeline 39:1024x768 60 
65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[   56.900662] [drm:drm_mode_debug_printmodeline] Modeline 40:832x624 75 
57284 832 864 928 1152 624 625 628 667 0x40 0xa
[   56.900669] [drm:drm_mode_debug_printmodeline] Modeline 41:800x600 75 
49500 800 816 896 1056 600 601 604 625 0x40 0x5
[   56.900675] [drm:drm_mode_debug_printmodeline] Modeline 42:800x600 72 
5 800 856 976 1040 600 637 643 666 0x40 0x5
[   56.900681] [drm:drm_mode_debug_printmodeline] Modeline 30:800x600 60 
4 800 840 968 1056 600 601 605 628 0x40 0x5
[   56.900687] [drm:drm_mode_debug_printmodeline] Modeline 31:640x480 75 
31500 640 656 720 840 480 481 484 500 0x40 0xa
[   56.900694] [drm:drm_mode_debug_printmodeline] Modeline 32:640x480 73 
31500 640 664 704 832 480 489 491 520 0x40 0xa
[   56.900700] [drm:drm_mode_debug_printmodeline] Modeline 33:640x480 67 
30240 640 704 768 864 480 483 486 525 0x40 0xa
[   56.900706] [drm:drm_mode_debug_printmodeline] Modeline 34:640x480 60 
25200 640 656 752 800 480 490 492 525 0x40 0xa
[   56.900713] [drm:drm_mode_debug_printmodeline] Modeline 35:720x400 70 
28320 720 738 846 900 400 412 414 449 0x40 0x6
[   56.900719] [drm:drm_mode_debug_printmodeline] Modeline 27:640x350 70 
25170 640 656 752 800 350 387 389 449 0x40 0x9
[   56.900724] [drm:drm_helper_probe_single_connector_modes_merge_bits] 
[CONNECTOR:12:LVDS-1

Re: [PATCH] arch: x86: kvm: x86.c: Cleaning up uninitialized variables

2014-06-03 Thread Michael Tokarev
03.06.2014 16:04, Paolo Bonzini wrote:
> Il 01/06/2014 01:05, Rickard Strandqvist ha scritto:
>> There is a risk that the variable will be used without being initialized.
>>
>> This was largely found by using a static code analysis program called 
>> cppcheck.
>>
>> Signed-off-by: Rickard Strandqvist 
> 
> No, there isn't.  The full context looks like this:
> 
> longmode = is_long_mode(vcpu) && cs_l == 1;
> if (!longmode) {
> param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) |
> (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0x);
> ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) |
> (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0x);
> outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) |
> (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0x);
> }
> #ifdef CONFIG_X86_64
> else {
> param = kvm_register_read(vcpu, VCPU_REGS_RCX);
> ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX);
> outgpa = kvm_register_read(vcpu, VCPU_REGS_R8);
> }
> #endif
> 
> and longmode must be zero if !CONFIG_X86_64:

This is not the first time this code is attempted to be changed.

Maybe adding an additional #ifdef..endif around the longmode
assignment and the "if" above will solve this for good?

Or maybe something like this:

 #ifdef CONFIG_X86_64
 if (!(is_long_mode(vcpu) && cs_l == 1)) {
 #else
 if (1) {
 #endif
 param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) |
 (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0x);
 ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) |
 (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0x);
 outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) |
 (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0x);
 }
 else {
 param = kvm_register_read(vcpu, VCPU_REGS_RCX);
 ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX);
 outgpa = kvm_register_read(vcpu, VCPU_REGS_R8);
 }

, to make it all explicit and obvious?

Thanks,

/mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arch: x86: kvm: x86.c: Cleaning up uninitialized variables

2014-06-03 Thread Michael Tokarev
03.06.2014 16:04, Paolo Bonzini wrote:
 Il 01/06/2014 01:05, Rickard Strandqvist ha scritto:
 There is a risk that the variable will be used without being initialized.

 This was largely found by using a static code analysis program called 
 cppcheck.

 Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se
 
 No, there isn't.  The full context looks like this:
 
 longmode = is_long_mode(vcpu)  cs_l == 1;
 if (!longmode) {
 param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX)  32) |
 (kvm_register_read(vcpu, VCPU_REGS_RAX)  0x);
 ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX)  32) |
 (kvm_register_read(vcpu, VCPU_REGS_RCX)  0x);
 outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI)  32) |
 (kvm_register_read(vcpu, VCPU_REGS_RSI)  0x);
 }
 #ifdef CONFIG_X86_64
 else {
 param = kvm_register_read(vcpu, VCPU_REGS_RCX);
 ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX);
 outgpa = kvm_register_read(vcpu, VCPU_REGS_R8);
 }
 #endif
 
 and longmode must be zero if !CONFIG_X86_64:

This is not the first time this code is attempted to be changed.

Maybe adding an additional #ifdef..endif around the longmode
assignment and the if above will solve this for good?

Or maybe something like this:

 #ifdef CONFIG_X86_64
 if (!(is_long_mode(vcpu)  cs_l == 1)) {
 #else
 if (1) {
 #endif
 param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX)  32) |
 (kvm_register_read(vcpu, VCPU_REGS_RAX)  0x);
 ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX)  32) |
 (kvm_register_read(vcpu, VCPU_REGS_RCX)  0x);
 outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI)  32) |
 (kvm_register_read(vcpu, VCPU_REGS_RSI)  0x);
 }
 else {
 param = kvm_register_read(vcpu, VCPU_REGS_RCX);
 ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX);
 outgpa = kvm_register_read(vcpu, VCPU_REGS_R8);
 }

, to make it all explicit and obvious?

Thanks,

/mjt

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-04-05 Thread Michael Tokarev
51935] [drm:drm_target_preferred], looking for preferred mode on 
connector 9
[   45.351938] [drm:drm_target_preferred], found mode 1920x1080
[   45.351942] [drm:drm_target_preferred], looking for cmdline mode on 
connector 20
[   45.351945] [drm:drm_target_preferred], looking for preferred mode on 
connector 20
[   45.351949] [drm:drm_target_preferred], found mode 1024x768
[   45.351953] [drm:drm_setup_crtcs], picking CRTCs for 4096x4096 config
[   45.351962] [drm:drm_setup_crtcs], desired mode 1280x1024 set on crtc 3
[   45.351967] [drm:drm_setup_crtcs], desired mode 1920x1080 set on crtc 4
[   45.351987] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on 
minor 0

Thank you!

/mjt

15.02.2014 22:28, Michael Tokarev wrote:
> 10.02.2014 14:44, One Thousand Gnomes wrote:
>>> fbcon is loaded so it isn't an issue.
>>>
>>> I tried 3.10 kernel initially (the above messages are from it), next
>>> I tried 3.13 kernel too, and that one behaves exactly the same.
>>>
>>> As far as I remember, this system never worked with graphics well.
>>> Previous kernel (from which I updated) was 3.2 which had no
>>> gma500 module (local build).
>>>
>>> What are the steps to debug this further?
>>
>> Check you have X86_SYSFB and SIMPLEFB disabled
> 
> Neither of these options exists in 3.10 config.  In 3.13 I had X86_SYSFB set
> to y initially (SIMPLEFB doesn't exist there too), but setting it to n does
> not make any difference.
> 
>> Boot with drm.debug=6
>>
>> collect the logs
> 
> I used `modprobe drm debug=6' (initially booting with gma500_gfx module
> disabled), followed with `modprobe gma500_gfx'.  After loading module
> the screen goes blank as before, and monitor says 'no signal detected'.
> 
> Here are the logs:
> 
> [599286.739923] Linux agpgart interface v0.103
> [599286.765176] [drm] Initialized drm 1.1.0 20060810
> [599303.673734] gma500 :00:02.0: setting latency timer to 64
> [599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported
> [599303.673887] [drm:psb_intel_opregion_setup], ASLE supported
> [599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X
> [599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT 
> CEDARVIEW  d
> [599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:"1920x1080" 0 
> 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
> [599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found 
> in VBT
> [599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 
> t10 500 t11_t12 5000
> [599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, 
> Bpp 24
> [599303.673984] [drm:parse_edp], VBT reports EDP: VSwing  0, Preemph 0
> [599303.688094] acpi device:29: registered as cooling_device2
> [599303.688446] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
> [599303.688557] input: Video Bus as 
> /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
> [599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> [599303.689188] [drm] No driver support for vblank timestamp query.
> [599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent 
> adapter intel drm LVDSDDC_C
> [599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B
> [599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
> [599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
> [599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
> [599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
> [599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C
> [599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
> [599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
> [599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0
> [599303.803958] gma500 :00:02.0: trying to get vblank count for disabled 
> pipe 1
> [599303.803996] gma500 :00:02.0: trying to get vblank count for disabled 
> pipe 1
> [599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0
> [599303.864408] [drm:drm_helper_probe_single_connector_modes], 
> [CONNECTOR:7:VGA-1]
> [599303.877172] [drm:drm_helper_probe_single_connector_modes], 
> [CONNECTOR:7:VGA-1] disconnected
> [599303.877184] [drm:drm_helper_probe_single_connector_modes], 
> [CONNECTOR:9:LVDS-1]
> [599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent 
> adapter intel drm LVDSBLC_B
> [599303.881778] [drm:drm_helper_probe_single_connector_modes], 
> [CONNECTOR:9:LVDS-1] probed modes :
> [599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:"1920x1080" 
> 60 144000 192

Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-04-05 Thread Michael Tokarev
 for preferred mode on 
connector 20
[   45.351949] [drm:drm_target_preferred], found mode 1024x768
[   45.351953] [drm:drm_setup_crtcs], picking CRTCs for 4096x4096 config
[   45.351962] [drm:drm_setup_crtcs], desired mode 1280x1024 set on crtc 3
[   45.351967] [drm:drm_setup_crtcs], desired mode 1920x1080 set on crtc 4
[   45.351987] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on 
minor 0

Thank you!

/mjt

15.02.2014 22:28, Michael Tokarev wrote:
 10.02.2014 14:44, One Thousand Gnomes wrote:
 fbcon is loaded so it isn't an issue.

 I tried 3.10 kernel initially (the above messages are from it), next
 I tried 3.13 kernel too, and that one behaves exactly the same.

 As far as I remember, this system never worked with graphics well.
 Previous kernel (from which I updated) was 3.2 which had no
 gma500 module (local build).

 What are the steps to debug this further?

 Check you have X86_SYSFB and SIMPLEFB disabled
 
 Neither of these options exists in 3.10 config.  In 3.13 I had X86_SYSFB set
 to y initially (SIMPLEFB doesn't exist there too), but setting it to n does
 not make any difference.
 
 Boot with drm.debug=6

 collect the logs
 
 I used `modprobe drm debug=6' (initially booting with gma500_gfx module
 disabled), followed with `modprobe gma500_gfx'.  After loading module
 the screen goes blank as before, and monitor says 'no signal detected'.
 
 Here are the logs:
 
 [599286.739923] Linux agpgart interface v0.103
 [599286.765176] [drm] Initialized drm 1.1.0 20060810
 [599303.673734] gma500 :00:02.0: setting latency timer to 64
 [599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported
 [599303.673887] [drm:psb_intel_opregion_setup], ASLE supported
 [599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X
 [599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT 
 CEDARVIEW  d
 [599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:1920x1080 0 
 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
 [599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found 
 in VBT
 [599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 
 t10 500 t11_t12 5000
 [599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, 
 Bpp 24
 [599303.673984] [drm:parse_edp], VBT reports EDP: VSwing  0, Preemph 0
 [599303.688094] acpi device:29: registered as cooling_device2
 [599303.688446] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
 [599303.688557] input: Video Bus as 
 /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
 [599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
 [599303.689188] [drm] No driver support for vblank timestamp query.
 [599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent 
 adapter intel drm LVDSDDC_C
 [599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B
 [599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
 [599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
 [599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
 [599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
 [599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C
 [599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
 [599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
 [599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0
 [599303.803958] gma500 :00:02.0: trying to get vblank count for disabled 
 pipe 1
 [599303.803996] gma500 :00:02.0: trying to get vblank count for disabled 
 pipe 1
 [599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0
 [599303.864408] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:7:VGA-1]
 [599303.877172] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:7:VGA-1] disconnected
 [599303.877184] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:9:LVDS-1]
 [599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent 
 adapter intel drm LVDSBLC_B
 [599303.881778] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:9:LVDS-1] probed modes :
 [599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:1920x1080 
 60 144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
 [599303.881791] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:12:DVI-D-1]
 [599303.886292] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent 
 adapter intel drm HDMIB
 [599303.886298] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:12:DVI-D-1] disconnected
 [599303.886304] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:14:DP-1]
 [599303.886811] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
 [599303.886815] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:14:DP-1] disconnected
 [599303.886820] [drm:drm_helper_probe_single_connector_modes], 
 [CONNECTOR:18:DVI-D-2]
 [599303.891350] [drm:drm_do_probe_ddc_edid], drm

Re: [Qemu-devel] Massive read only kvm guests when backing file was missing

2014-03-28 Thread Michael Tokarev
27.03.2014 20:14, Alejandro Comisario wrote:
> Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side
> (ubuntu 12.04 on host and guest).
> So, how can i adjust the tinmeout on the guest ?

After a bit more talks on IRC yesterday, it turned out that the situation
is _much_ more "interesting" than originally described.  The OP claims to
have 10500 guests running off an NFS server, and that after NFS server
downtime, the "backing files" were disappeared (whatever it means), so
they had to restore those files.  More, the OP didn't even bother to look
at the guest's dmesg, being busy rebooting all 10500 guests.

> This solution is the most logical one, but i cannot apply it!
> thanks for all the responses!

I suggested the OP to actually describe the _real_ situation, instead of
giving random half-pictures, and actually take a look at the actual problem
as reported in various places (most importantly the guest kernel log), and
reoirt _those_ hints to the list.  I also mentioned that, at least for some
NFS servers, if a client has a file open on the server, and this file is
deleted, the server will report error to the client when client tries to
access that file, and this has nothing at all to do with timeouts of any
kind.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] Massive read only kvm guests when backing file was missing

2014-03-28 Thread Michael Tokarev
27.03.2014 20:14, Alejandro Comisario wrote:
 Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side
 (ubuntu 12.04 on host and guest).
 So, how can i adjust the tinmeout on the guest ?

After a bit more talks on IRC yesterday, it turned out that the situation
is _much_ more interesting than originally described.  The OP claims to
have 10500 guests running off an NFS server, and that after NFS server
downtime, the backing files were disappeared (whatever it means), so
they had to restore those files.  More, the OP didn't even bother to look
at the guest's dmesg, being busy rebooting all 10500 guests.

 This solution is the most logical one, but i cannot apply it!
 thanks for all the responses!

I suggested the OP to actually describe the _real_ situation, instead of
giving random half-pictures, and actually take a look at the actual problem
as reported in various places (most importantly the guest kernel log), and
reoirt _those_ hints to the list.  I also mentioned that, at least for some
NFS servers, if a client has a file open on the server, and this file is
deleted, the server will report error to the client when client tries to
access that file, and this has nothing at all to do with timeouts of any
kind.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-02-15 Thread Michael Tokarev
10.02.2014 14:44, One Thousand Gnomes wrote:
>> fbcon is loaded so it isn't an issue.
>>
>> I tried 3.10 kernel initially (the above messages are from it), next
>> I tried 3.13 kernel too, and that one behaves exactly the same.
>>
>> As far as I remember, this system never worked with graphics well.
>> Previous kernel (from which I updated) was 3.2 which had no
>> gma500 module (local build).
>>
>> What are the steps to debug this further?
> 
> Check you have X86_SYSFB and SIMPLEFB disabled

Neither of these options exists in 3.10 config.  In 3.13 I had X86_SYSFB set
to y initially (SIMPLEFB doesn't exist there too), but setting it to n does
not make any difference.

> Boot with drm.debug=6
> 
> collect the logs

I used `modprobe drm debug=6' (initially booting with gma500_gfx module
disabled), followed with `modprobe gma500_gfx'.  After loading module
the screen goes blank as before, and monitor says 'no signal detected'.

Here are the logs:

[599286.739923] Linux agpgart interface v0.103
[599286.765176] [drm] Initialized drm 1.1.0 20060810
[599303.673734] gma500 :00:02.0: setting latency timer to 64
[599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported
[599303.673887] [drm:psb_intel_opregion_setup], ASLE supported
[599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X
[599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT 
CEDARVIEW  d
[599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:"1920x1080" 0 
144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
[599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found 
in VBT
[599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 
500 t11_t12 5000
[599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, 
Bpp 24
[599303.673984] [drm:parse_edp], VBT reports EDP: VSwing  0, Preemph 0
[599303.688094] acpi device:29: registered as cooling_device2
[599303.688446] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[599303.688557] input: Video Bus as 
/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
[599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[599303.689188] [drm] No driver support for vblank timestamp query.
[599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm LVDSDDC_C
[599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B
[599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
[599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
[599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C
[599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0
[599303.803958] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[599303.803996] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0
[599303.864408] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:7:VGA-1]
[599303.877172] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:7:VGA-1] disconnected
[599303.877184] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:9:LVDS-1]
[599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm LVDSBLC_B
[599303.881778] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:9:LVDS-1] probed modes :
[599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:"1920x1080" 60 
144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
[599303.881791] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:12:DVI-D-1]
[599303.886292] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm HDMIB
[599303.886298] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:12:DVI-D-1] disconnected
[599303.886304] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:14:DP-1]
[599303.886811] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[599303.886815] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:14:DP-1] disconnected
[599303.886820] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:18:DVI-D-2]
[599303.891350] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm HDMIC
[599303.891357] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:18:DVI-D-2] disconnected
[599303.891362] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:20:DP-2]
[599303.891569] [drm:cdv_dp_detect], DPCD: Rev=11 LN_Rate=a LN_CNT=82 
LN_DOWNSP=41
[599303.891876] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.892082] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.892085] [drm:i2c_algo_dp_aux_xfer], dp_aux_xfer return 

Re: screen goes blank when loading gma500_gfx (atom D2500)

2014-02-15 Thread Michael Tokarev
10.02.2014 14:44, One Thousand Gnomes wrote:
 fbcon is loaded so it isn't an issue.

 I tried 3.10 kernel initially (the above messages are from it), next
 I tried 3.13 kernel too, and that one behaves exactly the same.

 As far as I remember, this system never worked with graphics well.
 Previous kernel (from which I updated) was 3.2 which had no
 gma500 module (local build).

 What are the steps to debug this further?
 
 Check you have X86_SYSFB and SIMPLEFB disabled

Neither of these options exists in 3.10 config.  In 3.13 I had X86_SYSFB set
to y initially (SIMPLEFB doesn't exist there too), but setting it to n does
not make any difference.

 Boot with drm.debug=6
 
 collect the logs

I used `modprobe drm debug=6' (initially booting with gma500_gfx module
disabled), followed with `modprobe gma500_gfx'.  After loading module
the screen goes blank as before, and monitor says 'no signal detected'.

Here are the logs:

[599286.739923] Linux agpgart interface v0.103
[599286.765176] [drm] Initialized drm 1.1.0 20060810
[599303.673734] gma500 :00:02.0: setting latency timer to 64
[599303.673883] [drm:psb_intel_opregion_setup], Public ACPI methods supported
[599303.673887] [drm:psb_intel_opregion_setup], ASLE supported
[599303.673923] gma500 :00:02.0: irq 50 for MSI/MSI-X
[599303.673950] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT 
CEDARVIEW  d
[599303.673959] [drm:drm_mode_debug_printmodeline], Modeline 0:1920x1080 0 
144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
[599303.673969] [drm:parse_sdvo_device_mapping], No SDVO device info is found 
in VBT
[599303.673975] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 
500 t11_t12 5000
[599303.673980] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, 
Bpp 24
[599303.673984] [drm:parse_edp], VBT reports EDP: VSwing  0, Preemph 0
[599303.688094] acpi device:29: registered as cooling_device2
[599303.688446] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[599303.688557] input: Video Bus as 
/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
[599303.689160] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[599303.689188] [drm] No driver support for vblank timestamp query.
[599303.740423] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm LVDSDDC_C
[599303.741222] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B
[599303.741732] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[599303.741736] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
[599303.742242] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[599303.742246] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
[599303.742997] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C
[599303.743305] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.743510] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.783922] [drm:cdv_intel_single_pipe_active], pipe enabled 0
[599303.803958] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[599303.803996] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[599303.844370] [drm:cdv_intel_single_pipe_active], pipe enabled 0
[599303.864408] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:7:VGA-1]
[599303.877172] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:7:VGA-1] disconnected
[599303.877184] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:9:LVDS-1]
[599303.881764] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm LVDSBLC_B
[599303.881778] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:9:LVDS-1] probed modes :
[599303.881783] [drm:drm_mode_debug_printmodeline], Modeline 22:1920x1080 60 
144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
[599303.881791] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:12:DVI-D-1]
[599303.886292] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm HDMIB
[599303.886298] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:12:DVI-D-1] disconnected
[599303.886304] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:14:DP-1]
[599303.886811] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[599303.886815] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:14:DP-1] disconnected
[599303.886820] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:18:DVI-D-2]
[599303.891350] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm HDMIC
[599303.891357] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:18:DVI-D-2] disconnected
[599303.891362] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:20:DP-2]
[599303.891569] [drm:cdv_dp_detect], DPCD: Rev=11 LN_Rate=a LN_CNT=82 
LN_DOWNSP=41
[599303.891876] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.892082] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[599303.892085] [drm:i2c_algo_dp_aux_xfer], dp_aux_xfer return -121
[599303.892391] 

Re: [ANNOUNCE] s390 31 bit kernel support removal

2014-02-13 Thread Michael Tokarev
12.02.2014 13:29, Heiko Carstens wrote:
> We want to remove s390 31 bit kernel support with Linux kernel 3.16.

Maybe you can send a patch for Documentation/feature-removal-schedule.txt
about this now?

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] s390 31 bit kernel support removal

2014-02-13 Thread Michael Tokarev
12.02.2014 13:29, Heiko Carstens wrote:
 We want to remove s390 31 bit kernel support with Linux kernel 3.16.

Maybe you can send a patch for Documentation/feature-removal-schedule.txt
about this now?

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


screen goes blank when loading gma500_gfx (atom D2500)

2014-02-08 Thread Michael Tokarev
Hello.

Today I rebooted my router into a new kernel and noticed that
the screen goes blank after booting the system (initial bootup
messages are visible).  After some debugging it turns out that
the screen goes blank when loading gma500_gfx module.

This is an intel D2500CC motherboard with Atom D5200 built-in,
with a monitor connected to a VGA port, the following vga device
is reported by lspci:

00:02.0 VGA compatible controller: Intel Corporation Atom Processor D2xxx/N2xxx 
Integrated Graphics Controller (rev 09)

Here are the dmesg output after loading gma500_gfx:

[  176.427071] Linux agpgart interface v0.103
[  176.452914] [drm] Initialized drm 1.1.0 20060810
[  176.476037] gma500 :00:02.0: setting latency timer to 64
[  176.476216] gma500 :00:02.0: irq 50 for MSI/MSI-X
[  176.491675] acpi device:29: registered as cooling_device2
[  176.492041] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[  176.492169] input: Video Bus as 
/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
[  176.492357] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[  176.492396] [drm] No driver support for vblank timestamp query.
[  176.607485] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[  176.607531] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[  176.806078] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on 
minor 0

which does not look bad or suspicious to me.

fbcon is loaded so it isn't an issue.

I tried 3.10 kernel initially (the above messages are from it), next
I tried 3.13 kernel too, and that one behaves exactly the same.

As far as I remember, this system never worked with graphics well.
Previous kernel (from which I updated) was 3.2 which had no
gma500 module (local build).

What are the steps to debug this further?

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


screen goes blank when loading gma500_gfx (atom D2500)

2014-02-08 Thread Michael Tokarev
Hello.

Today I rebooted my router into a new kernel and noticed that
the screen goes blank after booting the system (initial bootup
messages are visible).  After some debugging it turns out that
the screen goes blank when loading gma500_gfx module.

This is an intel D2500CC motherboard with Atom D5200 built-in,
with a monitor connected to a VGA port, the following vga device
is reported by lspci:

00:02.0 VGA compatible controller: Intel Corporation Atom Processor D2xxx/N2xxx 
Integrated Graphics Controller (rev 09)

Here are the dmesg output after loading gma500_gfx:

[  176.427071] Linux agpgart interface v0.103
[  176.452914] [drm] Initialized drm 1.1.0 20060810
[  176.476037] gma500 :00:02.0: setting latency timer to 64
[  176.476216] gma500 :00:02.0: irq 50 for MSI/MSI-X
[  176.491675] acpi device:29: registered as cooling_device2
[  176.492041] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[  176.492169] input: Video Bus as 
/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input11
[  176.492357] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[  176.492396] [drm] No driver support for vblank timestamp query.
[  176.607485] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[  176.607531] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[  176.806078] [drm] Initialized gma500 1.0.0 2011-06-06 for :00:02.0 on 
minor 0

which does not look bad or suspicious to me.

fbcon is loaded so it isn't an issue.

I tried 3.10 kernel initially (the above messages are from it), next
I tried 3.13 kernel too, and that one behaves exactly the same.

As far as I remember, this system never worked with graphics well.
Previous kernel (from which I updated) was 3.2 which had no
gma500 module (local build).

What are the steps to debug this further?

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.10.25 kernel behaves unstable as a qemu/kvm guest

2013-12-27 Thread Michael Tokarev
Hello.

This is just an initial/preliminary heads-up, maybe mis-directed, about
a possible issue.

I upgraded 2 machines today to 3.10.25, and both shows some.. strangeness
within linux guests, which are also running 3.10.25.  Revering to 3.10.24
in guests (compiled by the same compiler with the same options) or
using older qemu/kvm (running with 1.7 now) fixes it.

All guests are using virtio-net and virtio-blk.

On one machine (prod), one guest (also prod) loads okay, but the networking
is not functioning: no packets are received by the guest.  I weren't able
to debug this further at this time, so reverted back to an older qemu/kvm
(1.1).

On another machine (my home workstation where I can experiment), the same
combination (3.10.25 on host & guest and qemu 1.7) shows rather unstable
behavour: about every 1/2 boot it stalls somewhere at the initial boot,
either after initializing PNP, or initializing networking, or sometimes
after initializing virtio, and the rest 1/2 it boots okay.

When it stalls, it consumes no CPU, qemu process is responsive, the guest
just does nothing.  Like this:

 ...
 NET: Registering protocol family 2
 TCP: established hash table entries: 8192 (order: 5, 131072 bytes)
 TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
 TCP: Hash tables configured (established 8192 bind 8192)
 TCP: reno registered
 [at this point it hanged]

(after this it normally registers UDP hash tables and other net stuff)

I'm not sure yet what's going on.  I understand that there are no guest-related
changes in 3.10.25 (compared with .24), so there should be something else.
The fact that it stalls randomly suggests there's some uninitialized value
somewhere.

I'll try to debug it further.  Just a heads-up for now.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.10.25 kernel behaves unstable as a qemu/kvm guest

2013-12-27 Thread Michael Tokarev
Hello.

This is just an initial/preliminary heads-up, maybe mis-directed, about
a possible issue.

I upgraded 2 machines today to 3.10.25, and both shows some.. strangeness
within linux guests, which are also running 3.10.25.  Revering to 3.10.24
in guests (compiled by the same compiler with the same options) or
using older qemu/kvm (running with 1.7 now) fixes it.

All guests are using virtio-net and virtio-blk.

On one machine (prod), one guest (also prod) loads okay, but the networking
is not functioning: no packets are received by the guest.  I weren't able
to debug this further at this time, so reverted back to an older qemu/kvm
(1.1).

On another machine (my home workstation where I can experiment), the same
combination (3.10.25 on host  guest and qemu 1.7) shows rather unstable
behavour: about every 1/2 boot it stalls somewhere at the initial boot,
either after initializing PNP, or initializing networking, or sometimes
after initializing virtio, and the rest 1/2 it boots okay.

When it stalls, it consumes no CPU, qemu process is responsive, the guest
just does nothing.  Like this:

 ...
 NET: Registering protocol family 2
 TCP: established hash table entries: 8192 (order: 5, 131072 bytes)
 TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
 TCP: Hash tables configured (established 8192 bind 8192)
 TCP: reno registered
 [at this point it hanged]

(after this it normally registers UDP hash tables and other net stuff)

I'm not sure yet what's going on.  I understand that there are no guest-related
changes in 3.10.25 (compared with .24), so there should be something else.
The fact that it stalls randomly suggests there's some uninitialized value
somewhere.

I'll try to debug it further.  Just a heads-up for now.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] autofs4 - rename autofs4 to autofs

2013-09-03 Thread Michael Tokarev
31.08.2013 15:42, Ian Kent wrote:
[...]
> By leaving a Kconfig and Makefile in fs/autofs4 (to build autofs4.ko)
> with a deprication message sub-system maintainers and other users will
> make any needed changes before these are removed after two kernel versions.
> IMHO the presence of the warning is reason enough to leave a build stub
> rather than do a straight out rename.

Why do you want to continue building autofs4.ko? (or allowing to)
What's actually wrong with a stright rename?
If the new module can be auto-loaded by both name (by providing an
alias), there's no need to keep ability to build autofs4.ko, I think.

Well, maybe except of the case when autofs is needed in initramfs (like
for systemd).  For this, indeed, you can keep autofs4.ko which is a
dummy depending on autofs.ko...

> Ian Kent (10):
>   autofs4 - coding style fixes
>   autofs4 - fix string.h include in auto_dev-ioctl.h
>   autofs4 - move linux/auto_dev-ioctl.h to uapi/linux
>   autofs - merge auto_fs.h and auto_fs4.h
>   autofs - use autofs instead of autofs4 everywhere
>   autofs - copy autofs4 to autofs
>   autofs - create autofs Kconfig and Makefile
>   autofs - update fs/autofs4/Kconfig
>   autofs - update fs/autofs4/Makefile
>   autofs - delete fs/autofs4

By doing it this way, you're losing all git history.
If you perform stright rename and git detects it, you
can use, eg, git log --follow to see whole hostory
across rename.  This way you create new files without
history.

So I strongly shuggest actually renaming the subdirectory
(together with appropriate kconfig/makefile changes so
things are bisectable), and creating the stubs after this.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] autofs4 - rename autofs4 to autofs

2013-09-03 Thread Michael Tokarev
31.08.2013 15:42, Ian Kent wrote:
[...]
 By leaving a Kconfig and Makefile in fs/autofs4 (to build autofs4.ko)
 with a deprication message sub-system maintainers and other users will
 make any needed changes before these are removed after two kernel versions.
 IMHO the presence of the warning is reason enough to leave a build stub
 rather than do a straight out rename.

Why do you want to continue building autofs4.ko? (or allowing to)
What's actually wrong with a stright rename?
If the new module can be auto-loaded by both name (by providing an
alias), there's no need to keep ability to build autofs4.ko, I think.

Well, maybe except of the case when autofs is needed in initramfs (like
for systemd).  For this, indeed, you can keep autofs4.ko which is a
dummy depending on autofs.ko...

 Ian Kent (10):
   autofs4 - coding style fixes
   autofs4 - fix string.h include in auto_dev-ioctl.h
   autofs4 - move linux/auto_dev-ioctl.h to uapi/linux
   autofs - merge auto_fs.h and auto_fs4.h
   autofs - use autofs instead of autofs4 everywhere
   autofs - copy autofs4 to autofs
   autofs - create autofs Kconfig and Makefile
   autofs - update fs/autofs4/Kconfig
   autofs - update fs/autofs4/Makefile
   autofs - delete fs/autofs4

By doing it this way, you're losing all git history.
If you perform stright rename and git detects it, you
can use, eg, git log --follow to see whole hostory
across rename.  This way you create new files without
history.

So I strongly shuggest actually renaming the subdirectory
(together with appropriate kconfig/makefile changes so
things are bisectable), and creating the stubs after this.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Very poor latency when using hard drive (raid1)

2013-04-16 Thread Michael Tokarev
15.04.2013 13:59, l...@tigusoft.pl пишет:
> There are 2 hard drives (normal, magnetic) in software raid 1
> on 3.2.41 kernel.
> 
> When I write into them e.g. using dd from /dev/zero to a local file
> (ext4 on default settings), running 2 dd at once (writing two files) it
> starves all other programs that try to use the disk.
> 
> Running ls on any directory on same disk (same fs btw), takes over half
> minute to execute, same for any other disk touching action.
> 
> Did anyone seen such problem, where too look, what to test?

This is typical, known for many years, issue.

Your dds are run against buffer cache, the same as used by all other
regular accesses.  So once it fills up, cached directories and the
like are thrown away to make room for new cache space.  So once
you need something else, that something needs to be read from disk,
which is busy together with the buffer cache.

> What could solve it (other then ionice on applications that I expect to
> use hard drive)?

Just don't mix these two workloads.  Or, if you really need to transfer
large amount of data, use direct I/O (O_DIRECT) -- for dd it is
iflag=direct or oflag=direct (depending on the I/O direction).

ionice wont help much.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Very poor latency when using hard drive (raid1)

2013-04-16 Thread Michael Tokarev
15.04.2013 13:59, l...@tigusoft.pl пишет:
 There are 2 hard drives (normal, magnetic) in software raid 1
 on 3.2.41 kernel.
 
 When I write into them e.g. using dd from /dev/zero to a local file
 (ext4 on default settings), running 2 dd at once (writing two files) it
 starves all other programs that try to use the disk.
 
 Running ls on any directory on same disk (same fs btw), takes over half
 minute to execute, same for any other disk touching action.
 
 Did anyone seen such problem, where too look, what to test?

This is typical, known for many years, issue.

Your dds are run against buffer cache, the same as used by all other
regular accesses.  So once it fills up, cached directories and the
like are thrown away to make room for new cache space.  So once
you need something else, that something needs to be read from disk,
which is busy together with the buffer cache.

 What could solve it (other then ionice on applications that I expect to
 use hard drive)?

Just don't mix these two workloads.  Or, if you really need to transfer
large amount of data, use direct I/O (O_DIRECT) -- for dd it is
iflag=direct or oflag=direct (depending on the I/O direction).

ionice wont help much.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()

2013-02-12 Thread Michael Tokarev

13.02.2013 11:37, Ian Kent wrote:
[]

So, you would like me to forward this to Linus?

I'd be inclined to wait until the window for 3.9 opens since Linus
probably has more than enough to do finalizing 3.8 right now.


I guess this change is anything but urgent ;)

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()

2013-02-12 Thread Michael Tokarev

13.02.2013 11:20, Ian Kent wrote:

On Tue, 2013-02-12 at 10:12 -0700, Tim Gardner wrote:

smatch analysis:

fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null
  check on wq->name.name calling kfree()


I'm not sure about this change.

autofs4_catatonic_mode() could be called when there are remaining
entries in the wait queue, which is nulled, so autofs4_wait_release()
won't see the the discarded waits if it is called.


Ian, this is about something else really.  The patch is about the
NULL check before calling kfree() -- it does the NULL check internally.
It is nothing about code flow or anything else, it is about calling
kfree() unconditionally regardless whenever the argument is actually
NULL or non-NULL.  It makes the code shorter and easier to read.

You can add my

Signed-off-by: Michael Tokarev 

if you want.


Cc: Ian Kent 
Cc: aut...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
  fs/autofs4/waitq.c |6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 03bc1d3..3db70da 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -42,10 +42,8 @@ void autofs4_catatonic_mode(struct autofs_sb_info *sbi)
while (wq) {
nwq = wq->next;
wq->status = -ENOENT; /* Magic is gone - report failure */
-   if (wq->name.name) {
-   kfree(wq->name.name);
-   wq->name.name = NULL;
-   }
+   kfree(wq->name.name);
+   wq->name.name = NULL;
wq->wait_ctr--;
wake_up_interruptible(>queue);
wq = nwq;



--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()

2013-02-12 Thread Michael Tokarev

13.02.2013 11:20, Ian Kent wrote:

On Tue, 2013-02-12 at 10:12 -0700, Tim Gardner wrote:

smatch analysis:

fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null
  check on wq-name.name calling kfree()


I'm not sure about this change.

autofs4_catatonic_mode() could be called when there are remaining
entries in the wait queue, which is nulled, so autofs4_wait_release()
won't see the the discarded waits if it is called.


Ian, this is about something else really.  The patch is about the
NULL check before calling kfree() -- it does the NULL check internally.
It is nothing about code flow or anything else, it is about calling
kfree() unconditionally regardless whenever the argument is actually
NULL or non-NULL.  It makes the code shorter and easier to read.

You can add my

Signed-off-by: Michael Tokarev m...@tls.msk.ru

if you want.


Cc: Ian Kent ra...@themaw.net
Cc: aut...@vger.kernel.org
Signed-off-by: Tim Gardner tim.gard...@canonical.com
---
  fs/autofs4/waitq.c |6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 03bc1d3..3db70da 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -42,10 +42,8 @@ void autofs4_catatonic_mode(struct autofs_sb_info *sbi)
while (wq) {
nwq = wq-next;
wq-status = -ENOENT; /* Magic is gone - report failure */
-   if (wq-name.name) {
-   kfree(wq-name.name);
-   wq-name.name = NULL;
-   }
+   kfree(wq-name.name);
+   wq-name.name = NULL;
wq-wait_ctr--;
wake_up_interruptible(wq-queue);
wq = nwq;



--
To unsubscribe from this list: send the line unsubscribe autofs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next] autofs4: autofs4_catatonic_mode(): remove redundant null check on kfree()

2013-02-12 Thread Michael Tokarev

13.02.2013 11:37, Ian Kent wrote:
[]

So, you would like me to forward this to Linus?

I'd be inclined to wait until the window for 3.9 opens since Linus
probably has more than enough to do finalizing 3.8 right now.


I guess this change is anything but urgent ;)

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Transparent Huge Pages

2013-02-07 Thread Michael Tokarev

Hello.

I'm trying to understand how to use transparent huge pages
(currently in x86).  Before I used "explicit" huge pages
alot (mostly about hugetlbfs), but it looked like THP should
be easier so I gave it a try.

This tiny program:

- cut -
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv) {
  void *ptr;
  size_t len = argv[1] ? atoi(argv[1]) : 1024*1024*1024;
  /* no error checking! */
  posix_memalign(, 2048*1024, len);
  madvise(ptr, len, MADV_HUGEPAGE);
  memset(ptr, 0, len);
  usleep(500); /* let khugepagesd do its work */
  system("grep ^AnonHugePages: /proc/meminfo");
  return 0;
}
- cut -

which just tries to allocate some amount of RAM (1Gb by default)
aligned to 2Mb, uses madvise(HUGEPAGE) on it, and checks
/proc/meminfo for AnonHugePages.

The problem is: I've never seen any value for AnonHugePages
larger than about 16Mb.  Usually it is around 10Mb or 8Mb,
no matter how large the requested memory size is, including
the default 1Gb.

The question, obviously, is: why so small?

My system (which is a few years old now) has 6Gb of RAM,
it uses AMD Athlon II X2 260 CPU, and is running 3.2
kernel.

Original question comes from grounds of of QEMU, which is
supposed to use THP for guest memory, but it also does not
use more than these ~10Mb, when allocating 1Gb to the guest.

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Transparent Huge Pages

2013-02-07 Thread Michael Tokarev

Hello.

I'm trying to understand how to use transparent huge pages
(currently in x86).  Before I used explicit huge pages
alot (mostly about hugetlbfs), but it looked like THP should
be easier so I gave it a try.

This tiny program:

- cut -
#include unistd.h
#include stdio.h
#include stdlib.h
#include sys/types.h
#include sys/mman.h
#include errno.h
#include string.h

int main(int argc, char **argv) {
  void *ptr;
  size_t len = argv[1] ? atoi(argv[1]) : 1024*1024*1024;
  /* no error checking! */
  posix_memalign(ptr, 2048*1024, len);
  madvise(ptr, len, MADV_HUGEPAGE);
  memset(ptr, 0, len);
  usleep(500); /* let khugepagesd do its work */
  system(grep ^AnonHugePages: /proc/meminfo);
  return 0;
}
- cut -

which just tries to allocate some amount of RAM (1Gb by default)
aligned to 2Mb, uses madvise(HUGEPAGE) on it, and checks
/proc/meminfo for AnonHugePages.

The problem is: I've never seen any value for AnonHugePages
larger than about 16Mb.  Usually it is around 10Mb or 8Mb,
no matter how large the requested memory size is, including
the default 1Gb.

The question, obviously, is: why so small?

My system (which is a few years old now) has 6Gb of RAM,
it uses AMD Athlon II X2 260 CPU, and is running 3.2
kernel.

Original question comes from grounds of of QEMU, which is
supposed to use THP for guest memory, but it also does not
use more than these ~10Mb, when allocating 1Gb to the guest.

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/11] x86/microcode: Early load microcode

2012-12-21 Thread Michael Tokarev
On 20.12.2012 23:48, Fenghua Yu wrote:
> From: Fenghua Yu 
> 
> The problem in current microcode loading method is that we load a microcode 
> way,
> way too late; ideally we should load it before turning paging on.  This may 
> only
> be practical on 32 bits since we can't get to 64-bit mode without paging on,
> but we should still do it as early as at all possible.

Why loading microcode this early is important?
Why it is bad to load it at the end of (initial) boot?

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/11] x86/microcode: Early load microcode

2012-12-21 Thread Michael Tokarev
On 20.12.2012 23:48, Fenghua Yu wrote:
 From: Fenghua Yu fenghua...@intel.com
 
 The problem in current microcode loading method is that we load a microcode 
 way,
 way too late; ideally we should load it before turning paging on.  This may 
 only
 be practical on 32 bits since we can't get to 64-bit mode without paging on,
 but we should still do it as early as at all possible.

Why loading microcode this early is important?
Why it is bad to load it at the end of (initial) boot?

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes for 3.7

2012-10-02 Thread Michael Tokarev
On 02.10.2012 23:59, Jeff Garzik wrote:
> On 10/02/2012 03:44 PM, Michael Tokarev wrote:
>> On 02.10.2012 23:40, Jeff Garzik wrote:
>>
>>> Minor libata updates, nothing notable.
>>>
>>> 1) Apply -- and then revert -- the FUA feature.  Caused
>>> disk corruption in linux-next, proving it cannot be turned on by
>>> default.
>>
>> Any details on that?  Disk corruprion is rather a nasty
>> side-effect indeed.
> 
> One thread with reports is
> 
> Storage related regression in linux-next 20120824

Eg, https://lkml.org/lkml/2012/8/27/66 (two reports).
Thank you!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes for 3.7

2012-10-02 Thread Michael Tokarev
On 02.10.2012 23:40, Jeff Garzik wrote:

> Minor libata updates, nothing notable.
> 
> 1) Apply -- and then revert -- the FUA feature.  Caused
>disk corruption in linux-next, proving it cannot be turned on by
>default.

Any details on that?  Disk corruprion is rather a nasty
side-effect indeed.

Thank you!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL

2012-10-02 Thread Michael Tokarev
On 02.10.2012 22:49, Ferenc Wagner wrote:
> "Michael Chan"  writes:
>> These are the likely fixes:
>>
>> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
>> Author: Matt Carlson 
>> Date: Mon Nov 28 09:41:03 2011 +
>>
>> tg3: Fix TSO CAP for 5704 devs w / ASF enabled
> 
> You are exactly right: cf9ecf4b fixed the premanent SoL breakage
> introduced by dabc5c67.  Looks like ASF utilizes similar technology to
> that of the HS20 BMC.  Thanks for the tip, it greatly reduced our CPU
> wear. :)  It's a pity ethtool -k did not give a hint.  Do you think it's
> possible to work around in 3.2 by eg. fiddling some ethtool setting?

Maybe it's better to push this commit to -stable instead? (the commit
that broke things is part of 3.0 kernel so all current 3.x -stable
kernels are affected)

(Besides, that commit "This patch fixes the problem by revisiting and
reevaluating the decision after tg3_get_eeprom_hw_cfg() is called." -
merely copies a somewhat "twisted" chunk of code into another place,
which does not look optimal)

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tg3 driver upgrade (Linux 2.6.32 - 3.2) breaks IBM Bladecenter SoL

2012-10-02 Thread Michael Tokarev
On 02.10.2012 22:49, Ferenc Wagner wrote:
 Michael Chan mc...@broadcom.com writes:
 These are the likely fixes:

 commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
 Author: Matt Carlson mcarl...@broadcom.com
 Date: Mon Nov 28 09:41:03 2011 +

 tg3: Fix TSO CAP for 5704 devs w / ASF enabled
 
 You are exactly right: cf9ecf4b fixed the premanent SoL breakage
 introduced by dabc5c67.  Looks like ASF utilizes similar technology to
 that of the HS20 BMC.  Thanks for the tip, it greatly reduced our CPU
 wear. :)  It's a pity ethtool -k did not give a hint.  Do you think it's
 possible to work around in 3.2 by eg. fiddling some ethtool setting?

Maybe it's better to push this commit to -stable instead? (the commit
that broke things is part of 3.0 kernel so all current 3.x -stable
kernels are affected)

(Besides, that commit This patch fixes the problem by revisiting and
reevaluating the decision after tg3_get_eeprom_hw_cfg() is called. -
merely copies a somewhat twisted chunk of code into another place,
which does not look optimal)

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes for 3.7

2012-10-02 Thread Michael Tokarev
On 02.10.2012 23:40, Jeff Garzik wrote:

 Minor libata updates, nothing notable.
 
 1) Apply -- and then revert -- the FUA feature.  Caused
disk corruption in linux-next, proving it cannot be turned on by
default.

Any details on that?  Disk corruprion is rather a nasty
side-effect indeed.

Thank you!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes for 3.7

2012-10-02 Thread Michael Tokarev
On 02.10.2012 23:59, Jeff Garzik wrote:
 On 10/02/2012 03:44 PM, Michael Tokarev wrote:
 On 02.10.2012 23:40, Jeff Garzik wrote:

 Minor libata updates, nothing notable.

 1) Apply -- and then revert -- the FUA feature.  Caused
 disk corruption in linux-next, proving it cannot be turned on by
 default.

 Any details on that?  Disk corruprion is rather a nasty
 side-effect indeed.
 
 One thread with reports is
 
 Storage related regression in linux-next 20120824

Eg, https://lkml.org/lkml/2012/8/27/66 (two reports).
Thank you!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lve module taint?

2012-09-18 Thread Michael Tokarev
On 19.09.2012 06:02, Rusty Russell wrote:

> From: Matthew Garrett 
> Subject: module: taint kernel when lve module is loaded
> Date: Fri, 22 Jun 2012 13:49:31 -0400
> 
> Cloudlinux have a product called lve that includes a kernel module. This
> was previously GPLed but is now under a proprietary license, but the
> module continues to declare MODULE_LICENSE("GPL") and makes use of some
> EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.

> + /* lve claims to be GPL but upstream won't provide source */
> + if (strcmp(mod->name, "lve") == 0)
> + add_taint_module(mod, TAINT_PROPRIETARY_MODULE);

This is setting a, in my opinion, rather bad precedent.  Next we'll
be adding various modules here due to various reasons.

I think this case should be pure political now, not technical.  Ie,
if some project declares itself as GPL, it is not kernel task to
verify that the sources are available or to enforce that.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lve module taint?

2012-09-18 Thread Michael Tokarev
On 19.09.2012 06:02, Rusty Russell wrote:

 From: Matthew Garrett mj...@srcf.ucam.org
 Subject: module: taint kernel when lve module is loaded
 Date: Fri, 22 Jun 2012 13:49:31 -0400
 
 Cloudlinux have a product called lve that includes a kernel module. This
 was previously GPLed but is now under a proprietary license, but the
 module continues to declare MODULE_LICENSE(GPL) and makes use of some
 EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.

 + /* lve claims to be GPL but upstream won't provide source */
 + if (strcmp(mod-name, lve) == 0)
 + add_taint_module(mod, TAINT_PROPRIETARY_MODULE);

This is setting a, in my opinion, rather bad precedent.  Next we'll
be adding various modules here due to various reasons.

I think this case should be pure political now, not technical.  Ie,
if some project declares itself as GPL, it is not kernel task to
verify that the sources are available or to enforce that.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] x86, nops settings result in kernel crash

2012-08-21 Thread Michael Tokarev
On 20.08.2012 21:13, Tomas Racek wrote:
[]
Can we trim the old, large and now not-so-relevant discussion please? ;)

> I can provide you with more different traces if it can help. But I thought 
> that maybe it will be more useful for you to try it on your own. So I've 
> prepared some minimal debian installation which you could download here (apx 
> 163M bzipped):
> 
> http://fi.muni.cz/~xracek/debian.img.bz2
> 
> Password:
> root/asdfgh
> 
> Here is my config for guest kernel:
> 
> http://fi.muni.cz/~xracek/config
> 
> I use
> 
> qemu-kvm -m 1500 -hda debian.img -kernel linux/arch/x86/boot/bzImage -append 
> "root=/dev/sda1"

Um.  I'd expect the image to be self-contained, no external kernel.
I wanted to do a quick test to see if it fails on my machine too,
d/loaded debian.img.bz2 but there's no kernel.  So.. no quick test
for you ;)

> After logging in just run "sh runtest.sh". This leads to crash in my case 
> (host: Intel Core i5-2540M, kernel 3.5.2-1.fc17.x86_64, qemu 1.0.1).

With all the above, this "runtest.sh" is informationally equal to
your disk image.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: root=PARTUUID for MBR/NT disk signatures?

2012-08-21 Thread Michael Tokarev
On 21.08.2012 08:47, Will Drewry wrote:
[]
> Functionally, I suspect this will work fine, but I am concerned that
> it is a bad move from an efficiency perspective (not unfixable
> though).  Right now, the user-supplied value is converted from
> string-uuid to packed-uuid.  This is then memcmp'd across any and all
> partitions - be it 2 or 200 - across all attached storage.  If we move
> to a pure string, then we end up needing to unpack every packed UUID
> at disk scan time (or search, depending on impl) rather than just the
> one user supplied value.
> 
> Perhaps the cost is negligible on modern machines, but it seems like
> the wrong place to put the cost (per entry rather than per search
> value).

Amount of work needed to READ all the partition tables might be
quite a bit larger than strcmp'ing it all.  I think.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: root=PARTUUID for MBR/NT disk signatures?

2012-08-21 Thread Michael Tokarev
On 21.08.2012 08:47, Will Drewry wrote:
[]
 Functionally, I suspect this will work fine, but I am concerned that
 it is a bad move from an efficiency perspective (not unfixable
 though).  Right now, the user-supplied value is converted from
 string-uuid to packed-uuid.  This is then memcmp'd across any and all
 partitions - be it 2 or 200 - across all attached storage.  If we move
 to a pure string, then we end up needing to unpack every packed UUID
 at disk scan time (or search, depending on impl) rather than just the
 one user supplied value.
 
 Perhaps the cost is negligible on modern machines, but it seems like
 the wrong place to put the cost (per entry rather than per search
 value).

Amount of work needed to READ all the partition tables might be
quite a bit larger than strcmp'ing it all.  I think.

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] x86, nops settings result in kernel crash

2012-08-21 Thread Michael Tokarev
On 20.08.2012 21:13, Tomas Racek wrote:
[]
Can we trim the old, large and now not-so-relevant discussion please? ;)

 I can provide you with more different traces if it can help. But I thought 
 that maybe it will be more useful for you to try it on your own. So I've 
 prepared some minimal debian installation which you could download here (apx 
 163M bzipped):
 
 http://fi.muni.cz/~xracek/debian.img.bz2
 
 Password:
 root/asdfgh
 
 Here is my config for guest kernel:
 
 http://fi.muni.cz/~xracek/config
 
 I use
 
 qemu-kvm -m 1500 -hda debian.img -kernel linux/arch/x86/boot/bzImage -append 
 root=/dev/sda1

Um.  I'd expect the image to be self-contained, no external kernel.
I wanted to do a quick test to see if it fails on my machine too,
d/loaded debian.img.bz2 but there's no kernel.  So.. no quick test
for you ;)

 After logging in just run sh runtest.sh. This leads to crash in my case 
 (host: Intel Core i5-2540M, kernel 3.5.2-1.fc17.x86_64, qemu 1.0.1).

With all the above, this runtest.sh is informationally equal to
your disk image.

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-18 Thread Michael Tokarev
On 18.08.2012 15:13, J. Bruce Fields wrote:
> On Sat, Aug 18, 2012 at 10:49:31AM +0400, Michael Tokarev wrote:
[]
>> Well.  What can I say?  With the change below applied (to 3.2 kernel
>> at least), I don't see any stalls or high CPU usage on the server
>> anymore.  It survived several multi-gigabyte transfers, for several
>> hours, without any problem.  So it is a good step forward ;)
>>
>> But the whole thing seems to be quite a bit fragile.  I tried to follow
>> the logic in there, and the thing is quite a bit, well, "twisted", and
>> somewhat difficult to follow.  So I don't know if this is the right
>> fix or not.  At least it works! :)
> 
> Suggestions welcomed.

Ok...

Meanwhile, you can add my
Tested-By: Michael Tokarev 

to the patch.

>> And I really wonder why no one else reported this problem before.
>> Is me the only one in this world who uses linux nfsd? :)
> 
> This, for example:
> 
>   http://marc.info/?l=linux-nfs=134131915612287=2
> 
> may well describe the same problem  It just needed some debugging
> persistence, thanks!

Ah.  I tried to find something when I initially
sent this report, but weren't able to.  Apparently
I'm not alone with this problem indeed!

Thank you for all the work!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-18 Thread Michael Tokarev
On 18.08.2012 02:32, J. Bruce Fields wrote:
> On Fri, Aug 17, 2012 at 04:08:07PM -0400, J. Bruce Fields wrote:
>> Wait a minute, that assumption's a problem because that calculation
>> depends in part on xpt_reserved, which is changed here
>>
>> In particular, svc_xprt_release() calls svc_reserve(rqstp, 0), which
>> subtracts rqstp->rq_reserved and then calls svc_xprt_enqueue, now with a
>> lower xpt_reserved value.  That could well explain this.
> 
> So, maybe something like this?

Well.  What can I say?  With the change below applied (to 3.2 kernel
at least), I don't see any stalls or high CPU usage on the server
anymore.  It survived several multi-gigabyte transfers, for several
hours, without any problem.  So it is a good step forward ;)

But the whole thing seems to be quite a bit fragile.  I tried to follow
the logic in there, and the thing is quite a bit, well, "twisted", and
somewhat difficult to follow.  So I don't know if this is the right
fix or not.  At least it works! :)

And I really wonder why no one else reported this problem before.
Is me the only one in this world who uses linux nfsd? :)

Thank you for all your patience and the proposed fix!

/mjt

> commit c8136c319ad85d0db870021fc3f9074d37f26d4a
> Author: J. Bruce Fields 
> Date:   Fri Aug 17 17:31:53 2012 -0400
> 
> svcrpc: don't add to xpt_reserved till we receive
> 
> The rpc server tries to ensure that there will be room to send a reply
> before it receives a request.
> 
> It does this by tracking, in xpt_reserved, an upper bound on the total
> size of the replies that is has already committed to for the socket.
> 
> Currently it is adding in the estimate for a new reply *before* it
> checks whether there is space available.  If it finds that there is not
> space, it then subtracts the estimate back out.
> 
> This may lead the subsequent svc_xprt_enqueue to decide that there is
> space after all.
> 
> The results is a svc_recv() that will repeatedly return -EAGAIN, causing
> server threads to loop without doing any actual work.
> 
> Reported-by: Michael Tokarev 
> Signed-off-by: J. Bruce Fields 
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index ec99849a..59ff3a3 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -366,8 +366,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
>   rqstp, rqstp->rq_xprt);
>   rqstp->rq_xprt = xprt;
>   svc_xprt_get(xprt);
> - rqstp->rq_reserved = serv->sv_max_mesg;
> - atomic_add(rqstp->rq_reserved, >xpt_reserved);
>   pool->sp_stats.threads_woken++;
>   wake_up(>rq_wait);
>   } else {
> @@ -644,8 +642,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>   if (xprt) {
>   rqstp->rq_xprt = xprt;
>   svc_xprt_get(xprt);
> - rqstp->rq_reserved = serv->sv_max_mesg;
> - atomic_add(rqstp->rq_reserved, >xpt_reserved);
>  
>   /* As there is a shortage of threads and this request
>* had to be queued, don't allow the thread to wait so
> @@ -743,6 +739,10 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>   len = xprt->xpt_ops->xpo_recvfrom(rqstp);
>   dprintk("svc: got len=%d\n", len);
>   }
> + if (len > 0) {
> + rqstp->rq_reserved = serv->sv_max_mesg;
> + atomic_add(rqstp->rq_reserved, >xpt_reserved);
> + }
>   svc_xprt_received(xprt);
>  
>   /* No data, incomplete (TCP) read, or accept() */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-18 Thread Michael Tokarev
On 18.08.2012 02:32, J. Bruce Fields wrote:
 On Fri, Aug 17, 2012 at 04:08:07PM -0400, J. Bruce Fields wrote:
 Wait a minute, that assumption's a problem because that calculation
 depends in part on xpt_reserved, which is changed here

 In particular, svc_xprt_release() calls svc_reserve(rqstp, 0), which
 subtracts rqstp-rq_reserved and then calls svc_xprt_enqueue, now with a
 lower xpt_reserved value.  That could well explain this.
 
 So, maybe something like this?

Well.  What can I say?  With the change below applied (to 3.2 kernel
at least), I don't see any stalls or high CPU usage on the server
anymore.  It survived several multi-gigabyte transfers, for several
hours, without any problem.  So it is a good step forward ;)

But the whole thing seems to be quite a bit fragile.  I tried to follow
the logic in there, and the thing is quite a bit, well, twisted, and
somewhat difficult to follow.  So I don't know if this is the right
fix or not.  At least it works! :)

And I really wonder why no one else reported this problem before.
Is me the only one in this world who uses linux nfsd? :)

Thank you for all your patience and the proposed fix!

/mjt

 commit c8136c319ad85d0db870021fc3f9074d37f26d4a
 Author: J. Bruce Fields bfie...@redhat.com
 Date:   Fri Aug 17 17:31:53 2012 -0400
 
 svcrpc: don't add to xpt_reserved till we receive
 
 The rpc server tries to ensure that there will be room to send a reply
 before it receives a request.
 
 It does this by tracking, in xpt_reserved, an upper bound on the total
 size of the replies that is has already committed to for the socket.
 
 Currently it is adding in the estimate for a new reply *before* it
 checks whether there is space available.  If it finds that there is not
 space, it then subtracts the estimate back out.
 
 This may lead the subsequent svc_xprt_enqueue to decide that there is
 space after all.
 
 The results is a svc_recv() that will repeatedly return -EAGAIN, causing
 server threads to loop without doing any actual work.
 
 Reported-by: Michael Tokarev m...@tls.msk.ru
 Signed-off-by: J. Bruce Fields bfie...@redhat.com
 
 diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
 index ec99849a..59ff3a3 100644
 --- a/net/sunrpc/svc_xprt.c
 +++ b/net/sunrpc/svc_xprt.c
 @@ -366,8 +366,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
   rqstp, rqstp-rq_xprt);
   rqstp-rq_xprt = xprt;
   svc_xprt_get(xprt);
 - rqstp-rq_reserved = serv-sv_max_mesg;
 - atomic_add(rqstp-rq_reserved, xprt-xpt_reserved);
   pool-sp_stats.threads_woken++;
   wake_up(rqstp-rq_wait);
   } else {
 @@ -644,8 +642,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
   if (xprt) {
   rqstp-rq_xprt = xprt;
   svc_xprt_get(xprt);
 - rqstp-rq_reserved = serv-sv_max_mesg;
 - atomic_add(rqstp-rq_reserved, xprt-xpt_reserved);
  
   /* As there is a shortage of threads and this request
* had to be queued, don't allow the thread to wait so
 @@ -743,6 +739,10 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
   len = xprt-xpt_ops-xpo_recvfrom(rqstp);
   dprintk(svc: got len=%d\n, len);
   }
 + if (len  0) {
 + rqstp-rq_reserved = serv-sv_max_mesg;
 + atomic_add(rqstp-rq_reserved, xprt-xpt_reserved);
 + }
   svc_xprt_received(xprt);
  
   /* No data, incomplete (TCP) read, or accept() */
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-18 Thread Michael Tokarev
On 18.08.2012 15:13, J. Bruce Fields wrote:
 On Sat, Aug 18, 2012 at 10:49:31AM +0400, Michael Tokarev wrote:
[]
 Well.  What can I say?  With the change below applied (to 3.2 kernel
 at least), I don't see any stalls or high CPU usage on the server
 anymore.  It survived several multi-gigabyte transfers, for several
 hours, without any problem.  So it is a good step forward ;)

 But the whole thing seems to be quite a bit fragile.  I tried to follow
 the logic in there, and the thing is quite a bit, well, twisted, and
 somewhat difficult to follow.  So I don't know if this is the right
 fix or not.  At least it works! :)
 
 Suggestions welcomed.

Ok...

Meanwhile, you can add my
Tested-By: Michael Tokarev m...@tls.msk.ru

to the patch.

 And I really wonder why no one else reported this problem before.
 Is me the only one in this world who uses linux nfsd? :)
 
 This, for example:
 
   http://marc.info/?l=linux-nfsm=134131915612287w=2
 
 may well describe the same problem  It just needed some debugging
 persistence, thanks!

Ah.  I tried to find something when I initially
sent this report, but weren't able to.  Apparently
I'm not alone with this problem indeed!

Thank you for all the work!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-17 Thread Michael Tokarev
On 17.08.2012 21:26, Michael Tokarev wrote:
> On 17.08.2012 21:18, J. Bruce Fields wrote:
>> On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote:
> []
>>> So we're calling svc_recv in a tight loop, eating
>>> all available CPU.  (The above is with just 2 nfsd
>>> threads).
>>>
>>> Something is definitely wrong here.  And it happens mure more
>>> often after the mentioned commit (f03d78db65085).
>>
>> Oh, neat.  Hm.  That commit doesn't really sound like the cause, then.
>> Is that busy-looping reproduceable on kernels before that commit?
> 
> Note I bisected this issue to this commit.  I haven't seen it
> happening before this commit, and reverting it from 3.0 or 3.2
> kernel makes the problem to go away.
> 
> I guess it is looping there:
> 
> 
> net/sunrpc/svc_xprt.c:svc_recv()
> ...
> len = 0;
> ...
> if (test_bit(XPT_LISTENER, >xpt_flags)) {
> ...
> } else if (xprt->xpt_ops->xpo_has_wspace(xprt)) {  <=== here -- has 
> no wspace due to memory...
> ...  len = 
> }
> 
> /* No data, incomplete (TCP) read, or accept() */
> if (len == 0 || len == -EAGAIN)
> goto out;
> ...
> out:
> rqstp->rq_res.len = 0;
> svc_xprt_release(rqstp);
> return -EAGAIN;
> }
> 
> I'm trying to verify this theory...

Yes.  I inserted a printk there, and all these million times while
we're waiting in this EAGAIN loop, this printk is triggering:


[21052.533053]  svc_recv: !has_wspace
[21052.533070]  svc_recv: !has_wspace
[21052.533087]  svc_recv: !has_wspace
[21052.533105]  svc_recv: !has_wspace
[21052.533122]  svc_recv: !has_wspace
[21052.533139]  svc_recv: !has_wspace
[21052.533156]  svc_recv: !has_wspace
[21052.533174]  svc_recv: !has_wspace
[21052.533191]  svc_recv: !has_wspace
[21052.533208]  svc_recv: !has_wspace
[21052.533226]  svc_recv: !has_wspace
[21052.533244]  svc_recv: !has_wspace
[21052.533265] calling svc_recv: 1228163 times (err=-4)
[21052.533403] calling svc_recv: 1226616 times (err=-4)
[21052.534520] nfsd: last server has exited, flushing export cache

(I stopped nfsd since it was flooding the log).

I can only guess that before that commit, we always had space,
now we don't anymore, and are looping like crazy.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-17 Thread Michael Tokarev
On 17.08.2012 21:18, J. Bruce Fields wrote:
> On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote:
[]
>> So we're calling svc_recv in a tight loop, eating
>> all available CPU.  (The above is with just 2 nfsd
>> threads).
>>
>> Something is definitely wrong here.  And it happens mure more
>> often after the mentioned commit (f03d78db65085).
> 
> Oh, neat.  Hm.  That commit doesn't really sound like the cause, then.
> Is that busy-looping reproduceable on kernels before that commit?

Note I bisected this issue to this commit.  I haven't seen it
happening before this commit, and reverting it from 3.0 or 3.2
kernel makes the problem to go away.

I guess it is looping there:


net/sunrpc/svc_xprt.c:svc_recv()
...
len = 0;
...
if (test_bit(XPT_LISTENER, >xpt_flags)) {
...
} else if (xprt->xpt_ops->xpo_has_wspace(xprt)) {  <=== here -- has no 
wspace due to memory...
...  len = 
}

/* No data, incomplete (TCP) read, or accept() */
if (len == 0 || len == -EAGAIN)
goto out;
...
out:
rqstp->rq_res.len = 0;
svc_xprt_release(rqstp);
return -EAGAIN;
}

I'm trying to verify this theory...

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-17 Thread Michael Tokarev
On 17.08.2012 20:00, J. Bruce Fields wrote:
[]> Uh, if I grepped my way through this right: it looks like it's the
> "memory" column of the "TCP" row of /proc/net/protocols; might be
> interesting to see how that's changing over time.

This file does not look interesting.  Memory usage does not jump,
there's no high increase either.

But there's something else which is interesting here.

I noticed that in perf top, the top consumer of CPU is svc_recv()
(I mentioned this in the start of this thread).  So I looked how
this routine is called from nfsd.  And here we go.

fs/nfsd/nfssvc.c:

/*
 * This is the NFS server kernel thread
 */
static int
nfsd(void *vrqstp)
{
...
/*
 * The main request loop
 */
for (;;) {
/*
 * Find a socket with data available and call its
 * recvfrom routine.
 */
int i = 0;
while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN)
++i;
printk(KERN_ERR "calling svc_recv: %d times (err=%d)\n", i, err);
if (err == -EINTR)
break;
...

(I added the "i" counter and the printk).  And here's the output:

[19626.401136] calling svc_recv: 0 times (err=212)
[19626.405059] calling svc_recv: 1478 times (err=212)
[19626.409512] calling svc_recv: 1106 times (err=212)
[19626.543020] calling svc_recv: 0 times (err=212)
[19626.543059] calling svc_recv: 0 times (err=212)
[19626.548074] calling svc_recv: 0 times (err=212)
[19626.549515] calling svc_recv: 0 times (err=212)
[19626.552320] calling svc_recv: 0 times (err=212)
[19626.553503] calling svc_recv: 0 times (err=212)
[19626.556007] calling svc_recv: 0 times (err=212)
[19626.557152] calling svc_recv: 0 times (err=212)
[19626.560109] calling svc_recv: 0 times (err=212)
[19626.560943] calling svc_recv: 0 times (err=212)
[19626.565315] calling svc_recv: 1067 times (err=212)
[19626.569735] calling svc_recv: 2571 times (err=212)
[19626.574150] calling svc_recv: 3842 times (err=212)
[19626.581914] calling svc_recv: 2891 times (err=212)
[19626.583072] calling svc_recv: 1247 times (err=212)
[19626.616885] calling svc_recv: 0 times (err=212)
[19626.616952] calling svc_recv: 0 times (err=212)
[19626.622889] calling svc_recv: 0 times (err=212)
[19626.624518] calling svc_recv: 0 times (err=212)
[19626.627118] calling svc_recv: 0 times (err=212)
[19626.629735] calling svc_recv: 0 times (err=212)
[19626.631777] calling svc_recv: 0 times (err=212)
[19626.633986] calling svc_recv: 0 times (err=212)
[19626.636746] calling svc_recv: 0 times (err=212)
[19626.637692] calling svc_recv: 0 times (err=212)
[19626.640769] calling svc_recv: 0 times (err=212)
[19626.657852] calling svc_recv: 0 times (err=212)
[19626.661602] calling svc_recv: 0 times (err=212)
[19626.670160] calling svc_recv: 0 times (err=212)
[19626.671917] calling svc_recv: 0 times (err=212)
[19626.684643] calling svc_recv: 0 times (err=212)
[19626.684680] calling svc_recv: 0 times (err=212)
[19626.812820] calling svc_recv: 0 times (err=212)
[19626.814697] calling svc_recv: 0 times (err=212)
[19626.817195] calling svc_recv: 0 times (err=212)
[19626.820324] calling svc_recv: 0 times (err=212)
[19626.822855] calling svc_recv: 0 times (err=212)
[19626.824823] calling svc_recv: 0 times (err=212)
[19626.828016] calling svc_recv: 0 times (err=212)
[19626.829021] calling svc_recv: 0 times (err=212)
[19626.831970] calling svc_recv: 0 times (err=212)

> the stall begin:
[19686.823135] calling svc_recv: 3670352 times (err=212)
[19686.823524] calling svc_recv: 3659205 times (err=212)

> transfer continues
[19686.854734] calling svc_recv: 0 times (err=212)
[19686.860023] calling svc_recv: 0 times (err=212)
[19686.887124] calling svc_recv: 0 times (err=212)
[19686.895532] calling svc_recv: 0 times (err=212)
[19686.903667] calling svc_recv: 0 times (err=212)
[19686.922780] calling svc_recv: 0 times (err=212)

So we're calling svc_recv in a tight loop, eating
all available CPU.  (The above is with just 2 nfsd
threads).

Something is definitely wrong here.  And it happens mure more
often after the mentioned commit (f03d78db65085).

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-17 Thread Michael Tokarev
On 17.08.2012 20:00, J. Bruce Fields wrote:
[] Uh, if I grepped my way through this right: it looks like it's the
 memory column of the TCP row of /proc/net/protocols; might be
 interesting to see how that's changing over time.

This file does not look interesting.  Memory usage does not jump,
there's no high increase either.

But there's something else which is interesting here.

I noticed that in perf top, the top consumer of CPU is svc_recv()
(I mentioned this in the start of this thread).  So I looked how
this routine is called from nfsd.  And here we go.

fs/nfsd/nfssvc.c:

/*
 * This is the NFS server kernel thread
 */
static int
nfsd(void *vrqstp)
{
...
/*
 * The main request loop
 */
for (;;) {
/*
 * Find a socket with data available and call its
 * recvfrom routine.
 */
int i = 0;
while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN)
++i;
printk(KERN_ERR calling svc_recv: %d times (err=%d)\n, i, err);
if (err == -EINTR)
break;
...

(I added the i counter and the printk).  And here's the output:

[19626.401136] calling svc_recv: 0 times (err=212)
[19626.405059] calling svc_recv: 1478 times (err=212)
[19626.409512] calling svc_recv: 1106 times (err=212)
[19626.543020] calling svc_recv: 0 times (err=212)
[19626.543059] calling svc_recv: 0 times (err=212)
[19626.548074] calling svc_recv: 0 times (err=212)
[19626.549515] calling svc_recv: 0 times (err=212)
[19626.552320] calling svc_recv: 0 times (err=212)
[19626.553503] calling svc_recv: 0 times (err=212)
[19626.556007] calling svc_recv: 0 times (err=212)
[19626.557152] calling svc_recv: 0 times (err=212)
[19626.560109] calling svc_recv: 0 times (err=212)
[19626.560943] calling svc_recv: 0 times (err=212)
[19626.565315] calling svc_recv: 1067 times (err=212)
[19626.569735] calling svc_recv: 2571 times (err=212)
[19626.574150] calling svc_recv: 3842 times (err=212)
[19626.581914] calling svc_recv: 2891 times (err=212)
[19626.583072] calling svc_recv: 1247 times (err=212)
[19626.616885] calling svc_recv: 0 times (err=212)
[19626.616952] calling svc_recv: 0 times (err=212)
[19626.622889] calling svc_recv: 0 times (err=212)
[19626.624518] calling svc_recv: 0 times (err=212)
[19626.627118] calling svc_recv: 0 times (err=212)
[19626.629735] calling svc_recv: 0 times (err=212)
[19626.631777] calling svc_recv: 0 times (err=212)
[19626.633986] calling svc_recv: 0 times (err=212)
[19626.636746] calling svc_recv: 0 times (err=212)
[19626.637692] calling svc_recv: 0 times (err=212)
[19626.640769] calling svc_recv: 0 times (err=212)
[19626.657852] calling svc_recv: 0 times (err=212)
[19626.661602] calling svc_recv: 0 times (err=212)
[19626.670160] calling svc_recv: 0 times (err=212)
[19626.671917] calling svc_recv: 0 times (err=212)
[19626.684643] calling svc_recv: 0 times (err=212)
[19626.684680] calling svc_recv: 0 times (err=212)
[19626.812820] calling svc_recv: 0 times (err=212)
[19626.814697] calling svc_recv: 0 times (err=212)
[19626.817195] calling svc_recv: 0 times (err=212)
[19626.820324] calling svc_recv: 0 times (err=212)
[19626.822855] calling svc_recv: 0 times (err=212)
[19626.824823] calling svc_recv: 0 times (err=212)
[19626.828016] calling svc_recv: 0 times (err=212)
[19626.829021] calling svc_recv: 0 times (err=212)
[19626.831970] calling svc_recv: 0 times (err=212)

 the stall begin:
[19686.823135] calling svc_recv: 3670352 times (err=212)
[19686.823524] calling svc_recv: 3659205 times (err=212)

 transfer continues
[19686.854734] calling svc_recv: 0 times (err=212)
[19686.860023] calling svc_recv: 0 times (err=212)
[19686.887124] calling svc_recv: 0 times (err=212)
[19686.895532] calling svc_recv: 0 times (err=212)
[19686.903667] calling svc_recv: 0 times (err=212)
[19686.922780] calling svc_recv: 0 times (err=212)

So we're calling svc_recv in a tight loop, eating
all available CPU.  (The above is with just 2 nfsd
threads).

Something is definitely wrong here.  And it happens mure more
often after the mentioned commit (f03d78db65085).

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-17 Thread Michael Tokarev
On 17.08.2012 21:18, J. Bruce Fields wrote:
 On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote:
[]
 So we're calling svc_recv in a tight loop, eating
 all available CPU.  (The above is with just 2 nfsd
 threads).

 Something is definitely wrong here.  And it happens mure more
 often after the mentioned commit (f03d78db65085).
 
 Oh, neat.  Hm.  That commit doesn't really sound like the cause, then.
 Is that busy-looping reproduceable on kernels before that commit?

Note I bisected this issue to this commit.  I haven't seen it
happening before this commit, and reverting it from 3.0 or 3.2
kernel makes the problem to go away.

I guess it is looping there:


net/sunrpc/svc_xprt.c:svc_recv()
...
len = 0;
...
if (test_bit(XPT_LISTENER, xprt-xpt_flags)) {
...
} else if (xprt-xpt_ops-xpo_has_wspace(xprt)) {  === here -- has no 
wspace due to memory...
...  len = something
}

/* No data, incomplete (TCP) read, or accept() */
if (len == 0 || len == -EAGAIN)
goto out;
...
out:
rqstp-rq_res.len = 0;
svc_xprt_release(rqstp);
return -EAGAIN;
}

I'm trying to verify this theory...

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-17 Thread Michael Tokarev
On 17.08.2012 21:26, Michael Tokarev wrote:
 On 17.08.2012 21:18, J. Bruce Fields wrote:
 On Fri, Aug 17, 2012 at 09:12:38PM +0400, Michael Tokarev wrote:
 []
 So we're calling svc_recv in a tight loop, eating
 all available CPU.  (The above is with just 2 nfsd
 threads).

 Something is definitely wrong here.  And it happens mure more
 often after the mentioned commit (f03d78db65085).

 Oh, neat.  Hm.  That commit doesn't really sound like the cause, then.
 Is that busy-looping reproduceable on kernels before that commit?
 
 Note I bisected this issue to this commit.  I haven't seen it
 happening before this commit, and reverting it from 3.0 or 3.2
 kernel makes the problem to go away.
 
 I guess it is looping there:
 
 
 net/sunrpc/svc_xprt.c:svc_recv()
 ...
 len = 0;
 ...
 if (test_bit(XPT_LISTENER, xprt-xpt_flags)) {
 ...
 } else if (xprt-xpt_ops-xpo_has_wspace(xprt)) {  === here -- has 
 no wspace due to memory...
 ...  len = something
 }
 
 /* No data, incomplete (TCP) read, or accept() */
 if (len == 0 || len == -EAGAIN)
 goto out;
 ...
 out:
 rqstp-rq_res.len = 0;
 svc_xprt_release(rqstp);
 return -EAGAIN;
 }
 
 I'm trying to verify this theory...

Yes.  I inserted a printk there, and all these million times while
we're waiting in this EAGAIN loop, this printk is triggering:


[21052.533053]  svc_recv: !has_wspace
[21052.533070]  svc_recv: !has_wspace
[21052.533087]  svc_recv: !has_wspace
[21052.533105]  svc_recv: !has_wspace
[21052.533122]  svc_recv: !has_wspace
[21052.533139]  svc_recv: !has_wspace
[21052.533156]  svc_recv: !has_wspace
[21052.533174]  svc_recv: !has_wspace
[21052.533191]  svc_recv: !has_wspace
[21052.533208]  svc_recv: !has_wspace
[21052.533226]  svc_recv: !has_wspace
[21052.533244]  svc_recv: !has_wspace
[21052.533265] calling svc_recv: 1228163 times (err=-4)
[21052.533403] calling svc_recv: 1226616 times (err=-4)
[21052.534520] nfsd: last server has exited, flushing export cache

(I stopped nfsd since it was flooding the log).

I can only guess that before that commit, we always had space,
now we don't anymore, and are looping like crazy.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues (bisected)

2012-08-16 Thread Michael Tokarev
On 12.07.2012 16:53, J. Bruce Fields wrote:
> On Tue, Jul 10, 2012 at 04:52:03PM +0400, Michael Tokarev wrote:
>> I tried to debug this again, maybe to reproduce in a virtual machine,
>> and found out that it is only 32bit server code shows this issue:
>> after updating the kernel on the server to 64bit (the same version)
>> I can't reproduce this issue anymore.  Rebooting back to 32bit,
>> and voila, it is here again.
>>
>> Something apparenlty isn't right on 32bits... ;)
>>
>> (And yes, the prob is still present and is very annoying :)
> 
> OK, that's very useful, thanks.  So probably a bug got introduced in the
> 32-bit case between 2.6.32 and 3.0.
> 
> My personal upstream testing is normally all x86_64 only.  I'll kick off
> a 32-bit install and see if I can reproduce this quickly.

Actually it has nothing to do with 32 vs 64 bits as I
initially thought.  It happens on 64bits too, but takes
more time (or data to transfer) to trigger.


> Let me know if you're able to narrow this down any more.

I bisected this issue to the following commit:

commit f03d78db65085609938fdb686238867e65003181
Author: Eric Dumazet 
Date:   Thu Jul 7 00:27:05 2011 -0700

net: refine {udp|tcp|sctp}_mem limits

Current tcp/udp/sctp global memory limits are not taking into account
hugepages allocations, and allow 50% of ram to be used by buffers of a
single protocol [ not counting space used by sockets / inodes ...]

Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram
per protocol, and a minimum of 128 pages.
Heavy duty machines sysadmins probably need to tweak limits anyway.


Reverting this commit on top of 3.0 (or any later 3.x kernel) fixes
the behavour here.

This machine has 4Gb of memory.  On 3.0, with this patch applied
(as it is part of 3.0), tcp_mem is like this:

  21228 28306   42456

with this patch reverted, tcp_mem shows:

  81216 108288  162432

and with these values, it works fine.

So it looks like something else goes wrong there,
which lead to all nfsds fighting with each other
for something and eating 100% of available CPU
instead of servicing clients.

For added fun, when setting tcp_mem to the "good" value
from "bad" value (after booting into kernel with that
patch applied), the problem is _not_ fixed.

Any further hints?

Thanks,

/mjt

>> On 31.05.2012 17:51, Michael Tokarev wrote:
>>> On 31.05.2012 17:46, Myklebust, Trond wrote:
>>>> On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote:
>>> []
>>>>> I started tcpdump:
>>>>>
>>>>>  tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 
>>>>> \) -w nfsdump
>>>>>
>>>>> on the client (192.168.88.2).  Next I mounted a directory on the client,
>>>>> and started reading (tar'ing) a directory into /dev/null.  It captured a
>>>>> few stalls.  Tcpdump shows number of packets it got, the stalls are at
>>>>> packet counts 58090, 97069 and 97071.  I cancelled the capture after that.
>>>>>
>>>>> The resulting file is available at 
>>>>> http://www.corpit.ru/mjt/tmp/nfsdump.xz ,
>>>>> it is 220Mb uncompressed and 1.3Mb compressed.  The source files are
>>>>> 10 files of 1Gb each, all made by using `truncate' utility, so does not
>>>>> take place on disk at all.  This also makes it obvious that the issue
>>>>> does not depend on the speed of disk on the server (since in this case,
>>>>> the server disk isn't even in use).
>>>>
>>>> OK. So from the above file it looks as if the traffic is mainly READ
>>>> requests.
>>>
>>> The issue here happens only with reads.
>>>
>>>> In 2 places the server stops responding. In both cases, the client seems
>>>> to be sending a single TCP frame containing several COMPOUNDS containing
>>>> READ requests (which should be legal) just prior to the hang. When the
>>>> server doesn't respond, the client pings it with a RENEW, before it ends
>>>> up severing the TCP connection and then retransmitting.
>>>
>>> And sometimes -- speaking only from the behavour I've seen, not from the
>>> actual frames sent -- server does not respond to the RENEW too, in which
>>> case the client reports "nfs server no responding", and on the next
>>> renew it may actually respond.  This happens too, but much more rare.
>>>
>>> During these stalls, ie, when there's no network activity at all,
>>> the server NFSD threads are busy eating all available CPU.
>>>
>>> What does it all tel

Re: 3.0+ NFS issues (bisected)

2012-08-16 Thread Michael Tokarev
On 12.07.2012 16:53, J. Bruce Fields wrote:
 On Tue, Jul 10, 2012 at 04:52:03PM +0400, Michael Tokarev wrote:
 I tried to debug this again, maybe to reproduce in a virtual machine,
 and found out that it is only 32bit server code shows this issue:
 after updating the kernel on the server to 64bit (the same version)
 I can't reproduce this issue anymore.  Rebooting back to 32bit,
 and voila, it is here again.

 Something apparenlty isn't right on 32bits... ;)

 (And yes, the prob is still present and is very annoying :)
 
 OK, that's very useful, thanks.  So probably a bug got introduced in the
 32-bit case between 2.6.32 and 3.0.
 
 My personal upstream testing is normally all x86_64 only.  I'll kick off
 a 32-bit install and see if I can reproduce this quickly.

Actually it has nothing to do with 32 vs 64 bits as I
initially thought.  It happens on 64bits too, but takes
more time (or data to transfer) to trigger.


 Let me know if you're able to narrow this down any more.

I bisected this issue to the following commit:

commit f03d78db65085609938fdb686238867e65003181
Author: Eric Dumazet eric.duma...@gmail.com
Date:   Thu Jul 7 00:27:05 2011 -0700

net: refine {udp|tcp|sctp}_mem limits

Current tcp/udp/sctp global memory limits are not taking into account
hugepages allocations, and allow 50% of ram to be used by buffers of a
single protocol [ not counting space used by sockets / inodes ...]

Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram
per protocol, and a minimum of 128 pages.
Heavy duty machines sysadmins probably need to tweak limits anyway.


Reverting this commit on top of 3.0 (or any later 3.x kernel) fixes
the behavour here.

This machine has 4Gb of memory.  On 3.0, with this patch applied
(as it is part of 3.0), tcp_mem is like this:

  21228 28306   42456

with this patch reverted, tcp_mem shows:

  81216 108288  162432

and with these values, it works fine.

So it looks like something else goes wrong there,
which lead to all nfsds fighting with each other
for something and eating 100% of available CPU
instead of servicing clients.

For added fun, when setting tcp_mem to the good value
from bad value (after booting into kernel with that
patch applied), the problem is _not_ fixed.

Any further hints?

Thanks,

/mjt

 On 31.05.2012 17:51, Michael Tokarev wrote:
 On 31.05.2012 17:46, Myklebust, Trond wrote:
 On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote:
 []
 I started tcpdump:

  tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 
 \) -w nfsdump

 on the client (192.168.88.2).  Next I mounted a directory on the client,
 and started reading (tar'ing) a directory into /dev/null.  It captured a
 few stalls.  Tcpdump shows number of packets it got, the stalls are at
 packet counts 58090, 97069 and 97071.  I cancelled the capture after that.

 The resulting file is available at 
 http://www.corpit.ru/mjt/tmp/nfsdump.xz ,
 it is 220Mb uncompressed and 1.3Mb compressed.  The source files are
 10 files of 1Gb each, all made by using `truncate' utility, so does not
 take place on disk at all.  This also makes it obvious that the issue
 does not depend on the speed of disk on the server (since in this case,
 the server disk isn't even in use).

 OK. So from the above file it looks as if the traffic is mainly READ
 requests.

 The issue here happens only with reads.

 In 2 places the server stops responding. In both cases, the client seems
 to be sending a single TCP frame containing several COMPOUNDS containing
 READ requests (which should be legal) just prior to the hang. When the
 server doesn't respond, the client pings it with a RENEW, before it ends
 up severing the TCP connection and then retransmitting.

 And sometimes -- speaking only from the behavour I've seen, not from the
 actual frames sent -- server does not respond to the RENEW too, in which
 case the client reports nfs server no responding, and on the next
 renew it may actually respond.  This happens too, but much more rare.

 During these stalls, ie, when there's no network activity at all,
 the server NFSD threads are busy eating all available CPU.

 What does it all tell us? :)

 Thank you!

 /mjt
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Don't use static to define "void *p" in show_partition_start().

2012-08-12 Thread Michael Tokarev
On 03.08.2012 12:41, Jens Axboe wrote:
> On 08/03/2012 07:07 AM, majianpeng wrote:
[]
>> diff --git a/block/genhd.c b/block/genhd.c
>> index cac7366..d839723 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -835,7 +835,7 @@ static void disk_seqf_stop(struct seq_file *seqf, void 
>> *v)
>>  
>>  static void *show_partition_start(struct seq_file *seqf, loff_t *pos)
>>  {
>> -static void *p;
>> +void *p;
>>  
>>  p = disk_seqf_start(seqf, pos);
>>  if (!IS_ERR_OR_NULL(p) && !*pos)
> 
> Huh, that looks like a clear bug. I've applied it, thanks.

It also looks like a -stable material, don't you think?

Thanks,

/mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Don't use static to define void *p in show_partition_start().

2012-08-12 Thread Michael Tokarev
On 03.08.2012 12:41, Jens Axboe wrote:
 On 08/03/2012 07:07 AM, majianpeng wrote:
[]
 diff --git a/block/genhd.c b/block/genhd.c
 index cac7366..d839723 100644
 --- a/block/genhd.c
 +++ b/block/genhd.c
 @@ -835,7 +835,7 @@ static void disk_seqf_stop(struct seq_file *seqf, void 
 *v)
  
  static void *show_partition_start(struct seq_file *seqf, loff_t *pos)
  {
 -static void *p;
 +void *p;
  
  p = disk_seqf_start(seqf, pos);
  if (!IS_ERR_OR_NULL(p)  !*pos)
 
 Huh, that looks like a clear bug. I've applied it, thanks.

It also looks like a -stable material, don't you think?

Thanks,

/mjt

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.5-rcX : Big problem with root device returning

2012-07-15 Thread Michael Tokarev
On 15.07.2012 23:12, werner wrote:

> Even if rdev isn't often used, it should kept working, as it's included in 
> many other programs, and principally in the installers.

rdev doesn't _exist_ anymore in current software,
including installers.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.5-rcX : Big problem with root device returning

2012-07-15 Thread Michael Tokarev
On 15.07.2012 23:12, werner wrote:

 Even if rdev isn't often used, it should kept working, as it's included in 
 many other programs, and principally in the installers.

rdev doesn't _exist_ anymore in current software,
including installers.

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.5-rcX : Big problem with root device returning

2012-07-12 Thread Michael Tokarev
On 12.07.2012 16:08, werner wrote:
> There is a big problem since 3.5-rc1 which potentially mess the installations
> 
> rdev   don't give longer back the root device like /dev/sda1  , but in the 
> bios form like 0x80010300

Note rdev returns information which is written to kernel image, not
information about actual device the system booted from.

> rdev is essential for the installation programs  and for the installation 
> f.ex. of lilo .   It's not conveniente to rely on the bios numbers, because 
> on some meinbords they change depending which boot order you select in BIOS, 
> or only if you select another boot device in the bios boot menu with F12.   
> Whilst /dev/sdXY is more reliable.
> 
>   rdev is an old basical function which always worked correctly, until now.

rdev utility is obsolete, it is not present in current util-linux anymore,
because it makes just no sense nowadays.  Storing root device in the
kernel image has been obsoleted long ago by boot loaders providing
kernel command line and root= parameter.  More, root device is often
not mounted by kernel itself, but by initramfs (which become an integral
part of the kernel image).

It is obsolete because of 3 reasons:

1) you've kernel command line from the bootloader to store this and
  other info
2) it is not guaranteed that the next reboot the same device will be
 using the same /dev/sdX node, since they're discovered dynamically
 (in this sense, bios codes are more reliable, and filesystem UUIDs or
 labels are the right way to go)
3) static device numbers are slowly going away too, very few tools
 left which knows about particular major,minor pairs.

> The error starts with 3.5-rc1 and is not corrected until 3.5-rc6 .If I go 
> back to an earlier kernel, 3.4 or older, then the same installation works 
> correct (rdev gives /dev/sda1 ) and if I go back then again to 3.5-rcX it's 
> again wrong (rdev gives 0x80010300).Thus, this seems a wrong manner how 
> the kernel gives back the root device,  or interact with rdev.  It's also 
> possible that this problem happens only under any kernel compilation option,  
>  so that below I give the differences in config between 3.4 and 3.5-rc1
> 
> This problem should be fixed most quickly,  rdev always have to work 
> correctly.

There's no problem, so nothing to fix.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.5-rcX : Big problem with root device returning

2012-07-12 Thread Michael Tokarev
On 12.07.2012 16:08, werner wrote:
 There is a big problem since 3.5-rc1 which potentially mess the installations
 
 rdev   don't give longer back the root device like /dev/sda1  , but in the 
 bios form like 0x80010300

Note rdev returns information which is written to kernel image, not
information about actual device the system booted from.

 rdev is essential for the installation programs  and for the installation 
 f.ex. of lilo .   It's not conveniente to rely on the bios numbers, because 
 on some meinbords they change depending which boot order you select in BIOS, 
 or only if you select another boot device in the bios boot menu with F12.   
 Whilst /dev/sdXY is more reliable.
 
   rdev is an old basical function which always worked correctly, until now.

rdev utility is obsolete, it is not present in current util-linux anymore,
because it makes just no sense nowadays.  Storing root device in the
kernel image has been obsoleted long ago by boot loaders providing
kernel command line and root= parameter.  More, root device is often
not mounted by kernel itself, but by initramfs (which become an integral
part of the kernel image).

It is obsolete because of 3 reasons:

1) you've kernel command line from the bootloader to store this and
  other info
2) it is not guaranteed that the next reboot the same device will be
 using the same /dev/sdX node, since they're discovered dynamically
 (in this sense, bios codes are more reliable, and filesystem UUIDs or
 labels are the right way to go)
3) static device numbers are slowly going away too, very few tools
 left which knows about particular major,minor pairs.

 The error starts with 3.5-rc1 and is not corrected until 3.5-rc6 .If I go 
 back to an earlier kernel, 3.4 or older, then the same installation works 
 correct (rdev gives /dev/sda1 ) and if I go back then again to 3.5-rcX it's 
 again wrong (rdev gives 0x80010300).Thus, this seems a wrong manner how 
 the kernel gives back the root device,  or interact with rdev.  It's also 
 possible that this problem happens only under any kernel compilation option,  
  so that below I give the differences in config between 3.4 and 3.5-rc1
 
 This problem should be fixed most quickly,  rdev always have to work 
 correctly.

There's no problem, so nothing to fix.

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] core-kernel: use multiply instead of shifts in hash_64

2012-07-10 Thread Michael Tokarev
On 03.07.2012 00:25, Andrew Hunter wrote:
> diff --git a/include/linux/hash.h b/include/linux/hash.h
> index b80506b..daabc3d 100644
> --- a/include/linux/hash.h
> +++ b/include/linux/hash.h
> @@ -34,7 +34,9 @@
>  static inline u64 hash_64(u64 val, unsigned int bits)
>  {
>   u64 hash = val;
> -
> +#if BITS_PER_LONG == 64
> + hash *= GOLDEN_RATIO_PRIME_64;
> +#else
>   /*  Sigh, gcc can't optimise this alone like it does for 32 bits. */

Hmm.  Does this comment make sense here now?

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues

2012-07-10 Thread Michael Tokarev
I tried to debug this again, maybe to reproduce in a virtual machine,
and found out that it is only 32bit server code shows this issue:
after updating the kernel on the server to 64bit (the same version)
I can't reproduce this issue anymore.  Rebooting back to 32bit,
and voila, it is here again.

Something apparenlty isn't right on 32bits... ;)

(And yes, the prob is still present and is very annoying :)

Thanks,

/mjt


On 31.05.2012 17:51, Michael Tokarev wrote:
> On 31.05.2012 17:46, Myklebust, Trond wrote:
>> On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote:
> []
>>> I started tcpdump:
>>>
>>>  tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 \) 
>>> -w nfsdump
>>>
>>> on the client (192.168.88.2).  Next I mounted a directory on the client,
>>> and started reading (tar'ing) a directory into /dev/null.  It captured a
>>> few stalls.  Tcpdump shows number of packets it got, the stalls are at
>>> packet counts 58090, 97069 and 97071.  I cancelled the capture after that.
>>>
>>> The resulting file is available at http://www.corpit.ru/mjt/tmp/nfsdump.xz ,
>>> it is 220Mb uncompressed and 1.3Mb compressed.  The source files are
>>> 10 files of 1Gb each, all made by using `truncate' utility, so does not
>>> take place on disk at all.  This also makes it obvious that the issue
>>> does not depend on the speed of disk on the server (since in this case,
>>> the server disk isn't even in use).
>>
>> OK. So from the above file it looks as if the traffic is mainly READ
>> requests.
> 
> The issue here happens only with reads.
> 
>> In 2 places the server stops responding. In both cases, the client seems
>> to be sending a single TCP frame containing several COMPOUNDS containing
>> READ requests (which should be legal) just prior to the hang. When the
>> server doesn't respond, the client pings it with a RENEW, before it ends
>> up severing the TCP connection and then retransmitting.
> 
> And sometimes -- speaking only from the behavour I've seen, not from the
> actual frames sent -- server does not respond to the RENEW too, in which
> case the client reports "nfs server no responding", and on the next
> renew it may actually respond.  This happens too, but much more rare.
> 
> During these stalls, ie, when there's no network activity at all,
> the server NFSD threads are busy eating all available CPU.
> 
> What does it all tell us? :)
> 
> Thank you!
> 
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.0+ NFS issues

2012-07-10 Thread Michael Tokarev
I tried to debug this again, maybe to reproduce in a virtual machine,
and found out that it is only 32bit server code shows this issue:
after updating the kernel on the server to 64bit (the same version)
I can't reproduce this issue anymore.  Rebooting back to 32bit,
and voila, it is here again.

Something apparenlty isn't right on 32bits... ;)

(And yes, the prob is still present and is very annoying :)

Thanks,

/mjt


On 31.05.2012 17:51, Michael Tokarev wrote:
 On 31.05.2012 17:46, Myklebust, Trond wrote:
 On Thu, 2012-05-31 at 17:24 +0400, Michael Tokarev wrote:
 []
 I started tcpdump:

  tcpdump -npvi br0 -s 0 host 192.168.88.4 and \( proto ICMP or port 2049 \) 
 -w nfsdump

 on the client (192.168.88.2).  Next I mounted a directory on the client,
 and started reading (tar'ing) a directory into /dev/null.  It captured a
 few stalls.  Tcpdump shows number of packets it got, the stalls are at
 packet counts 58090, 97069 and 97071.  I cancelled the capture after that.

 The resulting file is available at http://www.corpit.ru/mjt/tmp/nfsdump.xz ,
 it is 220Mb uncompressed and 1.3Mb compressed.  The source files are
 10 files of 1Gb each, all made by using `truncate' utility, so does not
 take place on disk at all.  This also makes it obvious that the issue
 does not depend on the speed of disk on the server (since in this case,
 the server disk isn't even in use).

 OK. So from the above file it looks as if the traffic is mainly READ
 requests.
 
 The issue here happens only with reads.
 
 In 2 places the server stops responding. In both cases, the client seems
 to be sending a single TCP frame containing several COMPOUNDS containing
 READ requests (which should be legal) just prior to the hang. When the
 server doesn't respond, the client pings it with a RENEW, before it ends
 up severing the TCP connection and then retransmitting.
 
 And sometimes -- speaking only from the behavour I've seen, not from the
 actual frames sent -- server does not respond to the RENEW too, in which
 case the client reports nfs server no responding, and on the next
 renew it may actually respond.  This happens too, but much more rare.
 
 During these stalls, ie, when there's no network activity at all,
 the server NFSD threads are busy eating all available CPU.
 
 What does it all tell us? :)
 
 Thank you!
 
 /mjt
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] core-kernel: use multiply instead of shifts in hash_64

2012-07-10 Thread Michael Tokarev
On 03.07.2012 00:25, Andrew Hunter wrote:
 diff --git a/include/linux/hash.h b/include/linux/hash.h
 index b80506b..daabc3d 100644
 --- a/include/linux/hash.h
 +++ b/include/linux/hash.h
 @@ -34,7 +34,9 @@
  static inline u64 hash_64(u64 val, unsigned int bits)
  {
   u64 hash = val;
 -
 +#if BITS_PER_LONG == 64
 + hash *= GOLDEN_RATIO_PRIME_64;
 +#else
   /*  Sigh, gcc can't optimise this alone like it does for 32 bits. */

Hmm.  Does this comment make sense here now?

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Michael Tokarev
Jeremy Higdon wrote:
[]
> I'll put it even more strongly.  My experience is that disabling write
> cache plus disabling barriers is often much faster than enabling both
> barriers and write cache enabled, when doing metadata intensive
> operations, as long as you have a drive that is good at CTQ/NCQ.

Now, and it's VERY interesting at least for me (and is off-topic in
this thread) -- which drive(s) are good at NCQ?  I tried numerous SATA
(NCQ is about sata, right? :) drives, but NCQ either does nothing in
terms of performance or hurts.  Yesterday we ordered another drive
from Hitachi (their "raid edition" thing), -- will try it tomorrow,
but I've no hope here as it's some 5th or 6th model/brand already.

(Ol'good SCSI drives, even 10 years old, shows large difference when
TCQ is enabled...)

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Michael Tokarev
Ric Wheeler wrote:
> Alasdair G Kergon wrote:
>> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
>>> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
>>>> I wonder if it's worth the effort to try to implement this.
>>
>> My personal view (which seems to be in the minority) is that it's a
>> waste of our development time *except* in the (rare?) cases similar to
>> the ones Andi is talking about.
> 
> Using working barriers is important for normal users when you really
> care about data loss and have normal drives in a box. We do power fail
> testing on boxes (with reiserfs and ext3) and can definitely see a lot
> of file system corruption eliminated over power failures when barriers
> are enabled properly.
> 
> It is not unreasonable for some machines to disable barriers to get a
> performance boost, but I would not do that when you are storing things
> you really need back.

The talk here is about something different - about supporting barriers
on md/dm devices, i.e., on pseudo-devices which uses multiple real devices
as components (software RAIDs etc).  In this "world" it's nearly impossible
to support barriers if there are more than one underlying component device,
barriers only works if there's only one component.  And the talk is about
supporting barriers only in "minority" of cases - mostly for simplest
device-mapper case only, NOT covering any raid1 or other "fancy" configurations.

> Of course, you don't need barriers when you either disable the write
> cache on the drives or use a battery backed RAID array which gives you a
> write cache that will survive power outages...

Two things here.

First, I still don't understand why in God's sake barriers are "working"
while regular cache flushes are not.  Almost no consumer-grade hard drive
supports write barriers, but they all support regular cache flushes, and
the latter should be enough (while not the most speed-optimal) to ensure
data safety.  Why to require write cache disable (like in XFS FAQ) instead
of going the flush-cache-when-appropriate (as opposed to write-barrier-
when-appropriate) way?

And second, "surprisingly", battery-backed RAID write caches tends to fail
too, sometimes... ;)  Usually, such a battery is enough to keep the data
in memory for several hours only (sine many RAID controllers uses regular
RAM for memory caches, which requires some power to keep its state), --
I come across this issue the hard way, and realized that only very few
persons around me who manages raid systems even knows about this problem -
that the battery-backed cache is only for some time...  For example,
power failed at evening, and by tomorrow morning, batteries are empty
already.  Or, with better batteries, think about a weekend... ;)
(I've seen some vendors now uses flash-based backing store for caches
instead, which should ensure far better results here).

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-18 Thread Michael Tokarev
Andrew Morton wrote:
> (suitable cc added)

Thanks.  I was meant to sent it to linux-nfs originally, but
looks like i mistyped the address.

> (regression)

Now, after we did some more experiments with it, I don't think it's
a regression.  I'll post a bit more details in a few hours when the
ongoing testing finishes.  Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-18 Thread Michael Tokarev
Andrew Morton wrote:
 (suitable cc added)

Thanks.  I was meant to sent it to linux-nfs originally, but
looks like i mistyped the address.

 (regression)

Now, after we did some more experiments with it, I don't think it's
a regression.  I'll post a bit more details in a few hours when the
ongoing testing finishes.  Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Michael Tokarev
Ric Wheeler wrote:
 Alasdair G Kergon wrote:
 On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote:
 On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
 I wonder if it's worth the effort to try to implement this.

 My personal view (which seems to be in the minority) is that it's a
 waste of our development time *except* in the (rare?) cases similar to
 the ones Andi is talking about.
 
 Using working barriers is important for normal users when you really
 care about data loss and have normal drives in a box. We do power fail
 testing on boxes (with reiserfs and ext3) and can definitely see a lot
 of file system corruption eliminated over power failures when barriers
 are enabled properly.
 
 It is not unreasonable for some machines to disable barriers to get a
 performance boost, but I would not do that when you are storing things
 you really need back.

The talk here is about something different - about supporting barriers
on md/dm devices, i.e., on pseudo-devices which uses multiple real devices
as components (software RAIDs etc).  In this world it's nearly impossible
to support barriers if there are more than one underlying component device,
barriers only works if there's only one component.  And the talk is about
supporting barriers only in minority of cases - mostly for simplest
device-mapper case only, NOT covering any raid1 or other fancy configurations.

 Of course, you don't need barriers when you either disable the write
 cache on the drives or use a battery backed RAID array which gives you a
 write cache that will survive power outages...

Two things here.

First, I still don't understand why in God's sake barriers are working
while regular cache flushes are not.  Almost no consumer-grade hard drive
supports write barriers, but they all support regular cache flushes, and
the latter should be enough (while not the most speed-optimal) to ensure
data safety.  Why to require write cache disable (like in XFS FAQ) instead
of going the flush-cache-when-appropriate (as opposed to write-barrier-
when-appropriate) way?

And second, surprisingly, battery-backed RAID write caches tends to fail
too, sometimes... ;)  Usually, such a battery is enough to keep the data
in memory for several hours only (sine many RAID controllers uses regular
RAM for memory caches, which requires some power to keep its state), --
I come across this issue the hard way, and realized that only very few
persons around me who manages raid systems even knows about this problem -
that the battery-backed cache is only for some time...  For example,
power failed at evening, and by tomorrow morning, batteries are empty
already.  Or, with better batteries, think about a weekend... ;)
(I've seen some vendors now uses flash-based backing store for caches
instead, which should ensure far better results here).

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Michael Tokarev
Jeremy Higdon wrote:
[]
 I'll put it even more strongly.  My experience is that disabling write
 cache plus disabling barriers is often much faster than enabling both
 barriers and write cache enabled, when doing metadata intensive
 operations, as long as you have a drive that is good at CTQ/NCQ.

Now, and it's VERY interesting at least for me (and is off-topic in
this thread) -- which drive(s) are good at NCQ?  I tried numerous SATA
(NCQ is about sata, right? :) drives, but NCQ either does nothing in
terms of performance or hurts.  Yesterday we ordered another drive
from Hitachi (their raid edition thing), -- will try it tomorrow,
but I've no hope here as it's some 5th or 6th model/brand already.

(Ol'good SCSI drives, even 10 years old, shows large difference when
TCQ is enabled...)

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Spurious completions during NCQ

2008-02-15 Thread Michael Tokarev
Hugo Mills wrote:
> On Fri, Feb 15, 2008 at 10:00:00AM -0500, Calvin Walton wrote:
>> On Fri, 2008-02-15 at 13:46 +, Hugo Mills wrote:
>>> I'm getting these on my Dell Latitude D830:
>>>
>>> Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 
>>> 0x0 action 0x2 frozen
>>> Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ 
>>> issue=0x0 SAct=0x4 FIS=004040a1:0002
>>>In some cases, there are several cmd/res lines listed. It's
>>> happening about once an hour or so (not correlated with any other
>>> event that I can see). It doesn't seem to be affecting operation of
>>> the machine, but it's making me nervous.

JFYI: Most probably it is correlated with smartd asking
the device for it's SMART status.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] quota: Turn quotas off when remounting read-only

2008-02-15 Thread Michael Tokarev
Jan Engelhardt wrote:
> On Feb 11 2008 13:39, Jan Kara wrote:
>>> But...  I'm thinking about this scenario:
>>>
>>>  # mount /data
>>>  # quotaon /data
>>>  (some maintenance stuff to be planned)
>>>  # mount -o remount,ro /data
>>>  (do backup etc)
>>>  # mount -r remount,rw /data
>>>
>>> at this point, it's expected that quota on /data is enabled.
>>> After this patch, it's not anymore...
>>  Yes, it previously accidentally worked this way (for an year or so,
>> before that we refused to remount read-only). Hmm, but maybe we could
>> somehow tweak quotas to be turned on when remounting read-write again.
>> We have all the information we need at the time of remounting read-only
>> so we could store it and use it later when remounting read-write. I'll have
>> a look into that.
> 
> Maybe it is possible to leave quota on all times, so that the
> reporting quota ioctls continue to work even in ro mode?

Well, that'd be the best approach imho (plus check if all
ioctls which try to modify quotas fails with EROFS as
appropriate).

But the problem really is that it's unknown at this time
where it fails in the first place.  I can't reproduce my
hang "on demand" (mount-ro followed with umount when quotas
are turned on, with ext3fs - umount never finishes), yet
it has biten me for several times already.  So it must be
something rare, some small race maybe, which is difficult
to find...  Yet it finds itself at the most inappropriate
moment. ;)  I already learned to turn the quota off before
doing something with a filesystem, but sometimes I'm
forgetting this, and the result is always the same... ;)
Oh well.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Implement barrier support for single device DM devices

2008-02-15 Thread Michael Tokarev
Alasdair G Kergon wrote:
> On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote:
>> Implement barrier support for single device DM devices
>  
> Thanks.  We've got some (more-invasive) dm patches in the works that
> attempt to use flushing to emulate barriers where we can't just
> pass them down like that.

I wonder if it's worth the effort to try to implement this.

As far as I understand (*), if a filesystem realizes that the
underlying block device does not support barriers, it will
switch to using regular flushes instead - isn't it the same
thing as you're trying to do on an MD level?

Note that a filesystem must understand barriers/flushes on
underlying block device, since many disk drives don't support
barriers anyway.

(*) this is, in fact, an interesting question.  I still can't
find complete information about this.  For example, how safe
xfs is if barriers are not supported or turned off?  Is it
"less safe" than with barriers?  Will it use regular cache
flushes if barriers are not here?  Ditto for ext3fs, but
here, barriers are not enabled by default.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] quota: Turn quotas off when remounting read-only

2008-02-15 Thread Michael Tokarev
Jan Engelhardt wrote:
 On Feb 11 2008 13:39, Jan Kara wrote:
 But...  I'm thinking about this scenario:

  # mount /data
  # quotaon /data
  (some maintenance stuff to be planned)
  # mount -o remount,ro /data
  (do backup etc)
  # mount -r remount,rw /data

 at this point, it's expected that quota on /data is enabled.
 After this patch, it's not anymore...
  Yes, it previously accidentally worked this way (for an year or so,
 before that we refused to remount read-only). Hmm, but maybe we could
 somehow tweak quotas to be turned on when remounting read-write again.
 We have all the information we need at the time of remounting read-only
 so we could store it and use it later when remounting read-write. I'll have
 a look into that.
 
 Maybe it is possible to leave quota on all times, so that the
 reporting quota ioctls continue to work even in ro mode?

Well, that'd be the best approach imho (plus check if all
ioctls which try to modify quotas fails with EROFS as
appropriate).

But the problem really is that it's unknown at this time
where it fails in the first place.  I can't reproduce my
hang on demand (mount-ro followed with umount when quotas
are turned on, with ext3fs - umount never finishes), yet
it has biten me for several times already.  So it must be
something rare, some small race maybe, which is difficult
to find...  Yet it finds itself at the most inappropriate
moment. ;)  I already learned to turn the quota off before
doing something with a filesystem, but sometimes I'm
forgetting this, and the result is always the same... ;)
Oh well.

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Spurious completions during NCQ

2008-02-15 Thread Michael Tokarev
Hugo Mills wrote:
 On Fri, Feb 15, 2008 at 10:00:00AM -0500, Calvin Walton wrote:
 On Fri, 2008-02-15 at 13:46 +, Hugo Mills wrote:
 I'm getting these on my Dell Latitude D830:

 Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 
 0x0 action 0x2 frozen
 Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ 
 issue=0x0 SAct=0x4 FIS=004040a1:0002
In some cases, there are several cmd/res lines listed. It's
 happening about once an hour or so (not correlated with any other
 event that I can see). It doesn't seem to be affecting operation of
 the machine, but it's making me nervous.

JFYI: Most probably it is correlated with smartd asking
the device for it's SMART status.

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-13 Thread Michael Tokarev
Hello!

After upgrading to 2.6.24 (from .23), we're seeing ALOT
of messages like in $subj in dmesg:

Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
...

with linux NFS server.  The clients are all linux too, mostly 2.6.23
and some 2.6.22.

I found the "offending" piece of code in net/sunrpc/svcsock.c,
in routine svc_tcp_recvfrom() with condition being:

   if (svsk->sk_reclen > serv->sv_max_mesg) ...

This happens after a server reboot.  At this point, client(s) are trying
to perform some NFS transaction and fail, and server starts generating
the above messages - till I do a umount followed by mount on all clients.
Before, such situation (nfs server reboot) were handled transparently,
ie, there was nothing to do, the mount continued working just fine when
the server comes back online.

Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
problem.  Some time ago we also upgraded nfs-kernel-server (Debian)
package, and the remount-after-nfs-server-reboot problem started to
occur at THAT time (and it is something to worry about as well, I just
had no time to deal with it); but the dmesg spamming only appeared
with 2.6.24.

How to debug the issue further on from this point?

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.24: RPC: bad TCP reclen 0x00020090 (large)

2008-02-13 Thread Michael Tokarev
Hello!

After upgrading to 2.6.24 (from .23), we're seeing ALOT
of messages like in $subj in dmesg:

Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
...

with linux NFS server.  The clients are all linux too, mostly 2.6.23
and some 2.6.22.

I found the offending piece of code in net/sunrpc/svcsock.c,
in routine svc_tcp_recvfrom() with condition being:

   if (svsk-sk_reclen  serv-sv_max_mesg) ...

This happens after a server reboot.  At this point, client(s) are trying
to perform some NFS transaction and fail, and server starts generating
the above messages - till I do a umount followed by mount on all clients.
Before, such situation (nfs server reboot) were handled transparently,
ie, there was nothing to do, the mount continued working just fine when
the server comes back online.

Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
problem.  Some time ago we also upgraded nfs-kernel-server (Debian)
package, and the remount-after-nfs-server-reboot problem started to
occur at THAT time (and it is something to worry about as well, I just
had no time to deal with it); but the dmesg spamming only appeared
with 2.6.24.

How to debug the issue further on from this point?

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] quota: Turn quotas off when remounting read-only

2008-02-07 Thread Michael Tokarev
Andrew Morton wrote:
> On Thu, 7 Feb 2008 15:37:21 +0100 Jan Kara <[EMAIL PROTECTED]> wrote:
> 
>> Turn off quotas before filesystem is remounted read only. Otherwise quota 
>> will
>> try to write to read-only filesystem which does no good... We could also just
>> refuse to remount ro when quota is enabled but turning quota off is 
>> consistent
>> with what we do on umount.

[a nice one-liner snipped]

> Cool.  And this is applicable to 2.6.23, 2.6.22 and even earlier, isn't it?

Provided the amount of time this issue exists, I don't think it's worth
to push it to -stable.  It's an ld, issue, which happens quite
rarely, and no one bothered to report it so far...  But it's not my
call... ;)

But...  I'm thinking about this scenario:

 # mount /data
 # quotaon /data
 (some maintenance stuff to be planned)
 # mount -o remount,ro /data
 (do backup etc)
 # mount -r remount,rw /data

at this point, it's expected that quota on /data is enabled.
After this patch, it's not anymore...

I think it's more usual scenario than mine (umount instead of
remount-rw).  And this change will break it.  So I'm not sure
what really to do here.  Probably refusing remount-ro if quota
is on is better...  it's annoying for sure, but at least it's
explicit, and avoids the handg too.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: remount-ro & umount & quota interaction

2008-02-07 Thread Michael Tokarev
Jan Kara wrote:
[]
>> I mean, why it locks in the first place?  Quota subsystem trying
>> to write something into an read-only filesystem?  If so, WHY it
>> is trying to do that on umount instead on a remount-ro?

>   Actually, I couldn't reproduce the hang on my testing machine so I don't
> know exactly why it hangs. But my guess is that it's because we try to
> write to the filesystem...

I can't reproduce it here easily as well.  Yesterday I had a
locked-up console and had to hard-reboot the machine due to
this (it was far from first time when I've hit this issue),
but "on-demand reproducing" don't work (the uptime on that
host was about 100 days, and I had to do some repartition -
hence remount-ro to copy consistent data to other place -
maybe during that 100 day there was something... ;)

And I wasn't able to reproduce it on 2.6.24 so far, as well
(this one is only used on a test machine so far).

I'll keep trying ;)

Thanks for your support!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: remount-ro & umount & quota interaction

2008-02-07 Thread Michael Tokarev
Jan Kara wrote:
[deadlock after remount-ro followed with umount when
 quota is enabled]

>   Of course, thanks for report :). The problem is we allow remounting
> read only which we should refuse when quota is enabled. I'll fix that in
> a minute.

Hmm.  While that will prevent the lockup, maybe it's better to
perform an equivalent of quotaoff on mount-ro instead?  Or even
do something more useful, like flush the quota stuff like the
rest of the filesystem is flushed to disk, so that on umount,
quota will not stay on the way...

I mean, why it locks in the first place?  Quota subsystem trying
to write something into an read-only filesystem?  If so, WHY it
is trying to do that on umount instead on a remount-ro?

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] quota: Turn quotas off when remounting read-only

2008-02-07 Thread Michael Tokarev
Andrew Morton wrote:
 On Thu, 7 Feb 2008 15:37:21 +0100 Jan Kara [EMAIL PROTECTED] wrote:
 
 Turn off quotas before filesystem is remounted read only. Otherwise quota 
 will
 try to write to read-only filesystem which does no good... We could also just
 refuse to remount ro when quota is enabled but turning quota off is 
 consistent
 with what we do on umount.

[a nice one-liner snipped]

 Cool.  And this is applicable to 2.6.23, 2.6.22 and even earlier, isn't it?

Provided the amount of time this issue exists, I don't think it's worth
to push it to -stable.  It's an ld, issue, which happens quite
rarely, and no one bothered to report it so far...  But it's not my
call... ;)

But...  I'm thinking about this scenario:

 # mount /data
 # quotaon /data
 (some maintenance stuff to be planned)
 # mount -o remount,ro /data
 (do backup etc)
 # mount -r remount,rw /data

at this point, it's expected that quota on /data is enabled.
After this patch, it's not anymore...

I think it's more usual scenario than mine (umount instead of
remount-rw).  And this change will break it.  So I'm not sure
what really to do here.  Probably refusing remount-ro if quota
is on is better...  it's annoying for sure, but at least it's
explicit, and avoids the handg too.

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: remount-ro umount quota interaction

2008-02-07 Thread Michael Tokarev
Jan Kara wrote:
[]
 I mean, why it locks in the first place?  Quota subsystem trying
 to write something into an read-only filesystem?  If so, WHY it
 is trying to do that on umount instead on a remount-ro?

   Actually, I couldn't reproduce the hang on my testing machine so I don't
 know exactly why it hangs. But my guess is that it's because we try to
 write to the filesystem...

I can't reproduce it here easily as well.  Yesterday I had a
locked-up console and had to hard-reboot the machine due to
this (it was far from first time when I've hit this issue),
but on-demand reproducing don't work (the uptime on that
host was about 100 days, and I had to do some repartition -
hence remount-ro to copy consistent data to other place -
maybe during that 100 day there was something... ;)

And I wasn't able to reproduce it on 2.6.24 so far, as well
(this one is only used on a test machine so far).

I'll keep trying ;)

Thanks for your support!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


remount-ro & umount & quota interaction

2008-02-06 Thread Michael Tokarev
For a long time I'm bitten by a bad interaction
of mount -o remount,ro and quota operations.

The sequence is as follows:

 mount /fs
 quotaon -ug /fs
 mount -o remount,ro /fs
 umount /fs

At this point, umount never returns.  /proc/$pid/wchan
shows vfs_quota_off:

Feb  6 20:53:25 linux kernel: umountD e5183eb8 0  8646  1
Feb  6 20:53:25 linux kernel:e5183ecc 0086 0002 e5183eb8 
e5183eb0  c1db2540 c1db2684
Feb  6 20:53:25 linux kernel:c1db2684 c1c0dd00  cfd9f1c0 
c0367080 c0367080 f5849000 f7f06880
Feb  6 20:53:25 linux kernel:f7e89d80  c0367080 b7c9795c 
005f3997  00ff 
Feb  6 20:53:25 linux kernel: Call Trace:
Feb  6 20:53:25 linux kernel:  [] vfs_quota_off+0x345/0x490
Feb  6 20:53:25 linux kernel:  [] autoremove_wake_function+0x0/0x50
Feb  6 20:53:25 linux kernel:  [] deactivate_super+0x46/0x80
Feb  6 20:53:25 linux kernel:  [] sys_umount+0x4a/0x240
Feb  6 20:53:25 linux kernel:  [] sys_stat64+0xf/0x30
Feb  6 20:53:25 linux kernel:  [] remove_vma+0x39/0x50
Feb  6 20:53:25 linux kernel:  [] do_munmap+0x197/0x1f0
Feb  6 20:53:25 linux kernel:  [] sys_oldumount+0x15/0x20
Feb  6 20:53:25 linux kernel:  [] sysenter_past_esp+0x5f/0x85

The filesystem is ext3.  The issue is here for a long time,
at least since before 2.6.20, and is still present in 2.6.23
(I'll try 2.6.24 later today).

Can it be fixed please? :)

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


remount-ro umount quota interaction

2008-02-06 Thread Michael Tokarev
For a long time I'm bitten by a bad interaction
of mount -o remount,ro and quota operations.

The sequence is as follows:

 mount /fs
 quotaon -ug /fs
 mount -o remount,ro /fs
 umount /fs

At this point, umount never returns.  /proc/$pid/wchan
shows vfs_quota_off:

Feb  6 20:53:25 linux kernel: umountD e5183eb8 0  8646  1
Feb  6 20:53:25 linux kernel:e5183ecc 0086 0002 e5183eb8 
e5183eb0  c1db2540 c1db2684
Feb  6 20:53:25 linux kernel:c1db2684 c1c0dd00  cfd9f1c0 
c0367080 c0367080 f5849000 f7f06880
Feb  6 20:53:25 linux kernel:f7e89d80  c0367080 b7c9795c 
005f3997  00ff 
Feb  6 20:53:25 linux kernel: Call Trace:
Feb  6 20:53:25 linux kernel:  [c01a2a65] vfs_quota_off+0x345/0x490
Feb  6 20:53:25 linux kernel:  [c013a3a0] autoremove_wake_function+0x0/0x50
Feb  6 20:53:25 linux kernel:  [c0174bf6] deactivate_super+0x46/0x80
Feb  6 20:53:25 linux kernel:  [c0188bba] sys_umount+0x4a/0x240
Feb  6 20:53:25 linux kernel:  [c017637f] sys_stat64+0xf/0x30
Feb  6 20:53:25 linux kernel:  [c0162069] remove_vma+0x39/0x50
Feb  6 20:53:25 linux kernel:  [c0162b67] do_munmap+0x197/0x1f0
Feb  6 20:53:25 linux kernel:  [c0188dc5] sys_oldumount+0x15/0x20
Feb  6 20:53:25 linux kernel:  [c010417e] sysenter_past_esp+0x5f/0x85

The filesystem is ext3.  The issue is here for a long time,
at least since before 2.6.20, and is still present in 2.6.23
(I'll try 2.6.24 later today).

Can it be fixed please? :)

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp on an AMD x2-64, 2.6.24: regression?

2008-02-01 Thread Michael Tokarev
Michael Tokarev wrote:
> Rafael J. Wysocki wrote:
[]
>> I guess it's a special variation of
>> http://bugzilla.kernel.org/show_bug.cgi?id=9528
>>
>> Please try to hibernate in the shutdown mode (ie. echo
>> "shutdown" into /sys/power/disk before hibernation).

[yes it works with shutdown...]

> In any way, this is definitely progress, and that bug
> seems to be the same as I'm seeing here.
> 
> Now... I see there's a new BIOS for this mobo available
> (it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)),
> which is more recent compared with what I have here.  Trying
> it now (will try to reflash it without a floppy - it turns
> out to be quite.. challenging task ;)

Ok, updated the bios (using freedos virtual boot floppy provided
by memdisk from syslinux), and... now it all works correctly!

I was definitely blaming linux for the regression - obviously,
as "before-kernel" worked, while "current-kernel" does not
anymore.  But the problem seems to be due to some bios buglet.
Oh well... ;)

> Thanks!
!

Now I can go on with my other.. question, namely the UPS thingie... :)

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp on an AMD x2-64, 2.6.24: regression?

2008-02-01 Thread Michael Tokarev
Rafael J. Wysocki wrote:
> On Friday, 1 of February 2008, Michael Tokarev wrote:
[]
>> no_console_suspend it is.  Tried that, the "S|" thing is still
>> here, but instead of "Suspending console(s)" it now shows
>> progress of suspending other devices.  The end result is
>> the same - finally it stops and sits here ad infinitum.
> 
> I guess it's a special variation of
> http://bugzilla.kernel.org/show_bug.cgi?id=9528
> 
> Please try to hibernate in the shutdown mode (ie. echo
> "shutdown" into /sys/power/disk before hibernation).

Hmm.  A very obscure thing - that bug, that is.

Tried "shutdown" - it works - even with all the other
"fancy" stuff like highres timers, cpufreq et al.  And
it resumes correctly as well.

After reading all the stuff attached to that bugreport,
I also tried removing ohci_hcd - it also works just fine
(had to do it in one line --
   rmmod ohci-hcd; sleep 5; echo disk > /sys/power/state
-- because I don't have non-USB keyboard handy :)

What I also noticied is that at least twice while doing
all the experiments, I've seen a message similar to (off
memory):

  ohci_hcd: unlink after non-IRQ - controller is probably using the wrong IRQ

this is done when no_console_suspend is enabled - during
the final stage of suspend, when the kernel prints messages
about disabling acpi devices.  I can't reproduce it easily,
but it happened at least twice with the same kernel configuration
(i tried different options, many variations, recompiling and
reinstalling kernel each time).

In any way, this is definitely progress, and that bug
seems to be the same as I'm seeing here.

Now... I see there's a new BIOS for this mobo available
(it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)),
which is more recent compared with what I have here.  Trying
it now (will try to reflash it without a floppy - it turns
out to be quite.. challenging task ;)

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp on an AMD x2-64, 2.6.24: regression?

2008-02-01 Thread Michael Tokarev
Pavel Machek wrote:
> On Fri 2008-02-01 00:41:06, Michael Tokarev wrote:
[]
>> With 2.6.24, it tries to suspend, saves pages to disk,
>> when prints this:
>>
>> ..Saving pages... done.
>> Sl

It's actually "S|", not "Sl".

>> Suspending console(s)
>> _
>>
>> At this point, nothing more happens.  It does not
>> react to keyboard or to any other external events,

..because the keyboard is USB-connected, and it shuts
down all USB devices.  I'll try with PS/2 keyboard
(when I'll find one I had somewhere... ;)

[]
> no_console_suspend (sp?), nohz=off, highres=off, and try with minimum
> config.

no_console_suspend it is.  Tried that, the "S|" thing is still
here, but instead of "Suspending console(s)" it now shows
progress of suspending other devices.  The end result is
the same - finally it stops and sits here ad infinitum.

nohz and highres are useless now, as I recompiled the kernel
without support for those, and without CPU_IDLE and other
fancy stuff, and disabled cpufreq just in case.

What's minimum config?  Should I turn off SMP (it's a dual-core
CPU by the way)?  Something else?  (I already removed most
driver modules when when trying suspend - only ones which are
absolutely necessary are left).

I've read Documentation/power/tricks.txt. From that list,
I have the following:

 o all drivers are unloaded except disk and usb (keyboard)
 o preempt is disabled (was never enabled)
 o APIC IS in use.
 o modules are in use.  Is it worth to try module-less?
 o vga text console - not even "vga" per se, - no framebuffers
   and such, not even as modules.  No "video mode switching
   support" is enabled.
 o only a few processes left, in like single-user mode.

One other difference between 2.6.23 and 2.6.24 as I see here
is: 2.6.24 tells me about TSC unstability (when I load cpufreq
stuff), while 2.6.23 did not.  This is about 64bit mode - with
32bits, both switches from tsc to hpet, so in this regard,
2.6.24 (with 32bits) is not different from 2.6.23 it seems
(i mean in relation with suspend issues, since 32bits .23
mentioned tsc instability yet it suspended fine).

So I'm.. stuck. :)  Don't know where to go from here.

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp on an AMD x2-64, 2.6.24: regression?

2008-02-01 Thread Michael Tokarev
Pavel Machek wrote:
 On Fri 2008-02-01 00:41:06, Michael Tokarev wrote:
[]
 With 2.6.24, it tries to suspend, saves pages to disk,
 when prints this:

 ..Saving pages... done.
 Sl

It's actually S|, not Sl.

 Suspending console(s)
 _

 At this point, nothing more happens.  It does not
 react to keyboard or to any other external events,

..because the keyboard is USB-connected, and it shuts
down all USB devices.  I'll try with PS/2 keyboard
(when I'll find one I had somewhere... ;)

[]
 no_console_suspend (sp?), nohz=off, highres=off, and try with minimum
 config.

no_console_suspend it is.  Tried that, the S| thing is still
here, but instead of Suspending console(s) it now shows
progress of suspending other devices.  The end result is
the same - finally it stops and sits here ad infinitum.

nohz and highres are useless now, as I recompiled the kernel
without support for those, and without CPU_IDLE and other
fancy stuff, and disabled cpufreq just in case.

What's minimum config?  Should I turn off SMP (it's a dual-core
CPU by the way)?  Something else?  (I already removed most
driver modules when when trying suspend - only ones which are
absolutely necessary are left).

I've read Documentation/power/tricks.txt. From that list,
I have the following:

 o all drivers are unloaded except disk and usb (keyboard)
 o preempt is disabled (was never enabled)
 o APIC IS in use.
 o modules are in use.  Is it worth to try module-less?
 o vga text console - not even vga per se, - no framebuffers
   and such, not even as modules.  No video mode switching
   support is enabled.
 o only a few processes left, in like single-user mode.

One other difference between 2.6.23 and 2.6.24 as I see here
is: 2.6.24 tells me about TSC unstability (when I load cpufreq
stuff), while 2.6.23 did not.  This is about 64bit mode - with
32bits, both switches from tsc to hpet, so in this regard,
2.6.24 (with 32bits) is not different from 2.6.23 it seems
(i mean in relation with suspend issues, since 32bits .23
mentioned tsc instability yet it suspended fine).

So I'm.. stuck. :)  Don't know where to go from here.

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp on an AMD x2-64, 2.6.24: regression?

2008-02-01 Thread Michael Tokarev
Rafael J. Wysocki wrote:
 On Friday, 1 of February 2008, Michael Tokarev wrote:
[]
 no_console_suspend it is.  Tried that, the S| thing is still
 here, but instead of Suspending console(s) it now shows
 progress of suspending other devices.  The end result is
 the same - finally it stops and sits here ad infinitum.
 
 I guess it's a special variation of
 http://bugzilla.kernel.org/show_bug.cgi?id=9528
 
 Please try to hibernate in the shutdown mode (ie. echo
 shutdown into /sys/power/disk before hibernation).

Hmm.  A very obscure thing - that bug, that is.

Tried shutdown - it works - even with all the other
fancy stuff like highres timers, cpufreq et al.  And
it resumes correctly as well.

After reading all the stuff attached to that bugreport,
I also tried removing ohci_hcd - it also works just fine
(had to do it in one line --
   rmmod ohci-hcd; sleep 5; echo disk  /sys/power/state
-- because I don't have non-USB keyboard handy :)

What I also noticied is that at least twice while doing
all the experiments, I've seen a message similar to (off
memory):

  ohci_hcd: unlink after non-IRQ - controller is probably using the wrong IRQ

this is done when no_console_suspend is enabled - during
the final stage of suspend, when the kernel prints messages
about disabling acpi devices.  I can't reproduce it easily,
but it happened at least twice with the same kernel configuration
(i tried different options, many variations, recompiling and
reinstalling kernel each time).

In any way, this is definitely progress, and that bug
seems to be the same as I'm seeing here.

Now... I see there's a new BIOS for this mobo available
(it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)),
which is more recent compared with what I have here.  Trying
it now (will try to reflash it without a floppy - it turns
out to be quite.. challenging task ;)

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hibernate/suspend-to-disk: to turn power or not?

2008-01-31 Thread Michael Tokarev
Pavel Machek wrote:
[]
>> I'm looking at the uswsusp source (while the kernel compiles),
>> and have a question here.  Is it possible to call some external
>> application (typically a shell script) to do the final work after
>> when the image has been written?  I mean in principle - I
>> understand there are some limitations here, but I don't know
>> which exactly.
> 
> No, you can't exec() anything. That would write mtime back to disk and
> cause badness.

Now that's.. interesting.

s2disk writes to a swap device/file, which should update
mtime of this device node/file.  Isn't it something which
also causes the same badness?

Also, if the only concern is mtime update, what's really
wrong with it?  I mean, regardless of whether such update
will finally hit the disk or not, there's not much difference
really - it updates just mtime field, and there should be
no harm in that.  Unless such update first goes to a
journal (in a journalling filesystem) - which DOES modify
some on-disk structures.

>> it typically involves writing/reading something to/from
>> a given serial port (/dev/ttySxx), or to an USB device,

> Create libups.so, and link s2disk to it?

And what's the difference here again?  We'll open a serial
port and write something to it - which, again, will update
mtime of that device node.  Unless the said node is on a
tmpfs, exactly the same badness will happen.  Not all the
world is udev, after all.

So I don't get the reason why we can't exec something here,
still.  (And, for example, call splashy commands as external
processes, instead of linking all this cruft into s2disk and
resume.)

What I'm thinking about here is - s2ram mlock()s its memory.
If it will fork/exec something, that something will obviously
NOT be locked like that.  Is it of some concern?  Probably
not, because that something will be executed after we've
taken the snapshot.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >