Re: Suspend code ordering (again)

2007-12-27 Thread Robert Hancock

Rafael J. Wysocki wrote:

On Wednesday, 26 of December 2007, Linus Torvalds wrote:

On Tue, 25 Dec 2007, Rafael J. Wysocki wrote:

the ACPI specification between versions 1.0x and 2.0.  Namely, while ACPI
2.0 and later wants us to put devices into low power states before calling
_PTS, ACPI 1.0x wants us to do that after calling _PTS.  Since we're following
the 2.0 and later specifications right now, we're not doing the right thing for
the (strictly) ACPI 1.0x-compliant systems.

We ought to be able to fix things on the high level, by calling _PTS earlier on
systems that claim to be ACPI 1.0x-compliant.  That will require us to modify
the generic susped code quite a bit and will need to be tested for some time.
That's insane. Are you really saying that ACPI wants totally different 
orderings for different versions of the spec?


Yes, I am.


And does Windows really do that?


I don't know.

Please don't make lots of modifications to the generic suspend code. The 
only thing that is worth doing is to just have a firmware callback before 
the "device_suspend()" thing (and then on a ACPI-1.0 system, call _PTS 
*there*), and on an ACPI-2.0 system, call _PTS *after* device_suspend().


Yes, that's what I'm going to do, but I need to untangle some ACPI code for
this purpose.

Still, the fact is, some (most, I think) drivers *should* put themselves 
into D3 only in "late_suspend()", so if ACPI-2.0 really expects _PTS to be 
called after that, we're just screwed.


Well, section 9.1.6 of ACPI 2.0 specifies the suspend ordering directly and
says exactly that _PTS is to be executed after putting devices into respective
D states.


I would not take those sections as gospel, they're really an example 
only. It's quite possible that Windows does not follow that ordering.


Also, as was pointed out, pre-Vista versions of Windows follow ACPI 1.0 
and Vista follows 3.0, so 2.0 doesn't really matter since BIOS people 
won't test against it. 1.0 specifies that _PTS is to be called before 
suspending devices and 3.0 says that the AML must not depend on any 
specific device power state, so in both cases it should be safe to call 
_PTS before suspending, no?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Robert Hancock

Arjan van de Ven wrote:

2) [non-minor] h.

[EMAIL PROTECTED] ~]$ lspci -n | wc -l
23

So I would have to perform 23 sysfs twiddles, before I could obtain a 
full and unabridged 'lspci -vvvxxx'?


not you as human, but "lspci" ought to yes.

For the userspace interface, the most-often-used knob for diagnostic 
purposes will be the easiest one.  And that's


the easiest one is an option to lspci. Nothing more nothing less.

Making a global knob in kernel space is a lot more tricky, and in addition
really there's enough cases where userspace wants the one device anyway
Doing the "for each device I'm about to dump" in lspci is pretty much as hard 
as doing
the global one (if not simpler)


So then if you have a system where MMCONFIG doesn't work and you're not 
using any devices that require extended config space, then doing lspci 
-vvvxxx will blow up the machine? Yuck.


Still don't like this approach. It seems like (partially) covering up 
problems rather than solving them.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Robert Hancock

Arjan van de Ven wrote:

2) [non-minor] h.

[EMAIL PROTECTED] ~]$ lspci -n | wc -l
23

So I would have to perform 23 sysfs twiddles, before I could obtain a 
full and unabridged 'lspci -vvvxxx'?


not you as human, but lspci ought to yes.

For the userspace interface, the most-often-used knob for diagnostic 
purposes will be the easiest one.  And that's


the easiest one is an option to lspci. Nothing more nothing less.

Making a global knob in kernel space is a lot more tricky, and in addition
really there's enough cases where userspace wants the one device anyway
Doing the for each device I'm about to dump in lspci is pretty much as hard 
as doing
the global one (if not simpler)


So then if you have a system where MMCONFIG doesn't work and you're not 
using any devices that require extended config space, then doing lspci 
-vvvxxx will blow up the machine? Yuck.


Still don't like this approach. It seems like (partially) covering up 
problems rather than solving them.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspend code ordering (again)

2007-12-27 Thread Robert Hancock

Rafael J. Wysocki wrote:

On Wednesday, 26 of December 2007, Linus Torvalds wrote:

On Tue, 25 Dec 2007, Rafael J. Wysocki wrote:

the ACPI specification between versions 1.0x and 2.0.  Namely, while ACPI
2.0 and later wants us to put devices into low power states before calling
_PTS, ACPI 1.0x wants us to do that after calling _PTS.  Since we're following
the 2.0 and later specifications right now, we're not doing the right thing for
the (strictly) ACPI 1.0x-compliant systems.

We ought to be able to fix things on the high level, by calling _PTS earlier on
systems that claim to be ACPI 1.0x-compliant.  That will require us to modify
the generic susped code quite a bit and will need to be tested for some time.
That's insane. Are you really saying that ACPI wants totally different 
orderings for different versions of the spec?


Yes, I am.


And does Windows really do that?


I don't know.

Please don't make lots of modifications to the generic suspend code. The 
only thing that is worth doing is to just have a firmware callback before 
the device_suspend() thing (and then on a ACPI-1.0 system, call _PTS 
*there*), and on an ACPI-2.0 system, call _PTS *after* device_suspend().


Yes, that's what I'm going to do, but I need to untangle some ACPI code for
this purpose.

Still, the fact is, some (most, I think) drivers *should* put themselves 
into D3 only in late_suspend(), so if ACPI-2.0 really expects _PTS to be 
called after that, we're just screwed.


Well, section 9.1.6 of ACPI 2.0 specifies the suspend ordering directly and
says exactly that _PTS is to be executed after putting devices into respective
D states.


I would not take those sections as gospel, they're really an example 
only. It's quite possible that Windows does not follow that ordering.


Also, as was pointed out, pre-Vista versions of Windows follow ACPI 1.0 
and Vista follows 3.0, so 2.0 doesn't really matter since BIOS people 
won't test against it. 1.0 specifies that _PTS is to be called before 
suspending devices and 3.0 says that the AML must not depend on any 
specific device power state, so in both cases it should be safe to call 
_PTS before suspending, no?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch?] s2ram + P4 + tsc = annoyance

2007-12-27 Thread Robert Hancock

Mike Galbraith wrote:

Greetings,

s2ram recently became useful here, except for the kernel's annoying
habit of disabling my P4's perfectly good TSC.

[  107.894470] CPU 1 is now offline
[  107.894474] SMP alternatives: switching to UP code
[  107.895832] CPU0 attaching sched-domain:
[  107.895836]  domain 0: span 1
[  107.895838]   groups: 1
[  107.896097] CPU1 is down
[3.726156] Intel machine check architecture supported.
[3.726165] Intel machine check reporting enabled on CPU#0.
[3.726167] CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
[3.726170] CPU0: Thermal monitoring enabled
[3.726175] Back to C!
[3.726708] Force enabled HPET at resume
[3.726775] Enabling non-boot CPUs ...
[3.727049] CPU0 attaching NULL sched-domain.
[3.727165] SMP alternatives: switching to SMP code
[3.727858] Booting processor 1/1 eip 3000
[3.727862] CPU 1 irqstacks, hard=b042f000 soft=b042d000
[3.738173] Initializing CPU#1
[3.798912] Calibrating delay using timer specific routine.. 5986.12 
BogoMIPS (lpj=2993061)
[3.798920] CPU: After generic identify, caps: bfebfbff   
 4400   
[3.798931] CPU: Trace cache: 12K uops, L1 D cache: 8K
[3.798934] CPU: L2 cache: 512K
[3.798936] CPU: Physical Processor ID: 0
[3.798938] CPU: After all inits, caps: bfebfbff   b080 
4400   
[3.798946] Intel machine check architecture supported.
[3.798952] Intel machine check reporting enabled on CPU#1.
[3.798955] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
[3.798959] CPU1: Thermal monitoring enabled
[3.799161] CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
[3.799187] checking TSC synchronization [CPU#0 - CPU#1]:
[3.819181] Measured 63588552840 cycles TSC warp between CPUs, turning off 
TSC clock.
[3.819184] Marking TSC unstable due to: check_tsc_sync_source failed.

I wonder why I'm the only guy in the galaxy experiencing this.  Does
everybody else's clock continue to move forward across resume or
something?  Anyway, I asked it to please stop doing that, and it
complied without even exploding (unlike crabby APICs).


Are we missing some logic to resync the TSCs after resume, or something?

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Robert Hancock

Linus Torvalds wrote:


On Thu, 27 Dec 2007, Jeff Garzik wrote:

2) [non-minor] h.

[EMAIL PROTECTED] ~]$ lspci -n | wc -l
23

So I would have to perform 23 sysfs twiddles, before I could obtain a full and
unabridged 'lspci -vvvxxx'?


Or you force it on with pci=mmconfig or something at boot-time.

But yes. The *fact* is that MMCONFIG has not just been globally broken, 
but broken on a per-device basis. I don't know why (and quite frankly, I 
doubt anybody does), but the PCI device ID corruption happened only for a 
specific set of devices.


Whether it was a timing issue with particular devices or whether it was a 
timing issue with some particular bridge (and could affect any devices 
behind that bridge), who knows... It almost certainly was brought on by a 
borderline (or broken) northbridge, but it apparently only affected 
specific devices - which makes me suspect that it wasn't *entirely* due to 
just the northbridge, and it was a combination of things.


Pointer to such a report? The only single-device problems I'm aware of
are with some devices within the K8 integrated northbridge, which we
already handle. Other than that, the only non-global problems I'm aware
of are devices behind host bridges which can't receive/handle MMCONFIG
requests, in which case the problem would be bus-wide.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Robert Hancock

Linus Torvalds wrote:
But as mentioned, there were other reports too of the exact same bug (with 
different PCI devices, but the same vendor == 0001 bogosity).


Googling for

lspci Unknown device 0001: mmconfig

shows reports like these:

http://lkml.org/lkml/2007/10/29/500
http://madwifi.org/ticket/1587
http://www.nvnews.net/vbulletin/showthread.php?t=103271
http://naoya.g.hatena.ne.jp/naoya/20070529/1180436756
http://bbs.archlinux.org/viewtopic.php?id=34321
...

which all seem to be due to this same bug with different cards (but the 
common theme seems to be an ATI northbridge).


This isn't an example of a per-device breakage, though. It only shows up 
on some devices, but the cause is apparently the chipset. Those devices 
work fine on other boards.


As mentioned later, it appears that CRS stuff might be related to this 
problem, but if it couldn't be fixed, I think the only sane solution 
would be to blacklist MMCONFIG support on that chipset.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspend code ordering (again)

2007-12-27 Thread Robert Hancock

Rafael J. Wysocki wrote:
Also, as was pointed out, pre-Vista versions of Windows follow ACPI 1.0 
and Vista follows 3.0, so 2.0 doesn't really matter since BIOS people 
won't test against it. 1.0 specifies that _PTS is to be called before 
suspending devices and 3.0 says that the AML must not depend on any 
specific device power state, so in both cases it should be safe to call 
_PTS before suspending, no?


Well, IMO, if we take one option only (whichever that is) and there are systems
that follow the other one, they will likely break.

Apart from this, there are BIOSes that openly claim ACPI 2.0 support (for
example, the one in my HP nx6325 does that) and they may actually prefer the
post-ACPI-1.0 ordering even if they work with the pre-ACPI-2.0 one.


I doubt they would prefer the later ordering in any way that matters, if 
the Windows version they were designed for uses the earlier ordering.


It would be best if somebody could manage to find out what ordering 
Windows XP (and Windows Vista, for good measure) actually use, then we 
could just use that. Virtual machine trickery might be an option - the 
only complication being that it'll be using the DSDT for the fake 
machine and not the real one..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HSM violation errors

2007-12-25 Thread Robert Hancock

Jeff Mitchell wrote:

I'm seeing errors in dmesg and the like.  It appears to be somewhat
similar to the issue reported here:
http://kerneltrap.org/mailarchive/linux-kernel/2007/8/25/164711 except
that my machine doesn't freeze, and everything seems normal --
hopefully nothing like silent corruption is going on.  Also it's on
brand new hardware...Intel ICH8 mobile chipset with AHCI.  Output from
dmesg, hdparm -I /dev/sda and hdparm --drq-hsm-error /dev/sda is
below...please let me know if there's anything else that would be of
use (and, of course, if this is something I should be worried about
:-)  ).

Thanks.
Jeff


dmesg:

ata1.00: exception Emask 0x2 SAct 0xfffd SErr 0x0 action 0x2 frozen
ata1.00: spurious completions during NCQ issue=0x1 SAct=0xfffd
FIS=005040a1:0002


You didn't say what kernel you were using, but in the latest kernels 
this spurious completion check was removed since it was broken, so this 
error shouldn't happen anymore.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-25 Thread Robert Hancock

Linus Torvalds wrote:

IMO, we should check which version of the specification we're supposed to
follow, on the basis of FADT contents, for example, and follow this one.


No, we should try to figure out what Windows does. *If* windows checks the 
version, we should do that too. But we should absolutely *not* just assume 
that the documentation is an accurate picture of reality.


Does anybody know how we could find out? 


Linus



Well, it seems that if one had a checked (debug) build of Windows (or at 
least the acpi.sys driver) installed, as well as a copy of the Microsoft 
ASL compiler, they could compile and temporarily override the DSDT with 
a hacked one that would output what the device power states were in some 
fashion (maybe through the kernel debugger). Some info about this here:


http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/TW04015_WINHEC2004.ppt

I suspect that might require more Windows hacking skill and/or 
motivation than one might be likely to find on this list, though :-)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-25 Thread Robert Hancock

Carlos Corbacho wrote:

On Tuesday 25 December 2007 13:26:12 Rafael J. Wysocki wrote:

Well, citing from the ACPI 2.0 specification, section 9.1.6 Transitioning
from the Working to the Sleeping State (which is what we're discussing
here):

3. OSPM places all device drivers into their respective Dx state. If the
device is enabled for wake, it enters the Dx state associated with the wake
capability. If the device is not enabled to wake the system, it enters the
D3 state.
4. OSPM executes the _PTS control method, passing an argument that
indicates the desired sleeping state (1, 2, 3, or 4 representing S1, S2,
S3, and S4).

My opinion is that we should follow this part of the specification and so
we do.


This is that same section from ACPI 1.0B:

3. The OS executes the Prepare To Sleep (_PTS) control method, passing an
argument that indicates the desired sleeping state (1, 2, 3, or 4 representing
S1, S2, S3, and S4).

4. The OS places all device drivers into their respective Dx state. If the
device is enabled for wakeup, it enters the Dx state associated with the 
wakeup capability. If the device is not enabled to wakeup the system, it 
enters the D3 state.


The DSDTs in question also claim ACPI 1.0 compatiblity.


You're wrong, sorry.


No, I'm not entirely wrong - read the 1.0 spec, and read section 7.3.2 of the 
ACPI 2.0 spec.


* ACPI 1.0 is very clear - we are breaking the 1.0 spec

* ACPI 2.0 is contradictory - section 7.3.2 repeats 1.0 ad verbatim (which is 
what I quote in reply to Robert Hancock), but as you point out, 9.3.2 says 
the opposite.


So, 1.0 and 3.0 are very clear and rather different on this, and 2.0 is 
contradictory (and I presume this is one of the points ACPI 3.0 set out to 
clean up).


I will rescind my point on ACPI 2.0 - I don't know what we should or shouldn't 
be doing there, the spec is unclear.


But for ACPI 1.0, we are doing the wrong thing.


Correct me if I'm wrong, but it appears ACPI 1.0 wants _PTS called 
before any devices are suspended, ACPI 2.0 is contradictory, and ACPI 
3.0 says that you can't assume anything about device state. My guess is 
that unless Windows has different behavior depending on ACPI version, it 
probably has called _PTS before suspending devices all along. Therefore 
it would likely be safest to emulate that behavior, no?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-25 Thread Robert Hancock

Carlos Corbacho wrote:

On Tuesday 25 December 2007 13:26:12 Rafael J. Wysocki wrote:

Well, citing from the ACPI 2.0 specification, section 9.1.6 Transitioning
from the Working to the Sleeping State (which is what we're discussing
here):

3. OSPM places all device drivers into their respective Dx state. If the
device is enabled for wake, it enters the Dx state associated with the wake
capability. If the device is not enabled to wake the system, it enters the
D3 state.
4. OSPM executes the _PTS control method, passing an argument that
indicates the desired sleeping state (1, 2, 3, or 4 representing S1, S2,
S3, and S4).

My opinion is that we should follow this part of the specification and so
we do.


This is that same section from ACPI 1.0B:

3. The OS executes the Prepare To Sleep (_PTS) control method, passing an
argument that indicates the desired sleeping state (1, 2, 3, or 4 representing
S1, S2, S3, and S4).

4. The OS places all device drivers into their respective Dx state. If the
device is enabled for wakeup, it enters the Dx state associated with the 
wakeup capability. If the device is not enabled to wakeup the system, it 
enters the D3 state.


The DSDTs in question also claim ACPI 1.0 compatiblity.


You're wrong, sorry.


No, I'm not entirely wrong - read the 1.0 spec, and read section 7.3.2 of the 
ACPI 2.0 spec.


* ACPI 1.0 is very clear - we are breaking the 1.0 spec

* ACPI 2.0 is contradictory - section 7.3.2 repeats 1.0 ad verbatim (which is 
what I quote in reply to Robert Hancock), but as you point out, 9.3.2 says 
the opposite.


So, 1.0 and 3.0 are very clear and rather different on this, and 2.0 is 
contradictory (and I presume this is one of the points ACPI 3.0 set out to 
clean up).


I will rescind my point on ACPI 2.0 - I don't know what we should or shouldn't 
be doing there, the spec is unclear.


But for ACPI 1.0, we are doing the wrong thing.


Correct me if I'm wrong, but it appears ACPI 1.0 wants _PTS called 
before any devices are suspended, ACPI 2.0 is contradictory, and ACPI 
3.0 says that you can't assume anything about device state. My guess is 
that unless Windows has different behavior depending on ACPI version, it 
probably has called _PTS before suspending devices all along. Therefore 
it would likely be safest to emulate that behavior, no?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-25 Thread Robert Hancock

Linus Torvalds wrote:

IMO, we should check which version of the specification we're supposed to
follow, on the basis of FADT contents, for example, and follow this one.


No, we should try to figure out what Windows does. *If* windows checks the 
version, we should do that too. But we should absolutely *not* just assume 
that the documentation is an accurate picture of reality.


Does anybody know how we could find out? 


Linus



Well, it seems that if one had a checked (debug) build of Windows (or at 
least the acpi.sys driver) installed, as well as a copy of the Microsoft 
ASL compiler, they could compile and temporarily override the DSDT with 
a hacked one that would output what the device power states were in some 
fashion (maybe through the kernel debugger). Some info about this here:


http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/TW04015_WINHEC2004.ppt

I suspect that might require more Windows hacking skill and/or 
motivation than one might be likely to find on this list, though :-)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HSM violation errors

2007-12-25 Thread Robert Hancock

Jeff Mitchell wrote:

I'm seeing errors in dmesg and the like.  It appears to be somewhat
similar to the issue reported here:
http://kerneltrap.org/mailarchive/linux-kernel/2007/8/25/164711 except
that my machine doesn't freeze, and everything seems normal --
hopefully nothing like silent corruption is going on.  Also it's on
brand new hardware...Intel ICH8 mobile chipset with AHCI.  Output from
dmesg, hdparm -I /dev/sda and hdparm --drq-hsm-error /dev/sda is
below...please let me know if there's anything else that would be of
use (and, of course, if this is something I should be worried about
:-)  ).

Thanks.
Jeff


dmesg:

ata1.00: exception Emask 0x2 SAct 0xfffd SErr 0x0 action 0x2 frozen
ata1.00: spurious completions during NCQ issue=0x1 SAct=0xfffd
FIS=005040a1:0002


You didn't say what kernel you were using, but in the latest kernels 
this spurious completion check was removed since it was broken, so this 
error shouldn't happen anymore.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-24 Thread Robert Hancock

Carlos Corbacho wrote:

On Monday 24 December 2007 18:34:21 Linus Torvalds wrote:

On Mon, 24 Dec 2007, Rafael J. Wysocki wrote:

Well, having considered that for a longer while, I think the AML code is
referring to a device that we have suspended already, and since it's in a
low power state, it just can't handle the reference.

If that is the case, we'll have to find the device (that should be
possible using some code instrumentation) and move the suspending of it
into the late stage.

Yes.


My own experimentation (in device_suspend(), calling _PTS() in the AML after 
each suspend_device() runs, until one device causes it to hang) points to 
ohci_hcd being the culprit here (with or without any devices attached). With 
the ohci_hcd module unloaded, the machine suspends just fine[1].


Of course, I'm at a complete loss as to why suspending OHCI would cause a 
problem for an IO port write.


The name of the operation region, SMIP, suggests that the BIOS has an 
SMI trap on that port. In that case, writing to that port will result in 
the BIOS taking control. We have little idea what it could be doing. 
Could be it's trying to access the OHCI controller which has been 
suspended already.


This sounds kind of like the Toshiba laptops that go nuts somewhere if 
the AHCI SATA controller gets put into suspend state before the system 
suspends..


The ACPI spec has the following to say about the _PTS method:

"The platform must not make any assumptions about the state of the 
machine when _PTS is called. For example, operation region accesses that 
require devices to be configured and enabled may not succeed, as these 
devices may be in a non-decoding state due to plug and play or power 
management operations."


I would guess some BIOS writers failed to heed this..




NOTE! This following patch is just for discussion, and while I think it's
conceptually a good thing to try, I don't think it will help Carlos'
problem. But removing the "pci_set_power_state()" in agp_nvidia_suspend()
might.


nvidia-agp cannot be built on x86-64, so it's not the culprit in this case.


Yeah, and this is a PCI Express system not AGP, so it wouldn't load anyway.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-24 Thread Robert Hancock

Loic Prylli wrote:

I just realized one thing: the bar sizing code in pci_read_bases() (that
writes 0x in the bars) does not seem to disable the
PCI_COMMAND_MEM/PCI_COMMAND_IO bits in the cmd register before
manipulating the BARs. And it seems nobody else ensures they are
disabled at this point either (or am I missing something?).


No you're not missing anything. This problem causes many machines to 
break horribly when MMCONFIG is enabled. There's a patch in -mm to fix 
this. (It special-cases the case of host bridges and doesn't disable the 
 decode bits for those, since some are known to do crazy things if you 
do that.)


http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/broken-out/pci-disable-decoding-during-sizing-of-bars.patch



Touching the bars while they are enabled would be buggy behaviour from
our part, and something trivial to fix. And it might well fix that
particular problem (it's fair play from the machine to crash if we
create a decoding conflict, simply disabling the cmd bits in
pci_read_bases() should remove that conflict).

FWIW, to partially answer your last question, Windows does disable
mem-space and/or IO-space when sizing the bars of a device (I have some
traces of configuration-space-access taken on a window machine for one
of the PCI busses).


Good to know. There was some speculation that it did not.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 9528] x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-24 Thread Robert Hancock

Linus Torvalds wrote:


On Sun, 23 Dec 2007, Carlos Corbacho wrote:

Fix suspend-to-RAM on nForce 4 (CK804) boards by increasing
PCIBIOS_MIN_IO.

Fixes kernel bugzilla #9528

Problem:

Linus' patch (52ade9b3b97fd3bea42842a056fe0786c28d0555) to re-order
suspend (and fix fall out from Rafael's earlier suspend reordering work)
broke suspend-to-RAM on nForce 4 (CK804) boards.

Why:

After debugging _PTS() in the DSDT, it turns out these nVidia boards are
trying to write to an IO port > 0x1000 (0x142E) during suspend. Before the
re-ordering, we got away with this.


Very interesting.

HOWEVER.

I'd much rather figure out what the magic IO resource is that clashes. 

It's almost certainly some hidden and undocumented (or badly documented) 
ACPI IO area that the kernel doesn't know about, because it's not a 
regular PCI BAR resource, but some northbridge (or southbridge) magic 
register range.


Those ranges *should* be reserved by the BIOS in the ACPI tables, but this 
would definitely not be the first time that doesn't happen.


I'm having trouble sorting out which report is for which BIOS (and some 
of them don't have any dmesg posted), but I believe in these cases that 
memory region is indeed reported as reserved by the BIOS, and no PCI 
resources should end up allocated there. So I'm not sure why fiddling 
with PCIBIOS_MIN_IO would have any effect (other than by accident).


I wonder if this is the culprit (from Arthur Erhardt's dmesg):

pnpacpi: exceeded the max number of mem resources: 12
pnpacpi: exceeded the max number of mem resources: 12

which means we're ignoring some of the memory reservations. I wonder if 
some IO reservations are also being ignored?


Why do we have this silly hard limit of number of resources anyway? If 
we just ignore random reservations provided by the BIOS, we shouldn't be 
surprised if things break randomly. This warning at the very least 
should be much louder (i.e. "Warning: This problem may break your system")..




But the right fix would be for us to just figure out what the range is ass 
a PCI quirk, and just know to avoid it on purpose, ratehr than just being 
lucky and happen to avoid it because PCIBIOS_MIN_IO just happens to be 
bigger than the particular address.


So can you:
 - show what your /proc/ioports contains (*with* the bug triggering, ie 
   non-working suspend, so we see what it is that actually ends up using 
   that area)

 - send out 'dmesg' for a boot (same deal)
 - add "lspci -xxxvv" output to the deal too.

and also make them part of the bugzilla history (I'm cc'ing bugzilla here, 
and added the bug number to the subject, so hopefully this thread ends up 
being archived there too).



There was some previous work in the PCIBIOS_MIN_IO area over two years ago
(71db63acff69618b3d9d3114bd061938150e146b) which bumped this to 0x4000,
but this was reverted (2ba84684e8cf6f980e4e95a2300f53a505eb794e) after
causing new and entirely different problems on another nForce board.


The problem here is classic: these magic ranges tend to be *different* on 
different boards (because they don't tend to be fixed by hardware, they 
are programmed regions set up by firmware), so trying to change 
PCIBIOS_MIN_IO to avoid a problem on one board is almost certain to just 
introduce it on another board instead.


On *your* particular board, 0x142E is used for something, but on somebody 
elses board it might be 0x162E, and now changing PCIBIOS_MIN_IO to 0x1500 
might make that other board hang instead.


So you seem to have debugged this very successfully, and I'm wondering if 
you might be able to find out where that 0x142e comes from, and we could 
fix it for *all* boards using that chipset by just figuring out what the 
*hardware* rules (rather than the random firmware setup that will be 
different on different boards) for that chipset actually are!


I suspect it's board specific. Looking at the DSDT for my A8N-SLI 
Deluxe, that SMIP region is defined at 0x442E (and is reported as 
reserved). This BIOS doesn't write there in the _PTS method like the 
ones in the report apparently do though.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 9528] x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-24 Thread Robert Hancock

Linus Torvalds wrote:


On Sun, 23 Dec 2007, Carlos Corbacho wrote:

Fix suspend-to-RAM on nForce 4 (CK804) boards by increasing
PCIBIOS_MIN_IO.

Fixes kernel bugzilla #9528

Problem:

Linus' patch (52ade9b3b97fd3bea42842a056fe0786c28d0555) to re-order
suspend (and fix fall out from Rafael's earlier suspend reordering work)
broke suspend-to-RAM on nForce 4 (CK804) boards.

Why:

After debugging _PTS() in the DSDT, it turns out these nVidia boards are
trying to write to an IO port  0x1000 (0x142E) during suspend. Before the
re-ordering, we got away with this.


Very interesting.

HOWEVER.

I'd much rather figure out what the magic IO resource is that clashes. 

It's almost certainly some hidden and undocumented (or badly documented) 
ACPI IO area that the kernel doesn't know about, because it's not a 
regular PCI BAR resource, but some northbridge (or southbridge) magic 
register range.


Those ranges *should* be reserved by the BIOS in the ACPI tables, but this 
would definitely not be the first time that doesn't happen.


I'm having trouble sorting out which report is for which BIOS (and some 
of them don't have any dmesg posted), but I believe in these cases that 
memory region is indeed reported as reserved by the BIOS, and no PCI 
resources should end up allocated there. So I'm not sure why fiddling 
with PCIBIOS_MIN_IO would have any effect (other than by accident).


I wonder if this is the culprit (from Arthur Erhardt's dmesg):

pnpacpi: exceeded the max number of mem resources: 12
pnpacpi: exceeded the max number of mem resources: 12

which means we're ignoring some of the memory reservations. I wonder if 
some IO reservations are also being ignored?


Why do we have this silly hard limit of number of resources anyway? If 
we just ignore random reservations provided by the BIOS, we shouldn't be 
surprised if things break randomly. This warning at the very least 
should be much louder (i.e. Warning: This problem may break your system)..




But the right fix would be for us to just figure out what the range is ass 
a PCI quirk, and just know to avoid it on purpose, ratehr than just being 
lucky and happen to avoid it because PCIBIOS_MIN_IO just happens to be 
bigger than the particular address.


So can you:
 - show what your /proc/ioports contains (*with* the bug triggering, ie 
   non-working suspend, so we see what it is that actually ends up using 
   that area)

 - send out 'dmesg' for a boot (same deal)
 - add lspci -xxxvv output to the deal too.

and also make them part of the bugzilla history (I'm cc'ing bugzilla here, 
and added the bug number to the subject, so hopefully this thread ends up 
being archived there too).



There was some previous work in the PCIBIOS_MIN_IO area over two years ago
(71db63acff69618b3d9d3114bd061938150e146b) which bumped this to 0x4000,
but this was reverted (2ba84684e8cf6f980e4e95a2300f53a505eb794e) after
causing new and entirely different problems on another nForce board.


The problem here is classic: these magic ranges tend to be *different* on 
different boards (because they don't tend to be fixed by hardware, they 
are programmed regions set up by firmware), so trying to change 
PCIBIOS_MIN_IO to avoid a problem on one board is almost certain to just 
introduce it on another board instead.


On *your* particular board, 0x142E is used for something, but on somebody 
elses board it might be 0x162E, and now changing PCIBIOS_MIN_IO to 0x1500 
might make that other board hang instead.


So you seem to have debugged this very successfully, and I'm wondering if 
you might be able to find out where that 0x142e comes from, and we could 
fix it for *all* boards using that chipset by just figuring out what the 
*hardware* rules (rather than the random firmware setup that will be 
different on different boards) for that chipset actually are!


I suspect it's board specific. Looking at the DSDT for my A8N-SLI 
Deluxe, that SMIP region is defined at 0x442E (and is reported as 
reserved). This BIOS doesn't write there in the _PTS method like the 
ones in the report apparently do though.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-24 Thread Robert Hancock

Loic Prylli wrote:

I just realized one thing: the bar sizing code in pci_read_bases() (that
writes 0x in the bars) does not seem to disable the
PCI_COMMAND_MEM/PCI_COMMAND_IO bits in the cmd register before
manipulating the BARs. And it seems nobody else ensures they are
disabled at this point either (or am I missing something?).


No you're not missing anything. This problem causes many machines to 
break horribly when MMCONFIG is enabled. There's a patch in -mm to fix 
this. (It special-cases the case of host bridges and doesn't disable the 
 decode bits for those, since some are known to do crazy things if you 
do that.)


http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/broken-out/pci-disable-decoding-during-sizing-of-bars.patch



Touching the bars while they are enabled would be buggy behaviour from
our part, and something trivial to fix. And it might well fix that
particular problem (it's fair play from the machine to crash if we
create a decoding conflict, simply disabling the cmd bits in
pci_read_bases() should remove that conflict).

FWIW, to partially answer your last question, Windows does disable
mem-space and/or IO-space when sizing the bars of a device (I have some
traces of configuration-space-access taken on a window machine for one
of the PCI busses).


Good to know. There was some speculation that it did not.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: Increase PCIBIOS_MIN_IO to 0x1500 to fix nForce 4 suspend-to-RAM

2007-12-24 Thread Robert Hancock

Carlos Corbacho wrote:

On Monday 24 December 2007 18:34:21 Linus Torvalds wrote:

On Mon, 24 Dec 2007, Rafael J. Wysocki wrote:

Well, having considered that for a longer while, I think the AML code is
referring to a device that we have suspended already, and since it's in a
low power state, it just can't handle the reference.

If that is the case, we'll have to find the device (that should be
possible using some code instrumentation) and move the suspending of it
into the late stage.

Yes.


My own experimentation (in device_suspend(), calling _PTS() in the AML after 
each suspend_device() runs, until one device causes it to hang) points to 
ohci_hcd being the culprit here (with or without any devices attached). With 
the ohci_hcd module unloaded, the machine suspends just fine[1].


Of course, I'm at a complete loss as to why suspending OHCI would cause a 
problem for an IO port write.


The name of the operation region, SMIP, suggests that the BIOS has an 
SMI trap on that port. In that case, writing to that port will result in 
the BIOS taking control. We have little idea what it could be doing. 
Could be it's trying to access the OHCI controller which has been 
suspended already.


This sounds kind of like the Toshiba laptops that go nuts somewhere if 
the AHCI SATA controller gets put into suspend state before the system 
suspends..


The ACPI spec has the following to say about the _PTS method:

The platform must not make any assumptions about the state of the 
machine when _PTS is called. For example, operation region accesses that 
require devices to be configured and enabled may not succeed, as these 
devices may be in a non-decoding state due to plug and play or power 
management operations.


I would guess some BIOS writers failed to heed this..




NOTE! This following patch is just for discussion, and while I think it's
conceptually a good thing to try, I don't think it will help Carlos'
problem. But removing the pci_set_power_state() in agp_nvidia_suspend()
might.


nvidia-agp cannot be built on x86-64, so it's not the culprit in this case.


Yeah, and this is a PCI Express system not AGP, so it wouldn't load anyway.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-22 Thread Robert Hancock

Loic Prylli wrote:

On 12/20/2007 6:21 PM, Tony Camuso wrote:

And the MMCONFIG problem with enterprise systems and workstations, where
we do control the BIOS (for the most part), is due to known bugs in
certain versions of certain chipsets, HT1000, AMD8132, among them, not
the BIOS.




The lack of MMCONFIG support is indeed because some hypertransport
chipsets lack that support. But there are some BIOSes out there that are
advertising support for all busses in their MCFG acpi attribute (even
the busses managed by some amd8131 in a mixed nvidia-ck804/amd8131
motherboard), and the BIOS seems at least faulty for advertising a
capability that does not exist.


This didn't really occur to me before for some reason. But yes, the MCFG 
table lists the buses to which each MMCONFIG region is applicable. If 
there are entire buses which MMCONFIG cannot access, it should not be 
indicating they are accessible via MMCONFIG in the ACPI MCFG table. If 
it is, then it's truly a BIOS bug.


Unless of course Linux isn't handling what the MCFG table is indicating 
properly. Then it's our bug. It would be good to verify this on one of 
the systems involved..


One of the things this patch (currently in -mm) does is dump out the 
segment and starting/ending buses for each MCFG configuration listed. 
The dmesg from this patch applied on such a system would tell you which 
is the case:


http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commit;h=e18c985289ee356f06dbc953281a3c140a02fbb3

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Make MMCONFIG space (extended PCI config space) a driver opt-in issue

2007-12-22 Thread Robert Hancock

Arjan van de Ven wrote:

Hi,

Linus really wants the extended (4Kb) PCI configuration space (using MCFG acpi 
table etc) to be opt-in, since there's many issues with it and most drivers 
don't even use/need it. The idea behind opt-in is that if you don't use it, you 
don't get to suffer the bugs...

Booted on my 64 bit test machine; sadly it has a defunct BIOS that doesn't have 
a working MCFG.


From: Arjan van de Ven <[EMAIL PROTECTED]>
Subject: Make MMCONFIG space a driver opt-in

There are many issues with using the extended PCI configuration space 
(CPU, Chipset and most of all BIOS bugs). This while the vast majority of drivers

and devices don't even use/need to use the memory mapped access methods since 
they
don't use the config space beyond the traditional 256 bytes.

This patch makes accessing the extended config space a driver choice, via the

pci_enable_ext_config(pdev)

API function; drivers that want/need the extended configuration space should 
call this.
(a separate patch will be posted to add this function call to the driver that 
uses this)


I don't really like this approach. Whether MMCONFIG works or not has 
nothing to do with the device itself, it's an attribute of the machine, 
and possibly the bus it's been plugged into. This patch might prevent 
problems in some cases, but it's equally likely to just delay problems 
until somebody plugs in a device that tries to use extended config 
space. Neither do I really like the approach of limiting MMCONFIG 
accesses to ones beyond a certain address in the config space, for a 
similar reason.


The detection of whether MMCONFIG works or not has to work properly (and 
I think we're pretty close, or at least we know what we need to do to 
get there, like fixing the stupid MMCONFIG/PCI bar sizing overlap 
problem, and likely Tony Camuso's patch or something like it, to disable 
MMCONFIG accesses to devices behind certain broken host bridges). Once 
that works, then this patch really serves no purpose.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Make MMCONFIG space (extended PCI config space) a driver opt-in issue

2007-12-22 Thread Robert Hancock

Arjan van de Ven wrote:

Hi,

Linus really wants the extended (4Kb) PCI configuration space (using MCFG acpi 
table etc) to be opt-in, since there's many issues with it and most drivers 
don't even use/need it. The idea behind opt-in is that if you don't use it, you 
don't get to suffer the bugs...

Booted on my 64 bit test machine; sadly it has a defunct BIOS that doesn't have 
a working MCFG.


From: Arjan van de Ven [EMAIL PROTECTED]
Subject: Make MMCONFIG space a driver opt-in

There are many issues with using the extended PCI configuration space 
(CPU, Chipset and most of all BIOS bugs). This while the vast majority of drivers

and devices don't even use/need to use the memory mapped access methods since 
they
don't use the config space beyond the traditional 256 bytes.

This patch makes accessing the extended config space a driver choice, via the

pci_enable_ext_config(pdev)

API function; drivers that want/need the extended configuration space should 
call this.
(a separate patch will be posted to add this function call to the driver that 
uses this)


I don't really like this approach. Whether MMCONFIG works or not has 
nothing to do with the device itself, it's an attribute of the machine, 
and possibly the bus it's been plugged into. This patch might prevent 
problems in some cases, but it's equally likely to just delay problems 
until somebody plugs in a device that tries to use extended config 
space. Neither do I really like the approach of limiting MMCONFIG 
accesses to ones beyond a certain address in the config space, for a 
similar reason.


The detection of whether MMCONFIG works or not has to work properly (and 
I think we're pretty close, or at least we know what we need to do to 
get there, like fixing the stupid MMCONFIG/PCI bar sizing overlap 
problem, and likely Tony Camuso's patch or something like it, to disable 
MMCONFIG accesses to devices behind certain broken host bridges). Once 
that works, then this patch really serves no purpose.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-22 Thread Robert Hancock

Loic Prylli wrote:

On 12/20/2007 6:21 PM, Tony Camuso wrote:

And the MMCONFIG problem with enterprise systems and workstations, where
we do control the BIOS (for the most part), is due to known bugs in
certain versions of certain chipsets, HT1000, AMD8132, among them, not
the BIOS.




The lack of MMCONFIG support is indeed because some hypertransport
chipsets lack that support. But there are some BIOSes out there that are
advertising support for all busses in their MCFG acpi attribute (even
the busses managed by some amd8131 in a mixed nvidia-ck804/amd8131
motherboard), and the BIOS seems at least faulty for advertising a
capability that does not exist.


This didn't really occur to me before for some reason. But yes, the MCFG 
table lists the buses to which each MMCONFIG region is applicable. If 
there are entire buses which MMCONFIG cannot access, it should not be 
indicating they are accessible via MMCONFIG in the ACPI MCFG table. If 
it is, then it's truly a BIOS bug.


Unless of course Linux isn't handling what the MCFG table is indicating 
properly. Then it's our bug. It would be good to verify this on one of 
the systems involved..


One of the things this patch (currently in -mm) does is dump out the 
segment and starting/ending buses for each MCFG configuration listed. 
The dmesg from this patch applied on such a system would tell you which 
is the case:


http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commit;h=e18c985289ee356f06dbc953281a3c140a02fbb3

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-20 Thread Robert Hancock

Tony Camuso wrote:

Robert Hancock wrote:

First off, I would like to see confirmation from the horses's mouths 
here (namely AMD, ServerWorks/Broadcom, and whoever else) that there 
is no other way to get around this problem than disabling MMCONFIG for 
accesses behind those chips.




I happen to have this one stored in my desktop.

 From AMD-8132TM HyperTransportTM
 PCI-X®2.0 Tunnel
  Revision Guide

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30801.pdf 



79 AMD-8132TM Tunnel Lacks Extended Configuration
Space Memory-Mapped I/O Base Address Register

Description

Current AMD processors do not natively support PCI-defined extended 
configuration space. A memory
mapped I/O base address register (MMIO BAR) is required in chipset 
devices to support extended

configuration space. The AMD-8132 does not have this MMIO BAR.
Potential Effect On System

The AMD-8132 is a PCI-X® Mode 2 capable device and requires the MMIO BAR 
to support extended
configuration space. Using a device which does have this MMIO BAR and an 
AMD-8132 on the same
HyperTransportTM link of the processor may cause firmware/software 
problems.


The base configuration space of the AMD-8132 and PCI(-X) devices 
attached to it are accessible using only
the mechanism defined in PCI 2.3. Registers of PCI-X Mode 2 devices 
attached to the AMD-8132 in the
extended configuration space are not accessible. The AMD-8132 has no 
registers in the extended

configuration space.

Suggested Workaround

It is strongly recommended that system designers do not connect the 
AMD-8132 and devices that use extended
configuration space MMIO BARs (ex: HyperTransport-to-PCI Express® 
bridges) to the same processor

HyperTransport link.

Fix Planned
No


That does sound fairly definitive. I have to wonder why certain system 
designers then didn't follow their strong recommendation..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-20 Thread Robert Hancock

Tony Camuso wrote:

Greg KH wrote:


Sure, I realize this, but it solves the problem in one way for broken
hardware, such that it at least allows it to work, right?  It also
provides a better incentive for the manufacturer to fix their bios,
which as you are on-site at HP, it would seem odd that they would just
not do that instead of trying to work around this in the kernel...

thanks,

greg k-h


I don't think that many OEMs have that much control over the BIOS in
their "value lines".
:)

And the MMCONFIG problem with enterprise systems and workstations, where
we do control the BIOS (for the most part), is due to known bugs in
certain versions of certain chipsets, HT1000, AMD8132, among them, not
the BIOS.

Anyway, we are devising better ways to deal with these anomalies
than blacklists and telling customers to use "pci=nommconf"

And we're bringing them to the community for discussion, improvement,
and, we hope, acceptance.


First off, I would like to see confirmation from the horses's mouths 
here (namely AMD, ServerWorks/Broadcom, and whoever else) that there is 
no other way to get around this problem than disabling MMCONFIG for 
accesses behind those chips.


The case of the device built into the K8 northbridge that's unreachable 
by MMCONFIG kind of makes sense, since the northbridge is what's 
translating the MMCONFIG memory access into config accesses. It seems 
bizarre to me that a bridge chip could possibly have such a problem. The 
MMCONFIG access should get translated into a configuration space access 
in the northbridge and from that point on there's no difference between 
an MMCONFIG and type1 access.


Look at MSI for another example, we recently had a patch from NVIDIA 
posted to enable Hypertransport MSI mapping bits on some chipsets so 
that MSI would function, because the BIOS failed to set them up 
properly. Are we sure there's not a similar BIOS configuration issue 
that could ideally be fixed in the BIOS, or else fixed up in the kernel?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-20 Thread Robert Hancock

Tony Camuso wrote:

Greg KH wrote:


Sure, I realize this, but it solves the problem in one way for broken
hardware, such that it at least allows it to work, right?  It also
provides a better incentive for the manufacturer to fix their bios,
which as you are on-site at HP, it would seem odd that they would just
not do that instead of trying to work around this in the kernel...

thanks,

greg k-h


I don't think that many OEMs have that much control over the BIOS in
their value lines.
:)

And the MMCONFIG problem with enterprise systems and workstations, where
we do control the BIOS (for the most part), is due to known bugs in
certain versions of certain chipsets, HT1000, AMD8132, among them, not
the BIOS.

Anyway, we are devising better ways to deal with these anomalies
than blacklists and telling customers to use pci=nommconf

And we're bringing them to the community for discussion, improvement,
and, we hope, acceptance.


First off, I would like to see confirmation from the horses's mouths 
here (namely AMD, ServerWorks/Broadcom, and whoever else) that there is 
no other way to get around this problem than disabling MMCONFIG for 
accesses behind those chips.


The case of the device built into the K8 northbridge that's unreachable 
by MMCONFIG kind of makes sense, since the northbridge is what's 
translating the MMCONFIG memory access into config accesses. It seems 
bizarre to me that a bridge chip could possibly have such a problem. The 
MMCONFIG access should get translated into a configuration space access 
in the northbridge and from that point on there's no difference between 
an MMCONFIG and type1 access.


Look at MSI for another example, we recently had a patch from NVIDIA 
posted to enable Hypertransport MSI mapping bits on some chipsets so 
that MSI would function, because the BIOS failed to set them up 
properly. Are we sure there's not a similar BIOS configuration issue 
that could ideally be fixed in the BIOS, or else fixed up in the kernel?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fwd: Re: [PATCH 0/5]PCI: x86 MMCONFIG]

2007-12-20 Thread Robert Hancock

Tony Camuso wrote:

Robert Hancock wrote:

First off, I would like to see confirmation from the horses's mouths 
here (namely AMD, ServerWorks/Broadcom, and whoever else) that there 
is no other way to get around this problem than disabling MMCONFIG for 
accesses behind those chips.




I happen to have this one stored in my desktop.

 From AMD-8132TM HyperTransportTM
 PCI-X®2.0 Tunnel
  Revision Guide

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30801.pdf 



79 AMD-8132TM Tunnel Lacks Extended Configuration
Space Memory-Mapped I/O Base Address Register

Description

Current AMD processors do not natively support PCI-defined extended 
configuration space. A memory
mapped I/O base address register (MMIO BAR) is required in chipset 
devices to support extended

configuration space. The AMD-8132 does not have this MMIO BAR.
Potential Effect On System

The AMD-8132 is a PCI-X® Mode 2 capable device and requires the MMIO BAR 
to support extended
configuration space. Using a device which does have this MMIO BAR and an 
AMD-8132 on the same
HyperTransportTM link of the processor may cause firmware/software 
problems.


The base configuration space of the AMD-8132 and PCI(-X) devices 
attached to it are accessible using only
the mechanism defined in PCI 2.3. Registers of PCI-X Mode 2 devices 
attached to the AMD-8132 in the
extended configuration space are not accessible. The AMD-8132 has no 
registers in the extended

configuration space.

Suggested Workaround

It is strongly recommended that system designers do not connect the 
AMD-8132 and devices that use extended
configuration space MMIO BARs (ex: HyperTransport-to-PCI Express® 
bridges) to the same processor

HyperTransport link.

Fix Planned
No


That does sound fairly definitive. I have to wonder why certain system 
designers then didn't follow their strong recommendation..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] 2.6.24-rc4 hwmon it87 probe fails

2007-12-19 Thread Robert Hancock

Carlos Corbacho wrote:

On Thursday 20 December 2007 00:20:21 Bjorn Helgaas wrote:

I suspect the manufacturers would say "Oh, the sensors?  The BIOS
isn't broken, you're just supposed to use WMI or some (undocumented)
ACPI device to get at those."


It's quite possible - can we have DSDTs for the boards in question so we can 
quickly check if this is a possibility? (Basically, to see if they have 
PNP0C14 devices - if they don't, then I'm afraid it's nothing to do with 
WMI).


-Carlos


It's quite possible that the BIOS accesses the device either from ACPI 
AML or possibly even from SMI. In that case it would be quite reasonable 
for the BIOS to reserve that region to prevent another driver from 
loading and trying to take conflicting control of the device. One has to 
be careful before assuming that any such reservation is bogus.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5]PCI: x86 MMCONFIG

2007-12-19 Thread Robert Hancock

Greg KH wrote:

On Wed, Dec 19, 2007 at 05:17:46PM -0500, [EMAIL PROTECTED] wrote:

OVERVIEW
===

The patches should be applied in sequence to obviate any
possible build problems.

The patch-set was built against 2.6.24-rc5

Description
===

There exist devices that do not respond correctly to PCI
MMCONFIG accesses in x86 platforms.


What devices are these?  Do you have reports of them somewhere?


This patch-set detects the problem by comparing an MMCONFIG
read to a Legacy PCI config read of the vendor/device dword
of every device discovered during the PCI probing sequence.

A miscompare means that a device does not correctly respond
to MMCONFIG accesses. When the patch code detects this condition,
the bus that serves this device, and all subordinate buses, will
be programmed to use Legacy PCI Config accesses.

This patch-set DOES NOT detect devices that generate machine
checks against MMCONFIG accesses. For such systems,
"pci=nommconf" is required in the boot command.


That sounds like this patchset can cause bad side affects on hardware
that currently works just fine.  That is not a good thing to be adding
to the kernel, right?


I think we need more details on why this patch is needed. Also, we 
already have something like this in arch/x86/pci/mmconfig-shared.c, in 
the unreachable_devices function. This attempts to detect devices where 
MMCONFIG cannot access the configuration space (one of these would be at 
least one device in the AMD K8 built-in northbridge). If this is not 
sufficient, I would suggest expanding that mechanism instead of adding 
all this new code.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5]PCI: x86 MMCONFIG

2007-12-19 Thread Robert Hancock

Greg KH wrote:

On Wed, Dec 19, 2007 at 05:17:46PM -0500, [EMAIL PROTECTED] wrote:

OVERVIEW
===

The patches should be applied in sequence to obviate any
possible build problems.

The patch-set was built against 2.6.24-rc5

Description
===

There exist devices that do not respond correctly to PCI
MMCONFIG accesses in x86 platforms.


What devices are these?  Do you have reports of them somewhere?


This patch-set detects the problem by comparing an MMCONFIG
read to a Legacy PCI config read of the vendor/device dword
of every device discovered during the PCI probing sequence.

A miscompare means that a device does not correctly respond
to MMCONFIG accesses. When the patch code detects this condition,
the bus that serves this device, and all subordinate buses, will
be programmed to use Legacy PCI Config accesses.

This patch-set DOES NOT detect devices that generate machine
checks against MMCONFIG accesses. For such systems,
pci=nommconf is required in the boot command.


That sounds like this patchset can cause bad side affects on hardware
that currently works just fine.  That is not a good thing to be adding
to the kernel, right?


I think we need more details on why this patch is needed. Also, we 
already have something like this in arch/x86/pci/mmconfig-shared.c, in 
the unreachable_devices function. This attempts to detect devices where 
MMCONFIG cannot access the configuration space (one of these would be at 
least one device in the AMD K8 built-in northbridge). If this is not 
sufficient, I would suggest expanding that mechanism instead of adding 
all this new code.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] 2.6.24-rc4 hwmon it87 probe fails

2007-12-19 Thread Robert Hancock

Carlos Corbacho wrote:

On Thursday 20 December 2007 00:20:21 Bjorn Helgaas wrote:

I suspect the manufacturers would say Oh, the sensors?  The BIOS
isn't broken, you're just supposed to use WMI or some (undocumented)
ACPI device to get at those.


It's quite possible - can we have DSDTs for the boards in question so we can 
quickly check if this is a possibility? (Basically, to see if they have 
PNP0C14 devices - if they don't, then I'm afraid it's nothing to do with 
WMI).


-Carlos


It's quite possible that the BIOS accesses the device either from ACPI 
AML or possibly even from SMI. In that case it would be quite reasonable 
for the BIOS to reserve that region to prevent another driver from 
loading and trying to take conflicting control of the device. One has to 
be careful before assuming that any such reservation is bogus.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] 2.6.24-rcx: Make sys_poll() wait at least timeout ms

2007-12-18 Thread Robert Hancock

Karsten Wiese wrote:

Am Mittwoch, 19. Dezember 2007 schrieb Robert Hancock:
That seems fishy. What is your value of HZ and what is the timeout value 
that was passed in the bad case?


HZ set to 250, timeout to 4ms.
Time spent in poll() taken by clock_gettime(CLOCK_MONOTONIC, )
before and after poll()call: i.e 62us.
Time measured with hpet gave 166us once.


msecs_to_jiffies (kernel/time.c) has this:

#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
/*
 * HZ is equal to or smaller than 1000, and 1000 is a nice
 * round multiple of HZ, divide with the factor between them,
 * but round upwards:
 */
return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);

With HZ=250 and m=4 this gives 7/4 or only 1 jiffy, which is not more 
than 4ms, but if we are already at near the end of the current jiffy it 
could be much less than that (potentially almost no time at all).


Maybe we could convert poll to use a hrtimer for this instead?

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of memory and no killable processes: 2.6.22-2-686-bigmem

2007-12-18 Thread Robert Hancock

Nico Schottelius wrote:

Hello!

We are running Debian with 2.6.22-2-686-bigmem on Dell Blade 1955 hardware
and get a Kernel Panic with oom + message that there are no processes
left to kill:

http://home.schottelius.org/~nico/unix/linux/oom_no_killable-2.6.22-1.jpeg

Anyone an idea, what's the cause for that? This error happened on two of
those machines,

What I can see in our analysis done with munin is that the number of
open inodes and inode table size decreased within some days from 40k
to next to zero. Munin uses

   awk '{print "used.value " $1-$2 "\nmax.value " $1}' < /proc/sys/fs/inode-nr

to log those value (happened on both machines).

Thanks for any hint and CC as usual, please.


How much RAM is in these machines? If you're running tons of memory, it 
really is better to run a 64-bit kernel if possible. I believe there are 
some cases where low memory can be pretty easily exhausted on machines 
with lots of high memory.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource problems caused by improper address rounding

2007-12-18 Thread Robert Hancock

Linus Torvalds wrote:


On Mon, 17 Dec 2007, Chuck Ebbert wrote:

Looks like a commit that I can't find in git due to the arch merge
has broken PCI address assignment. This patch by Richard Henderson
against 2.6.23 fixes it for x86_64:

--- linux-2.6.23.x86_64/arch/x86_64/kernel/e820.c   2007-10-09 
13:31:38.0 -0700
+++ linux-2.6.23.x86_64-rth/arch/x86_64/kernel/e820.c   2007-12-15 
12:37:44.0 -0800
@@ -718,8 +718,8 @@ __init void e820_setup_gap(void)
while ((gapsize >> 4) > round)
round += round;
/* Fun with two's complement */
-   pci_mem_start = (gapstart + round) & -round;
+   pci_mem_start = (gapstart + round - 1) & -round;


No, it's very much meant to be that way.

We do *not* want to have the PCI memory abutthe end of memory exactly. So 
it leaves a gap in between "gapstart" and the actual start of PCI memory 
addressing very much on purpose.


In fact, the very commit (it's f0eca9626c6becb6fc56106b2e4287c6c784af3d in 
the kernel tree) you mention actually explicitly *explains* that, although 
maybe it's a bit indirect: if you start allocating PCI resources directly 
after the end-of-RAM thing, you can easily end up using addresses that are 
actually inside the magic stolen system RAM that is being used for UMA 
video etc.


So you very much want to have a buffer in between the end-of-RAM and the 
actual start of the region we try to allocate in. 

So why do you want them to be close, anyway? 


Linus

PS. On a different topic: if you do

git log --follow arch/x86/kernel/e820_64.c

you'd see the history past the renames in git. Or just do a "git blame -C" 
which will also follow renames (and copies).


That patch is from the 2.6.14 era - I don't think we even did PnP ACPI 
resource reservation handling then? It could be that the BIOS was trying 
to tell us that UMA memory region is reserved using PnP ACPI 
reservations, but we just ignored it.


It seems rather arbitrary in how much it leaves unused - and in this 
case, likely prevents us from using the nice big open gap that the BIOS 
presumably expected the graphics card to be mapped into.


I suspect this buffer space insertion is really not needed at this 
point. The patch description is likely technically correct in that the 
BIOS should have reserved it in E820, but (according to MS comments in a 
presentation I read) Windows doesn't use E820 for anything other than 
figuring out where RAM is, it uses PnP ACPI for figuring out areas it 
needs to avoid. Since BIOS writers test against that behavior, there are 
surely lots of systems where ignoring PnP ACPI reservations and relying 
only on E820 would result in things really going blammo (like mappings 
things over MMCONFIG tables for instance). So disabling it on modern 
machines is really not an option. And if it's enabled, you likely 
wouldn't hit the problem it tries to fix.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource problems caused by improper address rounding

2007-12-18 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 18 Dec 2007, Chuck Ebbert wrote:


On 12/18/2007 04:09 PM, Linus Torvalds wrote:
I wonder what the heck is the point of that pnp entry. Just for fun, can 
you try to just disable CONFIG_PNP, and see if it all works then?

pnpacpi=off should work.

PnP is also trying (and failing) to reserve all physical memory.


Yeah, that really is a pretty confused-looking pnp table thing. But I have 
absolutely zero idea how PnP is even supposed to work - the whole thing is 
just a total hack for Windows, afaik.


The sad part is that *normally* the right thing to do about almost any 
BIOS information is what we do right now: just avoid that magic address 
range like the plague, because we have no clue what the heck the BIOS is 
up to. But it looks like in this particular case, some of the problems 
may arise exactly *because* we avoid that range.


It would be good to know what Windows does. If ACPI is found, does it 
perhaps just ignore all the PnP entries these days?


Linus


ACPI is where those PnP entries are coming from (on any modern system 
anyway). They do show up in Device Manager as devices with resources 
(the one that reserves all of system RAM on my machine is labeled 
"System board", others like the one that reserves the MMCONFIG aperature 
are "Motherboard resources" - the name is based on the PNP device ID, I 
believe).


It could be that Windows is stupid enough that it will map things over 
top of physical RAM if the BIOS doesn't explicitly reserve it like that. 
 I suspect based on some comments in Microsoft documents that Windows 
uses the E820 table to figure out where the RAM is, and ACPI/PnP 
information to figure out where IO mappings are, but may not really 
combine those two pieces of information into one overall map like Linux 
does, which would explain why it needs to reserve all physical RAM..


(As mentioned in another post, I would guess the BIOS is reserving that 
memory range since it's the MMCONFIG aperture..)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource problems caused by improper address rounding

2007-12-18 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 18 Dec 2007, Richard Henderson wrote:

I've added dmesg, /proc/iomem, and lspci -v output to that bug.

Basically, we have

c000-cfff : free
ddf0-dfef : PCI Bus #04
e000-efff : pnp 00:0b
f000-fedf : less than 256MB


Gaah. 

That really is very unlucky. That 256M only goes at one point in the low 
4GB, but the thing is, it fits perfectly well above it, and dammit, that 
resource is explicitly a 64-bit resource or a really good reason. 


However, I wonder about that

e000-efff : pnp 00:0b

thing. I actually suspect that that whole allocation is literally *meant* 
for that 256MB graphics aperture, but the kernel explicitly avoids it 
because it's listed in the PnP tables.


That is probably the MMCONFIG aperture, in that case any attempt to map 
the graphics BAR there will have disastrous results. (This BIOS has an 
MCFG table, though it looks like this Fedora kernel has MMCONFIG 
disabled, so we can't tell what it actually contains.)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Memory Read Error

2007-12-18 Thread Robert Hancock

shashi59 wrote:

I am newbie for Linux Kernel.How can I read the memory area like the range
between  to .Directly i read that area it shows some error
like this "unable to handle kernel paging request at virtual address
". So,I don't know, how to solve this error .Please anyone help
me


First off, why are you trying to do this and how. Without such details 
it's impossible to answer this question.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] 2.6.24-rcx: Make sys_poll() wait at least timeout ms

2007-12-18 Thread Robert Hancock

Karsten Wiese wrote:

Hi,

while playing with jackd on 2.6.24-rcx, I found poll() timing out too early.
That is: earlier than its timeout argument specified.
Setting poll()'s timeout argument to "required timeout" + "1 jiffy in ms"
fixed it. Patch below should fix it too. Correct?
Untested.
Otherwise 2.6.24-rc5 ticks just fine here, thanks.

  Karsten
 
->

Make sys_poll() wait at least timeout ms

schedule_timeout(jiffies) waits for at least jiffies - 1.
Add 1 jiffie to the timeout_jiffies calculated in sys_poll() to wait at least
timeout_msecs, like poll() manpage says.

Signed-off-by: Karsten Wiese <[EMAIL PROTECTED]>
---
 fs/select.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 47f4792..5633fe9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -739,7 +739,7 @@ asmlinkage long sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
timeout_jiffies = -1;
else
 #endif
-   timeout_jiffies = msecs_to_jiffies(timeout_msecs);
+   timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
} else {
/* Infinite (< 0) or no (0) timeout */
timeout_jiffies = timeout_msecs;


That seems fishy. What is your value of HZ and what is the timeout value 
that was passed in the bad case?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Memory Read Error

2007-12-18 Thread Robert Hancock

shashi59 wrote:

I am newbie for Linux Kernel.How can I read the memory area like the range
between  to .Directly i read that area it shows some error
like this unable to handle kernel paging request at virtual address
. So,I don't know, how to solve this error .Please anyone help
me


First off, why are you trying to do this and how. Without such details 
it's impossible to answer this question.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource problems caused by improper address rounding

2007-12-18 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 18 Dec 2007, Richard Henderson wrote:

I've added dmesg, /proc/iomem, and lspci -v output to that bug.

Basically, we have

c000-cfff : free
ddf0-dfef : PCI Bus #04
e000-efff : pnp 00:0b
f000-fedf : less than 256MB


Gaah. 

That really is very unlucky. That 256M only goes at one point in the low 
4GB, but the thing is, it fits perfectly well above it, and dammit, that 
resource is explicitly a 64-bit resource or a really good reason. 


However, I wonder about that

e000-efff : pnp 00:0b

thing. I actually suspect that that whole allocation is literally *meant* 
for that 256MB graphics aperture, but the kernel explicitly avoids it 
because it's listed in the PnP tables.


That is probably the MMCONFIG aperture, in that case any attempt to map 
the graphics BAR there will have disastrous results. (This BIOS has an 
MCFG table, though it looks like this Fedora kernel has MMCONFIG 
disabled, so we can't tell what it actually contains.)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource problems caused by improper address rounding

2007-12-18 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 18 Dec 2007, Chuck Ebbert wrote:


On 12/18/2007 04:09 PM, Linus Torvalds wrote:
I wonder what the heck is the point of that pnp entry. Just for fun, can 
you try to just disable CONFIG_PNP, and see if it all works then?

pnpacpi=off should work.

PnP is also trying (and failing) to reserve all physical memory.


Yeah, that really is a pretty confused-looking pnp table thing. But I have 
absolutely zero idea how PnP is even supposed to work - the whole thing is 
just a total hack for Windows, afaik.


The sad part is that *normally* the right thing to do about almost any 
BIOS information is what we do right now: just avoid that magic address 
range like the plague, because we have no clue what the heck the BIOS is 
up to. But it looks like in this particular case, some of the problems 
may arise exactly *because* we avoid that range.


It would be good to know what Windows does. If ACPI is found, does it 
perhaps just ignore all the PnP entries these days?


Linus


ACPI is where those PnP entries are coming from (on any modern system 
anyway). They do show up in Device Manager as devices with resources 
(the one that reserves all of system RAM on my machine is labeled 
System board, others like the one that reserves the MMCONFIG aperature 
are Motherboard resources - the name is based on the PNP device ID, I 
believe).


It could be that Windows is stupid enough that it will map things over 
top of physical RAM if the BIOS doesn't explicitly reserve it like that. 
 I suspect based on some comments in Microsoft documents that Windows 
uses the E820 table to figure out where the RAM is, and ACPI/PnP 
information to figure out where IO mappings are, but may not really 
combine those two pieces of information into one overall map like Linux 
does, which would explain why it needs to reserve all physical RAM..


(As mentioned in another post, I would guess the BIOS is reserving that 
memory range since it's the MMCONFIG aperture..)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] 2.6.24-rcx: Make sys_poll() wait at least timeout ms

2007-12-18 Thread Robert Hancock

Karsten Wiese wrote:

Hi,

while playing with jackd on 2.6.24-rcx, I found poll() timing out too early.
That is: earlier than its timeout argument specified.
Setting poll()'s timeout argument to required timeout + 1 jiffy in ms
fixed it. Patch below should fix it too. Correct?
Untested.
Otherwise 2.6.24-rc5 ticks just fine here, thanks.

  Karsten
 
-

Make sys_poll() wait at least timeout ms

schedule_timeout(jiffies) waits for at least jiffies - 1.
Add 1 jiffie to the timeout_jiffies calculated in sys_poll() to wait at least
timeout_msecs, like poll() manpage says.

Signed-off-by: Karsten Wiese [EMAIL PROTECTED]
---
 fs/select.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 47f4792..5633fe9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -739,7 +739,7 @@ asmlinkage long sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
timeout_jiffies = -1;
else
 #endif
-   timeout_jiffies = msecs_to_jiffies(timeout_msecs);
+   timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
} else {
/* Infinite ( 0) or no (0) timeout */
timeout_jiffies = timeout_msecs;


That seems fishy. What is your value of HZ and what is the timeout value 
that was passed in the bad case?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource problems caused by improper address rounding

2007-12-18 Thread Robert Hancock

Linus Torvalds wrote:


On Mon, 17 Dec 2007, Chuck Ebbert wrote:

Looks like a commit that I can't find in git due to the arch merge
has broken PCI address assignment. This patch by Richard Henderson
against 2.6.23 fixes it for x86_64:

--- linux-2.6.23.x86_64/arch/x86_64/kernel/e820.c   2007-10-09 
13:31:38.0 -0700
+++ linux-2.6.23.x86_64-rth/arch/x86_64/kernel/e820.c   2007-12-15 
12:37:44.0 -0800
@@ -718,8 +718,8 @@ __init void e820_setup_gap(void)
while ((gapsize  4)  round)
round += round;
/* Fun with two's complement */
-   pci_mem_start = (gapstart + round)  -round;
+   pci_mem_start = (gapstart + round - 1)  -round;


No, it's very much meant to be that way.

We do *not* want to have the PCI memory abutthe end of memory exactly. So 
it leaves a gap in between gapstart and the actual start of PCI memory 
addressing very much on purpose.


In fact, the very commit (it's f0eca9626c6becb6fc56106b2e4287c6c784af3d in 
the kernel tree) you mention actually explicitly *explains* that, although 
maybe it's a bit indirect: if you start allocating PCI resources directly 
after the end-of-RAM thing, you can easily end up using addresses that are 
actually inside the magic stolen system RAM that is being used for UMA 
video etc.


So you very much want to have a buffer in between the end-of-RAM and the 
actual start of the region we try to allocate in. 

So why do you want them to be close, anyway? 


Linus

PS. On a different topic: if you do

git log --follow arch/x86/kernel/e820_64.c

you'd see the history past the renames in git. Or just do a git blame -C 
which will also follow renames (and copies).


That patch is from the 2.6.14 era - I don't think we even did PnP ACPI 
resource reservation handling then? It could be that the BIOS was trying 
to tell us that UMA memory region is reserved using PnP ACPI 
reservations, but we just ignored it.


It seems rather arbitrary in how much it leaves unused - and in this 
case, likely prevents us from using the nice big open gap that the BIOS 
presumably expected the graphics card to be mapped into.


I suspect this buffer space insertion is really not needed at this 
point. The patch description is likely technically correct in that the 
BIOS should have reserved it in E820, but (according to MS comments in a 
presentation I read) Windows doesn't use E820 for anything other than 
figuring out where RAM is, it uses PnP ACPI for figuring out areas it 
needs to avoid. Since BIOS writers test against that behavior, there are 
surely lots of systems where ignoring PnP ACPI reservations and relying 
only on E820 would result in things really going blammo (like mappings 
things over MMCONFIG tables for instance). So disabling it on modern 
machines is really not an option. And if it's enabled, you likely 
wouldn't hit the problem it tries to fix.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of memory and no killable processes: 2.6.22-2-686-bigmem

2007-12-18 Thread Robert Hancock

Nico Schottelius wrote:

Hello!

We are running Debian with 2.6.22-2-686-bigmem on Dell Blade 1955 hardware
and get a Kernel Panic with oom + message that there are no processes
left to kill:

http://home.schottelius.org/~nico/unix/linux/oom_no_killable-2.6.22-1.jpeg

Anyone an idea, what's the cause for that? This error happened on two of
those machines,

What I can see in our analysis done with munin is that the number of
open inodes and inode table size decreased within some days from 40k
to next to zero. Munin uses

   awk '{print used.value  $1-$2 \nmax.value  $1}'  /proc/sys/fs/inode-nr

to log those value (happened on both machines).

Thanks for any hint and CC as usual, please.


How much RAM is in these machines? If you're running tons of memory, it 
really is better to run a 64-bit kernel if possible. I believe there are 
some cases where low memory can be pretty easily exhausted on machines 
with lots of high memory.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] 2.6.24-rcx: Make sys_poll() wait at least timeout ms

2007-12-18 Thread Robert Hancock

Karsten Wiese wrote:

Am Mittwoch, 19. Dezember 2007 schrieb Robert Hancock:
That seems fishy. What is your value of HZ and what is the timeout value 
that was passed in the bad case?


HZ set to 250, timeout to 4ms.
Time spent in poll() taken by clock_gettime(CLOCK_MONOTONIC, time)
before and after poll()call: i.e 62us.
Time measured with hpet gave 166us once.


msecs_to_jiffies (kernel/time.c) has this:

#if HZ = MSEC_PER_SEC  !(MSEC_PER_SEC % HZ)
/*
 * HZ is equal to or smaller than 1000, and 1000 is a nice
 * round multiple of HZ, divide with the factor between them,
 * but round upwards:
 */
return (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ);

With HZ=250 and m=4 this gives 7/4 or only 1 jiffy, which is not more 
than 4ms, but if we are already at near the end of the current jiffy it 
could be much less than that (potentially almost no time at all).


Maybe we could convert poll to use a hrtimer for this instead?

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-16 Thread Robert Hancock

Ingo Molnar wrote:

* H. Peter Anvin <[EMAIL PROTECTED]> wrote:


Pavel Machek wrote:

this is also something for v2.6.24 merging.

As much as I like this patch, I do not think it is suitable for
.24. Too risky, I'd say.

No kidding!  We're talking about removing a hack that has been 
successful on thousands of pieces of hardware over 15 years because it 

 ^[*]

breaks ONE machine.


[*] "- none of which needs it anymore -"

there, fixed it for you ;-)

So lets keep this in perspective: this is a hack that only helps on a 
very low number of systems. (the PIT of one PII era chipset is known to 
be affected)


unfortunately this hack's side-effects are mis-used by an unknown number 
of drivers to mask PCI posting bugs. We want to figure out those bugs 
(safely and carefully) and we want to remove this hack from modern 
machines that dont need it. Doing anything else would be superstition.


Are there any such examples known of such drivers? It doesn't seem to 
make much sense.. PCI IO writes are not posted on any known system (the 
spec allows them to be posted in the host bus bridge, but if they were 
they could only be flushed by a read, not a write) and PCI MMIO writes 
are only guaranteed to flush by doing a read from that device, not by 
other random port accesses. I suppose using the _p versions of port 
accesses might happen to mask such problems on certain machines..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: fix problems due to use of "outb" to port 80 on some AMD64x2 laptops, etc.

2007-12-16 Thread Robert Hancock

David P. Reed wrote:
PS: If I have time, I may try to build Rene's port 80 test for Windows 
and run it under WinXP on this machine (I still have a crappy little 
partition that boots it).   If it freezes the same way, it's almost 
certain a design "feature", and if it doesn't freeze, we might suspect 
that there is compensating logic in either Windows ACPI code or some way 
that windows "sets up" the machine.


You'd have to replace the iopl call to an equivalent one for Windows 
(seems like NtSetInformationProcess(ProcessUserModeIOPL) might do what 
you need).


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: fix problems due to use of outb to port 80 on some AMD64x2 laptops, etc.

2007-12-16 Thread Robert Hancock

David P. Reed wrote:
PS: If I have time, I may try to build Rene's port 80 test for Windows 
and run it under WinXP on this machine (I still have a crappy little 
partition that boots it).   If it freezes the same way, it's almost 
certain a design feature, and if it doesn't freeze, we might suspect 
that there is compensating logic in either Windows ACPI code or some way 
that windows sets up the machine.


You'd have to replace the iopl call to an equivalent one for Windows 
(seems like NtSetInformationProcess(ProcessUserModeIOPL) might do what 
you need).


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: fix problems due to use of outb to port 80 on some AMD64x2 laptops, etc.

2007-12-16 Thread Robert Hancock

Ingo Molnar wrote:

* H. Peter Anvin [EMAIL PROTECTED] wrote:


Pavel Machek wrote:

this is also something for v2.6.24 merging.

As much as I like this patch, I do not think it is suitable for
.24. Too risky, I'd say.

No kidding!  We're talking about removing a hack that has been 
successful on thousands of pieces of hardware over 15 years because it 

 ^[*]

breaks ONE machine.


[*] - none of which needs it anymore -

there, fixed it for you ;-)

So lets keep this in perspective: this is a hack that only helps on a 
very low number of systems. (the PIT of one PII era chipset is known to 
be affected)


unfortunately this hack's side-effects are mis-used by an unknown number 
of drivers to mask PCI posting bugs. We want to figure out those bugs 
(safely and carefully) and we want to remove this hack from modern 
machines that dont need it. Doing anything else would be superstition.


Are there any such examples known of such drivers? It doesn't seem to 
make much sense.. PCI IO writes are not posted on any known system (the 
spec allows them to be posted in the host bus bridge, but if they were 
they could only be flushed by a read, not a write) and PCI MMIO writes 
are only guaranteed to flush by doing a read from that device, not by 
other random port accesses. I suppose using the _p versions of port 
accesses might happen to mask such problems on certain machines..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New question on that sata controller

2007-12-15 Thread Robert Hancock

Gene Heskett wrote:

Greetings;

When I asked about a sata controller earlier this week, I gave a link to it.  
Unforch (maybe) when it actually arrived, the cards box showed a silicon 
image chip, and the card had a via.  So much for getting what I ordered...


The required module then was sata_via, not sata_uli, and it seems to be 
working ok.  However, this one claims its a raid controller according to an 
lspci -v:


01:0a.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID Controller 
(rev 50)

Subsystem: VIA Technologies, Inc. VT6421 IDE RAID Controller
Flags: bus master, medium devsel, latency 32, IRQ 19
I/O ports at 9400 [size=16]
I/O ports at 9800 [size=16]
I/O ports at 9c00 [size=16]
I/O ports at a000 [size=16]
I/O ports at a400 [size=32]
I/O ports at a800 [size=256]
[virtual] Expansion ROM at e900 [disabled] [size=64K]
Capabilities: [e0] Power Management version 2

I just noted that the Expansion ROM is disabled, but I didn't see any jumpers 
to enable it on the card prior to installing it.  Does anyone know how this 
is supposed to work?  I would like to make it directly bootable but I believe 
this has to be 'enabled' for that.


It's usually normal for it to be disabled after boot, I believe. Are you 
getting anything showing up on boot indicating its BIOS is active?




I cannot find any references to this particular chip in a 'make xconfig' for 
2.6.24-rc5.


Should this be a concern, or is this one a 'Just Works(TM)' chipset?  This 
card has 3 sata port connectors and one ide fitted.


Two rather pleasant side effects of going to the Biostar.tw site and finding a 
newer bios and installing it on an M7NCD Pro mobo are:


1: FSB now running at 400MHZ, was 333 before as it was not at all stable at 
400 and I have been told the XP-2800 Athlon only supports 333 and AMD's site 
agrees.


2: CPU temps are down around 13F.  CPU speed still the same at 2079MHZ 
according to dmesg.


The reduced temps at a higher FSB indicates better interface timing, and if it 
runs the rest of the night at 400 without a self reboot or crash, I'll leave 
it there.





--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New question on that sata controller

2007-12-15 Thread Robert Hancock

Gene Heskett wrote:

Greetings;

When I asked about a sata controller earlier this week, I gave a link to it.  
Unforch (maybe) when it actually arrived, the cards box showed a silicon 
image chip, and the card had a via.  So much for getting what I ordered...


The required module then was sata_via, not sata_uli, and it seems to be 
working ok.  However, this one claims its a raid controller according to an 
lspci -v:


01:0a.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID Controller 
(rev 50)

Subsystem: VIA Technologies, Inc. VT6421 IDE RAID Controller
Flags: bus master, medium devsel, latency 32, IRQ 19
I/O ports at 9400 [size=16]
I/O ports at 9800 [size=16]
I/O ports at 9c00 [size=16]
I/O ports at a000 [size=16]
I/O ports at a400 [size=32]
I/O ports at a800 [size=256]
[virtual] Expansion ROM at e900 [disabled] [size=64K]
Capabilities: [e0] Power Management version 2

I just noted that the Expansion ROM is disabled, but I didn't see any jumpers 
to enable it on the card prior to installing it.  Does anyone know how this 
is supposed to work?  I would like to make it directly bootable but I believe 
this has to be 'enabled' for that.


It's usually normal for it to be disabled after boot, I believe. Are you 
getting anything showing up on boot indicating its BIOS is active?




I cannot find any references to this particular chip in a 'make xconfig' for 
2.6.24-rc5.


Should this be a concern, or is this one a 'Just Works(TM)' chipset?  This 
card has 3 sata port connectors and one ide fitted.


Two rather pleasant side effects of going to the Biostar.tw site and finding a 
newer bios and installing it on an M7NCD Pro mobo are:


1: FSB now running at 400MHZ, was 333 before as it was not at all stable at 
400 and I have been told the XP-2800 Athlon only supports 333 and AMD's site 
agrees.


2: CPU temps are down around 13F.  CPU speed still the same at 2079MHZ 
according to dmesg.


The reduced temps at a higher FSB indicates better interface timing, and if it 
runs the rest of the night at 400 without a self reboot or crash, I'll leave 
it there.





--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Could not set non-blocking flag with 2.6.24-rc5

2007-12-14 Thread Robert Hancock

Tino Keitel wrote:

Hi folks,

I often build Debian packages inside a chroot. Today I discovered a
failure during an "aptitude update", which is a command to download new
package lists for the package management. In strace, the lines around
the failure look like this:

99% [Working]) = 14 14
[pid  5986] select(6, [3 4 5], [], NULL, {0, 50}) = 0 (Timeout)
[pid  5986] gettimeofday({1197576353, 670510}, NULL) = 0
[pid  5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
[pid  5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
99% [Working]) = 14 14
[pid  5986] select(6, [3 4 5], [], NULL, {0, 50}) = 0 (Timeout)
[pid  5986] gettimeofday({1197576354, 173902}, NULL) = 0
[pid  5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
[pid  5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
99% [Working]) = 14 14
[pid  5986] select(6, [3 4 5], [], NULL, {0, 50} 
[pid  5988] <... select resumed> )  = 1 (in [3], left {105, 0})
[pid  5988] read(3, "", 56559)  = 0
[pid  5988] fcntl64(-1, F_GETFL)= -1 EBADF (Bad file
descriptor)
[pid  5988] fcntl64(-1, F_SETFL,
O_ACCMODE|O_CREAT|O_EXCL|O_NOCTTY|O_TRUNC|O_APPEND|O_SYNC|O_ASYNC|O_DIRECT|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|0xfff8003c)
= -1 EBADF (Bad file descriptor)
[pid  5988] write(2, ""..., 41FATAL -> Could not set non-blocking flag
) = 41
[pid  5988] write(2, ""..., 19Bad file descriptor) = 19
[pid  5988] write(2, ""..., 1
)  = 1
[pid  5988] exit_group(100) = ?
Process 5988 detached

This happened with a kernel after 2.6.24-rc5
(4af75653031c6d454b4ace47c1536f0d2e727e3e). I rebooted into 2.6.23.8
and it worked. Now I rebooted into 2.6.24-rc5 again and was able to
reproduce the failure, so it looks like a kernel issue to me.


With this part of strace output it seems like an obvious userspace bug 
(calling fcntl on a -1 file descriptor). Could be some other change in 
behavior or timing difference is triggering the bug,however.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange ATA problems

2007-12-14 Thread Robert Hancock

Tejun Heo wrote:

Dec 14 01:06:33 fermat kernel: ata1: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next
 cpb idx 0x0
Dec 14 01:06:33 fermat kernel: ata1: CPB 0: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 1: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 2: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 3: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 4: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 5: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 6: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 7: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 8: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 9: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 10: ctl_flags 0x1f, resp_flags
0x2


CPB flags stuck at 0x2 indicates that the controller issued the command 
to the drive and is waiting for completion. Usually seems to indicate 
some kind of SATA communication problem.



If your USB cdrom is bus powered and you yanked it, it could have caused
fluctuation in power which in turn can cause disruption on serial ATA
bus leading to transmission error and timeouts.  There are other
possibilities but this kind of thing does happen often with SATA.  Those
highspeed low-voltage serial links are very susceptible to interferences.

Well,.. it actually "worked" again when I unplugged it, but the errors
from the cdrom above are probably unrelated..



As long as EH recovered it properly, there's nothing to worry about.

What does that mean?


That means unless the problem continues to occur repeatedly, you don't
have to worry about it.



Yes, if it didn't recur, was likely just a transient glitch.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange ATA problems

2007-12-14 Thread Robert Hancock

Tejun Heo wrote:

Dec 14 01:06:33 fermat kernel: ata1: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next
 cpb idx 0x0
Dec 14 01:06:33 fermat kernel: ata1: CPB 0: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 1: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 2: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 3: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 4: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 5: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 6: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 7: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 8: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 9: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 10: ctl_flags 0x1f, resp_flags
0x2


CPB flags stuck at 0x2 indicates that the controller issued the command 
to the drive and is waiting for completion. Usually seems to indicate 
some kind of SATA communication problem.



If your USB cdrom is bus powered and you yanked it, it could have caused
fluctuation in power which in turn can cause disruption on serial ATA
bus leading to transmission error and timeouts.  There are other
possibilities but this kind of thing does happen often with SATA.  Those
highspeed low-voltage serial links are very susceptible to interferences.

Well,.. it actually worked again when I unplugged it, but the errors
from the cdrom above are probably unrelated..



As long as EH recovered it properly, there's nothing to worry about.

What does that mean?


That means unless the problem continues to occur repeatedly, you don't
have to worry about it.



Yes, if it didn't recur, was likely just a transient glitch.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Could not set non-blocking flag with 2.6.24-rc5

2007-12-14 Thread Robert Hancock

Tino Keitel wrote:

Hi folks,

I often build Debian packages inside a chroot. Today I discovered a
failure during an aptitude update, which is a command to download new
package lists for the package management. In strace, the lines around
the failure look like this:

99% [Working]) = 14 14
[pid  5986] select(6, [3 4 5], [], NULL, {0, 50}) = 0 (Timeout)
[pid  5986] gettimeofday({1197576353, 670510}, NULL) = 0
[pid  5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
[pid  5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
99% [Working]) = 14 14
[pid  5986] select(6, [3 4 5], [], NULL, {0, 50}) = 0 (Timeout)
[pid  5986] gettimeofday({1197576354, 173902}, NULL) = 0
[pid  5986] rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0
[pid  5986] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
99% [Working]) = 14 14
[pid  5986] select(6, [3 4 5], [], NULL, {0, 50} unfinished ...
[pid  5988] ... select resumed )  = 1 (in [3], left {105, 0})
[pid  5988] read(3, , 56559)  = 0
[pid  5988] fcntl64(-1, F_GETFL)= -1 EBADF (Bad file
descriptor)
[pid  5988] fcntl64(-1, F_SETFL,
O_ACCMODE|O_CREAT|O_EXCL|O_NOCTTY|O_TRUNC|O_APPEND|O_SYNC|O_ASYNC|O_DIRECT|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW|O_NOATIME|0xfff8003c)
= -1 EBADF (Bad file descriptor)
[pid  5988] write(2, ..., 41FATAL - Could not set non-blocking flag
) = 41
[pid  5988] write(2, ..., 19Bad file descriptor) = 19
[pid  5988] write(2, ..., 1
)  = 1
[pid  5988] exit_group(100) = ?
Process 5988 detached

This happened with a kernel after 2.6.24-rc5
(4af75653031c6d454b4ace47c1536f0d2e727e3e). I rebooted into 2.6.23.8
and it worked. Now I rebooted into 2.6.24-rc5 again and was able to
reproduce the failure, so it looks like a kernel issue to me.


With this part of strace output it seems like an obvious userspace bug 
(calling fcntl on a -1 file descriptor). Could be some other change in 
behavior or timing difference is triggering the bug,however.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ATA ACPI needs "Mr interpreter, would you please shut up?" flag

2007-12-13 Thread Robert Hancock

Tejun Heo wrote:

Hello, all.

During 2.6.24-rc1, libata enabled ATA-ACPI support by default and there
have been a lot of regression reports stemming from it.  I have patchset
ready to fix most of the problems.  With these patches applied, libata
should be able to cope with most failures pretty well.  There is one
remaining issue tho.

libata caches the result of _GTM during controller for later use.  The
primary use is to peek at how BIOS configured the controller.  Some
controllers (pata_via and pata_amd) lack proper cable detection and BIOS
configured values are used as reference.  This caching is done before
any other operation is performed on the port to avoid caching corrupted
data.

Problem is that _GTM implementation on certain BIOSen crap themselves if
invoked on empty channels.  However, as written above, because initial
_GTM caching is done before any actual operation is performed on the
port, libata can't determine whether the port is occupied or not when
trying to cache _GTM result.  Unfortunately, VIA PATA is on both
categories - it needs _GTM caching but can't cope with _GTM invocation
on empty ports.  Yay!


I seem to have lost the thread/bug report where we decided that one 
board always choked on an empty channel. Maybe it's not that and it's 
just another case of the same issue where our resetting default timing 
values on the controller before calling _GTM would choke the _GTM method?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource unavailable on mips

2007-12-13 Thread Robert Hancock

Jon Dufresne wrote:

Hi,

I've done a bit of linux driver development on x86 in the past.
Currently I am working on my first ever linux driver for a mips box. I
started by testing the device in an x86 box and got it reasonable stable
and am now testing it in the mips box. There appears to be a major
problem, one unlike I have ever seen before.

My PCI device has three BARS. This can be confirmed by the Technical
documentation and the x86 code. When the pci device is first probed, I
run a loop to printk out the bar information, this is just as a sanity
check. Here is the output on the x86:

Bar0:PHYS=e000 LEN=0400
Bar1:PHYS=efa0 LEN=0020
Bar2:PHYS=e800 LEN=0400


So, two 64MB BARs and a 2MB one?



but here is the output on the mips:
Bar0:PHYS=2000 LEN=0400
Bar1:PHYS=2400 LEN=0020
Bar2:PHYS= LEN=

notice, BAR2 has no valid information on the mips. I tried to run
"pci_enable_device" before printing this information, as suggested by
LDD but it did not help.

Has anyone seen a problem like this before and any idea how I can get
BAR2 a proper address?

If I examine the config space directly there is an address in BAR2's
register, however it isn't in the 0x2000 range like the other two,
instead it is 0x1c00. Also if I do a ``cat /proc/iomem'' I correctly
see BAR0 and BAR1 in the output, but not BAR2.


Any PCI resource allocation errors in dmesg during the boot process? 
Could be the kernel wasn't able to find a place to map all of the BARs.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI resource unavailable on mips

2007-12-13 Thread Robert Hancock

Jon Dufresne wrote:

Hi,

I've done a bit of linux driver development on x86 in the past.
Currently I am working on my first ever linux driver for a mips box. I
started by testing the device in an x86 box and got it reasonable stable
and am now testing it in the mips box. There appears to be a major
problem, one unlike I have ever seen before.

My PCI device has three BARS. This can be confirmed by the Technical
documentation and the x86 code. When the pci device is first probed, I
run a loop to printk out the bar information, this is just as a sanity
check. Here is the output on the x86:

Bar0:PHYS=e000 LEN=0400
Bar1:PHYS=efa0 LEN=0020
Bar2:PHYS=e800 LEN=0400


So, two 64MB BARs and a 2MB one?



but here is the output on the mips:
Bar0:PHYS=2000 LEN=0400
Bar1:PHYS=2400 LEN=0020
Bar2:PHYS= LEN=

notice, BAR2 has no valid information on the mips. I tried to run
pci_enable_device before printing this information, as suggested by
LDD but it did not help.

Has anyone seen a problem like this before and any idea how I can get
BAR2 a proper address?

If I examine the config space directly there is an address in BAR2's
register, however it isn't in the 0x2000 range like the other two,
instead it is 0x1c00. Also if I do a ``cat /proc/iomem'' I correctly
see BAR0 and BAR1 in the output, but not BAR2.


Any PCI resource allocation errors in dmesg during the boot process? 
Could be the kernel wasn't able to find a place to map all of the BARs.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ATA ACPI needs Mr interpreter, would you please shut up? flag

2007-12-13 Thread Robert Hancock

Tejun Heo wrote:

Hello, all.

During 2.6.24-rc1, libata enabled ATA-ACPI support by default and there
have been a lot of regression reports stemming from it.  I have patchset
ready to fix most of the problems.  With these patches applied, libata
should be able to cope with most failures pretty well.  There is one
remaining issue tho.

libata caches the result of _GTM during controller for later use.  The
primary use is to peek at how BIOS configured the controller.  Some
controllers (pata_via and pata_amd) lack proper cable detection and BIOS
configured values are used as reference.  This caching is done before
any other operation is performed on the port to avoid caching corrupted
data.

Problem is that _GTM implementation on certain BIOSen crap themselves if
invoked on empty channels.  However, as written above, because initial
_GTM caching is done before any actual operation is performed on the
port, libata can't determine whether the port is occupied or not when
trying to cache _GTM result.  Unfortunately, VIA PATA is on both
categories - it needs _GTM caching but can't cope with _GTM invocation
on empty ports.  Yay!


I seem to have lost the thread/bug report where we decided that one 
board always choked on an empty channel. Maybe it's not that and it's 
just another case of the same issue where our resetting default timing 
values on the controller before calling _GTM would choke the _GTM method?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible issue with dangling PCI BARs

2007-12-12 Thread Robert Hancock

Benjamin Herrenschmidt wrote:

On Thu, 2007-12-13 at 14:05 +1100, Benjamin Herrenschmidt wrote:

On Thu, 2007-12-13 at 14:00 +1100, Benjamin Herrenschmidt wrote:

 .../...

(oops, sent too fast)


So not only we can have a dangling BAR, but nothing prevent us to
actually go turn IO or MEM decoding on in case it wasn't already the
case on that device.

And I was about to say before I clicked "send".. can't we do something like
writing all ff's into the BAR at the same time as we clear res->start ? Isn't
that supposed to pretty much disable decoding on that BAR ? Or not... Probably
still better than leaving it to whatever dangling value it had no ?


Ok, reading some other threads, it seems that writing all ff's will not
be a very good alternative on x86 machines where MMCONFIG sits up
there...

I suppose there is nothing totally safe that can be done, thanks to
Intel not thinking about making BARs individually enable/disable'able
(or size-able without interrupting access, among other numerous fuckups
in the PCI spec).

So if a BAR is left dangling, I think we -must- disable MEM and IO
decoding on the whole device. In fact, the whole trick of passing a
bitmask of required BARs to pci_enable_device_bars() in the first place
doesn't fly.

Yuck.


We could do a bit better than that - a common use case with 
pci_enable_device_bars would be where the device has some IO space that 
we don't care about because we only want to use MMIO space. If we only 
want to enable MMIO BARs then we don't need to enable IO decoding, and 
in that case it doesn't matter if we failed to find space for the IO 
space and it overlaps something else.


It looks like we already handle the "not enabling IO decoding" part in 
this case, except that it doesn't look like we ever would disable the 
decoding if it was already enabled.


For the case where you say "I want to enable decoding for this MMIO BAR, 
but not that one", though, I don't see an obvious way to provide that 
guarantee with certainty. Normally, one would expect that if a BAR is 
mapped safely outside the decode window of a PCI bridge it's behind, 
that it won't ever see the requests and can't respond to them. However, 
the Intel chipset MMCONFIG overlap fiasco appears to show that this is 
not always the case and in some cases the device can see and respond to 
requests outside of the bridge's decode window (with higher decode 
priority than the MMCONFIG aperture, even)..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmaping an IO port device

2007-12-12 Thread Robert Hancock

Aras Vaichas wrote:

Hi,

Can I implement mmap with an io port connected device on an x86 based CPU?


Background:

I've got a device driver which can be compiled for either x86 or ARM. 
The driver provides an interface to an FPGA via either an IO port 
(0x180) on the x86 or as a memory mapped SRAM-like device (0x3000) 
on the ARM.


To get myself an "address" for ioread calls I use:

FPGA_base = (u32) ioremap_nocache(FPGA_REG_IO_BASE, SZ_4K) for both CPU 
types.


FPGA_REG_IO_BASE is set to either 0x180 or 0x3000 for x86 and ARM 
respectively.


I then call ioread16(FPGA_base + FPGA_register) for both x86 and ARM and 
it all works perfectly. No problems there.


My problem is that I am now moving from ioctl calls to a mmap interface. 
This isn't a problem with ARM as I can pass (0x3000 >> PAGE_SHIFT) 
to remap_pfn_range() in the .mmap fops function but I can't pass 0x180 
because ... well, it's obvious.


Is there a trick?

Aras


It's impossible to mmap an IO port area on x86 since IO ports are not 
accessible as part of the normal memory space. The only way to get 
access to IO ports in userspace is to use iopl (which requires root 
privileges) and then executing inl/outl, etc. instructions directly.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmaping an IO port device

2007-12-12 Thread Robert Hancock

Aras Vaichas wrote:

Hi,

Can I implement mmap with an io port connected device on an x86 based CPU?


Background:

I've got a device driver which can be compiled for either x86 or ARM. 
The driver provides an interface to an FPGA via either an IO port 
(0x180) on the x86 or as a memory mapped SRAM-like device (0x3000) 
on the ARM.


To get myself an address for ioread calls I use:

FPGA_base = (u32) ioremap_nocache(FPGA_REG_IO_BASE, SZ_4K) for both CPU 
types.


FPGA_REG_IO_BASE is set to either 0x180 or 0x3000 for x86 and ARM 
respectively.


I then call ioread16(FPGA_base + FPGA_register) for both x86 and ARM and 
it all works perfectly. No problems there.


My problem is that I am now moving from ioctl calls to a mmap interface. 
This isn't a problem with ARM as I can pass (0x3000  PAGE_SHIFT) 
to remap_pfn_range() in the .mmap fops function but I can't pass 0x180 
because ... well, it's obvious.


Is there a trick?

Aras


It's impossible to mmap an IO port area on x86 since IO ports are not 
accessible as part of the normal memory space. The only way to get 
access to IO ports in userspace is to use iopl (which requires root 
privileges) and then executing inl/outl, etc. instructions directly.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible issue with dangling PCI BARs

2007-12-12 Thread Robert Hancock

Benjamin Herrenschmidt wrote:

On Thu, 2007-12-13 at 14:05 +1100, Benjamin Herrenschmidt wrote:

On Thu, 2007-12-13 at 14:00 +1100, Benjamin Herrenschmidt wrote:

 .../...

(oops, sent too fast)


So not only we can have a dangling BAR, but nothing prevent us to
actually go turn IO or MEM decoding on in case it wasn't already the
case on that device.

And I was about to say before I clicked send.. can't we do something like
writing all ff's into the BAR at the same time as we clear res-start ? Isn't
that supposed to pretty much disable decoding on that BAR ? Or not... Probably
still better than leaving it to whatever dangling value it had no ?


Ok, reading some other threads, it seems that writing all ff's will not
be a very good alternative on x86 machines where MMCONFIG sits up
there...

I suppose there is nothing totally safe that can be done, thanks to
Intel not thinking about making BARs individually enable/disable'able
(or size-able without interrupting access, among other numerous fuckups
in the PCI spec).

So if a BAR is left dangling, I think we -must- disable MEM and IO
decoding on the whole device. In fact, the whole trick of passing a
bitmask of required BARs to pci_enable_device_bars() in the first place
doesn't fly.

Yuck.


We could do a bit better than that - a common use case with 
pci_enable_device_bars would be where the device has some IO space that 
we don't care about because we only want to use MMIO space. If we only 
want to enable MMIO BARs then we don't need to enable IO decoding, and 
in that case it doesn't matter if we failed to find space for the IO 
space and it overlaps something else.


It looks like we already handle the not enabling IO decoding part in 
this case, except that it doesn't look like we ever would disable the 
decoding if it was already enabled.


For the case where you say I want to enable decoding for this MMIO BAR, 
but not that one, though, I don't see an obvious way to provide that 
guarantee with certainty. Normally, one would expect that if a BAR is 
mapped safely outside the decode window of a PCI bridge it's behind, 
that it won't ever see the requests and can't respond to them. However, 
the Intel chipset MMCONFIG overlap fiasco appears to show that this is 
not always the case and in some cases the device can see and respond to 
requests outside of the bridge's decode window (with higher decode 
priority than the MMCONFIG aperture, even)..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Iomega ZIP-100 drive unsupported with jmicron JMB361 chip?

2007-12-10 Thread Robert Hancock

(linux-ide cc'ed)

trash can wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have tolerated this problem for a year and do not post to this list in
haste. I have posted on forums and searched the community over the past
year. I have looked at the list archive on gossamer-threads.com for
solutions. With Fedora Core 6 unsupported (the last kernel for which my
zip drive worked), it is time for my last attempt at a solution. Please
CC: any response as I have not joined the list. I have compiled a
kernel-debug RPM and can run this if its output would help. Thank you
for any time you might devote to this problem.

motherboard: MSI P965 Platinum/Intel P965 Express Chipset Based (MS-7238
series)
Fedora 8 : kernel 2.6.23.1-42.fc8
Iomega Zip drive internal Model Z100ATAPI

lspci
03:00.0 SATA controller: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02)
03:00.1 IDE interface: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02)

# lsmod | grep ata
pata_jmicron8257  0
ata_generic 8901  0
ata_piix   16709  0
libata 99633  4 ahci,pata_jmicron,ata_generic,ata_piix
scsi_mod  119757  4 sr_mod,sg,libata,sd_mod

I have recently changed the BIOS setting for the SATA#1 Controller from
[IDE] to [AHCI] with no effect. I assume AHCI is correct?


AHCI is better, yes. It shouldn't be relevant this this problem though.



Text below attached as text.txt for readability.
from dmesg:
libata version 2.21 loaded.
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: [EMAIL PROTECTED]
PCI: Enabling device :03:00.1 ( -> 0001)
ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device :03:00.1 to 64
scsi0 : pata_jmicron
scsi1 : pata_jmicron
ata1: PATA max UDMA/100 cmd 0x0001cc00 ctl 0x0001c882 bmdma 0x0001c400 irq 17
ata2: PATA max UDMA/100 cmd 0x0001c800 ctl 0x0001c482 bmdma 0x0001c408 irq 17
ata1.00: ATAPI: LITE-ON DVDRW SOHW-1693S, KS0B, max UDMA/66
ata1.01: ATAPI: IOMEGA  ZIP 100   ATAPI, 05.H, max MWDMA1, CDB intr
ata1.00: configured for UDMA/66
ata1.01: configured for MWDMA1
scsi 0:0:0:0: CD-ROMLITE-ON  DVDRW SOHW-1693S KS0B PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access IOMEGA   ZIP 100  05.H PQ: 0 ANSI: 5
sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda:<6>sd 0:0:1:0: [sda] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 0:0:1:0: [sda] Sense Key : Hardware Error [current]
sd 0:0:1:0: [sda] Add. Sense: Scsi parity error
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0

If a disk is inserted into the drive (/var/log/messages)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Spinning up disk.<5>sd 
0:0:1:0: [sda] Spinning up diskready
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware 
sectors (101 MB)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware 
sectors (101 MB)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Dec 10 14:22:53 localhost kernel:  sda:<6>sd 0:0:1:0: [sda] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Sense Key : Hardware Error 
[current]
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Add. Sense: Scsi parity 
error
Dec 10 14:22:53 localhost kernel: end_request: I/O error, dev sda, sector 0
Dec 10 14:22:53 localhost kernel: printk: 42 messages suppressed.
Dec 10 14:22:53 localhost kernel: Buffer I/O error on device sda, logical block 0


That is rather curious. There's no sign of any libata error handling 
going on.. Maybe the drive is actually returning that error code in the 
ATAPI CDB, or at least we think it is?


You are sure that this drive still works with older kernels using 
drivers/ide, and that the hardware didn't break at some point, I assume?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Iomega ZIP-100 drive unsupported with jmicron JMB361 chip?

2007-12-10 Thread Robert Hancock

(linux-ide cc'ed)

trash can wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have tolerated this problem for a year and do not post to this list in
haste. I have posted on forums and searched the community over the past
year. I have looked at the list archive on gossamer-threads.com for
solutions. With Fedora Core 6 unsupported (the last kernel for which my
zip drive worked), it is time for my last attempt at a solution. Please
CC: any response as I have not joined the list. I have compiled a
kernel-debug RPM and can run this if its output would help. Thank you
for any time you might devote to this problem.

motherboard: MSI P965 Platinum/Intel P965 Express Chipset Based (MS-7238
series)
Fedora 8 : kernel 2.6.23.1-42.fc8
Iomega Zip drive internal Model Z100ATAPI

lspci
03:00.0 SATA controller: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02)
03:00.1 IDE interface: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02)

# lsmod | grep ata
pata_jmicron8257  0
ata_generic 8901  0
ata_piix   16709  0
libata 99633  4 ahci,pata_jmicron,ata_generic,ata_piix
scsi_mod  119757  4 sr_mod,sg,libata,sd_mod

I have recently changed the BIOS setting for the SATA#1 Controller from
[IDE] to [AHCI] with no effect. I assume AHCI is correct?


AHCI is better, yes. It shouldn't be relevant this this problem though.



Text below attached as text.txt for readability.
from dmesg:
libata version 2.21 loaded.
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: [EMAIL PROTECTED]
PCI: Enabling device :03:00.1 ( - 0001)
ACPI: PCI Interrupt :03:00.1[B] - GSI 17 (level, low) - IRQ 17
PCI: Setting latency timer of device :03:00.1 to 64
scsi0 : pata_jmicron
scsi1 : pata_jmicron
ata1: PATA max UDMA/100 cmd 0x0001cc00 ctl 0x0001c882 bmdma 0x0001c400 irq 17
ata2: PATA max UDMA/100 cmd 0x0001c800 ctl 0x0001c482 bmdma 0x0001c408 irq 17
ata1.00: ATAPI: LITE-ON DVDRW SOHW-1693S, KS0B, max UDMA/66
ata1.01: ATAPI: IOMEGA  ZIP 100   ATAPI, 05.H, max MWDMA1, CDB intr
ata1.00: configured for UDMA/66
ata1.01: configured for MWDMA1
scsi 0:0:0:0: CD-ROMLITE-ON  DVDRW SOHW-1693S KS0B PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access IOMEGA   ZIP 100  05.H PQ: 0 ANSI: 5
sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00
sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda:6sd 0:0:1:0: [sda] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 0:0:1:0: [sda] Sense Key : Hardware Error [current]
sd 0:0:1:0: [sda] Add. Sense: Scsi parity error
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0

If a disk is inserted into the drive (/var/log/messages)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Spinning up disk.5sd 
0:0:1:0: [sda] Spinning up diskready
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware 
sectors (101 MB)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware 
sectors (101 MB)
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Dec 10 14:22:53 localhost kernel:  sda:6sd 0:0:1:0: [sda] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Sense Key : Hardware Error 
[current]
Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Add. Sense: Scsi parity 
error
Dec 10 14:22:53 localhost kernel: end_request: I/O error, dev sda, sector 0
Dec 10 14:22:53 localhost kernel: printk: 42 messages suppressed.
Dec 10 14:22:53 localhost kernel: Buffer I/O error on device sda, logical block 0


That is rather curious. There's no sign of any libata error handling 
going on.. Maybe the drive is actually returning that error code in the 
ATAPI CDB, or at least we think it is?


You are sure that this drive still works with older kernels using 
drivers/ide, and that the hardware didn't break at some point, I assume?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Robert Hancock

Tejun Heo wrote:

Robert Hancock wrote:

And you're quite right in your comment that we are often too quick to
blacklist hardware instead of looking into why it really is failing.
ACPI is one of those areas where we often just need to figure out how to
be bug-to-bug compatibile with what Windows is doing..


In the spirit of not blacklisting without looking deep into ACPI code,
can somebody familiar with ASL take a look at comment 11 of bug 9320?

  http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11

This is libata calling _GTM to find out how the BIOS configured the
device to determine cable type.

Thanks.


I suspect it's somewhat similar (though perhaps a different cause), the 
code is trying to lookup a value (presumably register contents) in a 
table using Match, gets a value that's not in the table (which makes 
Match return the ONES value  meaning not found) and so the 
lookup of the corresponding output value with that index fails. We'd 
need the full ASL dump to know exactly what's going on there.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Robert Hancock

Andreas Mohr wrote:

On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:

IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
actually wrongly implemented but simply expects IDE controller values
to have been set up ""differently"".


Or... one could possibly even infer from this that - maybe -
the _GTM invocation spot is wrong, it should be done somewhere
different during bootup. Or whatever.


"Whatever" indeed:

There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register,
and the possible register values are:

Package (0x04)
{
0x20,
0x31,
0x65,
0xA8
},

and from

OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
Field (CFG2, DWordAcc, NoLock, Preserve)
{
Offset (0x08),·
SSPT,   8,·
SMPT,   8,·
PSPT,   8,·
PMPT,   8,·
Offset (0x10),·
...
we can infer that at PCI_Config offset 0x48 those values should be located.
However after bootup or resume there are:

# lspci -s 00:11.1 -xxx
00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00


As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
99 99 20 20, which are not quite entirely valid judging from the array above,
and this is because the secondary port is unused, as can also be seen
from my bootup log:

scsi0 : pata_via
scsi1 : pata_via
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
ata1.00: 234441648 sectors, multi 16: LBA
ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
Switched to high resolution mode on CPU 0
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0) is 
beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] 
(Node df80b9a8), AE_AML_PACKAGE_LIM
IT
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
E_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)


Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure 
message on suspend -
only to reappear right on resume due to 99 99 20 20 combo happening again.
If I don't tweak, I get _GTM failure at both suspend and resume.


As such one can conclude that this BIOS is rather very confused when being 
called for _GTM on an entirely
unused controller port. And this is either because the BIOS is dumb or because 
ACPI doesn't really
expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
(however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Andreas Mohr



Probably Windows doesn't call _GTM on a port with no devices connected, 
and so the BIOS people never tested that case. Likely we can just avoid 
doing this - if no devices are connected the timing settings for that 
channel are irrelevant..


And you're quite right in your comment that we are often too quick to 
blacklist hardware instead of looking into why it really is failing. 
ACPI is one of those areas where we often just need to figure out how to 
be bug-to-bug compatibile with what Windows is doing..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-09 Thread Robert Hancock

Marco Gatti wrote:

Linus Torvalds schrieb:

Was there a dmesg out there somewhere?

With 4G of RAM, you probably have some of it above the 4GB mark 
(because of RAM remapping etc, and the PCI decode hole in the low 
4GB). It does sound like this is a DMA problem, and your controller 
cannot correctly DMA to the upper 4GB.


So what controller/driver, what's the dmesg, and let's see if we can 
fix it by adding a DMA mask to it to limit it to the low 32 bits.


Controller / drivers:
it's a board with intel Q35 chipset. The southbridge has an ICH9
Intel Gigabit 82566DM-2 => e1000
Intel matrix storage SATA => ahci.c
Intel graphics media accelerator => not added to kernel
Intel Audio => Intel HD Audio AC97

I just got "EXT3-fs error Allocating block in system zone" in dmesg with 
4 or more GBs of RAM. I listed boot up dmesg to get an idea of dma 
config with different amount of RAM.


Thanks for your help.


The obvious suspect with a filesystem problem would be the disk 
controller driver, AHCI here. However, the controller appears to set the 
flag to indicate that it supports 64-bit DMA, so it should be fine, 
unless it lies of course (which we know that ATI SB600 chipset does, but 
I don't believe Intel is known to).


Could still be a DMA mapping bug that only shows up when IOMMU is used. 
However, AHCI is a pretty well tested driver..




dmesg with 2GB:


..



ahci :00:1f.2: version 2.3
ACPI: PCI Interrupt :00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19
ahci :00:1f.2: nr_ports (6) and implemented port map (0xf) don't match
ahci :00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0xf impl SATA 
mode

ahci :00:1f.2: flags: 64bit ncq sntf led clo pmp pio slum part
PCI: Setting latency timer of device :00:1f.2 to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
ata1: SATA max UDMA/133 cmd 0xc2334100 ctl 0x 
bmdma 0x irq 316
ata2: SATA max UDMA/133 cmd 0xc2334180 ctl 0x 
bmdma 0x irq 316
ata3: SATA max UDMA/133 cmd 0xc2334200 ctl 0x 
bmdma 0x irq 316
ata4: SATA max UDMA/133 cmd 0xc2334280 ctl 0x 
bmdma 0x00000000 irq 316


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bug: get EXT3-fs error Allocating block in system zone

2007-12-09 Thread Robert Hancock

Marco Gatti wrote:

Linus Torvalds schrieb:

Was there a dmesg out there somewhere?

With 4G of RAM, you probably have some of it above the 4GB mark 
(because of RAM remapping etc, and the PCI decode hole in the low 
4GB). It does sound like this is a DMA problem, and your controller 
cannot correctly DMA to the upper 4GB.


So what controller/driver, what's the dmesg, and let's see if we can 
fix it by adding a DMA mask to it to limit it to the low 32 bits.


Controller / drivers:
it's a board with intel Q35 chipset. The southbridge has an ICH9
Intel Gigabit 82566DM-2 = e1000
Intel matrix storage SATA = ahci.c
Intel graphics media accelerator = not added to kernel
Intel Audio = Intel HD Audio AC97

I just got EXT3-fs error Allocating block in system zone in dmesg with 
4 or more GBs of RAM. I listed boot up dmesg to get an idea of dma 
config with different amount of RAM.


Thanks for your help.


The obvious suspect with a filesystem problem would be the disk 
controller driver, AHCI here. However, the controller appears to set the 
flag to indicate that it supports 64-bit DMA, so it should be fine, 
unless it lies of course (which we know that ATI SB600 chipset does, but 
I don't believe Intel is known to).


Could still be a DMA mapping bug that only shows up when IOMMU is used. 
However, AHCI is a pretty well tested driver..




dmesg with 2GB:


..



ahci :00:1f.2: version 2.3
ACPI: PCI Interrupt :00:1f.2[B] - GSI 19 (level, low) - IRQ 19
ahci :00:1f.2: nr_ports (6) and implemented port map (0xf) don't match
ahci :00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0xf impl SATA 
mode

ahci :00:1f.2: flags: 64bit ncq sntf led clo pmp pio slum part
PCI: Setting latency timer of device :00:1f.2 to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
ata1: SATA max UDMA/133 cmd 0xc2334100 ctl 0x 
bmdma 0x irq 316
ata2: SATA max UDMA/133 cmd 0xc2334180 ctl 0x 
bmdma 0x irq 316
ata3: SATA max UDMA/133 cmd 0xc2334200 ctl 0x 
bmdma 0x irq 316
ata4: SATA max UDMA/133 cmd 0xc2334280 ctl 0x 
bmdma 0x irq 316


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Robert Hancock

Andreas Mohr wrote:

On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:

IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
actually wrongly implemented but simply expects IDE controller values
to have been set up differently.


Or... one could possibly even infer from this that - maybe -
the _GTM invocation spot is wrong, it should be done somewhere
different during bootup. Or whatever.


Whatever indeed:

There's an ASL Match() for a PMPT (Primary Master PorT) PCI register,
and the possible register values are:

Package (0x04)
{
0x20,
0x31,
0x65,
0xA8
},

and from

OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
Field (CFG2, DWordAcc, NoLock, Preserve)
{
Offset (0x08),·
SSPT,   8,·
SMPT,   8,·
PSPT,   8,·
PMPT,   8,·
Offset (0x10),·
...
we can infer that at PCI_Config offset 0x48 those values should be located.
However after bootup or resume there are:

# lspci -s 00:11.1 -xxx
00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00


As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
99 99 20 20, which are not quite entirely valid judging from the array above,
and this is because the secondary port is unused, as can also be seen
from my bootup log:

scsi0 : pata_via
scsi1 : pata_via
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
ata1.00: 234441648 sectors, multi 16: LBA
ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
Switched to high resolution mode on CPU 0
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0) is 
beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] 
(Node df80b9a8), AE_AML_PACKAGE_LIM
IT
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
E_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)


Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure 
message on suspend -
only to reappear right on resume due to 99 99 20 20 combo happening again.
If I don't tweak, I get _GTM failure at both suspend and resume.


As such one can conclude that this BIOS is rather very confused when being 
called for _GTM on an entirely
unused controller port. And this is either because the BIOS is dumb or because 
ACPI doesn't really
expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
(however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Andreas Mohr



Probably Windows doesn't call _GTM on a port with no devices connected, 
and so the BIOS people never tested that case. Likely we can just avoid 
doing this - if no devices are connected the timing settings for that 
channel are irrelevant..


And you're quite right in your comment that we are often too quick to 
blacklist hardware instead of looking into why it really is failing. 
ACPI is one of those areas where we often just need to figure out how to 
be bug-to-bug compatibile with what Windows is doing..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-09 Thread Robert Hancock

Tejun Heo wrote:

Robert Hancock wrote:

And you're quite right in your comment that we are often too quick to
blacklist hardware instead of looking into why it really is failing.
ACPI is one of those areas where we often just need to figure out how to
be bug-to-bug compatibile with what Windows is doing..


In the spirit of not blacklisting without looking deep into ACPI code,
can somebody familiar with ASL take a look at comment 11 of bug 9320?

  http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11

This is libata calling _GTM to find out how the BIOS configured the
device to determine cable type.

Thanks.


I suspect it's somewhat similar (though perhaps a different cause), the 
code is trying to lookup a value (presumably register contents) in a 
table using Match, gets a value that's not in the table (which makes 
Match return the ONES value  meaning not found) and so the 
lookup of the corresponding output value with that index fails. We'd 
need the full ASL dump to know exactly what's going on there.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-12-08 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:
This fixes some problems with ATAPI devices on nForce4 controllers in 
ADMA mode
on systems with memory located above 4GB. We need to delay setting the 
64-bit
DMA mask until the PRD table and padding buffer are allocated so that 
they don't

get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>


This is a bit nasty :/

I would consider setting the consistent DMA mask to 32-bit, and setting 
the overall mask to 64-bit.


Seems like that would solve the problem?

Also, does this need to be rebased on top of what I just pushed upstream?

Jeff


Jeff, ping on this one? This (or, one like it) really should make it 
into 2.6.24..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-08 Thread Robert Hancock

Matthew Garrett wrote:

On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:

On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <[EMAIL PROTECTED]> wrote:

ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0) is 
beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] 
(Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
ata1.01: _GTF evaluation failed (AE 0x300d)


037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?



I should think it should have.

I think we're too aggressive about disabling the libata ACPI support, 
even. One of my laptop's _GTF commands on resume is a DEVICE 
CONFIGURATION FREEZE LOCK command, which gets rejected by the drive 
(maybe it worked on the original Hitachi disk, but I've upgraded it to a 
 newer Samsung). I'd say if the drive returns command aborted on one of 
these, we should just ignore that command and continue to the next one 
without trying to retry or disabling the ACPI support entirely.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

2007-12-08 Thread Robert Hancock

Matthew Garrett wrote:

On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:

On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr [EMAIL PROTECTED] wrote:

ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0) is 
beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] 
(Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
ata1.01: _GTF evaluation failed (AE 0x300d)


037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?



I should think it should have.

I think we're too aggressive about disabling the libata ACPI support, 
even. One of my laptop's _GTF commands on resume is a DEVICE 
CONFIGURATION FREEZE LOCK command, which gets rejected by the drive 
(maybe it worked on the original Hitachi disk, but I've upgraded it to a 
 newer Samsung). I'd say if the drive returns command aborted on one of 
these, we should just ignore that command and continue to the next one 
without trying to retry or disabling the ACPI support entirely.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-12-08 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:
This fixes some problems with ATAPI devices on nForce4 controllers in 
ADMA mode
on systems with memory located above 4GB. We need to delay setting the 
64-bit
DMA mask until the PRD table and padding buffer are allocated so that 
they don't

get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]


This is a bit nasty :/

I would consider setting the consistent DMA mask to 32-bit, and setting 
the overall mask to 64-bit.


Seems like that would solve the problem?

Also, does this need to be rebased on top of what I just pushed upstream?

Jeff


Jeff, ping on this one? This (or, one like it) really should make it 
into 2.6.24..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash

2007-12-07 Thread Robert Hancock

Zan Lynx wrote:

On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote:

On Fri, 07 Dec 2007 23:09:43 +
Zan Lynx <[EMAIL PROTECTED]> wrote:


On Fri, 2007-12-07 at 15:02 -0800, Andrew Morton wrote:

On Fri, 07 Dec 2007 20:38:24 +
Zan Lynx <[EMAIL PROTECTED]> wrote:


While I'm reporting problems I'll get this one out there.

I normally use a USB-2 memory card reader but I also have a PCMCIA
CompactFlash adapter that I use occasionally.  During the MM series
kernels 2.6.22 and 23 (I am pretty sure) this didn't work at all.  I
don't know about vanilla since I don't run that.

Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I
only get read rates of 1.6 MB/s.  When it used to work in 2.6.20 I got
at least 16 MB/s.  The card itself is capable of 30+ in the USB-2
reader.


[cut]

Oh, OK.  Hopefully the ata guys can help out with this.

I don't know if it actually strictly a regression?  Did libata ever support
that device in any earlier kernels?


That could be why it didn't work for a few kernel versions.  I
reconfigured for a libata-only system a while back.  And, since I
usually use the USB-2 flash reader I didn't care much about the PCMCIA.

I will try reverting that patch later tonight, in a few hours.


It looks like pata_pcmcia is always PIO mode 0:

/**
 *  pcmcia_init_one -   attach a PCMCIA interface
 *  @pdev: pcmcia device
 *
 *  Register a PCMCIA IDE interface. Such interfaces are PIO 0 and
 *  shared IRQ.
 */

I assume that with old IDE this would use ide_cs.c, but I'm drawing a 
blank on what modes that supports..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64 dynticks not working prev: cpuidle, dynticks compatible or no?

2007-12-07 Thread Robert Hancock

Ed Sweetman wrote:
System is idle now, previously it was doing something i couldn't halt at 
the time.  I'm looking at "Local timer interrupts" in the "Loc:" section 
of /proc/interrupts.
Across 1 second while the system is pretty much idle, i still get 300 
interrupts. My HZ variable is set to 300 in the kernel config, so this 
is expected but I was under the assumption that dynticks/tickless being 
compiled in would cause that to be much lower.


Am I reading the wrong section of /proc/interrupts  to verify if 
dynticks is working or not? Again, i see no difference in cpu temp at all.


Try running powertop ( http://www.lesswatts.org/projects/powertop/ ) and 
see what it reports.


I don't think dynticks will generally save huge amounts of power on a 
typical desktop machine. The big gains come from being able to stay in 
deep sleep C-states (C2/C3) for longer periods of time, but most desktop 
machines only enable sleep states down to C1.




In case it helps, this is an athlon64 x2 with apic functioning and both 
cores active in 64bit mode. dmesg is below.

not related :
Some additional notes:  it87 is my lm_sensor, it doesn't work in this 
kernel, yet it did in 2.6.22.  Perhaps enabling high precision timers 
changed something in acpi land.


I enabled tcp dma offloading in this kernel, i get debugging output 
related to it, error is at the last line.  No corruption or otherwise 
bad behavior.   Transferring via cifs at 9.7MB/sec "incoming" took about 
15% of one cpu...  I never bothered to check if that is the norm but i 
suspect i'll be removing that feature as it seems to not play nice with 
the kernel.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible EXT2 race

2007-12-07 Thread Robert Hancock

linux-os (Dick Johnson) wrote:

On Fri, 7 Dec 2007, Dave Jones wrote:


On Fri, Dec 07, 2007 at 08:15:42AM -0500, linux-os (Dick Johnson) wrote:


Dec  7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device 
write fault

This sounds more like a hardware problem.

Dave



There was an attempt to write beyond the end of the device because
everything in the file-system was getting trashed. I can read/write
the 5 year-old SCSI physical drive with no errors from both within
linux and through the Adaptec BIOS. This problem only occurs
when I attempt to truncate a file that is being written by another
task.


That SCSI error code doesn't sound like a reasonable one for the drive 
getting a bad block address. The more typical one in that case would be 
"Logical block address out of range", or maybe the catch-all "Invalid 
field in CDB". "Peripheral device write fault", especially as a deferred 
error (i.e. after the drive already returned a normal completion for the 
data, and then is reporting the failure to actually write to the media 
on the next command), really sounds like a drive problem.


And the kernel is supposed to trap those at the disk layer, like these 
are saying it is, _after_ that error occurs:


Dec  7 04:08:13 chaos kernel: attempt to access beyond end of device
Dec  7 04:08:13 chaos kernel: sdb1: rw=0, want=29687515944, limit=33736437

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash

2007-12-07 Thread Robert Hancock

Zan Lynx wrote:

On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote:

On Fri, 07 Dec 2007 23:09:43 +
Zan Lynx [EMAIL PROTECTED] wrote:


On Fri, 2007-12-07 at 15:02 -0800, Andrew Morton wrote:

On Fri, 07 Dec 2007 20:38:24 +
Zan Lynx [EMAIL PROTECTED] wrote:


While I'm reporting problems I'll get this one out there.

I normally use a USB-2 memory card reader but I also have a PCMCIA
CompactFlash adapter that I use occasionally.  During the MM series
kernels 2.6.22 and 23 (I am pretty sure) this didn't work at all.  I
don't know about vanilla since I don't run that.

Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I
only get read rates of 1.6 MB/s.  When it used to work in 2.6.20 I got
at least 16 MB/s.  The card itself is capable of 30+ in the USB-2
reader.


[cut]

Oh, OK.  Hopefully the ata guys can help out with this.

I don't know if it actually strictly a regression?  Did libata ever support
that device in any earlier kernels?


That could be why it didn't work for a few kernel versions.  I
reconfigured for a libata-only system a while back.  And, since I
usually use the USB-2 flash reader I didn't care much about the PCMCIA.

I will try reverting that patch later tonight, in a few hours.


It looks like pata_pcmcia is always PIO mode 0:

/**
 *  pcmcia_init_one -   attach a PCMCIA interface
 *  @pdev: pcmcia device
 *
 *  Register a PCMCIA IDE interface. Such interfaces are PIO 0 and
 *  shared IRQ.
 */

I assume that with old IDE this would use ide_cs.c, but I'm drawing a 
blank on what modes that supports..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64 dynticks not working prev: cpuidle, dynticks compatible or no?

2007-12-07 Thread Robert Hancock

Ed Sweetman wrote:
System is idle now, previously it was doing something i couldn't halt at 
the time.  I'm looking at Local timer interrupts in the Loc: section 
of /proc/interrupts.
Across 1 second while the system is pretty much idle, i still get 300 
interrupts. My HZ variable is set to 300 in the kernel config, so this 
is expected but I was under the assumption that dynticks/tickless being 
compiled in would cause that to be much lower.


Am I reading the wrong section of /proc/interrupts  to verify if 
dynticks is working or not? Again, i see no difference in cpu temp at all.


Try running powertop ( http://www.lesswatts.org/projects/powertop/ ) and 
see what it reports.


I don't think dynticks will generally save huge amounts of power on a 
typical desktop machine. The big gains come from being able to stay in 
deep sleep C-states (C2/C3) for longer periods of time, but most desktop 
machines only enable sleep states down to C1.




In case it helps, this is an athlon64 x2 with apic functioning and both 
cores active in 64bit mode. dmesg is below.

not related :
Some additional notes:  it87 is my lm_sensor, it doesn't work in this 
kernel, yet it did in 2.6.22.  Perhaps enabling high precision timers 
changed something in acpi land.


I enabled tcp dma offloading in this kernel, i get debugging output 
related to it, error is at the last line.  No corruption or otherwise 
bad behavior.   Transferring via cifs at 9.7MB/sec incoming took about 
15% of one cpu...  I never bothered to check if that is the norm but i 
suspect i'll be removing that feature as it seems to not play nice with 
the kernel.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible EXT2 race

2007-12-07 Thread Robert Hancock

linux-os (Dick Johnson) wrote:

On Fri, 7 Dec 2007, Dave Jones wrote:


On Fri, Dec 07, 2007 at 08:15:42AM -0500, linux-os (Dick Johnson) wrote:


Dec  7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device 
write fault

This sounds more like a hardware problem.

Dave



There was an attempt to write beyond the end of the device because
everything in the file-system was getting trashed. I can read/write
the 5 year-old SCSI physical drive with no errors from both within
linux and through the Adaptec BIOS. This problem only occurs
when I attempt to truncate a file that is being written by another
task.


That SCSI error code doesn't sound like a reasonable one for the drive 
getting a bad block address. The more typical one in that case would be 
Logical block address out of range, or maybe the catch-all Invalid 
field in CDB. Peripheral device write fault, especially as a deferred 
error (i.e. after the drive already returned a normal completion for the 
data, and then is reporting the failure to actually write to the media 
on the next command), really sounds like a drive problem.


And the kernel is supposed to trap those at the disk layer, like these 
are saying it is, _after_ that error occurs:


Dec  7 04:08:13 chaos kernel: attempt to access beyond end of device
Dec  7 04:08:13 chaos kernel: sdb1: rw=0, want=29687515944, limit=33736437

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-06 Thread Robert Hancock

David P. Reed wrote:
After much, much testing (months, off and on, pursuing hypotheses), I've 
discovered that the use of "outb al,0x80" instructions to "delay" after 
inb and outb instructions causes solid freezes on my HP dv9000z laptop, 
when ACPI is enabled.


It takes a fair number of out's to 0x80, but the hard freeze is reliably 
reproducible by writing a driver that solely does a loop of 50 outb's to 
0x80 and calling it in a loop 1000 times from user space.  !!!


The serious impact is that the /dev/rtc and /dev/nvram devices are very 
unreliable - thus "hwclock" freezes very reliably while looping waiting 
for a new second value and calling "cat /dev/nvram" in a loop freezes 
the machine if done a few times in a row.


This is reproducible, but requires a fair number of outb's to the 0x80 
diagnostic port, and seems to require ACPI to be on.


io_64.h is the source of these particular instructions, via the 
CMOS_READ and CMOS_WRITE macros, which are defined in mc146818_64.h.  (I 
wonder if the same problem occurs in 32-bit mode).


I'm happy to complete and test a patch, but I'm curious what the right 
approach ought to be.  I have to say I have no clue as to what ACPI is 
doing on this chipset  (nvidia MCP51) that would make port 80 do this.  
A raw random guess is that something is logging POST codes, but if so, 
not clear what is problematic in ACPI mode.


ANy help/suggestions?

Changing the delay instruction sequence from the outb to short jumps 
might be the safe thing.  But Linus, et al. may have experience with 
that on other architectures like older Pentiums etc.


The fact that these "pausing" calls are needed in the first place seems 
rather cheesy. If there's hardware that's unable to respond to IO port 
writes as fast as possible, then surely there's a better solution than 
trying to stall the IOs by an arbitrary and hardware-dependent amount of 
time, like udelay calls, etc. Does any remotely recent hardware even 
need this?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops

2007-12-06 Thread Robert Hancock

David P. Reed wrote:
After much, much testing (months, off and on, pursuing hypotheses), I've 
discovered that the use of outb al,0x80 instructions to delay after 
inb and outb instructions causes solid freezes on my HP dv9000z laptop, 
when ACPI is enabled.


It takes a fair number of out's to 0x80, but the hard freeze is reliably 
reproducible by writing a driver that solely does a loop of 50 outb's to 
0x80 and calling it in a loop 1000 times from user space.  !!!


The serious impact is that the /dev/rtc and /dev/nvram devices are very 
unreliable - thus hwclock freezes very reliably while looping waiting 
for a new second value and calling cat /dev/nvram in a loop freezes 
the machine if done a few times in a row.


This is reproducible, but requires a fair number of outb's to the 0x80 
diagnostic port, and seems to require ACPI to be on.


io_64.h is the source of these particular instructions, via the 
CMOS_READ and CMOS_WRITE macros, which are defined in mc146818_64.h.  (I 
wonder if the same problem occurs in 32-bit mode).


I'm happy to complete and test a patch, but I'm curious what the right 
approach ought to be.  I have to say I have no clue as to what ACPI is 
doing on this chipset  (nvidia MCP51) that would make port 80 do this.  
A raw random guess is that something is logging POST codes, but if so, 
not clear what is problematic in ACPI mode.


ANy help/suggestions?

Changing the delay instruction sequence from the outb to short jumps 
might be the safe thing.  But Linus, et al. may have experience with 
that on other architectures like older Pentiums etc.


The fact that these pausing calls are needed in the first place seems 
rather cheesy. If there's hardware that's unable to respond to IO port 
writes as fast as possible, then surely there's a better solution than 
trying to stall the IOs by an arbitrary and hardware-dependent amount of 
time, like udelay calls, etc. Does any remotely recent hardware even 
need this?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Timers SMP] can this machine be helped?

2007-12-05 Thread Robert Hancock

Guennadi Liakhovetski wrote:

On Tue, 4 Dec 2007, Robert Hancock wrote:


Guennadi Liakhovetski wrote:

I've got an old 2xP-II @ 400MHz Compaq AP400 system, which I'm still using.
It has many peculiarities, so, I wouldn't be surprised if the answer to my
questions would be "sorry, the patient is rather dead than alive".

How about disabling ACPI entirely, acpi=off on kernel command line? I wouldn't
be surprised to see a lot of ACPI stuff broken on an older machine like that..


See above - it's an SMP.

Thanks
Guennadi
---
Guennadi Liakhovetski



On a machine that old, you shouldn't need ACPI to detect both CPUs, it 
should be able to use MPS..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Timers SMP] can this machine be helped?

2007-12-05 Thread Robert Hancock

Guennadi Liakhovetski wrote:

On Tue, 4 Dec 2007, Robert Hancock wrote:


Guennadi Liakhovetski wrote:

I've got an old 2xP-II @ 400MHz Compaq AP400 system, which I'm still using.
It has many peculiarities, so, I wouldn't be surprised if the answer to my
questions would be sorry, the patient is rather dead than alive.

How about disabling ACPI entirely, acpi=off on kernel command line? I wouldn't
be surprised to see a lot of ACPI stuff broken on an older machine like that..


See above - it's an SMP.

Thanks
Guennadi
---
Guennadi Liakhovetski



On a machine that old, you shouldn't need ACPI to detect both CPUs, it 
should be able to use MPS..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Timers SMP] can this machine be helped?

2007-12-04 Thread Robert Hancock

Guennadi Liakhovetski wrote:

Hi,

I've got an old 2xP-II @ 400MHz Compaq AP400 system, which I'm still 
using. It has many peculiarities, so, I wouldn't be surprised if the 
answer to my questions would be "sorry, the patient is rather dead than 
alive".


Some of the problems lie in ACPI area, I tried some time ago to fix the 
ACPI tables for these machine, but never got enough time for that. So I'm 
still booting with acpi=noirq


Another problem is its battery is dead and it's hard soldered to the 
mainboard (Compaq)...


It might also have some problems with one of its 3 SCSI busses.

I compiled a .24-ish kernel for it with CONFIG_NO_HZ and 
CONFIG_HIGH_RES_TIMERS. To get the system boot at least sometimes I have 
to specify nohz=off. Then I get


* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources

Without this parameter it hangs usually between

Time: acpi_pm clocksource has been installed.

and

Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0

Tried booting with clocksource=tsc then I've got

Marking TSC unstable due to: possible TSC halt in C2.

And then a few of these:

BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc2-g8c086340 #3)
EIP: 0060:[] EFLAGS: 0283 CPU: 0
EIP is at acpi_processor_idle+0x2ae/0x477
EAX:  EBX: feab ECX: 0001 EDX: 0001
ESI: c7c5f2d0 EDI: 00122d9f EBP: c03ddfa8 ESP: c03ddf90
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 081dcf88 CR3: 07e46000 CR4: 02d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] show_trace_log_lvl+0x1a/0x30
 [] show_trace+0x12/0x20
 [] show_regs+0x1c/0x20
 [] softlockup_tick+0x11b/0x150
 [] run_local_timers+0x12/0x20
 [] update_process_times+0x2f/0x60
 [] tick_sched_timer+0x6a/0xe0
 [] hrtimer_interrupt+0x120/0x1a0
 [] smp_apic_timer_interrupt+0x55/0x90
 [] apic_timer_interrupt+0x28/0x30
 [] cpu_idle+0x84/0xf0
 [] rest_init+0x5d/0x60
 [] start_kernel+0x2af/0x2f0
 [<>] run_init_process+0x3feff000/0x20
 ===

so, is there any way I can still reasonably use this system? Which 
configuration / command-line parameters should I try?


If needed can provide complete dmesg (with nohz=off or with 
clocksource=tsc) and .config.


How about disabling ACPI entirely, acpi=off on kernel command line? I 
wouldn't be surprised to see a lot of ACPI stuff broken on an older 
machine like that..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-04 Thread Robert Hancock

Justin Piszcz wrote:
The badblocks did not do anything; however, when I built a software raid 
5 and the performed a dd:


/usr/bin/time dd if=/dev/zero of=fill_disk bs=1M

I saw this somewhere along the way:

[30189.967531] RAID5 conf printout:
[30189.967576]  --- rd:3 wd:3
[30189.967617]  disk 0, o:1, dev:sdc1
[30189.967660]  disk 1, o:1, dev:sdd1
[30189.967716]  disk 2, o:1, dev:sde1
[42332.936615] ata5.00: exception Emask 0x2 SAct 0x7000 SErr 0x0 action 
0x2 frozen
[42332.936706] ata5.00: spurious completions during NCQ issue=0x0 
SAct=0x7000 FIS=004040a1:0800
[42332.936804] ata5.00: cmd 61/08:60:6f:4d:2a/00:00:27:00:00/40 tag 12 
cdb 0x0 data 4096 out
[42332.936805]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 
0x2 (HSM violation)
[42332.936977] ata5.00: cmd 61/08:68:77:4d:2a/00:00:27:00:00/40 tag 13 
cdb 0x0 data 4096 out
[42332.936981]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 
0x2 (HSM violation)
[42332.937162] ata5.00: cmd 61/00:70:0f:49:2a/04:00:27:00:00/40 tag 14 
cdb 0x0 data 524288 out
[42332.937163]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 
0x2 (HSM violation)

[42333.240054] ata5: soft resetting port
[42333.494462] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[42333.506592] ata5.00: configured for UDMA/133
[42333.506652] ata5: EH complete
[42333.506741] sd 4:0:0:0: [sde] 1465149168 512-byte hardware sectors 
(750156 MB)

[42333.506834] sd 4:0:0:0: [sde] Write Protect is off
[42333.506887] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
[42333.506905] sd 4:0:0:0: [sde] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA


Next test, I will turn off NCQ and try to make the problem re-occur.
If anyone else has any thoughts here..?
I ran long smart tests on all 3 disks, they all ran successfully.

Perhaps these drives need to be NCQ BLACKLISTED with the P35 chipset?


The problem won't recur with NCQ off, because spurious completions are 
impossible in that case.


It was originally thought that these AHCI spurious NCQ completions were 
busted NCQ implementations on the drives, but I think there theory is 
that it's some other timing problem or some such, given the number of 
drives across all makers which are reported to do this. I believe Tejun 
is investigating?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: solid state drive access and context switching

2007-12-04 Thread Robert Hancock

Chris Friesen wrote:


Over on comp.os.linux.development.system someone asked an interesting 
question, and I thought I'd mention it here.


Given a fast low-latency solid state drive, would it ever be beneficial 
to simply wait in the kernel for synchronous read/write calls to 
complete?  The idea is that you could avoid at least two task context 
switches, and if the data access can be completed at less cost than 
those context switches it could be an overall win.


Has anyone played with this concept?


I don't think most SSDs are fast enough that it would really be worth 
avoiding the context switch for.. I could be wrong though.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-12-04 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:
This fixes some problems with ATAPI devices on nForce4 controllers in 
ADMA mode
on systems with memory located above 4GB. We need to delay setting the 
64-bit
DMA mask until the PRD table and padding buffer are allocated so that 
they don't

get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>


This is a bit nasty :/

I would consider setting the consistent DMA mask to 32-bit, and setting 
the overall mask to 64-bit.


Seems like that would solve the problem?


The issue with that is that it would also constrain the ADMA CPB/PRD 
table allocation to 32-bit, which I'd rather avoid having to do. There 
are dual-socket Opteron boxes like HP xw9300 that use this controller, 
and limiting the allocation to 32-bit could force a non-optimal node 
allocation for the table memory.


These type of devices really want a version of dma_alloc_coherent that 
allows overriding the DMA mask for specific allocations to make this 
cleaner. I'm sure this isn't the only device that has different DMA mask 
requirements for different consistent memory allocations..


This patch does has the advantage of being confirmed to fix the 
reporter's problem (https://bugzilla.redhat.com/show_bug.cgi?id=351451) 
which there's something to be said for this late in the .24-rc series..




Also, does this need to be rebased on top of what I just pushed upstream?


It don't think so.. this change is independent from the "sata_nv: don't 
use legacy DMA in ADMA mode (v3)" patch you just merged.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-12-04 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:
This fixes some problems with ATAPI devices on nForce4 controllers in 
ADMA mode
on systems with memory located above 4GB. We need to delay setting the 
64-bit
DMA mask until the PRD table and padding buffer are allocated so that 
they don't

get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]


This is a bit nasty :/

I would consider setting the consistent DMA mask to 32-bit, and setting 
the overall mask to 64-bit.


Seems like that would solve the problem?


The issue with that is that it would also constrain the ADMA CPB/PRD 
table allocation to 32-bit, which I'd rather avoid having to do. There 
are dual-socket Opteron boxes like HP xw9300 that use this controller, 
and limiting the allocation to 32-bit could force a non-optimal node 
allocation for the table memory.


These type of devices really want a version of dma_alloc_coherent that 
allows overriding the DMA mask for specific allocations to make this 
cleaner. I'm sure this isn't the only device that has different DMA mask 
requirements for different consistent memory allocations..


This patch does has the advantage of being confirmed to fix the 
reporter's problem (https://bugzilla.redhat.com/show_bug.cgi?id=351451) 
which there's something to be said for this late in the .24-rc series..




Also, does this need to be rebased on top of what I just pushed upstream?


It don't think so.. this change is independent from the sata_nv: don't 
use legacy DMA in ADMA mode (v3) patch you just merged.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: solid state drive access and context switching

2007-12-04 Thread Robert Hancock

Chris Friesen wrote:


Over on comp.os.linux.development.system someone asked an interesting 
question, and I thought I'd mention it here.


Given a fast low-latency solid state drive, would it ever be beneficial 
to simply wait in the kernel for synchronous read/write calls to 
complete?  The idea is that you could avoid at least two task context 
switches, and if the data access can be completed at less cost than 
those context switches it could be an overall win.


Has anyone played with this concept?


I don't think most SSDs are fast enough that it would really be worth 
avoiding the context switch for.. I could be wrong though.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-04 Thread Robert Hancock

Justin Piszcz wrote:
The badblocks did not do anything; however, when I built a software raid 
5 and the performed a dd:


/usr/bin/time dd if=/dev/zero of=fill_disk bs=1M

I saw this somewhere along the way:

[30189.967531] RAID5 conf printout:
[30189.967576]  --- rd:3 wd:3
[30189.967617]  disk 0, o:1, dev:sdc1
[30189.967660]  disk 1, o:1, dev:sdd1
[30189.967716]  disk 2, o:1, dev:sde1
[42332.936615] ata5.00: exception Emask 0x2 SAct 0x7000 SErr 0x0 action 
0x2 frozen
[42332.936706] ata5.00: spurious completions during NCQ issue=0x0 
SAct=0x7000 FIS=004040a1:0800
[42332.936804] ata5.00: cmd 61/08:60:6f:4d:2a/00:00:27:00:00/40 tag 12 
cdb 0x0 data 4096 out
[42332.936805]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 
0x2 (HSM violation)
[42332.936977] ata5.00: cmd 61/08:68:77:4d:2a/00:00:27:00:00/40 tag 13 
cdb 0x0 data 4096 out
[42332.936981]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 
0x2 (HSM violation)
[42332.937162] ata5.00: cmd 61/00:70:0f:49:2a/04:00:27:00:00/40 tag 14 
cdb 0x0 data 524288 out
[42332.937163]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 
0x2 (HSM violation)

[42333.240054] ata5: soft resetting port
[42333.494462] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[42333.506592] ata5.00: configured for UDMA/133
[42333.506652] ata5: EH complete
[42333.506741] sd 4:0:0:0: [sde] 1465149168 512-byte hardware sectors 
(750156 MB)

[42333.506834] sd 4:0:0:0: [sde] Write Protect is off
[42333.506887] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
[42333.506905] sd 4:0:0:0: [sde] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA


Next test, I will turn off NCQ and try to make the problem re-occur.
If anyone else has any thoughts here..?
I ran long smart tests on all 3 disks, they all ran successfully.

Perhaps these drives need to be NCQ BLACKLISTED with the P35 chipset?


The problem won't recur with NCQ off, because spurious completions are 
impossible in that case.


It was originally thought that these AHCI spurious NCQ completions were 
busted NCQ implementations on the drives, but I think there theory is 
that it's some other timing problem or some such, given the number of 
drives across all makers which are reported to do this. I believe Tejun 
is investigating?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Timers SMP] can this machine be helped?

2007-12-04 Thread Robert Hancock

Guennadi Liakhovetski wrote:

Hi,

I've got an old 2xP-II @ 400MHz Compaq AP400 system, which I'm still 
using. It has many peculiarities, so, I wouldn't be surprised if the 
answer to my questions would be sorry, the patient is rather dead than 
alive.


Some of the problems lie in ACPI area, I tried some time ago to fix the 
ACPI tables for these machine, but never got enough time for that. So I'm 
still booting with acpi=noirq


Another problem is its battery is dead and it's hard soldered to the 
mainboard (Compaq)...


It might also have some problems with one of its 3 SCSI busses.

I compiled a .24-ish kernel for it with CONFIG_NO_HZ and 
CONFIG_HIGH_RES_TIMERS. To get the system boot at least sometimes I have 
to specify nohz=off. Then I get


* Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
* this clock source is slow. Consider trying other clock sources

Without this parameter it hangs usually between

Time: acpi_pm clocksource has been installed.

and

Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0

Tried booting with clocksource=tsc then I've got

Marking TSC unstable due to: possible TSC halt in C2.

And then a few of these:

BUG: soft lockup - CPU#0 stuck for 13s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc2-g8c086340 #3)
EIP: 0060:[c0233d33] EFLAGS: 0283 CPU: 0
EIP is at acpi_processor_idle+0x2ae/0x477
EAX:  EBX: feab ECX: 0001 EDX: 0001
ESI: c7c5f2d0 EDI: 00122d9f EBP: c03ddfa8 ESP: c03ddf90
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
CR0: 8005003b CR2: 081dcf88 CR3: 07e46000 CR4: 02d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [c01053fa] show_trace_log_lvl+0x1a/0x30
 [c0105f42] show_trace+0x12/0x20
 [c01024fc] show_regs+0x1c/0x20
 [c014fabb] softlockup_tick+0x11b/0x150
 [c01311f2] run_local_timers+0x12/0x20
 [c013168f] update_process_times+0x2f/0x60
 [c014597a] tick_sched_timer+0x6a/0xe0
 [c013fba0] hrtimer_interrupt+0x120/0x1a0
 [c0119ff5] smp_apic_timer_interrupt+0x55/0x90
 [c0104e70] apic_timer_interrupt+0x28/0x30
 [c0102624] cpu_idle+0x84/0xf0
 [c0316a7d] rest_init+0x5d/0x60
 [c03e1a7f] start_kernel+0x2af/0x2f0
 [] run_init_process+0x3feff000/0x20
 ===

so, is there any way I can still reasonably use this system? Which 
configuration / command-line parameters should I try?


If needed can provide complete dmesg (with nohz=off or with 
clocksource=tsc) and .config.


How about disabling ACPI entirely, acpi=off on kernel command line? I 
wouldn't be surprised to see a lot of ACPI stuff broken on an older 
machine like that..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-01 Thread Robert Hancock

Justin Piszcz wrote:
I am putting a new machine together and I have dual raptor raid 1 for 
the root, which works just fine under all stress tests.


Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on 
sale now adays):


I ran the following:

dd if=/dev/zero of=/dev/sdc
dd if=/dev/zero of=/dev/sdd
dd if=/dev/zero of=/dev/sde

(as it is always a very good idea to do this with any new disk)

And sometime along the way(?) (i had gone to sleep and let it run), this 
occurred:


[42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 
action 0x2 frozen

[42880.680231] ata3.00: irq_stat 0x00400040, connection status changed
[42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 
cdb 0x0 data 512 in
[42880.680292]  res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 
0x10 (ATA bus error)

[42881.841899] ata3: soft resetting port
[42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[42915.919042] ata3.00: qc timeout (cmd 0xec)
[42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[42915.919149] ata3.00: revalidation failed (errno=-5)
[42915.919206] ata3: failed to recover some devices, retrying in 5 secs
[42920.912458] ata3: hard resetting port
[42926.411363] ata3: port is slow to respond, please be patient (Status 
0x80)

[42930.943080] ata3: COMRESET failed (errno=-16)
[42930.943130] ata3: hard resetting port
[42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[42931.413523] ata3.00: configured for UDMA/133
[42931.413586] ata3: EH pending after completion, repeating EH (cnt=4)
[42931.413655] ata3: EH complete
[42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors 
(750156 MB)

[42931.413809] sd 2:0:0:0: [sdc] Write Protect is off
[42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA


Usually when I see this sort of thing with another box I have full of 
raptors, it was due to a bad raptor and I never saw it again after I 
replaced the disk that it happened on, but that was using the Intel P965 
chipset.


For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of 
the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge).


I am going to do some further testing but does this indicate a bad 
drive? Bad cable?  Bad connector?


Could be any of the above.



As you can see above, /dev/sdc stopped responding for a little bit and 
then the kernel reset the port.


It looks like the first thing that happened is that the controller 
reported it lost the SATA link, and then the drive didn't respond until 
it was bashed with a few hard resets..




Why is this though?  What is the likely root cause?  Should I replace 
the drive?  Obviously this is not normal and cannot be good at all, the 
idea is to put these drives in a RAID5 and if one is going to timeout 
that is going to cause the array to go degraded and thus be worthless in 
a raid5 configuration.


Can anyone offer any insight here?

Thank you,

Justin.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-01 Thread Robert Hancock

Justin Piszcz wrote:
I am putting a new machine together and I have dual raptor raid 1 for 
the root, which works just fine under all stress tests.


Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on 
sale now adays):


I ran the following:

dd if=/dev/zero of=/dev/sdc
dd if=/dev/zero of=/dev/sdd
dd if=/dev/zero of=/dev/sde

(as it is always a very good idea to do this with any new disk)

And sometime along the way(?) (i had gone to sleep and let it run), this 
occurred:


[42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 
action 0x2 frozen

[42880.680231] ata3.00: irq_stat 0x00400040, connection status changed
[42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 
cdb 0x0 data 512 in
[42880.680292]  res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 
0x10 (ATA bus error)

[42881.841899] ata3: soft resetting port
[42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[42915.919042] ata3.00: qc timeout (cmd 0xec)
[42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[42915.919149] ata3.00: revalidation failed (errno=-5)
[42915.919206] ata3: failed to recover some devices, retrying in 5 secs
[42920.912458] ata3: hard resetting port
[42926.411363] ata3: port is slow to respond, please be patient (Status 
0x80)

[42930.943080] ata3: COMRESET failed (errno=-16)
[42930.943130] ata3: hard resetting port
[42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[42931.413523] ata3.00: configured for UDMA/133
[42931.413586] ata3: EH pending after completion, repeating EH (cnt=4)
[42931.413655] ata3: EH complete
[42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors 
(750156 MB)

[42931.413809] sd 2:0:0:0: [sdc] Write Protect is off
[42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA


Usually when I see this sort of thing with another box I have full of 
raptors, it was due to a bad raptor and I never saw it again after I 
replaced the disk that it happened on, but that was using the Intel P965 
chipset.


For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of 
the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge).


I am going to do some further testing but does this indicate a bad 
drive? Bad cable?  Bad connector?


Could be any of the above.



As you can see above, /dev/sdc stopped responding for a little bit and 
then the kernel reset the port.


It looks like the first thing that happened is that the controller 
reported it lost the SATA link, and then the drive didn't respond until 
it was bashed with a few hard resets..




Why is this though?  What is the likely root cause?  Should I replace 
the drive?  Obviously this is not normal and cannot be good at all, the 
idea is to put these drives in a RAID5 and if one is going to timeout 
that is going to cause the array to go degraded and thus be worthless in 
a raid5 configuration.


Can anyone offer any insight here?

Thank you,

Justin.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Robert Hancock

Phillip Susi wrote:

Tejun Heo wrote:

Agreed.  Nobody cared on ATA controllers is usually very effective at
taking the whole machine down.  Is there any reason why we don't turn on
irqpoll on turned off IRQs automatically?


Why does a single spurious interrupt cause it to be shut down?  I can 
see if the interrupt is stuck on and keeps interrupting constantly, but 
if it's just the occasional spurious interrupt, why not just ignore it 
and move on?


I'm not certain offhand, but I think there may be such a threshold. 
However, an occasional spurious interrupt isn't likely. For a 
level-triggered interrupt, an unhandled interrupt will keep interrupting 
forever since nobody knows how to clear it (until we decide to disable 
the IRQ entirely).


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possibly SATA related freeze killed networking and RAID

2007-11-29 Thread Robert Hancock

Phillip Susi wrote:

Tejun Heo wrote:

Agreed.  Nobody cared on ATA controllers is usually very effective at
taking the whole machine down.  Is there any reason why we don't turn on
irqpoll on turned off IRQs automatically?


Why does a single spurious interrupt cause it to be shut down?  I can 
see if the interrupt is stuck on and keeps interrupting constantly, but 
if it's just the occasional spurious interrupt, why not just ignore it 
and move on?


I'm not certain offhand, but I think there may be such a threshold. 
However, an occasional spurious interrupt isn't likely. For a 
level-triggered interrupt, an unhandled interrupt will keep interrupting 
forever since nobody knows how to clear it (until we decide to disable 
the IRQ entirely).


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How to map user space's virtual memory into kernel logical address space

2007-11-28 Thread Robert Hancock

Maitre Bart wrote:

A given app is allocating a large amount of memory (~10M) with
malloc().
It passes this pointer to the kernel (device driver) via an custom
ioctl.
I would like the driver to work on that memory with a pointer (as if
it was allocated with vmalloc) as well as the user space too (upon
return of the syscall).
Is there a way to map a user space's virtual memory range into the
kernel logical address space?

As far as I learned from my readings, using the user-space pointer
directly in kernel space will not work.

Of course, copy_from_user() is out of question for efficiency
purposes.

ioremap() is pretty close to what I wish to do except that it accepts
a physical address and I don't how to get it from a user space
pointer. And since a physical address is required, I assume the range
is considered contiguous, which is not really the case for malloc().

mmap()/remap_pfn_range() are interesting but I don't know how to get a
kernel pointer out of them.

kmap() does the job for a single page (and anyway, I wouldn't know how
to  feed it with a struct page from the userland pointer).

get_user_pages() looks promising but it seems I have to call kmap() on
each page, so it looks like I cannot operate on the buffer with a
single pointer.

Does any one know if it is possible? And if so, how can I do it?


10MB is an awfully big mapping to put into kernel virtual memory space. 
I suspect it might be easier to allocate the memory in the kernel and 
map it in from userspace, but then you have the same problem (and 10MB 
is awfully big for vmalloc).


Is there a good reason why you have to be able to do this? There's 
likely a better way.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How to map user space's virtual memory into kernel logical address space

2007-11-28 Thread Robert Hancock

Maitre Bart wrote:

A given app is allocating a large amount of memory (~10M) with
malloc().
It passes this pointer to the kernel (device driver) via an custom
ioctl.
I would like the driver to work on that memory with a pointer (as if
it was allocated with vmalloc) as well as the user space too (upon
return of the syscall).
Is there a way to map a user space's virtual memory range into the
kernel logical address space?

As far as I learned from my readings, using the user-space pointer
directly in kernel space will not work.

Of course, copy_from_user() is out of question for efficiency
purposes.

ioremap() is pretty close to what I wish to do except that it accepts
a physical address and I don't how to get it from a user space
pointer. And since a physical address is required, I assume the range
is considered contiguous, which is not really the case for malloc().

mmap()/remap_pfn_range() are interesting but I don't know how to get a
kernel pointer out of them.

kmap() does the job for a single page (and anyway, I wouldn't know how
to  feed it with a struct page from the userland pointer).

get_user_pages() looks promising but it seems I have to call kmap() on
each page, so it looks like I cannot operate on the buffer with a
single pointer.

Does any one know if it is possible? And if so, how can I do it?


10MB is an awfully big mapping to put into kernel virtual memory space. 
I suspect it might be easier to allocate the memory in the kernel and 
map it in from userspace, but then you have the same problem (and 10MB 
is awfully big for vmalloc).


Is there a good reason why you have to be able to do this? There's 
likely a better way.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >