Re: 3D OpenGL applications eat CPU ressources

2010-02-16 Thread Émeric Maschino
2010/2/3 Stephane Marchesin :
> Really if you have such lockups they may also happen on x86, did you
> try the card there?

Hello,

I had some free time. So I've tried my FireGL X1 adapter on x86
hardware, no problem.

I don't know if it can provide valuable information, but I've also
tried an AGP Radeon 7500 graphics adapter in my ia64 system.

Without a xorg.conf file, AGP rate was automatically set at 4x, w/ SBA
and w/o FW. XAA acceleration was enabled by default. I did not
experience any problem with tiny OpenGL applications, like glxgears
(~380 fps in average). As a test, I ran quake2 : textures on the walls
and the floor were quickly corrupted, as if a translucent rainbow
color texture was blended with the wall/floor texture. And within
seconds, screen refresh was frozen, as if the application locked the
system hard. But it was not: top reveals no abusive CPU usage, quake2
process was killed and X restarted without a problem.

I've then tried EXA acceleration. I didn't succeed in reproducing the
problem again since then, but I've experienced two GPU lockups when I
was simply moving a terminal window in the GNOME desktop environment
(reducing AGP rate didn't help). Running glxgears gave ~390 fps in
average. Under quake2, the floor and wall textures were OK. But the
screen freezes as with XAA acceleration.

As a last attempt, I've also tried an AGP Radeon 9600 Pro graphics
adapter. My ia64 system didn't POST at all.

Cheers,

Émeric

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-11 Thread Émeric Maschino
2010/2/4 Jerome Glisse :
> IIRC old radeon drm doesn't have any thing to dump GPU command stream.
> Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
> what radeon GPU stream command looks like (packet pm4 stuff)

Interesting read for the parts I can understand. But a lot of this
documentation goes way beyond my knowledge.

Looking at the logs I've recorded, I can see R300_CMD_PACKET0,
R300_CMD_WAIT, R300_CMD_PACKET3_RAW or R300_CMD_END3 traces. If I'm
not mistaken, they come from r300_cmdbuf.c but I don't have the
impression that they give sufficient information to isolate which
command triggers a GPU lockup, right?

As an alternative approach, I was wondering whether simple OpenGL
applications, with growing number of different OpenGL instructions,
would help diagnose the offending r300 command or not at all?

Émeric

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-08 Thread Émeric Maschino
2010/2/8 Alex Deucher :
> Does AGP work at all on ia64?  I know on some alphas there were cache
> coherency issues or something like that that more or less prevented
> AGP from be usable at all.  It was mostly there to accommodate AGP
> form factor cards.

I would say that AGP works on ia64, or at least it used to ;-)

Indeed, ATI proprietary fglrx driver was running nicely, but was
limited to XFree86 4.1.x (there was a check of the XFree86 version at
runtime). This was during the kernel 2.4 era.

And NVIDIA proprietary driver was running fine during the kernel
2.4/early 2.6 era (I remember having used it with kernel 2.6.10).

At that time, the zx1 driver was already there. And except from
API/ABI adjustments, I don't think it has been massively rewritten
since then. That's why I tend to think that the GPU lockup probably
resides somewhere else.

Looking again at the lspci -vv output, I can read "GART64-" and
"64bit-" in this line:

Capabilities: [58] AGP version 2.0
   Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64-
HTrans- 64bit-
FW+ AGP3- Rate=x1,x2,x4

Are these capabilities related to 64-bit architectures or not at all?
If related, should we read GART64+ and 64bit+ on ia64 systems?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Alex Deucher
On Sun, Feb 7, 2010 at 12:18 PM, Émeric Maschino
 wrote:
> 2010/2/7 Stephane Marchesin :
>> From what I recall, all the ia64 AGP chipsets (well the zx1 and the
>> 460) have to be run:
>> - without side band adressing
>> - without fast writes
>> - at 4x speed
>> otherwise they're unstable.
>>
>> I think by default agpgart puts them at AGP 1x with fast writes...
>
> Without /etc/X11/xorg.conf, AGP is configured as follows:
> - 2x rate
> - fast writes are disabled.
>
> Adding /etc/X11/xorg.conf in order to manually set AGP rate at 4x
> didn't help. Running glxgears triggers GPU lockup slightly faster than
> at 2x or 1x rate (well, GPU lockup appears in less than 1 sec. vs.
> ~2-3 sec. at slower rates).
>
> I've no idea about sideband addressing. Is there a way to check
> whether it's enabled or not? And is there a way to disable it?
>
> Just for completeness, Chapter 8.2.3 (AGP Registers) of HP zx1 ioa
> External Reference Specification
> (http://ftp.parisc-linux.org/docs/chips/zx1-ioa-mercury_ers.pdf) says
> that zx1 chipset supports:
> - AGP 1x, 2x and 4x data rate
> - fast writes for PIO transactions
> - sideband addressing.

Does AGP work at all on ia64?  I know on some alphas there were cache
coherency issues or something like that that more or less prevented
AGP from be usable at all.  It was mostly there to accommodate AGP
form factor cards.

Alex

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Émeric Maschino
2010/2/7 Émeric Maschino :
> I've no idea about sideband addressing. Is there a way to check
> whether it's enabled or not? And is there a way to disable it?

lspci -vv gives:

80:00.0 VGA compatible controller: ATI Technologies Inc Radeon R300 NG
[FireGL X1] (rev 80) (prog-if 00 [VGA controller])
Subsystem: ATI Technologies Inc Device 0152
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
SERR- TAbort-
SERR- http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Émeric Maschino
2010/2/7 Dave Airlie :
>> This would thus narrow my investigation path to the AGP code
>> of the radeon driver, right?
>
> No it narrows it down the to the AGP hardware in your machine along with
> the probable lack of info on it, and maybe some tweaks that we know
> nothing about.

By AGP hardware, do you mean the chipset or the graphics adapter?

I know nothing about driver development, but it seems to me that the
zx1 chipset is fairly well documented
(http://ftp.parisc-linux.org/docs/chips/zx1-ioa-mercury_ers.pdf). And
from the copyright in drivers/char/agp/hp-agp.c, it seems to me that
the zx1 driver was written by Bjorn Helgaas who works at hp.

About the ATI FireGL X1 graphics adapter, since it's powered by an FGL
9700 GPU and people had to reverse-engineer this range of products,
maybe are there "some tweaks we know nothing about", indeed ;-)

> If it was as simple as a codepath in the radeon driver I think we'd have
> fixed it by now.

Would it be possible that something in the codepath in the radeon
driver behaves differently/incorrectly on ia64 systems? As an example
of "generic code" that nevertheless triggers a bad behaviour on ia64
systems (only?), patch "drm: Preserve SHMLBA bits in hash key for
_DRM_SHM mappings" prevents DRI from being enabled
(http://bugzilla.kernel.org/show_bug.cgi?id=15212).

Back to the radeon driver, would it help if I can put my hand on an
ATI Radeon 9700 graphics adapter? It was probably more widely used by
gamers, and thus tested by the Linux community, than the CAD-oriented
FireGL X1 one.

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Émeric Maschino
2010/2/7 Stephane Marchesin :
> From what I recall, all the ia64 AGP chipsets (well the zx1 and the
> 460) have to be run:
> - without side band adressing
> - without fast writes
> - at 4x speed
> otherwise they're unstable.
>
> I think by default agpgart puts them at AGP 1x with fast writes...

Without /etc/X11/xorg.conf, AGP is configured as follows:
- 2x rate
- fast writes are disabled.

Adding /etc/X11/xorg.conf in order to manually set AGP rate at 4x
didn't help. Running glxgears triggers GPU lockup slightly faster than
at 2x or 1x rate (well, GPU lockup appears in less than 1 sec. vs.
~2-3 sec. at slower rates).

I've no idea about sideband addressing. Is there a way to check
whether it's enabled or not? And is there a way to disable it?

Just for completeness, Chapter 8.2.3 (AGP Registers) of HP zx1 ioa
External Reference Specification
(http://ftp.parisc-linux.org/docs/chips/zx1-ioa-mercury_ers.pdf) says
that zx1 chipset supports:
- AGP 1x, 2x and 4x data rate
- fast writes for PIO transactions
- sideband addressing.

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Stephane Marchesin
On Sat, Feb 6, 2010 at 11:47, Émeric Maschino  wrote:
> 2010/2/4 Jerome Glisse :
>> IIRC old radeon drm doesn't have any thing to dump GPU command stream.
>> Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
>> what radeon GPU stream command looks like (packet pm4 stuff). Note that
>> dump GPU command stream can quickly eat Gigs of data and finding what
>> is causing the lockup is then very cumberstone especialy as in your
>> case it sounds like it's a timing issue. You might want to force your
>> card into pci mode to see if it's agp related.
>
> Yep, setting Option "BusType" "PCI" in /etc/X11/xorg.conf prevents
> from GPU lockup.
>
> A a side note, strace glxinfo and strace glxgears still give me read()
> errors on /tmp/.X11-unix/X0, so they're probably not related to GPU
> lockup.
>
> Anyway, I don't know whether this is due to PCI mode or not, but
> OpenGL performances, although there's no more GPU lockup, are poor.
> And serious OpenGL applications, as simulated by the SPECviewperf test
> suite, have very irregular frame rates. If I'm not mistaken, the
> BusType option is specific to the radeon driver (or maybe other
> drivers too)? I mean, it's not a X.org wide configuration option,
> isn't it? This would thus narrow my investigation path to the AGP code
> of the radeon driver, right?
>

>From what I recall, all the ia64 AGP chipsets (well the zx1 and the
460) have to be run:
- without side band adressing
- without fast writes
- at 4x speed
otherwise they're unstable.

I think by default agpgart puts them at AGP 1x with fast writes...

Stephane

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-06 Thread Dave Airlie

> Anyway, I don't know whether this is due to PCI mode or not, but
> OpenGL performances, although there's no more GPU lockup, are poor.
> And serious OpenGL applications, as simulated by the SPECviewperf test
> suite, have very irregular frame rates. If I'm not mistaken, the
> BusType option is specific to the radeon driver (or maybe other
> drivers too)? I mean, it's not a X.org wide configuration option,
> isn't it? This would thus narrow my investigation path to the AGP code
> of the radeon driver, right?

No it narrows it down the to the AGP hardware in your machine along with 
the probable lack of info on it, and maybe some tweaks that we know 
nothing about.

If it was as simple as a codepath in the radeon driver I think we'd have 
fixed it by now.

Dave.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-06 Thread Alex Deucher
On Sat, Feb 6, 2010 at 2:47 PM, Émeric Maschino
 wrote:
> 2010/2/4 Jerome Glisse :
>> IIRC old radeon drm doesn't have any thing to dump GPU command stream.
>> Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
>> what radeon GPU stream command looks like (packet pm4 stuff). Note that
>> dump GPU command stream can quickly eat Gigs of data and finding what
>> is causing the lockup is then very cumberstone especialy as in your
>> case it sounds like it's a timing issue. You might want to force your
>> card into pci mode to see if it's agp related.
>
> Yep, setting Option "BusType" "PCI" in /etc/X11/xorg.conf prevents
> from GPU lockup.
>
> A a side note, strace glxinfo and strace glxgears still give me read()
> errors on /tmp/.X11-unix/X0, so they're probably not related to GPU
> lockup.
>
> Anyway, I don't know whether this is due to PCI mode or not, but
> OpenGL performances, although there's no more GPU lockup, are poor.
> And serious OpenGL applications, as simulated by the SPECviewperf test
> suite, have very irregular frame rates. If I'm not mistaken, the
> BusType option is specific to the radeon driver (or maybe other
> drivers too)? I mean, it's not a X.org wide configuration option,
> isn't it? This would thus narrow my investigation path to the AGP code
> of the radeon driver, right?

AGP is somewhat broken by design.  There are alots of subtle
incompatibilities and quirks between different AGP and GPU
combinations.  Your best bet is to play with the agp options in your
bios, or try adjusting the agpmode option:
Option "AGPMode" "x"
where x = 1 or 2 or 4 or 8
If you find a mode that works, we can add a quirk for your chipset/gpu
combination.

Alex

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-06 Thread Émeric Maschino
2010/2/4 Jerome Glisse :
> IIRC old radeon drm doesn't have any thing to dump GPU command stream.
> Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
> what radeon GPU stream command looks like (packet pm4 stuff). Note that
> dump GPU command stream can quickly eat Gigs of data and finding what
> is causing the lockup is then very cumberstone especialy as in your
> case it sounds like it's a timing issue. You might want to force your
> card into pci mode to see if it's agp related.

Yep, setting Option "BusType" "PCI" in /etc/X11/xorg.conf prevents
from GPU lockup.

A a side note, strace glxinfo and strace glxgears still give me read()
errors on /tmp/.X11-unix/X0, so they're probably not related to GPU
lockup.

Anyway, I don't know whether this is due to PCI mode or not, but
OpenGL performances, although there's no more GPU lockup, are poor.
And serious OpenGL applications, as simulated by the SPECviewperf test
suite, have very irregular frame rates. If I'm not mistaken, the
BusType option is specific to the radeon driver (or maybe other
drivers too)? I mean, it's not a X.org wide configuration option,
isn't it? This would thus narrow my investigation path to the AGP code
of the radeon driver, right?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-04 Thread Jerome Glisse
On Thu, Feb 04, 2010 at 03:37:58PM +0100, Émeric Maschino wrote:
> 2010/2/3 Stephane Marchesin :
> > No, you are right they don't trigger MCA. Hmm I didn't have any of
> > those back then, my lockups came from the bus mostly...
> 
> Thank you for clarifying this point.
> 
> > Really if you have such lockups they may also happen on x86, did you
> > try the card there?
> 
> Yes, I have no problem with this (AGP Pro 4x) graphics adapter (ATI
> FireGL X1) in x86 hardware.
> 
> > At this point your best bet is probably replay the crashing sequence
> > until you can reduce it to the offending couple of commands.
> 
> OK. Are the commands you're talking about the argument passed in the
> various ioctl() calls logged when stracing the offending OpenGL
> application? For example, strace glxgears gives lines like:
> ioctl(4, 0xc0106451, 0x6fd52d30) = 0
> ioctl(4, 0xc0186419, 0x6fd52d30) = 0
> ioctl(4, 0x40106459, 0x6fd52d58) = 0
> where 4 is the file descriptor of /dev/dri/card0. Are 0xc0106451,
> 0xc0186419 or 0x40106459 the commands passed to the GPU?
> 
> I don't know if it's related to GPU lockup or not (I mean, being the
> cause or a consequence), but I've also noticed in the strace glxgears
> logs (or even simple application like glxinfo) that most of the read()
> calls to /tmp/.X11-unix/X0 fail, whereas the writev() calls seem to
> succeed:
> poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
> writev(3, [{"\222\0\3\0\4\0\0\0\0\0\0\0", 12}, {NULL, 0}, {"", 0}], 3) = 12
> poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
> read(3, "\1\0*\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 4096) = 32
> read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
> temporarily unavailable)
> where 3 is the file descriptor of /tmp/.X11-unix/X0
> 
> Émeric
> 

IIRC old radeon drm doesn't have any thing to dump GPU command stream.
Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
what radeon GPU stream command looks like (packet pm4 stuff). Note that
dump GPU command stream can quickly eat Gigs of data and finding what
is causing the lockup is then very cumberstone especialy as in your
case it sounds like it's a timing issue. You might want to force your
card into pci mode to see if it's agp related.

Cheers,
Jerome

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-04 Thread Émeric Maschino
2010/2/3 Stephane Marchesin :
> No, you are right they don't trigger MCA. Hmm I didn't have any of
> those back then, my lockups came from the bus mostly...

Thank you for clarifying this point.

> Really if you have such lockups they may also happen on x86, did you
> try the card there?

Yes, I have no problem with this (AGP Pro 4x) graphics adapter (ATI
FireGL X1) in x86 hardware.

> At this point your best bet is probably replay the crashing sequence
> until you can reduce it to the offending couple of commands.

OK. Are the commands you're talking about the argument passed in the
various ioctl() calls logged when stracing the offending OpenGL
application? For example, strace glxgears gives lines like:
ioctl(4, 0xc0106451, 0x6fd52d30) = 0
ioctl(4, 0xc0186419, 0x6fd52d30) = 0
ioctl(4, 0x40106459, 0x6fd52d58) = 0
where 4 is the file descriptor of /dev/dri/card0. Are 0xc0106451,
0xc0186419 or 0x40106459 the commands passed to the GPU?

I don't know if it's related to GPU lockup or not (I mean, being the
cause or a consequence), but I've also noticed in the strace glxgears
logs (or even simple application like glxinfo) that most of the read()
calls to /tmp/.X11-unix/X0 fail, whereas the writev() calls seem to
succeed:
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{"\222\0\3\0\4\0\0\0\0\0\0\0", 12}, {NULL, 0}, {"", 0}], 3) = 12
poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
read(3, "\1\0*\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
4096) = 32
read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
temporarily unavailable)
where 3 is the file descriptor of /tmp/.X11-unix/X0

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-03 Thread Stephane Marchesin
On Tue, Feb 2, 2010 at 13:19, Émeric Maschino  wrote:
> 2010/2/1 Stephane Marchesin :
>> If an ia64 machine lockups, it will usually store an MCA telling you
>> about why it locked/where in the code this happened.
>> This is how I got ia64 DRI going a bunch of years ago. For what it's
>> worth, most of the bugs were:
>> - pci resources casted to 32 bit in the DRM
>> - some 32 bit adresses but that got fixed as a side effect of us
>> having x86_64 supported now
>> - large (32 or 64 bit) writes to I/O areas (should be all 8 bit, the
>> ia64 crashes otherwise) either from the kernel or from user space
>>
>> Really to track those the MCA errors proved extremely useful. Usually
>> they carry a pci adress and all...
>
> Just to understand: in the present case, I've been told that I'm
> experiencing GPU lockups. I can still remote log in to the station and
> kill the offending application. So, I imagine that's different than
> ia64 lockup, isn't it? Will an MCA event thus be triggered?
>

No, you are right they don't trigger MCA. Hmm I didn't have any of
those back then, my lockups came from the bus mostly...
Really if you have such lockups they may also happen on x86, did you
try the card there?

At this point your best bet is probably replay the crashing sequence
until you can reduce it to the offending couple of commands.

Stephane

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-02 Thread Émeric Maschino
2010/2/1 Stephane Marchesin :
> If an ia64 machine lockups, it will usually store an MCA telling you
> about why it locked/where in the code this happened.
> This is how I got ia64 DRI going a bunch of years ago. For what it's
> worth, most of the bugs were:
> - pci resources casted to 32 bit in the DRM
> - some 32 bit adresses but that got fixed as a side effect of us
> having x86_64 supported now
> - large (32 or 64 bit) writes to I/O areas (should be all 8 bit, the
> ia64 crashes otherwise) either from the kernel or from user space
>
> Really to track those the MCA errors proved extremely useful. Usually
> they carry a pci adress and all...

Just to understand: in the present case, I've been told that I'm
experiencing GPU lockups. I can still remote log in to the station and
kill the offending application. So, I imagine that's different than
ia64 lockup, isn't it? Will an MCA event thus be triggered?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-01 Thread Stephane Marchesin
On Mon, Feb 1, 2010 at 13:17, Émeric Maschino  wrote:
> 2010/1/31 Jerome Glisse :
>>> 
>>> Eventually, strace log is flooded with
>>> ioctl(4, 0xc0106451, 0x6fd530f8) = 0
>>> roughly at the time the CPU charge increases. This is consistent with
>>> what is recorded in syslog:
>>> Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
>>> pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
>>> Jan 29 21:16:03 longspeak kernel: [  318.611789]
>>> [drm:radeon_cp_getparam], pid=2426
>>> repeated several tens of thousands times where 2426 is glxgears PID.
>>> 
>> You are hitting GPU lockup which traduce by userspace keep
>> trying the same ioctl over and over again which completely
>> eat all CPU.
>
> Thank you for clarifying. Does GPU lockup mean that this problem is
> specific to my current hardware configuration? If I try an other
> graphics adapter (choices are scarce on ia64), is it possible that I
> don't experience GPU lockup at all or a different one?
>
>> There is no easy way to debug GPU lockup and no way at
>> all than by staring a GPU command stream or making wild
>> guess and testing things randomly.
>
> Just to clarify: I imagine that a GPU command stream is specific to a
> given GPU/driver. Does it mean that the commands sent to the GPU are
> not the sames on different Linux platforms (e.g. ia64/r300 vs.
> x86/r300)?
>
> About GPU command, is this something I can read in the various
> logfiles? Is there some kind of command generator to send a specific
> command or command stream to the GPU in order to help determine which
> one is the faulty one?
>
> I don't know if these are the command sent to the GPU but, looking
> again at the strace glxgears output I've recorded, I'm getting:
> futex(0x6fd53420,
> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL,
> 2004d1e8) = -1 EAGAIN (Resource temporarily unavailable)
> and numerous
> read(3, 0x600093e4, 4096)       = -1 EAGAIN (Resource
> temporarily unavailable)
> Should the return value of read() be equal to the number of blocks (I
> imagine) passed as the third argument? In this case, before getting
> EAGAIN error when trying to read blocks, I'm getting this following
> sequence that seem to shift something:
> writev(3, [{"b\0\5\0\f\0\0\0BIG-REQUESTS", 20}], 1) = 20
> poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
> read(3, "\1\0\1\0\0\0\0\0\1\216\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 4096) = 32
> poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
> writev(3, [{"\216\0\1\0", 4}], 1)       = 4
> poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
> read(3, "\1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 4096) = 32
> read(3, 0x600093e4, 4096)       = -1 EAGAIN (Resource
> temporarily unavailable)
> poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
> >From there, all subsequent pair of read() calls fail.
> By contrast, in the (old) strace glxgears excerpt posted here
> (http://ubuntuforums.org/showthread.php?t=75007), the read calls seem
> to always succeed.
>
> Could this be a starting point or not at all?
>

If an ia64 machine lockups, it will usually store an MCA telling you
about why it locked/where in the code this happened.
This is how I got ia64 DRI going a bunch of years ago. For what it's
worth, most of the bugs were:
- pci resources casted to 32 bit in the DRM
- some 32 bit adresses but that got fixed as a side effect of us
having x86_64 supported now
- large (32 or 64 bit) writes to I/O areas (should be all 8 bit, the
ia64 crashes otherwise) either from the kernel or from user space

Really to track those the MCA errors proved extremely useful. Usually
they carry a pci adress and all...

Stephane

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-01 Thread Émeric Maschino
2010/1/31 Jerome Glisse :
>> 
>> Eventually, strace log is flooded with
>> ioctl(4, 0xc0106451, 0x6fd530f8) = 0
>> roughly at the time the CPU charge increases. This is consistent with
>> what is recorded in syslog:
>> Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
>> pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
>> Jan 29 21:16:03 longspeak kernel: [  318.611789]
>> [drm:radeon_cp_getparam], pid=2426
>> repeated several tens of thousands times where 2426 is glxgears PID.
>> 
> You are hitting GPU lockup which traduce by userspace keep
> trying the same ioctl over and over again which completely
> eat all CPU.

Thank you for clarifying. Does GPU lockup mean that this problem is
specific to my current hardware configuration? If I try an other
graphics adapter (choices are scarce on ia64), is it possible that I
don't experience GPU lockup at all or a different one?

> There is no easy way to debug GPU lockup and no way at
> all than by staring a GPU command stream or making wild
> guess and testing things randomly.

Just to clarify: I imagine that a GPU command stream is specific to a
given GPU/driver. Does it mean that the commands sent to the GPU are
not the sames on different Linux platforms (e.g. ia64/r300 vs.
x86/r300)?

About GPU command, is this something I can read in the various
logfiles? Is there some kind of command generator to send a specific
command or command stream to the GPU in order to help determine which
one is the faulty one?

I don't know if these are the command sent to the GPU but, looking
again at the strace glxgears output I've recorded, I'm getting:
futex(0x6fd53420,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL,
2004d1e8) = -1 EAGAIN (Resource temporarily unavailable)
and numerous
read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
temporarily unavailable)
Should the return value of read() be equal to the number of blocks (I
imagine) passed as the third argument? In this case, before getting
EAGAIN error when trying to read blocks, I'm getting this following
sequence that seem to shift something:
writev(3, [{"b\0\5\0\f\0\0\0BIG-REQUESTS", 20}], 1) = 20
poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
read(3, "\1\0\1\0\0\0\0\0\1\216\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
4096) = 32
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{"\216\0\1\0", 4}], 1)   = 4
poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
read(3, "\1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
4096) = 32
read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
>From there, all subsequent pair of read() calls fail.
By contrast, in the (old) strace glxgears excerpt posted here
(http://ubuntuforums.org/showthread.php?t=75007), the read calls seem
to always succeed.

Could this be a starting point or not at all?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-01-31 Thread Jerome Glisse
On Sun, Jan 31, 2010 at 02:28:39PM +0100, Émeric Maschino wrote:
> Hello,
> 
> I really don't know where to start, so feel free to redirect me to the
> right mailing list if this one is not the correct one.
> 
> [Summary]
> I'm trying to help revive 3D hardware acceleration on ia64
> architecture. This is a very long story that started in 2006
> (http://bugs.freedesktop.org/show_bug.cgi?id=7770).
> 
> Currently, DRI can't be activated at all because of a regression
> introduced during kernel 2.6.30 development cycle
> (http://marc.info/?l=linux-ia64&m=126419878611433&w=2). I've bisected
> the regression to commit f1a2a9b6189f9f5c27672d4d32fec9492c6486b2
> (drm: Preserve SHMLBA bits in hash key for _DRM_SHM mappings). Simply
> reverting it from current kernel source enables DRI again on ia64.
> I've asked for help several times from the author (David S. Miller
> ) through the linux-ia64 list and by contacting
> him directly but got no answer at this time. So I really don't know
> what to do with this patch. I bet that asking for its removal from the
> kernel source is not an acceptable solution, isn't it?
> [End of summary]
> 
> Anyway, with DRI enabled, I'm now trying to make it works again. My
> ia64 workstation sports an ATI FireGL X1 AGP adapter. I'm using the
> r300 open source driver. As soon as an 3D OpenGL application is
> started (e.g. glxgears), it eats CPU ressources within seconds.
> Switching between XAA/EXA acceleration makes no difference. Reducing
> AGP speed from 2x (set by default when no xorg.conf file is present)
> to 1x has little impact (the offending application takes 3 sec. rather
> than 1-2 sec. to eat CPU ressources). The system isn't locked as it
> can be remotely rebooted, but is really unusable once a 3D OpenGL
> application has started eating CPU. Killing the offending application
> makes the X server eats CPU ressources. This behaviour is consistent
> with what I noticed one year ago with older X.org X server
> (http://bugs.freedesktop.org/show_bug.cgi?id=7770#c42), so I bet the
> problem is still there with current X.org implementation (I'm using
> X.org X Server 1.7.4 on a Debian "Squeeze" Testing distribution).
> 
> I don't know what information is useful, so I simply straced glxgears
> with drm.debug=1 passed to kernel with my current hardware
> configuration. Eventually, strace log is flooded with
> ioctl(4, 0xc0106451, 0x6fd530f8) = 0
> roughly at the time the CPU charge increases. This is consistent with
> what is recorded in syslog:
> Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
> pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
> Jan 29 21:16:03 longspeak kernel: [  318.611789]
> [drm:radeon_cp_getparam], pid=2426
> repeated several tens of thousands times where 2426 is glxgears PID.
> Is this 0xc0106451 command a valuable information?
> 
> I don't know if it's informative either, but enabling the side-bar in
> GNOME Shell eats CPU ressources too and syslog is flooded with:
> Jan 30 12:38:26 longspeak kernel: [  325.146380] [drm:radeon_do_cp_idle],
> Jan 30 12:38:26 longspeak kernel: [  325.332672]
> [drm:radeon_do_wait_for_idle], wait idle failed status : 0x84110140
> 0x9C000800
> Jan 30 12:38:26 longspeak kernel: [  325.332676]
> [drm:radeon_do_release], radeon_do_cp_idle -16
> Does this failed status provides a useful starting point?
> 
> Thanks for reading and any advice/suggestion welcome.
> 
> Émeric

You are hitting GPU lockup which traduce by userspace keep
trying the same ioctl over and over again which completely
eat all CPU.

There is no easy way to debug GPU lockup and no way at
all than by staring a GPU command stream or making wild
guess and testing things randomly.

Cheers,
Jerome

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


3D OpenGL applications eat CPU ressources

2010-01-31 Thread Émeric Maschino
Hello,

I really don't know where to start, so feel free to redirect me to the
right mailing list if this one is not the correct one.

[Summary]
I'm trying to help revive 3D hardware acceleration on ia64
architecture. This is a very long story that started in 2006
(http://bugs.freedesktop.org/show_bug.cgi?id=7770).

Currently, DRI can't be activated at all because of a regression
introduced during kernel 2.6.30 development cycle
(http://marc.info/?l=linux-ia64&m=126419878611433&w=2). I've bisected
the regression to commit f1a2a9b6189f9f5c27672d4d32fec9492c6486b2
(drm: Preserve SHMLBA bits in hash key for _DRM_SHM mappings). Simply
reverting it from current kernel source enables DRI again on ia64.
I've asked for help several times from the author (David S. Miller
) through the linux-ia64 list and by contacting
him directly but got no answer at this time. So I really don't know
what to do with this patch. I bet that asking for its removal from the
kernel source is not an acceptable solution, isn't it?
[End of summary]

Anyway, with DRI enabled, I'm now trying to make it works again. My
ia64 workstation sports an ATI FireGL X1 AGP adapter. I'm using the
r300 open source driver. As soon as an 3D OpenGL application is
started (e.g. glxgears), it eats CPU ressources within seconds.
Switching between XAA/EXA acceleration makes no difference. Reducing
AGP speed from 2x (set by default when no xorg.conf file is present)
to 1x has little impact (the offending application takes 3 sec. rather
than 1-2 sec. to eat CPU ressources). The system isn't locked as it
can be remotely rebooted, but is really unusable once a 3D OpenGL
application has started eating CPU. Killing the offending application
makes the X server eats CPU ressources. This behaviour is consistent
with what I noticed one year ago with older X.org X server
(http://bugs.freedesktop.org/show_bug.cgi?id=7770#c42), so I bet the
problem is still there with current X.org implementation (I'm using
X.org X Server 1.7.4 on a Debian "Squeeze" Testing distribution).

I don't know what information is useful, so I simply straced glxgears
with drm.debug=1 passed to kernel with my current hardware
configuration. Eventually, strace log is flooded with
ioctl(4, 0xc0106451, 0x6fd530f8) = 0
roughly at the time the CPU charge increases. This is consistent with
what is recorded in syslog:
Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
Jan 29 21:16:03 longspeak kernel: [  318.611789]
[drm:radeon_cp_getparam], pid=2426
repeated several tens of thousands times where 2426 is glxgears PID.
Is this 0xc0106451 command a valuable information?

I don't know if it's informative either, but enabling the side-bar in
GNOME Shell eats CPU ressources too and syslog is flooded with:
Jan 30 12:38:26 longspeak kernel: [  325.146380] [drm:radeon_do_cp_idle],
Jan 30 12:38:26 longspeak kernel: [  325.332672]
[drm:radeon_do_wait_for_idle], wait idle failed status : 0x84110140
0x9C000800
Jan 30 12:38:26 longspeak kernel: [  325.332676]
[drm:radeon_do_release], radeon_do_cp_idle -16
Does this failed status provides a useful starting point?

Thanks for reading and any advice/suggestion welcome.

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel