Re: PCI Express

2004-02-04 Thread John Dennis
On Tue, 2004-02-03 at 17:37, Marc Aurele La France wrote:
 PCI-Xpress is programmatically identical to PCI, so I don't forsee any
 problems in that regard.

Yes, its identical to PCI in terms of the interface presented to the OS
so configuration probably won't be an issue, but there is code in
hw/xfree86/os-support/bus that attempts to walk the bus topology with
explicit knowledge of current PCI bridges. I don't believe this code
will execute correctly, but I assume it can be easily defeated, yes?

But thats probably less of an issue than the fact PCIE systems demand
new PCIE cards and that means driver support. If we are fortunate the
new PCIE cards might be pragmatically compatible with with current cards
and XFree86 will only need to recognize the new cards, but I really
doubt that will be the case. ATI, nvidia, 3DLabs have all announced
support for PCIE and pledged it will bring dramatic performance
increases. Some of that will be due to the faster bus, but I've got to
believe these new cards will feature architectural changes as well
demanding new driver support.

Bottom line, I've been asked if XFree86 will be able to support PCIE
systems due to arrive in a few months or if these systems are going to
be dead in the water for open source for anything other than a server.

So I'm digging for what people know or believe the issues are.

John


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


PCI Express

2004-02-03 Thread John Dennis
In a few months PCI Express (PCIE) will hit the streets. My
understanding is that some system vendors are building system boards
without any AGP slots. As far as I know that means only an old style PCI
graphics card will work (PCIE is fully compatible with PCI), or a new
PCIE card. Does anybody know to what extent PCIE will be supported by
XFree86 and in what timeframe? I found a preliminary release note for
the Radeon driver in 4.4 that in a PCIE system it will fall back to PCI
and should work. But I'm guessing that DRI and other components that use
AGP will be crippled in this environment, right?

I'm also aware the XFree86 server likes to roam around and touch things
like PCI bridges, I'm wondering how this code will play in a PCIE
system.

Have any of the XFree86 developers been seeded with PCIE systems and
graphics cards and are working on this? What is the extent of changes
anticipated to support PCIE only systems? And if so what's the
timeframe?
-- 
John Dennis [EMAIL PROTECTED]

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


DRI proprietary modules

2003-10-16 Thread John Dennis
For DRI to work correctly there are several independent pieces that all
have to be in sync.

* XFree86 server which loads drm modules (via xfree86 driver module)

* The drm kernel module

* The agpgart kernel module

Does anybody know for the proprietary drivers (supplied by ATI and
Nvidia) which pieces they replace and which pieces they expect to be
there? The reason I'm asking is to understand the consequences of
changing an API. I'm curious to the answer in general, but in this
specific instance the api I'm worried about is between the agpgart
kernel module and drm kernel module. If the agpgart kernel module
modifies it's API will that break things for someone who installs a
proprietary 3D driver? Do the proprietary drivers limit themselves to
mesa driver and retain the existing kernel services assuming the IOCTL's
are the same? Or do they replace the kernel drm drivers as well? If so
do they manage AGP themselves, or do they use the systems agpgart
driver? Do they replace the systems agpgart driver?

-- 
John Dennis [EMAIL PROTECTED]

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: [Dri-devel] Deadlock with radeon DRI

2003-10-10 Thread John Dennis
The locking problem is solved, my original analysis was incorrect. The
problem was that DRM_CAS was not correctly implemented on IA64. Thus
this was an IA64 issue only, this is consistent with others who showed
up in a google search describing the problem, all were on IA64.

I have filed an XFree86 bug report on this. I could not find a DRI
specific bug reporting mechanism other than the dri-devel list.

The IA64 implementation of CAS was this:

#define DRM_CAS(lock,old,new,__ret)   \
do {  \
unsigned int __result, __old = (old); \
__asm__ __volatile__( \
mf\n\
mov ar.ccv=%2\n \
;;\n\
cmpxchg4.acq %0=%1,%3,ar.ccv\
: =r (__result), =m (__drm_dummy_lock(lock))  \
: r (__old), r (new)  \
: memory);  \
__ret = (__result) != (__old);\
} while (0)

The problem was with the data types given to the cmpxchg4
instruction. All of the lock types in DRM are int's and on IA64 thats
4 bytes wide. The digit suffix cmpxchg4 signifies this instruction
operates on a 4 byte quantity. One might expect then since this
instruction operates on 4 byte values and in DRM the locks are 4 bytes
everything is fine, but it isn't.

The cmpxchg4 instruction operates this way: 

cmpxchg4 r1=[r3],r2,ar.ccv

4 bytes are read at the address pointed to by r3, that 32 bit value is
then zero extended to 64 bits. The 64 bit value is then compared to
the 64 bit value stored in appliation register CCV. If the two 64 bit
values are equal then the least significant 4 bytes in r2 are written
back to the address pointed to by r3. The original value pointed to by
r3 is stored in r1. The entire operation is atomic.

The mistake in the DRM_CAS implemenation is that the comparison is 64
bits wide, thus the value stored in ar.ccv (%2 in the asm) must be 64
bits wide and for us that means zero extending the 32 bit old
parameter to 64 bits.

Because of the way GCC asm blocks work to tie C variables and data
types to asm values the promotion of old from unsigned int to
unsigned long was not happening. Thus when old was stored into
ar.ccv its most significant 32 bits contained garbage. (Actually
because of the way GCC generates constants it turns out the upper 32
bits was 0x, this was from the OR of DRM_LOCK_HELD which is
defined as 0x8000, but the compiler generates a 64 bit OR
operation using the immediate value 0x8000, which is legal
because the upper 32 bits are undefined on int (32 bit) operations).

The bottom line is that the test would fail when it shouldn't because
the high 32 bits in ar.ccv were not zero.

One might think that because old was assigned to __old in a local
block which was unsigned int the compiler would know enough when using
this value in the asm to have zero extended it. But that's not true,
in asm blocks its critical to define the asm value correctly so the
compiler can translate between the C code variable and what the asm
code is referring to.

The line:

: r (__old), r (new)

says %2 is mapped by r (__old), in other words put __old in a
general 64 bit register. We've told the compiler to put 64 bits of
__old into a register, but __old is a 32 bit value with its high
order 32 bits undefined. We need to tell the compiler to widen the
type when assigning it to a general register, thus the asm template
type definition needs to be modified with a cast to unsigned long.

: r ((unsigned long)__old), r (new)

Only with this will the compiler know to widen the 32 bit __old value
to 64 bits inside the asm code.

Thanks to Jakub Jelinek who helped me understand the nuances of GCC
asm templates and type conversions.

As a minor side note, definitions of bit flags should be tagged as
unsigned. Thus things like:

#define DRM_LOCK_HELD  0x8000
#define DRM_LOCK_CONT  0x4000

should really be:

#define DRM_LOCK_HELD  0x8000U
#define DRM_LOCK_CONT  0x4000U

John


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Deadlock with radeon DRI

2003-10-02 Thread John Dennis
[Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ]

I'm trying to debug a hung X server problem with DRI using the radeon
driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at
the moment I don't see anything architecture specific about the problem.

The symptom of the problem is the following message from the drm
radeon kernel driver:

[drm:radeon_lock_take] *ERROR* x holds heavyweight lock

where x is a context id. I've tracked the sequence of events down to
the following:

DRIFinishScreenInit is called during the radeon driver initialization,
inside DRIFinishScreenInit is the following code snippet:

/* Now that we have created the X server's context, we can grab the
 * hardware lock for the X server.
 */
DRILock(pScreen, 0);
pDRIPriv-grabbedDRILock = TRUE;

Slightly later on RADEONAdjustFrame is called and it does the following:

#ifdef XF86DRI
if (info-CPStarted) DRILock(pScrn-pScreen, 0);
#endif

Its this DRILock which is causing the *ERROR* x holds heavyweight
lock message. The reason is both DRIFinishScreenInit and
RADEONAdjustFrame are executing in the server and using the servers
DRI lock. DRIFinishScreenInit never unlocks, it sets the
grabbedDRILock flag, big deal, no one ever references this flag. When
RADEONAdjustFrame calls DRILock its already locked because
DRIFinishScreenInit locked and never unlocked. The dri kernel driver
on the second lock call then suspends the X server process
(DRM(lock_take) returns zero to DRM(lock) because the context holding
the lock and context requesting the lock are the same, this then
causes DRM(lock) to put the X server on the lock wait queue). Putting
the X server on the wait queue waiting for the lock to be released
then deadlocks the X server because its the process holding the lock
on its context.

Questions:

The whole crux of the problem seems to me the taking and holding of
the lock in DRIFinishScreenInit. Why is this being done? I can't see a
reason for it. Why does it set a flag indicating its holding the lock
if nobody examines that flag?

Is suspending a process that already holds a lock during a lock request
really the right behavior? Granted, a process thats trying to lock twice
without an intervening unlock is broken, but do we really want to
deadlock that process?

Any other insights to this issue?

FWIW, I googled for this error and came up with several folks who
starting around last spring started seeing the same problem, but none
of the mail threads had a follow up solution.

Thanks,

John


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: setjmp needs fixing again, here's the issues

2003-09-11 Thread John Dennis
On Wed, 2003-09-10 at 20:11, David Dawes wrote: 

John wrapper. So as long as we've already lost module
John independence by virtue of linking the system function why
John not go all the way and use the system definition of the
John system function's argument? It seems like

David We haven't lost module independence by doing that.

Perhaps I'm misunderstanding module independence, can you confirm or
deny some of the assumptions I've been working with:

1) System specific functions are wrapped in an xf86* wrapper so that:

   a) system differences are isolated at a single point which provides
  a common interface to the rest of the server

   b) all system specific functions are wrapped and live in the
  executable XFree86. The XFree86 executable is linked against
  the system libraries and hence the XFree86 executable is not
  system independent.

2) module independence derives from the following:

   a) modules only link against (e.g. import) from the XFree86
  executable that loaded it. In other words all system
  specific elements are contained in the executable, not a
  loaded module.

   b) The executable and modules share the same linking, 
  relocation, exception, and call conventions. This makes
  a module mostly system independent, but not 100%, e.g.
  its possible, but not common, for different OS's on the same
  architecture to have a different ABI, but less common today
  as the industry is converging on standardized ABI's.

Is the above basically correct?

If so then by directly linking a reference to the system's setjmp call
in a module haven't we violated the notion that only the main
executable has dependencies on the system libraries? The module is now
making a call whose parameters and behavior is specific to the
system. Actually, more correctly, its specific to the how the main
executable was linked (which is system specific), see below for why
this is true.

David All that matters is that the jmp_buf type be defined on
David each architecture in such a way that it meets the
David requirements of all the supported OSs on that architecture.
David Newer architectures are more likely to have a
David platform-independent ABI anyway than was necessarily true
David in the past.  If the platform's setjmp.h can define the
David type with the correct alignment, then so can we.  It
David doesn't matter if the methods for doing this are
David compiler-specific.  If the 128-bit alignment is an IA64 ABI
David requirement, then I'd expect that all complilers for IA64
David will have a method for defining jmp_buf with the correct
David alignment.

If I following your reasoning then you would be happy with this
definition in xf86_libc.h

/* setjmp/longjmp */
#if defined(__ia64__)
typedef int xf86jmp_buf[1024]; __attribute__ ((aligned (16))); /* guarantees 128-bit 
alignment! */
#else
typedef int xf86jmp_buf[1024];
#endif

But if one is going to special case the definition of jmp_buf based on
architecture why not use the system definition and typedef xf86jmp_buf
to be jmp_buf? I suspect the answer will be that the system libraries
could change requiring a different size buffer. I'd buy that if it
weren't for the fact we are directly linking the library specfic
version of setjmp into the module.

Why a module referencing setjmp is tied to a specific system library:
-

As far as I can tell we've tied the module to a specific system
library. Here is why I believe this. I'm going to simplify the
discussion by assming there is only one xf86set_jmp symbol and ignore
the type 0 and type 1 variants that select setjmp or sigsetjmp.

1) a module makes a reference to xf86set_jmp.

2) the xfree86 loader when it loads that module searches for that
   symbol in its hash table of symbols, that table was populated in
   part by the table in xf86sym.c which in a loadable build contains
   this definition in the main executable:

   #include setjmp.h
   SYMFUNCALIAS(xf86setjmp,setjmp)

3) the above definition means when the symbol name xf86setjmp is
   looked up by the loader it will get the address of setjmp that was
   linked into the main executable. This is the function address that
   the xfree86 loader will insert into module during its relocation
   phase. 

4) How does the address of setjmp get into the main executable? It
   depends on whether the main executable was statically or
   dynamically linked, but in either case the system will assure it
   comes from a specific version of the library defined at the time
   the main executable was linked.

5) Therefore when a module that referenced setjmp is called its
   calling the system version of setjmp in the exact library when the
   main executable was linked. 

I don't see how the above satisfies the conception of module
independence and by extension the avoidence of using the 

Re: setjmp needs fixing again, here's the issues

2003-09-11 Thread John Dennis
On Thu, 2003-09-11 at 14:35, David Dawes wrote:
 What's the difference between this in the core executable:
 
   xf86A(pointer data)
   {
  return A(data);
   }
 
   SYMFUNC(xf86A)
 
 and this:
 
   SYMFUNCALIAS(xf86A, A)

The difference is that xf86A may massage data in some system specific
manner specific to the libraries the main executable is linked against.
Examples might include parameters to mmap where the base address and
size have to be adjusted for page boundaries and the mapping flags
passed. Or anything passed into an ioctl. I don't think the current set
of wrapped functions includes anything of this nature, but I thought it
was the principle behind it.

 Module independence comes from providing a more uniform ABI to the
 modules than you might get by linking directly with the functions provided
 by the system libraries.

Isn't that what I'm saying above? Xfree86 imposes its own interface
leaving the wrappers to translate when the underlying system differs.

Perhaps just as importantly since the wrappers are in the main
executable, which is the ONLY place loadable object dependencies are
enforced, its the only place you can be guaranteed that you'll be linked
to the version of shared object you're expecting!

 If I following your reasoning then you would be happy with this
 definition in xf86_libc.h

 Yes.

O.K. I'll make this patch and put it in the bugzilla. I'm not completely
happy with it because its not really addressing the root problem. Setjmp
may blow up on some other system in the future because we're ignoring
the system defined requirements. For example in the non-ia64 case
xf86jmp_buf will only be aligned on int boundaries, what if a system
needs long alignment, or some other requirement?

 I'll grant that what we're going with setjmp is a little hairy, but I
 haven't seen anything so far that says the basic approach doesn't work
 or impacts the portability of modules.

Yes, you are right setjmp is a true exception because it can't be
wrapped and thus isolate system specifics outside the module. And you're
also right its a bit hairy, it took me a while to fully understand
exactly what was going on.

As for impacting portability of modules, why isn't the following true?

libc version 1.0 requires 8  byte alignment, server XFree86 A is linked
against this library.

libc version 1.1 requires 16 byte alignment, server XFree86 B is linked
against this library.

The module in theory can be loaded into either XFree86 server A or B.
If the module was built against the libc 1.0 requirement it will fail
when loaded into XFree86 B. This is because unlike the system loader
which enforces the right version of the library to be loaded when
XFree86 is loaded by the system, the module loader does not deal with
version dependencies. In part because the version dependencies in theory
have been isolated in the main executable image via wrappers and
enforced by the system loader. The current setjmp implementation
violates all these assumptions which is why I conclude this negatively
impacts module portability. It is also true that if setjmp were in a
wrapper the module could be loaded into XFree86 server A or B without
issue, but thats not the case, it will crash the server if loaded in
XFree86 server B.

Granted, if we are lucky this may never bite us, there is a reasonable
chance it won't, but on the other hand to assert we haven't lost module
independence and portability is not being honest about the new situation
with modules that import xf86set_jmp.

I'm not saying I have an answer to the problem we've created for
ourselves, but I am saying we need to be honest about the impact, that
any module that imports xf86set_jmp is no longer portable, is tied the
libraries the main XFree86 is linked against, and as such we might as
well use the system definitions of jmp_buf because its the robust
solution and is no more, no less portable than inventing a jmp_buf
abstraction that inherently is not robust. So if you're not getting
portability and independence with the abstraction why sacrifice
robustness?




___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


setjmp needs fixing again, here's the issues

2003-09-10 Thread John Dennis
I have found the problem with XFree86's implementation of setjmp on
ia64. It occurs because we are ignoring the type definition of
setjmp's argument.

   int setjmp(jmp_buf env);

In xf86_lib.h we redefine jmp_buf thusly:

/* setjmp/longjmp */
typedef int xf86jmp_buf[1024];

#undef jmp_buf
#define jmp_buf xf86jmp_buf

Based on the discussions that occurred last Feb and March I believe
this was done to preserve a loadable modules system independence. The
notion being that each system may have a different jmp_buf size, we
can't know aprori what that is, so make it bigger than we think is
necessary and it should be big enough no matter on what system the
module is loaded on.

However ignoring the system defined type for jmp_buf we've ignored the
system specific alignment requirement of jmp_buf. On ia64 it must be
16 byte aligned because setjmp when it saves the execution context on
ia64 uses machine instructions that writes the contents of the 16 byte
floating point registers, the destination address must be 16 byte
aligned or a SIGBUS fault is signaled.

On ia64 you can see in /usr/include/bits/setjmp.h the following
jmp_buf definition:

/* the __jmp_buf element type should be __float80 per ABI... */
typedef long __jmp_buf[_JBLEN] __attribute__ ((aligned (16))); /*
guarantees 128-bit alignment! */

I did an experiment where I defeated the use of xf86jmp_buf and
instead used the systems definition of jmp_buf and the SIGBUS problem
went away. Earlier I had noted that a static server did not exhibit
the problem, thats because the xf86jmp_buf is only used in a loadable
build. 

How do we fix this?
---

1) It may be hard to know the alignment requirements for all OS's and
architectures (thats part of the reason we have system header
files). We could guess 16 byte is sufficient. But even if we got the
alignment requirement right how do we specify the alignment requirement
to the compiler in a portable way such that it will build on a variety
of systems and compilers? My understanding is that compiler alignment
directives are not portable (e.g. not part of ANSI C). Are we
comfortable picking one or two common alignment directives and
conditionally compiling with the right one?

2) We cannot force alignment without the use of compiler directives
for any alignment greater than the maximum size of a basic type by
creating artifical structs. This is because in C structs are aligned
to the size of the maximum basic type, which on ia64 is 8 (long,
double, etc), half the alignment requirement. Plus such a scheme would
put us back into guessing type sizes, alignment requrements, etc. It
would not be robust.

3) We could use the systems's definition of jmp_buf. That means it
will always be correct on the system it was compiled on, but that
module may be loaded on another system with potentially different
jmp_buf definitions and cause problems. I realize we have a goal of
system independence for loadable modules, but do we really expect
modules built on one system to be loaded on a significantly different
system? They are after all binary modules. I also realize that setjmp
is currently the only thing in our loadable modules which is not
wrapped by a core server function so it would kind of be a shame to
let this one item polute module independence but we have no choice,
the loader now links the system implementation of setjmp and not a
wrapper. So as long as we've already lost module independence by
virtue of linking the system function why not go all the way and use
the system definition of the system function's argument? It seems like
we would not have lost anything and would have picked up robustness
we've given up by trying to use a system neutral definition of the
jmp_buf argument.

Given the argument in 3 above I'd recommend taking out xf86jmp_buf and
putting back in #include stdjmp.h. It seems the simplist, most
robust, and in practical terms sacrifices little.

Comments?

I'm going to file a bugzilla on this, its very definitely broken on
ia64 and causes the server to crash. I will put the text of this in
the bugzilla.

John



___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


setjmp wrappers

2003-09-05 Thread John Dennis
Remember last Feburary and March there was a big discussion about
xf86setjmp? Part of that discussion involved a SIGSEGV or SIGBUS in the
freetype2 code when a font was not found. Well, I'm seeing the same
thing (on ia64). At the time it was pointed out that one can never wrap
setjmp because setjmp has bad behavior if its called from within a
function that returns, which is exactly what a wrapper is.

The setjmp code was reworked a couple of months ago (in part to account
for libc differences). The current implementation has these code
fragments:

ftstdlib.h:
---

#define DONT_DEFINE_WRAPPERS
#define DEFINE_SETJMP_WRAPPERS
#include xf86_ansic.h
#undef DONT_DEFINE_WRAPPERS

xf86_libc.h:


#if defined(XFree86LOADER)  \
(!defined(DONT_DEFINE_WRAPPERS) || defined(DEFINE_SETJMP_WRAPPERS))
#undef setjmp
#define setjmp(a)   xf86setjmp_macro(a)
#undef longjmp
#define longjmp(a,b)xf86longjmp(a,b) 
#undef jmp_buf
#define jmp_buf xf86jmp_buf
#endif


As far as I can tell when one builds the freetype module for the
loadable server you're not going to get any wrappers except for setjmp
and this is bad because you can't wrap setjmp.

As a trial I built both a static and a loadable server. The static
server ran fine, but the loadable server dies with a SIGBUS in the
freetype code when a setjmp/longjmp is executed. Pretty much what I
expected.

I'm wondering if I'm missing something, but its a fact you can't wrap
setjmp right? And why would you turn off all wrappers except setjmp? Is
this really right?

I reread the original discussion and part of had to do with how to
implement setjmp in modules that are supposed to be system neutral (i.e.
must use wrappers) when one has this exception of a clib function that
can't be wrapped. What ever came of that?

-- 
John Dennis [EMAIL PROTECTED]

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-28 Thread John Dennis
On Thu, 2003-08-28 at 12:46, Marc Aurele La France wrote:
 Secondly, EFI is already doing the wrong thing by marking PCI ROMs as
 non-cacheable.  This doesn't inspire confidence...

I believe there is a difference between ROM's being logically cacheable
and the way the ZX1 actually wires that memory into the memory system.
The ZX1 connection to PCI devices are always non-cached. It's a
simplified assumption correct in most all cases with little penalty for
the rare case of cacheable memory sitting out in MMIO space. Therefore
EFI is not doing the wrong thing by marking ROM as non-cacheable, the
ZX1 is going to treat any PCI address as non-cached by design.


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-28 Thread John Dennis
On Thu, 2003-08-28 at 16:40, Marc Aurele La France wrote:
  ... which basically means that framebuffers cannot benifit from CPU
  caching.  I don't beleive this to be the case.
 
 Further to this, it appears you don't realise that the frambuffers we're
 talking about here, _are_ in PCI space.

Yes, I realize framebuffers are in PCI space. All I can do is make the
following observations:

1) I was told by HP: The ZX1 chipset doesn't support cacheable
access to any MMIO space, regardless of whether that space happens
to contain RAM, ROM, device CSRs, etc. This statement seems clear to
me, do you have a different interpretation?

2) Write combining is a typical memory attribute to apply to a
framebuffer, on the IA64 the write combining memory attribute is also
non-cacheable.

3) Would you really want caching on framebuffer memory in the presence
of a graphics co-processor that can alter the memory independently of
the cache? 

4) It does not seem outlandish when considering the universe of PCI
devices to believe the memory regions on these devices either have
side-effects or can be modified by the device, either case would demand
non-caching. It would be very hard, as it has been pointed out, for the
firmware to know what memory regions on a device could be cached safely.
Thus a decision to treat any PCI memory region as non-cached sounds like
a plausible design decision.

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-27 Thread John Dennis
On Wed, 2003-08-27 at 06:15, Egbert Eich wrote:
 Appearantly there are still issues with VGA framebuffer and emulated
 PIO register writes when saving and restoring fonts. These problems
 only affect certain cards (so far I've only heared of Nvidia cards).
 Mark Vojkovich is sure that these are related to caching problems,
 too.
 
 Have you made any progress investigating this issue?

I have looked at this, here is what I know (or think I know :-)

* The vga font save/restore seems unrelated to caching.

  - The vga framebuffer (where the fonts live) has always been mapped non-cached.
The EFI on HP ZX1 would also have set this region up as non-cached.
We have verified by inspecting the TLB it is non-cached.
The kernel patch for non-caching (should not have mattered given the above)
does not affect the font/save restore, nor did I expect it to.
Everything seems right.

* The vga font save/restore seems related to timing, specifically too many
  back to back bus transactions to the VGA framebuffer.

  - Adding more delays into SlowBCopy makes the problem go away.

  - The symptom of the problem is the bridge appears to be in a master
abort error state. This would occur if the vga did not respond fast
enough.

* So far we've only seen this problem with nvidia (but see next item).

  - This suggests the problem is specific to the nvidia vga implementation
and the speed at which it responds.

* SlowBCopy seems only to be used for VGA font/save restore.

  - The whole idea of SlowBCopy bothers me from a technical perspective.

  - SlowBCopy has been around a while, thus this is not the first time
folks have run into this problem.

  - No one seem to know anymore when or why its needed, but it seems to have
first appeared on DEC ALPHA systems and seem unnecessary on standard PC
class machines. It is an interesting correlation that both ALPHA and IA64
had fast bus transaction design goals.

Mark, is there a hardware engineer at nvidia that would be familar with
the VGA timing? Is it possible your VGA is running near or beyond the
limits of the PCI timing requirements? My understanding is the ZX1 is a
very aggressive chipset tuned for high performance, so its possible on a
standard PC you may not have seen the timing problems, even if you were
close to the limits.

John


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-26 Thread John Dennis
I just wanted to follow up with this for those who are maintaining the
memory mapping code in the CVS tree, this is linux ia64 specific and
possibly HP ZX1 specific. As of today this is best information I have
and wanted to share it.

Originally the MAP_NONCACHED passed to mmap caused non-cached access.
That was deprecated in favor of the O_SYNC flag passed to the open of
/dev/mem. As of linux kernels 2.5.xx or those which have been patched,
these flags are both deprecated and ignored. At some point
os-support/linux/lnx_video.c should clean up the use of these flags.
There are a few other places in the tree these flags are used as well.

However, for the time being the use of these flags should remain as
people may be running on kernels without the automatic detection of the
caching attribute.

The critical issue is that caching attribute the kernel sets in the TLB
must match the EFI (firmware) setup. It has been a fortunate consequence
that when XFree86 passed the non-caching flags it was for memory regions
that EFI had configured to be non-caching and no problem was observed,
the non-caching flag caused the kernel to set the TLB correctly. For
kernels that ignore the flag it will still set the TLB correctly via an
automatic mechanism.

EFI will have created non-cached mappings with the HP ZX1 chipset for
ALL PCI memory regions since the ZX1 only supports non-cached on IO.

Anytime in the XServer when MMIO was specified as a mapping flag the
ia64 code would have requested non-cached, this is done for all register
mappings and the VGA framebuffer (because write combining was avoided on
banked memory). If the FAMEBUFFER flag is passed then write combining is
selected instead of non-cached, but the framebuffer should have also
been set up by the firmware as non-cached. I would expect this to have
caused an inconsistency between the TLB entry and the ZX1 chipset
possibly leading to MCA's but this has not been observed, so I can't
explain that one.

The only other example I have heard of where XFree86 was requesting a
cached or non-cached mapping which was not consistent with EFI is the
mapping of ROM BIOS when using the ROM base address register in the PCI
config. EFI treats that memory as non-cached because its PCI and the
non-cached flag was not passed in xf86ReadBIOS. ROM BIOS cached reads to
the standard video rom address 0xC were fine because its a shadow
image in RAM which is cached. So far I can only find source code that
passes 0xC to xf86ReadBIOS, but that does not mean there isn't code
lurking somewhere that uses the PCI ROM BAR.

The MAP_WRITECOMBINED flag passed to mmap is now also deprecated in
Linux as of 2.4.19 for two reasons, its lack of support on most ia64
platforms and because of memory attribute aliasing problems that
originally caused the MAP_NONCACHED and O_SYNC flags to be removed. The
flag can still be specified, but will be ignorned. Note that all ia64
GARTs are now run in choerent mode, so there is no need for write
combining. 

John




___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-26 Thread John Dennis
On Tue, 2003-08-26 at 10:22, Marc Aurele La France wrote:
 Frankly, I don't see how this EFI MDT can be accurate given that, in
 general, whether or not a particular PCI memory assignment will tolerate
 caching and/or write-combining is highly device-specific.  That would be a
 horrific PCI device database for EFI to maintain.

My understanding (albeit limited) from correspondence with HP is that it
is the chipset that determines this. On current HP ia64 they use the
ZX1 chipset which only does non-cached access on PCI. Thus if firmware
knows there is a ZX1 chipset it knows how the memory region will be
handled. Apparently EFI also seems to know about GARTs. Intel boxes used
a different chipset but share EFI and the kernel code so it should be
behave the same.

I'm leery of misinterpreting some of the correspondence so is the actual
text from some of it, if you draw different conclusions let me know.

John What had been confusing at this end is the notion that ROM by
John definition can never be incoherent, therefore cached
John vs. non-cached should be irrelevant.

HP Ah, I see that point.  The problem in this case is that the
HP chipset may support different types of access to different
HP regions.  (That's part of what the EFI MDT tells you.)  The ZX1
HP chipset doesn't support cacheable access to any MMIO space,
HP regardless of whether that space happens to contain RAM, ROM,
HP device CSRs, etc.

John The problem if I understand correctly is that the access was not
John consistent with the EFI table defining what the memory attribute
John needed to be. Bottom line, its EFI that determines cachability,
John not a prori knowledge of the memory characteristics being
John mapped.

HP Right.  And the underlying chipset determines what goes in the EFI
HP table.

-

John This made me think about the MAP_WRITECOMBINED flag passed to
John mmap. The current scheme has EFI responsible for determining the
John cached vs.  non-cached memory attribute, the user should not be
John specifying such a memory attribute. Write combining or write
John coalescing is one variant of a non-cached memory attribute on
John ia64 that is used for frame buffers. My understanding is that
John EFI will be ignorant of this memory attribute and the
John MAP_WRITECOMBINED remains a valid flag to pass. I also assume
John that passing this flag replaces an ma value of 4 (uncached) with
John 6 (uncached write coalescing) and that such a replacement,
John outside the scope of the EFI MDT is valid because both share the
John uncached attribute. Correct?

HP The EFI MDT does have a bit for WC, but I don't know of any ia64
HP platform that sets it.  I removed support for MAP_WRITECOMBINED in
HP the 2.4.19 patch because we don't have a good way of using it
HP safely.  (Even if there were an ia64 platform that claimed to
HP support it, we'd have to be very careful to avoid the attribute
HP aliasing problem.  The fact that kernel identity mappings don't
HP have page tables and are inserted in 64Mb (or maybe 16Mb) chunks
HP means that we'd have to somehow ensure that a 64Mb chunk that
HP contained a WC mapping could never also be mapped WB.)

HP From the user side, MAP_WRITECOMBINED can still be specified; we
HP just ignore it in the kernel.  This used to be used for AGP, but
HP all ia64 GARTs are now run in coherent mode, so there's no need
HP for WC.

HP It is possible for multiple attributes to be supported, so it's
HP conceivable that we could look at user requests for special
HP mappings.  For example, the i2000 supports both UC and WB mappings
HP of main memory.  But at the moment, I don't think there's an
HP actual need for such a feature, and the fact that we don't have
HP kernel page tables would complicate adding it.


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-26 Thread John Dennis
On Tue, 2003-08-26 at 13:40, Egbert Eich wrote:
 Marc Aurele La France writes:
   On Tue, 26 Aug 2003, Egbert Eich wrote:
   
   
   Frankly, I don't see how this EFI MDT can be accurate given that, in
   general, whether or not a particular PCI memory assignment will tolerate
   caching and/or write-combining is highly device-specific.  That would be a
   horrific PCI device database for EFI to maintain.
   
 
 How that is done is in fact an interesting question. Maybe someone
 with good contacts to HP could inquire on this.

I will confess my understanding is weak when it comes to low level bus
interactions, but I'm learning more eveytime I have to tackle these
issues ;-)

Correct me if I'm wrong, but I thought things like caching and
write-combining are not properties of the PCI device, rather they are
properties of the memory system upstream of the PCI device, e.g.
bridges, memory controllers, and the MMU in the CPU.

The PCI configuration does provide various pieces of information which
help determine how the device can be accessed, e.g. pre-fetch, latency,
cache line size etc. All of this is available to firmware. Wouldn't all
this be sufficient for firmware to make the right decisions?

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-25 Thread John Dennis
I have spent quite a bit of time investigating this issue and I think
we now understand the underlying issue.

The various places in XFree86 that mmap memory seem to very careful to
specify the proper mapping attributes, e.g. when mapping registers
with ordering requirements and side-effects (e.g. MMIO) the mapping is
forced to be non-cached and ordered. But ROM (e.g. device bios) can be
read cached, it has no side-effects. Thus when xf86ReadBIOS maps the
ROM BIOS it does not force a non-cached mapping. In theory this is
correct.

However on IA64 the concept of caching is overloaded, not only does it
refer to memory coherence but more importantly selects between two
vastly different memory spaces, RAM and IO. A cached access is
directed to RAM and a non-cached access is directed to IO (e.g. a pci
device).

It was observed that ROM reads using the standard VGA ROM base of
0xC were successful, but ROM reads using a base address computed
from the PCI TAG (e.g. using the ROM BAR) failed unless the caching
attribute was asserted.

Note that at boot time the VGA bios is copied from the card to RAM at
0xC (e.g. the shadow copy) as is required by the PCI spec. In
XFree86 it uses the symbol V_BIOS for this address.

This implies the following:

1) Reading the bios at 0xC must be cached so that it is directed
to RAM.

2) Reading a bios on the card must be non-cached so that it becomes an
IO access.

This means if there is a common routine (e.g. xf86ReadBIOS) it must
distinguish between whether the base address is a shadow copy or
not. This also explains why cached reads of 0xC were successful
and why cached reads off the card failed (and in most cases should
have machine checked). If our analysis is correct it means we can't
just force a non-cached mapping unless we know if we are pointing to a
shadow or the physical IO address of the bios.

This problem only seems to show up when there is a second graphics
card in the system. This also makes sense to me. The primary card has
its VGA shadowed at 0xC in RAM. But if a driver or int10 code want
to read the bios on the non-primary card it can't use the VGA ROM base
because that belongs to the primary, it must get it from the PCI
config and read it from the card.

I went looking for the place in the XFree86 code that reads rom bios
using the tag, as this seems to be a key element, but I didn't find it
yet. Can someone point me at it?

On the assumption this analysis is correct should we ifdef
xf86ReadBIOS for __ia64__ and test for anything in the range of
0xC to 0xC+size and modify the mapping based on that test?

Are there any places in the server which read BIOS that is not done
via the utility xf86ReadBIOS?

John


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: *** xf86BiosREAD, IA64, Multi-card Init ***

2003-08-25 Thread John Dennis
On Mon, 2003-08-25 at 14:51, David Dawes wrote:
 However on IA64 the concept of caching is overloaded, not only does it
 refer to memory coherence but more importantly selects between two
 vastly different memory spaces, RAM and IO. A cached access is
 directed to RAM and a non-cached access is directed to IO (e.g. a pci
 device).
 
 Does this mean that the video memory aperture on a PCI device is
 classified as RAM, or is there something else that ensures that
 cached accesses get directed to the PCI device in this case?  Is
 the difference that this is a writeable region while the ROM is
 read-only?

I just received information from HP (thank you Bjorn Helgaas) that the
memory attribute does not direct access between RAM and IO, I was in
error. But what is critical is that the EFI MDT (Memory Descriptor
Table?) which is set up at boot time is the arbiter of which memory
attributes are used for various memory regions. User space mappings that
try to force cached vs. non-cached accesses are inappropriate, its not
their decision, rather it must be consistent with the EFI table which is
set up at boot time. There is a kernel patch that ignores the user
request for cached vs. uncached mappings and instead (indirectly)
consults the EFI MDT such that the memory attribute field in the TLB is
consistent with how that memory is referenced in the system, as
determined by the firmware (EFI). Beta RH kernel's were missing this
patch that had been present in earlier kernels. Bjorn tells me that with
the patch applied no changes need to be made to X for proper operation
and that the patch is in the upstream kernel sources.

FYI, the patch follows for those interested, note that O_SYNC is no
longer used as a trigger for non-cached access.

--- drivers/char/mem.c  2003-08-21 15:55:17.0 -0600
+++ /home/helgaas/linux/testing/drivers/char/mem.c  2003-08-13
10:54:25.0 -0600
@@ -180,6 +177,11 @@
  test_bit(X86_FEATURE_CYRIX_ARR,
boot_cpu_data.x86_capability) ||
  test_bit(X86_FEATURE_CENTAUR_MCR,
boot_cpu_data.x86_capability) )
   addr = __pa(high_memory);
+#elif defined(__ia64__)
+   struct page *page;
+
+   page = virt_to_page(__va(addr));
+   return !VALID_PAGE(page) || PageReserved(page);
 #else
return addr = __pa(high_memory);
 #endif
@@ -194,7 +196,11 @@
 * through a file pointer that was marked O_SYNC will be
 * done non-cached.
 */
+#ifdef __ia64__
+   if (noncached_address(offset))
+#else
if (noncached_address(offset) || (file-f_flags  O_SYNC))
+#endif
vma-vm_page_prot = pgprot_noncached(vma-vm_page_prot);
 
/* Don't try to swap out physical pages.. */

-- 
John Dennis [EMAIL PROTECTED]

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


cvsup problems

2003-08-14 Thread John Dennis
I had been using cvsup successfully to sync with the XFree86 CVS tree,
but after returning from vacation it has started to fail with what
appears to be an internal error in the client. 

***
*** runtime error:
***Segmentation violation - possible attempt to dereference NIL0
***
 
  use option @M3stackdump to get a stack trace
Aborted


I will include the stackdump below and my config file, I didn't see
anything in the stackdump that meant anything to me. I did update my
cvsup client to the latest and I checked the cvsup web site looking for
possible causes/solutions. The only thing I found was if you DNS could
not reverse map your ip address you could get a similar failure, but my
DNS server can do the reverse mapping fine. So I'm a bit perplexed,
anybody have a suggestion as to what may be the problem. Like I said,
this has all been working fine previously. The only thing I can think of
is that my host machine did have some libraries updated but cvsup is
statically linked so I don't think that could explain the sudden
failure.

$ cvsup @M3stackdump cvsup.xfree86
 
 
***
*** runtime error:
***Segmentation violation - possible attempt to dereference NIL0
***
 
- STACK DUMP ---
PC  SP
 0x80c9c18  0xbfffe130  Crash + 0x58 in
/usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTProcess.m3
 0x80c8d6f  0xbfffe144  EndError + 0x3f in
/usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTMisc.m3
 0x80c8b74  0xbfffe168  FatalErrorI + 0x34 in
/usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTMisc.m3
 0x80cd46a  0xbfffe17c  SegV + 0x2a in
/usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/LINUXLIBC6/RTSignal.m3
 0x80eafb8  0xbfffe4f0
 0x8106e0c  0xbfffe538
 0x810694b  0xbfffe5fc
 0x8106f36  0xbfffe628
 0x8101ef9  0xbfffe634
 0x810694b  0xbfffe6f8
 0x8101772  0xbfffe758
 0x8101dff  0xbfffe774
 0x811b2d7  0xbfffe78c
 0x8102c9d  0xbfffe7e4
 0x810285c  0xbfffe83c
 0x80a7e8e  0xbfffea0c  CanGet + 0x16e in
/usr/local/src/m3/pm3-1.1.15/libs/libm3/src/uid/POSIX/MachineIDPosix.m3
 0x80a7238  0xbfffea28  Init + 0x58 in
/usr/local/src/m3/pm3-1.1.15/libs/libm3/src/uid/Common/TimeStamp.m3
 0x80a73b1  0xbfffea94  New + 0x81 in
/usr/local/src/m3/pm3-1.1.15/libs/libm3/src/uid/Common/TimeStamp.m3
 0x80a69a1  0xbfffead0  RandomSeed + 0x41 in
/usr/local/src/m3/pm3-1.1.15/libs/libm3/src/random/Common/Random.m3
 0x80a6876  0xbfffeae4  Init + 0x46 in
/usr/local/src/m3/pm3-1.1.15/libs/libm3/src/random/Common/Random.m3
 0x8066467  0xbfffeb1c  New + 0x87 in
/usr/local/src/cvsup/cvsup-snap-16.1d/client/src/BackoffTimer.m3
 0x806895b  0xb0bc
 0x80c869f  0xb0d4  RunMainBodies + 0x6f in
/usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTLinker.m3

-- EXCEPTION HANDLER STACK -
0xbfffe85c RAISES {}
0xbfffea20 RAISES {}
0xbfffea60 LOCK  mutex = 0x81842ec
0xbfffea6c RAISES {}
0xbfffeac8 RAISES {}
0xbfffec34 TRY-FINALLY  proc = 0x8068d73   frame = 0xb0bc
0xbfffecfc TRY-EXCEPT  {Main.Error}
0xbfffee68 TRY-EXCEPT  {Thread.Alerted}

Aborted

Here is my cvsup config file:

*default release=cvs host=anoncvs.xfree86.org
base=/home/boston/jdennis/.cvsup
*default prefix=/home/boston/jdennis/src/xfree86 delete use-rel-suffix
*default compress
*default tag=.
xc-all
doctools-all
contrib-all
xtest-all
utils-all
 

-- 
John Dennis [EMAIL PROTECTED]

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


SlowBCopy, IA64, PCI bus corruption

2003-08-14 Thread John Dennis
 as to
how much no-op is needed in the loop on a given system?

4) Is my general analysis correct? If not can you help explain where
I'm missing the mark and what the actual issues are?

John

-- 
John Dennis [EMAIL PROTECTED]

___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: Cyberpro 20x0 driver?

2003-01-27 Thread John Dennis
At at previous job I was responsible for XFree86 Cyberpro drivers. Tvia
(www.tvia.com) had supplied us with source code for their XFree86 4.x
Cyberpro driver that worked reasonably well. I did have to make a few
fixes and we did add some enhancements.

The point of this is that Tvia does develop and maintain XFree86 drivers
for their Cyberpro series. Why these are not available at least as
binary downloads from their web site eludes me. Tvia does try to earn
extra income from selling their SDK's. As of a year ago when I working
on this the SDK's did not include the XFree86 driver sources, we had to
obtain that separately. I suspect the reason Tvia has not open sourced
their driver is a function of their wanting to derive income from the
sale of their SDK's. My personal opion of their SDK's and their doc was
it was not worth the price they were asking. However, having said that,
I did find it essential to have the SDK's in order to work on the driver
because the SDK's provided example code to perform certain functions
which at the time were not part of the driver.

I think Tvia suffers the same type of myopia that many small vendors
suffer from. They would generate more income via increased hardware
sales by opening up their source pool then the amount of income they
generate by selling their marginal SDK's to a handful of partners. Maybe
it would be worthwhile for someone to press them on this issue.

John


___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel