Re: PCI Express
On Tue, 2004-02-03 at 17:37, Marc Aurele La France wrote: PCI-Xpress is programmatically identical to PCI, so I don't forsee any problems in that regard. Yes, its identical to PCI in terms of the interface presented to the OS so configuration probably won't be an issue, but there is code in hw/xfree86/os-support/bus that attempts to walk the bus topology with explicit knowledge of current PCI bridges. I don't believe this code will execute correctly, but I assume it can be easily defeated, yes? But thats probably less of an issue than the fact PCIE systems demand new PCIE cards and that means driver support. If we are fortunate the new PCIE cards might be pragmatically compatible with with current cards and XFree86 will only need to recognize the new cards, but I really doubt that will be the case. ATI, nvidia, 3DLabs have all announced support for PCIE and pledged it will bring dramatic performance increases. Some of that will be due to the faster bus, but I've got to believe these new cards will feature architectural changes as well demanding new driver support. Bottom line, I've been asked if XFree86 will be able to support PCIE systems due to arrive in a few months or if these systems are going to be dead in the water for open source for anything other than a server. So I'm digging for what people know or believe the issues are. John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
PCI Express
In a few months PCI Express (PCIE) will hit the streets. My understanding is that some system vendors are building system boards without any AGP slots. As far as I know that means only an old style PCI graphics card will work (PCIE is fully compatible with PCI), or a new PCIE card. Does anybody know to what extent PCIE will be supported by XFree86 and in what timeframe? I found a preliminary release note for the Radeon driver in 4.4 that in a PCIE system it will fall back to PCI and should work. But I'm guessing that DRI and other components that use AGP will be crippled in this environment, right? I'm also aware the XFree86 server likes to roam around and touch things like PCI bridges, I'm wondering how this code will play in a PCIE system. Have any of the XFree86 developers been seeded with PCIE systems and graphics cards and are working on this? What is the extent of changes anticipated to support PCIE only systems? And if so what's the timeframe? -- John Dennis [EMAIL PROTECTED] ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
DRI proprietary modules
For DRI to work correctly there are several independent pieces that all have to be in sync. * XFree86 server which loads drm modules (via xfree86 driver module) * The drm kernel module * The agpgart kernel module Does anybody know for the proprietary drivers (supplied by ATI and Nvidia) which pieces they replace and which pieces they expect to be there? The reason I'm asking is to understand the consequences of changing an API. I'm curious to the answer in general, but in this specific instance the api I'm worried about is between the agpgart kernel module and drm kernel module. If the agpgart kernel module modifies it's API will that break things for someone who installs a proprietary 3D driver? Do the proprietary drivers limit themselves to mesa driver and retain the existing kernel services assuming the IOCTL's are the same? Or do they replace the kernel drm drivers as well? If so do they manage AGP themselves, or do they use the systems agpgart driver? Do they replace the systems agpgart driver? -- John Dennis [EMAIL PROTECTED] ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: [Dri-devel] Deadlock with radeon DRI
The locking problem is solved, my original analysis was incorrect. The problem was that DRM_CAS was not correctly implemented on IA64. Thus this was an IA64 issue only, this is consistent with others who showed up in a google search describing the problem, all were on IA64. I have filed an XFree86 bug report on this. I could not find a DRI specific bug reporting mechanism other than the dri-devel list. The IA64 implementation of CAS was this: #define DRM_CAS(lock,old,new,__ret) \ do { \ unsigned int __result, __old = (old); \ __asm__ __volatile__( \ mf\n\ mov ar.ccv=%2\n \ ;;\n\ cmpxchg4.acq %0=%1,%3,ar.ccv\ : =r (__result), =m (__drm_dummy_lock(lock)) \ : r (__old), r (new) \ : memory); \ __ret = (__result) != (__old);\ } while (0) The problem was with the data types given to the cmpxchg4 instruction. All of the lock types in DRM are int's and on IA64 thats 4 bytes wide. The digit suffix cmpxchg4 signifies this instruction operates on a 4 byte quantity. One might expect then since this instruction operates on 4 byte values and in DRM the locks are 4 bytes everything is fine, but it isn't. The cmpxchg4 instruction operates this way: cmpxchg4 r1=[r3],r2,ar.ccv 4 bytes are read at the address pointed to by r3, that 32 bit value is then zero extended to 64 bits. The 64 bit value is then compared to the 64 bit value stored in appliation register CCV. If the two 64 bit values are equal then the least significant 4 bytes in r2 are written back to the address pointed to by r3. The original value pointed to by r3 is stored in r1. The entire operation is atomic. The mistake in the DRM_CAS implemenation is that the comparison is 64 bits wide, thus the value stored in ar.ccv (%2 in the asm) must be 64 bits wide and for us that means zero extending the 32 bit old parameter to 64 bits. Because of the way GCC asm blocks work to tie C variables and data types to asm values the promotion of old from unsigned int to unsigned long was not happening. Thus when old was stored into ar.ccv its most significant 32 bits contained garbage. (Actually because of the way GCC generates constants it turns out the upper 32 bits was 0x, this was from the OR of DRM_LOCK_HELD which is defined as 0x8000, but the compiler generates a 64 bit OR operation using the immediate value 0x8000, which is legal because the upper 32 bits are undefined on int (32 bit) operations). The bottom line is that the test would fail when it shouldn't because the high 32 bits in ar.ccv were not zero. One might think that because old was assigned to __old in a local block which was unsigned int the compiler would know enough when using this value in the asm to have zero extended it. But that's not true, in asm blocks its critical to define the asm value correctly so the compiler can translate between the C code variable and what the asm code is referring to. The line: : r (__old), r (new) says %2 is mapped by r (__old), in other words put __old in a general 64 bit register. We've told the compiler to put 64 bits of __old into a register, but __old is a 32 bit value with its high order 32 bits undefined. We need to tell the compiler to widen the type when assigning it to a general register, thus the asm template type definition needs to be modified with a cast to unsigned long. : r ((unsigned long)__old), r (new) Only with this will the compiler know to widen the 32 bit __old value to 64 bits inside the asm code. Thanks to Jakub Jelinek who helped me understand the nuances of GCC asm templates and type conversions. As a minor side note, definitions of bit flags should be tagged as unsigned. Thus things like: #define DRM_LOCK_HELD 0x8000 #define DRM_LOCK_CONT 0x4000 should really be: #define DRM_LOCK_HELD 0x8000U #define DRM_LOCK_CONT 0x4000U John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Deadlock with radeon DRI
[Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ] I'm trying to debug a hung X server problem with DRI using the radeon driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at the moment I don't see anything architecture specific about the problem. The symptom of the problem is the following message from the drm radeon kernel driver: [drm:radeon_lock_take] *ERROR* x holds heavyweight lock where x is a context id. I've tracked the sequence of events down to the following: DRIFinishScreenInit is called during the radeon driver initialization, inside DRIFinishScreenInit is the following code snippet: /* Now that we have created the X server's context, we can grab the * hardware lock for the X server. */ DRILock(pScreen, 0); pDRIPriv-grabbedDRILock = TRUE; Slightly later on RADEONAdjustFrame is called and it does the following: #ifdef XF86DRI if (info-CPStarted) DRILock(pScrn-pScreen, 0); #endif Its this DRILock which is causing the *ERROR* x holds heavyweight lock message. The reason is both DRIFinishScreenInit and RADEONAdjustFrame are executing in the server and using the servers DRI lock. DRIFinishScreenInit never unlocks, it sets the grabbedDRILock flag, big deal, no one ever references this flag. When RADEONAdjustFrame calls DRILock its already locked because DRIFinishScreenInit locked and never unlocked. The dri kernel driver on the second lock call then suspends the X server process (DRM(lock_take) returns zero to DRM(lock) because the context holding the lock and context requesting the lock are the same, this then causes DRM(lock) to put the X server on the lock wait queue). Putting the X server on the wait queue waiting for the lock to be released then deadlocks the X server because its the process holding the lock on its context. Questions: The whole crux of the problem seems to me the taking and holding of the lock in DRIFinishScreenInit. Why is this being done? I can't see a reason for it. Why does it set a flag indicating its holding the lock if nobody examines that flag? Is suspending a process that already holds a lock during a lock request really the right behavior? Granted, a process thats trying to lock twice without an intervening unlock is broken, but do we really want to deadlock that process? Any other insights to this issue? FWIW, I googled for this error and came up with several folks who starting around last spring started seeing the same problem, but none of the mail threads had a follow up solution. Thanks, John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: setjmp needs fixing again, here's the issues
On Wed, 2003-09-10 at 20:11, David Dawes wrote: John wrapper. So as long as we've already lost module John independence by virtue of linking the system function why John not go all the way and use the system definition of the John system function's argument? It seems like David We haven't lost module independence by doing that. Perhaps I'm misunderstanding module independence, can you confirm or deny some of the assumptions I've been working with: 1) System specific functions are wrapped in an xf86* wrapper so that: a) system differences are isolated at a single point which provides a common interface to the rest of the server b) all system specific functions are wrapped and live in the executable XFree86. The XFree86 executable is linked against the system libraries and hence the XFree86 executable is not system independent. 2) module independence derives from the following: a) modules only link against (e.g. import) from the XFree86 executable that loaded it. In other words all system specific elements are contained in the executable, not a loaded module. b) The executable and modules share the same linking, relocation, exception, and call conventions. This makes a module mostly system independent, but not 100%, e.g. its possible, but not common, for different OS's on the same architecture to have a different ABI, but less common today as the industry is converging on standardized ABI's. Is the above basically correct? If so then by directly linking a reference to the system's setjmp call in a module haven't we violated the notion that only the main executable has dependencies on the system libraries? The module is now making a call whose parameters and behavior is specific to the system. Actually, more correctly, its specific to the how the main executable was linked (which is system specific), see below for why this is true. David All that matters is that the jmp_buf type be defined on David each architecture in such a way that it meets the David requirements of all the supported OSs on that architecture. David Newer architectures are more likely to have a David platform-independent ABI anyway than was necessarily true David in the past. If the platform's setjmp.h can define the David type with the correct alignment, then so can we. It David doesn't matter if the methods for doing this are David compiler-specific. If the 128-bit alignment is an IA64 ABI David requirement, then I'd expect that all complilers for IA64 David will have a method for defining jmp_buf with the correct David alignment. If I following your reasoning then you would be happy with this definition in xf86_libc.h /* setjmp/longjmp */ #if defined(__ia64__) typedef int xf86jmp_buf[1024]; __attribute__ ((aligned (16))); /* guarantees 128-bit alignment! */ #else typedef int xf86jmp_buf[1024]; #endif But if one is going to special case the definition of jmp_buf based on architecture why not use the system definition and typedef xf86jmp_buf to be jmp_buf? I suspect the answer will be that the system libraries could change requiring a different size buffer. I'd buy that if it weren't for the fact we are directly linking the library specfic version of setjmp into the module. Why a module referencing setjmp is tied to a specific system library: - As far as I can tell we've tied the module to a specific system library. Here is why I believe this. I'm going to simplify the discussion by assming there is only one xf86set_jmp symbol and ignore the type 0 and type 1 variants that select setjmp or sigsetjmp. 1) a module makes a reference to xf86set_jmp. 2) the xfree86 loader when it loads that module searches for that symbol in its hash table of symbols, that table was populated in part by the table in xf86sym.c which in a loadable build contains this definition in the main executable: #include setjmp.h SYMFUNCALIAS(xf86setjmp,setjmp) 3) the above definition means when the symbol name xf86setjmp is looked up by the loader it will get the address of setjmp that was linked into the main executable. This is the function address that the xfree86 loader will insert into module during its relocation phase. 4) How does the address of setjmp get into the main executable? It depends on whether the main executable was statically or dynamically linked, but in either case the system will assure it comes from a specific version of the library defined at the time the main executable was linked. 5) Therefore when a module that referenced setjmp is called its calling the system version of setjmp in the exact library when the main executable was linked. I don't see how the above satisfies the conception of module independence and by extension the avoidence of using the
Re: setjmp needs fixing again, here's the issues
On Thu, 2003-09-11 at 14:35, David Dawes wrote: What's the difference between this in the core executable: xf86A(pointer data) { return A(data); } SYMFUNC(xf86A) and this: SYMFUNCALIAS(xf86A, A) The difference is that xf86A may massage data in some system specific manner specific to the libraries the main executable is linked against. Examples might include parameters to mmap where the base address and size have to be adjusted for page boundaries and the mapping flags passed. Or anything passed into an ioctl. I don't think the current set of wrapped functions includes anything of this nature, but I thought it was the principle behind it. Module independence comes from providing a more uniform ABI to the modules than you might get by linking directly with the functions provided by the system libraries. Isn't that what I'm saying above? Xfree86 imposes its own interface leaving the wrappers to translate when the underlying system differs. Perhaps just as importantly since the wrappers are in the main executable, which is the ONLY place loadable object dependencies are enforced, its the only place you can be guaranteed that you'll be linked to the version of shared object you're expecting! If I following your reasoning then you would be happy with this definition in xf86_libc.h Yes. O.K. I'll make this patch and put it in the bugzilla. I'm not completely happy with it because its not really addressing the root problem. Setjmp may blow up on some other system in the future because we're ignoring the system defined requirements. For example in the non-ia64 case xf86jmp_buf will only be aligned on int boundaries, what if a system needs long alignment, or some other requirement? I'll grant that what we're going with setjmp is a little hairy, but I haven't seen anything so far that says the basic approach doesn't work or impacts the portability of modules. Yes, you are right setjmp is a true exception because it can't be wrapped and thus isolate system specifics outside the module. And you're also right its a bit hairy, it took me a while to fully understand exactly what was going on. As for impacting portability of modules, why isn't the following true? libc version 1.0 requires 8 byte alignment, server XFree86 A is linked against this library. libc version 1.1 requires 16 byte alignment, server XFree86 B is linked against this library. The module in theory can be loaded into either XFree86 server A or B. If the module was built against the libc 1.0 requirement it will fail when loaded into XFree86 B. This is because unlike the system loader which enforces the right version of the library to be loaded when XFree86 is loaded by the system, the module loader does not deal with version dependencies. In part because the version dependencies in theory have been isolated in the main executable image via wrappers and enforced by the system loader. The current setjmp implementation violates all these assumptions which is why I conclude this negatively impacts module portability. It is also true that if setjmp were in a wrapper the module could be loaded into XFree86 server A or B without issue, but thats not the case, it will crash the server if loaded in XFree86 server B. Granted, if we are lucky this may never bite us, there is a reasonable chance it won't, but on the other hand to assert we haven't lost module independence and portability is not being honest about the new situation with modules that import xf86set_jmp. I'm not saying I have an answer to the problem we've created for ourselves, but I am saying we need to be honest about the impact, that any module that imports xf86set_jmp is no longer portable, is tied the libraries the main XFree86 is linked against, and as such we might as well use the system definitions of jmp_buf because its the robust solution and is no more, no less portable than inventing a jmp_buf abstraction that inherently is not robust. So if you're not getting portability and independence with the abstraction why sacrifice robustness? ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
setjmp needs fixing again, here's the issues
I have found the problem with XFree86's implementation of setjmp on ia64. It occurs because we are ignoring the type definition of setjmp's argument. int setjmp(jmp_buf env); In xf86_lib.h we redefine jmp_buf thusly: /* setjmp/longjmp */ typedef int xf86jmp_buf[1024]; #undef jmp_buf #define jmp_buf xf86jmp_buf Based on the discussions that occurred last Feb and March I believe this was done to preserve a loadable modules system independence. The notion being that each system may have a different jmp_buf size, we can't know aprori what that is, so make it bigger than we think is necessary and it should be big enough no matter on what system the module is loaded on. However ignoring the system defined type for jmp_buf we've ignored the system specific alignment requirement of jmp_buf. On ia64 it must be 16 byte aligned because setjmp when it saves the execution context on ia64 uses machine instructions that writes the contents of the 16 byte floating point registers, the destination address must be 16 byte aligned or a SIGBUS fault is signaled. On ia64 you can see in /usr/include/bits/setjmp.h the following jmp_buf definition: /* the __jmp_buf element type should be __float80 per ABI... */ typedef long __jmp_buf[_JBLEN] __attribute__ ((aligned (16))); /* guarantees 128-bit alignment! */ I did an experiment where I defeated the use of xf86jmp_buf and instead used the systems definition of jmp_buf and the SIGBUS problem went away. Earlier I had noted that a static server did not exhibit the problem, thats because the xf86jmp_buf is only used in a loadable build. How do we fix this? --- 1) It may be hard to know the alignment requirements for all OS's and architectures (thats part of the reason we have system header files). We could guess 16 byte is sufficient. But even if we got the alignment requirement right how do we specify the alignment requirement to the compiler in a portable way such that it will build on a variety of systems and compilers? My understanding is that compiler alignment directives are not portable (e.g. not part of ANSI C). Are we comfortable picking one or two common alignment directives and conditionally compiling with the right one? 2) We cannot force alignment without the use of compiler directives for any alignment greater than the maximum size of a basic type by creating artifical structs. This is because in C structs are aligned to the size of the maximum basic type, which on ia64 is 8 (long, double, etc), half the alignment requirement. Plus such a scheme would put us back into guessing type sizes, alignment requrements, etc. It would not be robust. 3) We could use the systems's definition of jmp_buf. That means it will always be correct on the system it was compiled on, but that module may be loaded on another system with potentially different jmp_buf definitions and cause problems. I realize we have a goal of system independence for loadable modules, but do we really expect modules built on one system to be loaded on a significantly different system? They are after all binary modules. I also realize that setjmp is currently the only thing in our loadable modules which is not wrapped by a core server function so it would kind of be a shame to let this one item polute module independence but we have no choice, the loader now links the system implementation of setjmp and not a wrapper. So as long as we've already lost module independence by virtue of linking the system function why not go all the way and use the system definition of the system function's argument? It seems like we would not have lost anything and would have picked up robustness we've given up by trying to use a system neutral definition of the jmp_buf argument. Given the argument in 3 above I'd recommend taking out xf86jmp_buf and putting back in #include stdjmp.h. It seems the simplist, most robust, and in practical terms sacrifices little. Comments? I'm going to file a bugzilla on this, its very definitely broken on ia64 and causes the server to crash. I will put the text of this in the bugzilla. John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
setjmp wrappers
Remember last Feburary and March there was a big discussion about xf86setjmp? Part of that discussion involved a SIGSEGV or SIGBUS in the freetype2 code when a font was not found. Well, I'm seeing the same thing (on ia64). At the time it was pointed out that one can never wrap setjmp because setjmp has bad behavior if its called from within a function that returns, which is exactly what a wrapper is. The setjmp code was reworked a couple of months ago (in part to account for libc differences). The current implementation has these code fragments: ftstdlib.h: --- #define DONT_DEFINE_WRAPPERS #define DEFINE_SETJMP_WRAPPERS #include xf86_ansic.h #undef DONT_DEFINE_WRAPPERS xf86_libc.h: #if defined(XFree86LOADER) \ (!defined(DONT_DEFINE_WRAPPERS) || defined(DEFINE_SETJMP_WRAPPERS)) #undef setjmp #define setjmp(a) xf86setjmp_macro(a) #undef longjmp #define longjmp(a,b)xf86longjmp(a,b) #undef jmp_buf #define jmp_buf xf86jmp_buf #endif As far as I can tell when one builds the freetype module for the loadable server you're not going to get any wrappers except for setjmp and this is bad because you can't wrap setjmp. As a trial I built both a static and a loadable server. The static server ran fine, but the loadable server dies with a SIGBUS in the freetype code when a setjmp/longjmp is executed. Pretty much what I expected. I'm wondering if I'm missing something, but its a fact you can't wrap setjmp right? And why would you turn off all wrappers except setjmp? Is this really right? I reread the original discussion and part of had to do with how to implement setjmp in modules that are supposed to be system neutral (i.e. must use wrappers) when one has this exception of a clib function that can't be wrapped. What ever came of that? -- John Dennis [EMAIL PROTECTED] ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
On Thu, 2003-08-28 at 12:46, Marc Aurele La France wrote: Secondly, EFI is already doing the wrong thing by marking PCI ROMs as non-cacheable. This doesn't inspire confidence... I believe there is a difference between ROM's being logically cacheable and the way the ZX1 actually wires that memory into the memory system. The ZX1 connection to PCI devices are always non-cached. It's a simplified assumption correct in most all cases with little penalty for the rare case of cacheable memory sitting out in MMIO space. Therefore EFI is not doing the wrong thing by marking ROM as non-cacheable, the ZX1 is going to treat any PCI address as non-cached by design. ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
On Thu, 2003-08-28 at 16:40, Marc Aurele La France wrote: ... which basically means that framebuffers cannot benifit from CPU caching. I don't beleive this to be the case. Further to this, it appears you don't realise that the frambuffers we're talking about here, _are_ in PCI space. Yes, I realize framebuffers are in PCI space. All I can do is make the following observations: 1) I was told by HP: The ZX1 chipset doesn't support cacheable access to any MMIO space, regardless of whether that space happens to contain RAM, ROM, device CSRs, etc. This statement seems clear to me, do you have a different interpretation? 2) Write combining is a typical memory attribute to apply to a framebuffer, on the IA64 the write combining memory attribute is also non-cacheable. 3) Would you really want caching on framebuffer memory in the presence of a graphics co-processor that can alter the memory independently of the cache? 4) It does not seem outlandish when considering the universe of PCI devices to believe the memory regions on these devices either have side-effects or can be modified by the device, either case would demand non-caching. It would be very hard, as it has been pointed out, for the firmware to know what memory regions on a device could be cached safely. Thus a decision to treat any PCI memory region as non-cached sounds like a plausible design decision. ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
On Wed, 2003-08-27 at 06:15, Egbert Eich wrote: Appearantly there are still issues with VGA framebuffer and emulated PIO register writes when saving and restoring fonts. These problems only affect certain cards (so far I've only heared of Nvidia cards). Mark Vojkovich is sure that these are related to caching problems, too. Have you made any progress investigating this issue? I have looked at this, here is what I know (or think I know :-) * The vga font save/restore seems unrelated to caching. - The vga framebuffer (where the fonts live) has always been mapped non-cached. The EFI on HP ZX1 would also have set this region up as non-cached. We have verified by inspecting the TLB it is non-cached. The kernel patch for non-caching (should not have mattered given the above) does not affect the font/save restore, nor did I expect it to. Everything seems right. * The vga font save/restore seems related to timing, specifically too many back to back bus transactions to the VGA framebuffer. - Adding more delays into SlowBCopy makes the problem go away. - The symptom of the problem is the bridge appears to be in a master abort error state. This would occur if the vga did not respond fast enough. * So far we've only seen this problem with nvidia (but see next item). - This suggests the problem is specific to the nvidia vga implementation and the speed at which it responds. * SlowBCopy seems only to be used for VGA font/save restore. - The whole idea of SlowBCopy bothers me from a technical perspective. - SlowBCopy has been around a while, thus this is not the first time folks have run into this problem. - No one seem to know anymore when or why its needed, but it seems to have first appeared on DEC ALPHA systems and seem unnecessary on standard PC class machines. It is an interesting correlation that both ALPHA and IA64 had fast bus transaction design goals. Mark, is there a hardware engineer at nvidia that would be familar with the VGA timing? Is it possible your VGA is running near or beyond the limits of the PCI timing requirements? My understanding is the ZX1 is a very aggressive chipset tuned for high performance, so its possible on a standard PC you may not have seen the timing problems, even if you were close to the limits. John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
I just wanted to follow up with this for those who are maintaining the memory mapping code in the CVS tree, this is linux ia64 specific and possibly HP ZX1 specific. As of today this is best information I have and wanted to share it. Originally the MAP_NONCACHED passed to mmap caused non-cached access. That was deprecated in favor of the O_SYNC flag passed to the open of /dev/mem. As of linux kernels 2.5.xx or those which have been patched, these flags are both deprecated and ignored. At some point os-support/linux/lnx_video.c should clean up the use of these flags. There are a few other places in the tree these flags are used as well. However, for the time being the use of these flags should remain as people may be running on kernels without the automatic detection of the caching attribute. The critical issue is that caching attribute the kernel sets in the TLB must match the EFI (firmware) setup. It has been a fortunate consequence that when XFree86 passed the non-caching flags it was for memory regions that EFI had configured to be non-caching and no problem was observed, the non-caching flag caused the kernel to set the TLB correctly. For kernels that ignore the flag it will still set the TLB correctly via an automatic mechanism. EFI will have created non-cached mappings with the HP ZX1 chipset for ALL PCI memory regions since the ZX1 only supports non-cached on IO. Anytime in the XServer when MMIO was specified as a mapping flag the ia64 code would have requested non-cached, this is done for all register mappings and the VGA framebuffer (because write combining was avoided on banked memory). If the FAMEBUFFER flag is passed then write combining is selected instead of non-cached, but the framebuffer should have also been set up by the firmware as non-cached. I would expect this to have caused an inconsistency between the TLB entry and the ZX1 chipset possibly leading to MCA's but this has not been observed, so I can't explain that one. The only other example I have heard of where XFree86 was requesting a cached or non-cached mapping which was not consistent with EFI is the mapping of ROM BIOS when using the ROM base address register in the PCI config. EFI treats that memory as non-cached because its PCI and the non-cached flag was not passed in xf86ReadBIOS. ROM BIOS cached reads to the standard video rom address 0xC were fine because its a shadow image in RAM which is cached. So far I can only find source code that passes 0xC to xf86ReadBIOS, but that does not mean there isn't code lurking somewhere that uses the PCI ROM BAR. The MAP_WRITECOMBINED flag passed to mmap is now also deprecated in Linux as of 2.4.19 for two reasons, its lack of support on most ia64 platforms and because of memory attribute aliasing problems that originally caused the MAP_NONCACHED and O_SYNC flags to be removed. The flag can still be specified, but will be ignorned. Note that all ia64 GARTs are now run in choerent mode, so there is no need for write combining. John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
On Tue, 2003-08-26 at 10:22, Marc Aurele La France wrote: Frankly, I don't see how this EFI MDT can be accurate given that, in general, whether or not a particular PCI memory assignment will tolerate caching and/or write-combining is highly device-specific. That would be a horrific PCI device database for EFI to maintain. My understanding (albeit limited) from correspondence with HP is that it is the chipset that determines this. On current HP ia64 they use the ZX1 chipset which only does non-cached access on PCI. Thus if firmware knows there is a ZX1 chipset it knows how the memory region will be handled. Apparently EFI also seems to know about GARTs. Intel boxes used a different chipset but share EFI and the kernel code so it should be behave the same. I'm leery of misinterpreting some of the correspondence so is the actual text from some of it, if you draw different conclusions let me know. John What had been confusing at this end is the notion that ROM by John definition can never be incoherent, therefore cached John vs. non-cached should be irrelevant. HP Ah, I see that point. The problem in this case is that the HP chipset may support different types of access to different HP regions. (That's part of what the EFI MDT tells you.) The ZX1 HP chipset doesn't support cacheable access to any MMIO space, HP regardless of whether that space happens to contain RAM, ROM, HP device CSRs, etc. John The problem if I understand correctly is that the access was not John consistent with the EFI table defining what the memory attribute John needed to be. Bottom line, its EFI that determines cachability, John not a prori knowledge of the memory characteristics being John mapped. HP Right. And the underlying chipset determines what goes in the EFI HP table. - John This made me think about the MAP_WRITECOMBINED flag passed to John mmap. The current scheme has EFI responsible for determining the John cached vs. non-cached memory attribute, the user should not be John specifying such a memory attribute. Write combining or write John coalescing is one variant of a non-cached memory attribute on John ia64 that is used for frame buffers. My understanding is that John EFI will be ignorant of this memory attribute and the John MAP_WRITECOMBINED remains a valid flag to pass. I also assume John that passing this flag replaces an ma value of 4 (uncached) with John 6 (uncached write coalescing) and that such a replacement, John outside the scope of the EFI MDT is valid because both share the John uncached attribute. Correct? HP The EFI MDT does have a bit for WC, but I don't know of any ia64 HP platform that sets it. I removed support for MAP_WRITECOMBINED in HP the 2.4.19 patch because we don't have a good way of using it HP safely. (Even if there were an ia64 platform that claimed to HP support it, we'd have to be very careful to avoid the attribute HP aliasing problem. The fact that kernel identity mappings don't HP have page tables and are inserted in 64Mb (or maybe 16Mb) chunks HP means that we'd have to somehow ensure that a 64Mb chunk that HP contained a WC mapping could never also be mapped WB.) HP From the user side, MAP_WRITECOMBINED can still be specified; we HP just ignore it in the kernel. This used to be used for AGP, but HP all ia64 GARTs are now run in coherent mode, so there's no need HP for WC. HP It is possible for multiple attributes to be supported, so it's HP conceivable that we could look at user requests for special HP mappings. For example, the i2000 supports both UC and WB mappings HP of main memory. But at the moment, I don't think there's an HP actual need for such a feature, and the fact that we don't have HP kernel page tables would complicate adding it. ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
On Tue, 2003-08-26 at 13:40, Egbert Eich wrote: Marc Aurele La France writes: On Tue, 26 Aug 2003, Egbert Eich wrote: Frankly, I don't see how this EFI MDT can be accurate given that, in general, whether or not a particular PCI memory assignment will tolerate caching and/or write-combining is highly device-specific. That would be a horrific PCI device database for EFI to maintain. How that is done is in fact an interesting question. Maybe someone with good contacts to HP could inquire on this. I will confess my understanding is weak when it comes to low level bus interactions, but I'm learning more eveytime I have to tackle these issues ;-) Correct me if I'm wrong, but I thought things like caching and write-combining are not properties of the PCI device, rather they are properties of the memory system upstream of the PCI device, e.g. bridges, memory controllers, and the MMU in the CPU. The PCI configuration does provide various pieces of information which help determine how the device can be accessed, e.g. pre-fetch, latency, cache line size etc. All of this is available to firmware. Wouldn't all this be sufficient for firmware to make the right decisions? ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
I have spent quite a bit of time investigating this issue and I think we now understand the underlying issue. The various places in XFree86 that mmap memory seem to very careful to specify the proper mapping attributes, e.g. when mapping registers with ordering requirements and side-effects (e.g. MMIO) the mapping is forced to be non-cached and ordered. But ROM (e.g. device bios) can be read cached, it has no side-effects. Thus when xf86ReadBIOS maps the ROM BIOS it does not force a non-cached mapping. In theory this is correct. However on IA64 the concept of caching is overloaded, not only does it refer to memory coherence but more importantly selects between two vastly different memory spaces, RAM and IO. A cached access is directed to RAM and a non-cached access is directed to IO (e.g. a pci device). It was observed that ROM reads using the standard VGA ROM base of 0xC were successful, but ROM reads using a base address computed from the PCI TAG (e.g. using the ROM BAR) failed unless the caching attribute was asserted. Note that at boot time the VGA bios is copied from the card to RAM at 0xC (e.g. the shadow copy) as is required by the PCI spec. In XFree86 it uses the symbol V_BIOS for this address. This implies the following: 1) Reading the bios at 0xC must be cached so that it is directed to RAM. 2) Reading a bios on the card must be non-cached so that it becomes an IO access. This means if there is a common routine (e.g. xf86ReadBIOS) it must distinguish between whether the base address is a shadow copy or not. This also explains why cached reads of 0xC were successful and why cached reads off the card failed (and in most cases should have machine checked). If our analysis is correct it means we can't just force a non-cached mapping unless we know if we are pointing to a shadow or the physical IO address of the bios. This problem only seems to show up when there is a second graphics card in the system. This also makes sense to me. The primary card has its VGA shadowed at 0xC in RAM. But if a driver or int10 code want to read the bios on the non-primary card it can't use the VGA ROM base because that belongs to the primary, it must get it from the PCI config and read it from the card. I went looking for the place in the XFree86 code that reads rom bios using the tag, as this seems to be a key element, but I didn't find it yet. Can someone point me at it? On the assumption this analysis is correct should we ifdef xf86ReadBIOS for __ia64__ and test for anything in the range of 0xC to 0xC+size and modify the mapping based on that test? Are there any places in the server which read BIOS that is not done via the utility xf86ReadBIOS? John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: *** xf86BiosREAD, IA64, Multi-card Init ***
On Mon, 2003-08-25 at 14:51, David Dawes wrote: However on IA64 the concept of caching is overloaded, not only does it refer to memory coherence but more importantly selects between two vastly different memory spaces, RAM and IO. A cached access is directed to RAM and a non-cached access is directed to IO (e.g. a pci device). Does this mean that the video memory aperture on a PCI device is classified as RAM, or is there something else that ensures that cached accesses get directed to the PCI device in this case? Is the difference that this is a writeable region while the ROM is read-only? I just received information from HP (thank you Bjorn Helgaas) that the memory attribute does not direct access between RAM and IO, I was in error. But what is critical is that the EFI MDT (Memory Descriptor Table?) which is set up at boot time is the arbiter of which memory attributes are used for various memory regions. User space mappings that try to force cached vs. non-cached accesses are inappropriate, its not their decision, rather it must be consistent with the EFI table which is set up at boot time. There is a kernel patch that ignores the user request for cached vs. uncached mappings and instead (indirectly) consults the EFI MDT such that the memory attribute field in the TLB is consistent with how that memory is referenced in the system, as determined by the firmware (EFI). Beta RH kernel's were missing this patch that had been present in earlier kernels. Bjorn tells me that with the patch applied no changes need to be made to X for proper operation and that the patch is in the upstream kernel sources. FYI, the patch follows for those interested, note that O_SYNC is no longer used as a trigger for non-cached access. --- drivers/char/mem.c 2003-08-21 15:55:17.0 -0600 +++ /home/helgaas/linux/testing/drivers/char/mem.c 2003-08-13 10:54:25.0 -0600 @@ -180,6 +177,11 @@ test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) || test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability) ) addr = __pa(high_memory); +#elif defined(__ia64__) + struct page *page; + + page = virt_to_page(__va(addr)); + return !VALID_PAGE(page) || PageReserved(page); #else return addr = __pa(high_memory); #endif @@ -194,7 +196,11 @@ * through a file pointer that was marked O_SYNC will be * done non-cached. */ +#ifdef __ia64__ + if (noncached_address(offset)) +#else if (noncached_address(offset) || (file-f_flags O_SYNC)) +#endif vma-vm_page_prot = pgprot_noncached(vma-vm_page_prot); /* Don't try to swap out physical pages.. */ -- John Dennis [EMAIL PROTECTED] ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
cvsup problems
I had been using cvsup successfully to sync with the XFree86 CVS tree, but after returning from vacation it has started to fail with what appears to be an internal error in the client. *** *** runtime error: ***Segmentation violation - possible attempt to dereference NIL0 *** use option @M3stackdump to get a stack trace Aborted I will include the stackdump below and my config file, I didn't see anything in the stackdump that meant anything to me. I did update my cvsup client to the latest and I checked the cvsup web site looking for possible causes/solutions. The only thing I found was if you DNS could not reverse map your ip address you could get a similar failure, but my DNS server can do the reverse mapping fine. So I'm a bit perplexed, anybody have a suggestion as to what may be the problem. Like I said, this has all been working fine previously. The only thing I can think of is that my host machine did have some libraries updated but cvsup is statically linked so I don't think that could explain the sudden failure. $ cvsup @M3stackdump cvsup.xfree86 *** *** runtime error: ***Segmentation violation - possible attempt to dereference NIL0 *** - STACK DUMP --- PC SP 0x80c9c18 0xbfffe130 Crash + 0x58 in /usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTProcess.m3 0x80c8d6f 0xbfffe144 EndError + 0x3f in /usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTMisc.m3 0x80c8b74 0xbfffe168 FatalErrorI + 0x34 in /usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTMisc.m3 0x80cd46a 0xbfffe17c SegV + 0x2a in /usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/LINUXLIBC6/RTSignal.m3 0x80eafb8 0xbfffe4f0 0x8106e0c 0xbfffe538 0x810694b 0xbfffe5fc 0x8106f36 0xbfffe628 0x8101ef9 0xbfffe634 0x810694b 0xbfffe6f8 0x8101772 0xbfffe758 0x8101dff 0xbfffe774 0x811b2d7 0xbfffe78c 0x8102c9d 0xbfffe7e4 0x810285c 0xbfffe83c 0x80a7e8e 0xbfffea0c CanGet + 0x16e in /usr/local/src/m3/pm3-1.1.15/libs/libm3/src/uid/POSIX/MachineIDPosix.m3 0x80a7238 0xbfffea28 Init + 0x58 in /usr/local/src/m3/pm3-1.1.15/libs/libm3/src/uid/Common/TimeStamp.m3 0x80a73b1 0xbfffea94 New + 0x81 in /usr/local/src/m3/pm3-1.1.15/libs/libm3/src/uid/Common/TimeStamp.m3 0x80a69a1 0xbfffead0 RandomSeed + 0x41 in /usr/local/src/m3/pm3-1.1.15/libs/libm3/src/random/Common/Random.m3 0x80a6876 0xbfffeae4 Init + 0x46 in /usr/local/src/m3/pm3-1.1.15/libs/libm3/src/random/Common/Random.m3 0x8066467 0xbfffeb1c New + 0x87 in /usr/local/src/cvsup/cvsup-snap-16.1d/client/src/BackoffTimer.m3 0x806895b 0xb0bc 0x80c869f 0xb0d4 RunMainBodies + 0x6f in /usr/local/src/m3/pm3-1.1.15/libs/m3core/src/runtime/common/RTLinker.m3 -- EXCEPTION HANDLER STACK - 0xbfffe85c RAISES {} 0xbfffea20 RAISES {} 0xbfffea60 LOCK mutex = 0x81842ec 0xbfffea6c RAISES {} 0xbfffeac8 RAISES {} 0xbfffec34 TRY-FINALLY proc = 0x8068d73 frame = 0xb0bc 0xbfffecfc TRY-EXCEPT {Main.Error} 0xbfffee68 TRY-EXCEPT {Thread.Alerted} Aborted Here is my cvsup config file: *default release=cvs host=anoncvs.xfree86.org base=/home/boston/jdennis/.cvsup *default prefix=/home/boston/jdennis/src/xfree86 delete use-rel-suffix *default compress *default tag=. xc-all doctools-all contrib-all xtest-all utils-all -- John Dennis [EMAIL PROTECTED] ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
SlowBCopy, IA64, PCI bus corruption
as to how much no-op is needed in the loop on a given system? 4) Is my general analysis correct? If not can you help explain where I'm missing the mark and what the actual issues are? John -- John Dennis [EMAIL PROTECTED] ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel
Re: Cyberpro 20x0 driver?
At at previous job I was responsible for XFree86 Cyberpro drivers. Tvia (www.tvia.com) had supplied us with source code for their XFree86 4.x Cyberpro driver that worked reasonably well. I did have to make a few fixes and we did add some enhancements. The point of this is that Tvia does develop and maintain XFree86 drivers for their Cyberpro series. Why these are not available at least as binary downloads from their web site eludes me. Tvia does try to earn extra income from selling their SDK's. As of a year ago when I working on this the SDK's did not include the XFree86 driver sources, we had to obtain that separately. I suspect the reason Tvia has not open sourced their driver is a function of their wanting to derive income from the sale of their SDK's. My personal opion of their SDK's and their doc was it was not worth the price they were asking. However, having said that, I did find it essential to have the SDK's in order to work on the driver because the SDK's provided example code to perform certain functions which at the time were not part of the driver. I think Tvia suffers the same type of myopia that many small vendors suffer from. They would generate more income via increased hardware sales by opening up their source pool then the amount of income they generate by selling their marginal SDK's to a handful of partners. Maybe it would be worthwhile for someone to press them on this issue. John ___ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel