Re: Question about your git habits
On Fri, 22 Feb 2008, Chase Venters wrote: > I've been making myself more familiar with git lately and I'm curious what > habits others have adopted. (I know there are a few documents in circulation > that deal with using git to work on the kernel but I don't think this has > been specifically covered). > > My question is: If you're working on multiple things at once, do you tend to > clone the entire repository repeatedly into a series of separate working > directories and do your work there, then pull that work (possibly comprising > a series of "temporary" commits) back into a separate local master > respository with --squash, either into "master" or into a branch containing > the new feature? > > Or perhaps you create a temporary topical branch for each thing you are > working on, and commit arbitrary changes then checkout another branch when > you need to change gears, finally --squashing the intermediate commits when a > particular piece of work is done? I find that the sequence of changes I make is pretty much unrelated to the sequence of changes that end up in the project's history, because my changes as I make them involve writing a lot of stubs (so I can build) and then filling them out. It's beneficial to have version control on this so that, if I screw up filling out a stub, I can get back to where I was. Having made a complete series, I then generate a new series of commits, each of which does one thing, without any bugs that I've resolved, such that the net result is the end of the messy history, except with any debugging or useless stuff skipped. It's this series that gets merged into the project history, and I discard the other history. The real trick is that the early patches in a lot of series often refactor existing code in ways that are generally good and necessary for your eventual outcome, but which you'd never think of until you've written more of the series. Generating a new commit sequence is necessary to end up with a history where it looks from the start like you know where you're going and have everything done that needs to be done when you get to the point of needing it. Furthermore, you want to be able to test these commits in isolation, without the distraction of the changes that actually prompted them, which means that you want to have your working tree is a state that you never actually had it in as you were developing the end result. This means that you'll usually want to rewrite commits for any series that isn't a single obvious patch, so it's not a big deal to commit any time you want to work on some different branch. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about your git habits
On Fri, 22 Feb 2008, Chase Venters wrote: I've been making myself more familiar with git lately and I'm curious what habits others have adopted. (I know there are a few documents in circulation that deal with using git to work on the kernel but I don't think this has been specifically covered). My question is: If you're working on multiple things at once, do you tend to clone the entire repository repeatedly into a series of separate working directories and do your work there, then pull that work (possibly comprising a series of temporary commits) back into a separate local master respository with --squash, either into master or into a branch containing the new feature? Or perhaps you create a temporary topical branch for each thing you are working on, and commit arbitrary changes then checkout another branch when you need to change gears, finally --squashing the intermediate commits when a particular piece of work is done? I find that the sequence of changes I make is pretty much unrelated to the sequence of changes that end up in the project's history, because my changes as I make them involve writing a lot of stubs (so I can build) and then filling them out. It's beneficial to have version control on this so that, if I screw up filling out a stub, I can get back to where I was. Having made a complete series, I then generate a new series of commits, each of which does one thing, without any bugs that I've resolved, such that the net result is the end of the messy history, except with any debugging or useless stuff skipped. It's this series that gets merged into the project history, and I discard the other history. The real trick is that the early patches in a lot of series often refactor existing code in ways that are generally good and necessary for your eventual outcome, but which you'd never think of until you've written more of the series. Generating a new commit sequence is necessary to end up with a history where it looks from the start like you know where you're going and have everything done that needs to be done when you get to the point of needing it. Furthermore, you want to be able to test these commits in isolation, without the distraction of the changes that actually prompted them, which means that you want to have your working tree is a state that you never actually had it in as you were developing the end result. This means that you'll usually want to rewrite commits for any series that isn't a single obvious patch, so it's not a big deal to commit any time you want to work on some different branch. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION 2.6.23] no vga console and no messages
As far as I can tell, the only differences in either dmesg or lspci between the broken one and the working one are the phrasing of messages, not what's happening. Out of curiousity, what do you see for the "Console: " line when you boot? It's possible that the VGA console code somehow got broken for both of us in 2.6.23, and this means that (a) my console doesn't work, since I'm trying to use it, and (b) if your framebuffer console tries to match your VGA console, it'll break, because your VGA console is broken. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION 2.6.23] no vga console and no messages
On Sun, 17 Feb 2008, Frans Pop wrote: > On Sunday 17 February 2008, Daniel Barkalow wrote: > > On Sun, 17 Feb 2008, Frans Pop wrote: > > > Daniel Barkalow wrote: > > > > For some reason I can't see and don't know how to debug, in 2.6.23 on > > > > my server I don't get the vga console, but only get the dummy > > > > console. > > > > > > Please check if this bug report matches the issue you are seeing: > > > http://bugzilla.kernel.org/show_bug.cgi?id=9310 > > > > I think mine might be different. I've got a vga parameter (vga=0x301), > > and mine disappears very early, before when you usually get "Console: > > colour VGA+ 80x25" (and I'm getting "Console: coloud dummy 80x25" > > instead). I've also got CONFIG_FB turned off entirely. > > The main question is: do you have FRAMEBUFFER_CONSOLE_DETECT_PRIMARY enabled > in you kernel config. If you do, I'd try disabling it. > > > But if you've got any insight into how the console driver stuff works > > from troubleshooting your problem, I could use the hints... > > Afraid not. Are you sure you have the correct framebuffer driver compiled > into the kernel? I'm sure I have none at all; I'm trying to use the vga console, not the framebuffer console, or the framebuffer at all. > Please post your kernel config and the output of 'lspci -nn', so people can > have a look. .config from the build where I disabled DUMMY_CONSOLE and it panics: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-gentoo-r8 # Sat Feb 16 21:54:06 2008 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=14 CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_BLOCK=y CONFIG_LBD=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MCORE2 is not set CONFIG_MPENTIUM4=y # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_
Re: [REGRESSION 2.6.23] no vga console and no messages
On Sun, 17 Feb 2008, Frans Pop wrote: > Daniel Barkalow wrote: > > For some reason I can't see and don't know how to debug, in 2.6.23 on my > > server I don't get the vga console, but only get the dummy console. > > Please check if this bug report matches the issue you are seeing: > http://bugzilla.kernel.org/show_bug.cgi?id=9310 I think mine might be different. I've got a vga parameter (vga=0x301), and mine disappears very early, before when you usually get "Console: colour VGA+ 80x25" (and I'm getting "Console: coloud dummy 80x25" instead). I've also got CONFIG_FB turned off entirely. But if you've got any insight into how the console driver stuff works from troubleshooting your problem, I could use the hints... -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION 2.6.23] no vga console and no messages
On Sun, 17 Feb 2008, Frans Pop wrote: Daniel Barkalow wrote: For some reason I can't see and don't know how to debug, in 2.6.23 on my server I don't get the vga console, but only get the dummy console. Please check if this bug report matches the issue you are seeing: http://bugzilla.kernel.org/show_bug.cgi?id=9310 I think mine might be different. I've got a vga parameter (vga=0x301), and mine disappears very early, before when you usually get Console: colour VGA+ 80x25 (and I'm getting Console: coloud dummy 80x25 instead). I've also got CONFIG_FB turned off entirely. But if you've got any insight into how the console driver stuff works from troubleshooting your problem, I could use the hints... -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION 2.6.23] no vga console and no messages
On Sun, 17 Feb 2008, Frans Pop wrote: On Sunday 17 February 2008, Daniel Barkalow wrote: On Sun, 17 Feb 2008, Frans Pop wrote: Daniel Barkalow wrote: For some reason I can't see and don't know how to debug, in 2.6.23 on my server I don't get the vga console, but only get the dummy console. Please check if this bug report matches the issue you are seeing: http://bugzilla.kernel.org/show_bug.cgi?id=9310 I think mine might be different. I've got a vga parameter (vga=0x301), and mine disappears very early, before when you usually get Console: colour VGA+ 80x25 (and I'm getting Console: coloud dummy 80x25 instead). I've also got CONFIG_FB turned off entirely. The main question is: do you have FRAMEBUFFER_CONSOLE_DETECT_PRIMARY enabled in you kernel config. If you do, I'd try disabling it. But if you've got any insight into how the console driver stuff works from troubleshooting your problem, I could use the hints... Afraid not. Are you sure you have the correct framebuffer driver compiled into the kernel? I'm sure I have none at all; I'm trying to use the vga console, not the framebuffer console, or the framebuffer at all. Please post your kernel config and the output of 'lspci -nn', so people can have a look. .config from the build where I disabled DUMMY_CONSOLE and it panics: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-gentoo-r8 # Sat Feb 16 21:54:06 2008 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=14 CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_BLOCK=y CONFIG_LBD=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MCORE2 is not set CONFIG_MPENTIUM4=y # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK
Re: [REGRESSION 2.6.23] no vga console and no messages
As far as I can tell, the only differences in either dmesg or lspci between the broken one and the working one are the phrasing of messages, not what's happening. Out of curiousity, what do you see for the Console: line when you boot? It's possible that the VGA console code somehow got broken for both of us in 2.6.23, and this means that (a) my console doesn't work, since I'm trying to use it, and (b) if your framebuffer console tries to match your VGA console, it'll break, because your VGA console is broken. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION 2.6.23] no vga console and no messages
For some reason I can't see and don't know how to debug, in 2.6.23 on my server I don't get the vga console, but only get the dummy console. I also noticed that the documentation is wrong and the Kconfig file is confused; it's impossible to not have DUMMY_CONSOLE set, because at least one of PROM_CONSOLE and VGA_CONSOLE must not be y. Normally (maybe only due to the fact that "dummycon" sorts before "promcon", "sticon", and "vgacon"), it actually only stays active if your real console doesn't also get initialized. This isn't my problem, AFAICT (my kernel panics if I disable DUMMY_CONSOLE, presumably for lack of any console at all); it's just misleading. I'm not seeing anything in dmesg to indicate why VGA+ isn't getting registered successfully, or anything to suggest it is trying to be, nor do I see anything in a 2.6.22 boot about why that seems to work. Any suggestions on further things to try? I haven't tested anything newer than 2.6.23.x, but I looked through the git history and didn't find anything that looked relevant, or even anyone who might know about it. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION 2.6.23] no vga console and no messages
For some reason I can't see and don't know how to debug, in 2.6.23 on my server I don't get the vga console, but only get the dummy console. I also noticed that the documentation is wrong and the Kconfig file is confused; it's impossible to not have DUMMY_CONSOLE set, because at least one of PROM_CONSOLE and VGA_CONSOLE must not be y. Normally (maybe only due to the fact that dummycon sorts before promcon, sticon, and vgacon), it actually only stays active if your real console doesn't also get initialized. This isn't my problem, AFAICT (my kernel panics if I disable DUMMY_CONSOLE, presumably for lack of any console at all); it's just misleading. I'm not seeing anything in dmesg to indicate why VGA+ isn't getting registered successfully, or anything to suggest it is trying to be, nor do I see anything in a 2.6.22 boot about why that seems to work. Any suggestions on further things to try? I haven't tested anything newer than 2.6.23.x, but I looked through the git history and didn't find anything that looked relevant, or even anyone who might know about it. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: > > The SCSI error reporting really ought to include a simple interpretation > > of the error for end users ("The drive doesn't support this command" "A > > sector's data got lost" "The drive timed out" "The drive failed" "The > > drive is entirely gone"). There's too much similarity between the message > > you get when you try a SMART test that doesn't apply to the drive and what > > you get when the drive is broken. > > That would be the SCSI verbose messages option. I think the Eric > Youngdale consortium added it about Linux 1.2. Nowdays its always built > that way. I've seen a lot of verbosity out of SCSI messages, but I haven't seen a straightforward interpretation of the problem in there. It's all information useful for debugging, not information useful for system administration. > > And it's possible that the error recovery is suboptimal in some cases. It > > seems to like resetting drives too much; perhaps if it keeps seeing the > > same problem and resetting the drive, it should decide that the drive's > > error reporting is just bad and just ignore that error like the old IDE > > did (but, in this case, after saying what it's doing). > > Nothing like casually praying the users data hasn't gone for a walk is > there. If we don't act on them the users don't report them until > something really bad occurs so that isn't an option. On the other hand, bringing the system down because a device is misbehaving is a poor idea. I've personally recovered most of the data off of a dying drive because the system was willing to let me keep using the drive anyway; IIRC, the drive didn't work at all after a reboot, so I would have lost all the data instead of only a little had the system insisted on a perfectly functioning drive in order to use it at all. There ought to be some middle ground between doing nothing until the computer really breaks and breaking the computer before then, but that's an issue not specific to libata. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: > > not one problem but lots---is sufficiently widespread that a Mini HOWTO, > > say, would be really welcome and, I'm guessing, widely used. > > We don't see very many libata problems at the distro level and they for > the most part boil down to > > - error messages looking different - Most bugs I get are things like > media errors (timeout looks different, UNC report looks different) The SCSI error reporting really ought to include a simple interpretation of the error for end users ("The drive doesn't support this command" "A sector's data got lost" "The drive timed out" "The drive failed" "The drive is entirely gone"). There's too much similarity between the message you get when you try a SMART test that doesn't apply to the drive and what you get when the drive is broken. > - faulty hardware being picked up because we actually do real error > checking now. We now check for and give some devices more slack while > still doing error checking. Both IDE layers also added blacklists for > stuff like the TSScorp DVD drives. Qemu has now had its bugs patched. I think this is the big source of unhappy users (and, of course, they all look the same and the reports stay findable by Google, so it looks a lot worse than it is). People getting this problem in distro kernels probably really do want to have a way to report it with enough detail from logs to get it dealt with and then switch back to old IDE until the fix propagates through. And it's possible that the error recovery is suboptimal in some cases. It seems to like resetting drives too much; perhaps if it keeps seeing the same problem and resetting the drive, it should decide that the drive's error reporting is just bad and just ignore that error like the old IDE did (but, in this case, after saying what it's doing). -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: > > things in the kernel that refer to SCSI probably should say "storage" (or > > "ATA", really, but that would make the acronyms confusing). > > SCSI is a command protocol. It is what your CD-ROM drive and USB storage > devices talk (albeit with a bit of an accent). Among other things, yes. But SCSI standards also specify electrical interfaces that aren't at all related to the electrical interfaces used by a lot of devices, and a lot of the places the kernel uses the term suggest that it's also talking about the electrical interface (or, at least, connector shape). For example, it's misleading to talk about "SCSI CDROM support" meaning the command protocol when hardly anybody has ever seen a CDROM drive that doesn't use the SCSI command protocol, but most people know about both SCSI-connector and PATA-connector CDROM drives. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Gene Heskett wrote: > >For starters, enable CONFIG_BLK_DEV_SR. > > That could stand to be moved or renamed, it is well buried in the menu for > the > REAL scsi stuffs, which I don't have any of. Enabled & building now. The "SCSI support type (disk, tape, CD-ROM)" section of that menu actually applies to all ATA-command-set devices that don't use the old IDE code. For example, usb-storage uses "SCSI disk" out of that section, and I've only seen "Probe all LUNs on each SCSI device" be needed for a particular USB card reader with two slots. At this point, most of the things in the kernel that refer to SCSI probably should say "storage" (or "ATA", really, but that would make the acronyms confusing). Incidentally, you should be able to save debugging time for problems like missing "sr" by building it as a module, which will build really quickly and not require a reboot to test. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Gene Heskett wrote: For starters, enable CONFIG_BLK_DEV_SR. That could stand to be moved or renamed, it is well buried in the menu for the REAL scsi stuffs, which I don't have any of. Enabled building now. The SCSI support type (disk, tape, CD-ROM) section of that menu actually applies to all ATA-command-set devices that don't use the old IDE code. For example, usb-storage uses SCSI disk out of that section, and I've only seen Probe all LUNs on each SCSI device be needed for a particular USB card reader with two slots. At this point, most of the things in the kernel that refer to SCSI probably should say storage (or ATA, really, but that would make the acronyms confusing). Incidentally, you should be able to save debugging time for problems like missing sr by building it as a module, which will build really quickly and not require a reboot to test. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: things in the kernel that refer to SCSI probably should say storage (or ATA, really, but that would make the acronyms confusing). SCSI is a command protocol. It is what your CD-ROM drive and USB storage devices talk (albeit with a bit of an accent). Among other things, yes. But SCSI standards also specify electrical interfaces that aren't at all related to the electrical interfaces used by a lot of devices, and a lot of the places the kernel uses the term suggest that it's also talking about the electrical interface (or, at least, connector shape). For example, it's misleading to talk about SCSI CDROM support meaning the command protocol when hardly anybody has ever seen a CDROM drive that doesn't use the SCSI command protocol, but most people know about both SCSI-connector and PATA-connector CDROM drives. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - error messages looking different - Most bugs I get are things like media errors (timeout looks different, UNC report looks different) The SCSI error reporting really ought to include a simple interpretation of the error for end users (The drive doesn't support this command A sector's data got lost The drive timed out The drive failed The drive is entirely gone). There's too much similarity between the message you get when you try a SMART test that doesn't apply to the drive and what you get when the drive is broken. - faulty hardware being picked up because we actually do real error checking now. We now check for and give some devices more slack while still doing error checking. Both IDE layers also added blacklists for stuff like the TSScorp DVD drives. Qemu has now had its bugs patched. I think this is the big source of unhappy users (and, of course, they all look the same and the reports stay findable by Google, so it looks a lot worse than it is). People getting this problem in distro kernels probably really do want to have a way to report it with enough detail from logs to get it dealt with and then switch back to old IDE until the fix propagates through. And it's possible that the error recovery is suboptimal in some cases. It seems to like resetting drives too much; perhaps if it keeps seeing the same problem and resetting the drive, it should decide that the drive's error reporting is just bad and just ignore that error like the old IDE did (but, in this case, after saying what it's doing). -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: The SCSI error reporting really ought to include a simple interpretation of the error for end users (The drive doesn't support this command A sector's data got lost The drive timed out The drive failed The drive is entirely gone). There's too much similarity between the message you get when you try a SMART test that doesn't apply to the drive and what you get when the drive is broken. That would be the SCSI verbose messages option. I think the Eric Youngdale consortium added it about Linux 1.2. Nowdays its always built that way. I've seen a lot of verbosity out of SCSI messages, but I haven't seen a straightforward interpretation of the problem in there. It's all information useful for debugging, not information useful for system administration. And it's possible that the error recovery is suboptimal in some cases. It seems to like resetting drives too much; perhaps if it keeps seeing the same problem and resetting the drive, it should decide that the drive's error reporting is just bad and just ignore that error like the old IDE did (but, in this case, after saying what it's doing). Nothing like casually praying the users data hasn't gone for a walk is there. If we don't act on them the users don't report them until something really bad occurs so that isn't an option. On the other hand, bringing the system down because a device is misbehaving is a poor idea. I've personally recovered most of the data off of a dying drive because the system was willing to let me keep using the drive anyway; IIRC, the drive didn't work at all after a reboot, so I would have lost all the data instead of only a little had the system insisted on a perfectly functioning drive in order to use it at all. There ought to be some middle ground between doing nothing until the computer really breaks and breaking the computer before then, but that's an issue not specific to libata. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: > On Monday 28 January 2008, Daniel Barkalow wrote: > >On Mon, 28 Jan 2008, Gene Heskett wrote: > >> On Monday 28 January 2008, Daniel Barkalow wrote: > >> >Building this and installing it along with the appropriate initrd (which > >> >might be handled by Fedora's install scripts) > >> > >> Or mine, which I've been using for years. > > > >You're ahead of a surprising number of people, including me, if you > >understand making initrds. > > In my script, its one line: > mkinitrd -f initrd-$VER.img $VER && \ > > where $VER is the shell variable I edit to = the version number, located at > the top of the script. > > Unforch, its failing: > No module pata_amd found for kernel 2.6.24, aborting. > > This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned > on. So something is still dependent on it. That looks like something in the guts of the initrd; it probably thinks you need pata_amd and it's unhappy that you don't have it. Actually, another thing to try is making the ATA/etc one be "y" and pata_amd be "m". Most likely, this should lead to the ATA one claiming the drive before the module is loaded (but the module would be loaded later, to avoid upsetting the initrd); you should be able to tell from dmesg (or /dev, for that matter) which one got it, and I think built-in drivers will claim everything they can before an initrd gets loaded. > I do have one sata drive, on an accessory card in the box, so I need the > rest of the sata_sil and friends stuff. Assuming it isn't picking up your hard drive, which it isn't, that shouldn't matter. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: > On Monday 28 January 2008, Daniel Barkalow wrote: > >Building this and installing it along with the appropriate initrd (which > >might be handled by Fedora's install scripts) > > Or mine, which I've been using for years. You're ahead of a surprising number of people, including me, if you understand making initrds. > >will either get you back to > >old IDE or will make your kernel panic on boot, depending on whether you > >got it right (so make sure you can still boot the kernel you're sure of or > >something from a boot disk). This will also cause your hard drives to show > >up as different device nodes, so if your boot process doesn't mount by > >disk uuid but by some other feature (and I don't know what Fedora does), > >you'll also need to change it to something either stable across access > >methods or which works for the one you're now using. > > It mounts by LABEL=. All of it. That'll save a huge amount of hassle. So long as you manage to get the right drivers included and the wrong drivers not included, you should be pretty much set. > Fedora is not the only people having trouble, name a distro, its probably > someplace in that 14,800 hit google returns. Yeah, but they each may need different instructions, particularly if they're not mounting by label in general, or not mounting the root partition by label. That was the big hassle going the opposite direction. And the procedure is 4 lines to describe to somebody who knows how to build and install a new kernel for the distro, which is much shorter than the explanation of how you generally build and install a kernel. A real howto would have to explain where to get the distro's kernel sources and default configuration, for example. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes
On Mon, 28 Jan 2008, Mike Frysinger wrote: > On Jan 28, 2008 7:08 PM, Daniel Barkalow <[EMAIL PROTECTED]> wrote: > > Could you submit the XML files and the autogeneration code? The C file > > isn't really source. Not only is it big, it'll probably change around a > > whole lot when you make small changes to your process, be hard to review, > > etc. > > that would require the build system to have xml tools installed ... > that doesnt sound pleasant. If they're only required for building blackfin debugging stuff, that shouldn't be a big deal. People building embedded kernels with debugging from source can probably handle the extra requirement. Setting up a cross-compilation toolchain for embedded processors is much trickier than getting xml tools. > that said, the XML files in question are probably 10x+ the size of the > C file. swapping 1 meg for 10+ megs ? :) If it's a bunch of smaller files, and if changes tend to be localized, that would be a good tradeoff. Alternatively, have them packaged separately, which might be more appropriate anyway if people might want to use them for other purposes (on the host when using jtag, perhaps). -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes
On Mon, 28 Jan 2008, Mike Frysinger wrote: > On Jan 28, 2008 8:04 AM, richard kennedy <[EMAIL PROTECTED]> wrote: > > Mike Frysinger wrote: > > > On Jan 28, 2008 5:40 AM, Bryan Wu <[EMAIL PROTECTED]> wrote: > > >> On Mon, 2008-01-28 at 05:16 -0500, Mike Frysinger wrote: > > >>> the trouble is that this file currently weighs in at ~1.8 megs. this > > >>> is because it contains all the information for all Blackfin processors > > >>> we support (which currently, is about ~23 variants). it's only going > > >>> to get bigger as we support more. Bryan cringes at the thought of > > >>> submitting it to LKML :). so i'm fishing around for alternatives ... > > >>> the code was originally developed against 2.6.21, so UIO was not a > > >>> possibility. i'm still not sure if it is ... i'd have to research it > > >>> a bit more and play with things. > > >> The main reason I am not willing to submit this to mainline is the file > > >> size. It's almost the biggest file in the kernel source. And it will be > > >> bigger and bigger when more and more new Blackfin processors supported > > >> by Linux kernel. > > > > > > a quick check of current git shows it is significantly larger than any > > > other ;) > > > > > >> My suggestion is: > > >> Or more deeper thought: > > >> - we don't need all the MMR setup at the same time for debugging. for > > >> example, maybe for some developer, he/she only needs one driver MMR for > > >> debugging such as watchdog/usb/spi/i2c > > > > > > splitting things up doesnt really address the original issue: there's > > > a lot of info here to be kept in the kernel > > > > > >> - How about split the debug MMR table to each drivers or processors? > > >> - watchdog driver implements a debug FS interface for debugging > > >> watchdog MMR and other drivers implement their own things. > > > > > > this had been mentioned before as a possibility but shot down. you do > > > not want to tie the creation of these debug files to anything as the > > > prevents independent development of any other drivers/application that > > > use the same peripheral. > > > > there is a lot of duplication in your file, but you could slim it down a > > bit if thats the only objection. > > i imagine there's a ton of duplication ... the file is auto-generated > from XML files, so i could take a look at the autogeneration producing > unified code. Could you submit the XML files and the autogeneration code? The C file isn't really source. Not only is it big, it'll probably change around a whole lot when you make small changes to your process, be hard to review, etc. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Richard Heck wrote: > Daniel Barkalow wrote: > > Can you switch back to old IDE to get your work done (and to make sure it's > > not a hardware issue that's developed recently)? > I think it'd be really, REALLY helpful to a lot of people if you, or someone, > could explain in moderate detail how this might be done. I tried doing it > myself, but I'm not sufficiently expert at configuring kernels that I was ever > able to figure out how to do it. As far as configuring the kernel, I can help: Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers, and turn off anything that's PATA and looks relevant. (Whether a device uses IDE or PATA depends on which driver that supports the device is present and find it first, not on any sort of global configuration, which is probably what tripped you up) Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) will either get you back to old IDE or will make your kernel panic on boot, depending on whether you got it right (so make sure you can still boot the kernel you're sure of or something from a boot disk). This will also cause your hard drives to show up as different device nodes, so if your boot process doesn't mount by disk uuid but by some other feature (and I don't know what Fedora does), you'll also need to change it to something either stable across access methods or which works for the one you're now using. > Obviously, the short version is: switch back to Fedora 6. But this kind of > problem with libata---and yes, you're almost surely right that it's not one > problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be > really welcome and, I'm guessing, widely used. Fedora really ought to provide documentation, because there's some distro-specific stuff (like how you deal with the kernel's device node for the root partition changing), and they're using code by default that's at least somewhat documented as experimental (although it doesn't seem to be actually marked as experimental in all cases). -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: > I believe at this point, its moot. I captured quite a few instances of that > error message while rebooting the last time, all of which occurred long > before I logged in and did a startx (I boot to runlevel 3 here), so the > kernel was NOT tainted at that point. That dmesg has been posted and some > questions asked. > > As this has gone on for a while, it seems to me that with 14,800 google hits > on this problem, Linus should call a halt until this is found and fixed. But > I'm not Linus. I'm also locking up for 30 at a time, & probably ready for > reboot #7 today. Can you switch back to old IDE to get your work done (and to make sure it's not a hardware issue that's developed recently)? I believe libata is just a whole lot pickier about behavior than the IDE subsystem was, so it's more likely to complain about stuff, both for good reasons and when it shouldn't, and there are a slew of potential "we have to accept that old PATA hardware does this" bugs that all have the same symptom of "we go into error handling when nothing is actually wrong", hence the vast quantity of hits. I think it's not exactly that it's a common problem as that it's a lot of problems that aren't very distinguishable. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: I believe at this point, its moot. I captured quite a few instances of that error message while rebooting the last time, all of which occurred long before I logged in and did a startx (I boot to runlevel 3 here), so the kernel was NOT tainted at that point. That dmesg has been posted and some questions asked. As this has gone on for a while, it seems to me that with 14,800 google hits on this problem, Linus should call a halt until this is found and fixed. But I'm not Linus. I'm also locking up for 30 at a time, probably ready for reboot #7 today. Can you switch back to old IDE to get your work done (and to make sure it's not a hardware issue that's developed recently)? I believe libata is just a whole lot pickier about behavior than the IDE subsystem was, so it's more likely to complain about stuff, both for good reasons and when it shouldn't, and there are a slew of potential we have to accept that old PATA hardware does this bugs that all have the same symptom of we go into error handling when nothing is actually wrong, hence the vast quantity of hits. I think it's not exactly that it's a common problem as that it's a lot of problems that aren't very distinguishable. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Richard Heck wrote: Daniel Barkalow wrote: Can you switch back to old IDE to get your work done (and to make sure it's not a hardware issue that's developed recently)? I think it'd be really, REALLY helpful to a lot of people if you, or someone, could explain in moderate detail how this might be done. I tried doing it myself, but I'm not sufficiently expert at configuring kernels that I was ever able to figure out how to do it. As far as configuring the kernel, I can help: Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers, and turn off anything that's PATA and looks relevant. (Whether a device uses IDE or PATA depends on which driver that supports the device is present and find it first, not on any sort of global configuration, which is probably what tripped you up) Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) will either get you back to old IDE or will make your kernel panic on boot, depending on whether you got it right (so make sure you can still boot the kernel you're sure of or something from a boot disk). This will also cause your hard drives to show up as different device nodes, so if your boot process doesn't mount by disk uuid but by some other feature (and I don't know what Fedora does), you'll also need to change it to something either stable across access methods or which works for the one you're now using. Obviously, the short version is: switch back to Fedora 6. But this kind of problem with libata---and yes, you're almost surely right that it's not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. Fedora really ought to provide documentation, because there's some distro-specific stuff (like how you deal with the kernel's device node for the root partition changing), and they're using code by default that's at least somewhat documented as experimental (although it doesn't seem to be actually marked as experimental in all cases). -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes
On Mon, 28 Jan 2008, Mike Frysinger wrote: On Jan 28, 2008 8:04 AM, richard kennedy [EMAIL PROTECTED] wrote: Mike Frysinger wrote: On Jan 28, 2008 5:40 AM, Bryan Wu [EMAIL PROTECTED] wrote: On Mon, 2008-01-28 at 05:16 -0500, Mike Frysinger wrote: the trouble is that this file currently weighs in at ~1.8 megs. this is because it contains all the information for all Blackfin processors we support (which currently, is about ~23 variants). it's only going to get bigger as we support more. Bryan cringes at the thought of submitting it to LKML :). so i'm fishing around for alternatives ... the code was originally developed against 2.6.21, so UIO was not a possibility. i'm still not sure if it is ... i'd have to research it a bit more and play with things. The main reason I am not willing to submit this to mainline is the file size. It's almost the biggest file in the kernel source. And it will be bigger and bigger when more and more new Blackfin processors supported by Linux kernel. a quick check of current git shows it is significantly larger than any other ;) My suggestion is: Or more deeper thought: - we don't need all the MMR setup at the same time for debugging. for example, maybe for some developer, he/she only needs one driver MMR for debugging such as watchdog/usb/spi/i2c splitting things up doesnt really address the original issue: there's a lot of info here to be kept in the kernel - How about split the debug MMR table to each drivers or processors? - watchdog driver implements a debug FS interface for debugging watchdog MMR and other drivers implement their own things. this had been mentioned before as a possibility but shot down. you do not want to tie the creation of these debug files to anything as the prevents independent development of any other drivers/application that use the same peripheral. there is a lot of duplication in your file, but you could slim it down a bit if thats the only objection. i imagine there's a ton of duplication ... the file is auto-generated from XML files, so i could take a look at the autogeneration producing unified code. Could you submit the XML files and the autogeneration code? The C file isn't really source. Not only is it big, it'll probably change around a whole lot when you make small changes to your process, be hard to review, etc. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: On Monday 28 January 2008, Daniel Barkalow wrote: Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) Or mine, which I've been using for years. You're ahead of a surprising number of people, including me, if you understand making initrds. will either get you back to old IDE or will make your kernel panic on boot, depending on whether you got it right (so make sure you can still boot the kernel you're sure of or something from a boot disk). This will also cause your hard drives to show up as different device nodes, so if your boot process doesn't mount by disk uuid but by some other feature (and I don't know what Fedora does), you'll also need to change it to something either stable across access methods or which works for the one you're now using. It mounts by LABEL=. All of it. That'll save a huge amount of hassle. So long as you manage to get the right drivers included and the wrong drivers not included, you should be pretty much set. Fedora is not the only people having trouble, name a distro, its probably someplace in that 14,800 hit google returns. Yeah, but they each may need different instructions, particularly if they're not mounting by label in general, or not mounting the root partition by label. That was the big hassle going the opposite direction. And the procedure is 4 lines to describe to somebody who knows how to build and install a new kernel for the distro, which is much shorter than the explanation of how you generally build and install a kernel. A real howto would have to explain where to get the distro's kernel sources and default configuration, for example. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes
On Mon, 28 Jan 2008, Mike Frysinger wrote: On Jan 28, 2008 7:08 PM, Daniel Barkalow [EMAIL PROTECTED] wrote: Could you submit the XML files and the autogeneration code? The C file isn't really source. Not only is it big, it'll probably change around a whole lot when you make small changes to your process, be hard to review, etc. that would require the build system to have xml tools installed ... that doesnt sound pleasant. If they're only required for building blackfin debugging stuff, that shouldn't be a big deal. People building embedded kernels with debugging from source can probably handle the extra requirement. Setting up a cross-compilation toolchain for embedded processors is much trickier than getting xml tools. that said, the XML files in question are probably 10x+ the size of the C file. swapping 1 meg for 10+ megs ? :) If it's a bunch of smaller files, and if changes tend to be localized, that would be a good tradeoff. Alternatively, have them packaged separately, which might be more appropriate anyway if people might want to use them for other purposes (on the host when using jtag, perhaps). -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: On Monday 28 January 2008, Daniel Barkalow wrote: On Mon, 28 Jan 2008, Gene Heskett wrote: On Monday 28 January 2008, Daniel Barkalow wrote: Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) Or mine, which I've been using for years. You're ahead of a surprising number of people, including me, if you understand making initrds. In my script, its one line: mkinitrd -f initrd-$VER.img $VER \ where $VER is the shell variable I edit to = the version number, located at the top of the script. Unforch, its failing: No module pata_amd found for kernel 2.6.24, aborting. This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned on. So something is still dependent on it. That looks like something in the guts of the initrd; it probably thinks you need pata_amd and it's unhappy that you don't have it. Actually, another thing to try is making the ATA/etc one be y and pata_amd be m. Most likely, this should lead to the ATA one claiming the drive before the module is loaded (but the module would be loaded later, to avoid upsetting the initrd); you should be able to tell from dmesg (or /dev, for that matter) which one got it, and I think built-in drivers will claim everything they can before an initrd gets loaded. I do have one sata drive, on an accessory card in the box, so I need the rest of the sata_sil and friends stuff. Assuming it isn't picking up your hard drive, which it isn't, that shouldn't matter. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Celeron Core
On Sun, 20 Jan 2008, Matt Mackall wrote: > Your usage of "overall power" here is wrong. Power is an instantaneous > quantity (1/s) like velocity, and you are comparing it to energy which > is not an instaneous quantity, more like distance. > > If we throttle the velocity of a car from 100km/h to 50km/h, it'll > obviously take longer for it travel a given distance. Now what will it > mean when we ask about its "overall velocity" when it reaches its > destination? We surely don't mean the distance travelled - that's not a > velocity! We can perhaps talk about its average velocity, which will > obviously be smaller. What's people tend to care about is average power usage over a period, not instantaneous power usage. In fact, throttling obviously doesn't decrease instantaneous power usage while the machine is doing anything (since it runs full speed and full power when running, and does nothing and uses some but not as much power when halted). Throttling decreases the average power usage over the period of the throttling, but increases the average power usage in general over longer periods. If we throttle a car's velocity by only driving 100km/h for 5 minutes out of every 10 instead of all of the time, it doesn't meaningfully have less velocity. And it's a particularly meaningless measure if the arrangement as a whole is that it will leave point A at some time, drive to point B, and sit there until some other time; in this case its average velocity is the distance from point A to point B divided by the duration between the two times, regardless of how you drive. But the distance travelled is longer if you have to pull over and park every 10 minutes, and so the average velocity must be higher for the TDMA throttling case. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Celeron Core
On Sun, 20 Jan 2008, Matt Mackall wrote: Your usage of overall power here is wrong. Power is an instantaneous quantity (1/s) like velocity, and you are comparing it to energy which is not an instaneous quantity, more like distance. If we throttle the velocity of a car from 100km/h to 50km/h, it'll obviously take longer for it travel a given distance. Now what will it mean when we ask about its overall velocity when it reaches its destination? We surely don't mean the distance travelled - that's not a velocity! We can perhaps talk about its average velocity, which will obviously be smaller. What's people tend to care about is average power usage over a period, not instantaneous power usage. In fact, throttling obviously doesn't decrease instantaneous power usage while the machine is doing anything (since it runs full speed and full power when running, and does nothing and uses some but not as much power when halted). Throttling decreases the average power usage over the period of the throttling, but increases the average power usage in general over longer periods. If we throttle a car's velocity by only driving 100km/h for 5 minutes out of every 10 instead of all of the time, it doesn't meaningfully have less velocity. And it's a particularly meaningless measure if the arrangement as a whole is that it will leave point A at some time, drive to point B, and sit there until some other time; in this case its average velocity is the distance from point A to point B divided by the duration between the two times, regardless of how you drive. But the distance travelled is longer if you have to pull over and park every 10 minutes, and so the average velocity must be higher for the TDMA throttling case. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
On Thu, 27 Dec 2007, Linus Torvalds wrote: > On Thu, 27 Dec 2007, Daniel Barkalow wrote: > > > > I'd actually bet that the hardware bug is actually that any device that > > gives a CRS response the first time will have its Vendor ID appear as 0001 > > on subsequent mmconfig accesses, which means that it's actually a bus > > quirk that probably only affects mmconfig access to something in the > > conf1-visible space. The only per-device aspect would be that it uses CRS > > (possibly correctly), and that doesn't mean that mmconfig won't be safe in > > general for the device, or even that it won't be necessary. Actually, we > > already know that per-driver enabling mmconfig is broken: sky2 is one that > > wants to opt in but there are also reports of the Vendor ID 0001 bug with > > it. > > Actually, having it be a per-device thing would have fixed this particular > problem, if only because the device probing would have been done without > MMCONFIG (thus avoiding the bug), and then after it has been probed, it > wouldn't have mattered if the driver enabled MMCONFIG for the device, > since it would now have the right ID in "struct pci_device". > > Sure, subsequent "lspci" users would still be confused, but the kernel > itself would never have noticed anything strange. A bug making lspci see something different from what the kernel sees initially sounds to me like a sure way to drive maintainers insane. If somebody had a northbridge that also screwed up the rest of the word, and a device that a mmconfig-using driver recognized but had problems with, the user would be reporting lspci info with 0001: as the device that doesn't work. > Of course, just doing *all* initial probing without MMCONFIG would also > have fixed it, which is another thing I advocate (regardless of any > per-device setting). So would always using conf1 for the non-extended space (unless the platform only uses mmconfig), or at least for the first 64 bytes. I'd bet all the subtle bugs are in the first few words, anyway. (With blatant bugs in the rest, of course, where we want to blacklist busses and devices) -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
On Thu, 27 Dec 2007, Kai Ruhnau wrote: > Linus Torvalds wrote: > > On Thu, 27 Dec 2007, Linus Torvalds wrote: > > > >> Kai, can you try that? Just remove the call to pci_enable_crs() in > >> pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts > >> working for you? > >> > > > > We could also make the error handling more permissive, and just check for > > the low 16 bits, which is the part that the CRS spec mentions the actual > > value for. The whole vendor ID of 0x0001 is mentioned int he CRS spec as > > being explicitly chosen exactly because it's invalid. > > > > That said, given that we don't actually reap any benefits from CRS support > > right now *anyway*, I think the right thing to do is disable it by > > default. But it would be interesting to know if this patch makes it work > > on those ATI bridges.. > > > > Linus > > > > --- > > drivers/pci/probe.c |2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > > index 2f75d69..94cd3a4 100644 > > --- a/drivers/pci/probe.c > > +++ b/drivers/pci/probe.c > > @@ -908,7 +908,7 @@ pci_scan_device(struct pci_bus *bus, int devfn) > > return NULL; > > > > /* Configuration request Retry Status */ > > - while (l == 0x0001) { > > + while ((l & 0x) == 0x0001) { > > msleep(delay); > > delay *= 2; > > if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, ) > > That one did not work out so well. > I reenabled the call to pci_enable_crs() and changed the line as above. > That resulted in two timeouts (from dmesg): > > [] > ACPI: Interpreter enabled > ACPI: (supports S0 S3 S4 S5) > ACPI: Using IOACPI for interrupt routing > ACPI: PCI Root Bridge [PCI0] (:00) > Device :01:00.0 not responding > Device :02:00.0 not responding > [] > > Then, the kernel boots up normally except of graphics and network card > not showing up at all in lspci. Uh, right. We already know that your northbridge, mmconfig, CRS, and this device combine to always return 0001 for the Vendor ID. If we loop on getting that, we must time out. I'd actually bet that the hardware bug is actually that any device that gives a CRS response the first time will have its Vendor ID appear as 0001 on subsequent mmconfig accesses, which means that it's actually a bus quirk that probably only affects mmconfig access to something in the conf1-visible space. The only per-device aspect would be that it uses CRS (possibly correctly), and that doesn't mean that mmconfig won't be safe in general for the device, or even that it won't be necessary. Actually, we already know that per-driver enabling mmconfig is broken: sky2 is one that wants to opt in but there are also reports of the Vendor ID 0001 bug with it. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
On Thu, 27 Dec 2007, Kai Ruhnau wrote: Linus Torvalds wrote: On Thu, 27 Dec 2007, Linus Torvalds wrote: Kai, can you try that? Just remove the call to pci_enable_crs() in pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts working for you? We could also make the error handling more permissive, and just check for the low 16 bits, which is the part that the CRS spec mentions the actual value for. The whole vendor ID of 0x0001 is mentioned int he CRS spec as being explicitly chosen exactly because it's invalid. That said, given that we don't actually reap any benefits from CRS support right now *anyway*, I think the right thing to do is disable it by default. But it would be interesting to know if this patch makes it work on those ATI bridges.. Linus --- drivers/pci/probe.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 2f75d69..94cd3a4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -908,7 +908,7 @@ pci_scan_device(struct pci_bus *bus, int devfn) return NULL; /* Configuration request Retry Status */ - while (l == 0x0001) { + while ((l 0x) == 0x0001) { msleep(delay); delay *= 2; if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l) That one did not work out so well. I reenabled the call to pci_enable_crs() and changed the line as above. That resulted in two timeouts (from dmesg): [] ACPI: Interpreter enabled ACPI: (supports S0 S3 S4 S5) ACPI: Using IOACPI for interrupt routing ACPI: PCI Root Bridge [PCI0] (:00) Device :01:00.0 not responding Device :02:00.0 not responding [] Then, the kernel boots up normally except of graphics and network card not showing up at all in lspci. Uh, right. We already know that your northbridge, mmconfig, CRS, and this device combine to always return 0001 for the Vendor ID. If we loop on getting that, we must time out. I'd actually bet that the hardware bug is actually that any device that gives a CRS response the first time will have its Vendor ID appear as 0001 on subsequent mmconfig accesses, which means that it's actually a bus quirk that probably only affects mmconfig access to something in the conf1-visible space. The only per-device aspect would be that it uses CRS (possibly correctly), and that doesn't mean that mmconfig won't be safe in general for the device, or even that it won't be necessary. Actually, we already know that per-driver enabling mmconfig is broken: sky2 is one that wants to opt in but there are also reports of the Vendor ID 0001 bug with it. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
On Thu, 27 Dec 2007, Linus Torvalds wrote: On Thu, 27 Dec 2007, Daniel Barkalow wrote: I'd actually bet that the hardware bug is actually that any device that gives a CRS response the first time will have its Vendor ID appear as 0001 on subsequent mmconfig accesses, which means that it's actually a bus quirk that probably only affects mmconfig access to something in the conf1-visible space. The only per-device aspect would be that it uses CRS (possibly correctly), and that doesn't mean that mmconfig won't be safe in general for the device, or even that it won't be necessary. Actually, we already know that per-driver enabling mmconfig is broken: sky2 is one that wants to opt in but there are also reports of the Vendor ID 0001 bug with it. Actually, having it be a per-device thing would have fixed this particular problem, if only because the device probing would have been done without MMCONFIG (thus avoiding the bug), and then after it has been probed, it wouldn't have mattered if the driver enabled MMCONFIG for the device, since it would now have the right ID in struct pci_device. Sure, subsequent lspci users would still be confused, but the kernel itself would never have noticed anything strange. A bug making lspci see something different from what the kernel sees initially sounds to me like a sure way to drive maintainers insane. If somebody had a northbridge that also screwed up the rest of the word, and a device that a mmconfig-using driver recognized but had problems with, the user would be reporting lspci info with 0001: as the device that doesn't work. Of course, just doing *all* initial probing without MMCONFIG would also have fixed it, which is another thing I advocate (regardless of any per-device setting). So would always using conf1 for the non-extended space (unless the platform only uses mmconfig), or at least for the first 64 bytes. I'd bet all the subtle bugs are in the first few words, anyway. (With blatant bugs in the rest, of course, where we want to blacklist busses and devices) -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] New Kernel Bugs
On Fri, 16 Nov 2007, Romano Giannetti wrote: > > (Cc: trimmed a bit). > > On Thu, 2007-11-15 at 11:19 -0500, Daniel Barkalow wrote: > > On Thu, 15 Nov 2007, Theodore Tso wrote: > [...] > > > A full kernel build with everything selected can take good 30 minutes or > > > more, and that's on a fast dual-core machine with 4gigs of memory and > > > 7200rpm disk drives. On a slower, memory limited laptop, doing a single > > > kernel build can take more time than the user has patiences; multiply > > > that by 7 or 8 build and test boots, and it starts to get tiresome. > > > > None of this is going to take as long, > > Well, the compile phase can. Especially if the first time you try to > compile the kernel with EXTRAVERSION=`git describe` which force almost a > full rebuild every time... Compared to getting useful suggestions from a mailing list, especially before you've gotten anybody's attention? Hours or overnight isn't particularly long, and doesn't take up much of your time if you've got a working kernel to use while it's working. > But the worst problem is that a full recompile, with a distro .config, > will take hours on my 2.66GHz/CoreDuo/1G ram. Trimming down .config is > fundamental to be able to bisect effectively, but it's not an easy thing > to do for an unexperienced user (and a painful one for all the rest of > us). > > What would be an invaluable help would be a tool that generates > a .config with all the modules and subsystems I am using *now*. Should > be possible in principle by parsing KConfig and Makefiles and using as > input the current .config and lsmod... is it possible to map the kernel > object name to the option enabling it? I don't think there's anything set up for that, aside from the actual build system generating it, and I don't know how hard that would be to repurpose for generating a configuration. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] New Kernel Bugs
On Fri, 16 Nov 2007, Romano Giannetti wrote: (Cc: trimmed a bit). On Thu, 2007-11-15 at 11:19 -0500, Daniel Barkalow wrote: On Thu, 15 Nov 2007, Theodore Tso wrote: [...] A full kernel build with everything selected can take good 30 minutes or more, and that's on a fast dual-core machine with 4gigs of memory and 7200rpm disk drives. On a slower, memory limited laptop, doing a single kernel build can take more time than the user has patiences; multiply that by 7 or 8 build and test boots, and it starts to get tiresome. None of this is going to take as long, Well, the compile phase can. Especially if the first time you try to compile the kernel with EXTRAVERSION=`git describe` which force almost a full rebuild every time... Compared to getting useful suggestions from a mailing list, especially before you've gotten anybody's attention? Hours or overnight isn't particularly long, and doesn't take up much of your time if you've got a working kernel to use while it's working. But the worst problem is that a full recompile, with a distro .config, will take hours on my 2.66GHz/CoreDuo/1G ram. Trimming down .config is fundamental to be able to bisect effectively, but it's not an easy thing to do for an unexperienced user (and a painful one for all the rest of us). What would be an invaluable help would be a tool that generates a .config with all the modules and subsystems I am using *now*. Should be possible in principle by parsing KConfig and Makefiles and using as input the current .config and lsmod... is it possible to map the kernel object name to the option enabling it? I don't think there's anything set up for that, aside from the actual build system generating it, and I don't know how hard that would be to repurpose for generating a configuration. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] New Kernel Bugs
On Thu, 15 Nov 2007, Theodore Tso wrote: > On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote: > > I don't see any reason that we couldn't have a tool accessible to Ubuntu > > users that does a real "git bisect". Git is really good at being scripted > > by fancy GUIs. It should be easy enough to have a drop down with all of > > the Ubuntu kernel package releases, where the user selects what works and > > what doesn't. > > It's possible users who haven't yet downloaded a git repository have > to surmount some obstacles that might cause them to lose interest. > First, they have to download some 190 megs of git repository, and if > they have a slow link, that can take a while, and then they have to > build each kernel, which can take a while. It should be possible for it to clone only the portion that they actually care about based on where the known-good version is. It should also (in theory, anyway) be possible to put off some amount of the download until it's actually going to be relevant. > A full kernel build with everything selected can take good 30 minutes or > more, and that's on a fast dual-core machine with 4gigs of memory and > 7200rpm disk drives. On a slower, memory limited laptop, doing a single > kernel build can take more time than the user has patiences; multiply > that by 7 or 8 build and test boots, and it starts to get tiresome. None of this is going to take as long, even on a slow link and a slow computer, as waiting for a response to a mailing list post. It'd annoy users who are specifically waiting for it, but if the interface is that the user says "kernel package X didn't work but the current kernel does", and it says "I'll let you know when I've got something to test", and the user watches a DVD, and afterward finds a message saying there's something to test, and tries it, and reports how it went, and the process repeats until it narrows it down to a single commit after a couple of days of the user getting occasional responses, it's not that different from asking for help online. > And then on top of that there are the issues about whether there is > enough support for dealing with hitting kernel revisions that fail due > to other bugs getting merged in during the -rc1 process, etc. Could have a distro-provided mask of things that aren't worth testing and possibly back-ported fixes for revisions in particular ranges. > I agree that a tool that automated the bisection process and walked > the user through it would be helpful, but I believe it would be > possible for us do better. That would probably help for giving the user something to try right away. I still think that the main cost to the user is the number of times that the user has to stop doing stuff to reboot with a kernel to test, whether the test kernels are available quickly from the distro site, slowly built locally, or slowly as suggested by humans helping online. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OT: Does Linux have any "Perfect Code"
On Thu, 15 Nov 2007, Michael Gerdau wrote: > > This code is far to be perfect, some part is outdated, bcopy() use instead > > of memcpy() for example. More annoying are the comment, the file is 3306 > > lines while there is only 1640 line of code, nothing bad per se but looking > > some comments: > > > > /* > > * Before we begin this operation, disable kernel preemption. > > */ > > kpreempt_disable(); > > > I'm not a kernel developer. > > > That having said: > I really do like such obvious (as in: for those knowing the stuff anyway) > comments when looking at code and probably concepts I'm not familiar with. > > ... > > I mean, isn't the whole purpose of comments to help those not familiar > with the code to understand it's purpose and possibly the intention of > the author (just in case the author had coded a bug) ? That's the problem with really obvious comments. In the example above, that function had better disable kernel preemption with a name like that, and, assuming it's before the code begins the operation in sequence, we know when we're doing it. But the comment fails to explain why we need to disable kernel preemption before beginning the operation, just that we are doing so. Having the comment merely distracts the reader from the fact that the purpose of the code and the intention of the author are completely undocumented. And there's a realy chance that this comment or ones like it cause this statement and the place in the code where things would go wrong if preemption weren't disabled to not fit on the reader's screen together, so it is not only unclear what the author's intention was, but it is harder to figure out from looking at the code than it would be without comments, because fewer clues are actually visible at the same time, since each of them takes up extra screen space. The code itself should be written to tell the reader everything there is to know about what it does, and the comments in code should only tell the reader why it does that. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OT: Does Linux have any Perfect Code
On Thu, 15 Nov 2007, Michael Gerdau wrote: This code is far to be perfect, some part is outdated, bcopy() use instead of memcpy() for example. More annoying are the comment, the file is 3306 lines while there is only 1640 line of code, nothing bad per se but looking some comments: /* * Before we begin this operation, disable kernel preemption. */ kpreempt_disable(); disclaimer I'm not a kernel developer. /disclaimer That having said: I really do like such obvious (as in: for those knowing the stuff anyway) comments when looking at code and probably concepts I'm not familiar with. ... I mean, isn't the whole purpose of comments to help those not familiar with the code to understand it's purpose and possibly the intention of the author (just in case the author had coded a bug) ? That's the problem with really obvious comments. In the example above, that function had better disable kernel preemption with a name like that, and, assuming it's before the code begins the operation in sequence, we know when we're doing it. But the comment fails to explain why we need to disable kernel preemption before beginning the operation, just that we are doing so. Having the comment merely distracts the reader from the fact that the purpose of the code and the intention of the author are completely undocumented. And there's a realy chance that this comment or ones like it cause this statement and the place in the code where things would go wrong if preemption weren't disabled to not fit on the reader's screen together, so it is not only unclear what the author's intention was, but it is harder to figure out from looking at the code than it would be without comments, because fewer clues are actually visible at the same time, since each of them takes up extra screen space. The code itself should be written to tell the reader everything there is to know about what it does, and the comments in code should only tell the reader why it does that. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] New Kernel Bugs
On Thu, 15 Nov 2007, Theodore Tso wrote: On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote: I don't see any reason that we couldn't have a tool accessible to Ubuntu users that does a real git bisect. Git is really good at being scripted by fancy GUIs. It should be easy enough to have a drop down with all of the Ubuntu kernel package releases, where the user selects what works and what doesn't. It's possible users who haven't yet downloaded a git repository have to surmount some obstacles that might cause them to lose interest. First, they have to download some 190 megs of git repository, and if they have a slow link, that can take a while, and then they have to build each kernel, which can take a while. It should be possible for it to clone only the portion that they actually care about based on where the known-good version is. It should also (in theory, anyway) be possible to put off some amount of the download until it's actually going to be relevant. A full kernel build with everything selected can take good 30 minutes or more, and that's on a fast dual-core machine with 4gigs of memory and 7200rpm disk drives. On a slower, memory limited laptop, doing a single kernel build can take more time than the user has patiences; multiply that by 7 or 8 build and test boots, and it starts to get tiresome. None of this is going to take as long, even on a slow link and a slow computer, as waiting for a response to a mailing list post. It'd annoy users who are specifically waiting for it, but if the interface is that the user says kernel package X didn't work but the current kernel does, and it says I'll let you know when I've got something to test, and the user watches a DVD, and afterward finds a message saying there's something to test, and tries it, and reports how it went, and the process repeats until it narrows it down to a single commit after a couple of days of the user getting occasional responses, it's not that different from asking for help online. And then on top of that there are the issues about whether there is enough support for dealing with hitting kernel revisions that fail due to other bugs getting merged in during the -rc1 process, etc. Could have a distro-provided mask of things that aren't worth testing and possibly back-ported fixes for revisions in particular ranges. I agree that a tool that automated the bisection process and walked the user through it would be helpful, but I believe it would be possible for us do better. That would probably help for giving the user something to try right away. I still think that the main cost to the user is the number of times that the user has to stop doing stuff to reboot with a kernel to test, whether the test kernels are available quickly from the distro site, slowly built locally, or slowly as suggested by humans helping online. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] New Kernel Bugs
On Tue, 13 Nov 2007, Theodore Tso wrote: > There are two parts to this. One is a Ubuntu development kernel which > we can give to large numbers of people to expand our testing pool. > But if we don't do a better job of responding to bug reports that > would be generated by expanded testing this won't necessarily help us. > > The other an automated set of standard pre-built bisection points so > that testers can more easily localize a bug down to a few hundred > commits without needing to learn how to use "git bisect" (think Ubuntu > users). I don't see any reason that we couldn't have a tool accessible to Ubuntu users that does a real "git bisect". Git is really good at being scripted by fancy GUIs. It should be easy enough to have a drop down with all of the Ubuntu kernel package releases, where the user selects what works and what doesn't. Then the tool clones a git repository with flags to only get relevant parts, and then leads a bisect run, where it's also configuring, building, and installing the kernels (as a different grub entry), and providing instructions in general. Fundamentally, "git bisect" is a really low-interaction process: you tell it a couple of commits, and then it does stuff, and then you tell it "I tested, and it worked" or "I tested, and it had the problem" or "Something else went wrong", and it asks you something new. Other than that, it just takes time (and a build system hook, which this tool would handle for the kernel). Eventually, it tells you what to report, and you do so. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] New Kernel Bugs
On Tue, 13 Nov 2007, Theodore Tso wrote: There are two parts to this. One is a Ubuntu development kernel which we can give to large numbers of people to expand our testing pool. But if we don't do a better job of responding to bug reports that would be generated by expanded testing this won't necessarily help us. The other an automated set of standard pre-built bisection points so that testers can more easily localize a bug down to a few hundred commits without needing to learn how to use git bisect (think Ubuntu users). I don't see any reason that we couldn't have a tool accessible to Ubuntu users that does a real git bisect. Git is really good at being scripted by fancy GUIs. It should be easy enough to have a drop down with all of the Ubuntu kernel package releases, where the user selects what works and what doesn't. Then the tool clones a git repository with flags to only get relevant parts, and then leads a bisect run, where it's also configuring, building, and installing the kernels (as a different grub entry), and providing instructions in general. Fundamentally, git bisect is a really low-interaction process: you tell it a couple of commits, and then it does stuff, and then you tell it I tested, and it worked or I tested, and it had the problem or Something else went wrong, and it asks you something new. Other than that, it just takes time (and a build system hook, which this tool would handle for the kernel). Eventually, it tells you what to report, and you do so. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.
On Tue, 23 Oct 2007, David Miller wrote: > From: Daniel Barkalow <[EMAIL PROTECTED]> > Date: Wed, 24 Oct 2007 00:58:45 -0400 (EDT) > > > I'm not sure all of the pci_intx() calls in msi.c should be skipped when > > the quirk applies; I think some of them might be there so that the legacy > > interrupt won't be delivered while MSI is turned off (since the handler > > isn't listening for the legacy interrupts). I'd guess this would cause > > people to have their MSI-capable device kill their non-MSI-capable device > > when they restore their laptop (and the shared interrupt fires and gets > > stuck at just the wrong time). No idea if this is a real concern, but I'm > > pretty sure that not all of those calls are recent. > > I don't think it's a real concern. Okay, good. As long as someone more clueful than me has thought about it, because I couldn't tell off hand. > > There's a couple of ATA drivers that look like they might be trying to > > work around the same bug, but it's a bit hard to tell. It might be good to > > have them use the quirk (or set the flag) because it's cleaner. > > I noticed these cases as well, and I would hope that Jeff would help > out here using the infrastructure my patches created. Or coordinate with someone with the quirky hardware, yes. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.
On Tue, 23 Oct 2007, David Miller wrote: From: Daniel Barkalow [EMAIL PROTECTED] Date: Wed, 24 Oct 2007 00:58:45 -0400 (EDT) I'm not sure all of the pci_intx() calls in msi.c should be skipped when the quirk applies; I think some of them might be there so that the legacy interrupt won't be delivered while MSI is turned off (since the handler isn't listening for the legacy interrupts). I'd guess this would cause people to have their MSI-capable device kill their non-MSI-capable device when they restore their laptop (and the shared interrupt fires and gets stuck at just the wrong time). No idea if this is a real concern, but I'm pretty sure that not all of those calls are recent. I don't think it's a real concern. Okay, good. As long as someone more clueful than me has thought about it, because I couldn't tell off hand. There's a couple of ATA drivers that look like they might be trying to work around the same bug, but it's a bit hard to tell. It might be good to have them use the quirk (or set the flag) because it's cleaner. I noticed these cases as well, and I would hope that Jeff would help out here using the infrastructure my patches created. Or coordinate with someone with the quirky hardware, yes. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.
On Tue, 23 Oct 2007, David Miller wrote: > > The forthcoming patches are also available from: > > kernel.org:/pub/scm/linux/kernel/git/davem/msiquirk-2.6.git > > and clean up the handling of the common quirk wherein setting > INTX_DISABLE will mistakedly disable MSI generation for some > devices. > > For devices without that problem, we want to keep the pci_intx() calls > in drivers/pci/msi.c because those help protect against devices > with the opposite problem. Such devices always generate INTX > interrupts even when MSI is enabled, unless INTX_DISABLE is set. > > Michael, please pay special attention to patch #3. I think I > picked the correct PCI device IDs to match for the quirk > (5714* and 5780*) but it's possible we might need more elaborate > checks here. It at least worked properly for the chips in my > Niagara system. I'm not sure all of the pci_intx() calls in msi.c should be skipped when the quirk applies; I think some of them might be there so that the legacy interrupt won't be delivered while MSI is turned off (since the handler isn't listening for the legacy interrupts). I'd guess this would cause people to have their MSI-capable device kill their non-MSI-capable device when they restore their laptop (and the shared interrupt fires and gets stuck at just the wrong time). No idea if this is a real concern, but I'm pretty sure that not all of those calls are recent. > In addition to the Tigon3 cases, I added quirk entries for the > SB700/800 SATA chips and the IXP SB400 USB controllers. There's a couple of ATA drivers that look like they might be trying to work around the same bug, but it's a bit hard to tell. It might be good to have them use the quirk (or set the flag) because it's cleaner. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.
On Tue, 23 Oct 2007, David Miller wrote: The forthcoming patches are also available from: kernel.org:/pub/scm/linux/kernel/git/davem/msiquirk-2.6.git and clean up the handling of the common quirk wherein setting INTX_DISABLE will mistakedly disable MSI generation for some devices. For devices without that problem, we want to keep the pci_intx() calls in drivers/pci/msi.c because those help protect against devices with the opposite problem. Such devices always generate INTX interrupts even when MSI is enabled, unless INTX_DISABLE is set. Michael, please pay special attention to patch #3. I think I picked the correct PCI device IDs to match for the quirk (5714* and 5780*) but it's possible we might need more elaborate checks here. It at least worked properly for the chips in my Niagara system. I'm not sure all of the pci_intx() calls in msi.c should be skipped when the quirk applies; I think some of them might be there so that the legacy interrupt won't be delivered while MSI is turned off (since the handler isn't listening for the legacy interrupts). I'd guess this would cause people to have their MSI-capable device kill their non-MSI-capable device when they restore their laptop (and the shared interrupt fires and gets stuck at just the wrong time). No idea if this is a real concern, but I'm pretty sure that not all of those calls are recent. In addition to the Tigon3 cases, I added quirk entries for the SB700/800 SATA chips and the IXP SB400 USB controllers. There's a couple of ATA drivers that look like they might be trying to work around the same bug, but it's a bit hard to tell. It might be good to have them use the quirk (or set the flag) because it's cleaner. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Mon, 22 Oct 2007, David Miller wrote: > My suggestion is: > > 1) Leave the pci_intx() twiddling code in drivers/pci/msi.c > > 2) Add quirks for "INTX_DISABLE turns off MSI too", this sets >a flag in the pci_dev. > > 3) The pci_intx() calls in drivers/pci/msi.c are skipped if this >flag from #2 is set. > > 4) Add quirk entries for drivers/net/tg3.c chips and these SATA >devices we are learning about here, as well as any others we >are aware of right now. > > 5) Remove the pci_intx() workaround code from drivers/net/tg3.c >and elsewhere. Seems right to me, and pretty straightforward, except that I don't really understand the pm-related logic in there to know how that should work and whether intx will need to be enabled somewhere in addition to not disabling it in the msi enable code. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Mon, 22 Oct 2007, Jeff Garzik wrote: > Daniel Barkalow wrote: > > On Fri, 19 Oct 2007, Jeff Garzik wrote: > > > > > Linas Vepstas wrote: > > > > On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote: > > > > > Since we have little experience on PCI and MSI here, we had to try to > > > > As someone else pointed out, AMD should have *lots* of people with > > > > pci and msi experience on the payroll. (Folks here buy AMD-designed pci > > > > chips ...) > > > > > > > > > ONLY > > > > > comment out the pci_intx() call in drivers/ata/ahci.c > > > > > My system can boot up too with MSI enabled! > > > > > > > > > > So does it mean that the root cause is our SB700 SATA controller > > > > > has a hardware bug where setting INTX_DISABLE in the PCI COMMAND > > > > > register masks MSI interrupts too? > > > > That's what it sounds like, to me. > > > > > > > > > And what is the software solution or workaround? > > > > Not sure. Sounds like the device driver needs a quirk for this part. > > > > > > Take a look at tg3.c net driver change > > > 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation. > > > > > > However, it may turn out that removing the pci_intx() stuff as a general > > > rule > > > is easier than quirking these devices, if enough of them turn out to have > > > this > > > hardware bug. > > > > At a first approximation, ATI/AMD devices don't send any interrupts if intx > > is disabled, nVidia devices send legacy interrupts in addition to MSI ones > > if intx isn't disabled, and Intel devices actually work correctly. So we > > need at least one kind of device quirk for intx and msi. (And doing it in > > the drivers doesn't work, since everybody is making things driven by > > snd_hda_intel and would like msi, afaict) > > Note that INTX_DISABLE is a recent addition to PCI. Older PCI devices support > neither MSI nor INTX-disable, so make sure such devices don't creep into your > sample. I have a device that supports MSI and INTX-disable, and, with MSI on (and delivering interrupts successfully) also sends legacy interrupts (on the IRQ that is no longer associated with the device) unless INTX is disabled. Without the intx_disable(), the kernel disables the IRQ entirely and breaks a random other device in my system. It's: 00:07.0 Bridge: nVidia Corporation MCP61 Ethernet (rev a2) I haven't tried MSI with the other devices in the system, but I expect that this: 00:05.0 Audio device: nVidia Corporation MCP61 High Definition Audio (rev a2) will have the same issue, and use a multi-vendor driver. > In general it is documented that INTX_DISABLE should apply only to INTx# so > devices that disable MSI based on that bit are out of spec. But unfortunately > that is rather irrelevant, since we see these out-of-spec devices in the field > today. It's likewise documented (although maybe arguable in wording) that the device shouldn't send legacy interrupts if MSI is in use, regardless of INTX_DISABLE, but this also happens in the field. I think that the current Linux behavior with respect to INTX_DISABLE is simply due to which hardware bug was present in the device whose driver first got Linux support, but one or the other or both needs a quirk, since there's no behavior that works with everything. And it's still impossible to tell which bug is more common, since MSI isn't used most of the time, even if the hardware supports it, so it's pretty arbitrary which way Linux goes in the non-quirk case. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Fri, 19 Oct 2007, Jeff Garzik wrote: > Linas Vepstas wrote: > > On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote: > > > Since we have little experience on PCI and MSI here, we had to try to > > > > As someone else pointed out, AMD should have *lots* of people with > > pci and msi experience on the payroll. (Folks here buy AMD-designed pci > > chips ...) > > > > > ONLY > > > comment out the pci_intx() call in drivers/ata/ahci.c > > > My system can boot up too with MSI enabled! > > > > > > So does it mean that the root cause is our SB700 SATA controller > > > has a hardware bug where setting INTX_DISABLE in the PCI COMMAND > > > register masks MSI interrupts too? > > > > That's what it sounds like, to me. > > > > > And what is the software solution or workaround? > > > > Not sure. Sounds like the device driver needs a quirk for this part. > > > Take a look at tg3.c net driver change > 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation. > > However, it may turn out that removing the pci_intx() stuff as a general rule > is easier than quirking these devices, if enough of them turn out to have this > hardware bug. At a first approximation, ATI/AMD devices don't send any interrupts if intx is disabled, nVidia devices send legacy interrupts in addition to MSI ones if intx isn't disabled, and Intel devices actually work correctly. So we need at least one kind of device quirk for intx and msi. (And doing it in the drivers doesn't work, since everybody is making things driven by snd_hda_intel and would like msi, afaict) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Fri, 19 Oct 2007, Jeff Garzik wrote: Linas Vepstas wrote: On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote: Since we have little experience on PCI and MSI here, we had to try to As someone else pointed out, AMD should have *lots* of people with pci and msi experience on the payroll. (Folks here buy AMD-designed pci chips ...) ONLY comment out the pci_intx() call in drivers/ata/ahci.c My system can boot up too with MSI enabled! So does it mean that the root cause is our SB700 SATA controller has a hardware bug where setting INTX_DISABLE in the PCI COMMAND register masks MSI interrupts too? That's what it sounds like, to me. And what is the software solution or workaround? Not sure. Sounds like the device driver needs a quirk for this part. Take a look at tg3.c net driver change 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation. However, it may turn out that removing the pci_intx() stuff as a general rule is easier than quirking these devices, if enough of them turn out to have this hardware bug. At a first approximation, ATI/AMD devices don't send any interrupts if intx is disabled, nVidia devices send legacy interrupts in addition to MSI ones if intx isn't disabled, and Intel devices actually work correctly. So we need at least one kind of device quirk for intx and msi. (And doing it in the drivers doesn't work, since everybody is making things driven by snd_hda_intel and would like msi, afaict) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Mon, 22 Oct 2007, Jeff Garzik wrote: Daniel Barkalow wrote: On Fri, 19 Oct 2007, Jeff Garzik wrote: Linas Vepstas wrote: On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote: Since we have little experience on PCI and MSI here, we had to try to As someone else pointed out, AMD should have *lots* of people with pci and msi experience on the payroll. (Folks here buy AMD-designed pci chips ...) ONLY comment out the pci_intx() call in drivers/ata/ahci.c My system can boot up too with MSI enabled! So does it mean that the root cause is our SB700 SATA controller has a hardware bug where setting INTX_DISABLE in the PCI COMMAND register masks MSI interrupts too? That's what it sounds like, to me. And what is the software solution or workaround? Not sure. Sounds like the device driver needs a quirk for this part. Take a look at tg3.c net driver change 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation. However, it may turn out that removing the pci_intx() stuff as a general rule is easier than quirking these devices, if enough of them turn out to have this hardware bug. At a first approximation, ATI/AMD devices don't send any interrupts if intx is disabled, nVidia devices send legacy interrupts in addition to MSI ones if intx isn't disabled, and Intel devices actually work correctly. So we need at least one kind of device quirk for intx and msi. (And doing it in the drivers doesn't work, since everybody is making things driven by snd_hda_intel and would like msi, afaict) Note that INTX_DISABLE is a recent addition to PCI. Older PCI devices support neither MSI nor INTX-disable, so make sure such devices don't creep into your sample. I have a device that supports MSI and INTX-disable, and, with MSI on (and delivering interrupts successfully) also sends legacy interrupts (on the IRQ that is no longer associated with the device) unless INTX is disabled. Without the intx_disable(), the kernel disables the IRQ entirely and breaks a random other device in my system. It's: 00:07.0 Bridge: nVidia Corporation MCP61 Ethernet (rev a2) I haven't tried MSI with the other devices in the system, but I expect that this: 00:05.0 Audio device: nVidia Corporation MCP61 High Definition Audio (rev a2) will have the same issue, and use a multi-vendor driver. In general it is documented that INTX_DISABLE should apply only to INTx# so devices that disable MSI based on that bit are out of spec. But unfortunately that is rather irrelevant, since we see these out-of-spec devices in the field today. It's likewise documented (although maybe arguable in wording) that the device shouldn't send legacy interrupts if MSI is in use, regardless of INTX_DISABLE, but this also happens in the field. I think that the current Linux behavior with respect to INTX_DISABLE is simply due to which hardware bug was present in the device whose driver first got Linux support, but one or the other or both needs a quirk, since there's no behavior that works with everything. And it's still impossible to tell which bug is more common, since MSI isn't used most of the time, even if the hardware supports it, so it's pretty arbitrary which way Linux goes in the non-quirk case. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Mon, 22 Oct 2007, David Miller wrote: My suggestion is: 1) Leave the pci_intx() twiddling code in drivers/pci/msi.c 2) Add quirks for INTX_DISABLE turns off MSI too, this sets a flag in the pci_dev. 3) The pci_intx() calls in drivers/pci/msi.c are skipped if this flag from #2 is set. 4) Add quirk entries for drivers/net/tg3.c chips and these SATA devices we are learning about here, as well as any others we are aware of right now. 5) Remove the pci_intx() workaround code from drivers/net/tg3.c and elsewhere. Seems right to me, and pretty straightforward, except that I don't really understand the pm-related logic in there to know how that should work and whether intx will need to be enabled somewhere in addition to not disabling it in the msi enable code. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Thu, 18 Oct 2007, David Miller wrote: > From: "Shane Huang" <[EMAIL PROTECTED]> > Date: Thu, 18 Oct 2007 18:37:59 +0800 > > > Hi Miller: > > > > Thank you for your response. > > > > The reason why MSIs of these northbridges do not work is still under > > further debug, we are NOT able to tell its hardware issue or software > > issue at this time. But enablement of them will lead to the OS > > installation failure in many distributions like openSUSE, Ubuntu etc: > > https://bugzilla.novell.com/show_bug.cgi?id=302016 > > > > So we have to disable them firstly before we find out the root cause, > > maybe they are just workarounds. > > This logic seems backwards, to me. "shoot first, ask questions later" > To me this it not how to approach this problem. > > Once you turn MSI off, there is next to no incentive to fix the > problem because users aren't running into it any longer. > > The only two devices in that bug report which should be using MSI > would be the SATA controller and the broadcom ethernet NIC. And by > the failed bootup logs provided by the user the problem is clearly > with the SATA controller. And the same SATA controller could show up behind a different northbridge. It would be unfortunate to hit the same device bug independantly on each system and work around it by doing something that won't help the next user. > One common problem we're finding is that some devices have a hardware > bug where setting INTX_DISABLE in the PCI COMMAND register masks MSI > interrupts too. > > I mention this because the user in that report mentions that the > kernel upgrade causes the failure, and one thing we started doing not > too long ago was to set the INTX_DISABLE bit when MSI is enabled for a > device. > > So maybe this SATA controller has this problem too. It is easy to > test, simply comment out all of the pci_intx() function calls in > drivers/pci/msi.c and perform a test boot with MSI enabled. Have we gotten around to having a device quirk for this? I bet it won't be too long before we see a system where the SATA controller doesn't work with INTX disabled and the ethernet controller doesn't work with it enabled, since we've seen devices with each of these bugs. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] PCI: disable MSI on more ATI NorthBridges
On Thu, 18 Oct 2007, David Miller wrote: From: Shane Huang [EMAIL PROTECTED] Date: Thu, 18 Oct 2007 18:37:59 +0800 Hi Miller: Thank you for your response. The reason why MSIs of these northbridges do not work is still under further debug, we are NOT able to tell its hardware issue or software issue at this time. But enablement of them will lead to the OS installation failure in many distributions like openSUSE, Ubuntu etc: https://bugzilla.novell.com/show_bug.cgi?id=302016 So we have to disable them firstly before we find out the root cause, maybe they are just workarounds. This logic seems backwards, to me. shoot first, ask questions later To me this it not how to approach this problem. Once you turn MSI off, there is next to no incentive to fix the problem because users aren't running into it any longer. The only two devices in that bug report which should be using MSI would be the SATA controller and the broadcom ethernet NIC. And by the failed bootup logs provided by the user the problem is clearly with the SATA controller. And the same SATA controller could show up behind a different northbridge. It would be unfortunate to hit the same device bug independantly on each system and work around it by doing something that won't help the next user. One common problem we're finding is that some devices have a hardware bug where setting INTX_DISABLE in the PCI COMMAND register masks MSI interrupts too. I mention this because the user in that report mentions that the kernel upgrade causes the failure, and one thing we started doing not too long ago was to set the INTX_DISABLE bit when MSI is enabled for a device. So maybe this SATA controller has this problem too. It is easy to test, simply comment out all of the pci_intx() function calls in drivers/pci/msi.c and perform a test boot with MSI enabled. Have we gotten around to having a device quirk for this? I bet it won't be too long before we see a system where the SATA controller doesn't work with INTX disabled and the ethernet controller doesn't work with it enabled, since we've seen devices with each of these bugs. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NDAs - ANY KNOWN RULES?
On Wed, 27 Jun 2007, hermann pitton wrote: > Hi, > > such stuff causes a lot of troubles since long. > > Are there any rules, or can everybody go on as some sort of freelancer > exclusively on such? I don't like it! http://www.linux-foundation.org/en/NDA_program In short, the Linux Foundation can negotiate a reasonable NDA for you to sign, and they may be able to show you relevant documents as a freelancer under a reasonable and standardized contract. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NDAs - ANY KNOWN RULES?
On Wed, 27 Jun 2007, hermann pitton wrote: Hi, such stuff causes a lot of troubles since long. Are there any rules, or can everybody go on as some sort of freelancer exclusively on such? I don't like it! http://www.linux-foundation.org/en/NDA_program In short, the Linux Foundation can negotiate a reasonable NDA for you to sign, and they may be able to show you relevant documents as a freelancer under a reasonable and standardized contract. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc5 regression
On Mon, 18 Jun 2007, Linus Torvalds wrote: > On Mon, 18 Jun 2007, Carlo Wood wrote: > > > diff --git a/scripts/package/Makefile b/scripts/package/Makefile > > index 7c434e0..f758b75 100644 > > --- a/scripts/package/Makefile > > +++ b/scripts/package/Makefile > > but this one has actually been modified. To this: > > > +# Dummy file > > +help: > > And finally, > > > diff --git a/scripts/package/builddeb b/scripts/package/builddeb > > deleted file mode 100644 > > index 6edb29f..000 > > That one also has been actually deleted. And "make distclean" doesn't do > that. You have something else going on. Probably make-kpkg removing the in-tree instructions for building debian packages so that its own rules will be used instead or something like that. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc5 regression
On Mon, 18 Jun 2007, Linus Torvalds wrote: On Mon, 18 Jun 2007, Carlo Wood wrote: diff --git a/scripts/package/Makefile b/scripts/package/Makefile index 7c434e0..f758b75 100644 --- a/scripts/package/Makefile +++ b/scripts/package/Makefile but this one has actually been modified. To this: +# Dummy file +help: And finally, diff --git a/scripts/package/builddeb b/scripts/package/builddeb deleted file mode 100644 index 6edb29f..000 That one also has been actually deleted. And make distclean doesn't do that. You have something else going on. Probably make-kpkg removing the in-tree instructions for building debian packages so that its own rules will be used instead or something like that. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: My kernel hangs again: Help with git please
On Sat, 16 Jun 2007, Carlo Wood wrote: > On Sat, Jun 16, 2007 at 01:28:13AM -0400, Daniel Barkalow wrote: > > That's not actually the right image. There's a graph of commits with a lot > > of splitting and joining lines. Each branch and each tag sits something in > > this web. The difference between branches and tags is that you're expected > > to move branch pointers around, and tags stay mostly in place. There's no > > accounting of commits newer than the current spot in the web for a branch > > belonging to that branch, so if you move a branch back to an older tag (or > > other commit), the spot it's leaving is no longer "on the branch". > > Okay, it took me two hours before I understood this... but here's the > picture that I have in mind now: > >master->X(merge point) > /| > / | > ^ branch->3 X > Time | | | > | 2 X > | | > 1 X > | | > \ | > \| >X(branch point) >| Right, except that, in your repository, "master" has ended up pointing to "3" also. Or, in any case, all of your local branches ("master" is no different from other branches, except that it's the initial default name for a branch) are somewhere down the web from the latest stuff from Linus's repositry. > Then if I define a branch pointer to point to '3', then the branch is > 3--2--1. If next I move the branch pointer to point to '2', node '3' is > no longer on the branch because now the branch exists of 2--1, and > HEAD moves to '2' as well. Right, except that "HEAD" is really just a symlink, not a pointer directly to the history; the branch it points to is what you've got in your working directory currently. So in that case, HEAD moves to '2' simply because it's indirection for branch, which has moved to 2. Side note: in more recent versions of git, there's the feature that you seem to be trying to use. It's called "detatched HEAD", and means that you can have HEAD be some arbitrary commit, not a link to a branch. You'd do this with "git checkout ", and then your working directory would match that revision, and "git branch" would have no *, and you wouldn't have a current branch at all, and you wouldn't be moving branch pointers around. But I don't think you're using a version of git that supports this, and you need to get your branch pointers back to the present anyway. > This seems to make most sense in the light of your last sentence. > I don't understand how I'd have moved branch pointers however. I thought > I would just change my working copy along the branch by specifying > tag nodes. Ie, I have a branch '3'(--2--1) and I say: give me '2', > then give me '1' - and when I do: git reset --hard HEAD - it moves > me to 3 because the branch was never touched. git reset --hard moves the current branch to that revision, as well as moving the working directory (and the index, which doesn't matter for your case). If you were thinking that it only changed the working directory, you probably moved some branches without realizing it. > > So master is a point in the web, and bisect jumps around through the web > > according to some special rules (due to having git-bisect use the good/bad > > marks do determine which commit to try next, and jump there). git-bisect > > doesn't really even care that you started on any single branch. It's just > > operating on the web, and the branch you start on is treated as an > > arbitrary commit that has the problem. > > Ok - so it does something magical that I don't have to understand :P > The only thing that matters is that I choose the begin and end point, > the first two points, correctly: where one is bad and the other is good. > I seems that git bisect can't deal with swapping good/bad (the 'bad' > one always has to be the newest revision), so I had decided to call > 'kernel hangs' good and 'kernel works' bad. The problem then is that > I can't find any starting point anymore that is 'bad'. Right; since the normal goal is to find regressions, not fixes, "bad" is the "after it changed" case, and "good" is the "before it changed". It is trying to find a commit which all of the "bad" commits are descended from, and which is descended from only "good" commits. > > You may find "gitk --all" informative. > > The dates on the right side seem to make no sense. Even in a part > where there are no branches/merges at all, the date goes in both > direction (sometimes older, sometimes newer).
Re: My kernel hangs again: Help with git please
On Sat, 16 Jun 2007, Carlo Wood wrote: On Sat, Jun 16, 2007 at 01:28:13AM -0400, Daniel Barkalow wrote: That's not actually the right image. There's a graph of commits with a lot of splitting and joining lines. Each branch and each tag sits something in this web. The difference between branches and tags is that you're expected to move branch pointers around, and tags stay mostly in place. There's no accounting of commits newer than the current spot in the web for a branch belonging to that branch, so if you move a branch back to an older tag (or other commit), the spot it's leaving is no longer on the branch. Okay, it took me two hours before I understood this... but here's the picture that I have in mind now: master-X(merge point) /| / | ^ branch-3 X Time | | | | 2 X | | 1 X | | \ | \| X(branch point) | Right, except that, in your repository, master has ended up pointing to 3 also. Or, in any case, all of your local branches (master is no different from other branches, except that it's the initial default name for a branch) are somewhere down the web from the latest stuff from Linus's repositry. Then if I define a branch pointer to point to '3', then the branch is 3--2--1. If next I move the branch pointer to point to '2', node '3' is no longer on the branch because now the branch exists of 2--1, and HEAD moves to '2' as well. Right, except that HEAD is really just a symlink, not a pointer directly to the history; the branch it points to is what you've got in your working directory currently. So in that case, HEAD moves to '2' simply because it's indirection for branch, which has moved to 2. Side note: in more recent versions of git, there's the feature that you seem to be trying to use. It's called detatched HEAD, and means that you can have HEAD be some arbitrary commit, not a link to a branch. You'd do this with git checkout some revision, and then your working directory would match that revision, and git branch would have no *, and you wouldn't have a current branch at all, and you wouldn't be moving branch pointers around. But I don't think you're using a version of git that supports this, and you need to get your branch pointers back to the present anyway. This seems to make most sense in the light of your last sentence. I don't understand how I'd have moved branch pointers however. I thought I would just change my working copy along the branch by specifying tag nodes. Ie, I have a branch '3'(--2--1) and I say: give me '2', then give me '1' - and when I do: git reset --hard HEAD - it moves me to 3 because the branch was never touched. git reset --hard revision moves the current branch to that revision, as well as moving the working directory (and the index, which doesn't matter for your case). If you were thinking that it only changed the working directory, you probably moved some branches without realizing it. So master is a point in the web, and bisect jumps around through the web according to some special rules (due to having git-bisect use the good/bad marks do determine which commit to try next, and jump there). git-bisect doesn't really even care that you started on any single branch. It's just operating on the web, and the branch you start on is treated as an arbitrary commit that has the problem. Ok - so it does something magical that I don't have to understand :P The only thing that matters is that I choose the begin and end point, the first two points, correctly: where one is bad and the other is good. I seems that git bisect can't deal with swapping good/bad (the 'bad' one always has to be the newest revision), so I had decided to call 'kernel hangs' good and 'kernel works' bad. The problem then is that I can't find any starting point anymore that is 'bad'. Right; since the normal goal is to find regressions, not fixes, bad is the after it changed case, and good is the before it changed. It is trying to find a commit which all of the bad commits are descended from, and which is descended from only good commits. You may find gitk --all informative. The dates on the right side seem to make no sense. Even in a part where there are no branches/merges at all, the date goes in both direction (sometimes older, sometimes newer). Roughly it seems that the newest date is at the top - but I see a lot of times things like: |||O|| Description Author1 2007-05-14 03:43:20 |||O|| Description Author2 2007-05-15 15:10:34 |||O|| Description Author3 2007-05-13 17:50:27 Thus, there seems to be no time related ordering :/ Those dates are when the patches which became the commits were written. The ordering is the lineage of the revisions in the repository
Re: My kernel hangs again: Help with git please
On Sat, 16 Jun 2007, Carlo Wood wrote: > I don't understand - any branch that I am on has many tags. I can use > 'git reset --hard sometag' to change the source tree to that tag (which > works if I look at the version in the Makefile and pick tags that are > far apart enough). That's not actually the right image. There's a graph of commits with a lot of splitting and joining lines. Each branch and each tag sits something in this web. The difference between branches and tags is that you're expected to move branch pointers around, and tags stay mostly in place. There's no accounting of commits newer than the current spot in the web for a branch belonging to that branch, so if you move a branch back to an older tag (or other commit), the spot it's leaving is no longer "on the branch". So master is a point in the web, and bisect jumps around through the web according to some special rules (due to having git-bisect use the good/bad marks do determine which commit to try next, and jump there). git-bisect doesn't really even care that you started on any single branch. It's just operating on the web, and the branch you start on is treated as an arbitrary commit that has the problem. You may find "gitk --all" informative. > Anyway, I tried this: > > $ git checkout master > $ git branch > bisect > * master > origin > $ BRANCH=$(git branch | grep "^\*" | sed -e "s/\* //") > $ echo $BRANCH > master > $ git rev-list --max-count=1 $BRANCH > 5ecd3100e695228ac5e0ce0e325e252c0f11806f > > Is it correct that this last command gives me the 'git id' (if that > is the correct name for the hash) of the revision that my local > working copy is at? Yes. > Can you tell me what is the latest git id that you see? I'm seeing de7f928ca460005086a8296be07c217aac4b625d, but I just got the latest code, more recently than you probably did. > Because, if I compile 5ecd3100e695228ac5e0ce0e325e252c0f11806f is > still hangs at boot :( It looks like you moved master back to 2.6.22-rc4 (with git reset --hard v2.6.22-rc4) at some point. What you should do now is: $ git checkout master $ git merge origin Which should move master forward through the web to "origin", which is (unless you've moved it) what you got from upsteam. Alternatively: $ git checkout master $ git pull Should fetch the latest stuff and advance master to the fetched version. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: My kernel hangs again: Help with git please
On Sat, 16 Jun 2007, Carlo Wood wrote: > Therefore I have the following questions: > > 1) What git command will ASSURE that I get the LATEST >kernel tree checked out? > > I tried this: > > hikaru:/usr/src/kernel/git/linux-2.6>git branch -l > * bisect > master > origin > hikaru:/usr/src/kernel/git/linux-2.6>git reset --hard HEAD HEAD doesn't mean what you think it means. It's the latest revision on the branch with the *. What you want is: $ git checkout master This will move the * to "master", which shouldn't have been affected by any of this, and move your working directory to this point as well. At that point, you should be able to build a working kernel. What "git reset --hard HEAD" does is discard any differences to tracked files between your working directory and the revision you're on. It's relevant if you want to discard local changes, not otherwise. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: My kernel hangs again: Help with git please
On Sat, 16 Jun 2007, Carlo Wood wrote: Therefore I have the following questions: 1) What git command will ASSURE that I get the LATEST kernel tree checked out? I tried this: hikaru:/usr/src/kernel/git/linux-2.6git branch -l * bisect master origin hikaru:/usr/src/kernel/git/linux-2.6git reset --hard HEAD HEAD doesn't mean what you think it means. It's the latest revision on the branch with the *. What you want is: $ git checkout master This will move the * to master, which shouldn't have been affected by any of this, and move your working directory to this point as well. At that point, you should be able to build a working kernel. What git reset --hard HEAD does is discard any differences to tracked files between your working directory and the revision you're on. It's relevant if you want to discard local changes, not otherwise. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: My kernel hangs again: Help with git please
On Sat, 16 Jun 2007, Carlo Wood wrote: I don't understand - any branch that I am on has many tags. I can use 'git reset --hard sometag' to change the source tree to that tag (which works if I look at the version in the Makefile and pick tags that are far apart enough). That's not actually the right image. There's a graph of commits with a lot of splitting and joining lines. Each branch and each tag sits something in this web. The difference between branches and tags is that you're expected to move branch pointers around, and tags stay mostly in place. There's no accounting of commits newer than the current spot in the web for a branch belonging to that branch, so if you move a branch back to an older tag (or other commit), the spot it's leaving is no longer on the branch. So master is a point in the web, and bisect jumps around through the web according to some special rules (due to having git-bisect use the good/bad marks do determine which commit to try next, and jump there). git-bisect doesn't really even care that you started on any single branch. It's just operating on the web, and the branch you start on is treated as an arbitrary commit that has the problem. You may find gitk --all informative. Anyway, I tried this: $ git checkout master $ git branch bisect * master origin $ BRANCH=$(git branch | grep ^\* | sed -e s/\* //) $ echo $BRANCH master $ git rev-list --max-count=1 $BRANCH 5ecd3100e695228ac5e0ce0e325e252c0f11806f Is it correct that this last command gives me the 'git id' (if that is the correct name for the hash) of the revision that my local working copy is at? Yes. Can you tell me what is the latest git id that you see? I'm seeing de7f928ca460005086a8296be07c217aac4b625d, but I just got the latest code, more recently than you probably did. Because, if I compile 5ecd3100e695228ac5e0ce0e325e252c0f11806f is still hangs at boot :( It looks like you moved master back to 2.6.22-rc4 (with git reset --hard v2.6.22-rc4) at some point. What you should do now is: $ git checkout master $ git merge origin Which should move master forward through the web to origin, which is (unless you've moved it) what you got from upsteam. Alternatively: $ git checkout master $ git pull Should fetch the latest stuff and advance master to the fetched version. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Thu, 26 Apr 2007, Adrian Bunk wrote: > Linus said 2.6.20 was a stable kernel. My impression was that at least > two of the regressions from my 2.6.20 regressions list should have been > fixed before 2.6.20. > > They have both been fixed through -stable, but I also remember a quite > experienced kernel maintainer running into one of them after 2.6.20 was > released and spending half a day tracking it down - and my answer was > "known unfixed regression, first reported more than a month ago". I think there is an issue with two different things being conflated, and this causes real stability problems. 2.6.x is both the first kernel in a series that is judged to be "stable" and the kernel that is the split between 2.6.x.y and 2.6.x+1. This is a fundamental problem, because it means that 2.6.x must have all of the problems that are being debugged by the people who understand the areas they are in, because 2.6.x+1 has to start so that people who are clueless about all of the areas with remaining bugs don't spend their time putting more regressions into their submissions for 2.6.x+1. It is also a problem because it is easily possible for a problem to exist in 2.6.x-rcN which can only be correctly fixed by doing intrusive things, but can be papered over in an obviously-safe way. (E.g., the issue with legacy interrupt delivery when MSI is enabled). The intrusive patch could easily break a bunch of unrelated stuff, so that's no good for 2.6.x-rcN, but papering over bugs is no good for mainline. These bugs have to be fixed after the split, which means that the version at the fork must contain the bug. Furthermore, everybody (people reporting bugs, people fixing them, and people merging fixes) seem to doze off late in -rc kernels. Having an announcement of something with a qualitatively different version wakes them up. I say have a target of no known regressions in 2.6.21.1, with 2.6.21 being pretty good, and don't count too much on the stability of 3-number kernel versions. > And a serious delay of the next regression-merge window due to unfixed > regressions might even have the positive side effect of more developers > becoming interested in fixing the current regressions for getting their > shiny new regressions^Wfeatures faster into Linus' tree. I think the "stick" can't be delaying the window, because that's too broad. I think it has to be making people who are needed for fixing stuff miss the window. People aren't going to go learn a new area of the kernel to resolve regressions in it, but they're more likely to keep their own area clean so that they get to merge every 2 months instead of every 4. > These are just my personal opinions, and other people consider the > resulting 2.6.20 and 2.6.21 kernels OK. I don't think 2.6.x can be OK, by policy. I think 2.6.20.y got to an OK state eventually, which is to say that there's no need now to use a 2.6.19.y kernel. I think that 2.6.21 isn't OK yet, but I think looking for an OK 2.6.21-derived kernel is premature still. Ignoring the version scheme entirely, I think the success condition should be that the "latest stable version of the Linux kernel" link on www.kernel.org is always strictly better than all previous links in that spot, and new features get there eventually (ideally, within 4 months of hitting mainline). I think this is actually possible, although it would require changing the policy for this link. And I don't think it requires a change in what goes into Linus's git repository when. Furthermore, I think we're a lot closer to an OK kernel derived from Linus's Apr 25 version than we would be if "2.6.21" had not been released at that point. It sounds like more items were resolved in the past few days than in the preceding week. Incidentally, will you continue to track 2.6.21 regressions against 2.6.20? You said there was at least one that you haven't sent out, and there's been movement on several others since your last report. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Thu, 26 Apr 2007, Adrian Bunk wrote: Linus said 2.6.20 was a stable kernel. My impression was that at least two of the regressions from my 2.6.20 regressions list should have been fixed before 2.6.20. They have both been fixed through -stable, but I also remember a quite experienced kernel maintainer running into one of them after 2.6.20 was released and spending half a day tracking it down - and my answer was known unfixed regression, first reported more than a month ago. I think there is an issue with two different things being conflated, and this causes real stability problems. 2.6.x is both the first kernel in a series that is judged to be stable and the kernel that is the split between 2.6.x.y and 2.6.x+1. This is a fundamental problem, because it means that 2.6.x must have all of the problems that are being debugged by the people who understand the areas they are in, because 2.6.x+1 has to start so that people who are clueless about all of the areas with remaining bugs don't spend their time putting more regressions into their submissions for 2.6.x+1. It is also a problem because it is easily possible for a problem to exist in 2.6.x-rcN which can only be correctly fixed by doing intrusive things, but can be papered over in an obviously-safe way. (E.g., the issue with legacy interrupt delivery when MSI is enabled). The intrusive patch could easily break a bunch of unrelated stuff, so that's no good for 2.6.x-rcN, but papering over bugs is no good for mainline. These bugs have to be fixed after the split, which means that the version at the fork must contain the bug. Furthermore, everybody (people reporting bugs, people fixing them, and people merging fixes) seem to doze off late in -rc kernels. Having an announcement of something with a qualitatively different version wakes them up. I say have a target of no known regressions in 2.6.21.1, with 2.6.21 being pretty good, and don't count too much on the stability of 3-number kernel versions. And a serious delay of the next regression-merge window due to unfixed regressions might even have the positive side effect of more developers becoming interested in fixing the current regressions for getting their shiny new regressions^Wfeatures faster into Linus' tree. I think the stick can't be delaying the window, because that's too broad. I think it has to be making people who are needed for fixing stuff miss the window. People aren't going to go learn a new area of the kernel to resolve regressions in it, but they're more likely to keep their own area clean so that they get to merge every 2 months instead of every 4. These are just my personal opinions, and other people consider the resulting 2.6.20 and 2.6.21 kernels OK. I don't think 2.6.x can be OK, by policy. I think 2.6.20.y got to an OK state eventually, which is to say that there's no need now to use a 2.6.19.y kernel. I think that 2.6.21 isn't OK yet, but I think looking for an OK 2.6.21-derived kernel is premature still. Ignoring the version scheme entirely, I think the success condition should be that the latest stable version of the Linux kernel link on www.kernel.org is always strictly better than all previous links in that spot, and new features get there eventually (ideally, within 4 months of hitting mainline). I think this is actually possible, although it would require changing the policy for this link. And I don't think it requires a change in what goes into Linus's git repository when. Furthermore, I think we're a lot closer to an OK kernel derived from Linus's Apr 25 version than we would be if 2.6.21 had not been released at that point. It sounds like more items were resolved in the past few days than in the preceding week. Incidentally, will you continue to track 2.6.21 regressions against 2.6.20? You said there was at least one that you haven't sent out, and there's been movement on several others since your last report. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Thu, 26 Apr 2007, Adrian Bunk wrote: > Number of different known regressions compared to 2.6.20 at the time > of the 2.6.21 release: > 14 I count 13. (v2) had 15 items, of which 2 were subsequently fixed or found to be inapplicable. > Number of different known regressions compared to 2.6.20 at the time > of the 2.6.21 release with patches available at the time of the 2.6.21 > release [1]: > 3 The -stable team can presumably take care of these in 2.6.21.1, right? That leaves 10 that need developer attention. John Stultz seems to be taking care of 3 of them. Oliver Neukum has 1. 2 are particular drivers (ali_pata and rtl8139, according to the reports). 2 seem to be ACPI-related; at least one has a candidate patch now. 1 seems to be an ALSA problem. 1 is STD and being debugged. It looks like all of the known regressions are being worked on, and getting fixes in for them is -stable material at this point. Furthermore, it doesn't look to me like anyone who is needed for dealing with these regressions is trying to get stuff into the 2.6.22 merge window. I think it's clear that this is the right point for Linus to start the 2.6.22 cycle and leave the rest of the 2.6.21 work to the -stable team, who are the experts of taking care of this sort of stuff. Furthermore, it seems like -rc testers at this point have found everything in 2.6.21-rc they're going to, so, again, it's time for new regressions. Personally, I'd vote for having Linus leave off at 2.6.X-final, and have 2.6.X be the first -stable release of the series, where the remaining known regressions get fixed, but that's an issue of nomenclature, not development process. I think you've allowed for a well-tested 2.6.21, and a good chance of a 2.6.21.1 or .2 with no known regressions against 2.6.20, which seems to me like you succeeded as far as everything except making Linus a release engineer. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Thu, 26 Apr 2007, Adrian Bunk wrote: Number of different known regressions compared to 2.6.20 at the time of the 2.6.21 release: 14 I count 13. (v2) had 15 items, of which 2 were subsequently fixed or found to be inapplicable. Number of different known regressions compared to 2.6.20 at the time of the 2.6.21 release with patches available at the time of the 2.6.21 release [1]: 3 The -stable team can presumably take care of these in 2.6.21.1, right? That leaves 10 that need developer attention. John Stultz seems to be taking care of 3 of them. Oliver Neukum has 1. 2 are particular drivers (ali_pata and rtl8139, according to the reports). 2 seem to be ACPI-related; at least one has a candidate patch now. 1 seems to be an ALSA problem. 1 is STD and being debugged. It looks like all of the known regressions are being worked on, and getting fixes in for them is -stable material at this point. Furthermore, it doesn't look to me like anyone who is needed for dealing with these regressions is trying to get stuff into the 2.6.22 merge window. I think it's clear that this is the right point for Linus to start the 2.6.22 cycle and leave the rest of the 2.6.21 work to the -stable team, who are the experts of taking care of this sort of stuff. Furthermore, it seems like -rc testers at this point have found everything in 2.6.21-rc they're going to, so, again, it's time for new regressions. Personally, I'd vote for having Linus leave off at 2.6.X-final, and have 2.6.X be the first -stable release of the series, where the remaining known regressions get fixed, but that's an issue of nomenclature, not development process. I think you've allowed for a well-tested 2.6.21, and a good chance of a 2.6.21.1 or .2 with no known regressions against 2.6.20, which seems to me like you succeeded as far as everything except making Linus a release engineer. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] New driver information
On Fri, 16 Feb 2007, Heikki Orsila wrote: > I just read > > http://kerneltrap.org/node/7729 > > and it occured to me that it would be informative to have a new device > driver macro. The motivation for the new macro would be 4 issues: > > * Is it possible to get specifications for the device? > * If yes, under what terms? (nda, public) > * Where to get public specs? > * How many closed and open drivers in the Linux source tree? This doesn't make any sense as a driver macro, because it's per device, not per driver. E.g., the sdhci driver drives a number of devices, including both well-documented devices and devices whose only documentation is that the PCI ID matches (and they work with only a few quirks). On the other hand, a kconfig-readable table of PCI, USB, etc IDs with this information isn't a bad idea, especially if the drivers actually depend on it (so that it has to be kept up to date, at least as far as the device/driver mapping). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] New driver information
On Fri, 16 Feb 2007, Heikki Orsila wrote: I just read http://kerneltrap.org/node/7729 and it occured to me that it would be informative to have a new device driver macro. The motivation for the new macro would be 4 issues: * Is it possible to get specifications for the device? * If yes, under what terms? (nda, public) * Where to get public specs? * How many closed and open drivers in the Linux source tree? This doesn't make any sense as a driver macro, because it's per device, not per driver. E.g., the sdhci driver drives a number of devices, including both well-documented devices and devices whose only documentation is that the PCI ID matches (and they work with only a few quirks). On the other hand, a kconfig-readable table of PCI, USB, etc IDs with this information isn't a bad idea, especially if the drivers actually depend on it (so that it has to be kept up to date, at least as far as the device/driver mapping). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is this bug too obvious?
On Tue, 13 Feb 2007, Chuck Ebbert wrote: > drivers/usb/net/usbnet.c: > > int > usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) > { > struct usbnet *dev; > struct net_device *net; > struct usb_host_interface *interface; > struct driver_info *info; > struct usb_device *xdev; > int status; > > ... > > net = alloc_etherdev(sizeof(*dev)); > > *net ??? No, alloc_etherdev takes the size of the private data, which, in this case, is *dev. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is this bug too obvious?
On Tue, 13 Feb 2007, Chuck Ebbert wrote: drivers/usb/net/usbnet.c: int usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) { struct usbnet *dev; struct net_device *net; struct usb_host_interface *interface; struct driver_info *info; struct usb_device *xdev; int status; ... net = alloc_etherdev(sizeof(*dev)); *net ??? No, alloc_etherdev takes the size of the private data, which, in this case, is *dev. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: NAK new drivers without proper power management?
On Sun, 11 Feb 2007, Rafael J. Wysocki wrote: > The problem is it was made implicit long ago. The design is "optimistic", so > to speak, and I think we have the following choices: > > 1) Change the design to make the kernel refuse to suspend if there are any > drivers not explicitly flagged as "suspend/resume-safe". [This looks like a > lot of work to me, but it is generally doable provided that someone has enough > time to do it. Unfortunately it has to be done in one shot for all of the > known good drivers to avoid user-observable regressions.] The kernel wouldn't necessarily have to refuse to suspend. It could just warn (and list the drivers that aren't marked), or could require some extra insistance from the user. It would be good to have it log a message saying something like: "If you can read this, report that ne2000 seems to be safe for suspend/resume". Having drivers explicitly marked as to whether they are safe is a good kernel feature; what to do if they're not is policy. > 2) Require the authors of new drivers to _either_ ensure that their drivers > will be suspend/resume-safe (and I mean both STR and STD here), _or_ > explicitly > flag the drivers as "suspend/resume-unsafe", for example by impelenting > .suspend() routines returning -ENOSYS. [The existing drivers can be modified > to follow this convention gradually.] I don't see any reason not to do (2) regardless of (1). That was (my idea of) the statement that started this thread: new drivers need to not mess up on suspend/resume, as a matter of suitability for inclusion. Of course, we need some way for drivers to indicate that they work fine with the PCI-layer defaults. And it should probably more machine-readable than the author telling reviewers that it works. > - Problem what to do with drivers that work for some people and don't work > for the others (ie. if we don't flag them as known good, we will break the > setups in which they work) I think the only interesting case here is when a device resumes fine with no driver support if the BIOS manages to deal effectively with it, but the BIOS generally doesn't. Otherwise, I think it's only going to work at all if the author put in the effort to make it work (so it should be "known good"), but there may be bugs (firmware, BIOS, driver, etc). But that's true of any functionality. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: NAK new drivers without proper power management?
On Sun, 11 Feb 2007, Rafael J. Wysocki wrote: The problem is it was made implicit long ago. The design is optimistic, so to speak, and I think we have the following choices: 1) Change the design to make the kernel refuse to suspend if there are any drivers not explicitly flagged as suspend/resume-safe. [This looks like a lot of work to me, but it is generally doable provided that someone has enough time to do it. Unfortunately it has to be done in one shot for all of the known good drivers to avoid user-observable regressions.] The kernel wouldn't necessarily have to refuse to suspend. It could just warn (and list the drivers that aren't marked), or could require some extra insistance from the user. It would be good to have it log a message saying something like: If you can read this, report that ne2000 seems to be safe for suspend/resume. Having drivers explicitly marked as to whether they are safe is a good kernel feature; what to do if they're not is policy. 2) Require the authors of new drivers to _either_ ensure that their drivers will be suspend/resume-safe (and I mean both STR and STD here), _or_ explicitly flag the drivers as suspend/resume-unsafe, for example by impelenting .suspend() routines returning -ENOSYS. [The existing drivers can be modified to follow this convention gradually.] I don't see any reason not to do (2) regardless of (1). That was (my idea of) the statement that started this thread: new drivers need to not mess up on suspend/resume, as a matter of suitability for inclusion. Of course, we need some way for drivers to indicate that they work fine with the PCI-layer defaults. And it should probably more machine-readable than the author telling reviewers that it works. - Problem what to do with drivers that work for some people and don't work for the others (ie. if we don't flag them as known good, we will break the setups in which they work) I think the only interesting case here is when a device resumes fine with no driver support if the BIOS manages to deal effectively with it, but the BIOS generally doesn't. Otherwise, I think it's only going to work at all if the author put in the effort to make it work (so it should be known good), but there may be bugs (firmware, BIOS, driver, etc). But that's true of any functionality. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: NAK new drivers without proper power management?
On Sat, 10 Feb 2007, Rafael J. Wysocki wrote: > On Saturday, 10 February 2007 11:02, Nigel Cunningham wrote: > > > Well, the original desire was to stop new drivers getting in without > > proper power management. > > I know, but I agree with the argument that having a driver without the > suspend/resume support is better than not having the driver at all. How about if "proper power management" is defined to include the driver explicitly preventing suspend? It seems to me like the current problem is that driver writers don't think about power management at all, and the result is that, after suspend/resume, the system doesn't come back. It would be better if driver writers had to think about power management just enough to realize that it's not going to work, and make this information available to the system. At that point, it's relatively easy for the system to do something useful about it. > Also, I think there are quite some drivers already in the tree that don't > support suspend/resume explicitly and honestly we should start from adding the > suspend/resume routines to these drivers _before_ we ban new drivers like > that. It'd be relatively quick to modify all the current drivers that don't explicitly support suspend/resume to explicitly not support it. (Or to explicitly support it trivially; /dev/null obviously doesn't need anything.) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: NAK new drivers without proper power management?
On Sat, 10 Feb 2007, Rafael J. Wysocki wrote: On Saturday, 10 February 2007 11:02, Nigel Cunningham wrote: Well, the original desire was to stop new drivers getting in without proper power management. I know, but I agree with the argument that having a driver without the suspend/resume support is better than not having the driver at all. How about if proper power management is defined to include the driver explicitly preventing suspend? It seems to me like the current problem is that driver writers don't think about power management at all, and the result is that, after suspend/resume, the system doesn't come back. It would be better if driver writers had to think about power management just enough to realize that it's not going to work, and make this information available to the system. At that point, it's relatively easy for the system to do something useful about it. Also, I think there are quite some drivers already in the tree that don't support suspend/resume explicitly and honestly we should start from adding the suspend/resume routines to these drivers _before_ we ban new drivers like that. It'd be relatively quick to modify all the current drivers that don't explicitly support suspend/resume to explicitly not support it. (Or to explicitly support it trivially; /dev/null obviously doesn't need anything.) -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth problems on 2.6.20-rc6-mm3
On Sun, 4 Feb 2007, Robert Hancock wrote: > Something's busted with forcedeth in 2.6.20-rc6-mm3 for me relative to > 2.6.20-rc6. There's no errors in dmesg, but it seems no packets ever get > received and so the machine can't get an IP address. I tried reverting all the > -mm changes to drivers/net/forcedeth.c, which didn't help. The network > controller shares an IRQ with the USB OHCI controller which is receiving > interrupts, so it doesn't seem like an interrupt routing problem, though I > suppose something wierd could be happening there. IIRC, forcedeth tries to use MSI by default. Perhaps the hardware is using it, but the kernel thinks enabling it didn't work? I think there's a module option for forcedeth to disable MSI, which might be worth a try to see if it has any effect. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth problems on 2.6.20-rc6-mm3
On Sun, 4 Feb 2007, Robert Hancock wrote: Something's busted with forcedeth in 2.6.20-rc6-mm3 for me relative to 2.6.20-rc6. There's no errors in dmesg, but it seems no packets ever get received and so the machine can't get an IP address. I tried reverting all the -mm changes to drivers/net/forcedeth.c, which didn't help. The network controller shares an IRQ with the USB OHCI controller which is receiving interrupts, so it doesn't seem like an interrupt routing problem, though I suppose something wierd could be happening there. IIRC, forcedeth tries to use MSI by default. Perhaps the hardware is using it, but the kernel thinks enabling it didn't work? I think there's a module option for forcedeth to disable MSI, which might be worth a try to see if it has any effect. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18-stable release plans?
On Tue, 23 Jan 2007, Jesper Juhl wrote: > Now that 2.6.19 is out, most likely not. -stable releases are made > for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one > -stable patches are made for (2.6.16 is an exception).. There's generally a bit of overlap. 2.6.17.14 was about the same time as 2.6.18.1, and 2.6.18.6 was after 2.6.19.1. But 2.6.18.x must be over now, because the -stable team didn't release a 2.6.18.7 to match 2.6.19.2, and all of 2.6.x except for 2.6.19.2 has that weird file corruption bug (although rarely triggered). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18-stable release plans?
On Tue, 23 Jan 2007, Jesper Juhl wrote: Now that 2.6.19 is out, most likely not. -stable releases are made for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one -stable patches are made for (2.6.16 is an exception).. There's generally a bit of overlap. 2.6.17.14 was about the same time as 2.6.18.1, and 2.6.18.6 was after 2.6.19.1. But 2.6.18.x must be over now, because the -stable team didn't release a 2.6.18.7 to match 2.6.19.2, and all of 2.6.x except for 2.6.19.2 has that weird file corruption bug (although rarely triggered). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Unbreak MSI on ATI devices
On Fri, 5 Jan 2007, Petr Vandrovec wrote: > Hi, > unfortunately it is not everything :-( > > I cannot get MSI to work on IDE interface under any circumstances - in legacy > mode it always uses IRQ14/15 regardless of whether MSI is enabled or not > (that's probably correct), but in native mode as soon as I enable MSI it > either does not deliver interrupts at all (definitely not through IRQ14/15, > and, if I got routing right, also not through its INTA#), or it delivers them > somewhere else than where programmed. As my boot device is connected to this > adapter, and it is a notebook, it is not easy to debug what's really going on > :-( Are you doing this with INTx left on or turned off? Have you determined whether turning off INTx does anything useful on these devices when you're not using MSI? (There are only a few places in the kernel which disable INTx, mostly associated with enabling MSI.) It might be easier to test if you boot off a USB storage device of some sort. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Unbreak MSI on ATI devices
On Thu, 4 Jan 2007, Roland Dreier wrote: > > So my question is - what is real reason for disabling INTX when in MSI > mode? > > According to PCI spec it should not be needed, and it hurts at least chips > > listed below: > > > > 00:13.0 0c03: 1002:4374 USB Controller: ATI Technologies Inc IXP SB400 USB > Host Controller > > 00:13.1 0c03: 1002:4375 USB Controller: ATI Technologies Inc IXP SB400 USB > Host Controller > > 00:13.2 0c03: 1002:4373 USB Controller: ATI Technologies Inc IXP SB400 > USB2 Host Controller > > heh... I'm not gloating or anything... but I am glad that some ASIC > designer was careless enough to prove me right when I said going > beyond what the PCI spec requires is dangerous. No more dangerous than expecting exactly following the PCI spec to be sufficient; at least some nVidia devices misbehave if you don't disable INTx when using MSI, while at least some ATI devices misehave if you do disable INTx. The only *safe* thing is to ignore the PCI spec and match the behavior of Windows. In this case, that's just don't use MSI yet. Of course, this should be relatively easy to handle with quirks, especially if it's predictable which hardware bug you get from the vendor id. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Unbreak MSI on ATI devices
On Thu, 4 Jan 2007, Roland Dreier wrote: So my question is - what is real reason for disabling INTX when in MSI mode? According to PCI spec it should not be needed, and it hurts at least chips listed below: 00:13.0 0c03: 1002:4374 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller 00:13.1 0c03: 1002:4375 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller 00:13.2 0c03: 1002:4373 USB Controller: ATI Technologies Inc IXP SB400 USB2 Host Controller heh... I'm not gloating or anything... but I am glad that some ASIC designer was careless enough to prove me right when I said going beyond what the PCI spec requires is dangerous. No more dangerous than expecting exactly following the PCI spec to be sufficient; at least some nVidia devices misbehave if you don't disable INTx when using MSI, while at least some ATI devices misehave if you do disable INTx. The only *safe* thing is to ignore the PCI spec and match the behavior of Windows. In this case, that's just don't use MSI yet. Of course, this should be relatively easy to handle with quirks, especially if it's predictable which hardware bug you get from the vendor id. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Unbreak MSI on ATI devices
On Fri, 5 Jan 2007, Petr Vandrovec wrote: Hi, unfortunately it is not everything :-( I cannot get MSI to work on IDE interface under any circumstances - in legacy mode it always uses IRQ14/15 regardless of whether MSI is enabled or not (that's probably correct), but in native mode as soon as I enable MSI it either does not deliver interrupts at all (definitely not through IRQ14/15, and, if I got routing right, also not through its INTA#), or it delivers them somewhere else than where programmed. As my boot device is connected to this adapter, and it is a notebook, it is not easy to debug what's really going on :-( Are you doing this with INTx left on or turned off? Have you determined whether turning off INTx does anything useful on these devices when you're not using MSI? (There are only a few places in the kernel which disable INTx, mostly associated with enabling MSI.) It might be easier to test if you boot off a USB storage device of some sort. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc2: known unfixed regressions
On Fri, 29 Dec 2006, Adrian Bunk wrote: > On Fri, Dec 29, 2006 at 01:14:13PM -0500, Daniel Barkalow wrote: > > > There's also http://lkml.org/lkml/2006/12/21/47; the included patch break > > my nVidia devices and probably all PCIX devices, so it's not right, but > > something has to be done to fix ATI. My guess is a quirk to say that > > pci_intx doesn't work on certain devices and should just be skipped, but > > I'm not sure if it's just in combination with MSI or not. > > This: > - does not seem to be a regression and > - missing MSI support is not such a big problem. > > Considering how many problems patches in this area tend to cause on > different hardware, I'm even inclined to say that such patches should > only be added during the 2 weeks merge window before -rc1. (I was only talking about the first issue/patch as being a regression, obviously, and forgot that there was more to the email I cited.) Ah, okay. I somehow missed that all of the devices that were reported to break with the MSI change in mainline doesn't support MSI in mainline. Actually, I wouldn't be surprised if this issue applied to audio on ATI SB450 and later, which (I think) use the hda_intel driver, which supports MSI (although I guess it's still defaulting to disabled). If this is true, it would be a regression since 2.6.19. The addition of a quirk to not use pci_intx with MSI on ATI PCI devices should be safe (until 2.6.20-rc1, this was the usual kernel behavior), but is clearly not critical if mainline doesn't use MSI with any such devices anyway. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc2: known unfixed regressions
There's also http://lkml.org/lkml/2006/12/21/47; the included patch break my nVidia devices and probably all PCIX devices, so it's not right, but something has to be done to fix ATI. My guess is a quirk to say that pci_intx doesn't work on certain devices and should just be skipped, but I'm not sure if it's just in combination with MSI or not. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc2: known unfixed regressions
There's also http://lkml.org/lkml/2006/12/21/47; the included patch break my nVidia devices and probably all PCIX devices, so it's not right, but something has to be done to fix ATI. My guess is a quirk to say that pci_intx doesn't work on certain devices and should just be skipped, but I'm not sure if it's just in combination with MSI or not. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc2: known unfixed regressions
On Fri, 29 Dec 2006, Adrian Bunk wrote: On Fri, Dec 29, 2006 at 01:14:13PM -0500, Daniel Barkalow wrote: There's also http://lkml.org/lkml/2006/12/21/47; the included patch break my nVidia devices and probably all PCIX devices, so it's not right, but something has to be done to fix ATI. My guess is a quirk to say that pci_intx doesn't work on certain devices and should just be skipped, but I'm not sure if it's just in combination with MSI or not. This: - does not seem to be a regression and - missing MSI support is not such a big problem. Considering how many problems patches in this area tend to cause on different hardware, I'm even inclined to say that such patches should only be added during the 2 weeks merge window before -rc1. (I was only talking about the first issue/patch as being a regression, obviously, and forgot that there was more to the email I cited.) Ah, okay. I somehow missed that all of the devices that were reported to break with the MSI change in mainline doesn't support MSI in mainline. Actually, I wouldn't be surprised if this issue applied to audio on ATI SB450 and later, which (I think) use the hda_intel driver, which supports MSI (although I guess it's still defaulting to disabled). If this is true, it would be a regression since 2.6.19. The addition of a quirk to not use pci_intx with MSI on ATI PCI devices should be safe (until 2.6.20-rc1, this was the usual kernel behavior), but is clearly not critical if mainline doesn't use MSI with any such devices anyway. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Unbreak MSI on ATI devices
On Thu, 21 Dec 2006, Petr Vandrovec wrote: > So my question is - what is real reason for disabling INTX when in MSI mode? > According to PCI spec it should not be needed. The PCI spec is at least not clear enough on the matter to keep nVidia from thinking that it's the OS's responsibility to make legacy interrupts not happen, by disabling INTX. > None of devices in the box assert INTX while in MSI even if INTX is enabled. I've got a forcedeth-driven ethernet card that does, and people have reported that nVidia "Intel HDA" sound does as well. > So I'd like to see first patch below accepted. If there are some > devices which require INTX disabling, then apparently decision whether > to disable it or no has to be moved to device drivers, or some > blacklist/whitelist must be created... PCI Express (IIRC) had the pci_intx() calls already, so it's probably actually required by the spec (or at least common implementations) there. I'd guess that it's more common for hardware to be unhappy with intx enabled than to be unhappy with intx disabled, since the hardware is supposed to not send legacy interrupts. > I'm not sure about second one - I have it in my tree for months, but I run > that kernel only on hardware mentioned above, so it is probably too dangerous > until pci_enable_msi() gets answer whether MSI works or no always right. I think it'd be better to add an module parameter, like in the later drivers in your patch. Figuring out how to get MSI working whenever it's available isn't going to move forward unless there's an easy way to test it, especially since (according to rumor) Windows doesn't use it at all. > /proc/interrupts after patch. Before patch *hci_hcd:usb* were at zero, > IRQ21 was stuck with IRQ count at 1, and HCD complained about > "Unlink after no-IRQ?". Maybe the intx disable is just totally broken for your device? It certainly shouldn't cause the delivery of *more* legacy interrupts, and if it does with MSI enabled, I'd be surprised if it didn't without MSI. My guess is that that device should get a quirk to just leave the INTx disable bit alone (such that pci_intx doesn't do anything, regardless of context). > diff -uprdN linux/sound/pci/atiixp.c linux/sound/pci/atiixp.c > --- linux/sound/pci/atiixp.c 2006-12-16 13:35:47.0 -0800 > +++ linux/sound/pci/atiixp.c 2006-12-16 13:57:09.0 -0800 > @@ -1442,6 +1446,11 @@ static int snd_atiixp_suspend(struct pci > snd_atiixp_aclink_down(chip); > snd_atiixp_chip_stop(chip); > > + if (chip->have_msi) { > + pci_disable_msi(pci); > + } else { > + pci_intx(pci, 0); > + } This doesn't look right, at least for !chip->have_msi. Or is disabling intx desirable here for non-MSI? I'd guess that devices that freak out if you fiddle with intx are likely to be old, and therefore likely to not support MSI. > @@ -1532,6 +1546,11 @@ static int snd_atiixp_free(struct atiixp > if (chip->remap_addr) > iounmap(chip->remap_addr); > pci_release_regions(chip->pci); > + if (chip->have_msi) { > + pci_disable_msi(chip->pci); > + } else { > + pci_intx(chip->pci, 0); > + } My playing with forcedeth trying to get my system working suggests that the expected intx state for a device with no driver is "not disabled". I think the else clause here would cause the device to not work if you used this driver, unloaded the module, and loaded a version without the patch (or kexeced an older kernel, or soft-rebooted into some operating system without MSI support). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Unbreak MSI on ATI devices
On Thu, 21 Dec 2006, Petr Vandrovec wrote: So my question is - what is real reason for disabling INTX when in MSI mode? According to PCI spec it should not be needed. The PCI spec is at least not clear enough on the matter to keep nVidia from thinking that it's the OS's responsibility to make legacy interrupts not happen, by disabling INTX. None of devices in the box assert INTX while in MSI even if INTX is enabled. I've got a forcedeth-driven ethernet card that does, and people have reported that nVidia Intel HDA sound does as well. So I'd like to see first patch below accepted. If there are some devices which require INTX disabling, then apparently decision whether to disable it or no has to be moved to device drivers, or some blacklist/whitelist must be created... PCI Express (IIRC) had the pci_intx() calls already, so it's probably actually required by the spec (or at least common implementations) there. I'd guess that it's more common for hardware to be unhappy with intx enabled than to be unhappy with intx disabled, since the hardware is supposed to not send legacy interrupts. I'm not sure about second one - I have it in my tree for months, but I run that kernel only on hardware mentioned above, so it is probably too dangerous until pci_enable_msi() gets answer whether MSI works or no always right. I think it'd be better to add an module parameter, like in the later drivers in your patch. Figuring out how to get MSI working whenever it's available isn't going to move forward unless there's an easy way to test it, especially since (according to rumor) Windows doesn't use it at all. /proc/interrupts after patch. Before patch *hci_hcd:usb* were at zero, IRQ21 was stuck with IRQ count at 1, and HCD complained about Unlink after no-IRQ?. Maybe the intx disable is just totally broken for your device? It certainly shouldn't cause the delivery of *more* legacy interrupts, and if it does with MSI enabled, I'd be surprised if it didn't without MSI. My guess is that that device should get a quirk to just leave the INTx disable bit alone (such that pci_intx doesn't do anything, regardless of context). diff -uprdN linux/sound/pci/atiixp.c linux/sound/pci/atiixp.c --- linux/sound/pci/atiixp.c 2006-12-16 13:35:47.0 -0800 +++ linux/sound/pci/atiixp.c 2006-12-16 13:57:09.0 -0800 @@ -1442,6 +1446,11 @@ static int snd_atiixp_suspend(struct pci snd_atiixp_aclink_down(chip); snd_atiixp_chip_stop(chip); + if (chip-have_msi) { + pci_disable_msi(pci); + } else { + pci_intx(pci, 0); + } This doesn't look right, at least for !chip-have_msi. Or is disabling intx desirable here for non-MSI? I'd guess that devices that freak out if you fiddle with intx are likely to be old, and therefore likely to not support MSI. @@ -1532,6 +1546,11 @@ static int snd_atiixp_free(struct atiixp if (chip-remap_addr) iounmap(chip-remap_addr); pci_release_regions(chip-pci); + if (chip-have_msi) { + pci_disable_msi(chip-pci); + } else { + pci_intx(chip-pci, 0); + } My playing with forcedeth trying to get my system working suggests that the expected intx state for a device with no driver is not disabled. I think the else clause here would cause the device to not work if you used this driver, unloaded the module, and loaded a version without the patch (or kexeced an older kernel, or soft-rebooted into some operating system without MSI support). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth trouble in 2.6.19(.1)
On Tue, 19 Dec 2006, John M Flinchbaugh wrote: > I saw a mention of interrupt handling for forcedeth cards is the > 2.6.19.1 changelog, but I still see this error in 2.6.19.1. It started > in 2.6.19, and it didn't happen in 2.6.18.1. Nope; the issue fixed in 2.6.19.1 has always existed (provided you had hardware suitable to trigger it). And it was an issue of getting bogus legacy interrupts when using MSI, which would lead to some other device on the same legacy interrupt getting disabled. I'd suggest reverting 0a07bc645e818b88559d99f52ad45e35352e8228 (fixes a lockdep warning, stuff with interrupts, only build tested) as a first guess. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth trouble in 2.6.19(.1)
On Tue, 19 Dec 2006, John M Flinchbaugh wrote: I saw a mention of interrupt handling for forcedeth cards is the 2.6.19.1 changelog, but I still see this error in 2.6.19.1. It started in 2.6.19, and it didn't happen in 2.6.18.1. Nope; the issue fixed in 2.6.19.1 has always existed (provided you had hardware suitable to trigger it). And it was an issue of getting bogus legacy interrupts when using MSI, which would lead to some other device on the same legacy interrupt getting disabled. I'd suggest reverting 0a07bc645e818b88559d99f52ad45e35352e8228 (fixes a lockdep warning, stuff with interrupts, only build tested) as a first guess. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL only modules
On Mon, 18 Dec 2006, Linus Torvalds wrote: > Static vs dynamic matters for whether it's an AGGREGATE work. Clearly, > static linking aggregates the library with the other program in the same > binary. There's no question about that. And that _does_ have meaning from > a copyright law angle, since if you don't have permission to ship > aggregate works under the license, then you can't ship said binary. It's > just a non-issue in the specific case of the GPLv2. Under US law, the distinction is between works that are copyrightable themselves as "derivative works" and works that are derived from others, but aren't copyrightable. Provided you're allowed to ship aggregate works, the question is whether the output of "ld" is a copyrightable work distinct from the inputs. I'd agree that "ar", like "mkisofs", doesn't create a derived work, but I think that "objcopy" does create a derived work, and "ld" does too, by virtue of modifying the objects it takes to resolve symbols. Now, you could distribute to somebody an ar archive of your program, and the recipient (given fair use rights to the copy of the program they received) could do "gcc program.a -o program" to link it. But I don't think you automatically get the right (under the "mere aggregation" permission) to distribute the result of relocating the symbols of gnutls around those of your program and vice versa, along with modifying the references to external symbols from each of these to point to specific locations. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL only modules
On Mon, 18 Dec 2006, Linus Torvalds wrote: Static vs dynamic matters for whether it's an AGGREGATE work. Clearly, static linking aggregates the library with the other program in the same binary. There's no question about that. And that _does_ have meaning from a copyright law angle, since if you don't have permission to ship aggregate works under the license, then you can't ship said binary. It's just a non-issue in the specific case of the GPLv2. Under US law, the distinction is between works that are copyrightable themselves as derivative works and works that are derived from others, but aren't copyrightable. Provided you're allowed to ship aggregate works, the question is whether the output of ld is a copyrightable work distinct from the inputs. I'd agree that ar, like mkisofs, doesn't create a derived work, but I think that objcopy does create a derived work, and ld does too, by virtue of modifying the objects it takes to resolve symbols. Now, you could distribute to somebody an ar archive of your program, and the recipient (given fair use rights to the copy of the program they received) could do gcc program.a -o program to link it. But I don't think you automatically get the right (under the mere aggregation permission) to distribute the result of relocating the symbols of gnutls around those of your program and vice versa, along with modifying the references to external symbols from each of these to point to specific locations. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patch] improve INTx toggle for PCI MSI
On Thu, 7 Dec 2006, Jeff Garzik wrote: > "it boots" on ICH7 at least. It solves my problem (and doesn't break anything). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patch] improve INTx toggle for PCI MSI
On Thu, 7 Dec 2006, Jeff Garzik wrote: it boots on ICH7 at least. It solves my problem (and doesn't break anything). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disable INTx when enabling MSI
On Thu, 7 Dec 2006, Greg KH wrote: > Care to take Jeff's proposed patch, verify that it works and forward it > on to me? I'll test it tomorrow. Testing disables my network, and making sure the problem exists without the patch kills my disk controller, so I need to sit at the computer for a while. I assume that I've got the only known device that demonstrates the need for this? Off topic: would it be wise as a general rule to somehow shut down devices whose interrupts get disabled by the "nobody cared!" code? Or maybe call their interrupt handlers periodically to keep them alive? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Disable INTx when enabling MSI
Some device manufacturers seem to think it's the OS's responsibility to disable legacy interrupt delivery when using MSI. If the driver doesn't handle it (which they generally don't), and the device isn't PCI-Express, a steady stream of legacy interrupts will be delivered in addition to the MSI ones, eventually leading to the legacy IRQ getting disabled, which kills any device that shares it. Jeff proposed a patch in http://lkml.org/lkml/2006/11/21/332 when Linus wanted to do it in the PCI layer, but nobody seems to have told the actual PCI maintainer. I'm trying to get a patch into -stable to do pci_intx in exactly the same situations, but only for forcedeth (which is the device that's causing problems for me), but that requires that the real solution be merged in the mainline. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/