Re: Question about your git habits

2008-02-22 Thread Daniel Barkalow
On Fri, 22 Feb 2008, Chase Venters wrote:

> I've been making myself more familiar with git lately and I'm curious what 
> habits others have adopted. (I know there are a few documents in circulation 
> that deal with using git to work on the kernel but I don't think this has 
> been specifically covered).
> 
> My question is: If you're working on multiple things at once, do you tend to 
> clone the entire repository repeatedly into a series of separate working 
> directories and do your work there, then pull that work (possibly comprising 
> a series of "temporary" commits) back into a separate local master 
> respository with --squash, either into "master" or into a branch containing 
> the new feature?
> 
> Or perhaps you create a temporary topical branch for each thing you are 
> working on, and commit arbitrary changes then checkout another branch when 
> you need to change gears, finally --squashing the intermediate commits when a 
> particular piece of work is done?

I find that the sequence of changes I make is pretty much unrelated to the 
sequence of changes that end up in the project's history, because my 
changes as I make them involve writing a lot of stubs (so I can build) and 
then filling them out. It's beneficial to have version control on this so 
that, if I screw up filling out a stub, I can get back to where I was.

Having made a complete series, I then generate a new series of commits, 
each of which does one thing, without any bugs that I've resolved, such 
that the net result is the end of the messy history, except with any 
debugging or useless stuff skipped. It's this series that gets merged into 
the project history, and I discard the other history.

The real trick is that the early patches in a lot of series often refactor 
existing code in ways that are generally good and necessary for your 
eventual outcome, but which you'd never think of until you've written more 
of the series. Generating a new commit sequence is necessary to end up 
with a history where it looks from the start like you know where you're 
going and have everything done that needs to be done when you get to the 
point of needing it. Furthermore, you want to be able to test these 
commits in isolation, without the distraction of the changes that actually 
prompted them, which means that you want to have your working tree is a 
state that you never actually had it in as you were developing the end 
result.

This means that you'll usually want to rewrite commits for any series that 
isn't a single obvious patch, so it's not a big deal to commit any time 
you want to work on some different branch.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about your git habits

2008-02-22 Thread Daniel Barkalow
On Fri, 22 Feb 2008, Chase Venters wrote:

 I've been making myself more familiar with git lately and I'm curious what 
 habits others have adopted. (I know there are a few documents in circulation 
 that deal with using git to work on the kernel but I don't think this has 
 been specifically covered).
 
 My question is: If you're working on multiple things at once, do you tend to 
 clone the entire repository repeatedly into a series of separate working 
 directories and do your work there, then pull that work (possibly comprising 
 a series of temporary commits) back into a separate local master 
 respository with --squash, either into master or into a branch containing 
 the new feature?
 
 Or perhaps you create a temporary topical branch for each thing you are 
 working on, and commit arbitrary changes then checkout another branch when 
 you need to change gears, finally --squashing the intermediate commits when a 
 particular piece of work is done?

I find that the sequence of changes I make is pretty much unrelated to the 
sequence of changes that end up in the project's history, because my 
changes as I make them involve writing a lot of stubs (so I can build) and 
then filling them out. It's beneficial to have version control on this so 
that, if I screw up filling out a stub, I can get back to where I was.

Having made a complete series, I then generate a new series of commits, 
each of which does one thing, without any bugs that I've resolved, such 
that the net result is the end of the messy history, except with any 
debugging or useless stuff skipped. It's this series that gets merged into 
the project history, and I discard the other history.

The real trick is that the early patches in a lot of series often refactor 
existing code in ways that are generally good and necessary for your 
eventual outcome, but which you'd never think of until you've written more 
of the series. Generating a new commit sequence is necessary to end up 
with a history where it looks from the start like you know where you're 
going and have everything done that needs to be done when you get to the 
point of needing it. Furthermore, you want to be able to test these 
commits in isolation, without the distraction of the changes that actually 
prompted them, which means that you want to have your working tree is a 
state that you never actually had it in as you were developing the end 
result.

This means that you'll usually want to rewrite commits for any series that 
isn't a single obvious patch, so it's not a big deal to commit any time 
you want to work on some different branch.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION 2.6.23] no vga console and no messages

2008-02-17 Thread Daniel Barkalow
As far as I can tell, the only differences in either dmesg or lspci 
between the broken one and the working one are the phrasing of messages, 
not what's happening.

Out of curiousity, what do you see for the "Console: " line when you boot? 
It's possible that the VGA console code somehow got broken for both of 
us in 2.6.23, and this means that (a) my console doesn't work, since I'm 
trying to use it, and (b) if your framebuffer console tries to match your 
VGA console, it'll break, because your VGA console is broken.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION 2.6.23] no vga console and no messages

2008-02-17 Thread Daniel Barkalow
On Sun, 17 Feb 2008, Frans Pop wrote:

> On Sunday 17 February 2008, Daniel Barkalow wrote:
> > On Sun, 17 Feb 2008, Frans Pop wrote:
> > > Daniel Barkalow wrote:
> > > > For some reason I can't see and don't know how to debug, in 2.6.23 on
> > > > my server I don't get the vga console, but only get the dummy
> > > > console.
> > >
> > > Please check if this bug report matches the issue you are seeing:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=9310
> >
> > I think mine might be different. I've got a vga parameter (vga=0x301),
> > and mine disappears very early, before when you usually get "Console:
> > colour VGA+ 80x25" (and I'm getting "Console: coloud dummy 80x25"
> > instead). I've also got CONFIG_FB turned off entirely.
> 
> The main question is: do you have FRAMEBUFFER_CONSOLE_DETECT_PRIMARY enabled 
> in you kernel config. If you do, I'd try disabling it.
>
> > But if you've got any insight into how the console driver stuff works
> > from troubleshooting your problem, I could use the hints...
> 
> Afraid not. Are you sure you have the correct framebuffer driver compiled 
> into the kernel?

I'm sure I have none at all; I'm trying to use the vga console, not the 
framebuffer console, or the framebuffer at all.

> Please post your kernel config and the output of 'lspci -nn', so people can 
> have a look.

.config from the build where I disabled DUMMY_CONSOLE and it panics:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-gentoo-r8
# Sat Feb 16 21:54:06 2008
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_

Re: [REGRESSION 2.6.23] no vga console and no messages

2008-02-17 Thread Daniel Barkalow
On Sun, 17 Feb 2008, Frans Pop wrote:

> Daniel Barkalow wrote:
> > For some reason I can't see and don't know how to debug, in 2.6.23 on my
> > server I don't get the vga console, but only get the dummy console.
> 
> Please check if this bug report matches the issue you are seeing:
> http://bugzilla.kernel.org/show_bug.cgi?id=9310

I think mine might be different. I've got a vga parameter (vga=0x301), and 
mine disappears very early, before when you usually get "Console: 
colour VGA+ 80x25" (and I'm getting "Console: coloud dummy 80x25" 
instead). I've also got CONFIG_FB turned off entirely.

But if you've got any insight into how the console driver stuff works from 
troubleshooting your problem, I could use the hints...

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION 2.6.23] no vga console and no messages

2008-02-17 Thread Daniel Barkalow
On Sun, 17 Feb 2008, Frans Pop wrote:

 Daniel Barkalow wrote:
  For some reason I can't see and don't know how to debug, in 2.6.23 on my
  server I don't get the vga console, but only get the dummy console.
 
 Please check if this bug report matches the issue you are seeing:
 http://bugzilla.kernel.org/show_bug.cgi?id=9310

I think mine might be different. I've got a vga parameter (vga=0x301), and 
mine disappears very early, before when you usually get Console: 
colour VGA+ 80x25 (and I'm getting Console: coloud dummy 80x25 
instead). I've also got CONFIG_FB turned off entirely.

But if you've got any insight into how the console driver stuff works from 
troubleshooting your problem, I could use the hints...

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION 2.6.23] no vga console and no messages

2008-02-17 Thread Daniel Barkalow
On Sun, 17 Feb 2008, Frans Pop wrote:

 On Sunday 17 February 2008, Daniel Barkalow wrote:
  On Sun, 17 Feb 2008, Frans Pop wrote:
   Daniel Barkalow wrote:
For some reason I can't see and don't know how to debug, in 2.6.23 on
my server I don't get the vga console, but only get the dummy
console.
  
   Please check if this bug report matches the issue you are seeing:
   http://bugzilla.kernel.org/show_bug.cgi?id=9310
 
  I think mine might be different. I've got a vga parameter (vga=0x301),
  and mine disappears very early, before when you usually get Console:
  colour VGA+ 80x25 (and I'm getting Console: coloud dummy 80x25
  instead). I've also got CONFIG_FB turned off entirely.
 
 The main question is: do you have FRAMEBUFFER_CONSOLE_DETECT_PRIMARY enabled 
 in you kernel config. If you do, I'd try disabling it.

  But if you've got any insight into how the console driver stuff works
  from troubleshooting your problem, I could use the hints...
 
 Afraid not. Are you sure you have the correct framebuffer driver compiled 
 into the kernel?

I'm sure I have none at all; I'm trying to use the vga console, not the 
framebuffer console, or the framebuffer at all.

 Please post your kernel config and the output of 'lspci -nn', so people can 
 have a look.

.config from the build where I disabled DUMMY_CONSOLE and it panics:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-gentoo-r8
# Sat Feb 16 21:54:06 2008
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=cfq

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK

Re: [REGRESSION 2.6.23] no vga console and no messages

2008-02-17 Thread Daniel Barkalow
As far as I can tell, the only differences in either dmesg or lspci 
between the broken one and the working one are the phrasing of messages, 
not what's happening.

Out of curiousity, what do you see for the Console:  line when you boot? 
It's possible that the VGA console code somehow got broken for both of 
us in 2.6.23, and this means that (a) my console doesn't work, since I'm 
trying to use it, and (b) if your framebuffer console tries to match your 
VGA console, it'll break, because your VGA console is broken.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION 2.6.23] no vga console and no messages

2008-02-16 Thread Daniel Barkalow
For some reason I can't see and don't know how to debug, in 2.6.23 on my 
server I don't get the vga console, but only get the dummy console.

I also noticed that the documentation is wrong and the Kconfig file is 
confused; it's impossible to not have DUMMY_CONSOLE set, because at least 
one of PROM_CONSOLE and VGA_CONSOLE must not be y. Normally (maybe only 
due to the fact that "dummycon" sorts before "promcon", "sticon", and 
"vgacon"), it actually only stays active if your real console doesn't also 
get initialized. This isn't my problem, AFAICT (my kernel panics if I 
disable DUMMY_CONSOLE, presumably for lack of any console at all); it's 
just misleading.

I'm not seeing anything in dmesg to indicate why VGA+ isn't getting 
registered successfully, or anything to suggest it is trying to be, nor do 
I see anything in a 2.6.22 boot about why that seems to work. Any 
suggestions on further things to try?

I haven't tested anything newer than 2.6.23.x, but I looked through the 
git history and didn't find anything that looked relevant, or even anyone 
who might know about it.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION 2.6.23] no vga console and no messages

2008-02-16 Thread Daniel Barkalow
For some reason I can't see and don't know how to debug, in 2.6.23 on my 
server I don't get the vga console, but only get the dummy console.

I also noticed that the documentation is wrong and the Kconfig file is 
confused; it's impossible to not have DUMMY_CONSOLE set, because at least 
one of PROM_CONSOLE and VGA_CONSOLE must not be y. Normally (maybe only 
due to the fact that dummycon sorts before promcon, sticon, and 
vgacon), it actually only stays active if your real console doesn't also 
get initialized. This isn't my problem, AFAICT (my kernel panics if I 
disable DUMMY_CONSOLE, presumably for lack of any console at all); it's 
just misleading.

I'm not seeing anything in dmesg to indicate why VGA+ isn't getting 
registered successfully, or anything to suggest it is trying to be, nor do 
I see anything in a 2.6.22 boot about why that seems to work. Any 
suggestions on further things to try?

I haven't tested anything newer than 2.6.23.x, but I looked through the 
git history and didn't find anything that looked relevant, or even anyone 
who might know about it.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

> > The SCSI error reporting really ought to include a simple interpretation 
> > of the error for end users ("The drive doesn't support this command" "A 
> > sector's data got lost" "The drive timed out" "The drive failed" "The 
> > drive is entirely gone"). There's too much similarity between the message 
> > you get when you try a SMART test that doesn't apply to the drive and what 
> > you get when the drive is broken.
> 
> That would be the SCSI verbose messages option. I think the Eric
> Youngdale consortium added it about Linux 1.2. Nowdays its always built
> that way.

I've seen a lot of verbosity out of SCSI messages, but I haven't seen a 
straightforward interpretation of the problem in there. It's all 
information useful for debugging, not information useful for system 
administration.

> > And it's possible that the error recovery is suboptimal in some cases. It 
> > seems to like resetting drives too much; perhaps if it keeps seeing the 
> > same problem and resetting the drive, it should decide that the drive's 
> > error reporting is just bad and just ignore that error like the old IDE 
> > did (but, in this case, after saying what it's doing).
> 
> Nothing like casually praying the users data hasn't gone for a walk is
> there. If we don't act on them the users don't report them until
> something really bad occurs so that isn't an option.

On the other hand, bringing the system down because a device is 
misbehaving is a poor idea. I've personally recovered most of the data off 
of a dying drive because the system was willing to let me keep using the 
drive anyway; IIRC, the drive didn't work at all after a reboot, so I 
would have lost all the data instead of only a little had the system 
insisted on a perfectly functioning drive in order to use it at all.

There ought to be some middle ground between doing nothing until the 
computer really breaks and breaking the computer before then, but that's 
an issue not specific to libata.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

> > not one problem but lots---is sufficiently widespread that a Mini HOWTO, 
> > say, would be really welcome and, I'm guessing, widely used.
> 
> We don't see very many libata problems at the distro level and they for
> the most part boil down to
> 
> - error messages looking different - Most bugs I get are things like
> media errors (timeout looks different, UNC report looks different)

The SCSI error reporting really ought to include a simple interpretation 
of the error for end users ("The drive doesn't support this command" "A 
sector's data got lost" "The drive timed out" "The drive failed" "The 
drive is entirely gone"). There's too much similarity between the message 
you get when you try a SMART test that doesn't apply to the drive and what 
you get when the drive is broken.

> - faulty hardware being picked up because we actually do real error
> checking now. We now check for and give some devices more slack while
> still doing error checking. Both IDE layers also added blacklists for
> stuff like the TSScorp DVD drives. Qemu has now had its bugs patched.

I think this is the big source of unhappy users (and, of course, they all 
look the same and the reports stay findable by Google, so it looks a lot 
worse than it is). People getting this problem in distro kernels probably 
really do want to have a way to report it with enough detail from logs to 
get it dealt with and then switch back to old IDE until the fix propagates 
through.

And it's possible that the error recovery is suboptimal in some cases. It 
seems to like resetting drives too much; perhaps if it keeps seeing the 
same problem and resetting the drive, it should decide that the drive's 
error reporting is just bad and just ignore that error like the old IDE 
did (but, in this case, after saying what it's doing).

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

> > things in the kernel that refer to SCSI probably should say "storage" (or 
> > "ATA", really, but that would make the acronyms confusing).
> 
> SCSI is a command protocol. It is what your CD-ROM drive and USB storage
> devices talk (albeit with a bit of an accent).

Among other things, yes. But SCSI standards also specify electrical 
interfaces that aren't at all related to the electrical interfaces used by 
a lot of devices, and a lot of the places the kernel uses the term suggest 
that it's also talking about the electrical interface (or, at least, 
connector shape). For example, it's misleading to talk about "SCSI CDROM 
support" meaning the command protocol when hardly anybody has ever seen a 
CDROM drive that doesn't use the SCSI command protocol, but most people 
know about both SCSI-connector and PATA-connector CDROM drives.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Gene Heskett wrote:

> >For starters, enable CONFIG_BLK_DEV_SR.
> 
> That could stand to be moved or renamed, it is well buried in the menu for 
> the 
> REAL scsi stuffs, which I don't have any of.  Enabled & building now.  

The "SCSI support type (disk, tape, CD-ROM)" section of that menu actually 
applies to all ATA-command-set devices that don't use the old IDE code. 
For example, usb-storage uses "SCSI disk" out of that section, and 
I've only seen "Probe all LUNs on each SCSI device" be needed for a 
particular USB card reader with two slots. At this point, most of the 
things in the kernel that refer to SCSI probably should say "storage" (or 
"ATA", really, but that would make the acronyms confusing).

Incidentally, you should be able to save debugging time for problems like 
missing "sr" by building it as a module, which will build really quickly 
and not require a reboot to test.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Gene Heskett wrote:

 For starters, enable CONFIG_BLK_DEV_SR.
 
 That could stand to be moved or renamed, it is well buried in the menu for 
 the 
 REAL scsi stuffs, which I don't have any of.  Enabled  building now.  

The SCSI support type (disk, tape, CD-ROM) section of that menu actually 
applies to all ATA-command-set devices that don't use the old IDE code. 
For example, usb-storage uses SCSI disk out of that section, and 
I've only seen Probe all LUNs on each SCSI device be needed for a 
particular USB card reader with two slots. At this point, most of the 
things in the kernel that refer to SCSI probably should say storage (or 
ATA, really, but that would make the acronyms confusing).

Incidentally, you should be able to save debugging time for problems like 
missing sr by building it as a module, which will build really quickly 
and not require a reboot to test.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

  things in the kernel that refer to SCSI probably should say storage (or 
  ATA, really, but that would make the acronyms confusing).
 
 SCSI is a command protocol. It is what your CD-ROM drive and USB storage
 devices talk (albeit with a bit of an accent).

Among other things, yes. But SCSI standards also specify electrical 
interfaces that aren't at all related to the electrical interfaces used by 
a lot of devices, and a lot of the places the kernel uses the term suggest 
that it's also talking about the electrical interface (or, at least, 
connector shape). For example, it's misleading to talk about SCSI CDROM 
support meaning the command protocol when hardly anybody has ever seen a 
CDROM drive that doesn't use the SCSI command protocol, but most people 
know about both SCSI-connector and PATA-connector CDROM drives.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

  not one problem but lots---is sufficiently widespread that a Mini HOWTO, 
  say, would be really welcome and, I'm guessing, widely used.
 
 We don't see very many libata problems at the distro level and they for
 the most part boil down to
 
 - error messages looking different - Most bugs I get are things like
 media errors (timeout looks different, UNC report looks different)

The SCSI error reporting really ought to include a simple interpretation 
of the error for end users (The drive doesn't support this command A 
sector's data got lost The drive timed out The drive failed The 
drive is entirely gone). There's too much similarity between the message 
you get when you try a SMART test that doesn't apply to the drive and what 
you get when the drive is broken.

 - faulty hardware being picked up because we actually do real error
 checking now. We now check for and give some devices more slack while
 still doing error checking. Both IDE layers also added blacklists for
 stuff like the TSScorp DVD drives. Qemu has now had its bugs patched.

I think this is the big source of unhappy users (and, of course, they all 
look the same and the reports stay findable by Google, so it looks a lot 
worse than it is). People getting this problem in distro kernels probably 
really do want to have a way to report it with enough detail from logs to 
get it dealt with and then switch back to old IDE until the fix propagates 
through.

And it's possible that the error recovery is suboptimal in some cases. It 
seems to like resetting drives too much; perhaps if it keeps seeing the 
same problem and resetting the drive, it should decide that the drive's 
error reporting is just bad and just ignore that error like the old IDE 
did (but, in this case, after saying what it's doing).

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

  The SCSI error reporting really ought to include a simple interpretation 
  of the error for end users (The drive doesn't support this command A 
  sector's data got lost The drive timed out The drive failed The 
  drive is entirely gone). There's too much similarity between the message 
  you get when you try a SMART test that doesn't apply to the drive and what 
  you get when the drive is broken.
 
 That would be the SCSI verbose messages option. I think the Eric
 Youngdale consortium added it about Linux 1.2. Nowdays its always built
 that way.

I've seen a lot of verbosity out of SCSI messages, but I haven't seen a 
straightforward interpretation of the problem in there. It's all 
information useful for debugging, not information useful for system 
administration.

  And it's possible that the error recovery is suboptimal in some cases. It 
  seems to like resetting drives too much; perhaps if it keeps seeing the 
  same problem and resetting the drive, it should decide that the drive's 
  error reporting is just bad and just ignore that error like the old IDE 
  did (but, in this case, after saying what it's doing).
 
 Nothing like casually praying the users data hasn't gone for a walk is
 there. If we don't act on them the users don't report them until
 something really bad occurs so that isn't an option.

On the other hand, bringing the system down because a device is 
misbehaving is a poor idea. I've personally recovered most of the data off 
of a dying drive because the system was willing to let me keep using the 
drive anyway; IIRC, the drive didn't work at all after a reboot, so I 
would have lost all the data instead of only a little had the system 
insisted on a perfectly functioning drive in order to use it at all.

There ought to be some middle ground between doing nothing until the 
computer really breaks and breaking the computer before then, but that's 
an issue not specific to libata.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

> On Monday 28 January 2008, Daniel Barkalow wrote:
> >On Mon, 28 Jan 2008, Gene Heskett wrote:
> >> On Monday 28 January 2008, Daniel Barkalow wrote:
> >> >Building this and installing it along with the appropriate initrd (which
> >> >might be handled by Fedora's install scripts)
> >>
> >> Or mine, which I've been using for years.
> >
> >You're ahead of a surprising number of people, including me, if you
> >understand making initrds.
> 
> In my script, its one line:
> mkinitrd -f initrd-$VER.img $VER && \
> 
> where $VER is the shell variable I edit to = the version number, located at 
> the top of the script.
> 
> Unforch, its failing:
> No module pata_amd found for kernel 2.6.24, aborting.
> 
> This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned 
> on.  So something is still dependent on it. 

That looks like something in the guts of the initrd; it probably thinks 
you need pata_amd and it's unhappy that you don't have it.

Actually, another thing to try is making the ATA/etc one be "y" and 
pata_amd be "m". Most likely, this should lead to the ATA one claiming the 
drive before the module is loaded (but the module would be loaded later, 
to avoid upsetting the initrd); you should be able to tell from dmesg (or 
/dev, for that matter) which one got it, and I think built-in drivers will 
claim everything they can before an initrd gets loaded.

> I do have one sata drive, on an accessory card in the box, so I need the 
> rest of the sata_sil and friends stuff. 

Assuming it isn't picking up your hard drive, which it isn't, that 
shouldn't matter.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

> On Monday 28 January 2008, Daniel Barkalow wrote:
> >Building this and installing it along with the appropriate initrd (which
> >might be handled by Fedora's install scripts)
> 
> Or mine, which I've been using for years.

You're ahead of a surprising number of people, including me, if you 
understand making initrds.

> >will either get you back to 
> >old IDE or will make your kernel panic on boot, depending on whether you
> >got it right (so make sure you can still boot the kernel you're sure of or
> >something from a boot disk). This will also cause your hard drives to show
> >up as different device nodes, so if your boot process doesn't mount by
> >disk uuid but by some other feature (and I don't know what Fedora does),
> >you'll also need to change it to something either stable across access
> >methods or which works for the one you're now using.
> 
> It mounts by LABEL=.  All of it.

That'll save a huge amount of hassle. So long as you manage to get the 
right drivers included and the wrong drivers not included, you should be 
pretty much set.

> Fedora is not the only people having trouble,  name a distro, its probably 
> someplace in that 14,800 hit google returns.

Yeah, but they each may need different instructions, particularly if 
they're not mounting by label in general, or not mounting the root 
partition by label. That was the big hassle going the opposite direction. 
And the procedure is 4 lines to describe to somebody who knows how to 
build and install a new kernel for the distro, which is much shorter than 
the explanation of how you generally build and install a kernel. A real 
howto would have to explain where to get the distro's kernel sources and 
default configuration, for example.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Mike Frysinger wrote:

> On Jan 28, 2008 7:08 PM, Daniel Barkalow <[EMAIL PROTECTED]> wrote:
> > Could you submit the XML files and the autogeneration code? The C file
> > isn't really source. Not only is it big, it'll probably change around a
> > whole lot when you make small changes to your process, be hard to review,
> > etc.
> 
> that would require the build system to have xml tools installed ...
> that doesnt sound pleasant.

If they're only required for building blackfin debugging stuff, that 
shouldn't be a big deal. People building embedded kernels with debugging 
from source can probably handle the extra requirement. Setting up a 
cross-compilation toolchain for embedded processors is much trickier than 
getting xml tools.

> that said, the XML files in question are probably 10x+ the size of the
> C file.  swapping 1 meg for 10+ megs ? :)

If it's a bunch of smaller files, and if changes tend to be localized, 
that would be a good tradeoff. Alternatively, have them packaged 
separately, which might be more appropriate anyway if people might want to 
use them for other purposes (on the host when using jtag, perhaps).

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Mike Frysinger wrote:

> On Jan 28, 2008 8:04 AM, richard kennedy <[EMAIL PROTECTED]> wrote:
> > Mike Frysinger wrote:
> > > On Jan 28, 2008 5:40 AM, Bryan Wu <[EMAIL PROTECTED]> wrote:
> > >> On Mon, 2008-01-28 at 05:16 -0500, Mike Frysinger wrote:
> > >>> the trouble is that this file currently weighs in at ~1.8 megs.  this
> > >>> is because it contains all the information for all Blackfin processors
> > >>> we support (which currently, is about ~23 variants).  it's only going
> > >>> to get bigger as we support more.  Bryan cringes at the thought of
> > >>> submitting it to LKML :).  so i'm fishing around for alternatives ...
> > >>> the code was originally developed against 2.6.21, so UIO was not a
> > >>> possibility.  i'm still not sure if it is ... i'd have to research it
> > >>> a bit more and play with things.
> > >> The main reason I am not willing to submit this to mainline is the file
> > >> size. It's almost the biggest file in the kernel source. And it will be
> > >> bigger and bigger when more and more new Blackfin processors supported
> > >> by Linux kernel.
> > >
> > > a quick check of current git shows it is significantly larger than any 
> > > other ;)
> > >
> > >> My suggestion is:
> > >> Or more deeper thought:
> > >>  - we don't need all the MMR setup at the same time for debugging. for 
> > >> example, maybe for some developer, he/she only needs one driver MMR for 
> > >> debugging such as watchdog/usb/spi/i2c 
> > >
> > > splitting things up doesnt really address the original issue: there's
> > > a lot of info here to be kept in the kernel
> > >
> > >>  - How about split the debug MMR table to each drivers or processors?
> > >>  - watchdog driver implements a debug FS interface for debugging 
> > >> watchdog MMR and other drivers implement their own things.
> > >
> > > this had been mentioned before as a possibility but shot down.  you do
> > > not want to tie the creation of these debug files to anything as the
> > > prevents independent development of any other drivers/application that
> > > use the same peripheral.
> >
> > there is a lot of duplication in your file, but you could slim it down a
> > bit if thats the only objection.
> 
> i imagine there's a ton of duplication ... the file is auto-generated
> from XML files, so i could take a look at the autogeneration producing
> unified code.

Could you submit the XML files and the autogeneration code? The C file 
isn't really source. Not only is it big, it'll probably change around a 
whole lot when you make small changes to your process, be hard to review, 
etc.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Richard Heck wrote:

> Daniel Barkalow wrote:
> > Can you switch back to old IDE to get your work done (and to make sure it's
> > not a hardware issue that's developed recently)? 
> I think it'd be really, REALLY helpful to a lot of people if you, or someone,
> could explain in moderate detail how this might be done. I tried doing it
> myself, but I'm not sufficiently expert at configuring kernels that I was ever
> able to figure out how to do it.

As far as configuring the kernel, I can help:

Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that 
looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers, 
and turn off anything that's PATA and looks relevant.

(Whether a device uses IDE or PATA depends on which driver that supports 
the device is present and find it first, not on any sort of global 
configuration, which is probably what tripped you up)

Building this and installing it along with the appropriate initrd (which 
might be handled by Fedora's install scripts) will either get you back to 
old IDE or will make your kernel panic on boot, depending on whether you 
got it right (so make sure you can still boot the kernel you're sure of or 
something from a boot disk). This will also cause your hard drives to show 
up as different device nodes, so if your boot process doesn't mount by 
disk uuid but by some other feature (and I don't know what Fedora does), 
you'll also need to change it to something either stable across access 
methods or which works for the one you're now using.

> Obviously, the short version is: switch back to Fedora 6. But this kind of
> problem with libata---and yes, you're almost surely right that it's not one
> problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be
> really welcome and, I'm guessing, widely used.

Fedora really ought to provide documentation, because there's some 
distro-specific stuff (like how you deal with the kernel's device node for 
the root partition changing), and they're using code by default that's at 
least somewhat documented as experimental (although it doesn't seem to be 
actually marked as experimental in all cases).

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

> I believe at this point, its moot.  I captured quite a few instances of that 
> error message while rebooting the last time, all of which occurred long 
> before I logged in and did a startx (I boot to runlevel 3 here), so the 
> kernel was NOT tainted at that point.  That dmesg has been posted and some 
> questions asked.
> 
> As this has gone on for a while, it seems to me that with 14,800 google hits 
> on this problem, Linus should call a halt until this is found and fixed.  But 
> I'm not Linus.  I'm also locking up for 30 at a time, & probably ready for 
> reboot #7 today.

Can you switch back to old IDE to get your work done (and to make sure 
it's not a hardware issue that's developed recently)? I believe libata is 
just a whole lot pickier about behavior than the IDE subsystem was, so 
it's more likely to complain about stuff, both for good reasons and when 
it shouldn't, and there are a slew of potential "we have to accept that 
old PATA hardware does this" bugs that all have the same symptom of "we go 
into error handling when nothing is actually wrong", hence the vast 
quantity of hits. I think it's not exactly that it's a common problem as 
that it's a lot of problems that aren't very distinguishable.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

 I believe at this point, its moot.  I captured quite a few instances of that 
 error message while rebooting the last time, all of which occurred long 
 before I logged in and did a startx (I boot to runlevel 3 here), so the 
 kernel was NOT tainted at that point.  That dmesg has been posted and some 
 questions asked.
 
 As this has gone on for a while, it seems to me that with 14,800 google hits 
 on this problem, Linus should call a halt until this is found and fixed.  But 
 I'm not Linus.  I'm also locking up for 30 at a time,  probably ready for 
 reboot #7 today.

Can you switch back to old IDE to get your work done (and to make sure 
it's not a hardware issue that's developed recently)? I believe libata is 
just a whole lot pickier about behavior than the IDE subsystem was, so 
it's more likely to complain about stuff, both for good reasons and when 
it shouldn't, and there are a slew of potential we have to accept that 
old PATA hardware does this bugs that all have the same symptom of we go 
into error handling when nothing is actually wrong, hence the vast 
quantity of hits. I think it's not exactly that it's a common problem as 
that it's a lot of problems that aren't very distinguishable.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Richard Heck wrote:

 Daniel Barkalow wrote:
  Can you switch back to old IDE to get your work done (and to make sure it's
  not a hardware issue that's developed recently)? 
 I think it'd be really, REALLY helpful to a lot of people if you, or someone,
 could explain in moderate detail how this might be done. I tried doing it
 myself, but I'm not sufficiently expert at configuring kernels that I was ever
 able to figure out how to do it.

As far as configuring the kernel, I can help:

Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that 
looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers, 
and turn off anything that's PATA and looks relevant.

(Whether a device uses IDE or PATA depends on which driver that supports 
the device is present and find it first, not on any sort of global 
configuration, which is probably what tripped you up)

Building this and installing it along with the appropriate initrd (which 
might be handled by Fedora's install scripts) will either get you back to 
old IDE or will make your kernel panic on boot, depending on whether you 
got it right (so make sure you can still boot the kernel you're sure of or 
something from a boot disk). This will also cause your hard drives to show 
up as different device nodes, so if your boot process doesn't mount by 
disk uuid but by some other feature (and I don't know what Fedora does), 
you'll also need to change it to something either stable across access 
methods or which works for the one you're now using.

 Obviously, the short version is: switch back to Fedora 6. But this kind of
 problem with libata---and yes, you're almost surely right that it's not one
 problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be
 really welcome and, I'm guessing, widely used.

Fedora really ought to provide documentation, because there's some 
distro-specific stuff (like how you deal with the kernel's device node for 
the root partition changing), and they're using code by default that's at 
least somewhat documented as experimental (although it doesn't seem to be 
actually marked as experimental in all cases).

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Mike Frysinger wrote:

 On Jan 28, 2008 8:04 AM, richard kennedy [EMAIL PROTECTED] wrote:
  Mike Frysinger wrote:
   On Jan 28, 2008 5:40 AM, Bryan Wu [EMAIL PROTECTED] wrote:
   On Mon, 2008-01-28 at 05:16 -0500, Mike Frysinger wrote:
   the trouble is that this file currently weighs in at ~1.8 megs.  this
   is because it contains all the information for all Blackfin processors
   we support (which currently, is about ~23 variants).  it's only going
   to get bigger as we support more.  Bryan cringes at the thought of
   submitting it to LKML :).  so i'm fishing around for alternatives ...
   the code was originally developed against 2.6.21, so UIO was not a
   possibility.  i'm still not sure if it is ... i'd have to research it
   a bit more and play with things.
   The main reason I am not willing to submit this to mainline is the file
   size. It's almost the biggest file in the kernel source. And it will be
   bigger and bigger when more and more new Blackfin processors supported
   by Linux kernel.
  
   a quick check of current git shows it is significantly larger than any 
   other ;)
  
   My suggestion is:
   Or more deeper thought:
- we don't need all the MMR setup at the same time for debugging. for 
   example, maybe for some developer, he/she only needs one driver MMR for 
   debugging such as watchdog/usb/spi/i2c 
  
   splitting things up doesnt really address the original issue: there's
   a lot of info here to be kept in the kernel
  
- How about split the debug MMR table to each drivers or processors?
- watchdog driver implements a debug FS interface for debugging 
   watchdog MMR and other drivers implement their own things.
  
   this had been mentioned before as a possibility but shot down.  you do
   not want to tie the creation of these debug files to anything as the
   prevents independent development of any other drivers/application that
   use the same peripheral.
 
  there is a lot of duplication in your file, but you could slim it down a
  bit if thats the only objection.
 
 i imagine there's a ton of duplication ... the file is auto-generated
 from XML files, so i could take a look at the autogeneration producing
 unified code.

Could you submit the XML files and the autogeneration code? The C file 
isn't really source. Not only is it big, it'll probably change around a 
whole lot when you make small changes to your process, be hard to review, 
etc.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

 On Monday 28 January 2008, Daniel Barkalow wrote:
 Building this and installing it along with the appropriate initrd (which
 might be handled by Fedora's install scripts)
 
 Or mine, which I've been using for years.

You're ahead of a surprising number of people, including me, if you 
understand making initrds.

 will either get you back to 
 old IDE or will make your kernel panic on boot, depending on whether you
 got it right (so make sure you can still boot the kernel you're sure of or
 something from a boot disk). This will also cause your hard drives to show
 up as different device nodes, so if your boot process doesn't mount by
 disk uuid but by some other feature (and I don't know what Fedora does),
 you'll also need to change it to something either stable across access
 methods or which works for the one you're now using.
 
 It mounts by LABEL=.  All of it.

That'll save a huge amount of hassle. So long as you manage to get the 
right drivers included and the wrong drivers not included, you should be 
pretty much set.

 Fedora is not the only people having trouble,  name a distro, its probably 
 someplace in that 14,800 hit google returns.

Yeah, but they each may need different instructions, particularly if 
they're not mounting by label in general, or not mounting the root 
partition by label. That was the big hassle going the opposite direction. 
And the procedure is 4 lines to describe to somebody who knows how to 
build and install a new kernel for the distro, which is much shorter than 
the explanation of how you generally build and install a kernel. A real 
howto would have to explain where to get the distro's kernel sources and 
default configuration, for example.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] exposing MMR's of on-chip peripherals for debugging purposes

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Mike Frysinger wrote:

 On Jan 28, 2008 7:08 PM, Daniel Barkalow [EMAIL PROTECTED] wrote:
  Could you submit the XML files and the autogeneration code? The C file
  isn't really source. Not only is it big, it'll probably change around a
  whole lot when you make small changes to your process, be hard to review,
  etc.
 
 that would require the build system to have xml tools installed ...
 that doesnt sound pleasant.

If they're only required for building blackfin debugging stuff, that 
shouldn't be a big deal. People building embedded kernels with debugging 
from source can probably handle the extra requirement. Setting up a 
cross-compilation toolchain for embedded processors is much trickier than 
getting xml tools.

 that said, the XML files in question are probably 10x+ the size of the
 C file.  swapping 1 meg for 10+ megs ? :)

If it's a bunch of smaller files, and if changes tend to be localized, 
that would be a good tradeoff. Alternatively, have them packaged 
separately, which might be more appropriate anyway if people might want to 
use them for other purposes (on the host when using jtag, perhaps).

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

 On Monday 28 January 2008, Daniel Barkalow wrote:
 On Mon, 28 Jan 2008, Gene Heskett wrote:
  On Monday 28 January 2008, Daniel Barkalow wrote:
  Building this and installing it along with the appropriate initrd (which
  might be handled by Fedora's install scripts)
 
  Or mine, which I've been using for years.
 
 You're ahead of a surprising number of people, including me, if you
 understand making initrds.
 
 In my script, its one line:
 mkinitrd -f initrd-$VER.img $VER  \
 
 where $VER is the shell variable I edit to = the version number, located at 
 the top of the script.
 
 Unforch, its failing:
 No module pata_amd found for kernel 2.6.24, aborting.
 
 This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned 
 on.  So something is still dependent on it. 

That looks like something in the guts of the initrd; it probably thinks 
you need pata_amd and it's unhappy that you don't have it.

Actually, another thing to try is making the ATA/etc one be y and 
pata_amd be m. Most likely, this should lead to the ATA one claiming the 
drive before the module is loaded (but the module would be loaded later, 
to avoid upsetting the initrd); you should be able to tell from dmesg (or 
/dev, for that matter) which one got it, and I think built-in drivers will 
claim everything they can before an initrd gets loaded.

 I do have one sata drive, on an accessory card in the box, so I need the 
 rest of the sata_sil and friends stuff. 

Assuming it isn't picking up your hard drive, which it isn't, that 
shouldn't matter.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Celeron Core

2008-01-20 Thread Daniel Barkalow
On Sun, 20 Jan 2008, Matt Mackall wrote:

> Your usage of "overall power" here is wrong. Power is an instantaneous
> quantity (1/s) like velocity, and you are comparing it to energy which
> is not an instaneous quantity, more like distance.
> 
> If we throttle the velocity of a car from 100km/h to 50km/h, it'll
> obviously take longer for it travel a given distance. Now what will it
> mean when we ask about its "overall velocity" when it reaches its
> destination? We surely don't mean the distance travelled - that's not a
> velocity! We can perhaps talk about its average velocity, which will
> obviously be smaller.

What's people tend to care about is average power usage over a period, not 
instantaneous power usage. In fact, throttling obviously doesn't decrease 
instantaneous power usage while the machine is doing anything (since it 
runs full speed and full power when running, and does nothing and uses 
some but not as much power when halted). Throttling decreases the average 
power usage over the period of the throttling, but increases the average 
power usage in general over longer periods.

If we throttle a car's velocity by only driving 100km/h for 5 minutes out 
of every 10 instead of all of the time, it doesn't meaningfully have less 
velocity. And it's a particularly meaningless measure if the arrangement 
as a whole is that it will leave point A at some time, drive to point B, 
and sit there until some other time; in this case its average velocity is 
the distance from point A to point B divided by the duration between the 
two times, regardless of how you drive. But the distance travelled is 
longer if you have to pull over and park every 10 minutes, and so the 
average velocity must be higher for the TDMA throttling case.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Celeron Core

2008-01-20 Thread Daniel Barkalow
On Sun, 20 Jan 2008, Matt Mackall wrote:

 Your usage of overall power here is wrong. Power is an instantaneous
 quantity (1/s) like velocity, and you are comparing it to energy which
 is not an instaneous quantity, more like distance.
 
 If we throttle the velocity of a car from 100km/h to 50km/h, it'll
 obviously take longer for it travel a given distance. Now what will it
 mean when we ask about its overall velocity when it reaches its
 destination? We surely don't mean the distance travelled - that's not a
 velocity! We can perhaps talk about its average velocity, which will
 obviously be smaller.

What's people tend to care about is average power usage over a period, not 
instantaneous power usage. In fact, throttling obviously doesn't decrease 
instantaneous power usage while the machine is doing anything (since it 
runs full speed and full power when running, and does nothing and uses 
some but not as much power when halted). Throttling decreases the average 
power usage over the period of the throttling, but increases the average 
power usage in general over longer periods.

If we throttle a car's velocity by only driving 100km/h for 5 minutes out 
of every 10 instead of all of the time, it doesn't meaningfully have less 
velocity. And it's a particularly meaningless measure if the arrangement 
as a whole is that it will leave point A at some time, drive to point B, 
and sit there until some other time; in this case its average velocity is 
the distance from point A to point B divided by the duration between the 
two times, regardless of how you drive. But the distance travelled is 
longer if you have to pull over and park every 10 minutes, and so the 
average velocity must be higher for the TDMA throttling case.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Daniel Barkalow
On Thu, 27 Dec 2007, Linus Torvalds wrote:

> On Thu, 27 Dec 2007, Daniel Barkalow wrote:
> > 
> > I'd actually bet that the hardware bug is actually that any device that 
> > gives a CRS response the first time will have its Vendor ID appear as 0001 
> > on subsequent mmconfig accesses, which means that it's actually a bus 
> > quirk that probably only affects mmconfig access to something in the 
> > conf1-visible space. The only per-device aspect would be that it uses CRS 
> > (possibly correctly), and that doesn't mean that mmconfig won't be safe in 
> > general for the device, or even that it won't be necessary. Actually, we 
> > already know that per-driver enabling mmconfig is broken: sky2 is one that 
> > wants to opt in but there are also reports of the Vendor ID 0001 bug with 
> > it.
> 
> Actually, having it be a per-device thing would have fixed this particular 
> problem, if only because the device probing would have been done without 
> MMCONFIG (thus avoiding the bug), and then after it has been probed, it 
> wouldn't have mattered if the driver enabled MMCONFIG for the device, 
> since it would now have the right ID in "struct pci_device".
> 
> Sure, subsequent "lspci" users would still be confused, but the kernel 
> itself would never have noticed anything strange.

A bug making lspci see something different from what the kernel sees 
initially sounds to me like a sure way to drive maintainers insane. If 
somebody had a northbridge that also screwed up the rest of the word, and 
a device that a mmconfig-using driver recognized but had problems with, 
the user would be reporting lspci info with 0001: as the device that 
doesn't work.

> Of course, just doing *all* initial probing without MMCONFIG would also 
> have fixed it, which is another thing I advocate (regardless of any 
> per-device setting).

So would always using conf1 for the non-extended space (unless the 
platform only uses mmconfig), or at least for the first 64 bytes. I'd bet 
all the subtle bugs are in the first few words, anyway. (With blatant bugs 
in the rest, of course, where we want to blacklist busses and devices)

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Daniel Barkalow
On Thu, 27 Dec 2007, Kai Ruhnau wrote:

> Linus Torvalds wrote:
> > On Thu, 27 Dec 2007, Linus Torvalds wrote:
> >   
> >> Kai, can you try that? Just remove the call to pci_enable_crs() in 
> >> pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts 
> >> working for you?
> >> 
> >
> > We could also make the error handling more permissive, and just check for 
> > the low 16 bits, which is the part that the CRS spec mentions the actual 
> > value for. The whole vendor ID of 0x0001 is mentioned int he CRS spec as 
> > being explicitly chosen exactly because it's invalid.
> >
> > That said, given that we don't actually reap any benefits from CRS support 
> > right now *anyway*, I think the right thing to do is disable it by 
> > default. But it would be interesting to know if this patch makes it work 
> > on those ATI bridges..
> >
> > Linus
> >
> > ---
> >  drivers/pci/probe.c |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 2f75d69..94cd3a4 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -908,7 +908,7 @@ pci_scan_device(struct pci_bus *bus, int devfn)
> > return NULL;
> >  
> > /* Configuration request Retry Status */
> > -   while (l == 0x0001) {
> > +   while ((l & 0x) == 0x0001) {
> > msleep(delay);
> > delay *= 2;
> > if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, )
> 
> That one did not work out so well.
> I reenabled the call to pci_enable_crs() and changed the line as above.
> That resulted in two timeouts (from dmesg):
> 
> []
> ACPI: Interpreter enabled
> ACPI: (supports S0 S3 S4 S5)
> ACPI: Using IOACPI for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (:00)
> Device :01:00.0 not responding
> Device :02:00.0 not responding
> []
> 
> Then, the kernel boots up normally except of graphics and network card
> not showing up at all in lspci.

Uh, right. We already know that your northbridge, mmconfig, CRS, and this 
device combine to always return 0001 for the Vendor ID. If we loop on 
getting that, we must time out.

I'd actually bet that the hardware bug is actually that any device that 
gives a CRS response the first time will have its Vendor ID appear as 0001 
on subsequent mmconfig accesses, which means that it's actually a bus 
quirk that probably only affects mmconfig access to something in the 
conf1-visible space. The only per-device aspect would be that it uses CRS 
(possibly correctly), and that doesn't mean that mmconfig won't be safe in 
general for the device, or even that it won't be necessary. Actually, we 
already know that per-driver enabling mmconfig is broken: sky2 is one that 
wants to opt in but there are also reports of the Vendor ID 0001 bug with 
it.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Daniel Barkalow
On Thu, 27 Dec 2007, Kai Ruhnau wrote:

 Linus Torvalds wrote:
  On Thu, 27 Dec 2007, Linus Torvalds wrote:

  Kai, can you try that? Just remove the call to pci_enable_crs() in 
  pci_scan_bridge() in drivers/pci/probe.c, and see if mmconfig starts 
  working for you?
  
 
  We could also make the error handling more permissive, and just check for 
  the low 16 bits, which is the part that the CRS spec mentions the actual 
  value for. The whole vendor ID of 0x0001 is mentioned int he CRS spec as 
  being explicitly chosen exactly because it's invalid.
 
  That said, given that we don't actually reap any benefits from CRS support 
  right now *anyway*, I think the right thing to do is disable it by 
  default. But it would be interesting to know if this patch makes it work 
  on those ATI bridges..
 
  Linus
 
  ---
   drivers/pci/probe.c |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)
 
  diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
  index 2f75d69..94cd3a4 100644
  --- a/drivers/pci/probe.c
  +++ b/drivers/pci/probe.c
  @@ -908,7 +908,7 @@ pci_scan_device(struct pci_bus *bus, int devfn)
  return NULL;
   
  /* Configuration request Retry Status */
  -   while (l == 0x0001) {
  +   while ((l  0x) == 0x0001) {
  msleep(delay);
  delay *= 2;
  if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)
 
 That one did not work out so well.
 I reenabled the call to pci_enable_crs() and changed the line as above.
 That resulted in two timeouts (from dmesg):
 
 []
 ACPI: Interpreter enabled
 ACPI: (supports S0 S3 S4 S5)
 ACPI: Using IOACPI for interrupt routing
 ACPI: PCI Root Bridge [PCI0] (:00)
 Device :01:00.0 not responding
 Device :02:00.0 not responding
 []
 
 Then, the kernel boots up normally except of graphics and network card
 not showing up at all in lspci.

Uh, right. We already know that your northbridge, mmconfig, CRS, and this 
device combine to always return 0001 for the Vendor ID. If we loop on 
getting that, we must time out.

I'd actually bet that the hardware bug is actually that any device that 
gives a CRS response the first time will have its Vendor ID appear as 0001 
on subsequent mmconfig accesses, which means that it's actually a bus 
quirk that probably only affects mmconfig access to something in the 
conf1-visible space. The only per-device aspect would be that it uses CRS 
(possibly correctly), and that doesn't mean that mmconfig won't be safe in 
general for the device, or even that it won't be necessary. Actually, we 
already know that per-driver enabling mmconfig is broken: sky2 is one that 
wants to opt in but there are also reports of the Vendor ID 0001 bug with 
it.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2007-12-27 Thread Daniel Barkalow
On Thu, 27 Dec 2007, Linus Torvalds wrote:

 On Thu, 27 Dec 2007, Daniel Barkalow wrote:
  
  I'd actually bet that the hardware bug is actually that any device that 
  gives a CRS response the first time will have its Vendor ID appear as 0001 
  on subsequent mmconfig accesses, which means that it's actually a bus 
  quirk that probably only affects mmconfig access to something in the 
  conf1-visible space. The only per-device aspect would be that it uses CRS 
  (possibly correctly), and that doesn't mean that mmconfig won't be safe in 
  general for the device, or even that it won't be necessary. Actually, we 
  already know that per-driver enabling mmconfig is broken: sky2 is one that 
  wants to opt in but there are also reports of the Vendor ID 0001 bug with 
  it.
 
 Actually, having it be a per-device thing would have fixed this particular 
 problem, if only because the device probing would have been done without 
 MMCONFIG (thus avoiding the bug), and then after it has been probed, it 
 wouldn't have mattered if the driver enabled MMCONFIG for the device, 
 since it would now have the right ID in struct pci_device.
 
 Sure, subsequent lspci users would still be confused, but the kernel 
 itself would never have noticed anything strange.

A bug making lspci see something different from what the kernel sees 
initially sounds to me like a sure way to drive maintainers insane. If 
somebody had a northbridge that also screwed up the rest of the word, and 
a device that a mmconfig-using driver recognized but had problems with, 
the user would be reporting lspci info with 0001: as the device that 
doesn't work.

 Of course, just doing *all* initial probing without MMCONFIG would also 
 have fixed it, which is another thing I advocate (regardless of any 
 per-device setting).

So would always using conf1 for the non-extended space (unless the 
platform only uses mmconfig), or at least for the first 64 bytes. I'd bet 
all the subtle bugs are in the first few words, anyway. (With blatant bugs 
in the rest, of course, where we want to blacklist busses and devices)

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] New Kernel Bugs

2007-11-16 Thread Daniel Barkalow
On Fri, 16 Nov 2007, Romano Giannetti wrote:

> 
> (Cc: trimmed a bit).
> 
> On Thu, 2007-11-15 at 11:19 -0500, Daniel Barkalow wrote:
> > On Thu, 15 Nov 2007, Theodore Tso wrote:
> [...]
> > > A full kernel build with everything selected can take good 30 minutes or 
> > > more, and that's on a fast dual-core machine with 4gigs of memory and 
> > > 7200rpm disk drives. On a slower, memory limited laptop, doing a single 
> > > kernel build can take more time than the user has patiences; multiply 
> > > that by 7 or 8 build and test boots, and it starts to get tiresome.
> > 
> > None of this is going to take as long, 
> 
> Well, the compile phase can. Especially if the first time you try to
> compile the kernel with EXTRAVERSION=`git describe` which force almost a
> full rebuild every time...

Compared to getting useful suggestions from a mailing list, especially 
before you've gotten anybody's attention? Hours or overnight isn't 
particularly long, and doesn't take up much of your time if you've got a 
working kernel to use while it's working.

> But the worst problem is that a full recompile, with a distro .config,
> will take hours on my 2.66GHz/CoreDuo/1G ram. Trimming down .config is
> fundamental to be able to bisect effectively, but it's not an easy thing
> to do for an unexperienced user (and a painful one for all the rest of
> us). 
> 
> What would be an invaluable help would be a tool that generates
> a .config with all the modules and subsystems I am using *now*. Should
> be possible in principle by parsing KConfig and Makefiles and using as
> input the current .config and lsmod... is it possible to map the kernel
> object name to the option enabling it?

I don't think there's anything set up for that, aside from the actual 
build system generating it, and I don't know how hard that would be to 
repurpose for generating a configuration.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] New Kernel Bugs

2007-11-16 Thread Daniel Barkalow
On Fri, 16 Nov 2007, Romano Giannetti wrote:

 
 (Cc: trimmed a bit).
 
 On Thu, 2007-11-15 at 11:19 -0500, Daniel Barkalow wrote:
  On Thu, 15 Nov 2007, Theodore Tso wrote:
 [...]
   A full kernel build with everything selected can take good 30 minutes or 
   more, and that's on a fast dual-core machine with 4gigs of memory and 
   7200rpm disk drives. On a slower, memory limited laptop, doing a single 
   kernel build can take more time than the user has patiences; multiply 
   that by 7 or 8 build and test boots, and it starts to get tiresome.
  
  None of this is going to take as long, 
 
 Well, the compile phase can. Especially if the first time you try to
 compile the kernel with EXTRAVERSION=`git describe` which force almost a
 full rebuild every time...

Compared to getting useful suggestions from a mailing list, especially 
before you've gotten anybody's attention? Hours or overnight isn't 
particularly long, and doesn't take up much of your time if you've got a 
working kernel to use while it's working.

 But the worst problem is that a full recompile, with a distro .config,
 will take hours on my 2.66GHz/CoreDuo/1G ram. Trimming down .config is
 fundamental to be able to bisect effectively, but it's not an easy thing
 to do for an unexperienced user (and a painful one for all the rest of
 us). 
 
 What would be an invaluable help would be a tool that generates
 a .config with all the modules and subsystems I am using *now*. Should
 be possible in principle by parsing KConfig and Makefiles and using as
 input the current .config and lsmod... is it possible to map the kernel
 object name to the option enabling it?

I don't think there's anything set up for that, aside from the actual 
build system generating it, and I don't know how hard that would be to 
repurpose for generating a configuration.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] New Kernel Bugs

2007-11-15 Thread Daniel Barkalow
On Thu, 15 Nov 2007, Theodore Tso wrote:

> On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote:
> > I don't see any reason that we couldn't have a tool accessible to Ubuntu 
> > users that does a real "git bisect". Git is really good at being scripted 
> > by fancy GUIs. It should be easy enough to have a drop down with all of 
> > the Ubuntu kernel package releases, where the user selects what works and 
> > what doesn't.
> 
> It's possible users who haven't yet downloaded a git repository have
> to surmount some obstacles that might cause them to lose interest.
> First, they have to download some 190 megs of git repository, and if
> they have a slow link, that can take a while, and then they have to
> build each kernel, which can take a while.

It should be possible for it to clone only the portion that they actually 
care about based on where the known-good version is. It should also (in 
theory, anyway) be possible to put off some amount of the download until 
it's actually going to be relevant.

> A full kernel build with everything selected can take good 30 minutes or 
> more, and that's on a fast dual-core machine with 4gigs of memory and 
> 7200rpm disk drives. On a slower, memory limited laptop, doing a single 
> kernel build can take more time than the user has patiences; multiply 
> that by 7 or 8 build and test boots, and it starts to get tiresome.

None of this is going to take as long, even on a slow link and a slow 
computer, as waiting for a response to a mailing list post. It'd annoy 
users who are specifically waiting for it, but if the interface is that 
the user says "kernel package X didn't work but the current kernel does", 
and it says "I'll let you know when I've got something to test", and the 
user watches a DVD, and afterward finds a message saying there's something 
to test, and tries it, and reports how it went, and the process repeats 
until it narrows it down to a single commit after a couple of days of the 
user getting occasional responses, it's not that different from asking for 
help online.

> And then on top of that there are the issues about whether there is
> enough support for dealing with hitting kernel revisions that fail due
> to other bugs getting merged in during the -rc1 process, etc.

Could have a distro-provided mask of things that aren't worth testing and 
possibly back-ported fixes for revisions in particular ranges.

> I agree that a tool that automated the bisection process and walked
> the user through it would be helpful, but I believe it would be
> possible for us do better.

That would probably help for giving the user something to try right away. 
I still think that the main cost to the user is the number of times that 
the user has to stop doing stuff to reboot with a kernel to test, whether 
the test kernels are available quickly from the distro site, slowly built 
locally, or slowly as suggested by humans helping online.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OT: Does Linux have any "Perfect Code"

2007-11-15 Thread Daniel Barkalow
On Thu, 15 Nov 2007, Michael Gerdau wrote:

> > This code is far to be perfect, some part is outdated, bcopy() use instead
> > of memcpy() for example. More annoying are the comment, the file is 3306
> > lines while there is only 1640 line of code, nothing bad per se but looking
> > some comments:
> > 
> > /*
> >  * Before we begin this operation, disable kernel preemption.
> >  */
> > kpreempt_disable();
> 
> 
> I'm not a kernel developer.
> 
> 
> That having said:
> I really do like such obvious (as in: for those knowing the stuff anyway)
> comments when looking at code and probably concepts I'm not familiar with.
>
> ...
> 
> I mean, isn't the whole purpose of comments to help those not familiar
> with the code to understand it's purpose and possibly the intention of
> the author (just in case the author had coded a bug) ?

That's the problem with really obvious comments. In the example above, 
that function had better disable kernel preemption with a name like that, 
and, assuming it's before the code begins the operation in sequence, we 
know when we're doing it. But the comment fails to explain why we need to 
disable kernel preemption before beginning the operation, just that we are 
doing so. Having the comment merely distracts the reader from the fact 
that the purpose of the code and the intention of the author are 
completely undocumented. And there's a realy chance that this comment or 
ones like it cause this statement and the place in the code where things 
would go wrong if preemption weren't disabled to not fit on the reader's 
screen together, so it is not only unclear what the author's intention 
was, but it is harder to figure out from looking at the code than it would 
be without comments, because fewer clues are actually visible at the same 
time, since each of them takes up extra screen space.

The code itself should be written to tell the reader everything there is 
to know about what it does, and the comments in code should only tell the 
reader why it does that.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OT: Does Linux have any Perfect Code

2007-11-15 Thread Daniel Barkalow
On Thu, 15 Nov 2007, Michael Gerdau wrote:

  This code is far to be perfect, some part is outdated, bcopy() use instead
  of memcpy() for example. More annoying are the comment, the file is 3306
  lines while there is only 1640 line of code, nothing bad per se but looking
  some comments:
  
  /*
   * Before we begin this operation, disable kernel preemption.
   */
  kpreempt_disable();
 
 disclaimer
 I'm not a kernel developer.
 /disclaimer
 
 That having said:
 I really do like such obvious (as in: for those knowing the stuff anyway)
 comments when looking at code and probably concepts I'm not familiar with.

 ...
 
 I mean, isn't the whole purpose of comments to help those not familiar
 with the code to understand it's purpose and possibly the intention of
 the author (just in case the author had coded a bug) ?

That's the problem with really obvious comments. In the example above, 
that function had better disable kernel preemption with a name like that, 
and, assuming it's before the code begins the operation in sequence, we 
know when we're doing it. But the comment fails to explain why we need to 
disable kernel preemption before beginning the operation, just that we are 
doing so. Having the comment merely distracts the reader from the fact 
that the purpose of the code and the intention of the author are 
completely undocumented. And there's a realy chance that this comment or 
ones like it cause this statement and the place in the code where things 
would go wrong if preemption weren't disabled to not fit on the reader's 
screen together, so it is not only unclear what the author's intention 
was, but it is harder to figure out from looking at the code than it would 
be without comments, because fewer clues are actually visible at the same 
time, since each of them takes up extra screen space.

The code itself should be written to tell the reader everything there is 
to know about what it does, and the comments in code should only tell the 
reader why it does that.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] New Kernel Bugs

2007-11-15 Thread Daniel Barkalow
On Thu, 15 Nov 2007, Theodore Tso wrote:

 On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote:
  I don't see any reason that we couldn't have a tool accessible to Ubuntu 
  users that does a real git bisect. Git is really good at being scripted 
  by fancy GUIs. It should be easy enough to have a drop down with all of 
  the Ubuntu kernel package releases, where the user selects what works and 
  what doesn't.
 
 It's possible users who haven't yet downloaded a git repository have
 to surmount some obstacles that might cause them to lose interest.
 First, they have to download some 190 megs of git repository, and if
 they have a slow link, that can take a while, and then they have to
 build each kernel, which can take a while.

It should be possible for it to clone only the portion that they actually 
care about based on where the known-good version is. It should also (in 
theory, anyway) be possible to put off some amount of the download until 
it's actually going to be relevant.

 A full kernel build with everything selected can take good 30 minutes or 
 more, and that's on a fast dual-core machine with 4gigs of memory and 
 7200rpm disk drives. On a slower, memory limited laptop, doing a single 
 kernel build can take more time than the user has patiences; multiply 
 that by 7 or 8 build and test boots, and it starts to get tiresome.

None of this is going to take as long, even on a slow link and a slow 
computer, as waiting for a response to a mailing list post. It'd annoy 
users who are specifically waiting for it, but if the interface is that 
the user says kernel package X didn't work but the current kernel does, 
and it says I'll let you know when I've got something to test, and the 
user watches a DVD, and afterward finds a message saying there's something 
to test, and tries it, and reports how it went, and the process repeats 
until it narrows it down to a single commit after a couple of days of the 
user getting occasional responses, it's not that different from asking for 
help online.

 And then on top of that there are the issues about whether there is
 enough support for dealing with hitting kernel revisions that fail due
 to other bugs getting merged in during the -rc1 process, etc.

Could have a distro-provided mask of things that aren't worth testing and 
possibly back-ported fixes for revisions in particular ranges.

 I agree that a tool that automated the bisection process and walked
 the user through it would be helpful, but I believe it would be
 possible for us do better.

That would probably help for giving the user something to try right away. 
I still think that the main cost to the user is the number of times that 
the user has to stop doing stuff to reboot with a kernel to test, whether 
the test kernels are available quickly from the distro site, slowly built 
locally, or slowly as suggested by humans helping online.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] New Kernel Bugs

2007-11-14 Thread Daniel Barkalow
On Tue, 13 Nov 2007, Theodore Tso wrote:

> There are two parts to this.  One is a Ubuntu development kernel which
> we can give to large numbers of people to expand our testing pool.
> But if we don't do a better job of responding to bug reports that
> would be generated by expanded testing this won't necessarily help us.
> 
> The other an automated set of standard pre-built bisection points so
> that testers can more easily localize a bug down to a few hundred
> commits without needing to learn how to use "git bisect" (think Ubuntu
> users).

I don't see any reason that we couldn't have a tool accessible to Ubuntu 
users that does a real "git bisect". Git is really good at being scripted 
by fancy GUIs. It should be easy enough to have a drop down with all of 
the Ubuntu kernel package releases, where the user selects what works and 
what doesn't. Then the tool clones a git repository with flags to only get 
relevant parts, and then leads a bisect run, where it's also 
configuring, building, and installing the kernels (as a different grub 
entry), and providing instructions in general. Fundamentally, "git bisect" 
is a really low-interaction process: you tell it a couple of commits, and 
then it does stuff, and then you tell it "I tested, and it worked" or "I 
tested, and it had the problem" or "Something else went wrong", and it 
asks you something new. Other than that, it just takes time (and a build 
system hook, which this tool would handle for the kernel). Eventually, it 
tells you what to report, and you do so.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] New Kernel Bugs

2007-11-14 Thread Daniel Barkalow
On Tue, 13 Nov 2007, Theodore Tso wrote:

 There are two parts to this.  One is a Ubuntu development kernel which
 we can give to large numbers of people to expand our testing pool.
 But if we don't do a better job of responding to bug reports that
 would be generated by expanded testing this won't necessarily help us.
 
 The other an automated set of standard pre-built bisection points so
 that testers can more easily localize a bug down to a few hundred
 commits without needing to learn how to use git bisect (think Ubuntu
 users).

I don't see any reason that we couldn't have a tool accessible to Ubuntu 
users that does a real git bisect. Git is really good at being scripted 
by fancy GUIs. It should be easy enough to have a drop down with all of 
the Ubuntu kernel package releases, where the user selects what works and 
what doesn't. Then the tool clones a git repository with flags to only get 
relevant parts, and then leads a bisect run, where it's also 
configuring, building, and installing the kernels (as a different grub 
entry), and providing instructions in general. Fundamentally, git bisect 
is a really low-interaction process: you tell it a couple of commits, and 
then it does stuff, and then you tell it I tested, and it worked or I 
tested, and it had the problem or Something else went wrong, and it 
asks you something new. Other than that, it just takes time (and a build 
system hook, which this tool would handle for the kernel). Eventually, it 
tells you what to report, and you do so.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.

2007-10-24 Thread Daniel Barkalow
On Tue, 23 Oct 2007, David Miller wrote:

> From: Daniel Barkalow <[EMAIL PROTECTED]>
> Date: Wed, 24 Oct 2007 00:58:45 -0400 (EDT)
> 
> > I'm not sure all of the pci_intx() calls in msi.c should be skipped when 
> > the quirk applies; I think some of them might be there so that the legacy 
> > interrupt won't be delivered while MSI is turned off (since the handler 
> > isn't listening for the legacy interrupts). I'd guess this would cause 
> > people to have their MSI-capable device kill their non-MSI-capable device 
> > when they restore their laptop (and the shared interrupt fires and gets 
> > stuck at just the wrong time). No idea if this is a real concern, but I'm 
> > pretty sure that not all of those calls are recent.
> 
> I don't think it's a real concern.

Okay, good. As long as someone more clueful than me has thought about it, 
because I couldn't tell off hand.

> > There's a couple of ATA drivers that look like they might be trying to 
> > work around the same bug, but it's a bit hard to tell. It might be good to 
> > have them use the quirk (or set the flag) because it's cleaner.
> 
> I noticed these cases as well, and I would hope that Jeff would help
> out here using the infrastructure my patches created.

Or coordinate with someone with the quirky hardware, yes.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.

2007-10-24 Thread Daniel Barkalow
On Tue, 23 Oct 2007, David Miller wrote:

 From: Daniel Barkalow [EMAIL PROTECTED]
 Date: Wed, 24 Oct 2007 00:58:45 -0400 (EDT)
 
  I'm not sure all of the pci_intx() calls in msi.c should be skipped when 
  the quirk applies; I think some of them might be there so that the legacy 
  interrupt won't be delivered while MSI is turned off (since the handler 
  isn't listening for the legacy interrupts). I'd guess this would cause 
  people to have their MSI-capable device kill their non-MSI-capable device 
  when they restore their laptop (and the shared interrupt fires and gets 
  stuck at just the wrong time). No idea if this is a real concern, but I'm 
  pretty sure that not all of those calls are recent.
 
 I don't think it's a real concern.

Okay, good. As long as someone more clueful than me has thought about it, 
because I couldn't tell off hand.

  There's a couple of ATA drivers that look like they might be trying to 
  work around the same bug, but it's a bit hard to tell. It might be good to 
  have them use the quirk (or set the flag) because it's cleaner.
 
 I noticed these cases as well, and I would hope that Jeff would help
 out here using the infrastructure my patches created.

Or coordinate with someone with the quirky hardware, yes.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.

2007-10-23 Thread Daniel Barkalow
On Tue, 23 Oct 2007, David Miller wrote:

> 
> The forthcoming patches are also available from:
> 
>   kernel.org:/pub/scm/linux/kernel/git/davem/msiquirk-2.6.git
> 
> and clean up the handling of the common quirk wherein setting
> INTX_DISABLE will mistakedly disable MSI generation for some
> devices.
> 
> For devices without that problem, we want to keep the pci_intx() calls
> in drivers/pci/msi.c because those help protect against devices
> with the opposite problem.  Such devices always generate INTX
> interrupts even when MSI is enabled, unless INTX_DISABLE is set.
> 
> Michael, please pay special attention to patch #3.  I think I
> picked the correct PCI device IDs to match for the quirk
> (5714* and 5780*) but it's possible we might need more elaborate
> checks here.  It at least worked properly for the chips in my
> Niagara system.

I'm not sure all of the pci_intx() calls in msi.c should be skipped when 
the quirk applies; I think some of them might be there so that the legacy 
interrupt won't be delivered while MSI is turned off (since the handler 
isn't listening for the legacy interrupts). I'd guess this would cause 
people to have their MSI-capable device kill their non-MSI-capable device 
when they restore their laptop (and the shared interrupt fires and gets 
stuck at just the wrong time). No idea if this is a real concern, but I'm 
pretty sure that not all of those calls are recent.

> In addition to the Tigon3 cases, I added quirk entries for the
> SB700/800 SATA chips and the IXP SB400 USB controllers.

There's a couple of ATA drivers that look like they might be trying to 
work around the same bug, but it's a bit hard to tell. It might be good to 
have them use the quirk (or set the flag) because it's cleaner.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4]: Resolve MSI vs. INTX_DISABLE quirks.

2007-10-23 Thread Daniel Barkalow
On Tue, 23 Oct 2007, David Miller wrote:

 
 The forthcoming patches are also available from:
 
   kernel.org:/pub/scm/linux/kernel/git/davem/msiquirk-2.6.git
 
 and clean up the handling of the common quirk wherein setting
 INTX_DISABLE will mistakedly disable MSI generation for some
 devices.
 
 For devices without that problem, we want to keep the pci_intx() calls
 in drivers/pci/msi.c because those help protect against devices
 with the opposite problem.  Such devices always generate INTX
 interrupts even when MSI is enabled, unless INTX_DISABLE is set.
 
 Michael, please pay special attention to patch #3.  I think I
 picked the correct PCI device IDs to match for the quirk
 (5714* and 5780*) but it's possible we might need more elaborate
 checks here.  It at least worked properly for the chips in my
 Niagara system.

I'm not sure all of the pci_intx() calls in msi.c should be skipped when 
the quirk applies; I think some of them might be there so that the legacy 
interrupt won't be delivered while MSI is turned off (since the handler 
isn't listening for the legacy interrupts). I'd guess this would cause 
people to have their MSI-capable device kill their non-MSI-capable device 
when they restore their laptop (and the shared interrupt fires and gets 
stuck at just the wrong time). No idea if this is a real concern, but I'm 
pretty sure that not all of those calls are recent.

 In addition to the Tigon3 cases, I added quirk entries for the
 SB700/800 SATA chips and the IXP SB400 USB controllers.

There's a couple of ATA drivers that look like they might be trying to 
work around the same bug, but it's a bit hard to tell. It might be good to 
have them use the quirk (or set the flag) because it's cleaner.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-22 Thread Daniel Barkalow
On Mon, 22 Oct 2007, David Miller wrote:

> My suggestion is:
> 
> 1) Leave the pci_intx() twiddling code in drivers/pci/msi.c
> 
> 2) Add quirks for "INTX_DISABLE turns off MSI too", this sets
>a flag in the pci_dev.
> 
> 3) The pci_intx() calls in drivers/pci/msi.c are skipped if this
>flag from #2 is set.
> 
> 4) Add quirk entries for drivers/net/tg3.c chips and these SATA
>devices we are learning about here, as well as any others we
>are aware of right now.
> 
> 5) Remove the pci_intx() workaround code from drivers/net/tg3.c
>and elsewhere.

Seems right to me, and pretty straightforward, except that I don't really 
understand the pm-related logic in there to know how that should work and 
whether intx will need to be enabled somewhere in addition to not 
disabling it in the msi enable code.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-22 Thread Daniel Barkalow
On Mon, 22 Oct 2007, Jeff Garzik wrote:

> Daniel Barkalow wrote:
> > On Fri, 19 Oct 2007, Jeff Garzik wrote:
> > 
> > > Linas Vepstas wrote:
> > > > On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote:
> > > > > Since we have little experience on PCI and MSI here, we had to try to
> > > > As someone else pointed out, AMD should have *lots* of people with
> > > > pci and msi experience on the payroll.  (Folks here buy AMD-designed pci
> > > > chips ...)
> > > >
> > > > > ONLY
> > > > > comment out the pci_intx() call in drivers/ata/ahci.c
> > > > > My system can boot up too with MSI enabled!
> > > > >
> > > > > So does it mean that the root cause is our SB700 SATA controller
> > > > > has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
> > > > > register masks MSI interrupts too? 
> > > > That's what it sounds like, to me.
> > > >
> > > > > And what is the software solution or workaround?
> > > > Not sure. Sounds like the device driver needs a quirk for this part.
> > >
> > > Take a look at tg3.c net driver change
> > > 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation.
> > >
> > > However, it may turn out that removing the pci_intx() stuff as a general
> > > rule
> > > is easier than quirking these devices, if enough of them turn out to have
> > > this
> > > hardware bug.
> > 
> > At a first approximation, ATI/AMD devices don't send any interrupts if intx
> > is disabled, nVidia devices send legacy interrupts in addition to MSI ones
> > if intx isn't disabled, and Intel devices actually work correctly. So we
> > need at least one kind of device quirk for intx and msi. (And doing it in
> > the drivers doesn't work, since everybody is making things driven by
> > snd_hda_intel and would like msi, afaict)
> 
> Note that INTX_DISABLE is a recent addition to PCI.  Older PCI devices support
> neither MSI nor INTX-disable, so make sure such devices don't creep into your
> sample.

I have a device that supports MSI and INTX-disable, and, with MSI on (and 
delivering interrupts successfully) also sends legacy interrupts (on 
the IRQ that is no longer associated with the device) unless INTX is 
disabled. Without the intx_disable(), the kernel disables the IRQ 
entirely and breaks a random other device in my system.

It's:

00:07.0 Bridge: nVidia Corporation MCP61 Ethernet (rev a2)

I haven't tried MSI with the other devices in the system, but I expect 
that this:

00:05.0 Audio device: nVidia Corporation MCP61 High Definition Audio (rev a2)

will have the same issue, and use a multi-vendor driver.

> In general it is documented that INTX_DISABLE should apply only to INTx# so
> devices that disable MSI based on that bit are out of spec.  But unfortunately
> that is rather irrelevant, since we see these out-of-spec devices in the field
> today.

It's likewise documented (although maybe arguable in wording) that the 
device shouldn't send legacy interrupts if MSI is in use, regardless of 
INTX_DISABLE, but this also happens in the field.

I think that the current Linux behavior with respect to INTX_DISABLE is 
simply due to which hardware bug was present in the device whose driver 
first got Linux support, but one or the other or both needs a quirk, since 
there's no behavior that works with everything. And it's still impossible 
to tell which bug is more common, since MSI isn't used most of the time, 
even if the hardware supports it, so it's pretty arbitrary which way Linux 
goes in the non-quirk case.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-22 Thread Daniel Barkalow
On Fri, 19 Oct 2007, Jeff Garzik wrote:

> Linas Vepstas wrote:
> > On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote:
> > > Since we have little experience on PCI and MSI here, we had to try to
> > 
> > As someone else pointed out, AMD should have *lots* of people with
> > pci and msi experience on the payroll.  (Folks here buy AMD-designed pci
> > chips ...)
> > 
> > > ONLY
> > > comment out the pci_intx() call in drivers/ata/ahci.c
> > > My system can boot up too with MSI enabled!
> > >
> > > So does it mean that the root cause is our SB700 SATA controller
> > > has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
> > > register masks MSI interrupts too? 
> > 
> > That's what it sounds like, to me.
> > 
> > > And what is the software solution or workaround?
> > 
> > Not sure. Sounds like the device driver needs a quirk for this part.
> 
> 
> Take a look at tg3.c net driver change
> 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation.
> 
> However, it may turn out that removing the pci_intx() stuff as a general rule
> is easier than quirking these devices, if enough of them turn out to have this
> hardware bug.

At a first approximation, ATI/AMD devices don't send any interrupts if 
intx is disabled, nVidia devices send legacy interrupts in addition to MSI 
ones if intx isn't disabled, and Intel devices actually work correctly. So 
we need at least one kind of device quirk for intx and msi. (And doing it 
in the drivers doesn't work, since everybody is making things driven by 
snd_hda_intel and would like msi, afaict)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-22 Thread Daniel Barkalow
On Fri, 19 Oct 2007, Jeff Garzik wrote:

 Linas Vepstas wrote:
  On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote:
   Since we have little experience on PCI and MSI here, we had to try to
  
  As someone else pointed out, AMD should have *lots* of people with
  pci and msi experience on the payroll.  (Folks here buy AMD-designed pci
  chips ...)
  
   ONLY
   comment out the pci_intx() call in drivers/ata/ahci.c
   My system can boot up too with MSI enabled!
  
   So does it mean that the root cause is our SB700 SATA controller
   has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
   register masks MSI interrupts too? 
  
  That's what it sounds like, to me.
  
   And what is the software solution or workaround?
  
  Not sure. Sounds like the device driver needs a quirk for this part.
 
 
 Take a look at tg3.c net driver change
 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation.
 
 However, it may turn out that removing the pci_intx() stuff as a general rule
 is easier than quirking these devices, if enough of them turn out to have this
 hardware bug.

At a first approximation, ATI/AMD devices don't send any interrupts if 
intx is disabled, nVidia devices send legacy interrupts in addition to MSI 
ones if intx isn't disabled, and Intel devices actually work correctly. So 
we need at least one kind of device quirk for intx and msi. (And doing it 
in the drivers doesn't work, since everybody is making things driven by 
snd_hda_intel and would like msi, afaict)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-22 Thread Daniel Barkalow
On Mon, 22 Oct 2007, Jeff Garzik wrote:

 Daniel Barkalow wrote:
  On Fri, 19 Oct 2007, Jeff Garzik wrote:
  
   Linas Vepstas wrote:
On Fri, Oct 19, 2007 at 09:17:23PM +0800, Shane Huang wrote:
 Since we have little experience on PCI and MSI here, we had to try to
As someone else pointed out, AMD should have *lots* of people with
pci and msi experience on the payroll.  (Folks here buy AMD-designed pci
chips ...)
   
 ONLY
 comment out the pci_intx() call in drivers/ata/ahci.c
 My system can boot up too with MSI enabled!

 So does it mean that the root cause is our SB700 SATA controller
 has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
 register masks MSI interrupts too? 
That's what it sounds like, to me.
   
 And what is the software solution or workaround?
Not sure. Sounds like the device driver needs a quirk for this part.
  
   Take a look at tg3.c net driver change
   2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation.
  
   However, it may turn out that removing the pci_intx() stuff as a general
   rule
   is easier than quirking these devices, if enough of them turn out to have
   this
   hardware bug.
  
  At a first approximation, ATI/AMD devices don't send any interrupts if intx
  is disabled, nVidia devices send legacy interrupts in addition to MSI ones
  if intx isn't disabled, and Intel devices actually work correctly. So we
  need at least one kind of device quirk for intx and msi. (And doing it in
  the drivers doesn't work, since everybody is making things driven by
  snd_hda_intel and would like msi, afaict)
 
 Note that INTX_DISABLE is a recent addition to PCI.  Older PCI devices support
 neither MSI nor INTX-disable, so make sure such devices don't creep into your
 sample.

I have a device that supports MSI and INTX-disable, and, with MSI on (and 
delivering interrupts successfully) also sends legacy interrupts (on 
the IRQ that is no longer associated with the device) unless INTX is 
disabled. Without the intx_disable(), the kernel disables the IRQ 
entirely and breaks a random other device in my system.

It's:

00:07.0 Bridge: nVidia Corporation MCP61 Ethernet (rev a2)

I haven't tried MSI with the other devices in the system, but I expect 
that this:

00:05.0 Audio device: nVidia Corporation MCP61 High Definition Audio (rev a2)

will have the same issue, and use a multi-vendor driver.

 In general it is documented that INTX_DISABLE should apply only to INTx# so
 devices that disable MSI based on that bit are out of spec.  But unfortunately
 that is rather irrelevant, since we see these out-of-spec devices in the field
 today.

It's likewise documented (although maybe arguable in wording) that the 
device shouldn't send legacy interrupts if MSI is in use, regardless of 
INTX_DISABLE, but this also happens in the field.

I think that the current Linux behavior with respect to INTX_DISABLE is 
simply due to which hardware bug was present in the device whose driver 
first got Linux support, but one or the other or both needs a quirk, since 
there's no behavior that works with everything. And it's still impossible 
to tell which bug is more common, since MSI isn't used most of the time, 
even if the hardware supports it, so it's pretty arbitrary which way Linux 
goes in the non-quirk case.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-22 Thread Daniel Barkalow
On Mon, 22 Oct 2007, David Miller wrote:

 My suggestion is:
 
 1) Leave the pci_intx() twiddling code in drivers/pci/msi.c
 
 2) Add quirks for INTX_DISABLE turns off MSI too, this sets
a flag in the pci_dev.
 
 3) The pci_intx() calls in drivers/pci/msi.c are skipped if this
flag from #2 is set.
 
 4) Add quirk entries for drivers/net/tg3.c chips and these SATA
devices we are learning about here, as well as any others we
are aware of right now.
 
 5) Remove the pci_intx() workaround code from drivers/net/tg3.c
and elsewhere.

Seems right to me, and pretty straightforward, except that I don't really 
understand the pm-related logic in there to know how that should work and 
whether intx will need to be enabled somewhere in addition to not 
disabling it in the msi enable code.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-19 Thread Daniel Barkalow
On Thu, 18 Oct 2007, David Miller wrote:

> From: "Shane Huang" <[EMAIL PROTECTED]>
> Date: Thu, 18 Oct 2007 18:37:59 +0800
> 
> > Hi Miller:
> > 
> > Thank you for your response.
> > 
> > The reason why MSIs of these northbridges do not work is still under
> > further debug, we are NOT able to tell its hardware issue or software
> > issue at this  time. But enablement of them will lead to the OS
> > installation failure in many distributions like openSUSE, Ubuntu etc:
> > https://bugzilla.novell.com/show_bug.cgi?id=302016
> > 
> > So we have to disable them firstly before we find out the root cause,
> > maybe they are just workarounds.
> 
> This logic seems backwards, to me.  "shoot first, ask questions later"
> To me this it not how to approach this problem.
> 
> Once you turn MSI off, there is next to no incentive to fix the
> problem because users aren't running into it any longer.
> 
> The only two devices in that bug report which should be using MSI
> would be the SATA controller and the broadcom ethernet NIC.  And by
> the failed bootup logs provided by the user the problem is clearly
> with the SATA controller.

And the same SATA controller could show up behind a different northbridge. 
It would be unfortunate to hit the same device bug independantly on each 
system and work around it by doing something that won't help the next 
user.

> One common problem we're finding is that some devices have a hardware
> bug where setting INTX_DISABLE in the PCI COMMAND register masks MSI
> interrupts too.
> 
> I mention this because the user in that report mentions that the
> kernel upgrade causes the failure, and one thing we started doing not
> too long ago was to set the INTX_DISABLE bit when MSI is enabled for a
> device.
> 
> So maybe this SATA controller has this problem too.  It is easy to
> test, simply comment out all of the pci_intx() function calls in
> drivers/pci/msi.c and perform a test boot with MSI enabled.

Have we gotten around to having a device quirk for this? I bet it won't be 
too long before we see a system where the SATA controller doesn't work 
with INTX disabled and the ethernet controller doesn't work with it 
enabled, since we've seen devices with each of these bugs.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] PCI: disable MSI on more ATI NorthBridges

2007-10-19 Thread Daniel Barkalow
On Thu, 18 Oct 2007, David Miller wrote:

 From: Shane Huang [EMAIL PROTECTED]
 Date: Thu, 18 Oct 2007 18:37:59 +0800
 
  Hi Miller:
  
  Thank you for your response.
  
  The reason why MSIs of these northbridges do not work is still under
  further debug, we are NOT able to tell its hardware issue or software
  issue at this  time. But enablement of them will lead to the OS
  installation failure in many distributions like openSUSE, Ubuntu etc:
  https://bugzilla.novell.com/show_bug.cgi?id=302016
  
  So we have to disable them firstly before we find out the root cause,
  maybe they are just workarounds.
 
 This logic seems backwards, to me.  shoot first, ask questions later
 To me this it not how to approach this problem.
 
 Once you turn MSI off, there is next to no incentive to fix the
 problem because users aren't running into it any longer.
 
 The only two devices in that bug report which should be using MSI
 would be the SATA controller and the broadcom ethernet NIC.  And by
 the failed bootup logs provided by the user the problem is clearly
 with the SATA controller.

And the same SATA controller could show up behind a different northbridge. 
It would be unfortunate to hit the same device bug independantly on each 
system and work around it by doing something that won't help the next 
user.

 One common problem we're finding is that some devices have a hardware
 bug where setting INTX_DISABLE in the PCI COMMAND register masks MSI
 interrupts too.
 
 I mention this because the user in that report mentions that the
 kernel upgrade causes the failure, and one thing we started doing not
 too long ago was to set the INTX_DISABLE bit when MSI is enabled for a
 device.
 
 So maybe this SATA controller has this problem too.  It is easy to
 test, simply comment out all of the pci_intx() function calls in
 drivers/pci/msi.c and perform a test boot with MSI enabled.

Have we gotten around to having a device quirk for this? I bet it won't be 
too long before we see a system where the SATA controller doesn't work 
with INTX disabled and the ethernet controller doesn't work with it 
enabled, since we've seen devices with each of these bugs.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NDAs - ANY KNOWN RULES?

2007-06-26 Thread Daniel Barkalow
On Wed, 27 Jun 2007, hermann pitton wrote:

> Hi,
> 
> such stuff causes a lot of troubles since long.
> 
> Are there any rules, or can everybody go on as some sort of freelancer
> exclusively on such? I don't like it!

http://www.linux-foundation.org/en/NDA_program

In short, the Linux Foundation can negotiate a reasonable NDA for you to 
sign, and they may be able to show you relevant documents as a freelancer 
under a reasonable and standardized contract.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NDAs - ANY KNOWN RULES?

2007-06-26 Thread Daniel Barkalow
On Wed, 27 Jun 2007, hermann pitton wrote:

 Hi,
 
 such stuff causes a lot of troubles since long.
 
 Are there any rules, or can everybody go on as some sort of freelancer
 exclusively on such? I don't like it!

http://www.linux-foundation.org/en/NDA_program

In short, the Linux Foundation can negotiate a reasonable NDA for you to 
sign, and they may be able to show you relevant documents as a freelancer 
under a reasonable and standardized contract.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5 regression

2007-06-18 Thread Daniel Barkalow
On Mon, 18 Jun 2007, Linus Torvalds wrote:

> On Mon, 18 Jun 2007, Carlo Wood wrote:
>
> > diff --git a/scripts/package/Makefile b/scripts/package/Makefile
> > index 7c434e0..f758b75 100644
> > --- a/scripts/package/Makefile
> > +++ b/scripts/package/Makefile
> 
> but this one has actually been modified. To this:
> 
> > +# Dummy file 
> > +help:
> 
> And finally, 
> 
> > diff --git a/scripts/package/builddeb b/scripts/package/builddeb
> > deleted file mode 100644
> > index 6edb29f..000
> 
> That one also has been actually deleted. And "make distclean" doesn't do 
> that. You have something else going on.

Probably make-kpkg removing the in-tree instructions for building debian 
packages so that its own rules will be used instead or something like 
that.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc5 regression

2007-06-18 Thread Daniel Barkalow
On Mon, 18 Jun 2007, Linus Torvalds wrote:

 On Mon, 18 Jun 2007, Carlo Wood wrote:

  diff --git a/scripts/package/Makefile b/scripts/package/Makefile
  index 7c434e0..f758b75 100644
  --- a/scripts/package/Makefile
  +++ b/scripts/package/Makefile
 
 but this one has actually been modified. To this:
 
  +# Dummy file 
  +help:
 
 And finally, 
 
  diff --git a/scripts/package/builddeb b/scripts/package/builddeb
  deleted file mode 100644
  index 6edb29f..000
 
 That one also has been actually deleted. And make distclean doesn't do 
 that. You have something else going on.

Probably make-kpkg removing the in-tree instructions for building debian 
packages so that its own rules will be used instead or something like 
that.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: My kernel hangs again: Help with git please

2007-06-16 Thread Daniel Barkalow
On Sat, 16 Jun 2007, Carlo Wood wrote:

> On Sat, Jun 16, 2007 at 01:28:13AM -0400, Daniel Barkalow wrote:
> > That's not actually the right image. There's a graph of commits with a lot 
> > of splitting and joining lines. Each branch and each tag sits something in 
> > this web. The difference between branches and tags is that you're expected 
> > to move branch pointers around, and tags stay mostly in place. There's no 
> > accounting of commits newer than the current spot in the web for a branch 
> > belonging to that branch, so if you move a branch back to an older tag (or 
> > other commit), the spot it's leaving is no longer "on the branch".
> 
> Okay, it took me two hours before I understood this... but here's the
> picture that I have in mind now:
> 
>master->X(merge point)
>   /|
>  / |
>   ^ branch->3  X
>  Time | |  |
>   | 2  X
> |  |
> 1  X
> |  |
>  \ |
>   \|
>X(branch point)
>|

Right, except that, in your repository, "master" has ended up pointing to 
"3" also. Or, in any case, all of your local branches ("master" is no 
different from other branches, except that it's the initial default name 
for a branch) are somewhere down the web from the latest stuff from 
Linus's repositry.

> Then if I define a branch pointer to point to '3', then the branch is
> 3--2--1. If next I move the branch pointer to point to '2', node '3' is
> no longer on the branch because now the branch exists of 2--1, and
> HEAD moves to '2' as well.

Right, except that "HEAD" is really just a symlink, not a pointer 
directly to the history; the branch it points to is what you've got in 
your working directory currently. So in that case, HEAD moves to '2' 
simply because it's indirection for branch, which has moved to 2.

Side note: in more recent versions of git, there's the feature that you 
seem to be trying to use. It's called "detatched HEAD", and means that you 
can have HEAD be some arbitrary commit, not a link to a branch. You'd do 
this with "git checkout ", and then your working directory 
would match that revision, and "git branch" would have no *, and you 
wouldn't have a current branch at all, and you wouldn't be moving branch 
pointers around. But I don't think you're using a version of git that 
supports this, and you need to get your branch pointers back to the 
present anyway.

> This seems to make most sense in the light of your last sentence.
> I don't understand how I'd have moved branch pointers however. I thought
> I would just change my working copy along the branch by specifying
> tag nodes. Ie, I have a branch '3'(--2--1) and I say: give me '2',
> then give me '1' - and when I do: git reset --hard HEAD - it moves
> me to 3 because the branch was never touched.

git reset --hard  moves the current branch to that revision, as 
well as moving the working directory (and the index, which doesn't matter 
for your case). If you were thinking that it only changed the working 
directory, you probably moved some branches without realizing it.

> > So master is a point in the web, and bisect jumps around through the web 
> > according to some special rules (due to having git-bisect use the good/bad 
> > marks do determine which commit to try next, and jump there). git-bisect 
> > doesn't really even care that you started on any single branch. It's just 
> > operating on the web, and the branch you start on is treated as an 
> > arbitrary commit that has the problem.
> 
> Ok - so it does something magical that I don't have to understand :P
> The only thing that matters is that I choose the begin and end point,
> the first two points, correctly: where one is bad and the other is good.
> I seems that git bisect can't deal with swapping good/bad (the 'bad'
> one always has to be the newest revision), so I had decided to call
> 'kernel hangs' good and 'kernel works' bad. The problem then is that
> I can't find any starting point anymore that is 'bad'.

Right; since the normal goal is to find regressions, not fixes, "bad" is 
the "after it changed" case, and "good" is the "before it changed". It is 
trying to find a commit which all of the "bad" commits are descended from, 
and which is descended from only "good" commits.

> > You may find "gitk --all" informative.
> 
> The dates on the right side seem to make no sense. Even in a part
> where there are no branches/merges at all, the date goes in both
> direction (sometimes older, sometimes newer).

Re: My kernel hangs again: Help with git please

2007-06-16 Thread Daniel Barkalow
On Sat, 16 Jun 2007, Carlo Wood wrote:

 On Sat, Jun 16, 2007 at 01:28:13AM -0400, Daniel Barkalow wrote:
  That's not actually the right image. There's a graph of commits with a lot 
  of splitting and joining lines. Each branch and each tag sits something in 
  this web. The difference between branches and tags is that you're expected 
  to move branch pointers around, and tags stay mostly in place. There's no 
  accounting of commits newer than the current spot in the web for a branch 
  belonging to that branch, so if you move a branch back to an older tag (or 
  other commit), the spot it's leaving is no longer on the branch.
 
 Okay, it took me two hours before I understood this... but here's the
 picture that I have in mind now:
 
master-X(merge point)
   /|
  / |
   ^ branch-3  X
  Time | |  |
   | 2  X
 |  |
 1  X
 |  |
  \ |
   \|
X(branch point)
|

Right, except that, in your repository, master has ended up pointing to 
3 also. Or, in any case, all of your local branches (master is no 
different from other branches, except that it's the initial default name 
for a branch) are somewhere down the web from the latest stuff from 
Linus's repositry.

 Then if I define a branch pointer to point to '3', then the branch is
 3--2--1. If next I move the branch pointer to point to '2', node '3' is
 no longer on the branch because now the branch exists of 2--1, and
 HEAD moves to '2' as well.

Right, except that HEAD is really just a symlink, not a pointer 
directly to the history; the branch it points to is what you've got in 
your working directory currently. So in that case, HEAD moves to '2' 
simply because it's indirection for branch, which has moved to 2.

Side note: in more recent versions of git, there's the feature that you 
seem to be trying to use. It's called detatched HEAD, and means that you 
can have HEAD be some arbitrary commit, not a link to a branch. You'd do 
this with git checkout some revision, and then your working directory 
would match that revision, and git branch would have no *, and you 
wouldn't have a current branch at all, and you wouldn't be moving branch 
pointers around. But I don't think you're using a version of git that 
supports this, and you need to get your branch pointers back to the 
present anyway.

 This seems to make most sense in the light of your last sentence.
 I don't understand how I'd have moved branch pointers however. I thought
 I would just change my working copy along the branch by specifying
 tag nodes. Ie, I have a branch '3'(--2--1) and I say: give me '2',
 then give me '1' - and when I do: git reset --hard HEAD - it moves
 me to 3 because the branch was never touched.

git reset --hard revision moves the current branch to that revision, as 
well as moving the working directory (and the index, which doesn't matter 
for your case). If you were thinking that it only changed the working 
directory, you probably moved some branches without realizing it.

  So master is a point in the web, and bisect jumps around through the web 
  according to some special rules (due to having git-bisect use the good/bad 
  marks do determine which commit to try next, and jump there). git-bisect 
  doesn't really even care that you started on any single branch. It's just 
  operating on the web, and the branch you start on is treated as an 
  arbitrary commit that has the problem.
 
 Ok - so it does something magical that I don't have to understand :P
 The only thing that matters is that I choose the begin and end point,
 the first two points, correctly: where one is bad and the other is good.
 I seems that git bisect can't deal with swapping good/bad (the 'bad'
 one always has to be the newest revision), so I had decided to call
 'kernel hangs' good and 'kernel works' bad. The problem then is that
 I can't find any starting point anymore that is 'bad'.

Right; since the normal goal is to find regressions, not fixes, bad is 
the after it changed case, and good is the before it changed. It is 
trying to find a commit which all of the bad commits are descended from, 
and which is descended from only good commits.

  You may find gitk --all informative.
 
 The dates on the right side seem to make no sense. Even in a part
 where there are no branches/merges at all, the date goes in both
 direction (sometimes older, sometimes newer). Roughly it seems that
 the newest date is at the top - but I see a lot of times things like:
 
 |||O||  Description   Author1  2007-05-14 03:43:20
 |||O||  Description   Author2  2007-05-15 15:10:34
 |||O||  Description   Author3  2007-05-13 17:50:27
 
 Thus, there seems to be no time related ordering :/

Those dates are when the patches which became the commits were written. 
The ordering is the lineage of the revisions in the repository

Re: My kernel hangs again: Help with git please

2007-06-15 Thread Daniel Barkalow
On Sat, 16 Jun 2007, Carlo Wood wrote:

> I don't understand - any branch that I am on has many tags. I can use
> 'git reset --hard sometag' to change the source tree to that tag (which
> works if I look at the version in the Makefile and pick tags that are
> far apart enough).

That's not actually the right image. There's a graph of commits with a lot 
of splitting and joining lines. Each branch and each tag sits something in 
this web. The difference between branches and tags is that you're expected 
to move branch pointers around, and tags stay mostly in place. There's no 
accounting of commits newer than the current spot in the web for a branch 
belonging to that branch, so if you move a branch back to an older tag (or 
other commit), the spot it's leaving is no longer "on the branch".

So master is a point in the web, and bisect jumps around through the web 
according to some special rules (due to having git-bisect use the good/bad 
marks do determine which commit to try next, and jump there). git-bisect 
doesn't really even care that you started on any single branch. It's just 
operating on the web, and the branch you start on is treated as an 
arbitrary commit that has the problem.

You may find "gitk --all" informative.

> Anyway, I tried this:
> 
> $ git checkout master
> $ git branch
>   bisect
> * master
>   origin
> $ BRANCH=$(git branch | grep "^\*" | sed -e "s/\* //")
> $ echo $BRANCH
> master
> $ git rev-list --max-count=1 $BRANCH
> 5ecd3100e695228ac5e0ce0e325e252c0f11806f
> 
> Is it correct that this last command gives me the 'git id' (if that
> is the correct name for the hash) of the revision that my local
> working copy is at?

Yes.

> Can you tell me what is the latest git id that you see?

I'm seeing de7f928ca460005086a8296be07c217aac4b625d, but I just got the 
latest code, more recently than you probably did.

> Because, if I compile 5ecd3100e695228ac5e0ce0e325e252c0f11806f is
> still hangs at boot :(

It looks like you moved master back to 2.6.22-rc4 (with git reset --hard 
v2.6.22-rc4) at some point.

What you should do now is:

$ git checkout master
$ git merge origin

Which should move master forward through the web to "origin", which is 
(unless you've moved it) what you got from upsteam.

Alternatively:

$ git checkout master
$ git pull

Should fetch the latest stuff and advance master to the fetched version.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: My kernel hangs again: Help with git please

2007-06-15 Thread Daniel Barkalow
On Sat, 16 Jun 2007, Carlo Wood wrote:

> Therefore I have the following questions:
> 
> 1) What git command will ASSURE that I get the LATEST
>kernel tree checked out?
>
> I tried this:
> 
> hikaru:/usr/src/kernel/git/linux-2.6>git branch -l
> * bisect
>   master
>   origin
> hikaru:/usr/src/kernel/git/linux-2.6>git reset --hard HEAD

HEAD doesn't mean what you think it means. It's the latest revision on the 
branch with the *. What you want is:

$ git checkout master

This will move the * to "master", which shouldn't have been affected by 
any of this, and move your working directory to this point as well. At 
that point, you should be able to build a working kernel.

What "git reset --hard HEAD" does is discard any differences to tracked 
files between your working directory and the revision you're on. It's 
relevant if you want to discard local changes, not otherwise.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: My kernel hangs again: Help with git please

2007-06-15 Thread Daniel Barkalow
On Sat, 16 Jun 2007, Carlo Wood wrote:

 Therefore I have the following questions:
 
 1) What git command will ASSURE that I get the LATEST
kernel tree checked out?

 I tried this:
 
 hikaru:/usr/src/kernel/git/linux-2.6git branch -l
 * bisect
   master
   origin
 hikaru:/usr/src/kernel/git/linux-2.6git reset --hard HEAD

HEAD doesn't mean what you think it means. It's the latest revision on the 
branch with the *. What you want is:

$ git checkout master

This will move the * to master, which shouldn't have been affected by 
any of this, and move your working directory to this point as well. At 
that point, you should be able to build a working kernel.

What git reset --hard HEAD does is discard any differences to tracked 
files between your working directory and the revision you're on. It's 
relevant if you want to discard local changes, not otherwise.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: My kernel hangs again: Help with git please

2007-06-15 Thread Daniel Barkalow
On Sat, 16 Jun 2007, Carlo Wood wrote:

 I don't understand - any branch that I am on has many tags. I can use
 'git reset --hard sometag' to change the source tree to that tag (which
 works if I look at the version in the Makefile and pick tags that are
 far apart enough).

That's not actually the right image. There's a graph of commits with a lot 
of splitting and joining lines. Each branch and each tag sits something in 
this web. The difference between branches and tags is that you're expected 
to move branch pointers around, and tags stay mostly in place. There's no 
accounting of commits newer than the current spot in the web for a branch 
belonging to that branch, so if you move a branch back to an older tag (or 
other commit), the spot it's leaving is no longer on the branch.

So master is a point in the web, and bisect jumps around through the web 
according to some special rules (due to having git-bisect use the good/bad 
marks do determine which commit to try next, and jump there). git-bisect 
doesn't really even care that you started on any single branch. It's just 
operating on the web, and the branch you start on is treated as an 
arbitrary commit that has the problem.

You may find gitk --all informative.

 Anyway, I tried this:
 
 $ git checkout master
 $ git branch
   bisect
 * master
   origin
 $ BRANCH=$(git branch | grep ^\* | sed -e s/\* //)
 $ echo $BRANCH
 master
 $ git rev-list --max-count=1 $BRANCH
 5ecd3100e695228ac5e0ce0e325e252c0f11806f
 
 Is it correct that this last command gives me the 'git id' (if that
 is the correct name for the hash) of the revision that my local
 working copy is at?

Yes.

 Can you tell me what is the latest git id that you see?

I'm seeing de7f928ca460005086a8296be07c217aac4b625d, but I just got the 
latest code, more recently than you probably did.

 Because, if I compile 5ecd3100e695228ac5e0ce0e325e252c0f11806f is
 still hangs at boot :(

It looks like you moved master back to 2.6.22-rc4 (with git reset --hard 
v2.6.22-rc4) at some point.

What you should do now is:

$ git checkout master
$ git merge origin

Which should move master forward through the web to origin, which is 
(unless you've moved it) what you got from upsteam.

Alternatively:

$ git checkout master
$ git pull

Should fetch the latest stuff and advance master to the fetched version.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-27 Thread Daniel Barkalow
On Thu, 26 Apr 2007, Adrian Bunk wrote:

> Linus said 2.6.20 was a stable kernel. My impression was that at least 
> two of the regressions from my 2.6.20 regressions list should have been 
> fixed before 2.6.20.
> 
> They have both been fixed through -stable, but I also remember a quite 
> experienced kernel maintainer running into one of them after 2.6.20 was 
> released and spending half a day tracking it down - and my answer was
> "known unfixed regression, first reported more than a month ago".

I think there is an issue with two different things being conflated, and 
this causes real stability problems. 2.6.x is both the first kernel in a 
series that is judged to be "stable" and the kernel that is the split 
between 2.6.x.y and 2.6.x+1. This is a fundamental problem, because it 
means that 2.6.x must have all of the problems that are being debugged by 
the people who understand the areas they are in, because 2.6.x+1 has to 
start so that people who are clueless about all of the areas with 
remaining bugs don't spend their time putting more regressions into their 
submissions for 2.6.x+1.

It is also a problem because it is easily possible for a problem to exist 
in 2.6.x-rcN which can only be correctly fixed by doing intrusive things, 
but can be papered over in an obviously-safe way. (E.g., the issue with 
legacy interrupt delivery when MSI is enabled). The intrusive patch could 
easily break a bunch of unrelated stuff, so that's no good for 2.6.x-rcN, 
but papering over bugs is no good for mainline. These bugs have to be 
fixed after the split, which means that the version at the fork must 
contain the bug.

Furthermore, everybody (people reporting bugs, people fixing them, and 
people merging fixes) seem to doze off late in -rc kernels. Having an 
announcement of something with a qualitatively different version wakes 
them up.

I say have a target of no known regressions in 2.6.21.1, with 2.6.21 being 
pretty good, and don't count too much on the stability of 3-number kernel 
versions.

> And a serious delay of the next regression-merge window due to unfixed 
> regressions might even have the positive side effect of more developers 
> becoming interested in fixing the current regressions for getting their 
> shiny new regressions^Wfeatures faster into Linus' tree.

I think the "stick" can't be delaying the window, because that's too 
broad. I think it has to be making people who are needed for fixing stuff 
miss the window. People aren't going to go learn a new area of the kernel 
to resolve regressions in it, but they're more likely to keep their own 
area clean so that they get to merge every 2 months instead of every 4.

> These are just my personal opinions, and other people consider the 
> resulting 2.6.20 and 2.6.21 kernels OK.

I don't think 2.6.x can be OK, by policy. I think 2.6.20.y got to an OK 
state eventually, which is to say that there's no need now to use a 
2.6.19.y kernel. I think that 2.6.21 isn't OK yet, but I think looking for 
an OK 2.6.21-derived kernel is premature still. Ignoring the version 
scheme entirely, I think the success condition should be that the "latest 
stable version of the Linux kernel" link on www.kernel.org is 
always strictly better than all previous links in that spot, and new 
features get there eventually (ideally, within 4 months of hitting 
mainline). I think this is actually possible, although it would require 
changing the policy for this link. And I don't think it requires a change 
in what goes into Linus's git repository when.

Furthermore, I think we're a lot closer to an OK kernel derived from 
Linus's Apr 25 version than we would be if "2.6.21" had not been released 
at that point. It sounds like more items were resolved in the past few 
days than in the preceding week.

Incidentally, will you continue to track 2.6.21 regressions against 
2.6.20? You said there was at least one that you haven't sent out, and 
there's been movement on several others since your last report.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-27 Thread Daniel Barkalow
On Thu, 26 Apr 2007, Adrian Bunk wrote:

 Linus said 2.6.20 was a stable kernel. My impression was that at least 
 two of the regressions from my 2.6.20 regressions list should have been 
 fixed before 2.6.20.
 
 They have both been fixed through -stable, but I also remember a quite 
 experienced kernel maintainer running into one of them after 2.6.20 was 
 released and spending half a day tracking it down - and my answer was
 known unfixed regression, first reported more than a month ago.

I think there is an issue with two different things being conflated, and 
this causes real stability problems. 2.6.x is both the first kernel in a 
series that is judged to be stable and the kernel that is the split 
between 2.6.x.y and 2.6.x+1. This is a fundamental problem, because it 
means that 2.6.x must have all of the problems that are being debugged by 
the people who understand the areas they are in, because 2.6.x+1 has to 
start so that people who are clueless about all of the areas with 
remaining bugs don't spend their time putting more regressions into their 
submissions for 2.6.x+1.

It is also a problem because it is easily possible for a problem to exist 
in 2.6.x-rcN which can only be correctly fixed by doing intrusive things, 
but can be papered over in an obviously-safe way. (E.g., the issue with 
legacy interrupt delivery when MSI is enabled). The intrusive patch could 
easily break a bunch of unrelated stuff, so that's no good for 2.6.x-rcN, 
but papering over bugs is no good for mainline. These bugs have to be 
fixed after the split, which means that the version at the fork must 
contain the bug.

Furthermore, everybody (people reporting bugs, people fixing them, and 
people merging fixes) seem to doze off late in -rc kernels. Having an 
announcement of something with a qualitatively different version wakes 
them up.

I say have a target of no known regressions in 2.6.21.1, with 2.6.21 being 
pretty good, and don't count too much on the stability of 3-number kernel 
versions.

 And a serious delay of the next regression-merge window due to unfixed 
 regressions might even have the positive side effect of more developers 
 becoming interested in fixing the current regressions for getting their 
 shiny new regressions^Wfeatures faster into Linus' tree.

I think the stick can't be delaying the window, because that's too 
broad. I think it has to be making people who are needed for fixing stuff 
miss the window. People aren't going to go learn a new area of the kernel 
to resolve regressions in it, but they're more likely to keep their own 
area clean so that they get to merge every 2 months instead of every 4.

 These are just my personal opinions, and other people consider the 
 resulting 2.6.20 and 2.6.21 kernels OK.

I don't think 2.6.x can be OK, by policy. I think 2.6.20.y got to an OK 
state eventually, which is to say that there's no need now to use a 
2.6.19.y kernel. I think that 2.6.21 isn't OK yet, but I think looking for 
an OK 2.6.21-derived kernel is premature still. Ignoring the version 
scheme entirely, I think the success condition should be that the latest 
stable version of the Linux kernel link on www.kernel.org is 
always strictly better than all previous links in that spot, and new 
features get there eventually (ideally, within 4 months of hitting 
mainline). I think this is actually possible, although it would require 
changing the policy for this link. And I don't think it requires a change 
in what goes into Linus's git repository when.

Furthermore, I think we're a lot closer to an OK kernel derived from 
Linus's Apr 25 version than we would be if 2.6.21 had not been released 
at that point. It sounds like more items were resolved in the past few 
days than in the preceding week.

Incidentally, will you continue to track 2.6.21 regressions against 
2.6.20? You said there was at least one that you haven't sent out, and 
there's been movement on several others since your last report.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-26 Thread Daniel Barkalow

On Thu, 26 Apr 2007, Adrian Bunk wrote:

> Number of different known regressions compared to 2.6.20 at the time
> of the 2.6.21 release:
> 14

I count 13. (v2) had 15 items, of which 2 were subsequently fixed or found 
to be inapplicable.

> Number of different known regressions compared to 2.6.20 at the time
> of the 2.6.21 release with patches available at the time of the 2.6.21 
> release [1]:
> 3

The -stable team can presumably take care of these in 2.6.21.1, right? 
That leaves 10 that need developer attention.

 John Stultz seems to be taking care of 3 of them.

 Oliver Neukum has 1.

 2 are particular drivers (ali_pata and rtl8139, according to the 
 reports).

 2 seem to be ACPI-related; at least one has a candidate patch now.

 1 seems to be an ALSA problem.

 1 is STD and being debugged.

It looks like all of the known regressions are being worked on, and 
getting fixes in for them is -stable material at this point. Furthermore, 
it doesn't look to me like anyone who is needed for dealing with these 
regressions is trying to get stuff into the 2.6.22 merge window.

I think it's clear that this is the right point for Linus to start the 
2.6.22 cycle and leave the rest of the 2.6.21 work to the -stable team, 
who are the experts of taking care of this sort of stuff. Furthermore, it 
seems like -rc testers at this point have found everything in 2.6.21-rc 
they're going to, so, again, it's time for new regressions. Personally, I'd 
vote for having Linus leave off at 2.6.X-final, and have 2.6.X be the 
first -stable release of the series, where the remaining known regressions 
get fixed, but that's an issue of nomenclature, not development process.

I think you've allowed for a well-tested 2.6.21, and a good chance of a 
2.6.21.1 or .2 with no known regressions against 2.6.20, which seems to me 
like you succeeded as far as everything except making Linus a release 
engineer.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-26 Thread Daniel Barkalow

On Thu, 26 Apr 2007, Adrian Bunk wrote:

 Number of different known regressions compared to 2.6.20 at the time
 of the 2.6.21 release:
 14

I count 13. (v2) had 15 items, of which 2 were subsequently fixed or found 
to be inapplicable.

 Number of different known regressions compared to 2.6.20 at the time
 of the 2.6.21 release with patches available at the time of the 2.6.21 
 release [1]:
 3

The -stable team can presumably take care of these in 2.6.21.1, right? 
That leaves 10 that need developer attention.

 John Stultz seems to be taking care of 3 of them.

 Oliver Neukum has 1.

 2 are particular drivers (ali_pata and rtl8139, according to the 
 reports).

 2 seem to be ACPI-related; at least one has a candidate patch now.

 1 seems to be an ALSA problem.

 1 is STD and being debugged.

It looks like all of the known regressions are being worked on, and 
getting fixes in for them is -stable material at this point. Furthermore, 
it doesn't look to me like anyone who is needed for dealing with these 
regressions is trying to get stuff into the 2.6.22 merge window.

I think it's clear that this is the right point for Linus to start the 
2.6.22 cycle and leave the rest of the 2.6.21 work to the -stable team, 
who are the experts of taking care of this sort of stuff. Furthermore, it 
seems like -rc testers at this point have found everything in 2.6.21-rc 
they're going to, so, again, it's time for new regressions. Personally, I'd 
vote for having Linus leave off at 2.6.X-final, and have 2.6.X be the 
first -stable release of the series, where the remaining known regressions 
get fixed, but that's an issue of nomenclature, not development process.

I think you've allowed for a well-tested 2.6.21, and a good chance of a 
2.6.21.1 or .2 with no known regressions against 2.6.20, which seems to me 
like you succeeded as far as everything except making Linus a release 
engineer.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] New driver information

2007-02-16 Thread Daniel Barkalow
On Fri, 16 Feb 2007, Heikki Orsila wrote:

> I just read
> 
>   http://kerneltrap.org/node/7729
> 
> and it occured to me that it would be informative to have a new device 
> driver macro. The motivation for the new macro would be 4 issues:
> 
>   * Is it possible to get specifications for the device?
>   * If yes, under what terms? (nda, public)
>   * Where to get public specs?
>   * How many closed and open drivers in the Linux source tree?

This doesn't make any sense as a driver macro, because it's per device, 
not per driver. E.g., the sdhci driver drives a number of devices, 
including both well-documented devices and devices whose only 
documentation is that the PCI ID matches (and they work with only a few 
quirks).

On the other hand, a kconfig-readable table of PCI, USB, etc IDs with this 
information isn't a bad idea, especially if the drivers actually depend on 
it (so that it has to be kept up to date, at least as far as the 
device/driver mapping).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] New driver information

2007-02-16 Thread Daniel Barkalow
On Fri, 16 Feb 2007, Heikki Orsila wrote:

 I just read
 
   http://kerneltrap.org/node/7729
 
 and it occured to me that it would be informative to have a new device 
 driver macro. The motivation for the new macro would be 4 issues:
 
   * Is it possible to get specifications for the device?
   * If yes, under what terms? (nda, public)
   * Where to get public specs?
   * How many closed and open drivers in the Linux source tree?

This doesn't make any sense as a driver macro, because it's per device, 
not per driver. E.g., the sdhci driver drives a number of devices, 
including both well-documented devices and devices whose only 
documentation is that the PCI ID matches (and they work with only a few 
quirks).

On the other hand, a kconfig-readable table of PCI, USB, etc IDs with this 
information isn't a bad idea, especially if the drivers actually depend on 
it (so that it has to be kept up to date, at least as far as the 
device/driver mapping).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is this bug too obvious?

2007-02-13 Thread Daniel Barkalow
On Tue, 13 Feb 2007, Chuck Ebbert wrote:

> drivers/usb/net/usbnet.c:
> 
> int
> usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
> {
> struct usbnet   *dev;
> struct net_device   *net;
> struct usb_host_interface   *interface;
> struct driver_info  *info;
> struct usb_device   *xdev;
> int status;
> 
>   ...
> 
> net = alloc_etherdev(sizeof(*dev));
> 
>   *net ???

No, alloc_etherdev takes the size of the private data, which, in this 
case, is *dev.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is this bug too obvious?

2007-02-13 Thread Daniel Barkalow
On Tue, 13 Feb 2007, Chuck Ebbert wrote:

 drivers/usb/net/usbnet.c:
 
 int
 usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 {
 struct usbnet   *dev;
 struct net_device   *net;
 struct usb_host_interface   *interface;
 struct driver_info  *info;
 struct usb_device   *xdev;
 int status;
 
   ...
 
 net = alloc_etherdev(sizeof(*dev));
 
   *net ???

No, alloc_etherdev takes the size of the private data, which, in this 
case, is *dev.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: NAK new drivers without proper power management?

2007-02-11 Thread Daniel Barkalow
On Sun, 11 Feb 2007, Rafael J. Wysocki wrote:

> The problem is it was made implicit long ago.  The design is "optimistic", so
> to speak, and I think we have the following choices:
> 
> 1) Change the design to make the kernel refuse to suspend if there are any
> drivers not explicitly flagged as "suspend/resume-safe".  [This looks like a
> lot of work to me, but it is generally doable provided that someone has enough
> time to do it.  Unfortunately it has to be done in one shot for all of the
> known good drivers to avoid user-observable regressions.]

The kernel wouldn't necessarily have to refuse to suspend. It could just 
warn (and list the drivers that aren't marked), or could require some 
extra insistance from the user. It would be good to have it log a message 
saying something like: "If you can read this, report that ne2000 seems to 
be safe for suspend/resume". Having drivers explicitly marked as to 
whether they are safe is a good kernel feature; what to do if they're not 
is policy.

> 2) Require the authors of new drivers to _either_ ensure that their drivers
> will be suspend/resume-safe (and I mean both STR and STD here), _or_ 
> explicitly
> flag the drivers as "suspend/resume-unsafe", for example by impelenting
> .suspend() routines returning -ENOSYS.  [The existing drivers can be modified
> to follow this convention gradually.]

I don't see any reason not to do (2) regardless of (1). That was (my idea 
of) the statement that started this thread: new drivers need to not mess 
up on suspend/resume, as a matter of suitability for inclusion. Of course, 
we need some way for drivers to indicate that they work fine with the 
PCI-layer defaults. And it should probably more machine-readable than the 
author telling reviewers that it works.

> - Problem what to do with drivers that work for some people and don't work
> for the others (ie. if we don't flag them as known good, we will break the
> setups in which they work)

I think the only interesting case here is when a device resumes fine with 
no driver support if the BIOS manages to deal effectively with it, but the 
BIOS generally doesn't. Otherwise, I think it's only going to work at 
all if the author put in the effort to make it work (so it should be 
"known good"), but there may be bugs (firmware, BIOS, driver, etc). But 
that's true of any functionality.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: NAK new drivers without proper power management?

2007-02-11 Thread Daniel Barkalow
On Sun, 11 Feb 2007, Rafael J. Wysocki wrote:

 The problem is it was made implicit long ago.  The design is optimistic, so
 to speak, and I think we have the following choices:
 
 1) Change the design to make the kernel refuse to suspend if there are any
 drivers not explicitly flagged as suspend/resume-safe.  [This looks like a
 lot of work to me, but it is generally doable provided that someone has enough
 time to do it.  Unfortunately it has to be done in one shot for all of the
 known good drivers to avoid user-observable regressions.]

The kernel wouldn't necessarily have to refuse to suspend. It could just 
warn (and list the drivers that aren't marked), or could require some 
extra insistance from the user. It would be good to have it log a message 
saying something like: If you can read this, report that ne2000 seems to 
be safe for suspend/resume. Having drivers explicitly marked as to 
whether they are safe is a good kernel feature; what to do if they're not 
is policy.

 2) Require the authors of new drivers to _either_ ensure that their drivers
 will be suspend/resume-safe (and I mean both STR and STD here), _or_ 
 explicitly
 flag the drivers as suspend/resume-unsafe, for example by impelenting
 .suspend() routines returning -ENOSYS.  [The existing drivers can be modified
 to follow this convention gradually.]

I don't see any reason not to do (2) regardless of (1). That was (my idea 
of) the statement that started this thread: new drivers need to not mess 
up on suspend/resume, as a matter of suitability for inclusion. Of course, 
we need some way for drivers to indicate that they work fine with the 
PCI-layer defaults. And it should probably more machine-readable than the 
author telling reviewers that it works.

 - Problem what to do with drivers that work for some people and don't work
 for the others (ie. if we don't flag them as known good, we will break the
 setups in which they work)

I think the only interesting case here is when a device resumes fine with 
no driver support if the BIOS manages to deal effectively with it, but the 
BIOS generally doesn't. Otherwise, I think it's only going to work at 
all if the author put in the effort to make it work (so it should be 
known good), but there may be bugs (firmware, BIOS, driver, etc). But 
that's true of any functionality.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: NAK new drivers without proper power management?

2007-02-10 Thread Daniel Barkalow
On Sat, 10 Feb 2007, Rafael J. Wysocki wrote:

> On Saturday, 10 February 2007 11:02, Nigel Cunningham wrote:
>
> > Well, the original desire was to stop new drivers getting in without
> > proper power management.
> 
> I know, but I agree with the argument that having a driver without the
> suspend/resume support is better than not having the driver at all.

How about if "proper power management" is defined to include the driver 
explicitly preventing suspend? It seems to me like the current problem is 
that driver writers don't think about power management at all, and the 
result is that, after suspend/resume, the system doesn't come back. It 
would be better if driver writers had to think about power management just 
enough to realize that it's not going to work, and make this information 
available to the system. At that point, it's relatively easy for the 
system to do something useful about it.

> Also, I think there are quite some drivers already in the tree that don't
> support suspend/resume explicitly and honestly we should start from adding the
> suspend/resume routines to these drivers _before_ we ban new drivers like 
> that.

It'd be relatively quick to modify all the current drivers that don't 
explicitly support suspend/resume to explicitly not support it. (Or to 
explicitly support it trivially; /dev/null obviously doesn't need 
anything.)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: NAK new drivers without proper power management?

2007-02-10 Thread Daniel Barkalow
On Sat, 10 Feb 2007, Rafael J. Wysocki wrote:

 On Saturday, 10 February 2007 11:02, Nigel Cunningham wrote:

  Well, the original desire was to stop new drivers getting in without
  proper power management.
 
 I know, but I agree with the argument that having a driver without the
 suspend/resume support is better than not having the driver at all.

How about if proper power management is defined to include the driver 
explicitly preventing suspend? It seems to me like the current problem is 
that driver writers don't think about power management at all, and the 
result is that, after suspend/resume, the system doesn't come back. It 
would be better if driver writers had to think about power management just 
enough to realize that it's not going to work, and make this information 
available to the system. At that point, it's relatively easy for the 
system to do something useful about it.

 Also, I think there are quite some drivers already in the tree that don't
 support suspend/resume explicitly and honestly we should start from adding the
 suspend/resume routines to these drivers _before_ we ban new drivers like 
 that.

It'd be relatively quick to modify all the current drivers that don't 
explicitly support suspend/resume to explicitly not support it. (Or to 
explicitly support it trivially; /dev/null obviously doesn't need 
anything.)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-04 Thread Daniel Barkalow
On Sun, 4 Feb 2007, Robert Hancock wrote:

> Something's busted with forcedeth in 2.6.20-rc6-mm3 for me relative to
> 2.6.20-rc6. There's no errors in dmesg, but it seems no packets ever get
> received and so the machine can't get an IP address. I tried reverting all the
> -mm changes to drivers/net/forcedeth.c, which didn't help. The network
> controller shares an IRQ with the USB OHCI controller which is receiving
> interrupts, so it doesn't seem like an interrupt routing problem, though I
> suppose something wierd could be happening there.

IIRC, forcedeth tries to use MSI by default. Perhaps the hardware is using 
it, but the kernel thinks enabling it didn't work? I think there's a 
module option for forcedeth to disable MSI, which might be worth a try to 
see if it has any effect.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-04 Thread Daniel Barkalow
On Sun, 4 Feb 2007, Robert Hancock wrote:

 Something's busted with forcedeth in 2.6.20-rc6-mm3 for me relative to
 2.6.20-rc6. There's no errors in dmesg, but it seems no packets ever get
 received and so the machine can't get an IP address. I tried reverting all the
 -mm changes to drivers/net/forcedeth.c, which didn't help. The network
 controller shares an IRQ with the USB OHCI controller which is receiving
 interrupts, so it doesn't seem like an interrupt routing problem, though I
 suppose something wierd could be happening there.

IIRC, forcedeth tries to use MSI by default. Perhaps the hardware is using 
it, but the kernel thinks enabling it didn't work? I think there's a 
module option for forcedeth to disable MSI, which might be worth a try to 
see if it has any effect.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.18-stable release plans?

2007-01-23 Thread Daniel Barkalow
On Tue, 23 Jan 2007, Jesper Juhl wrote:

> Now that 2.6.19 is out, most likely not.  -stable releases are made
> for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one
> -stable patches are made for (2.6.16 is an exception)..

There's generally a bit of overlap. 2.6.17.14 was about the same time as 
2.6.18.1, and 2.6.18.6 was after 2.6.19.1. But 2.6.18.x must be over now, 
because the -stable team didn't release a 2.6.18.7 to match 2.6.19.2, and 
all of 2.6.x except for 2.6.19.2 has that weird file corruption bug 
(although rarely triggered).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.18-stable release plans?

2007-01-23 Thread Daniel Barkalow
On Tue, 23 Jan 2007, Jesper Juhl wrote:

 Now that 2.6.19 is out, most likely not.  -stable releases are made
 for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one
 -stable patches are made for (2.6.16 is an exception)..

There's generally a bit of overlap. 2.6.17.14 was about the same time as 
2.6.18.1, and 2.6.18.6 was after 2.6.19.1. But 2.6.18.x must be over now, 
because the -stable team didn't release a 2.6.18.7 to match 2.6.19.2, and 
all of 2.6.x except for 2.6.19.2 has that weird file corruption bug 
(although rarely triggered).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Unbreak MSI on ATI devices

2007-01-05 Thread Daniel Barkalow
On Fri, 5 Jan 2007, Petr Vandrovec wrote:

> Hi,
>   unfortunately it is not everything :-(
> 
> I cannot get MSI to work on IDE interface under any circumstances - in legacy
> mode it always uses IRQ14/15 regardless of whether MSI is enabled or not
> (that's probably correct), but in native mode as soon as I enable MSI it
> either does not deliver interrupts at all (definitely not through IRQ14/15,
> and, if I got routing right, also not through its INTA#), or it delivers them
> somewhere else than where programmed.  As my boot device is connected to this
> adapter, and it is a notebook, it is not easy to debug what's really going on
> :-(

Are you doing this with INTx left on or turned off? Have you determined 
whether turning off INTx does anything useful on these devices when you're 
not using MSI? (There are only a few places in the kernel which disable 
INTx, mostly associated with enabling MSI.)

It might be easier to test if you boot off a USB storage device of some 
sort.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Unbreak MSI on ATI devices

2007-01-05 Thread Daniel Barkalow
On Thu, 4 Jan 2007, Roland Dreier wrote:

>  > So my question is - what is real reason for disabling INTX when in MSI 
> mode?
>  > According to PCI spec it should not be needed, and it hurts at least chips
>  > listed below:
>  > 
>  > 00:13.0 0c03: 1002:4374 USB Controller: ATI Technologies Inc IXP SB400 USB 
> Host Controller
>  > 00:13.1 0c03: 1002:4375 USB Controller: ATI Technologies Inc IXP SB400 USB 
> Host Controller
>  > 00:13.2 0c03: 1002:4373 USB Controller: ATI Technologies Inc IXP SB400 
> USB2 Host Controller 
> 
> heh... I'm not gloating or anything... but I am glad that some ASIC
> designer was careless enough to prove me right when I said going
> beyond what the PCI spec requires is dangerous.

No more dangerous than expecting exactly following the PCI spec to be 
sufficient; at least some nVidia devices misbehave if you don't disable 
INTx when using MSI, while at least some ATI devices misehave if you do 
disable INTx. The only *safe* thing is to ignore the PCI spec and match 
the behavior of Windows. In this case, that's just don't use MSI yet.

Of course, this should be relatively easy to handle with quirks, 
especially if it's predictable which hardware bug you get from the vendor 
id.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Unbreak MSI on ATI devices

2007-01-05 Thread Daniel Barkalow
On Thu, 4 Jan 2007, Roland Dreier wrote:

   So my question is - what is real reason for disabling INTX when in MSI 
 mode?
   According to PCI spec it should not be needed, and it hurts at least chips
   listed below:
   
   00:13.0 0c03: 1002:4374 USB Controller: ATI Technologies Inc IXP SB400 USB 
 Host Controller
   00:13.1 0c03: 1002:4375 USB Controller: ATI Technologies Inc IXP SB400 USB 
 Host Controller
   00:13.2 0c03: 1002:4373 USB Controller: ATI Technologies Inc IXP SB400 
 USB2 Host Controller 
 
 heh... I'm not gloating or anything... but I am glad that some ASIC
 designer was careless enough to prove me right when I said going
 beyond what the PCI spec requires is dangerous.

No more dangerous than expecting exactly following the PCI spec to be 
sufficient; at least some nVidia devices misbehave if you don't disable 
INTx when using MSI, while at least some ATI devices misehave if you do 
disable INTx. The only *safe* thing is to ignore the PCI spec and match 
the behavior of Windows. In this case, that's just don't use MSI yet.

Of course, this should be relatively easy to handle with quirks, 
especially if it's predictable which hardware bug you get from the vendor 
id.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Unbreak MSI on ATI devices

2007-01-05 Thread Daniel Barkalow
On Fri, 5 Jan 2007, Petr Vandrovec wrote:

 Hi,
   unfortunately it is not everything :-(
 
 I cannot get MSI to work on IDE interface under any circumstances - in legacy
 mode it always uses IRQ14/15 regardless of whether MSI is enabled or not
 (that's probably correct), but in native mode as soon as I enable MSI it
 either does not deliver interrupts at all (definitely not through IRQ14/15,
 and, if I got routing right, also not through its INTA#), or it delivers them
 somewhere else than where programmed.  As my boot device is connected to this
 adapter, and it is a notebook, it is not easy to debug what's really going on
 :-(

Are you doing this with INTx left on or turned off? Have you determined 
whether turning off INTx does anything useful on these devices when you're 
not using MSI? (There are only a few places in the kernel which disable 
INTx, mostly associated with enabling MSI.)

It might be easier to test if you boot off a USB storage device of some 
sort.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Daniel Barkalow
On Fri, 29 Dec 2006, Adrian Bunk wrote:

> On Fri, Dec 29, 2006 at 01:14:13PM -0500, Daniel Barkalow wrote:
> 
> > There's also http://lkml.org/lkml/2006/12/21/47; the included patch break 
> > my nVidia devices and probably all PCIX devices, so it's not right, but 
> > something has to be done to fix ATI. My guess is a quirk to say that 
> > pci_intx doesn't work on certain devices and should just be skipped, but 
> > I'm not sure if it's just in combination with MSI or not.
> 
> This:
> - does not seem to be a regression and
> - missing MSI support is not such a big problem.
> 
> Considering how many problems patches in this area tend to cause on 
> different hardware, I'm even inclined to say that such patches should 
> only be added during the 2 weeks merge window before -rc1.

(I was only talking about the first issue/patch as being a regression, 
obviously, and forgot that there was more to the email I cited.)

Ah, okay. I somehow missed that all of the devices that were reported 
to break with the MSI change in mainline doesn't support MSI in mainline. 
Actually, I wouldn't be surprised if this issue applied to audio on ATI 
SB450 and later, which (I think) use the hda_intel driver, which supports 
MSI (although I guess it's still defaulting to disabled). If this is true, 
it would be a regression since 2.6.19.

The addition of a quirk to not use pci_intx with MSI on ATI PCI devices 
should be safe (until 2.6.20-rc1, this was the usual kernel behavior), but 
is clearly not critical if mainline doesn't use MSI with any such devices 
anyway.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Daniel Barkalow
There's also http://lkml.org/lkml/2006/12/21/47; the included patch break 
my nVidia devices and probably all PCIX devices, so it's not right, but 
something has to be done to fix ATI. My guess is a quirk to say that 
pci_intx doesn't work on certain devices and should just be skipped, but 
I'm not sure if it's just in combination with MSI or not.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Daniel Barkalow
There's also http://lkml.org/lkml/2006/12/21/47; the included patch break 
my nVidia devices and probably all PCIX devices, so it's not right, but 
something has to be done to fix ATI. My guess is a quirk to say that 
pci_intx doesn't work on certain devices and should just be skipped, but 
I'm not sure if it's just in combination with MSI or not.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc2: known unfixed regressions

2006-12-29 Thread Daniel Barkalow
On Fri, 29 Dec 2006, Adrian Bunk wrote:

 On Fri, Dec 29, 2006 at 01:14:13PM -0500, Daniel Barkalow wrote:
 
  There's also http://lkml.org/lkml/2006/12/21/47; the included patch break 
  my nVidia devices and probably all PCIX devices, so it's not right, but 
  something has to be done to fix ATI. My guess is a quirk to say that 
  pci_intx doesn't work on certain devices and should just be skipped, but 
  I'm not sure if it's just in combination with MSI or not.
 
 This:
 - does not seem to be a regression and
 - missing MSI support is not such a big problem.
 
 Considering how many problems patches in this area tend to cause on 
 different hardware, I'm even inclined to say that such patches should 
 only be added during the 2 weeks merge window before -rc1.

(I was only talking about the first issue/patch as being a regression, 
obviously, and forgot that there was more to the email I cited.)

Ah, okay. I somehow missed that all of the devices that were reported 
to break with the MSI change in mainline doesn't support MSI in mainline. 
Actually, I wouldn't be surprised if this issue applied to audio on ATI 
SB450 and later, which (I think) use the hda_intel driver, which supports 
MSI (although I guess it's still defaulting to disabled). If this is true, 
it would be a regression since 2.6.19.

The addition of a quirk to not use pci_intx with MSI on ATI PCI devices 
should be safe (until 2.6.20-rc1, this was the usual kernel behavior), but 
is clearly not critical if mainline doesn't use MSI with any such devices 
anyway.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Unbreak MSI on ATI devices

2006-12-24 Thread Daniel Barkalow
On Thu, 21 Dec 2006, Petr Vandrovec wrote:

> So my question is - what is real reason for disabling INTX when in MSI mode?
> According to PCI spec it should not be needed.

The PCI spec is at least not clear enough on the matter to keep nVidia 
from thinking that it's the OS's responsibility to make legacy interrupts 
not happen, by disabling INTX.

> None of devices in the box assert INTX while in MSI even if INTX is enabled.

I've got a forcedeth-driven ethernet card that does, and people have 
reported that nVidia "Intel HDA" sound does as well.

> So I'd like to see first patch below accepted.  If there are some 
> devices which require INTX disabling, then apparently decision whether 
> to disable it or no has to be moved to device drivers, or some 
> blacklist/whitelist must be created...

PCI Express (IIRC) had the pci_intx() calls already, so it's probably 
actually required by the spec (or at least common implementations) there. 

I'd guess that it's more common for hardware to be unhappy with intx 
enabled than to be unhappy with intx disabled, since the hardware is 
supposed to not send legacy interrupts.

> I'm not sure about second one - I have it in my tree for months, but I run 
> that kernel only on hardware mentioned above, so it is probably too dangerous 
> until pci_enable_msi() gets answer whether MSI works or no always right.

I think it'd be better to add an module parameter, like in the later 
drivers in your patch. Figuring out how to get MSI working whenever it's 
available isn't going to move forward unless there's an easy way to test 
it, especially since (according to rumor) Windows doesn't use it at all.

> /proc/interrupts after patch.  Before patch *hci_hcd:usb* were at zero, 
> IRQ21 was stuck with IRQ count at 1, and HCD complained about 
> "Unlink after no-IRQ?".

Maybe the intx disable is just totally broken for your device? It 
certainly shouldn't cause the delivery of *more* legacy interrupts, and if 
it does with MSI enabled, I'd be surprised if it didn't without MSI. My 
guess is that that device should get a quirk to just leave the INTx 
disable bit alone (such that pci_intx doesn't do anything, regardless of 
context).

> diff -uprdN linux/sound/pci/atiixp.c linux/sound/pci/atiixp.c
> --- linux/sound/pci/atiixp.c  2006-12-16 13:35:47.0 -0800
> +++ linux/sound/pci/atiixp.c  2006-12-16 13:57:09.0 -0800
> @@ -1442,6 +1446,11 @@ static int snd_atiixp_suspend(struct pci
>   snd_atiixp_aclink_down(chip);
>   snd_atiixp_chip_stop(chip);
>  
> + if (chip->have_msi) {
> + pci_disable_msi(pci);
> + } else {
> + pci_intx(pci, 0);
> + }

This doesn't look right, at least for !chip->have_msi. Or is disabling 
intx desirable here for non-MSI? I'd guess that devices that freak out if 
you fiddle with intx are likely to be old, and therefore likely to not 
support MSI.

> @@ -1532,6 +1546,11 @@ static int snd_atiixp_free(struct atiixp
>   if (chip->remap_addr)
>   iounmap(chip->remap_addr);
>   pci_release_regions(chip->pci);
> + if (chip->have_msi) {
> + pci_disable_msi(chip->pci);
> + } else {
> + pci_intx(chip->pci, 0);
> + }

My playing with forcedeth trying to get my system working suggests that 
the expected intx state for a device with no driver is "not disabled". I 
think the else clause here would cause the device to not work if you used 
this driver, unloaded the module, and loaded a version without the patch 
(or kexeced an older kernel, or soft-rebooted into some operating system 
without MSI support).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Unbreak MSI on ATI devices

2006-12-24 Thread Daniel Barkalow
On Thu, 21 Dec 2006, Petr Vandrovec wrote:

 So my question is - what is real reason for disabling INTX when in MSI mode?
 According to PCI spec it should not be needed.

The PCI spec is at least not clear enough on the matter to keep nVidia 
from thinking that it's the OS's responsibility to make legacy interrupts 
not happen, by disabling INTX.

 None of devices in the box assert INTX while in MSI even if INTX is enabled.

I've got a forcedeth-driven ethernet card that does, and people have 
reported that nVidia Intel HDA sound does as well.

 So I'd like to see first patch below accepted.  If there are some 
 devices which require INTX disabling, then apparently decision whether 
 to disable it or no has to be moved to device drivers, or some 
 blacklist/whitelist must be created...

PCI Express (IIRC) had the pci_intx() calls already, so it's probably 
actually required by the spec (or at least common implementations) there. 

I'd guess that it's more common for hardware to be unhappy with intx 
enabled than to be unhappy with intx disabled, since the hardware is 
supposed to not send legacy interrupts.

 I'm not sure about second one - I have it in my tree for months, but I run 
 that kernel only on hardware mentioned above, so it is probably too dangerous 
 until pci_enable_msi() gets answer whether MSI works or no always right.

I think it'd be better to add an module parameter, like in the later 
drivers in your patch. Figuring out how to get MSI working whenever it's 
available isn't going to move forward unless there's an easy way to test 
it, especially since (according to rumor) Windows doesn't use it at all.

 /proc/interrupts after patch.  Before patch *hci_hcd:usb* were at zero, 
 IRQ21 was stuck with IRQ count at 1, and HCD complained about 
 Unlink after no-IRQ?.

Maybe the intx disable is just totally broken for your device? It 
certainly shouldn't cause the delivery of *more* legacy interrupts, and if 
it does with MSI enabled, I'd be surprised if it didn't without MSI. My 
guess is that that device should get a quirk to just leave the INTx 
disable bit alone (such that pci_intx doesn't do anything, regardless of 
context).

 diff -uprdN linux/sound/pci/atiixp.c linux/sound/pci/atiixp.c
 --- linux/sound/pci/atiixp.c  2006-12-16 13:35:47.0 -0800
 +++ linux/sound/pci/atiixp.c  2006-12-16 13:57:09.0 -0800
 @@ -1442,6 +1446,11 @@ static int snd_atiixp_suspend(struct pci
   snd_atiixp_aclink_down(chip);
   snd_atiixp_chip_stop(chip);
  
 + if (chip-have_msi) {
 + pci_disable_msi(pci);
 + } else {
 + pci_intx(pci, 0);
 + }

This doesn't look right, at least for !chip-have_msi. Or is disabling 
intx desirable here for non-MSI? I'd guess that devices that freak out if 
you fiddle with intx are likely to be old, and therefore likely to not 
support MSI.

 @@ -1532,6 +1546,11 @@ static int snd_atiixp_free(struct atiixp
   if (chip-remap_addr)
   iounmap(chip-remap_addr);
   pci_release_regions(chip-pci);
 + if (chip-have_msi) {
 + pci_disable_msi(chip-pci);
 + } else {
 + pci_intx(chip-pci, 0);
 + }

My playing with forcedeth trying to get my system working suggests that 
the expected intx state for a device with no driver is not disabled. I 
think the else clause here would cause the device to not work if you used 
this driver, unloaded the module, and loaded a version without the patch 
(or kexeced an older kernel, or soft-rebooted into some operating system 
without MSI support).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth trouble in 2.6.19(.1)

2006-12-19 Thread Daniel Barkalow
On Tue, 19 Dec 2006, John M Flinchbaugh wrote:

> I saw a mention of interrupt handling for forcedeth cards is the
> 2.6.19.1 changelog, but I still see this error in 2.6.19.1.  It started
> in 2.6.19, and it didn't happen in 2.6.18.1.

Nope; the issue fixed in 2.6.19.1 has always existed (provided you had 
hardware suitable to trigger it). And it was an issue of getting bogus 
legacy interrupts when using MSI, which would lead to some other device on 
the same legacy interrupt getting disabled.

I'd suggest reverting 0a07bc645e818b88559d99f52ad45e35352e8228 (fixes a 
lockdep warning, stuff with interrupts, only build tested) as a first 
guess.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth trouble in 2.6.19(.1)

2006-12-19 Thread Daniel Barkalow
On Tue, 19 Dec 2006, John M Flinchbaugh wrote:

 I saw a mention of interrupt handling for forcedeth cards is the
 2.6.19.1 changelog, but I still see this error in 2.6.19.1.  It started
 in 2.6.19, and it didn't happen in 2.6.18.1.

Nope; the issue fixed in 2.6.19.1 has always existed (provided you had 
hardware suitable to trigger it). And it was an issue of getting bogus 
legacy interrupts when using MSI, which would lead to some other device on 
the same legacy interrupt getting disabled.

I'd suggest reverting 0a07bc645e818b88559d99f52ad45e35352e8228 (fixes a 
lockdep warning, stuff with interrupts, only build tested) as a first 
guess.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL only modules

2006-12-18 Thread Daniel Barkalow
On Mon, 18 Dec 2006, Linus Torvalds wrote:

> Static vs dynamic matters for whether it's an AGGREGATE work. Clearly, 
> static linking aggregates the library with the other program in the same 
> binary. There's no question about that. And that _does_ have meaning from 
> a copyright law angle, since if you don't have permission to ship 
> aggregate works under the license, then you can't ship said binary. It's 
> just a non-issue in the specific case of the GPLv2.

Under US law, the distinction is between works that are copyrightable 
themselves as "derivative works" and works that are derived from others, 
but aren't copyrightable. Provided you're allowed to ship aggregate works, 
the question is whether the output of "ld" is a copyrightable work 
distinct from the inputs.

I'd agree that "ar", like "mkisofs", doesn't create a derived work, but I 
think that "objcopy" does create a derived work, and "ld" does too, by 
virtue of modifying the objects it takes to resolve symbols. Now, you 
could distribute to somebody an ar archive of your program, and the 
recipient (given fair use rights to the copy of the program they received) 
could do "gcc program.a -o program" to link it. But I don't think you 
automatically get the right (under the "mere aggregation" permission) to 
distribute the result of relocating the symbols of gnutls around those of 
your program and vice versa, along with modifying the references to 
external symbols from each of these to point to specific locations.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL only modules

2006-12-18 Thread Daniel Barkalow
On Mon, 18 Dec 2006, Linus Torvalds wrote:

 Static vs dynamic matters for whether it's an AGGREGATE work. Clearly, 
 static linking aggregates the library with the other program in the same 
 binary. There's no question about that. And that _does_ have meaning from 
 a copyright law angle, since if you don't have permission to ship 
 aggregate works under the license, then you can't ship said binary. It's 
 just a non-issue in the specific case of the GPLv2.

Under US law, the distinction is between works that are copyrightable 
themselves as derivative works and works that are derived from others, 
but aren't copyrightable. Provided you're allowed to ship aggregate works, 
the question is whether the output of ld is a copyrightable work 
distinct from the inputs.

I'd agree that ar, like mkisofs, doesn't create a derived work, but I 
think that objcopy does create a derived work, and ld does too, by 
virtue of modifying the objects it takes to resolve symbols. Now, you 
could distribute to somebody an ar archive of your program, and the 
recipient (given fair use rights to the copy of the program they received) 
could do gcc program.a -o program to link it. But I don't think you 
automatically get the right (under the mere aggregation permission) to 
distribute the result of relocating the symbols of gnutls around those of 
your program and vice versa, along with modifying the references to 
external symbols from each of these to point to specific locations.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patch] improve INTx toggle for PCI MSI

2006-12-08 Thread Daniel Barkalow
On Thu, 7 Dec 2006, Jeff Garzik wrote:

> "it boots" on ICH7 at least.

It solves my problem (and doesn't break anything).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patch] improve INTx toggle for PCI MSI

2006-12-08 Thread Daniel Barkalow
On Thu, 7 Dec 2006, Jeff Garzik wrote:

 it boots on ICH7 at least.

It solves my problem (and doesn't break anything).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Disable INTx when enabling MSI

2006-12-07 Thread Daniel Barkalow
On Thu, 7 Dec 2006, Greg KH wrote:

> Care to take Jeff's proposed patch, verify that it works and forward it
> on to me?

I'll test it tomorrow. Testing disables my network, and making sure the 
problem exists without the patch kills my disk controller, so I need to 
sit at the computer for a while. I assume that I've got the only known 
device that demonstrates the need for this?

Off topic: would it be wise as a general rule to somehow shut down devices 
whose interrupts get disabled by the "nobody cared!" code? Or maybe call 
their interrupt handlers periodically to keep them alive?

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Disable INTx when enabling MSI

2006-12-07 Thread Daniel Barkalow
Some device manufacturers seem to think it's the OS's responsibility to 
disable legacy interrupt delivery when using MSI. If the driver doesn't 
handle it (which they generally don't), and the device isn't PCI-Express, 
a steady stream of legacy interrupts will be delivered in addition to the 
MSI ones, eventually leading to the legacy IRQ getting disabled, which 
kills any device that shares it.

Jeff proposed a patch in http://lkml.org/lkml/2006/11/21/332 when Linus 
wanted to do it in the PCI layer, but nobody seems to have told the actual 
PCI maintainer.

I'm trying to get a patch into -stable to do pci_intx in exactly the same 
situations, but only for forcedeth (which is the device that's causing 
problems for me), but that requires that the real solution be merged in 
the mainline.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >