Re: [PATCH] Force UNIX domain sockets to be built in

2007-12-31 Thread Theodore Tso
On Tue, Jan 01, 2008 at 04:45:21AM +0100, Bodo Eggert wrote:
> > udev-free != embedded.
> 
> But UNIX=m == waste RAM and have an effectively b0rken system until the 
> module is loaded. 

Well, the system isn't necessarily totally broken.  If you don't use
udev, then system will be crippled, but not totally broken.  Then
again, besides udev, packages such as dbus, gdm, and acpid all use
Unix Domain Sockets --- not to mention cups, avahi-daemon, bluez,
emacsclient, and any X program when the DISPLAY is :0.0.

The question is whether the size of the Unix domain sockets support is
worth the complexity of yet another config option that we expose to
the user.  For the embedded world, OK, maybe they want to save 14k of
non-swappable memory.  But for the non-embedded world, given the 117k
mandatory memory usage of sysfs, or the 124k memory usage of the core
networking stack, never mind the 3 megabytes of memory used by objects
in the kernel subdirectory, it's not clear that it's worth worrying
over 14k of memory, especially when many Unix programs assume
that Unix Domain Sockets are present.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-15 Thread Theodore Tso
On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote:
> I don't see any reason that we couldn't have a tool accessible to Ubuntu 
> users that does a real "git bisect". Git is really good at being scripted 
> by fancy GUIs. It should be easy enough to have a drop down with all of 
> the Ubuntu kernel package releases, where the user selects what works and 
> what doesn't.

It's possible users who haven't yet downloaded a git repository have
to surmount some obstacles that might cause them to lose interest.
First, they have to download some 190 megs of git repository, and if
they have a slow link, that can take a while, and then they have to
build each kernel, which can take a while.  A full kernel build with
everything selected can take good 30 minutes or more, and that's on a
fast dual-core machine with 4gigs of memory and 7200rpm disk drives.
On a slower, memory limited laptop, doing a single kernel build can
take more time than the user has patiences; multiply that by 7 or 8
build and test boots, and it starts to get tiresome.  

And then on top of that there are the issues about whether there is
enough support for dealing with hitting kernel revisions that fail due
to other bugs getting merged in during the -rc1 process, etc.

I agree that a tool that automated the bisection process and walked
the user through it would be helpful, but I believe it would be
possible for us do better.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Theodore Tso
On Tue, Nov 13, 2007 at 11:33:44AM -0600, Larry Finger wrote:
> I'm very encouraged to read of your expanded testing efforts. As a
> bcm43xx developer, Ubuntu has been our problem distro, mostly
> because your standard kernels have debugging turned off for bcm43xx.
> When a Ubuntu user reports a problem and we ask for the relevant
> output from dmesg, they have no information. I ask two things of all
> distros: (1) Turn on debugging - we don't spam the logs that badly,
> and (2) forward any bugs found by your testing to the maintainer,
> and/or the bcm43xx mailing list.

Heh. I hadn't enabled CONFIG_BCM43XX_DEBUG myself, but I just changed
it for my next kernel build.  This is a slightly different issue,
which is that sometimes _DEBUG options shouldn't be turned on by
default (because they really trash performance and bloat log size),
and sometimes they are painless to turn on and don't cost much.

If that is the case, I'd suggest removing the option and just making
it compiled in by default with a run-time option to enable it.

  - Ted


  

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Theodore Tso
On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
> Btw, I used to test every -mm kernel. But since I've switched distros
> (gentoo->ubuntu)
> and I have less time, I feel it's harder to test -rc or -mm kernels (I
> know this isn't a lkml problem
> but more a distro problem, but I would love having an ubuntu blessed
> repo with current dev kernel
> for the latest stable ubuntu release).

There are two parts to this.  One is a Ubuntu development kernel which
we can give to large numbers of people to expand our testing pool.
But if we don't do a better job of responding to bug reports that
would be generated by expanded testing this won't necessarily help us.

The other an automated set of standard pre-built bisection points so
that testers can more easily localize a bug down to a few hundred
commits without needing to learn how to use "git bisect" (think Ubuntu
users).

So for the first, I've actually been playing with some plans to put
together an unofficial kernel that basically "what Ted is using on his
laptop".  It generally has emergency bug fixes that haven't made it
into mainline, plus some other trees where I've been more aggressive
since I want to latest in wireless and powersaving technology, etc.
It has the property that "if it breaks, you get to keep both pieces
--- and I've helpfully included the git ID in the package name so you
can do the bisection yourself".  If you want to try it, the first such
kernel is here:

   http://www.kernel.org/~tytso/tbek

I wasn't planning on talking about it until it was more fully baked,
but if people want something vaguely stable based on 2.6.24-rc2, this
might be interesting.

As for the second, I was just talking to Arjan over pizza and beer
last night, and we reached the same conclusion as Ingo, which is this
really isn't that hard.  It wouldn't be that hard to set up
infrastructure to do this, and it's just a matter of getting the disk
space and the network bandwidth togehter in the right place, plus a
relatively small amount of prgramming at least for the simplest
iteration of the idea.  (As is quite common when doing designs over
beer, we talked about some more gradious web-based schemes to do
custom built kernels that was tied to the kernel bugzilla, but first
things first. :-)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-05 Thread Theodore Tso
On Mon, Mar 05, 2007 at 04:37:15PM -0800, Greg KH wrote:
> But I AM TRYING TO MAKE IT COMPATIBLE!!!
> 
> That's what that config option is there for.  If you happen to be
> running a newer userspace, a different distro than what is in Debian
> right now, or don't use HAL and Networkmanager, then disable that
> option.  Then all of sysfs looks just like it used to, no user visble
> changes at all.  It doesn't get any more compatible than that.

This is great, but I think the real problem isn't the config option,
but what is changing if the config option isn't enabled.  The claim
which some, including Matt and Bron, seem to be making is that if you
turn *off* CONFIG_SYSFS_DEPRECATED, you must be using at least hal
0.5.9-rc1, released ***yesterday***, or suffer breakages for at least
some system configurations.

So the problem with putting a date in Kconfig.txt help file, or in
Documentation/feature-removal-schedule.txt, is that if there are other
incompatible changes which are added to sysfs in say, December 2007 or
January 2008, but which are papered over with CONFIG_SYSFS_DEPRECATED,
and then come June 2008, CONFIG_SYSFS_DEPRECATED is unceremoniously
ripped out, then users will get screwed.  

So the question really is are we really done making changes to sysfs,
or maybe what we should do is talk about major version numbers to
sysfs.  Call what we have currently not CONFIG_SYSFS_DEPRECATED, but
rather CONFIG_SYSFS_LAYOUT_1.  At the moment, CONFIG_SYSFS_LAYOUT_2 is
undergoing changes, but at some point we need to lock down and state
that Layout version 2 is never going to change, and then people who
want changes can go work on CONFIG_SYSFS_LAYOUT_3.  

The problem with calling CONFIG_SYSFS_DEPRECATED is that people think
that since it's deprecated, it should be turned off, but if we have
staged major version numbers, with guarantees of absolute stability
once a particular major version number is locked down, then it may
make it a lot easier to talk about what version of hal and udev and
Network Manager is really needed for different versions.  

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-05 Thread Theodore Tso
On Sun, Mar 04, 2007 at 05:17:29PM -0800, Greg KH wrote:
> I should not have broken any userspace if CONFIG_SYSFS_DEPRECATED is
> enabled with that patch.  If that is enabled, and that patch still
> causes problems, please let me know.

But we still need to update the help text for CONFIG_SYS_DEPRECATED to
make it clear that its deprecation schedule still needs to be 2009 to
2011 (depending on whether we want to accomodate Debian's glacial
release schedule).  Certainly the 2006 date which is currently there
simply isn't accurate.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] d80211 based driver for Intel PRO/Wireless 3945ABG

2007-02-10 Thread Theodore Tso
On Fri, Feb 09, 2007 at 01:12:42PM -0800, James Ketrenos wrote:
> Please hold all questions until I am done with this email.  Thank you.
> 
> We are pleased to announce the availability of a new driver for the 
> Intel PRO/Wireless 3945ABG Network Connection adapter.  This new driver 
> uses the new d80211 subsystem previously only available as part of the 
> wireless-dev tree.

Very cool!  Is it likely that d80211 and iwlwifi will be pushed into
mainline in time for 2.6.21?

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-05 Thread Theodore Tso
On Thu, Aug 03, 2006 at 08:45:31PM -0700, Michael Chan wrote:
> ASF is firmware that monitors the system and sends out alerts whenever
> certain events happen.  So it needs to run before the OS boots or after
> it has crashed.  When the driver is up and running, the driver and ASF
> run independently sending and receiving traffic on the same wire.  Of
> course, the bandwidth that is used by ASF is a very tiny fraction of the
> host traffic.  If the system crashes, the FIFO and other resources on
> the NIC will be backed up and ASF can no longer function without
> resetting the chip.

Thanks, that description was very helpful.  Would you accept a patch
with adding a comment describing this?  I couldn't figure it out from
looking at the source and googling "ASF" turned up lots of other uses
for that particular acronym.

It appears that there is no way of disabling ASF; is that a true
statement?

Thanks, regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-03 Thread Theodore Tso
On Thu, Aug 03, 2006 at 04:28:19PM -0700, Michael Chan wrote:
> True.  But they also have ASF enabled which requires tg3_timer() to send
> the heartbeat periodically.  If the heartbeat is late, ASF may reset the
> chip believing that the system has crashed.

Parden me for asking a dumb question, but what's being accomplished by
resetting the chip if the system has crashed?  Why not reset the chip
when the system reboots and it sees the PCI bus reset?  I guess I'm
missing the purpose of the ASF heartbeat; why does the networking chip
need a chip-specific watchdog?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-03 Thread Theodore Tso
On Thu, Aug 03, 2006 at 04:43:11PM -0700, David Miller wrote:
> From: "Michael Chan" <[EMAIL PROTECTED]>
> Date: Thu, 03 Aug 2006 16:28:19 -0700
> 
> > > eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[0] 
> > > TSOcap[0]
> > 
> > We'll see if we can do away with the timer-based heartbeat.  That's
> > probably the best solution.
> 
> The tg3 driver is not the only device in the world that requires a
> timer based "ping" to work.  The watchdog drivers and the softlockup
> detector are other instances which require a timer to not be delayed
> an unreasonable amount of time.
> 
> Therefore TG3 is not unique in this regard, and I thus don't think
> it's worthwhile to change tg3 just to accomodate this broken behavior
> of the RT patches.

Removing the timer-based "ping" might be a good thing to do from the
point of view of reducing power utilization of laptops (but hey, I
don't have a tg3 in my laptop, so I won't worry about it a whole lot :-), 
but I agree that in general the RT patches need to be able to
call functions such as tg3_timer() reliably even when under a high
real-time process workload, without needing to use the blunt hammer of
"chrt -f 95 `pidof softirq-timer`".  (Since not all timer callbacks
need to be run at rt prio 95.)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-03 Thread Theodore Tso
On Thu, Aug 03, 2006 at 02:48:45PM -0700, David Miller wrote:
> > eth0: Tigon3 [partno(BCM95704s) rev 2100 PHY(serdes)] (PCIX:100MHz:64-bit) 
> > 10/100/1000BaseT Ethernet 00:14:5e:86:44:24
> 
> The 5704 chip will set TG3_FLAG_TAGGED_STATUS, and therefore
> doesn't need the periodic poking done by tg3_timer().

Hmm all I can say is that I could reliably knock the box off the
network by running a four processes that tied up all CPU's at high
real-time priorities, and after I applied the horrible hack that
guaranteed that tg3_timer() was run every 0.128 seconds, the system
stayed on the network.  I'm not sure why, but it did fix the problem.  

Any suggestions on how I could figure out what was really going on and
what would be a better fix would be greatly appreciated.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-03 Thread Theodore Tso
On Thu, Aug 03, 2006 at 11:36:47AM -0700, Michael Chan wrote:
> On Thu, 2006-08-03 at 20:00 +1000, Herbert Xu wrote:
> > Theodore Tso <[EMAIL PROTECTED]> wrote:
> > > 
> > > I'm sending this on mostly because it was a bit of a pain to track down,
> > > and hopefully it will save time if anyone else hits this while playing
> > > with the -rt kernel.  It is NOT the right way to fix things, so please
> > > don't even think of applying this patch (unless you need it, in your own
> > > local tree :-).
> > > 
> > > One of these days when we have time to breath we'll look into fixing
> > > this the right way, if someone doesn't beat us to it first.  :-)
> > 
> Ted, what tg3 hardware is having this timer related problem?  Can you
> send me the tg3 probing output?

tg3.c:v3.49 (Feb 2, 2006)
ACPI: PCI Interrupt :02:01.0[A] -> GSI 24 (level, low) -> IRQ 17
eth0: Tigon3 [partno(BCM95704s) rev 2100 PHY(serdes)] (PCIX:100MHz:64-bit) 
10/100/1000BaseT Ethernet 00:14:5e:86:44:24
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[0] TSOcap[0]
eth0: dma_rwctrl[769f4000] dma_mask[64-bit]

02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit 
Ethernet (rev 10)
Subsystem: IBM: Unknown device 0301
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 17
Memory at efff (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] PCI-X non-bridge device.
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 
Enable-

The other interesting bit of information is that after networking card
goes dead and I do the "ifdown eth0; ifup eth0", the following printk
shows up:

tg3: tg3_abort_hw timed out for eth0, TX_MODE_ENABLE will not clear 
MAC_TX_MODE=

This is from an IBM LS-20 blade.

Is this helpful?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-03 Thread Theodore Tso
On Thu, Aug 03, 2006 at 09:49:17AM -0700, Randy.Dunlap wrote:
> Interesting.  On my Dell D610 notebook with tg3 and vpn,
> I have to ping a server on the vpn to keep it alive, otherwise
> it disappears soon and I have to restart the vpn.  Of course,
> this could just be the vpn or some other software problem
> instead of a tg3 problem.

That sounds almost certainly like a VPN problem.  The tg3_timer() code
wakes up every second or tenth of a second (depending on which mode
you're in) and takes care of keeping the tg3 hardware mollified.  On a
standard kernel, this shouldn't ever be an issue.  For the -rt kernel,
this problem only shows up if you have enough tasks running at
rtprio's above the rtprio of the softirq-timer for long enough that
tg3 chip gets angry

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-03 Thread Theodore Tso
On Thu, Aug 03, 2006 at 09:46:37AM -0700, Daniel Walker wrote:
> There is some form of priority inheritance on the timer softirq. It said
> in the patch header that the right fix was for the timer softirq to
> change priorities. Which Real Time patch are you using? Or is the
> current system not sufficient ?

We're using a someone older version of the CONFIG_PREEMPT_RT patches
(2.6.16-rt22, with various bug fixes pulled up to what we are
running.)

There is priority inheritance on the hrtimers, but not on normal
timers, and conversations with Thomas and Stephen at OLS indicated
this is on the wishlist, but it has not happened yet.  As I mentioned
in the patch comments, I looked at hacking hrtimers into the tg3
driver code, but (a) the hrtimers code assume that the priority
inheritance happens when a process is associated with the hrtimer and
the process is a high priority process, and (b) the hrtimers code
aren't exported for use by modules.  So I went with a very quick hack,
since we have a hard code freeze for a customer deliverable.

In the long term, we're going to need something a bit more
sophisticated than what we have in the hrtimers code, since not all
code which requests timers are necessarily associated with a process.
The tg3_timer() code, for example, is trigger by the device driver but
isn't associated with a process for boosting purposes, and creating a
process just so that tg3_timer() can be boosted seems like the Wrong
Thing.

In addition, the timer wheel code has a *large* number of timers that
get added and then removed without ever getting expired by the TCP
networking code, and I'm not at all convinced that the technique used
for doing prio boosting for the hrtimers will scale to what is needed
for normal timers.

- Ted




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -rt DO NOT APPLY] Fix for tg3 networking lockup

2006-08-03 Thread Theodore Tso
On Thu, Aug 03, 2006 at 08:00:35PM +1000, Herbert Xu wrote:
> Theodore Tso <[EMAIL PROTECTED]> wrote:
> > 
> > I'm sending this on mostly because it was a bit of a pain to track down,
> > and hopefully it will save time if anyone else hits this while playing
> > with the -rt kernel.  It is NOT the right way to fix things, so please
> > don't even think of applying this patch (unless you need it, in your own
> > local tree :-).
> > 
> > One of these days when we have time to breath we'll look into fixing
> > this the right way, if someone doesn't beat us to it first.  :-)
> 
> You probably should resend the patch to netdev and Michael Chan
> <[EMAIL PROTECTED]>.  He might have ideas on how this could be
> avoided.

This only shows up with the real-time kernel where timer softirq's run
in their own processes, and a high priority process preempts the timer
softirq.  I don't really consider this a networking bug, or even
driver bug, although it does seem unfortunate that Broadcom hardware
locks up and goes unresponsive if the OS doesn't tickle it every tenth
of a second or so.  (Definitely a bad idea if the tg3 gets used on any
laptops, from a power usage perspective.)  But that seems like a
(lame) hardware bug, not a driver bug

- Ted

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000: "fix" it on thinkpad x60 / eeprom checksum read fails

2006-07-21 Thread Theodore Tso
On Fri, Jul 21, 2006 at 06:41:05AM -0700, Andrew Morton wrote:
> > It's completely not acceptable to run when the EEPROM checksum fails - you 
> > might even be running with the wrong MAC address, or worse. Lets fix this 
> > the 
> > right way instead.
> 
> A printk which helps the user to understand all this saga would be very nice.
> -

And if someone who understands all of these details could put a note
in the thinkwiki (say, here:
http://www.thinkwiki.org/wiki/Ethernet_Controllers#Intel_Gigabit_.2810.2F100.2F1000.29)
it would be greatly appreciated.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/2] in-kernel sockets API

2006-06-14 Thread Theodore Tso
On Tue, Jun 13, 2006 at 07:53:19PM -0500, Chase Venters wrote:
> > It is the lack of an ABI that is most frustrating to these users.
> 
> And the presence of an ABI would be _very_ frustrating to core
> developers. Not only would these people suffer, everyone would --
> developer time would be wasted dealing with cruft, and forward
> progress would be slowed.

Note that just because an interface is EXPORT_SYMBOL doesn't mean that
the interface is guaranteed to be stable.  So folks who are aruging
that an interface shouldn't be usable by non-GPL applications because
we are therefore guaranteeing a stable API are making an unwarranted
assumption.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html