date:20070913

Re: Network Namespace status

2007-09-13 Thread Oliver Hartkopp

Eric W. Biederman wrote:
> Looking into my patch queue I have:
> 5 patches for cleaning up and making a per network namespace loopback device.
> 4 patches for making rtnetlink message processing per network namespace
> 1 patch for making AF_UNIX per network namespace
> 1 patch for making AF_PACKET per network namespace
>
> The ipv4 part of my patchset is currently working but it needs some
> more cleanup and reordering of patches before it is ready to go anywhere.
> Nothing has been done for ipv6, but the changes should very much parallel
> ipv4.
>
> The other protocols I haven't even looked at yet.
>   

Hi Eric,

can you send me your current AF_PACKET patch? I just want to update our
recent post of the CAN (controller area network) subsystem (AF_CAN)
which is (in some parts) similar to AF_PACKET. So i can take a look on
it to provide the latest technique in the next post ...

Thanks,
Oliver


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23-rc4-mm1 OOPS in forcedeth?

2007-09-13 Thread Andrew James Wade

I have an Oops that may be related:

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0025
printing eip: c037d81b *pde = 
Oops:  [#1]
last sysfs file: /devices/pci:00/:00:01.0/:01:00.0/class

Pid: 0, comm: swapper Not tainted (2.6.23-rc4-mm1-config2 #2)
EIP: 0060:[] EFLAGS: 00010246 CPU: 0
EIP is at tcp_rto_min+0xb/0x15
EAX: 0032 EBX: c4c98b68 ECX: fffe EDX: 
ESI: c4c98b68 EDI: c055f600 EBP: c4432e40 ESP: c0596dec
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process swapper (pid: 0, ti=c0596000 task=c052a340 task.ti=c0568000)
Stack: c037d8de c4c98b68 c4c98b68 c037e0ec 0001 c037f879 c052a8b4 c052a340
    0001 c25e1e60   0001 8c176265 8c17678a
    0001 0001  8c17678a 8600  007d8b21
Call Trace:
 [] tcp_rtt_estimator+0xb9/0xfe
 [] tcp_ack_saw_tstamp+0x14/0x43
 [] tcp_ack+0x6b8/0x17b8
 [] tcp_rcv_established+0x519/0x5f1
 [] tcp_v4_do_rcv+0x28/0x2f8
 [] tcp_v4_rcv+0x7df/0x83d
 [] ip_local_deliver+0xcc/0x148
 [] ip_rcv+0x3b7/0x3de
 [] netif_receive_skb+0x17a/0x1c2
 [] rtl8139_poll+0x2d9/0x425
 [] net_rx_action+0xa8/0xc8
 [] __do_softirq+0x40/0x90
 [] do_softirq+0x4d/0xb6
 ===
INFO: lockdep is turned off.
Code: 24 8b 82 88 03 00 00 89 82 40 05 00 00 a1 a0 23 53 c0 89 82 44 05 00 00 
83 c4 0c 5b 5e 5f 5d c3 8b 90 88 00 00 00 b8 32 00 00 00  42 25 20 74 03 8b 
42 54 c3 56
 85 d2 b9 01 00 00 00 0f 45 ca
EIP: [] tcp_rto_min+0xb/0x15 SS:ESP 0068:c0596dec
Kernel panic - not syncing: Fatal exception in interrupt

config:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc4-mm1
# Wed Sep 12 19:53:26 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION="-config2"
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CONTAINERS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_KPAGEMAP=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MCORE2 is not set
# CONFIG_MK6 is not set
CONFIG_MK7=y
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT

e1000 driver and samba

2007-09-13 Thread L F

Folks,
I've been playing with multiple gigabit ethernet drivers to get samba
3.0.25+ to work reliably. The situation is as follows.
I have a network, one of the machines on the network is a
server/firewall. It contains an Intel PRO1000 dual port PCI Express
card and runs Debian-testing.
The machine is running shorewall 3.4.5 and at present, one port of the
PRO1000 is configured as the WAN port, the other is bridged to a tap
device for virtualbox and is running as the LAN port.
Samba 3.0.25+ will either lose connection or - more worrisomely -
corrupt data in files upon sustained traffic.
One of the tests that consistently fails is mounting a samba share
onto any WinXP client, then trying to unzip a file from the
mounted/mapped drive into the drive itself (i.e. unzipping
Z:\Stuff\qqq.zip to Z:\Stuff\qqq\* ).
If the zip file is of any significant size, one of two things will
happen. Either the client will complain about losing connection to the
share - with a corrisponding error in the samba logs - or everything
will be fine.. except the files will be corrupt.
The unusual thing is that going through the TAP interface from a
Virtualbox machine yields no problems even when transferring tens of
GBs of data.
Copying a large file (500MB+) also has the same effect.
Now, the machine worked when it was using an onboard Realtek 8169
chipset on a 945G board from ASUS, but it worked slowly. I upgraded to
a P965 chipset, started using the realtek driver for the 8110B on that
board.. and started getting consistent samba errors. I therefore
killed the onboard LAN, switched to the Intel board, tried both the
7.6.5 driver on the Intel website AND the driver in the 2.6.20+
kernels - 7.2.x IIRC - and it fails, less than it did with the
realtek, but it fails. Switching back and forth between 2.6.18,
2.6.20.x and 2.6.22.x yielded no improvements. I could use some help,
because I refuse to believe that there isn't a reliable PCIexpress
gigeth/samba combo available.
For further reference, the kernel versions are those mentioned above,
compiled with gcc-3.4.6 and gcc-4.1.2 (current on debian-testing),
with no improvement between the two.
Any and all indications appreciated.

Regards,
Luigi Fabio
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: r8169: slow samba performance

2007-09-13 Thread David Madsen

> > I noticed a somewhat significant difference between patch #0002 and a
> > busy wait loop with ndelay(10). Write performance was equivalent in
> > both cases as should be the case.  Read perfomance for me maxed out
>
> Do you have some (gross) figure for the write performance ?

Write performance was quite high, around 650-700 megabit no doubt due
to caching behavior on the server, and it was similar in both cases.
I believe the machine with the r8169 was CPU bound at this point or it
probably would have been even higher.

Sorry for the slow response, your reply ended up buried in a deluge of
email and I just dug it out.  I'll give these other patches a spin in
the next couple days when I get a chance and see if things improve.

--David Madsen
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] myri10ge: Add support for PCI device id 9

2007-09-13 Thread Brice Goglin

Add support for new Myri-10G boards with PCI device id 9.

Signed-off-by: Brice Goglin <[EMAIL PROTECTED]>
---
 drivers/net/myri10ge/myri10ge.c |3 +++
 1 file changed, 3 insertions(+)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-09-11 
20:27:17.0 +0200
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-09-14 00:36:36.0 
+0200
@@ -3094,9 +3094,12 @@
 }
 
 #define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E 0x0008
+#define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9   0x0009
 
 static struct pci_device_id myri10ge_pci_tbl[] = {
{PCI_DEVICE(PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E)},
+   {PCI_DEVICE
+(PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9)},
{0},
 };
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/1] myri10ge update for 2.6.23

2007-09-13 Thread Brice Goglin

Hi Jeff,

The following patch adds support for a new PCI device id. Please apply
for 2.6.23.

Thanks,
Brice

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Michael Chan

On Thu, 2007-09-13 at 14:11 -0700, Roland Dreier wrote:

> 
> I've been meaning to track down the bnx2 iscsi offload patch to look
> and see if this issue is addressed, since the same problem seems to
> exist: it seems an iscsi connection and a main stack tcp connection
> might share the same 4-tuple unless something is done to avoid that
> happening.
> 

iSCSI does not do passive listens, only active connections to the
target.  But you're right, the port space is still shared between iSCSI
and the main stack.  We currently rely on user apps binding to the main
stack to reserve certain ephemeral ports, and telling the iSCSI driver
which ports to use.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier

 > Well, if it involves /sharing/ port space with the native stack,
 > i.e. where port 1234 is IB but 1235 is Linux, pretty much all the
 > networking devs have NAK'd that approach AFAICS.

Just to be clear, InfiniBand has no problem; the issue is port
collisions involving iWARP connections.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier

 > I was about to post v2 of my patch to avoid port space collisions with
 > the native stack.  Can we get that 2.6.24?  It is high priority
 > IMO. I've tried to solicit review on it, but I think folks are
 > reluctant... ;-)

I would like to get this in, but I'm still at least a little
reluctant, since we would be committing to a user interface that seems
a little awkward at best, so I'd like to try and find something
better.  Just to summarize my understanding:

 - your patch requires the administration to configure an ethX:iwY
   alias address to use iwarp.  (By the way is there anything other
   than "don't do that" that avoids assigning the same address to the
   iwarp alias and a non-iwarp interface?)

 - it would be nicer to create the alias automatically, but an alias
   without an address doesn't make sense.  Creating a whole separate
   net device causes problems because the iwarp stuff still needs to
   use the main net device to do ARP etc.

 - so I'm out of better ideas but I still want to push back a little
   before we commit to something ugly.

I've been meaning to track down the bnx2 iscsi offload patch to look
and see if this issue is addressed, since the same problem seems to
exist: it seems an iscsi connection and a main stack tcp connection
might share the same 4-tuple unless something is done to avoid that
happening.

Also, I think it behooves us to get some agreement on this approach
with NetEffect and Kanoj (NetXen?) at least, since their iwarp drivers
seem to be imminent.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH for 2.6.24] SCTP: Move sysctl_sctp_[rw]mem definitions to protocol.c

2007-09-13 Thread Vlad Yasevich

The sctp_[rw]mem definitions should really be in protocol.c
since that is where they are initialized.  This also allows
one to build a kernel without sysctl support.

Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
---
 net/sctp/protocol.c |6 +++---
 net/sctp/sysctl.c   |   11 +++
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 193835d..c49eb99 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -84,9 +84,9 @@ static struct sctp_af *sctp_af_v6_specific;
 struct kmem_cache *sctp_chunk_cachep __read_mostly;
 struct kmem_cache *sctp_bucket_cachep __read_mostly;
 
-extern int sysctl_sctp_mem[3];
-extern int sysctl_sctp_rmem[3];
-extern int sysctl_sctp_wmem[3];
+int sysctl_sctp_mem[3];
+int sysctl_sctp_rmem[3];
+int sysctl_sctp_wmem[3];
 
 /* Return the address of the control sock. */
 struct sock *sctp_get_ctl_sock(void)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index ba75ef4..39b10ee 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -52,14 +52,9 @@ static int int_max = INT_MAX;
 static long sack_timer_min = 1;
 static long sack_timer_max = 500;
 
-int sysctl_sctp_mem[3];
-int sysctl_sctp_rmem[3];
-int sysctl_sctp_wmem[3];
-
-/*
- * per assoc memory limitationf for sends
- */
-int sysctl_sctp_wmem[3];
+extern int sysctl_sctp_mem[3];
+extern int sysctl_sctp_rmem[3];
+extern int sysctl_sctp_wmem[3];
 
 static ctl_table sctp_table[] = {
{
-- 
1.5.2.4

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier

 > > - My user_mad P_Key index support patch.  I'll test the ioctl to
 > >   change to the new mode and merge this I guess, since Hal and Sean
 > >   have tested this out.
 > 
 > I can give this patch a reviewed-by: too, and I will also try to review a 
 > couple
 > of the pending ipoib patches.

Thanks!

 > > - Sean's QoS changes.  These look fine at first glance, and I just
 > >   plan to understand the backwards compatibility story (ie how this
 > >   works with an old SM) and merge.  Anyone who objects let me know.
 > 
 > The new QoS fields fall into fields that are currently reserved, which 
 > should be
 > ignored by an older SM.  I've only tested this against openSM however.

That seems OK -- I'm OK with breaking things if an SM is clearly buggy
(and not ignoring fields that are defined to be ignored in the spec
would certainly be a clear bug to me).

 > This patch was generated in response to an Intel MPI issue.  We've seen MPI 
 > take
 > several minutes to respond to a connection request during the middle of large
 > application runs.  When this happens, the active side times out the 
 > connection.
 > In OFED, we added module parameters to adjust the rdma_cm connection timeout 
 > on
 > the active side, but I believe that sending an MRA from the passive side is a
 > better solution.

OK -- just to make sure I'm understanding what you're saying: have you
confirmed that your proposed patches actually fix the issue?

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier

 > Since ehca can support 4K MTU, we would like to see a patch in 
 > IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24 
 > kernel. The idea is IPoIB link MTU will pick up a return value from SM's 
 > default broadcast MTU. This patch should be a small patch, I hope you are 
 > OK with this.

It's actually not small, since it turns the skb allocation into a
4100-byte buffer, which ends up being more than 1 page usually, which
means it fails if memory is fragmented.

Anyway given the backlog anything substantial that hasn't been posted
already is almost surely going to have to wait until 2.6.25.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Lksctp-developers] [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU

2007-09-13 Thread Vlad Yasevich

Sridhar Samudrala wrote:
> On Thu, 2007-09-13 at 15:33 -0400, Vlad Yasevich wrote:
>> Hi Sridhar
>>
>> Sridhar Samudrala wrote:
>>> On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
 On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
> [... and here is the updated version as promissed ...]
>
> Since the sctp_sockaddr_entry is now RCU enabled as part of
> the patch to synchronize sctp_localaddr_list, it makes sense to
> change all handling of these entries to RCU.  This includes the
> sctp_bind_addrs structure and it's list of bound addresses.
>
> This list is currently protected by an external rw_lock and that
> looks like an overkill.  There are only 2 writers to the list:
> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> These are already seriealized via the socket lock, so they will
> not step on each other.  These are also relatively rare, so we
> should be good with RCU.
>
> The readers are varied and they are easily converted to RCU.
 Looks good from an RCU viewpoint -- I must defer to others on
 the networking aspects.

 Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>
>>> looks good to me too. some minor typos and some comments on
>>> RCU usage comments inline.
>>>
>>> Also, I guess we can remove the sctp_[read/write]_[un]lock macros
>>> from sctp.h now that you removed the all the users of rwlocks
>>> in SCTP
>>>
>> Looks like some of the hashing calls still use sctp_write_[un]lock
>> macros, but use normal read_lock() for the read side.
>>
>> I'll clean that up after these patches are accepted.
> 
> OK. You may also consider looking into the generic inet_hashtable
> infrastructure and see if we can use it for SCTP.
> 
> 

I've had a patch set brewing for a while.  I had everything done except the
association hash.  Have been trying to figure out how to plug that one in...

If you want to take a look, I can send you what I have so far. :)

-vlad
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v3 PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU

2007-09-13 Thread Sridhar Samudrala

On Thu, 2007-09-13 at 15:34 -0400, Vlad Yasevich wrote:
> Since the sctp_sockaddr_entry is now RCU enabled as part of
> the patch to synchronize sctp_localaddr_list, it makes sense to
> change all handling of these entries to RCU.  This includes the
> sctp_bind_addrs structure and it's list of bound addresses.
> 
> This list is currently protected by an external rw_lock and that
> looks like an overkill.  There are only 2 writers to the list:
> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> These are already seriealized via the socket lock, so they will
> not step on each other.  These are also relatively rare, so we
> should be good with RCU.
> 
> The readers are varied and they are easily converted to RCU.
> 
> Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
> Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>

Acked-by: Sridhar Samudrala <[EMAIL PROTECTED]>

Thanks
Sridhar
> ---
>  include/net/sctp/structs.h |7 +--
>  net/sctp/associola.c   |   14 +-
>  net/sctp/bind_addr.c   |   68 --
>  net/sctp/endpointola.c |   27 +++-
>  net/sctp/ipv6.c|   12 ++---
>  net/sctp/protocol.c|   25 ---
>  net/sctp/sm_make_chunk.c   |   18 +++-
>  net/sctp/socket.c  |   98 ---
>  8 files changed, 106 insertions(+), 163 deletions(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index a89e361..c2fe2dc 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -1155,7 +1155,9 @@ int sctp_bind_addr_copy(struct sctp_bind_addr *dest,
>   int flags);
>  int sctp_add_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
>  __u8 use_as_src, gfp_t gfp);
> -int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *);
> +int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
> + void (*rcu_call)(struct rcu_head *,
> +   void (*func)(struct rcu_head *)));
>  int sctp_bind_addr_match(struct sctp_bind_addr *, const union sctp_addr *,
>struct sctp_sock *);
>  union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr*bp,
> @@ -1226,9 +1228,6 @@ struct sctp_ep_common {
>* bind_addr.address_list is our set of local IP addresses.
>*/
>   struct sctp_bind_addr bind_addr;
> -
> - /* Protection during address list comparisons. */
> - rwlock_t   addr_lock;
>  };
> 
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 2ad1caf..9bad8ba 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -99,7 +99,6 @@ static struct sctp_association 
> *sctp_association_init(struct sctp_association *a
> 
>   /* Initialize the bind addr area.  */
>   sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
> - rwlock_init(&asoc->base.addr_lock);
> 
>   asoc->state = SCTP_STATE_CLOSED;
> 
> @@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct 
> sctp_association *asoc,
>  {
>   struct sctp_transport *transport;
> 
> - sctp_read_lock(&asoc->base.addr_lock);
> -
>   if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
>   (htons(asoc->peer.port) == paddr->v4.sin_port)) {
>   transport = sctp_assoc_lookup_paddr(asoc, paddr);
> @@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct 
> sctp_association *asoc,
>   transport = NULL;
> 
>  out:
> - sctp_read_unlock(&asoc->base.addr_lock);
>   return transport;
>  }
> 
> @@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct 
> sctp_association *asoc,
>  int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
>   const union sctp_addr *laddr)
>  {
> - int found;
> + int found = 0;
> 
> - sctp_read_lock(&asoc->base.addr_lock);
>   if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
>   sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
> -  sctp_sk(asoc->base.sk))) {
> +  sctp_sk(asoc->base.sk)))
>   found = 1;
> - goto out;
> - }
> 
> - found = 0;
> -out:
> - sctp_read_unlock(&asoc->base.addr_lock);
>   return found;
>  }
> 
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index 7fc369f..d35cbf5 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -167,7 +167,11 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union 
> sctp_addr *new,
> 
>   INIT_LIST_HEAD(&addr->list);
>   INIT_RCU_HEAD(&addr->rcu);
> - list_add_tail(&addr->list, &bp->address_list);
> +
> + /* We always hold a socket lock when calling this function,
> +  * and that acts as a writer synchronizing lock.
> +  */
> + list_add_tail_rcu(&addr->list, &bp->address_list);
>   SCTP_DBG_OBJCNT_INC(addr);
> 
>

Re: [v3 PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list

2007-09-13 Thread Sridhar Samudrala

On Thu, 2007-09-13 at 15:34 -0400, Vlad Yasevich wrote:
> sctp_localaddr_list is modified dynamically via NETDEV_UP
> and NETDEV_DOWN events, but there is not synchronization
> between writer (even handler) and readers.  As a result,
> the readers can access an entry that has been freed and
> crash the sytem.
> 
> Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
> Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>

Acked-by: Sridhar Samdurala <[EMAIL PROTECTED]>

Thanks
Sridhar
> ---
>  include/net/sctp/sctp.h|1 +
>  include/net/sctp/structs.h |6 +
>  net/sctp/bind_addr.c   |2 +
>  net/sctp/ipv6.c|   34 +++
>  net/sctp/protocol.c|   54 +++
>  net/sctp/socket.c  |   38 --
>  6 files changed, 97 insertions(+), 38 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index d529045..c9cc00c 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -123,6 +123,7 @@
>   * sctp/protocol.c
>   */
>  extern struct sock *sctp_get_ctl_sock(void);
> +extern void sctp_local_addr_free(struct rcu_head *head);
>  extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
>sctp_scope_t, gfp_t gfp,
>int flags);
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index c0d5848..a89e361 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -207,6 +207,9 @@ extern struct sctp_globals {
>* It is a list of sctp_sockaddr_entry.
>*/
>   struct list_head local_addr_list;
> +
> + /* Lock that protects the local_addr_list writers */
> + spinlock_t addr_list_lock;
>   
>   /* Flag to indicate if addip is enabled. */
>   int addip_enable;
> @@ -242,6 +245,7 @@ extern struct sctp_globals {
>  #define sctp_port_alloc_lock (sctp_globals.port_alloc_lock)
>  #define sctp_port_hashtable  (sctp_globals.port_hashtable)
>  #define sctp_local_addr_list (sctp_globals.local_addr_list)
> +#define sctp_local_addr_lock (sctp_globals.addr_list_lock)
>  #define sctp_addip_enable(sctp_globals.addip_enable)
>  #define sctp_prsctp_enable   (sctp_globals.prsctp_enable)
> 
> @@ -737,8 +741,10 @@ const union sctp_addr *sctp_source(const struct 
> sctp_chunk *chunk);
>  /* This is a structure for holding either an IPv6 or an IPv4 address.  */
>  struct sctp_sockaddr_entry {
>   struct list_head list;
> + struct rcu_head rcu;
>   union sctp_addr a;
>   __u8 use_as_src;
> + __u8 valid;
>  };
> 
>  typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association 
> *);
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index fdb287a..7fc369f 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union 
> sctp_addr *new,
>   addr->a.v4.sin_port = htons(bp->port);
> 
>   addr->use_as_src = use_as_src;
> + addr->valid = 1;
> 
>   INIT_LIST_HEAD(&addr->list);
> + INIT_RCU_HEAD(&addr->rcu);
>   list_add_tail(&addr->list, &bp->address_list);
>   SCTP_DBG_OBJCNT_INC(addr);
> 
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index f8aa23d..e12fa0a 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -77,13 +77,18 @@
> 
>  #include 
> 
> -/* Event handler for inet6 address addition/deletion events.  */
> +/* Event handler for inet6 address addition/deletion events.
> + * The sctp_local_addr_list needs to be protocted by a spin lock since
> + * multiple notifiers (say IPv4 and IPv6) may be running at the same
> + * time and thus corrupt the list.
> + * The reader side is protected with RCU.
> + */
>  static int sctp_inet6addr_event(struct notifier_block *this, unsigned long 
> ev,
>   void *ptr)
>  {
>   struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
> - struct sctp_sockaddr_entry *addr;
> - struct list_head *pos, *temp;
> + struct sctp_sockaddr_entry *addr = NULL;
> + struct sctp_sockaddr_entry *temp;
> 
>   switch (ev) {
>   case NETDEV_UP:
> @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block 
> *this, unsigned long ev,
>   memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
>sizeof(struct in6_addr));
>   addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
> - list_add_tail(&addr->list, &sctp_local_addr_list);
> + addr->valid = 1;
> + spin_lock_bh(&sctp_local_addr_lock);
> + list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
> + spin_unlock_bh(&sctp_local_addr_lock);
>   }
>   break;
>   case NETDEV_DOWN:
> - list_for_eac

Re: [Lksctp-developers] [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU

2007-09-13 Thread Sridhar Samudrala

On Thu, 2007-09-13 at 15:33 -0400, Vlad Yasevich wrote:
> Hi Sridhar
> 
> Sridhar Samudrala wrote:
> > On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
> >> On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
> >>> [... and here is the updated version as promissed ...]
> >>>
> >>> Since the sctp_sockaddr_entry is now RCU enabled as part of
> >>> the patch to synchronize sctp_localaddr_list, it makes sense to
> >>> change all handling of these entries to RCU.  This includes the
> >>> sctp_bind_addrs structure and it's list of bound addresses.
> >>>
> >>> This list is currently protected by an external rw_lock and that
> >>> looks like an overkill.  There are only 2 writers to the list:
> >>> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> >>> These are already seriealized via the socket lock, so they will
> >>> not step on each other.  These are also relatively rare, so we
> >>> should be good with RCU.
> >>>
> >>> The readers are varied and they are easily converted to RCU.
> >> Looks good from an RCU viewpoint -- I must defer to others on
> >> the networking aspects.
> >>
> >> Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>
> > 
> > looks good to me too. some minor typos and some comments on
> > RCU usage comments inline.
> > 
> > Also, I guess we can remove the sctp_[read/write]_[un]lock macros
> > from sctp.h now that you removed the all the users of rwlocks
> > in SCTP
> > 
> 
> Looks like some of the hashing calls still use sctp_write_[un]lock
> macros, but use normal read_lock() for the read side.
> 
> I'll clean that up after these patches are accepted.

OK. You may also consider looking into the generic inet_hashtable
infrastructure and see if we can use it for SCTP.

Thanks
Sridhar

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Jeff Garzik


Steve Wise wrote:

Jeff Garzik wrote:

Steve Wise wrote:
I was about to post v2 of my patch to avoid port space collisions 
with the native stack.  Can we get that 2.6.24?  It is high priority 
IMO. I've tried to solicit review on it, but I think folks are 
reluctant... ;-)


Well, if it involves /sharing/ port space with the native stack, i.e. 
where port 1234 is IB but 1235 is Linux, pretty much all the 
networking devs have NAK'd that approach AFAICS.


Jeff, I posted a fix that doesn't do this.  No port sharing.  The iwarp 
device will use its own ip address and subnet to avoid collisions.  You 
should review the patch when I post v2.


Sounds promising, then!  :)

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] [PATCH v2] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.

2007-09-13 Thread Sean Hefty


The iWARP driver must translate all listens on address 0.0.0.0 to the
set of rdma-only ip addresses for the device in question.  This prevents
incoming connect requests to the TCP ipaddresses from going up the
rdma stack.


I've only given this a high level review at this point, and while the 
patch looks okay on first pass, is there a way to move some of this 
functionality to either the rdma_cm or iw_cm?  I don't like the idea of 
every iwarp driver having to implement address/listen list maintenance. 
 I may have some ideas after re-examining it.



Implementation Details:


There are a couple of areas that I made a note to look at in more detail 
(because I didn't understand everything that was happening), but I did 
have one minor nit - most uses of list_del_init can just be list_del.


- Sean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot

2007-09-13 Thread Michael Chan

On Thu, 2007-09-13 at 21:28 +0200, Lucas Nussbaum wrote:

> Erm, Wouldn't it be possible to print a warning when the driver loads,
> saying that the firmware is outdated ?

It's possible, but would require the driver to parse the version string.
The driver currently reports the version string for information and for
the human to parse it.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v3 PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU

2007-09-13 Thread Vlad Yasevich

Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU.  This includes the
sctp_bind_addrs structure and it's list of bound addresses.

This list is currently protected by an external rw_lock and that
looks like an overkill.  There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other.  These are also relatively rare, so we
should be good with RCU.

The readers are varied and they are easily converted to RCU.

Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>
---
 include/net/sctp/structs.h |7 +--
 net/sctp/associola.c   |   14 +-
 net/sctp/bind_addr.c   |   68 --
 net/sctp/endpointola.c |   27 +++-
 net/sctp/ipv6.c|   12 ++---
 net/sctp/protocol.c|   25 ---
 net/sctp/sm_make_chunk.c   |   18 +++-
 net/sctp/socket.c  |   98 ---
 8 files changed, 106 insertions(+), 163 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a89e361..c2fe2dc 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1155,7 +1155,9 @@ int sctp_bind_addr_copy(struct sctp_bind_addr *dest,
int flags);
 int sctp_add_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
   __u8 use_as_src, gfp_t gfp);
-int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *);
+int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
+   void (*rcu_call)(struct rcu_head *,
+ void (*func)(struct rcu_head *)));
 int sctp_bind_addr_match(struct sctp_bind_addr *, const union sctp_addr *,
 struct sctp_sock *);
 union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr  *bp,
@@ -1226,9 +1228,6 @@ struct sctp_ep_common {
 * bind_addr.address_list is our set of local IP addresses.
 */
struct sctp_bind_addr bind_addr;
-
-   /* Protection during address list comparisons. */
-   rwlock_t   addr_lock;
 };
 
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 2ad1caf..9bad8ba 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct 
sctp_association *a
 
/* Initialize the bind addr area.  */
sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
-   rwlock_init(&asoc->base.addr_lock);
 
asoc->state = SCTP_STATE_CLOSED;
 
@@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct 
sctp_association *asoc,
 {
struct sctp_transport *transport;
 
-   sctp_read_lock(&asoc->base.addr_lock);
-
if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
(htons(asoc->peer.port) == paddr->v4.sin_port)) {
transport = sctp_assoc_lookup_paddr(asoc, paddr);
@@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct 
sctp_association *asoc,
transport = NULL;
 
 out:
-   sctp_read_unlock(&asoc->base.addr_lock);
return transport;
 }
 
@@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct 
sctp_association *asoc,
 int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
const union sctp_addr *laddr)
 {
-   int found;
+   int found = 0;
 
-   sctp_read_lock(&asoc->base.addr_lock);
if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
-sctp_sk(asoc->base.sk))) {
+sctp_sk(asoc->base.sk)))
found = 1;
-   goto out;
-   }
 
-   found = 0;
-out:
-   sctp_read_unlock(&asoc->base.addr_lock);
return found;
 }
 
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index 7fc369f..d35cbf5 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -167,7 +167,11 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union 
sctp_addr *new,
 
INIT_LIST_HEAD(&addr->list);
INIT_RCU_HEAD(&addr->rcu);
-   list_add_tail(&addr->list, &bp->address_list);
+
+   /* We always hold a socket lock when calling this function,
+* and that acts as a writer synchronizing lock.
+*/
+   list_add_tail_rcu(&addr->list, &bp->address_list);
SCTP_DBG_OBJCNT_INC(addr);
 
return 0;
@@ -176,23 +180,35 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union 
sctp_addr *new,
 /* Delete an address from the bind address list in the SCTP_bind_addr
  * structure.
  */
-int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_

[v3 PATCH 0/2] Add RCU locking to SCTPaddress management

2007-09-13 Thread Vlad Yasevich

Hi All

Thanks to Sridhar Samudral and Paul McKenney for all the help and comments.
I think this is a final version, unless someone else can spot more problems.
I've ran this under heavy load and it the patches behaves well.

I think patch 1 is a candidate for 2.6.23 since it fixes a bug, but splitting
these seems a bit odd to me.  I'll leave it to DaveM to decide where to
put them.

Thanks
-vlad
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v3 PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list

2007-09-13 Thread Vlad Yasevich

sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers.  As a result,
the readers can access an entry that has been freed and
crash the sytem.

Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>
---
 include/net/sctp/sctp.h|1 +
 include/net/sctp/structs.h |6 +
 net/sctp/bind_addr.c   |2 +
 net/sctp/ipv6.c|   34 +++
 net/sctp/protocol.c|   54 +++
 net/sctp/socket.c  |   38 --
 6 files changed, 97 insertions(+), 38 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d529045..c9cc00c 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -123,6 +123,7 @@
  * sctp/protocol.c
  */
 extern struct sock *sctp_get_ctl_sock(void);
+extern void sctp_local_addr_free(struct rcu_head *head);
 extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
 sctp_scope_t, gfp_t gfp,
 int flags);
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..a89e361 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -207,6 +207,9 @@ extern struct sctp_globals {
 * It is a list of sctp_sockaddr_entry.
 */
struct list_head local_addr_list;
+
+   /* Lock that protects the local_addr_list writers */
+   spinlock_t addr_list_lock;

/* Flag to indicate if addip is enabled. */
int addip_enable;
@@ -242,6 +245,7 @@ extern struct sctp_globals {
 #define sctp_port_alloc_lock   (sctp_globals.port_alloc_lock)
 #define sctp_port_hashtable(sctp_globals.port_hashtable)
 #define sctp_local_addr_list   (sctp_globals.local_addr_list)
+#define sctp_local_addr_lock   (sctp_globals.addr_list_lock)
 #define sctp_addip_enable  (sctp_globals.addip_enable)
 #define sctp_prsctp_enable (sctp_globals.prsctp_enable)
 
@@ -737,8 +741,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk 
*chunk);
 /* This is a structure for holding either an IPv6 or an IPv4 address.  */
 struct sctp_sockaddr_entry {
struct list_head list;
+   struct rcu_head rcu;
union sctp_addr a;
__u8 use_as_src;
+   __u8 valid;
 };
 
 typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index fdb287a..7fc369f 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union 
sctp_addr *new,
addr->a.v4.sin_port = htons(bp->port);
 
addr->use_as_src = use_as_src;
+   addr->valid = 1;
 
INIT_LIST_HEAD(&addr->list);
+   INIT_RCU_HEAD(&addr->rcu);
list_add_tail(&addr->list, &bp->address_list);
SCTP_DBG_OBJCNT_INC(addr);
 
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f8aa23d..e12fa0a 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -77,13 +77,18 @@
 
 #include 
 
-/* Event handler for inet6 address addition/deletion events.  */
+/* Event handler for inet6 address addition/deletion events.
+ * The sctp_local_addr_list needs to be protocted by a spin lock since
+ * multiple notifiers (say IPv4 and IPv6) may be running at the same
+ * time and thus corrupt the list.
+ * The reader side is protected with RCU.
+ */
 static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
void *ptr)
 {
struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
-   struct sctp_sockaddr_entry *addr;
-   struct list_head *pos, *temp;
+   struct sctp_sockaddr_entry *addr = NULL;
+   struct sctp_sockaddr_entry *temp;
 
switch (ev) {
case NETDEV_UP:
@@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block 
*this, unsigned long ev,
memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
 sizeof(struct in6_addr));
addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
-   list_add_tail(&addr->list, &sctp_local_addr_list);
+   addr->valid = 1;
+   spin_lock_bh(&sctp_local_addr_lock);
+   list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+   spin_unlock_bh(&sctp_local_addr_lock);
}
break;
case NETDEV_DOWN:
-   list_for_each_safe(pos, temp, &sctp_local_addr_list) {
-   addr = list_entry(pos, struct sctp_sockaddr_entry, 
list);
-   if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) 
{
-   list_del(pos);
-

Re: [Lksctp-developers] [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU

2007-09-13 Thread Vlad Yasevich

Hi Sridhar

Sridhar Samudrala wrote:
> On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
>> On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
>>> [... and here is the updated version as promissed ...]
>>>
>>> Since the sctp_sockaddr_entry is now RCU enabled as part of
>>> the patch to synchronize sctp_localaddr_list, it makes sense to
>>> change all handling of these entries to RCU.  This includes the
>>> sctp_bind_addrs structure and it's list of bound addresses.
>>>
>>> This list is currently protected by an external rw_lock and that
>>> looks like an overkill.  There are only 2 writers to the list:
>>> bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
>>> These are already seriealized via the socket lock, so they will
>>> not step on each other.  These are also relatively rare, so we
>>> should be good with RCU.
>>>
>>> The readers are varied and they are easily converted to RCU.
>> Looks good from an RCU viewpoint -- I must defer to others on
>> the networking aspects.
>>
>> Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>
> 
> looks good to me too. some minor typos and some comments on
> RCU usage comments inline.
> 
> Also, I guess we can remove the sctp_[read/write]_[un]lock macros
> from sctp.h now that you removed the all the users of rwlocks
> in SCTP
> 

Looks like some of the hashing calls still use sctp_write_[un]lock
macros, but use normal read_lock() for the read side.

I'll clean that up after these patches are accepted.

-vlad
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot

2007-09-13 Thread Lucas Nussbaum

On 13/09/07 at 11:05 -0700, Michael Chan wrote:
> On Thu, 2007-09-13 at 17:41 +0200, Lucas Nussbaum wrote:
> 
> > # ethtool -i eth0
> > driver: tg3
> > version: 3.65
> > firmware-version: 5703-v2.21a
> > bus-info: :02:02.0
> 
> The firmware is quite old and needs to be upgraded to fix the problem.
> I'll have someone contact you to get it upgraded.

Erm, Wouldn't it be possible to print a warning when the driver loads,
saying that the firmware is outdated ?
-- 
| Lucas NussbaumPhD student |
| [EMAIL PROTECTED]LIG / Projet MESCAL |
| jabber: [EMAIL PROTECTED]+33 (0)6 64 71 41 65 |
| homepage:http://www-id.imag.fr/~nussbaum/ |
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.

2007-09-13 Thread Steve Wise


iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.

Version 2:

- added a per-device mutex for the address and listening endpoints lists.

- wait for all replies if sending multiple passive_open requests to rnic.

- log warning if no addresses are available when a listen is issued.

- tested

---

Design:

The sysadmin creates "for iwarp use only" alias interfaces of the form
"devname:iw*" where devname is the native interface name (eg eth0) for the
iwarp netdev device.  The alias label can be anything starting with "iw".
The "iw" immediately after the ':' is the key used by the iw_cxgb3 driver.

EG:
ifconfig eth0 192.168.70.123 up
ifconfig eth0:iw1 192.168.71.123 up
ifconfig eth0:iw2 192.168.72.123 up

In the above example, 192.168.70/24 is for TCP traffic, while
192.168.71/24 and 192.168.72/24 are for iWARP/RDMA use.

The rdma-only interface must be on its own IP subnet. This allows routing
all rdma traffic onto this interface.

The iWARP driver must translate all listens on address 0.0.0.0 to the
set of rdma-only ip addresses for the device in question.  This prevents
incoming connect requests to the TCP ipaddresses from going up the
rdma stack.

Implementation Details:

- The iw_cxgb3 driver registers for inetaddr events via
register_inetaddr_notifier().  This allows tracking the iwarp-only
addresses/subnets as they get added and deleted.  The iwarp driver
maintains a list of the current iwarp-only addresses.

- The iw_cxgb3 driver builds the list of iwarp-only addresses for its
devices at module insert time.  This is needed because the inetaddr
notifier callbacks don't "replay" address-add events when someone
registers.  So the driver must build the initial list at module load time.

- When a listen is done on address 0.0.0.0, then the iw_cxgb3 driver
must translate that into a set of listens on the iwarp-only addresses.
This is implemented by maintaining a list of stid/addr entries per
listening endpoint.

- When a new iwarp-only address is added or removed, the iw_cxgb3 driver
must traverse the set of listening endpoints and update them accordingly.
This allows an application to bind to 0.0.0.0 prior to the iwarp-only
interfaces being configured.  It also allows changing the iwarp-only set
of addresses and getting the expected behavior for apps already bound
to 0.0.0.0.  This is done by maintaining a list of listening endpoints
off the device struct.

- The address list, the listening endpoint list, and each list of
stid/addrs in use per listening endpoint are all protected via a mutex
per iw_cxgb3 device.

Signed-off-by: Steve Wise <[EMAIL PROTECTED]>
---

 drivers/infiniband/hw/cxgb3/iwch.c|  125 
 drivers/infiniband/hw/cxgb3/iwch.h|   11 +
 drivers/infiniband/hw/cxgb3/iwch_cm.c |  259 +++--
 drivers/infiniband/hw/cxgb3/iwch_cm.h |   15 ++
 4 files changed, 360 insertions(+), 50 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c 
b/drivers/infiniband/hw/cxgb3/iwch.c
index 0315c9d..296fb66 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -63,6 +63,123 @@ struct cxgb3_client t3c_client = {
 static LIST_HEAD(dev_list);
 static DEFINE_MUTEX(dev_mutex);
 
+static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+   struct iwch_addrlist *addr;
+
+   addr = kmalloc(sizeof *addr, GFP_KERNEL);
+   if (!addr) {
+   printk(KERN_ERR MOD "%s - failed to alloc memory!\n",
+  __FUNCTION__);
+   return;
+   }
+   addr->ifa = ifa;
+   mutex_lock(&rnicp->mutex);
+   list_add_tail(&addr->entry, &rnicp->addrlist);
+   mutex_unlock(&rnicp->mutex);
+}
+
+static void remove_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+   struct iwch_addrlist *addr, *tmp;
+
+   mutex_lock(&rnicp->mutex);
+   list_for_each_entry_safe(addr, tmp, &rnicp->addrlist, entry) {
+   if (addr->ifa == ifa) {
+   list_del_init(&addr->entry);
+   kfree(addr);
+   goto out;
+   }
+   }
+out:
+   mutex_unlock(&rnicp->mutex);
+}
+
+static int netdev_is_ours(struct iwch_dev *rnicp, struct net_device *netdev)
+{
+   int i;
+
+   for (i = 0; i < rnicp->rdev.port_info.nports; i++)
+   if (netdev == rnicp->rdev.port_info.lldevs[i])
+   return 1;
+   return 0;
+}
+
+static inline int is_iwarp_label(char *label)
+{
+   char *colon;
+
+   colon = strchr(label, ':');
+   if (colon && !strncmp(colon+1, "iw", 2))
+   return 1;
+   return 0;
+}
+
+static int nb_callback(struct notifier_block *self, unsigned long event,
+  void *ctx)
+{
+   struct in_ifaddr *ifa = ctx;
+   struct iwch_dev *rnicp = container_of(self, struct iwch_dev, nb);
+
+   PDBG("%s rnicp %p event %lx\n", __FUNCTION__, rnicp, event);
+
+

Network Namespace status

2007-09-13 Thread Eric W. Biederman


Now that the network namespace work is partly merged I figure
a short status summary of where everything is at is in order.

David Miller has merged the core of the network namespace work
and that probably needs to sit just a little while to make certain
we don't have unexpected breakage.

Before enabling multiple instances of the network namespace
it is necessary to sort through a few last user interface issues.

In Greg KH's tree there is work from Tejun and myself that decouples
the sysfs dentry tree from the kobject tree, and Tejun is actively
working on completing that decoupling.  From the current sysfs state
it takes just a handful of patches to support multiple super_blocks
each displaying the network devices for a different network namespace.
And the last round of patches that did that Tejun and I almost agree
upon.  That support is needed before we can allow network devices
to exist in anything except the initial network namespace.

In Andrew's tree there is the start of my sysctl cleanup.  Basically
just an additional sanity check in register_sysctl_table and a bunch
of fixes to avoid the errors that sanity check has found.  Pending
I have a few more general cleanups and code to support multiple
network namespaces.  Last we talked Andrew said I have sent
him enough sysctl changes for now, and to wait until after the
merge window before sending more.

The proc support in the net-2.6.24 tree is reasonable from the
direction of the networking code.  Currently I am looking at
"current->net_ns" and resolving /proc/net based upon that.  Long term
we want to refactor that code so that "current->net_ns" is captured
when we mount /proc.  So the network namespace state can be monitored
from outside applications, and so that we aren't playing dangerous
games with the vfs dentry trees.

The final blocker to having multiple useful instances of network
namespaces is the loopback device.  We recognize the network namespace
of incoming packets by looking at dev->nd_net.  Which means for
packets to properly loopback within a network namespace we need a
loopback device per network namespace.  There were some concerns
expressed when we posted the cleanup part of the patches that allowed
for multiple loopback devices a few weeks ago so resolving this one
may be tricky.


Looking into my patch queue I have:
5 patches for cleaning up and making a per network namespace loopback device.
4 patches for making rtnetlink message processing per network namespace
1 patch for making AF_UNIX per network namespace
1 patch for making AF_PACKET per network namespace

The ipv4 part of my patchset is currently working but it needs some
more cleanup and reordering of patches before it is ready to go anywhere.
Nothing has been done for ipv6, but the changes should very much parallel
ipv4.

The other protocols I haven't even looked at yet.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add IP1000A Driver

2007-09-13 Thread Francois Romieu

?-Jesse <[EMAIL PROTECTED]> :
[...]
> I wish to list three people you, me and, my leader Sorbica in this file.

Yes.

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Steve Wise




Jeff Garzik wrote:

Steve Wise wrote:
I was about to post v2 of my patch to avoid port space collisions with 
the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
I've tried to solicit review on it, but I think folks are reluctant... 
;-)


Well, if it involves /sharing/ port space with the native stack, i.e. 
where port 1234 is IB but 1235 is Linux, pretty much all the networking 
devs have NAK'd that approach AFAICS.




Jeff, I posted a fix that doesn't do this.  No port sharing.  The iwarp 
device will use its own ip address and subnet to avoid collisions.  You 
should review the patch when I post v2.


Thanks,

Steve.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Jeff Garzik


Steve Wise wrote:
I was about to post v2 of my patch to avoid port space collisions with 
the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
I've tried to solicit review on it, but I think folks are reluctant... ;-)


Well, if it involves /sharing/ port space with the native stack, i.e. 
where port 1234 is IB but 1235 is Linux, pretty much all the networking 
devs have NAK'd that approach AFAICS.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 PATCH for 2.6.24] SCTP: Implement the Supported Extensions Parameter

2007-09-13 Thread Vlad Yasevich

[... i can't seem to spell to save my life lately...]

SCTP Supported Extenions parameter is specified in Section 4.2.7
of the ADD-IP draft (soon to be RFC).  The parameter is
encoded as:

  0   1   2   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | Parameter Type = 0x8008   |  Parameter Length |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | CHUNK TYPE 1  |  CHUNK TYPE 2 |  CHUNK TYPE 3 |  CHUNK TYPE 4 |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | CHUNK TYPE N  |  PAD  |  PAD  |  PAD  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It contains a list of chunks that a particular SCTP extension
uses.  Current extensions supported are Partial Reliability
(FWD-TSN) and ADD-IP (ASCONF and ASCONF-ACK).

When implementing new extensions (AUTH, PKT-DROP, etc..), new
chunks need to be added to this parameter.  Parameter processing
would be modified to negotiate support for these new features.

Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
---
 include/linux/sctp.h   |9 
 include/net/sctp/structs.h |1 +
 net/sctp/sm_make_chunk.c   |   91 +++-
 3 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index d70df61..f4d717b 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -180,6 +180,9 @@ typedef enum {
SCTP_PARAM_SUPPORTED_ADDRESS_TYPES  = __constant_htons(12),
SCTP_PARAM_ECN_CAPABLE  = __constant_htons(0x8000),
 
+   /* Add-IP: Supported Extensions, Section 4.2 */
+   SCTP_PARAM_SUPPORTED_EXT= __constant_htons(0x8008),
+
/* PR-SCTP Sec 3.1 */
SCTP_PARAM_FWD_TSN_SUPPORT  = __constant_htons(0xc000),
 
@@ -296,6 +299,12 @@ typedef struct sctp_adaptation_ind_param {
__be32 adaptation_ind;
 } __attribute__((packed)) sctp_adaptation_ind_param_t;
 
+/* ADDIP Section 4.2.7 Supported Extensions Parameter */
+typedef struct sctp_supported_ext_param {
+   struct sctp_paramhdr param_hdr;
+   __u8 chunks[0];
+} __attribute__((packed)) sctp_supported_ext_param_t;
+
 /* RFC 2960.  Section 3.3.3 Initiation Acknowledgement (INIT ACK) (2):
  *   The INIT ACK chunk is used to acknowledge the initiation of an SCTP
  *   association.
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..46d215b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -435,6 +435,7 @@ union sctp_params {
struct sctp_ipv6addr_param *v6;
union sctp_addr_param *addr;
struct sctp_adaptation_ind_param *aind;
+   struct sctp_supported_ext_param *ext;
 };
 
 /* RFC 2960.  Section 3.3.5 Heartbeat.
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 79856c9..3d8f85f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -179,6 +179,9 @@ struct sctp_chunk *sctp_make_init(const struct 
sctp_association *asoc,
sctp_supported_addrs_param_t sat;
__be16 types[2];
sctp_adaptation_ind_param_t aiparam;
+   sctp_supported_ext_param_t ext_param;
+   int num_ext = 0;
+   __u8 extensions[3];
 
/* RFC 2960 3.3.2 Initiation (INIT) (1)
 *
@@ -202,11 +205,31 @@ struct sctp_chunk *sctp_make_init(const struct 
sctp_association *asoc,
 
chunksize = sizeof(init) + addrs_len + SCTP_SAT_LEN(num_types);
chunksize += sizeof(ecap_param);
-   if (sctp_prsctp_enable)
+   if (sctp_prsctp_enable) {
chunksize += sizeof(prsctp_param);
+   extensions[num_ext] = SCTP_CID_FWD_TSN;
+   num_ext += 1;
+   }
+   /* ADDIP: Section 4.2.7:
+*  An implementation supporting this extension [ADDIP] MUST list
+*  the ASCONF,the ASCONF-ACK, and the AUTH  chunks in its INIT and
+*  INIT-ACK parameters.
+*  XXX: We don't support AUTH just yet, so don't list it.  AUTH
+*  support should add it.
+*/
+   if (sctp_addip_enable) {
+   extensions[num_ext] = SCTP_CID_ASCONF;
+   extensions[num_ext+1] = SCTP_CID_ASCONF_ACK;
+   num_ext += 2;
+   }
+
chunksize += sizeof(aiparam);
chunksize += vparam_len;
 
+   /* If we have any extensions to report, account for that */
+   if (num_ext)
+   chunksize += sizeof(sctp_supported_ext_param_t) + num_ext;
+
/* RFC 2960 3.3.2 Initiation (INIT) (1)
 *
 * Note 3: An INIT chunk MUST NOT contain more than one Host
@@ -241,12 +264,27 @@ struct sctp_chunk *sctp_make_init(con

[PATCH for 2.6.24] SCTP: Implete the Supported Extensions Parameter

2007-09-13 Thread Vlad Yasevich

SCTP Supported Extenions parameter is specified in Section 4.2.7
of the ADD-IP draft (soon to be RFC).  The parameter is
encoded as:

  0   1   2   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | Parameter Type = 0x8008   |  Parameter Length |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | CHUNK TYPE 1  |  CHUNK TYPE 2 |  CHUNK TYPE 3 |  CHUNK TYPE 4 |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | CHUNK TYPE N  |  PAD  |  PAD  |  PAD  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It contains the list of chunks that a particular SCTP extension
usues.  Current extensions supported are Partial Reliability
(FWD-TSN) and ADD-IP (ASCONF and ASCONF-ACK).

When implementing new extensions (AUTH, PKT-DROP, etc..), new
chunks need to be added to this parameter.  Parameter processing
would be modified to negotiate support for these features.

Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
---
 include/linux/sctp.h   |9 
 include/net/sctp/structs.h |1 +
 net/sctp/sm_make_chunk.c   |   91 +++-
 3 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index d70df61..f4d717b 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -180,6 +180,9 @@ typedef enum {
SCTP_PARAM_SUPPORTED_ADDRESS_TYPES  = __constant_htons(12),
SCTP_PARAM_ECN_CAPABLE  = __constant_htons(0x8000),
 
+   /* Add-IP: Supported Extensions, Section 4.2 */
+   SCTP_PARAM_SUPPORTED_EXT= __constant_htons(0x8008),
+
/* PR-SCTP Sec 3.1 */
SCTP_PARAM_FWD_TSN_SUPPORT  = __constant_htons(0xc000),
 
@@ -296,6 +299,12 @@ typedef struct sctp_adaptation_ind_param {
__be32 adaptation_ind;
 } __attribute__((packed)) sctp_adaptation_ind_param_t;
 
+/* ADDIP Section 4.2.7 Supported Extensions Parameter */
+typedef struct sctp_supported_ext_param {
+   struct sctp_paramhdr param_hdr;
+   __u8 chunks[0];
+} __attribute__((packed)) sctp_supported_ext_param_t;
+
 /* RFC 2960.  Section 3.3.3 Initiation Acknowledgement (INIT ACK) (2):
  *   The INIT ACK chunk is used to acknowledge the initiation of an SCTP
  *   association.
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..46d215b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -435,6 +435,7 @@ union sctp_params {
struct sctp_ipv6addr_param *v6;
union sctp_addr_param *addr;
struct sctp_adaptation_ind_param *aind;
+   struct sctp_supported_ext_param *ext;
 };
 
 /* RFC 2960.  Section 3.3.5 Heartbeat.
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 79856c9..3d8f85f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -179,6 +179,9 @@ struct sctp_chunk *sctp_make_init(const struct 
sctp_association *asoc,
sctp_supported_addrs_param_t sat;
__be16 types[2];
sctp_adaptation_ind_param_t aiparam;
+   sctp_supported_ext_param_t ext_param;
+   int num_ext = 0;
+   __u8 extensions[3];
 
/* RFC 2960 3.3.2 Initiation (INIT) (1)
 *
@@ -202,11 +205,31 @@ struct sctp_chunk *sctp_make_init(const struct 
sctp_association *asoc,
 
chunksize = sizeof(init) + addrs_len + SCTP_SAT_LEN(num_types);
chunksize += sizeof(ecap_param);
-   if (sctp_prsctp_enable)
+   if (sctp_prsctp_enable) {
chunksize += sizeof(prsctp_param);
+   extensions[num_ext] = SCTP_CID_FWD_TSN;
+   num_ext += 1;
+   }
+   /* ADDIP: Section 4.2.7:
+*  An implementation supporting this extension [ADDIP] MUST list
+*  the ASCONF,the ASCONF-ACK, and the AUTH  chunks in its INIT and
+*  INIT-ACK parameters.
+*  XXX: We don't support AUTH just yet, so don't list it.  AUTH
+*  support should add it.
+*/
+   if (sctp_addip_enable) {
+   extensions[num_ext] = SCTP_CID_ASCONF;
+   extensions[num_ext+1] = SCTP_CID_ASCONF_ACK;
+   num_ext += 2;
+   }
+
chunksize += sizeof(aiparam);
chunksize += vparam_len;
 
+   /* If we have any extensions to report, account for that */
+   if (num_ext)
+   chunksize += sizeof(sctp_supported_ext_param_t) + num_ext;
+
/* RFC 2960 3.3.2 Initiation (INIT) (1)
 *
 * Note 3: An INIT chunk MUST NOT contain more than one Host
@@ -241,12 +264,27 @@ struct sctp_chunk *sctp_make_init(const struct 
sctp_association *asoc,
sctp_addto_ch

incorrect cksum with tcp/udp on lo with 2.6.20/2.6.21/2.6.22

2007-09-13 Thread Krzysztof Oledzki


Hello,

It seems that after some not very recent changes udp and tcp packes 
carring data send by a loopback have incorrect cksum:


UDP:
# echo test|nc -u 127.0.0.1 

# tcpdump -i lo -n -v -v port 
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
19:43:39.340576 IP (tos 0x0, ttl  64, id 15179, offset 0, flags [DF], proto: UDP 
(17), length: 33) 127.0.0.1.49512 > 127.0.0.1.: [bad udp cksum 174c!] UDP, 
length 5

TCP:
# echo test|nc -u 127.0.0.1 

tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
*Correct:
19:44:27.692614 IP (tos 0x0, ttl  64, id 32100, offset 0, flags [DF], proto: TCP (6), 
length: 60) 127.0.0.1.53804 > 127.0.0.1.: S, cksum 0xfd54 (correct), 
3426125135:3426125135(0) win 32792 
19:44:27.692674 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto: TCP (6), 
length: 60) 127.0.0.1. > 127.0.0.1.53804: S, cksum 0xea3f (correct), 
3427916955:3427916955(0) ack 3426125136 win 32768 
19:44:27.692711 IP (tos 0x0, ttl  64, id 32101, offset 0, flags [DF], proto: TCP (6), 
length: 52) 127.0.0.1.53804 > 127.0.0.1.: ., cksum 0xd263 (correct), 1:1(0) ack 1 
win 257 

*Incorrect:
19:44:27.692831 IP (tos 0x0, ttl  64, id 32102, offset 0, flags [DF], proto: TCP (6), 
length: 57) 127.0.0.1.53804 > 127.0.0.1.: P, cksum 0xfe2d (incorrect (-> 0xe07c), 
1:6(5) ack 1 win 257 

*Correct:
19:44:27.692859 IP (tos 0x0, ttl  64, id 9399, offset 0, flags [DF], proto: TCP (6), 
length: 52) 127.0.0.1. > 127.0.0.1.53804: ., cksum 0xd25f (correct), 1:1(0) ack 6 
win 256 

Tested on:
 - 2.6.22.6
 - 2.6.21.7
 - 2.6.20.11

Best regards,


Krzysztof Olędzki

Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Shirley Ma

Hello Roland,
 
Since ehca can support 4K MTU, we would like to see a patch in 
IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24 
kernel. The idea is IPoIB link MTU will pick up a return value from SM's 
default broadcast MTU. This patch should be a small patch, I hope you are 
OK with this.

Thanks
Shirley
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Sean Hefty

> - My user_mad P_Key index support patch.  I'll test the ioctl to
>   change to the new mode and merge this I guess, since Hal and Sean
>   have tested this out.

I can give this patch a reviewed-by: too, and I will also try to review a couple
of the pending ipoib patches.

> - Sean's QoS changes.  These look fine at first glance, and I just
>   plan to understand the backwards compatibility story (ie how this
>   works with an old SM) and merge.  Anyone who objects let me know.

The new QoS fields fall into fields that are currently reserved, which should be
ignored by an older SM.  I've only tested this against openSM however.

> - Sean's IB CM MRA interface changes.  Don't know at this point.  It
>   seems OK but I'm not clear on what if any real-world improvement
>   this gives us.

This patch was generated in response to an Intel MPI issue.  We've seen MPI take
several minutes to respond to a connection request during the middle of large
application runs.  When this happens, the active side times out the connection.
In OFED, we added module parameters to adjust the rdma_cm connection timeout on
the active side, but I believe that sending an MRA from the passive side is a
better solution.

- Sean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU

2007-09-13 Thread Vlad Yasevich

Hi Sridhar

Sridhar Samudrala wrote:
> 
> looks good to me too. some minor typos and some comments on
> RCU usage comments inline.
> 
> Also, I guess we can remove the sctp_[read/write]_[un]lock macros
> from sctp.h now that you removed the all the users of rwlocks
> in SCTP

Ok.  I guess I pull them.

> 
> Thanks
> Sridhar
>>> Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
>>> ---
>>>  include/net/sctp/structs.h |7 +--
>>>  net/sctp/associola.c   |   14 +-
>>>  net/sctp/bind_addr.c   |   68 --
>>>  net/sctp/endpointola.c |   27 +++-
>>>  net/sctp/ipv6.c|   12 ++---
>>>  net/sctp/protocol.c|   25 ---
>>>  net/sctp/sm_make_chunk.c   |   18 +++-
>>>  net/sctp/socket.c  |   98 
>>> ---
>>>  8 files changed, 106 insertions(+), 163 deletions(-)
>>>
>>>
>>> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
>>> index 7fc369f..d16055f 100644
>>> --- a/net/sctp/bind_addr.c
>>> +++ b/net/sctp/bind_addr.c
>>> @@ -167,7 +167,11 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, 
>>> union sctp_addr *new,
>>>
>>> INIT_LIST_HEAD(&addr->list);
>>> INIT_RCU_HEAD(&addr->rcu);
>>> -   list_add_tail(&addr->list, &bp->address_list);
>>> +
>>> +   /* We always hold a socket lock when calling this function,
>>> +* so rcu_read_lock is not needed.
>>> +*/
>>> +   list_add_tail_rcu(&addr->list, &bp->address_list);
> 
> I am little confused with the comment above.
> Isn't this an update-side of RCU. If so, this should be protected
> by a spin-lock or a mutex rather than rcu_read_lock().
> 

Yes, the comment is confusing.  I put it there because I removed the 
rcu_read_lock() that
was also taken in prior version of the patch.  The comment should really say, 
that since
the socket is held, we don't need another synchronizing spin lock in this case.

>>> SCTP_DBG_OBJCNT_INC(addr);
>>>
>>> return 0;
>>> @@ -176,23 +180,35 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, 
>>> union sctp_addr *new,
>>>  /* Delete an address from the bind address list in the SCTP_bind_addr
>>>   * structure.
>>>   */
>>> -int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr 
>>> *del_addr)
>>> +int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr 
>>> *del_addr,
>>> +   void (*rcu_call)(struct rcu_head *head,
>>> +void (*func)(struct rcu_head *head)))
>>>  {
>>> -   struct list_head *pos, *temp;
>>> -   struct sctp_sockaddr_entry *addr;
>>> +   struct sctp_sockaddr_entry *addr, *temp;
>>>
>>> -   list_for_each_safe(pos, temp, &bp->address_list) {
>>> -   addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>>> +   /* We hold the socket lock when calling this function, so
>>> +* rcu_read_lock is not needed.
>>> +*/
> 
> Same as above. This is also an update-side of RCU protected 
> by socket lock.

Same reason.  Prior versions used rcu_spin_lock and I was just making a note 
that
that not needed.  I'll remove.

> 
>>> +   list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
>>> if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
>>> /* Found the exact match. */
>>> -   list_del(pos);
>>> -   kfree(addr);
>>> -   SCTP_DBG_OBJCNT_DEC(addr);
>>> -
>>> -   return 0;
>>> +   addr->valid = 0;
>>> +   list_del_rcu(&addr->list);
>>> +   break;
>>> }
>>> }
>>>
>>> +   /* Call the rcu callback provided in the args.  This function is
>>> +* called by both BH packet processing and user side socket option
>>> +* processing, but it works on different lists in those 2 contexts.
>>> +* Each context provides it's own callback, whether call_rc_bh()
> s/call_rc_bh/call_rcu_bh

yep.

> 
>>> +* or call_rcu(), to make sure that we wait an for appropriate time.
> s/an for/for an

yep.  fat fingered...

>>> @@ -295,20 +285,17 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
>>>  int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
>>> const union sctp_addr *paddr)
>>>  {
>>> -   struct list_head *pos;
>>> struct sctp_sockaddr_entry *addr;
>>> struct sctp_bind_addr *bp;
>>>
>>> -   sctp_read_lock(&ep->base.addr_lock);
>>> bp = &ep->base.bind_addr;
>>> -   list_for_each(pos, &bp->address_list) {
>>> -   addr = list_entry(pos, struct sctp_sockaddr_entry, list);
>>> -   if (sctp_has_association(&addr->a, paddr)) {
>>> -   sctp_read_unlock(&ep->base.addr_lock);
>>> +   /* This function is called whith the socket lock held,
> s/whith/with

ok


Thanks
-vlad

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Steve Wise


Hey Roland,

I was about to post v2 of my patch to avoid port space collisions with 
the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
I've tried to solicit review on it, but I think folks are reluctant... ;-)


Steve.



Roland Dreier wrote:

With 2.6.24 probably opening in the not-too-distant future, it's
probably a good time to review what my plans are for when the merge
window opens.

At the kernel summit, we discussed patch review (doing a web search
for "kernel summit" "reviewed-by:" should turn up lots of info on
this).  Due to an unfortunate combination of vacation and conference
travel, summer colds, and other inconveniences, I am very backed up on
reviewing.  And in any case, I've allowed too much code review to be
dumped on me -- when there are dozens of people working on IB and RDMA
stuff, it obviously doesn't work to expect me to do all the reviewing.

Unfortunately, due to the length of the backlog and the fact that
2.6.23 seems fairly close, some of the things listed below are going
to miss the 2.6.24 merge window.  So, although the plan is to phase in
requiring "Reviewed-by:" gently, for this merge, if you can get
someone other than me to review your work, then the chances of it
being merged increase dramatically.  I'm talking about a real review--
ideally, someone independent (from another company would be good) who
is willing to provide a "Reviewed-by:" line that means the reviewer
has really looked at and thought about the patch.  There should be a
mailing list thread you can point me at where the reviewer comments on
the patch and a new version of that patch addressing all comments is
posted (or in exceptional cases, where the patch is perfect to start
with, where the reviewer says the patch is great).

For example, given the number of IPoIB changes pending, it might be a
good idea for the people submitting them to get together and trade
reviews (ie "If you review my patch, I'll review your patch").  There
are a few cases where getting a review may not be necessary.  First of
all, trivial and obvious patches don't need a review.  It's a
judgement call what is trivial or obvious, and it's always a good idea
to provide a changelog that makes it clear why a patch is trivial and
obviously correct.  Second, hardware driver patches may not make sense
to anyone outside of the company whose hardware the driver is for.
Still, in this case, an internal Reviewed-by: would be nice, and also
a changelog that explains the reason for the change always helps
(don't just tell me what your patch does, but also explain what the
patch fixes and what the impact of the current situation is).

Anyway, here are all the pending things that I'm aware of.  As usual,
if something isn't already in my tree and isn't listed below, I
probably missed it or dropped it by mistake.  Please remind me again
in that case.

Core:

 - My user_mad P_Key index support patch.  I'll test the ioctl to
   change to the new mode and merge this I guess, since Hal and Sean
   have tested this out.

 - A fix to the user_mad 32-bit big-endian userspace 64/32 problem
   with the method_mask when registering agents.  I'll write a patch
   to handle this in a way that doesn't change the ABI for anything
   other than the broken case and hope to get someone to review this
   so it can be merged.

 - Sean's QoS changes.  These look fine at first glance, and I just
   plan to understand the backwards compatibility story (ie how this
   works with an old SM) and merge.  Anyone who objects let me know.

 - Sean's IB CM MRA interface changes.  Don't know at this point.  It
   seems OK but I'm not clear on what if any real-world improvement
   this gives us.

ULPs:

 - Pradeep's IPoIB CM support for devices that don't have SRQs.  I
   think the basic approach makes sense (I don't think faking SRQs at
   some other layer is really feasible) and I need to find time to
   look at the details to see if the current patch looks workable.  I'm
   likely to merge this; getting an independent Reviewed-by: would
   certainly be appreciated too.

 - Moni's IPoIB bonding support.  This seems mostly an issue of
   getting the core bonding maintainer's attention.  However getting a
   Reviewed-by: for the IPoIB changes wouldn't hurt too.

 - Rolf's IPoIB MGID scope changes.  Certainly we want to fix this
   issue but the specific changes need review.

 - Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
   LRO, etc).  It's a big series that makes quite a few core changes.
   I think it needs some careful review and is probably at risk of
   missing this merge window.  Sorting in order of invasiveness so we
   can merge at least some of it (if splitting it makes sense) might
   be a good idea.

HW specific:

 - I already merged patches to enable MSI-X by default for mthca and
   mlx4.  I hope there aren't too many systems that get hosed if a
   MSI-X interrupt is generated.

 - Jack and Michael's mlx4 FMR support.  Will merge

Re: [RFC v3 PATCH 2/21] SCTP: Convert bind_addr_list locking to RCU

2007-09-13 Thread Sridhar Samudrala

On Wed, 2007-09-12 at 15:33 -0700, Paul E. McKenney wrote:
> On Wed, Sep 12, 2007 at 05:03:42PM -0400, Vlad Yasevich wrote:
> > [... and here is the updated version as promissed ...]
> > 
> > Since the sctp_sockaddr_entry is now RCU enabled as part of
> > the patch to synchronize sctp_localaddr_list, it makes sense to
> > change all handling of these entries to RCU.  This includes the
> > sctp_bind_addrs structure and it's list of bound addresses.
> > 
> > This list is currently protected by an external rw_lock and that
> > looks like an overkill.  There are only 2 writers to the list:
> > bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
> > These are already seriealized via the socket lock, so they will
> > not step on each other.  These are also relatively rare, so we
> > should be good with RCU.
> > 
> > The readers are varied and they are easily converted to RCU.
> 
> Looks good from an RCU viewpoint -- I must defer to others on
> the networking aspects.
> 
> Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>

looks good to me too. some minor typos and some comments on
RCU usage comments inline.

Also, I guess we can remove the sctp_[read/write]_[un]lock macros
from sctp.h now that you removed the all the users of rwlocks
in SCTP

Thanks
Sridhar
> 
> > Signed-off-by: Vlad Yasevich <[EMAIL PROTECTED]>
> > ---
> >  include/net/sctp/structs.h |7 +--
> >  net/sctp/associola.c   |   14 +-
> >  net/sctp/bind_addr.c   |   68 --
> >  net/sctp/endpointola.c |   27 +++-
> >  net/sctp/ipv6.c|   12 ++---
> >  net/sctp/protocol.c|   25 ---
> >  net/sctp/sm_make_chunk.c   |   18 +++-
> >  net/sctp/socket.c  |   98 
> > ---
> >  8 files changed, 106 insertions(+), 163 deletions(-)
> > 
> > diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> > index a89e361..c2fe2dc 100644
> > --- a/include/net/sctp/structs.h
> > +++ b/include/net/sctp/structs.h
> > @@ -1155,7 +1155,9 @@ int sctp_bind_addr_copy(struct sctp_bind_addr *dest,
> > int flags);
> >  int sctp_add_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
> >__u8 use_as_src, gfp_t gfp);
> > -int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *);
> > +int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *,
> > +   void (*rcu_call)(struct rcu_head *,
> > + void (*func)(struct rcu_head *)));
> >  int sctp_bind_addr_match(struct sctp_bind_addr *, const union sctp_addr *,
> >  struct sctp_sock *);
> >  union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr  *bp,
> > @@ -1226,9 +1228,6 @@ struct sctp_ep_common {
> >  * bind_addr.address_list is our set of local IP addresses.
> >  */
> > struct sctp_bind_addr bind_addr;
> > -
> > -   /* Protection during address list comparisons. */
> > -   rwlock_t   addr_lock;
> >  };
> > 
> > 
> > diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> > index 2ad1caf..9bad8ba 100644
> > --- a/net/sctp/associola.c
> > +++ b/net/sctp/associola.c
> > @@ -99,7 +99,6 @@ static struct sctp_association 
> > *sctp_association_init(struct sctp_association *a
> > 
> > /* Initialize the bind addr area.  */
> > sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
> > -   rwlock_init(&asoc->base.addr_lock);
> > 
> > asoc->state = SCTP_STATE_CLOSED;
> > 
> > @@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct 
> > sctp_association *asoc,
> >  {
> > struct sctp_transport *transport;
> > 
> > -   sctp_read_lock(&asoc->base.addr_lock);
> > -
> > if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
> > (htons(asoc->peer.port) == paddr->v4.sin_port)) {
> > transport = sctp_assoc_lookup_paddr(asoc, paddr);
> > @@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct 
> > sctp_association *asoc,
> > transport = NULL;
> > 
> >  out:
> > -   sctp_read_unlock(&asoc->base.addr_lock);
> > return transport;
> >  }
> > 
> > @@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct 
> > sctp_association *asoc,
> >  int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
> > const union sctp_addr *laddr)
> >  {
> > -   int found;
> > +   int found = 0;
> > 
> > -   sctp_read_lock(&asoc->base.addr_lock);
> > if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
> > sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
> > -sctp_sk(asoc->base.sk))) {
> > +sctp_sk(asoc->base.sk)))
> > found = 1;
> > -   goto out;
> > -   }
> > 
> > -   found = 0;
> > -out:
> > -   sctp_read_unlock(&asoc->base.addr_lock);
> > return found;
> >  }
> > 
> > diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> > index 7

InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier

With 2.6.24 probably opening in the not-too-distant future, it's
probably a good time to review what my plans are for when the merge
window opens.

At the kernel summit, we discussed patch review (doing a web search
for "kernel summit" "reviewed-by:" should turn up lots of info on
this).  Due to an unfortunate combination of vacation and conference
travel, summer colds, and other inconveniences, I am very backed up on
reviewing.  And in any case, I've allowed too much code review to be
dumped on me -- when there are dozens of people working on IB and RDMA
stuff, it obviously doesn't work to expect me to do all the reviewing.

Unfortunately, due to the length of the backlog and the fact that
2.6.23 seems fairly close, some of the things listed below are going
to miss the 2.6.24 merge window.  So, although the plan is to phase in
requiring "Reviewed-by:" gently, for this merge, if you can get
someone other than me to review your work, then the chances of it
being merged increase dramatically.  I'm talking about a real review--
ideally, someone independent (from another company would be good) who
is willing to provide a "Reviewed-by:" line that means the reviewer
has really looked at and thought about the patch.  There should be a
mailing list thread you can point me at where the reviewer comments on
the patch and a new version of that patch addressing all comments is
posted (or in exceptional cases, where the patch is perfect to start
with, where the reviewer says the patch is great).

For example, given the number of IPoIB changes pending, it might be a
good idea for the people submitting them to get together and trade
reviews (ie "If you review my patch, I'll review your patch").  There
are a few cases where getting a review may not be necessary.  First of
all, trivial and obvious patches don't need a review.  It's a
judgement call what is trivial or obvious, and it's always a good idea
to provide a changelog that makes it clear why a patch is trivial and
obviously correct.  Second, hardware driver patches may not make sense
to anyone outside of the company whose hardware the driver is for.
Still, in this case, an internal Reviewed-by: would be nice, and also
a changelog that explains the reason for the change always helps
(don't just tell me what your patch does, but also explain what the
patch fixes and what the impact of the current situation is).

Anyway, here are all the pending things that I'm aware of.  As usual,
if something isn't already in my tree and isn't listed below, I
probably missed it or dropped it by mistake.  Please remind me again
in that case.

Core:

 - My user_mad P_Key index support patch.  I'll test the ioctl to
   change to the new mode and merge this I guess, since Hal and Sean
   have tested this out.

 - A fix to the user_mad 32-bit big-endian userspace 64/32 problem
   with the method_mask when registering agents.  I'll write a patch
   to handle this in a way that doesn't change the ABI for anything
   other than the broken case and hope to get someone to review this
   so it can be merged.

 - Sean's QoS changes.  These look fine at first glance, and I just
   plan to understand the backwards compatibility story (ie how this
   works with an old SM) and merge.  Anyone who objects let me know.

 - Sean's IB CM MRA interface changes.  Don't know at this point.  It
   seems OK but I'm not clear on what if any real-world improvement
   this gives us.

ULPs:

 - Pradeep's IPoIB CM support for devices that don't have SRQs.  I
   think the basic approach makes sense (I don't think faking SRQs at
   some other layer is really feasible) and I need to find time to
   look at the details to see if the current patch looks workable.  I'm
   likely to merge this; getting an independent Reviewed-by: would
   certainly be appreciated too.

 - Moni's IPoIB bonding support.  This seems mostly an issue of
   getting the core bonding maintainer's attention.  However getting a
   Reviewed-by: for the IPoIB changes wouldn't hurt too.

 - Rolf's IPoIB MGID scope changes.  Certainly we want to fix this
   issue but the specific changes need review.

 - Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
   LRO, etc).  It's a big series that makes quite a few core changes.
   I think it needs some careful review and is probably at risk of
   missing this merge window.  Sorting in order of invasiveness so we
   can merge at least some of it (if splitting it makes sense) might
   be a good idea.

HW specific:

 - I already merged patches to enable MSI-X by default for mthca and
   mlx4.  I hope there aren't too many systems that get hosed if a
   MSI-X interrupt is generated.

 - Jack and Michael's mlx4 FMR support.  Will merge I guess, although
   I do hope to have time to address the DMA API abuse that is being
   copied from mthca, so that mlx4 and mthca work in Xen domU.

 - ehca patch queue.  Will merge, pending fixes for the few minor
   issues I commented on.

 - Steve's mthca rou

Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot

2007-09-13 Thread Michael Chan

On Thu, 2007-09-13 at 17:41 +0200, Lucas Nussbaum wrote:

> # ethtool -i eth0
> driver: tg3
> version: 3.65
> firmware-version: 5703-v2.21a
> bus-info: :02:02.0

The firmware is quite old and needs to be upgraded to fix the problem.
I'll have someone contact you to get it upgraded.

> 
> What do you mean by "what machine" ? The systems are Dell PowerEdge
> 1600SC, but the NICs were bought separately AFAIK.

I assumed the device was on-board which would normally require a BIOS
upgrade.  For NICs, it's easier.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RFC] allow admin/users to specify rto_min in milliseconds rather than jiffies

2007-09-13 Thread Rick Jones


Your observations are correct. rtnetlink can't/shouldn't be doing conversions
itself.  The 'ip' command should use a consistent unit for all values and
do conversions if necessary.


That being the case I'll start looking to see what is involved in 
"leveraging" the time conversion stuff in tc for use in ip.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: Fix the prototype of call_netdevice_notifiers

2007-09-13 Thread Eric W. Biederman


This replaces the void * parameter with a struct net_device * which
is what is actually required.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 include/linux/netdevice.h |2 +-
 net/core/dev.c|4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0106fa6..90aecc3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,7 +779,7 @@ extern void free_netdev(struct net_device *dev);
 extern voidsynchronize_net(void);
 extern int register_netdevice_notifier(struct notifier_block *nb);
 extern int unregister_netdevice_notifier(struct notifier_block 
*nb);
-extern int call_netdevice_notifiers(unsigned long val, void *v);
+extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
 extern struct net_device   *dev_get_by_index(struct net *net, int ifindex);
 extern struct net_device   *__dev_get_by_index(struct net *net, int 
ifindex);
 extern int dev_restart(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index f119dc0..cc343dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1206,9 +1206,9 @@ int unregister_netdevice_notifier(struct notifier_block 
*nb)
  * are as for raw_notifier_call_chain().
  */
 
-int call_netdevice_notifiers(unsigned long val, void *v)
+int call_netdevice_notifiers(unsigned long val, struct net_device *dev)
 {
-   return raw_notifier_call_chain(&netdev_chain, val, v);
+   return raw_notifier_call_chain(&netdev_chain, val, dev);
 }
 
 /* When > 0 there are consumers of rx skb time stamps */
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...

2007-09-13 Thread Jeff Garzik


Ralf Baechle wrote:

On Thu, Sep 13, 2007 at 03:13:06PM +0100, Maciej W. Rozycki wrote:

 Hmm, works fine with linux-2.6.git#master.  I do not recall any recent 
activity with this driver -- I wonder what the difference is.  Let me 
see...


Hmm...  HEAD du jour has no differences for the sb1250-mac between lmo
and kernel.org.


Net driver patches should apply on top of netdev-2.6.git#upstream, which 
is where changes to net drivers are queued for the next release.


The closer we get to the merge window, the greater the diff between 
netdev-2.6.git#upstream and linux-2.6.git#master, so 
linux-2.6.git#master is not a useful comparison.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot

2007-09-13 Thread Lucas Nussbaum

On 13/09/07 at 08:15 -0700, Michael Chan wrote:
> Lucas Nussbaum wrote:
> 
> > This used to work, and broke between 2.6.16 and 2.6.17. Using 
> > git bissect,
> > I could trace this back to that commit:
> > commit bc1c756741b065cfebf850e4164c0e2aae9d527f
> > Author: Michael Chan <[EMAIL PROTECTED]>
> > Date:   Mon Mar 20 17:48:03 2006 -0800
> > [TG3]: Support shutdown WoL.
> 
> This may be caused by bugs in early versions of bootcode or
> PXE code.  When tg3 powers down the PHY during shutdown, the
> MAC address will become zero in the MAC address register.  The
> PXE code or bootcode needs to fetch the MAC address again from
> The NVRAM.
> 
> Can you also send me ethtool -i eth0 which will provide the
> bootcode version?  What machine are you using?  Thanks.

# ethtool -i eth0
driver: tg3
version: 3.65
firmware-version: 5703-v2.21a
bus-info: :02:02.0

What do you mean by "what machine" ? The systems are Dell PowerEdge
1600SC, but the NICs were bought separately AFAIK.
-- 
| Lucas NussbaumPhD student |
| [EMAIL PROTECTED]LIG / Projet MESCAL |
| jabber: [EMAIL PROTECTED]+33 (0)6 64 71 41 65 |
| homepage:http://www-id.imag.fr/~nussbaum/ |
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ucc_geth: fix compilation

2007-09-13 Thread Anton Vorontsov

Currently qe_bd_t is used in the macro call -- dma_unmap_single,
which is a no-op on PPC32, thus error is hidden today. Starting
with 2.6.24, macro will be replaced by the empty static function,
and erroneous use of qe_bd_t will trigger compilation error.

Signed-off-by: Anton Vorontsov <[EMAIL PROTECTED]>
---

Reposting this to include netdev in Cc.

 drivers/net/ucc_geth.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 12e01b2..9a38dfe 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2148,7 +2148,7 @@ static void ucc_geth_memclean(struct ucc_geth_private 
*ugeth)
for (j = 0; j < ugeth->ug_info->bdRingLenTx[i]; j++) {
if (ugeth->tx_skbuff[i][j]) {
dma_unmap_single(NULL,
-((qe_bd_t *)bd)->buf,
+((struct qe_bd *)bd)->buf,
 (in_be32((u32 *)bd) &
  BD_LENGTH_MASK),
 DMA_TO_DEVICE);
-- 
1.5.0.6
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] tg3 cannot do PXE (loses MAC address) after soft reboot

2007-09-13 Thread Michael Chan

Lucas Nussbaum wrote:

> This used to work, and broke between 2.6.16 and 2.6.17. Using 
> git bissect,
> I could trace this back to that commit:
> commit bc1c756741b065cfebf850e4164c0e2aae9d527f
> Author: Michael Chan <[EMAIL PROTECTED]>
> Date:   Mon Mar 20 17:48:03 2006 -0800
> [TG3]: Support shutdown WoL.

This may be caused by bugs in early versions of bootcode or
PXE code.  When tg3 powers down the PHY during shutdown, the
MAC address will become zero in the MAC address register.  The
PXE code or bootcode needs to fetch the MAC address again from
The NVRAM.

Can you also send me ethtool -i eth0 which will provide the
bootcode version?  What machine are you using?  Thanks.

> 
> During boot, the following messages are displayed:
> Broadcom NetXtreme Gigabit Ethernet Boot Agent v2.2.8
> [...]
> Broadcom UNDI, PXE-2.1 (build 082) v2.2.8
> [...]
> CLIENT MAC ADDR: 00 10 18 01 E5 2F GUID: 44454C4C 4800 1052 8032
> B9C04F53304A
> 
> After a soft reboot, the last line is changed to:
> CLIENT MAC ADDR: 00 00 00 00 00 00 GUID: 44454C4C 4800 1052 8032
> B9C04F53304A
> 
> lspci -v for the card:
> 02:02.0 Ethernet controller: Broadcom Corporation NetXtreme 
> BCM5703X Gigabit Ethernet (rev 02)
> Subsystem: Broadcom Corporation NetXtreme BCM5703 1000Base-T
> Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 177
> Memory at fcf0 (64-bit, non-prefetchable) [size=64K]
> Capabilities: [40] PCI-X non-bridge device
> Capabilities: [48] Power Management version 2
> Capabilities: [50] Vital Product Data
> Capabilities: [58] Message Signalled Interrupts: 
> Mask- 64bit+ Queue=0/3 Enable-
> 
> Thank you,
> -- 
> | Lucas NussbaumPhD student |
> | [EMAIL PROTECTED]LIG / Projet MESCAL |
> | jabber: [EMAIL PROTECTED]+33 (0)6 64 71 41 65 |
> | homepage:http://www-id.imag.fr/~nussbaum/ |
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...

2007-09-13 Thread Ralf Baechle

On Thu, Sep 13, 2007 at 03:13:06PM +0100, Maciej W. Rozycki wrote:

>  Hmm, works fine with linux-2.6.git#master.  I do not recall any recent 
> activity with this driver -- I wonder what the difference is.  Let me 
> see...

Hmm...  HEAD du jour has no differences for the sb1250-mac between lmo
and kernel.org.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Paul E. McKenney

On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote:
> Hi Paul.
> 
> On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL 
> PROTECTED]) wrote:
> > > Further TODO list includes:
> > > * implement optional saving of mirroring/linear information on the remote
> > >   nodes (simple)
> > > * implement netlink based setup (simple)
> > > * new redundancy algorithm (complex)
> > > 
> > > Homepage:
> > > http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
> > 
> > A couple questions below, but otherwise looks good from an RCU viewpoint.
> > 
> > Thanx, Paul
> 
> Thanks for your comments, and sorry for late reply I was at KS/London
> trip.
> > > + if (--num) {
> > > + list_for_each_entry_rcu(n, &node->shared, shared) {
> > 
> > This function is called under rcu_read_lock() or similar, right?
> > (Can't tell from this patch.)  It is also OK to call it from under the
> > update-side mutex, of course.
> 
> Actually not, but it does not require it, since entry can not be removed
> during this operations since appropriate reference counter for given node is
> being held. It should not be RCU at all.

Ah!  Yes, it is OK to use _rcu in this case, but should be avoided
unless doing so eliminates duplicate code or some such.  So, agree
with dropping _rcu in this case.

> > > +static int dst_mirror_read(struct dst_request *req)
> > > +{
> > > + struct dst_node *node = req->node, *n, *min_dist_node;
> > > + struct dst_mirror_priv *priv = node->priv;
> > > + u64 dist, d;
> > > + int err;
> > > +
> > > + req->bio_endio = &dst_mirror_read_endio;
> > > +
> > > + do {
> > > + err = -ENODEV;
> > > + min_dist_node = NULL;
> > > + dist = -1ULL;
> > > + 
> > > + /*
> > > +  * Reading is never performed from the node under resync.
> > > +  * If this will cause any troubles (like all nodes must be
> > > +  * resynced between each other), this check can be removed
> > > +  * and per-chunk dirty bit can be tested instead.
> > > +  */
> > > +
> > > + if (!test_bit(DST_NODE_NOTSYNC, &node->flags)) {
> > > + priv = node->priv;
> > > + if (req->start > priv->last_start)
> > > + dist = req->start - priv->last_start;
> > > + else
> > > + dist = priv->last_start - req->start;
> > > + min_dist_node = req->node;
> > > + }
> > > +
> > > + list_for_each_entry_rcu(n, &node->shared, shared) {
> > 
> > I see one call to this function that appears to be under the update-side
> > mutex, but I cannot tell if the other calls are safe.  (Safe as in either
> > under the update-side mutex or under rcu_read_lock() and friends.)
> 
> The same here - those processing function are called from
> generic_make_request() from any lock on top of them. Each node is linked
> into the list of the first added node, which reference counter is
> increased in higher layer. Right now there is no way to add or remove
> nodes after array was started, such functionality requires storage tree
> lock to be taken and RCU can not be used (since it requires sleeping and
> I did not investigate sleepable RCU for this purpose).
> 
> So, essentially RCU is not used in DST :)

Works for me!  "Use the right tool for the job!"

> Thanks for review, Paul.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v3] Make the pr_*() family of macros in kernel.hcomplete

2007-09-13 Thread Medve Emilian-EMMEDVE1

Hello Joe,


> I expect all the kernel logging functions to be
> overhauled eventually.
> 
> I'd prefer a mechanism that somehow supports
> identifying complete messages.  I think the new
> pr_ functions are not particularly useful
> without a mechanism to avoid or identify multiple
> processors or threads interleaving partial in-progress
> multiple statement messages.

I agree with you that one can think and propose an improved kernel
logging system, but that might be an incremental effort. For now,
patches like the ones you or I sent are a step in the general direction
of improving kernel logging, fix an inconsistency and  increase the
probability of people logging kernel message as intended (i.e. at a
minimum, with a loglevel). I don't think that this hurts or delays the
perceived urgency of getting a sub-optimal kernel logging mechanism...

> At some point, sooner or later, the logging functions
> will be improved.  Apparently, more likely later.

I'm not sure way must it be later or why the resistance about a little
better and sooner.


Cheerios,
Emil.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-09-13 Thread Jeff Garzik


Andrew Morton wrote:

--- 
a/drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt
+++ a/drivers/net/tulip/uli526x.c
@@ -666,11 +666,6 @@ static irqreturn_t uli526x_interrupt(int
unsigned long ioaddr = dev->base_addr;
unsigned long flags;
 
-	if (!dev) {

-   ULI526X_DBUG(1, "uli526x_interrupt() without DEVICE arg", 0);
-   return IRQ_NONE;
-   }
-



correct / ACK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...

2007-09-13 Thread Maciej W. Rozycki

On Wed, 12 Sep 2007, Jeff Garzik wrote:

> > Remove typedefs, volatiles and convert kmalloc()/memset() pairs to
> > kcalloc().  Also reformat the surrounding clutter.
> > 
> > Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
> > ---
> 
> ACK, but patch does not apply cleanly to netdev-2.6.git#upstream (nor -mm)

 Hmm, works fine with linux-2.6.git#master.  I do not recall any recent 
activity with this driver -- I wonder what the difference is.  Let me 
see...

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list

2007-09-13 Thread Vlad Yasevich

Hi Sridhar

Sridhar Samudrala wrote:
> Vlad,
> 
> few minor comments inline.
> otherwise, looks good.
> 
> Thanks
> Sridhar
> 
>> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
>> index f8aa23d..54ff472 100644
>> --- a/net/sctp/ipv6.c
>> +++ b/net/sctp/ipv6.c
>> @@ -77,13 +77,18 @@
>>
>>  #include 
>>
>> -/* Event handler for inet6 address addition/deletion events.  */
>> +/* Event handler for inet6 address addition/deletion events.
>> + * This even is part of the atomic notifier call chain
>> + * and thus happens atomically and can NOT sleep.  As a result
>> + * we can't and really don't need to add any locks to guard the
>> + * RCU.
>> + */
> 
> Now that we are adding a spin_lock, the above comment is not valid.
> It should be fixed saying that we still need a lock because we use the 
> same list for both inet and inet6 address events and they can happen in
> parallel.

Yes, I forgot to fix this comment.  Will do.


>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>> index e98579b..4688559 100644
>> --- a/net/sctp/protocol.c
>> +++ b/net/sctp/protocol.c
>> @@ -153,6 +153,8 @@ static void sctp_v4_copy_addrlist(struct list_head 
>> *addrlist,
>>  addr->a.v4.sin_family = AF_INET;
>>  addr->a.v4.sin_port = 0;
>>  addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
>> +addr->valid = 1;
>> +INIT_RCU_HEAD(&addr->rcu);
> 
> This has nothing to do with this patch, but i noticed that
> INIT_LIST_HEAD(&addr->list) is missing here when comparing with
> earlier v6 version of this routine.

Hmm...  I thought it looked a little different, but didn't pay too much
attention to it.  I'll add a follow-on patch to fix this.

Thanks
-vlad
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] af_packet: allow disabling timestamps

2007-09-13 Thread Eric Dumazet

On Thu, 13 Sep 2007 12:42:53 +0200
Stephen Hemminger <[EMAIL PROTECTED]> wrote:

> Currently, af_packet does not allow disabling timestamps. This patch changes
> that but doesn't force global timestamps on.
> 
> This shows up in bugzilla as:
>   http://bugzilla.kernel.org/show_bug.cgi?id=4809
> 
> Patch against net-2.6.24 tree.
> 

I am not sure I understood this patch.

This means that tcpdump/ethereal wont get precise timestamps 
(gathered when packet is received), but imprecise ones (gathered when the 
sniffer reads the packet)

I added some time ago ktime infrastructure to eventually get nanosecond 
precision in libpcap, so I would prefer a step in the right direction :)

Should'nt we use something like :

[PATCH] af_packet : allow disabling timestamps, or requesting nanosecond 
precision.

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

diff --git a/net/core/sock.c b/net/core/sock.c
index 5a16e38..1c10b9d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -563,6 +563,7 @@ set_rcvbuf:
} else {
sock_reset_flag(sk, SOCK_RCVTSTAMP);
sock_reset_flag(sk, SOCK_RCVTSTAMPNS);
+   sock_disable_timestamp(sk);
}
break;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 745e2cb..409de44 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -650,12 +650,27 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev, struct packe
h->tp_snaplen = snaplen;
h->tp_mac = macoff;
h->tp_net = netoff;
-   if (skb->tstamp.tv64)
-   tv = ktime_to_timeval(skb->tstamp);
-   else
-   do_gettimeofday(&tv);
-   h->tp_sec = tv.tv_sec;
-   h->tp_usec = tv.tv_usec;
+   h->tp_sec = 0;
+   h->tp_usec = 0;
+   if ((sock_flag(sk, SOCK_TIMESTAMP))) {
+   if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+   struct timespec ts;
+   if (skb->tstamp.tv64)
+   ts = ktime_to_timespec(skb->tstamp);
+   else
+   getnstimeofday(&ts);
+   h->tp_sec = ts.tv_sec;
+   h->tp_usec = ts.tv_nsec; /* cheat a litle bit */
+   }
+   else {
+   if (skb->tstamp.tv64)
+   tv = ktime_to_timeval(skb->tstamp);
+   else
+   do_gettimeofday(&tv);
+   h->tp_sec = tv.tv_sec;
+   h->tp_usec = tv.tv_usec;
+   }
+   }
 
sll = (struct sockaddr_ll*)((u8*)h + TPACKET_ALIGN(sizeof(*h)));
sll->sll_halen = 0;
@@ -1014,6 +1029,7 @@ static int packet_create(struct net *net, struct socket 
*sock, int protocol)
sock->ops = &packet_ops_spkt;
 
sock_init_data(sock, sk);
+   sock_enable_timestamp(sk);
 
po = pkt_sk(sk);
sk->sk_family = PF_PACKET;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Evgeniy Polyakov

Hi Paul.

On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) 
wrote:
> > Further TODO list includes:
> > * implement optional saving of mirroring/linear information on the remote
> > nodes (simple)
> > * implement netlink based setup (simple)
> > * new redundancy algorithm (complex)
> > 
> > Homepage:
> > http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst
> 
> A couple questions below, but otherwise looks good from an RCU viewpoint.
> 
>   Thanx, Paul

Thanks for your comments, and sorry for late reply I was at KS/London
trip.
> > +   if (--num) {
> > +   list_for_each_entry_rcu(n, &node->shared, shared) {
> 
> This function is called under rcu_read_lock() or similar, right?
> (Can't tell from this patch.)  It is also OK to call it from under the
> update-side mutex, of course.

Actually not, but it does not require it, since entry can not be removed
during this operations since appropriate reference counter for given node is
being held. It should not be RCU at all.

> > +static int dst_mirror_read(struct dst_request *req)
> > +{
> > +   struct dst_node *node = req->node, *n, *min_dist_node;
> > +   struct dst_mirror_priv *priv = node->priv;
> > +   u64 dist, d;
> > +   int err;
> > +
> > +   req->bio_endio = &dst_mirror_read_endio;
> > +
> > +   do {
> > +   err = -ENODEV;
> > +   min_dist_node = NULL;
> > +   dist = -1ULL;
> > + 
> > +   /*
> > +* Reading is never performed from the node under resync.
> > +* If this will cause any troubles (like all nodes must be
> > +* resynced between each other), this check can be removed
> > +* and per-chunk dirty bit can be tested instead.
> > +*/
> > +
> > +   if (!test_bit(DST_NODE_NOTSYNC, &node->flags)) {
> > +   priv = node->priv;
> > +   if (req->start > priv->last_start)
> > +   dist = req->start - priv->last_start;
> > +   else
> > +   dist = priv->last_start - req->start;
> > +   min_dist_node = req->node;
> > +   }
> > +
> > +   list_for_each_entry_rcu(n, &node->shared, shared) {
> 
> I see one call to this function that appears to be under the update-side
> mutex, but I cannot tell if the other calls are safe.  (Safe as in either
> under the update-side mutex or under rcu_read_lock() and friends.)

The same here - those processing function are called from
generic_make_request() from any lock on top of them. Each node is linked
into the list of the first added node, which reference counter is
increased in higher layer. Right now there is no way to add or remove
nodes after array was started, such functionality requires storage tree
lock to be taken and RCU can not be used (since it requires sleeping and
I did not investigate sleepable RCU for this purpose).

So, essentially RCU is not used in DST :)

Thanks for review, Paul.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][MIPS][7/7] AR7: ethernet

2007-09-13 Thread Ralf Baechle

On Thu, Sep 13, 2007 at 02:42:46AM +0100, Thiemo Seufer wrote:

> > All struct members here are sized such that there is no padding needed, so
> > the packed attribute doesn't buy you anything - unless of course the
> > entire structure is missaligned but I don't see how that would be possible
> > in this driver so the __attribute__ ((packed)) should go - it result in
> > somwhat larger and slower code.
> 
> FWIW, a modern gcc will warn about such superfluous packed attributes,
> that's another reason to remove those.

I doubt it will in this case; the packed structure is dereferenced by a
pointer so no way for gcc to know the alignment.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] af_packet: allow disabling timestamps

2007-09-13 Thread Stephen Hemminger

Currently, af_packet does not allow disabling timestamps. This patch changes
that but doesn't force global timestamps on.

This shows up in bugzilla as:
http://bugzilla.kernel.org/show_bug.cgi?id=4809

Patch against net-2.6.24 tree.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- a/net/core/sock.c   2007-09-12 15:08:43.0 +0200
+++ b/net/core/sock.c   2007-09-13 12:10:19.0 +0200
@@ -259,7 +259,8 @@ static void sock_disable_timestamp(struc
 {
if (sock_flag(sk, SOCK_TIMESTAMP)) {
sock_reset_flag(sk, SOCK_TIMESTAMP);
-   net_disable_timestamp();
+   if (sk->sk_family != PF_PACKET)
+   net_disable_timestamp();
}
 }
 
@@ -1645,7 +1646,8 @@ void sock_enable_timestamp(struct sock *
 {
if (!sock_flag(sk, SOCK_TIMESTAMP)) {
sock_set_flag(sk, SOCK_TIMESTAMP);
-   net_enable_timestamp();
+   if (sk->sk_family != PF_PACKET)
+   net_enable_timestamp();
}
 }
 EXPORT_SYMBOL(sock_enable_timestamp);
--- a/net/packet/af_packet.c2007-09-12 17:07:00.0 +0200
+++ b/net/packet/af_packet.c2007-09-13 12:09:10.0 +0200
@@ -572,7 +572,6 @@ static int tpacket_rcv(struct sk_buff *s
unsigned long status = TP_STATUS_LOSING|TP_STATUS_USER;
unsigned short macoff, netoff;
struct sk_buff *copy_skb = NULL;
-   struct timeval tv;
 
if (dev->nd_net != &init_net)
goto drop;
@@ -650,12 +649,19 @@ static int tpacket_rcv(struct sk_buff *s
h->tp_snaplen = snaplen;
h->tp_mac = macoff;
h->tp_net = netoff;
-   if (skb->tstamp.tv64)
-   tv = ktime_to_timeval(skb->tstamp);
-   else
-   do_gettimeofday(&tv);
-   h->tp_sec = tv.tv_sec;
-   h->tp_usec = tv.tv_usec;
+
+   if (sock_flag(sk, SOCK_TIMESTAMP)) {
+   struct timeval tv;
+   if (skb->tstamp.tv64)
+   tv = ktime_to_timeval(skb->tstamp);
+   else
+   do_gettimeofday(&tv);
+   h->tp_sec = tv.tv_sec;
+   h->tp_usec = tv.tv_usec;
+   } else {
+   h->tp_sec = 0;
+   h->tp_usec = 0;
+   }
 
sll = (struct sockaddr_ll*)((u8*)h + TPACKET_ALIGN(sizeof(*h)));
sll->sll_halen = 0;
@@ -1014,6 +1020,7 @@ static int packet_create(struct net *net
sock->ops = &packet_ops_spkt;
 
sock_init_data(sock, sk);
+   sock_set_flag(sk, SOCK_TIMESTAMP);
 
po = pkt_sk(sk);
sk->sk_family = PF_PACKET;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] [IPROUTE2] iproute2: link_veth support bug fixes.

2007-09-13 Thread Stephen Hemminger

On Wed, 12 Sep 2007 09:13:02 -0600
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> Pavel Emelyanov <[EMAIL PROTECTED]> writes:
> 
> > [snip]
> >
> >> @@ -25,6 +26,3 @@ clean:
> >>  
> >>  LDLIBS+= -ldl
> >>  LDFLAGS   += -Wl,-export-dynamic
> >> -
> >> -%.so: %.c
> >> -  $(CC) $(CFLAGS) -shared $< -o $@
> >
> > %) How do we get the .so file then?
> 
> The code was built into iproute2, so we don't need the .so file.
> That rule does not work on arch/x86_64 so I had to do something
> and the easiest was to simply compile the code in.  Like Patrick
> did with his recent VLAN support.
> 
> The usefulness of a .so file seems to be distributing the code
> outside of /bin/ip.  Although I think we currently have some issues with
> mixed 32bit and 64bit systems because we have "/usr/lib/ip/link_*.so"
> hard coded.
> 
> A .so file always seems to override the compiled in version so I don't
> think we loose any flexibility on that front.
> 
> Eric

Fixing the 64 bit library path is on my to fix list.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] ipv6: remove redundant RTM_DELLINK message

2007-09-13 Thread Milan Kocian

remove useless message. We get right message from other subsystem.
---

--- a/net/ipv6/addrconf.c   2007-09-13 11:22:31.087494976 +0200
+++ b/net/ipv6/addrconf.c   2007-09-13 11:25:56.056225711 +0200
@@ -2486,9 +2486,7 @@ static int addrconf_ifdown(struct net_de
else
ipv6_mc_down(idev);
 
-   /* Step 5: netlink notification of this interface */
idev->tstamp = jiffies;
-   inet6_ifinfo_notify(RTM_DELLINK, idev);
 
/* Shot the device (if unregistered) */
 
Signed-off-by: Milan Kocian <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Fix a lock problem in generic phy code

2007-09-13 Thread Hans-Jürgen Koch

Am Donnerstag 13 September 2007 schrieb Jeff Garzik:
> Hans-Jürgen Koch wrote:
> > Lock debugging finds a problem in phy.c and phy_device.c,
> > this patch fixes it. Tested on an AT91SAM9263-EK board,
> > kernel 2.6.23-rc4.
> >
> > Signed-off-by: Hans J. Koch <[EMAIL PROTECTED]>
>
> applied

Thanks! Andrew applied it to -mm, too.

Hans

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-09-13 Thread Kyle McMartin

On Thu, Sep 13, 2007 at 02:03:46AM -0700, Andrew Morton wrote:
> I suspect the fix we want is:
> 

ack. The trend seems to be to avoid this redundant check in the
interrupt handler.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bridge problem

2007-09-13 Thread Stephen Hemminger

On Thu, 13 Sep 2007 15:03:24 +0800
"潘炳宇" <[EMAIL PROTECTED]> wrote:

>  i get a bridge problem when patch my kernel 2.4.32

You need to describe the problem more fully to get assistance.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] veth: Cleanly handle a missing peer_tb argument on creation.

2007-09-13 Thread Pavel Emelyanov

Eric W. Biederman wrote:
> Pavel Emelyanov <[EMAIL PROTECTED]> writes:
> 
>> Eric W. Biederman wrote:
>>> Pavel Emelyanov <[EMAIL PROTECTED]> writes:
>>>
> + }
>  
> - tbp = peer_tb;
> - } else
> - tbp = tb;
 The intention of this part was to get the same parameters for
 peer as for the first device if no "peer" argument was specified
 for ip utility. Does it still work?
>>> I know it is problematic because we try to assign the same name
>>> to both network devices, if we assign a name to the primary
>>> network device.  That can't work.
>> This can - as you can see I reallocate the name lower.
> 
> Hmm. I just see:
>   if (tbp[IFLA_IFNAME])
>   nla_strlcpy(ifname, tbp[IFLA_IFNAME], IFNAMSIZ);
> 
> Then lower I see:
>   if (tb[IFLA_IFNAME])
>   nla_strlcpy(dev->name, tb[IFLA_IFNAME], IFNAMSIZ);
> 
> If (tb == tbp) then dev->name == ifname
> Unless I'm completely misreading that code.
> 
>>> Beyond that I had some really weird crashes while testing this
>>> piece of code, especially when I did not specify a peer parameter.
>> Can you please give me the exact command that caused an oops.
>> I try simple ip link add type veth and everything is just fine.
> 
> It might have been 64bit specific. 
> 
> What I have in my history is:
> ./ip/ip link add veth23 type veth
> 
> I forget exactly how it failed but as I recall it wasn't as
> nice as an oops.  My memory may be a bit foggy though.
> 
> If I haven't provided a bit enough clue I guess I can go back
> and remove the patch and try to reproduce the failure again.

Neither ip link add type veth nor your one fail on my x86_64 box.

However, maybe you didn't like that your command didn't produce
any devices. I can explain this. You order two *equal* devices with
the same name veth23. This has to fail. However if you request 
devices with generic name veth%d or with different names everything
is good.

So could you please give more clues on what's bad with veth driver.

>>> So it was just easier to avoid the problem with this patch then
>>> to completely root cause it.
>> Let me handle this problem. AFAIR this was one of wishes from 
>> Patrick that we make two equal devices in case peer is not given, 
>> not just the default peer.
> 
> Ok.  I have if we can track down the weird cases I have no problem
> if we handle this.  I think it still might be simpler if just
> copy tb onto peer_tb instead of using tbp.
> 
> Eric
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-09-13 Thread Andrew Morton

On Tue, 04 Sep 2007 16:14:06 +0800 Micah Gruber <[EMAIL PROTECTED]> wrote:

> This patch fixes a potential null dereference bug where we dereference dev 
> before a null check. This patch simply moves the dereferencing after the null 
> check.
> 
> Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>
> ---
> 
> --- a/drivers/net/tulip/uli526x.c
> +++ b/drivers/net/tulip/uli526x.c
> @@ -663,7 +663,7 @@
>  {
>   struct net_device *dev = dev_id;
>   struct uli526x_board_info *db = netdev_priv(dev);
> - unsigned long ioaddr = dev->base_addr;
> + unsigned long ioaddr;
>   unsigned long flags;
>  
>   if (!dev) {
> @@ -671,6 +671,8 @@
>   return IRQ_NONE;
>   }
>  
> + ioaddr = dev->base_addr;
> +
>   spin_lock_irqsave(&db->lock, flags);
>   outl(0, ioaddr + DCR7);
> 

I suspect the fix we want is:


--- 
a/drivers/net/tulip/uli526x.c~fix-a-potential-null-pointer-dereference-in-uli526x_interrupt
+++ a/drivers/net/tulip/uli526x.c
@@ -666,11 +666,6 @@ static irqreturn_t uli526x_interrupt(int
unsigned long ioaddr = dev->base_addr;
unsigned long flags;
 
-   if (!dev) {
-   ULI526X_DBUG(1, "uli526x_interrupt() without DEVICE arg", 0);
-   return IRQ_NONE;
-   }
-
spin_lock_irqsave(&db->lock, flags);
outl(0, ioaddr + DCR7);
 
_

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG] tg3 cannot do PXE (loses MAC address) after soft reboot

2007-09-13 Thread Lucas Nussbaum

Hi,

We use PXE with Broadcom NetXtreme cards. After a soft reboot (using the
"reboot" command), the system cannot get an IP address using DHCP. On
the console, a MAC address of 00 00 00 00 00 00 is shown. When rebooting
with "reboot -f" or with the reset button, everything works as expected.

This used to work, and broke between 2.6.16 and 2.6.17. Using git bissect,
I could trace this back to that commit:
commit bc1c756741b065cfebf850e4164c0e2aae9d527f
Author: Michael Chan <[EMAIL PROTECTED]>
Date:   Mon Mar 20 17:48:03 2006 -0800
[TG3]: Support shutdown WoL.

During boot, the following messages are displayed:
Broadcom NetXtreme Gigabit Ethernet Boot Agent v2.2.8
[...]
Broadcom UNDI, PXE-2.1 (build 082) v2.2.8
[...]
CLIENT MAC ADDR: 00 10 18 01 E5 2F GUID: 44454C4C 4800 1052 8032
B9C04F53304A

After a soft reboot, the last line is changed to:
CLIENT MAC ADDR: 00 00 00 00 00 00 GUID: 44454C4C 4800 1052 8032
B9C04F53304A

lspci -v for the card:
02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit 
Ethernet (rev 02)
Subsystem: Broadcom Corporation NetXtreme BCM5703 1000Base-T
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 177
Memory at fcf0 (64-bit, non-prefetchable) [size=64K]
Capabilities: [40] PCI-X non-bridge device
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 
Enable-

Thank you,
-- 
| Lucas NussbaumPhD student |
| [EMAIL PROTECTED]LIG / Projet MESCAL |
| jabber: [EMAIL PROTECTED]+33 (0)6 64 71 41 65 |
| homepage:http://www-id.imag.fr/~nussbaum/ |
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RFC] allow admin/users to specify rto_min in milliseconds rather than jiffies

2007-09-13 Thread Stephen Hemminger

On Wed, 12 Sep 2007 13:28:42 -0700
Rick Jones <[EMAIL PROTECTED]> wrote:

> >> The api in netlink should be in milliseconds rather than compensating
> >> in the application (iproute2).
> > 
> > 
> > My understanding of the in-kernel rtnetlink code is far from complete, 
> > but it doesn't seem to have much in the way of provisions for unit 
> > conversion, which would suggest no nice suffix-based ui as in tc, and ip 
> > is already doing some massaging of units on the display side for a 
> > couple of the other parameters, so I'm at something of a loss.
> 
> So, I used the source and looked and saw that tc seems to convert 
> everything to nanoseconds and passes that to the kernel.  The user can 
> give it seconds, milliseconds, microseconds or nanoseconds by using a 
> suffix. It then does something ostensibly intelligent to display those 
> to the user.
> 
> Ip converts nothing when passing things to the kernel (rtt rttvar or rto 
> - when/if at least the intial rto changes are included - were they?), 
> but when they come-out of the kernel ip converts them to milliseconds. 
>   So the units in != the units out.
> 
> Tc seems much more friendly and less prone to user error.
> 
> I'm still not sure how "easily" rtnetlink can do conversions itself - 
> feedback there would be _very_ welcome - but at the very least, having 
> ip provide at least the illusion of what tc does would seem to be a good 
> thing.
> 
> rick jones

Your observations are correct. rtnetlink can't/shouldn't be doing conversions
itself.  The 'ip' command should use a consistent unit for all values and
do conversions if necessary.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] [IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.

2007-09-13 Thread David Miller

From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Thu, 13 Sep 2007 09:35:28 +0900 (JST)

> | [PATCH 1/4] [IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
> 
> Ah, I should say, socket locking, probably...
> Anyway, lock_sock() and release_sock() are not paired approriately.

Thanks for these fixes, I'll merge these in as soon as I
can, which might take a little bit since I'm traveling today.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3 v4] rfkill: Add rfkill documentation

2007-09-13 Thread David Miller

From: Ivo van Doorn <[EMAIL PROTECTED]>
Date: Wed, 12 Sep 2007 20:14:39 +0200

> Add a documentation file which contains
> a short description about rfkill with some
> notes about drivers and the userspace interface.
> 
> Changes since v1 and v2:
>  - Spellchecking
> 
> Signed-off-by: Ivo van Doorn <[EMAIL PROTECTED]>
> Acked-by: Dmitry Torokhov <[EMAIL PROTECTED]>
> Acked-by: Randy Dunlap <[EMAIL PROTECTED]>

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3 v4] rfkill: Remove IRDA

2007-09-13 Thread David Miller

From: Ivo van Doorn <[EMAIL PROTECTED]>
Date: Wed, 12 Sep 2007 20:14:26 +0200

> As Dmitry pointed out earlier, rfkill-input.c
> doesn't support irda because there are no users
> and we shouldn't add unrequired KEY_ defines.
> 
> However, RFKILL_TYPE_IRDA was defined in the
> rfkill.h header file and would confuse people
> about whether it is implemented or not.
> 
> This patch removes IRDA support completely,
> so it can be added whenever a driver wants the
> feature.
> 
> Signed-off-by: Ivo van Doorn <[EMAIL PROTECTED]>
> CC: Dmitry Torokhov <[EMAIL PROTECTED]>
> CC: Inaky Perez-Gonzalez <[EMAIL PROTECTED]>

Applied to net-2.6.24
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3 v4] rfkill: Add support for ultrawideband

2007-09-13 Thread David Miller

From: Ivo van Doorn <[EMAIL PROTECTED]>
Date: Wed, 12 Sep 2007 20:14:29 +0200

> This patch will add support for UWB keys to rfkill,
> support for this has been requested by Inaky.
> 
> Signed-off-by: Ivo van Doorn <[EMAIL PROTECTED]>
> CC: Dmitry Torokhov <[EMAIL PROTECTED]>
> CC: Inaky Perez-Gonzalez <[EMAIL PROTECTED]>

Applied to net-2.6.24
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Fix race when opening a proc file while a network namespace is exiting.

2007-09-13 Thread David Miller

From: "Paul E. McKenney" <[EMAIL PROTECTED]>
Date: Wed, 12 Sep 2007 15:46:53 -0700

> Looks much better!
> 
> Acked-by: Paul E. McKenney <[EMAIL PROTECTED]>

Applied, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-2.6.24][NETNS][patch 1/1] fix allnoconfig compilation error

2007-09-13 Thread David Miller

From: [EMAIL PROTECTED]
Date: Thu, 13 Sep 2007 08:01:52 +0200

> From: Daniel Lezcano <[EMAIL PROTECTED]>
> 
> When CONFIG_NET=no, init_net is unresolved because net_namespace.c
> is not compiled and the include pull init_net definition.
> 
> This problem was very similar with the ipc namespace where the kernel
> can be compiled with SYSV ipc out.
> 
> This patch fix that defining a macro which simply remove init_net
> initialization from nsproxy namespace aggregator.
> 
> Compiled and booted on qemu-i386 with CONFIG_NET=no and CONFIG_NET=yes.
> 
> Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]>
> Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-13 Thread Johannes Berg

On Wed, 2007-09-12 at 05:34 -0700, David Miller wrote:
> From: Johannes Berg <[EMAIL PROTECTED]>
> Date: Thu, 06 Sep 2007 17:19:55 +0200
> 
> > 
> > Oh btw. Can we stick a might_sleep() into dev_close() *before* the test
> > whether the device is up? That way, we'd have seen the bug, but
> > apparently nobody before Florian ever did a 'ip link set wmaster0 down'
> > while the other interfaces were still open.
> 
> I've added this to net-2.6.24

Great, thanks. Now I just hope John gets around to merging all the
accumulated fixes :)

johannes


signature.asc
Description: This is a digitally signed message part

Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates

2007-09-13 Thread David Miller

From: "Mandeep Baines" <[EMAIL PROTECTED]>
Date: Wed, 12 Sep 2007 09:47:46 -0700

> Why would disabling IRQ's be expensive on non-MSI PCI devices?
> Wouldn't it just require a single MMIO write to clear the interrupt
> mask of the device.

MMIO's are the most expensive part of the whole interrupt
servicing routines and minimizing them is absolutely
crucial.

This is why many devices do things like report status purely
in memory data structures, automatically disable interrupts
on either MSI delivery or status register read, etc.

Often you will see the first MMIO access in the interrupt
handler at the top of the profiles.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

72 matches

Mail list logo