date:20071115

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-11-15 Thread Joe Perches

On Thu, 2007-11-15 at 08:40 +0100, Oliver Hartkopp wrote:
 Stephen Hemminger wrote:
  +#ifdef CONFIG_CAN_DEBUG_CORE
  +extern void can_debug_skb(struct sk_buff *skb);
  +extern void can_debug_cframe(const char *msg, struct can_frame *cframe);
  +#define DBG(fmt, args...)  (DBG_VAR  1 ? printk( \
  +  KERN_DEBUG DBG_PREFIX : %s:  fmt, \
  +  __func__, ##args) : 0)
  +#define DBG_FRAME(fmt, cf) (DBG_VAR  2 ? can_debug_cframe(fmt, cf) : 0)
  +#define DBG_SKB(skb)   (DBG_VAR  4 ? can_debug_skb(skb) : 0)
  +#else
  +#define DBG(fmt, args...)
  +#define DBG_FRAME(fmt, cf)
  +#define DBG_SKB(skb)
  +#endif

I would prefer the more frequently used macro style:

#define DBG(fmt, args...) \
do { if (DBG_VAR  1) printk(KERN_DEBUG DBG_PREFIX : %s:  fmt, \
 __func__, ##args); } while (0)

#define DBG_FRAME(fmt, cf) \
do { if (DBG_VAR  2) can_debug_cframe(fmt, cf); } while (0)

#define DBG_SKB(skb) \
do { if (DBG_VAR  4) can_debug_skb(skb); } while (0)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()

2007-11-15 Thread Eric Dumazet


Andi Kleen a écrit :

Eric Dumazet [EMAIL PROTECTED] writes:

Using a if (need_resched()) test before calling cond_resched(); is
necessary to avoid spending too much time doing the resched check.


The only difference between cond_resched() and if (need_resched())
cond_resched() is one function call less and one might_sleep less. If
the might_sleep or the function call are really problems (did you
measure it? -- i doubt it somewhat) then it would be better to fix the
generic code to either inline that or supply a __cond_resched()
without might_sleep.


Please note that :

if (need_resched())
cond_resched();

will re-test need_resched() once cond_resched() is called.

So it may sound unnecessary but in the rt_check_expire() case, with a loop 
potentially doing XXX.XXX iterations, being able to bypass the function call 
is a clear win (in my bench case, 25 ms instead of 88 ms). Impact on I-cache 
is irrelevant here as this rt_check_expires() runs once every 60 sec.


I think the actual cond_resched() is fine for other uses in the kernel, that 
are not used in a loop : In the general case, kernel text size should be as 
small as possible to reduce I-cache pressure, so a function call is better 
than an inline.




A cheaper change might have been to just limit the number of buckets
scanned.



Well, not in some particular cases, when there are 3 millions of routes for 
example in the cache. We really want to scan/free them eventually :)


An admin already has the possibility to tune 
/proc/sys/net/ipv4/route/gc_interval and /proc/sys/net/ipv4/route/gc_timeout, 
so on a big cache, it will probably set gc_interval to 1 instead of 60


Next step will be to move ip route flush cache and rt_secret_rebuild() 
handling from softirq to process context too, since this still can kill a machine.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2][INET] (resend) Fix potential kfree on vmalloc-ed area of request_sock_queue

2007-11-15 Thread Pavel Emelyanov

The request_sock_queue's listen_opt is either vmalloc-ed or
kmalloc-ed depending on the number of table entries. Thus it 
is expected to be handled properly on free, which is done in 
the reqsk_queue_destroy().

However the error path in inet_csk_listen_start() calls 
the lite version of reqsk_queue_destroy, called 
__reqsk_queue_destroy, which calls the kfree unconditionally. 

Fix this and move the __reqsk_queue_destroy into a .c file as 
it looks too big to be inline.

As David also noticed, this is an error recovery path only,
so no locking is required and the lopt is known to be not NULL.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 7aed02c..0a954ee 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -136,11 +136,7 @@ static inline struct listen_sock 
*reqsk_queue_yank_listen_sk(struct request_sock
return lopt;
 }
 
-static inline void __reqsk_queue_destroy(struct request_sock_queue *queue)
-{
-   kfree(reqsk_queue_yank_listen_sk(queue));
-}
-
+extern void __reqsk_queue_destroy(struct request_sock_queue *queue);
 extern void reqsk_queue_destroy(struct request_sock_queue *queue);
 
 static inline struct request_sock *
diff --git a/net/core/request_sock.c b/net/core/request_sock.c
index 5f0818d..dd78b85 100644
--- a/net/core/request_sock.c
+++ b/net/core/request_sock.c
@@ -71,6 +71,28 @@ int reqsk_queue_alloc(struct request_sock_queue *queue,
 
 EXPORT_SYMBOL(reqsk_queue_alloc);
 
+void __reqsk_queue_destroy(struct request_sock_queue *queue)
+{
+   struct listen_sock *lopt;
+   size_t lopt_size;
+
+   /*
+* this is an error recovery path only
+* no locking needed and the lopt is not NULL
+*/
+
+   lopt = queue-listen_opt;
+   lopt_size = sizeof(struct listen_sock) +
+   lopt-nr_table_entries * sizeof(struct request_sock *);
+
+   if (lopt_size  PAGE_SIZE)
+   vfree(lopt);
+   else
+   kfree(lopt);
+}
+
+EXPORT_SYMBOL(__reqsk_queue_destroy);
+
 void reqsk_queue_destroy(struct request_sock_queue *queue)
 {
/* make all the listen_opt local to us */
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2][INET] (resend) Move the reqsk_queue_yank_listen_sk from header

2007-11-15 Thread Pavel Emelyanov

This function is used in the net/core/request_sock.c only.
No need in keeping it in the header file.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 0a954ee..cff4608 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -124,18 +124,6 @@ struct request_sock_queue {
 extern int reqsk_queue_alloc(struct request_sock_queue *queue,
 unsigned int nr_table_entries);
 
-static inline struct listen_sock *reqsk_queue_yank_listen_sk(struct 
request_sock_queue *queue)
-{
-   struct listen_sock *lopt;
-
-   write_lock_bh(queue-syn_wait_lock);
-   lopt = queue-listen_opt;
-   queue-listen_opt = NULL;
-   write_unlock_bh(queue-syn_wait_lock);
-
-   return lopt;
-}
-
 extern void __reqsk_queue_destroy(struct request_sock_queue *queue);
 extern void reqsk_queue_destroy(struct request_sock_queue *queue);
 
diff --git a/net/core/request_sock.c b/net/core/request_sock.c
index dd78b85..45aed75 100644
--- a/net/core/request_sock.c
+++ b/net/core/request_sock.c
@@ -93,6 +93,19 @@ void __reqsk_queue_destroy(struct request_sock_queue *queue)
 
 EXPORT_SYMBOL(__reqsk_queue_destroy);
 
+static inline struct listen_sock *reqsk_queue_yank_listen_sk(
+   struct request_sock_queue *queue)
+{
+   struct listen_sock *lopt;
+
+   write_lock_bh(queue-syn_wait_lock);
+   lopt = queue-listen_opt;
+   queue-listen_opt = NULL;
+   write_unlock_bh(queue-syn_wait_lock);
+
+   return lopt;
+}
+
 void reqsk_queue_destroy(struct request_sock_queue *queue)
 {
/* make all the listen_opt local to us */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()

2007-11-15 Thread Ilpo Järvinen

On Wed, 14 Nov 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Wed, 14 Nov 2007 15:32:58 +0200 (EET)

  [PATCH] [TCP] FRTO: Clear frto_highmark only after process_frto that uses it

  I broke this in commit 3de96471bd7fb76406e975ef6387abe3a0698149.
  tcp_process_frto should always see a valid frto_highmark. An
  invalid frto_highmark (zero) is very likely what ultimately
  caused a seqno compare in tcp_frto_enter_loss to do the wrong
  leading to the LOST-bit leak.

  Having LOST-bits integry ensured like done after commit
  23aeeec365dcf8bc87fae44c533e50d0bb4f23cc won't hurt. It may
  still be useful in some other, possibly legimate, scenario.

  Reported by Chazarain Guillaume [EMAIL PROTECTED].

  Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

 Applied.

 Thanks for making such an incredibly thorough investigation
 into this bug!

I suppose this bug also caused all those spurious rtos I used to see with 
my home connection (~10% of all RTOs during 10M scp transfer). They seemed 
a bit out of place because it's all wired and low RTT. Though there are bw 
limits enforced by ISP which I first suspected could cause it, except for 
suspecting bug in my code of course :-). ...It seems I can drop 
investigating them now since last evening test run gave 0 spurious
RTOs :-).

Thanks Chazarain for you report.

-- 
 i.

Re: [PATCH 2/2][INET] Move the reqsk_queue_yank_listen_sk from header

2007-11-15 Thread Pavel Emelyanov

Simon Horman wrote:
 On Wed, Nov 14, 2007 at 09:11:06PM +0300, Pavel Emelyanov wrote:
 This function is used in the net/core/request_sock.c only.
 No need in keeping it in the header file.
 
 I feel like I am missing something here, but 
 doesn't __reqsk_queue_destroy() in include/net/request_sock.h use
 reqsk_queue_yank_listen_sk()?

It does, but this is a patch number 2. The patch number 1 moved this
__reqsk_queue_destroy() into request_sock.c.

 static inline void __reqsk_queue_destroy(struct request_sock_queue
 *queue)
 {
 kfree(reqsk_queue_yank_listen_sk(queue));
 }

Thanks,
Pavel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] via-velocity: don't oops on MTU change.

2007-11-15 Thread Jarek Poplawski

On 15-11-2007 04:38, Stephen Hemminger wrote:
 Simple mtu change when device is down.
 Fix http://bugzilla.kernel.org/show_bug.cgi?id=9382.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
 
 
 --- a/drivers/net/via-velocity.c  2007-10-22 09:38:11.0 -0700
 +++ b/drivers/net/via-velocity.c  2007-11-14 19:34:30.0 -0800
 @@ -1963,6 +1963,11 @@ static int velocity_change_mtu(struct ne
   return -EINVAL;
   }
  
 + if (!netif_running(dev)) {
 + dev-mtu = new_mtu;
 + return 0;
 + }
 +
   if (new_mtu != oldmtu) {
   spin_lock_irqsave(vptr-lock, flags);

Shouldn't this latter 'if' be removed now, btw?

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()

2007-11-15 Thread David Miller

From: Andi Kleen [EMAIL PROTECTED]
Date: Thu, 15 Nov 2007 08:30:16 +0100

 Eric Dumazet [EMAIL PROTECTED] writes:

  Using a if (need_resched()) test before calling cond_resched(); is
  necessary to avoid spending too much time doing the resched check.

 The only difference between cond_resched() and if (need_resched())
 cond_resched() is one function call less and one might_sleep less. If
 the might_sleep or the function call are really problems (did you
 measure it? -- i doubt it somewhat) then it would be better to fix the
 generic code to either inline that or supply a __cond_resched()
 without might_sleep.

 A cheaper change might have been to just limit the number of buckets
 scanned.

Fix up unmap_vmas() too if this is done as it does a similar
need_resched() check too.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()

2007-11-15 Thread David Miller

From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 15 Nov 2007 09:25:59 +0100

 Please note that :

 if (need_resched())
  cond_resched();

 will re-test need_resched() once cond_resched() is called.

 So it may sound unnecessary but in the rt_check_expire() case, with a loop 
 potentially doing XXX.XXX iterations, being able to bypass the function call 
 is a clear win (in my bench case, 25 ms instead of 88 ms). Impact on I-cache 
 is irrelevant here as this rt_check_expires() runs once every 60 sec.

BTW, Eric, initially I was going to recommend that you do
something like:

if ((goal % SOME_POWER_OF_2) == 0)
cond_resched();

to mitigate this cost but decided it wasn't worth the bother.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2][INET] (resend) Fix potential kfree on vmalloc-ed area of request_sock_queue

2007-11-15 Thread Eric Dumazet

On Thu, 15 Nov 2007 11:41:37 +0300
Pavel Emelyanov [EMAIL PROTECTED] wrote:

 The request_sock_queue's listen_opt is either vmalloc-ed or
 kmalloc-ed depending on the number of table entries. Thus it 
 is expected to be handled properly on free, which is done in 
 the reqsk_queue_destroy().
 
 However the error path in inet_csk_listen_start() calls 
 the lite version of reqsk_queue_destroy, called 
 __reqsk_queue_destroy, which calls the kfree unconditionally. 
 
 Fix this and move the __reqsk_queue_destroy into a .c file as 
 it looks too big to be inline.
 
 As David also noticed, this is an error recovery path only,
 so no locking is required and the lopt is known to be not NULL.
 
 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
 

Acked-by: Eric Dumazet [EMAIL PROTECTED]

Thank you for finding this bug Pavel

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3: strange errors and non-working-ness

2007-11-15 Thread Jarek Poplawski

On 13-11-2007 19:57, Jon Nelson wrote:
 I'm not sure if this is the right place,

Me too. Looks more like acpi or pci problem. Did you try to experiment
with something like: pci=noacpi or acpi=off boot parameters? Probably
some point to your .config and dmesg should be useful too, so taking
it to bugzilla and sending a number as a follow up to this thread
should be resonable.

Btw, I add main kernel to cc.

Regards,
Jarek P.


 but I've got a pair of GiG-E
 cards that do not work correctly. Everything appears to come up just
 fine, but sooner or later (typically fairly quickly) the cards weird
 out and never really come back.
 
 The best info I've got is this:
 
 Nov 10 22:21:19 frank kernel: tg3.c:v3.65 (August 07, 2006)
 Nov 10 22:21:19 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] -
 Link [LNKB] - GSI 3 (level, low) - IRQ 3
 Nov 10 22:21:19 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105
 PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet
 00:09:5b:09:b1:69
 Nov 10 22:21:19 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0]
 ASF[0] Split[0] WireSpeed[1] TSOcap[0]
 Nov 10 22:21:19 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit]
 Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset b (was 164514e4, writing 302a1385)
 Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 3 (was 0, writing 4008)
 Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 2 (was 200, writing 215)
 Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
 Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
 Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and
 on for RX.
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset b (was 164514e4, writing 302a1385)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 3 (was 0, writing 4008)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 2 (was 200, writing 215)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
 Nov 10 22:21:20 frank kernel: ACPI: PCI interrupt for device
 :00:0b.0 disabled
 Nov 10 22:21:20 frank kernel: PCI: Enabling device :00:0b.0 (0100 - 0102)
 Nov 10 22:21:20 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] -
 Link [LNKB] - GSI 3 (level, low) - IRQ 3
 Nov 10 22:21:20 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105
 PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet
 00:09:5b:09:b1:69
 Nov 10 22:21:20 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0]
 ASF[0] Split[0] WireSpeed[1] TSOcap[0]
 Nov 10 22:21:20 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit]
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset b (was 164514e4, writing 302a1385)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 3 (was 0, writing 4008)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 2 (was 200, writing 215)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
 Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
 Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and
 on for RX.
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset b (was 164514e4, writing 302a1385)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 3 (was 0, writing 4008)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 2 (was 200, writing 215)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset b (was 164514e4, writing 302a1385)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 3 (was 0, writing 4008)
 Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
 :00:0b.0 at offset 2

[PATCH][NET-2.6.25] Move sock_valbool_flag to socket.c

2007-11-15 Thread Pavel Emelyanov

The sock_valbool_flag() helper is used in setsockopt to 
set or reset some flag on the sock. This helper is required
in the net/socket.c only, so move it there.

Besides, patch two places in sys_setsockopt() that repeat
this helper functionality manually.

Since this is not a bugfix, but a trivial cleanup, I
prepared this patch against net-2.6.25, but it also
applies (with a single offset) to the latest net-2.6.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/sock.h b/include/net/sock.h
index cfb946a..80ca671 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1393,14 +1393,6 @@ extern int net_msg_warn;
lock_sock(sk); \
}
 
-static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
-{
-   if (valbool)
-   sock_set_flag(sk, bit);
-   else
-   sock_reset_flag(sk, bit);
-}
-
 extern __u32 sysctl_wmem_max;
 extern __u32 sysctl_rmem_max;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 2029d09..98b243a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -419,6 +419,14 @@ out:
return ret;
 }
 
+static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
+{
+   if (valbool)
+   sock_set_flag(sk, bit);
+   else
+   sock_reset_flag(sk, bit);
+}
+
 /*
  * This is meant for all protocols to use and covers goings on
  * at the socket level. Everything here is generic.
@@ -463,11 +471,8 @@ int sock_setsockopt(struct socket *sock, int level, int 
optname,
case SO_DEBUG:
if (val  !capable(CAP_NET_ADMIN)) {
ret = -EACCES;
-   }
-   else if (valbool)
-   sock_set_flag(sk, SOCK_DBG);
-   else
-   sock_reset_flag(sk, SOCK_DBG);
+   } else
+   sock_valbool_flag(sk, SOCK_DBG, valbool);
break;
case SO_REUSEADDR:
sk-sk_reuse = valbool;
@@ -477,10 +482,7 @@ int sock_setsockopt(struct socket *sock, int level, int 
optname,
ret = -ENOPROTOOPT;
break;
case SO_DONTROUTE:
-   if (valbool)
-   sock_set_flag(sk, SOCK_LOCALROUTE);
-   else
-   sock_reset_flag(sk, SOCK_LOCALROUTE);
+   sock_valbool_flag(sk, SOCK_LOCALROUTE, valbool);
break;
case SO_BROADCAST:
sock_valbool_flag(sk, SOCK_BROADCAST, valbool);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc2-mm1 -- strange apparent network failures

2007-11-15 Thread Andy Whitcroft

When testing some of the later 2.6.24-rc2-mm1+hotfix combinations on three
of our test systems one job from each batch (1/4) failed.  In each case the
machine appears to have booted normally all the way to a login: prompt.
However in the failed boots the networking though apparently initialised
completely and correctly (as far as I can tell from the console output), is
reported as not responding to ssh connections.  The network interface seems
to have been initialised on the right port, and the ssh daemons started.

Two of the machines are powerpc boxes, the other an older x86_64.
One machine is 4/4 in testing, just one.  Most of the other machines are
still not able to compile this stack so do not contribute to our knowledge.

Any ideas?

-apw
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re : Re : Bug in using inet_lookup ()

2007-11-15 Thread Evgeniy Polyakov

On Wed, Nov 14, 2007 at 04:47:22PM +, Nj A ([EMAIL PROTECTED]) wrote:
 By setting the ID of the ingress device to the inet_lookup() to 0, the 
 machine reboots automatically.
 Setting proc/sys/kernel/panic* to non zero values dosn't help more..

Sorry, I did not understand?
You mean after you provide zero to inet_lookup() instead of device id it
strted to reboot?

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bugme-new] [Bug 9384] New: Appletalk packets are delivered to the last interface FD_SET

2007-11-15 Thread Andrew Morton


(switching to email for netdev - please repond via emailed reply-to-all, not
via the bugzilla UI)

On Thu, 15 Nov 2007 01:56:07 -0800 (PST) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=9384
 
Summary: Appletalk packets are delivered to the last interface
 FD_SET
Product: Networking
Version: 2.5
  KernelVersion: 2.6.21.3
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: Other
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: 2.6.10. Maybe 2.6.15? It was
 in 2.6.18 along with bug 7421 which caused me to disable netatalk until now.
 Distribution: Debian etch (4.0)
 Hardware Environment: Pentium 4 2.8GHz, HT off, Intel D865GLC motherboard,
 256MB RAM, onboard Intel GigE, PCI Intel e100.
 Software Environment: Netatalk 2.0.3, ipset patch for iptables and kernel
 Problem Description: Appletalk packets appear to come from the wrong 
 interface,
 specifically the last one FD_SET. Using wireshark I see Appletalk rtmp packets
 arrive from the upstream router on eth1 (the e100). Netatalk then reports the
 packet as having arrived on eth0.3, which is the only other appletalk enabled
 interface, and prints rtmp_packet interface mismatch because the packet
 appears to come from the wrong interface.
 
 I'm fairly sure it's the kernel doing it, because wireshark is listening on
 eth1 and shows the packet from the upstream router's MAC address and DDP
 address, then the debug code in atalkd immediately after the recvfrom prints
 the ifr_name which is eth0.3. Also netatalk 2.0.3 was released over 2 years
 ago, so the only code that's changed is the kernel.
 
 Enabling appletalk on eth0.2 clarifies the problem - packets are delivered to
 fds belonging to the last interface FD_SET. Reordering the interfaces also
 shows this, as in the config file changing the order of the interfaces changes
 the order they're looped through for FD_SET.
 
 Steps to reproduce: Set up a multi-interface netatalk config and watch for
 rtmp_packet interface mismatch messages. I added a bunch of log statements to
 debug this, the most useful places to put them are at the end of setaddr() and
 after the select() in main().
 
 The machine is a router, so I have to minimise the downtime of testing
 different kernel versions. I am happy to instrument atalkd or provide packet
 captures.
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()

2007-11-15 Thread Guillaume Chazarain

David Miller [EMAIL PROTECTED] wrote:

 Chazarain please let us know if it does indeed cure your
 problem.

Unfortunately, I couldn't manage to reproduce the problem with an
unpatched kernel. But your investigation Ilpo was really impressive.

BTW, even though I messed up the yahoo webmail configuration, you can
call me by my first name: Guillaume ;-)

Thanks again for such an awesome bug fixing attitude!

-- 
Guillaume
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2][INET] (resend) Fix potential kfree on vmalloc-ed area of request_sock_queue

2007-11-15 Thread David Miller

From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 15 Nov 2007 10:21:01 +0100

 On Thu, 15 Nov 2007 11:41:37 +0300
 Pavel Emelyanov [EMAIL PROTECTED] wrote:

  The request_sock_queue's listen_opt is either vmalloc-ed or
  kmalloc-ed depending on the number of table entries. Thus it 
  is expected to be handled properly on free, which is done in 
  the reqsk_queue_destroy().

  However the error path in inet_csk_listen_start() calls 
  the lite version of reqsk_queue_destroy, called 
  __reqsk_queue_destroy, which calls the kfree unconditionally. 

  Fix this and move the __reqsk_queue_destroy into a .c file as 
  it looks too big to be inline.

  As David also noticed, this is an error recovery path only,
  so no locking is required and the lopt is known to be not NULL.

  Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

 Acked-by: Eric Dumazet [EMAIL PROTECTED]

 Thank you for finding this bug Pavel

Indeed.

I applied this, but what I did was I combined both changes
into one because to me they logically belong together.

Thanks again Pavel!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][NET-2.6.25] Move sock_valbool_flag to socket.c

2007-11-15 Thread David Miller

From: Pavel Emelyanov [EMAIL PROTECTED]
Date: Thu, 15 Nov 2007 12:43:51 +0300

 The sock_valbool_flag() helper is used in setsockopt to 
 set or reset some flag on the sock. This helper is required
 in the net/socket.c only, so move it there.

 Besides, patch two places in sys_setsockopt() that repeat
 this helper functionality manually.

 Since this is not a bugfix, but a trivial cleanup, I
 prepared this patch against net-2.6.25, but it also
 applies (with a single offset) to the latest net-2.6.

 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

Applied to net-2.6.25, thanks Pavel.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-15 Thread Jonas Danielsson

Fix arp reply when received arp probe with sender ip 0.

Can't find any ground in RFC2131 to send a non-valid arp-reply in
the special case of sender ip being set to 0.

- Bug fix for arp handling when sender ip is set to 0.
Send a correct arp reply instead of one with sender ip and sender
hardware adress in target fields.

Now sends target ip and target hw as received in arp probe.

Signed-off-by: Jonas Danielsson [EMAIL PROTECTED]

---
arp.c |3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: arp.c
===
RCS file: /usr/local/cvs/linux/os/linux-2.6/net/ipv4/arp.c,v
retrieving revision 1.22
diff -u -w -r1.22 arp.c
--- arp.c   13 Oct 2006 12:45:47 -  1.22
+++ arp.c   15 Nov 2007 10:34:44 -
@@ -827,7 +827,8 @@
if (arp-ar_op == htons(ARPOP_REQUEST) 
inet_addr_type(tip) == RTN_LOCAL 
!arp_ignore(in_dev,dev,sip,tip))
-   
arp_send(ARPOP_REPLY,ETH_P_ARP,tip,dev,tip,sha,dev-dev_addr,dev-dev_addr);
+   arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,
+dev-dev_addr, sha);
goto out;
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-11-15 Thread Urs Thuermann

Stephen Hemminger [EMAIL PROTECTED] writes:

  +#ifdef CONFIG_CAN_DEBUG_CORE
  +extern void can_debug_skb(struct sk_buff *skb);
  +extern void can_debug_cframe(const char *msg, struct can_frame *cframe);
  +#define DBG(fmt, args...)  (DBG_VAR  1 ? printk( \
  +   KERN_DEBUG DBG_PREFIX : %s:  fmt, \
  +   __func__, ##args) : 0)
  +#define DBG_FRAME(fmt, cf) (DBG_VAR  2 ? can_debug_cframe(fmt, cf) : 0)
  +#define DBG_SKB(skb)   (DBG_VAR  4 ? can_debug_skb(skb) : 0)
  +#else
  +#define DBG(fmt, args...)
  +#define DBG_FRAME(fmt, cf)
  +#define DBG_SKB(skb)
  +#endif
 
 
 This non-standard debugging seems like it needs a better interface.
 Also, need paren's around (DBG_VAR  1) and don't use UPPERCASE for
 variable names.

No additional parenthesis is needed here.  ?: is the lowest precedence
operator above assignment and ,.  Also, DBG_VAR is no variable name.
It's a macro that expands to a variable name like can_debug, raw_debug
or bcm_debug.

  +HLIST_HEAD(rx_dev_list);
 
 Please either make rx_dev_list static or call it can_rx_dev_list
 to avoid name conflices.
 
 
  +static struct dev_rcv_lists rx_alldev_list;
  +static DEFINE_SPINLOCK(rcv_lists_lock);
  +
  +static struct kmem_cache *rcv_cache __read_mostly;
  +
  +/* table of registered CAN protocols */
  +static struct can_proto *proto_tab[CAN_NPROTO] __read_mostly;
  +static DEFINE_SPINLOCK(proto_tab_lock);
  +
  +struct timer_list stattimer; /* timer for statistics update */
  +struct s_stats  stats;   /* packet statistics */
  +struct s_pstats pstats;  /* receive list statistics */
 
 More global variables without prefix.

These variables are not exported with EXPORT_SYMBOL, so there should
be no name conflict.  They cannot be made static because they are used
in af_can.c and proc.c.  Nevertheless we can prefix them with can_ if
you still think it's necessary.

  +static int can_proc_read_stats(char *page, char **start, off_t off,
  +  int count, int *eof, void *data)
  +{

  +}
 
 The read interface should use seq_file interface rather than
 formatting into page buffer.

Why?  For this simple function a page buffer is enough space and the
seq_file API would require more effort.  IMHO, seq_files offer
advantages if the proc file shows some sequence of data generated in
an iteration through some loop (see below).

  +static int can_proc_read_reset_stats(char *page, char **start, off_t off,
  +int count, int *eof, void *data)
  +{

  +}
 
 Why not have a write interface to do the reset?

I haven't looked into writable proc files yet.  Will do so.

  +static int can_proc_read_rcvlist(char *page, char **start, off_t off,
  +int count, int *eof, void *data)
  +{
  +   /* double cast to prevent GCC warning */
  +   int idx = (int)(long)data;

  +}

This is were I would prefer sequence files.  However, the seq file
interface doesn't allow me to pass additional info like the `data'
argument.  This means I would have to write separate functions
instead.

 Output from checkpatch:
 
 WARNING: do not add new typedefs
 #116: FILE: include/linux/can.h:41:
 +typedef __u32 canid_t;
 
 WARNING: do not add new typedefs
 #124: FILE: include/linux/can.h:49:
 +typedef __u32 can_err_mask_t;

These typedef were considered OK in previous discussions on the list.

 ERROR: use tabs not spaces
 #498: FILE: net/can/af_can.c:159:
 +^I^I^I^I not implemented.\n, module_name);$

Fixed.

 WARNING: braces {} are not necessary for single statement blocks
 #1080: FILE: net/can/af_can.c:741:
 + if (!proto_tab[proto]) {
 + printk(KERN_ERR BUG: can: protocol %d is not registered\n,
 +proto);
 + }

Hm, isn't it common to use braces for single statements if they span
more than one line?

Thanks for your review.

urs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)

2007-11-15 Thread YOSHIFUJI Hideaki / 吉藤英明

In article [EMAIL PROTECTED] (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, 
Fred L [EMAIL PROTECTED] says:

 --- linux-2.6.24-rc2/net/ipv6/addrconf.c.orig 2007-11-08 11:59:35.0 
 -0800
 +++ linux-2.6.24-rc2/net/ipv6/addrconf.c  2007-11-14 22:17:28.0 
 -0800
 @@ -1424,6 +1424,21 @@ static int addrconf_ifid_infiniband(u8 *
   return 0;
  }
  
 +static int addrconf_ifid_isatap(u8 *eui, __be32 addr)
 +{
 +
 + eui[0] = 0x02; eui[1] = 0; eui[2] = 0x5E; eui[3] = 0xFE;
 + memcpy (eui+4, addr, 4);
 +
 + if (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) ||
 + LINKLOCAL_169(addr) || PRIVATE_172(addr) || TEST_192(addr) ||
 + ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) ||
 + MULTICAST(addr) || BADCLASS(addr))
 + eui[0] = ~0x02;
 +
 + return 0;
 +}
 +
  static int ipv6_generate_eui64(u8 *eui, struct net_device *dev)
  {
   switch (dev-type) {

{
  eui[0] = (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) ||
LINKLOCAL_169(addr) || PRIVATE_172(addr) || TEST_192(addr) ||
ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) ||
MULTICAST(addr) || BADCLASS(addr)) ? 0 : 2;
  eui[1] = 0;
  eui[2] = 0x5E;
  eui[3] = 0xFE;
  memcpy (eui+4, addr, 4);
}


 @@ -2167,7 +2185,8 @@ static void addrconf_dev_config(struct n
   (dev-type != ARPHRD_FDDI) 
   (dev-type != ARPHRD_IEEE802_TR) 
   (dev-type != ARPHRD_ARCNET) 
 - (dev-type != ARPHRD_INFINIBAND)) {
 + (dev-type != ARPHRD_INFINIBAND) 
 + !(dev-priv_flags  IFF_ISATAP)) {
   /* Alas, we support only Ethernet autoconfiguration. */
   return;
   }

Because priv_flags are local to device type, you need to check dev-type:
(dev-type == ARPHRD_SIT  !(dev-priv_flags  IFF_ISATAP))
or something like this.


 + struct ip_tunnel *t  = netdev_priv(ifp-idev-dev);
 + if (t-parms.i_key != INADDR_NONE) {
 + spin_lock(ifp-lock);

I guess INADDR_ANY.

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: reproducible cxgb kernel panic in FC8 kernel 2.6.23.1-49

2007-11-15 Thread Divy Le Ray


Ben Greear wrote:
This panic happens (almost?) immediately after starting TCP traffic 
between

the cxgb nic on this system and another.  We also got at least one crash
on a custom/tainted 2.6.20.12 kernel, but it would run for at least
a few minutes at ~1Gbps first.

I think my serial console chomped some of this..but it's very 
reproducible,

so if you need more info I can make the terminal wider and do it again.


Hi Ben,

I just posted a patch fixing this T2 crash. It appeared in 2.6.22, when 
eth_type_trans()
was modified to set skb-dev. cxgb3 got fixed at the time, but I 
obviously forgot the
chelsio driver. I'm a bit behind on T2 updates. I will get to it in a 
few days.


Cheers,
Divy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)

2007-11-15 Thread YOSHIFUJI Hideaki / 吉藤英明

In article [EMAIL PROTECTED] (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, 
Fred L [EMAIL PROTECTED] says:

 From: Fred L. Templin [EMAIL PROTECTED]
 
 This patch includes support for the Intra-Site Automatic Tunnel
 Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
 module, and is configured using extensions to the iproute2
 utility.
 
 The following diffs are specific to the Linux 2.6.24-rc2 kernel
 distribution. This message includes the full and patchable diff text;
 please use this version to apply patches.
 
 Signed-off-by: Fred L. Templin [EMAIL PROTECTED]

BTW, how will we handle DNS name (and TTL) and/or multiple PRL entries 
in RFC4214?

I'm doubting if we really need to handle PRL refresh in kernel.

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()

2007-11-15 Thread Ilpo Järvinen

On Thu, 15 Nov 2007, Guillaume Chazarain wrote:

 David Miller [EMAIL PROTECTED] wrote:
 
  Chazarain please let us know if it does indeed cure your
  problem.
 
 Unfortunately, I couldn't manage to reproduce the problem with an
 unpatched kernel. But your investigation Ilpo was really impressive.

These are usually very sensitive on other traffic because even a simple 
change in packet pattern changes behavior enough for it do disappear.
The same thing occurred with the month ago fackets_out miscount as 
well, at different weekday it just wasn't reproducable. ...Anyway, I'm 
pretty sure it's now fixed because there's a simple explination to it 
due to the frto_highmark premature clearing bug. But if you would still 
end up seeing them after that, make sure to report it... :-)

 BTW, even though I messed up the yahoo webmail configuration, you can
 call me by my first name: Guillaume ;-)

Fair enough. :-)

 Thanks again for such an awesome bug fixing attitude!

The best thing is that usually when forced to really think what could go 
wrong, also other, unrelated bugs seem to come up, though up to 10%
of the initial oh-nos end up being genuine bugs. ...Thus I still have 
couple of miscount-due-to-GSOhints fixes to do as a result of this 
venture besides the problems already fixed.

-- 
 i.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-11-15 Thread Urs Thuermann

Joe Perches [EMAIL PROTECTED] writes:

 On Thu, 2007-11-15 at 08:40 +0100, Oliver Hartkopp wrote:
  Stephen Hemminger wrote:
   +#ifdef CONFIG_CAN_DEBUG_CORE
   +extern void can_debug_skb(struct sk_buff *skb);
   +extern void can_debug_cframe(const char *msg, struct can_frame *cframe);
   +#define DBG(fmt, args...)  (DBG_VAR  1 ? printk( \
   +KERN_DEBUG DBG_PREFIX : %s:  
   fmt, \
   +__func__, ##args) : 0)
   +#define DBG_FRAME(fmt, cf) (DBG_VAR  2 ? can_debug_cframe(fmt, cf) : 0)
   +#define DBG_SKB(skb)   (DBG_VAR  4 ? can_debug_skb(skb) : 0)
   +#else
   +#define DBG(fmt, args...)
   +#define DBG_FRAME(fmt, cf)
   +#define DBG_SKB(skb)
   +#endif
 
 I would prefer the more frequently used macro style:
 
 #define DBG(fmt, args...) \
   do { if (DBG_VAR  1) printk(KERN_DEBUG DBG_PREFIX : %s:  fmt, \
__func__, ##args); } while (0)
 
 #define DBG_FRAME(fmt, cf) \
   do { if (DBG_VAR  2) can_debug_cframe(fmt, cf); } while (0)
 
 #define DBG_SKB(skb) \
   do { if (DBG_VAR  4) can_debug_skb(skb); } while (0)

I prefer our code because it is shorter (fits into one line) and can
be used anywhere where an expression is allowed compared to only where
a statement is allowed.  Actually, I first had

#define DBG( ... )   ((debug  1)  printk( ... ))

and so on, but that didn't work with can_debug_{cframe,sbk} since they
return void.

Admitted, the benefit of expr vs. statement is really negligible and
since this issue has come up several times I will change these macros
using do-while.

urs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [alsa-devel] [BUG] New Kernel Bugs

2007-11-15 Thread Bron Gondwana

On Thu, Nov 15, 2007 at 06:59:34AM +0100, Rene Herman wrote:
 On 15-11-07 05:16, Bron Gondwana wrote:

 Totally unrelated - I sent something to the kolab mailing list a couple

 [ ... ]

 I'm sure if I had something that I considered worth informing the ALSA 
 project of, I'd be wary of spending the same effort writing a good post
 knowing it may be dropped in between the by a list moderator just selecing 
 all and bouncing them.

 Totally unrelated indeed so why are spouting crap? If the kohab list has a 
 problem take it up with them but keep ALSA out of it. alsa-devel has only 
 ever moderated out spam -- nothing else.

As an outsider to the list, how do I know what your policy will be
other than I've been rejected out of hand by someone else's list, 
so my experience is that member only lists aren't willing to listen
to something I have to say unless I make the effort to sign up and
have yet another folder accumulating unread messages.  I don't.

Well, ok - maybe I do here since I've let myself be dragged in to
the debate.  Oops.

I get the same information from both project websites: moderated
for non-members, public archives - no way of knowing that ALSA
will accept me informing them of something they would be interested
without committing to reading or bit-bucketing their list.

The alternative is to subscribe just long enough to send something
and then unsubscribe again or cold-email a member and ask them
to pass a message along.  Or post and hope it doesn't get rejected,
not even knowing for a day or so.

Bron.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-11-15 Thread David Miller

From: Urs Thuermann [EMAIL PROTECTED]
Date: 15 Nov 2007 12:51:34 +0100

 I prefer our code because it is shorter (fits into one line) and can
 be used anywhere where an expression is allowed compared to only where
 a statement is allowed.  Actually, I first had

 #define DBG( ... )   ((debug  1)  printk( ... ))

 and so on, but that didn't work with can_debug_{cframe,sbk} since they
 return void.

 Admitted, the benefit of expr vs. statement is really negligible and
 since this issue has come up several times I will change these macros
 using do-while.

I really frown upon these local debugging macros people tend to want
to submit with their changes.

It really craps up the tree, even though it might be useful to you.

So please remove this stuff or replace the debugging statements
with some generic kernel debugging facility, there are several.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Null pointer dereference in nf_nat_move_storage(), kernel 2.6.23.1

2007-11-15 Thread Evgeniy Polyakov

Hi Chuck.

On Wed, Nov 14, 2007 at 06:25:15PM -0500, Chuck Ebbert ([EMAIL PROTECTED]) 
wrote:
  https://bugzilla.redhat.com/show_bug.cgi?id=259501#c14

   [f8b61643] __nf_ct_ext_add+0x12f/0x1c4 [nf_conntrack] 

  nf_nat_move_storage():
  /usr/src/debug/kernel-2.6.23/linux-2.6.23.i686/net/ipv4/netfilter/nf_nat_core.c:612
87:   f7 47 64 80 01 00 00testl  $0x180,0x64(%edi)
8e:   74 39   je c9 nf_nat_move_storage+0x65
  
  line 612:
  if (!(ct-status  IPS_NAT_DONE_MASK))
  return;

Please test attached patch.

This routing is called each time hash should be replaced, nf_conn has
extension list which contains pointers to connection tracking users
(like nat, which is right now the only such user), so when replace takes
place it should copy own extensions. Loop above checks for own
extension, but tries to move higer-layer one, which can lead to above
oops.

Not tested, derived from code observation only.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/net/netfilter/nf_conntrack_extend.c 
b/net/netfilter/nf_conntrack_extend.c
index a1a65a1..cf6ba66 100644
--- a/net/netfilter/nf_conntrack_extend.c
+++ b/net/netfilter/nf_conntrack_extend.c
@@ -109,7 +109,7 @@ void *__nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id 
id, gfp_t gfp)
rcu_read_lock();
t = rcu_dereference(nf_ct_ext_types[i]);
if (t  t-move)
-   t-move(ct, ct-ext + ct-ext-offset[id]);
+   t-move(ct, ct-ext + ct-ext-offset[i]);
rcu_read_unlock();
}
kfree(ct-ext);

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netfilter : struct xt_table_info diet

2007-11-15 Thread Patrick McHardy

Eric Dumazet wrote:
 On Wed, 14 Nov 2007 18:19:41 +0100
 Patrick McHardy [EMAIL PROTECTED] wrote:
 
diff --git a/net/ipv4/netfilter/arp_tables.c 
b/net/ipv4/netfilter/arp_tables.c
index 2909c92..ed3bd0b 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -811,7 +811,7 @@ static int do_replace(void __user *user, unsigned int 
len)
 return -ENOPROTOOPT;
 
 /* overflow check */
-if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS -
+if (tmp.size = (INT_MAX - XT_TABLE_INFO_SZ) / NR_CPUS -
 SMP_CACHE_BYTES)


Shouldn't NR_CPUs be replaced by nr_cpu_ids here? I'm wondering
why we still include NR_CPUs in the calculation at all though,
unlike in 2.4, we don't allocate one huge area of memory anymore
but do one allocation per CPU. IIRC it even was you who changed
that.
 
 
 Yes, doing an allocation per possible cpu was better than one giant 
 allocation (memory savings and NUMA aware)
 
 Well, technically speaking you are right, we may also replace these 
 divides per NR_CPUS by nr_cpu_ids (or even better : num_possible_cpus())
 
 Because with NR_CPUS=4096, we actually limit tmp.size to about 524000,
  what a shame ! :)


We actually had complaints about number of rule limitations, but that
was more likely caused by vmalloc limits :) But of course we do need
to include the number of CPUs in the check, I misread the code.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] chelsio - Fix skb-dev setting

2007-11-15 Thread Divy Le Ray

From: Divy Le Ray [EMAIL PROTECTED]

eth_type_trans() now sets skb-dev. 
Access skb-def after it gets set.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/chelsio/sge.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
index ffa7e64..4436662 100644
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1379,11 +1379,11 @@ static void sge_rx(struct sge *sge, struct freelQ *fl, 
unsigned int len)
}
__skb_pull(skb, sizeof(*p));
 
-   skb-dev-last_rx = jiffies;
st = per_cpu_ptr(sge-port_stats[p-iff], smp_processor_id());
st-rx_packets++;
 
skb-protocol = eth_type_trans(skb, adapter-port[p-iff].dev);
+   skb-dev-last_rx = jiffies;
if ((adapter-flags  RX_CSUM_ENABLED)  p-csum == 0x 
skb-protocol == htons(ETH_P_IP) 
(skb-data[9] == IPPROTO_TCP || skb-data[9] == IPPROTO_UDP)) {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [alsa-devel] [BUG] New Kernel Bugs

2007-11-15 Thread Rene Herman


On 15-11-07 13:02, Bron Gondwana wrote:


I get the same information from both project websites: moderated for
non-members, public archives - no way of knowing that ALSA will accept
me informing them of something they would be interested without
committing to reading or bit-bucketing their list.


Can you please just shelve this crap? You have a way of knowing that ALSA 
will accept you and that is knowing or assuming that the ALSA project 
doesn't consist of drooling retards.


When a project list goes to the difficulty of moderating non-subscribers it 
has made the explicit choice to _not_ become subscriber only. Then refusing 
valid non-subscribers after all makes no sense whatsoever. I'm sorry you got 
your feelings hurt by that other list but it was no doubt an accident; take 
it up with them.


Rene.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH, take2] netfilter : struct xt_table_info diet

2007-11-15 Thread Patrick McHardy

Eric Dumazet wrote:
 [PATCH] netfilter : struct xt_table_info diet
 
 Instead of using a big array of NR_CPUS entries, we can compute the size
 needed at runtime, using nr_cpu_ids
 
 This should save some ram (especially on David's machines where
 NR_CPUS=4096 :
 32 KB can be saved per table, and 64KB for dynamically allocated ones
 (because
 of slab/slub alignements) )
 
 In particular, the 'bootstrap' tables are not any more static (in data
 section) but on stack as their size is now very small.
 
 This also should reduce the size used on stack in compat functions
 (get_info() declares an automatic variable, that could be bigger than
 kernel
 stack size for big NR_CPUS)


I fixed a compilation error with CONFIG_COMPAT and applied it, thanks
Eric. One question though:

 +#define XT_TABLE_INFO_SZ (offsetof(struct xt_table_info, entries) \
 +   + nr_cpu_ids * sizeof(char *))


   /* overflow check */
 - if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS -
 - SMP_CACHE_BYTES)
 + if (tmp.size = INT_MAX / num_possible_cpus())
   return -ENOMEM;

We need to make sure offsetof(struct xt_table_info, entries) +
nr_cpu_ids * sizeof(char *) doesn't overflow, so why doesn't it
use nr_cpu_ids here as well?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [alsa-devel] [BUG] New Kernel Bugs

2007-11-15 Thread Jörn Engel

On Thu, 15 November 2007 13:26:51 +0100, Rene Herman wrote:
 
 Can you please just shelve this crap? You have a way of knowing that ALSA 
 will accept you and that is knowing or assuming that the ALSA project 
 doesn't consist of drooling retards.

Well, my experience with moderation has been that moderated mails are
stuck in some queue for weeks.  Two seperate lists, neither of them was
alsa.  If also is doing a better job, great.  But it still has to live
with the general reputation of non-subscriber moderation.

 When a project list goes to the difficulty of moderating non-subscribers it 
 has made the explicit choice to _not_ become subscriber only. Then refusing 
 valid non-subscribers after all makes no sense whatsoever. I'm sorry you 
 got your feelings hurt by that other list but it was no doubt an accident; 
 take it up with them.

Been there, done that.  In spite of people not being drooling retards,
the amount of time and effort they invest into either moderation or
improving the ruleset is quite limited.  Problems persist.

And even without mails being held hostage for weeks, every single
moderation mail is annoying.  Like the one I'm sure to receive after
sending this out.

Jörn

-- 
Joern's library part 5:
http://www.faqs.org/faqs/compression-faq/part2/section-9.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [alsa-devel] [BUG] New Kernel Bugs

2007-11-15 Thread Olivier Galibert

On Thu, Nov 15, 2007 at 06:59:34AM +0100, Rene Herman wrote:
 Totally unrelated indeed so why are spouting crap? If the kohab list has a 
 problem take it up with them but keep ALSA out of it. alsa-devel has only 
 ever moderated out spam -- nothing else.

That is incorrect.  Hopefully it is the case now though, since my
experience of the subject was years ago.

  OG.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [alsa-devel] [BUG] New Kernel Bugs

2007-11-15 Thread Takashi Iwai

At Thu, 15 Nov 2007 14:17:27 +0100,
Olivier Galibert wrote:
 
 On Thu, Nov 15, 2007 at 06:59:34AM +0100, Rene Herman wrote:
  Totally unrelated indeed so why are spouting crap? If the kohab list has a 
  problem take it up with them but keep ALSA out of it. alsa-devel has only 
  ever moderated out spam -- nothing else.
 
 That is incorrect.  Hopefully it is the case now though, since my
 experience of the subject was years ago.

Yeah, it was really years ago that we once switched to the open list.
Funny that people never forget such a thing :)


Takashi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: network interface state

2007-11-15 Thread jamal

On Thu, 2007-15-11 at 10:11 +0800, Herbert Xu wrote:

 We don't make use of that on recvmsg() though although theoretically
 user-space is supposed to be ready to handle that too.

iproute2 handles that well. Anyone writting netlink apps should program
with the thought that a single received datagram will include many
netlink messages.

On the concept of putting some generation marker/counter: 
It is one of those things that have bothered me as well for sometime,
but i cant think of a clean way to solve it for every user of netlink.
One way to transport this from the kernel is stash it in the netlink
sequence but that would violate things when a user expects a specific 
sequence. 

For the ifla/iflink, it should be trivial to solve by adding a marker in
the kernel that gets set to jiffies (or some incremental counter) every
time an event happens. You then transport this to user space as an
attribute anytime someone does a GET. 
Clearly the best way to solve it is to be generic, but we would need to
revamp netlink totaly.

Note, we do today signal to user space that a message was lost because
of buffer overrun. So a hack (not applicable to the poster given they
dont have a daemon) would be to listen to events and set the rx socket
buffer to be very small so you loose every message. 

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/10] [TCP]: Make lost retrans detection more self-contained

2007-11-15 Thread Ilpo Järvinen

Highest_sack_end_seq is no longer calculated in the loop,
thus it can be pushed to the worker function altogether
making that function independent of the sacktag.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   20 +++-
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c25704f..b7af304 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1115,16 +1115,23 @@ static int tcp_is_sackblock_valid(struct tcp_sock *tp, 
int is_dsack,
  *
  * Search retransmitted skbs from write_queue that were sent when snd_nxt was
  * less than what is now known to be received by the other end (derived from
- * SACK blocks by the caller). Also calculate the lowest snd_nxt among the
- * remaining retransmitted skbs to avoid some costly processing per ACKs.
+ * highest SACK block). Also calculate the lowest snd_nxt among the remaining
+ * retransmitted skbs to avoid some costly processing per ACKs.
  */
-static int tcp_mark_lost_retrans(struct sock *sk, u32 received_upto)
+static int tcp_mark_lost_retrans(struct sock *sk)
 {
+   const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
int flag = 0;
int cnt = 0;
u32 new_low_seq = tp-snd_nxt;
+   u32 received_upto = TCP_SKB_CB(tp-highest_sack)-end_seq;
+
+   if (!tcp_is_fack(tp) || !tp-retrans_out ||
+   !after(received_upto, tp-lost_retrans_low) ||
+   icsk-icsk_ca_state != TCP_CA_Recovery)
+   return flag;
 
tcp_for_write_queue(skb, sk) {
u32 ack_seq = TCP_SKB_CB(skb)-ack_seq;
@@ -1245,7 +1252,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
int reord = tp-packets_out;
int prior_fackets;
-   u32 highest_sack_end_seq;
int flag = 0;
int found_dup_sack = 0;
int cached_fack_count;
@@ -1513,11 +1519,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
flag = ~FLAG_ONLY_ORIG_SACKED;
}
 
-   highest_sack_end_seq = TCP_SKB_CB(tp-highest_sack)-end_seq;
-   if (tcp_is_fack(tp)  tp-retrans_out 
-   after(highest_sack_end_seq, tp-lost_retrans_low) 
-   icsk-icsk_ca_state == TCP_CA_Recovery)
-   flag |= tcp_mark_lost_retrans(sk, highest_sack_end_seq);
+   flag |= tcp_mark_lost_retrans(sk);
 
tcp_verify_left_out(tp);
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 09/10] [TCP]: Rewrite SACK block processing sack_recv_cache use

2007-11-15 Thread Ilpo Järvinen

Key points of this patch are:

  - In case new SACK information is advance only type, no skb
processing below previously discovered highest point is done
  - Optimize cases below highest point too since there's no need
to always go up to highest point (which is very likely still
present in that SACK), this is not entirely true though
because I'm dropping the fastpath_skb_hint which could
previously optimize those cases even better. Whether that's
significant, I'm not too sure.

Corrently it will provide skipping by walking. Combined with
RB-tree, all skipping would become fast too regardless of window
size (can be done incrementally later).

Previously a number of cases in TCP SACK processing fails to
take advantage of costly stored information in sack_recv_cache,
most importantly, expected events such as cumulative ACK and new
hole ACKs. Processing on such ACKs result in rather long walks
building up latencies (which easily gets nasty when window is
huge). Those latencies are often completely unnecessary
compared with the amount of _new_ information received, usually
for cumulative ACK there's no new information at all, yet TCP
walks whole queue unnecessary potentially taking a number of
costly cache misses on the way, etc.!

Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this knowledge here and drop
fastpath hint and use direct access to highest SACKed skb as
a replacement.

Effectively special cased fastpath is dropped. This change
adds some complexity to introduce better coveraged fastpath,
though the added complexity should make TCP behave more cache
friendly.

The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used (if necessary at all). In addition, whenever
possible, TCP fast-forwards to highest_sack skb that was made
available by an earlier patch. In typical case, no other things
but this fast-forward and mandatory markings after that occur
making the access pattern quite similar to the former fastpath
special case.

DSACKs are special case that must always be walked.

The local to recv_sack_cache copying could be more intelligent
w.r.t DSACKs which are likely to be there only once but that
is left to a separate patch.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h   |3 -
 include/net/tcp.h |1 -
 net/ipv4/tcp_input.c  |  277 +++--
 net/ipv4/tcp_output.c |   14 +---
 4 files changed, 175 insertions(+), 120 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 794497c..08027f1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -343,10 +343,7 @@ struct tcp_sock {
struct sk_buff *scoreboard_skb_hint;
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
-   struct sk_buff *fastpath_skb_hint;
 
-   int fastpath_cnt_hint;  /* Lags behind by current skb's pcount
-* compared to respective fackets_out */
int lost_cnt_hint;
int retransmit_cnt_hint;
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3444647..0844261 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1078,7 +1078,6 @@ static inline void tcp_clear_retrans_hints_partial(struct 
tcp_sock *tp)
 static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp)
 {
tcp_clear_retrans_hints_partial(tp);
-   tp-fastpath_skb_hint = NULL;
 }
 
 /* MD5 Signature */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 69f2f79..5833b01 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1333,6 +1333,88 @@ static int tcp_sacktag_one(struct sk_buff *skb, struct 
tcp_sock *tp,
return flag;
 }
 
+static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk,
+   struct tcp_sack_block *next_dup,
+   u32 start_seq, u32 end_seq,
+   int dup_sack_in, int *fack_count,
+   int *reord, int *flag)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   tcp_for_write_queue_from(skb, sk) {
+   int in_sack = 0;
+   int dup_sack = dup_sack_in;
+
+   if (skb == tcp_send_head(sk))
+   break;
+
+   /* queue is in-order = we can short-circuit the walk early */
+   if (!before(TCP_SKB_CB(skb)-seq, end_seq))
+   break;
+
+   if ((next_dup != NULL) 
+

[PATCH 06/10] [TCP]: Prior_fackets can be replaced by highest_sack seq

2007-11-15 Thread Ilpo Järvinen

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b7af304..29fff81 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1251,7 +1251,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
int reord = tp-packets_out;
-   int prior_fackets;
int flag = 0;
int found_dup_sack = 0;
int cached_fack_count;
@@ -1264,7 +1263,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-fackets_out = 0;
tp-highest_sack = tcp_write_queue_head(sk);
}
-   prior_fackets = tp-fackets_out;
 
found_dup_sack = tcp_check_dsack(tp, ack_skb, sp,
 num_sacks, prior_snd_una);
@@ -1457,7 +1455,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
/* New sack for not 
retransmitted frame,
 * which was in hole. It is 
reordering.
 */
-   if (fack_count  prior_fackets)
+   if (before(TCP_SKB_CB(skb)-seq,
+  
tcp_highest_sack_seq(tp)))
reord = min(fack_count, 
reord);
 
/* SACK enhanced F-RTO 
(RFC4138; Appendix B) */
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/10] [TCP]: Convert highest_sack to sk_buff to allow direct access

2007-11-15 Thread Ilpo Järvinen

It is going to replace the sack fastpath hint quite soon... :-)

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h   |6 --
 include/net/tcp.h |   10 ++
 net/ipv4/tcp_input.c  |   12 ++--
 net/ipv4/tcp_output.c |   19 ++-
 4 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index bac17c5..34acee6 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -332,8 +332,10 @@ struct tcp_sock {
 
struct tcp_sack_block_wire recv_sack_cache[4];
 
-   u32 highest_sack;   /* Start seq of globally highest revd SACK
-* (validity guaranteed only if sacked_out  0) 
*/
+   struct sk_buff *highest_sack;   /* highest skb with SACK received
+* (validity guaranteed only if
+* sacked_out  0)
+*/
 
/* from STCP, retrans queue hinting */
struct sk_buff* lost_skb_hint;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index d695cea..3444647 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1306,6 +1306,16 @@ static inline int tcp_write_queue_empty(struct sock *sk)
return skb_queue_empty(sk-sk_write_queue);
 }
 
+/* Start sequence of the highest skb with SACKed bit, valid only if
+ * sacked  0 or when the caller has ensured validity by itself.
+ */
+static inline u32 tcp_highest_sack_seq(struct tcp_sock *tp)
+{
+   if (!tp-sacked_out)
+   return tp-snd_una;
+   return TCP_SKB_CB(tp-highest_sack)-seq;
+}
+
 /* /proc */
 enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ef8187b..c25704f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1245,7 +1245,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
int reord = tp-packets_out;
int prior_fackets;
-   u32 highest_sack_end_seq = tp-lost_retrans_low;
+   u32 highest_sack_end_seq;
int flag = 0;
int found_dup_sack = 0;
int cached_fack_count;
@@ -1256,7 +1256,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (!tp-sacked_out) {
if (WARN_ON(tp-fackets_out))
tp-fackets_out = 0;
-   tp-highest_sack = tp-snd_una;
+   tp-highest_sack = tcp_write_queue_head(sk);
}
prior_fackets = tp-fackets_out;
 
@@ -1483,10 +1483,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (fack_count  tp-fackets_out)
tp-fackets_out = fack_count;
 
-   if (after(TCP_SKB_CB(skb)-seq, 
tp-highest_sack)) {
-   tp-highest_sack = TCP_SKB_CB(skb)-seq;
-   highest_sack_end_seq = 
TCP_SKB_CB(skb)-end_seq;
-   }
+   if (after(TCP_SKB_CB(skb)-seq, 
tcp_highest_sack_seq(tp)))
+   tp-highest_sack = skb;
+
} else {
if (dup_sack  (sackedTCPCB_RETRANS))
reord = min(fack_count, reord);
@@ -1514,6 +1513,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
flag = ~FLAG_ONLY_ORIG_SACKED;
}
 
+   highest_sack_end_seq = TCP_SKB_CB(tp-highest_sack)-end_seq;
if (tcp_is_fack(tp)  tp-retrans_out 
after(highest_sack_end_seq, tp-lost_retrans_low) 
icsk-icsk_ca_state == TCP_CA_Recovery)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 324b420..a5863f9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -657,13 +657,15 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct 
sk_buff *skb, unsigned
  * tweak SACK fastpath hint too as it would overwrite all changes unless
  * hint is also changed.
  */
-static void tcp_adjust_fackets_out(struct tcp_sock *tp, struct sk_buff *skb,
+static void tcp_adjust_fackets_out(struct sock *sk, struct sk_buff *skb,
   int decr)
 {
+   struct tcp_sock *tp = tcp_sk(sk);
+
if (!tp-sacked_out || tcp_is_reno(tp))
return;
 
-   if (!before(tp-highest_sack, TCP_SKB_CB(skb)-seq))
+   if (!before(tcp_highest_sack_seq(tp), TCP_SKB_CB(skb)-seq))
tp-fackets_out -= decr;
 
/* cnt_hint is off-by-one compared with fackets_out (see sacktag) */
@@ -712,9 +714,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 
len, unsigned int mss
TCP_SKB_CB(buff)-end_seq = TCP_SKB_CB(skb)-end_seq;

[RFC PATCH net-2.6.25 0/10] [TCP]: Cleanups, tweaks sacktag recode

2007-11-15 Thread Ilpo Järvinen

Hi Dave,

Here's the sacktag recv_sack_cache usage rewrite which you were
interested to look at earlier, due to other fixes it has dragged
on this long... Besides that, couple of new bugs^W^Wcleanups 
tweaks are there as well :-). I'll probably have to summon create
tcp_sacktag_state patch back to avoid all that pointer passing
all-around. But those won't be earth-shattering changes. The first
two are probably trivial enough to be accepted as is.

Boot  simple transfer tested, minor fixes after that. I'll try
to arrange time at some point of time to do more verification for
the new sacktag and rfc3517 code, and compare old and new sacktag
to get some numbers from accessed skbs per sacktag operation.

--
 i.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/10] [TCP]: Move !in_sack test earlier in sacktag reorganize if()s

2007-11-15 Thread Ilpo Järvinen

All intermediate conditions include it already, make them
simpler as well.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   31 ++-
 1 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0f0c1c9..c470b5a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1406,28 +1406,25 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (unlikely(in_sack  0))
break;
 
+   if (!in_sack) {
+   fack_count += tcp_skb_pcount(skb);
+   continue;
+   }
+
sacked = TCP_SKB_CB(skb)-sacked;
 
/* Account D-SACK for retransmitted packet. */
-   if ((dup_sack  in_sack) 
-   (sacked  TCPCB_RETRANS) 
-   after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker))
-   tp-undo_retrans--;
-
-   /* The frame is ACKed. */
-   if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) {
-   if (sackedTCPCB_RETRANS) {
-   if ((dup_sack  in_sack) 
-   (sackedTCPCB_SACKED_ACKED))
-   reord = min(fack_count, reord);
-   }
-
-   /* Nothing to do; acked frame is about to be 
dropped. */
-   fack_count += tcp_skb_pcount(skb);
-   continue;
+   if (dup_sack  (sacked  TCPCB_RETRANS)) {
+   if (after(TCP_SKB_CB(skb)-end_seq, 
tp-undo_marker))
+   tp-undo_retrans--;
+   if (!after(TCP_SKB_CB(skb)-end_seq, 
tp-snd_una) 
+   (sacked  TCPCB_SACKED_ACKED))
+   reord = min(fack_count, reord);
}
 
-   if (!in_sack) {
+
+   /* Nothing to do; acked frame is about to be dropped 
(was ACKed). */
+   if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) {
fack_count += tcp_skb_pcount(skb);
continue;
}
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/10] [TCP]: non-FACK SACK follows conservative SACK loss recovery

2007-11-15 Thread Ilpo Järvinen

Many assumptions that are true when no reordering or other
strange events happen are not a part of the RFC3517. FACK
implementation is based on such assumptions. Previously (before
the rewrite) the non-FACK SACK was basically doing fast rexmit
and then it times out all skbs when first cumulative ACK arrives,
which cannot really be called SACK based recovery :-).

RFC3517 SACK disables these things:
- Per SKB timeouts  head timeout entry to recovery
- Marking at least one skb while in recovery (RFC3517 does this
  only for the fast retransmission but not for the other skbs
  when cumulative ACKs arrive in the recovery)
- Sacktag's loss detection flavors B and C (see comment before
  tcp_sacktag_write_queue)

This does not implement the last resort rule 3 of NextSeg, which
allows retransmissions also when not enough SACK blocks have yet
arrived above a segment for IsLost to return true [RFC3517].

The implementation differs from RFC3517 in these points:
- Rate-halving is used instead of FlightSize / 2
- Instead of using dupACKs to trigger the recovery, the number
  of SACK blocks is used as FACK does with SACK blocks+holes
  (which provides more accurate number). It seems that the
  difference can affect negatively only if the receiver does not
  generate SACK blocks at all even though it claimed to be
  SACK-capable.
- Dupthresh is not a constant one. Dynamical adjustments include
  both holes and sacked segments (equal to what FACK has) due to
  complexity involved in determining the number sacked blocks
  between highest_sack and the reordered segment. Thus it's will
  be an over-estimate.

Implementation note:

tcp_clean_rtx_queue doesn't need a lost_cnt tweak because head
skb at that point cannot be SACKED_ACKED (nor would such
situation last for long enough to cause problems).

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   80 ++---
 1 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 48c059d..c1b5339 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -863,6 +863,9 @@ void tcp_enter_cwr(struct sock *sk, const int set_ssthresh)
  */
 static void tcp_disable_fack(struct tcp_sock *tp)
 {
+   /* RFC3517 uses different metric in lost marker = reset on change */
+   if (tcp_is_fack(tp))
+   tp-lost_skb_hint = NULL;
tp-rx_opt.sack_ok = ~2;
 }
 
@@ -1470,6 +1473,13 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-sacked_out += tcp_skb_pcount(skb);
 
fack_count += tcp_skb_pcount(skb);
+
+   /* Lost marker hint past SACKed? Tweak RFC3517 
cnt */
+   if (!tcp_is_fack(tp)  (tp-lost_skb_hint != 
NULL) 
+   before(TCP_SKB_CB(skb)-seq,
+  TCP_SKB_CB(tp-lost_skb_hint)-seq))
+   tp-lost_cnt_hint += 
tcp_skb_pcount(skb);
+
if (fack_count  tp-fackets_out)
tp-fackets_out = fack_count;
 
@@ -1504,7 +1514,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
flag = ~FLAG_ONLY_ORIG_SACKED;
}
 
-   if (tp-retrans_out 
+   if (tcp_is_fack(tp)  tp-retrans_out 
after(highest_sack_end_seq, tp-lost_retrans_low) 
icsk-icsk_ca_state == TCP_CA_Recovery)
flag |= tcp_mark_lost_retrans(sk, highest_sack_end_seq);
@@ -1858,6 +1868,26 @@ static inline int tcp_fackets_out(struct tcp_sock *tp)
return tcp_is_reno(tp) ? tp-sacked_out+1 : tp-fackets_out;
 }
 
+/* Heurestics to calculate number of duplicate ACKs. There's no dupACKs
+ * counter when SACK is enabled (without SACK, sacked_out is used for
+ * that purpose).
+ *
+ * Instead, with FACK TCP uses fackets_out that includes both SACKed
+ * segments up to the highest received SACK block so far and holes in
+ * between them.
+ *
+ * With reordering, holes may still be in flight, so RFC3517 recovery
+ * uses pure sacked_out (total number of SACKed segments) even though
+ * it violates the RFC that uses duplicate ACKs, often these are equal
+ * but when e.g. out-of-window ACKs or packet duplication occurs,
+ * they differ. Since neither occurs due to loss, TCP should really
+ * ignore them.
+ */
+static inline int tcp_dupack_heurestics(struct tcp_sock *tp)
+{
+   return tcp_is_fack(tp) ? tp-fackets_out : tp-sacked_out + 1;
+}
+
 static inline int tcp_skb_timedout(struct sock *sk, struct sk_buff *skb)
 {
return (tcp_time_stamp - TCP_SKB_CB(skb)-when  
inet_csk(sk)-icsk_rto);
@@ -1978,13 +2008,13 @@ static int tcp_time_to_recover(struct sock *sk)
return 1;
 
/* Not-A-Trick#2 : Classic rule... */
-   if (tcp_fackets_out(tp)

[PATCH 02/10] [TCP]: Extend reordering detection to cover CA_Loss partially

2007-11-15 Thread Ilpo Järvinen

This implements more accurately what is stated in sacktag's
overall comment:

  Both of these heuristics are not used in Loss state, when
   we cannot account for retransmits accurately.

When CA_Loss state is entered, the state changer ensures that
undo_marker is only set if no TCPCB_RETRANS skbs were found,
thus having non-zero undo_marker in CA_Loss basically tells
that the R-bits still accurately reflect the current state
of TCP.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c470b5a..48c059d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1511,7 +1511,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 
tcp_verify_left_out(tp);
 
-   if ((reord  tp-fackets_out)  icsk-icsk_ca_state != TCP_CA_Loss 
+   if ((reord  tp-fackets_out) 
+   ((icsk-icsk_ca_state != TCP_CA_Loss) || tp-undo_marker) 
(!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark)))
tcp_update_reordering(sk, tp-fackets_out - reord, 0);
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] [TCP]: Create tcp_sacktag_one().

2007-11-15 Thread Ilpo Järvinen

Worker function that implements the main logic of
the inner-most loop of tcp_sacktag_write_queue().

Idea was originally presented by David S. Miller.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |  192 +-
 1 files changed, 96 insertions(+), 96 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 29fff81..b301abb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1240,6 +1240,99 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct 
sk_buff *skb,
return in_sack;
 }
 
+static int tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
+  int *reord, int dup_sack, int fack_count)
+{
+   u8 sacked = TCP_SKB_CB(skb)-sacked;
+   int flag = 0;
+
+   /* Account D-SACK for retransmitted packet. */
+   if (dup_sack  (sacked  TCPCB_RETRANS)) {
+   if (after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker))
+   tp-undo_retrans--;
+   if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una) 
+   (sacked  TCPCB_SACKED_ACKED))
+   *reord = min(fack_count, *reord);
+   }
+
+   /* Nothing to do; acked frame is about to be dropped (was ACKed). */
+   if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una))
+   return flag;
+
+   if (!(sacked  TCPCB_SACKED_ACKED)) {
+   if (sacked  TCPCB_SACKED_RETRANS) {
+   /* If the segment is not tagged as lost,
+* we do not clear RETRANS, believing
+* that retransmission is still in flight.
+*/
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked =
+   ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
+   tp-lost_out -= tcp_skb_pcount(skb);
+   tp-retrans_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   } else {
+   if (!(sacked  TCPCB_RETRANS)) {
+   /* New sack for not retransmitted frame,
+* which was in hole. It is reordering.
+*/
+   if (before(TCP_SKB_CB(skb)-seq,
+  tcp_highest_sack_seq(tp)))
+   *reord = min(fack_count, *reord);
+
+   /* SACK enhanced F-RTO (RFC4138; Appendix B) */
+   if (!after(TCP_SKB_CB(skb)-end_seq, 
tp-frto_highmark))
+   flag |= FLAG_ONLY_ORIG_SACKED;
+   }
+
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST;
+   tp-lost_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   }
+
+   TCP_SKB_CB(skb)-sacked |= TCPCB_SACKED_ACKED;
+   flag |= FLAG_DATA_SACKED;
+   tp-sacked_out += tcp_skb_pcount(skb);
+
+   fack_count += tcp_skb_pcount(skb);
+
+   /* Lost marker hint past SACKed? Tweak RFC3517 cnt */
+   if (!tcp_is_fack(tp)  (tp-lost_skb_hint != NULL) 
+   before(TCP_SKB_CB(skb)-seq,
+  TCP_SKB_CB(tp-lost_skb_hint)-seq))
+   tp-lost_cnt_hint += tcp_skb_pcount(skb);
+
+   if (fack_count  tp-fackets_out)
+   tp-fackets_out = fack_count;
+
+   if (after(TCP_SKB_CB(skb)-seq, tcp_highest_sack_seq(tp)))
+   tp-highest_sack = skb;
+
+   } else {
+   if (dup_sack  (sacked  TCPCB_RETRANS))
+   *reord = min(fack_count, *reord);
+   }
+
+   /* D-SACK. We can detect redundant retransmission in S|R and plain R
+* frames and clear it. undo_retrans is decreased above, L|R frames
+* are accounted above as well.
+*/
+   if (dup_sack  (TCP_SKB_CB(skb)-sacked  TCPCB_SACKED_RETRANS)) {
+   TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_RETRANS;
+   tp-retrans_out -= tcp_skb_pcount(skb);
+   tp-retransmit_skb_hint = NULL;
+   }
+
+   return flag;
+}
+
 static int
 tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 
prior_snd_una)
 {
@@ -1375,7 +1468,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 
tcp_for_write_queue_from(skb, sk) {
int in_sack = 0;
-   u8 sacked;
 
if (skb

[PATCH 08/10] [TCP]: Earlier SACK block verification simplify access to them

2007-11-15 Thread Ilpo Järvinen

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h  |2 +-
 net/ipv4/tcp_input.c |   85 ++
 2 files changed, 52 insertions(+), 35 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 34acee6..794497c 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -330,7 +330,7 @@ struct tcp_sock {
struct tcp_sack_block duplicate_sack[1]; /* D-SACK block */
struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/
 
-   struct tcp_sack_block_wire recv_sack_cache[4];
+   struct tcp_sack_block recv_sack_cache[4];
 
struct sk_buff *highest_sack;   /* highest skb with SACK received
 * (validity guaranteed only if
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b301abb..69f2f79 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1340,9 +1340,11 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
struct tcp_sock *tp = tcp_sk(sk);
unsigned char *ptr = (skb_transport_header(ack_skb) +
  TCP_SKB_CB(ack_skb)-sacked);
-   struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
+   struct tcp_sack_block_wire *sp_wire = (struct tcp_sack_block_wire 
*)(ptr+2);
+   struct tcp_sack_block sp[4];
struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
+   int used_sacks;
int reord = tp-packets_out;
int flag = 0;
int found_dup_sack = 0;
@@ -1357,7 +1359,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-highest_sack = tcp_write_queue_head(sk);
}
 
-   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp,
+   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp_wire,
 num_sacks, prior_snd_una);
if (found_dup_sack)
flag |= FLAG_DSACKING_ACK;
@@ -1372,14 +1374,49 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (!tp-packets_out)
goto out;
 
+   used_sacks = 0;
+   first_sack_index = 0;
+   for (i = 0; i  num_sacks; i++) {
+   int dup_sack = !i  found_dup_sack;
+
+   sp[used_sacks].start_seq = 
ntohl(get_unaligned(sp_wire[i].start_seq));
+   sp[used_sacks].end_seq = 
ntohl(get_unaligned(sp_wire[i].end_seq));
+
+   if (!tcp_is_sackblock_valid(tp, dup_sack,
+   sp[used_sacks].start_seq,
+   sp[used_sacks].end_seq)) {
+   if (dup_sack) {
+   if (!tp-undo_marker)
+   
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDNOUNDO);
+   else
+   
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDOLD);
+   } else {
+   /* Don't count olds caused by ACK reordering */
+   if ((TCP_SKB_CB(ack_skb)-ack_seq != 
tp-snd_una) 
+   !after(sp[used_sacks].end_seq, tp-snd_una))
+   continue;
+   NET_INC_STATS_BH(LINUX_MIB_TCPSACKDISCARD);
+   }
+   if (i == 0)
+   first_sack_index = -1;
+   continue;
+   }
+
+   /* Ignore very old stuff early */
+   if (!after(sp[used_sacks].end_seq, prior_snd_una))
+   continue;
+
+   used_sacks++;
+   }
+
/* SACK fastpath:
 * if the only SACK change is the increase of the end_seq of
 * the first block then only apply that SACK block
 * and use retrans queue hinting otherwise slowpath */
force_one_sack = 1;
-   for (i = 0; i  num_sacks; i++) {
-   __be32 start_seq = sp[i].start_seq;
-   __be32 end_seq = sp[i].end_seq;
+   for (i = 0; i  used_sacks; i++) {
+   u32 start_seq = sp[i].start_seq;
+   u32 end_seq = sp[i].end_seq;
 
if (i == 0) {
if (tp-recv_sack_cache[i].start_seq != start_seq)
@@ -1398,19 +1435,17 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-recv_sack_cache[i].end_seq = 0;
}
 
-   first_sack_index = 0;
if (force_one_sack)
-   num_sacks = 1;
+   used_sacks = 1;
else {
int j;
tp-fastpath_skb_hint = NULL;
 
/* order SACK blocks to allow in order walk of the retrans 
queue */
-   for (i = num_sacks-1; i  0; i--) {
+   for (i = used_sacks - 1; i  0; i--) {

[PATCH 10/10] [TCP]: Track sacktag (DEVEL PATCH)

2007-11-15 Thread Ilpo Järvinen

This is not intented to go to mainline, provided just for those
who are interested enough about the algorithm internals during
a test.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/snmp.h |   19 +++
 net/ipv4/proc.c  |   19 +++
 net/ipv4/tcp_input.c |   50 --
 3 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/include/linux/snmp.h b/include/linux/snmp.h
index 89f0c2b..fbcd62d 100644
--- a/include/linux/snmp.h
+++ b/include/linux/snmp.h
@@ -214,6 +214,25 @@ enum
LINUX_MIB_TCPDSACKIGNOREDOLD,   /* TCPSACKIgnoredOld */
LINUX_MIB_TCPDSACKIGNOREDNOUNDO,/* TCPSACKIgnoredNoUndo */
LINUX_MIB_TCPSPURIOUSRTOS,  /* TCPSpuriousRTOs */
+   LINUX_MIB_TCP_SACK0,
+   LINUX_MIB_TCP_SACK1,
+   LINUX_MIB_TCP_SACK2,
+   LINUX_MIB_TCP_SACK3,
+   LINUX_MIB_TCP_SACK4,
+   LINUX_MIB_TCP_WALKEDSKBS,
+   LINUX_MIB_TCP_WALKEDDSACKS,
+   LINUX_MIB_TCP_SKIPPEDSKBS,
+   LINUX_MIB_TCP_NOCACHE,
+   LINUX_MIB_TCP_HEADWALK,
+   LINUX_MIB_TCP_FULLSKIP,
+   LINUX_MIB_TCP_TAILSKIP,
+   LINUX_MIB_TCP_HEADSKIP_TOHIGH,
+   LINUX_MIB_TCP_TAIL_TOHIGH,
+   LINUX_MIB_TCP_HEADSKIP,
+   LINUX_MIB_TCP_NEWSKIP,
+   LINUX_MIB_TCP_FULLWALK,
+   LINUX_MIB_TCP_TAILWALK,
+   LINUX_MIB_TCP_CACHEREMAINING,
__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index ce34b28..a5e842d 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -227,6 +227,25 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM(TCPDSACKIgnoredOld, LINUX_MIB_TCPDSACKIGNOREDOLD),
SNMP_MIB_ITEM(TCPDSACKIgnoredNoUndo, LINUX_MIB_TCPDSACKIGNOREDNOUNDO),
SNMP_MIB_ITEM(TCPSpuriousRTOs, LINUX_MIB_TCPSPURIOUSRTOS),
+   SNMP_MIB_ITEM(TCP_SACK0, LINUX_MIB_TCP_SACK0),
+   SNMP_MIB_ITEM(TCP_SACK1, LINUX_MIB_TCP_SACK1),
+   SNMP_MIB_ITEM(TCP_SACK2, LINUX_MIB_TCP_SACK2),
+   SNMP_MIB_ITEM(TCP_SACK3, LINUX_MIB_TCP_SACK3),
+   SNMP_MIB_ITEM(TCP_SACK4, LINUX_MIB_TCP_SACK4),
+   SNMP_MIB_ITEM(TCP_WALKEDSKBS, LINUX_MIB_TCP_WALKEDSKBS),
+   SNMP_MIB_ITEM(TCP_WALKEDDSACKS, LINUX_MIB_TCP_WALKEDDSACKS),
+   SNMP_MIB_ITEM(TCP_SKIPPEDSKBS, LINUX_MIB_TCP_SKIPPEDSKBS),
+   SNMP_MIB_ITEM(TCP_NOCACHE, LINUX_MIB_TCP_NOCACHE),
+   SNMP_MIB_ITEM(TCP_FULLWALK, LINUX_MIB_TCP_FULLWALK),
+   SNMP_MIB_ITEM(TCP_HEADWALK, LINUX_MIB_TCP_HEADWALK),
+   SNMP_MIB_ITEM(TCP_TAILWALK, LINUX_MIB_TCP_TAILWALK),
+   SNMP_MIB_ITEM(TCP_FULLSKIP, LINUX_MIB_TCP_FULLSKIP),
+   SNMP_MIB_ITEM(TCP_TAILSKIP, LINUX_MIB_TCP_TAILSKIP),
+   SNMP_MIB_ITEM(TCP_HEADSKIP, LINUX_MIB_TCP_HEADSKIP),
+   SNMP_MIB_ITEM(TCP_HEADSKIP_TOHIGH, LINUX_MIB_TCP_HEADSKIP_TOHIGH),
+   SNMP_MIB_ITEM(TCP_TAIL_TOHIGH, LINUX_MIB_TCP_TAIL_TOHIGH),
+   SNMP_MIB_ITEM(TCP_NEWSKIP, LINUX_MIB_TCP_NEWSKIP),
+   SNMP_MIB_ITEM(TCP_CACHEREMAINING, LINUX_MIB_TCP_CACHEREMAINING),
SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5833b01..87ab327 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1370,6 +1370,10 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff 
*skb, struct sock *sk,
*flag |= tcp_sacktag_one(skb, tp, reord, dup_sack, 
*fack_count);
 
*fack_count += tcp_skb_pcount(skb);
+
+   NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDSKBS);
+   if (dup_sack)
+   NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDDSACKS);
}
return skb;
 }
@@ -1386,6 +1390,8 @@ static struct sk_buff *tcp_sacktag_skip(struct sk_buff 
*skb, struct sock *sk,
 
if (before(TCP_SKB_CB(skb)-end_seq, skip_to_seq))
break;
+
+   NET_INC_STATS_BH(LINUX_MIB_TCP_SKIPPEDSKBS);
}
return skb;
 }
@@ -1434,6 +1440,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
int fack_count;
int i, j;
int first_sack_index;
+   int fullwalk = 1;
 
if (!tp-sacked_out) {
if (WARN_ON(tp-fackets_out))
@@ -1523,6 +1530,17 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
cache++;
}
 
+   switch (used_sacks) {
+   case 0: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK0); break;
+   case 1: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK1); break;
+   case 2: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK2); break;
+   case 3: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK3); break;
+   case 4: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK4); break;
+   }
+
+   if (!tcp_sack_cache_ok(tp, cache))
+   NET_INC_STATS_BH(LINUX_MIB_TCP_NOCACHE);
+
while (i  used_sacks) {
u32 start_seq = sp[i].start_seq;
u32 end_seq = sp[i].end_seq;
@@

Re: [PATCH 1/2] cleanup pernet operation without CONFIG_NET_NS

2007-11-15 Thread Eric W. Biederman

Denis V. Lunev [EMAIL PROTECTED] writes:

 If CONFIG_NET_NS is not set, the only namespace is possible.

 This patch removes list of pernet_operations and cleanups code a bit.
 This list is not needed if there are no namespaces. We should just call
 -init method.

 Additionally, the -exit will be called on module unloading only. This
 case is safe - the code is not discarded. For the in/kernel code, -exit
 should never be called.

This patch looks sane, and reasonable in the !CONFIG_NET_NS case.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] move unneeded data to initdata section

2007-11-15 Thread Denis V. Lunev

Eric W. Biederman wrote:
 Denis V. Lunev [EMAIL PROTECTED] writes:
 
 This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35

 It diets .text  .data section of the kernel if CONFIG_NET_NS is not set.
 This is safe after list operations cleanup.
 
 Ok.  This patch is technically safe because none of the touched
 code can live in a module and so we never touch the exit code path.
 
 However in the general case and as a code idiom this __net_initdata
 on struct pernet_operations is fundamentally horribly broken.
 
 Look at what happens if we use this idiom in module.  There
 is only one definition of __initdata .init.data.  The module
 loader places all sections that begin with .init in a region of
 memory that will be discarded after module initialization.  

nothing is discarded after module load. Though, I can be wrong. Could
you point me to the exact place?

Regards,
Den
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] move unneeded data to initdata section

2007-11-15 Thread Eric W. Biederman

Denis V. Lunev [EMAIL PROTECTED] writes:

 This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35

 It diets .text  .data section of the kernel if CONFIG_NET_NS is not set.
 This is safe after list operations cleanup.

Ok.  This patch is technically safe because none of the touched
code can live in a module and so we never touch the exit code path.

However in the general case and as a code idiom this __net_initdata
on struct pernet_operations is fundamentally horribly broken.

Look at what happens if we use this idiom in module.  There
is only one definition of __initdata .init.data.  The module
loader places all sections that begin with .init in a region of
memory that will be discarded after module initialization.  

So in register_pernet_operations we pass in the a pointer to struct
pernet_operations and call the init method.  Later when we remove the
module we again pass in the pointer to struct pernet_operations which
lived in an init section so it has been discarded.  We dereference
that pointer to find the exit method and KABOOM

So I'm still opposed to __net_initdata on the grounds that at best
it is like putting our head under a guillotine and reaching up and
sawing at the row that holds the blade up with a pocket knife.  It is
a think rope and a puny knife so you are safe for a while

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [alsa-devel] [BUG] New Kernel Bugs

2007-11-15 Thread Rene Herman


On 15-11-07 14:00, Jörn Engel wrote:

And even without mails being held hostage for weeks, every single 
moderation mail is annoying.  Like the one I'm sure to receive after 
sending this out.


Certainly. Upto this thread I wasn't actually aware the list was doing that. 
While it might be informative once, getting it each time quickly gets old. 
Don't know if mailman can do anything like it but I'd suggest anyone running 
a non-subscriber-moderation list configure it to send such messages at most 
once a time-period per address or some such. And just disable the message 
if it cannot do that.


Fortunately, alsa-devel is (almost) no longer such a list anyway as it's 
moving to vger. Hurrah. David -- thanks.


Rene.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-11-15 Thread Sam Ravnborg

   +
   +struct timer_list stattimer; /* timer for statistics update */
   +struct s_stats  stats;   /* packet statistics */
   +struct s_pstats pstats;  /* receive list statistics */
  
  More global variables without prefix.
 
 These variables are not exported with EXPORT_SYMBOL, so there should
 be no name conflict.  They cannot be made static because they are used
 in af_can.c and proc.c.  Nevertheless we can prefix them with can_ if
 you still think it's necessary.

When this is build-in they will be in the global kernel namespace.
So please add can_ prefix.

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-11-15 Thread Sam Ravnborg

On Thu, Nov 15, 2007 at 04:05:30AM -0800, David Miller wrote:
 From: Urs Thuermann [EMAIL PROTECTED]
 Date: 15 Nov 2007 12:51:34 +0100

  I prefer our code because it is shorter (fits into one line) and can
  be used anywhere where an expression is allowed compared to only where
  a statement is allowed.  Actually, I first had

  #define DBG( ... )   ((debug  1)  printk( ... ))

  and so on, but that didn't work with can_debug_{cframe,sbk} since they
  return void.

  Admitted, the benefit of expr vs. statement is really negligible and
  since this issue has come up several times I will change these macros
  using do-while.

 I really frown upon these local debugging macros people tend to want
 to submit with their changes.

 It really craps up the tree, even though it might be useful to you.

 So please remove this stuff or replace the debugging statements
 with some generic kernel debugging facility, there are several.

It would be usefull if someone could make a short intro to the preferred
ones and we could stuff it in Documentation/*

Had same comment but had nowhere to point the can guys at.

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] move unneeded data to initdata section

2007-11-15 Thread Sam Ravnborg

On Thu, Nov 15, 2007 at 05:42:04PM +0300, Denis V. Lunev wrote:
 Eric W. Biederman wrote:
  Denis V. Lunev [EMAIL PROTECTED] writes:
  
  This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35
 
  It diets .text  .data section of the kernel if CONFIG_NET_NS is not set.
  This is safe after list operations cleanup.
  
  Ok.  This patch is technically safe because none of the touched
  code can live in a module and so we never touch the exit code path.
  
  However in the general case and as a code idiom this __net_initdata
  on struct pernet_operations is fundamentally horribly broken.
  
  Look at what happens if we use this idiom in module.  There
  is only one definition of __initdata .init.data.  The module
  loader places all sections that begin with .init in a region of
  memory that will be discarded after module initialization.  
 
 nothing is discarded after module load. Though, I can be wrong. Could
 you point me to the exact place?
If __initdata is not discarded after module load then we should do it.
There is no reason to waste __initdata RAM when the module is loaded.

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25 4/6] net: Make AF_PACKET handle multiple network namespaces

2007-11-15 Thread Denis V. Lunev

This is done by making packet_sklist_lock and packet_sklist per
network namespace and adding an additional filter condition on
received packets to ensure they came from the proper network
namespace.

Changes from v1:
- prohibit to call inet_dgram_ops.ioctl in other than init_net

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 include/net/net_namespace.h |4 +
 net/packet/af_packet.c  |  131 ---
 2 files changed, 89 insertions(+), 46 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 90802a6..4d0d634 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -32,6 +32,10 @@ struct net {
struct hlist_head   *dev_index_head;
 
struct sock *rtnl;  /* rtnetlink socket */
+
+   /* List of all packet sockets. */
+   rwlock_tpacket_sklist_lock;
+   struct hlist_head   packet_sklist;
 };
 
 #ifdef CONFIG_NET
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8a7807d..45e3cbc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -135,10 +135,6 @@ dev-hard_header == NULL (ll header is added by device, we 
cannot control it)
packet classifier depends on it.
  */
 
-/* List of all packet sockets. */
-static HLIST_HEAD(packet_sklist);
-static DEFINE_RWLOCK(packet_sklist_lock);
-
 /* Private packet socket structures. */
 
 struct packet_mclist
@@ -246,9 +242,6 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct 
net_device *dev,  struct
struct sock *sk;
struct sockaddr_pkt *spkt;
 
-   if (dev-nd_net != init_net)
-   goto out;
-
/*
 *  When we registered the protocol we saved the socket in the data
 *  field for just this event.
@@ -270,6 +263,9 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct 
net_device *dev,  struct
if (skb-pkt_type == PACKET_LOOPBACK)
goto out;
 
+   if (dev-nd_net != sk-sk_net)
+   goto out;
+
if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
goto oom;
 
@@ -341,7 +337,7 @@ static int packet_sendmsg_spkt(struct kiocb *iocb, struct 
socket *sock,
 */
 
saddr-spkt_device[13] = 0;
-   dev = dev_get_by_name(init_net, saddr-spkt_device);
+   dev = dev_get_by_name(sk-sk_net, saddr-spkt_device);
err = -ENODEV;
if (dev == NULL)
goto out_unlock;
@@ -449,15 +445,15 @@ static int packet_rcv(struct sk_buff *skb, struct 
net_device *dev, struct packet
int skb_len = skb-len;
unsigned int snaplen, res;
 
-   if (dev-nd_net != init_net)
-   goto drop;
-
if (skb-pkt_type == PACKET_LOOPBACK)
goto drop;
 
sk = pt-af_packet_priv;
po = pkt_sk(sk);
 
+   if (dev-nd_net != sk-sk_net)
+   goto drop;
+
skb-dev = dev;
 
if (dev-header_ops) {
@@ -566,15 +562,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct 
net_device *dev, struct packe
struct sk_buff *copy_skb = NULL;
struct timeval tv;
 
-   if (dev-nd_net != init_net)
-   goto drop;
-
if (skb-pkt_type == PACKET_LOOPBACK)
goto drop;
 
sk = pt-af_packet_priv;
po = pkt_sk(sk);
 
+   if (dev-nd_net != sk-sk_net)
+   goto drop;
+
if (dev-header_ops) {
if (sk-sk_type != SOCK_DGRAM)
skb_push(skb, skb-data - skb_mac_header(skb));
@@ -732,7 +728,7 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket 
*sock,
}
 
 
-   dev = dev_get_by_index(init_net, ifindex);
+   dev = dev_get_by_index(sk-sk_net, ifindex);
err = -ENXIO;
if (dev == NULL)
goto out_unlock;
@@ -799,15 +795,17 @@ static int packet_release(struct socket *sock)
 {
struct sock *sk = sock-sk;
struct packet_sock *po;
+   struct net *net;
 
if (!sk)
return 0;
 
+   net = sk-sk_net;
po = pkt_sk(sk);
 
-   write_lock_bh(packet_sklist_lock);
+   write_lock_bh(net-packet_sklist_lock);
sk_del_node_init(sk);
-   write_unlock_bh(packet_sklist_lock);
+   write_unlock_bh(net-packet_sklist_lock);
 
/*
 *  Unhook packet receive handler.
@@ -916,7 +914,7 @@ static int packet_bind_spkt(struct socket *sock, struct 
sockaddr *uaddr, int add
return -EINVAL;
strlcpy(name,uaddr-sa_data,sizeof(name));
 
-   dev = dev_get_by_name(init_net, name);
+   dev = dev_get_by_name(sk-sk_net, name);
if (dev) {
err = packet_do_bind(sk, dev, pkt_sk(sk)-num);
dev_put(dev);
@@ -943,7 +941,7 @@ static int packet_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len
 
if (sll-sll_ifindex) {

[PATCH 2.6.25 5/6] net: Make AF_UNIX per network namespace safe (v2)

2007-11-15 Thread Denis V. Lunev

From 337f0867c81ab93a1bc645e62896a798d0c864ac Mon Sep 17 00:00:00 2001
From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 15 Nov 2007 15:04:12 +0300
Subject: [PATCH] net: Make AF_UNIX per network namespace safe [v2]

Because of the global nature of garbage collection, and because of the
cost of per namespace hash tables unix_socket_table has been kept
global.  With a filter added on lookups so we don't see sockets from
the wrong namespace.

Currently I don't fold the namesapce into the hash so multiple
namespaces using the same socket name will be guaranteed a hash
collision.

Changes from v1:
- fixed unix_seq_open

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 net/unix/af_unix.c |  118 ---
 1 files changed, 92 insertions(+), 26 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e835da8..93d7e55 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -270,7 +270,8 @@ static inline void unix_insert_socket(struct hlist_head 
*list, struct sock *sk)
spin_unlock(unix_table_lock);
 }
 
-static struct sock *__unix_find_socket_byname(struct sockaddr_un *sunname,
+static struct sock *__unix_find_socket_byname(struct net *net,
+ struct sockaddr_un *sunname,
  int len, int type, unsigned hash)
 {
struct sock *s;
@@ -279,6 +280,9 @@ static struct sock *__unix_find_socket_byname(struct 
sockaddr_un *sunname,
sk_for_each(s, node, unix_socket_table[hash ^ type]) {
struct unix_sock *u = unix_sk(s);
 
+   if (s-sk_net != net)
+   continue;
+
if (u-addr-len == len 
!memcmp(u-addr-name, sunname, len))
goto found;
@@ -288,21 +292,22 @@ found:
return s;
 }
 
-static inline struct sock *unix_find_socket_byname(struct sockaddr_un *sunname,
+static inline struct sock *unix_find_socket_byname(struct net *net,
+  struct sockaddr_un *sunname,
   int len, int type,
   unsigned hash)
 {
struct sock *s;
 
spin_lock(unix_table_lock);
-   s = __unix_find_socket_byname(sunname, len, type, hash);
+   s = __unix_find_socket_byname(net, sunname, len, type, hash);
if (s)
sock_hold(s);
spin_unlock(unix_table_lock);
return s;
 }
 
-static struct sock *unix_find_socket_byinode(struct inode *i)
+static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i)
 {
struct sock *s;
struct hlist_node *node;
@@ -312,6 +317,9 @@ static struct sock *unix_find_socket_byinode(struct inode 
*i)
unix_socket_table[i-i_ino  (UNIX_HASH_SIZE - 1)]) {
struct dentry *dentry = unix_sk(s)-dentry;
 
+   if (s-sk_net != net)
+   continue;
+
if(dentry  dentry-d_inode == i)
{
sock_hold(s);
@@ -631,9 +639,6 @@ out:
 
 static int unix_create(struct net *net, struct socket *sock, int protocol)
 {
-   if (net != init_net)
-   return -EAFNOSUPPORT;
-
if (protocol  protocol != PF_UNIX)
return -EPROTONOSUPPORT;
 
@@ -677,6 +682,7 @@ static int unix_release(struct socket *sock)
 static int unix_autobind(struct socket *sock)
 {
struct sock *sk = sock-sk;
+   struct net *net = sk-sk_net;
struct unix_sock *u = unix_sk(sk);
static u32 ordernum = 1;
struct unix_address * addr;
@@ -703,7 +709,7 @@ retry:
spin_lock(unix_table_lock);
ordernum = (ordernum+1)0xF;
 
-   if (__unix_find_socket_byname(addr-name, addr-len, sock-type,
+   if (__unix_find_socket_byname(net, addr-name, addr-len, sock-type,
  addr-hash)) {
spin_unlock(unix_table_lock);
/* Sanity yield. It is unusual case, but yet... */
@@ -723,7 +729,8 @@ out:mutex_unlock(u-readlock);
return err;
 }
 
-static struct sock *unix_find_other(struct sockaddr_un *sunname, int len,
+static struct sock *unix_find_other(struct net *net,
+   struct sockaddr_un *sunname, int len,
int type, unsigned hash, int *error)
 {
struct sock *u;
@@ -741,7 +748,7 @@ static struct sock *unix_find_other(struct sockaddr_un 
*sunname, int len,
err = -ECONNREFUSED;
if (!S_ISSOCK(nd.dentry-d_inode-i_mode))
goto put_fail;
-   u=unix_find_socket_byinode(nd.dentry-d_inode);
+   u=unix_find_socket_byinode(net, nd.dentry-d_inode);
if (!u)
goto put_fail;
 
@@ -757,7 +764,7 @@

[PATCH 2.6.25 6/6] net: consolidate net namespace related proc files creation

2007-11-15 Thread Denis V. Lunev

net: consolidate net namespace related proc files creation

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
---
 fs/proc/proc_net.c   |   38 ++
 include/linux/seq_file.h |   13 +
 net/core/dev.c   |   28 +---
 net/core/dev_mcast.c |   26 --
 net/netlink/af_netlink.c |   33 +++--
 net/packet/af_packet.c   |   26 --
 net/unix/af_unix.c   |   31 ++-
 net/wireless/wext.c  |   24 +++-
 8 files changed, 80 insertions(+), 139 deletions(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 131f9c6..421ea28 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -22,10 +22,48 @@
 #include linux/mount.h
 #include linux/nsproxy.h
 #include net/net_namespace.h
+#include linux/seq_file.h
 
 #include internal.h
 
 
+int seq_open_net(struct inode *ino, struct file *f,
+const struct seq_operations *ops, int size)
+{
+   struct net *net;
+   struct seq_net_private *p;
+
+   BUG_ON(size  sizeof(*p));
+
+   net = get_proc_net(ino);
+   if (net == NULL)
+   return -ENXIO;
+
+   p = __seq_open_private(f, ops, size);
+   if (p == NULL) {
+   put_net(net);
+   return -ENOMEM;
+   }
+   p-net = net;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(seq_open_net);
+
+int seq_release_net(struct inode *ino, struct file *f)
+{
+   struct seq_file *seq;
+   struct seq_net_private *p;
+
+   seq = f-private_data;
+   p = seq-private;
+
+   put_net(p-net);
+   seq_release_private(ino, f);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(seq_release_net);
+
+
 struct proc_dir_entry *proc_net_fops_create(struct net *net,
const char *name, mode_t mode, const struct file_operations *fops)
 {
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index ebbc02b..648dfeb 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -63,5 +63,18 @@ extern struct list_head *seq_list_start_head(struct 
list_head *head,
 extern struct list_head *seq_list_next(void *v, struct list_head *head,
loff_t *ppos);
 
+struct net;
+struct seq_net_private {
+   struct net *net;
+};
+
+int seq_open_net(struct inode *, struct file *,
+const struct seq_operations *, int);
+int seq_release_net(struct inode *, struct file *);
+static inline struct net *seq_file_net(struct seq_file *seq)
+{
+   return ((struct seq_net_private *)seq-private)-net;
+}
+
 #endif
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 86d6261..043e2f8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2360,7 +2360,7 @@ static int dev_ifconf(struct net *net, char __user *arg)
  */
 void *dev_seq_start(struct seq_file *seq, loff_t *pos)
 {
-   struct net *net = seq-private;
+   struct net *net = seq_file_net(seq);
loff_t off;
struct net_device *dev;
 
@@ -2378,7 +2378,7 @@ void *dev_seq_start(struct seq_file *seq, loff_t *pos)
 
 void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-   struct net *net = seq-private;
+   struct net *net = seq_file_net(seq);
++*pos;
return v == SEQ_START_TOKEN ?
first_net_device(net) : next_net_device((struct net_device *)v);
@@ -2477,26 +2477,8 @@ static const struct seq_operations dev_seq_ops = {
 
 static int dev_seq_open(struct inode *inode, struct file *file)
 {
-   struct seq_file *seq;
-   int res;
-   res =  seq_open(file, dev_seq_ops);
-   if (!res) {
-   seq = file-private_data;
-   seq-private = get_proc_net(inode);
-   if (!seq-private) {
-   seq_release(inode, file);
-   res = -ENXIO;
-   }
-   }
-   return res;
-}
-
-static int dev_seq_release(struct inode *inode, struct file *file)
-{
-   struct seq_file *seq = file-private_data;
-   struct net *net = seq-private;
-   put_net(net);
-   return seq_release(inode, file);
+   return seq_open_net(inode, file, dev_seq_ops,
+   sizeof(struct seq_net_private));
 }
 
 static const struct file_operations dev_seq_fops = {
@@ -2504,7 +2486,7 @@ static const struct file_operations dev_seq_fops = {
.open= dev_seq_open,
.read= seq_read,
.llseek  = seq_lseek,
-   .release = dev_seq_release,
+   .release = seq_release_net,
 };
 
 static const struct seq_operations softnet_seq_ops = {
diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
index 69fff16..63f0b33 100644
--- a/net/core/dev_mcast.c
+++ b/net/core/dev_mcast.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL(dev_mc_unsync);
 #ifdef CONFIG_PROC_FS
 static void *dev_mc_seq_start(struct seq_file *seq, loff_t *pos)
 {
-   struct net *net = seq-private;
+

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-15 Thread Alexey Kuznetsov

Hello!

 Send a correct arp reply instead of one with sender ip and sender
 hardware adress in target fields.

I do not see anything more legal in setting target address to 0.


Actually, semantics of target address in ARP reply is ambiguous.
If it is a reply to some real request, it is set to address of requestor
and protocol requires recipient of this arp reply to test that the address
matches its own address before creating new entry triggered by unsolicited
arp reply. That's all.

In the case of duplicate address detection, requestor does not have
any address, so that it is absolutely not essential what we use as target
address. The only place, which could depend on this is the tool, which
tests for duplicate address. At least, arping written by me, should
work with any variant.

So, please, could you explain what did force you to think that use of 0
is better?

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH, take2] netfilter : struct xt_table_info diet

2007-11-15 Thread Eric Dumazet

On Thu, 15 Nov 2007 13:41:54 +0100
Patrick McHardy [EMAIL PROTECTED] wrote:

 Eric Dumazet wrote:
  [PATCH] netfilter : struct xt_table_info diet
  
  Instead of using a big array of NR_CPUS entries, we can compute the size
  needed at runtime, using nr_cpu_ids
  
  This should save some ram (especially on David's machines where
  NR_CPUS=4096 :
  32 KB can be saved per table, and 64KB for dynamically allocated ones
  (because
  of slab/slub alignements) )
  
  In particular, the 'bootstrap' tables are not any more static (in data
  section) but on stack as their size is now very small.
  
  This also should reduce the size used on stack in compat functions
  (get_info() declares an automatic variable, that could be bigger than
  kernel
  stack size for big NR_CPUS)
 
 
 I fixed a compilation error with CONFIG_COMPAT and applied it, thanks
 Eric. One question though:
 
  +#define XT_TABLE_INFO_SZ (offsetof(struct xt_table_info, entries) \
  + + nr_cpu_ids * sizeof(char *))
 
 
  /* overflow check */
  -   if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS -
  -   SMP_CACHE_BYTES)
  +   if (tmp.size = INT_MAX / num_possible_cpus())
  return -ENOMEM;
 
 We need to make sure offsetof(struct xt_table_info, entries) +
 nr_cpu_ids * sizeof(char *) doesn't overflow, so why doesn't it
 use nr_cpu_ids here as well?
 

nr_cpu_ids is = NR_CPUS, so XT_TABLE_INFO_SZ cannot overflow

The 'overflow check' we do here is in fact not very usefull now
that we dont need to multiply tmp.size by NR_CPUS and potentially
overflow the result.

We can delete the test, because kmalloc()/vmalloc() will probably
fail gracefully if we ask too much memory.

We could imagine a dual Opteron machine, with a total of 32GB of ram, and
it could be possible to load a 3GB iptable  (that would consume 2*3GB of ram), 
but the 'overflow check' test actually forbids such a scenario.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25 1/6] net: Modify all rtnetlink methods to only work in the initial namespace (v2)

2007-11-15 Thread Denis V. Lunev

Before I can enable rtnetlink to work in all network namespaces
I need to be certain that something won't break.  So this
patch deliberately disables all of the rtnletlink methods in everything
except the initial network namespace.  After the methods have been
audited this extra check can be disabled.

Changes from v1:
- added IPv6 addrlabel protection

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 net/bridge/br_netlink.c |9 +
 net/core/fib_rules.c|   11 +++
 net/core/neighbour.c|   18 ++
 net/core/rtnetlink.c|   19 +++
 net/decnet/dn_dev.c |   12 
 net/decnet/dn_fib.c |8 
 net/decnet/dn_route.c   |8 
 net/decnet/dn_table.c   |4 
 net/ipv4/devinet.c  |   12 
 net/ipv4/fib_frontend.c |   12 
 net/ipv4/route.c|4 
 net/ipv6/addrconf.c |   31 +++
 net/ipv6/addrlabel.c|   12 
 net/ipv6/ip6_fib.c  |4 
 net/ipv6/route.c|   12 
 net/sched/act_api.c |   10 ++
 net/sched/cls_api.c |   10 ++
 net/sched/sch_api.c |   21 +
 18 files changed, 217 insertions(+), 0 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 53ab8e0..a4ffa2b 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -13,6 +13,7 @@
 #include linux/kernel.h
 #include net/rtnetlink.h
 #include net/net_namespace.h
+#include net/sock.h
 #include br_private.h
 
 static inline size_t br_nlmsg_size(void)
@@ -107,9 +108,13 @@ errout:
  */
 static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   struct net *net = skb-sk-sk_net;
struct net_device *dev;
int idx;
 
+   if (net != init_net)
+   return 0;
+
idx = 0;
for_each_netdev(init_net, dev) {
/* not a bridge port */
@@ -135,12 +140,16 @@ skip:
  */
 static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void 
*arg)
 {
+   struct net *net = skb-sk-sk_net;
struct ifinfomsg *ifm;
struct nlattr *protinfo;
struct net_device *dev;
struct net_bridge_port *p;
u8 new_state;
 
+   if (net != init_net)
+   return -EINVAL;
+
if (nlmsg_len(nlh)  sizeof(*ifm))
return -EINVAL;
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 848132b..3b20b6f 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -228,6 +228,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh, void *arg)
struct nlattr *tb[FRA_MAX+1];
int err = -EINVAL, unresolved = 0;
 
+   if (net != init_net)
+   return -EINVAL;
+
if (nlh-nlmsg_len  nlmsg_msg_size(sizeof(*frh)))
goto errout;
 
@@ -358,12 +361,16 @@ errout:
 
 static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 {
+   struct net *net = skb-sk-sk_net;
struct fib_rule_hdr *frh = nlmsg_data(nlh);
struct fib_rules_ops *ops = NULL;
struct fib_rule *rule, *tmp;
struct nlattr *tb[FRA_MAX+1];
int err = -EINVAL;
 
+   if (net != init_net)
+   return -EINVAL;
+
if (nlh-nlmsg_len  nlmsg_msg_size(sizeof(*frh)))
goto errout;
 
@@ -539,9 +546,13 @@ skip:
 
 static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   struct net *net = skb-sk-sk_net;
struct fib_rules_ops *ops;
int idx = 0, family;
 
+   if (net != init_net)
+   return -EINVAL;
+
family = rtnl_msg_family(cb-nlh);
if (family != AF_UNSPEC) {
/* Protocol specific dump request */
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 175bbc0..29f0a4d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1449,6 +1449,9 @@ static int neigh_delete(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg)
struct net_device *dev = NULL;
int err = -EINVAL;
 
+   if (net != init_net)
+   return -EINVAL;
+
if (nlmsg_len(nlh)  sizeof(*ndm))
goto out;
 
@@ -1515,6 +1518,9 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr 
*nlh, void *arg)
struct net_device *dev = NULL;
int err;
 
+   if (net != init_net)
+   return -EINVAL;
+
err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
if (err  0)
goto out;
@@ -1789,11 +1795,15 @@ static const struct nla_policy 
nl_ntbl_parm_policy[NDTPA_MAX+1] = {
 
 static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
+   struct net *net = skb-sk-sk_net;
struct neigh_table *tbl;
struct ndtmsg *ndtmsg;
struct nlattr *tb[NDTA_MAX+1];
int err;
 
+   if

[PATCH 2.6.25 2/6] net: Make rtnetlink infrastructure network namespace aware (v3)

2007-11-15 Thread Denis V. Lunev

After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 include/linux/rtnetlink.h   |8 +++---
 include/net/net_namespace.h |3 ++
 net/bridge/br_netlink.c |4 +-
 net/core/fib_rules.c|4 +-
 net/core/neighbour.c|4 +-
 net/core/rtnetlink.c|   63 +++---
 net/decnet/dn_dev.c |4 +-
 net/decnet/dn_route.c   |2 +-
 net/decnet/dn_table.c   |4 +-
 net/ipv4/devinet.c  |4 +-
 net/ipv4/fib_semantics.c|4 +-
 net/ipv4/ipmr.c |4 +-
 net/ipv4/route.c|2 +-
 net/ipv6/addrconf.c |   14 +-
 net/ipv6/addrlabel.c|2 +-
 net/ipv6/ndisc.c|5 ++-
 net/ipv6/route.c|6 ++--
 net/sched/act_api.c |8 +++---
 net/sched/cls_api.c |2 +-
 net/sched/sch_api.c |4 +-
 net/wireless/wext.c |5 +++-
 21 files changed, 102 insertions(+), 54 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index e20dcc8..b014f6b 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -620,11 +620,11 @@ extern int __rtattr_parse_nested_compat(struct rtattr 
*tb[], int maxattr,
 ({ data = RTA_PAYLOAD(rta) = len ? RTA_DATA(rta) : NULL; \
__rtattr_parse_nested_compat(tb, max, rta, len); })
 
-extern int rtnetlink_send(struct sk_buff *skb, u32 pid, u32 group, int echo);
-extern int rtnl_unicast(struct sk_buff *skb, u32 pid);
-extern int rtnl_notify(struct sk_buff *skb, u32 pid, u32 group,
+extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 
group, int echo);
+extern int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid);
+extern int rtnl_notify(struct sk_buff *skb, struct net *net, u32 pid, u32 
group,
   struct nlmsghdr *nlh, gfp_t flags);
-extern void rtnl_set_sk_err(u32 group, int error);
+extern void rtnl_set_sk_err(struct net *net, u32 group, int error);
 extern int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics);
 extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
  u32 id, u32 ts, u32 tsage, long expires,
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5dd6d90..90802a6 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -10,6 +10,7 @@
 
 struct proc_dir_entry;
 struct net_device;
+struct sock;
 struct net {
atomic_tcount;  /* To decided when the network
 *  namespace should be freed.
@@ -29,6 +30,8 @@ struct net {
struct list_headdev_base_head;
struct hlist_head   *dev_name_head;
struct hlist_head   *dev_index_head;
+
+   struct sock *rtnl;  /* rtnetlink socket */
 };
 
 #ifdef CONFIG_NET
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index a4ffa2b..f5d6933 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -97,10 +97,10 @@ void br_ifinfo_notify(int event, struct net_bridge_port 
*port)
kfree_skb(skb);
goto errout;
}
-   err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+   err = rtnl_notify(skb, init_net,0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
 errout:
if (err  0)
-   rtnl_set_sk_err(RTNLGRP_LINK, err);
+   rtnl_set_sk_err(init_net, RTNLGRP_LINK, err);
 }
 
 /*
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 3b20b6f..0af0538 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -599,10 +599,10 @@ static void notify_rule_change(int event, struct fib_rule 
*rule,
kfree_skb(skb);
goto errout;
}
-   err = rtnl_notify(skb, pid, ops-nlgroup, nlh, GFP_KERNEL);
+   err = rtnl_notify(skb, init_net, pid, ops-nlgroup, nlh, GFP_KERNEL);
 errout:
if (err  0)
-   rtnl_set_sk_err(ops-nlgroup, err);
+   rtnl_set_sk_err(init_net, ops-nlgroup, err);
 }
 
 static void attach_rules(struct list_head *rules, struct net_device *dev)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 29f0a4d..a8b72c1 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2467,10 +2467,10 @@ static void __neigh_notify(struct neighbour *n, int 
type, int flags)
kfree_skb(skb);
goto errout;
}
-   err = rtnl_notify(skb, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
+   err = rtnl_notify(skb, init_net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);

[PATCH 2.6.25 3/6] net: Make the netlink methods in rtnetlink handle multiple network namespaces

2007-11-15 Thread Denis V. Lunev

From: Eric W. Biederman [EMAIL PROTECTED]

After the previous prep work this just consists of removing checks
limiting the code to work in the initial network namespace, and
updating rtmsg_ifinfo so we can generate events for devices in
something other then the initial network namespace.

Referring to network other network devices like the IFLA_LINK
and IFLA_MASTER attributes do, gets interesting if those network
devices happen to be in other network namespaces.  Currently
ifindex numbers are allocated globally so I have taken the path
of least resistance and not still report the information even
though the devices they are talking about are invisible.

If applications start getting confused or when ifindex
numbers become local to the network namespace we may need
to do something different in the future.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/rtnetlink.c |   27 +++
 1 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index be8e10c..4a07e83 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -705,9 +705,6 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct 
netlink_callback *cb)
int s_idx = cb-args[0];
struct net_device *dev;
 
-   if (net != init_net)
-   return 0;
-
idx = 0;
for_each_netdev(net, dev) {
if (idx  s_idx)
@@ -910,9 +907,6 @@ static int rtnl_setlink(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg)
struct nlattr *tb[IFLA_MAX+1];
char ifname[IFNAMSIZ];
 
-   if (net != init_net)
-   return -EINVAL;
-
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
if (err  0)
goto errout;
@@ -961,9 +955,6 @@ static int rtnl_dellink(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg)
struct nlattr *tb[IFLA_MAX+1];
int err;
 
-   if (net != init_net)
-   return -EINVAL;
-
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
if (err  0)
return err;
@@ -1045,9 +1036,6 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg)
struct nlattr *linkinfo[IFLA_INFO_MAX+1];
int err;
 
-   if (net != init_net)
-   return -EINVAL;
-
 #ifdef CONFIG_KMOD
 replay:
 #endif
@@ -1174,9 +1162,6 @@ static int rtnl_getlink(struct sk_buff *skb, struct 
nlmsghdr* nlh, void *arg)
struct sk_buff *nskb;
int err;
 
-   if (net != init_net)
-   return -EINVAL;
-
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
if (err  0)
return err;
@@ -1212,13 +1197,9 @@ errout:
 
 static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 {
-   struct net *net = skb-sk-sk_net;
int idx;
int s_idx = cb-family;
 
-   if (net != init_net)
-   return 0;
-
if (s_idx == 0)
s_idx = 1;
for (idx=1; idxNPROTO; idx++) {
@@ -1240,6 +1221,7 @@ static int rtnl_dump_all(struct sk_buff *skb, struct 
netlink_callback *cb)
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change)
 {
+   struct net *net = dev-nd_net;
struct sk_buff *skb;
int err = -ENOBUFS;
 
@@ -1254,10 +1236,10 @@ void rtmsg_ifinfo(int type, struct net_device *dev, 
unsigned change)
kfree_skb(skb);
goto errout;
}
-   err = rtnl_notify(skb, init_net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
+   err = rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
 errout:
if (err  0)
-   rtnl_set_sk_err(init_net, RTNLGRP_LINK, err);
+   rtnl_set_sk_err(net, RTNLGRP_LINK, err);
 }
 
 /* Protected by RTNL sempahore.  */
@@ -1350,9 +1332,6 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
 {
struct net_device *dev = ptr;
 
-   if (dev-nd_net != init_net)
-   return NOTIFY_DONE;
-
switch (event) {
case NETDEV_UNREGISTER:
rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
-- 
1.5.3.rc5

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[MACVLAN 00/02]: Macvlan update

2007-11-15 Thread Patrick McHardy

These two patches remove an unnecessary check in macvlan_broadcast() and
add the ability to change the mac address while the device is up.

Please apply, thanks.


 drivers/net/macvlan.c |   26 --
 1 files changed, 24 insertions(+), 2 deletions(-)

Patrick McHardy (2):
  [MACVLAN]: Remove unnecessary IFF_UP check
  [MACVLAN]: Allow setting mac address while device is up
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[MACVLAN 01/02]: Remove unnecessary IFF_UP check

2007-11-15 Thread Patrick McHardy

[MACVLAN]: Remove unnecessary IFF_UP check

Only devices that are UP are in the hash, so macvlan_broadcast() doesn't
need to check for IFF_UP.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit e2d06a34b52a999e8c539d1cdef51ff523e2f2c2
tree f95a5eef37c421950ddc7318797909c0031ee948
parent 86aa441a13a474e66d484af38575609d9a0ff8ec
author Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:33:24 +0100
committer Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:33:24 +0100

 drivers/net/macvlan.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 2e4bcd5..461149c 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -73,8 +73,6 @@ static void macvlan_broadcast(struct sk_buff *skb,
for (i = 0; i  MACVLAN_HASH_SIZE; i++) {
hlist_for_each_entry_rcu(vlan, n, port-vlan_hash[i], hlist) {
dev = vlan-dev;
-   if (unlikely(!(dev-flags  IFF_UP)))
-   continue;
 
nskb = skb_clone(skb, GFP_ATOMIC);
if (nskb == NULL) {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH, take2] netfilter : struct xt_table_info diet

2007-11-15 Thread Patrick McHardy


Eric Dumazet wrote:

On Thu, 15 Nov 2007 13:41:54 +0100
Patrick McHardy [EMAIL PROTECTED] wrote:


+#define XT_TABLE_INFO_SZ (offsetof(struct xt_table_info, entries) \
+ + nr_cpu_ids * sizeof(char *))



/* overflow check */
-   if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS -
-   SMP_CACHE_BYTES)
+   if (tmp.size = INT_MAX / num_possible_cpus())
return -ENOMEM;

We need to make sure offsetof(struct xt_table_info, entries) +
nr_cpu_ids * sizeof(char *) doesn't overflow, so why doesn't it
use nr_cpu_ids here as well?



nr_cpu_ids is = NR_CPUS, so XT_TABLE_INFO_SZ cannot overflow



Yes, but nr_cpu_ids is = num_possible_cpus, which is what we're
using with your patch.


The 'overflow check' we do here is in fact not very usefull now
that we dont need to multiply tmp.size by NR_CPUS and potentially
overflow the result.

We can delete the test, because kmalloc()/vmalloc() will probably
fail gracefully if we ask too much memory.



You're right, I'll remove it. Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re : Re : Re : Bug in using inet_lookup ()

2007-11-15 Thread Evgeniy Polyakov

On Thu, Nov 15, 2007 at 05:29:52PM +0100, Nj A ([EMAIL PROTECTED]) wrote:
 Hello all,
 No bugs are due to the inet_lookup call now using the following:
   if ((s_skb = alloc_skb (MAX_TCP_HEADER + 15, GFP_ATOMIC)) == NULL)
   {
  printk (%s: Unable to allocate memory \n, __FUNCTION__);
  err = -ENOMEM;
   }
   dev = s_skb-dev;
 
   if (!dev)
  printk (%s: no device attached to s_skb\n, __FUNCTION__);
  goto process_dev;
 
   sk = inet_lookup (tcp_hashinfo, src, p_src, dst, p_dst, inet_iif 
 (s_skb));
 
   bh_lock_sock (sk);
 process_dev:
   spin_lock (tmp_lock);
   new_dev = list_entry (tmp, struct net_device, todo_list);
   spin_unlock (tmp_lock);
   if (!new_dev)
  printk (%s: no device attached to new_dev \n, __FUNCTION__);
   s_skb-dev = new_dev;
 
   ...
   bh_unlock_sock (sk);
   ...
 
 However, I am not having the right results. I checked with an established 
 socket and expected to see that the socket is established (which is the case) 
 but got the wrong state when testing on (sk-sk_state) and the socket seems 
 in the TIME_WAIT / CLOSE state.
 
 May be I am corrupting the search by manually attaching a device to the skb?
 Any idea please?

Well, your code will oops just like before - you provide empty skb to
the inet_iif(), which is wrong. Actually you will not even reach that
point, since your code will exit after skb-dev check.

Try simple inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0).
It does work.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Re : Re : Re : Bug in using inet_lookup ()

2007-11-15 Thread Nj A

 Well, your code will oops just like before - you provide empty skb to
 the inet_iif(), which is wrong. Actually you will not even reach that
 point, since your code will exit after skb-dev check.
 
 Try simple inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0).

But trying  inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0), the machine 
either hangs or panics.

Is there any clean manner to come across this issue?


Cheers,


  
_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] New Kernel Bugs

2007-11-15 Thread Theodore Tso

On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote:
 I don't see any reason that we couldn't have a tool accessible to Ubuntu 
 users that does a real git bisect. Git is really good at being scripted 
 by fancy GUIs. It should be easy enough to have a drop down with all of 
 the Ubuntu kernel package releases, where the user selects what works and 
 what doesn't.

It's possible users who haven't yet downloaded a git repository have
to surmount some obstacles that might cause them to lose interest.
First, they have to download some 190 megs of git repository, and if
they have a slow link, that can take a while, and then they have to
build each kernel, which can take a while.  A full kernel build with
everything selected can take good 30 minutes or more, and that's on a
fast dual-core machine with 4gigs of memory and 7200rpm disk drives.
On a slower, memory limited laptop, doing a single kernel build can
take more time than the user has patiences; multiply that by 7 or 8
build and test boots, and it starts to get tiresome.  

And then on top of that there are the issues about whether there is
enough support for dealing with hitting kernel revisions that fail due
to other bugs getting merged in during the -rc1 process, etc.

I agree that a tool that automated the bisection process and walked
the user through it would be helpful, but I believe it would be
possible for us do better.

- Ted
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re : Re : Re : Re : Bug in using inet_lookup ()

2007-11-15 Thread Evgeniy Polyakov

On Thu, Nov 15, 2007 at 04:57:17PM +, Nj A ([EMAIL PROTECTED]) wrote:
  Well, your code will oops just like before - you provide empty skb to
  the inet_iif(), which is wrong. Actually you will not even reach that
  point, since your code will exit after skb-dev check.
  
  Try simple inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0).
 
 But trying  inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0), the 
 machine either hangs or panics.

Hmmm, it does not.
Please show at least one bug trace when inet_lookup(tcp_hashinfo, 0, 0, 0, 0, 
0) fails :)

 Is there any clean manner to come across this issue?

Yes, to show the code you are using.
Sorry, all mind readers are on vacations.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] New Kernel Bugs

2007-11-15 Thread Daniel Barkalow

On Thu, 15 Nov 2007, Theodore Tso wrote:

 On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote:
  I don't see any reason that we couldn't have a tool accessible to Ubuntu 
  users that does a real git bisect. Git is really good at being scripted 
  by fancy GUIs. It should be easy enough to have a drop down with all of 
  the Ubuntu kernel package releases, where the user selects what works and 
  what doesn't.
 
 It's possible users who haven't yet downloaded a git repository have
 to surmount some obstacles that might cause them to lose interest.
 First, they have to download some 190 megs of git repository, and if
 they have a slow link, that can take a while, and then they have to
 build each kernel, which can take a while.

It should be possible for it to clone only the portion that they actually 
care about based on where the known-good version is. It should also (in 
theory, anyway) be possible to put off some amount of the download until 
it's actually going to be relevant.

 A full kernel build with everything selected can take good 30 minutes or 
 more, and that's on a fast dual-core machine with 4gigs of memory and 
 7200rpm disk drives. On a slower, memory limited laptop, doing a single 
 kernel build can take more time than the user has patiences; multiply 
 that by 7 or 8 build and test boots, and it starts to get tiresome.

None of this is going to take as long, even on a slow link and a slow 
computer, as waiting for a response to a mailing list post. It'd annoy 
users who are specifically waiting for it, but if the interface is that 
the user says kernel package X didn't work but the current kernel does, 
and it says I'll let you know when I've got something to test, and the 
user watches a DVD, and afterward finds a message saying there's something 
to test, and tries it, and reports how it went, and the process repeats 
until it narrows it down to a single commit after a couple of days of the 
user getting occasional responses, it's not that different from asking for 
help online.

 And then on top of that there are the issues about whether there is
 enough support for dealing with hitting kernel revisions that fail due
 to other bugs getting merged in during the -rc1 process, etc.

Could have a distro-provided mask of things that aren't worth testing and 
possibly back-ported fixes for revisions in particular ranges.

 I agree that a tool that automated the bisection process and walked
 the user through it would be helpful, but I believe it would be
 possible for us do better.

That would probably help for giving the user something to try right away. 
I still think that the main cost to the user is the number of times that 
the user has to stop doing stuff to reboot with a kernel to test, whether 
the test kernels are available quickly from the distro site, slowly built 
locally, or slowly as suggested by humans helping online.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Re : Re : Bug in using inet_lookup ()

2007-11-15 Thread Nj A

Hello all,
No bugs are due to the inet_lookup call now using the following:
  if ((s_skb = alloc_skb (MAX_TCP_HEADER + 15, GFP_ATOMIC)) == NULL)
  {
 printk (%s: Unable to allocate memory \n, __FUNCTION__);
 err = -ENOMEM;
  }
  dev = s_skb-dev;

  if (!dev)
 printk (%s: no device attached to s_skb\n, __FUNCTION__);
 goto process_dev;

  sk = inet_lookup (tcp_hashinfo, src, p_src, dst, p_dst, inet_iif 
(s_skb));

  bh_lock_sock (sk);
process_dev:
  spin_lock (tmp_lock);
  new_dev = list_entry (tmp, struct net_device, todo_list);
  spin_unlock (tmp_lock);
  if (!new_dev)
 printk (%s: no device attached to new_dev \n, __FUNCTION__);
  s_skb-dev = new_dev;

  ...
  bh_unlock_sock (sk);
  ...

However, I am not having the right results. I checked with an established 
socket and expected to see that the socket is established (which is the case) 
but got the wrong state when testing on (sk-sk_state) and the socket seems in 
the TIME_WAIT / CLOSE state.

May be I am corrupting the search by manually attaching a device to the skb?
Any idea please?

Cheers,

- Message d'origine 
 De : Evgeniy Polyakov [EMAIL PROTECTED]
 À : Nj A [EMAIL PROTECTED]
 Cc : netdev@vger.kernel.org
 Envoyé le : Jeudi, 15 Novembre 2007, 11h12mn 28s
 Objet : Re: Re : Re : Bug in using inet_lookup ()
 
 On Wed, Nov 14, 2007 at 04:47:22PM +, Nj A ([EMAIL PROTECTED]) wrote:
  By setting the ID of the ingress device to the inet_lookup() to 0, the 
  machine
 reboots automatically.
  Setting proc/sys/kernel/panic* to non zero values dosn't help more..
 
 Sorry, I did not understand?
 You mean after you provide zero to inet_lookup() instead of device id it
 strted to reboot?
 
 -- 
 Evgeniy Polyakov
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at 
 http://vger.kernel.org/majordomo-info.html



  
_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[MACVLAN 02/02]: Allow setting mac address while device is up

2007-11-15 Thread Patrick McHardy

[MACVLAN]: Allow setting mac address while device is up

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 3c50588260810d735231220f9a8ebaa6a6e8fb1e
tree 48ee2625502caf2454263a05b5a9869648de3aed
parent e2d06a34b52a999e8c539d1cdef51ff523e2f2c2
author Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:38:06 +0100
committer Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:38:06 +0100

 drivers/net/macvlan.c |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 461149c..3acf8cd 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -213,6 +213,29 @@ static int macvlan_stop(struct net_device *dev)
return 0;
 }
 
+static int macvlan_set_mac_address(struct net_device *dev, void *p)
+{
+   struct macvlan_dev *vlan = netdev_priv(dev);
+   struct net_device *lowerdev = vlan-lowerdev;
+   struct sockaddr *addr = p;
+   int err;
+
+   if (!is_valid_ether_addr(addr-sa_data))
+   return -EADDRNOTAVAIL;
+
+   if (!(dev-flags  IFF_UP))
+   goto out;
+
+   err = dev_unicast_add(lowerdev, addr-sa_data, ETH_ALEN);
+   if (err  0)
+   return err;
+   dev_unicast_delete(lowerdev, dev-dev_addr, ETH_ALEN);
+
+out:
+   memcpy(dev-dev_addr, addr-sa_data, ETH_ALEN);
+   return 0;
+}
+
 static void macvlan_change_rx_flags(struct net_device *dev, int change)
 {
struct macvlan_dev *vlan = netdev_priv(dev);
@@ -300,6 +323,7 @@ static void macvlan_setup(struct net_device *dev)
dev-stop   = macvlan_stop;
dev-change_mtu = macvlan_change_mtu;
dev-change_rx_flags= macvlan_change_rx_flags;
+   dev-set_mac_address= macvlan_set_mac_address;
dev-set_multicast_list = macvlan_set_multicast_list;
dev-hard_start_xmit= macvlan_hard_start_xmit;
dev-destructor = free_netdev;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)

2007-11-15 Thread Templin, Fred L

Yoshifuji,

See below for follow-up:

 -Original Message-
 From: YOSHIFUJI Hideaki / 吉藤英明 [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, November 15, 2007 3:22 AM
 To: Templin, Fred L
 Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
 Subject: Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)
 
 In article 
 [EMAIL PROTECTED]
eing.com (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, Fred L  [EMAIL 
PROTECTED] says:
 
  --- linux-2.6.24-rc2/net/ipv6/addrconf.c.orig   
 2007-11-08 11:59:35.0 -0800
  +++ linux-2.6.24-rc2/net/ipv6/addrconf.c2007-11-14 
 22:17:28.0 -0800
  @@ -1424,6 +1424,21 @@ static int addrconf_ifid_infiniband(u8 *
  return 0;
   }
   
  +static int addrconf_ifid_isatap(u8 *eui, __be32 addr)
  +{
  +
  +   eui[0] = 0x02; eui[1] = 0; eui[2] = 0x5E; eui[3] = 0xFE;
  +   memcpy (eui+4, addr, 4);
  +
  +   if (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) ||
  +   LINKLOCAL_169(addr) || PRIVATE_172(addr) || 
 TEST_192(addr) ||
  +   ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) ||
  +   MULTICAST(addr) || BADCLASS(addr))
  +   eui[0] = ~0x02;
  +
  +   return 0;
  +}
  +
   static int ipv6_generate_eui64(u8 *eui, struct net_device *dev)
   {
  switch (dev-type) {
 
 {
   eui[0] = (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) ||
 LINKLOCAL_169(addr) || PRIVATE_172(addr) || 
 TEST_192(addr) ||
 ANYCAST_6TO4(addr) || PRIVATE_192(addr) || 
 TEST_198(addr) ||
 MULTICAST(addr) || BADCLASS(addr)) ? 0 : 2;
   eui[1] = 0;
   eui[2] = 0x5E;
   eui[3] = 0xFE;
   memcpy (eui+4, addr, 4);
 }

OK; I'll make this change.

  @@ -2167,7 +2185,8 @@ static void addrconf_dev_config(struct n
  (dev-type != ARPHRD_FDDI) 
  (dev-type != ARPHRD_IEEE802_TR) 
  (dev-type != ARPHRD_ARCNET) 
  -   (dev-type != ARPHRD_INFINIBAND)) {
  +   (dev-type != ARPHRD_INFINIBAND) 
  +   !(dev-priv_flags  IFF_ISATAP)) {
  /* Alas, we support only Ethernet autoconfiguration. */
  return;
  }
 
 Because priv_flags are local to device type, you need to 
 check dev-type:
   (dev-type == ARPHRD_SIT  !(dev-priv_flags  IFF_ISATAP))
 or something like this.

OK.

  +   struct ip_tunnel *t  = netdev_priv(ifp-idev-dev);
  +   if (t-parms.i_key != INADDR_NONE) {
  +   spin_lock(ifp-lock);
 
 I guess INADDR_ANY.

No; INADDR_NONE is correct. Non-zero router value is the way
'ip' tells the kernel that the interface is ISATAP. INADDR_NONE
means ISATAP, but no router. The ISATAP router will never be
INADDR_ANY.

Thanks - Fred
[EMAIL PROTECTED]

 --yoshfuji
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)

2007-11-15 Thread Templin, Fred L

Yoshifuji, 

 -Original Message-
 From: YOSHIFUJI Hideaki / 吉藤英明 [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, November 15, 2007 3:48 AM
 To: Templin, Fred L
 Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
 Subject: Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)

 In article 
 [EMAIL PROTECTED]
 eing.com (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, 
 Fred L [EMAIL PROTECTED] says:

  From: Fred L. Templin [EMAIL PROTECTED]

  This patch includes support for the Intra-Site Automatic Tunnel
  Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
  module, and is configured using extensions to the iproute2
  utility.

  The following diffs are specific to the Linux 2.6.24-rc2 kernel
  distribution. This message includes the full and patchable 
 diff text;
  please use this version to apply patches.

  Signed-off-by: Fred L. Templin [EMAIL PROTECTED]

 BTW, how will we handle DNS name (and TTL) and/or multiple PRL entries 
 in RFC4214?

 I'm doubting if we really need to handle PRL refresh in kernel.

DNS name and PRL refresh are done in a daemon that either exec's
'ip' or issues the device ioctl's directly. When there are multiple default
router IPv4 addresses, the daemon picks one as the primary and writes
it to the kernel. It can then change to a different primary later if it wants
to. Also possible is something like VRRP to allow several routers for
fault tolerance even though there is only a single default router address. 

Thanks - Fred
[EMAIL PROTECTED]

 --yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] move unneeded data to initdata section

2007-11-15 Thread Eric W. Biederman

Sam Ravnborg [EMAIL PROTECTED] writes:

 On Thu, Nov 15, 2007 at 05:42:04PM +0300, Denis V. Lunev wrote:
 
 nothing is discarded after module load. Though, I can be wrong. Could
 you point me to the exact place?
 If __initdata is not discarded after module load then we should do it.
 There is no reason to waste __initdata RAM when the module is loaded.

Down at the bottom of sys_init_module we have:

/* Drop initial reference. */
module_put(mod);
unwind_remove_table(mod-unwind_info, 1);

module_free(mod, mod-module_init);
^
mod-module_init = NULL;
mod-init_size = 0;
mod-init_text_size = 0;
mutex_unlock(module_mutex);

return 0;

Which frees the memory for the .init sections.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] add bnx2x driver for BCM57710 - bnx2x_init.h

2007-11-15 Thread Eliezer Tamir

posting individual files for comments.
---

/* bnx2x_init.h: Broadcom Everest network driver.
 *
 * Copyright (c) 2007 Broadcom Corporation
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation.
 *
 * Written by: Eliezer Tamir [EMAIL PROTECTED]
 */

#ifndef BNX2X_INIT_H
#define BNX2X_INIT_H

#define COMMON  0x1
#define PORT0   0x2
#define PORT1   0x4

#define INIT_EMULATION  0x1
#define INIT_FPGA   0x2
#define INIT_ASIC   0x4
#define INIT_HARDWARE   0x7

#define STORM_INTMEM_SIZE   (0x5800 / 4)
#define TSTORM_INTMEM_ADDR  0x1a
#define CSTORM_INTMEM_ADDR  0x22
#define XSTORM_INTMEM_ADDR  0x2a
#define USTORM_INTMEM_ADDR  0x32


/* Init operation types and structures */

#define OP_RD   0x1 /* read single register */
#define OP_WR   0x2 /* write single register */
#define OP_IW   0x3 /* write single register using mailbox */
#define OP_SW   0x4 /* copy a string to the device */
#define OP_SI   0x5 /* copy a string using mailbox */
#define OP_ZR   0x6 /* clear memory */
#define OP_ZP   0x7 /* unzip then copy with DMAE */
#define OP_WB   0x8 /* copy a string using DMAE */

struct raw_op {
u32 op  :8;
u32 offset  :24;
u32 raw_data;
};

struct op_read {
u32 op  :8;
u32 offset  :24;
u32 pad;
};

struct op_write {
u32 op  :8;
u32 offset  :24;
u32 val;
};

struct op_string_write {
u32 op  :8;
u32 offset  :24;
#ifdef __LITTLE_ENDIAN
u16 data_off;
u16 data_len;
#else /* __BIG_ENDIAN */
u16 data_len;
u16 data_off;
#endif
};

struct op_zero {
u32 op  :8;
u32 offset  :24;
u32 len;
};

union init_op {
struct op_read  read;
struct op_write write;
struct op_string_write  str_wr;
struct op_zero  zero;
struct raw_op   raw;
};

#include bnx2x_init_values.h

static void bnx2x_reg_wr_ind(struct bnx2x *bp, u32 addr, u32 val);

static void bnx2x_write_dmae(struct bnx2x *bp, dma_addr_t dma_addr,
 u32 dst_addr, u32 len32);

static int bnx2x_gunzip(struct bnx2x *bp, u8 *zbuf, int len);

static void bnx2x_init_str_wr(struct bnx2x *bp, u32 addr, const u32 *data,
  u32 len)
{
int i;

for (i = 0; i  len; i++) {
REG_WR(bp, addr + i*4, data[i]);
if (!(i % 1)) {
touch_softlockup_watchdog();
cpu_relax();
}
}
}

#define INIT_MEM_WR(reg, data, reg_off, len) \
bnx2x_init_str_wr(bp, reg + reg_off*4, data, len)

static void bnx2x_init_ind_wr(struct bnx2x *bp, u32 addr, const u32 *data,
  u16 len)
{
int i;

for (i = 0; i  len; i++) {
REG_WR_IND(bp, addr + i*4, data[i]);
if (!(i % 1)) {
touch_softlockup_watchdog();
cpu_relax();
}
}
}

static void bnx2x_init_wr_wb(struct bnx2x *bp, u32 addr, const u32 *data,
 u32 len, int gunzip)
{
int offset = 0;

if (gunzip) {
int rc;
#ifdef __BIG_ENDIAN
int i, size;
u32 *temp;

temp = kmalloc(len, GFP_KERNEL);
size = (len / 4) + ((len % 4) ? 1 : 0);
for (i = 0; i  size; i++)
temp[i] = swab32(data[i]);
data = temp;
#endif
rc = bnx2x_gunzip(bp, (u8 *)data, len);
if (rc) {
DP(NETIF_MSG_HW, gunzip failed ! rc %d\n, rc);
return;
}
len = bp-gunzip_outlen;
#ifdef __BIG_ENDIAN
kfree(temp);
for (i = 0; i  len; i++)
 ((u32 *)bp-gunzip_buf)[i] =
swab32(((u32 *)bp-gunzip_buf)[i]);
#endif
} else {
if ((len * 4)  FW_BUF_SIZE) {
BNX2X_ERR(LARGE DMAE OPERATION ! len 0x%x\n, len*4);
return;
}
memcpy(bp-gunzip_buf, data, len * 4);
}

while (len  DMAE_LEN32_MAX) {
bnx2x_write_dmae(bp, bp-gunzip_mapping + offset,
 addr + offset, DMAE_LEN32_MAX);
offset += DMAE_LEN32_MAX * 4;
len -=

Re: tg3: strange errors and non-working-ness

2007-11-15 Thread Michael Chan

On Thu, 2007-11-15 at 13:17 -0600, Jon Nelson wrote:

 Is this what you mean? I pulled this from the quoted text:
 
 Nov 10 22:45:52 frank kernel: NETDEV WATCHDOG: eth0: transmit timed out
 

Right.  This explains the reset at 22:45:52, but not the earlier reset
at 22:24:40.  Link never came up after that earlier reset.

Is this a new problem introduced by a new driver?  I notice you are
using tg3 3.65.  Have you used newer versions or older versions?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] move unneeded data to initdata section

2007-11-15 Thread Sam Ravnborg

On Thu, Nov 15, 2007 at 10:17:14PM +0300, Denis V. Lunev wrote:
 
 will you mind against this?

 diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
 index 5dd6d90..d136707 100644
 --- a/include/net/net_namespace.h
 +++ b/include/net/net_namespace.h
 @@ -119,10 +119,14 @@ static inline struct net *maybe_get_net(struct net *net)
  #ifdef CONFIG_NET_NS
  #define __net_init
  #define __net_exit
 -#define __net_initdata
  #else
  #define __net_init   __init
  #define __net_exit   __exit_refok
 +#endif
 +
 +#if defined(CONFIG_NET_NS) || defined(MODULE)
 +#define __net_initdata
 +#else
  #define __net_initdata   __initdata
  #endif
  
n principle I am against this approach.
__initdata is far too overloaded with different stuff.

A much more preferred approach should be to create new sections
named for example .init.data.net and .init.data.net.module

And then in include/asm-generic/vmlinux.lds.h decide the
location of these sections.

On top of this we would have to teach modpost about these new sections.
But the advantage of this approach is that the section mismatch
checks are *independent* of the module being a MODULE or build-in.
The check will still happen.

In this way we avoid the situation where a warning only pops up
in certain configurations.

To do so will obviously require a bit more linker script
consolidation but if you or some else could step in a do this
it would be great!

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3: strange errors and non-working-ness

2007-11-15 Thread Jon Nelson

On 11/15/07, Michael Chan [EMAIL PROTECTED] wrote:
 On Thu, 2007-11-15 at 10:47 +0100, Jarek Poplawski wrote:
  On 13-11-2007 19:57, Jon Nelson wrote:
   The best info I've got is this:

 It looks like the card is being reset periodically.  Every time the card
 gets reset, you'll see those PM messages in the version of the driver
 you're using.  Do you see NETDEV WATCHDOG message as well in the dmesg
 log?

Is this what you mean? I pulled this from the quoted text:

Nov 10 22:45:52 frank kernel: NETDEV WATCHDOG: eth0: transmit timed out



-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] r6040 various bugfixes

2007-11-15 Thread Florian Fainelli

Hello Stephen,

Le jeudi 15 novembre 2007, Stephen Hemminger a écrit :
 Looks good, thanks:

 There is a function to make this easier:
  @@ -756,10 +803,8 @@ r6040_open(struct net_device *dev)
  if (lp-switch_sig != ICPLUS_PHY_ID) {
  /* set and active a timer process */
  init_timer(lp-timer);
  -   lp-timer.expires = TIMER_WUT;
  lp-timer.data = (unsigned long)dev;
  lp-timer.function = r6040_timer;
  -   add_timer(lp-timer);

 Could be:
   setup_timer(lp-timer, r6040_timer, dev);
   if (lp-switch_sig != ICPLUS_PHY_ID)
   mod_timer(lp-timer, jiffies + HZ);

I will send a fix later when I have tested your suggestion to use slightly 
larger buffer and skb_reserve(skb, NET_IP_ALIGN).

Thank you.
-- 
Florian
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)

2007-11-15 Thread YOSHIFUJI Hideaki / 吉藤英明

In article [EMAIL PROTECTED] (at Thu, 15 Nov 2007 10:11:16 -0800), Templin, 
Fred L [EMAIL PROTECTED] says:

 Yoshifuji, 

  -Original Message-
  From: YOSHIFUJI Hideaki / 吉藤英明 [mailto:[EMAIL PROTECTED] 
  Sent: Thursday, November 15, 2007 3:48 AM
  To: Templin, Fred L
  Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]
  Subject: Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)

  In article 
  [EMAIL PROTECTED]
  eing.com (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, 
  Fred L [EMAIL PROTECTED] says:

   From: Fred L. Templin [EMAIL PROTECTED]

   This patch includes support for the Intra-Site Automatic Tunnel
   Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
   module, and is configured using extensions to the iproute2
   utility.

   The following diffs are specific to the Linux 2.6.24-rc2 kernel
   distribution. This message includes the full and patchable 
  diff text;
   please use this version to apply patches.

   Signed-off-by: Fred L. Templin [EMAIL PROTECTED]

  BTW, how will we handle DNS name (and TTL) and/or multiple PRL entries 
  in RFC4214?

  I'm doubting if we really need to handle PRL refresh in kernel.

 DNS name and PRL refresh are done in a daemon that either exec's
 'ip' or issues the device ioctl's directly. When there are multiple default
 router IPv4 addresses, the daemon picks one as the primary and writes
 it to the kernel. It can then change to a different primary later if it wants
 to. Also possible is something like VRRP to allow several routers for
 fault tolerance even though there is only a single default router address. 

Why?  All PRLs should be installed in kernel so that standard router 
selection can be used.  For this, I think we should have just one
isatap interface per set of PRLs provideing virtual link,
especially if each of them provides the same prefix.

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] move unneeded data to initdata section

2007-11-15 Thread Sam Ravnborg

On Thu, Nov 15, 2007 at 11:19:26AM -0700, Eric W. Biederman wrote:
 Sam Ravnborg [EMAIL PROTECTED] writes:
 
  On Thu, Nov 15, 2007 at 05:42:04PM +0300, Denis V. Lunev wrote:
  
  nothing is discarded after module load. Though, I can be wrong. Could
  you point me to the exact place?
  If __initdata is not discarded after module load then we should do it.
  There is no reason to waste __initdata RAM when the module is loaded.
 
 Down at the bottom of sys_init_module we have:
 
   /* Drop initial reference. */
   module_put(mod);
   unwind_remove_table(mod-unwind_info, 1);
 
   module_free(mod, mod-module_init);
 ^
   mod-module_init = NULL;
   mod-init_size = 0;
   mod-init_text_size = 0;
   mutex_unlock(module_mutex);
 
   return 0;
 
 Which frees the memory for the .init sections.

Thanks for clarifying this Eric - should have looked myself..

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3: strange errors and non-working-ness

2007-11-15 Thread Michael Chan

On Thu, 2007-11-15 at 10:47 +0100, Jarek Poplawski wrote:
 On 13-11-2007 19:57, Jon Nelson wrote:
  The best info I've got is this:

It looks like the card is being reset periodically.  Every time the card
gets reset, you'll see those PM messages in the version of the driver
you're using.  Do you see NETDEV WATCHDOG message as well in the dmesg
log?

  
  Nov 10 22:21:19 frank kernel: tg3.c:v3.65 (August 07, 2006)
  Nov 10 22:21:19 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] -
  Link [LNKB] - GSI 3 (level, low) - IRQ 3
  Nov 10 22:21:19 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105
  PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet
  00:09:5b:09:b1:69
  Nov 10 22:21:19 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0]
  ASF[0] Split[0] WireSpeed[1] TSOcap[0]
  Nov 10 22:21:19 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit]
  Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset b (was 164514e4, writing 302a1385)
  Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 3 (was 0, writing 4008)
  Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 2 (was 200, writing 215)
  Nov 10 22:21:19 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
  Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full 
  duplex.
  Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and
  on for RX.
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset b (was 164514e4, writing 302a1385)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 3 (was 0, writing 4008)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 2 (was 200, writing 215)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
  Nov 10 22:21:20 frank kernel: ACPI: PCI interrupt for device
  :00:0b.0 disabled
  Nov 10 22:21:20 frank kernel: PCI: Enabling device :00:0b.0 (0100 - 
  0102)
  Nov 10 22:21:20 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] -
  Link [LNKB] - GSI 3 (level, low) - IRQ 3
  Nov 10 22:21:20 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105
  PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet
  00:09:5b:09:b1:69
  Nov 10 22:21:20 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0]
  ASF[0] Split[0] WireSpeed[1] TSOcap[0]
  Nov 10 22:21:20 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit]
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset b (was 164514e4, writing 302a1385)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 3 (was 0, writing 4008)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 2 (was 200, writing 215)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
  Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full 
  duplex.
  Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and
  on for RX.
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset b (was 164514e4, writing 302a1385)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 3 (was 0, writing 4008)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 2 (was 200, writing 215)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset b (was 164514e4, writing 302a1385)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 3 (was 0, writing 4008)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 2 (was 200, writing 215)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
  :00:0b.0 at offset 1 (was 2b0, writing 2b00106)
  Nov 10 22:21:20 frank kernel: PM: Writing back config space on device

Re: [PATCH 2.6.25] add bnx2x driver for BCM57710 - bnx2x_fw_defs.h

2007-11-15 Thread Eliezer Tamir

posting individual files for comments.
---

/* bnx2x_fw_defs.h: Broadcom Everest network driver.
 *
 * Copyright (c) 2007 Broadcom Corporation
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation.
 */


#define CSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x1922 + (port * 0x40) + (index * 0x4))
#define CSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1900 + (port * 0x40))
#define CSTORM_HC_BTR_OFFSET(port)\
(0x1984 + (port * 0xc0))
#define CSTORM_SB_HC_DISABLE_OFFSET(port, cpu_id, index)\
(0x141a + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define CSTORM_SB_HC_TIMEOUT_OFFSET(port, cpu_id, index)\
(0x1418 + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define CSTORM_SB_HOST_SB_ADDR_OFFSET(port, cpu_id)\
(0x1400 + (port * 0x280) + (cpu_id * 0x28))
#define CSTORM_STATS_FLAGS_OFFSET(port) (0x5108 + (port * 0x8))
#define TSTORM_CLIENT_CONFIG_OFFSET(port, client_id)\
(0x1510 + (port * 0x240) + (client_id * 0x20))
#define TSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x138a + (port * 0x28) + (index * 0x4))
#define TSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1370 + (port * 0x28))
#define TSTORM_ETH_STATS_QUERY_ADDR_OFFSET(port)\
(0x4b70 + (port * 0x8))
#define TSTORM_FUNCTION_COMMON_CONFIG_OFFSET(function)\
(0x1418 + (function * 0x30))
#define TSTORM_HC_BTR_OFFSET(port)\
(0x13c4 + (port * 0x18))
#define TSTORM_INDIRECTION_TABLE_OFFSET(port)\
(0x22c8 + (port * 0x80))
#define TSTORM_INDIRECTION_TABLE_SIZE   0x80
#define TSTORM_MAC_FILTER_CONFIG_OFFSET(port)\
(0x1420 + (port * 0x30))
#define TSTORM_RCQ_PROD_OFFSET(port, client_id)\
(0x1508 + (port * 0x240) + (client_id * 0x20))
#define TSTORM_STATS_FLAGS_OFFSET(port) (0x4b90 + (port * 0x8))
#define USTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x191a + (port * 0x28) + (index * 0x4))
#define USTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1900 + (port * 0x28))
#define USTORM_HC_BTR_OFFSET(port)\
(0x1954 + (port * 0xb8))
#define USTORM_MEM_WORKAROUND_ADDRESS_OFFSET(port)\
(0x5408 + (port * 0x8))
#define USTORM_SB_HC_DISABLE_OFFSET(port, cpu_id, index)\
(0x141a + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define USTORM_SB_HC_TIMEOUT_OFFSET(port, cpu_id, index)\
(0x1418 + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define USTORM_SB_HOST_SB_ADDR_OFFSET(port, cpu_id)\
(0x1400 + (port * 0x280) + (cpu_id * 0x28))
#define XSTORM_ASSERT_LIST_INDEX_OFFSET 0x1000
#define XSTORM_ASSERT_LIST_OFFSET(idx)  (0x1020 + (idx * 0x10))
#define XSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x141a + (port * 0x28) + (index * 0x4))
#define XSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1400 + (port * 0x28))
#define XSTORM_ETH_STATS_QUERY_ADDR_OFFSET(port)\
(0x5408 + (port * 0x8))
#define XSTORM_HC_BTR_OFFSET(port)\
(0x1454 + (port * 0x18))
#define XSTORM_SPQ_PAGE_BASE_OFFSET(port)\
(0x5328 + (port * 0x18))
#define XSTORM_SPQ_PROD_OFFSET(port)\
(0x5330 + (port * 0x18))
#define XSTORM_STATS_FLAGS_OFFSET(port) (0x53f8 + (port * 0x8))
#define COMMON_ASM_INVALID_ASSERT_OPCODE 0x0

/**
* This file defines HSI constatnts for the ETH flow
*/

/* hash types */
#define DEFAULT_HASH_TYPE   0
#define IPV4_HASH_TYPE  1
#define TCP_IPV4_HASH_TYPE  2
#define IPV6_HASH_TYPE  3
#define TCP_IPV6_HASH_TYPE  4

/* values of command IDs in the ramrod message */
#define RAMROD_CMD_ID_ETH_PORT_SETUP(80)
#define RAMROD_CMD_ID_ETH_CLIENT_SETUP  (85)
#define RAMROD_CMD_ID_ETH_STAT_QUERY(90)
#define RAMROD_CMD_ID_ETH_UPDATE(100)
#define RAMROD_CMD_ID_ETH_HALT  (105)
#define RAMROD_CMD_ID_ETH_SET_MAC   (110)
#define RAMROD_CMD_ID_ETH_CFC_DEL   (115)
#define RAMROD_CMD_ID_ETH_PORT_DEL  (120)
#define RAMROD_CMD_ID_ETH_FORWARD_SETUP (125)


/* command values for set mac command */
#define T_ETH_MAC_COMMAND_SET   0
#define T_ETH_MAC_COMMAND_INVALIDATE1

#define T_ETH_INDIRECTION_TABLE_SIZE128

/* Maximal L2 clients supported */
#define ETH_MAX_RX_CLIENTS  (18)

/**
* This file defines HSI constatnts common to all microcode flows
*/

/* Connection types */
#define ETH_CONNECTION_TYPE 0

#define PROTOCOL_STATE_BIT_OFFSET   6

#define ETH_STATE   (ETH_CONNECTION_TYPE  PROTOCOL_STATE_BIT_OFFSET)

/* microcode fixed page page size 4K (chains and ring segments)

[PATCH] r6040 various bugfixes

2007-11-15 Thread Florian Fainelli

This patch fixes various bugfixes spotted by Stephen, thanks !

- add functions to allocate/free TX and RX buffers
- recover from transmit timeout and use the 4 helpers defined below
- use netdev_alloc_skb instead of dev_alloc_skb
- do not use a private stats structure to store statistics
- break each TX/RX error to a separate line for better reading
- suppress volatiles and make checkpatch happy
- better control of the timer
- fix spin_unlock_irq typo in netdev_get_settings
- fix various typos and spelling in the driver

Signed-off-by: Florian Fainelli [EMAIL PROTECTED]
-- 
diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
index edce5a4..529c903 100644
--- a/drivers/net/r6040.c
+++ b/drivers/net/r6040.c
@@ -172,7 +172,6 @@ struct r6040_private {
struct net_device *dev;
struct mii_if_info mii_if;
struct napi_struct napi;
-   struct net_device_stats stats;
u16 napi_rx_running;
void __iomem *base;
 };
@@ -233,18 +232,121 @@ static void mdio_write(struct net_device *dev, int 
mii_id, int reg, int val)
phy_write(ioaddr, lp-phy_addr, reg, val);
 }
 
+static void r6040_free_txbufs(struct net_device *dev)
+{
+   struct r6040_private *lp = netdev_priv(dev);
+   int i;
+
+   for (i = 0; i  TX_DCNT; i++) {
+   if (lp-tx_insert_ptr-skb_ptr) {
+   pci_unmap_single(lp-pdev, lp-tx_insert_ptr-buf,
+   MAX_BUF_SIZE, PCI_DMA_TODEVICE);
+   dev_kfree_skb(lp-tx_insert_ptr-skb_ptr);
+   lp-rx_insert_ptr-skb_ptr = NULL;
+   }
+   lp-tx_insert_ptr = lp-tx_insert_ptr-vndescp;
+   }
+}
+
+static void r6040_free_rxbufs(struct net_device *dev)
+{
+   struct r6040_private *lp = netdev_priv(dev);
+   int i;
+
+   for (i = 0; i  RX_DCNT; i++) {
+   if (lp-rx_insert_ptr-skb_ptr) {
+   pci_unmap_single(lp-pdev, lp-rx_insert_ptr-buf,
+   MAX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+   dev_kfree_skb(lp-rx_insert_ptr-skb_ptr);
+   lp-rx_insert_ptr-skb_ptr = NULL;
+   }
+   lp-rx_insert_ptr = lp-rx_insert_ptr-vndescp;
+   }
+}
+
+static void r6040_alloc_txbufs(struct net_device *dev)
+{
+   struct r6040_private *lp = netdev_priv(dev);
+   struct r6040_descriptor *descptr;
+   int i;
+   dma_addr_t desc_dma, start_dma;
+
+   lp-tx_free_desc = TX_DCNT;
+   /* Zero all descriptors */
+   memset(lp-desc_pool, 0, ALLOC_DESC_SIZE);
+   lp-tx_insert_ptr = (struct r6040_descriptor *)lp-desc_pool;
+   lp-tx_remove_ptr = lp-tx_insert_ptr;
+
+   /* Init TX descriptor */
+   descptr = lp-tx_insert_ptr;
+   desc_dma = lp-desc_dma;
+   start_dma = desc_dma;
+   for (i = 0; i  TX_DCNT; i++) {
+   descptr-ndesc = cpu_to_le32(desc_dma +
+   sizeof(struct r6040_descriptor));
+   descptr-vndescp = (descptr + 1);
+   descptr = (descptr + 1);
+   desc_dma += sizeof(struct r6040_descriptor);
+   }
+   (descptr - 1)-ndesc = cpu_to_le32(start_dma);
+   (descptr - 1)-vndescp = lp-tx_insert_ptr;
+}
+
+static void r6040_alloc_rxbufs(struct net_device *dev)
+{
+   struct r6040_private *lp = netdev_priv(dev);
+   struct r6040_descriptor *descptr;
+   int i;
+   dma_addr_t desc_dma, start_dma;
+
+   lp-rx_free_desc = 0;
+   /* Zero all descriptors */
+   memset(lp-desc_pool, 0, ALLOC_DESC_SIZE);
+   lp-rx_insert_ptr = (struct r6040_descriptor *)lp-tx_insert_ptr +
+   TX_DCNT;
+   lp-rx_remove_ptr = lp-rx_insert_ptr;
+
+   /* Init RX descriptor */
+   start_dma = desc_dma;
+   descptr = lp-rx_insert_ptr;
+   for (i = 0; i  RX_DCNT; i++) {
+   descptr-ndesc = cpu_to_le32(desc_dma +
+   sizeof(struct r6040_descriptor));
+   descptr-vndescp = (descptr + 1);
+   descptr = (descptr + 1);
+   desc_dma += sizeof(struct r6040_descriptor);
+   }
+   (descptr - 1)-ndesc = cpu_to_le32(start_dma);
+   (descptr - 1)-vndescp = lp-rx_insert_ptr;
+}
+
 static void
 r6040_tx_timeout(struct net_device *dev)
 {
struct r6040_private *priv = netdev_priv(dev);
+   void __iomem *ioaddr = priv-base;
 
+   printk(KERN_WARNING %s: transmit timed out, status %4.4x, PHY status 
+   %4.4x\n,
+   dev-name, ioread16(ioaddr + MIER),
+   mdio_read(dev, priv-mii_if.phy_id, MII_BMSR));
disable_irq(dev-irq);
napi_disable(priv-napi);
+
spin_lock(priv-lock);
-   dev-stats.tx_errors++;
+   /* Clear all descriptors */
+   r6040_free_txbufs(dev);
+   r6040_free_rxbufs(dev);
+   r6040_alloc_txbufs(dev);
+   r6040_alloc_rxbufs(dev);
+
+   /* Reset MAC */
+   iowrite16(MAC_RST, ioaddr + MCR1);

Re: [PATCH] r6040 various bugfixes

2007-11-15 Thread Stephen Hemminger

On Thu, 15 Nov 2007 19:37:43 +0100
Florian Fainelli [EMAIL PROTECTED] wrote:

 This patch fixes various bugfixes spotted by Stephen, thanks !
 
 - add functions to allocate/free TX and RX buffers
 - recover from transmit timeout and use the 4 helpers defined below
 - use netdev_alloc_skb instead of dev_alloc_skb
 - do not use a private stats structure to store statistics
 - break each TX/RX error to a separate line for better reading
 - suppress volatiles and make checkpatch happy
 - better control of the timer
 - fix spin_unlock_irq typo in netdev_get_settings
 - fix various typos and spelling in the driver
 
 Signed-off-by: Florian Fainelli [EMAIL PROTECTED]

Looks good, thanks:

There is a function to make this easier:
 @@ -756,10 +803,8 @@ r6040_open(struct net_device *dev)
   if (lp-switch_sig != ICPLUS_PHY_ID) {
   /* set and active a timer process */
   init_timer(lp-timer);
 - lp-timer.expires = TIMER_WUT;
   lp-timer.data = (unsigned long)dev;
   lp-timer.function = r6040_timer;
 - add_timer(lp-timer);

Could be:
setup_timer(lp-timer, r6040_timer, dev);
if (lp-switch_sig != ICPLUS_PHY_ID)
mod_timer(lp-timer, jiffies + HZ);


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25] add bnx2x driver for BCM57710

2007-11-15 Thread Eliezer Tamir

Dave,

Here is the latest version for bnx2x.
Please consider applying to 2.6.25.

This patch also applies cleanly to net-2.6 for anyone that would like to
test it.

Major changes from last post.

* parts of the slowpath have been re-factored.
* slowpath task now runs in work queue context,
  which allowed us to replace the mdelays with msleeps.

ftp link
ftp://[EMAIL PROTECTED]/0001-add-bnx2x-driver-for-BCM57710.patch

gzipped 
ftp://[EMAIL PROTECTED]/0001-add-bnx2x-driver-for-BCM57710.patch.gz

I will also post individual files for review as replies to this post.

Thanks,
Eliezer



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] via-velocity: don't oops on MTU change.

2007-11-15 Thread Stephen Hemminger

On Thu, 15 Nov 2007 09:26:00 +0100
Jarek Poplawski [EMAIL PROTECTED] wrote:

 On 15-11-2007 04:38, Stephen Hemminger wrote:
  Simple mtu change when device is down.
  Fix http://bugzilla.kernel.org/show_bug.cgi?id=9382.
  
  Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
  
  
  --- a/drivers/net/via-velocity.c2007-10-22 09:38:11.0 -0700
  +++ b/drivers/net/via-velocity.c2007-11-14 19:34:30.0 -0800
  @@ -1963,6 +1963,11 @@ static int velocity_change_mtu(struct ne
  return -EINVAL;
  }
   
  +   if (!netif_running(dev)) {
  +   dev-mtu = new_mtu;
  +   return 0;
  +   }
  +
  if (new_mtu != oldmtu) {
  spin_lock_irqsave(vptr-lock, flags);
 
 Shouldn't this latter 'if' be removed now, btw?

No, it makes sense that if mtu is same, no action need be taken.

Actually, it would make sense to push the same check up into
the netdevice core management.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] add bnx2x driver for BCM57710 - bnx2x.h

2007-11-15 Thread Eliezer Tamir

posting individual files for comments.
---

/* bnx2x.h: Broadcom Everest network driver.
 *
 * Copyright (c) 2007 Broadcom Corporation
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation.
 *
 * Written by: Eliezer Tamir [EMAIL PROTECTED]
 * Based on code from Michael Chan's bnx2 driver
 */

#ifndef BNX2X_H
#define BNX2X_H

/* error/debug prints */

#define DRV_MODULE_NAME bnx2x
#define PFX DRV_MODULE_NAME : 

/* for messages that are currently off */
#define BNX2X_MSG_OFF   0
#define BNX2X_MSG_MCP   0x1 /* was: NETIF_MSG_HW */
#define BNX2X_MSG_STATS 0x2 /* was: NETIF_MSG_TIMER */
#define NETIF_MSG_NVM   0x4 /* was: NETIF_MSG_HW */
#define NETIF_MSG_DMAE  0x8 /* was: NETIF_MSG_HW */

#define DP_LEVELKERN_NOTICE /* was: KERN_DEBUG */

/* regular debug print */
#define DP(__mask, __fmt, __args...) do { \
if (bp-msglevel  (__mask)) \
printk(DP_LEVEL [%s:%d(%s)] __fmt, __FUNCTION__, \
__LINE__, bp-dev?(bp-dev-name):?, ##__args); \
} while (0)

/* for errors (never masked) */
#define BNX2X_ERR(__fmt, __args...) do { \
printk(KERN_ERR [%s:%d(%s)] __fmt, __FUNCTION__, \
__LINE__, bp-dev?(bp-dev-name):?, ##__args); \
} while (0)

/* before we have a dev-name use dev_info() */
#define BNX2X_DEV_INFO(__fmt, __args...) do { \
if (bp-msglevel  NETIF_MSG_PROBE) \
dev_info(bp-pdev-dev, __fmt, ##__args); \
} while (0)


#ifdef BNX2X_STOP_ON_ERROR
#define bnx2x_panic() do { \
bp-panic = 1; \
BNX2X_ERR(driver assert\n); \
bnx2x_disable_int(bp); \
bnx2x_panic_dump(bp); \
} while (0)
#else
#define bnx2x_panic() do { \
BNX2X_ERR(driver assert\n); \
bnx2x_panic_dump(bp); \
} while (0)
#endif


#define U64_LO(x)   (((u64)x)  0x)
#define U64_HI(x)   (((u64)x)  32)
#define HILO_U64(hi, lo)(((u64)hi  32) + lo)


#define REG_ADDR(bp, offset)(bp-regview + offset)

#define REG_RD(bp, offset)  readl(REG_ADDR(bp, offset))
#define REG_RD8(bp, offset) readb(REG_ADDR(bp, offset))
#define REG_RD64(bp, offset)readq(REG_ADDR(bp, offset))

#define REG_WR(bp, offset, val) writel((u32)val, REG_ADDR(bp, offset))
#define REG_WR8(bp, offset, val)writeb((u8)val, REG_ADDR(bp, offset))
#define REG_WR16(bp, offset, val)   writew((u16)val, REG_ADDR(bp, offset))
#define REG_WR32(bp, offset, val)   REG_WR(bp, offset, val)

#define REG_RD_IND(bp, offset)  bnx2x_reg_rd_ind(bp, offset)
#define REG_WR_IND(bp, offset, val) bnx2x_reg_wr_ind(bp, offset, val)

#define REG_WR_DMAE(bp, offset, val, len32) \
do { \
memcpy(bnx2x_sp(bp, wb_data[0]), val, len32 * 4); \
bnx2x_write_dmae(bp, bnx2x_sp_mapping(bp, wb_data), \
 offset, len32); \
} while (0)

#define SHMEM_RD(bp, type) \
REG_RD(bp, bp-shmem_base + offsetof(struct shmem_region, type))
#define SHMEM_WR(bp, type, val) \
REG_WR(bp, bp-shmem_base + offsetof(struct shmem_region, type), val)

#define NIG_WR(reg, val)REG_WR(bp, reg, val)
#define EMAC_WR(reg, val)   REG_WR(bp, emac_base + reg, val)
#define BMAC_WR(reg, val)   REG_WR(bp, GRCBASE_NIG + bmac_addr + reg, val)


#define for_each_queue(bp, var) for (var = 0; var  bp-num_queues; var++)

#define for_each_nondefault_queue(bp, var) \
for (var = 1; var  bp-num_queues; var++)
#define is_multi(bp)(bp-num_queues  1)


struct regp {
u32 lo;
u32 hi;
};

struct bmac_stats {
struct regp tx_gtpkt;
struct regp tx_gtxpf;
struct regp tx_gtfcs;
struct regp tx_gtmca;
struct regp tx_gtgca;
struct regp tx_gtfrg;
struct regp tx_gtovr;
struct regp tx_gt64;
struct regp tx_gt127;
struct regp tx_gt255;   /* 10 */
struct regp tx_gt511;
struct regp tx_gt1023;
struct regp tx_gt1518;
struct regp tx_gt2047;
struct regp tx_gt4095;
struct regp tx_gt9216;
struct regp tx_gt16383;
struct regp tx_gtmax;
struct regp tx_gtufl;
struct regp tx_gterr;   /* 20 */
struct regp tx_gtbyt;

struct regp rx_gr64;
struct regp rx_gr127;
struct regp rx_gr255;
struct regp rx_gr511;
struct regp rx_gr1023;
struct regp rx_gr1518;
struct regp rx_gr2047;
struct regp rx_gr4095;
struct regp rx_gr9216;  /* 30 */
struct regp rx_gr16383;
struct regp rx_grmax;
struct

Re: [Bugme-new] [Bug 9386] New: sis190 network driver crash

2007-11-15 Thread Andrew Morton

On Thu, 15 Nov 2007 07:30:53 -0800 (PST) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=9386
 
Summary: sis190 network driver crash
Product: Drivers
Version: 2.5
  KernelVersion: 2.6.23.1
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: high
   Priority: P1
  Component: Network
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 CC: [EMAIL PROTECTED]
 
 
 I have a problem where I can lock up a number of machines by 
 changing the link state on a sis190 Ethernet port. For example, during 
 a data transfer such as FTP if I unplug the Ethernet cable and plug it 
 back in, the Ethernet interface will stop responding and the machine 
 will lock up after a minute or so. This behaviour is repeatable. I have the
 sis190 driver loaded as a module.
 
 I haven't found a kernel version where this doesn't happen. It happens with
 kernel 2.6.20.15, for example.
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3: strange errors and non-working-ness

2007-11-15 Thread Jon Nelson

On 11/15/07, Michael Chan [EMAIL PROTECTED] wrote:
 On Thu, 2007-11-15 at 13:17 -0600, Jon Nelson wrote:

  Is this what you mean? I pulled this from the quoted text:
 
  Nov 10 22:45:52 frank kernel: NETDEV WATCHDOG: eth0: transmit timed out
 

 Right.  This explains the reset at 22:45:52, but not the earlier reset
 at 22:24:40.  Link never came up after that earlier reset.

 Is this a new problem introduced by a new driver?  I notice you are
 using tg3 3.65.  Have you used newer versions or older versions?

This is not a new problem - these cards have done this or something
like it for as long as I've had them*. They work just fine in 100 MBit
mode but not in all of my machines, and in none of them at gig-e.
I've tried every version of the driver since SUSE 9.1 without much
luck (at least as far back as 2.6.9). I'd try a newer driver, esp. if
I could make it compile on 2.6.22.12 (I prefer but do not require to
stay with the stock distro kernel, modules notwithstanding).

NOTE: to avoid list noise, I can make a bug out of this on
bugzilla.kernel.org and we can proceed from there if that is
preferred.

[*] Actually, they worked OK in 2.4.something way-back-when but only
for short durations at gig-e speeds.

-- 
Jon
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] New Kernel Bugs

2007-11-15 Thread Ben Dooks

On Tue, Nov 13, 2007 at 10:34:37PM +, Russell King wrote:
 On Tue, Nov 13, 2007 at 06:25:16PM +, Alan Cox wrote:
   Given the wide range of ARM platforms today, it is utterly idiotic to
   expect a single person to be able to provide responses for all ARM bugs.
   I for one wish I'd never *VOLUNTEERED* to be a part of the kernel
   bugzilla, and really *WISH* I could pull out of that function.
  
  You can. Perhaps that bugzilla needs to point to some kind of
  [EMAIL PROTECTED] list for the various ARM platform
  maintainers ?
 
 That might work - though it would be hard to get all the platform
 maintainers to be signed up to yet another mailing list, I'm sure
 sufficient would do.

As long as it would just be bug reports, I'm sure that most of us
could be persuaded to subscribe. Adding another list for general
discussions is probably not going to be read, the current list
provides more than enough to keep us busy.

-- 
Ben

Q:  What's a light-year?
A:  One-third less calories than a regular year.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] PATCH 1/2 [SCHED 2.6.24]: Check subqueue status before calling hard_start_xmit

2007-11-15 Thread Waskiewicz Jr, Peter P

 You could optimize this by getting HARD_TX_LOCK after the 
 check. I assume that netif_stop_subqueue (from another CPU) 
 would always be called by the driver xmit, and that is not 
 possible since we hold the __LINK_STATE_QDISC_RUNNING bit. 
 Does that sound correct?

Sorry for not responding sooner; Dave hit it on the head though with his
response.  I agree with your changes, and I'll incorporate them in the
lockless stack patches I've been working on (in the software queuing
mode).  Now if I could just find some time to finish them up and get
them out for review...

Thanks,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-15 Thread Jonas Danielsson

Hi,

I started to look at this code when I was working on a project of
rewriting a dhcp-client.
I wanted to make the client use arp to determine if the offered
address was free or in use.
Thats when I  noticed that linux machines responded in this, for me, odd way.

The problem is not really the target ip address in the reply, it is
the fact that the target hardware address is set to the hardware
address of the machines that is sending the reply.
The target hardware address should be the same as the destination
address in the ethernet frame.

The dhcp clients I examined, and the implementation of the arpcheck
that I use will compare the target hardware field of the arp-reply and
match it against its own mac, to verify the reply. And this fails with
the current implementation in the kernel.

As for the the target ip set to 0, that is the behavior I saw in
Windows and OpenBSD machines and figured it was a valid approach. The
main thing is however that the target machine address in the arp reply
in this case will confuse dhcp-clients trying to verify the reply.

And even if your arping implementation will work with any variant,
other implementation of this approach of duplicate ip detection
expects a differeant behavior.

Is there a reason that the target hardware address isn't the target
hardware address?

-Jonas

2007/11/15, Alexey Kuznetsov [EMAIL PROTECTED]:
 Hello!

  Send a correct arp reply instead of one with sender ip and sender
  hardware adress in target fields.

 I do not see anything more legal in setting target address to 0.


 Actually, semantics of target address in ARP reply is ambiguous.
 If it is a reply to some real request, it is set to address of requestor
 and protocol requires recipient of this arp reply to test that the address
 matches its own address before creating new entry triggered by unsolicited
 arp reply. That's all.

 In the case of duplicate address detection, requestor does not have
 any address, so that it is absolutely not essential what we use as target
 address. The only place, which could depend on this is the tool, which
 tests for duplicate address. At least, arping written by me, should
 work with any variant.

 So, please, could you explain what did force you to think that use of 0
 is better?

 Alexey

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] accounting unit and variable

2007-11-15 Thread Hideo AOKI

Herbert Xu wrote:
 On Wed, Nov 14, 2007 at 06:30:51PM -0500, Hideo AOKI wrote:
 +#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE)
 +
 +static inline int sk_datagram_pages(int amt)
 +{
 +/* Cast to unsigned as an optimization, since amt is always 
 positive. */
 +return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM);
 +}
 +
 
 Thanks, this looks OK to me.

Hello,

Thank you for reviewing. Then, I'll send take 8 patch set later.

Regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] UDP memory accounting and limitation (take 8)

2007-11-15 Thread Hideo AOKI

Hello,

This is the latest patch set of UDP memory accounting and limitation.

I modified sk_datagram_pages() to avoid using divide instruction.

In addition, I also fixed memory accounting code in udp_recvmsg(),
since, in previous takes, the accounting code referred released
sk_buff's truesize. The fix can be found in 3rd patch of the patch set.

The patch set is for net-2.6. Please apply.


Changelog take 7 - take 8:
 * sk_datagram_pages(): avoided using divide instruction
 * udp_recvmsg(): fixed referring released truesize in accounting

Changelog take 6 - take 7:
 * renamed /proc/sys/net/ipv4/udp_rmem to
   /proc/sys/net/ipv4/udp_rmem_min
 * renamed /proc/sys/net/ipv4/udp_wmem to
   /proc/sys/net/ipv4/udp_wmem_min
 * rebased to net-2.6


Changelog take 5 - take 6:

 * removed minimal limit of /proc/sys/net/ipv4/udp_mem
 * added udp_init() for default value calculation of parameters
 * added /proc/sys/net/ipv4/udp_rmem and
   /proc/sys/net/ipv4/udp_rmem
 * added limitation code to ip_ufo_append_data()
 * improved accounting for receiving packet
 * fixed typos
 * rebased to 2.6.24-rc1


Changelog take 4 - take 5:

 * removing unnessesary EXPORT_SYMBOLs
 * adding minimal limit of /proc/sys/net/ipv4/udp_mem
 * bugfix of UDP limit affecting protocol other than UDP
 * introducing __ip_check_max_skb_pages()
 * using CTL_UNNUMBERED
 * adding udp_mem usage to Documentation/networking/ip_sysctl.txt


Best regards,
Hideo Aoki

--
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] udp: memory limitation by using udp_mem

2007-11-15 Thread Hideo AOKI

This patch introduces memory limitation for UDP.

signed-off-by: Satoshi Oshima [EMAIL PROTECTED]
signed-off-by: Hideo Aoki [EMAIL PROTECTED]
---

 Documentation/networking/ip-sysctl.txt |6 
 include/net/udp.h  |3 ++
 net/ipv4/af_inet.c |3 ++
 net/ipv4/ip_output.c   |   47 ++---
 net/ipv4/sysctl_net_ipv4.c |   11 +++
 net/ipv4/udp.c |   24 
 6 files changed, 91 insertions(+), 3 deletions(-)

diff -pruN net-2.6-udp-p3/Documentation/networking/ip-sysctl.txt 
net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt
--- net-2.6-udp-p3/Documentation/networking/ip-sysctl.txt   2007-11-14 
10:48:49.0 -0500
+++ net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt   2007-11-15 
14:44:21.0 -0500
@@ -446,6 +446,12 @@ tcp_dma_copybreak - INTEGER
and CONFIG_NET_DMA is enabled.
Default: 4096

+UDP variables:
+
+udp_mem - INTEGER
+   Number of pages allowed for queueing by all UDP sockets.
+   Default is calculated at boot time from amount of available memory.
+
 CIPSOv4 Variables:

 cipso_cache_enable - BOOLEAN
diff -pruN net-2.6-udp-p3/include/net/udp.h net-2.6-udp-p4/include/net/udp.h
--- net-2.6-udp-p3/include/net/udp.h2007-11-15 14:44:13.0 -0500
+++ net-2.6-udp-p4/include/net/udp.h2007-11-15 14:44:21.0 -0500
@@ -66,6 +66,7 @@ extern rwlock_t udp_hash_lock;
 extern struct proto udp_prot;

 extern atomic_t udp_memory_allocated;
+extern int sysctl_udp_mem;

 struct sk_buff;

@@ -175,4 +176,6 @@ extern void udp_proc_unregister(struct u
 extern int  udp4_proc_init(void);
 extern void udp4_proc_exit(void);
 #endif
+
+extern void udp_init(void);
 #endif /* _UDP_H */
diff -pruN net-2.6-udp-p3/net/ipv4/af_inet.c net-2.6-udp-p4/net/ipv4/af_inet.c
--- net-2.6-udp-p3/net/ipv4/af_inet.c   2007-11-15 14:44:18.0 -0500
+++ net-2.6-udp-p4/net/ipv4/af_inet.c   2007-11-15 14:44:21.0 -0500
@@ -1446,6 +1446,9 @@ static int __init inet_init(void)
/* Setup TCP slab cache for open requests. */
tcp_init();

+   /* Setup UDP memory threshold */
+   udp_init();
+
/* Add UDP-Lite (RFC 3828) */
udplite4_register();

diff -pruN net-2.6-udp-p3/net/ipv4/ip_output.c 
net-2.6-udp-p4/net/ipv4/ip_output.c
--- net-2.6-udp-p3/net/ipv4/ip_output.c 2007-11-15 14:44:18.0 -0500
+++ net-2.6-udp-p4/net/ipv4/ip_output.c 2007-11-15 14:44:21.0 -0500
@@ -75,6 +75,7 @@
 #include net/icmp.h
 #include net/checksum.h
 #include net/inetpeer.h
+#include net/udp.h
 #include linux/igmp.h
 #include linux/netfilter_ipv4.h
 #include linux/netfilter_bridge.h
@@ -699,6 +700,20 @@ csum_page(struct page *page, int offset,
return csum;
 }

+static inline int __ip_check_max_skb_pages(struct sock *sk, int size)
+{
+   switch(sk-sk_protocol) {
+   case IPPROTO_UDP:
+   if (atomic_read(sk-sk_prot-memory_allocated) + size
+sk-sk_prot-sysctl_mem[0])
+   return -ENOBUFS;
+   /* Fall through */  
+   default:
+   break;
+   }
+   return 0;
+}
+
 static inline int ip_ufo_append_data(struct sock *sk,
int getfrag(void *from, char *to, int offset, int len,
   int odd, struct sk_buff *skb),
@@ -707,16 +722,20 @@ static inline int ip_ufo_append_data(str
 {
struct sk_buff *skb;
int err;
+   int size = 0;

/* There is support for UDP fragmentation offload by network
 * device, so create one single skb packet containing complete
 * udp datagram
 */
if ((skb = skb_peek_tail(sk-sk_write_queue)) == NULL) {
-   skb = sock_alloc_send_skb(sk,
-   hh_len + fragheaderlen + transhdrlen + 20,
-   (flags  MSG_DONTWAIT), err);
+   size = hh_len + fragheaderlen + transhdrlen + 20;
+   err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size));
+   if (err)
+   return err;

+   skb = sock_alloc_send_skb(sk, size, (flags  MSG_DONTWAIT),
+ err);
if (skb == NULL)
return err;

@@ -737,6 +756,10 @@ static inline int ip_ufo_append_data(str
sk-sk_sndmsg_off = 0;
}

+   err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size + length -
+transhdrlen));
+   if (err)
+   goto fail;
err = skb_append_datato_frags(sk,skb, getfrag, from,
   (length - transhdrlen));
if (!err) {
@@ -752,6 +775,7 @@ static inline int ip_ufo_append_data(str
/* There is not enough support do UFO ,
 * so follow normal path
 */
+fail:
kfree_skb(skb);

[PATCH 2/5] udp: accounting unit and variable

2007-11-15 Thread Hideo AOKI

This patch introduces global variable for UDP memory accounting.
The unit is page.

signed-off-by: Satoshi Oshima [EMAIL PROTECTED]
signed-off-by: Hideo Aoki [EMAIL PROTECTED]
---

 include/net/sock.h |8 
 include/net/udp.h  |2 ++
 net/ipv4/proc.c|3 ++-
 net/ipv4/udp.c |2 ++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff -pruN net-2.6-udp-p1/include/net/sock.h net-2.6-udp-p2/include/net/sock.h
--- net-2.6-udp-p1/include/net/sock.h   2007-11-15 12:42:04.0 -0500
+++ net-2.6-udp-p2/include/net/sock.h   2007-11-15 14:44:13.0 -0500
@@ -778,6 +778,14 @@ static inline int sk_stream_wmem_schedul
   sk_stream_mem_schedule(sk, size, 0);
 }

+#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE)
+
+static inline int sk_datagram_pages(int amt)
+{
+   /* Cast to unsigned as an optimization, since amt is always positive. */
+   return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM);
+}
+
 /* Used by processes to lock a socket state, so that
  * interrupts and bottom half handlers won't change it
  * from under us. It essentially blocks any incoming
diff -pruN net-2.6-udp-p1/include/net/udp.h net-2.6-udp-p2/include/net/udp.h
--- net-2.6-udp-p1/include/net/udp.h2007-11-14 10:49:05.0 -0500
+++ net-2.6-udp-p2/include/net/udp.h2007-11-15 14:44:13.0 -0500
@@ -65,6 +65,8 @@ extern rwlock_t udp_hash_lock;

 extern struct proto udp_prot;

+extern atomic_t udp_memory_allocated;
+
 struct sk_buff;

 /*
diff -pruN net-2.6-udp-p1/net/ipv4/proc.c net-2.6-udp-p2/net/ipv4/proc.c
--- net-2.6-udp-p1/net/ipv4/proc.c  2007-11-14 10:49:07.0 -0500
+++ net-2.6-udp-p2/net/ipv4/proc.c  2007-11-15 14:44:13.0 -0500
@@ -56,7 +56,8 @@ static int sockstat_seq_show(struct seq_
   sock_prot_inuse(tcp_prot), atomic_read(tcp_orphan_count),
   tcp_death_row.tw_count, atomic_read(tcp_sockets_allocated),
   atomic_read(tcp_memory_allocated));
-   seq_printf(seq, UDP: inuse %d\n, sock_prot_inuse(udp_prot));
+   seq_printf(seq, UDP: inuse %d mem %d\n, sock_prot_inuse(udp_prot),
+  atomic_read(udp_memory_allocated));
seq_printf(seq, UDPLITE: inuse %d\n, sock_prot_inuse(udplite_prot));
seq_printf(seq, RAW: inuse %d\n, sock_prot_inuse(raw_prot));
seq_printf(seq,  FRAG: inuse %d memory %d\n,
diff -pruN net-2.6-udp-p1/net/ipv4/udp.c net-2.6-udp-p2/net/ipv4/udp.c
--- net-2.6-udp-p1/net/ipv4/udp.c   2007-11-14 10:49:07.0 -0500
+++ net-2.6-udp-p2/net/ipv4/udp.c   2007-11-15 14:44:13.0 -0500
@@ -114,6 +114,8 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta
 struct hlist_head udp_hash[UDP_HTABLE_SIZE];
 DEFINE_RWLOCK(udp_hash_lock);

+atomic_t udp_memory_allocated;
+
 static inline int __udp_lib_lport_inuse(__u16 num,
const struct hlist_head udptable[])
 {
-- 
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] udp: memory accounting

2007-11-15 Thread Hideo AOKI

This patch adds UDP memory usage accounting in IPv4.

signed-off-by: Satoshi Oshima [EMAIL PROTECTED]
signed-off-by: Hideo Aoki [EMAIL PROTECTED]
---

 af_inet.c   |   30 +-
 ip_output.c |   25 ++---
 udp.c   |   10 ++
 3 files changed, 61 insertions(+), 4 deletions(-)

diff -pruN net-2.6-udp-p2/net/ipv4/af_inet.c net-2.6-udp-p3/net/ipv4/af_inet.c
--- net-2.6-udp-p2/net/ipv4/af_inet.c   2007-11-14 10:49:06.0 -0500
+++ net-2.6-udp-p3/net/ipv4/af_inet.c   2007-11-15 14:44:18.0 -0500
@@ -126,13 +126,41 @@ extern void ip_mc_drop_socket(struct soc
 static struct list_head inetsw[SOCK_MAX];
 static DEFINE_SPINLOCK(inetsw_lock);

+/**
+ * __skb_queue_purge_and_sub_memory_allocated
+ * - empty a list and subtruct memory allocation counter
+ * @sk:   sk
+ * @list: list to empty
+ * Delete all buffers on an sk_buff list and subtruct the
+ * truesize of the sk_buff for memory accounting. Each buffer
+ * is removed from the list and one reference dropped. This
+ * function does not take the list lock and the caller must
+ * hold the relevant locks to use it.
+ */
+static inline void __skb_queue_purge_and_sub_memory_allocated(struct sock *sk,
+   struct sk_buff_head *list)
+{
+   struct sk_buff *skb;
+   int purged_skb_size = 0;
+   while ((skb = __skb_dequeue(list)) != NULL) {
+   purged_skb_size += sk_datagram_pages(skb-truesize);
+   kfree_skb(skb);
+   }
+   atomic_sub(purged_skb_size, sk-sk_prot-memory_allocated);
+}
+
 /* New destruction routine */

 void inet_sock_destruct(struct sock *sk)
 {
struct inet_sock *inet = inet_sk(sk);

-   __skb_queue_purge(sk-sk_receive_queue);
+   if (sk-sk_prot-memory_allocated  sk-sk_type != SOCK_STREAM)
+   __skb_queue_purge_and_sub_memory_allocated(sk,
+   sk-sk_receive_queue);
+   else
+   __skb_queue_purge(sk-sk_receive_queue);
+
__skb_queue_purge(sk-sk_error_queue);

if (sk-sk_type == SOCK_STREAM  sk-sk_state != TCP_CLOSE) {
diff -pruN net-2.6-udp-p2/net/ipv4/ip_output.c 
net-2.6-udp-p3/net/ipv4/ip_output.c
--- net-2.6-udp-p2/net/ipv4/ip_output.c 2007-11-15 14:44:11.0 -0500
+++ net-2.6-udp-p3/net/ipv4/ip_output.c 2007-11-15 14:44:18.0 -0500
@@ -743,6 +743,8 @@ static inline int ip_ufo_append_data(str
/* specify the length of each IP datagram fragment*/
skb_shinfo(skb)-gso_size = mtu - fragheaderlen;
skb_shinfo(skb)-gso_type = SKB_GSO_UDP;
+   atomic_add(sk_datagram_pages(skb-truesize),
+  sk-sk_prot-memory_allocated);
__skb_queue_tail(sk-sk_write_queue, skb);

return 0;
@@ -924,6 +926,9 @@ alloc_new_skb:
}
if (skb == NULL)
goto error;
+   if (sk-sk_prot-memory_allocated)
+   atomic_add(sk_datagram_pages(skb-truesize),
+  sk-sk_prot-memory_allocated);

/*
 *  Fill in the control structures
@@ -1023,6 +1028,8 @@ alloc_new_skb:
frag = skb_shinfo(skb)-frags[i];
skb-truesize += PAGE_SIZE;
atomic_add(PAGE_SIZE, sk-sk_wmem_alloc);
+   if (sk-sk_prot-memory_allocated)
+   
atomic_inc(sk-sk_prot-memory_allocated);
} else {
err = -EMSGSIZE;
goto error;
@@ -1123,7 +1130,9 @@ ssize_t   ip_append_page(struct sock *sk,
if (unlikely(!skb)) {
err = -ENOBUFS;
goto error;
-   }
+   } else if (sk-sk_prot-memory_allocated)
+   atomic_add(sk_datagram_pages(skb-truesize),
+  sk-sk_prot-memory_allocated);

/*
 *  Fill in the control structures
@@ -1213,13 +1222,14 @@ int ip_push_pending_frames(struct sock *
struct iphdr *iph;
__be16 df = 0;
__u8 ttl;
-   int err = 0;
+   int err = 0, send_page_size;

if ((skb = __skb_dequeue(sk-sk_write_queue)) == NULL)
goto out;
tail_skb = (skb_shinfo(skb)-frag_list);

/* move skb-data to ip header from ext header */
+   send_page_size = sk_datagram_pages(skb-truesize);
if (skb-data  skb_network_header(skb))
__skb_pull(skb, skb_network_offset(skb));
while ((tmp_skb = __skb_dequeue(sk-sk_write_queue)) != NULL) {
@@ -1229,6 +1239,7 @@ int

[PATCH 1/5] udp: fix send buffer check

2007-11-15 Thread Hideo AOKI

This patch introduces sndbuf size check before memory allocation for
send buffer.

signed-off-by: Satoshi Oshima [EMAIL PROTECTED]
signed-off-by: Hideo Aoki [EMAIL PROTECTED]
---

 ip_output.c |5 +
 1 file changed, 5 insertions(+)

diff -pruN net-2.6/net/ipv4/ip_output.c net-2.6-udp-p1/net/ipv4/ip_output.c
--- net-2.6/net/ipv4/ip_output.c2007-11-14 10:49:06.0 -0500
+++ net-2.6-udp-p1/net/ipv4/ip_output.c 2007-11-15 14:44:11.0 -0500
@@ -1004,6 +1004,11 @@ alloc_new_skb:
frag = skb_shinfo(skb)-frags[i];
}
} else if (i  MAX_SKB_FRAGS) {
+   if (atomic_read(sk-sk_wmem_alloc) + PAGE_SIZE
+2 * sk-sk_sndbuf) {
+   err = -ENOBUFS;
+   goto error;
+   }
if (copy  PAGE_SIZE)
copy = PAGE_SIZE;
page = alloc_pages(sk-sk_allocation, 0);
-- 
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 153 matches

Mail list logo