date:20060918

Kernel panic occurs with ucdsnmp

2006-09-18 Thread Chinmaya Mishra




Hi all,

I am trying to compile the Linux's distribution with enabling the 'compounds'.
It will successfully compiled, image and romfs.img is created.
But when i am trying to boot  with this image I am getting the following 
error message

.
Other stuff added by David S. Miller davem@redhat.com
VFS: Mounted root (romfs filesystem) readonly.
Freeing init memory: 44K
Warning: unable to open an initial console.
Kernel panic: Attempted to kill init!
Panic reset

After this it will halts. If i am not enable the 'ucdsnmp' package it 
works fine. What is the resean of this .


WORNING: UNABLE TO OPEN AN INITIAL CONSOLE.
KERNEL PANIC: ATTEMPTED TO KILL INIT!
PANIC RESET

Please tell me where is the mistake ?

Tahnks in advance.
Chinmaya


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kernel panic occurs with ucdsnmp

2006-09-18 Thread Chinmaya Mishra



Hi all,

I am trying to compile the uCLinux's distribution with enabling the 
'ucdsnmp'.

It will successfully compiled, image and romfs.img is created.
But when i am trying to boot with this image I am getting the following
error message

.
Other stuff added by David S. Miller davem@redhat.com
VFS: Mounted root (romfs filesystem) readonly.
Freeing init memory: 44K
Warning: unable to open an initial console.
Kernel panic: Attempted to kill init!
Panic reset

After this it will halts. If i am not enable the 'ucdsnmp' package it
works fine. What is the resean of this .

WORNING: UNABLE TO OPEN AN INITIAL CONSOLE.
KERNEL PANIC: ATTEMPTED TO KILL INIT!
PANIC RESET

Please tell me where is the mistake ?

Tahnks in advance.
Chinmaya



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp

2006-09-18 Thread David Miller

From: Patrick McHardy [EMAIL PROTECTED]
Date: Fri, 15 Sep 2006 22:16:17 +0200

 bert hubert wrote:
 It appears to be intentionally, but I don't see a reason for it.
 Can you try if this patch makes it work as expected?

 [PACKET]: Don't truncate non-linear skbs with mmaped IO

 Non-linear skbs are truncated to their linear part with mmaped IO.
 Fix by using skb_copy_bits instead of memcpy.

  Works very well for me! I hope this can make it into 2.6.18.

 That would be fine with me, lets see what Dave thinks.

Applied to net-2.6, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/8] address: Convert address lookup to new netlink api

2006-09-18 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Fri, 01 Sep 2006 23:40:00 +0200

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] address: Add put_ifaddrmsg() and rt_scope()

2006-09-18 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Fri, 01 Sep 2006 23:40:02 +0200

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] address: Support NLM_F_EXCL when adding addresses

2006-09-18 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Fri, 01 Sep 2006 23:40:05 +0200

 iproute2 doesn't provide the NLM_F_CREATE flag when adding addresses,
 it is assumed to be implied. The existing code issues a check on
 said flag when the modify operation fails (likely due to ENOENT)
 before continueing to create it, this leads to a hard to predict
 result, therefore the NLM_F_CREATE check is removed.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

I hope this doesn't break any existing stuff, but it is certainly
the logically correct thing to do.  If things break I'm reverting
this though.

But for now, applied, thanks.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]:[XFRM] BEET mode

2006-09-18 Thread Miika Komu


On Sat, 16 Sep 2006, Diego Beltrami wrote:


The patch which introduces the BEET mode and which previously was sent to this 
mailing list is valid also for
http://www.kernel.org/git/?p=linux/kernel/git/davem/net-2.6.19.git;a=summary
branch.
However there are probably some errors in attaching inline the patch to the 
mail.
I retry to reattach it. In any case, if there would be some errors, the same 
patch can be found at the following URL and it works just fine:

..

For those who haven't been following this discussion, the patch introduces the 
BEET mode (Bound End-to-End Tunnel) as specified by the ietf draft at the 
following link:

http://www.ietf.org/internet-drafts/draft-nikander-esp-beet-mode-06.txt

Signed-off-by: Diego Beltrami [EMAIL PROTECTED]
Signed-off-by: Miika Komu [EMAIL PROTECTED]
Signed-off-by: Herbert Xu [EMAIL PROTECTED]
Signed-off-by: Abhinav Pathak [EMAIL PROTECTED]
Signed-off-by: Jeff Ahrenholz [EMAIL PROTECTED]


Is the patch in the web fine? Diego said that the patch applies fine to 
Dave's branch, but the problem is the email formatting. The patch in the 
web is the same as forwarded to the email list.


I put the patch into a more permanent location:

http://infrahip.hiit.fi/beet/2.6.18/simple-beet-ph-patch-2.6.18
http://infrahip.hiit.fi/beet/2.6.18/simple-beet-ph-patch-2.6.18.md5sum
5cd131d2f15f04d3dc26e360ce3ae38e  simple-beet-ph-patch-2.6.18

--
Miika Komu   http://www.iki.fi/miika/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [XFRM]: Fix wildcard as tunnel source

2006-09-18 Thread David Miller

From: Patrick McHardy [EMAIL PROTECTED]
Date: Sat, 02 Sep 2006 16:46:44 +0200

 [XFRM]: Fix wildcard as tunnel source

 Hashing SAs by source address breaks templates with wildcards as tunnel
 source. Remove saddr from the hash key.

 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

Unfortunately, this break scalability of the xfrm state layer when the
source is equally as varying as the destination.  In such setups you
have an enormous number of entries with destination being the local
system and only the source address changing.

BTW, how can the source be specified as wildcard?  There is no prefix
component, it is simply an xfrm_address_t.  And there are several
macros which check for x-props.saddr equality directly with no
special prefixing or wildcard logic.

I really don't want to remove this as it's fairly critical performance
wise for the scalability problems all my changes were meant to address.
I hope I really don't have to do something like what was needed for
the policy layer, having a linked list and a hash table to handle the
two cases.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread David Miller

From: Alexey Kuznetsov [EMAIL PROTECTED]
Date: Mon, 4 Sep 2006 20:00:45 +0400

 Try enclosed patch. I have no idea why 9.997 sec is so magic, but I
 get exactly this number on my notebook. :-)

 =

 This patch enables sending ACKs each 2d received segment.
 It does not affect either mss-sized connections (obviously) or connections
 controlled by Nagle (because there is only one small segment in flight).

 The idea is to record the fact that a small segment arrives
 on a connection, where one small segment has already been received
 and still not-ACKed. In this case ACK is forced after tcp_recvmsg()
 drains receive buffer.

 In other words, it is a soft each-2d-segment ACK, which is enough
 to preserve ACK clock even when ABC is enabled.

 Signed-off-by: Alexey Kuznetsov [EMAIL PROTECTED]

This looks exactly like the kind of patch I tried to formulate,
very unsuccessfully, last time this topic came up a year or
so ago.

It looks perfectly fine to me, would you like me to apply it
Alexey?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread David Miller

From: Rick Jones [EMAIL PROTECTED]
Date: Tue, 05 Sep 2006 10:55:16 -0700

 Is this really necessary?  I thought that the problems with ABC were in 
 trying to apply byte-based heuristics from the RFC(s) to a 
 packet-oritented cwnd in the stack?

This is receiver side, and helps a sender who does congestion
control based upon packet counting like Linux does.   It really
is less related to ABC than Alexey implies, we've always had
this kind of problem as I mentioned in previous talks in the
past on this issue.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [XFRM]: Fix wildcard as tunnel source

2006-09-18 Thread Patrick McHardy

David Miller wrote:
 Unfortunately, this break scalability of the xfrm state layer when the
 source is equally as varying as the destination.  In such setups you
 have an enormous number of entries with destination being the local
 system and only the source address changing.
 
 BTW, how can the source be specified as wildcard?  There is no prefix
 component, it is simply an xfrm_address_t.  And there are several
 macros which check for x-props.saddr equality directly with no
 special prefixing or wildcard logic.

The tunnel endpoint in the template (either source or destination,
depending on the direction) is set to 0.0.0.0. For outbound SAs,
the address is compared using xfrm_state_addr_check(), which interprets
0.0.0.0 as wildcard. When no matching SA is present, the address
is resolved using routing and filled in the ACQ SA. The keying daemon
will then install SAs with the proper source. For inbound SAs the
tunnel destination from the template is ignored.

 I really don't want to remove this as it's fairly critical performance
 wise for the scalability problems all my changes were meant to address.
 I hope I really don't have to do something like what was needed for
 the policy layer, having a linked list and a hash table to handle the
 two cases.

We could query the address before the SA lookup. It will cost an
additional route lookup in case a matching SA is already present,
but I guess thats still better than removing the source from the
hash. I'll try if it works and send a new patch.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tcp congestion policy selection link order fragile

2006-09-18 Thread David Miller

From: bert hubert [EMAIL PROTECTED]
Date: Sun, 17 Sep 2006 14:21:53 +0200

 Operators, distributors and even people who've been doing kernel stuff for
 more than a decade expect to be able to compile in (experimental) policies,
 and not have a *random* one of them enabled by default!

We created TCP_CONG_ADVANCED for a purpose.  If you turn that
thing on, you get full control but if something breaks you get
to keep the pieces.

Quite frankly, just about everyone should not enable TCP_CONG_ADVANCED
at all.  And quite likely thie applies even distribution vendors.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)

2006-09-18 Thread Philippe De Muyter

On Mon, Sep 18, 2006 at 11:41:09AM +0800, Jesse Huang wrote:
 Dear Philippe:
 
 (1) We are not allow to support register TxStartThresh and, RxEarlyThresh,
 so
 we remove it.

Could you develop ?
- What do you mean by `We are not allow'
- Is it specific to the IP100A chip ?

Those register are documented in the Sundance Technology ST201 Data Sheet
and when modified with fine-tuned values, they can have a real positive
effect on the overall throughput on a loaded system.

 
 (2) Your consideration is right. But reset_tx is workaround for customer's
 embedded system, I don't have this
 enviroment now. I can't sure it will work fine if I removed this.

On DFE-580TX boards, the reset_tx way did not work.  The ports remained
blocked until a power-cycle.  I do not know if the TxUnderrun problem ever
happened with earlier (one port) boards, so I doubt that the reset_tx way
ever worked.  Is was even commented as not being tested.  On DFE-580TX
boards, the current way has been verified by me and others to work, so
please do not break it.

Best regards

Philippe

 
 Thanks you very mutch.
 
 Best Regards,
 Jesse Huang.
 
 - Original Message - 
 From: Philippe De Muyter [EMAIL PROTECTED]
 To: Jesse Huang [EMAIL PROTECTED]
 Cc: netdev@vger.kernel.org
 Sent: Friday, September 15, 2006 7:44 PM
 Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)
 
 
 On Thu, Sep 14, 2006 at 12:58:30AM +, Jesse Huang wrote:
 [...]
  @@ -262,8 +262,6 @@ enum alta_offsets {
   ASICCtrl = 0x30,
   EEData = 0x34,
   EECtrl = 0x36,
  - TxStartThresh = 0x3c,
  - RxEarlyThresh = 0x3e,
 
 Why ?
 
   FlashAddr = 0x40,
   FlashData = 0x44,
   TxStatus = 0x46,
 [...]
  @@ -1156,29 +1160,29 @@ static irqreturn_t intr_handler(int irq,
   np-stats.tx_fifo_errors++;
   if (tx_status  0x02)
   np-stats.tx_window_errors++;
  - /*
  - ** This reset has been verified on
  - ** DFE-580TX boards ! [EMAIL PROTECTED]
  - */
  - if (tx_status  0x10) { /* TxUnderrun */
  - unsigned short txthreshold;
  -
  - txthreshold = ioread16 (ioaddr + TxStartThresh);
  - /* Restart Tx FIFO and transmitter */
  - sundance_reset(dev, (NetworkReset|FIFOReset|TxReset)  16);
  - iowrite16 (txthreshold, ioaddr + TxStartThresh);
  - /* No need to reset the Tx pointer here */
  +
  + /* FIFO ERROR need to be reset tx */
  + if (tx_status  0x10) { /* Reset the Tx. */
  + spin_lock(np-lock);
  + reset_tx(dev);
  + spin_unlock(np-lock);
  + }
 
 Just as the comments say, on DFE-580TX 4 port boards, where it is easy to
 reproduce TxUnderrun problems, just resetting on the chip the Tx FIFO and
 transmitter is enough.
 There is no need to call reset_tx, which discards all pending messages and
 frees all the skb's.  It is also not necessary to reload the Tx pointer.
 
 Is it different with newer versions of the chip ?
 
 Philippe
 

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [XFRM]: Fix wildcard as tunnel source

2006-09-18 Thread Patrick McHardy

Patrick McHardy wrote:
 David Miller wrote:
 
I really don't want to remove this as it's fairly critical performance
wise for the scalability problems all my changes were meant to address.
I hope I really don't have to do something like what was needed for
the policy layer, having a linked list and a hash table to handle the
two cases.
 
 
 We could query the address before the SA lookup. It will cost an
 additional route lookup in case a matching SA is already present,
 but I guess thats still better than removing the source from the
 hash. I'll try if it works and send a new patch.

I've tested this patch and it works fine. I'm wondering if something
else might be affected by the hash change though, xfrm_state_addr_check
treated 0.0.0.0 as wildcard even before the introduction of wildcards
in tunnel templates, but I can't see in which other case it would be
zero.

[XFRM]: Fix wildcard as tunnel source

Hashing SAs by source address breaks templates with wildcards as tunnel
source since the source address used for hashing/lookup is still 0/0.
Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
real address in the lookup.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit f3307c3183e50959247f28c773590b5d7902097f
tree 78ddff768dc25145110767f182408ed6993828c5
parent c2cb1937e1054380c49699188810b9c6e04c8e21
author Patrick McHardy [EMAIL PROTECTED] Mon, 18 Sep 2006 11:34:25 +0200
committer Patrick McHardy [EMAIL PROTECTED] Mon, 18 Sep 2006 11:34:25 +0200

 include/net/xfrm.h  |   13 +
 net/ipv4/xfrm4_policy.c |   20 
 net/ipv4/xfrm4_state.c  |   15 ---
 net/ipv6/xfrm6_policy.c |   21 +
 net/ipv6/xfrm6_state.c  |   16 
 net/xfrm/xfrm_policy.c  |   21 +
 6 files changed, 75 insertions(+), 31 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index bf8e2df..c6fac69 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -223,6 +223,7 @@ struct xfrm_policy_afinfo {
struct dst_ops  *dst_ops;
void(*garbage_collect)(void);
int (*dst_lookup)(struct xfrm_dst **dst, struct 
flowi *fl);
+   int (*get_saddr)(xfrm_address_t *saddr, 
xfrm_address_t *daddr);
struct dst_entry*(*find_bundle)(struct flowi *fl, struct 
xfrm_policy *policy);
int (*bundle_create)(struct xfrm_policy *policy, 
 struct xfrm_state **xfrm, 
@@ -632,6 +633,18 @@ #endif
 }
 
 static inline int
+xfrm_addr_any(xfrm_address_t *addr, unsigned short family)
+{
+   switch (family) {
+   case AF_INET:
+   return addr-a4 == 0;
+   case AF_INET6:
+   return ipv6_addr_any((struct in6_addr*)addr-a6);
+   }
+   return 0;
+}
+
+static inline int
 __xfrm4_state_addr_cmp(struct xfrm_tmpl *tmpl, struct xfrm_state *x)
 {
return  (tmpl-saddr.a4 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 4795985..eabcd27 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -21,6 +21,25 @@ static int xfrm4_dst_lookup(struct xfrm_
return __ip_route_output_key((struct rtable**)dst, fl);
 }
 
+static int xfrm4_get_saddr(xfrm_address_t *saddr, xfrm_address_t *daddr)
+{
+   struct rtable *rt;
+   struct flowi fl_tunnel = {
+   .nl_u = {
+   .ip4_u = {
+   .daddr = daddr-a4,
+   },
+   },
+   };
+
+   if (!xfrm4_dst_lookup((struct xfrm_dst **)rt, fl_tunnel)) {
+   saddr-a4 = rt-rt_src;
+   dst_release(rt-u.dst);
+   return 0;
+   }
+   return -EHOSTUNREACH;
+}
+
 static struct dst_entry *
 __xfrm4_find_bundle(struct flowi *fl, struct xfrm_policy *policy)
 {
@@ -298,6 +317,7 @@ static struct xfrm_policy_afinfo xfrm4_p
.family =   AF_INET,
.dst_ops =  xfrm4_dst_ops,
.dst_lookup =   xfrm4_dst_lookup,
+   .get_saddr =xfrm4_get_saddr,
.find_bundle =  __xfrm4_find_bundle,
.bundle_create =__xfrm4_bundle_create,
.decode_session =   _decode_session4,
diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c
index 6a2a4ab..fe20344 100644
--- a/net/ipv4/xfrm4_state.c
+++ b/net/ipv4/xfrm4_state.c
@@ -42,21 +42,6 @@ __xfrm4_init_tempsel(struct xfrm_state *
x-props.saddr = tmpl-saddr;
if (x-props.saddr.a4 == 0)
x-props.saddr.a4 = saddr-a4;
-   if (tmpl-mode == XFRM_MODE_TUNNEL  x-props.saddr.a4 == 0) {
-   struct rtable *rt;
-   struct flowi fl_tunnel = {
-   .nl_u = {
-   .ip4_u = {
-   .daddr = x-id.daddr.a4,
-   }
-

Re: [XFRM]: Fix wildcard as tunnel source

2006-09-18 Thread Patrick McHardy

Patrick McHardy wrote:
 [XFRM]: Fix wildcard as tunnel source
 
 Hashing SAs by source address breaks templates with wildcards as tunnel
 source since the source address used for hashing/lookup is still 0/0.
 Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
 real address in the lookup.
 
  
  static inline int
 +xfrm_addr_any(xfrm_address_t *addr, unsigned short family)
 +{
 + switch (family) {
 + case AF_INET:
 + return addr-a4 == 0;
 + case AF_INET6:
 + return ipv6_addr_any((struct in6_addr*)addr-a6);

D'oh. Fixed patch attached.

[XFRM]: Fix wildcard as tunnel source

Hashing SAs by source address breaks templates with wildcards as tunnel
source since the source address used for hashing/lookup is still 0/0.
Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
real address in the lookup.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit f3307c3183e50959247f28c773590b5d7902097f
tree 78ddff768dc25145110767f182408ed6993828c5
parent c2cb1937e1054380c49699188810b9c6e04c8e21
author Patrick McHardy [EMAIL PROTECTED] Mon, 18 Sep 2006 11:34:25 +0200
committer Patrick McHardy [EMAIL PROTECTED] Mon, 18 Sep 2006 11:34:25 +0200

 include/net/xfrm.h  |   13 +
 net/ipv4/xfrm4_policy.c |   20 
 net/ipv4/xfrm4_state.c  |   15 ---
 net/ipv6/xfrm6_policy.c |   21 +
 net/ipv6/xfrm6_state.c  |   16 
 net/xfrm/xfrm_policy.c  |   21 +
 6 files changed, 75 insertions(+), 31 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index bf8e2df..c6fac69 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -223,6 +223,7 @@ struct xfrm_policy_afinfo {
struct dst_ops  *dst_ops;
void(*garbage_collect)(void);
int (*dst_lookup)(struct xfrm_dst **dst, struct 
flowi *fl);
+   int (*get_saddr)(xfrm_address_t *saddr, 
xfrm_address_t *daddr);
struct dst_entry*(*find_bundle)(struct flowi *fl, struct 
xfrm_policy *policy);
int (*bundle_create)(struct xfrm_policy *policy, 
 struct xfrm_state **xfrm, 
@@ -632,6 +633,18 @@ #endif
 }
 
 static inline int
+xfrm_addr_any(xfrm_address_t *addr, unsigned short family)
+{
+   switch (family) {
+   case AF_INET:
+   return addr-a4 == 0;
+   case AF_INET6:
+   return ipv6_addr_any((struct in6_addr *)addr-a6);
+   }
+   return 0;
+}
+
+static inline int
 __xfrm4_state_addr_cmp(struct xfrm_tmpl *tmpl, struct xfrm_state *x)
 {
return  (tmpl-saddr.a4 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 4795985..eabcd27 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -21,6 +21,25 @@ static int xfrm4_dst_lookup(struct xfrm_
return __ip_route_output_key((struct rtable**)dst, fl);
 }
 
+static int xfrm4_get_saddr(xfrm_address_t *saddr, xfrm_address_t *daddr)
+{
+   struct rtable *rt;
+   struct flowi fl_tunnel = {
+   .nl_u = {
+   .ip4_u = {
+   .daddr = daddr-a4,
+   },
+   },
+   };
+
+   if (!xfrm4_dst_lookup((struct xfrm_dst **)rt, fl_tunnel)) {
+   saddr-a4 = rt-rt_src;
+   dst_release(rt-u.dst);
+   return 0;
+   }
+   return -EHOSTUNREACH;
+}
+
 static struct dst_entry *
 __xfrm4_find_bundle(struct flowi *fl, struct xfrm_policy *policy)
 {
@@ -298,6 +317,7 @@ static struct xfrm_policy_afinfo xfrm4_p
.family =   AF_INET,
.dst_ops =  xfrm4_dst_ops,
.dst_lookup =   xfrm4_dst_lookup,
+   .get_saddr =xfrm4_get_saddr,
.find_bundle =  __xfrm4_find_bundle,
.bundle_create =__xfrm4_bundle_create,
.decode_session =   _decode_session4,
diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c
index 6a2a4ab..fe20344 100644
--- a/net/ipv4/xfrm4_state.c
+++ b/net/ipv4/xfrm4_state.c
@@ -42,21 +42,6 @@ __xfrm4_init_tempsel(struct xfrm_state *
x-props.saddr = tmpl-saddr;
if (x-props.saddr.a4 == 0)
x-props.saddr.a4 = saddr-a4;
-   if (tmpl-mode == XFRM_MODE_TUNNEL  x-props.saddr.a4 == 0) {
-   struct rtable *rt;
-   struct flowi fl_tunnel = {
-   .nl_u = {
-   .ip4_u = {
-   .daddr = x-id.daddr.a4,
-   }
-   }
-   };
-   if (!xfrm_dst_lookup((struct xfrm_dst **)rt,
-fl_tunnel, AF_INET)) {
-   x-props.saddr.a4 = rt-rt_src;
-   dst_release(rt-u.dst);
-   }
-   }

Re: tcp congestion policy selection link order fragile

2006-09-18 Thread bert hubert

On Mon, Sep 18, 2006 at 01:51:30AM -0700, David Miller wrote:
 We created TCP_CONG_ADVANCED for a purpose.  If you turn that
 thing on, you get full control but if something breaks you get
 to keep the pieces.

But we should not try to break stuff on purpose, no matter how advanced. It
makes zero sense. To reiterate, when compiling in multiple TCP policies, a
*random* one gets enabled. This is not something we want to offer even
advanced users. It is a kernel, not an adventure course.

Please consider this near-oneliner patch which makes stuff behave more like
people expect: loading a module, or compiling in a congestion avoidance
policy only makes it available, but does not turn it on by default. 

It also cleans up two notices a bit.

I've tested this patch and it does the job for me, reno is now the default,
even when more advanced options are compiled in, but the rest is still
available.

When in doubt, consider that I discovered this because my kernel was
crashing, and that this is bound to generate heaps of annoying email
otherwise. 

Thanks.

Signed-off-by: bert hubert [EMAIL PROTECTED]

--- linux-2.6.18-rc7/net/ipv4/tcp_cong.c.org2006-09-18 11:42:25.0 
+0200
+++ linux-2.6.18-rc7/net/ipv4/tcp_cong.c2006-09-18 11:43:45.0 
+0200
@@ -45,11 +45,11 @@
 
spin_lock(tcp_cong_list_lock);
if (tcp_ca_find(ca-name)) {
-   printk(KERN_NOTICE TCP %s already registered\n, ca-name);
+   printk(KERN_NOTICE TCP congestion control '%s' already 
registered\n, ca-name);
ret = -EEXIST;
} else {
-   list_add_rcu(ca-list, tcp_cong_list);
-   printk(KERN_INFO TCP %s registered\n, ca-name);
+   list_add_tail_rcu(ca-list, tcp_cong_list);
+   printk(KERN_INFO TCP congestion control '%s' registered\n, 
ca-name);
}
spin_unlock(tcp_cong_list_lock);
 
-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote:
  I just found out that TSC clocksource is not implemented on x86-64.
  Kernel version 2.6.18-rc7, is it true?
 
 The x86-64 timer subsystems currently doesn't have clocksources
 at all, but it supports TSC and some other timers.

Hm. On my box, TSC did not work, until I hacked arch/i386/kernel/tsc.c
in it. 
Neither clock=tsc nor clocksource=tsc didn't have any effect.

  I've also had experience of unsychronized TSC on dual-core Athlon,
  but it was cured by idle=poll.
 
 You can use that, but it will make your system run quite hot 
 and cost you a lot of powe^wmoney.

Here in Russia electric power is cheap compared with hardware upgrade.

  It seems that dhcpd3 makes the box timestamping incoming packets,
  killing the performance. I think that combining router and DHCP server
  on a same box is a legitimate situation, isn't it?
 
 Yes.  Good point. DHCP is broken and needs to be fixed. Can you
 send a bug report to the DHCP maintainers? 
 
 iirc the problem used to be that RAW sockets didn't do something
 they need them to do. Maybe we can fix that now.

Will try some days later.

Oh, and pppoe-server uses some kind of packet socket too, doesn't it?

 
 If that's not possible we can probably add a ioctl or similar
 to disable time stamping for packet sockets (DHCP shouldn't really
 need a fine grained time stamp). dhcpcd would need to use that then.

I would like some sysctl very much, too. Let tcpdump show imprecise
timestamps when forwarding performance is more important.
After all, Ciscos don't have any tcpdump analog at all, and they are 
very popular :)

 
 Keep me updated what they say.
 
 -Andi
 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel panic on T60 by e1000 driver

2006-09-18 Thread Alexey Dobriyan


Only authors of proprietary modules you loaded can debug this.
Please, redirect this and all futher oopses to them.

On 9/18/06, Joe Jin [EMAIL PROTECTED] wrote:

while I try to transmit a 8k data by send() on my laptap T60,  kernel
panic occured:



Modules linked in: rds cisco_ipsec parport_pc lp parport autofs4
pcmcia opw3945 ieee80211 ie80211_crypt ipt_REJECT xt_tcpudp x_tables
vfat fat dm_mirror dm_mod ibm-acpi button battery ac yenta_socket
rsrc_nonstatic pcmcia_core uhci_hcd ehci_hcd i2c_i801 i2c_core e1000
ext3 jbd ahci libata sd_mod scsi_mod
CPU:0
EIP:0060:[f8e02261]   Tainted:PF  VLI

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.

2006-09-18 Thread Marcel Holtmann

Hi David,

 We were making no attempt to deal with the fact that a structure with a
 uint32_t followed by a pointer is going to be _different_ for 32-bit and
 64-bit userspace. Any 32-bit process trying to use BNEPGETCONNLIST will
 be failing with -EFAULT if it's lucky; suffering from having the
 connection list dumped at a random address if it's not.

it seems that HIDP and CMTP will have the same problem.

Regards

Marcel


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Alexey Kuznetsov

Hello!

 It looks perfectly fine to me, would you like me to apply it
 Alexey?

Yes, I think it is safe.


Theoretically, there is one place where it can be not so good.
Good nagling tcp connection, which makes lots of small write()s,
will send MSS sized frames due to delayed ACKs. But if we ACK
each other segment, more segments will come out incomplete,
which could result in some decrease of throughput.

But the trap for this case was set 6 years ago. For unidirectional sessions
ACKs were sent not even each second segment, but each small segment. :-)
This did not show any problems for those 6 years. I guess it means
that the problem does not exist.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)

2006-09-18 Thread Jesse Huang

Dear Philippe:
(1)Because this is a patent issue, we are not allow to use it again, even it
is in Data Sheet.

(2)Ok, sorry for this, I will add it back.

Should I resent those 4 patches? Or generate this as a new patch?

Thanks very much!

Best Regards,
Jesse Huang.

- Original Message - 
From: Philippe De Muyter [EMAIL PROTECTED]
To: Jesse Huang [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org
Sent: Monday, September 18, 2006 5:41 PM
Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)


On Mon, Sep 18, 2006 at 11:41:09AM +0800, Jesse Huang wrote:
 Dear Philippe:

 (1) We are not allow to support register TxStartThresh and, RxEarlyThresh,
 so
 we remove it.

Could you develop ?
- What do you mean by `We are not allow'
- Is it specific to the IP100A chip ?

Those register are documented in the Sundance Technology ST201 Data Sheet
and when modified with fine-tuned values, they can have a real positive
effect on the overall throughput on a loaded system.


 (2) Your consideration is right. But reset_tx is workaround for customer's
 embedded system, I don't have this
 enviroment now. I can't sure it will work fine if I removed this.

On DFE-580TX boards, the reset_tx way did not work.  The ports remained
blocked until a power-cycle.  I do not know if the TxUnderrun problem ever
happened with earlier (one port) boards, so I doubt that the reset_tx way
ever worked.  Is was even commented as not being tested.  On DFE-580TX
boards, the current way has been verified by me and others to work, so
please do not break it.

Best regards

Philippe


 Thanks you very mutch.

 Best Regards,
 Jesse Huang.

 - Original Message - 
 From: Philippe De Muyter [EMAIL PROTECTED]
 To: Jesse Huang [EMAIL PROTECTED]
 Cc: netdev@vger.kernel.org
 Sent: Friday, September 15, 2006 7:44 PM
 Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)


 On Thu, Sep 14, 2006 at 12:58:30AM +, Jesse Huang wrote:
 [...]
  @@ -262,8 +262,6 @@ enum alta_offsets {
   ASICCtrl = 0x30,
   EEData = 0x34,
   EECtrl = 0x36,
  - TxStartThresh = 0x3c,
  - RxEarlyThresh = 0x3e,

 Why ?

   FlashAddr = 0x40,
   FlashData = 0x44,
   TxStatus = 0x46,
 [...]
  @@ -1156,29 +1160,29 @@ static irqreturn_t intr_handler(int irq,
   np-stats.tx_fifo_errors++;
   if (tx_status  0x02)
   np-stats.tx_window_errors++;
  - /*
  - ** This reset has been verified on
  - ** DFE-580TX boards ! [EMAIL PROTECTED]
  - */
  - if (tx_status  0x10) { /* TxUnderrun */
  - unsigned short txthreshold;
  -
  - txthreshold = ioread16 (ioaddr + TxStartThresh);
  - /* Restart Tx FIFO and transmitter */
  - sundance_reset(dev, (NetworkReset|FIFOReset|TxReset)  16);
  - iowrite16 (txthreshold, ioaddr + TxStartThresh);
  - /* No need to reset the Tx pointer here */
  +
  + /* FIFO ERROR need to be reset tx */
  + if (tx_status  0x10) { /* Reset the Tx. */
  + spin_lock(np-lock);
  + reset_tx(dev);
  + spin_unlock(np-lock);
  + }

 Just as the comments say, on DFE-580TX 4 port boards, where it is easy to
 reproduce TxUnderrun problems, just resetting on the chip the Tx FIFO and
 transmitter is enough.
 There is no need to call reset_tx, which discards all pending messages and
 frees all the skb's.  It is also not necessary to reload the Tx pointer.

 Is it different with newer versions of the chip ?

 Philippe


--


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

Vladimir B. Savkin [EMAIL PROTECTED] writes:

[you seem to send your emails in a strange way that doesn't keep me in cc.
Please stop doing that.]

 On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote:
The x86-64 timer subsystems currently doesn't have clocksources
at all, but it supports TSC and some other timers.
   
  
   until I hacked arch/i386/kernel/tsc.c
  
  Then you don't use x86-64. 
  
 Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64
 by hacking some Makefiles and headers. 

The codebase for timing (and lots of other things) is quite different
between 32bit and 64bit. You're really surprised it doesn't work if you do such 
things?

 But the question is, why stock 2.6.18-rc7 could not use TSC on its own?

x86-64 doesn't use the TSC when it deems it to not be reliable, which
is the case on your system.
 
 I've also had experience of unsychronized TSC on dual-core Athlon,
 but it was cured by idle=poll.

You can use that, but it will make your system run quite hot 
and cost you a lot of powe^wmoney.
   
   Here in Russia electric power is cheap compared with hardware upgrade.
  
  It's not just electrical power - the hardware is more stressed and will
  likely fail earlier too.  As a rule of thumb the hotter your hardware runs
  the earlier it will fail.
 
 What hardware exactly. Doesn't it affect only CPU? And they are not
 know to fail before any other components.

All hardware. It's basic physics.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)

2006-09-18 Thread Philippe De Muyter

Hi Jesse,

On Mon, Sep 18, 2006 at 07:11:29PM +0800, Jesse Huang wrote:
 Dear Philippe:
 (1)Because this is a patent issue, we are not allow to use it again, even it
 is in Data Sheet.

I surmise this is only a concern for icplus as a hardware company.

The sundance driver in Linux is meant to work also with the previous versions
of the chip (Sundance, Kendin, D-Link).  If you wish you can make it clear
that those registers have disappeared or have no effect in the icplus 100A
version.

 
 (2)Ok, sorry for this, I will add it back.

Thanks

 
 Should I resent those 4 patches? Or generate this as a new patch?

I do not know about the other patches, but for this one of course you should

Philippe

 
 Thanks very much!
 
 Best Regards,
 Jesse Huang.
 
 - Original Message - 
 From: Philippe De Muyter [EMAIL PROTECTED]
 To: Jesse Huang [EMAIL PROTECTED]
 Cc: netdev@vger.kernel.org
 Sent: Monday, September 18, 2006 5:41 PM
 Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)
 
 
 On Mon, Sep 18, 2006 at 11:41:09AM +0800, Jesse Huang wrote:
  Dear Philippe:
 
  (1) We are not allow to support register TxStartThresh and, RxEarlyThresh,
  so
  we remove it.
 
 Could you develop ?
 - What do you mean by `We are not allow'
 - Is it specific to the IP100A chip ?
 
 Those register are documented in the Sundance Technology ST201 Data Sheet
 and when modified with fine-tuned values, they can have a real positive
 effect on the overall throughput on a loaded system.
 
 
  (2) Your consideration is right. But reset_tx is workaround for customer's
  embedded system, I don't have this
  enviroment now. I can't sure it will work fine if I removed this.
 
 On DFE-580TX boards, the reset_tx way did not work.  The ports remained
 blocked until a power-cycle.  I do not know if the TxUnderrun problem ever
 happened with earlier (one port) boards, so I doubt that the reset_tx way
 ever worked.  Is was even commented as not being tested.  On DFE-580TX
 boards, the current way has been verified by me and others to work, so
 please do not break it.
 
 Best regards
 
 Philippe
 
 
  Thanks you very mutch.
 
  Best Regards,
  Jesse Huang.
 
  - Original Message - 
  From: Philippe De Muyter [EMAIL PROTECTED]
  To: Jesse Huang [EMAIL PROTECTED]
  Cc: netdev@vger.kernel.org
  Sent: Friday, September 15, 2006 7:44 PM
  Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)
 
 
  On Thu, Sep 14, 2006 at 12:58:30AM +, Jesse Huang wrote:
  [...]
   @@ -262,8 +262,6 @@ enum alta_offsets {
ASICCtrl = 0x30,
EEData = 0x34,
EECtrl = 0x36,
   - TxStartThresh = 0x3c,
   - RxEarlyThresh = 0x3e,
 
  Why ?
 
FlashAddr = 0x40,
FlashData = 0x44,
TxStatus = 0x46,
  [...]
   @@ -1156,29 +1160,29 @@ static irqreturn_t intr_handler(int irq,
np-stats.tx_fifo_errors++;
if (tx_status  0x02)
np-stats.tx_window_errors++;
   - /*
   - ** This reset has been verified on
   - ** DFE-580TX boards ! [EMAIL PROTECTED]
   - */
   - if (tx_status  0x10) { /* TxUnderrun */
   - unsigned short txthreshold;
   -
   - txthreshold = ioread16 (ioaddr + TxStartThresh);
   - /* Restart Tx FIFO and transmitter */
   - sundance_reset(dev, (NetworkReset|FIFOReset|TxReset)  16);
   - iowrite16 (txthreshold, ioaddr + TxStartThresh);
   - /* No need to reset the Tx pointer here */
   +
   + /* FIFO ERROR need to be reset tx */
   + if (tx_status  0x10) { /* Reset the Tx. */
   + spin_lock(np-lock);
   + reset_tx(dev);
   + spin_unlock(np-lock);
   + }
 
  Just as the comments say, on DFE-580TX 4 port boards, where it is easy to
  reproduce TxUnderrun problems, just resetting on the chip the Tx FIFO and
  transmitter is enough.
  There is no need to call reset_tx, which discards all pending messages and
  frees all the skb's.  It is also not necessary to reload the Tx pointer.
 
  Is it different with newer versions of the chip ?
 
  Philippe
 
 
 --
 

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.

2006-09-18 Thread David Woodhouse

On Mon, 2006-09-18 at 12:38 +0200, Marcel Holtmann wrote:
 Hi David,
 
  We were making no attempt to deal with the fact that a structure with a
  uint32_t followed by a pointer is going to be _different_ for 32-bit and
  64-bit userspace. Any 32-bit process trying to use BNEPGETCONNLIST will
  be failing with -EFAULT if it's lucky; suffering from having the
  connection list dumped at a random address if it's not.
 
 it seems that HIDP and CMTP will have the same problem.

Indeed they do. This patch fixes 'hidd -l'... although HIDP mouse
movement doesn't seem to be appearing in /dev/input/mice on my G5, while
the 'hcidump' output looks sane enough while I move it.

-
[HIDP] Fix compat HIDPGETCONNLIST ioctl.

Signed-off-by: David Woodhouse [EMAIL PROTECTED]

diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
index 099646e..af5a21c 100644
--- a/net/bluetooth/hidp/sock.c
+++ b/net/bluetooth/hidp/sock.c
@@ -35,6 +35,7 @@ #include linux/socket.h
 #include linux/ioctl.h
 #include linux/file.h
 #include linux/init.h
+#include linux/compat.h
 #include net/sock.h
 
 #include hidp.h
@@ -143,11 +144,42 @@ static int hidp_sock_ioctl(struct socket
return -EINVAL;
 }
 
+#ifdef CONFIG_COMPAT
+static int hidp_sock_compat_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
+{
+   if (cmd == HIDPGETCONNLIST) {
+   struct hidp_connlist_req cl;
+   uint32_t uci;
+   int err;
+
+   if (get_user(cl.cnum, (uint32_t __user *)arg) ||
+   get_user(uci, (u32 __user *)(arg+4)))
+   return -EFAULT;
+
+   cl.ci = compat_ptr(uci);
+
+   if (cl.cnum = 0)
+   return -EINVAL;
+
+   err = hidp_get_connlist(cl);
+
+   if (!err  put_user(cl.cnum, (uint32_t __user *)arg))
+   err = -EFAULT;
+
+   return err;
+   }
+   return hidp_sock_ioctl(sock, cmd, arg);
+}
+#endif
+
 static const struct proto_ops hidp_sock_ops = {
.family = PF_BLUETOOTH,
.owner  = THIS_MODULE,
.release= hidp_sock_release,
.ioctl  = hidp_sock_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = hidp_sock_compat_ioctl,
+#endif
.bind   = sock_no_bind,
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tcp congestion policy selection link order fragile

2006-09-18 Thread David Miller

From: bert hubert [EMAIL PROTECTED]
Date: Mon, 18 Sep 2006 11:59:36 +0200

 I've tested this patch and it does the job for me, reno is now the default,
 even when more advanced options are compiled in, but the rest is still
 available.

This breaks our intention that when TCP_CONG_ADVANCED is not set, BIC
is the default since that is the default congestion control algorithm
we want users to get.

When TCP_CONG_ADVANCED is disabled, we turn on TCP_CONG_BIC,
and your changes cause reno to be the default algorithm in
that build case.  That's not what we want.

Any ordering scheme is wrong or unexpected for _somebody_.  Look how
easy it was for you to break the BIC default we had in place.  To make
things sensible for you, your patch causes everyone else got the wrong
default.  Therefore any ordering scheme is by definition arbitrary and
no ordering is better than any other one.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.

2006-09-18 Thread David Woodhouse

On Mon, 2006-09-18 at 14:25 +0100, David Woodhouse wrote:
 although HIDP mouse movement doesn't seem to be appearing
 in /dev/input/mice on my G5, while the 'hcidump' output looks sane
 enough while I move it. 

Ew, that's because struct hidp_connadd_req is similarly buggered for
compat. Replacement HIDP patch to fix both at once... I didn't miss
anywhere where we actually change the hidp_connadd_req structure during
the call, did I?

-
[HIDP] Fix compat HIDPGETCONNLIST and HIDPCONNADD ioctls.

Signed-off-by: David Woodhouse [EMAIL PROTECTED]

diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
index 099646e..e01fdc5 100644
--- a/net/bluetooth/hidp/sock.c
+++ b/net/bluetooth/hidp/sock.c
@@ -35,6 +35,7 @@ #include linux/socket.h
 #include linux/ioctl.h
 #include linux/file.h
 #include linux/init.h
+#include linux/compat.h
 #include net/sock.h
 
 #include hidp.h
@@ -143,11 +144,87 @@ static int hidp_sock_ioctl(struct socket
return -EINVAL;
 }
 
+#ifdef CONFIG_COMPAT
+struct compat_hidp_connadd_req {
+   int   ctrl_sock;// Connected control socket
+   int   intr_sock;// Connteted interrupt socket
+   __u16 parser;
+   __u16 rd_size;
+   compat_uptr_t rd_data;
+   __u8  country;
+   __u8  subclass;
+   __u16 vendor;
+   __u16 product;
+   __u16 version;
+   __u32 flags;
+   __u32 idle_to;
+   char  name[128];
+};
+
+static int hidp_sock_compat_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
+{
+   if (cmd == HIDPGETCONNLIST) {
+   struct hidp_connlist_req cl;
+   uint32_t uci;
+   int err;
+
+   if (get_user(cl.cnum, (uint32_t __user *)arg) ||
+   get_user(uci, (u32 __user *)(arg+4)))
+   return -EFAULT;
+
+   cl.ci = compat_ptr(uci);
+
+   if (cl.cnum = 0)
+   return -EINVAL;
+
+   err = hidp_get_connlist(cl);
+
+   if (!err  put_user(cl.cnum, (uint32_t __user *)arg))
+   err = -EFAULT;
+
+   return err;
+   } else if (cmd == HIDPCONNADD) {
+   struct compat_hidp_connadd_req ca;
+   struct hidp_connadd_req __user *uca;
+
+   uca = compat_alloc_user_space(sizeof(*uca));
+
+   if (copy_from_user(ca, (void *)arg, sizeof(ca)))
+   return -EFAULT;
+
+   if (put_user(ca.ctrl_sock, uca-ctrl_sock)
+   || put_user(ca.intr_sock, uca-intr_sock)
+   || put_user(ca.parser, uca-parser)
+   || put_user(ca.rd_size, uca-parser)
+   || put_user(compat_ptr(ca.rd_data), uca-rd_data)
+   || put_user(ca.country, uca-country)
+   || put_user(ca.subclass, uca-subclass)
+   || put_user(ca.vendor, uca-vendor)
+   || put_user(ca.product, uca-product)
+   || put_user(ca.version, uca-version)
+   || put_user(ca.flags, uca-flags)
+   || put_user(ca.idle_to, uca-idle_to)
+   || copy_to_user(uca-name[0], ca.name[0], 128))
+   return -EFAULT;
+   
+   arg = (unsigned long)uca;
+   /* Fall through. We don't actually write back any _changes_
+  to the structure anyway, so there's no need to copy back
+  into the original compat version */
+   }
+
+   return hidp_sock_ioctl(sock, cmd, arg);
+}
+#endif
+
 static const struct proto_ops hidp_sock_ops = {
.family = PF_BLUETOOTH,
.owner  = THIS_MODULE,
.release= hidp_sock_release,
.ioctl  = hidp_sock_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = hidp_sock_compat_ioctl,
+#endif
.bind   = sock_no_bind,
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,


-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.

2006-09-18 Thread David Woodhouse

On Mon, 2006-09-18 at 12:38 +0200, Marcel Holtmann wrote:
 it seems that HIDP and CMTP will have the same problem.

Finally, the CMTP version... this one is untested.


[CMTP] Fix compat CMTPGETCONNLIST ioctl

Signed-off-by: David Woodhouse [EMAIL PROTECTED]

diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index 10ad7fd..68e1290 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -34,6 +34,7 @@ #include linux/skbuff.h
 #include linux/socket.h
 #include linux/ioctl.h
 #include linux/file.h
+#include linux/compat.h
 #include net/sock.h
 
 #include linux/isdn/capilli.h
@@ -137,11 +138,44 @@ static int cmtp_sock_ioctl(struct socket
return -EINVAL;
 }
 
+#ifdef CONFIG_COMPAT
+static int cmtp_sock_compat_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
+{
+
+   if (cmd == CMTPGETCONNLIST) {
+   struct cmtp_connlist_req cl;
+   uint32_t uci;
+   int err;
+
+   if (get_user(cl.cnum, (uint32_t __user *)arg) ||
+   get_user(uci, (u32 __user *)(arg+4)))
+   return -EFAULT;
+
+   cl.ci = compat_ptr(uci);
+
+   if (cl.cnum = 0)
+   return -EINVAL;
+   
+   err = cmtp_get_connlist(cl);
+
+   if (!err  put_user(cl.cnum, (uint32_t __user *)arg))
+   err = -EFAULT;
+
+   return err;
+   }
+
+   return cmtp_sock_ioctl(sock, cmd, arg);
+}
+#endif
+
 static const struct proto_ops cmtp_sock_ops = {
.family = PF_BLUETOOTH,
.owner  = THIS_MODULE,
.release= cmtp_sock_release,
.ioctl  = cmtp_sock_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = cmtp_sock_compat_ioctl,
+#endif
.bind   = sock_no_bind,
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PATCH] NET: Fixes for net-2.6.19

2006-09-18 Thread YOSHIFUJI Hideaki / 吉藤英明

Hello.

Please pull the following changesets available at:
git://git.skbuff.net/gitroot/yoshfuji/net-2.6.19-20060918-net/

HEADLINES
-

[XFRM]: Do not add a state whose SPI is zero to the SPI hash.
[NET]: Move netlink interface bits to linux/if_link.h.
[NET]: Include linux/if_link.h directly from the source file.
[NET]: Include new rtnetlink headers for userspace backward compatibility.
[NET]: Put {IFLA,IFA,NDA,NDTA}_{RTA,PAYLOAD}() macro back.
[NET] KBUILD: Add missing entries for new net headers.

DIFFSTAT


 include/linux/Kbuild  |   10 ++-
 include/linux/if.h|  130 --
 include/linux/if_addr.h   |3 +
 include/linux/if_link.h   |  139 +
 include/linux/neighbour.h |7 ++
 include/linux/rtnetlink.h |7 ++
 net/bridge/br_netlink.c   |1 
 net/core/rtnetlink.c  |1 
 net/core/wireless.c   |1 
 net/ipv6/addrconf.c   |1 
 net/xfrm/xfrm_state.c |   11 ++--
 11 files changed, 172 insertions(+), 139 deletions(-)

CHANGESETS
--

commit 04b3eac83cccb7da663bd11a2b569f197bb3170e
Author: Masahide NAKAMURA [EMAIL PROTECTED]
Date:   Sun Sep 17 13:54:53 2006 +0900

[XFRM]: Do not add a state whose SPI is zero to the SPI hash.

SPI=0 is used for acquired IPsec SA and MIPv6 RO state.
Such state should not be added to the SPI hash
because we do not care about it on deleting path.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 9f63edd..5f4a50e 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -96,9 +96,12 @@ static void xfrm_hash_transfer(struct hl
nhashmask);
hlist_add_head(x-bysrc, nsrctable+h);
 
-   h = __xfrm_spi_hash(x-id.daddr, x-id.spi, x-id.proto,
-   x-props.family, nhashmask);
-   hlist_add_head(x-byspi, nspitable+h);
+   if (x-id.spi) {
+   h = __xfrm_spi_hash(x-id.daddr, x-id.spi,
+   x-id.proto, x-props.family,
+   nhashmask);
+   hlist_add_head(x-byspi, nspitable+h);
+   }
}
 }
 
@@ -622,7 +625,7 @@ static void __xfrm_state_insert(struct x
h = xfrm_src_hash(x-props.saddr, x-props.family);
hlist_add_head(x-bysrc, xfrm_state_bysrc+h);
 
-   if (xfrm_id_proto_match(x-id.proto, IPSEC_PROTO_ANY)) {
+   if (x-id.spi) {
h = xfrm_spi_hash(x-id.daddr, x-id.spi, x-id.proto,
  x-props.family);
 

---
commit 44ad787528719604896754d1d05895d2dcfff88b
Author: YOSHIFUJI Hideaki [EMAIL PROTECTED]
Date:   Sun Sep 17 13:54:55 2006 +0900

[NET]: Move netlink interface bits to linux/if_link.h.

Moving netlink interface bits to linux/if.h is rather troublesome for
applications including both linux/if.h (which was changed to be included
from linux/rtnetlink.h automatically) and net/if.h.

Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

diff --git a/include/linux/if.h b/include/linux/if.h
index cd080d7..ab85ed0 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -212,134 +212,4 @@ struct ifconf 
 #defineifc_buf ifc_ifcu.ifcu_buf   /* buffer address   
*/
 #defineifc_req ifc_ifcu.ifcu_req   /* array of structures  
*/
 
-/* The struct should be in sync with struct net_device_stats */
-struct rtnl_link_stats
-{
-   __u32   rx_packets; /* total packets received   */
-   __u32   tx_packets; /* total packets transmitted*/
-   __u32   rx_bytes;   /* total bytes received */
-   __u32   tx_bytes;   /* total bytes transmitted  */
-   __u32   rx_errors;  /* bad packets received */
-   __u32   tx_errors;  /* packet transmit problems */
-   __u32   rx_dropped; /* no space in linux buffers*/
-   __u32   tx_dropped; /* no space available in linux  */
-   __u32   multicast;  /* multicast packets received   */
-   __u32   collisions;
-
-   /* detailed rx_errors: */
-   __u32   rx_length_errors;
-   __u32   rx_over_errors; /* receiver ring buff overflow  */
-   __u32   rx_crc_errors;  /* recved pkt with crc error*/
-   __u32   rx_frame_errors;/* recv'd frame alignment error */
-   __u32   rx_fifo_errors; /* recv'r fifo overrun  */
-   __u32   rx_missed_errors;   /* receiver missed packet   */
-
-   /* detailed tx_errors */
-   __u32   tx_aborted_errors;
-   __u32   tx_carrier_errors;
-   __u32   tx_fifo_errors

[GIT PATCH] IPV6: Updates for net-2.6.19

2006-09-18 Thread YOSHIFUJI Hideaki / 吉藤英明

Hello.

Please pull the following changesets available at:
git://git.skbuff.net/gitroot/yoshfuji/net-2.6.19-20060918-inet6/

HEADLINES
-

[IPV6] NDISC: Handle NDP messages to proxied addresses.
[IPV6]: Don't forward packets to proxied link-local address.
[IPV6] NDISC: Avoid updating neighbor cache for proxied address in 
receiving NA.
[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.
[IPV6] NDISC: Add proxy_ndp sysctl.
[IPV6] ADDRCONF: Convert addrconf_lock to RCU.

DIFFSTAT


 Documentation/networking/ip-sysctl.txt |3 ++
 include/linux/ipv6.h   |2 +
 include/linux/sysctl.h |1 +
 include/net/addrconf.h |   10 ++---
 include/net/if_inet6.h |1 +
 include/net/neighbour.h|1 +
 net/core/neighbour.c   |   11 --
 net/core/pktgen.c  |4 +-
 net/ipv6/addrconf.c|   57 ++---
 net/ipv6/anycast.c |4 +-
 net/ipv6/ip6_output.c  |   62 
 net/ipv6/ipv6_syms.c   |1 -
 net/ipv6/ndisc.c   |   29 +--
 net/sctp/ipv6.c|6 ++-
 14 files changed, 150 insertions(+), 42 deletions(-)

CHANGESETS
--

commit 9b06d4f4593cb15872e4351e3b1bdbf69c279f68
Author: Masahide NAKAMURA [EMAIL PROTECTED]
Date:   Sun Sep 17 13:55:07 2006 +0900

[IPV6] NDISC: Handle NDP messages to proxied addresses.

It is required to respond to NDP messages sent directly to the target
unicast address.  Proxying node (router) is required to handle such
messages.  To achieve this, check if the packet in forwarding patch is
NDP message.

With this patch, the proxy neighbor entries are always looked up in
forwarding path.  We may want to optimize further.

Based on MIPL2 kernel patch.

Signed-off-by: Ville Nuorvala [EMAIL PROTECTED]
Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c14ea1e..0f56e9e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -308,6 +308,46 @@ static int ip6_call_ra_chain(struct sk_b
return 0;
 }
 
+static int ip6_forward_proxy_check(struct sk_buff *skb)
+{
+   struct ipv6hdr *hdr = skb-nh.ipv6h;
+   u8 nexthdr = hdr-nexthdr;
+   int offset;
+
+   if (ipv6_ext_hdr(nexthdr)) {
+   offset = ipv6_skip_exthdr(skb, sizeof(*hdr), nexthdr);
+   if (offset  0)
+   return 0;
+   } else
+   offset = sizeof(struct ipv6hdr);
+
+   if (nexthdr == IPPROTO_ICMPV6) {
+   struct icmp6hdr *icmp6;
+
+   if (!pskb_may_pull(skb, skb-nh.raw + offset + 1 - skb-data))
+   return 0;
+
+   icmp6 = (struct icmp6hdr *)(skb-nh.raw + offset);
+
+   switch (icmp6-icmp6_type) {
+   case NDISC_ROUTER_SOLICITATION:
+   case NDISC_ROUTER_ADVERTISEMENT:
+   case NDISC_NEIGHBOUR_SOLICITATION:
+   case NDISC_NEIGHBOUR_ADVERTISEMENT:
+   case NDISC_REDIRECT:
+   /* For reaction involving unicast neighbor discovery
+* message destined to the proxied address, pass it to
+* input function.
+*/
+   return 1;
+   default:
+   break;
+   }
+   }
+
+   return 0;
+}
+
 static inline int ip6_forward_finish(struct sk_buff *skb)
 {
return dst_output(skb);
@@ -362,6 +402,11 @@ int ip6_forward(struct sk_buff *skb)
return -ETIMEDOUT;
}
 
+   if (pneigh_lookup(nd_tbl, hdr-daddr, skb-dev, 0)) {
+   if (ip6_forward_proxy_check(skb))
+   return ip6_input(skb);
+   }
+
if (!xfrm6_route_forward(skb)) {
IP6_INC_STATS(IPSTATS_MIB_INDISCARDS);
goto drop;

---
commit 6d57c4f060b4d327a40bed8cd5053ba812cb0cb6
Author: Masahide NAKAMURA [EMAIL PROTECTED]
Date:   Sun Sep 17 13:55:09 2006 +0900

[IPV6]: Don't forward packets to proxied link-local address.

Proxying router can't forward traffic sent to link-local address, so signal
the sender and discard the packet. This behavior is clarified by Mobile IPv6
specification (RFC3775) but might be required for all proxying router.
Based on MIPL2 kernel patch.

Signed-off-by: Ville Nuorvala [EMAIL PROTECTED]
Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 0f56e9e..b2be749 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -345,6

Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.

2006-09-18 Thread Marcel Holtmann

Hi David,

  although HIDP mouse movement doesn't seem to be appearing
  in /dev/input/mice on my G5, while the 'hcidump' output looks sane
  enough while I move it. 
 
 Ew, that's because struct hidp_connadd_req is similarly buggered for
 compat. Replacement HIDP patch to fix both at once... I didn't miss
 anywhere where we actually change the hidp_connadd_req structure during
 the call, did I?

that looks ugly, but I assume there is no other way to solve this
problem. I will go over all three patches and wrap them up nicely.

Linus, will you accept these for inclusion before 2.6.18 final?

Regards

Marcel


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Miller

From: Andi Kleen [EMAIL PROTECTED]
Date: 18 Sep 2006 11:58:21 +0200

 For netdev: I'm more and more thinking we should just avoid the
 problem completely and switch to true end2end timestamps. This
 means don't time stamp when a packet is received, but only when it
 is delivered to a socket. The timestamp at receiving is a lie
 anyways because the network hardware can add an arbitary long delay
 before the driver interrupt handler runs. Then the problem above
 would completely disappear.

I don't think this is wise.

People who run tcpdump want wire timestamps as close as possible.
Yes, things get delayed with the IRQ path, DMA delays, IRQ
mitigation and whatnot, but it's an order of magnitude worse if
you delay to user read() since that introduces also the delay of
the packet copies to userspace which are significantly larger than
these hardware level delays.  If tcpdump gets swapped out, the
timestamp delay can be on the order of several seconds making it
totally useless.

Andi, you will need to find another solution to this problem :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen


 
 People who run tcpdump want wire timestamps as close as possible.
 Yes, things get delayed with the IRQ path, DMA delays, IRQ
 mitigation and whatnot, but it's an order of magnitude worse if
 you delay to user read() since that introduces also the delay of
 the packet copies to userspace which are significantly larger than
 these hardware level delays.  If tcpdump gets swapped out, the
 timestamp delay can be on the order of several seconds making it
 totally useless.

My proposal wasn't to delay to user read, just to do the time stamp in socket 
context. This means as soon as packet or RAW/UDP have looked up the socket and 
can 
check a per socket flag do the time stamp.

The only delay this would add would be the queueing time from the NIC
to the softirq. Do you really think that is that bad?

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alan Cox

Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen:
 The only delay this would add would be the queueing time from the NIC
 to the softirq. Do you really think that is that bad?

If you are trying to do things like network record/playback then you
want the minimal delay. There's a reason the original timestamp code
supported the hardware setting the timestamp itself - we actually had a
separare set of logic on a board that was doing the timestamping by
watching the IRQ line of the NIC chip.

Alan

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/9] network namespaces: socket hashes

2006-09-18 Thread Daniel Lezcano


Andrey Savochkin wrote:

Socket hash lookups are made within namespace.
Hash tables are common for all namespaces, with
additional permutation of indexes.


Hi Andrey,

why is the hash table common and not instanciated multiple times for 
each namespace like the routes ?


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tcp congestion policy selection link order fragile

2006-09-18 Thread bert hubert

On Mon, Sep 18, 2006 at 07:06:00AM -0700, David Miller wrote:

 Any ordering scheme is wrong or unexpected for _somebody_.  Look how

I agree violently. Would you agree that it would be best to have a mechanism
that explicitly sets a sane default, and does not rely on ordering?

My implementation indeed broke your intentions, but would you be open to
revamping things so the default policy is not dependent on load order?

What would the desired default be, 'BIC' in all cases?

Thanks. 

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)

2006-09-18 Thread Francois Romieu

Philippe De Muyter [EMAIL PROTECTED] :
[...]
 On Mon, Sep 18, 2006 at 07:11:29PM +0800, Jesse Huang wrote:
  Dear Philippe:
  (1)Because this is a patent issue, we are not allow to use it again, even it
  is in Data Sheet.
 
 I surmise this is only a concern for icplus as a hardware company.

I'd rather avoid that any Linux user of the old sundance driver with
a new ip100a chipset instantly has some problem with the said patent.

Who would be responsible for it ? :o(

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Rick Jones

David Miller wrote:

From: Rick Jones [EMAIL PROTECTED]
Date: Tue, 05 Sep 2006 10:55:16 -0700

Is this really necessary?  I thought that the problems with ABC were in 
trying to apply byte-based heuristics from the RFC(s) to a 
packet-oritented cwnd in the stack?

This is receiver side, and helps a sender who does congestion
control based upon packet counting like Linux does.   It really
is less related to ABC than Alexey implies, we've always had
this kind of problem as I mentioned in previous talks in the
past on this issue.

For a connection receiving nothing but sub-MSS segments this is going to 
non-trivially increase the number of ACKs sent no?  I would expect an 
unpleasant increase in service demands on something like a burst 
enabled (./configure --enable-burst) netperf TCP_RR test:

netperf -t TCP_RR -H foo -- -b N   # N  1

to increase as a result.   Pipelined HTTP would be like that, some NFS 
over TCP stuff too, maybe X traffic, other transactional workloads as 
well - maybe Tuxeudo.

rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/7] secid reconciliation-v02: Invoke LSM hook for inbound traffic

2006-09-18 Thread James Morris

On Fri, 8 Sep 2006, Venkat Yekkirala wrote:

 -static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff
 *skb)
 -{
 - return xfrm_policy_check(sk, dir, skb, AF_INET6);
 + if (sk  sk-sk_policy[XFRM_POLICY_IN])
 + ret = __xfrm_policy_check(sk, dir, skb, family);
 + else
 + ret = (!xfrm_policy_count[dir]  !skb-sp) ||
 +   (skb-dst-flags  DST_NOPOLICY) ||
 +   __xfrm_policy_check(sk, dir, skb, family);
 +
 +#ifdef CONFIG_SECURITY_NETWORK
 + if (ret)
 + ret = security_skb_policy_check(skb, family);
 +#endif /* CONFIG_SECURITY_NETWORK */

Why is this code ifdef'd when the function is conditionally compiled?

 {
 +#ifdef CONFIG_SECURITY_NETWORK
 + return security_skb_policy_check(skb, family);
 +#else
   return 1;
 +#endif /* CONFIG_SECURITY_NETWORK */

Ditto.



-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7] secid reconciliation-v02: Label locally generated IPv4 traffic

2006-09-18 Thread James Morris

On Fri, 8 Sep 2006, Venkat Yekkirala wrote:

 diff --git a/include/net/ip.h b/include/net/ip.h
 index 98f9084..4646c13 100644
 --- a/include/net/ip.h
 +++ b/include/net/ip.h
 @@ -48,6 +48,9 @@ struct ipcm_cookie
   u32 addr;
   int oif;
   struct ip_options   *opt;
 +#ifdef CONFIG_SECURITY_NETWORK
 + __u32   secid;
 +#endif /* CONFIG_SECURITY_NETWORK */
 };

This field should be 'u32'.


-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] secid reconciliation-v02: Invoke LSM hook for outbound traffic

2006-09-18 Thread James Morris

On Fri, 8 Sep 2006, Venkat Yekkirala wrote:

 @@ -114,6 +128,9 @@ static struct xt_target xt_connsecmark_t
   .target = target,
   .targetsize = sizeof(struct xt_connsecmark_target_info),
   .table  = mangle,
 + .hooks  = (1  NF_IP_LOCAL_IN) |
 +   (1  NF_IP_FORWARD) |
 +   (1  NF_IP_POST_ROUTING),

Why have you added constraints on the hooks?

This breaks a bunch of things.


 @@ -123,6 +140,9 @@ static struct xt_target xt_connsecmark_t
   .target = target,
   .targetsize = sizeof(struct xt_connsecmark_target_info),
   .table  = mangle,
 + .hooks  = (1  NF_IP6_LOCAL_IN) |
 +   (1  NF_IP6_FORWARD) |
 +   (1  NF_IP6_POST_ROUTING),
   .me = THIS_MODULE,

Ditto...

 @@ -119,6 +129,9 @@ static struct xt_target xt_secmark_targe
   .target = target,
   .targetsize = sizeof(struct xt_secmark_target_info),
   .table  = mangle,
 + .hooks  = (1  NF_IP_LOCAL_IN) |
 +   (1  NF_IP_FORWARD) |
 +   (1  NF_IP_POST_ROUTING),
   .me = THIS_MODULE,
   },
   {
 @@ -128,6 +141,9 @@ static struct xt_target xt_secmark_targe
   .target = target,
   .targetsize = sizeof(struct xt_secmark_target_info),
   .table  = mangle,
 + .hooks  = (1  NF_IP6_LOCAL_IN) |
 +   (1  NF_IP6_FORWARD) |
 +   (1  NF_IP6_POST_ROUTING),
   .me = THIS_MODULE,
   },
 };
 

-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 17:19, Alan Cox wrote:
 Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen:
  The only delay this would add would be the queueing time from the NIC
  to the softirq. Do you really think that is that bad?
 
 If you are trying to do things like network record/playback then you
 want the minimal delay. 

But it's not minimal. Maybe it was long ago when the code was designed
on a 3c509 but not with modern hardware: Think interrupt mitigation and NAPI. 

And with NAPI we tend to process the packets directly after they
are fetched out of the RX queue, so there is practically no delay
between driver seeing the packet and softirq seeing it.  All the queuing
is done either at hardware level or later at socket level.

 There's a reason the original timestamp code 
 supported the hardware setting the timestamp itself - we actually had a
 separare set of logic on a board that was doing the timestamping by
 watching the IRQ line of the NIC chip.

That would be fine too (because it will be likely fast), but unfortunately
I don't know of any driver that does that.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

 For netdev: I'm more and more thinking we should just avoid the problem
 completely and switch to true end2end timestamps. This means don't
 time stamp when a packet is received, but only when it is delivered
 to a socket.

This will work.

From viewpoint of existing uses of timestamp by packet socket
this time is not worse. The only danger is violation of casuality
(when forwarded packet or reply packet gets timestamp earlier than
original packet). This pathology was main reason why timestamp
is recorded early, before packet is demultiplexed in netif_receive_skb().
But it is not a practical problem: delivery to packet/raw sockets
is occasionally placed _before_ delivery to real protocol handlers.


 handler runs. Then the problem above would completely disappear. 

Well, not completely. Too slow clock source remains too slow clock source.
If it is so slow, that it results in performance degradation, it just
should not be used at all, even such pariah as tcpdump wants to be fast.

Actually, I have a question. Why the subject is
Network performance degradation from 2.6.11.12 to 2.6.16.20?
I do not see beginning of the thread and cannot guess
why clock source degraded. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 17:38, Alexey Kuznetsov wrote:
 Hello!
 
  For netdev: I'm more and more thinking we should just avoid the problem
  completely and switch to true end2end timestamps. This means don't
  time stamp when a packet is received, but only when it is delivered
  to a socket.
 
 This will work.
 
 From viewpoint of existing uses of timestamp by packet socket
 this time is not worse. The only danger is violation of casuality
 (when forwarded packet or reply packet gets timestamp earlier than
 original packet). 

Hmm, not sure how that could happen. Also is it a real problem
even if it could?

  handler runs. Then the problem above would completely disappear. 
 
 Well, not completely. Too slow clock source remains too slow clock source.
 If it is so slow, that it results in performance degradation, it just
 should not be used at all, even such pariah as tcpdump wants to be fast.
 
 Actually, I have a question. Why the subject is
 Network performance degradation from 2.6.11.12 to 2.6.16.20?
 I do not see beginning of the thread and cannot guess
 why clock source degraded. :-)

It's a long and sad story.

Old kernels didn't disable the TSC on those boxes (multi core K8) and assumed
they were synchronized for timing purposes. 

This initially mostly worked  if you don't use cpufreq, 
but over a longer uptime the TSCs would drift against each other and timing
would jump more and more between CPUs.

On older versions of K8 this drift happened much slower (more
aggressive power saving in HLT in newer steppings made it worse; that is why
idle=poll helps) and could be often ignored. But technically it was still a 
bug there because it would could break timing after long uptimes.

New multi socket K8 boxes are generally 
totally unusable with TSC because they use cpufreq and the TSCs can run
at completely differently frequencies, which obviously doesn't give very 
good timing information if you assume the TSC is globally synchronized.

That is why later kernels default to TSC off.  The original plan 
was to use HPET then, which is slower than TSC, but still not that bad.
But while most modern systems have a HPET timer somewhere in the chipset 
nearly all BIOS vendors forgot to describe it in the BIOS because Windows
didn't use it and Linux can't find it because of that. 

Then it has to use the ACPI pmtmr which is really really slow.
The overhead of that thing is so large that you can clearly see it in
the network benchmark.

The real fix long term is to change the timer subsystem to keep all TSC
state per CPU, then it'll work on the K8s too. Unfortunately it's a moderately 
hard problem  to make the result still fully monotonic. But people are working 
on it.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

 Hmm, not sure how that could happen. Also is it a real problem
 even if it could?

As I said, the problem is _occasionally_ theoretical.

This would happen f.e. if packet socket handler was installed
after IP handler. Then tcpdump would get packet after it is processed
(acked/replied/forwarded). This would be disasterous, the results
are unparsable.

I recall, the issue was discussed, and that time it looked more
reasonable to solve problems of this kind taking timestamp once
before it is seen by all the rest of stack. Who could expect that
PIT nightmare is going to return? :-)


 Then it has to use the ACPI pmtmr which is really really slow.
 The overhead of that thing is so large that you can clearly see it in
 the network benchmark.

I see. Thank you.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 18:28, Alexey Kuznetsov wrote:
 Hello!
 
  Hmm, not sure how that could happen. Also is it a real problem
  even if it could?
 
 As I said, the problem is _occasionally_ theoretical.
 
 This would happen f.e. if packet socket handler was installed
 after IP handler. Then tcpdump would get packet after it is processed
 (acked/replied/forwarded). This would be disasterous, the results
 are unparsable.

But that never happens right? 

And do you have some other prefered way to solve this? Even if the timer
was fast it would be still good to avoid it in the fast path when DHCPD
is running.

I suppose in the worst case a sysctl like Vladimir asked for could be added,
but it would seem somewhat lame.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux

2006-09-18 Thread James Morris

On Fri, 8 Sep 2006, Venkat Yekkirala wrote:

 + if (selinux_compat_net) {
 + err = selinux_xfrm_decode_session(skb, peersid, 0);
 + BUG_ON(err);

I'm pretty sure this should not be a BUG_ON.  IIUC, you want to panic the 
kernel because one of the nested SAs has a different security context.

 + err = selinux_xfrm_decode_session(skb, xfrm_sid, 0);
 + BUG_ON(err);

Same.


-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tcp congestion policy selection link order fragile

2006-09-18 Thread David Miller

From: bert hubert [EMAIL PROTECTED]
Date: Mon, 18 Sep 2006 17:40:48 +0200

 What would the desired default be, 'BIC' in all cases?

And if BIC is not enabled in the configuration, then what?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux

2006-09-18 Thread James Morris

On Fri, 8 Sep 2006, Venkat Yekkirala wrote:

 This defines SELinux enforcement of the 2 new LSM hooks.
 

I think this looks ok in general (I have a couple more technical issues), 
athough I believe that Stephen has some question about policy 
construction.

Please rename these hooks:

+ * @skb_policy_check:
+ * @skb_netfilter_check

to:

skb_flow_in
skb_flow_out



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PATCH] NET: Fixes for net-2.6.19

2006-09-18 Thread Thomas Graf

* YOSHIFUJI Hideaki / ?$B5HF#1QL@ [EMAIL PROTECTED] 2006-09-19 00:08
 [NET]: Move netlink interface bits to linux/if_link.h.
 
 Moving netlink interface bits to linux/if.h is rather troublesome for
 applications including both linux/if.h (which was changed to be included
 from linux/rtnetlink.h automatically) and net/if.h.

Agreed.

 [NET]: Include new rtnetlink headers for userspace backward compatibility.
 
 Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]
 
 diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
 index 3a18add..8ec375c 100644
 --- a/include/linux/rtnetlink.h
 +++ b/include/linux/rtnetlink.h
 @@ -2,7 +2,12 @@ #ifndef __LINUX_RTNETLINK_H
  #define __LINUX_RTNETLINK_H
  
  #include linux/netlink.h
 +#ifndef __KERNEL__
 +/* Backward compatibility */
  #include linux/if_link.h
 +#include linux/if_addr.h
 +#include linux/neighbour.h
 +#endif
  
  /
   *   Routing/neighbour discovery messages.

Still acceptable but this gets ugly at some point. Applications using
the interface should start making copies of the header version they
use.

 commit 55a08a9078b243a06223222735580df9e11a5fa6
 Author: YOSHIFUJI Hideaki [EMAIL PROTECTED]
 Date:   Sun Sep 17 13:55:02 2006 +0900
 
 [NET]: Put {IFLA,IFA,NDA,NDTA}_{RTA,PAYLOAD}() macro back.
 
 These macros are still used by userspace applications.

Same here, it doesn't make sense to export macros only of functional
value and used by userspace only. The same issue will pop up once
all users have been converted to use the new netlink interface.
Keeping the old interface around just so userspace doesn't have to
make copies doesn't make sense. I think it's better to start fixing
userspace than to try and keep headers source compatible.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux

2006-09-18 Thread Venkat Yekkirala

 On Fri, 8 Sep 2006, Venkat Yekkirala wrote:
 
  +   if (selinux_compat_net) {
  +   err = selinux_xfrm_decode_session(skb, peersid, 0);
  +   BUG_ON(err);
 
 I'm pretty sure this should not be a BUG_ON.  IIUC, you want 
 to panic the 
 kernel because one of the nested SAs has a different security context.

No, we are sending in 0 for the ckall param by which we are telling
the function NOT to do any checks, but to simply set the return param
peersid to the secid on the first xfrm if any and succeed by returning 0.
Must not fail.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 4/7] secid reconciliation-v02: Invoke LSM hook for out bound traffic

2006-09-18 Thread Venkat Yekkirala

 On Fri, 8 Sep 2006, Venkat Yekkirala wrote:
 
  @@ -114,6 +128,9 @@ static struct xt_target xt_connsecmark_t
  .target = target,
  .targetsize = sizeof(struct 
 xt_connsecmark_target_info),
  .table  = mangle,
  +   .hooks  = (1  NF_IP_LOCAL_IN) |
  + (1  NF_IP_FORWARD) |
  + (1  NF_IP_POST_ROUTING),
 
 Why have you added constraints on the hooks?
 
 This breaks a bunch of things.

I was trying to restrict the module usage to these, but later realized
I really needn't. Will take these out.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] secid reconciliation-v02: Invoke LSM hook for outbound traffic

2006-09-18 Thread James Morris

On Fri, 8 Sep 2006, Venkat Yekkirala wrote:

 -static void secmark_restore(struct sk_buff *skb)
 +static unsigned int secmark_restore(struct sk_buff *skb, unsigned int
 hooknum,
 +const struct xt_target *target)
 {
 - if (!skb-secmark) {
 - u32 *connsecmark;
 - enum ip_conntrack_info ctinfo;
 + u32 *psecmark;
 + u32 secmark = 0;
 + enum ip_conntrack_info ctinfo;
 
 - connsecmark = nf_ct_get_secmark(skb, ctinfo);
 - if (connsecmark  *connsecmark)
 - if (skb-secmark != *connsecmark)
 - skb-secmark = *connsecmark;
 - }
 + psecmark = nf_ct_get_secmark(skb, ctinfo);
 + if (psecmark)
 + secmark = *psecmark;
 +
 + if (!secmark)
 + return XT_CONTINUE;
 +
 + /* Set secmark on inbound and filter it on outbound */
 + if (hooknum == NF_IP_POST_ROUTING || hooknum == NF_IP6_POST_ROUTING) {
 + if (!security_skb_netfilter_check(skb, secmark))
 + return NF_DROP;
 + } else
 + if (skb-secmark != secmark)
 + skb-secmark = secmark;
 +
 + return XT_CONTINUE;
 }

Quite a lot of logic has changed here.

With the original code, we only restored a secmark once for the lifetime 
of a packet or connetcion (to make behavior deterministic and security 
marks immutable in the face of arbitrarily complex iptables rules).

With your patch, secmarks are always writable.

What about packets on the OUTPUT hook?

Also, we did not restore a 'null' (zero) secmark to the skb (while this 
should never happen with the current SECMARK target, there may be 
non-SELinux extensions later which set a null marking).

Why not just do something like:


psecmark = nf_ct_get_secmark(skb, ctinfo);
if (psecmark  *psecmark) {

... core of function ...

}

return XT_CONTINUE;

I don't think you need the new secmark variable.

You've also changed the logic for the dummy case of 
security_skb_netfilter_check()


+static inline int security_skb_netfilter_check(struct sk_buff *skb,
+   u32 nf_secid)
+{
+   return 1;
+}
+

This code does not now behave as it did originally.  Keep in mind that 
SELinux is not the only user of SECMARK.

(The documentation of the hook in security.h doesn't match the behavior, 
either -- it's (re-)labeling, not just filtering).

I really don't know if connection tracking is the right place to be doing 
policy enforcment, either.  Perhaps you should just do the relabeling here 
and enforcement later.

The xt_SECMARK.c case has similar issues to all of the above.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tcp congestion policy selection link order fragile

2006-09-18 Thread bert hubert

On Mon, Sep 18, 2006 at 11:53:09AM -0700, David Miller wrote:
  What would the desired default be, 'BIC' in all cases?
 
 And if BIC is not enabled in the configuration, then what?

As the source notes /* we'll always have reno */ . This would make the
policy: the default is bic if available, otherwise it is reno, which is
*always* available.

But it is all up to you. I'm willing to do the leg work.

Bert

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Alexey Kuznetsov

Hello!

Of course, number of ACK increases. It is the goal. :-)


 unpleasant increase in service demands on something like a burst 
 enabled (./configure --enable-burst) netperf TCP_RR test:
 
 netperf -t TCP_RR -H foo -- -b N   # N  1

foo=localhost

b   patched orig
2   105874.83   105143.71
3   114208.53   114023.07
4   120493.99   120851.27
5   128087.48   128573.33
10  151328.48   151056.00

Probably, the test is done wrong. But I see no difference.


 to increase as a result.   Pipelined HTTP would be like that, some NFS 
 over TCP stuff too, maybe X traffic,

X will be excited about better latency.

What's about protocols not interested in latency, they will be a little
happier, if transactions are processed asynchronously.

But actually, it is not about increasing/decreasing number of ACKs.
It is about killing that pain in ass which we used to have because
we pretended to be too smart.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Rick Jones


Alexey Kuznetsov wrote:

Hello!

Of course, number of ACK increases. It is the goal. :-)

unpleasant increase in service demands on something like a burst 
enabled (./configure --enable-burst) netperf TCP_RR test:


netperf -t TCP_RR -H foo -- -b N   # N  1


foo=localhost


There isn't any sort of clever short-circuiting in loopback is there?  I 
do like the convenience of testing things over loopback, but always fret 
about not including drivers and actual hardware interrupts etc.



b   patched orig
2   105874.83   105143.71
3   114208.53   114023.07
4   120493.99   120851.27
5   128087.48   128573.33
10  151328.48   151056.00



Probably, the test is done wrong. But I see no difference.


Regardless, kudos for running the test.  The only thing missing is the 
-c and -C options to enable the CPU utilization measurements which will 
then give the service demand on a CPU time per transaction basis.  Or 
was this a UP system that was taken to CPU saturation?


to increase as a result.   Pipelined HTTP would be like that, some NFS 
over TCP stuff too, maybe X traffic,



X will be excited about better latency.

What's about protocols not interested in latency, they will be a little
happier, if transactions are processed asynchronously.


What i'm thinking about isn't so much about the latency as it is the 
aggregate throughput a system can do with lots of these 
protocols/connections going at the same time.  Hence the concern about 
increases in service demand.



But actually, it is not about increasing/decreasing number of ACKs.
It is about killing that pain in ass which we used to have because
we pretended to be too smart.


:)

rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] secid reconciliation-v02: Repost patchset with updates

2006-09-18 Thread Paul Moore

On Friday 08 September 2006 12:50 pm, Venkat Yekkirala wrote:
 UPCOMING WORK:

 The following per the discussion at:
   http://marc.theaimsgroup.com/?l=selinuxm=115755980516072w=2

 - Create IPSec SAs to be acquired with the creating sock's context as
 opposed to that of the matching SPD rule, resulting in a simpler SPD as
 well as policy. - Set peer_sid on tcp sockets to the reconciled secmark so
 trusted applications can retrieve and service the data at the appropriate
 context.

Considering the discussions that have taken place on the SELinux list I think 
doing the work to set the peer_sid value on TCP sockets is an important part 
of the secid work and should be included in this patchset.  I don't believe 
it would be that difficult, and it would make some of the code much 
cleaner/simpler I think.

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] please include in 2.6.18: e100 disable device on PCI error

2006-09-18 Thread Andrew Morton

On Mon, 18 Sep 2006 15:01:22 -0500
[EMAIL PROTECTED] (Linas Vepstas) wrote:

 
 Hi,
 
 Please apply the following one-liner patch to  
 what will become the stable 2.6.18.  This patch is 
 low-risk because it affects only the PCI error 
 recovery code, which dosn't run on most platforms
 (in particular, isn't invoked on current x86/ia64).
 
 This patch was originally sent on 29 June 2006
 to fix a bug that showed up in an -mm build.
 The code from -mm made it into mainline, but 
 this patch did not, and so we're unhappy. :-(
 
 Here's the original patch description:
 
 A recent patch in -mm3 titled 
 gregkh-pci-pci-don-t-enable-device-if-already-enabled.patch
 causes pci_enable_device() to be a no-op if the kernel thinks
 that the device is already enabled.  This change breaks the
 PCI error recovery mechanism in the e100 device driver, since, 
 after PCI slot reset, the card is no longer enabled. This is 
 a trivial fix for this problem. Tested.
 
 Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
 Signed-off-by: Andrew Morton [EMAIL PROTECTED]
 Signed-off-by: Auke Kok [EMAIL PROTECTED]
 
 
  drivers/net/e100.c |1 +
  1 file changed, 1 insertion(+)
 
 Index: linux-2.6.18-rc7-git1/drivers/net/e100.c
 ===
 --- linux-2.6.18-rc7-git1.orig/drivers/net/e100.c 2006-09-18 
 14:21:49.0 -0500
 +++ linux-2.6.18-rc7-git1/drivers/net/e100.c  2006-09-18 14:24:50.0 
 -0500
 @@ -2799,6 +2799,7 @@ static pci_ers_result_t e100_io_error_de
   /* Detach; put netif into state similar to hotplug unplug. */
   netif_poll_enable(netdev);
   netif_device_detach(netdev);
 + pci_disable_device(pdev);
  
   /* Request a slot reset. */
   return PCI_ERS_RESULT_NEED_RESET;

hm.  I don't have this patch queued, but I _do_ have an equivalent patch
for e1000 queued; what's up with that?  Nobody seems to have paid much
attention to the e1000 fix.

If we can gather the appropriate acks quickly then I expect we can get both
of these into 2.6.18.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] EtherIP tunnel driver (RFC 3378)

2006-09-18 Thread Lennert Buytenhek

On Mon, Sep 11, 2006 at 10:41:29PM +0200, Joerg Roedel wrote:

 This driver implements the tunneling of Ethernet packets over IPv4
 networks for Linux. It uses the protocol defined in RFC 3378.

Check out the thread [PATCH][RFC] etherip: Ethernet-in-IPv4 tunneling
that was on netdev in January of 2005 -- a number of arguments against
etherip (and for tunneling ethernet in GRE) were raised back then.

One of the most significant ones, IMHO:

 Another argument against etherip would be that OpenBSD apparently
 mis-implemented etherip by putting the etherip version nibble in the
 second nibble of the etherip header instead of the first, which would
 probably prevent the linux and OpenBSD versions from interoperating,
 negating the advantage of using etherip in the first place.


cheers,
Lennert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] please include in 2.6.18: e100 disable device on PCI error

2006-09-18 Thread Auke Kok


Andrew Morton wrote:

On Mon, 18 Sep 2006 15:01:22 -0500
[EMAIL PROTECTED] (Linas Vepstas) wrote:


Hi,

Please apply the following one-liner patch to  
what will become the stable 2.6.18.  This patch is 
low-risk because it affects only the PCI error 
recovery code, which dosn't run on most platforms

(in particular, isn't invoked on current x86/ia64).

This patch was originally sent on 29 June 2006
to fix a bug that showed up in an -mm build.
The code from -mm made it into mainline, but 
this patch did not, and so we're unhappy. :-(


Here's the original patch description:

A recent patch in -mm3 titled 
gregkh-pci-pci-don-t-enable-device-if-already-enabled.patch

causes pci_enable_device() to be a no-op if the kernel thinks
that the device is already enabled.  This change breaks the
PCI error recovery mechanism in the e100 device driver, since, 
after PCI slot reset, the card is no longer enabled. This is 
a trivial fix for this problem. Tested.


Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]


 drivers/net/e100.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.18-rc7-git1/drivers/net/e100.c
===
--- linux-2.6.18-rc7-git1.orig/drivers/net/e100.c   2006-09-18 
14:21:49.0 -0500
+++ linux-2.6.18-rc7-git1/drivers/net/e100.c2006-09-18 14:24:50.0 
-0500
@@ -2799,6 +2799,7 @@ static pci_ers_result_t e100_io_error_de
/* Detach; put netif into state similar to hotplug unplug. */
netif_poll_enable(netdev);
netif_device_detach(netdev);
+   pci_disable_device(pdev);
 
 	/* Request a slot reset. */

return PCI_ERS_RESULT_NEED_RESET;


hm.  I don't have this patch queued, but I _do_ have an equivalent patch
for e1000 queued; what's up with that?  Nobody seems to have paid much
attention to the e1000 fix.

If we can gather the appropriate acks quickly then I expect we can get both
of these into 2.6.18.


Ack! for both, of course.

I'm unsure what happened here, as I have this patch in my local tree. I suspect 
that it got merged into jeff's #upstream only somehow.


Auke

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

 But that never happens right? 

Right.

Well, not right. It happens. Simply because you get packet
with newer timestamp after previous handler saw this packet
and did some actions. I just do not see any bad consequences.


 And do you have some other prefered way to solve this? Even if the timer
 was fast it would be still good to avoid it in the fast path when DHCPD
 is running.

No. The way, which you suggested, seems to be the best.


1. It even does not disable possibility to record timestamp inside
   driver, which Alan was afraid of. The sequence is:

if (!skb-tstamp.off_sec)
net_timestamp(skb);

2. Maybe, netif_rx() should continue to get timestamp in netif_rx().

3. NAPI already introduced almost the same inaccuracy. And it is really
   silly to waste time getting timestamp in netif_receive_skb() a few
   moments before the packet is delivered to a socket.

4. ...but clock source, which takes one of top lines in profiles
   must be repaired yet. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] please include in 2.6.18: e100 disable device on PCI error

2006-09-18 Thread Linas Vepstas


It seems our mails crossed.

On Mon, Sep 18, 2006 at 01:21:02PM -0700, Andrew Morton wrote:
 
 hm.  I don't have this patch queued, but I _do_ have an equivalent patch
 for e1000 queued; what's up with that?  Nobody seems to have paid much
 attention to the e1000 fix.

I spotted the e100 patch in your broken-out patches earlier today, 
as a part of git-netdev-all.patch (where it had the right changelog 
and old acks)

 If we can gather the appropriate acks quickly then I expect we can get both
 of these into 2.6.18.

That would be great! Thanks.

--linas
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Mon, Sep 18, 2006 at 01:27:57PM +0200, Andi Kleen wrote:
 The codebase for timing (and lots of other things) is quite different
 between 32bit and 64bit. You're really surprised it doesn't work if you do 
 such things?
 
It works, and after your remark above, I'm surprised.
Dunno about slow TSC drift though, there was not enough time passed to
detect it, and I hope we will have this problem soved in a better way
before the drift becomes visible :)

  But the question is, why stock 2.6.18-rc7 could not use TSC on its own?
 
 x86-64 doesn't use the TSC when it deems it to not be reliable, which
 is the case on your system.
  
Could it at least print something so that I know that using TSC  was
considered, but rejected?

  What hardware exactly. Doesn't it affect only CPU? And they are not
  know to fail before any other components.
 
 All hardware. It's basic physics.

Hm, what other hardware is affected by idle=poll? Does this option ear
out HDDs?
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Mon, Sep 18, 2006 at 06:50:22PM +0200, Andi Kleen wrote:
 
 I suppose in the worst case a sysctl like Vladimir asked for could be added,
 but it would seem somewhat lame.
 
Please think about it this way:
suppose you haave a heavily loaded router and some network problem is to
be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
switching to timestamp-it-all mode), drops OSPF adjancecies etc. Users
are angry, and you can't diagnose anything. But with impresise
timestamps and maybe even with reordered packets you still have some
traces to analyze.
So, in this particular corner case it's not that lame.

Or maybe patching tcpdump will do better?
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Miller

From: Alexey Kuznetsov [EMAIL PROTECTED]
Date: Tue, 19 Sep 2006 01:03:21 +0400

 1. It even does not disable possibility to record timestamp inside
driver, which Alan was afraid of. The sequence is:

   if (!skb-tstamp.off_sec)
 net_timestamp(skb);

 2. Maybe, netif_rx() should continue to get timestamp in netif_rx().

 3. NAPI already introduced almost the same inaccuracy. And it is really
silly to waste time getting timestamp in netif_receive_skb() a few
moments before the packet is delivered to a socket.

 4. ...but clock source, which takes one of top lines in profiles
must be repaired yet. :-)

Ok, ok, but don't we have queueing disciplines that need the timestamp
even on ingress?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello!

 Please think about it this way:
 suppose you haave a heavily loaded router and some network problem is to
 be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
 switching to timestamp-it-all mode

I am sorry. I cannot think that way. :-)

Instead of attempts to scare, better resend original report,
where you said how much performance degraded, I cannot find it.

* I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
* I do not undestand what the hell dhcp needs timestamps for.
* I do not listen any suggestions to screw up tcpdump with a sysctl.
  Kernel already implements much better thing then a sysctl.
  Do not want timestamps? Fix tcpdump, add an options, submit the
  patch to tcpdump maintainers. Not a big deal. 

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Vladimir B. Savkin

On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote:
 Hello!
 
  Please think about it this way:
  suppose you haave a heavily loaded router and some network problem is to
  be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
  switching to timestamp-it-all mode
 
 I am sorry. I cannot think that way. :-)
 
 Instead of attempts to scare, better resend original report,
 where you said how much performance degraded, I cannot find it.
 
 * I do see get_offset_pmtmr() in top lines of profile. That's scary enough.

I had it at the very top line.

 * I do not undestand what the hell dhcp needs timestamps for.
 * I do not listen any suggestions to screw up tcpdump with a sysctl.
   Kernel already implements much better thing then a sysctl.
   Do not want timestamps? Fix tcpdump, add an options, submit the
   patch to tcpdump maintainers. Not a big deal. 

OK, point taken.
It's better to patch tcpdump.

 
 Alexey
 
~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread David Lang


On Tue, 19 Sep 2006, Alexey Kuznetsov wrote:


Hello!


Please think about it this way:
suppose you haave a heavily loaded router and some network problem is to
be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
switching to timestamp-it-all mode


I am sorry. I cannot think that way. :-)

Instead of attempts to scare, better resend original report,
where you said how much performance degraded, I cannot find it.

* I do see get_offset_pmtmr() in top lines of profile. That's scary enough.
* I do not undestand what the hell dhcp needs timestamps for.
* I do not listen any suggestions to screw up tcpdump with a sysctl.
 Kernel already implements much better thing then a sysctl.
 Do not want timestamps? Fix tcpdump, add an options, submit the
 patch to tcpdump maintainers. Not a big deal.


if fireing up one program (however minor) can cause network performance to drop 
by 50% (based on the numbers reported earlier in this thread) that is a 
significant problem for sysadmins.


yes tcpdump may be wrong in requesting timestamps (in most cases it probably is, 
but in some cases it's doing exactly what the sysadmin wants it to do), but I 
don't think that many sysadmins would expect this much of a performance hit. 
there should be some way to tell the system to ignore requests for timestamps so 
that a badly behaved program cannot cripple the system this way (and preferably 
something that doesn't require a full SELinux/capabilities implementation)


David Lang
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux

2006-09-18 Thread Paul Moore

On Friday 08 September 2006 12:50 pm, Venkat Yekkirala wrote:
 This defines SELinux enforcement of the 2 new LSM hooks.

{snip}

 +static int selinux_skb_policy_check(struct sk_buff *skb, unsigned short
 family) +{
 + u32 xfrm_sid, trans_sid;
 + int err;
 +
 + if (selinux_compat_net)
 + return 1;
 +
 + err = selinux_xfrm_decode_session(skb, xfrm_sid, 0);
 + BUG_ON(err);

First, any reason against including the struct sock * in the LSM hook?  At a 
quick glance it looks like it is available at each place 
security_skb_policy_check() is invoked?  If there are no objections I would 
like to see it included in the hook.

Second, I wonder if it would be better to do a NetLabel/CIPSO query here using 
the xfrm_sid as the NetLabel base_sid instead of at the end of the function 
(see your comment)?  This way we wouldn't have to duplicate the 
avc_has_perm() and security_transition_sid() calls for both xfrm and 
NetLabel.  It just seems to be more inline with the whole secid 
reconciliation concept.

I don't feel too strongly either way, I just thought it was worth exploring - 
thoughts?

 + err = avc_has_perm(xfrm_sid, skb-secmark, SECCLASS_PACKET,
 + PACKET__FLOW_IN, NULL);
 + if (err)
 + goto out;
 +
 + if (xfrm_sid) {
 + err = security_transition_sid(xfrm_sid, skb-secmark,
 + SECCLASS_PACKET, trans_sid);
 + if (err)
 + goto out;
 +
 + skb-secmark = trans_sid;
 + }
 +
 + /* See if CIPSO can flow in thru the current secmark here */
 +
 +out:
 + return err ? 0 : 1;
 +};

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Alexey Kuznetsov

Hello!

 There isn't any sort of clever short-circuiting in loopback is there?

No, from all that I know.


   I 
 do like the convenience of testing things over loopback, but always fret 
 about not including drivers and actual hardware interrupts etc.

Well, if the test is right, it should show cost of redundant ACKs.


 Regardless, kudos for running the test.  The only thing missing is the 
 -c and -C options to enable the CPU utilization measurements which will 
 then give the service demand on a CPU time per transaction basis.  Or 
 was this a UP system that was taken to CPU saturation?

It is my notebook. :-) Of course, cpu consumption is 100%.
(Actally, netperf shows 100.10 :-))

I will redo test on a real network. What range of -b should I test?


 What i'm thinking about isn't so much about the latency

I understand.

Actually, I did those tests ages ago for a pure throughput case,
when nothing goes in the opposite direction. I did not find a difference
that time. And nobody even noticed that Linux sends ACKs _each_ small
segment for unidirectional connections for all those years. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tcp congestion policy selection link order fragile

2006-09-18 Thread Stephen Hemminger

On Sun, 17 Sep 2006 16:51:50 +0200
bert hubert [EMAIL PROTECTED] wrote:

 The original message Stephen reacts to below apparently never made it to the
 list, it can be found here: http://ds9a.nl/tmp/module-policy.txt
 
  Any body who builds in random stuff without thinking is being foolish.
  But, if you can think of a better configuration method that isn't too
  grotty, then go for it.
 
 The method I'm proposing is simple enough:
 
 1) reno is always built-in
 2) it is the default tcp congestion policy

No, Reno is unstable in high BDP

 3) loading/compiling-in additional tcp congestion policies only make them
available
 4) userspace is free to select a non-default tcp congestion policy at will
 
 The implementation might be as simple as making the *first* registered
 congestion policy the default (instead of the last one) which would be reno,
 as it is in tcp_cong.o, which is probably always loaded first (as the other
 .o's need symbols that are in tcp_cong.o).
 
 Despite what you allege about my foolishness, I maintain that a kernel that
 enables a *random policy* from the ones you compiled in, is not a sane
 kernel.
 
 The default kernel should be as sane as possible, allowing the userspace
 people (ie, me) to mess things up to their heart's desire.
 
   Bert
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Rick Jones

Regardless, kudos for running the test.  The only thing missing is the 
-c and -C options to enable the CPU utilization measurements which will 
then give the service demand on a CPU time per transaction basis.  Or 
was this a UP system that was taken to CPU saturation?



It is my notebook. :-) Of course, cpu consumption is 100%.
(Actally, netperf shows 100.10 :-))


Gotta love the accuracy. :)



I will redo test on a real network. What range of -b should I test?



I suppose that depends on your patience :) In theory, as you increase 
(eg double) the -b setting you should reach a point of diminishing 
returns wrt transaction rate.  If you see that, and see the service 
demand flattening-out I'd say it is probably time to stop.


I'm also not quite sure if abc needs to be disabled or not.

I do know that I left-out one very important netperf option.  The 
command line should be:


netperf -t TCP_RR -H foo -- -b N -D

where -D is added to set TCP_NODELAY.  Otherwise, the ratio of 
transactions to data segments is fubar.  That issue is also why I wonder 
about the setting of tcp_abc.


[I have this quixotic pipedream about being able to --enable-burst, set 
-D and say that the number of TCP segments exchanged on the network is 
2X the transaction count when request and response size are  MSS.  The 
raison d'etre for this pipe dream is maximizing PPS with TCP_RR tests 
without _having_ to have hundreds if not thousands of simultaneous 
netperfs/connections - say with just as many netperfs/connections as 
there are CPUs or threads/strands in the system. It was while trying to 
make this pipe dream a reality I first noticed that HP-UX 11i, which 
normally has a very nice ACK avoidance heuristic, would send an 
immediate ACK if it received back-to-back sub-MSS segments - thus 
ruining my pipe dream when it came to HP-UX testing.  Hapily, I noticed 
that linux didn't seem to be doing the same thing. Hence my tweaking 
when seeing this patch come along...]



What i'm thinking about isn't so much about the latency



I understand.

Actually, I did those tests ages ago for a pure throughput case,
when nothing goes in the opposite direction. I did not find a difference
that time. And nobody even noticed that Linux sends ACKs _each_ small
segment for unidirectional connections for all those years. :-)


Not everyone looks very closely (alas, sometimes myself included).

If all anyone does is look at throughput, until they CPU saturate they 
wouldn't notice.  Heck, before netperf and TCP_RR tests, and sadly even 
still today, most people just look at how fast a single-connection, 
unidirectional data transfer goes and leave it at that :(


Thankfully, the set of most people and netdev aren't completely 
overlapping.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] xt_policy: remove dups in .family

2006-09-18 Thread Alexey Dobriyan

sparse defined twice warning

Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
---

 net/netfilter/xt_policy.c |2 --
 1 file changed, 2 deletions(-)

--- a/net/netfilter/xt_policy.c
+++ b/net/netfilter/xt_policy.c
@@ -171,7 +171,6 @@ static struct xt_match policy_match = {
.match  = match,
.matchsize  = sizeof(struct xt_policy_info),
.checkentry = checkentry,
-   .family = AF_INET,
.me = THIS_MODULE,
 };
 
@@ -181,7 +180,6 @@ static struct xt_match policy6_match = {
.match  = match,
.matchsize  = sizeof(struct xt_policy_info),
.checkentry = checkentry,
-   .family = AF_INET6,
.me = THIS_MODULE,
 };
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-18 Thread Stephen Rothwell

On powerpc and ppc, insl_ns and insl are identical as are outsl_ns and
outsl, so remove the conditional use of insl_ns and outsl_ns.

Signed-off-by: Stephen Rothwell [EMAIL PROTECTED]
---
 drivers/net/3c509.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

This is in anticipation of removing the insl_ns and outsl_ns definitions
which are powerpc sepcific patches.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]

diff --git a/drivers/net/3c509.c b/drivers/net/3c509.c
index cbdae54..add6381 100644
--- a/drivers/net/3c509.c
+++ b/drivers/net/3c509.c
@@ -879,11 +879,7 @@ #endif
outw(skb-len, ioaddr + TX_FIFO);
outw(0x00, ioaddr + TX_FIFO);
/* ... and the packet rounded to a doubleword. */
-#ifdef  __powerpc__
-   outsl_ns(ioaddr + TX_FIFO, skb-data, (skb-len + 3)  2);
-#else
outsl(ioaddr + TX_FIFO, skb-data, (skb-len + 3)  2);
-#endif
 
dev-trans_start = jiffies;
if (inw(ioaddr + TX_FREE)  1536)
@@ -1103,13 +1099,8 @@ el3_rx(struct net_device *dev)
skb_reserve(skb, 2); /* Align IP on 16 byte 
*/
 
/* 'skb-data' points to the start of sk_buff 
data area. */
-#ifdef  __powerpc__
-   insl_ns(ioaddr+RX_FIFO, skb_put(skb,pkt_len),
-  (pkt_len + 3)  2);
-#else
insl(ioaddr + RX_FIFO, skb_put(skb,pkt_len),
 (pkt_len + 3)  2);
-#endif
 
outw(RxDiscard, ioaddr + EL3_CMD); /* Pop top 
Rx packet. */
skb-protocol = eth_type_trans(skb,dev);
-- 
1.4.2.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add Broadcom PHY support

2006-09-18 Thread Jeff Garzik


Amy Fong wrote:

[PATCH] Add Broadcom PHY support

This patch adds a driver to support the bcm5421s and bcm5461s PHY

Kernel version:  linux-2.6.18-rc6

Signed-off-by: Amy Fong [EMAIL PROTECTED]


And... where are the users of this phy driver?

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] tcp: set congestion default through Kconfig

2006-09-18 Thread Stephen Hemminger

Bert's attempt was noble
It showed your desire for the truth
A simple path exists

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/ipv4/Kconfig   |   39 +--
 net/ipv4/sysctl_net_ipv4.c |7 +++
 net/ipv4/tcp_cong.c|2 +-
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 90f9136..e922c3a 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -573,12 +573,47 @@ config TCP_CONG_VENO
loss packets.
See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf
 
+choice
+   prompt Default TCP congestion control
+   default DEFAULT_BIC
+   help
+ Select the TCP congestion control that will be used by default
+ for all connections.
+
+   config DEFAULT_BIC
+   bool Bic if TCP_CONG_BIC=y
+
+   config DEFAULT_CUBIC
+   bool Cubic if TCP_CONG_CUBIC=y
+
+   config DEFAULT_HTCP
+   bool Htcp if TCP_CONG_HTCP=y
+
+   config DEFAULT_VEGAS
+   bool Vegas if TCP_CONG_VEGAS=y
+
+   config DEFAULT_WESTWOOD
+   bool Westwood if TCP_CONG_WESTWOOD=y
+
+   config DEFAULT_RENO
+   bool Reno
+
+endchoice
+
 endmenu
 
-config TCP_CONG_BIC
-   tristate
+config DEFAULT_BIC
depends on !TCP_CONG_ADVANCED
default y
 
+config DEFAULT_TCP_CONG
+   string
+   default bic if DEFAULT_BIC
+   default cubic if DEFAULT_CUBIC
+   default htcp if DEFAULT_HTCP
+   default vegas if DEFAULT_VEGAS
+   default westwood if DEFAULT_WESTWOOD
+   default reno if DEFAULT_RENO
+
 source net/ipv4/ipvs/Kconfig
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 19b2071..52b6481 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -129,6 +129,13 @@ static int sysctl_tcp_congestion_control
return ret;
 }
 
+static __init void tcp_congestion_default(void)
+{
+   tcp_set_default_congestion_control(CONFIG_DEFAULT_TCP_CONG)
+}
+
+late_initcall(tcp_congestion_default);
+
 
 ctl_table ipv4_table[] = {
 {
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 7ff2e42..af0aca1 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -48,7 +48,7 @@ int tcp_register_congestion_control(stru
printk(KERN_NOTICE TCP %s already registered\n, ca-name);
ret = -EEXIST;
} else {
-   list_add_rcu(ca-list, tcp_cong_list);
+   list_add_tail_rcu(ca-list, tcp_cong_list);
printk(KERN_INFO TCP %s registered\n, ca-name);
}
spin_unlock(tcp_cong_list_lock);
-- 
1.4.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tcp: set congestion default through Kconfig

2006-09-18 Thread Ian McDonald


Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/ipv4/Kconfig   |   39 +--
 net/ipv4/sysctl_net_ipv4.c |7 +++
 net/ipv4/tcp_cong.c|2 +-
 3 files changed, 45 insertions(+), 3 deletions(-)


Nice solution.

Signed-off-by: Ian McDonald [EMAIL PROTECTED]
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Andi Kleen

On Monday 18 September 2006 23:03, Alexey Kuznetsov wrote:

 
  And do you have some other prefered way to solve this? Even if the timer
  was fast it would be still good to avoid it in the fast path when DHCPD
  is running.
 
 No. The way, which you suggested, seems to be the best.

Ok. I also checked my desktop and for some reason I got a timestamp counter
of 7 (and it doesn't even run client dhcp). Haven't investigated why yet, and I 
am 
still hoping it's not a leak. 

But that hints that trying to fix all of user space to not use the ioctl 
would have been probably too much work.


 1. It even does not disable possibility to record timestamp inside
driver, which Alan was afraid of. The sequence is:
 
   if (!skb-tstamp.off_sec)
 net_timestamp(skb);
 
 2. Maybe, netif_rx() should continue to get timestamp in netif_rx().

Hmm, there are still quite a lot users and even with netif_rx() you
can have long delays from interrupt mitigation etc.

% grep -rw netif_rx drivers/net/*  | wc -l
253

 3. NAPI already introduced almost the same inaccuracy. And it is really
silly to waste time getting timestamp in netif_receive_skb() a few
moments before the packet is delivered to a socket.
 
 4. ...but clock source, which takes one of top lines in profiles
must be repaired yet. :-)

It's being worked on, but it'll take some time. But even when TSC 
can be used it's still a good idea to not call gtod unnecessarily 
because it can be still relatively slow (e.g. on P4 RDTSC takes
hundreds of cycles because it synchronizes the CPU). Also on some 
other non x86 platforms it is also relatively slow because they have 
to reach out to the chipset and every time you do that things get slow.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

77 matches

Mail list logo