date:20071203

[PATCH] SMC911X: Fix using of dereferenced skb after netif_rx

2007-12-03 Thread Wang Chen

After netif_rx(), skb will be freed. So get the
skb-len before netif_rx().

Signed-off-by: Wang Chen [EMAIL PROTECTED]
---
 smc911x.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

--- linux-2.6.24.rc3.org/drivers/net/smc911x.c  2007-11-19 12:38:05.0 
+0800
+++ linux-2.6.24.rc3/drivers/net/smc911x.c  2007-11-30 15:00:53.0 
+0800
@@ -1287,7 +1287,7 @@ smc911x_rx_dma_irq(int dma, void *data)
struct smc911x_local *lp = netdev_priv(dev);
struct sk_buff *skb = lp-current_rx_skb;
unsigned long flags;
-   unsigned int pkts;
+   unsigned int pkts, len;
 
DBG(SMC_DEBUG_FUNC, %s: -- %s\n, dev-name, __FUNCTION__);
DBG(SMC_DEBUG_RX | SMC_DEBUG_DMA, %s: RX DMA irq handler\n, 
dev-name);
@@ -1299,9 +1299,10 @@ smc911x_rx_dma_irq(int dma, void *data)
PRINT_PKT(skb-data, skb-len);
dev-last_rx = jiffies;
skb-protocol = eth_type_trans(skb, dev);
+   len = skb-len;
netif_rx(skb);
dev-stats.rx_packets++;
-   dev-stats.rx_bytes += skb-len;
+   dev-stats.rx_bytes += len;
 
spin_lock_irqsave(lp-lock, flags);
pkts = (SMC_GET_RX_FIFO_INF()  RX_FIFO_INF_RXSUSED_)  16;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Ian McDonald

On 12/3/07, Arnaldo Carvalho de Melo [EMAIL PROTECTED] wrote:
 WARNING: After reading some messages from Ingo Molnar on lkml I think we 
 should really
  trim the number of lists we use for kernel development. And since I 
 moved
  back to using mutt for reading e-mails, something I should have 
 never, ever
  stopped doing, I guess we should move the DCCP discussions to netdev,
  where we hopefully can get more people interested and reviewing the 
 work we
  do, so please consider moving DCCP discussion to 
 netdev@vger.kernel.org,
  where lots of smart networking folks are present and can help our 
 efforts
  on turning RFCs to code.

I (and others too) don't necessarily have time to read netdev so would
vote on keeping dccp. I would totally agree to making sure that
cross-post to netdev as well as dccp.

Ian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: namespace support requires network modules to say GPL

2007-12-03 Thread Denis V. Lunev

Patrick McHardy wrote:
 Adrian Bunk wrote:
 On Sun, Dec 02, 2007 at 09:03:56PM +0100, Patrick McHardy wrote:

 For all I care binary modules can break, but frankly I don't see
 how encapsulating a couple of structures and pointers in a new
 structure and adding a new argument to existing functions shifts
 the decision about how a function should be usable to the namespace
 guys. IMO all functions should continue to be usable as before,
 as decided by whoever actually wrote them.
 ...

 Even ignoring the fact that it's unclear whether distributing modules
 with not GPLv2 compatible licences is legal at all or might bring you
 in jail,
 
 Agreed, lets ignore that :)
 
 your statement has an interesting implication:

 Stuff like e.g. the EXPORT_SYMBOL(sk_alloc) predates the
 EXPORT_SYMBOL_GPL stuff.

 Who is considered the author of this code?

 And when should he state whether he prefers to use EXPORT_SYMBOL_GPL
 but wasn't able to use it at that when he wrote it since his code
 predates it and is glad to be able to decide this now?
 
 
 He can state it when he feels like it, I don't see the point.
 Authors generally get to decide whether they use EXPORT_SYMBOL
 or EXPORT_SYMBOL_GPL unless in cases where its really clear-cut
 that EXPORT_SYMBOL is inapproriate. But thats a different matter.
 
 If a symbol was OK to be used previously and something using it
 would not automatically be considered a derived work, how does
 passing init_net to the function just to make the compiler
 happy, avoid BUG_ONs and generally keep things working as before
 make it more of a derived work?

We, namely, Pavel Emelyanov and me, if we have some rights as a
committers to this staff :), do not mind against change
EXPORT_SYMBOL_GPL to EXPORT_SYMBOL.

Regards,
Den
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][RESEND] PHY: Add the phy_device_release device method.

2007-12-03 Thread Thierry Reding

In cases where more than a single PHY is found on the MDIO bus, the kernel
will print a warning that this method is missing for each PHY device that
is not attached to a networking device.

Signed-off-by: Thierry Reding [EMAIL PROTECTED]
---
 drivers/net/phy/mdio_bus.c |   19 ++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index fc2f0e6..cb7fb47 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -36,6 +36,23 @@
 #include asm/uaccess.h
 
 /**
+ * phy_device_release - free a phy_device structure when all users of it are
+ * finished.
+ *
+ * @dev: device that's been disconnected
+ *
+ * Will be called only by the device core when all users of this phy_device
+ * are done.
+ */
+static void phy_device_release(struct device *dev)
+{
+   struct phy_device *phy;
+
+   phy = to_phy_device(dev);
+   kfree(phy);
+}
+
+/**
  * mdiobus_register - bring up all the PHYs on a given bus and attach them to 
bus
  * @bus: target mii_bus
  *
@@ -83,6 +100,7 @@ int mdiobus_register(struct mii_bus *bus)
if (phydev) {
phydev-irq = bus-irq[i];
 
+   phydev-dev.release = phy_device_release;
phydev-dev.parent = bus-dev;
phydev-dev.bus = mdio_bus_type;
snprintf(phydev-dev.bus_id, BUS_ID_SIZE, PHY_ID_FMT, 
bus-id, i);
@@ -112,7 +130,6 @@ void mdiobus_unregister(struct mii_bus *bus)
for (i = 0; i  PHY_MAX_ADDR; i++) {
if (bus-phy_map[i]) {
device_unregister(bus-phy_map[i]-dev);
-   kfree(bus-phy_map[i]);
}
}
 }


signature.asc
Description: Digital signature

Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Gerrit Renker

Hi Arnaldo,

hank you for going through this. I have just backported your recent patches of 
2.6.25
to the DCCP/CCID4/Faster Restart test tree at 
git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr}
as per subsequent message.
|do, so please consider moving DCCP discussion to 
netdev@vger.kernel.org,
|where lots of smart networking folks are present and can help our 
efforts
|on turning RFCs to code.
Are you suggesting using netdev exclusively or in addition to [EMAIL PROTECTED]
  

|   Please take a look at this patch series where I reorganized your work 
on the new
| TFRC rx history handling code. I'll wait for your considerations and then do 
as many
| interactions as reasonable to get your work merged.
| 
|   It should be completely equivalent, plus some fixes and optimizations, 
such as:
It will be necessary to address these points one-by-one. Before diving into 
making
fixes and `optimisations', have you tested your code? The patches you are 
referring to
have been posted and re-posted for a period of over 9 months on [EMAIL 
PROTECTED], and
there are regression tests which show that this code improves on the existing 
Linux
implementation. These are labelled as `test tree' on 
http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing
So if you are making changes to the code, I would like to ask if you have run 
similar
regression tests, to avoid having to step back later.


 
| . The code that allocates the RX ring deals with failures when one of the 
entries in
|   the ring buffer is not successfully allocated, the original code was 
leaking the
|   successfully allocated entries.
| 
| . We do just one allocation for the ring buffer, as the number of entries is 
fixed we
|   should just do one allocation and not TFRC_NDUPACK times.
Will look at the first point in the patch; with regard to the second point I 
agree, this
will make the code simpler, which is good. 

| . I haven't checked if all the code was commited, as I tried to introduce 
just what was
|   immediatelly used, probably we'll need to do some changes when working on 
the merge
|   of your loss intervals code.
Sorry I don't understand this point.

| . I changed the ccid3_hc_rx_packet_recv code to set hcrx-ccid3hcrx_s for the 
first
|   non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 
as the
|   initial value in the EWMA calculation.
This is a misunderstanding. Non-data packets are not considered in the moving 
average
for the data packet size `s'; and it would be an error to do (consider 40byte 
Acks vs.
1460byte data packets, also it is in RFC 4342). 
Where would the zero initial value come from? I think this is also a 
misunderstanding.
Please have a look below:
static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff 
*skb)
{
// ...
u32 sample, payload_size = skb-len - dccp_hdr(skb)-dccph_doff 
* 4;

if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
if (is_data_packet) {
do_feedback = FBACK_INITIAL;
ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
ccid3_hc_rx_update_s(hcrx, payload_size);
}
goto update_records;
}

== Non-data packets are ignored for the purposes of computing s (this is in 
the RFC),
consequently update_s() is only called for data packets; using the two 
following
functions:


static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 
weight)
{
return avg ? (weight * avg + (10 - weight) * newval) / 10 : 
newval;
}

static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, 
int len)
{
if (likely(len  0))/* don't update on empty packets (e.g. 
ACKs) */
hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 
9);
}

== Hence I can't see where a zero value should come from: ccid3hrx_s is 
initially 
initialised with zero (memset(...,0,...)); when first called, update_s() 
will
feed a non-zero payload size to tfrc_ewma(), which will return  `newval' = 
payload_size,
hence the first data packet will contribute a non-zero payload_size.
Zero-sized DCCP-Data packets are pathological and are ignored by the CCID 
calculations
(not by the receiver); a corresponding counterpart for zero-sized

| 
|   It is available at:
| 
| master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25
| 
Need to do this separately. As said, the code has been developed and tested 
over a long time,
it took a long while until it acted predictably, so being careful is very 
important.

I would rather not have my patches merged and continue to run a test tree if 
the current
changes alter the behaviour to the worse.
--
To unsubscribe

Re: [PATCH (resubmit)] Fix inet_diag.ko register vs rcv race

2007-12-03 Thread Pavel Emelyanov

Herbert Xu wrote:
 On Thu, Nov 29, 2007 at 04:01:25PM +0300, Pavel Emelyanov wrote:
 @@ -863,13 +861,13 @@ int inet_diag_register(const struct inet_diag_handler 
 *h)
  if (type = INET_DIAG_GETSOCK_MAX)
  goto out;
  
 -spin_lock(inet_diag_register_lock);
 +mutex_lock(inet_diag_mutex);
  err = -EEXIST;
  if (inet_diag_table[type] == NULL) {
  inet_diag_table[type] = h;
  err = 0;
  }
 -spin_unlock(inet_diag_register_lock);
 +mutex_unlock(inet_diag_mutex);
 
 Actually this causes a dead-lock when the handlers are built as modules
 because we try to load them with that mutex held.

Ouch! Sorry, I didn't notice this :(

 I've fixed it with this patch on top.

Thanks!

 Cheers,

Pavel

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-2.6.25 1/6][CORE] Remove unneeded ifdefs from sysctl_net_core.c

2007-12-03 Thread Pavel Emelyanov

Eric W. Biederman wrote:
 Pavel Emelyanov [EMAIL PROTECTED] writes:
 
 They include the whole file, but it is already compiled
 out when SYSCTL=n, since it is obj-$(CONFIG_SYSCTL) target
 in the Makefile.
 
 Pavel thanks for sending these patches.  Might I ask
 for some level of acknowledgement when you rework one of
 my patches and send it off.

Sure. I though I've been doing exactly this thing. I've said
that unix ctls were from your tree and so on.

 I suppose this could be a case of duplicate thinking but
 this patch looks very familiar.

Hm... Do you mean that this one is from your tree? Sorry then,
I didn't know this. I can resend it with From: you if you wish.

 Eric

Thanks,
Pavel

 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

 ---

 diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
 index 113cc72..277c8fa 100644
 --- a/net/core/sysctl_net_core.c
 +++ b/net/core/sysctl_net_core.c
 @@ -13,8 +13,6 @@
  #include net/sock.h
  #include net/xfrm.h
  
 -#ifdef CONFIG_SYSCTL
 -
  ctl_table core_table[] = {
  #ifdef CONFIG_NET
  {
 @@ -151,5 +149,3 @@ ctl_table core_table[] = {
  },
  { .ctl_name = 0 }
  };
 -
 -#endif
 -- 
 1.5.3.4

 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: namespace support requires network modules to say GPL

2007-12-03 Thread Romano Giannetti



On Sat, 2007-12-01 at 22:34 -0500, Mark Lord wrote:
 Stephen Hemminger wrote:.
 
  I spoke too soon earlier, ndiswrapper builds and loads against current
  2.6.24-rc3. Vmware and proprietary VPN software probably do not. Once again 
  I don't
  give a damn, but the enterprise distro vendors certainly care.
 ...
 
 Naw, enterprise (or any other) distro vendors shouldn't have any issues here,
 since they can just patch their kernels around any issues.

Please pardon me for jumping in; I am not a kernel developer, but I try
to help with debugging whenever I can (and it's not just hand-waving, I
helped to track down a couple of nasty bugs on MMC or ACPI EC,
recently). And I am an engineer and IANAL, so I wouldn't speak about
laws here. But I think it's not just a distribution's problem.

Unfortunately, I need VMware and ndiswrapper to get work done with my
laptop. It's not the perfect world, but the only alternative is to boot
in XP. So I normally stick with vendors kernels and, when I have time to
play with new kernel, I go for it. If ndiswrapper and VMware work,
perfect, I can test extensively the new kernel; if I find problems, I
*know* I have to restart without proprietary modules, try to reproduce,
report back. I did it a lot of times.

What I think is that every time VMware or (worst) ndiswrapper breaks,
the kernel loose an awful lot of testers. In the span of time before
Giri and the VMware team post a patch (-rc1 and -rc4, tipically), my
testing activity is just occasional. And I guess a lot of people is in
the same situation. 

These are just my 2cents. I will continue to test new kernels every time
I can, and to use native solutions as often as I can (go, ath5k, go!;
and LabWindows/CVI for Linux, anyone?). But maybe a bit of tolerance can
help everyone...

Back in my hole,

Romano 


-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SSB: No is not an answer

2007-12-03 Thread Jarek Poplawski

On 01-12-2007 21:27, Michael Buesch wrote:
 On Saturday 01 December 2007 20:00:23 Arnaldo Carvalho de Melo wrote:
 Em Sat, Dec 01, 2007 at 12:45:32PM -0500, John W. Linville escreveu:
 On Sat, Dec 01, 2007 at 03:17:44PM -0200, Arnaldo Carvalho de Melo wrote:
 Sonics Silicon Backplane support (SSB) [M/y/?] (NEW) n

 Support for the Sonics Silicon Backplane bus.
 You only need to enable this option, if you are
 configuring a kernel for an embedded system with
 this bus.
 It will be auto-selected if needed in other
 environments.

 The module will be called ssb.

 If unsure, say N.

 Sonics Silicon Backplane support (SSB) [M/y/?] (NEW)
 I think this is OK -- it isn't really offering the choice to say
 no anyway.  You must have turned-on B44 or B43(LEGACY) already?

 So, your choice is merely whether to have it built-in or as a module.
 Ok, so the comment on being unsure is wrong as we can't say N as
 suggested :-)
 
 Oh, come on... Read the _whole_ comment.

Why? IMHO If unsure... are kind of defaults just to let you not read
or understand the whole comment. But here even reading the whole
doesn't make this less amusing, because it suggests you still have the
choice (eg. to revert auto-selection).

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: fix lro_gen_skb() alignment

2007-12-03 Thread Herbert Xu

On Sun, Dec 02, 2007 at 07:07:26PM -0500, Andrew Gallatin wrote:
 
 That was my first thought as well, but it turns out that
 when lro_gen_skb() is called via the out1 label, mac_hdr_len
 may not be known.  It seemed simplest and cleanest to just
 make it a field in lro_mgr.

Fair enough.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] [IPSEC]: Reinject packet instead of calling netfilter directly on input

2007-12-03 Thread Herbert Xu

On Thu, Nov 29, 2007 at 03:49:34PM -0500, jamal wrote:
 Herbert,
 
 This is a simplified version of one of your earlier patches that never
 made it in. I liked it so much that i reduced it to this and infact
 given the cycles today, tested it (with transport and tunnel mode
 only;-).

Sorry for the late response Jamal.  I've been too busy to give
this issue proper thought.  It's still in my mailbox so I will
respond to it once things quiten down a little.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] netem: trace enhancement

2007-12-03 Thread Ariane Keller




Patrick McHardy wrote:

Ariane Keller wrote:

Thanks for your comments!

I'd like to better understand your dislike of the current 
implementation  of the data transfer from user space to kernel space.

Is it the fact that we use configfs?
I think, we had already a discussion about this (and we changed from 
procfs to configfs).
Or don't you like that we need a user space daemon which is 
responsible for feeding the data to the kernel module?
I think we do not have another option, since the trace file may be of 
arbitrary length.

Or anything else?



I dislike using anything besides rtnetlink for qdisc configuration.
The only way to transfer arbitary amounts of data over netlink would
be to spread the data over multiple messages. But then again, you're
using kmalloc and only seem to allocate 4k, so how large are these
traces in practice?


For each packet to be processed there is 32bit of data, which encodes 
delay and drop, duplicate etc. The size of the actual trace file can 
therefore reach any length, depending on for how many packets the 
information is encoded (up to several GB).
Therefore we send the trace file in chunks of 4000bytes to the kernel. 
In order to have always a packet-delay-value ready, we maintain two 
delay queues in the kernel (each of 4k). In a first step, both queues 
are filled, and the values are read from the first queue, if this queue 
is finished, we read values from the second queue and fill the first 
queue with new values from the trace file etc. Therefore we have a user 
space process running, which reads the values from the trace file, and 
sends them to the kernel.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h

2007-12-03 Thread Denis V. Lunev

This patch removes some unneeded includes for net_namespace.h to speed up
compilation.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f285de6..28b7f25 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -2,7 +2,6 @@
 #define __NET_PKT_CLS_H
 
 #include linux/pkt_cls.h
-#include net/net_namespace.h
 #include net/sch_generic.h
 #include net/act_api.h
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 43e3cd9..a04e361 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -57,7 +57,6 @@
 #include asm/atomic.h
 #include net/dst.h
 #include net/checksum.h
-#include net/net_namespace.h
 
 /*
  * This structure really needs to be cleaned up.
@@ -95,6 +94,7 @@ typedef struct {
 
 struct sock;
 struct proto;
+struct net;
 
 /**
  * struct sock_common - minimal network layer representation of sockets
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 driver problems

2007-12-03 Thread Lukas Hejtmanek

On Tue, Nov 27, 2007 at 10:23:00AM -0800, Kok, Auke wrote:
 can you see if your problem goes away with this patch?

I cannot test it right now but friend of mine has the same card with 2.6.23.1
kernel. it does not. he also tried module 7.6.12 from source fourge, your patch
did not help. Moreover, it just hangs network connections after while. No
message in dmesg about hangup.

-- 
Lukáš Hejtmánek
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][IPVS] Don't leak sysctl tables if the scheduler registration fails

2007-12-03 Thread Pavel Emelyanov

In case we load lblc or lblcr module we can leak some sysctl
tables if the call to register_ip_vs_scheduler() fails.

I've looked at the register_ip_vs_scheduler() code and saw, that
the only reason to fail is the name collision, so I think that
with some 3rd party schedulers this becomes a relevant issue. No?

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv4/ipvs/ip_vs_lblc.c b/net/ipv4/ipvs/ip_vs_lblc.c
index b843a11..ad89644 100644
--- a/net/ipv4/ipvs/ip_vs_lblc.c
+++ b/net/ipv4/ipvs/ip_vs_lblc.c
@@ -580,9 +580,14 @@ static struct ip_vs_scheduler ip_vs_lblc_scheduler =
 
 static int __init ip_vs_lblc_init(void)
 {
+   int ret;
+
INIT_LIST_HEAD(ip_vs_lblc_scheduler.n_list);
sysctl_header = register_sysctl_table(lblc_root_table);
-   return register_ip_vs_scheduler(ip_vs_lblc_scheduler);
+   ret = register_ip_vs_scheduler(ip_vs_lblc_scheduler);
+   if (ret)
+   unregister_sysctl_table(sysctl_header);
+   return ret;
 }
 
 
diff --git a/net/ipv4/ipvs/ip_vs_lblcr.c b/net/ipv4/ipvs/ip_vs_lblcr.c
index e5b323a..2a5ed85 100644
--- a/net/ipv4/ipvs/ip_vs_lblcr.c
+++ b/net/ipv4/ipvs/ip_vs_lblcr.c
@@ -769,9 +769,14 @@ static struct ip_vs_scheduler ip_vs_lblcr_scheduler =
 
 static int __init ip_vs_lblcr_init(void)
 {
+   int ret;
+
INIT_LIST_HEAD(ip_vs_lblcr_scheduler.n_list);
sysctl_header = register_sysctl_table(lblcr_root_table);
-   return register_ip_vs_scheduler(ip_vs_lblcr_scheduler);
+   ret = register_ip_vs_scheduler(ip_vs_lblcr_scheduler);
+   if (ret)
+   unregister_sysctl_table(sysctl_header);
+   return ret;
 }
 
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx

2007-12-03 Thread Wang Chen

Herbert Xu said the following on 2007-12-3 18:11:
 Please send driver patches to Jeff Garzik [EMAIL PROTECTED].
 

Sorry. Resend the patch.

After netif_rx(), skb will be freed. So get the
skb-len before netif_rx().

Signed-off-by: Wang Chen [EMAIL PROTECTED]
---
 smc911x.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

--- linux-2.6.24.rc3.org/drivers/net/smc911x.c  2007-11-19 12:38:05.0 
+0800
+++ linux-2.6.24.rc3/drivers/net/smc911x.c  2007-11-30 15:00:53.0 
+0800
@@ -1287,7 +1287,7 @@ smc911x_rx_dma_irq(int dma, void *data)
struct smc911x_local *lp = netdev_priv(dev);
struct sk_buff *skb = lp-current_rx_skb;
unsigned long flags;
-   unsigned int pkts;
+   unsigned int pkts, len;
 
DBG(SMC_DEBUG_FUNC, %s: -- %s\n, dev-name, __FUNCTION__);
DBG(SMC_DEBUG_RX | SMC_DEBUG_DMA, %s: RX DMA irq handler\n, 
dev-name);
@@ -1299,9 +1299,10 @@ smc911x_rx_dma_irq(int dma, void *data)
PRINT_PKT(skb-data, skb-len);
dev-last_rx = jiffies;
skb-protocol = eth_type_trans(skb, dev);
+   len = skb-len;
netif_rx(skb);
dev-stats.rx_packets++;
-   dev-stats.rx_bytes += skb-len;
+   dev-stats.rx_bytes += len;
 
spin_lock_irqsave(lp-lock, flags);
pkts = (SMC_GET_RX_FIFO_INF()  RX_FIFO_INF_RXSUSED_)  16;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx

2007-12-03 Thread Herbert Xu

On Mon, Dec 03, 2007 at 03:59:09PM +0800, Wang Chen wrote:
 After netif_rx(), skb will be freed. So get the
 skb-len before netif_rx().
 
 Signed-off-by: Wang Chen [EMAIL PROTECTED]

Please send driver patches to Jeff Garzik [EMAIL PROTECTED].

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][IPVS] Fix sched registration race when checking for name collision

2007-12-03 Thread Pavel Emelyanov

The register_ip_vs_scheduler() checks for the scheduler with the
same name under the read-locked __ip_vs_sched_lock, then drops,
takes it for writing and puts the scheduler in list.

This is racy, since we can have a race window between the lock
being re-locked for writing.

The fix is to search the scheduler with the given name right under
the write-locked __ip_vs_sched_lock.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv4/ipvs/ip_vs_sched.c b/net/ipv4/ipvs/ip_vs_sched.c
index 1602304..4322358 100644
--- a/net/ipv4/ipvs/ip_vs_sched.c
+++ b/net/ipv4/ipvs/ip_vs_sched.c
@@ -183,19 +183,6 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler 
*scheduler)
/* increase the module use count */
ip_vs_use_count_inc();
 
-   /*
-*  Make sure that the scheduler with this name doesn't exist
-*  in the scheduler list.
-*/
-   sched = ip_vs_sched_getbyname(scheduler-name);
-   if (sched) {
-   ip_vs_scheduler_put(sched);
-   ip_vs_use_count_dec();
-   IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler 
- already existed in the system\n, scheduler-name);
-   return -EINVAL;
-   }
-
write_lock_bh(__ip_vs_sched_lock);
 
if (scheduler-n_list.next != scheduler-n_list) {
@@ -207,6 +194,20 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler 
*scheduler)
}
 
/*
+*  Make sure that the scheduler with this name doesn't exist
+*  in the scheduler list.
+*/
+   list_for_each_entry(sched, ip_vs_schedulers, n_list) {
+   if (strcmp(scheduler-name, sched-name) == 0) {
+   write_unlock_bh(__ip_vs_sched_lock);
+   ip_vs_use_count_dec();
+   IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler 
+   already existed in the system\n,
+   scheduler-name);
+   return -EINVAL;
+   }
+   }
+   /*
 *  Add it into the d-linked scheduler list
 */
list_add(scheduler-n_list, ip_vs_schedulers);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sungem: fix napi regression with reset work

2007-12-03 Thread Johannes Berg


On Thu, 2007-11-29 at 21:57 +1100, Herbert Xu wrote:
 On Mon, Nov 26, 2007 at 09:02:08PM +0100, Johannes Berg wrote:
  sungem's gem_reset_task() will unconditionally try to disable NAPI even
  when it's called while the interface is not operating and hence the NAPI
  struct isn't enabled. Make napi_disable() depend on gp-running.
  
  Also removes a superfluous test of gp-running in the same function.
  
  Signed-off-by: Johannes Berg [EMAIL PROTECTED]
 
 Patch applied to net-2.6.  Thanks Johannes!

Thanks. I just noticed that I made a mistake in the patch description,
it should read Make napi_disable() depend on gp-opened. I don't think
it matters much but wanted to at least mention it.

johannes


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-2.6.25 1/6][CORE] Remove unneeded ifdefs from sysctl_net_core.c

2007-12-03 Thread Eric W. Biederman

Pavel Emelyanov [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
 Pavel Emelyanov [EMAIL PROTECTED] writes:
 
 They include the whole file, but it is already compiled
 out when SYSCTL=n, since it is obj-$(CONFIG_SYSCTL) target
 in the Makefile.
 
 Pavel thanks for sending these patches.  Might I ask
 for some level of acknowledgement when you rework one of
 my patches and send it off.

 Sure. I though I've been doing exactly this thing. I've said
 that unix ctls were from your tree and so on.

Ok.  I guess I must have overlooked that attribution.

 I suppose this could be a case of duplicate thinking but
 this patch looks very familiar.

 Hm... Do you mean that this one is from your tree? Sorry then,
 I didn't know this. I can resend it with From: you if you wish.

It is.  But I don't see any of this as needing a resend.  Please
just don't forget me in the future is all I ask.

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] [IrDA] irlmp_unregister_link needs to free lsaps

2007-12-03 Thread Samuel Ortiz

While testing the mcs7780 based IrDA USB dongle I've stumbled upon
memory leak in irlmp_unregister_link(). Hashbin for lsaps is created in
irlmp_register_link and should probably be freed in irlmp_unregister_link().

Signed-off-by: Hinko Kocevar [EMAIL PROTECTED]
Signed-off-by: Samuel Ortiz [EMAIL PROTECTED]

---
 net/irda/irlmp.c |1 +
 1 file changed, 1 insertion(+)

Index: net-2.6/net/irda/irlmp.c
===
--- net-2.6.orig/net/irda/irlmp.c   2007-11-25 05:54:02.0 +0100
+++ net-2.6/net/irda/irlmp.c2007-11-25 07:12:13.0 +0100
@@ -353,6 +353,7 @@
/* Final cleanup */
del_timer(link-idle_timer);
link-magic = 0;
+   hashbin_delete(link-lsaps, (FREE_FUNC) __irlmp_close_lsap);
kfree(link);
}
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [1/4] dst: Distributed storage documentation.

2007-12-03 Thread Evgeniy Polyakov

Hi Matt.

On Sun, Dec 02, 2007 at 10:50:59PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
wrote:
  Distributed storage documentation.
  
  Algorithms used in the system, userspace interfaces
  (sysfs dirs and files), design and implementation details
  are described here.
 
 Can you give us a summary of how this differs from using device mapper
 with NBD?

From the higher point ov view it does not, but it operates quite differently:
it has async processing of the requests, thus not blocking, it has
different protocol with smaller overhead, supports strong checksums, has
in-kernel export server, which supports simple security attributes (i.e.
allow to connect, to read or write). It uses smaller amount of memory
(zero additional allocations in the common path for linear mapping,
not including network allocations, it uses smaller amount of additional
allocations for mirroring case).
DST supports failure recovery in case of dropped connection (core will
reconnect to the remote node when it is ready), thus it is possible to
turn off and on remote nodes without special administration steps. DST
has simple autoconfiguration at the startup time (support checksums and
storage size autonegotiation). It is possible to turn one of the mirror
nodes off and use it as a offline backup, since dst mirror node stores
data at the end of the storage, so it can be mounted locally.

-- 
Evgeniy Polyakov
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-2.6.25] TCP: Fix copy-paste (or code move) error

2007-12-03 Thread Herbert Xu

On Sun, Dec 02, 2007 at 06:45:44PM +0200, Ilpo Järvinen wrote:
 Should get the skb from the same queue. I had it first elsewhere
 and missed this change while moving.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied to net-2.6.25.  Thanks Ilpo!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 1/2][IPV6] Make the ipv6/sysctl_net_ipv6.c compilation cleaner

2007-12-03 Thread Pavel Emelyanov

Since this file is enclosed with the #ifdef CONFIG_SYSCTL/#endif
pair, it's OK to move this CONFIG_ into a Makefile.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 5ffa980..24f3aa0 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -8,9 +8,9 @@ ipv6-objs :=af_inet6.o anycast.o ip6_output.o ip6_input.o 
addrconf.o \
addrlabel.o \
route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
raw.o protocol.o icmp.o mcast.o reassembly.o tcp_ipv6.o \
-   exthdrs.o sysctl_net_ipv6.o datagram.o \
-   ip6_flowlabel.o inet6_connection_sock.o
+   exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o
 
+ipv6-$(CONFIG_SYSCTL) = sysctl_net_ipv6.o
 ipv6-$(CONFIG_XFRM) += xfrm6_policy.o xfrm6_state.o xfrm6_input.o \
xfrm6_output.o
 ipv6-$(CONFIG_NETFILTER) += netfilter.o
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 68bb254..227efa7 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -14,8 +14,6 @@
 #include net/addrconf.h
 #include net/inet_frag.h
 
-#ifdef CONFIG_SYSCTL
-
 static ctl_table ipv6_table[] = {
{
.ctl_name   = NET_IPV6_ROUTE,
@@ -115,8 +113,3 @@ void ipv6_sysctl_unregister(void)
 {
unregister_sysctl_table(ipv6_sysctl_header);
 }
-
-#endif /* CONFIG_SYSCTL */
-
-
-
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 2/2][IPV6] Use sysctl paths to register ipv6 sysctl tables

2007-12-03 Thread Pavel Emelyanov

This makes the ipv6.ko smaller and creates the ground needed
for net namespaces support in ipv6.ko ssctls.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 227efa7..0b5bec3 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -82,31 +82,17 @@ static ctl_table ipv6_table[] = {
{ .ctl_name = 0 }
 };
 
-static struct ctl_table_header *ipv6_sysctl_header;
-
-static ctl_table ipv6_net_table[] = {
-   {
-   .ctl_name   = NET_IPV6,
-   .procname   = ipv6,
-   .mode   = 0555,
-   .child  = ipv6_table
-   },
-   { .ctl_name = 0 }
+static struct ctl_path ipv6_ctl_path[] = {
+   { .procname = net, .ctl_name = CTL_NET, },
+   { .procname = ipv6, .ctl_name = NET_IPV6, },
+   { },
 };
 
-static ctl_table ipv6_root_table[] = {
-   {
-   .ctl_name   = CTL_NET,
-   .procname   = net,
-   .mode   = 0555,
-   .child  = ipv6_net_table
-   },
-   { .ctl_name = 0 }
-};
+static struct ctl_table_header *ipv6_sysctl_header;
 
 void ipv6_sysctl_register(void)
 {
-   ipv6_sysctl_header = register_sysctl_table(ipv6_root_table);
+   ipv6_sysctl_header = register_sysctl_paths(ipv6_ctl_path, ipv6_table);
 }
 
 void ipv6_sysctl_unregister(void)
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC][Resend][PATCH 0/7]: CCID3/TFRC RX history and loss intervals implementation

2007-12-03 Thread Gerrit Renker

I am re-submitting the original patch set which Arnaldo referred to, due to:

 1/  It has been reworked to patch/compile against the 2.6.25 tree, which lead
 to a few changes (not conceptual in nature).

 2/  Patch #7 in Arnaldo's set says that this patch set introduced some bugs.

 Please can you point out if and what those are, so that they can be fixed 
(just
 saying that there are bugs without saying where they are is not very 
helpful).

Overview:
-
Patch #1: Provides a central module source file for dccp_tfrc_lib.
Patch #2: Implements a new (reduced) RX history interface.
Patch #3: Hooks CCID3 up with the new RX history interface.
Patch #4: Removes the old/previous RX history interface.
Patch #5: Ringbuffer-based loss intervals database.
Patch #6: Interface CCID3 with loss intervals database. 

The algorithms that are implemented here have been documented on 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] [TFRC]: Provide central source file and debug facility

2007-12-03 Thread Gerrit Renker

This patch changes the tfrc_lib module in the following manner:

 (1) a dedicated tfrc_module source file  (this is later populated
 with TX/RX hist and LI Database routines);
 (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'.

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
Signed-off-by: Ian McDonald [EMAIL PROTECTED]
---
 net/dccp/ccids/Kconfig  |   12 ++---
 net/dccp/ccids/lib/Makefile |3 +-
 net/dccp/ccids/lib/loss_interval.c  |2 +-
 net/dccp/ccids/lib/packet_history.c |   28 ++---
 net/dccp/ccids/lib/packet_history.h |3 +-
 net/dccp/ccids/lib/tfrc.h   |   17 ++---
 net/dccp/ccids/lib/tfrc_module.c|   45 +++
 7 files changed, 73 insertions(+), 37 deletions(-)
 create mode 100644 net/dccp/ccids/lib/tfrc_module.c

diff --git a/net/dccp/ccids/Kconfig b/net/dccp/ccids/Kconfig
index 3d7d867..a0c42d7 100644
--- a/net/dccp/ccids/Kconfig
+++ b/net/dccp/ccids/Kconfig
@@ -63,10 +63,6 @@ config IP_DCCP_CCID3
 
  If in doubt, say M.
 
-config IP_DCCP_TFRC_LIB
-   depends on IP_DCCP_CCID3
-   def_tristate IP_DCCP_CCID3
-
 config IP_DCCP_CCID3_DEBUG
  bool CCID3 debugging messages
  depends on IP_DCCP_CCID3
@@ -110,5 +106,13 @@ config IP_DCCP_CCID3_RTO
is serious network congestion: experimenting with larger values 
should
therefore not be performed on WANs.
 
+# The TFRC Library: currently only has CCID 3 as customer
+config IP_DCCP_TFRC_LIB
+   depends on IP_DCCP_CCID3
+   def_tristate IP_DCCP_CCID3
+
+config IP_DCCP_TFRC_DEBUG
+   bool
+   default y if IP_DCCP_CCID3_DEBUG
 
 endmenu
diff --git a/net/dccp/ccids/lib/Makefile b/net/dccp/ccids/lib/Makefile
index 5f940a6..1295635 100644
--- a/net/dccp/ccids/lib/Makefile
+++ b/net/dccp/ccids/lib/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_IP_DCCP_TFRC_LIB) += dccp_tfrc_lib.o
 
-dccp_tfrc_lib-y := loss_interval.o packet_history.o tfrc_equation.o
+dccp_tfrc_lib-y := tfrc_module.otfrc_equation.o \
+   packet_history.o loss_interval.o
diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index f2ca4eb..4fe3c1c 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -285,7 +285,7 @@ int __init dccp_li_init(void)
return dccp_li_cachep == NULL ? -ENOBUFS : 0;
 }
 
-void dccp_li_exit(void)
+void __exit dccp_li_exit(void)
 {
if (dccp_li_cachep != NULL) {
kmem_cache_destroy(dccp_li_cachep);
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 4805de9..ebcb6c0 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -35,7 +35,6 @@
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
-#include linux/module.h
 #include linux/string.h
 #include packet_history.h
 
@@ -277,39 +276,18 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct 
list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
 
-extern int __init dccp_li_init(void);
-extern void dccp_li_exit(void);
-
-static __init int packet_history_init(void)
+int __init packet_history_init(void)
 {
-   if (dccp_li_init() != 0)
-   goto out;
-
tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist,
 sizeof(struct tfrc_tx_hist_entry), 0,
 SLAB_HWCACHE_ALIGN, NULL);
-   if (tfrc_tx_hist == NULL)
-   goto out_li_exit;
-
-   return 0;
-out_li_exit:
-   dccp_li_exit();
-out:
-   return -ENOBUFS;
+   return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
 }
-module_init(packet_history_init);
 
-static __exit void packet_history_exit(void)
+void __exit packet_history_exit(void)
 {
if (tfrc_tx_hist != NULL) {
kmem_cache_destroy(tfrc_tx_hist);
tfrc_tx_hist = NULL;
}
-   dccp_li_exit();
 }
-module_exit(packet_history_exit);
-
-MODULE_AUTHOR(Ian McDonald [EMAIL PROTECTED], 
- Arnaldo Carvalho de Melo [EMAIL PROTECTED]);
-MODULE_DESCRIPTION(DCCP TFRC library);
-MODULE_LICENSE(GPL);
diff --git a/net/dccp/ccids/lib/packet_history.h 
b/net/dccp/ccids/lib/packet_history.h
index 0670f46..9a2642e 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -39,8 +39,7 @@
 #include linux/ktime.h
 #include linux/list.h
 #include linux/slab.h
-
-#include ../../dccp.h
+#include tfrc.h
 
 /* Number of later packets received before one is considered lost */
 #define TFRC_RECV_NUM_LATE_LOSS 3
diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h
index 5a0ba86..ab8848c 100644
--- a/net/dccp/ccids/lib/tfrc.h
+++ b/net/dccp/ccids/lib/tfrc.h
@@ -3,10 +3,11 @@
 /*
  *  net/dccp/ccids/lib/tfrc.h
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald [EMAIL

[PATCH 6/6] [CCID]: Interface CCID3 code with newer Loss Intervals Database

2007-12-03 Thread Gerrit Renker

This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the  upper bound of 4 seconds for an RTT sample has  been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
Signed-off-by: Ian McDonald [EMAIL PROTECTED]
---
 net/dccp/ccids/ccid3.c |   76 +++
 net/dccp/ccids/ccid3.h |8 ++--
 net/dccp/dccp.h|7 +++-
 3 files changed, 72 insertions(+), 19 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index ef243dc..bf6d043 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -34,11 +34,7 @@
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
-#include ../ccid.h
 #include ../dccp.h
-#include lib/packet_history.h
-#include lib/loss_interval.h
-#include lib/tfrc.h
 #include ccid3.h
 
 #include asm/unaligned.h
@@ -751,6 +747,46 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, 
struct sk_buff *skb)
return 0;
 }
 
+/** ccid3_first_li  -  Implements [RFC 3448, 6.3.1]
+ *
+ * Determine the length of the first loss interval via inverse lookup.
+ * Assume that X_recv can be computed by the throughput equation
+ * s
+ * X_recv = 
+ *  R * fval
+ * Find some p such that f(p) = fval; return 1/p (scaled).
+ */
+static u32 ccid3_first_li(struct sock *sk)
+{
+   struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
+   u32 x_recv, p, delta;
+   u64 fval;
+
+   if (hcrx-ccid3hcrx_rtt == 0) {
+   DCCP_WARN(No RTT estimate available, using fallback RTT\n);
+   hcrx-ccid3hcrx_rtt = DCCP_FALLBACK_RTT;
+   }
+
+   delta = ktime_to_us(net_timedelta(hcrx-ccid3hcrx_last_feedback));
+   x_recv = scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta);
+   if (x_recv == 0) {  /* would also trigger divide-by-zero */
+   DCCP_WARN(X_recv==0\n);
+   if ((x_recv = hcrx-ccid3hcrx_x_recv) == 0) {
+   DCCP_BUG(stored value of X_recv is zero);
+   return ~0U;
+   }
+   }
+
+   fval = scaled_div(hcrx-ccid3hcrx_s, hcrx-ccid3hcrx_rtt);
+   fval = scaled_div32(fval, x_recv);
+   p = tfrc_calc_x_reverse_lookup(fval);
+
+   ccid3_pr_debug(%s(%p), receive rate=%u bytes/s, implied 
+  loss rate=%u\n, dccp_role(sk), sk, x_recv, p);
+
+   return p == 0 ? ~0U : scaled_div(1, p);
+}
+
 static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
@@ -779,6 +815,14 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
/*
 *  Handle pending losses and otherwise check for new loss
 */
+   if (tfrc_rx_loss_pending(hcrx-ccid3hcrx_hist) 
+   tfrc_rx_handle_loss(hcrx-ccid3hcrx_hist,
+   hcrx-ccid3hcrx_li_hist,
+   skb, ndp, ccid3_first_li, sk) ) {
+   do_feedback = FBACK_PARAM_CHANGE;
+   goto done_receiving;
+   }
+
if (tfrc_rx_new_loss_indicated(hcrx-ccid3hcrx_hist, skb, ndp))
goto update_records;
 
@@ -788,13 +832,23 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, 
struct sk_buff *skb)
if (unlikely(!is_data_packet))
goto update_records;
 
-   if (list_empty(hcrx-ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
-
+   if (! tfrc_lh_is_initialised(hcrx-ccid3hcrx_li_hist)) {
+   /*
+* Empty loss history: no loss so far, hence p stays 0.
+* Sample RTT values, since an RTT estimate is required for the
+* computation of p when the first loss occurs; RFC 3448, 6.3.1.
+*/
sample = tfrc_rx_sample_rtt(hcrx-ccid3hcrx_hist, skb);
if (sample != 0)
hcrx-ccid3hcrx_rtt =
tfrc_ewma(hcrx-ccid3hcrx_rtt, sample, 9);
 
+   } else if (tfrc_lh_update_i_mean(hcrx-ccid3hcrx_li_hist, skb)) {
+   /*
+* Step (3) of [RFC 3448, 6.1]: Recompute I_mean and, if I_mean
+* has decreased (resp. p has increased), send feedback now.
+*/
+   do_feedback = FBACK_PARAM_CHANGE;
}
 
/* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */
@@ -816,10 +870,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock 
*sk)
ccid3_pr_debug(entry\n);

[PATCH 4/6] [TFRC]: Remove old RX history interface

2007-12-03 Thread Gerrit Renker

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
Signed-off-by: Ian McDonald [EMAIL PROTECTED]
---
 net/dccp/ccids/lib/loss_interval.c  |9 ++
 net/dccp/ccids/lib/packet_history.c |  162 ---
 net/dccp/ccids/lib/packet_history.h |   84 --
 3 files changed, 9 insertions(+), 246 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 4fe3c1c..0ebdc67 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -129,6 +129,12 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
 u16 s, u32 bytes_recv,
 u32 previous_x_recv)
 {
+   /*
+* XXX This function still relies on the old RX interface and thus can 
not be
+* kept. But it can also not be removed since the loss interval code 
calls it.
+* Since it will be replaced anyway, comment it out for this moment.
+*/
+#ifdef __THIS_IS_RESOLVED_IN_NEXT_PATCH__
struct dccp_rx_hist_entry *entry, *next, *tail = NULL;
u32 x_recv, p;
suseconds_t rtt, delta;
@@ -220,6 +226,9 @@ found:
return ~0;
else
return 100 / p;
+#else
+   return ~0;
+#endif
 }
 
 void dccp_li_update_li(struct sock *sk,
diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index 2f9e0a0..e87e445 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -129,156 +129,6 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
 /*
  * Receiver History Routines
  */
-struct dccp_rx_hist *dccp_rx_hist_new(const char *name)
-{
-   struct dccp_rx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC);
-   static const char dccp_rx_hist_mask[] = rx_hist_%s;
-   char *slab_name;
-
-   if (hist == NULL)
-   goto out;
-
-   slab_name = kmalloc(strlen(name) + sizeof(dccp_rx_hist_mask) - 1,
-   GFP_ATOMIC);
-   if (slab_name == NULL)
-   goto out_free_hist;
-
-   sprintf(slab_name, dccp_rx_hist_mask, name);
-   hist-dccprxh_slab = kmem_cache_create(slab_name,
-sizeof(struct dccp_rx_hist_entry),
-0, SLAB_HWCACHE_ALIGN, NULL);
-   if (hist-dccprxh_slab == NULL)
-   goto out_free_slab_name;
-out:
-   return hist;
-out_free_slab_name:
-   kfree(slab_name);
-out_free_hist:
-   kfree(hist);
-   hist = NULL;
-   goto out;
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_new);
-
-void dccp_rx_hist_delete(struct dccp_rx_hist *hist)
-{
-   const char* name = kmem_cache_name(hist-dccprxh_slab);
-
-   kmem_cache_destroy(hist-dccprxh_slab);
-   kfree(name);
-   kfree(hist);
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_delete);
-
-int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq,
-   u8 *ccval)
-{
-   struct dccp_rx_hist_entry *packet = NULL, *entry;
-
-   list_for_each_entry(entry, list, dccphrx_node)
-   if (entry-dccphrx_seqno == seq) {
-   packet = entry;
-   break;
-   }
-
-   if (packet)
-   *ccval = packet-dccphrx_ccval;
-
-   return packet != NULL;
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry);
-struct dccp_rx_hist_entry *
-   dccp_rx_hist_find_data_packet(const struct list_head *list)
-{
-   struct dccp_rx_hist_entry *entry, *packet = NULL;
-
-   list_for_each_entry(entry, list, dccphrx_node)
-   if (entry-dccphrx_type == DCCP_PKT_DATA ||
-   entry-dccphrx_type == DCCP_PKT_DATAACK) {
-   packet = entry;
-   break;
-   }
-
-   return packet;
-}
-
-EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet);
-
-void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
-   struct list_head *rx_list,
-   struct list_head *li_list,
-   struct dccp_rx_hist_entry *packet,
-   u64 nonloss_seqno)
-{
-   struct dccp_rx_hist_entry *entry, *next;
-   u8 num_later = 0;
-
-   list_add(packet-dccphrx_node, rx_list);
-
-   num_later = TFRC_RECV_NUM_LATE_LOSS + 1;
-
-   if (!list_empty(li_list)) {
-   list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) {
-   if (num_later == 0) {
-   if (after48(nonloss_seqno,
-  entry-dccphrx_seqno)) {
-   list_del_init(entry-dccphrx_node);
-   dccp_rx_hist_entry_delete(hist, entry);
-   }
-   } else if (dccp_rx_hist_entry_data_packet(entry))
-   --num_later;
-

[PATCH 2/6] [TFRC]: New RX history implementation

2007-12-03 Thread Gerrit Renker

This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.


Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
Signed-off-by: Ian McDonald [EMAIL PROTECTED]
---
 net/dccp/ccids/lib/packet_history.c |  120 ++---
 net/dccp/ccids/lib/packet_history.h |  143 ++-
 net/dccp/ccids/lib/tfrc_module.c|   27 ++-
 net/dccp/dccp.h |   12 +++
 4 files changed, 281 insertions(+), 21 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c 
b/net/dccp/ccids/lib/packet_history.c
index ebcb6c0..2f9e0a0 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -34,7 +34,6 @@
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
-
 #include linux/string.h
 #include packet_history.h
 
@@ -55,6 +54,22 @@ struct tfrc_tx_hist_entry {
  */
 static struct kmem_cache *tfrc_tx_hist;
 
+int __init tx_packet_history_init(void)
+{
+   tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist,
+sizeof(struct tfrc_tx_hist_entry), 0,
+SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
+}
+
+void tx_packet_history_cleanup(void)
+{
+   if (tfrc_tx_hist != NULL) {
+   kmem_cache_destroy(tfrc_tx_hist);
+   tfrc_tx_hist = NULL;
+   }
+}
+
 static struct tfrc_tx_hist_entry *
tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno)
 {
@@ -264,6 +279,49 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet);
 
+static struct kmem_cache *tfrc_rxh_cache;
+
+int __init rx_packet_history_init(void)
+{
+   tfrc_rxh_cache = kmem_cache_create(tfrc_rxh_cache,
+  sizeof(struct tfrc_rx_hist_entry),
+  0, SLAB_HWCACHE_ALIGN, NULL);
+   return tfrc_rxh_cache == NULL ? -ENOBUFS : 0;
+}
+
+void rx_packet_history_cleanup(void)
+{
+   if (tfrc_rxh_cache != NULL) {
+   kmem_cache_destroy(tfrc_rxh_cache);
+   tfrc_rxh_cache = NULL;
+   }
+}
+
+int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
+{
+   int i;
+
+   for (i = 0; i = NDUPACK; i++) {
+   h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
+   if (h-ring[i] == NULL)
+   return 1;
+   }
+   h-loss_count = 0;
+   h-loss_start = 0;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);
+
+void tfrc_rx_hist_cleanup(struct tfrc_rx_hist *h)
+{
+   int i;
+
+   for (i=0; i = NDUPACK; i++)
+   if (h-ring[i] != NULL)
+   kmem_cache_free(tfrc_rxh_cache, h-ring[i]);
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_cleanup);
+
 void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list)
 {
struct dccp_rx_hist_entry *entry, *next;
@@ -276,18 +334,56 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct 
list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
 
-int __init packet_history_init(void)
+/**
+ * tfrc_rx_sample_rtt  -  Sample RTT from timestamp / CCVal
+ * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able
+ * to compute a sample with given data - calling function should check this.
+ */
+u32 tfrc_rx_sample_rtt(struct tfrc_rx_hist *h, struct sk_buff *skb)
 {
-   tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist,
-sizeof(struct tfrc_tx_hist_entry), 0,
-SLAB_HWCACHE_ALIGN, NULL);
-   return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
-}
+   u32 sample = 0,
+   delta_v = SUB16(dccp_hdr(skb)-dccph_ccval, rtt_last_s(h)-ccval);
+
+   if (delta_v  1 || delta_v  4) {   /* unsuitable CCVal delta */
+
+   if (h-rtt_sample_prev == 2) {  /* previous candidate stored */
+   sample = SUB16(rtt_prev_s(h)-ccval,
+  rtt_last_s(h)-ccval);
+   if (sample)
+   sample = 4 / sample
+  * ktime_us_delta(rtt_prev_s(h)-stamp,
+   rtt_last_s(h)-stamp);
+   else/*
+* FIXME: This condition is in principle not
+* possible but occurs when CCID is used for
+

[PATCH 3/6] [CCID3]: Hook up with new RX history interface

2007-12-03 Thread Gerrit Renker

In addition, it makes two corrections too the code:

 1. The receiver of a half-connection does not set window counter values;
only the sender sets window counters [RFC 4342, sections 5 and 8.1].
 2. The computation of X_recv does currently not conform to TFRC/RFC 3448,
since this specification requires that X_recv be computed over the last
R_m seconds (sec. 6.2). The patch tackles this problem as it
  - explicitly distinguishes the types of feedback (using an enum);
  - uses previous value of X_recv when sending feedback due to a parameter 
change;
  - makes all state changes local to ccid3_hc_tx_packet_recv;
  - assigns feedback type according to incident (previously only used flag 
`do_feedback').
   Further and detailed information is at
   
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/#8._Computing_X_recv_

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
Signed-off-by: Ian McDonald [EMAIL PROTECTED]
---
 net/dccp/ccids/ccid3.c |  292 ++--
 net/dccp/ccids/ccid3.h |   38 ---
 2 files changed, 102 insertions(+), 228 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 5dea690..ef243dc 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -1,6 +1,7 @@
 /*
  *  net/dccp/ccids/ccid3.c
  *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
  *  Copyright (c) 2005-7 Ian McDonald [EMAIL PROTECTED]
  *
@@ -49,8 +50,6 @@ static int ccid3_debug;
 #define ccid3_pr_debug(format, a...)
 #endif
 
-static struct dccp_rx_hist *ccid3_rx_hist;
-
 /*
  * Transmitter Half-Connection Routines
  */
@@ -675,52 +674,53 @@ static inline void ccid3_hc_rx_update_s(struct 
ccid3_hc_rx_sock *hcrx, int len)
hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9);
 }
 
-static void ccid3_hc_rx_send_feedback(struct sock *sk)
+static void ccid3_hc_rx_send_feedback(struct sock *sk, struct sk_buff *skb,
+ enum ccid3_fback_type fbtype)
 {
struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
struct dccp_sock *dp = dccp_sk(sk);
-   struct dccp_rx_hist_entry *packet;
-   ktime_t now;
-   suseconds_t delta;
-
-   ccid3_pr_debug(%s(%p) - entry \n, dccp_role(sk), sk);
+   ktime_t now = ktime_get_real();
+   s64 delta = 0;
 
-   now = ktime_get_real();
+   if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_TERM))
+   return;
 
-   switch (hcrx-ccid3hcrx_state) {
-   case TFRC_RSTATE_NO_DATA:
+   switch (fbtype) {
+   case FBACK_INITIAL:
hcrx-ccid3hcrx_x_recv = 0;
+   hcrx-ccid3hcrx_pinv   = ~0U;   /* see RFC 4342, 8.5 */
break;
-   case TFRC_RSTATE_DATA:
-   delta = ktime_us_delta(now,
-  hcrx-ccid3hcrx_tstamp_last_feedback);
-   DCCP_BUG_ON(delta  0);
-   hcrx-ccid3hcrx_x_recv =
-   scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta);
+   case FBACK_PARAM_CHANGE:
+   /*
+* When parameters change (new loss or p  p_prev), we do not
+* have a reliable estimate for R_m of [RFC 3448, 6.2] and so
+* need to  reuse the previous value of X_recv. However, when
+* X_recv was 0 (due to early loss), this would kill X down to
+* s/t_mbi (i.e. one packet in 64 seconds).
+* To avoid such drastic reduction, we approximate X_recv as
+* the number of bytes since last feedback.
+* This is a safe fallback, since X is bounded above by X_calc.
+*/
+   if (hcrx-ccid3hcrx_x_recv  0)
+   break;
+   /* fall through */
+   case FBACK_PERIODIC:
+   delta = ktime_us_delta(now, hcrx-ccid3hcrx_last_feedback);
+   if (delta = 0)
+   DCCP_BUG(delta (%ld) = 0, (long)delta);
+   else
+   hcrx-ccid3hcrx_x_recv =
+   scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta);
break;
-   case TFRC_RSTATE_TERM:
-   DCCP_BUG(%s(%p) - Illegal state TERM, dccp_role(sk), sk);
-   return;
-   }
-
-   packet = dccp_rx_hist_find_data_packet(hcrx-ccid3hcrx_hist);
-   if (unlikely(packet == NULL)) {
-   DCCP_WARN(%s(%p), no data packet in history!\n,
- dccp_role(sk), sk);
+   default:
return;
}
+   ccid3_pr_debug(Interval %ldusec, X_recv=%u, 1/p=%u\n, (long)delta,
+  hcrx-ccid3hcrx_x_recv, hcrx-ccid3hcrx_pinv);
 
-   hcrx-ccid3hcrx_tstamp_last_feedback = now;
-   hcrx-ccid3hcrx_ccval_last_counter   = packet-dccphrx_ccval;
-   hcrx-ccid3hcrx_bytes_recv

[PATCH 5/6] [TFRC]: Ringbuffer to track loss interval history

2007-12-03 Thread Gerrit Renker

A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.

Details:
 * access to the Loss Interval Records via macro wrappers (with safety checks);
 * simplified, on-demand allocation of entries (no extra memory consumption on
   lossless links); cache allocation is local to the module / exported as 
service;
 * provision of RFC-compliant algorithm to re-compute average loss interval;
 * provision of comprehensive, new loss detection algorithm
- support for all cases of loss, including re-ordered/duplicate packets;
- waiting for NDUPACK=3 packets to fill the hole;
- updating loss records when a late-arriving packet fills a hole.

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
Signed-off-by: Ian McDonald [EMAIL PROTECTED]
---
 net/dccp/ccids/lib/loss_interval.c  |  158 +-
 net/dccp/ccids/lib/loss_interval.h  |   58 +++-
 net/dccp/ccids/lib/packet_history.c |  183 +++
 net/dccp/ccids/lib/packet_history.h |4 +
 net/dccp/ccids/lib/tfrc.h   |3 +
 5 files changed, 400 insertions(+), 6 deletions(-)

diff --git a/net/dccp/ccids/lib/loss_interval.c 
b/net/dccp/ccids/lib/loss_interval.c
index 0ebdc67..cde5c7f 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -1,6 +1,7 @@
 /*
  *  net/dccp/ccids/lib/loss_interval.c
  *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
  *  Copyright (c) 2005-7 Ian McDonald [EMAIL PROTECTED]
  *  Copyright (c) 2005 Arnaldo Carvalho de Melo [EMAIL PROTECTED]
@@ -10,12 +11,7 @@
  *  the Free Software Foundation; either version 2 of the License, or
  *  (at your option) any later version.
  */
-
-#include linux/module.h
 #include net/sock.h
-#include ../../dccp.h
-#include loss_interval.h
-#include packet_history.h
 #include tfrc.h
 
 #define DCCP_LI_HIST_IVAL_F_LENGTH  8
@@ -27,6 +23,51 @@ struct dccp_li_hist_entry {
u32  dccplih_interval;
 };
 
+static struct kmem_cache  *tfrc_lh_slab  __read_mostly;
+/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */
+static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 };
+
+/*
+ * Access macros: These require that at least one entry is present in lh,
+ * and implement array semantics (0 is first, n-1 is the last of n entries).
+ */
+#define __lih_index(lh, n) LIH_INDEX((lh)-counter - (n) - 1)
+#define __lih_entry(lh, n) (lh)-ring[__lih_index(lh, n)]
+#define __curr_entry(lh)   (lh)-ring[LIH_INDEX((lh)-counter - 1)]
+#define __next_entry(lh)   (lh)-ring[LIH_INDEX((lh)-counter)]
+
+/* given i with 0 = i = k, return I_i as per the rfc3448bis notation */
+static inline u32 tfrc_lh_get_interval(struct tfrc_loss_hist *lh, u8 i)
+{
+   BUG_ON(i = lh-counter);
+   return __lih_entry(lh, i)-li_length;
+}
+
+static inline struct tfrc_loss_interval *tfrc_lh_peek(struct tfrc_loss_hist 
*lh)
+{
+   return lh-counter ? __curr_entry(lh) : NULL;
+}
+
+/*
+ * On-demand allocation and de-allocation of entries
+ */
+static struct tfrc_loss_interval *tfrc_lh_demand_next(struct tfrc_loss_hist 
*lh)
+{
+   if (__next_entry(lh) == NULL)
+   __next_entry(lh) = kmem_cache_alloc(tfrc_lh_slab, GFP_ATOMIC);
+
+   return __next_entry(lh);
+}
+
+void tfrc_lh_cleanup(struct tfrc_loss_hist *lh)
+{
+   if (tfrc_lh_is_initialised(lh))
+   for (lh-counter = 0; lh-counter  LIH_SIZE; lh-counter++)
+   if (__next_entry(lh) != NULL)
+   kmem_cache_free(tfrc_lh_slab, __next_entry(lh));
+}
+EXPORT_SYMBOL_GPL(tfrc_lh_cleanup);
+
 static struct kmem_cache *dccp_li_cachep __read_mostly;
 
 static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t 
prio)
@@ -98,6 +139,65 @@ u32 dccp_li_hist_calc_i_mean(struct list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_li_hist_calc_i_mean);
 
+static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh)
+{
+   u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0;
+   int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */
+
+   for (i=0; i = k; i++) {
+   i_i = tfrc_lh_get_interval(lh, i);
+
+   if (i  k) {
+   i_tot0 += i_i * tfrc_lh_weights[i];
+   w_tot  += tfrc_lh_weights[i];
+   }
+   if (i  0)
+   i_tot1 += i_i * tfrc_lh_weights[i-1];
+   }
+
+   BUG_ON(w_tot == 0);
+   lh-i_mean = max(i_tot0, i_tot1) / w_tot;
+}
+
+/**
+ * tfrc_lh_update_i_mean  -  Update the `open' loss interval I_0
+ * For recomputing p: returns `true' if p  p_prev  =  1/p  1/p_prev
+ */
+u8 tfrc_lh_update_i_mean(struct tfrc_loss_hist *lh, struct sk_buff *skb)
+{
+   struct tfrc_loss_interval *cur = tfrc_lh_peek(lh);
+   u32 old_i_mean = lh-i_mean;
+   s64 length;
+
+

Re: [PATCHv7 1/5] Remove unnecessary locks from rtnetlink (in do_setlink)

2007-12-03 Thread Laszlo Attila Toth


Jarek Poplawski írta:

Laszlo Attila Toth wrote, On 11/29/2007 05:11 PM:


The do_setlink function is protected by rtnl, additional locks are unnecessary,
and the set_operstate() function is called from protected parts. Locks removed
from both functions.


It doesn't look like in accordance with a comment to dev_base_lock in dev.c.
And it makes eg. rfc2863_policy() locking from link_watch.c looking strange.
Isn't there needed some additional comment to this?


I modified do_setlink(), but set_operstate() is also called from
rtnl_create_link() and from no other places.  In rtnl_create_link() none 
of the changes is protected by set_lock_bh() except inside 
set_operstate(), different locking scheme is not necessary for the 
operstate.


Also two solution can be made, one is locking everything and one is 
locking nothing (to unify the changes made by these parts). The second 
one is better if it is protected.


I tried to figure out how it is protected but I couldn't. But Patrick 
said it is protected by rtnl. And he suggested this patch.



Attila
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Alexey Kuznetsov

On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote:
 So we need to fix this, and whatever the fix is will probably render
 the BH/USER distinction obsolete.

Hmm, I would think opposite. USER (or generic) is expensive variant,
BH is lite. No?

Alexey
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h

2007-12-03 Thread Denis V. Lunev

you right, how about this?

Eric W. Biederman wrote:
 Denis V. Lunev [EMAIL PROTECTED] writes:
 
 This patch removes some unneeded includes for net_namespace.h to speed up
 compilation.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

 diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
 index f285de6..28b7f25 100644
 --- a/include/net/pkt_cls.h
 +++ b/include/net/pkt_cls.h
 @@ -2,7 +2,6 @@
  #define __NET_PKT_CLS_H
  
  #include linux/pkt_cls.h
 -#include net/net_namespace.h
  #include net/sch_generic.h
  #include net/act_api.h
 
 What of tcf_match_indev?  I guess linux/netdevice.h brings
 it in for us but still.
 
 diff --git a/include/net/sock.h b/include/net/sock.h
 index 43e3cd9..a04e361 100644
 --- a/include/net/sock.h
 +++ b/include/net/sock.h
 @@ -57,7 +57,6 @@
  #include asm/atomic.h
  #include net/dst.h
  #include net/checksum.h
 -#include net/net_namespace.h
  
  /*
   * This structure really needs to be cleaned up.
 @@ -95,6 +94,7 @@ typedef struct {
  
  struct sock;
  struct proto;
 +struct net;
  
  /**
   *  struct sock_common - minimal network layer representation of sockets
 
 This hunk definitely looks sane.
 
 Eric
 
 

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f285de6..2eaf204 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -2,7 +2,6 @@
 #define __NET_PKT_CLS_H
 
 #include linux/pkt_cls.h
-#include net/net_namespace.h
 #include net/sch_generic.h
 #include net/act_api.h
 
@@ -336,6 +335,8 @@ static inline int tcf_valid_offset(const struct sk_buff 
*skb,
 }
 
 #ifdef CONFIG_NET_CLS_IND
+#include net/net_namespace.h
+
 static inline int
 tcf_change_indev(struct tcf_proto *tp, char *indev, struct rtattr *indev_tlv)
 {
diff --git a/include/net/sock.h b/include/net/sock.h
index 43e3cd9..a04e361 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -57,7 +57,6 @@
 #include asm/atomic.h
 #include net/dst.h
 #include net/checksum.h
-#include net/net_namespace.h
 
 /*
  * This structure really needs to be cleaned up.
@@ -95,6 +94,7 @@ typedef struct {
 
 struct sock;
 struct proto;
+struct net;
 
 /**
  * struct sock_common - minimal network layer representation of sockets

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Herbert Xu

On Mon, Dec 03, 2007 at 02:49:12PM +0300, Alexey Kuznetsov wrote:
 On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote:
  So we need to fix this, and whatever the fix is will probably render
  the BH/USER distinction obsolete.
 
 Hmm, I would think opposite. USER (or generic) is expensive variant,
 BH is lite. No?

Yes that would certainly be the obvious fix.  In other words, just use
smp_processor_id instead of the raw version.  I suppose the only issue
would be that disabling/enabling preemption isn't exactly cost-free and
we're going to be doing that for every single increment.

Hmm, wasn't someone else talking about a non-atomic version of atomic
ops lately (i.e., atomic with respect to the local CPU only)? Perhaps
this is the killer app for it :)

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h

2007-12-03 Thread Eric W. Biederman

Denis V. Lunev [EMAIL PROTECTED] writes:

 This patch removes some unneeded includes for net_namespace.h to speed up
 compilation.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

 diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
 index f285de6..28b7f25 100644
 --- a/include/net/pkt_cls.h
 +++ b/include/net/pkt_cls.h
 @@ -2,7 +2,6 @@
  #define __NET_PKT_CLS_H
  
  #include linux/pkt_cls.h
 -#include net/net_namespace.h
  #include net/sch_generic.h
  #include net/act_api.h

What of tcf_match_indev?  I guess linux/netdevice.h brings
it in for us but still.

 diff --git a/include/net/sock.h b/include/net/sock.h
 index 43e3cd9..a04e361 100644
 --- a/include/net/sock.h
 +++ b/include/net/sock.h
 @@ -57,7 +57,6 @@
  #include asm/atomic.h
  #include net/dst.h
  #include net/checksum.h
 -#include net/net_namespace.h
  
  /*
   * This structure really needs to be cleaned up.
 @@ -95,6 +94,7 @@ typedef struct {
  
  struct sock;
  struct proto;
 +struct net;
  
  /**
   *   struct sock_common - minimal network layer representation of sockets

This hunk definitely looks sane.

Eric

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Herbert Xu

On Mon, Dec 03, 2007 at 03:19:35PM +0800, Wang Chen wrote:
 
 System calls should be USER. So change the BH to USER for
 UDP*_INC_STATS_BH().
 
 Signed-off-by: Wang Chen [EMAIL PROTECTED]

I've rearragned it so that we the new INC_STATS call is USER in the
first patch.  Otherwise it's all in net-2.6.25 now.

BTW, thanks to Gerrit's note I took a look at the underlying code
for the _BH/_USER stuff.  It's in fact totally broken when PREEMPT
is on.  It relies on the fact that process-context kernel code does
not get preempted by other process-context kernel code or for that
matter migrate to other CPUs, neither of which is true with PREEMPT
on.

So we need to fix this, and whatever the fix is will probably render
the BH/USER distinction obsolete.

Any takers?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 2/5] rtnetlink: send a single notification on device state changes

2007-12-03 Thread Laszlo Attila Toth


Jarek Poplawski írta:

Laszlo Attila Toth wrote, On 11/29/2007 05:11 PM:


In do_setlink() a single ntification is sent at the end of the function
if any modification occured. If the address has been changed, another
notification is sent.



...


@@ -858,6 +859,7 @@ static int do_setlink(struct net_device *dev, struct 
ifinfomsg *ifm,
if (tb[IFLA_BROADCAST]) {
nla_memcpy(dev-broadcast, tb[IFLA_BROADCAST], dev-addr_len);
send_addr_notify = 1;
+   modified = 1;
}


..


if (send_addr_notify)
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
+
+   if (modified)
+   netdev_state_change(dev);
+


The subject suggests there might be less notifications. The patch actually
adds a little. Any additional comment why they are necessary?


The actual state of a device contains its address(es), also address 
change implies state change, but these are different netlink messages 
also the NETDEV_CHANGEADDR cannot be dropped because the other one is used.


Attila
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] [IPSEC]: Reinject packet instead of calling netfilter directly on input

2007-12-03 Thread jamal

On Mon, 2007-03-12 at 20:21 +1100, Herbert Xu wrote:

 Sorry for the late response Jamal.  I've been too busy to give
 this issue proper thought.  It's still in my mailbox so I will
 respond to it once things quiten down a little.

I totaly empathize - take your time.

The point brought up on v6 extensions needs to be addressed. I thought
about it a little - and it is valid as well for ipv4 options; they will
be processed twice.
To build up on what Patrick said, I noticed a bit still available in the
bag right after skb-nf_trace that i could use to signal
options/extensions already processed. 
If people think think this is a sane use of that very lonely bit, I will
post patches.
CCing Yoshfuji.

cheers,
jamal

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Arnaldo Carvalho de Melo

Em Mon, Dec 03, 2007 at 08:35:12AM +, Gerrit Renker escreveu:
 Hi Arnaldo,
 
 hank you for going through this. I have just backported your recent patches 
 of 2.6.25
 to the DCCP/CCID4/Faster Restart test tree at 
   git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr}
 as per subsequent message.
 |  do, so please consider moving DCCP discussion to 
 netdev@vger.kernel.org,
 |  where lots of smart networking folks are present and can help our 
 efforts
 |  on turning RFCs to code.
 Are you suggesting using netdev exclusively or in addition to [EMAIL 
 PROTECTED]

Well, since at least one person that has contributed significantly in
the past has said he can't cope with traffic on netdev, we can CC
[EMAIL PROTECTED]

 
 | Please take a look at this patch series where I reorganized your work 
 on the new
 | TFRC rx history handling code. I'll wait for your considerations and then 
 do as many
 | interactions as reasonable to get your work merged.
 | 
 | It should be completely equivalent, plus some fixes and optimizations, 
 such as:
 It will be necessary to address these points one-by-one. Before diving into 
 making
 fixes and `optimisations', have you tested your code? The patches you are 
 referring to

I have performed basic tests, and when doing so noticed a bug in
inet_diag, that I commented with Herbert Xu and he has since provided a
fix.

 have been posted and re-posted for a period of over 9 months on [EMAIL 
 PROTECTED], and

Being posted and re-posted does not guarantee that the patch is OK or
that is in a form that is acceptable to all tree maintainers. DaveM is
subscribed to [EMAIL PROTECTED] and could have picked this if he had the time to
do the review. I didn't, but now I'm trying to spend my time on
reviewing your patches and in the process doing modifications I find
appropriate while trying hard not to introduce bugs in your hard work
to get it merged.

 there are regression tests which show that this code improves on the existing 
 Linux
 implementation. These are labelled as `test tree' on 
   http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing
 So if you are making changes to the code, I would like to ask if you have run 
 similar
 regression tests, to avoid having to step back later.

Fair enough, I will do that before asking Herbert or Dave to pull from
my tree.
 
 | . The code that allocates the RX ring deals with failures when one of the 
 entries in
 |   the ring buffer is not successfully allocated, the original code was 
 leaking the
 |   successfully allocated entries.

Sorry for not point out exactly this, here it goes:

Your original patch:

+int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
+{
+   int i;
+
+   for (i = 0; i = NDUPACK; i++) {
+   h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
+   if (h-ring[i] == NULL)
+   return 1;
+   }
+   h-loss_count = 0;
+   h-loss_start = 0;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);

Then, in ccid3_hc_rx_init you do:

static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
{
struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid);

ccid3_pr_debug(entry\n);

hcrx-ccid3hcrx_state = TFRC_RSTATE_NO_DATA;
tfrc_lh_init(hcrx-ccid3hcrx_li_hist);
return tfrc_rx_hist_init(hcrx-ccid3hcrx_hist);
}

So if tfrc_rx_hist_init fail to allocate, say, the last entry in the
ring, it will return 1, and from looking at how you initialize
h-loss_{count,start} to zero I assumed that the whole tfrc_rx_hist
is not zeroed when tfrc_rx_hist_init is called, so one can also assume
that the ring entries are not initialized to NULL and that if the
error recovery is to assume that later on tfrc_rx_hist_cleanup is called
we would not have a leak, but an OOPS when tfrc_rx_hist_cleanup tries
to call kfree_cache_free on the uninitialized ring entries.

But if you look at ccid_new(), that calls ccid3_hc_rx_init(), you'll see
that when the ccid rx or hx routine fails, it just frees the
struct ccid with the area where h-ring is, so, what was not cleaned up
by the ccid init routine is effectively forgot, leaked.

I first did the cleanup at tfrc_rx_hist_init (that I renamed to
tfrc_rx_hist_alloc, since it doesn't just initializes things, but
allocates entries from slab), but then I just made the rx history slab
have arrays of tfrc_rx_hist_entry objects, not individual ones as your
code always allocates them as arrays.

 | . We do just one allocation for the ring buffer, as the number of entries 
 is fixed we
 |   should just do one allocation and not TFRC_NDUPACK times.
 Will look at the first point in the patch; with regard to the second point I 
 agree, this
 will make the code simpler, which is good. 

good

 | . I haven't checked if all the code was commited, as I tried to introduce 
 just what was
 |   immediatelly used, probably we'll need to do some changes when working on 
 the merge

Re: [PATCH net-2.6.25] Add packet filtering based on process's securitycontext.

2007-12-03 Thread Tetsuo Handa

Hello.

Patrick McHardy wrote:
 No news on that. I'm also a bit sceptical if adding all this complexity
 and overhead would really be worth it (considering only netfilter) just
 to use the owner match and UID/GID matching. It wouldn't even be
 accurate because there is not 1:1 mapping of sockets and processes.
Considering only LSM, socket_post_accept()/socket_post_recv_datagram() hooks are
not complicated at all.
A socket may be mapped to multiple processes, but at the moment of picking up
(i.e. accept()/recvmsg()), I think it is accurate 1:1 mapping.
I'm more interested in Who picks this connection/datagram up? than
Which socket enqueues this connection/datagram?
It may be indifferent for netfilter, but it is region of interest for me.

 I actually like Samir Bellabes' approach, which doesn't suffer from
 these problems IIRC.
Oh, I found him at http://nfws.inl.fr/en/?p=50 . (Sorry, I didn't know.)
He is the person who was discussing with me a few days ago.

 From memory, one approach under discussion was to add netfilter hooks to 
  the transport layer, which could be invoked correctly by each type of 
  protocol when the target process is selected.
 
 We can only invoke the hooks after the socket lookup, but we don't
 know which process is going to call recvmsg() for that socket.

Right. Thus, I'm proposing LSM hooks at accept()/recvmsg() time.

Regards.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv7 2/5] rtnetlink: send a single notification on device state changes

2007-12-03 Thread Jarek Poplawski

On 03-12-2007 12:40, Laszlo Attila Toth wrote:
 Jarek Poplawski írta:
 Laszlo Attila Toth wrote, On 11/29/2007 05:11 PM:

 In do_setlink() a single ntification is sent at the end of the function
 if any modification occured. If the address has been changed, another
 notification is sent.


 ...

 @@ -858,6 +859,7 @@ static int do_setlink(struct net_device *dev, 
 struct ifinfomsg *ifm,
  if (tb[IFLA_BROADCAST]) {
  nla_memcpy(dev-broadcast, tb[IFLA_BROADCAST], dev-addr_len);
  send_addr_notify = 1;
 +modified = 1;
  }

 ..

  if (send_addr_notify)
  call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
 +
 +if (modified)
 +netdev_state_change(dev);
 +

 The subject suggests there might be less notifications. The patch 
 actually
 adds a little. Any additional comment why they are necessary?
 
 The actual state of a device contains its address(es), also address 
 change implies state change, but these are different netlink messages 
 also the NETDEV_CHANGEADDR cannot be dropped because the other one is used.

OK. But, since until this patch it seemed to be enough, it would be
nice to know from the changelog why exactly it's nececessary to add
this now, because it doesn't look like it was omitted here by mistake.
(Or to say that it was omitted by mistake.)

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Herbert Xu

On Mon, Dec 03, 2007 at 10:54:35PM +1100, Herbert Xu wrote:

 Hmm, wasn't someone else talking about a non-atomic version of atomic
 ops lately (i.e., atomic with respect to the local CPU only)? Perhaps
 this is the killer app for it :)

Never mind, we already have that in local_t and as Alexey correctly
points out, USER is still going to be the expensive variant with the
preempt_disable (well until BH gets threaded).  So how about this patch?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/snmp.h b/include/net/snmp.h
index ea206bf..9444b54 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -23,6 +23,7 @@
 
 #include linux/cache.h
 #include linux/snmp.h
+#include linux/smp.h
 
 /*
  * Mibs are stored in array of unsigned long.
@@ -137,7 +138,10 @@ struct linux_mib {
 #define SNMP_INC_STATS_OFFSET_BH(mib, field, offset)   \
(per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field + (offset)]++)
 #define SNMP_INC_STATS_USER(mib, field) \
-   (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field]++)
+   do { \
+   per_cpu_ptr(mib[1], get_cpu())-mibs[field]++; \
+   put_cpu(); \
+   } while (0)
 #define SNMP_INC_STATS(mib, field) \
(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())-mibs[field]++)
 #define SNMP_DEC_STATS(mib, field) \
@@ -145,6 +149,9 @@ struct linux_mib {
 #define SNMP_ADD_STATS_BH(mib, field, addend)  \
(per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field] += addend)
 #define SNMP_ADD_STATS_USER(mib, field, addend)\
-   (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field] += addend)
+   do { \
+   per_cpu_ptr(mib[1], get_cpu())-mibs[field] += addend; \
+   put_cpu(); \
+   } while (0)
 
 #endif
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c

2007-12-03 Thread Ilpo Järvinen

On Mon, 3 Dec 2007, Wolfgang Walter wrote:

 with kernel 2.6.23.8 we saw a
 
 KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at 
 net/ipv4/tcp_input.c (1292)

Is this the only message? Are there any Leak printouts?
Any tweaking done to TCP related sysctls?
And for completeness, is GSO enabled (ethtool -k)?

Most likely I broke the manual synchronization for left_out in sacktag by 
skipping over it when packets_out == 0 but so far I haven't been able to 
figure out how such state could develop in the first place... Ie., I 
couldn't find a case where tcp_fastretrans_alert wouldn't be called if 
left_out was non-zero (and it did the sync_left_out after modifying
either sacked_out or lost_out, IIRC).

...If you can reproduce it, you could try if this patch below changes 
anything (should silence the assert and trigger earlier a WARN_ON or
two :-)). ...If this triggers, then I'm sure we can pollute TCP code
by a larger number of more costly checks to catch it in early.

This might reveal a long-standing inconsistency of left_out in some
case I just couldn't come up with by code review. Left_out will be
(is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for
left_out occurs quite soon after that BUG_TRAP anyway so the effect
won't be too dramatic, prior_in_flight would be once stale, won't
lead to big problems (either missed cnwd or cwnd_cnt increment, or
failure to do application limited check at that particular ACK).

Thanks anyway for the report. ...If I figure something out here, I'll
let you know.

--

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c9298a7..0c5194d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (before(TCP_SKB_CB(ack_skb)-ack_seq, prior_snd_una - 
tp-max_window))
return 0;
 
-   if (!tp-packets_out)
+   if (!tp-packets_out) {
+   WARN_ON(tp-sacked_out);
+   WARN_ON(tp-lost_out);
+   WARN_ON(tp-left_out);
goto out;
+   }
 
/* SACK fastpath:
 * if the only SACK change is the increase of the end_seq of
@@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
}
}
 
+out:
+
tp-left_out = tp-sacked_out + tp-lost_out;
 
if ((reord  tp-fackets_out)  icsk-icsk_ca_state != TCP_CA_Loss 
(!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark)))
tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0);
 
-out:
-
 #if FASTRETRANS_DEBUG  0
BUG_TRAP((int)tp-sacked_out = 0);
BUG_TRAP((int)tp-lost_out = 0);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx

2007-12-03 Thread Peter Korsgaard

 Wang == Wang Chen [EMAIL PROTECTED] writes:

Hi,

 Wang +len = skb-len;
 Wang  netif_rx(skb);
 dev- stats.rx_packets++;
 Wang -dev-stats.rx_bytes += skb-len;
 Wang +dev-stats.rx_bytes += len;

Why not simply update the stats before calling netif_rx as the return
value isn't checked anyway?

-- 
Bye, Peter Korsgaard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Gerrit Renker

|  Are you suggesting using netdev exclusively or in addition to [EMAIL 
PROTECTED]
| 
| Well, since at least one person that has contributed significantly in
| the past has said he can't cope with traffic on netdev, we can CC
| [EMAIL PROTECTED]
I have a similar problem with the traffic but agree and will copy as well.


|  have been posted and re-posted for a period of over 9 months on [EMAIL 
PROTECTED], and
| 
| Being posted and re-posted does not guarantee that the patch is OK or
| that is in a form that is acceptable to all tree maintainers.
With the first point I couldn't agree more, but this is not really what I meant 
- the point
was that despite posting and re-posting there was often silence. And now there 
is feedback,
in form of a patchset made by you; and all that I am asking for is just to be 
given the time to
look that through. Last time a RFC patch appeared at 3pm and was checked in the 
same evening
(only found out next morning).
With your experience and long background as a maintainer maybe you expect 
quicker turn-around
times, but I also had waited patiently until you had had a chance to review the 
stuff I sent.


|  | . The code that allocates the RX ring deals with failures when one of the 
entries in
|  |   the ring buffer is not successfully allocated, the original code was 
leaking the
|  |   successfully allocated entries.
| 
| Sorry for not point out exactly this, here it goes:
| 
| Your original patch:
| 
| +int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
| +{
| + int i;
| +
| + for (i = 0; i = NDUPACK; i++) {
| + h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
| + if (h-ring[i] == NULL)
| + return 1;
| + }
| + h-loss_count = 0;
| + h-loss_start = 0;
| + return 0;
| +}
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);
| 
I expected this and it actually was very clear from your original message. I 
fully back up
your solution in this point, see below. But my question above was rather: are 
there any
other bugs rather than the above leakage, which is what the previous email 
seemed to indicate?

With regard to your solution - you are using

int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
{
h-ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
h-loss_count = h-loss_start = 0;
return h-ring == NULL;
}

which is better not only for one but for two reasons. It solves the leakage and 
in addition makes
the entire code simpler. Fully agreed.
  

| 
|  | . I haven't checked if all the code was commited, as I tried to introduce 
just what was
|  |   immediatelly used, probably we'll need to do some changes when working 
on the merge
|  |   of your loss intervals code.
|  Sorry I don't understand this point.
| 
| Let me check now and tell you for sure:
| 
| tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as
| they were not used, we should introduce them later, when getting to the
| working on the loss interval code.
Ah thanks, that was really not clear. Just beginning to work through the set.

|  static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff 
*skb)
|  {
|  // ...
|  u32 sample, payload_size = skb-len - dccp_hdr(skb)-dccph_doff 
* 4;
|  
|  if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
|  if (is_data_packet) {
|  do_feedback = FBACK_INITIAL;
|  ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
|  ccid3_hc_rx_update_s(hcrx, payload_size);
|  }
|  goto update_records;
|  }
|  
|  == Non-data packets are ignored for the purposes of computing s (this is 
in the RFC),
|  consequently update_s() is only called for data packets; using the two 
following
|  functions:
|  
|  
|  static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 
weight)
|  {
|  return avg ? (weight * avg + (10 - weight) * newval) / 10 : 
newval;
|  }
| 
| I hadn't considered that tfrc_ewma would for every packet check if the
| avg was 0 and I find it suboptimal now that I look at it, we are just
| feeding data packets, no? 
Yes exactly, only data packets are used for s.

|  static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, 
int len)
|  {
|  if (likely(len  0))/* don't update on empty packets (e.g. 
ACKs) */
|  hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 
9);
|  }
| 
|   And we also just do test for len  0 in update_s, that looks like
| also excessive, no?
Hm, I think we need to make it robust against API bugs and/or zero-sized
data packets. The check `len  0' may seem redundant but it catches such
a condition. For a moving average an accidental zero value will have
quite an impact on the overall value. In CCID3 it

[PATCH net-2.6.25 1/2][INET] Merge sys.net.ipv4.ip_forward and sys.net.ipv4.conf.all.forwarding

2007-12-03 Thread Pavel Emelyanov

AFAIS these two entries should do the same thing - change the
forwarding state on ipv4_devconf and on all the devices.

I propose to merge the handlers together using ctl paths.

The inet_forward_change() is static after this and I move
it higher to make code organized like
helpers (including this)
proc/sysctl handlers
tables
rather than mixing it altogether.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index d83fee2..dd093ea 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -135,7 +135,6 @@ extern struct in_device *inetdev_by_index(int);
 extern __be32  inet_select_addr(const struct net_device *dev, __be32 
dst, int scope);
 extern __be32  inet_confirm_addr(const struct net_device *dev, __be32 
dst, __be32 local, int scope);
 extern struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 
prefix, __be32 mask);
-extern voidinet_forward_change(void);
 
 static __inline__ int inet_ifa_match(__be32 addr, struct in_ifaddr *ifa)
 {
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index c19c8db..0b5f042 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1264,6 +1264,28 @@ static void devinet_copy_dflt_conf(int i)
read_unlock(dev_base_lock);
 }
 
+static void inet_forward_change(void)
+{
+   struct net_device *dev;
+   int on = IPV4_DEVCONF_ALL(FORWARDING);
+
+   IPV4_DEVCONF_ALL(ACCEPT_REDIRECTS) = !on;
+   IPV4_DEVCONF_DFLT(FORWARDING) = on;
+
+   read_lock(dev_base_lock);
+   for_each_netdev(init_net, dev) {
+   struct in_device *in_dev;
+   rcu_read_lock();
+   in_dev = __in_dev_get_rcu(dev);
+   if (in_dev)
+   IN_DEV_CONF_SET(in_dev, FORWARDING, on);
+   rcu_read_unlock();
+   }
+   read_unlock(dev_base_lock);
+
+   rt_cache_flush(0);
+}
+
 static int devinet_conf_proc(ctl_table *ctl, int write,
 struct file* filp, void __user *buffer,
 size_t *lenp, loff_t *ppos)
@@ -1333,28 +1355,6 @@ static int devinet_conf_sysctl(ctl_table *table, int 
__user *name, int nlen,
return 1;
 }
 
-void inet_forward_change(void)
-{
-   struct net_device *dev;
-   int on = IPV4_DEVCONF_ALL(FORWARDING);
-
-   IPV4_DEVCONF_ALL(ACCEPT_REDIRECTS) = !on;
-   IPV4_DEVCONF_DFLT(FORWARDING) = on;
-
-   read_lock(dev_base_lock);
-   for_each_netdev(init_net, dev) {
-   struct in_device *in_dev;
-   rcu_read_lock();
-   in_dev = __in_dev_get_rcu(dev);
-   if (in_dev)
-   IN_DEV_CONF_SET(in_dev, FORWARDING, on);
-   rcu_read_unlock();
-   }
-   read_unlock(dev_base_lock);
-
-   rt_cache_flush(0);
-}
-
 static int devinet_sysctl_forward(ctl_table *ctl, int write,
  struct file* filp, void __user *buffer,
  size_t *lenp, loff_t *ppos)
@@ -1537,6 +1537,27 @@ static void devinet_sysctl_unregister(struct 
ipv4_devconf *p)
 }
 #endif
 
+static struct ctl_table ctl_forward_entry[] = {
+   {
+   .ctl_name   = NET_IPV4_FORWARD,
+   .procname   = ip_forward,
+   .data   = ipv4_devconf.data[
+   NET_IPV4_CONF_FORWARDING - 1],
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = devinet_sysctl_forward,
+   .strategy   = devinet_conf_sysctl,
+   .extra1 = ipv4_devconf,
+   },
+   { },
+};
+
+static __initdata struct ctl_path net_ipv4_path[] = {
+   { .procname = net, .ctl_name = CTL_NET, },
+   { .procname = ipv4, .ctl_name = NET_IPV4, },
+   { },
+};
+
 void __init devinet_init(void)
 {
register_gifconf(PF_INET, inet_gifconf);
@@ -1550,6 +1571,7 @@ void __init devinet_init(void)
ipv4_devconf);
__devinet_sysctl_register(default, NET_PROTO_CONF_DEFAULT,
ipv4_devconf_dflt);
+   register_sysctl_paths(net_ipv4_path, ctl_forward_entry);
 #endif
 }
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index bec6fe8..8c16077 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -35,62 +35,6 @@ struct ipv4_config ipv4_config;
 
 #ifdef CONFIG_SYSCTL
 
-static
-int ipv4_sysctl_forward(ctl_table *ctl, int write, struct file * filp,
-   void __user *buffer, size_t *lenp, loff_t *ppos)
-{
-   int val = IPV4_DEVCONF_ALL(FORWARDING);
-   int ret;
-
-   ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
-
-   if (write  IPV4_DEVCONF_ALL(FORWARDING) != val)
-   inet_forward_change();
-
-   return ret;
-}
-
-static int ipv4_sysctl_forward_strategy(ctl_table *table,
-

[PATCH net-2.6.25 2/2][INET] Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding

2007-12-03 Thread Pavel Emelyanov

AFAIS the net.ipv4.conf. dev, all and default sysctls should 
work like this when changed (besides changing the value itself):

dev   : optionally do smth else
all : walk devices
default : walk devices

The proc handler for net.ipv4.conf.all works like this:

dev   : flush rt cache
all : walk devices and flush rt cache
default : nothing

while the sysctl handler works like this:

dev   : nothing
all : nothing
default : walk devices but don't flush the cache

All this looks strange. Am I right that regardless of whatever
handler (proc or syscall) is called the behavior should be:

dev   : flush rt cache
all : walk the devices and flush the cache
default : walk the devices and flush the cache

?

If yes, then this patch should fix it up.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0b5f042..1934a06 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1282,6 +1282,17 @@ static void inet_forward_change(void)
rcu_read_unlock();
}
read_unlock(dev_base_lock);
+}
+
+static void fixup_forward_change(struct ctl_table *table)
+{
+   struct ipv4_devconf *conf;
+
+   conf = table-extra1;
+   if (conf == ipv4_devconf)
+   inet_forward_change();
+   else if (conf == ipv4_devconf_dflt)
+   devinet_copy_dflt_conf(NET_IPV4_CONF_FORWARDING - 1);
 
rt_cache_flush(0);
 }
@@ -1305,9 +1316,9 @@ static int devinet_conf_proc(ctl_table *ctl, int write,
return ret;
 }
 
-static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen,
+static int __devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen,
   void __user *oldval, size_t __user *oldlenp,
-  void __user *newval, size_t newlen)
+  void __user *newval, size_t newlen, int *idx)
 {
struct ipv4_devconf *cnf;
int *valp = table-data;
@@ -1346,16 +1357,27 @@ static int devinet_conf_sysctl(ctl_table *table, int 
__user *name, int nlen,
 
cnf = table-extra1;
i = (int *)table-data - cnf-data;
-
set_bit(i, cnf-state);
+   *idx = i;
+   return 1;
+}
+
+static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen,
+  void __user *oldval, size_t __user *oldlenp,
+  void __user *newval, size_t newlen)
+{
+   int ret, i;
 
-   if (cnf == ipv4_devconf_dflt)
+   ret = __devinet_conf_sysctl(table, name, nlen, oldval, oldlenp,
+   newval, newlen, i);
+
+   if (ret == 1  table-extra1 == ipv4_devconf_dflt)
devinet_copy_dflt_conf(i);
 
-   return 1;
+   return ret;
 }
 
-static int devinet_sysctl_forward(ctl_table *ctl, int write,
+static int devinet_forward_proc(ctl_table *ctl, int write,
  struct file* filp, void __user *buffer,
  size_t *lenp, loff_t *ppos)
 {
@@ -1363,16 +1385,25 @@ static int devinet_sysctl_forward(ctl_table *ctl, int 
write,
int val = *valp;
int ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
 
-   if (write  *valp != val) {
-   if (valp == IPV4_DEVCONF_ALL(FORWARDING))
-   inet_forward_change();
-   else if (valp != IPV4_DEVCONF_DFLT(FORWARDING))
-   rt_cache_flush(0);
-   }
+   if (write  *valp != val)
+   fixup_forward_change(ctl);
 
return ret;
 }
 
+static int devinet_forward_sysctl(ctl_table *table, int __user *name, int nlen,
+  void __user *oldval, size_t __user *oldlenp,
+  void __user *newval, size_t newlen)
+{
+   int ret, i;
+
+   ret = __devinet_conf_sysctl(table, name, nlen, oldval, oldlenp,
+   newval, newlen, i);
+   if (ret == 1)
+   fixup_forward_change(table);
+   return ret;
+}
+
 int ipv4_doint_and_flush(ctl_table *ctl, int write,
 struct file* filp, void __user *buffer,
 size_t *lenp, loff_t *ppos)
@@ -1436,8 +1467,8 @@ static struct devinet_sysctl_table {
 } devinet_sysctl = {
.devinet_vars = {
DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, forwarding,
-devinet_sysctl_forward,
-devinet_conf_sysctl),
+devinet_forward_proc,
+devinet_forward_sysctl),
DEVINET_SYSCTL_RO_ENTRY(MC_FORWARDING, mc_forwarding),
 
DEVINET_SYSCTL_RW_ENTRY(ACCEPT_REDIRECTS, accept_redirects),
@@ -1545,8 +1576,8 @@ static struct ctl_table ctl_forward_entry[] = {
NET_IPV4_CONF_FORWARDING - 1],
.maxlen =

Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Arnaldo Carvalho de Melo

Em Mon, Dec 03, 2007 at 01:49:47PM +, Gerrit Renker escreveu:
 |  Are you suggesting using netdev exclusively or in addition to [EMAIL 
 PROTECTED]
 | 
 | Well, since at least one person that has contributed significantly in
 | the past has said he can't cope with traffic on netdev, we can CC
 | [EMAIL PROTECTED]
 I have a similar problem with the traffic but agree and will copy as well.
 
 
 |  have been posted and re-posted for a period of over 9 months on [EMAIL 
 PROTECTED], and
 | 
 | Being posted and re-posted does not guarantee that the patch is OK or
 | that is in a form that is acceptable to all tree maintainers.

 With the first point I couldn't agree more, but this is not really what I 
 meant - the point
 was that despite posting and re-posting there was often silence. And now 
 there is feedback,
 in form of a patchset made by you; and all that I am asking for is just to be 
 given the time to
 look that through. Last time a RFC patch appeared at 3pm and was checked in 
 the same evening
 (only found out next morning).

I was too optimistic about that one, feeling that it was safe, sorry
about that, will avoid doing that in the future.

 With your experience and long background as a maintainer maybe you expect 
 quicker turn-around
 times, but I also had waited patiently until you had had a chance to review 
 the stuff I sent.

Agreed, my bad, will be more patient with my side as you've been with
yours :-)

 |  | . The code that allocates the RX ring deals with failures when one of 
 the entries in
 |  |   the ring buffer is not successfully allocated, the original code was 
 leaking the
 |  |   successfully allocated entries.
 | 
 | Sorry for not point out exactly this, here it goes:
 | 
 | Your original patch:
 | 
 | +int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
 | +{
 | +   int i;
 | +
 | +   for (i = 0; i = NDUPACK; i++) {
 | +   h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
 | +   if (h-ring[i] == NULL)
 | +   return 1;
 | +   }
 | +   h-loss_count = 0;
 | +   h-loss_start = 0;
 | +   return 0;
 | +}
 | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);
 | 
 I expected this and it actually was very clear from your original message. I 
 fully back up
 your solution in this point, see below. But my question above was rather: are 
 there any
 other bugs rather than the above leakage, which is what the previous email 
 seemed to indicate?
 
 With regard to your solution - you are using
 
   int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
   {
   h-ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
   h-loss_count = h-loss_start = 0;
   return h-ring == NULL;
   }
 
 which is better not only for one but for two reasons. It solves the leakage 
 and in addition makes
 the entire code simpler. Fully agreed.

good
   
 
 | 
 |  | . I haven't checked if all the code was commited, as I tried to 
 introduce just what was
 |  |   immediatelly used, probably we'll need to do some changes when 
 working on the merge
 |  |   of your loss intervals code.
 |  Sorry I don't understand this point.
 | 
 | Let me check now and tell you for sure:
 | 
 | tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as
 | they were not used, we should introduce them later, when getting to the
 | working on the loss interval code.
 Ah thanks, that was really not clear. Just beginning to work through the set.

great
 
 |static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff 
 *skb)
 |{
 |// ...
 |u32 sample, payload_size = skb-len - dccp_hdr(skb)-dccph_doff 
 * 4;
 |  
 |if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
 |if (is_data_packet) {
 |do_feedback = FBACK_INITIAL;
 |ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
 |ccid3_hc_rx_update_s(hcrx, payload_size);
 |}
 |goto update_records;
 |}
 |  
 |  == Non-data packets are ignored for the purposes of computing s (this is 
 in the RFC),
 |  consequently update_s() is only called for data packets; using the 
 two following
 |  functions:
 |  
 |  
 |static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 
 weight)
 |{
 |return avg ? (weight * avg + (10 - weight) * newval) / 10 : 
 newval;
 |}
 | 
 | I hadn't considered that tfrc_ewma would for every packet check if the
 | avg was 0 and I find it suboptimal now that I look at it, we are just
 | feeding data packets, no? 
 Yes exactly, only data packets are used for s.
 
 |static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, 
 int len)
 |{
 |if (likely(len  0))/* don't update on empty packets (e.g. 
 ACKs) */
 |hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 
 9);
 |}
 | 
 | And we also

Re: namespace support requires network modules to say GPL

2007-12-03 Thread Arjan van de Ven

On Mon, 03 Dec 2007 09:24:15 +0100
Romano Giannetti [EMAIL PROTECTED] wrote:

 
 
 On Sat, 2007-12-01 at 22:34 -0500, Mark Lord wrote:
  Stephen Hemminger wrote:.
  
   I spoke too soon earlier, ndiswrapper builds and loads against
   current 2.6.24-rc3. Vmware and proprietary VPN software probably
   do not. Once again I don't give a damn, but the enterprise distro
   vendors certainly care.
  ...
  
  Naw, enterprise (or any other) distro vendors shouldn't have any
  issues here, since they can just patch their kernels around any
  issues.
 
 Please pardon me for jumping in; 

 
 What I think is that every time VMware or (worst) ndiswrapper breaks,

if you had read the thread... ndiswrapper doesn't break, and vmware
driver had some bugs that, once fixed, no no longer break either


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 driver problems

2007-12-03 Thread Kok, Auke

Lukas Hejtmanek wrote:
 On Tue, Nov 27, 2007 at 10:23:00AM -0800, Kok, Auke wrote:
 can you see if your problem goes away with this patch?
 
 I cannot test it right now but friend of mine has the same card with 2.6.23.1
 kernel. it does not. he also tried module 7.6.12 from source fourge, your 
 patch
 did not help. Moreover, it just hangs network connections after while. No
 message in dmesg about hangup.

can you open up a ticket on e1000.sf.net and fill in a full bugreport including
output of all of these? :

- ethttool -i eth0
- ethtool -e eth0
- lspci -vv
- full dmesg (not just the driver parts)
- dmidecode
- cat /proc/interrupts

I'd like to see if maybe we can reproduce this in our lab.

Thanks,

Auke
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c

2007-12-03 Thread Wolfgang Walter

Am Montag, 3. Dezember 2007 14:34 schrieb Ilpo Järvinen:
 On Mon, 3 Dec 2007, Wolfgang Walter wrote:
  with kernel 2.6.23.8 we saw a
 
  KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at
  net/ipv4/tcp_input.c (1292)

 Is this the only message? Are there any Leak printouts?

No.

4 days earlier there were 3 messages: TCP: Treason uncloaked! Peer 
a.b.c.d:80/56532 shrinks window 3535507131:3535513869. Repaired.

 Any tweaking done to TCP related sysctls?
 And for completeness, is GSO enabled (ethtool -k)?

rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off


 Most likely I broke the manual synchronization for left_out in sacktag by
 skipping over it when packets_out == 0 but so far I haven't been able to
 figure out how such state could develop in the first place... Ie., I
 couldn't find a case where tcp_fastretrans_alert wouldn't be called if
 left_out was non-zero (and it did the sync_left_out after modifying
 either sacked_out or lost_out, IIRC).

 ...If you can reproduce it, you could try if this patch below changes

I don't know how to reproduce it - we never saw the message before. I'll aply 
the patch. Let see if the WARN_ON triggers before we update to a newer 
kernel :-).

 anything (should silence the assert and trigger earlier a WARN_ON or
 two :-)). ...If this triggers, then I'm sure we can pollute TCP code
 by a larger number of more costly checks to catch it in early.

 This might reveal a long-standing inconsistency of left_out in some
 case I just couldn't come up with by code review. Left_out will be
 (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for
 left_out occurs quite soon after that BUG_TRAP anyway so the effect
 won't be too dramatic, prior_in_flight would be once stale, won't
 lead to big problems (either missed cnwd or cwnd_cnt increment, or
 failure to do application limited check at that particular ACK).

 Thanks anyway for the report. ...If I figure something out here, I'll
 let you know.

 --

 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
 index c9298a7..0c5194d 100644
 --- a/net/ipv4/tcp_input.c
 +++ b/net/ipv4/tcp_input.c
 @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct
 sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)-ack_seq,
 prior_snd_una - tp-max_window)) return 0;

 - if (!tp-packets_out)
 + if (!tp-packets_out) {
 + WARN_ON(tp-sacked_out);
 + WARN_ON(tp-lost_out);
 + WARN_ON(tp-left_out);
   goto out;
 + }

   /* SACK fastpath:
* if the only SACK change is the increase of the end_seq of
 @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct
 sk_buff *ack_skb, u32 prior_snd_ }
   }

 +out:
 +
   tp-left_out = tp-sacked_out + tp-lost_out;

   if ((reord  tp-fackets_out)  icsk-icsk_ca_state != TCP_CA_Loss 
   (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark)))
   tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0);

 -out:
 -
  #if FASTRETRANS_DEBUG  0
   BUG_TRAP((int)tp-sacked_out = 0);
   BUG_TRAP((int)tp-lost_out = 0);

Thanks and regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches

2007-12-03 Thread Gerrit Renker

|  |  static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock 
*hcrx, int len)
|  |  {
|  |  if (likely(len  0))/* don't update on empty 
packets (e.g. ACKs) */
|  |  hcrx-ccid3hcrx_s = 
tfrc_ewma(hcrx-ccid3hcrx_s, len, 9);
|  |  }
|  | 
|  |   And we also just do test for len  0 in update_s, that looks like
|  | also excessive, no?
|  Hm, I think we need to make it robust against API bugs and/or zero-sized
|  data packets. The check `len  0' may seem redundant but it catches such
|  a condition. For a moving average an accidental zero value will have
|  quite an impact on the overall value. In CCID3 it is
|  
|  x_new = 0.9 * x_old + 0.1 * len
|  
|  So if len is accidentally 0 where it should not be, then each time the
|  moving average is reduced to 90%.
| 
| So we must make it a BUG_ON, not something that is to be always present.
|  
I think it should be a warning condition since it can be triggered when
the remote party sends zero-sized packets. It may be good to log this
into the syslog to warn about possibly misbehaving apps/peers/remote
stacks.


|  As a comparison - the entire patch set took about a full month to do.
|  But that meant I am reasonably sure the algorithm is sound and can cope
|  with problematic conditions.
| 
| And from what I saw so far that is my impression too, if you look at
| what I'm doing it is:
| 
| 1. go thru the whole patch trying to understand hunk by hunk
You are doing a great job - in particular as it really is a lot of material.

| 2. do consistency changes (add namespace prefixes)
| 3. reorganize the code to look more like what is already there, we
|both have different backgrounds and tastes about how code should
|be written, so its only normal that if we want to keep the code
|consistent, and I want that, I jump into things I think should be
|reworded, while trying to keep the algorithm expressed by you.
|
Agree, that is not always easy to get right. I try to stick as close as
possible to existing conventions but of course that is my
interpretation, so I am already anticipating such changes/comments here.

| think about further automatization on regression testing.
| 
If it is of any use, some scripts and setups are at the bottom of the page at
http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h

2007-12-03 Thread Eric W. Biederman


Looks good to me.

Acked-by: Eric W. Biederman [EMAIL PROTECTED]


Eric

Denis V. Lunev [EMAIL PROTECTED] writes:

 you right, how about this?

 Eric W. Biederman wrote:
 Denis V. Lunev [EMAIL PROTECTED] writes:
 
 This patch removes some unneeded includes for net_namespace.h to speed up
 compilation.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

 diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
 index f285de6..28b7f25 100644
 --- a/include/net/pkt_cls.h
 +++ b/include/net/pkt_cls.h
 @@ -2,7 +2,6 @@
  #define __NET_PKT_CLS_H
  
  #include linux/pkt_cls.h
 -#include net/net_namespace.h
  #include net/sch_generic.h
  #include net/act_api.h
 
 What of tcf_match_indev?  I guess linux/netdevice.h brings
 it in for us but still.
 
 diff --git a/include/net/sock.h b/include/net/sock.h
 index 43e3cd9..a04e361 100644
 --- a/include/net/sock.h
 +++ b/include/net/sock.h
 @@ -57,7 +57,6 @@
  #include asm/atomic.h
  #include net/dst.h
  #include net/checksum.h
 -#include net/net_namespace.h
  
  /*
   * This structure really needs to be cleaned up.
 @@ -95,6 +94,7 @@ typedef struct {
  
  struct sock;
  struct proto;
 +struct net;
  
  /**
   * struct sock_common - minimal network layer representation of sockets
 
 This hunk definitely looks sane.
 
 Eric
 
 

 diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
 index f285de6..2eaf204 100644
 --- a/include/net/pkt_cls.h
 +++ b/include/net/pkt_cls.h
 @@ -2,7 +2,6 @@
  #define __NET_PKT_CLS_H
  
  #include linux/pkt_cls.h
 -#include net/net_namespace.h
  #include net/sch_generic.h
  #include net/act_api.h
  
 @@ -336,6 +335,8 @@ static inline int tcf_valid_offset(const struct sk_buff
 *skb,
  }
  
  #ifdef CONFIG_NET_CLS_IND
 +#include net/net_namespace.h
 +
  static inline int
  tcf_change_indev(struct tcf_proto *tp, char *indev, struct rtattr *indev_tlv)
  {
 diff --git a/include/net/sock.h b/include/net/sock.h
 index 43e3cd9..a04e361 100644
 --- a/include/net/sock.h
 +++ b/include/net/sock.h
 @@ -57,7 +57,6 @@
  #include asm/atomic.h
  #include net/dst.h
  #include net/checksum.h
 -#include net/net_namespace.h
  
  /*
   * This structure really needs to be cleaned up.
 @@ -95,6 +94,7 @@ typedef struct {
  
  struct sock;
  struct proto;
 +struct net;
  
  /**
   *   struct sock_common - minimal network layer representation of sockets
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] netem: trace enhancement

2007-12-03 Thread Patrick McHardy


Ariane Keller wrote:

Patrick McHardy wrote:


I dislike using anything besides rtnetlink for qdisc configuration.
The only way to transfer arbitary amounts of data over netlink would
be to spread the data over multiple messages. But then again, you're
using kmalloc and only seem to allocate 4k, so how large are these
traces in practice?


For each packet to be processed there is 32bit of data, which encodes 
delay and drop, duplicate etc. The size of the actual trace file can 
therefore reach any length, depending on for how many packets the 
information is encoded (up to several GB).
Therefore we send the trace file in chunks of 4000bytes to the kernel. 
In order to have always a packet-delay-value ready, we maintain two 
delay queues in the kernel (each of 4k). In a first step, both queues 
are filled, and the values are read from the first queue, if this queue 
is finished, we read values from the second queue and fill the first 
queue with new values from the trace file etc. Therefore we have a user 
space process running, which reads the values from the trace file, and 
sends them to the kernel.



That sounds like it would also be possible using rtnetlink. You could
send out a notification whenever you switch the active buffer and have
userspace listen to these and replace the inactive one.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: namespace support requires network modules to say GPL

2007-12-03 Thread Eric W. Biederman

Patrick McHardy [EMAIL PROTECTED] writes:

 Ben Greear wrote:
 I have a binary module that uses dev_get_by_name...it's sort of a bridge-like
 thing and
 needs user-space to tell it which device to listen for packets on...

 This code doesn't need or care about name-spaces, so I don't see how it could
 really
 be infringing on the author's code (any worse than loading a binary driver
 into the kernel
 ever does).

Regardless of infringement it is incompatible with a complete network
namespace implementation.  Further it sounds like the module you are
describing defines a kernel ABI without being merged and hopes that
ABI will still be supportable in the future.  Honestly I think doing so
is horrible code maintenance policy.

 I would certainly prefer to not have to patch around any problems with 
 calling
 dev_get_by_name
 from a non-gpl module, but if required, I can probably figure something 
 out...


 For all I care binary modules can break, but frankly I don't see
 how encapsulating a couple of structures and pointers in a new
 structure and adding a new argument to existing functions shifts
 the decision about how a function should be usable to the namespace
 guys. IMO all functions should continue to be usable as before,
 as decided by whoever actually wrote them. The only exception
 might be stuff where an existing EXPORT_SYMBOL is clearly wrong,
 but that would be a seperate discussion.

I don't think we have actually shifted the decision.

Further from a namespace perspective if I had to support out of tree
modules and the current in kernel API the implementation would be
impossible short of loading kernel modules multiple times once
for each namespace.  I totally refuse to give out of tree modules
that power whatever their license.

Right now the network namespace code that has been merged isn't that
interesting as it does not include ipv4 and ipv6 support which everyone
uses.

One of the tests for completion of the network namespace work is
grepping for init_net and making certain we have cleanly removed
all references to except in a handful of cases like the boot code.

Once things are largely complete it makes sense to argue with out of
tree module authors that because they don't have network namespace
support in their modules, their modules are broken.   

Right now I suspect to many developers even of in-tree modules
I have just shifted code around in an annoying looking way.  I can
completely see other developers not getting the point.

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: namespace support requires network modules to say GPL

2007-12-03 Thread Eric W. Biederman

Romano Giannetti [EMAIL PROTECTED] writes:

 Please pardon me for jumping in; I am not a kernel developer, but I try
 to help with debugging whenever I can (and it's not just hand-waving, I
 helped to track down a couple of nasty bugs on MMC or ACPI EC,
 recently). And I am an engineer and IANAL, so I wouldn't speak about
 laws here. But I think it's not just a distribution's problem.

 Unfortunately, I need VMware and ndiswrapper to get work done with my
 laptop. It's not the perfect world, but the only alternative is to boot
 in XP. So I normally stick with vendors kernels and, when I have time to
 play with new kernel, I go for it. If ndiswrapper and VMware work,
 perfect, I can test extensively the new kernel; if I find problems, I
 *know* I have to restart without proprietary modules, try to reproduce,
 report back. I did it a lot of times.

 What I think is that every time VMware or (worst) ndiswrapper breaks,
 the kernel loose an awful lot of testers. In the span of time before
 Giri and the VMware team post a patch (-rc1 and -rc4, tipically), my
 testing activity is just occasional. And I guess a lot of people is in
 the same situation. 

 These are just my 2cents. I will continue to test new kernels every time
 I can, and to use native solutions as often as I can (go, ath5k, go!;
 and LabWindows/CVI for Linux, anyone?). But maybe a bit of tolerance can
 help everyone...

As a kernel developer let me say thank you for doing what testing you can.

I think a bit of tolerance for others can help the conversation.  At the
same time since out of tree modules (even GPL'd ones) have not chosen
to play with us we have to move forward as best we can without their
input.  It isn't possible to do anything else.

Right now I have made some changes for good technical reasons, and
some out of tree modules have broken.  Regardless of the flavor of
EXPORT_SYMBOL they would have broken.

Based on my experience with in-tree code and the few glimpses I
have gotten of out of tree code the reason the out of tree code broke
is because it is doing very questionable things.

So the best I can say at this point, is my apologies that we have not
served you better and made it possible to do what you need to do
without relying on code of questionable character.  Hopefully this
situation will be better in the future.

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: namespace support requires network modules to say GPL

2007-12-03 Thread Ben Greear


Eric W. Biederman wrote:

Patrick McHardy [EMAIL PROTECTED] writes:

  

Ben Greear wrote:


I have a binary module that uses dev_get_by_name...it's sort of a bridge-like
thing and
needs user-space to tell it which device to listen for packets on...

This code doesn't need or care about name-spaces, so I don't see how it could
really
be infringing on the author's code (any worse than loading a binary driver
into the kernel
ever does).
  


Regardless of infringement it is incompatible with a complete network
namespace implementation.  Further it sounds like the module you are
describing defines a kernel ABI without being merged and hopes that
ABI will still be supportable in the future.  Honestly I think doing so
is horrible code maintenance policy.
  
I don't mind if the ABI changes, so long as I can still use something 
similar.


The namespace logic is interesting to me in general, but at this point I 
can't think of a way that
it actually helps this particular module.  All I really need is a way to 
grab every frame
from eth0 and then transmit it to eth1.  I'm currently doing this by 
finding the netdevice
and registering a raw-packet protocol (ie, like tcpdump would do).  At 
least up to 2.6.23,
this does not require any hacks to the kernel and uses only non GPL 
exported symbols.


Based on my understanding of the namespace logic, if I never add any 
namespaces,
the general network layout should look similar to how it does today, so 
I should have

no logical problem with my module.


Once things are largely complete it makes sense to argue with out of
tree module authors that because they don't have network namespace
support in their modules, their modules are broken.   
  
Does this imply that every module that accesses the network code *must* 
become
GPL simply because it must interact with namespace logic that is 
exported as GPL only symbols?


Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] netem: trace enhancement

2007-12-03 Thread Ben Greear


Patrick McHardy wrote:


That sounds like it would also be possible using rtnetlink. You could
send out a notification whenever you switch the active buffer and have
userspace listen to these and replace the inactive one.


Also, I think you will need a larger cache than 4-8k if you are running 
higher speeds (100,000 pps, etc),
as you probably can't rely on user-space responding reliably every 10ms 
(or even less time for faster

speeds.)

Thanks,
Ben



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: namespace support requires network modules to say GPL

2007-12-03 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Mon, 03 Dec 2007 11:03:38 -0700

 Based on my experience with in-tree code and the few glimpses I
 have gotten of out of tree code the reason the out of tree code broke
 is because it is doing very questionable things.

Calling dev_get_by_foo() was never ever a very questionable thing.
Stop saying bullshit, because that's all that is coming out of your
mouth in this thread.

The fact is, these modules called perfectly fine interfaces and by
adding namespaces YOU BROKE THEM.

That by itself is OK, they can make the code changes to adapt and use
the init namespace.

Enforcing new licensing restrictions on them for existing interfaces
just because you add a new freaking argument that is practically
speaking a constant and always the same right now, on the other hand,
IS NOT FINE and you must fix this now.

I don't care how you do it.

If you don't want them to get at the init namespace symbol, fine,
revert all the dev_get_by_*() interfaces to not take the namespace
symbol and make them internally use the init namespace albeit
invisibly to the caller.

Then you make all the existing call sites invoke new dev_get_by_*_ns()
interfaces that take an explicit argument.  But only do this where it
is truly necessary, everything uses the init namespace practically
speaking and it's clearer if you conver to the *_ns() variant when the
code itself is converted.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: namespace support requires network modules to say GPL

2007-12-03 Thread Daniel Lezcano


Ben Greear wrote:

Eric W. Biederman wrote:

Patrick McHardy [EMAIL PROTECTED] writes:

 

Ben Greear wrote:
   
I have a binary module that uses dev_get_by_name...it's sort of a 
bridge-like

thing and
needs user-space to tell it which device to listen for packets on...

This code doesn't need or care about name-spaces, so I don't see how 
it could

really
be infringing on the author's code (any worse than loading a binary 
driver

into the kernel
ever does).
  


Regardless of infringement it is incompatible with a complete network
namespace implementation.  Further it sounds like the module you are
describing defines a kernel ABI without being merged and hopes that
ABI will still be supportable in the future.  Honestly I think doing so
is horrible code maintenance policy.
  
I don't mind if the ABI changes, so long as I can still use something 
similar.


The namespace logic is interesting to me in general, but at this point I 
can't think of a way that
it actually helps this particular module.  All I really need is a way to 
grab every frame
from eth0 and then transmit it to eth1.  I'm currently doing this by 
finding the netdevice
and registering a raw-packet protocol (ie, like tcpdump would do).  At 
least up to 2.6.23,
this does not require any hacks to the kernel and uses only non GPL 
exported symbols.


Based on my understanding of the namespace logic, if I never add any 
namespaces,
the general network layout should look similar to how it does today, so 
I should have

no logical problem with my module.


Once things are largely complete it makes sense to argue with out of
tree module authors that because they don't have network namespace
support in their modules, their modules are broken. 
Does this imply that every module that accesses the network code *must* 
become
GPL simply because it must interact with namespace logic that is 
exported as GPL only symbols?


That's right, with init_net's EXPORT_SYMBOL_GPL and dev_get_xx, we 
enforce people to be GPL whatever they didn't asked to have the 
namespaces in their code.


Eric, why can we simply change EXPORT_SYMBOL_GPL to EXPORT_SYMBOL for 
init_net ?



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] ucc_geth: use rx-clock-name and tx-clock-name device tree properties

2007-12-03 Thread Timur Tabi

Updates the ucc_geth device driver to check the new rx-clock-name and
tx-clock-name properties first.  If present, it uses the new function
qe_clock_source() to obtain the clock source.  Otherwise, it checks the
deprecated rx-clock and tx-clock properties.

Update the device trees for 832x, 836x, and 8568 to contain the new property
names only.

Signed-off-by: Timur Tabi [EMAIL PROTECTED]
---

This patch applies to Kumar's for-2.6.25 branch.  ucc_geth will compile but not
run if my other patch, qe: add function qe_clock_source has not also been
applied.

 arch/powerpc/boot/dts/mpc832x_mds.dts |8 ++--
 arch/powerpc/boot/dts/mpc832x_rdb.dts |8 ++--
 arch/powerpc/boot/dts/mpc836x_mds.dts |8 ++--
 arch/powerpc/boot/dts/mpc8568mds.dts  |8 ++--
 drivers/net/ucc_geth.c|   55 ++--
 5 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc832x_mds.dts 
b/arch/powerpc/boot/dts/mpc832x_mds.dts
index c64f303..fe54489 100644
--- a/arch/powerpc/boot/dts/mpc832x_mds.dts
+++ b/arch/powerpc/boot/dts/mpc832x_mds.dts
@@ -223,8 +223,8 @@
 */
mac-address = [ 00 00 00 00 00 00 ];
local-mac-address = [ 00 00 00 00 00 00 ];
-   rx-clock = 19;
-   tx-clock = 1a;
+   rx-clock-name = clk9;
+   tx-clock-name = clk10;
phy-handle =  phy3 ;
pio-handle =  pio3 ;
};
@@ -244,8 +244,8 @@
 */
mac-address = [ 00 00 00 00 00 00 ];
local-mac-address = [ 00 00 00 00 00 00 ];
-   rx-clock = 17;
-   tx-clock = 18;
+   rx-clock-name = clk7;
+   tx-clock-name = clk8;
phy-handle =  phy4 ;
pio-handle =  pio4 ;
};
diff --git a/arch/powerpc/boot/dts/mpc832x_rdb.dts 
b/arch/powerpc/boot/dts/mpc832x_rdb.dts
index 388c8a7..e68a08b 100644
--- a/arch/powerpc/boot/dts/mpc832x_rdb.dts
+++ b/arch/powerpc/boot/dts/mpc832x_rdb.dts
@@ -202,8 +202,8 @@
 */
mac-address = [ 00 00 00 00 00 00 ];
local-mac-address = [ 00 00 00 00 00 00 ];
-   rx-clock = 20;
-   tx-clock = 13;
+   rx-clock-name = clk16;
+   tx-clock-name = clk3;
phy-handle = phy00;
pio-handle = ucc2pio;
};
@@ -223,8 +223,8 @@
 */
mac-address = [ 00 00 00 00 00 00 ];
local-mac-address = [ 00 00 00 00 00 00 ];
-   rx-clock = 19;
-   tx-clock = 1a;
+   rx-clock-name = clk9;
+   tx-clock-name = clk10;
phy-handle = phy04;
pio-handle = ucc3pio;
};
diff --git a/arch/powerpc/boot/dts/mpc836x_mds.dts 
b/arch/powerpc/boot/dts/mpc836x_mds.dts
index 0b2d2b5..bfd48d0 100644
--- a/arch/powerpc/boot/dts/mpc836x_mds.dts
+++ b/arch/powerpc/boot/dts/mpc836x_mds.dts
@@ -254,8 +254,8 @@
 */
mac-address = [ 00 00 00 00 00 00 ];
local-mac-address = [ 00 00 00 00 00 00 ];
-   rx-clock = 0;
-   tx-clock = 19;
+   rx-clock-name = none;
+   tx-clock-name = clk9;
phy-handle =  phy0 ;
phy-connection-type = rgmii-id;
pio-handle =  pio1 ;
@@ -276,8 +276,8 @@
 */
mac-address = [ 00 00 00 00 00 00 ];
local-mac-address = [ 00 00 00 00 00 00 ];
-   rx-clock = 0;
-   tx-clock = 14;
+   rx-clock-name = none;
+   tx-clock-name = clk4;
phy-handle =  phy1 ;
phy-connection-type = rgmii-id;
pio-handle =  pio2 ;
diff --git a/arch/powerpc/boot/dts/mpc8568mds.dts 
b/arch/powerpc/boot/dts/mpc8568mds.dts
index 5439437..cf45aab 100644
--- a/arch/powerpc/boot/dts/mpc8568mds.dts
+++ b/arch/powerpc/boot/dts/mpc8568mds.dts
@@ -333,8 +333,8 @@
 */
mac-address = [ 00 00 00 00 00 00 ];
local-mac-address = [ 00 00 00 00 00 00 ];
-   rx-clock = 0;
-   tx-clock = 20;
+   rx-clock-name = none;
+   tx-clock-name = clk16;
pio-handle = pio1;
phy-handle = phy0;

[PATCH 0/2] QE clock source improvements

2007-12-03 Thread Timur Tabi


This patch set adds a new property to make specifying QE clock sources
easier, adds a function to help parse the property, and updates the ucc_geth
driver to take advantage of all this.

Patch #1 is an arch/powerpc patch meant for Kumar's for-2.6.25 branch.
Patch #2 is a netdev patch, so it's either for Jeff G and/or Kumar.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] qe: add function qe_clock_source()

2007-12-03 Thread Timur Tabi

Add function qe_clock_source() which takes a string containing the name of a
QE clock source (as is typically found in device trees) and returns the
matching enum qe_clock value.

Update booting-without-of.txt to indicate that the UCC properties rx-clock
and tx-clock are deprecated and replaced with rx-clock-name and tx-clock-name,
which use strings instead of numbers to indicate QE clock sources.

Signed-off-by: Timur Tabi [EMAIL PROTECTED]
---

This patch applies to Kumar's for-2.6.25 branch.  You may need to apply
my other pending patch, qe: add ability to upload QE firmware, first.

 Documentation/powerpc/booting-without-of.txt |   13 ++
 arch/powerpc/sysdev/qe_lib/qe.c  |   32 ++
 include/asm-powerpc/qe.h |1 +
 3 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/Documentation/powerpc/booting-without-of.txt 
b/Documentation/powerpc/booting-without-of.txt
index e9a3cb1..c4342d3 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1626,6 +1626,19 @@ platforms are moved over to use the 
flattened-device-tree model.
- interrupt-parent : the phandle for the interrupt controller that
  services interrupts for this device.
- pio-handle : The phandle for the Parallel I/O port configuration.
+   - rx-clock-name: the UCC receive clock source
+ none: clock source is disabled
+ brg1 through brg16: clock source is BRG1-BRG16, respectively
+ clk1 through clk24: clock source is CLK1-CLK24, respectively
+   - tx-clock-name: the UCC transmit clock source
+ none: clock source is disabled
+ brg1 through brg16: clock source is BRG1-BRG16, respectively
+ clk1 through clk24: clock source is CLK1-CLK24, respectively
+   The following two properties are deprecated.  rx-clock has been replaced
+   with rx-clock-name, and tx-clock has been replaced with tx-clock-name.
+   Drivers that currently use the deprecated properties should continue to
+   do so, in order to support older device trees, but they should be updated
+   to check for the new properties first.
- rx-clock : represents the UCC receive clock source.
  0x00 : clock source is disabled;
  0x1~0x10 : clock source is BRG1~BRG16 respectively;
diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c
index 865277b..5df8530 100644
--- a/arch/powerpc/sysdev/qe_lib/qe.c
+++ b/arch/powerpc/sysdev/qe_lib/qe.c
@@ -204,6 +204,38 @@ int qe_setbrg(enum qe_clock brg, unsigned int rate, 
unsigned int multiplier)
 }
 EXPORT_SYMBOL(qe_setbrg);
 
+/* Convert a string to a QE clock source enum
+ *
+ * This function takes a string, typically from a property in the device
+ * tree, and returns the corresponding enum qe_clock value.
+*/
+enum qe_clock qe_clock_source(const char *source)
+{
+   unsigned int i;
+
+   if (strcasecmp(source, none) == 0)
+   return QE_CLK_NONE;
+
+   if (strncasecmp(source, brg, 3) == 0) {
+   i = simple_strtoul(source + 3, NULL, 10);
+   if ((i = 1)  (i = 16))
+   return (QE_BRG1 - 1) + i;
+   else
+   return QE_CLK_DUMMY;
+   }
+
+   if (strncasecmp(source, clk, 3) == 0) {
+   i = simple_strtoul(source + 3, NULL, 10);
+   if ((i = 1)  (i = 24))
+   return (QE_CLK1 - 1) + i;
+   else
+   return QE_CLK_DUMMY;
+   }
+
+   return QE_CLK_DUMMY;
+}
+EXPORT_SYMBOL(qe_clock_source);
+
 /* Initialize SNUMs (thread serial numbers) according to
  * QE Module Control chapter, SNUM table
  */
diff --git a/include/asm-powerpc/qe.h b/include/asm-powerpc/qe.h
index 35c7b8d..430dc77 100644
--- a/include/asm-powerpc/qe.h
+++ b/include/asm-powerpc/qe.h
@@ -84,6 +84,7 @@ extern int par_io_data_set(u8 port, u8 pin, u8 val);
 
 /* QE internal API */
 int qe_issue_cmd(u32 cmd, u32 device, u8 mcn_protocol, u32 cmd_input);
+enum qe_clock qe_clock_source(const char *source);
 int qe_setbrg(enum qe_clock brg, unsigned int rate, unsigned int multiplier);
 int qe_get_snum(void);
 void qe_put_snum(u8 snum);
-- 
1.5.2.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [README] away until Dec 3rd

2007-12-03 Thread Herbert Xu

On Thu, Nov 22, 2007 at 12:27:47PM +0800, Herbert Xu wrote:
 On Tue, Nov 20, 2007 at 08:29:21PM -0800, David Miller wrote:
  
  During this time Herbert Xu (CC:'d) will take care of both the net-2.6
  stable tree and the net-2.6.25 devel tree.
 
 For this duration please use the net-2.6.25 tree at this location
 for basing your patches:

OK, David is back now.  If I haven't taken your patches yet, then please
resend them to [EMAIL PROTECTED] because all previous submissions may
have been deleted.

Andrew, you can stop pulling my tree and start pulling David's now.

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git/

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix memory corruption in fec_mpc52xx

2007-12-03 Thread David Woodhouse

From: Jon Smirl [EMAIL PROTECTED]

The mpc5200 fec driver is corrupting memory. This patch fixes two bugs
where the wrong skb was being referenced.

Signed-off-by: Jon Smirl [EMAIL PROTECTED]
Acked-by: Domen Puncer [EMAIL PROTECTED]
Signed-off-by: Grant Likely [EMAIL PROTECTED]
Signed-off-by: David Woodhouse [EMAIL PROTECTED]

--- a/drivers/net/fec_mpc52xx.c
+++ b/drivers/net/fec_mpc52xx.c
@@ -422,7 +422,7 @@ static irqreturn_t mpc52xx_fec_rx_interrupt(int irq, void 
*dev_id)

rskb = bcom_retrieve_buffer(priv-rx_dmatsk, status,
(struct bcom_bd **)bd);
-   dma_unmap_single(dev-dev, bd-skb_pa, skb-len, 
DMA_FROM_DEVICE);
+   dma_unmap_single(dev-dev, bd-skb_pa, rskb-len, 
DMA_FROM_DEVICE);

/* Test for errors in received frame */
if (status  BCOM_FEC_RX_BD_ERRORS) {
@@ -467,7 +467,7 @@ static irqreturn_t mpc52xx_fec_rx_interrupt(int irq, void 
*dev_id)
bcom_prepare_next_buffer(priv-rx_dmatsk);

bd-status = FEC_RX_BUFFER_SIZE;
-   bd-skb_pa = dma_map_single(dev-dev, rskb-data,
+   bd-skb_pa = dma_map_single(dev-dev, skb-data,
FEC_RX_BUFFER_SIZE, DMA_FROM_DEVICE);

bcom_submit_next_buffer(priv-rx_dmatsk, skb);


-- 
dwmw2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Fix memory corruption in fec_mpc52xx

2007-12-03 Thread Jon Smirl

On 12/3/07, David Woodhouse [EMAIL PROTECTED] wrote:
 From: Jon Smirl [EMAIL PROTECTED]

 The mpc5200 fec driver is corrupting memory. This patch fixes two bugs
 where the wrong skb was being referenced.

What does it take to get this patch into the kernel? I've been posting
it since Nov 9.
It is an obvious bug. I can't even boot one of my boxes without it.

-- 
Jon Smirl
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Lachlan Andrew

Greetings Stephen,

Thanks.  We'll have to play with the rate of ageing.  I used the slower ageing

if (ca-cnt_rtt  3) {
u64 mean_rtt = ca-sum_rtt;
do_div (mean_rtt, ca-cnt_rtt);

if (ca-max_rtt  mean_rtt)
ca-max_rtt -= (ca-max_rtt - mean_rtt)  9;
}

and still found that the max_rtt drops considerably within a congestion epoch.

What would also really help would be getting rid of RTT outliers
somehow.  I ignore RTT measurements when SACK is active:
   if (ca-max_rtt  rtt) {
struct tcp_sock *tp = tcp_sk(sk);
if (! tp-sacked_out )  // SACKs cause hi-CPU/hi-RTT. ignore
ca-max_rtt = rtt;
}
which helps a lot, but still gets some outliers.  Would it be possible
to time-stamp packets in the hardware interrupt handler, instead of
waiting for the post-processing stage?

Cheers,
Lachlan

On 03/12/2007, Stephen Hemminger [EMAIL PROTECTED] wrote:
 On Wed, 28 Nov 2007 21:26:12 -0800
 Shao Liu [EMAIL PROTECTED] wrote:

  Hi Stephen and Lachlan,
 
  Thanks for pointing out and fixing this bug.
 
  For the max RTT problem, I have considered it also and I have some idea on
  improve it. I also have some other places to improve. I will summarize all
  my new ideas and send you an update. For me to change it, could you please
  give me a link to download to latest source codes for the whole congestion
  control module in Linux implementation, including the general module for all
  algorithms, and the implementation for specific algorithms like TCP-Illinois
  and H-TCP?
 
  Thanks for the help!
  -Shao
 
 
 
  -Original Message-
  From: Stephen Hemminger [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, November 28, 2007 4:44 PM
  To: Lachlan Andrew
  Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith;
  Robert Shorten; netdev@vger.kernel.org
  Subject: Re: [PATCH] tcp-illinois: incorrect beta usage
 
  Lachlan Andrew wrote:
   Thanks Stephen.
  
   A related problem (largely due to the published algorithm itself) is
   that Illinois is very aggressive when it over-estimates the maximum
   RTT.
  
   At high load (say 200Mbps and 200ms RTT), a backlog of packets builds
   up just after a loss, causing the RTT estimate to become large.  This
   makes Illinois think that *all* losses are due to corruption not
   congestion, and so only back off by 1/8 instead of 1/2.
  
   I can't think how to fix this except by better RTT estimation, or
   changes to Illinois itself.  Currently, I ignore RTT measurements when
  sacked_out != 0and have a heuristic RTT aging mechanism, but
   that's pretty ugly.
  
   Cheers,
   Lachlan
  
  
  Ageing the RTT estimates needs to be done anyway.
  Maybe something can be reused from H-TCP. The two are closely related.
 

 The following adds gradual aging of max RTT.

 --- a/net/ipv4/tcp_illinois.c   2007-11-29 08:58:35.0 -0800
 +++ b/net/ipv4/tcp_illinois.c   2007-11-29 09:37:33.0 -0800
 @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk)
 ca-cnt_rtt = 0;
 ca-sum_rtt = 0;

 -   /* TODO: age max_rtt? */
 +   /* add slowly fading memory for maxRTT to accommodate routing changes 
 */
 +   if (ca-max_rtt  ca-base_rtt)
 +   ca-max_rtt = ca-base_rtt
 +   + (((ca-max_rtt - ca-base_rtt) * 31)  5);
  }

  static void tcp_illinois_init(struct sock *sk)



-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603
http://netlab.caltech.edu/~lachlan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] TCP illinois max rtt aging

2007-12-03 Thread Stephen Hemminger

On Wed, 28 Nov 2007 21:26:12 -0800
Shao Liu [EMAIL PROTECTED] wrote:

 Hi Stephen and Lachlan,
 
 Thanks for pointing out and fixing this bug.
 
 For the max RTT problem, I have considered it also and I have some idea on
 improve it. I also have some other places to improve. I will summarize all
 my new ideas and send you an update. For me to change it, could you please
 give me a link to download to latest source codes for the whole congestion
 control module in Linux implementation, including the general module for all
 algorithms, and the implementation for specific algorithms like TCP-Illinois
 and H-TCP? 
 
 Thanks for the help!
 -Shao
 
 
 
 -Original Message-
 From: Stephen Hemminger [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, November 28, 2007 4:44 PM
 To: Lachlan Andrew
 Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith;
 Robert Shorten; netdev@vger.kernel.org
 Subject: Re: [PATCH] tcp-illinois: incorrect beta usage
 
 Lachlan Andrew wrote:
  Thanks Stephen.
 
  A related problem (largely due to the published algorithm itself) is
  that Illinois is very aggressive when it over-estimates the maximum
  RTT.
 
  At high load (say 200Mbps and 200ms RTT), a backlog of packets builds
  up just after a loss, causing the RTT estimate to become large.  This
  makes Illinois think that *all* losses are due to corruption not
  congestion, and so only back off by 1/8 instead of 1/2.
 
  I can't think how to fix this except by better RTT estimation, or
  changes to Illinois itself.  Currently, I ignore RTT measurements when
 sacked_out != 0and have a heuristic RTT aging mechanism, but
  that's pretty ugly.
 
  Cheers,
  Lachlan
 

 Ageing the RTT estimates needs to be done anyway.
 Maybe something can be reused from H-TCP. The two are closely related.


The following adds gradual aging of max RTT.

--- a/net/ipv4/tcp_illinois.c   2007-11-29 08:58:35.0 -0800
+++ b/net/ipv4/tcp_illinois.c   2007-11-29 09:37:33.0 -0800
@@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk)
ca-cnt_rtt = 0;
ca-sum_rtt = 0;
 
-   /* TODO: age max_rtt? */
+   /* add slowly fading memory for maxRTT to accommodate routing changes */
+   if (ca-max_rtt  ca-base_rtt)
+   ca-max_rtt = ca-base_rtt
+   + (((ca-max_rtt - ca-base_rtt) * 31)  5);
 }
 
 static void tcp_illinois_init(struct sock *sk)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fw: [Bug 9493] New: routing table bug or feature adding/replacing/deleting routes

2007-12-03 Thread Stephen Hemminger



Begin forwarded message:

Date: Mon,  3 Dec 2007 11:23:27 -0800 (PST)
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bug 9493] New: routing table bug or feature adding/replacing/deleting 
routes


http://bugzilla.kernel.org/show_bug.cgi?id=9493

   Summary: routing table bug or feature adding/replacing/deleting
routes
   Product: Networking
   Version: 2.5
 KernelVersion: 2.6.23
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: IPV4
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


Most recent kernel where this bug did not occur: unknown
Distribution: Ubuntu (kernel 2.6.22-14), Gentoo (kernel 2.6.23-gentoo-r3)
Hardware Environment: x86
Software Environment:
Steps to reproduce:
(eth0 is 172.16.32.0/22):
$ ip route add 172.17.0.1 via 172.16.32.1
$ ip route append 172.17.0.1 via 172.16.35.1
$ ip ro
...
172.17.0.1 via 172.16.32.1 dev eth0
172.17.0.1 via 172.16.35.1 dev eth0
...
$ ip ro replace 172.17.0.1 via 172.16.35.1
$ ip ro
...
172.17.0.1 via 172.16.35.1 dev eth0
172.17.0.1 via 172.16.35.1 dev eth0
...
$ ip ro del 172.17.0.1
$ ip ro
...
172.17.0.1 via 172.16.35.1 dev eth0
...

Problem Description:
Is it OK:
- only 1 route is deleted?
- only 1 route is replaced?
- replacing route results in dublicate routing table entry?
Is this a bug or feature?


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


-- 
Stephen Hemminger [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Shao Liu

Hi Stephen and Lachlan,

Thanks for the discussion. The gradual aging is surely an option. And
another possibility is that, we compute the RTT just before congestion
notification, which ideally, represent the full queueing delay + propagation
delay. We can compute the average of the last M such values, and either use
the average as maxRTT, or use it as a benchmark to judge whether a sample is
outlier. How do you think of this idea?

And also a question, why the samples when SACK is active are outliers? 

For the accuracy of time stamping, I am not very familiar with the
implementation details. But I can think of two ways, 1) do time stamp in as
low layer as possible; 2) use as high priority thread to do it as possible.
For 2), we can use separate threads to do time stamp and to process packets.

Thanks and I will let you know more of my thoughts after I go over the
entire code space!
-Shao 


-Original Message-
From: Lachlan Andrew [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 03, 2007 3:06 PM
To: Stephen Hemminger
Cc: [EMAIL PROTECTED]; David S. Miller; Herbert Xu; Douglas Leith;
Robert Shorten; netdev@vger.kernel.org
Subject: Re: [RFC] TCP illinois max rtt aging

Greetings Stephen,

Thanks.  We'll have to play with the rate of ageing.  I used the slower
ageing

if (ca-cnt_rtt  3) {
u64 mean_rtt = ca-sum_rtt;
do_div (mean_rtt, ca-cnt_rtt);

if (ca-max_rtt  mean_rtt)
ca-max_rtt -= (ca-max_rtt - mean_rtt)  9;
}

and still found that the max_rtt drops considerably within a congestion
epoch.

What would also really help would be getting rid of RTT outliers
somehow.  I ignore RTT measurements when SACK is active:
   if (ca-max_rtt  rtt) {
struct tcp_sock *tp = tcp_sk(sk);
if (! tp-sacked_out )  // SACKs cause hi-CPU/hi-RTT. ignore
ca-max_rtt = rtt;
}
which helps a lot, but still gets some outliers.  Would it be possible
to time-stamp packets in the hardware interrupt handler, instead of
waiting for the post-processing stage?

Cheers,
Lachlan

On 03/12/2007, Stephen Hemminger [EMAIL PROTECTED] wrote:
 On Wed, 28 Nov 2007 21:26:12 -0800
 Shao Liu [EMAIL PROTECTED] wrote:

  Hi Stephen and Lachlan,
 
  Thanks for pointing out and fixing this bug.
 
  For the max RTT problem, I have considered it also and I have some idea
on
  improve it. I also have some other places to improve. I will summarize
all
  my new ideas and send you an update. For me to change it, could you
please
  give me a link to download to latest source codes for the whole
congestion
  control module in Linux implementation, including the general module for
all
  algorithms, and the implementation for specific algorithms like
TCP-Illinois
  and H-TCP?
 
  Thanks for the help!
  -Shao
 
 
 
  -Original Message-
  From: Stephen Hemminger [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, November 28, 2007 4:44 PM
  To: Lachlan Andrew
  Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith;
  Robert Shorten; netdev@vger.kernel.org
  Subject: Re: [PATCH] tcp-illinois: incorrect beta usage
 
  Lachlan Andrew wrote:
   Thanks Stephen.
  
   A related problem (largely due to the published algorithm itself) is
   that Illinois is very aggressive when it over-estimates the maximum
   RTT.
  
   At high load (say 200Mbps and 200ms RTT), a backlog of packets builds
   up just after a loss, causing the RTT estimate to become large.  This
   makes Illinois think that *all* losses are due to corruption not
   congestion, and so only back off by 1/8 instead of 1/2.
  
   I can't think how to fix this except by better RTT estimation, or
   changes to Illinois itself.  Currently, I ignore RTT measurements when
  sacked_out != 0and have a heuristic RTT aging mechanism, but
   that's pretty ugly.
  
   Cheers,
   Lachlan
  
  
  Ageing the RTT estimates needs to be done anyway.
  Maybe something can be reused from H-TCP. The two are closely related.
 

 The following adds gradual aging of max RTT.

 --- a/net/ipv4/tcp_illinois.c   2007-11-29 08:58:35.0 -0800
 +++ b/net/ipv4/tcp_illinois.c   2007-11-29 09:37:33.0 -0800
 @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk)
 ca-cnt_rtt = 0;
 ca-sum_rtt = 0;

 -   /* TODO: age max_rtt? */
 +   /* add slowly fading memory for maxRTT to accommodate routing
changes */
 +   if (ca-max_rtt  ca-base_rtt)
 +   ca-max_rtt = ca-base_rtt
 +   + (((ca-max_rtt - ca-base_rtt) * 31)  5);
  }

  static void tcp_illinois_init(struct sock *sk)



-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603
http://netlab.caltech.edu/~lachlan

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: [PATCH 2/4] datagram: mem_scheudle functions

2007-12-03 Thread Hideo AOKI

Herbert Xu wrote:
 On Wed, Nov 28, 2007 at 01:52:59PM -0500, Hideo AOKI wrote:
 +static inline int sk_wmem_schedule(struct sock *sk, int size)
 +{
 +if (sk-sk_type == SOCK_STREAM)
 +return sk_stream_wmem_schedule(sk, size);
 +else if (sk-sk_type == SOCK_DGRAM)
 +return sk_datagram_wmem_schedule(sk, size);
 +else
 +return 1;
 +}
 
 Why do we need this function? As far as I can see we always know
 whether it's a stream or datagram socket at compile time so doing
 a run-time test is pointless.

Because we have to call wmem_schedule function in ip_append_data()
which is used by several protocols both stream and datagram.
I just thought adding the sk_wmem_schedule() was only way to call
proper function from ip_append_data().

Please let me know if I misunderstand or there is better way to
call wmem_schedule functions.

Best regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] udp: memory accounting in IPv4

2007-12-03 Thread Hideo AOKI

Eric Dumazet wrote:
 Herbert Xu a écrit :
 However, I'm still a little concerned about the effect of two more
 atomic op's per packet that we're adding here.  Hang on a sec, that
 should've been Dave's line since atomic ops are cheap on x86 :)

 But seriously, it's not so much that we have two more atomic op's
 per packet, but we have two more writes to a single global counter
 for each packet.  This is going to really suck on SMP.

 So what I'd like to see is a scheme that's similar to sk_forward_alloc.
 The idea is that each socket allocates memory using mem_schedule and
 then stores it in sk_forward_alloc.  Each packet then only has to
 add to/subtract from sk_forward_alloc.

 There is one big problem with this though, UDP is not serialised like
 TCP.  So you can't just use sk_forward_alloc since it's not an atomic_t.

 We'll need to think about this one a bit more.
 
 I agree adding yet another atomics ops is a big problem.
 
 Another idea, coupled with recent work on percpu storage done by
 Christoph Lameter, would be to use kind of a percpu_counter :
 
 We dont really need strong and precise memory accounting (UDP , but TCP
 as well), just some kind of limit to avoid memory to be too much used.
 
 That is, updating a percpu variable, and doing some updates to a global
 counter only when this percpu variable escapes from a given range.
 
 Lot of contended cache lines could benefit from this relaxing (count of
 sockets...)
 
 I would wait first that Christoph work is done, so that we dont need
 atomic ops on local cpu storage (and no need to disable preemption too).

Thank you for your comments.
I understood your concern of atomic operations.

Let me try to use sk_forward_alloc at first, while percpu storage
is an interesting idea.

Many thanks,
Hideo

-- 
Hitachi Computer Products (America) Inc.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] udp: memory accounting in IPv4

2007-12-03 Thread Herbert Xu

On Mon, Dec 03, 2007 at 07:14:26PM -0500, Hideo AOKI wrote:

 Let me try to use sk_forward_alloc at first, while percpu storage
 is an interesting idea.

Actually I don't think sk_forward_alloc would work for UDP because
it runs lockless (unlike TCP which is run under a the socket lock).

So it's either going to be the atomic op or per-cpu counters.  For
me the atomic op isn't the issue, it's the SMP cache-line bouncing
that's more important so having something that did atomic ops on a
socket counter which then feeds into the global counter would solve
my concerns.

But let's wait and see what Dave has to say about this too.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Stephen Hemminger

On Mon, 3 Dec 2007 15:59:23 -0800
Shao Liu [EMAIL PROTECTED] wrote:

 Hi Stephen and Lachlan,
 
 Thanks for the discussion. The gradual aging is surely an option. And
 another possibility is that, we compute the RTT just before congestion
 notification, which ideally, represent the full queueing delay + propagation
 delay. We can compute the average of the last M such values, and either use
 the average as maxRTT, or use it as a benchmark to judge whether a sample is
 outlier. How do you think of this idea?

The problem with an average like that would be storing enough values
to be useful and choosing how many to store. Perhaps some form of
weighted sliding average which favors recent values heavily would work
best. Remember that RTT's have a huge noise component and you are
fighting against the long tail distribution trying to see the queue
effects.

 
 And also a question, why the samples when SACK is active are outliers?

Any sample with SACK is going to mean a loss or reordering has occurred.
So shouldn't the SACK values be useful, but RTT values from retransmits
are not useful.

 
 For the accuracy of time stamping, I am not very familiar with the
 implementation details. But I can think of two ways, 1) do time stamp in as
 low layer as possible; 2) use as high priority thread to do it as possible.
 For 2), we can use separate threads to do time stamp and to process packets.

Right now the resolution is in microseconds using the hardware clock.
The clock usage costs a little bit, but makes the math more accurate.
It would be worth exploring sensitivity by taking out RTT_STAMP from
the flags field and varying HZ.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] sky2: recovery deadlock fix

2007-12-03 Thread Stephen Hemminger

Prevent deadlock in sky2 recovery logic. sky2_down calls napi_synchronize
which gets stuck if napi was already disabled. 

Fix by rearranging slightly and not calling napi_disable until after
both ports are stopped. The napi_disable probably is being overly
paranoid, but it is safe now.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
Please apply for 2.6.24 (upstream-fixes)

--- a/drivers/net/sky2.c2007-11-30 16:51:50.0 -0800
+++ b/drivers/net/sky2.c2007-11-30 16:54:52.0 -0800
@@ -2906,16 +2906,14 @@ static void sky2_restart(struct work_str
int i, err;
 
rtnl_lock();
-   sky2_write32(hw, B0_IMSK, 0);
-   sky2_read32(hw, B0_IMSK);
-   napi_disable(hw-napi);
-
for (i = 0; i  hw-ports; i++) {
dev = hw-dev[i];
if (netif_running(dev))
sky2_down(dev);
}
 
+   napi_disable(hw-napi);
+   sky2_write32(hw, B0_IMSK, 0);
sky2_reset(hw);
sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
napi_enable(hw-napi);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Lachlan Andrew

Greetings,

On 03/12/2007, Stephen Hemminger [EMAIL PROTECTED] wrote:
 On Mon, 3 Dec 2007 15:59:23 -0800
 Shao Liu [EMAIL PROTECTED] wrote:
  And also a question, why the samples when SACK is active are outliers?

 Any sample with SACK is going to mean a loss or reordering has occurred.
 So shouldn't the SACK values be useful, but RTT values from retransmits
 are not useful.

When SACK is active, the per-packet processing becomes more involved,
tracking the list of lost/SACKed packets.  This causes a CPU spike
just after a loss, which increases the RTTs, at least in my
experience.  This is a separate issue from the fact that it is hard to
get RTT measurements from lost/retransmitted packets themselves.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603
http://netlab.caltech.edu/~lachlan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][IPVS] Don't leak sysctl tables if the scheduler registration fails

2007-12-03 Thread Simon Horman

On Mon, Dec 03, 2007 at 01:04:53PM +0300, Pavel Emelyanov wrote:
 In case we load lblc or lblcr module we can leak some sysctl
 tables if the call to register_ip_vs_scheduler() fails.

This looks correct to me.

 I've looked at the register_ip_vs_scheduler() code and saw, that
 the only reason to fail is the name collision, so I think that
 with some 3rd party schedulers this becomes a relevant issue. No?

I guess so. Though presumably register_ip_vs_scheduler() could
have other modes of failure in the future.

 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

Acked-by: Simon Horman [EMAIL PROTECTED]

 ---
 
 diff --git a/net/ipv4/ipvs/ip_vs_lblc.c b/net/ipv4/ipvs/ip_vs_lblc.c
 index b843a11..ad89644 100644
 --- a/net/ipv4/ipvs/ip_vs_lblc.c
 +++ b/net/ipv4/ipvs/ip_vs_lblc.c
 @@ -580,9 +580,14 @@ static struct ip_vs_scheduler ip_vs_lblc_scheduler =
  
  static int __init ip_vs_lblc_init(void)
  {
 + int ret;
 +
   INIT_LIST_HEAD(ip_vs_lblc_scheduler.n_list);
   sysctl_header = register_sysctl_table(lblc_root_table);
 - return register_ip_vs_scheduler(ip_vs_lblc_scheduler);
 + ret = register_ip_vs_scheduler(ip_vs_lblc_scheduler);
 + if (ret)
 + unregister_sysctl_table(sysctl_header);
 + return ret;
  }
  
  
 diff --git a/net/ipv4/ipvs/ip_vs_lblcr.c b/net/ipv4/ipvs/ip_vs_lblcr.c
 index e5b323a..2a5ed85 100644
 --- a/net/ipv4/ipvs/ip_vs_lblcr.c
 +++ b/net/ipv4/ipvs/ip_vs_lblcr.c
 @@ -769,9 +769,14 @@ static struct ip_vs_scheduler ip_vs_lblcr_scheduler =
  
  static int __init ip_vs_lblcr_init(void)
  {
 + int ret;
 +
   INIT_LIST_HEAD(ip_vs_lblcr_scheduler.n_list);
   sysctl_header = register_sysctl_table(lblcr_root_table);
 - return register_ip_vs_scheduler(ip_vs_lblcr_scheduler);
 + ret = register_ip_vs_scheduler(ip_vs_lblcr_scheduler);
 + if (ret)
 + unregister_sysctl_table(sysctl_header);
 + return ret;
  }
  
  

-- 
Horms

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx

2007-12-03 Thread Wang Chen

Peter Korsgaard said the following on 2007-12-3 21:47:
 Wang == Wang Chen [EMAIL PROTECTED] writes:
 
 Hi,
 
  Wang +  len = skb-len;
  Wangnetif_rx(skb);
  dev- stats.rx_packets++;
  Wang -  dev-stats.rx_bytes += skb-len;
  Wang +  dev-stats.rx_bytes += len;
 
 Why not simply update the stats before calling netif_rx as the return
 value isn't checked anyway?
 

Even the return value of netif_rx isn't checked, dev-stats maybe
changed in netif_rx. But fortunately dev-stats isn't changed in
netif_rx.
So, I agree. 
Here is the new patch.

Signed-off-by: Wang Chen [EMAIL PROTECTED]
---
 smc911x.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.24.rc3.org/drivers/net/smc911x.c  2007-11-19 12:38:05.0 
+0800
+++ linux-2.6.24.rc3/drivers/net/smc911x.c  2007-12-04 09:59:06.0 
+0800
@@ -1299,9 +1299,9 @@ smc911x_rx_dma_irq(int dma, void *data)
PRINT_PKT(skb-data, skb-len);
dev-last_rx = jiffies;
skb-protocol = eth_type_trans(skb, dev);
-   netif_rx(skb);
dev-stats.rx_packets++;
dev-stats.rx_bytes += skb-len;
+   netif_rx(skb);
 
spin_lock_irqsave(lp-lock, flags);
pkts = (SMC_GET_RX_FIFO_INF()  RX_FIFO_INF_RXSUSED_)  16;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG] open network driver bugs (100)

2007-12-03 Thread Stephen Hemminger


ID  AssigneeSummary 
2283[EMAIL PROTECTED]   (net forcedeth) NETDEV WATCHDOG: eth1: transmit 
timed out... 
2776[EMAIL PROTECTED]   (net dmfe) Davicom 9102AF only works in 10 Mbps 
3156[EMAIL PROTECTED]   (net de2104x) Kernel panic with de2104x tulip 
driver on boot 
3526[EMAIL PROTECTED]   (net 8139too) CardBus NIC locks up system with 
PIO 
3661[EMAIL PROTECTED]   (net 8139too) gives strange errors, drops and 
overruns in... 
3765[EMAIL PROTECTED]   (net b44) Network link down, when writting to 
sata disk A... 
3801[EMAIL PROTECTED]   (net 8139too) does not set PME-Enable upon 
setting WOL 
3938[EMAIL PROTECTED]   (net tun) device driver doesn't fill in 
interface name on... 
4186[EMAIL PROTECTED]   (wireless airo) Aironet 340 PCMCIA does not 
support WPA 
4420[EMAIL PROTECTED]   (net tulip) Problem with 6 bit addressing in 
tulip_read_e... 
4434[EMAIL PROTECTED]   (net Tulip) NIC card causes hard lock up of PC 
4451[EMAIL PROTECTED]   (net via-rhine) VIA Rhine II: media detection 
fails on re... 
4566[EMAIL PROTECTED]   (net B44) Randomly driver starts sending 
garbage and stop... 
4701[EMAIL PROTECTED]   (net tulip) No driver works with Asantï¿½FAST 
10/100 PWA ... 
4755[EMAIL PROTECTED]   (net b44) At system startup b44 doesn't report 
correct mi... 
4849[EMAIL PROTECTED]   (net via-rhine) WAKE ON LAN not working. 
4883[EMAIL PROTECTED]   (net tg3) doesn't send ARP reply on 8021q VLAN 
interfaces 
5149[EMAIL PROTECTED]   Wake-on-lan broken using Intel e100 driver 
5569[EMAIL PROTECTED]   (net xirc2ps_cs) stopped working in 2.6.14 
5624[EMAIL PROTECTED]   (net b44) Lockup on 'mii-tool ethX' when 
'ifconfig ethX u... 
5839[EMAIL PROTECTED]   uli526x partially recognizing interface 
5870[EMAIL PROTECTED]   SiS 190 doesn't download files 
5979[EMAIL PROTECTED]   (net dmfe) Davicom DM9102 Network Card cuts out 
every 60 ... 
6108[EMAIL PROTECTED]   Failure of r8169 ethernet i/f after resume from 
S3 sleep 
6171[EMAIL PROTECTED]   3c59x: netlink only updates online status each 
60s 
6444[EMAIL PROTECTED]   (net 3c59x) transmit timed out 
6690[EMAIL PROTECTED]   (net forcedeth) 0.57 problems at gigabit speeds 
6807[EMAIL PROTECTED]   r8169: freeze at higher speeds 
6986[EMAIL PROTECTED]   e1000 doesn't update trafic counters frequently 
enough 
7051[EMAIL PROTECTED]   prism54 does not respect carrier 
7071[EMAIL PROTECTED]   (net cassini) Can't bring Sun Quad GigaSwift 
Ethernet int... 
7085[EMAIL PROTECTED]   System freezes when load the module sis190 
7133[EMAIL PROTECTED]   (net ibmtr_cs) seams working as fine with 64bit 
kernel th... 
7226[EMAIL PROTECTED]   (net 8139too) Problem in forcing RTL8139 into 
100Mbps ful... 
7440[EMAIL PROTECTED]   (net 3c59x) suddenly receives no more packets 
7443[EMAIL PROTECTED]   (net 8139too) transmit timeouts with edge 
triggered PCI irq 
7487[EMAIL PROTECTED]   (net sundance) driver fails to recognize 
carrier status 
7588[EMAIL PROTECTED]   (net wan z85230) Race: Lock is not acquired 
before callin... 
7617[EMAIL PROTECTED]   sky2 driver crashes 
7633[EMAIL PROTECTED]   (net bonding) Race: Caller of function 
alb_swap_mac_addr(... 
7659[EMAIL PROTECTED]   (net 3c589_cs) Race: a lock must be held before 
entering ... 
7660[EMAIL PROTECTED]   (net wan z85230) Race: lock must be held before 
entering ... 
7682[EMAIL PROTECTED]   bcm43xx: iwlist scan: no scan results with 
2.6.19.1 
7696[EMAIL PROTECTED]   (net b44) driver doesn't work under heavy load 
7718[EMAIL PROTECTED]   (net xircom_cb, xircom_tulip_cb) When card 
inserted, dmes... 
7752[EMAIL PROTECTED]   drivers/net/wireless/zd1211rw/zd_chip.c:1461 
ASSERT r = ... 
7821[EMAIL PROTECTED]   sundance driver not activating D-Link DFE-580TX 
adapter 
7853[EMAIL PROTECTED]   (net ppp) Get mppe_decompress: osize too 
small while fo... 
7856[EMAIL PROTECTED]   (net b44) WOL problem 
7946[EMAIL PROTECTED]   ipw2200 driver + wpa_supplicant (wpa-psk) = 
fail to send ... 
8007[EMAIL PROTECTED]   (net 3c59x) 3c905 Tornado dosen't work (eeprom 
mac addres... 
8009[EMAIL PROTECTED]   (net ppp) PPPoE+mppe Server fail with Win 
Client 
8043[EMAIL PROTECTED]   curious communication breakage with e1000 and 
NBT 
8061[EMAIL PROTECTED]   is VIA Network device statistics calculation 
wrong? missi... 
8084[EMAIL PROTECTED]   phy_mii_ioctl(...) forgets to return 
phy_device's speed s... 
8106[EMAIL PROTECTED]   tx_errors and tx_fifo_errors not updated 
consistantly 
8132[EMAIL PROTECTED]   pptp server lockup in ppp_asynctty_receive() 
8252[EMAIL PROTECTED]

Re: [PATCH][IPVS] Fix sched registration race when checking for name collision

2007-12-03 Thread Simon Horman

On Mon, Dec 03, 2007 at 01:10:57PM +0300, Pavel Emelyanov wrote:
 The register_ip_vs_scheduler() checks for the scheduler with the
 same name under the read-locked __ip_vs_sched_lock, then drops,
 takes it for writing and puts the scheduler in list.
 
 This is racy, since we can have a race window between the lock
 being re-locked for writing.
 
 The fix is to search the scheduler with the given name right under
 the write-locked __ip_vs_sched_lock.

This looks correct to me.

 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

Acked-by: Simon Horman [EMAIL PROTECTED]

 ---
 
 diff --git a/net/ipv4/ipvs/ip_vs_sched.c b/net/ipv4/ipvs/ip_vs_sched.c
 index 1602304..4322358 100644
 --- a/net/ipv4/ipvs/ip_vs_sched.c
 +++ b/net/ipv4/ipvs/ip_vs_sched.c
 @@ -183,19 +183,6 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler 
 *scheduler)
   /* increase the module use count */
   ip_vs_use_count_inc();
  
 - /*
 -  *  Make sure that the scheduler with this name doesn't exist
 -  *  in the scheduler list.
 -  */
 - sched = ip_vs_sched_getbyname(scheduler-name);
 - if (sched) {
 - ip_vs_scheduler_put(sched);
 - ip_vs_use_count_dec();
 - IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler 
 -   already existed in the system\n, scheduler-name);
 - return -EINVAL;
 - }
 -
   write_lock_bh(__ip_vs_sched_lock);
  
   if (scheduler-n_list.next != scheduler-n_list) {
 @@ -207,6 +194,20 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler 
 *scheduler)
   }
  
   /*
 +  *  Make sure that the scheduler with this name doesn't exist
 +  *  in the scheduler list.
 +  */
 + list_for_each_entry(sched, ip_vs_schedulers, n_list) {
 + if (strcmp(scheduler-name, sched-name) == 0) {
 + write_unlock_bh(__ip_vs_sched_lock);
 + ip_vs_use_count_dec();
 + IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler 
 + already existed in the system\n,
 + scheduler-name);
 + return -EINVAL;
 + }
 + }
 + /*
*  Add it into the d-linked scheduler list
*/
   list_add(scheduler-n_list, ip_vs_schedulers);

-- 
Horms

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG] Open networking bugs (73)

2007-12-03 Thread Stephen Hemminger

Some of these are old and maybe already fixed?

ID  Owner   Description
2803[EMAIL PROTECTED]   isa modem detected but does not initialize 
4206[EMAIL PROTECTED]   NAT/Masquerade not working 
4595[EMAIL PROTECTED]   Error inserting ebt_ulog 
4809[EMAIL PROTECTED]   AF_PACKET sockets ignore the SO_TIMESTAMP 
sockopt 
4885[EMAIL PROTECTED]   IPv6 Route entry being wrongly removed 
5088[EMAIL PROTECTED]   disable ECN per connection 
5091[EMAIL PROTECTED]   connection tracking can give abnormal 
throughput result 
5131[EMAIL PROTECTED]   Computer hangs when default-gw becomes 
unreachable 
5248[EMAIL PROTECTED]   Rapid loading and unloading of iptables modules 
gives oop... 
5999[EMAIL PROTECTED]   Iptables modules fail to load on Alpha arch 
6036[EMAIL PROTECTED]   mmap'ed write to socket hangs when connection 
remote end ... 
6161[EMAIL PROTECTED]   Modem Slow down speed 50% after kernel upgrade 
6187[EMAIL PROTECTED]   netlink: possible use after free in 
netlink_recvmsg 
6197[EMAIL PROTECTED]   unregister_netdevice: waiting for ppp9 to 
become free. Us... 
6319[EMAIL PROTECTED]   ipsec tunnel asymmetrical mtu 
6322[EMAIL PROTECTED]   Kernel Panic (tc filter delete panic) 
6339[EMAIL PROTECTED]   Wish: /proc or /sys-access to counters 
6548[EMAIL PROTECTED]   MPPE Encrypt Decrypt module bug. 
6681[EMAIL PROTECTED]   TC crash and rule freeze 
6682[EMAIL PROTECTED]   BUG: soft lockup detected on CPU#0! / ksoftirqd 
takse 100... 
6830[EMAIL PROTECTED]   Need a /proc to view IF-MIB counters not in 
/proc/net/dev 
6917[EMAIL PROTECTED]   BUG: warning at 
net/core/dev.c:1171/skb_checksum_help() 
6998[EMAIL PROTECTED]   rp_filter missing for ipv6 
7058[EMAIL PROTECTED]   CONFIG_IP_ROUTE_FWMARK breaks rp_filter checks 
7198[EMAIL PROTECTED]   balance-alb bonding oops when disconnecting 
primary slave... 
7202[EMAIL PROTECTED]   sun happy meal ethernet driver problem 
7512[EMAIL PROTECTED]   irda: sock_error in 
af_irda.c:irda_recvmsg_stream 
7554[EMAIL PROTECTED]   oops after insmod rfcomm and rmmod rfcomm 
7709[EMAIL PROTECTED]   Exposing the string field lengths of the 
ethtool_drvinfo ... 
7732[EMAIL PROTECTED]   System freeze every 10 days 
7846[EMAIL PROTECTED]   Strange trouble with samba and 2.6.19 kernel 
7952[EMAIL PROTECTED]   slattach only works every other time 
7974[EMAIL PROTECTED]   (bonding): scheduling while atomic 
7983[EMAIL PROTECTED]   kernel BUG at kernel/workqueue.c:150! 
8054[EMAIL PROTECTED]   tipc_ref_discard tipc_deleteport locking 
dependency 
8085networking_netfilter-iptabl...  performance drop in 2.6.20 
(CONFIG_NF_CONNTRACK_SUPPORT) 
8107[EMAIL PROTECTED]   dev-header_cache_update has a random value 
8203[EMAIL PROTECTED]   Race: a lock is expected before calling 
llc_conn_state_pr... 
8215[EMAIL PROTECTED]   A lock is expected before calling 
zero_fw_chain, but it ... 
8218[EMAIL PROTECTED]   8021q - Vlan - Tag lost/missing on base 
interface when sn... 
8253[EMAIL PROTECTED]   BUG: unable to handle kernel paging request at 
virtual ad... 
8325networking_netfilter-iptabl...  -j REDIRECT --to-ports 1000-1009, 
always first choosen 
8338networking_netfilter-iptabl...  NAT of TCP connections broken 
8382[EMAIL PROTECTED]   2.6.20 cannot route packets outside tunnel 
8474[EMAIL PROTECTED]   regression failure, can't even ping modem 
8536[EMAIL PROTECTED]   Kernel drops UDP packets silently when reading 
from certa... 
8561[EMAIL PROTECTED]   list_add corruption. prev-next should be next 
(f7d28794)... 
8654[EMAIL PROTECTED]   possible connect() bug 
8659[EMAIL PROTECTED]   Patch for bug #8635 breaks TCPv6 
8726[EMAIL PROTECTED]   MSG_TRUNC not regarded in unix_dgram_recvmsg() 
8732[EMAIL PROTECTED]   Samba - very slow -one way- speed on AMD 690 
8736[EMAIL PROTECTED]   New TC deadlock scenario 
8754[EMAIL PROTECTED]   Kernel addrconf modifies MTU of non-kernel 
routes 
8755[EMAIL PROTECTED]   ip -6 route change  behaves like ip -6 route 
add 
8766[EMAIL PROTECTED]   802.1q VLAN stacking + REORDER_HDR is broken 
8786[EMAIL PROTECTED]   Dell 350 Bluetooth device (usb id 413c:8103) 
not found 
8891[EMAIL PROTECTED]   in-kernel rpc generates broken 
RPCBPROC_GETVERSADDR v4 re... 
8895[EMAIL PROTECTED]   An ioctl to delete an ipv6 tunnel leads to a 
kernel panic 
8961[EMAIL PROTECTED]   BUG triggered by oidentd in netlink code 
8971[EMAIL PROTECTED]   htb class delete causes kernelpanic and other 
htb bugs. 
8975[EMAIL PROTECTED]   Broadcasting fails when default gateway can't 
be reached. 
9079[EMAIL

[PATCH 2.6.24] pasemi_mac: Fix reuse of free'd skb

2007-12-03 Thread Olof Johansson

Turns out we're freeing the skb when we detect CRC error, but we're
not clearing out info-skb. We could either clear it and have the stack
reallocate it, or just leave it and the rx ring refill code will reuse
the one that was allocated.

Reusing a freed skb obviously caused some nasty crashes of various kind,
as reported by Brent Baude and David Woodhouse.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]

---

Jeff, I'd like to see this in 2.6.24, it's causing some real problems
out there. It's not needed in the 2.6.25 queue since the other changes
there have already covered these cases.

My test network at home is quiet enough to not cause CRC errors, we
mainly get those during interface bringup before speed is configured.

diff --git a/drivers/net/pasemi_mac.c b/drivers/net/pasemi_mac.c
index 09b4fde..6617e24 100644
--- a/drivers/net/pasemi_mac.c
+++ b/drivers/net/pasemi_mac.c
@@ -586,7 +586,7 @@ static int pasemi_mac_clean_rx(struct pasemi_mac *mac, int 
limit)
/* CRC error flagged */
mac-netdev-stats.rx_errors++;
mac-netdev-stats.rx_crc_errors++;
-   dev_kfree_skb_irq(skb);
+   /* No need to free skb, it'll be reused */
goto next;
}
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

IPv6 IPsec input

2007-12-03 Thread Kazunori MIYAZAWA

Hello Herbert,

We have a trouble of IPv6 IPsec input process with your net-2.6.25 tree.
We can not receive IPsec packets.

I found that you directly pass the address family (AF_INET) to xfrm_state_lookup
in xfrm_input.

I guess it may result the bug.

Is it correct?

Best regards,

--
Kazunori Miyazawa


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Wang Chen

Herbert Xu said the following on 2007-12-3 21:17:
 On Mon, Dec 03, 2007 at 10:54:35PM +1100, Herbert Xu wrote:
 diff --git a/include/net/snmp.h b/include/net/snmp.h
 index ea206bf..9444b54 100644
 --- a/include/net/snmp.h
 +++ b/include/net/snmp.h
 @@ -23,6 +23,7 @@
  
  #include linux/cache.h
  #include linux/snmp.h
 +#include linux/smp.h

It's no need to include smp.h?

--
WCN

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Herbert Xu

On Tue, Dec 04, 2007 at 11:50:55AM +0800, Wang Chen wrote:

   #include linux/cache.h
   #include linux/snmp.h
  +#include linux/smp.h
 
 It's no need to include smp.h?

We need it for smp_processor_id.  Well we needed it before too but
there's probably some implicit inclusion which has made it work.
It's better to declare these inclusions explicitly as otherwise
they may break on less common architectures later.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input

2007-12-03 Thread Kazunori MIYAZAWA

Hi Herbert, 

This is the patch to fix IPv6 IPsec input.

Signed-off-by: Kazunori MIYAZAWA [EMAIL PROTECTED]
---
 net/xfrm/xfrm_input.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 96f42c1..da3c963 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -136,7 +136,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
if (skb-sp-len == XFRM_MAX_DEPTH)
goto drop;
 
-   x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET);
+   x = xfrm_state_lookup(daddr, spi, nexthdr, 
skb-dst-ops-family);
if (x == NULL)
goto drop;
 
-- 
1.4.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RESEND] PHY: Add the phy_device_release device method.

2007-12-03 Thread Andrew Morton

On Mon, 3 Dec 2007 09:35:11 +0100 Thierry Reding [EMAIL PROTECTED] wrote:

 In cases where more than a single PHY is found on the MDIO bus, the kernel
 will print a warning that this method is missing for each PHY device that
 is not attached to a networking device.
 
 Signed-off-by: Thierry Reding [EMAIL PROTECTED]
 ---
  drivers/net/phy/mdio_bus.c |   19 ++-
  1 files changed, 18 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
 index fc2f0e6..cb7fb47 100644
 --- a/drivers/net/phy/mdio_bus.c
 +++ b/drivers/net/phy/mdio_bus.c
 @@ -36,6 +36,23 @@
  #include asm/uaccess.h
  
  /**
 + * phy_device_release - free a phy_device structure when all users of it are
 + *   finished.
 + *
 + * @dev: device that's been disconnected
 + *
 + * Will be called only by the device core when all users of this phy_device
 + * are done.
 + */
 +static void phy_device_release(struct device *dev)
 +{
 + struct phy_device *phy;
 +
 + phy = to_phy_device(dev);
 + kfree(phy);
 +}
 +
 +/**
   * mdiobus_register - bring up all the PHYs on a given bus and attach them 
 to bus
   * @bus: target mii_bus
   *
 @@ -83,6 +100,7 @@ int mdiobus_register(struct mii_bus *bus)
   if (phydev) {
   phydev-irq = bus-irq[i];
  
 + phydev-dev.release = phy_device_release;
   phydev-dev.parent = bus-dev;
   phydev-dev.bus = mdio_bus_type;
   snprintf(phydev-dev.bus_id, BUS_ID_SIZE, PHY_ID_FMT, 
 bus-id, i);
 @@ -112,7 +130,6 @@ void mdiobus_unregister(struct mii_bus *bus)
   for (i = 0; i  PHY_MAX_ADDR; i++) {
   if (bus-phy_map[i]) {
   device_unregister(bus-phy_map[i]-dev);
 - kfree(bus-phy_map[i]);
   }
   }
  }
 

I've been sitting on
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm2/broken-out/phy-implement-release-function.patch
for a few weeks.  For some reason I have it in my nacked netdev patches
section but I think that was a mistake and it has not (yet ;)) been nacked.

Anyway, Anton's patch looks somewhat different from yours.  Please compare
notes.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input

2007-12-03 Thread Herbert Xu

On Tue, Dec 04, 2007 at 02:28:36PM +0900, Kazunori MIYAZAWA wrote:
 Hi Herbert, 
 
 This is the patch to fix IPv6 IPsec input.
 
 Signed-off-by: Kazunori MIYAZAWA [EMAIL PROTECTED]

Good catch! Thanks.

 diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
 index 96f42c1..da3c963 100644
 --- a/net/xfrm/xfrm_input.c
 +++ b/net/xfrm/xfrm_input.c
 @@ -136,7 +136,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
 spi, int encap_type)
   if (skb-sp-len == XFRM_MAX_DEPTH)
   goto drop;
  
 - x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET);
 + x = xfrm_state_lookup(daddr, spi, nexthdr, 
 skb-dst-ops-family);

I'd prefer to put this in XFRM_SPI_SKB_CB instead because the xfrm
layer shouldn't really rely on dst-ops structures except on entry
and exit, like this:

[IPSEC]: Use the correct family for input state lookup

When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call.  This broke IPv6 obviously.  This
patch fixes by getting the input callers to specify the family through
skb-cb.

Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 99251b7..3f86cd8 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -534,6 +534,7 @@ struct xfrm_spi_skb_cb {
} header;
 
unsigned int daddroff;
+   unsigned int family;
 };
 
 #define XFRM_SPI_SKB_CB(__skb) ((struct xfrm_spi_skb_cb *)((__skb)-cb[0]))
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 0c377a6..33f990d 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -39,6 +39,7 @@ drop:
 int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
int encap_type)
 {
+   XFRM_SPI_SKB_CB(skb)-family = AF_INET;
XFRM_SPI_SKB_CB(skb)-daddroff = offsetof(struct iphdr, daddr);
return xfrm_input(skb, nexthdr, spi, encap_type);
 }
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index e2c3efd..74f3aac 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -23,6 +23,7 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff 
*skb)
 
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
+   XFRM_SPI_SKB_CB(skb)-family = AF_INET6;
XFRM_SPI_SKB_CB(skb)-daddroff = offsetof(struct ipv6hdr, daddr);
return xfrm_input(skb, nexthdr, spi, 0);
 }
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 96f42c1..8b2b1b5 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -102,6 +102,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
__be32 seq;
struct xfrm_state *x;
xfrm_address_t *daddr;
+   unsigned int family;
int decaps = 0;
int async = 0;
 
@@ -127,6 +128,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
 
daddr = (xfrm_address_t *)(skb_network_header(skb) +
   XFRM_SPI_SKB_CB(skb)-daddroff);
+   family = XFRM_SPI_SKB_CB(skb)-family;
 
seq = 0;
if (!spi  (err = xfrm_parse_spi(skb, nexthdr, spi, seq)) != 0)
@@ -136,7 +138,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
if (skb-sp-len == XFRM_MAX_DEPTH)
goto drop;
 
-   x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET);
+   x = xfrm_state_lookup(daddr, spi, nexthdr, family);
if (x == NULL)
goto drop;
 
@@ -198,6 +200,7 @@ resume:
 * transport mode so the outer address is identical.
 */
daddr = x-id.daddr;
+   family = x-outer_mode-afinfo-family;
 
err = xfrm_parse_spi(skb, nexthdr, spi, seq);
if (err  0)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IPv6 IPsec input

2007-12-03 Thread Herbert Xu

On Tue, Dec 04, 2007 at 12:39:43PM +0900, Kazunori MIYAZAWA wrote:

 I found that you directly pass the address family (AF_INET) to 
 xfrm_state_lookup
 in xfrm_input.
 
 I guess it may result the bug.
 
 Is it correct?

Yes sorry, I didn't test IPv6.  Ironic because all this work is for
the benefit of IPv6 :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input

2007-12-03 Thread Kazunori MIYAZAWA


Herbert Xu wrote:

On Tue, Dec 04, 2007 at 02:28:36PM +0900, Kazunori MIYAZAWA wrote:
Hi Herbert, 


This is the patch to fix IPv6 IPsec input.

Signed-off-by: Kazunori MIYAZAWA [EMAIL PROTECTED]


Good catch! Thanks.


diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 96f42c1..da3c963 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -136,7 +136,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
if (skb-sp-len == XFRM_MAX_DEPTH)
goto drop;
 
-		x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET);

+   x = xfrm_state_lookup(daddr, spi, nexthdr, 
skb-dst-ops-family);


I'd prefer to put this in XFRM_SPI_SKB_CB instead because the xfrm
layer shouldn't really rely on dst-ops structures except on entry
and exit, like this:

[IPSEC]: Use the correct family for input state lookup

When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call.  This broke IPv6 obviously.  This
patch fixes by getting the input callers to specify the family through
skb-cb.

Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,


It works fine.

Thank you!!

--
Kazunori Miyazawa
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input

2007-12-03 Thread David Miller

From: Kazunori MIYAZAWA [EMAIL PROTECTED]
Date: Tue, 04 Dec 2007 15:23:01 +0900

 Herbert Xu wrote:
  I'd prefer to put this in XFRM_SPI_SKB_CB instead because the xfrm
  layer shouldn't really rely on dst-ops structures except on entry
  and exit, like this:

  [IPSEC]: Use the correct family for input state lookup

  When merging the input paths of IPsec I accidentally left a hard-coded
  AF_INET for the state lookup call.  This broke IPv6 obviously.  This
  patch fixes by getting the input callers to specify the family through
  skb-cb.

  Credit goes to Kazunori Miyazawa for diagnosing this and providing an
  initial patch.

  Signed-off-by: Herbert Xu [EMAIL PROTECTED]

  Cheers,

 It works fine.

Patch applied, thanks everyone.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] [TFRC] New rx history code

2007-12-03 Thread Gerrit Renker

NAK. You have changed the control flow of the algorithm and the underlying
data structure. Originally it had been an array of pointers, and this had
been a design decision right from the first submissions in March. From I
of
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt

1. 'ring' is an array of pointers rather than a contiguous area in
memory. The reason is that, when having to swap adjacent entries
to sort the history, it is easier to swap pointers than to copy
(heavy-weight) history-entry data structs.

But in your algorithm it becomes a normal linear array.

As a consequence all subsequent code needs to be rewritten to use
copying instead of swapping pointers. And there is a lot of that, since
the loss detection code makes heavy use of swapping entries in order to
cope with reordered and/or delayed packets.

This is not what I would call entirely equivalent as you claim. Your
algorithm is fundamentally different, the control flow is not the same,
nor are the underlying data structures.

| Credit here goes to Gerrit Renker, that provided the initial implementation
for
| this new codebase.
|
| I modified it just to try to make it closer to the existing API, hide details
from
| the CCIDs and fix a couple bugs found in the initial implementation.
What is a couple bugs? So far you have pointed out only one problem and that
was
agreed, it can be fixed. But it remains a side issue, it is not a failure of the
algorithm per se. What is worse here is that you take this single occurrence as
a
kind of carte blanche to mess around with the code as you please. Where please
are
the other bugs you are referring to?

I am not buying into this and am not convinced that you are understanding
what you are doing. It is the third time that you take patches which
have been submitted on [EMAIL PROTECTED] for over half a year, for which you
have offered no more than a sentence of feedback, release them under
your name, and introduce fundamental changes which alter the behaviour.

The first instance was the set of ktime patches which I had developed
for the test tree and which you extended to DCCP. In this case it were
in fact three bugs that you added in migrating my patches. I know this
because it messed up the way CCID3 behaved and since I spent several
hours chasing these. And, in contrast to the above, it is not a mere
claim: that is recorded in the netdev mail archives.

The second instance was when you released the TX history patches under
your name. Pro forma there was an RFC patch at 3pm, de facto it was
checked in a few hours later: input not welcome.

The third instance is now where you change in essence the floor
underneath this work. Since you are using a different basis, the
algorithm is (in addition to the changes in control flow) necessarily
different.

I have provided documentation, written test modules, and am able to prove
that the algorithm as submitted works reasonably correct. In addition, the
behaviour has been verified using regression tests.

I am not prepared to take your claims and expressions of deepest
respect at face value since your actions - not your words - show that
you are, in fact, not dealing respectfully with work which has taken
months to complete and verify.

During 9 months on [EMAIL PROTECTED] you did not provide so much as a paragraph
of feedback. Instead you mopped up code from the test tree, modified it
according to your own whims and now, in the circle of your
invitation-only buddies, start to talk about having discussions and
iterations. The only iteration I can see is in fixing up the things you
introduced, so it is not you who has to pay the price.

Sorry, unless you can offer a more mature model of collaboration,
consider me out of this and the patches summarily withdrawn. I am not
prepared to throw away several months of work just because you feel
inspired to do as you please with the work of others.

Gerrit

| Original changeset comment from Gerrit:
| ---
| This provides a new, self-contained and generic RX history service for TFRC
| based protocols.
|
| Details:
| * new data structure, initialisation and cleanup routines;
| * allocation of dccp_rx_hist entries local to packet_history.c,
|as a service exported by the dccp_tfrc_lib module.
| * interface to automatically track highest-received seqno;
| * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
| * a generic function to test for `data packets' as per RFC 4340, sec. 7.7.
| ---
|
| Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
| ---
| net/dccp/ccids/ccid3.c | 255
| net/dccp/ccids/ccid3.h | 14 +-
| net/dccp/ccids/lib/loss_interval.c | 14 +-
| net/dccp/ccids/lib/packet_history.c | 277
+++---
| net/dccp/ccids/lib/packet_history.h

Re: [PATCH][RESEND] PHY: Add the phy_device_release device method.

2007-12-03 Thread Thierry Reding

* Andrew Morton wrote:
 On Mon, 3 Dec 2007 09:35:11 +0100 Thierry Reding [EMAIL PROTECTED] wrote:
 
  In cases where more than a single PHY is found on the MDIO bus, the kernel
  will print a warning that this method is missing for each PHY device that
  is not attached to a networking device.
  
  Signed-off-by: Thierry Reding [EMAIL PROTECTED]
  ---
   drivers/net/phy/mdio_bus.c |   19 ++-
   1 files changed, 18 insertions(+), 1 deletions(-)
  
  diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
  index fc2f0e6..cb7fb47 100644
  --- a/drivers/net/phy/mdio_bus.c
  +++ b/drivers/net/phy/mdio_bus.c
  @@ -36,6 +36,23 @@
   #include asm/uaccess.h
   
   /**
  + * phy_device_release - free a phy_device structure when all users of it 
  are
  + * finished.
  + *
  + * @dev: device that's been disconnected
  + *
  + * Will be called only by the device core when all users of this phy_device
  + * are done.
  + */
  +static void phy_device_release(struct device *dev)
  +{
  +   struct phy_device *phy;
  +
  +   phy = to_phy_device(dev);
  +   kfree(phy);
  +}
  +
  +/**
* mdiobus_register - bring up all the PHYs on a given bus and attach them 
  to bus
* @bus: target mii_bus
*
  @@ -83,6 +100,7 @@ int mdiobus_register(struct mii_bus *bus)
  if (phydev) {
  phydev-irq = bus-irq[i];
   
  +   phydev-dev.release = phy_device_release;
  phydev-dev.parent = bus-dev;
  phydev-dev.bus = mdio_bus_type;
  snprintf(phydev-dev.bus_id, BUS_ID_SIZE, PHY_ID_FMT, 
  bus-id, i);
  @@ -112,7 +130,6 @@ void mdiobus_unregister(struct mii_bus *bus)
  for (i = 0; i  PHY_MAX_ADDR; i++) {
  if (bus-phy_map[i]) {
  device_unregister(bus-phy_map[i]-dev);
  -   kfree(bus-phy_map[i]);
  }
  }
   }
  
 
 I've been sitting on
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm2/broken-out/phy-implement-release-function.patch
 for a few weeks.  For some reason I have it in my nacked netdev patches
 section but I think that was a mistake and it has not (yet ;)) been nacked.
 
 Anyway, Anton's patch looks somewhat different from yours.  Please compare
 notes.

FWIW, I like Anton's patch better, especially since it plugs a possible
memory leak. I'm not sure it's useful or necessary to export the
phy_device_free symbol, though.

The only other difference that I can see is that Anton's patch sets the
release function in a different place, when the device is created, whereas my
patch only sets it before registering the device. Both should be identical
since the release function won't be needed unless the device is registered.

Thierry



signature.asc
Description: Digital signature

95 matches

Mail list logo