[PATCH] [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec = 1000000

2006-06-19 Thread Shuya MAEDA
I found two problems in PSCHED_TADD() and PSCHED_TADD2().

1) These function increment tv_sec if tv_usec  100.
   But I think it should if tv_usec = 100.

2) tv_usec became 120 or more when I used CBQ and
   experimented it. It is not correct to exceed 100
   because tv_usec is micro seconds.
   To fix 2), I think that it should do delta / 100,
   add the quotient to tv_sec, and add the remainder to
   tv_usec.

In both cases, because time when the transmission is restarted
reaches an illegal value, it is not possible to communicate at
the set rate.

To fix these problem I create following patch.
Are there any comments?

[Experiment]
  * kernel: linux-2.6.15.5
  * CBQ settings
   --
   tc qdisc add dev $IF root handle 1:0 cbq bandwidth 100Mbit \
avpkt 1000 mpu 64 ewma 5 cell 8
   tc class add dev $IF parent 1:0 classid 1:10 cbq rate 32Kbit \
prio 1 ewma 5 cell 8 avpkt 138 mpu 64 bandwidth 100Mbit \
minburst 25 maxburst 50 bounded isolated
   tc filter add dev $IF parent 1:0 protocol ip prio 16 u32 match \
ip dport 4952 0x flowid 1:10
   ---
  * Traffic
dst port 4952: 138byte per 20msec.

[Result]
  * In cbq_ovl_classic():
cl-undertime = { tv_sec = 1150368540, tv_usec = 1208301 }
   ~~
q-now= { tv_sec = 1150368539, tv_usec = 878917 }
delay = 1329384
cl-avgidle   = -14781
cl-offtime   = 1295394

[Patch]
diff -Nur linux-2.6.17-rc6.orig/include/net/pkt_sched.h 
linux-2.6.17-rc6.mypatch/include/net/pkt_sched.h
--- linux-2.6.17-rc6.orig/include/net/pkt_sched.h   2006-06-06 
09:57:02.0 +0900
+++ linux-2.6.17-rc6.mypatch/include/net/pkt_sched.h2006-06-16 
11:29:08.0 +0900
@@ -169,17 +169,31 @@

 #define PSCHED_TADD2(tv, delta, tv_res) \
 ({ \
-  int __delta = (tv).tv_usec + (delta); \
-  (tv_res).tv_sec = (tv).tv_sec; \
-  if (__delta  USEC_PER_SEC) { (tv_res).tv_sec++; __delta -= 
USEC_PER_SEC; } \
-  (tv_res).tv_usec = __delta; \
+  int __delta = (delta); \
+  (tv_res) = (tv); \
+  if((delta)  USEC_PER_SEC) { \
+(tv_res).tv_sec += (delta) / USEC_PER_SEC; \
+__delta -= (delta) % USEC_PER_SEC; \
+  } \
+  (tv_res).tv_usec += __delta; \
+  if((tv_res).tv_usec = USEC_PER_SEC) { \
+(tv_res).tv_sec++; \
+(tv_res).tv_usec -= USEC_PER_SEC; \
+  } \
 })

 #define PSCHED_TADD(tv, delta) \
 ({ \
-  (tv).tv_usec += (delta); \
-  if ((tv).tv_usec  USEC_PER_SEC) { (tv).tv_sec++; \
-(tv).tv_usec -= USEC_PER_SEC; } \
+  int __delta = (delta); \
+  if((delta)  USEC_PER_SEC) { \
+(tv).tv_sec += (delta) / USEC_PER_SEC; \
+__delta -= (delta) % USEC_PER_SEC; \
+  } \
+  (tv).tv_usec += __delta; \
+  if((tv).tv_usec = USEC_PER_SEC) { \
+(tv).tv_sec++; \
+(tv).tv_usec -= USEC_PER_SEC; \
+  } \
 })

 /* Set/check that time is in the past perfect;
-- 
Shuya Maeda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.17: networking bug??

2006-06-19 Thread Helge Hafting

Mark Lord wrote:


Unilaterally following the standard is all well and good
for those who know how to get around it when a site becomes
inaccessible, but not for Joe User.


So lets enable it in the kernel, and let the distros turn it off.
The Joe User who isn't a kernel hacker won't be running 2.6.17
in a long time.  He'll be running whatever his distro packages for him,
and they will know how to disable (or patch out) window scaling.

Someone who compiles his own kernel runs into all sorts of
issues, this is just one more of them.

If it always fails, or always works, that's not such a big problem.
I would never have complained if I had never been able to access
the web sites in question.  But since it IS working in 2.6.16,
and got broken in 2.6.17, I'm bloody well going to complain.

Yes.  And make sure you complain to those running the bad
box as well.

Helge Hafting
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH, RFT] bcm43xx: AccessPoint mode

2006-06-19 Thread Michael Buesch
Hi,

This patch enables the usage of a bcm43xx card as AP with
the Devicescape 802.11 stack.

Well, it does not work 100%, but at least it's very promising.
We are able to create a bssid and correctly send beacon frames out.

This patch is tested on BE and LE machines.

There seem to be issues with Devicescape and/or hostap.
Trying to authenticate from a STA to the AP does not work. The
packet is simply not processed. I was able to catch the auth
request on the AP (using the wonderful dscape virtual interfaces).
So the AP receives the packet, but loses it somewhere in the
stack or hostapd.

Well, thanks to Alexander Tsvyashchenko and the OpenWRT team for
the hard work to figure out how this all works.
My part on this patch is mainly endianess fixes.

Please give it a testrun.
Final note about hostapd:
hostapd snapshot 0.5-2006-06-10 seems to work in the sense
that it is able to bring up the device.
hostapd snapshot 0.5-2006-06-11 seems to fail.

I did not look into this more close, yet.



Important notes from Alexander Tsvyashchenko's initial mail follow:
--

1) This version deals with TIM in cleaner way (though, PS mode is still
not supported) - instead of patching dscape stack to skip TIM
generation, it strips TIM when writing probe response template and
leaves it when writing beacon template.

2) As in current dscape stack management interface seems to be no longer
passed to the driver, all interface handling is left as it is, no
changes there should be made anymore.

...

Known limitations:

1) PS mode is not supported.

Testing instructions:

Although my previous patch to hostapd to make it interoperable with
bcm43xx  dscape has been merged already in their CVS version, due to
the subsequent changes in dscape stack current hostapd is again
incompartible :-( So, to test this patch, the patch to hostapd should be
applied.
I used hostapd snapshot 0.5-2006-06-10, patch for it is attached.
The patch is very hacky and requires tricky way to bring everything up,
but as dscape stack is changed quite constantly, I just do not want to
waste time fixing it in proper way only to find a week later that
dscape handling of master interface was changed completely once more and
everything is broken again ;-)

The patch for dscape stack that is attached is not 100% necessary, but it
seems to allow operating clients that request PS mode to be enabled at
AP (verified with PDA client), the only thing it contains is disabling
actual PS handling in dscape.

So, the following sequence should be used to test AP mode:

1) take hostapd snapshot 0.5-2006-06-10 (other recent versions should
work OK also, though), apply the hostapd patch attached.

2) Insert modules (80211, rate_control and bcm43xx-d80211)

3) iwconfig wlan0 mode master

4) ifconfig wlan0 up (this should be done by hostapd actually, but
its operation with current dscape stack seems to be broken)

5) Start hostapd (f.e. hostapd -B /etc/hostapd.conf), config file can
look like:
=
interface=wlan0
driver=devicescape
ssid=OpenWrt
channel=1
send_probe_response=0
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
debug=4
=

6) iwconfig wlan0 essid your-SSID-name (this also should not be
required, but current combination of hostapd + dscape doesn't seem to
generate config_interface callback when setting beacon, so this is
required just to force call of config_interface).



Index: 
wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c
===
--- 
wireless-dev-dscapeports.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c
2006-06-17 21:26:10.0 +0200
+++ wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 
2006-06-18 23:36:31.0 +0200
@@ -152,7 +152,7 @@
u32 status;
 
status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
-   if (!(status  BCM43xx_SBF_XFER_REG_BYTESWAP))
+   if (status  BCM43xx_SBF_XFER_REG_BYTESWAP)
val = swab32(val);
 
bcm43xx_write32(bcm, BCM43xx_MMIO_RAM_CONTROL, offset);
@@ -312,7 +312,7 @@
}
 }
 
-void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf)
+static void bcm43xx_time_lock(struct bcm43xx_private *bcm)
 {
u32 status;
 
@@ -320,7 +320,19 @@
status |= BCM43xx_SBF_TIME_UPDATE;
bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
mmiowb();
+}
+
+static void bcm43xx_time_unlock(struct bcm43xx_private *bcm)
+{
+   u32 status;
+
+   status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
+   status = ~BCM43xx_SBF_TIME_UPDATE;
+   bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
+}
 
+static void bcm43xx_tsf_write_locked(struct bcm43xx_private *bcm, u64 tsf)
+{
/* Be careful with the in-progress timer.
 * First zero out the low register, so we have a full
 * register-overflow duration to complete the operation.
@@ -350,10 

Re: [PATCH] AP (master) mode fixed (resubmit)

2006-06-19 Thread Michael Buesch
On Monday 19 June 2006 11:37, Francois Barre wrote:
 2006/6/18, Michael Buesch [EMAIL PROTECTED]:
  Ok, I got my Airport to generate Beacons on this BE machine.
 
 Hurray, I'm not alone running BE stuff here...
 
  There was a bug hiding in bcm43xx_ram_write().
 [..]
 Could you provide a small patch just for this issue please ? It's not
 that I'm too lasy to re-apply your whole patches again, but... Well,
 if you have it...

It is not an issue without the AP mode patch, because all callers
of bcm43xx_ram_write() are buggy, too.
So, caller buggy, callee buggy, result OK. ;)

  But I can not associate to the bcm43xx-AP.
  But it seems like a dscape problem. The authentication packet
  arrives at the machine (I can capture it with the new cool virtual
  monitor interface), but it is not processed. So the STA does
  not receive a response.
 
 Funny, I did have no problem associating with the AP. What happens
 exactly on the STA ? Did you manage to trace anything on ?
 What exactly is your hardware, Michael ?

I think I did something wrong while bringing the device up.
It works now.

But attached is a fixed patch, already.
We had an off-by-two bug in common template write.

 Also, Alexander, did you have the opportunity to heavily test your AP
 code ? I mean, finding the maximum bandwidth a bcm43xx could provide
 while being an AP, the way it behaves with multiple STA associated,

I was able to associate now, but could not transmit ping packets, yet.
Dunno what the problem is. Maybe the STA is broken, too. Let's see.


Index: 
wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c
===
--- 
wireless-dev-dscapeports.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c
2006-06-17 21:26:10.0 +0200
+++ wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 
2006-06-19 11:25:02.0 +0200
@@ -151,8 +151,10 @@
 {
u32 status;
 
+   assert(offset % 4 == 0);
+
status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
-   if (!(status  BCM43xx_SBF_XFER_REG_BYTESWAP))
+   if (status  BCM43xx_SBF_XFER_REG_BYTESWAP)
val = swab32(val);
 
bcm43xx_write32(bcm, BCM43xx_MMIO_RAM_CONTROL, offset);
@@ -312,7 +314,7 @@
}
 }
 
-void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf)
+static void bcm43xx_time_lock(struct bcm43xx_private *bcm)
 {
u32 status;
 
@@ -320,7 +322,19 @@
status |= BCM43xx_SBF_TIME_UPDATE;
bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
mmiowb();
+}
+
+static void bcm43xx_time_unlock(struct bcm43xx_private *bcm)
+{
+   u32 status;
+
+   status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
+   status = ~BCM43xx_SBF_TIME_UPDATE;
+   bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
+}
 
+static void bcm43xx_tsf_write_locked(struct bcm43xx_private *bcm, u64 tsf)
+{
/* Be careful with the in-progress timer.
 * First zero out the low register, so we have a full
 * register-overflow duration to complete the operation.
@@ -350,10 +364,13 @@
mmiowb();
bcm43xx_write16(bcm, BCM43xx_MMIO_TSF_0, v0);
}
+}
 
-   status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
-   status = ~BCM43xx_SBF_TIME_UPDATE;
-   bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
+void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf)
+{
+   bcm43xx_time_lock(bcm);
+   bcm43xx_tsf_write_locked(bcm, tsf);
+   bcm43xx_time_unlock(bcm);
 }
 
 static void bcm43xx_measure_channel_change_time(struct bcm43xx_private *bcm)
@@ -415,10 +432,11 @@
 static void bcm43xx_write_mac_bssid_templates(struct bcm43xx_private *bcm)
 {
static const u8 zero_addr[ETH_ALEN] = { 0 };
-   const u8 *mac = NULL;
-   const u8 *bssid = NULL;
+   const u8 *mac;
+   const u8 *bssid;
u8 mac_bssid[ETH_ALEN * 2];
int i;
+   u32 tmp;
 
bssid = bcm-interface.bssid;
if (!bssid)
@@ -431,12 +449,13 @@
memcpy(mac_bssid + ETH_ALEN, bssid, ETH_ALEN);
 
/* Write our MAC address and BSSID to template ram */
-   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32))
-   bcm43xx_ram_write(bcm, 0x20 + i, *((u32 *)(mac_bssid + i)));
-   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32))
-   bcm43xx_ram_write(bcm, 0x78 + i, *((u32 *)(mac_bssid + i)));
-   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32))
-   bcm43xx_ram_write(bcm, 0x478 + i, *((u32 *)(mac_bssid + i)));
+   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32)) {
+   tmp =  (u32)(mac_bssid[i + 0]);
+   tmp |= (u32)(mac_bssid[i + 1])  8;
+   tmp |= (u32)(mac_bssid[i + 2])  16;
+   tmp |= (u32)(mac_bssid[i + 3])  24;
+   bcm43xx_ram_write(bcm, 0x20 + i, tmp);
+  

[NET]: Prevent multiple qdisc runs

2006-06-19 Thread Herbert Xu
Hi Dave:

I'm nearly done with the generic segmentation offload stuff (although
only TCPv4 is implemented for now), and I encountered this problem.

[NET]: Prevent multiple qdisc runs

Having two or more qdisc_run's contend against each other is bad because
it can induce packet reordering if the packets have to be requeued.  It
appears that this is an unintended consequence of relinquinshing the queue
lock while transmitting.  That in turn is needed for devices that spend a
lot of time in their transmit routine.

There are no advantages to be had as devices with queues are inherently
single-threaded (the loopback device is not but then it doesn't have a
queue).

Even if you were to add a queue to a parallel virtual device (e.g., bolt
a tbf filter in front of an ipip tunnel device), you would still want to
process the queue in sequence to ensure that the packets are ordered
correctly.

The solution here is to steal a bit from net_device to prevent this.

BTW, as qdisc_restart is no longer used by anyone as a module inside the
kernel (IIRC it used to with netif_wake_queue), I have not exported the
new __qdisc_run function.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e432b74..39919c8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -233,6 +233,7 @@ enum netdev_state_t
__LINK_STATE_RX_SCHED,
__LINK_STATE_LINKWATCH_PENDING,
__LINK_STATE_DORMANT,
+   __LINK_STATE_QDISC_RUNNING,
 };
 
 
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index b94d1ad..75b5b93 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -218,12 +218,13 @@ extern struct qdisc_rate_table *qdisc_ge
struct rtattr *tab);
 extern void qdisc_put_rtab(struct qdisc_rate_table *tab);
 
-extern int qdisc_restart(struct net_device *dev);
+extern void __qdisc_run(struct net_device *dev);
 
 static inline void qdisc_run(struct net_device *dev)
 {
-   while (!netif_queue_stopped(dev)  qdisc_restart(dev)  0)
-   /* NOTHING */;
+   if (!netif_queue_stopped(dev) 
+   !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
+   __qdisc_run(dev);
 }
 
 extern int tc_classify(struct sk_buff *skb, struct tcf_proto *tp,
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index b1e4c5e..d7aca8e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -90,7 +90,7 @@ void qdisc_unlock_tree(struct net_device
NOTE: Called under dev-queue_lock with locally disabled BH.
 */
 
-int qdisc_restart(struct net_device *dev)
+static inline int qdisc_restart(struct net_device *dev)
 {
struct Qdisc *q = dev-qdisc;
struct sk_buff *skb;
@@ -179,6 +179,14 @@ requeue:
return q-q.qlen;
 }
 
+void __qdisc_run(struct net_device *dev)
+{
+   while (qdisc_restart(dev)  0  !netif_queue_stopped(dev))
+   /* NOTHING */;
+
+   clear_bit(__LINK_STATE_QDISC_RUNNING, dev-state);
+}
+
 static void dev_watchdog(unsigned long arg)
 {
struct net_device *dev = (struct net_device *)arg;
@@ -620,6 +628,5 @@ EXPORT_SYMBOL(qdisc_create_dflt);
 EXPORT_SYMBOL(qdisc_alloc);
 EXPORT_SYMBOL(qdisc_destroy);
 EXPORT_SYMBOL(qdisc_reset);
-EXPORT_SYMBOL(qdisc_restart);
 EXPORT_SYMBOL(qdisc_lock_tree);
 EXPORT_SYMBOL(qdisc_unlock_tree);


Re: [NET]: Prevent multiple qdisc runs

2006-06-19 Thread jamal
Herbert,

I take it you saw a lot of requeues happening that prompted this? What
were the circumstances? The _only_ times i have seen it happen is when
the (PCI) bus couldnt handle the incoming rate or there was a bug in the
driver. 
Also: what happens to the packet that comes in from either local or is
being forwarded and finds the qdisc_is_running flag is set? I couldnt
tell if the intent was to drop it or not. The answer for TCP is probably
simpler than for packets being forwarded.

cheers,
jamal


On Mon, 2006-19-06 at 22:15 +1000, Herbert Xu wrote:
 Hi Dave:
 
 I'm nearly done with the generic segmentation offload stuff (although
 only TCPv4 is implemented for now), and I encountered this problem.
 
 [NET]: Prevent multiple qdisc runs
 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DOC]: generic netlink

2006-06-19 Thread jamal

Folks,

Attached is a document that should help people wishing to use generic
netlink interface. It is a WIP so a lot more to go if i see interest.
The doc has been around for a while, i spent part of yesterday and this
morning cleaning it up. If you have sent me comments before, please
forgive me for having misplaced them - just send again. 

cheers,
jamal

PS:- I dont have a good place to put this doc and point to, hence the
17K attachment

1.0 Problem Statement
---

Netlink is a robust wire-format IPC typically used for kernel-user
communication although could also be used to be a communication
carrier between user-user and kernel-kernel.

A typical netlink connection setup is of the form:

netlink_socket = socket(PF_NETLINK, socket_type, netlink_family);

where netlink_family selects the netlink bus to communicate
on. Example of a family would be NETLINK_ROUTE which is 0x0 or
NETLINK_XFRM which is 0x6. [Refer to RFC 3549 for a high level view
and look at include/linux/netlink.h for some of the allocated families].

Over the years, due to its robust design, netlink has become very popular.
This has resulted in the danger of running out of family numbers to issue.

In netconf 2005 in Montreal it was decided to find ways to work around
the allocation challenge and as a result NETLINK_GENERIC bus was born.

This document gives a mid-level view if NETLINK_GENERIC and how to use it.
The reader does not necessarily have to know what netlink is, but needs
to know at least the encapsulation used - which is described in the next
section. There are some implicit assumptions about what netlink is
or what structures like TLVs are etc. I apologize i dont have much
time to give a tutorial - invite me to some odd conference and i will
be forced to do better than this doc. Better send patches to this doc.

2.0 High Level view


In order to illustrate the way different components talk to each
other, the diagram below is used to provide an abstraction on
how the operations happen. There are two (three depending on your
perspective) components:

1) The generic netlink connection which for illustration is refered
to as a bus. The generic netlink bus is shown as split between user 
and kernel domains: This means programs can connect to the bus from either
kernel or user space.

2) components that talk to each other after attaching to the bus.
a) Two users are shown in user spaces 
b)3 in the kernel.

All boxes have kernel-wide unique identifiers that can be used to 
address them. 
Typicaly, user space boxes exist to control one or more kernel level
boxen i.e they update some attributes that exist in a kernel level
box.
Any of these boxes can communicate to each other by first
connecting to the bus and then sending messages addressed to any
box. 

+--+  +--+
|  user1   |  ..  |  user-n  |
+--+---+  +---+--+
   |  |
   /  |
  |   |User
+-++-+ Space/domain
 user   ||
+   Generic Netlink Bus  +---
 kernel ||   Kernel
+--+--+--+   Space/domain
  ||  |
  ||  |
  ||  |
  ||  |
   +--+---++---+-+ +--+-+
   |controller|| foobar  | | googah |
   +--++-+ ++

The controller is a speacial built-in user of the bus. It is the repository
of info on kernel components that have attached to the bus. It has
a reserved address identifier of 0x10. By querying the controller,
one could find out that both foobar and googah are registered and
what their IDs are etc. Essentially its a namespace translator
not unlike DNS is for IP addresses. More later on this.

To get to the point of the most common usage of netlink
(user space control of a kernel component), the diagram below breaks
things down for a single user program that controls a kernel module
called foobar. The example is simple for illustration purposes; as an
example, user space could control a lot more kernel modules.


 +--+
 |  |
 |user program  |
  gnl events  ; ---|  |
(2),-/   +--^-+--^--+
 ,'  gnl| ^ foobar   ^ foobar
,'discovery ^ | events   | config/query 
   ,'   (1) | ^  (4) ^  (3)
   +--/-- 

Re: [NET]: Prevent multiple qdisc runs

2006-06-19 Thread Herbert Xu
Hi Jamal:

On Mon, Jun 19, 2006 at 09:33:51AM -0400, jamal wrote:
 
 I take it you saw a lot of requeues happening that prompted this? What
 were the circumstances? The _only_ times i have seen it happen is when
 the (PCI) bus couldnt handle the incoming rate or there was a bug in the
 driver. 

Actually I discovered the problem only because the generic segmentation
offload stuff that I'm working on needs to deal with the situation where
a super-packet is partially transmitted.  Requeueing causes all sorts of
nasty problems so I chose to keep it within the net_device structure.

To do so requires qdisc_run to be serialised against each other.  I then
found out that we want this anyway because otherwise the requeued packets
could be reordered.

 Also: what happens to the packet that comes in from either local or is
 being forwarded and finds the qdisc_is_running flag is set? I couldnt
 tell if the intent was to drop it or not. The answer for TCP is probably
 simpler than for packets being forwarded.

The qdisc_is_running only prevents qdisc_run from occuring (because it's
already running), it does not impact on the queueing of the packet.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: Prevent multiple qdisc runs

2006-06-19 Thread jamal
Herbert,

On Mon, 2006-19-06 at 23:42 +1000, Herbert Xu wrote:
 Hi Jamal:
 
 On Mon, Jun 19, 2006 at 09:33:51AM -0400, jamal wrote:
[..]
 
 Actually I discovered the problem only because the generic segmentation
 offload stuff that I'm working on needs to deal with the situation where
 a super-packet is partially transmitted.  Requeueing causes all sorts of
 nasty problems so I chose to keep it within the net_device structure.

 To do so requires qdisc_run to be serialised against each other.  I then
 found out that we want this anyway because otherwise the requeued packets
 could be reordered.
 

Ok, I am trying to visualize but having a hard time:
Re-queueing is done at the front of the queue to maintain ordering
whereas queueing is done at the front (i.e it is a FIFO). i,e
even if p2 comes in and gets queued while p1 is being processed,
requeueing of p1 will put it infront of p2.
Your super-packet issue may be different though ..

  Also: what happens to the packet that comes in from either local or is
  being forwarded and finds the qdisc_is_running flag is set? I couldnt
  tell if the intent was to drop it or not. The answer for TCP is probably
  simpler than for packets being forwarded.
 
 The qdisc_is_running only prevents qdisc_run from occuring (because it's
 already running), it does not impact on the queueing of the packet.
 

I will wait for your answer on the other part before responding to this.

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Harry Edmon

Stephen Hemminger wrote:


Does this fix it?
   # sysctl -w net.ipv4.tcp_abc=0


That did not help.  I have 1 minute outputs from tcpdump under both 2.6.11.12 
and 2.6.16.20.  You will see a large size difference between the files.  Since 
the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web instead 
of via attachments.   Look at:


http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min
http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min

And again, thank to all of you for looking into this.

--
 Dr. Harry EdmonE-MAIL: [EMAIL PROTECTED]
 206-543-0547   [EMAIL PROTECTED]
 Dept of Atmospheric Sciences   FAX:206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: Prevent multiple qdisc runs

2006-06-19 Thread Herbert Xu
On Mon, Jun 19, 2006 at 10:23:29AM -0400, jamal wrote:
 
 Ok, I am trying to visualize but having a hard time:
 Re-queueing is done at the front of the queue to maintain ordering
 whereas queueing is done at the front (i.e it is a FIFO). i,e
 even if p2 comes in and gets queued while p1 is being processed,
 requeueing of p1 will put it infront of p2.

Correct.  When qdisc_run happens we take an skb off the head of the
queue.  If it can't be transmitted right away, we try to put it back
in the same spot.

If you have two qdisc_run's happening at the same time then that spot
could be different.

 Your super-packet issue may be different though ..

The reordering issue is not related to super-packets.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: Prevent multiple qdisc runs

2006-06-19 Thread jamal
On Tue, 2006-20-06 at 00:29 +1000, Herbert Xu wrote:

 Correct.  When qdisc_run happens we take an skb off the head of the
 queue.  If it can't be transmitted right away, we try to put it back
 in the same spot.
 
 If you have two qdisc_run's happening at the same time then that spot
 could be different.
 

Ok, but:
The queue lock will ensure only one of the qdisc runs (assuming
different CPUs) will be able to dequeue at any one iota in time, no?
And if you assume that the cpu that manages to get the tx lock as well
is going to be contending for the qlock in ordewr to requeue, then the
only scenario i can see the race happening is when you have one CPU
faster than the other.
Did i miss something?

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] pcnet32 NAPI changes

2006-06-19 Thread Lennart Sorensen
On Fri, Jun 16, 2006 at 12:11:54PM -0700, Don Fry wrote:
 This patch is a collection of changes to pcnet32 which does the
 following: 
 
 - Fix section mismatch warning.
 - fix set_ringparam to correctly handle memory allocation failures
 - fix off-by-one in get_ringparam.
 - cleanup at end of loopback_test when not up.
 - Add NAPI to driver, fixing set_ringparam and loopback_test to work
   correctly with poll.
 - for multicast, do not reset the chip unless cannot enter suspend mode
   to avoid race with poll.
 
 The set_ringparam code is larger than I would prefer, but it will not
 leave null pointers around for the code to stumble over when memory
 allocation fails.  If anyone has a better idea, please let me know.
 
 Some complexity could be avoided by allocating memory for the maximum
 number of tx and rx buffers at probe time.  Requiring 14k for the tx
 ring and arrays, and another 14k for rx; instead of about 10k total for
 the default sizes.

So 28k vs 10k?  Why are these adjustable if it makes that little
difference?  Is there any advantage to making them smaller?

 It is NAPI only, unlike Len Sorensen's version which allows for compile
 time selection.  Some drivers are NAPI only, others have compile
 options.  Which is preferred?

I just figured making it an option was less intrusive, although I can't
imagine a good reason for not wanting to use the NAPI version at all
times.  I certainly know I intend to use it that way.

 I have tested these changes with a 79C971, 973, 976, and 978 on a ppc64
 machine, and 970A, 972, 973, 975, and 976 on an x86 machine.
 
 I have not tested these changes with VMware or Xen.

I will give it a try with our system and see how it runs.

Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DOC]: generic netlink

2006-06-19 Thread James Morris
On Mon, 19 Jun 2006, jamal wrote:

 Attached is a document that should help people wishing to use generic
 netlink interface. It is a WIP so a lot more to go if i see interest.

Thanks for writing this up.

It seems that TIPC is multiplexing all of it's commands through  
TIPC_GENL_CMD.

I wonder, if this is how other protocols are likely to utilize genl, then 
we could possibly drop the command registration code completely and one 
command op can be registered by the protocol during 
genl_register_family().

This would both simplify the genl code and API, and help ensure 
consistency of users.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Jesper Dangaard Brouer



Harry Edmon [EMAIL PROTECTED] wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6. 
The system is has Dual single core Xeons with hyperthreading on.

cut

Hi Harry

Can you check which high-res timesource you are using?

In the kernel log look for:
 kernel: Using tsc for high-res timesource
 kernel: Using pmtmr for high-res timesource

I have experinced some network performance degradation when using the 
pmtmr timesource, on a Opteron AMD system.  It seems that the default 
timesource change between 2.6.15 to 2.6.16.


If you use pmtmr try to reboot with kernel option clock=tsc.

On my Opteron AMD system i normally can route 400 kpps, but with 
timesource pmtmr i could only route around 83 kpps.  (I found the timer 
to be the issue by using oprofile).



Cheers,
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFT] bcm43xx: Busting the 1G limit

2006-06-19 Thread Daniel Gryniewicz
On Sat, 2006-06-17 at 19:28 +0200, Michael Buesch wrote:
 Hi,
 
 This patch adds full 32-bit and 64-bit DMA support
 to the bcm43xx driver. Well, it _should_ do this. I can
 not test it, as I don't have a machine to trigger the 1G
 limit.
 The 1G limit should be exploitable on an AMD64 machine
 with more than 1G RAM.
 
 Please test and report, if it works or not. In the
 case of works not, please provide full dmesg log.
 
 Note that I am not sure which cards actually support
 full 32-bit or even 64-bit mode. Older cards might still
 only support 30-bit DMA.

Hi.

I tried this on both 2.6.17-rc6 and on wireless-dev, and got pretty much
the same panic on both (modulo locking).  My box is a turion with 2 GB
of ram and a BCM4318.  Here's the panic from wireless-dev:

Unable to handle kernel NULL pointer dereference at 0020
RIP:
88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436}
PGD 0
Oops:  [1] PREEMPT
CPU 0
Modules linked in: uhci_hdc ieee80211_crypt_wep cryptoloop loop
snd_atiixp_modem
snd_atiixp snd_ac97_codec snd_ac97_bus bcm43xx snd_pcm snd_timer
ieee80211softmac ehci_hcd snd ohci1394 ieee80211 ohci_hdc sdhci ieee1394
yenta_socket usbcore mmc_core soundcore rsrc_nonstatic ieee80211_crypt
8139too
snd_page_alloc pcmcia_core
Pid: 6139, comm: iwconfig Not tainted 2.6.17-rc6-dfg1-g57aab842-dirty #1
RIP: 0010:[88104f24]
88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436}
RSP: 0018:81445df8  EFLAGS: 00010002
RAX: 0063 RBX: 0001 RCX: 
RDX:  RSI: 0082 RDI: 0001
RBP: 81445e28 R08: 0002e8c7 R09: 
R10:  R11: fffa R12: 
R13: 30d1 R14: 81445eb8 R15: 00d0
FS:  2b8b5dc68d20() GS:81445eb8()
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0020 CR3: 76314000 CR4: 06e0
Process iwconfig (pid: 6139, threadinfo 8100748ae000, task
810075840890)
Stack: 81007510f050  81007510f050
8800
   814453b8 2800 814453f8
880f13cb
   81445e78 81007541e740
Call Trace: IRQ 880f13cb{:bcm43xx:bcm43xx_interrupt_tasklet
+2379}
   81092b78{tasklet_action+72}
810126d0{__do_softirq+80}
   8106872a{call_softirq+30} 81075e04{do_softirq
+52}
   81092cf4{irq_exit+63} 81075e51{do_IRQ+65}
   81067dae{ret_from_intr+0} EOI
810078c8{_raw_spin_lock+296}
   8106dc9e{_spin_lock+30}
810202dd{unlink_file_vma+61}
   810206a8{free_pagetables+152}
8103ee97{exit_mmap+135}
   810416b6{mmput+54} 81047b63{exit_mmap+243}
   81016a9a{do_exit+602} 81012e1c{__fput+428}
   810506e0{debug_mutex_init+0}
81054862{sys_exit_group+18}
   81067892{system_call+126}

Code: 45 3b 7c 24 20 7c 28 49 c7 c0 be 9e 10 88 b9 59 03 00 00 48
RIP 88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436} RSP
81445df8
CR2: 0020
 0Kernel panic - not syncing: Aiee, killing interrupt handler!

Daniel

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DOC]: generic netlink

2006-06-19 Thread jamal
On Mon, 2006-19-06 at 11:13 -0400, James Morris wrote:

 
 It seems that TIPC is multiplexing all of it's commands through  
 TIPC_GENL_CMD.


TIPC is a deviation; they had the 100 ioctls and therefore did a direct
one-to-one mapping.

 I wonder, if this is how other protocols are likely to utilize genl, then 
 we could possibly drop the command registration code completely and one 
 command op can be registered by the protocol during 
 genl_register_family().
 

The intent is to have a handful of commands as in classical netlink
(eg route or qdisc etc) where you are controlling data that sits in the
kernel; i.e when you have an attribute or a vector of attributes, then
the commands will be of the semantics: ADD/DEL/GET/DUMP only. 
Other that TIPC the two other users i have seen use it in this manner.
But, you are right if usage tends to lean in some other way we could get
rid of it (I think TIPC is a bad example).

 This would both simplify the genl code and API, and help ensure 
 consistency of users.
 

You are talking from an SELinux perspective i take it?
My view: If you want to have ACLs against such commands
then it becomes easier to say can only do ADD but not DEL for example
(We need to resolve genl_rcv_msg() check on commands to be in sync with
SELinux as was pointed by Thomas)

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DOC]: generic netlink

2006-06-19 Thread James Morris
On Mon, 19 Jun 2006, jamal wrote:

 Other that TIPC the two other users i have seen use it in this manner.
 But, you are right if usage tends to lean in some other way we could get
 rid of it (I think TIPC is a bad example).

Ok, perhaps make a note in the docs about this and keep an eye out when 
new code is submitted, and encourage people not to do this.

  This would both simplify the genl code and API, and help ensure 
  consistency of users.
  
 
 You are talking from an SELinux perspective i take it?

Actually, what would help SELinux is the opposite, forcing everyone to use 
separate commands and assigning security attributes to each one.  But 
because TIPC is already multiplexing, it's not feasible.

Instead, I think the way to go for SELinux is to have each nl family 
provide a permission callback, so SELinux can pass the skb back to the nl 
module which then returns a type of permission ('read', 'write', 
'readpriv').  This way, the nl module can create and manage its own 
internal table of command permissions and also know exactly where in the 
message to dig for the command specifier.

 My view: If you want to have ACLs against such commands then it becomes 
 easier to say can only do ADD but not DEL for example (We need to 
 resolve genl_rcv_msg() check on commands to be in sync with SELinux as 
 was pointed by Thomas)

This already exists, to some extent, but only for some protocols. You can 
see examples of existing permission tables managed by SELinux in:
 security/selinux/nlmsgtab.c

The hope move this out of SELinux and into each nl module, which is much 
more manageable and scalable.


- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DOC]: generic netlink

2006-06-19 Thread Shailabh Nagar
jamal wrote:
 On Mon, 2006-19-06 at 11:13 -0400, James Morris wrote:
 
 
It seems that TIPC is multiplexing all of it's commands through  
TIPC_GENL_CMD.
 
 
 
 TIPC is a deviation; they had the 100 ioctls and therefore did a direct
 one-to-one mapping.
 
 
I wonder, if this is how other protocols are likely to utilize genl, then 
we could possibly drop the command registration code completely and one 
command op can be registered by the protocol during 
genl_register_family().

 
 
 The intent is to have a handful of commands as in classical netlink
 (eg route or qdisc etc) where you are controlling data that sits in the
 kernel; i.e when you have an attribute or a vector of attributes, then
 the commands will be of the semantics: ADD/DEL/GET/DUMP only. 
 Other that TIPC the two other users i have seen use it in this manner.
 But, you are right if usage tends to lean in some other way we could get
 rid of it (I think TIPC is a bad example).

The taskstats interface, currently in -mm, is one user of genetlink
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17-rc6/2.6.17-rc6-mm2/broken-out/per-task-delay-accounting-taskstats-interface.patch

Based on Jamal's suggestions, we found it useful to have the limited
set of commands model and ended up with having to register just one GET
command. And in subsequent discussions, a SET command would also be handy.

But I'm not too clear about what are the advantages of trying to limit the
number of commands registered by a given exploiter of genetlink (say TIPC or 
taskstats),
other than the conventional usage of netlink.

e.g in the taskstats code, userspace needs to GET data on a per-pid and 
per-tgid basis
from the kernel and supplies the specific pid or tgid. We could either have 
registered
two commands (say GET_PID and GET_TGID) and then the parsing of the supplied 
uint32 would
be implicit in the command. But we went with the model where we have only one 
GET command
and the type of the parameter is specified via netlink attributes.

In our case, it didn't matter and since the type of data returned is very 
similar and so is
the parameter supplied (pid/tgid), one GET suffices. But I'm wondering if 
userspace should
consciously try and limit the commands or would it be better from a performance 
standpoint,
to permit a reasonably larger fan-out to happen at the genetlink command 
level (for each exploiter).
I guess this introduces more overhead for in-kernel structures (the linked list 
of command structures
that needs to be kept around) while saving time on doing a second level of 
parsing within the
exploiter-defined function that services the GET command.

The small set model looks like a good compromise. Reducing number of commands 
to one is not a good
idea IMHOfor reasons similar to why ioctl type syscalls aren't 
encouraged...since the genetlink
layer anyway has code for demultiplexing, might as well use it and avoid an 
extra level of indirection.

--Shailabh


This would both simplify the genl code and API, and help ensure 
consistency of users.

 
 
 You are talking from an SELinux perspective i take it?
 My view: If you want to have ACLs against such commands
 then it becomes easier to say can only do ADD but not DEL for example
 (We need to resolve genl_rcv_msg() check on commands to be in sync with
 SELinux as was pointed by Thomas)
 
 cheers,
 jamal
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/1] AF_UNIX Datagram getpeersec (with minor fix)

2006-06-19 Thread Xiaolan Zhang
James Morris [EMAIL PROTECTED] wrote on 06/18/2006 04:04:06 AM:

 On Sun, 18 Jun 2006, Catherine Zhang wrote:
 
 I'd also mention here that this is to complement the SO_PEERSEC option 
for 
 stream sockets.
 

OK.


 There's an implementation issue, which I'm sure has been mentioned 
 previously.  This code should not be calling SELinux API functions.
 
  @@ -62,6 +70,12 @@ static __inline__ void scm_recv(struct s
if (test_bit(SOCK_PASSCRED, sock-flags))
put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, 
 sizeof(scm-creds), scm-creds);
  
  + if (test_bit(SOCK_PASSSEC, sock-flags)) {
  + err = selinux_ctxid_to_string(scm-sid, scontext, 
 scontext_len);
 
 

I remember this issue being discussed, but no conclusion was made.  The 
reason that we cannot use socket_getpeersec_dgram directly is that it 
passes skb as the argument, instead of socket.  If we want to reuse the 
same hook for UNIX, then we have to make change to the interface.  I was 
debating on whether I should add another hook for the UNIX domain...

Let me check whether it'll be possible to reuse socket_getpeersec_dgram 
without too much disruption/complicaiton and I will repost.

thanks,
Catherine

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Harry Edmon



Jesper Dangaard Brouer wrote:



Harry Edmon [EMAIL PROTECTED] wrote:

I have a system with a strange network performance degradation from 
2.6.11.12 to most recent kernels including 2.6.16.20 and 
2.6.17-rc6. The system is has Dual single core Xeons with 
hyperthreading on.

cut

Hi Harry

Can you check which high-res timesource you are using?

In the kernel log look for:
 kernel: Using tsc for high-res timesource
 kernel: Using pmtmr for high-res timesource

I have experinced some network performance degradation when using the 
pmtmr timesource, on a Opteron AMD system.  It seems that the 
default timesource change between 2.6.15 to 2.6.16.


If you use pmtmr try to reboot with kernel option clock=tsc.

On my Opteron AMD system i normally can route 400 kpps, but with 
timesource pmtmr i could only route around 83 kpps.  (I found the 
timer to be the issue by using oprofile).




We have CONFIG_HPET_TIMER=y, so we do not see these messages.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Chris Friesen

Andi Kleen wrote:


Incoming packets are only time stamped
when someone asks for the timestamps.


Doesn't that add scheduling latency to the timestamps?  Or is is a flag 
that gets set to trigger timestamping at packet arrival?


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Jesper Dangaard Brouer


On Mon, 19 Jun 2006, Andi Kleen wrote:


If you use pmtmr try to reboot with kernel option clock=tsc.


That's dangerous advice - when the system choses not to use
TSC it often has a reason.


Sorry, it was not a general advice, just something to try out.  It really 
solved my network performance issue...




On my Opteron AMD system i normally can route 400 kpps, but with
timesource pmtmr i could only route around 83 kpps.  (I found the timer
to be the issue by using oprofile).


Unless you're using packet sniffing or any other application
that requests time stamps on a socket then the timer shouldn't
make much difference. Incoming packets are only time stamped
when someone asks for the timestamps.


I do not know what caused the issue on my machine, but I can look into it 
if you like to know?


I do have VLAN interfaces on the machine and it seems that eth1 runs in 
PROMISC mode (eth1.xxx does not).  Could it be caused by that?


Hilsen
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] update sunrpc to use in-kernel sockets API - ver2

2006-06-19 Thread Sridhar Samudrala
This patch updates sunrpc to use in-kernel sockets API.

Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]
Acked-by: James Morris [EMAIL PROTECTED]

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -388,7 +388,7 @@ svc_sendto(struct svc_rqst *rqstp, struc
/* send head */
if (slen == xdr-head[0].iov_len)
flags = 0;
-   len = sock-ops-sendpage(sock, rqstp-rq_respages[0], 0, 
xdr-head[0].iov_len, flags);
+   len = kernel_sendpage(sock, rqstp-rq_respages[0], 0, 
xdr-head[0].iov_len, flags);
if (len != xdr-head[0].iov_len)
goto out;
slen -= xdr-head[0].iov_len;
@@ -400,7 +400,7 @@ svc_sendto(struct svc_rqst *rqstp, struc
while (pglen  0) {
if (slen == size)
flags = 0;
-   result = sock-ops-sendpage(sock, *ppage, base, size, flags);
+   result = kernel_sendpage(sock, *ppage, base, size, flags);
if (result  0)
len += result;
if (result != size)
@@ -413,7 +413,7 @@ svc_sendto(struct svc_rqst *rqstp, struc
}
/* send tail */
if (xdr-tail[0].iov_len) {
-   result = sock-ops-sendpage(sock, 
rqstp-rq_respages[rqstp-rq_restailpage], 
+   result = kernel_sendpage(sock, 
rqstp-rq_respages[rqstp-rq_restailpage],
 ((unsigned 
long)xdr-tail[0].iov_base) (PAGE_SIZE-1),
 xdr-tail[0].iov_len, 0);
 
@@ -434,13 +434,10 @@ out:
 static int
 svc_recv_available(struct svc_sock *svsk)
 {
-   mm_segment_toldfs;
struct socket   *sock = svsk-sk_sock;
int avail, err;
 
-   oldfs = get_fs(); set_fs(KERNEL_DS);
-   err = sock-ops-ioctl(sock, TIOCINQ, (unsigned long) avail);
-   set_fs(oldfs);
+   err = kernel_sock_ioctl(sock, TIOCINQ, (unsigned long) avail);
 
return (err = 0)? avail : err;
 }
@@ -472,7 +469,7 @@ svc_recvfrom(struct svc_rqst *rqstp, str
 * at accept time. FIXME
 */
alen = sizeof(rqstp-rq_addr);
-   sock-ops-getname(sock, (struct sockaddr *)rqstp-rq_addr, alen, 1);
+   kernel_getpeername(sock, (struct sockaddr *)rqstp-rq_addr, alen);
 
dprintk(svc: socket %p recvfrom(%p, %Zu) = %d\n,
rqstp-rq_sock, iov[0].iov_base, iov[0].iov_len, len);
@@ -758,7 +755,6 @@ svc_tcp_accept(struct svc_sock *svsk)
struct svc_serv *serv = svsk-sk_server;
struct socket   *sock = svsk-sk_sock;
struct socket   *newsock;
-   const struct proto_ops *ops;
struct svc_sock *newsvsk;
int err, slen;
 
@@ -766,29 +762,23 @@ svc_tcp_accept(struct svc_sock *svsk)
if (!sock)
return;
 
-   err = sock_create_lite(PF_INET, SOCK_STREAM, IPPROTO_TCP, newsock);
-   if (err) {
+   clear_bit(SK_CONN, svsk-sk_flags);
+   err = kernel_accept(sock, newsock, O_NONBLOCK);
+   if (err  0) {
if (err == -ENOMEM)
printk(KERN_WARNING %s: no more sockets!\n,
   serv-sv_name);
-   return;
-   }
-
-   dprintk(svc: tcp_accept %p allocated\n, newsock);
-   newsock-ops = ops = sock-ops;
-
-   clear_bit(SK_CONN, svsk-sk_flags);
-   if ((err = ops-accept(sock, newsock, O_NONBLOCK))  0) {
-   if (err != -EAGAIN  net_ratelimit())
+   else if (err != -EAGAIN  net_ratelimit())
printk(KERN_WARNING %s: accept failed (err %d)!\n,
   serv-sv_name, -err);
-   goto failed;/* aborted connection or whatever */
+   return;
}
+
set_bit(SK_CONN, svsk-sk_flags);
svc_sock_enqueue(svsk);
 
slen = sizeof(sin);
-   err = ops-getname(newsock, (struct sockaddr *) sin, slen, 1);
+   err = kernel_getpeername(newsock, (struct sockaddr *) sin, slen);
if (err  0) {
if (net_ratelimit())
printk(KERN_WARNING %s: peername failed (err %d)!\n,
@@ -1407,14 +1397,14 @@ svc_create_socket(struct svc_serv *serv,
if (sin != NULL) {
if (type == SOCK_STREAM)
sock-sk-sk_reuse = 1; /* allow address reuse */
-   error = sock-ops-bind(sock, (struct sockaddr *) sin,
+   error = kernel_bind(sock, (struct sockaddr *) sin,
sizeof(*sin));
if (error  0)
goto bummer;
}
 
if (protocol == IPPROTO_TCP) {
-   if ((error = sock-ops-listen(sock, 64))  0)
+   if ((error = kernel_listen(sock, 64))  0)
goto bummer;
}
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
--- a/net/sunrpc/xprtsock.c
+++ 

[PATCH 1/2] in-kernel sockets API - ver2

2006-06-19 Thread Sridhar Samudrala
This patch implements wrapper functions that provide a convenient way to
access the sockets API for in-kernel users like sunrpc, cifs  ocsf2 etc
and any future users.

The only change from the version i submitted last week is the renaming of
kernel_ioctl to kernel_sock_ioctl.

I left the exports to use EXPORT_SYMBOL() to match with the existing
interfaces sock_create_kern(), kernel_sendmsg(), kernel_recvmsg etc.

Thanks
Sridhar

Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]
Acked-by: James Morris [EMAIL PROTECTED]

diff --git a/include/linux/net.h b/include/linux/net.h
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -208,6 +208,25 @@ extern int  kernel_recvmsg(struct
struct kvec *vec, size_t num,
size_t len, int flags);
 
+extern int kernel_bind(struct socket *sock, struct sockaddr *addr,
+  int addrlen);
+extern int kernel_listen(struct socket *sock, int backlog);
+extern int kernel_accept(struct socket *sock, struct socket **newsock,
+int flags);
+extern int kernel_connect(struct socket *sock, struct sockaddr *addr,
+ int addrlen, int flags);
+extern int kernel_getsockname(struct socket *sock, struct sockaddr *addr,
+ int *addrlen);
+extern int kernel_getpeername(struct socket *sock, struct sockaddr *addr,
+ int *addrlen);
+extern int kernel_getsockopt(struct socket *sock, int level, int optname,
+char *optval, int *optlen);
+extern int kernel_setsockopt(struct socket *sock, int level, int optname,
+char *optval, int optlen);
+extern int kernel_sendpage(struct socket *sock, struct page *page, int offset,
+  size_t size, int flags);
+extern int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg);
+
 #ifndef CONFIG_SMP
 #define SOCKOPS_WRAPPED(name) name
 #define SOCKOPS_WRAP(name, fam)
diff --git a/net/socket.c b/net/socket.c
--- a/net/socket.c
+++ b/net/socket.c
@@ -2160,6 +2160,109 @@ static long compat_sock_ioctl(struct fil
 }
 #endif
 
+int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen)
+{
+   return sock-ops-bind(sock, addr, addrlen);
+}
+
+int kernel_listen(struct socket *sock, int backlog)
+{
+   return sock-ops-listen(sock, backlog);
+}
+
+int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
+{
+   struct sock *sk = sock-sk;
+   int err;
+
+   err = sock_create_lite(sk-sk_family, sk-sk_type, sk-sk_protocol,
+  newsock);
+   if (err  0)
+   goto done;
+
+   err = sock-ops-accept(sock, *newsock, flags);
+   if (err  0) {
+   sock_release(*newsock);
+   goto done;
+   }
+
+   (*newsock)-ops = sock-ops;
+
+done:
+   return err;
+}
+
+int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen,
+   int flags)
+{
+   return sock-ops-connect(sock, addr, addrlen, flags);
+}
+
+int kernel_getsockname(struct socket *sock, struct sockaddr *addr,
+int *addrlen)
+{
+   return sock-ops-getname(sock, addr, addrlen, 0);
+}
+
+int kernel_getpeername(struct socket *sock, struct sockaddr *addr,
+int *addrlen)
+{
+   return sock-ops-getname(sock, addr, addrlen, 1);
+}
+
+int kernel_getsockopt(struct socket *sock, int level, int optname,
+   char *optval, int *optlen)
+{
+   mm_segment_t oldfs = get_fs();
+   int err;
+
+   set_fs(KERNEL_DS);
+   if (level == SOL_SOCKET)
+   err = sock_getsockopt(sock, level, optname, optval, optlen);
+   else
+   err = sock-ops-getsockopt(sock, level, optname, optval,
+   optlen);
+   set_fs(oldfs);
+   return err;
+}
+
+int kernel_setsockopt(struct socket *sock, int level, int optname,
+   char *optval, int optlen)
+{
+   mm_segment_t oldfs = get_fs();
+   int err;
+
+   set_fs(KERNEL_DS);
+   if (level == SOL_SOCKET)
+   err = sock_setsockopt(sock, level, optname, optval, optlen);
+   else
+   err = sock-ops-setsockopt(sock, level, optname, optval,
+   optlen);
+   set_fs(oldfs);
+   return err;
+}
+
+int kernel_sendpage(struct socket *sock, struct page *page, int offset,
+   size_t size, int flags)
+{
+   if (sock-ops-sendpage)
+   return sock-ops-sendpage(sock, page, offset, size, flags);
+
+   return sock_no_sendpage(sock, page, offset, size, flags);
+}
+
+int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg)
+{
+   mm_segment_t oldfs = get_fs();
+   int err;
+
+   set_fs(KERNEL_DS);
+   err = sock-ops-ioctl(sock, cmd, arg);
+   

Re: [PATCH 2/2] NET: Accurate packet scheduling for ATM/ADSL (userspace)

2006-06-19 Thread Jesper Dangaard Brouer



On Thu, 15 Jun 2006, jamal wrote:


On Thu, 2006-15-06 at 10:47 +1000, Russell Stuart wrote:

On Wed, 2006-06-14 at 11:57 +0100, Alan Cox wrote:

The other problem I see with this code is it is very tightly tied to ATM
cell sizes, not to solving the generic question of packetisation.


Others have made this point also.  I can't speak for Jesper,
but I did consider making it generic.


I also have considered to make it generic, but choose to make my patch as 
non-intrusive as possible to the kernel (and try to handle as much in 
userspace as possible).


Actually I do think that the kernel patch part is very generic.
The patch simply allow us to align the rate table/array.

With the kernel patch in place, we can work on the userspace TC program to 
support more and more types of exotic link layer modeling.




The issue was that
doing so would add more code, but I don't personally know
of any real world situation that would use the generic
solution.  I didn't fancy the thought of arguing on these
lists for code that no one would actually use.


;-)



If someone could put up their hand and say Hey, I need
this, then expanding the patch to accommodate them would
be a pleasure.  I like generic code too.



It is probably doable by just looking at netdevice-type and figuring
the link layer technology. Totally in user space and building the
compensated for tables there before telling the kernel (advantage is no
kernel changes and therefore it would work with older kernels as well).


I think you have got the setup all wrong.

The linux middlebox/router has two ethernet interfaces, one of the 
ethernet interfaces is connected to the ADSL modem.  Thus, the linux 
ethernet card cannot determine that it is connected to an ADSL line.



The patch is the solution to the classical problem people 
have when tryng to configure traffic control on an ADSL link?


Q: The packet scheduling does not work all the time?
A: Try to decrease to bandwidth.

The issue here is, that ATM does not have fixed overhead (due to alignment 
and padding).  This means that a fixed reduction of the bandwidth is not 
the solution.  We could reduce the bandwidth to the worst-case overhead, 
which is 62%, I do not think that is a good solution...


With the patch, you can now simply configure HTB to use the rate that was 
specified by the ISP.


Please read chapter 6 (Achieving Queue Control) page 55-65, where I 
demonstrate that the naive approach of reducing bandwidth does not work, 
when the packet distribution change on the link.


 http://www.adsl-optimizer.dk/thesis/

Cheers,
  Jesper Brouer

--
---
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFT] bcm43xx: Busting the 1G limit

2006-06-19 Thread Michael Buesch
On Monday 19 June 2006 17:23, Daniel Gryniewicz wrote:
 On Sat, 2006-06-17 at 19:28 +0200, Michael Buesch wrote:
  Hi,
  
  This patch adds full 32-bit and 64-bit DMA support
  to the bcm43xx driver. Well, it _should_ do this. I can
  not test it, as I don't have a machine to trigger the 1G
  limit.
  The 1G limit should be exploitable on an AMD64 machine
  with more than 1G RAM.
  
  Please test and report, if it works or not. In the
  case of works not, please provide full dmesg log.
  
  Note that I am not sure which cards actually support
  full 32-bit or even 64-bit mode. Older cards might still
  only support 30-bit DMA.
 
 Hi.
 
 I tried this on both 2.6.17-rc6 and on wireless-dev, and got pretty much
 the same panic on both (modulo locking).  My box is a turion with 2 GB
 of ram and a BCM4318.  Here's the panic from wireless-dev:
 
 Unable to handle kernel NULL pointer dereference at 0020
 RIP:
 88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436}

I am still not absolutely sure where this oops comes from.
Could you remove at least 1G of your RAM and retry?

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] pcnet32 NAPI changes

2006-06-19 Thread Jon Mason
On Fri, Jun 16, 2006 at 12:11:54PM -0700, Don Fry wrote:
 This patch is a collection of changes to pcnet32 which does the
 following: 
 
 - Fix section mismatch warning.
 - fix set_ringparam to correctly handle memory allocation failures
 - fix off-by-one in get_ringparam.
 - cleanup at end of loopback_test when not up.
 - Add NAPI to driver, fixing set_ringparam and loopback_test to work
   correctly with poll.
 - for multicast, do not reset the chip unless cannot enter suspend mode
   to avoid race with poll.
 
 The set_ringparam code is larger than I would prefer, but it will not
 leave null pointers around for the code to stumble over when memory
 allocation fails.  If anyone has a better idea, please let me know.
 
 Some complexity could be avoided by allocating memory for the maximum
 number of tx and rx buffers at probe time.  Requiring 14k for the tx
 ring and arrays, and another 14k for rx; instead of about 10k total for
 the default sizes.
 
 It is NAPI only, unlike Len Sorensen's version which allows for compile
 time selection.  Some drivers are NAPI only, others have compile
 options.  Which is preferred?

I believe it is preferred to be a compile option for non-gigabit
drivers, given that it will be eating a lot of cycles for infrequent
packets (especially for the 10Mb).  I believe there was a thread about
this last year when e100 was having NAPI problems.

A general nit.  There are ALOT of magic numbers in the code, most
existing prior to this patch.  The driver would benefit from a little
clean-up.

Also nothing to do with this patch, but I noticed it when the code was
moved.  A comment about why the following is necessary might be nice:
lp-rx_ring[i].buf_length = le16_to_cpu(2 - PKT_BUF_SZ);

Thanks,
Jon

 
 I have tested these changes with a 79C971, 973, 976, and 978 on a ppc64
 machine, and 970A, 972, 973, 975, and 976 on an x86 machine.
 
 I have not tested these changes with VMware or Xen.
 
 
 
 --- linux-2.6.17-rc6/drivers/net/orig.pcnet32.c   2006-06-15 
 11:49:39.0 -0700
 +++ linux-2.6.17-rc6/drivers/net/pcnet32.c2006-06-16 11:30:45.0 
 -0700
 @@ -22,8 +22,8 @@
   */
  
  #define DRV_NAME pcnet32
 -#define DRV_VERSION  1.32
 -#define DRV_RELDATE  18.Mar.2006
 +#define DRV_VERSION  1.33-NAPI
 +#define DRV_RELDATE  16.Jun.2006
  #define PFX  DRV_NAME : 
  
  static const char *const version =
 @@ -277,13 +277,12 @@ struct pcnet32_private {
   u32 phymask;
  };
  
 -static void pcnet32_probe_vlbus(void);
  static int pcnet32_probe_pci(struct pci_dev *, const struct pci_device_id *);
  static int pcnet32_probe1(unsigned long, int, struct pci_dev *);
  static int pcnet32_open(struct net_device *);
  static int pcnet32_init_ring(struct net_device *);
  static int pcnet32_start_xmit(struct sk_buff *, struct net_device *);
 -static int pcnet32_rx(struct net_device *);
 +static int pcnet32_poll(struct net_device *dev, int *budget);
  static void pcnet32_tx_timeout(struct net_device *dev);
  static irqreturn_t pcnet32_interrupt(int, void *, struct pt_regs *);
  static int pcnet32_close(struct net_device *);
 @@ -425,6 +424,215 @@ static struct pcnet32_access pcnet32_dwi
   .reset = pcnet32_dwio_reset
  };
  
 +static void pcnet32_netif_stop(struct net_device *dev)
 +{
 + dev-trans_start = jiffies;
 + netif_poll_disable(dev);
 + netif_tx_disable(dev);
 +}
 +
 +static void pcnet32_netif_start(struct net_device *dev)
 +{
 + netif_wake_queue(dev);
 + netif_poll_enable(dev);
 +}
 +
 +/*
 + * Allocate space for the new sized tx ring.
 + * Free old resources
 + * Save new resources.
 + * Any failure keeps old resources.
 + * Must be called with lp-lock held.
 + */
 +static void pcnet32_realloc_tx_ring(struct net_device *dev,
 + struct pcnet32_private *lp,
 + unsigned int size)
 +{
 + dma_addr_t new_ring_dma_addr;
 + dma_addr_t *new_dma_addr_list;
 + struct pcnet32_tx_head *new_tx_ring;
 + struct sk_buff **new_skb_list;
 +
 + pcnet32_purge_tx_ring(dev);
 +
 + new_tx_ring = pci_alloc_consistent(lp-pci_dev,
 +sizeof(struct pcnet32_tx_head) *
 +(1  size),
 +new_ring_dma_addr);
 + if (new_tx_ring == NULL) {
 + if (pcnet32_debug  NETIF_MSG_DRV)
 + printk(\n KERN_ERR PFX
 +%s: Consistent memory allocation failed.\n,
 +dev-name);
 + return;
 + }
 + memset(new_tx_ring, 0, sizeof(struct pcnet32_tx_head) * (1  size));
 +
 + new_dma_addr_list = kcalloc(sizeof(dma_addr_t), (1  size), 
 GFP_ATOMIC);
 + if (!new_dma_addr_list) {
 + if (pcnet32_debug  NETIF_MSG_DRV)
 + printk(\n KERN_ERR PFX
 +  

Re: [RFT] pcnet32 NAPI changes

2006-06-19 Thread Lennart Sorensen
On Mon, Jun 19, 2006 at 03:41:40PM -0500, Jon Mason wrote:
 I believe it is preferred to be a compile option for non-gigabit
 drivers, given that it will be eating a lot of cycles for infrequent
 packets (especially for the 10Mb).  I believe there was a thread about
 this last year when e100 was having NAPI problems.

How does NAPI eat cycles?  It goes back to interrupt mode when the queue
is empty, and only on RX interrupt does it turn on polling again.

It is certainly possible that there are bugs in a NAPI conversion, which
I guess could be a reason to have the option to stick with the old
method, although then again not having the option ensures the bugs get
found sooner.

 A general nit.  There are ALOT of magic numbers in the code, most
 existing prior to this patch.  The driver would benefit from a little
 clean-up.
 
 Also nothing to do with this patch, but I noticed it when the code was
 moved.  A comment about why the following is necessary might be nice:
 lp-rx_ring[i].buf_length = le16_to_cpu(2 - PKT_BUF_SZ);

I suspect many drivers are in need of some cleanup.

Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Andi Kleen
On Monday 19 June 2006 19:34, Chris Friesen wrote:
 Andi Kleen wrote:
  Incoming packets are only time stamped
  when someone asks for the timestamps.

 Doesn't that add scheduling latency to the timestamps?  Or is is a flag
 that gets set to trigger timestamping at packet arrival?

It's a flag (or more precise a global counter) 

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 6681] New: TC crash and rule freeze

2006-06-19 Thread Andrew Morton
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=6681
 
Summary: TC crash and rule freeze
 Kernel Version: 2.6.16-gentoo-r6
 Status: NEW
   Severity: normal
  Owner: [EMAIL PROTECTED]
  Submitter: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur:
 2.6.16-gentoo-r6
 Distribution:
 Gentoo
 Hardware Environment:
 00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
 Interface (rev 02)
 00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated 
 Graphics
 Controller (rev 02)
 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
 Controller #1 (rev 02)
 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
 Controller #2 (rev 02)
 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
 Controller #3 (rev 02)
 00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
 Controller #4 (rev 02)
 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
 Controller (rev 02)
 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
 Bridge (rev 02)
 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE 
 Controller
 (rev 02)
 01:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] 
 (rev 05)
 01:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] 
 (rev 05)
 01:0b.0 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03)
 01:0b.1 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03)
 Software Environment:
 sys-apps/iproute2-2.6.16.20060323
 Problem Description:
 #cat dmeseg
 Unable to handle kernel NULL pointer dereference at virtual address 000c
 printing eip:
 c0217c26
 *pde = 
 Oops:  [#1]
 SMP
 Modules linked in: sch_sfq cls_u32 sch_red sch_htb iptable_filter ip_tables
 x_tables uhci_hcd ehci_hcd usbcore
 CPU:0
 EIP:0060:[c0217c26]Not tainted VLI
 EFLAGS: 00010287   (2.6.16-gentoo-r6 #2)
 EIP is at __rb_erase_color+0x94/0x1ad
 eax: f55f4954   ebx: f724bb54   ecx: f724bb54   edx: 
 esi:    edi: f7679468   ebp: f7679468   esp: f676bbbc
 ds: 007b   es: 007b   ss: 0068
 Process tc (pid: 24294, threadinfo=f676a000 task=f7d7da90)
 Stack: 0f724bb54 f7679468  e6bbf154  c0217e36  
 f724bb54
 f7679468 e6bbf000 e6bbf06c f7679000 f7679080 f8903366 e6bbf154 f7679468
 0004 00d0  00010006 000103c9 f7679000 c0311ffa f7679000
 Call Trace:
 [c0217e36] rb_erase+0xf7/0x12d
 [f8903366] htb_destroy_class+0xec/0x15d [sch_htb]
 [c0311ffa] tc_ctl_tclass+0x1b1/0x288
 [c030d69e] rtnetlink_dump_ifinfo+0x6c/0x89
 [c030dcef] rtnetlink_rcv_msg+0x171/0x233
 [c031759f] netlink_dump+0x94/0x1e2
 [c030db7e] rtnetlink_rcv_msg+0x0/0x233
 [c0317a45] netlink_rcv_skb+0x46/0xad
 [c030db7e] rtnetlink_rcv_msg+0x0/0x233
 [c0317aec] netlink_run_queue+0x40/0xd0
 [c030db7e] rtnetlink_rcv_msg+0x0/0x233
 [c030db5e] rtnetlink_rcv+0x2e/0x4e
 [c030db7e] rtnetlink_rcv_msg+0x0/0x233
 [c031735c] netlink_data_ready+0x60/0x62
 [c03164ed] netlink_sendskb+0x32/0x61
 [c031704d] netlink_sendmsg+0x291/0x304
 [c02f9b0d] sock_sendmsg+0xeb/0x10d
 [c02f9b0d] sock_sendmsg+0xeb/0x10d
 [c0131fa6] autoremove_wake_function+0x0/0x57
 [c021a084] copy_from_user+0x46/0x7e
 [c0300ae4] verify_iovec+0x44/0x9e
 [c02fb525] sys_sendmsg+0x15a/0x272
 [c0140ca6] filemap_nopage+0x30d/0x38a
 [c0152e83] page_add_file_rmap+0x2a/0x2e
 [c014e014] do_no_page+0x219/0x278
 [c021a084] copy_from_user+0x46/0x7e
 [c02fbaf7] sys_socketcall+0x28d/0x294
 [c0102ca7] sysenter_past_esp+0x54/0x75
 Code: 04 01 00 00 00 c7 43 04 00 00 00 00 89 7c 24 04 89 1c 24 e8 6b fe ff ff 
 8b
 53 0c eb 8e 8b 53 08 8b 72 04 85 f6 0f 84 82 00 00 00  8b 4a 0c 85 c9 74 0a 
 83
 79 04 01 0f 85 00 01 00 00 8b 72 08 85
 

It crashed in net/sched/somewhere.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 6682] New: BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU

2006-06-19 Thread Andrew Morton
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=6682
 
Summary: BUG: soft lockup detected on CPU#0! / ksoftirqd takse
 100% CPU
 Kernel Version: 2.6.15.6
 Status: NEW
   Severity: normal
  Owner: [EMAIL PROTECTED]
  Submitter: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: (unknown)
 Distribution: Gentoo
 Hardware Environment: 2x Xeon 2.66, 1 GB RAM, NICS: 2 x e1000, and one double
 port e100. Based on Intel E7501 architecture (2U rack Intel chassis).
 Software Environment: quagga 0.98.6
 Problem Description: ksoftirqd/0 takes 100% of CPU. further investigation 
 shows
 no sing of network flood or something (and also 2 of 3 NICs are e1000 with
 NAPI). Ocassionaly there are BUG: soft lockup detected on CPU#0!.
 
 
 Steps to reproduce:
 
 There is no simple way to reproduce. I think that everythint started when we
 attached second provider with BGP support. We are using quagga which injects
 about 186 000 routes into kernel. When running for a while (at least few 
 hours,
 sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in 
 logs:
 
 BUG: soft lockup detected on CPU#0!
 
 Pid: 6506, comm:zebra
 EIP: 0060:[c027f6fd] CPU: 0
 EIP is at _spin_lock+0x7/0xf
  EFLAGS: 0286Not tainted  (2.6.15.6)
 EAX: f6203180 EBX: e6fbf000 ECX:  EDX: f6bec000
 ESI: f6203000 EDI: eddb4b80 EBP: fff4 DS: 007b ES: 007b
 CR0: 8005003b CR2: aca6dff0 CR3: 361ad000 CR4: 06d0
  [c02396f9] dev_queue_xmit+0xe0/0x203
  [c0250de8] ip_output+0x1e1/0x237
  [c024f3f5] ip_forward+0x181/0x1df
  [c024e21a] ip_rcv+0x40c/0x485
  [c0239bd0] netif_receive_skb+0x12f/0x165
  [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000]
  [f885a1ca] e1000_clean+0x94/0x12f [e1000]
  [c0239d5a] net_rx_action+0x69/0xf0
  [c011a305] __do_softirq+0x55/0xbd
  [c011a39a] do_softirq+0x2d/0x31
  [c011a3f8] local_bh_enable+0x5a/0x65
  [c024a0a1] rt_run_flush+0x5f/0x80
  [c027623f] fn_hash_insert+0x352/0x39f
  [c027364c] inet_rtm_newroute+0x57/0x62
  [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c0247c1e] netlink_rcv_skb+0x3a/0x8b
  [c0247cb1] netlink_run_queue+0x42/0xc3
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c0241227] rtnetlink_rcv+0x22/0x40
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c024764c] netlink_data_ready+0x17/0x54
  [c0246a99] netlink_sendskb+0x1f/0x39
  [c0247449] netlink_sendmsg+0x27b/0x28c
  [c0231467] sock_sendmsg+0xce/0xe9
  [c0112b36] __wake_up+0x27/0x3b
  [c01a6216] copy_to_user+0x38/0x42
  [c01a625a] copy_from_user+0x3a/0x60
  [c01a625a] copy_from_user+0x3a/0x60
  [c0126be2] autoremove_wake_function+0x0/0x3a
  [c0236bcd] verify_iovec+0x49/0x7f
  [c02327f2] sys_sendmsg+0x152/0x1a8
  [c0147a62] do_sync_read+0xb8/0xeb
  [c01a6216] copy_to_user+0x38/0x42
  [c0126be2] autoremove_wake_function+0x0/0x3a
  [c0122b7a] getrusage+0x34/0x43
  [c0168504] inotify_dentry_parent_queue_event+0x29/0x7c
  [c01a625a] copy_from_user+0x3a/0x60
  [c0232b6b] sys_socketcall+0x167/0x180
  [c0102433] sysenter_past_esp+0x54/0x75
 
 BUG: soft lockup detected on CPU#0!
 
 Pid: 6506, comm:zebra
 EIP: 0060:[f8952052] CPU: 0
 EIP is at u32_classify+0x52/0x170 [cls_u32]
  EFLAGS: 0206Not tainted  (2.6.15.6)
 EAX: e2fbd020 EBX: f48649c0 ECX: 0010 EDX: 29b09d5a
 ESI: f48649ec EDI: 0001 EBP: e2fbd020 DS: 007b ES: 007b
 CR0: 8005003b CR2: 08154004 CR3: 361ad000 CR4: 06d0
  [f88462fa] ipt_do_table+0x2de/0x2fd [ip_tables]
  [f883b523] ip_nat_fn+0x177/0x185 [iptable_nat]
  [f88e159f] ip_refrag+0x23/0x5f [ip_conntrack]
  [c0244d82] tc_classify+0x2c/0x3f
  [f895514b] htb_classify+0x14b/0x1dd [sch_htb]
  [f8955638] htb_enqueue+0x1d/0x13a [sch_htb]
  [c02396fd] dev_queue_xmit+0xe4/0x203
  [c0250de8] ip_output+0x1e1/0x237
  [c024f3f5] ip_forward+0x181/0x1df
  [c024e21a] ip_rcv+0x40c/0x485
  [c0239bd0] netif_receive_skb+0x12f/0x165
  [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000]
  [f885a1ca] e1000_clean+0x94/0x12f [e1000]
  [c0239d5a] net_rx_action+0x69/0xf0
  [c011a305] __do_softirq+0x55/0xbd
  [c011a39a] do_softirq+0x2d/0x31
  [c011a3f8] local_bh_enable+0x5a/0x65
  [c024a0a1] rt_run_flush+0x5f/0x80
  [c027623f] fn_hash_insert+0x352/0x39f
  [c027364c] inet_rtm_newroute+0x57/0x62
  [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c0247c1e] netlink_rcv_skb+0x3a/0x8b
  [c0247cb1] netlink_run_queue+0x42/0xc3
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c0241227] rtnetlink_rcv+0x22/0x40
  [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
  [c024764c] netlink_data_ready+0x17/0x54
  [c0246a99] netlink_sendskb+0x1f/0x39
  [c0247449] netlink_sendmsg+0x27b/0x28c
  [c0231467] sock_sendmsg+0xce/0xe9
  [c0112b36] __wake_up+0x27/0x3b
  [c01a625a] copy_from_user+0x3a/0x60
  [c01a625a] copy_from_user+0x3a/0x60
  [c0126be2] 

Re: [NET]: Prevent multiple qdisc runs

2006-06-19 Thread Herbert Xu
On Mon, Jun 19, 2006 at 10:36:50AM -0400, jamal wrote:
 
 Ok, but:
 The queue lock will ensure only one of the qdisc runs (assuming
 different CPUs) will be able to dequeue at any one iota in time, no?
 And if you assume that the cpu that manages to get the tx lock as well
 is going to be contending for the qlock in ordewr to requeue, then the
 only scenario i can see the race happening is when you have one CPU
 faster than the other.
 Did i miss something?

First of all you could receive an IRQ in between dropping xmit_lock
and regaining the queue lock.  Secondly we now have lockless drivers
where this assumption also does not hold.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DOC]: generic netlink

2006-06-19 Thread Shailabh Nagar
jamal wrote:
 Folks,
 
 Attached is a document that should help people wishing to use generic
 netlink interface. It is a WIP so a lot more to go if i see interest.
 The doc has been around for a while, i spent part of yesterday and this
 morning cleaning it up. If you have sent me comments before, please
 forgive me for having misplaced them - just send again. 

Jamal,

Completing the documentation on generic netlink usage will definitely be
useful. I'd be happy to help out with this since I've recently gone through
trying to understand and use genetlink for the taskstats interface. Hopefully
this will help other users like me who aren't netlink experts to begin with !

I've sent you a patch to the document that attempts to cover the following
TODOS (didn't see any point sending it to the whole list since its harder to
read patches to documentation). Pls use as you see fit.

 TODO:
 a) Add a more complete compiling kernel module with events.
 Have Thomas put his Mashimaro example and point to it.
(not the Mashimaro example, nor a completly compiled module but snippets
of pseudo code taken from the user space program used in taskstats development,
modified to the foobar example you've used)
 b) Describe some details on how user space - kernel works
 probably using libnl??
 c) Describe discovery using the controller..

I'll provide another patch that will cover d) and e) in the set below, again
in the context of the foobar example, which might need to be modified a bit.

 d) talk about policies etc
 e) talk about how something coming from user space eventually
 gets to you.
 f) Talk about the TLV manipulation stuff from Thomas.
 g) submit controller patch to iproute2

One point...does d), f) etc. belong in a separate doc describing usage
of netlink attributes ? Its useful here too but not directly related to
genetlink perhaps.

 PS:- I dont have a good place to put this doc and point to, hence the
 17K attachment


http://www.kernel.org/pub/linux/kernel/people/hadi/ ?

(unless your permissions have been revoked for lack of use ! :-)

Having the current document will be useful to see what edits have been accepted
and work on that instead of the original.

--Shailabh
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bcm43xx-d80211: AccessPoint mode related fixes

2006-06-19 Thread Michael Buesch
Hi John,

Please apply this to wireless-dev.
There is no real reason to delay it, even _if_ there might
be still bugs in it. It's a development tree. That's what it is for. ;)

--

Get AccessPoint mode working in bcm43xx-d80211.
This patch is derived from Alexander Tsvyashchenko's original
patch. I (mb) extended it by endianess fixes and other bugfixes.

From: Alexander Tsvyashchenko [EMAIL PROTECTED]
Signed-off-by: Michael Buesch [EMAIL PROTECTED]


Index: 
wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c
===
--- 
wireless-dev-dscapeports.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c
2006-06-17 21:26:10.0 +0200
+++ wireless-dev-dscapeports/drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c 
2006-06-19 11:25:02.0 +0200
@@ -151,8 +151,10 @@
 {
u32 status;
 
+   assert(offset % 4 == 0);
+
status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
-   if (!(status  BCM43xx_SBF_XFER_REG_BYTESWAP))
+   if (status  BCM43xx_SBF_XFER_REG_BYTESWAP)
val = swab32(val);
 
bcm43xx_write32(bcm, BCM43xx_MMIO_RAM_CONTROL, offset);
@@ -312,7 +314,7 @@
}
 }
 
-void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf)
+static void bcm43xx_time_lock(struct bcm43xx_private *bcm)
 {
u32 status;
 
@@ -320,7 +322,19 @@
status |= BCM43xx_SBF_TIME_UPDATE;
bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
mmiowb();
+}
+
+static void bcm43xx_time_unlock(struct bcm43xx_private *bcm)
+{
+   u32 status;
+
+   status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
+   status = ~BCM43xx_SBF_TIME_UPDATE;
+   bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
+}
 
+static void bcm43xx_tsf_write_locked(struct bcm43xx_private *bcm, u64 tsf)
+{
/* Be careful with the in-progress timer.
 * First zero out the low register, so we have a full
 * register-overflow duration to complete the operation.
@@ -350,10 +364,13 @@
mmiowb();
bcm43xx_write16(bcm, BCM43xx_MMIO_TSF_0, v0);
}
+}
 
-   status = bcm43xx_read32(bcm, BCM43xx_MMIO_STATUS_BITFIELD);
-   status = ~BCM43xx_SBF_TIME_UPDATE;
-   bcm43xx_write32(bcm, BCM43xx_MMIO_STATUS_BITFIELD, status);
+void bcm43xx_tsf_write(struct bcm43xx_private *bcm, u64 tsf)
+{
+   bcm43xx_time_lock(bcm);
+   bcm43xx_tsf_write_locked(bcm, tsf);
+   bcm43xx_time_unlock(bcm);
 }
 
 static void bcm43xx_measure_channel_change_time(struct bcm43xx_private *bcm)
@@ -415,10 +432,11 @@
 static void bcm43xx_write_mac_bssid_templates(struct bcm43xx_private *bcm)
 {
static const u8 zero_addr[ETH_ALEN] = { 0 };
-   const u8 *mac = NULL;
-   const u8 *bssid = NULL;
+   const u8 *mac;
+   const u8 *bssid;
u8 mac_bssid[ETH_ALEN * 2];
int i;
+   u32 tmp;
 
bssid = bcm-interface.bssid;
if (!bssid)
@@ -431,12 +449,13 @@
memcpy(mac_bssid + ETH_ALEN, bssid, ETH_ALEN);
 
/* Write our MAC address and BSSID to template ram */
-   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32))
-   bcm43xx_ram_write(bcm, 0x20 + i, *((u32 *)(mac_bssid + i)));
-   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32))
-   bcm43xx_ram_write(bcm, 0x78 + i, *((u32 *)(mac_bssid + i)));
-   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32))
-   bcm43xx_ram_write(bcm, 0x478 + i, *((u32 *)(mac_bssid + i)));
+   for (i = 0; i  ARRAY_SIZE(mac_bssid); i += sizeof(u32)) {
+   tmp =  (u32)(mac_bssid[i + 0]);
+   tmp |= (u32)(mac_bssid[i + 1])  8;
+   tmp |= (u32)(mac_bssid[i + 2])  16;
+   tmp |= (u32)(mac_bssid[i + 3])  24;
+   bcm43xx_ram_write(bcm, 0x20 + i, tmp);
+   }
 }
 
 static void bcm43xx_set_slot_time(struct bcm43xx_private *bcm, u16 slot_time)
@@ -460,49 +479,6 @@
bcm-short_slot = 0;
 }
 
-/* FIXME: To get the MAC-filter working, we need to implement the
- *following functions (and rename them :)
- */
-#if 0
-static void bcm43xx_disassociate(struct bcm43xx_private *bcm)
-{
-   bcm43xx_mac_suspend(bcm);
-   bcm43xx_macfilter_clear(bcm, BCM43xx_MACFILTER_ASSOC);
-
-   bcm43xx_ram_write(bcm, 0x0026, 0x);
-   bcm43xx_ram_write(bcm, 0x0028, 0x);
-   bcm43xx_ram_write(bcm, 0x007E, 0x);
-   bcm43xx_ram_write(bcm, 0x0080, 0x);
-   bcm43xx_ram_write(bcm, 0x047E, 0x);
-   bcm43xx_ram_write(bcm, 0x0480, 0x);
-
-   if (bcm-current_core-rev  3) {
-   bcm43xx_write16(bcm, 0x0610, 0x8000);
-   bcm43xx_write16(bcm, 0x060E, 0x);
-   } else
-   bcm43xx_write32(bcm, 0x0188, 0x8000);
-
-   bcm43xx_shm_write32(bcm, BCM43xx_SHM_WIRELESS, 0x0004, 0x03ff);
-
-#if 0
-   if (bcm43xx_current_phy(bcm)-type == 

Re: [Bugme-new] [Bug 6698] New: unregister_netdevice hangs indefinitely from /proc/sys/net/ipv6/conf/all/forwarding

2006-06-19 Thread Andrew Morton
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=6698
 
Summary: unregister_netdevice hangs indefinitely from
 /proc/sys/net/ipv6/conf/all/forwarding
 Kernel Version: 2.6.17-rc6
 Status: NEW
   Severity: normal
  Owner: [EMAIL PROTECTED]
  Submitter: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: none known (yet)
 Distribution: reproduced on Debian/stable, SuSE/10.0, SuSE/10.1
 Hardware Environment: reproduced on UML, i386, x86/64
 Software Environment: reproduced with openvpn and UML tap devices
 Problem Description: after adding IPv6 to my previously working openvpn
 tunneling setup, a (really old) IPv6-related bug started to occurr:
 http://lkml.org/lkml/2003/8/21/1
 I also reproduced this bug with kernel 2.6.15.1(vanilla,uml) and
 2.6.16.13(SuSE-version,x86/64) and linux-2.6.13 (SuSE-version,i386)
 
 Steps to reproduce:
 echo 0  /proc/sys/net/ipv6/conf/all/forwarding # this is important 
 initialization
 
 Have (any version of) openvpn open a tunnel using a tap (virtual ethernet)
 device. In the up script do:
 echo 1  /proc/sys/net/ipv6/conf/all/forwarding
 this can be easily tested with these lines:
 apt-get install openvpn
 modprobe tun
 mknod /dev/net/tun c 10 200
 echo 0  /proc/sys/net/ipv6/conf/all/forwarding
 echo echo 1  /proc/sys/net/ipv6/conf/all/forwarding  /tmp/up ; chmod a+x 
 /tmp/up
 openvpn --dev-type tap --remote tunnel.lsmod.de 5003 --ifconfig 10.9.0.2
 255.255.255.0 --dev-node /dev/net/tun --up /tmp/up
 # at this point you can verify your tunnel setup by ping 10.9.0.1
 # on the server I have this: openvpn --dev-type tap --ifconfig 10.9.0.1
 255.255.255.0 --port 5003 --dev-node /dev/net/tun --float
 # you need UDP port 5003 to pass through your firewall for this
 
 
 Alternatively get an user-mode-linux(UML) binary and do something along the
 lines of:
 apt-get install uml-utilities
 TAP=`tunctl -b`
 ifconfig $TAP 192.168.121.1 netmask 255.255.255.252
 echo 1  /proc/sys/net/ipv6/conf/all/forwarding
 /path/to/linux eth0=tuntap,$TAP ... # booting up to the point where the tap 
 dev
 is really bound (at ifconfig eth0 192.168.121.2 within the UML)
 tunctl -d $TAP
 
 
 After 20 seconds kill the openvpn or linux process.
 This hangs indefinitely, leaving the openvpn process in D state.
 syslog states every 10 secs:
 unregister_netdevice: waiting for tap0 to become free.  Usage count = 1
 
 The kernel will then hang ifconfig and ip commands, probably because the
 waiting-for-tap0 still holds a mutex.
 
 After a dozen reboots of trying I found a work-around: replacing the critical
 line with
 (sleep 2 ; echo 1  /proc/sys/net/ipv6/conf/all/forwarding )
 
 A sleep 1 does not suffice.
 Doing the echo before calling openvpn also works fine, so there seems to be a
 timing problem or race condition during initialization of the IPv6 on the 
 newly
 created tap0 device.
 

Thought to be an ipv6 refcount leak.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFT] bcm43xx: Busting the 1G limit

2006-06-19 Thread Daniel Gryniewicz
On Mon, 2006-06-19 at 22:43 +0200, Michael Buesch wrote:
 On Monday 19 June 2006 17:23, Daniel Gryniewicz wrote:
  On Sat, 2006-06-17 at 19:28 +0200, Michael Buesch wrote:
   Hi,
   
   This patch adds full 32-bit and 64-bit DMA support
   to the bcm43xx driver. Well, it _should_ do this. I can
   not test it, as I don't have a machine to trigger the 1G
   limit.
   The 1G limit should be exploitable on an AMD64 machine
   with more than 1G RAM.
   
   Please test and report, if it works or not. In the
   case of works not, please provide full dmesg log.
   
   Note that I am not sure which cards actually support
   full 32-bit or even 64-bit mode. Older cards might still
   only support 30-bit DMA.
  
  Hi.
  
  I tried this on both 2.6.17-rc6 and on wireless-dev, and got pretty much
  the same panic on both (modulo locking).  My box is a turion with 2 GB
  of ram and a BCM4318.  Here's the panic from wireless-dev:
  
  Unable to handle kernel NULL pointer dereference at 0020
  RIP:
  88104f24{:bcm43xx:bcm43xx_dma_handle_xmitstatus+436}
 
 I am still not absolutely sure where this oops comes from.
 Could you remove at least 1G of your RAM and retry?
 

I took out 1G of RAM (2 1G sticks), and there was no more panic.  It
still didn't work (no output from iwlist scan), but also no panic.

dmesg output was:
Jun 19 18:00:54 athena bcm43xx: Radio turned on
Jun 19 18:00:54 athena bcm43xx: ASSERTION FAILED (radio_attenuation 
10) at:
drivers/net/wireless/bcm43xx/bcm43xx_phy.c:1485:bcm43xx_find_lopair()
Jun 19 18:00:54 athena bcm43xx: ASSERTION FAILED (radio_attenuation 
10) at:
drivers/net/wireless/bcm43xx/bcm43xx_phy.c:1485:bcm43xx_find_lopair()
Jun 19 18:00:54 athena bcm43xx: Chip initialized
Jun 19 18:00:54 athena bcm43xx: 32-bit DMA initialized
Jun 19 18:00:54 athena bcm43xx: 80211 cores initialized
Jun 19 18:00:54 athena bcm43xx: Keys cleared
Jun 19 18:00:54 athena SoftMAC: Associate: Scanning for networks first.
Jun 19 18:00:54 athena SoftMAC: Associate: failed to initiate scan. Is
device up?

followed by a bunch of:
Jun 19 18:01:15 athena SoftMAC: Start scanning with channel: 1
Jun 19 18:01:15 athena SoftMAC: Scanning 14 channels
Jun 19 18:01:15 athena SoftMAC: Scanning finished

followed by:
Jun 19 18:02:03 athena SoftMAC: Associate: Scanning for networks first.
Jun 19 18:02:03 athena SoftMAC: Start scanning with channel: 1
Jun 19 18:02:03 athena SoftMAC: Scanning 14 channels
Jun 19 18:02:03 athena bcm43xx: set security called
Jun 19 18:02:03 athena bcm43xx:.level = 0
Jun 19 18:02:03 athena bcm43xx:.enabled = 0
Jun 19 18:02:03 athena bcm43xx:.encrypt = 0
Jun 19 18:02:03 athena SoftMAC: Scanning finished
Jun 19 18:02:03 athena SoftMAC: Associate: Scanning for networks first.
Jun 19 18:02:03 athena SoftMAC: Start scanning with channel: 1
Jun 19 18:02:03 athena SoftMAC: Scanning 14 channels
Jun 19 18:02:03 athena SoftMAC: Scanning finished
Jun 19 18:02:03 athena SoftMAC: Associate: Scanning for networks first.
Jun 19 18:02:03 athena SoftMAC: Start scanning with channel: 1
Jun 19 18:02:03 athena SoftMAC: Scanning 14 channels
Jun 19 18:02:04 athena SoftMAC: Scanning finished
Jun 19 18:02:04 athena SoftMAC: Unable to find matching network after
scan!

and finally:

Jun 19 18:02:44 athena bcm43xx: Radio turned off
Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0200 (RX) max used slots: 0/64
Jun 19 18:02:44 athena bcm43xx: DMA-32 0x02A0 (TX) max used slots: 0/512
Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0280 (TX) max used slots: 0/512
Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0260 (TX) max used slots: 0/512
Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0240 (TX) max used slots: 0/512
Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0220 (TX) max used slots: 2/512
Jun 19 18:02:44 athena bcm43xx: DMA-32 0x0200 (TX) max used slots: 0/512

At that point, I remove the bcm43xx module, and switched over to my
prism54 card in order to get net access.

This was all on wireless-dev as of yesterday with the 1G limit patch
from this thread.

Let me know if there's anything I can try, I'd love to get this working
properly.

Daniel

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 6682] New: BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU

2006-06-19 Thread Paul E. McKenney
On Mon, Jun 19, 2006 at 03:20:10PM -0700, Andrew Morton wrote:
 [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=6682
  
 Summary: BUG: soft lockup detected on CPU#0! / ksoftirqd takse
  100% CPU
  Kernel Version: 2.6.15.6
  Status: NEW
Severity: normal
   Owner: [EMAIL PROTECTED]
   Submitter: [EMAIL PROTECTED]
  
  
  Most recent kernel where this bug did not occur: (unknown)
  Distribution: Gentoo
  Hardware Environment: 2x Xeon 2.66, 1 GB RAM, NICS: 2 x e1000, and one 
  double
  port e100. Based on Intel E7501 architecture (2U rack Intel chassis).
  Software Environment: quagga 0.98.6
  Problem Description: ksoftirqd/0 takes 100% of CPU. further investigation 
  shows
  no sing of network flood or something (and also 2 of 3 NICs are e1000 with
  NAPI). Ocassionaly there are BUG: soft lockup detected on CPU#0!.
  
  Steps to reproduce:
  
  There is no simple way to reproduce. I think that everythint started when we
  attached second provider with BGP support. We are using quagga which injects
  about 186 000 routes into kernel. When running for a while (at least few 
  hours,
  sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in 
  logs:

Is it possible that there is a routing loop, either in the overall
configuration or in some intermediate point in the route injection?
Both CPUs seem to be receiving ethernet packets at the time of the oops.

Thanx, Paul

  BUG: soft lockup detected on CPU#0!
  
  Pid: 6506, comm:zebra
  EIP: 0060:[c027f6fd] CPU: 0
  EIP is at _spin_lock+0x7/0xf
   EFLAGS: 0286Not tainted  (2.6.15.6)
  EAX: f6203180 EBX: e6fbf000 ECX:  EDX: f6bec000
  ESI: f6203000 EDI: eddb4b80 EBP: fff4 DS: 007b ES: 007b
  CR0: 8005003b CR2: aca6dff0 CR3: 361ad000 CR4: 06d0
   [c02396f9] dev_queue_xmit+0xe0/0x203
   [c0250de8] ip_output+0x1e1/0x237
   [c024f3f5] ip_forward+0x181/0x1df
   [c024e21a] ip_rcv+0x40c/0x485
   [c0239bd0] netif_receive_skb+0x12f/0x165
   [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000]
   [f885a1ca] e1000_clean+0x94/0x12f [e1000]
   [c0239d5a] net_rx_action+0x69/0xf0
   [c011a305] __do_softirq+0x55/0xbd
   [c011a39a] do_softirq+0x2d/0x31
   [c011a3f8] local_bh_enable+0x5a/0x65
   [c024a0a1] rt_run_flush+0x5f/0x80
   [c027623f] fn_hash_insert+0x352/0x39f
   [c027364c] inet_rtm_newroute+0x57/0x62
   [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb
   [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
   [c0247c1e] netlink_rcv_skb+0x3a/0x8b
   [c0247cb1] netlink_run_queue+0x42/0xc3
   [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
   [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
   [c0241227] rtnetlink_rcv+0x22/0x40
   [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
   [c024764c] netlink_data_ready+0x17/0x54
   [c0246a99] netlink_sendskb+0x1f/0x39
   [c0247449] netlink_sendmsg+0x27b/0x28c
   [c0231467] sock_sendmsg+0xce/0xe9
   [c0112b36] __wake_up+0x27/0x3b
   [c01a6216] copy_to_user+0x38/0x42
   [c01a625a] copy_from_user+0x3a/0x60
   [c01a625a] copy_from_user+0x3a/0x60
   [c0126be2] autoremove_wake_function+0x0/0x3a
   [c0236bcd] verify_iovec+0x49/0x7f
   [c02327f2] sys_sendmsg+0x152/0x1a8
   [c0147a62] do_sync_read+0xb8/0xeb
   [c01a6216] copy_to_user+0x38/0x42
   [c0126be2] autoremove_wake_function+0x0/0x3a
   [c0122b7a] getrusage+0x34/0x43
   [c0168504] inotify_dentry_parent_queue_event+0x29/0x7c
   [c01a625a] copy_from_user+0x3a/0x60
   [c0232b6b] sys_socketcall+0x167/0x180
   [c0102433] sysenter_past_esp+0x54/0x75
  
  BUG: soft lockup detected on CPU#0!
  
  Pid: 6506, comm:zebra
  EIP: 0060:[f8952052] CPU: 0
  EIP is at u32_classify+0x52/0x170 [cls_u32]
   EFLAGS: 0206Not tainted  (2.6.15.6)
  EAX: e2fbd020 EBX: f48649c0 ECX: 0010 EDX: 29b09d5a
  ESI: f48649ec EDI: 0001 EBP: e2fbd020 DS: 007b ES: 007b
  CR0: 8005003b CR2: 08154004 CR3: 361ad000 CR4: 06d0
   [f88462fa] ipt_do_table+0x2de/0x2fd [ip_tables]
   [f883b523] ip_nat_fn+0x177/0x185 [iptable_nat]
   [f88e159f] ip_refrag+0x23/0x5f [ip_conntrack]
   [c0244d82] tc_classify+0x2c/0x3f
   [f895514b] htb_classify+0x14b/0x1dd [sch_htb]
   [f8955638] htb_enqueue+0x1d/0x13a [sch_htb]
   [c02396fd] dev_queue_xmit+0xe4/0x203
   [c0250de8] ip_output+0x1e1/0x237
   [c024f3f5] ip_forward+0x181/0x1df
   [c024e21a] ip_rcv+0x40c/0x485
   [c0239bd0] netif_receive_skb+0x12f/0x165
   [f885aa4c] e1000_clean_rx_irq+0x389/0x410 [e1000]
   [f885a1ca] e1000_clean+0x94/0x12f [e1000]
   [c0239d5a] net_rx_action+0x69/0xf0
   [c011a305] __do_softirq+0x55/0xbd
   [c011a39a] do_softirq+0x2d/0x31
   [c011a3f8] local_bh_enable+0x5a/0x65
   [c024a0a1] rt_run_flush+0x5f/0x80
   [c027623f] fn_hash_insert+0x352/0x39f
   [c027364c] inet_rtm_newroute+0x57/0x62
   [c02413ed] rtnetlink_rcv_msg+0x1a8/0x1cb
   [c0241245] rtnetlink_rcv_msg+0x0/0x1cb
   [c0247c1e] netlink_rcv_skb+0x3a/0x8b
   [c0247cb1] netlink_run_queue+0x42/0xc3
   [c0241245] 

Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-19 Thread Patrick McHardy
jamal wrote:
 - For further reflection: Have you considered the case where the rate
 table has already been considered on some link speed in user space and
 then somewhere post-config the physical link speed changes? This would
 happen in the case where ethernet AN is involved and the partner makes
 some changes (use ethtool). 
 
 I would say the last bullet is a more interesting problem than a corner
 case of some link layer technology that has high overhead.
 Your work would be more interesting if it was generic for many link
 layers instead of just ATM.

I've thought about this a couple of times, scaling the virtual clock
rate should be enough for simple qdiscs like TBF or HTB, which have
a linear relation between time and bandwidth. I haven't really thought
about the effects on HFSC yet, on a small scale the relation is
non-linear. But this is a different problem from trying to accomodate
for link-layer overhead.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-19 Thread Patrick McHardy
jamal wrote:
 You are still speaking ATM (and the above may still be valid), but: 
 Could you for example look at the netdevice-type and from that figure
 out the link layer overhead and compensate for it.
 Obviously a lot more useful if such activity is doable in user space
 without any knowledge of the kernel? and therefore zero change to the
 kernel and everything then becomes forward and backward compatible.

It would be nice to have support for HFSC as well, which unfortunately
needs to be done in the kernel since it doesn't use rate tables.
What about qdiscs like SFQ (which uses the packet size in quantum
calculations)? I guess it would make sense to use the wire-length
there as well.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Herbert Xu
Harry Edmon [EMAIL PROTECTED] wrote:
 
 That did not help.  I have 1 minute outputs from tcpdump under both 2.6.11.12 
 and 2.6.16.20.  You will see a large size difference between the files.  
 Since 
 the 2.6.11.12 one is 2 MBytes, I thought I would post them via the web 
 instead 
 of via attachments.   Look at:
 
 http://www.atmos.washington.edu/~harry/linux/2.6.11.12.out.1min
 http://www.atmos.washington.edu/~harry/linux/2.6.16.20.out.1min

The latter shows that it took 40ms to generate an ACK.  What does
'vmstat 1' show while this is happneing?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] ieee80211: fix not allocating IV+ICV space when using encryption in ieee80211_tx_frame

2006-06-19 Thread Hong Liu
We should preallocate IV+ICV space when encrypting the frame.
Currently no problem shows up just because dev_alloc_skb aligns the
data len to SMP_CACHE_BYTES which can be used for ICV.

Thanks,
Hong
diff -urp a/net/ieee80211/ieee80211_tx.c b/net/ieee80211/ieee80211_tx.c
--- a/net/ieee80211/ieee80211_tx.c	2006-06-20 09:36:13.0 +0800
+++ b/net/ieee80211/ieee80211_tx.c	2006-06-20 09:32:39.0 +0800
@@ -562,10 +562,12 @@ int ieee80211_tx_frame(struct ieee80211_
 	struct net_device_stats *stats = ieee-stats;
 	struct sk_buff *skb_frag;
 	int priority = -1;
+	int fraglen = total_len;
+	struct ieee80211_crypt_data *crypt = ieee-crypt[ieee-tx_keyidx];
 
 	spin_lock_irqsave(ieee-lock, flags);
 
-	if (encrypt_mpdu  !ieee-sec.encrypt)
+	if (encrypt_mpdu  (!ieee-sec.encrypt || !crypt))
 		encrypt_mpdu = 0;
 
 	/* If there is no driver handler to take the TXB, dont' bother
@@ -581,20 +583,25 @@ int ieee80211_tx_frame(struct ieee80211_
 		goto success;
 	}
 
-	if (encrypt_mpdu)
+	if (encrypt_mpdu) {
 		frame-frame_ctl |= cpu_to_le16(IEEE80211_FCTL_PROTECTED);
+		/* mpdu_prefix_len will be add to the headroom */
+		fraglen += crypt-ops-extra_mpdu_postfix_len;
+	}
 
 	/* When we allocate the TXB we allocate enough space for the reserve
 	 * and full fragment bytes (bytes_per_frag doesn't include prefix,
 	 * postfix, header, FCS, etc.) */
-	txb = ieee80211_alloc_txb(1, total_len, ieee-tx_headroom, GFP_ATOMIC);
+	txb = ieee80211_alloc_txb(1, fraglen, ieee-tx_headroom +
+  crypt-ops-extra_mpdu_prefix_len,
+  GFP_ATOMIC);
 	if (unlikely(!txb)) {
 		printk(KERN_WARNING %s: Could not allocate TXB\n,
 		   ieee-dev-name);
 		goto failed;
 	}
 	txb-encrypted = 0;
-	txb-payload_size = total_len;
+	txb-payload_size = fraglen;
 
 	skb_frag = txb-fragments[0];
 


Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL

2006-06-19 Thread Chris Wedgwood
On Wed, Jun 14, 2006 at 11:40:04AM +0200, Jesper Dangaard Brouer wrote:

 The Linux traffic's control engine inaccurately calculates
 transmission times for packets sent over ADSL links.  For some
 packet sizes the error rises to over 50%.  This occurs because ADSL
 uses ATM as its link layer transport, and ATM transmits packets in
 fixed sized 53 byte cells.

What if AAL5 is used?  The cell-alignment math is going to be wrong
there surely?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html