Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Steven Whitehouse

Hi,

Can we solve the problem another way, by not taking refs on the glocks 
when we are iterating over them for the debugfs files? I assume that is 
the main issue here.


We didn't used to take refs since the rcu locking was enough during the 
walk itself. We used to only keep track of the hash bucket and offset 
within the bucket when we dropped the rcu lock between calls to the 
iterator. I may have lost track of why that approach did not work?


Steve.


On 29/03/18 13:06, Andreas Gruenbacher wrote:

Here's a second version of the patch (now a patch set) to eliminate
rhashtable_walk_peek in gfs2.

The first patch introduces lockref_put_not_zero, the inverse of
lockref_get_not_zero.

The second patch eliminates rhashtable_walk_peek in gfs2.  In
gfs2_glock_iter_next, the new lockref function from patch one is used to
drop a lockref count as long as the count doesn't drop to zero.  This is
almost always the case; if there is a risk of dropping the last
reference, we must defer that to a work queue because dropping the last
reference may sleep.

Thanks,
Andreas

Andreas Gruenbacher (2):
   lockref: Add lockref_put_not_zero
   gfs2: Stop using rhashtable_walk_peek

  fs/gfs2/glock.c | 47 ---
  include/linux/lockref.h |  1 +
  lib/lockref.c   | 28 
  3 files changed, 57 insertions(+), 19 deletions(-)





Re: Question about SOCK_SEQPACKET

2017-05-05 Thread Steven Whitehouse

Hi,


On 05/05/17 11:09, Sowmini Varadhan wrote:

On (05/05/17 10:45), Steven Whitehouse wrote:

I do wonder if the man page for recvmsg is wrong, or at least a bit
confusing. SOCK_SEQPACKET is stream based not message based - it just
happens to have EOR markers in the stream. There is no reason that the whole
message needs to be returned in a single read, and in fact that would be
impossible if the sender didn't insert any EOR markers but kept sending data
beyond the size that the socket could buffer.

I notice that man 7 socket says SOCK_SEQPACKET is for datagrams of fixed
maximum length which is definitely wrong, as is the statement that a
consumer has to read an entire packet with each system call.

Which man page do you think is wrong here? The POSIX definition is here

http://pubs.opengroup.org/onlinepubs/009695399/functions/recvmsg.html

The description in

http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_10.html

says, "It is protocol-specific whether a maximum record size is imposed."
In my machine (Ubuntu 4.4.0-72, and it is in socket(2), not socket(7), btw)
doesnt have any references to max length, but I'm not sure I'd boldly assert
"definitely wrong" about the requirement of having to read entire
packet in a system call (see POSIX man page)

--Sowmini

Just before the part that you've quoted, the description for 
SOCK_SEQPACKET says:
"The SOCK_SEQPACKET socket type is similar to the SOCK_STREAM type, and 
is also connection-oriented. The only difference between these types is 
that record boundaries are maintained using the SOCK_SEQPACKET type. A 
record can be sent using one or more output operations and received 
using one or more input operations, but a single operation never 
transfers parts of more than one record."


The man page for socket says SOCK_SEQPACKET "Provides  a sequenced,  
reliable,  two-way connection-based data transmission path  for  
datagrams  of  fixed maximum  length" which is not true, because while 
there may be a length restriction, it is quite possible that there is 
not a length restriction (as per DECnet). It also says "a  consumer  is  
required  to read an entire packet with each input system call" which is 
also contradicted by POSIX which says that a record can be "received 
using one or more input operations". So both statements in the man page 
are wrong, I think.


I have to say that I'd not spotted the POSIX recvmsg wording before, 
which says "For message-based sockets, such as SOCK_DGRAM and 
SOCK_SEQPACKET, the entire message shall be read in a single operation" 
however that does contradict the earlier wording, where it explicitly 
says that multiple receive operations per record are ok for 
SOCK_SEQPACKET - at least if we assume that record == message in this 
case. Also, if this restriction was true (one message per recvmsg call) 
then MSG_EOR would never be needed on receive, since every recvmsg would 
be a single message/record only, and that same document does say that 
MSG_EOR can be set on receive for protocols which support it,


Steve.




Re: Question about SOCK_SEQPACKET

2017-05-05 Thread Steven Whitehouse

Hi,


On 05/05/17 06:18, Sam Kumar wrote:

Hello,
I have recently had occasion to use SOCK_SEQPACKET sockets on Linux,
and noticed some odd behavior. When using sendmsg and recvmsg with
these sockets, it seems that the "end-of-record" flag (MSG_EOR) is not
being propagated correctly.
It depends which protocol you are using as to whether that is true. 
SOCK_SEQPACKET is supposed to be identical to SOCK_STREAM except for the 
record separators. That is true for DECnet (but whether DECnet is still 
functional is another thing!) and not true for ax.25 which uses 
SOCK_SEQPACKET incorrectly. For AF_UNIX that you are using I'm not quite 
sure what would be expected.



The man page for recvmsg(2) states:

The  msg_flags  field  in the msghdr is set on return of recvmsg().  It
can contain several flags:

MSG_EOR
   indicates end-of-record; the data returned  completed  a  record
   (generally used with sockets of type SOCK_SEQPACKET).


The man page for recvmsg(3) states:

For
   message-based  sockets,  such as SOCK_DGRAM and SOCK_SEQPACKET, the 
entire
   message shall be read in a single operation.



This leads me to believe that MSG_EOR should be set in the msghdr
struct whenever recvmsg() returns data. However, I am not observing
this flag ever being set, whether or not I set the MSG_EOR when
sending the messages.

If it helps you can take a look at the code I'm using. It is at
https://github.com/samkumar/seqpacket-test/, commit
2a7dbc1f94bafce6950ee726bdd54da96945d083 (HEAD of master at the time
of writing). Look at server.c and client.c (don't bother with
goclient.go).

The reason that I need to check MSG_EOR is that I need to distinguish
between EOF and messages of length 0. For SOCK_STREAM sockets, a
return value of 0 unambiguously means EOF, and for SOCK_DGRAM sockets
a return value of 0 unambiguously means that a datagram of length 0
was received.

Because SOCK_SEQPACKET is both connection-based and message-oriented,
a return value of 0 is ambiguous. Based on my reading of the man
pages, reading the MSG_EOR bit would let me disambiguate between EOF
and a zero-length datagram, because MSG_EOR would be set for a
zero-length datagram, but would not be set for EOF.

If someone could please help me understand MSG_EOR, and how to
distinguish between EOF and zero-length messages in a SOCK_SEQPACKET
connection, I would definitely appreciate it!

Thanks,
Sam Kumar
That would be my expectation of how it should work - if you ignore 
MSG_EOR on recvmsg then what you get is identical to a SOCK_STREAM, and 
that every call to recvmsg will return data from (at most) a single 
message, with MSG_EOR set if the end of that message has been reached. 
That is what POSIX says should happen anyway.


I do wonder if the man page for recvmsg is wrong, or at least a bit 
confusing. SOCK_SEQPACKET is stream based not message based - it just 
happens to have EOR markers in the stream. There is no reason that the 
whole message needs to be returned in a single read, and in fact that 
would be impossible if the sender didn't insert any EOR markers but kept 
sending data beyond the size that the socket could buffer.


I notice that man 7 socket says SOCK_SEQPACKET is for datagrams of fixed 
maximum length which is definitely wrong, as is the statement that a 
consumer has to read an entire packet with each system call.


Also it is probably better to ask this question on netdev where it is 
likely to get more attention from the net developers, so I'm copying my 
reply there too,


Steve.



Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API

2015-08-13 Thread Steven Whitehouse

Hi,

On 12/08/15 17:42, Marcelo Ricardo Leitner wrote:

Em 12-08-2015 12:33, David Laight escreveu:

From: Marcelo Ricardo Leitner

Sent: 12 August 2015 14:16
Em 12-08-2015 07:23, David Laight escreveu:

From: Marcelo Ricardo Leitner

Sent: 11 August 2015 23:22
DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
needed but this causes it to use sctp_do_peeloff() to mimic an
kernel_accept() and this causes a symbol dependency on sctp module.

By switching it to 1-to-1 API we can avoid this dependency and also
reduce quite a lot of SCTP-specific code in lowcomms.c.

...

You still need to enable sctp notifications (I think the patch deleted
that code).
Otherwise you don't get any kind of indication if the remote system
'resets' (ie sends an new INIT chunk) on an existing connection.


Right, it would miss the restart event and could generate a corrupted
tx/rx buffers by glueing parts of old messages with new ones.


Except that it is SCTP so you'd expect DATA chunks to contain entire
messages and so get unexpected message sequences rather than corrupt
messages.


I was thinking on cases where the buf for recvmsg is not enough to 
hold the chunk, so that the remaining is left for another attempt 
(sctp_recvmsg, around line 2130), but sounds like we won't purge rx 
buffer when the reset happens so that doesn't matter. The association 
is replaced, but the buffers are kept.


Out of order messages aren't a problem for dlm. It can recover from 
that just fine, as it doesn't have a specific handshake at beginning 
or something like that and upper layers are agnostic to that state 
transition (disconnect/reconnect/...), this should be fine.


I'm not sure thats true - DLM does rely on message ordering in some 
cases in order to ensure correct functioning. So depending on how SCTP 
is interfaced to DLM, it might potentially be an issue,


Steve.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: My 802.3ad is my bond

2008-01-24 Thread Steven Whitehouse
Hi,

On Wed, 2008-01-23 at 09:13 -0800, Jay Vosburgh wrote:
 Steven Whitehouse [EMAIL PROTECTED] wrote:
 [...]
 This commit: ece95f7fefe3afae19e641e1b3f5e64b00d5b948 seems to have
 caused a problem with parsing bond arguments as now only the numeric
 arguments seem to work (in modprobe.conf) and specifying 802.3ad fails.
 When I revert that patch in my local tree all seems ok.
 
   Thanks for the report; I know what the problem here is.  I'll
 get a fix out.
 
 Also I notice that one of my two NICs now reports this:
 
 bonding: bond0: link status definitely down for interface eth0,
 disabling it
 bonding: bond0: Interface eth0 is already enslaved!
 bond0.5: no IPv6 routers present
 
 which I think is also new with this set of bonding updates, before it
 used to use both interfaces ok. I've not worked out which of the other
 patches causes this so far, but I can if its helpful,
 
   That would be helpful, as would some more details: e.g., the
 various options passed to bonding, the complete dmesg log, contents of
 /proc/net/bonding/bond0 [or whatever your interface is called], and
 anything else you think would be helpful.
 
   -J
 
I've been through all the patches now, reverting each in turn and it
seems that I still have the problem. Thats rather odd as I'm sure that I
didn't have only one interface working before. When I get a moment I'll
try and work out whats going on here,

Steve.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


My 802.3ad is my bond

2008-01-23 Thread Steven Whitehouse
Hi,

This commit: ece95f7fefe3afae19e641e1b3f5e64b00d5b948 seems to have
caused a problem with parsing bond arguments as now only the numeric
arguments seem to work (in modprobe.conf) and specifying 802.3ad fails.
When I revert that patch in my local tree all seems ok.

Also I notice that one of my two NICs now reports this:

bonding: bond0: link status definitely down for interface eth0,
disabling it
bonding: bond0: Interface eth0 is already enslaved!
bond0.5: no IPv6 routers present

which I think is also new with this set of bonding updates, before it
used to use both interfaces ok. I've not worked out which of the other
patches causes this so far, but I can if its helpful,

Steve.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: My 802.3ad is my bond

2008-01-23 Thread Steven Whitehouse
Hi,

On Wed, 2008-01-23 at 09:13 -0800, Jay Vosburgh wrote:
 Steven Whitehouse [EMAIL PROTECTED] wrote:
 [...]
 This commit: ece95f7fefe3afae19e641e1b3f5e64b00d5b948 seems to have
 caused a problem with parsing bond arguments as now only the numeric
 arguments seem to work (in modprobe.conf) and specifying 802.3ad fails.
 When I revert that patch in my local tree all seems ok.
 
   Thanks for the report; I know what the problem here is.  I'll
 get a fix out.
 
 Also I notice that one of my two NICs now reports this:
 
 bonding: bond0: link status definitely down for interface eth0,
 disabling it
 bonding: bond0: Interface eth0 is already enslaved!
 bond0.5: no IPv6 routers present
 
 which I think is also new with this set of bonding updates, before it
 used to use both interfaces ok. I've not worked out which of the other
 patches causes this so far, but I can if its helpful,
 
   That would be helpful, as would some more details: e.g., the
 various options passed to bonding, the complete dmesg log, contents of
 /proc/net/bonding/bond0 [or whatever your interface is called], and
 anything else you think would be helpful.
 
   -J
 
 ---
   -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
Ok. I'll try and work out which patch it is, but in the mean time:

Ethernet Channel Bonding Driver: v3.2.3 (December 6, 2007)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 17
Partner Key: 4
Partner Mac Address: 00:12:a9:13:3f:67

Slave Interface: eth0
MII Status: down   -- this one should be up as well
Link Failure Count: 2
Permanent HW addr: 00:11:43:d7:75:74
Aggregator ID: 2

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:11:43:d7:75:75
Aggregator ID: 1

Bonding options:
options bond0 miimon=100 mode=802.3ad

I'll send the dmesg by private email as its quite large,

Steve.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dn_neigh_table vs pneigh_lookup/pneigh_delete

2007-12-20 Thread Steven Whitehouse
Hi,

On Wed, Dec 19, 2007 at 05:11:34PM +0300, Pavel Emelyanov wrote:
 Hi
 
 The pneigh_lookup/delete silently concerns, that the 
 key_len of the table is more that 4 bytes. Look:
 
  u32 hash_val = *(u32 *)(pkey + key_len - 4);
 
 The hash_val for the proxy neighbor entry is four last bytes
 from the pkey.
 
 But the dn_neigh_tables' key_len is sizeof(__le16), that is 2,
 so setting (via netlink) the proxy neighbor entry for decnet 
 will cause this entry to reside in arbitrary hash chain.
 
 Is this too bad for decnet?
 
 Thanks,
 Pavel
The pneigh code is never used in DECnet, we only use the normal
part of the neigh code where the hash function was changed so that
it can be defined for each protocol (and thus doesn't suffer from
this problem)

Steve.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][DECNET] dn_nl_deladdr() almost always returns no error

2007-11-29 Thread Steven Whitehouse
Hi,

On Thu, Nov 29, 2007 at 07:29:20PM +0300, Pavel Emelyanov wrote:
 As far as I see from the err variable initialization
 the dn_nl_deladdr() routine was designed to report errors
 like EADDRNOTAVAIL and probaby ENODEV.
 
 But the code sets this err to 0 after the first nlmsg_parse
 and goes on, returning this 0 in any case.
 
 Is this made deliberately, or the patch below is correct?

The patch looks good to me.
 
 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
Acked-by: Steven Whitehouse [EMAIL PROTECTED]

Steve.

 
 ---
 
 diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
 index 66e266f..3bc82dc 100644
 --- a/net/decnet/dn_dev.c
 +++ b/net/decnet/dn_dev.c
 @@ -651,16 +651,18 @@ static int dn_nl_deladdr(struct sk_buff *skb, struct 
 nlmsghdr *nlh, void *arg)
   struct dn_dev *dn_db;
   struct ifaddrmsg *ifm;
   struct dn_ifaddr *ifa, **ifap;
 - int err = -EADDRNOTAVAIL;
 + int err;
  
   err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, dn_ifa_policy);
   if (err  0)
   goto errout;
  
 + err = -ENODEV;
   ifm = nlmsg_data(nlh);
   if ((dn_db = dn_dev_by_index(ifm-ifa_index)) == NULL)
   goto errout;
  
 + err = -EADDRNOTAVAIL;
   for (ifap = dn_db-ifa_list; (ifa = *ifap); ifap = ifa-ifa_next) {
   if (tb[IFA_LOCAL] 
   nla_memcmp(tb[IFA_LOCAL], ifa-ifa_local, 2))
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] decnet: addr module param can't be __initdata

2007-11-01 Thread Steven Whitehouse
Hi,

Looks good, Feel free to add:

Acked-by: Steven Whitehouse [EMAIL PROTECTED]

Steve.

On Thu, Nov 01, 2007 at 06:36:29PM +0300, Alexey Dobriyan wrote:
 sysfs keeps references to module parameters via /sys/module/*/parameters,
 so marking them as __initdata can't work.
 
 Steps to reproduce:
 
   modprobe decnet
   cat /sys/module/decnet/parameters/addr
 
 BUG: unable to handle kernel paging request at virtual address f88cd410
 printing eip: c043dfd1 *pdpt = 4001 *pde = 04408067 *pte 
 =  
 Oops:  [#1] PREEMPT SMP 
 Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror 
 dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply 
 parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom 
 shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
 Pid: 2099, comm: cat Not tainted 
 (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #6)
 EIP: 0060:[c043dfd1] EFLAGS: 00210286 CPU: 1
 EIP is at param_get_int+0x6/0x20
 EAX: c5c87000 EBX:  ECX: 80d0 EDX: f88cd410
 ESI: f8a108f8 EDI: c5c87000 EBP:  ESP: c5c97f00
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000)
 Stack:  f8a108f8 c5c87000 c043db6b f8a108f1 0124 c043de1a 
 c043db2f 
f88cd410  c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 
 c5dd5078 
c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 
 1000 
 Call Trace:
  [c043db6b] param_array_get+0x3c/0x62
  [c043de1a] param_array_set+0x0/0xdf
  [c043db2f] param_array_get+0x0/0x62
  [c043dd69] param_attr_show+0x15/0x2d
  [c043dd54] param_attr_show+0x0/0x2d
  [c043dbc8] module_attr_show+0x1a/0x1e
  [c04c431f] sysfs_read_file+0x7c/0xd9
  [c04c42a3] sysfs_read_file+0x0/0xd9
  [c048d4b2] vfs_read+0x88/0x134
  [c042090b] do_page_fault+0x0/0x7d5
  [c048d920] sys_read+0x41/0x67
  [c04080fa] sysenter_past_esp+0x6b/0xc1
  ===
 Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 
 89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 
 58 8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c 
 EIP: [c043dfd1] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00
 
 Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
 ---
 
  net/decnet/dn_dev.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 --- a/net/decnet/dn_dev.c
 +++ b/net/decnet/dn_dev.c
 @@ -1439,7 +1439,7 @@ static const struct file_operations dn_dev_seq_fops = {
  
  #endif /* CONFIG_PROC_FS */
  
 -static int __initdata addr[2];
 +static int addr[2];
  module_param_array(addr, int, NULL, 0444);
  MODULE_PARM_DESC(addr, The DECnet address of this machine: area,node);
  
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


DECnet routing rule resolution

2007-03-29 Thread Steven Whitehouse
Hi,

One of the effects of the recent tidy up of the DECnet routing rules
code is that we are no longer able to see the difference between reading
a rule of type FR_ACT_UNREACHABLE returning -ENETUNREACH and simply
running out of rules to look at, which also returns the same thing.

The DECnet code used to return -ESRCH if it ran out of rules in which
case the test in dn_route.c (which resulted in DECnet falling back to
endnode routing in the -ESRCH case) no longer works.

So there seems to be several options to try and solve this: one is to
change the error return for running out of rules in
fib_rules.c:fib_rules_lookup() to something else (but then that has a
knock on effect in the ipv4 code). Another is to add the not found
error return as a parameter in the struct fib_rules_ops so that both
protocols can have their preferred error return. Both solutions seem a
bit messy, so I thought I'd ask for some guidance on this before writing
a patch,


Steve.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/12] [DECNet]: Use rtnl registration interface

2007-03-22 Thread Steven Whitehouse
Hi,

On Thu, Mar 22, 2007 at 02:00:04PM +0100, Thomas Graf wrote:
 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Acked-by: Steven Whitehouse [EMAIL PROTECTED]

for all the DECnet bits  also the DECnet changes in the other patch I saw
from you relating to the routing rules,

Steve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: decnet handle a failure in neigh_parms_alloc (take 2)

2007-01-25 Thread Steven Whitehouse
Hi,

On Wed, Jan 24, 2007 at 09:55:45PM -0700, Eric W. Biederman wrote:
 
 While enhancing the neighbour code to handle multiple network
 namespaces I noticed that decnet is assuming neigh_parms_alloc
 will allways succeed, which is clearly wrong.  So handle the
 failure.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Acked-by: Steven Whitehouse [EMAIL PROTECTED]

Also you should cc Patrick as he is now the maintainer,

Steve.


 ---
  net/decnet/dn_dev.c |   11 +--
  1 files changed, 9 insertions(+), 2 deletions(-)
 
 diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
 index 324eb47..913e25a 100644
 --- a/net/decnet/dn_dev.c
 +++ b/net/decnet/dn_dev.c
 @@ -1140,16 +1140,23 @@ struct dn_dev *dn_dev_create(struct net_device *dev, 
 int *err)
   init_timer(dn_db-timer);
  
   dn_db-uptime = jiffies;
 +
 + dn_db-neigh_parms = neigh_parms_alloc(dev, dn_neigh_table);
 + if (!dn_db-neigh_parms) {
 + dev-dn_ptr = NULL;
 + kfree(dn_db);
 + return NULL;
 + }
 +
   if (dn_db-parms.up) {
   if (dn_db-parms.up(dev)  0) {
 + neigh_parms_release(dn_neigh_table, 
 dn_db-neigh_parms);
   dev-dn_ptr = NULL;
   kfree(dn_db);
   return NULL;
   }
   }
  
 - dn_db-neigh_parms = neigh_parms_alloc(dev, dn_neigh_table);
 -
   dn_dev_sysctl_register(dev, dn_db-parms);
  
   dn_dev_set_timer(dev);
 -- 
 1.4.4.1.g278f
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] packet mark fib rules work

2006-11-09 Thread Steven Whitehouse
Hi,

On Thu, Nov 09, 2006 at 12:27:35PM +0100, Thomas Graf wrote:
 Renames nfmark to mark and remove the dependency on netfilter
 to ease usage by all subsystems. Also removes all the unneeded
 config options to enable routing by fwmark, it can be safely
 enabled by default.
 
 Moves mark selector code from per protocol part into the generic
 part and adds support for inverting selectors.
 

Acked-by: Steven Whitehouse [EMAIL PROTECTED]

so far as all the DECnet bits go. One question though... will you
be adding later (as your slide #5 and #11 from your netconf presentation
appear to imply) a way to set the mark from the routing table (presumably
included in the nexthop info) ?

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DECNET] Endian bug fixes

2006-11-06 Thread Steven Whitehouse
Hi,

Here is a patch which fixes some endianess problems. Patrick: since you
have both big  little endian machines at your disposal, can you test to
ensure this is ok? Thanks,

Steve.

From ed3de950e89f8b02302308a2bedd59123ff3b88e Mon Sep 17 00:00:00 2001
From: Steven Whitehouse [EMAIL PROTECTED]
Date: Mon, 6 Nov 2006 10:30:30 -0500
Subject: [PATCH] [DECNET] Endianess fixes

Here are some fixes to endianess problems spotted by Al Viro.

Cc: Al Viro [EMAIL PROTECTED]
Cc: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
---
 net/decnet/af_decnet.c |   21 ++---
 net/decnet/dn_rules.c  |4 ++--
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 3456cd3..37b4720 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -166,7 +166,7 @@ static struct hlist_head *dn_find_list(s
if (scp-addr.sdn_flags  SDF_WILD)
return hlist_empty(dn_wild_sk) ? dn_wild_sk : NULL;
 
-   return dn_sk_hash[scp-addrloc  DN_SK_HASH_MASK];
+   return dn_sk_hash[dn_ntohs(scp-addrloc)  DN_SK_HASH_MASK];
 }
 
 /* 
@@ -180,7 +180,7 @@ static int check_port(__le16 port)
if (port == 0)
return -1;
 
-   sk_for_each(sk, node, dn_sk_hash[port  DN_SK_HASH_MASK]) {
+   sk_for_each(sk, node, dn_sk_hash[dn_ntohs(port)  DN_SK_HASH_MASK]) {
struct dn_scp *scp = DN_SK(sk);
if (scp-addrloc == port)
return -1;
@@ -194,12 +194,12 @@ static unsigned short port_alloc(struct 
 static unsigned short port = 0x2000;
unsigned short i_port = port;
 
-   while(check_port(++port) != 0) {
+   while(check_port(dn_htons(++port)) != 0) {
if (port == i_port)
return 0;
}
 
-   scp-addrloc = port;
+   scp-addrloc = dn_htons(port);
 
return 1;
 }
@@ -418,7 +418,7 @@ struct sock *dn_find_by_skb(struct sk_bu
struct dn_scp *scp;
 
read_lock(dn_hash_lock);
-   sk_for_each(sk, node, dn_sk_hash[cb-dst_port  DN_SK_HASH_MASK]) {
+   sk_for_each(sk, node, dn_sk_hash[dn_ntohs(cb-dst_port)  
DN_SK_HASH_MASK]) {
scp = DN_SK(sk);
if (cb-src != dn_saddr2dn(scp-peer))
continue;
@@ -1016,13 +1016,12 @@ static void dn_access_copy(struct sk_buf
 
 static void dn_user_copy(struct sk_buff *skb, struct optdata_dn *opt)
 {
-unsigned char *ptr = skb-data;
-
-opt-opt_optl   = *ptr++;
-opt-opt_status = 0;
-memcpy(opt-opt_data, ptr, opt-opt_optl);
-skb_pull(skb, dn_ntohs(opt-opt_optl) + 1);
+   unsigned char *ptr = skb-data;
 
+   opt-opt_optl   = dn_htons((__u16)*ptr++);
+   opt-opt_status = 0;
+   memcpy(opt-opt_data, ptr, dn_ntohs(opt-opt_optl));
+   skb_pull(skb, dn_ntohs(opt-opt_optl) + 1);
 }
 
 static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo)
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 3e0c882..590e0a7 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -124,8 +124,8 @@ static struct nla_policy dn_fib_rule_pol
 static int dn_fib_rule_match(struct fib_rule *rule, struct flowi *fl, int 
flags)
 {
struct dn_fib_rule *r = (struct dn_fib_rule *)rule;
-   u16 daddr = fl-fld_dst;
-   u16 saddr = fl-fld_src;
+   __le16 daddr = fl-fld_dst;
+   __le16 saddr = fl-fld_src;
 
if (((saddr ^ r-src)  r-srcmask) ||
((daddr ^ r-dst)  r-dstmask))
-- 
1.4.1



Re: [DECNET] Endian bug fixes

2006-11-06 Thread Steven Whitehouse
Hi,

On Mon, 2006-11-06 at 10:32 +, Al Viro wrote:
 On Mon, Nov 06, 2006 at 10:31:02AM +, Steven Whitehouse wrote:
  +   opt-opt_optl   = dn_htons((__u16)*ptr++);
 
 Lose that cast; it's only confusing the things...
 
  +   memcpy(opt-opt_data, ptr, dn_ntohs(opt-opt_optl));
  +   skb_pull(skb, dn_ntohs(opt-opt_optl) + 1);
 
 ... and I'd actually do
 
   u16 len = *ptr++; /* yes, it's 8bit on the wire */
   opt-opt_optl   = dn_htons(len);
   BUG_ON(len  16); /* we've checked the contents earlier */
   memcpy(opt-opt_data, ptr, len);
   skb_pull(skb, len + 1);

Ok, and I've also made the same change in the other places too, so far
as its relevant in those cases. New patch attached,

Steve.

From a184f89a13fa292589f309057cc0775a8256a89e Mon Sep 17 00:00:00 2001
From: Steven Whitehouse [EMAIL PROTECTED]
Date: Mon, 6 Nov 2006 11:51:00 -0500
Subject: [DECNET] Endianess fixes (try #2)

Here are some fixes to endianess problems spotted by Al Viro.

Cc: Al Viro [EMAIL PROTECTED]
Cc: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
---
 net/decnet/af_decnet.c  |   25 +
 net/decnet/dn_nsp_in.c  |8 
 net/decnet/dn_nsp_out.c |2 +-
 net/decnet/dn_rules.c   |4 ++--
 4 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 3456cd3..21f20f2 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -166,7 +166,7 @@ static struct hlist_head *dn_find_list(s
if (scp-addr.sdn_flags  SDF_WILD)
return hlist_empty(dn_wild_sk) ? dn_wild_sk : NULL;
 
-   return dn_sk_hash[scp-addrloc  DN_SK_HASH_MASK];
+   return dn_sk_hash[dn_ntohs(scp-addrloc)  DN_SK_HASH_MASK];
 }
 
 /* 
@@ -180,7 +180,7 @@ static int check_port(__le16 port)
if (port == 0)
return -1;
 
-   sk_for_each(sk, node, dn_sk_hash[port  DN_SK_HASH_MASK]) {
+   sk_for_each(sk, node, dn_sk_hash[dn_ntohs(port)  DN_SK_HASH_MASK]) {
struct dn_scp *scp = DN_SK(sk);
if (scp-addrloc == port)
return -1;
@@ -194,12 +194,12 @@ static unsigned short port_alloc(struct 
 static unsigned short port = 0x2000;
unsigned short i_port = port;
 
-   while(check_port(++port) != 0) {
+   while(check_port(dn_htons(++port)) != 0) {
if (port == i_port)
return 0;
}
 
-   scp-addrloc = port;
+   scp-addrloc = dn_htons(port);
 
return 1;
 }
@@ -418,7 +418,7 @@ struct sock *dn_find_by_skb(struct sk_bu
struct dn_scp *scp;
 
read_lock(dn_hash_lock);
-   sk_for_each(sk, node, dn_sk_hash[cb-dst_port  DN_SK_HASH_MASK]) {
+   sk_for_each(sk, node, dn_sk_hash[dn_ntohs(cb-dst_port)  
DN_SK_HASH_MASK]) {
scp = DN_SK(sk);
if (cb-src != dn_saddr2dn(scp-peer))
continue;
@@ -1016,13 +1016,14 @@ static void dn_access_copy(struct sk_buf
 
 static void dn_user_copy(struct sk_buff *skb, struct optdata_dn *opt)
 {
-unsigned char *ptr = skb-data;
-
-opt-opt_optl   = *ptr++;
-opt-opt_status = 0;
-memcpy(opt-opt_data, ptr, opt-opt_optl);
-skb_pull(skb, dn_ntohs(opt-opt_optl) + 1);
-
+   unsigned char *ptr = skb-data;
+   u16 len = *ptr++; /* yes, it's 8bit on the wire */
+
+   BUG_ON(len  16); /* we've checked the contents earlier */
+   opt-opt_optl   = dn_htons(len);
+   opt-opt_status = 0;
+   memcpy(opt-opt_data, ptr, len);
+   skb_pull(skb, len + 1);
 }
 
 static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo)
diff --git a/net/decnet/dn_nsp_in.c b/net/decnet/dn_nsp_in.c
index 72ecc6e..7683d4f 100644
--- a/net/decnet/dn_nsp_in.c
+++ b/net/decnet/dn_nsp_in.c
@@ -360,9 +360,9 @@ static void dn_nsp_conn_conf(struct sock
scp-max_window = decnet_no_fc_max_cwnd;
 
if (skb-len  0) {
-   unsigned char dlen = *skb-data;
+   u16 dlen = *skb-data;
if ((dlen = 16)  (dlen = skb-len)) {
-   scp-conndata_in.opt_optl = 
dn_htons((__u16)dlen);
+   scp-conndata_in.opt_optl = dn_htons(dlen);
memcpy(scp-conndata_in.opt_data, skb-data + 
1, dlen);
}
}
@@ -404,9 +404,9 @@ static void dn_nsp_disc_init(struct sock
memset(scp-discdata_in.opt_data, 0, 16);
 
if (skb-len  0) {
-   unsigned char dlen = *skb-data;
+   u16 dlen = *skb-data;
if ((dlen = 16)  (dlen = skb-len)) {
-   scp-discdata_in.opt_optl = dn_htons((__u16)dlen);
+   scp-discdata_in.opt_optl = dn_htons(dlen);
memcpy(scp-discdata_in.opt_data, skb-data + 1, dlen

Re: [DECNET] Endian bug fixes

2006-11-06 Thread Steven Whitehouse
Hi,

On Mon, 2006-11-06 at 10:34 +, Al Viro wrote:
 On Mon, Nov 06, 2006 at 10:32:43AM +, Al Viro wrote:
  On Mon, Nov 06, 2006 at 10:31:02AM +, Steven Whitehouse wrote:
   + opt-opt_optl   = dn_htons((__u16)*ptr++);
  
  Lose that cast; it's only confusing the things...
  
   + memcpy(opt-opt_data, ptr, dn_ntohs(opt-opt_optl));
   + skb_pull(skb, dn_ntohs(opt-opt_optl) + 1);
  
  ... and I'd actually do
  
  u16 len = *ptr++; /* yes, it's 8bit on the wire */
  opt-opt_optl   = dn_htons(len);
  BUG_ON(len  16); /* we've checked the contents earlier */
  memcpy(opt-opt_data, ptr, len);
  skb_pull(skb, len + 1);
 
 BTW, why the hell do we keep -opt_optl __le16 internally?  If we ever
 pass it to userland, fine, but let's convert to __le16 *then*...

Really the only thing that we do with this data is verify it and pass to
userland. It does mean that getsockopt() is simpler for just being able
to use copy_to_user() with a ptr  len depending on which of the
structures the user has requested rather than having to convert each
field of each structure for example.

I'm not sure its worth changing it now, for saving just one byte per
socket in this case,

Steve.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] reduce sizeof(struct flow)

2006-10-18 Thread Steven Whitehouse
Hi,

On Tue, Oct 17, 2006 at 11:53:36PM -0700, David Miller wrote:
 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Wed, 18 Oct 2006 07:42:17 +0200
 
  How many people are using DECNET and want to pay the price of this
  20 bytes dnports structure ?
 
Point taken :-) Eric, you also need to add a || defined(CONFIG_DECNET_MODULE)
I think in your patch, if you want to make this optional.

 I bet you could make that cost get hidden by careful rearrangement
 of the struct flow, or adjustment of the implementation.
 
 BTW, I see assignments to these fields, and as a specific example
 uli_u.dnports.objnamel but no use of it.  Perhaps much of dnports can
 even be deleted outright. :-)
 -

Its not used at the moment[*], but would be required for any kind of flow
tracking. The objnum field, could be folded into the objname field I
guess on the basis that objnamel == 0 means objname[0] represents the objnum,
but that doesn't really buy much.

Looking at the rearrangement option, and the relative lengths of
ipv6 and DECnet node addresses, dn_u is a lot smaller than ip6_u and thus
the obj[num|name|namel] fields could be moved into that structure.
Even after doing this, dn_u would still be shorter than ip6_u, although
12 bytes longer than ip4_u (if my counting is correct). Is that an
acceptable solution?

[*] By which I mean, as you've already alluded to, that its set up correctly
but not currently read/tested by anything yet.

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DECNET] Fix input routing bug

2006-10-18 Thread Steven Whitehouse

This patch fixes a silly bug that has been in the input routing code
for some time. It results in trying to send to a node directly when
the origin of the packet is via the default router.

Its been tested by Alan Kemmerer [EMAIL PROTECTED] who
reported the bug and its a fairly obvious fix for a typo.

Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
Signed-off-by: Patrick Caulfield [EMAIL PROTECTED]
Cc: Alan Kemmerer [EMAIL PROTECTED]

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index dd0761e..810ab9c 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1270,7 +1270,6 @@ #endif
goto e_inval;
 
res.type = RTN_LOCAL;
-   flags |= RTCF_DIRECTSRC;
} else {
__le16 src_map = fl.fld_src;
free_res = 1;
@@ -1341,7 +1340,7 @@ #endif
goto make_route;
 
/* Packet was intra-ethernet, so we know its on-link */
-   if (cb-rt_flags | DN_RT_F_IE) {
+   if (cb-rt_flags  DN_RT_F_IE) {
gateway = cb-src;
flags |= RTCF_DIRECTSRC;
goto make_route;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] reduce sizeof(struct flow)

2006-10-18 Thread Steven Whitehouse
Hi,

 
  Its not used at the moment[*], but would be required for any kind of flow
  tracking. The objnum field, could be folded into the objname field I
  guess on the basis that objnamel == 0 means objname[0] represents the
  objnum, but that doesn't really buy much.
 
 Well, as I privately said to David, objnamel is used, at least on 2.6.18 tree
 ( net/decnet/dn_route.c function compare_keys() 
 memcmp() will read this field and check before comparing objname[]
 
 So it is set by dn_sk_ports_copy(), and used by compare_keys()

It is set by dn_sk_ports_copy() as you say, but the obj[num|name|namel]
fields are not used as a key in compare_keys() at the moment, although
all the other fields are used there.

Since the recent bug fix to this area of the code (memcmp was
comparing uninitialised padding) its easier to see what is
being compared.

In fact I suspect the reason that the obj fields were not put
in the dn_u where they'd take up a lot less room was
because the memcmp covered only dn_u and not the rest of the
structure. It was a while ago that I wrote this and my memory
is failing me as to the exact reason I did it like that.
 
 
  Looking at the rearrangement option, and the relative lengths of
  ipv6 and DECnet node addresses, dn_u is a lot smaller than ip6_u and thus
  the obj[num|name|namel] fields could be moved into that structure.
  Even after doing this, dn_u would still be shorter than ip6_u, although
  12 bytes longer than ip4_u (if my counting is correct). Is that an
  acceptable solution?
 
 Well, most of my machines dont use IPV6 nor DECNET :)
 
Its still an improvement over the current situation, even though
it might be (probably is) possible to improve it further,

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Steven Whitehouse
Hi,

On Wed, Oct 11, 2006 at 11:05:04AM +0200, Michael S. Tsirkin wrote:
 Quoting r. David Miller [EMAIL PROTECTED]:
  Subject: Re: Dropping NETIF_F_SG since no checksum feature.
  
  From: Michael S. Tsirkin [EMAIL PROTECTED]
  Date: Wed, 11 Oct 2006 02:13:38 +0200
  
   Maybe I can patch linux to allow SG without checksum?
   Dave, maybe you could drop a hint or two on whether this is worthwhile
   and what are the issues that need addressing to make this work?
   
   I imagine it's not just the matter of changing net/core/dev.c :).
  
  You can't, it's a quality of implementation issue.  We sendfile()
  pages directly out of the filesystem page cache without any
  blocking of modifications to the page contents, and the only way
  that works is if the card computes the checksum for us.
  
  If we sendfile() a page directly, we must compute a correct checksum
  no matter what the contents.  We can't do this on the cpu before the
  data hits the device because another thread of execution can go in and
  modify the page contents which would invalidate the checksum and thus
  invalidating the packet.  We cannot allow this.
  
  Blocking modifications is too expensive, so that's not an option
  either.
  


I would argue that SG does make sense without checksum for protocols that
don't need/use a checksum. DECnet for example could do zero-copy without
caring about the checksum since it doesn't have one. One of these days
I'll get around to writing that bit of code :-)
 
 But copying still works fine, does it not?
 Dave, could you clarify this please?
 
 ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
  size_t size, int flags)
 {
 ssize_t res;
 struct sock *sk = sock-sk;
 
 if (!(sk-sk_route_caps  NETIF_F_SG) ||
 !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
 return sock_no_sendpage(sock, page, offset, size, flags);
 
 
 So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
 data will be copied over rather than sent directly.
 So why does dev.c have to force set NETIF_F_SG to off then?

I agree with that analysis,

Steve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Steven Whitehouse
Hi,

On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
 Quoting Steven Whitehouse [EMAIL PROTECTED]:
   ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
   {
   ssize_t res;
   struct sock *sk = sock-sk;
   
   if (!(sk-sk_route_caps  NETIF_F_SG) ||
   !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
   return sock_no_sendpage(sock, page, offset, size, flags);
   
   
   So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
   data will be copied over rather than sent directly.
   So why does dev.c have to force set NETIF_F_SG to off then?
  
  I agree with that analysis,
 
 So, would you Ack something like the following then?


In so far as I'm able to ack it, then yes, but with the following
caveats: that you also need to look at the tcp code's checks for
NETIF_F_SG (aside from the interface to tcp_sendpage which I think
we've agreed is ok) and ensure that this patch will not change their
behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
in particular - there may be others but thats the only one I can think
of off the top of my head. I think this is what davem was getting at
with his comment about copy  sum for smaller packets.

Also all subject to approval by davem and shemminger of course :-)

My general feeling is that devices should advertise the features that
they actually have and that the protocols should make the decision
as to which ones to use or not depending on the combinations available
(which I think is pretty much your argument).

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DECNET]: Add support for fwmark masks in routing rules

2006-08-25 Thread Steven Whitehouse
Hi,

On Fri, Aug 25, 2006 at 02:14:12PM +0200, Patrick McHardy wrote:

 [DECNET]: Add support for fwmark masks in routing rules
 
 Add support for fwmark masks. For compatibility a mask of 0x is used
 when a mark value != 0 is sent without a mask.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]
Acked-by: Steven Whitehouse [EMAIL PROTECTED]

Looks good,

Steve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DECNET] Fix to multiple tables routing

2006-08-11 Thread Steven Whitehouse

Here is a fix to Patrick McHardy's increase number of routing tables patch
for DECnet. I did just test this and it appears to be working fine with
this patch.

Cc: Patrick McHardy [EMAIL PROTECTED]
Cc: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 878312f..c8d9411 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -116,6 +116,7 @@ static struct nla_policy dn_fib_rule_pol
[FRA_SRC]   = { .type = NLA_U16 },
[FRA_DST]   = { .type = NLA_U16 },
[FRA_FWMARK]= { .type = NLA_U32 },
+   [FRA_TABLE] = { .type = NLA_U32 },
 };
 
 static int dn_fib_rule_match(struct fib_rule *rule, struct flowi *fl, int 
flags)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DECNET] Fix to multiple tables routing

2006-08-11 Thread Steven Whitehouse
Hi,

On Fri, Aug 11, 2006 at 05:22:17PM +0200, Patrick McHardy wrote:
 Steven Whitehouse wrote:
  Here is a fix to Patrick McHardy's increase number of routing tables patch
  for DECnet. I did just test this and it appears to be working fine with
  this patch.
  
  Cc: Patrick McHardy [EMAIL PROTECTED]
  Cc: Patrick Caulfield [EMAIL PROTECTED]
  Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]
  
  diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
  index 878312f..c8d9411 100644
  --- a/net/decnet/dn_rules.c
  +++ b/net/decnet/dn_rules.c
  @@ -116,6 +116,7 @@ static struct nla_policy dn_fib_rule_pol
  [FRA_SRC]   = { .type = NLA_U16 },
  [FRA_DST]   = { .type = NLA_U16 },
  [FRA_FWMARK]= { .type = NLA_U32 },
  +   [FRA_TABLE] = { .type = NLA_U32 },
   };
 
 Looks good. BTW, I noticed something in the DecNET fib_rule conversion
 that looks like a bug:
 
 The policy includes this for FRA_SRC/FRA_DST:
 
 [FRA_SRC]   = { .type = NLA_U16 },
 [FRA_DST]   = { .type = NLA_U16 },
 
 But in dn_fib_rule_compare it is used like this:
 
 if (tb[FRA_SRC]  (r-src != nla_get_u32(tb[FRA_SRC])))
 return 0;
 
 if (tb[FRA_DST]  (r-dst != nla_get_u32(tb[FRA_DST])))
 return 0;
 
 I think this might create problems depending on the endianness.

Yes, good spotting :-) I'll send a patch shortly,

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DECNET] Fix to decnet rules compare function

2006-08-11 Thread Steven Whitehouse


Here is a fix to the DECnet rules compare function where we used
32bit values rather than 16bit values. Spotted by Patrick McHardy.

Cc: Patrick McHardy [EMAIL PROTECTED]
Cc: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]


diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 878312f..977bb56 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -196,10 +197,10 @@ static int dn_fib_rule_compare(struct fi
return 0;
 #endif
 
-   if (tb[FRA_SRC]  (r-src != nla_get_u32(tb[FRA_SRC])))
+   if (tb[FRA_SRC]  (r-src != nla_get_u16(tb[FRA_SRC])))
return 0;
 
-   if (tb[FRA_DST]  (r-dst != nla_get_u32(tb[FRA_DST])))
+   if (tb[FRA_DST]  (r-dst != nla_get_u16(tb[FRA_DST])))
return 0;
 
return 1;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Rules and groups

2006-08-09 Thread Steven Whitehouse
Hi,

With your new protocol independent rules code, I see that there is an
entry in struct fib_rules_ops for a netlink group for notification of
rule changes. For whatever reason (historical I guess) DECnet has never
had a nl group assigned for this, so I'm left with two choices when moving
the code over to ues your new scheme.

Either I could send a patch to put the content of notify_rule_change()
conditional on nlgroup being non-zero, or I could assign a suitable
group for DECnet rule changes. I'm leaning towards the latter as the
best solution, so that at least it matches up with the other protocols.

Is there anything to stop me using RTNLGRP_NOP3 for that? (suitable renamed
of course!) So far as I can see its never been assigned to anything else...

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rules and groups

2006-08-09 Thread Steven Whitehouse
Hi,

On Wed, Aug 09, 2006 at 01:01:25AM -0700, David Miller wrote:
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Wed, 9 Aug 2006 09:01:39 +0100
 
  Is there anything to stop me using RTNLGRP_NOP3 for that? (suitable
  renamed of course!) So far as I can see its never been assigned to
  anything else...
 
 I have no objection to you using it.

Excellent. Please find attached my attempt at doing the conversion to the
generic rules code. It appears to work ok for me. You can also pull this
patch from the decnet git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/steve/decnet-2.6.19.git

which was cloned from net-2.6.19.git quite recently. In addition the
git tree has another patch which is in a following email.

Patrick: does this patch get your blessing?

Steve.

-

[DECnet] Covert rules to use generic code

This patch converts the DECnet rules code to use the generic
rules system created by Thomas Graf [EMAIL PROTECTED].

Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

---

 include/linux/rtnetlink.h |3 
 include/net/dn_fib.h  |8 -
 net/decnet/Kconfig|1 
 net/decnet/af_decnet.c|1 
 net/decnet/dn_dev.c   |3 
 net/decnet/dn_fib.c   |1 
 net/decnet/dn_route.c |3 
 net/decnet/dn_rules.c |  492 +
 net/decnet/dn_table.c |1 
 9 files changed, 195 insertions(+), 318 deletions(-)

0ff88cb56e1674234a8aaacb7869bc004252c0c2
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index bf35353..ea02e7d 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -885,7 +885,8 @@ enum rtnetlink_groups {
RTNLGRP_NOP2,
RTNLGRP_DECnet_ROUTE,
 #define RTNLGRP_DECnet_ROUTE   RTNLGRP_DECnet_ROUTE
-   RTNLGRP_NOP3,
+   RTNLGRP_DECnet_RULE,
+#define RTNLGRP_DECnet_RULERTNLGRP_DECnet_RULE
RTNLGRP_NOP4,
RTNLGRP_IPV6_PREFIX,
 #define RTNLGRP_IPV6_PREFIXRTNLGRP_IPV6_PREFIX
diff --git a/include/net/dn_fib.h b/include/net/dn_fib.h
index a15dcf0..32bc8ce 100644
--- a/include/net/dn_fib.h
+++ b/include/net/dn_fib.h
@@ -22,7 +22,7 @@ struct dn_kern_rta
 };
 
 struct dn_fib_res {
-   struct dn_fib_rule *r;
+   struct fib_rule *r;
struct dn_fib_info *fi;
unsigned char prefixlen;
unsigned char nh_sel;
@@ -147,10 +147,8 @@ extern void dn_fib_table_cleanup(void);
  */
 extern void dn_fib_rules_init(void);
 extern void dn_fib_rules_cleanup(void);
-extern void dn_fib_rule_put(struct dn_fib_rule *);
-extern __le16 dn_fib_rules_policy(__le16 saddr, struct dn_fib_res *res, 
unsigned *flags);
 extern unsigned dnet_addr_type(__le16 addr);
-extern int dn_fib_lookup(const struct flowi *fl, struct dn_fib_res *res);
+extern int dn_fib_lookup(struct flowi *fl, struct dn_fib_res *res);
 
 /*
  * rtnetlink interface
@@ -176,7 +174,7 @@ static inline void dn_fib_res_put(struct
if (res-fi)
dn_fib_info_put(res-fi);
if (res-r)
-   dn_fib_rule_put(res-r);
+   fib_rule_put(res-r);
 }
 
 extern struct dn_fib_table *dn_fib_tables[];
diff --git a/net/decnet/Kconfig b/net/decnet/Kconfig
index 92f2ec4..36e72cb 100644
--- a/net/decnet/Kconfig
+++ b/net/decnet/Kconfig
@@ -27,6 +27,7 @@ config DECNET
 config DECNET_ROUTER
bool DECnet: router support (EXPERIMENTAL)
depends on DECNET  EXPERIMENTAL
+   select FIB_RULES
---help---
  Add support for turning your DECnet Endnode into a level 1 or 2
  router.  This is an experimental, but functional option.  If you
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 5486247..70e0273 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -130,6 +130,7 @@ Version 0.0.62.1.110   07-aug-98   E
 #include linux/poll.h
 #include net/neighbour.h
 #include net/dst.h
+#include net/fib_rules.h
 #include net/dn.h
 #include net/dn_nsp.h
 #include net/dn_dev.h
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 476455f..9ab2e5b 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -45,6 +45,7 @@
 #include net/neighbour.h
 #include net/dst.h
 #include net/flow.h
+#include net/fib_rules.h
 #include net/dn.h
 #include net/dn_dev.h
 #include net/dn_route.h
@@ -1417,8 +1418,6 @@ static struct rtnetlink_link dnet_rtnetl
[RTM_DELROUTE - RTM_BASE] = { .doit = dn_fib_rtm_delroute,  },
[RTM_GETROUTE - RTM_BASE] = { .doit = dn_cache_getroute,
  .dumpit   = dn_fib_dump,  },
-   [RTM_NEWRULE  - RTM_BASE] = { .doit = dn_fib_rtm_newrule,   },
-   [RTM_DELRULE  - RTM_BASE] = { .doit = dn_fib_rtm_delrule,   },
[RTM_GETRULE  - RTM_BASE] = { .dumpit   = dn_fib_dump_rules,},
 #else
[RTM_GETROUTE - RTM_BASE] = { .doit = dn_cache_getroute,
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index fa20e2e

[DECnet] Convert rwlock to spinlock

2006-08-09 Thread Steven Whitehouse
Hi,

[This is also in the DECnet git tree]

As per Stephen Hemminger's recent patch to ipv4/fib_semantics.c this
is the same change but for DECnet.

Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

---

 net/decnet/dn_fib.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

ec32d8512a0878f0f48155bdd01f439c6bb80883
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index 846df39..ed5fb5c 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -59,7 +59,7 @@ extern int dn_cache_dump(struct sk_buff 
 
 static DEFINE_SPINLOCK(dn_fib_multipath_lock);
 static struct dn_fib_info *dn_fib_info_list;
-static DEFINE_RWLOCK(dn_fib_info_lock);
+static DEFINE_SPINLOCK(dn_fib_info_lock);
 
 static struct
 {
@@ -97,7 +97,7 @@ void dn_fib_free_info(struct dn_fib_info
 
 void dn_fib_release_info(struct dn_fib_info *fi)
 {
-   write_lock(dn_fib_info_lock);
+   spin_lock(dn_fib_info_lock);
if (fi  --fi-fib_treeref == 0) {
if (fi-fib_next)
fi-fib_next-fib_prev = fi-fib_prev;
@@ -108,7 +108,7 @@ void dn_fib_release_info(struct dn_fib_i
fi-fib_dead = 1;
dn_fib_info_put(fi);
}
-   write_unlock(dn_fib_info_lock);
+   spin_unlock(dn_fib_info_lock);
 }
 
 static inline int dn_fib_nh_comp(const struct dn_fib_info *fi, const struct 
dn_fib_info *ofi)
@@ -379,13 +379,13 @@ link_it:
 
fi-fib_treeref++;
atomic_inc(fi-fib_clntref);
-   write_lock(dn_fib_info_lock);
+   spin_lock(dn_fib_info_lock);
fi-fib_next = dn_fib_info_list;
fi-fib_prev = NULL;
if (dn_fib_info_list)
dn_fib_info_list-fib_prev = fi;
dn_fib_info_list = fi;
-   write_unlock(dn_fib_info_lock);
+   spin_unlock(dn_fib_info_lock);
return fi;
 
 err_inval:
-- 
1.2.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC DECNET 04/04]: Increase number of possible routing tables to 2^32

2006-07-03 Thread Steven Whitehouse
Hi,

On Mon, Jul 03, 2006 at 09:53:05AM +0200, Patrick McHardy wrote:
 [DECNET]: Increase number of possible routing tables to 2^32
 
 Increase the nubmer of possible routing tables to 2^32 by replacing the
 fixed sized array of tables by a hash table.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

I've had a quick look though the DECnet parts of this and it looks good,
atthough I've not had a chance to test it at all. Please cc Patrick Caulfield
on DECnet changes,

Steve.
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MPLS extension for pktgen

2006-03-22 Thread Steven Whitehouse
Hi,

On Tue, Mar 21, 2006 at 07:38:06PM +0100, Robert Olsson wrote:
 
 Steven Whitehouse writes:
 
   I've been looking into MPLS recently and so one of the first things that
   would be useful is a testbed to generate test traffic, and hence the
   attached patch to pktgen.
   
   If you have a moment to look over it, then please let me know if you
   would give it your blessing. The patch is against davem's current
   net-2.6.17 tree,
 
  Nice. Well never thought about mpls but it seems possible too. With mpls 
  enabled it seems send something my tcpdump does not understand so I trust 
  you there. I and it does not seem to brake standard ipv4 sending. So it 
  should be OK. 

My tcpdump (3.9.4-2 according to RPM) shows:

09:05:28.751494 MPLS (label 16, exp 0, ttl 10)
(label 32, exp 0, ttl 10)
(label 0 (IPv4 explicit NULL), exp 0, [S], ttl 10)
IP (tos 0x0, ttl  32, id 257, offset 0, flags [none], proto: UDP (17), 
length: 64) men-an-tol.chygwyn.com.discard  10.10.10.2.discard: [no cksum] 
UDP, length 36[|MPLS]
0x:  0001 000a 0002 000a  010a 4500 0040
0x0010:  0101  2011 7a6e 0a2c 0107 0a0a 0a02
0x0020:  0009 0009 002c  be9b e955  00c7
0x0030:  4421 0f0f 0001 f03d  0310  
0x0040:       

for the example label stack. So it looks correct to me, though I'm slightly
worried by the [|MPLS] I counted up the bytes manually and unless I've
made a mistake somewhere it appears ok.
 
  But I'll guess the mpls result code is not what you expected...
 
  echo mpls 0001000a,0002000a,000a/proc/net/pktgen/eth1
  cat /proc/net/pktgen/eth1 | grep Res
  Result: 000a
 
   sprintf(pg_result, OK: mpls=);
   for(n = 0; n  pkt_dev-nr_labels; n++)
sprintf(pg_result, %08x%s, ntohl(pkt_dev-labels[n]),
 n == pkt_dev-nr_labels-1 ?  : ,);
 
  Cheers.
 
   --ro

Yes, I'll send another patch with that fixed shortly :-)

Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MPLS extension for pktgen

2006-03-22 Thread Steven Whitehouse
Hi,

Here is the updated patch. If you are happy with this, then will you send
it on to Dave, or should I do that?

Steve.

---


Signed-off-by: Steven Whitehouse [EMAIL PROTECTED]

diff --git a/Documentation/networking/pktgen.txt 
b/Documentation/networking/pktgen.txt
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -109,6 +109,22 @@ Examples:
  cycle through the port range.
  pgset udp_dst_max 9   set UDP destination port max.
 
+ pgset mpls 0001000a,0002000a,000a set MPLS labels (in this example
+ outer label=16,middle label=32,
+inner label=0 (IPv4 NULL)) Note that
+there must be no spaces between the
+arguments. Leading zeros are required.
+Do not set the bottom of stack bit,
+thats done automatically. If you do
+set the bottom of stack bit, that
+indicates that you want to randomly
+generate that address and the flag
+MPLS_RND will be turned on. You
+can have any mix of random and fixed
+labels in the label stack.
+
+ pgset mpls 0  turn off mpls (or any invalid argument works 
too!)
+
  pgset stop  aborts injection. Also, ^C aborts generator.
 
 
@@ -167,6 +183,8 @@ pkt_size 
 min_pkt_size
 max_pkt_size
 
+mpls
+
 udp_src_min
 udp_src_max
 
@@ -211,4 +229,4 @@ Grant Grundler for testing on IA-64 and 
 Stephen Hemminger, Andi Kleen, Dave Miller and many others.
 
 
-Good luck with the linux net-development.
\ No newline at end of file
+Good luck with the linux net-development.
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -106,6 +106,9 @@
  *
  * interruptible_sleep_on_timeout() replaced Nishanth Aravamudan [EMAIL 
PROTECTED] 
  * 050103
+ *
+ * MPLS support by Steven Whitehouse [EMAIL PROTECTED]
+ *
  */
 #include linux/sys.h
 #include linux/types.h
@@ -154,7 +157,7 @@
 #include asm/div64.h /* do_div */
 #include asm/timex.h
 
-#define VERSION  pktgen v2.66: Packet Generator for packet performance 
testing.\n
+#define VERSION  pktgen v2.67: Packet Generator for packet performance 
testing.\n
 
 /* #define PG_DEBUG(a) a */
 #define PG_DEBUG(a)
@@ -162,6 +165,8 @@
 /* The buckets are exponential in 'width' */
 #define LAT_BUCKETS_MAX 32
 #define IP_NAME_SZ 32
+#define MAX_MPLS_LABELS 16 /* This is the max label stack depth */
+#define MPLS_STACK_BOTTOM __constant_htonl(0x0100)
 
 /* Device flag bits */
 #define F_IPSRC_RND   (10)   /* IP-Src Random  */
@@ -172,6 +177,7 @@
 #define F_MACDST_RND  (15)   /* MAC-Dst Random */
 #define F_TXSIZE_RND  (16)   /* Transmit size is random */
 #define F_IPV6(17)   /* Interface in IPV6 Mode */
+#define F_MPLS_RND(18)   /* Random MPLS labels */
 
 /* Thread control flag bits */
 #define T_TERMINATE   (10)
@@ -278,6 +284,10 @@ struct pktgen_dev {
__u16 udp_dst_min;  /* inclusive, dest UDP port */
__u16 udp_dst_max;  /* exclusive, dest UDP port */
 
+   /* MPLS */
+   unsigned nr_labels; /* Depth of stack, 0 = no MPLS */
+   __be32 labels[MAX_MPLS_LABELS];
+
__u32 src_mac_count;/* How many MACs to iterate through */
__u32 dst_mac_count;/* How many MACs to iterate through */
 
@@ -623,9 +633,19 @@ static int pktgen_if_show(struct seq_fil
   pkt_dev-udp_dst_min, pkt_dev-udp_dst_max);
 
seq_printf(seq,
-   src_mac_count: %d  dst_mac_count: %d \n Flags: ,
+   src_mac_count: %d  dst_mac_count: %d\n,
   pkt_dev-src_mac_count, pkt_dev-dst_mac_count);
 
+   if (pkt_dev-nr_labels) {
+   unsigned i;
+   seq_printf(seq,  mpls: );
+   for(i = 0; i  pkt_dev-nr_labels; i++)
+   seq_printf(seq, %08x%s, ntohl(pkt_dev-labels[i]),
+  i == pkt_dev-nr_labels-1 ? \n : , );
+   }
+
+   seq_printf(seq,  Flags: );
+
if (pkt_dev-flags  F_IPV6)
seq_printf(seq, IPV6  );
 
@@ -644,6 +664,9 @@ static int pktgen_if_show(struct seq_fil
if (pkt_dev-flags  F_UDPDST_RND)
seq_printf(seq, UDPDST_RND  );
 
+   if (pkt_dev-flags  F_MPLS_RND)
+   seq_printf(seq,  MPLS_RND  );
+
if (pkt_dev-flags  F_MACSRC_RND)
seq_printf(seq, MACSRC_RND  );
 
@@ -691,6 +714,29 @@ static int pktgen_if_show(struct seq_fil
return 0;
 }
 
+
+static int

MPLS extension for pktgen

2006-03-21 Thread Steven Whitehouse
Hi,

I've been looking into MPLS recently and so one of the first things that
would be useful is a testbed to generate test traffic, and hence the
attached patch to pktgen.

If you have a moment to look over it, then please let me know if you
would give it your blessing. The patch is against davem's current
net-2.6.17 tree,


Steve.


--

diff --git a/Documentation/networking/pktgen.txt 
b/Documentation/networking/pktgen.txt
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -109,6 +109,22 @@ Examples:
  cycle through the port range.
  pgset udp_dst_max 9   set UDP destination port max.
 
+ pgset mpls 0001000a,0002000a,000a set MPLS labels (in this example
+ outer label=16,middle label=32,
+inner label=0 (IPv4 NULL)) Note that
+there must be no spaces between the
+arguments. Leading zeros are required.
+Do not set the bottom of stack bit,
+thats done automatically. If you do
+set the bottom of stack bit, that
+indicates that you want to randomly
+generate that address and the flag
+MPLS_RND will be turned on. You
+can have any mix of random and fixed
+labels in the label stack.
+
+ pgset mpls 0  turn off mpls (or any invalid argument works 
too!)
+
  pgset stop  aborts injection. Also, ^C aborts generator.
 
 
@@ -167,6 +183,8 @@ pkt_size 
 min_pkt_size
 max_pkt_size
 
+mpls
+
 udp_src_min
 udp_src_max
 
@@ -211,4 +229,4 @@ Grant Grundler for testing on IA-64 and 
 Stephen Hemminger, Andi Kleen, Dave Miller and many others.
 
 
-Good luck with the linux net-development.
\ No newline at end of file
+Good luck with the linux net-development.
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -106,6 +106,9 @@
  *
  * interruptible_sleep_on_timeout() replaced Nishanth Aravamudan [EMAIL 
PROTECTED] 
  * 050103
+ *
+ * MPLS support by Steven Whitehouse [EMAIL PROTECTED]
+ *
  */
 #include linux/sys.h
 #include linux/types.h
@@ -154,7 +157,7 @@
 #include asm/div64.h /* do_div */
 #include asm/timex.h
 
-#define VERSION  pktgen v2.66: Packet Generator for packet performance 
testing.\n
+#define VERSION  pktgen v2.67: Packet Generator for packet performance 
testing.\n
 
 /* #define PG_DEBUG(a) a */
 #define PG_DEBUG(a)
@@ -162,6 +165,8 @@
 /* The buckets are exponential in 'width' */
 #define LAT_BUCKETS_MAX 32
 #define IP_NAME_SZ 32
+#define MAX_MPLS_LABELS 16 /* This is the max label stack depth */
+#define MPLS_STACK_BOTTOM __constant_htonl(0x0100)
 
 /* Device flag bits */
 #define F_IPSRC_RND   (10)   /* IP-Src Random  */
@@ -172,6 +177,7 @@
 #define F_MACDST_RND  (15)   /* MAC-Dst Random */
 #define F_TXSIZE_RND  (16)   /* Transmit size is random */
 #define F_IPV6(17)   /* Interface in IPV6 Mode */
+#define F_MPLS_RND(18)   /* Random MPLS labels */
 
 /* Thread control flag bits */
 #define T_TERMINATE   (10)
@@ -278,6 +284,10 @@ struct pktgen_dev {
__u16 udp_dst_min;  /* inclusive, dest UDP port */
__u16 udp_dst_max;  /* exclusive, dest UDP port */
 
+   /* MPLS */
+   unsigned nr_labels; /* Depth of stack, 0 = no MPLS */
+   __be32 labels[MAX_MPLS_LABELS];
+
__u32 src_mac_count;/* How many MACs to iterate through */
__u32 dst_mac_count;/* How many MACs to iterate through */
 
@@ -623,9 +633,19 @@ static int pktgen_if_show(struct seq_fil
   pkt_dev-udp_dst_min, pkt_dev-udp_dst_max);
 
seq_printf(seq,
-   src_mac_count: %d  dst_mac_count: %d \n Flags: ,
+   src_mac_count: %d  dst_mac_count: %d\n,
   pkt_dev-src_mac_count, pkt_dev-dst_mac_count);
 
+   if (pkt_dev-nr_labels) {
+   unsigned i;
+   seq_printf(seq,  mpls: );
+   for(i = 0; i  pkt_dev-nr_labels; i++)
+   seq_printf(seq, %08x%s, ntohl(pkt_dev-labels[i]),
+  i == pkt_dev-nr_labels-1 ? \n : , );
+   }
+
+   seq_printf(seq,  Flags: );
+
if (pkt_dev-flags  F_IPV6)
seq_printf(seq, IPV6  );
 
@@ -644,6 +664,9 @@ static int pktgen_if_show(struct seq_fil
if (pkt_dev-flags  F_UDPDST_RND)
seq_printf(seq, UDPDST_RND  );
 
+   if (pkt_dev-flags  F_MPLS_RND)
+   seq_printf(seq,  MPLS_RND  );
+
if (pkt_dev-flags