Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek
Hi, Can we solve the problem another way, by not taking refs on the glocks when we are iterating over them for the debugfs files? I assume that is the main issue here. We didn't used to take refs since the rcu locking was enough during the walk itself. We used to only keep track of the hash bucket and offset within the bucket when we dropped the rcu lock between calls to the iterator. I may have lost track of why that approach did not work? Steve. On 29/03/18 13:06, Andreas Gruenbacher wrote: Here's a second version of the patch (now a patch set) to eliminate rhashtable_walk_peek in gfs2. The first patch introduces lockref_put_not_zero, the inverse of lockref_get_not_zero. The second patch eliminates rhashtable_walk_peek in gfs2. In gfs2_glock_iter_next, the new lockref function from patch one is used to drop a lockref count as long as the count doesn't drop to zero. This is almost always the case; if there is a risk of dropping the last reference, we must defer that to a work queue because dropping the last reference may sleep. Thanks, Andreas Andreas Gruenbacher (2): lockref: Add lockref_put_not_zero gfs2: Stop using rhashtable_walk_peek fs/gfs2/glock.c | 47 --- include/linux/lockref.h | 1 + lib/lockref.c | 28 3 files changed, 57 insertions(+), 19 deletions(-)
Re: Question about SOCK_SEQPACKET
Hi, On 05/05/17 11:09, Sowmini Varadhan wrote: On (05/05/17 10:45), Steven Whitehouse wrote: I do wonder if the man page for recvmsg is wrong, or at least a bit confusing. SOCK_SEQPACKET is stream based not message based - it just happens to have EOR markers in the stream. There is no reason that the whole message needs to be returned in a single read, and in fact that would be impossible if the sender didn't insert any EOR markers but kept sending data beyond the size that the socket could buffer. I notice that man 7 socket says SOCK_SEQPACKET is for datagrams of fixed maximum length which is definitely wrong, as is the statement that a consumer has to read an entire packet with each system call. Which man page do you think is wrong here? The POSIX definition is here http://pubs.opengroup.org/onlinepubs/009695399/functions/recvmsg.html The description in http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_10.html says, "It is protocol-specific whether a maximum record size is imposed." In my machine (Ubuntu 4.4.0-72, and it is in socket(2), not socket(7), btw) doesnt have any references to max length, but I'm not sure I'd boldly assert "definitely wrong" about the requirement of having to read entire packet in a system call (see POSIX man page) --Sowmini Just before the part that you've quoted, the description for SOCK_SEQPACKET says: "The SOCK_SEQPACKET socket type is similar to the SOCK_STREAM type, and is also connection-oriented. The only difference between these types is that record boundaries are maintained using the SOCK_SEQPACKET type. A record can be sent using one or more output operations and received using one or more input operations, but a single operation never transfers parts of more than one record." The man page for socket says SOCK_SEQPACKET "Provides a sequenced, reliable, two-way connection-based data transmission path for datagrams of fixed maximum length" which is not true, because while there may be a length restriction, it is quite possible that there is not a length restriction (as per DECnet). It also says "a consumer is required to read an entire packet with each input system call" which is also contradicted by POSIX which says that a record can be "received using one or more input operations". So both statements in the man page are wrong, I think. I have to say that I'd not spotted the POSIX recvmsg wording before, which says "For message-based sockets, such as SOCK_DGRAM and SOCK_SEQPACKET, the entire message shall be read in a single operation" however that does contradict the earlier wording, where it explicitly says that multiple receive operations per record are ok for SOCK_SEQPACKET - at least if we assume that record == message in this case. Also, if this restriction was true (one message per recvmsg call) then MSG_EOR would never be needed on receive, since every recvmsg would be a single message/record only, and that same document does say that MSG_EOR can be set on receive for protocols which support it, Steve.
Re: Question about SOCK_SEQPACKET
Hi, On 05/05/17 06:18, Sam Kumar wrote: Hello, I have recently had occasion to use SOCK_SEQPACKET sockets on Linux, and noticed some odd behavior. When using sendmsg and recvmsg with these sockets, it seems that the "end-of-record" flag (MSG_EOR) is not being propagated correctly. It depends which protocol you are using as to whether that is true. SOCK_SEQPACKET is supposed to be identical to SOCK_STREAM except for the record separators. That is true for DECnet (but whether DECnet is still functional is another thing!) and not true for ax.25 which uses SOCK_SEQPACKET incorrectly. For AF_UNIX that you are using I'm not quite sure what would be expected. The man page for recvmsg(2) states: The msg_flags field in the msghdr is set on return of recvmsg(). It can contain several flags: MSG_EOR indicates end-of-record; the data returned completed a record (generally used with sockets of type SOCK_SEQPACKET). The man page for recvmsg(3) states: For message-based sockets, such as SOCK_DGRAM and SOCK_SEQPACKET, the entire message shall be read in a single operation. This leads me to believe that MSG_EOR should be set in the msghdr struct whenever recvmsg() returns data. However, I am not observing this flag ever being set, whether or not I set the MSG_EOR when sending the messages. If it helps you can take a look at the code I'm using. It is at https://github.com/samkumar/seqpacket-test/, commit 2a7dbc1f94bafce6950ee726bdd54da96945d083 (HEAD of master at the time of writing). Look at server.c and client.c (don't bother with goclient.go). The reason that I need to check MSG_EOR is that I need to distinguish between EOF and messages of length 0. For SOCK_STREAM sockets, a return value of 0 unambiguously means EOF, and for SOCK_DGRAM sockets a return value of 0 unambiguously means that a datagram of length 0 was received. Because SOCK_SEQPACKET is both connection-based and message-oriented, a return value of 0 is ambiguous. Based on my reading of the man pages, reading the MSG_EOR bit would let me disambiguate between EOF and a zero-length datagram, because MSG_EOR would be set for a zero-length datagram, but would not be set for EOF. If someone could please help me understand MSG_EOR, and how to distinguish between EOF and zero-length messages in a SOCK_SEQPACKET connection, I would definitely appreciate it! Thanks, Sam Kumar That would be my expectation of how it should work - if you ignore MSG_EOR on recvmsg then what you get is identical to a SOCK_STREAM, and that every call to recvmsg will return data from (at most) a single message, with MSG_EOR set if the end of that message has been reached. That is what POSIX says should happen anyway. I do wonder if the man page for recvmsg is wrong, or at least a bit confusing. SOCK_SEQPACKET is stream based not message based - it just happens to have EOR markers in the stream. There is no reason that the whole message needs to be returned in a single read, and in fact that would be impossible if the sender didn't insert any EOR markers but kept sending data beyond the size that the socket could buffer. I notice that man 7 socket says SOCK_SEQPACKET is for datagrams of fixed maximum length which is definitely wrong, as is the statement that a consumer has to read an entire packet with each system call. Also it is probably better to ask this question on netdev where it is likely to get more attention from the net developers, so I'm copying my reply there too, Steve.
Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API
Hi, On 12/08/15 17:42, Marcelo Ricardo Leitner wrote: Em 12-08-2015 12:33, David Laight escreveu: From: Marcelo Ricardo Leitner Sent: 12 August 2015 14:16 Em 12-08-2015 07:23, David Laight escreveu: From: Marcelo Ricardo Leitner Sent: 11 August 2015 23:22 DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not needed but this causes it to use sctp_do_peeloff() to mimic an kernel_accept() and this causes a symbol dependency on sctp module. By switching it to 1-to-1 API we can avoid this dependency and also reduce quite a lot of SCTP-specific code in lowcomms.c. ... You still need to enable sctp notifications (I think the patch deleted that code). Otherwise you don't get any kind of indication if the remote system 'resets' (ie sends an new INIT chunk) on an existing connection. Right, it would miss the restart event and could generate a corrupted tx/rx buffers by glueing parts of old messages with new ones. Except that it is SCTP so you'd expect DATA chunks to contain entire messages and so get unexpected message sequences rather than corrupt messages. I was thinking on cases where the buf for recvmsg is not enough to hold the chunk, so that the remaining is left for another attempt (sctp_recvmsg, around line 2130), but sounds like we won't purge rx buffer when the reset happens so that doesn't matter. The association is replaced, but the buffers are kept. Out of order messages aren't a problem for dlm. It can recover from that just fine, as it doesn't have a specific handshake at beginning or something like that and upper layers are agnostic to that state transition (disconnect/reconnect/...), this should be fine. I'm not sure thats true - DLM does rely on message ordering in some cases in order to ensure correct functioning. So depending on how SCTP is interfaced to DLM, it might potentially be an issue, Steve. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: My 802.3ad is my bond
Hi, On Wed, 2008-01-23 at 09:13 -0800, Jay Vosburgh wrote: Steven Whitehouse [EMAIL PROTECTED] wrote: [...] This commit: ece95f7fefe3afae19e641e1b3f5e64b00d5b948 seems to have caused a problem with parsing bond arguments as now only the numeric arguments seem to work (in modprobe.conf) and specifying 802.3ad fails. When I revert that patch in my local tree all seems ok. Thanks for the report; I know what the problem here is. I'll get a fix out. Also I notice that one of my two NICs now reports this: bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: Interface eth0 is already enslaved! bond0.5: no IPv6 routers present which I think is also new with this set of bonding updates, before it used to use both interfaces ok. I've not worked out which of the other patches causes this so far, but I can if its helpful, That would be helpful, as would some more details: e.g., the various options passed to bonding, the complete dmesg log, contents of /proc/net/bonding/bond0 [or whatever your interface is called], and anything else you think would be helpful. -J I've been through all the patches now, reverting each in turn and it seems that I still have the problem. Thats rather odd as I'm sure that I didn't have only one interface working before. When I get a moment I'll try and work out whats going on here, Steve. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
My 802.3ad is my bond
Hi, This commit: ece95f7fefe3afae19e641e1b3f5e64b00d5b948 seems to have caused a problem with parsing bond arguments as now only the numeric arguments seem to work (in modprobe.conf) and specifying 802.3ad fails. When I revert that patch in my local tree all seems ok. Also I notice that one of my two NICs now reports this: bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: Interface eth0 is already enslaved! bond0.5: no IPv6 routers present which I think is also new with this set of bonding updates, before it used to use both interfaces ok. I've not worked out which of the other patches causes this so far, but I can if its helpful, Steve. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: My 802.3ad is my bond
Hi, On Wed, 2008-01-23 at 09:13 -0800, Jay Vosburgh wrote: Steven Whitehouse [EMAIL PROTECTED] wrote: [...] This commit: ece95f7fefe3afae19e641e1b3f5e64b00d5b948 seems to have caused a problem with parsing bond arguments as now only the numeric arguments seem to work (in modprobe.conf) and specifying 802.3ad fails. When I revert that patch in my local tree all seems ok. Thanks for the report; I know what the problem here is. I'll get a fix out. Also I notice that one of my two NICs now reports this: bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: Interface eth0 is already enslaved! bond0.5: no IPv6 routers present which I think is also new with this set of bonding updates, before it used to use both interfaces ok. I've not worked out which of the other patches causes this so far, but I can if its helpful, That would be helpful, as would some more details: e.g., the various options passed to bonding, the complete dmesg log, contents of /proc/net/bonding/bond0 [or whatever your interface is called], and anything else you think would be helpful. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] Ok. I'll try and work out which patch it is, but in the mean time: Ethernet Channel Bonding Driver: v3.2.3 (December 6, 2007) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Active Aggregator Info: Aggregator ID: 1 Number of ports: 1 Actor Key: 17 Partner Key: 4 Partner Mac Address: 00:12:a9:13:3f:67 Slave Interface: eth0 MII Status: down -- this one should be up as well Link Failure Count: 2 Permanent HW addr: 00:11:43:d7:75:74 Aggregator ID: 2 Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:11:43:d7:75:75 Aggregator ID: 1 Bonding options: options bond0 miimon=100 mode=802.3ad I'll send the dmesg by private email as its quite large, Steve. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dn_neigh_table vs pneigh_lookup/pneigh_delete
Hi, On Wed, Dec 19, 2007 at 05:11:34PM +0300, Pavel Emelyanov wrote: Hi The pneigh_lookup/delete silently concerns, that the key_len of the table is more that 4 bytes. Look: u32 hash_val = *(u32 *)(pkey + key_len - 4); The hash_val for the proxy neighbor entry is four last bytes from the pkey. But the dn_neigh_tables' key_len is sizeof(__le16), that is 2, so setting (via netlink) the proxy neighbor entry for decnet will cause this entry to reside in arbitrary hash chain. Is this too bad for decnet? Thanks, Pavel The pneigh code is never used in DECnet, we only use the normal part of the neigh code where the hash function was changed so that it can be defined for each protocol (and thus doesn't suffer from this problem) Steve. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][DECNET] dn_nl_deladdr() almost always returns no error
Hi, On Thu, Nov 29, 2007 at 07:29:20PM +0300, Pavel Emelyanov wrote: As far as I see from the err variable initialization the dn_nl_deladdr() routine was designed to report errors like EADDRNOTAVAIL and probaby ENODEV. But the code sets this err to 0 after the first nlmsg_parse and goes on, returning this 0 in any case. Is this made deliberately, or the patch below is correct? The patch looks good to me. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Steven Whitehouse [EMAIL PROTECTED] Steve. --- diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index 66e266f..3bc82dc 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -651,16 +651,18 @@ static int dn_nl_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct dn_dev *dn_db; struct ifaddrmsg *ifm; struct dn_ifaddr *ifa, **ifap; - int err = -EADDRNOTAVAIL; + int err; err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, dn_ifa_policy); if (err 0) goto errout; + err = -ENODEV; ifm = nlmsg_data(nlh); if ((dn_db = dn_dev_by_index(ifm-ifa_index)) == NULL) goto errout; + err = -EADDRNOTAVAIL; for (ifap = dn_db-ifa_list; (ifa = *ifap); ifap = ifa-ifa_next) { if (tb[IFA_LOCAL] nla_memcmp(tb[IFA_LOCAL], ifa-ifa_local, 2)) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] decnet: addr module param can't be __initdata
Hi, Looks good, Feel free to add: Acked-by: Steven Whitehouse [EMAIL PROTECTED] Steve. On Thu, Nov 01, 2007 at 06:36:29PM +0300, Alexey Dobriyan wrote: sysfs keeps references to module parameters via /sys/module/*/parameters, so marking them as __initdata can't work. Steps to reproduce: modprobe decnet cat /sys/module/decnet/parameters/addr BUG: unable to handle kernel paging request at virtual address f88cd410 printing eip: c043dfd1 *pdpt = 4001 *pde = 04408067 *pte = Oops: [#1] PREEMPT SMP Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore Pid: 2099, comm: cat Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #6) EIP: 0060:[c043dfd1] EFLAGS: 00210286 CPU: 1 EIP is at param_get_int+0x6/0x20 EAX: c5c87000 EBX: ECX: 80d0 EDX: f88cd410 ESI: f8a108f8 EDI: c5c87000 EBP: ESP: c5c97f00 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000) Stack: f8a108f8 c5c87000 c043db6b f8a108f1 0124 c043de1a c043db2f f88cd410 c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 c5dd5078 c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 1000 Call Trace: [c043db6b] param_array_get+0x3c/0x62 [c043de1a] param_array_set+0x0/0xdf [c043db2f] param_array_get+0x0/0x62 [c043dd69] param_attr_show+0x15/0x2d [c043dd54] param_attr_show+0x0/0x2d [c043dbc8] module_attr_show+0x1a/0x1e [c04c431f] sysfs_read_file+0x7c/0xd9 [c04c42a3] sysfs_read_file+0x0/0xd9 [c048d4b2] vfs_read+0x88/0x134 [c042090b] do_page_fault+0x0/0x7d5 [c048d920] sys_read+0x41/0x67 [c04080fa] sysenter_past_esp+0x6b/0xc1 === Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 58 8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c EIP: [c043dfd1] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00 Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- net/decnet/dn_dev.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -1439,7 +1439,7 @@ static const struct file_operations dn_dev_seq_fops = { #endif /* CONFIG_PROC_FS */ -static int __initdata addr[2]; +static int addr[2]; module_param_array(addr, int, NULL, 0444); MODULE_PARM_DESC(addr, The DECnet address of this machine: area,node); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
DECnet routing rule resolution
Hi, One of the effects of the recent tidy up of the DECnet routing rules code is that we are no longer able to see the difference between reading a rule of type FR_ACT_UNREACHABLE returning -ENETUNREACH and simply running out of rules to look at, which also returns the same thing. The DECnet code used to return -ESRCH if it ran out of rules in which case the test in dn_route.c (which resulted in DECnet falling back to endnode routing in the -ESRCH case) no longer works. So there seems to be several options to try and solve this: one is to change the error return for running out of rules in fib_rules.c:fib_rules_lookup() to something else (but then that has a knock on effect in the ipv4 code). Another is to add the not found error return as a parameter in the struct fib_rules_ops so that both protocols can have their preferred error return. Both solutions seem a bit messy, so I thought I'd ask for some guidance on this before writing a patch, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/12] [DECNet]: Use rtnl registration interface
Hi, On Thu, Mar 22, 2007 at 02:00:04PM +0100, Thomas Graf wrote: Signed-off-by: Thomas Graf [EMAIL PROTECTED] Acked-by: Steven Whitehouse [EMAIL PROTECTED] for all the DECnet bits also the DECnet changes in the other patch I saw from you relating to the routing rules, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: decnet handle a failure in neigh_parms_alloc (take 2)
Hi, On Wed, Jan 24, 2007 at 09:55:45PM -0700, Eric W. Biederman wrote: While enhancing the neighbour code to handle multiple network namespaces I noticed that decnet is assuming neigh_parms_alloc will allways succeed, which is clearly wrong. So handle the failure. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Acked-by: Steven Whitehouse [EMAIL PROTECTED] Also you should cc Patrick as he is now the maintainer, Steve. --- net/decnet/dn_dev.c | 11 +-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index 324eb47..913e25a 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -1140,16 +1140,23 @@ struct dn_dev *dn_dev_create(struct net_device *dev, int *err) init_timer(dn_db-timer); dn_db-uptime = jiffies; + + dn_db-neigh_parms = neigh_parms_alloc(dev, dn_neigh_table); + if (!dn_db-neigh_parms) { + dev-dn_ptr = NULL; + kfree(dn_db); + return NULL; + } + if (dn_db-parms.up) { if (dn_db-parms.up(dev) 0) { + neigh_parms_release(dn_neigh_table, dn_db-neigh_parms); dev-dn_ptr = NULL; kfree(dn_db); return NULL; } } - dn_db-neigh_parms = neigh_parms_alloc(dev, dn_neigh_table); - dn_dev_sysctl_register(dev, dn_db-parms); dn_dev_set_timer(dev); -- 1.4.4.1.g278f - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] packet mark fib rules work
Hi, On Thu, Nov 09, 2006 at 12:27:35PM +0100, Thomas Graf wrote: Renames nfmark to mark and remove the dependency on netfilter to ease usage by all subsystems. Also removes all the unneeded config options to enable routing by fwmark, it can be safely enabled by default. Moves mark selector code from per protocol part into the generic part and adds support for inverting selectors. Acked-by: Steven Whitehouse [EMAIL PROTECTED] so far as all the DECnet bits go. One question though... will you be adding later (as your slide #5 and #11 from your netconf presentation appear to imply) a way to set the mark from the routing table (presumably included in the nexthop info) ? Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DECNET] Endian bug fixes
Hi, Here is a patch which fixes some endianess problems. Patrick: since you have both big little endian machines at your disposal, can you test to ensure this is ok? Thanks, Steve. From ed3de950e89f8b02302308a2bedd59123ff3b88e Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Mon, 6 Nov 2006 10:30:30 -0500 Subject: [PATCH] [DECNET] Endianess fixes Here are some fixes to endianess problems spotted by Al Viro. Cc: Al Viro [EMAIL PROTECTED] Cc: Patrick Caulfield [EMAIL PROTECTED] Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- net/decnet/af_decnet.c | 21 ++--- net/decnet/dn_rules.c |4 ++-- 2 files changed, 12 insertions(+), 13 deletions(-) diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index 3456cd3..37b4720 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -166,7 +166,7 @@ static struct hlist_head *dn_find_list(s if (scp-addr.sdn_flags SDF_WILD) return hlist_empty(dn_wild_sk) ? dn_wild_sk : NULL; - return dn_sk_hash[scp-addrloc DN_SK_HASH_MASK]; + return dn_sk_hash[dn_ntohs(scp-addrloc) DN_SK_HASH_MASK]; } /* @@ -180,7 +180,7 @@ static int check_port(__le16 port) if (port == 0) return -1; - sk_for_each(sk, node, dn_sk_hash[port DN_SK_HASH_MASK]) { + sk_for_each(sk, node, dn_sk_hash[dn_ntohs(port) DN_SK_HASH_MASK]) { struct dn_scp *scp = DN_SK(sk); if (scp-addrloc == port) return -1; @@ -194,12 +194,12 @@ static unsigned short port_alloc(struct static unsigned short port = 0x2000; unsigned short i_port = port; - while(check_port(++port) != 0) { + while(check_port(dn_htons(++port)) != 0) { if (port == i_port) return 0; } - scp-addrloc = port; + scp-addrloc = dn_htons(port); return 1; } @@ -418,7 +418,7 @@ struct sock *dn_find_by_skb(struct sk_bu struct dn_scp *scp; read_lock(dn_hash_lock); - sk_for_each(sk, node, dn_sk_hash[cb-dst_port DN_SK_HASH_MASK]) { + sk_for_each(sk, node, dn_sk_hash[dn_ntohs(cb-dst_port) DN_SK_HASH_MASK]) { scp = DN_SK(sk); if (cb-src != dn_saddr2dn(scp-peer)) continue; @@ -1016,13 +1016,12 @@ static void dn_access_copy(struct sk_buf static void dn_user_copy(struct sk_buff *skb, struct optdata_dn *opt) { -unsigned char *ptr = skb-data; - -opt-opt_optl = *ptr++; -opt-opt_status = 0; -memcpy(opt-opt_data, ptr, opt-opt_optl); -skb_pull(skb, dn_ntohs(opt-opt_optl) + 1); + unsigned char *ptr = skb-data; + opt-opt_optl = dn_htons((__u16)*ptr++); + opt-opt_status = 0; + memcpy(opt-opt_data, ptr, dn_ntohs(opt-opt_optl)); + skb_pull(skb, dn_ntohs(opt-opt_optl) + 1); } static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo) diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c index 3e0c882..590e0a7 100644 --- a/net/decnet/dn_rules.c +++ b/net/decnet/dn_rules.c @@ -124,8 +124,8 @@ static struct nla_policy dn_fib_rule_pol static int dn_fib_rule_match(struct fib_rule *rule, struct flowi *fl, int flags) { struct dn_fib_rule *r = (struct dn_fib_rule *)rule; - u16 daddr = fl-fld_dst; - u16 saddr = fl-fld_src; + __le16 daddr = fl-fld_dst; + __le16 saddr = fl-fld_src; if (((saddr ^ r-src) r-srcmask) || ((daddr ^ r-dst) r-dstmask)) -- 1.4.1
Re: [DECNET] Endian bug fixes
Hi, On Mon, 2006-11-06 at 10:32 +, Al Viro wrote: On Mon, Nov 06, 2006 at 10:31:02AM +, Steven Whitehouse wrote: + opt-opt_optl = dn_htons((__u16)*ptr++); Lose that cast; it's only confusing the things... + memcpy(opt-opt_data, ptr, dn_ntohs(opt-opt_optl)); + skb_pull(skb, dn_ntohs(opt-opt_optl) + 1); ... and I'd actually do u16 len = *ptr++; /* yes, it's 8bit on the wire */ opt-opt_optl = dn_htons(len); BUG_ON(len 16); /* we've checked the contents earlier */ memcpy(opt-opt_data, ptr, len); skb_pull(skb, len + 1); Ok, and I've also made the same change in the other places too, so far as its relevant in those cases. New patch attached, Steve. From a184f89a13fa292589f309057cc0775a8256a89e Mon Sep 17 00:00:00 2001 From: Steven Whitehouse [EMAIL PROTECTED] Date: Mon, 6 Nov 2006 11:51:00 -0500 Subject: [DECNET] Endianess fixes (try #2) Here are some fixes to endianess problems spotted by Al Viro. Cc: Al Viro [EMAIL PROTECTED] Cc: Patrick Caulfield [EMAIL PROTECTED] Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- net/decnet/af_decnet.c | 25 + net/decnet/dn_nsp_in.c |8 net/decnet/dn_nsp_out.c |2 +- net/decnet/dn_rules.c |4 ++-- 4 files changed, 20 insertions(+), 19 deletions(-) diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index 3456cd3..21f20f2 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -166,7 +166,7 @@ static struct hlist_head *dn_find_list(s if (scp-addr.sdn_flags SDF_WILD) return hlist_empty(dn_wild_sk) ? dn_wild_sk : NULL; - return dn_sk_hash[scp-addrloc DN_SK_HASH_MASK]; + return dn_sk_hash[dn_ntohs(scp-addrloc) DN_SK_HASH_MASK]; } /* @@ -180,7 +180,7 @@ static int check_port(__le16 port) if (port == 0) return -1; - sk_for_each(sk, node, dn_sk_hash[port DN_SK_HASH_MASK]) { + sk_for_each(sk, node, dn_sk_hash[dn_ntohs(port) DN_SK_HASH_MASK]) { struct dn_scp *scp = DN_SK(sk); if (scp-addrloc == port) return -1; @@ -194,12 +194,12 @@ static unsigned short port_alloc(struct static unsigned short port = 0x2000; unsigned short i_port = port; - while(check_port(++port) != 0) { + while(check_port(dn_htons(++port)) != 0) { if (port == i_port) return 0; } - scp-addrloc = port; + scp-addrloc = dn_htons(port); return 1; } @@ -418,7 +418,7 @@ struct sock *dn_find_by_skb(struct sk_bu struct dn_scp *scp; read_lock(dn_hash_lock); - sk_for_each(sk, node, dn_sk_hash[cb-dst_port DN_SK_HASH_MASK]) { + sk_for_each(sk, node, dn_sk_hash[dn_ntohs(cb-dst_port) DN_SK_HASH_MASK]) { scp = DN_SK(sk); if (cb-src != dn_saddr2dn(scp-peer)) continue; @@ -1016,13 +1016,14 @@ static void dn_access_copy(struct sk_buf static void dn_user_copy(struct sk_buff *skb, struct optdata_dn *opt) { -unsigned char *ptr = skb-data; - -opt-opt_optl = *ptr++; -opt-opt_status = 0; -memcpy(opt-opt_data, ptr, opt-opt_optl); -skb_pull(skb, dn_ntohs(opt-opt_optl) + 1); - + unsigned char *ptr = skb-data; + u16 len = *ptr++; /* yes, it's 8bit on the wire */ + + BUG_ON(len 16); /* we've checked the contents earlier */ + opt-opt_optl = dn_htons(len); + opt-opt_status = 0; + memcpy(opt-opt_data, ptr, len); + skb_pull(skb, len + 1); } static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo) diff --git a/net/decnet/dn_nsp_in.c b/net/decnet/dn_nsp_in.c index 72ecc6e..7683d4f 100644 --- a/net/decnet/dn_nsp_in.c +++ b/net/decnet/dn_nsp_in.c @@ -360,9 +360,9 @@ static void dn_nsp_conn_conf(struct sock scp-max_window = decnet_no_fc_max_cwnd; if (skb-len 0) { - unsigned char dlen = *skb-data; + u16 dlen = *skb-data; if ((dlen = 16) (dlen = skb-len)) { - scp-conndata_in.opt_optl = dn_htons((__u16)dlen); + scp-conndata_in.opt_optl = dn_htons(dlen); memcpy(scp-conndata_in.opt_data, skb-data + 1, dlen); } } @@ -404,9 +404,9 @@ static void dn_nsp_disc_init(struct sock memset(scp-discdata_in.opt_data, 0, 16); if (skb-len 0) { - unsigned char dlen = *skb-data; + u16 dlen = *skb-data; if ((dlen = 16) (dlen = skb-len)) { - scp-discdata_in.opt_optl = dn_htons((__u16)dlen); + scp-discdata_in.opt_optl = dn_htons(dlen); memcpy(scp-discdata_in.opt_data, skb-data + 1, dlen
Re: [DECNET] Endian bug fixes
Hi, On Mon, 2006-11-06 at 10:34 +, Al Viro wrote: On Mon, Nov 06, 2006 at 10:32:43AM +, Al Viro wrote: On Mon, Nov 06, 2006 at 10:31:02AM +, Steven Whitehouse wrote: + opt-opt_optl = dn_htons((__u16)*ptr++); Lose that cast; it's only confusing the things... + memcpy(opt-opt_data, ptr, dn_ntohs(opt-opt_optl)); + skb_pull(skb, dn_ntohs(opt-opt_optl) + 1); ... and I'd actually do u16 len = *ptr++; /* yes, it's 8bit on the wire */ opt-opt_optl = dn_htons(len); BUG_ON(len 16); /* we've checked the contents earlier */ memcpy(opt-opt_data, ptr, len); skb_pull(skb, len + 1); BTW, why the hell do we keep -opt_optl __le16 internally? If we ever pass it to userland, fine, but let's convert to __le16 *then*... Really the only thing that we do with this data is verify it and pass to userland. It does mean that getsockopt() is simpler for just being able to use copy_to_user() with a ptr len depending on which of the structures the user has requested rather than having to convert each field of each structure for example. I'm not sure its worth changing it now, for saving just one byte per socket in this case, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NET] reduce sizeof(struct flow)
Hi, On Tue, Oct 17, 2006 at 11:53:36PM -0700, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 18 Oct 2006 07:42:17 +0200 How many people are using DECNET and want to pay the price of this 20 bytes dnports structure ? Point taken :-) Eric, you also need to add a || defined(CONFIG_DECNET_MODULE) I think in your patch, if you want to make this optional. I bet you could make that cost get hidden by careful rearrangement of the struct flow, or adjustment of the implementation. BTW, I see assignments to these fields, and as a specific example uli_u.dnports.objnamel but no use of it. Perhaps much of dnports can even be deleted outright. :-) - Its not used at the moment[*], but would be required for any kind of flow tracking. The objnum field, could be folded into the objname field I guess on the basis that objnamel == 0 means objname[0] represents the objnum, but that doesn't really buy much. Looking at the rearrangement option, and the relative lengths of ipv6 and DECnet node addresses, dn_u is a lot smaller than ip6_u and thus the obj[num|name|namel] fields could be moved into that structure. Even after doing this, dn_u would still be shorter than ip6_u, although 12 bytes longer than ip4_u (if my counting is correct). Is that an acceptable solution? [*] By which I mean, as you've already alluded to, that its set up correctly but not currently read/tested by anything yet. Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DECNET] Fix input routing bug
This patch fixes a silly bug that has been in the input routing code for some time. It results in trying to send to a node directly when the origin of the packet is via the default router. Its been tested by Alan Kemmerer [EMAIL PROTECTED] who reported the bug and its a fairly obvious fix for a typo. Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] Signed-off-by: Patrick Caulfield [EMAIL PROTECTED] Cc: Alan Kemmerer [EMAIL PROTECTED] diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c index dd0761e..810ab9c 100644 --- a/net/decnet/dn_route.c +++ b/net/decnet/dn_route.c @@ -1270,7 +1270,6 @@ #endif goto e_inval; res.type = RTN_LOCAL; - flags |= RTCF_DIRECTSRC; } else { __le16 src_map = fl.fld_src; free_res = 1; @@ -1341,7 +1340,7 @@ #endif goto make_route; /* Packet was intra-ethernet, so we know its on-link */ - if (cb-rt_flags | DN_RT_F_IE) { + if (cb-rt_flags DN_RT_F_IE) { gateway = cb-src; flags |= RTCF_DIRECTSRC; goto make_route; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NET] reduce sizeof(struct flow)
Hi, Its not used at the moment[*], but would be required for any kind of flow tracking. The objnum field, could be folded into the objname field I guess on the basis that objnamel == 0 means objname[0] represents the objnum, but that doesn't really buy much. Well, as I privately said to David, objnamel is used, at least on 2.6.18 tree ( net/decnet/dn_route.c function compare_keys() memcmp() will read this field and check before comparing objname[] So it is set by dn_sk_ports_copy(), and used by compare_keys() It is set by dn_sk_ports_copy() as you say, but the obj[num|name|namel] fields are not used as a key in compare_keys() at the moment, although all the other fields are used there. Since the recent bug fix to this area of the code (memcmp was comparing uninitialised padding) its easier to see what is being compared. In fact I suspect the reason that the obj fields were not put in the dn_u where they'd take up a lot less room was because the memcmp covered only dn_u and not the rest of the structure. It was a while ago that I wrote this and my memory is failing me as to the exact reason I did it like that. Looking at the rearrangement option, and the relative lengths of ipv6 and DECnet node addresses, dn_u is a lot smaller than ip6_u and thus the obj[num|name|namel] fields could be moved into that structure. Even after doing this, dn_u would still be shorter than ip6_u, although 12 bytes longer than ip4_u (if my counting is correct). Is that an acceptable solution? Well, most of my machines dont use IPV6 nor DECNET :) Its still an improvement over the current situation, even though it might be (probably is) possible to improve it further, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dropping NETIF_F_SG since no checksum feature.
Hi, On Wed, Oct 11, 2006 at 11:05:04AM +0200, Michael S. Tsirkin wrote: Quoting r. David Miller [EMAIL PROTECTED]: Subject: Re: Dropping NETIF_F_SG since no checksum feature. From: Michael S. Tsirkin [EMAIL PROTECTED] Date: Wed, 11 Oct 2006 02:13:38 +0200 Maybe I can patch linux to allow SG without checksum? Dave, maybe you could drop a hint or two on whether this is worthwhile and what are the issues that need addressing to make this work? I imagine it's not just the matter of changing net/core/dev.c :). You can't, it's a quality of implementation issue. We sendfile() pages directly out of the filesystem page cache without any blocking of modifications to the page contents, and the only way that works is if the card computes the checksum for us. If we sendfile() a page directly, we must compute a correct checksum no matter what the contents. We can't do this on the cpu before the data hits the device because another thread of execution can go in and modify the page contents which would invalidate the checksum and thus invalidating the packet. We cannot allow this. Blocking modifications is too expensive, so that's not an option either. I would argue that SG does make sense without checksum for protocols that don't need/use a checksum. DECnet for example could do zero-copy without caring about the checksum since it doesn't have one. One of these days I'll get around to writing that bit of code :-) But copying still works fine, does it not? Dave, could you clarify this please? ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { ssize_t res; struct sock *sk = sock-sk; if (!(sk-sk_route_caps NETIF_F_SG) || !(sk-sk_route_caps NETIF_F_ALL_CSUM)) return sock_no_sendpage(sock, page, offset, size, flags); So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, data will be copied over rather than sent directly. So why does dev.c have to force set NETIF_F_SG to off then? I agree with that analysis, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dropping NETIF_F_SG since no checksum feature.
Hi, On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote: Quoting Steven Whitehouse [EMAIL PROTECTED]: ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { ssize_t res; struct sock *sk = sock-sk; if (!(sk-sk_route_caps NETIF_F_SG) || !(sk-sk_route_caps NETIF_F_ALL_CSUM)) return sock_no_sendpage(sock, page, offset, size, flags); So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, data will be copied over rather than sent directly. So why does dev.c have to force set NETIF_F_SG to off then? I agree with that analysis, So, would you Ack something like the following then? In so far as I'm able to ack it, then yes, but with the following caveats: that you also need to look at the tcp code's checks for NETIF_F_SG (aside from the interface to tcp_sendpage which I think we've agreed is ok) and ensure that this patch will not change their behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size() in particular - there may be others but thats the only one I can think of off the top of my head. I think this is what davem was getting at with his comment about copy sum for smaller packets. Also all subject to approval by davem and shemminger of course :-) My general feeling is that devices should advertise the features that they actually have and that the protocols should make the decision as to which ones to use or not depending on the combinations available (which I think is pretty much your argument). Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DECNET]: Add support for fwmark masks in routing rules
Hi, On Fri, Aug 25, 2006 at 02:14:12PM +0200, Patrick McHardy wrote: [DECNET]: Add support for fwmark masks in routing rules Add support for fwmark masks. For compatibility a mask of 0x is used when a mark value != 0 is sent without a mask. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] Acked-by: Steven Whitehouse [EMAIL PROTECTED] Looks good, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DECNET] Fix to multiple tables routing
Here is a fix to Patrick McHardy's increase number of routing tables patch for DECnet. I did just test this and it appears to be working fine with this patch. Cc: Patrick McHardy [EMAIL PROTECTED] Cc: Patrick Caulfield [EMAIL PROTECTED] Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c index 878312f..c8d9411 100644 --- a/net/decnet/dn_rules.c +++ b/net/decnet/dn_rules.c @@ -116,6 +116,7 @@ static struct nla_policy dn_fib_rule_pol [FRA_SRC] = { .type = NLA_U16 }, [FRA_DST] = { .type = NLA_U16 }, [FRA_FWMARK]= { .type = NLA_U32 }, + [FRA_TABLE] = { .type = NLA_U32 }, }; static int dn_fib_rule_match(struct fib_rule *rule, struct flowi *fl, int flags) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DECNET] Fix to multiple tables routing
Hi, On Fri, Aug 11, 2006 at 05:22:17PM +0200, Patrick McHardy wrote: Steven Whitehouse wrote: Here is a fix to Patrick McHardy's increase number of routing tables patch for DECnet. I did just test this and it appears to be working fine with this patch. Cc: Patrick McHardy [EMAIL PROTECTED] Cc: Patrick Caulfield [EMAIL PROTECTED] Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c index 878312f..c8d9411 100644 --- a/net/decnet/dn_rules.c +++ b/net/decnet/dn_rules.c @@ -116,6 +116,7 @@ static struct nla_policy dn_fib_rule_pol [FRA_SRC] = { .type = NLA_U16 }, [FRA_DST] = { .type = NLA_U16 }, [FRA_FWMARK]= { .type = NLA_U32 }, + [FRA_TABLE] = { .type = NLA_U32 }, }; Looks good. BTW, I noticed something in the DecNET fib_rule conversion that looks like a bug: The policy includes this for FRA_SRC/FRA_DST: [FRA_SRC] = { .type = NLA_U16 }, [FRA_DST] = { .type = NLA_U16 }, But in dn_fib_rule_compare it is used like this: if (tb[FRA_SRC] (r-src != nla_get_u32(tb[FRA_SRC]))) return 0; if (tb[FRA_DST] (r-dst != nla_get_u32(tb[FRA_DST]))) return 0; I think this might create problems depending on the endianness. Yes, good spotting :-) I'll send a patch shortly, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DECNET] Fix to decnet rules compare function
Here is a fix to the DECnet rules compare function where we used 32bit values rather than 16bit values. Spotted by Patrick McHardy. Cc: Patrick McHardy [EMAIL PROTECTED] Cc: Patrick Caulfield [EMAIL PROTECTED] Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c index 878312f..977bb56 100644 --- a/net/decnet/dn_rules.c +++ b/net/decnet/dn_rules.c @@ -196,10 +197,10 @@ static int dn_fib_rule_compare(struct fi return 0; #endif - if (tb[FRA_SRC] (r-src != nla_get_u32(tb[FRA_SRC]))) + if (tb[FRA_SRC] (r-src != nla_get_u16(tb[FRA_SRC]))) return 0; - if (tb[FRA_DST] (r-dst != nla_get_u32(tb[FRA_DST]))) + if (tb[FRA_DST] (r-dst != nla_get_u16(tb[FRA_DST]))) return 0; return 1; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Rules and groups
Hi, With your new protocol independent rules code, I see that there is an entry in struct fib_rules_ops for a netlink group for notification of rule changes. For whatever reason (historical I guess) DECnet has never had a nl group assigned for this, so I'm left with two choices when moving the code over to ues your new scheme. Either I could send a patch to put the content of notify_rule_change() conditional on nlgroup being non-zero, or I could assign a suitable group for DECnet rule changes. I'm leaning towards the latter as the best solution, so that at least it matches up with the other protocols. Is there anything to stop me using RTNLGRP_NOP3 for that? (suitable renamed of course!) So far as I can see its never been assigned to anything else... Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rules and groups
Hi, On Wed, Aug 09, 2006 at 01:01:25AM -0700, David Miller wrote: From: Steven Whitehouse [EMAIL PROTECTED] Date: Wed, 9 Aug 2006 09:01:39 +0100 Is there anything to stop me using RTNLGRP_NOP3 for that? (suitable renamed of course!) So far as I can see its never been assigned to anything else... I have no objection to you using it. Excellent. Please find attached my attempt at doing the conversion to the generic rules code. It appears to work ok for me. You can also pull this patch from the decnet git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/steve/decnet-2.6.19.git which was cloned from net-2.6.19.git quite recently. In addition the git tree has another patch which is in a following email. Patrick: does this patch get your blessing? Steve. - [DECnet] Covert rules to use generic code This patch converts the DECnet rules code to use the generic rules system created by Thomas Graf [EMAIL PROTECTED]. Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- include/linux/rtnetlink.h |3 include/net/dn_fib.h |8 - net/decnet/Kconfig|1 net/decnet/af_decnet.c|1 net/decnet/dn_dev.c |3 net/decnet/dn_fib.c |1 net/decnet/dn_route.c |3 net/decnet/dn_rules.c | 492 + net/decnet/dn_table.c |1 9 files changed, 195 insertions(+), 318 deletions(-) 0ff88cb56e1674234a8aaacb7869bc004252c0c2 diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index bf35353..ea02e7d 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -885,7 +885,8 @@ enum rtnetlink_groups { RTNLGRP_NOP2, RTNLGRP_DECnet_ROUTE, #define RTNLGRP_DECnet_ROUTE RTNLGRP_DECnet_ROUTE - RTNLGRP_NOP3, + RTNLGRP_DECnet_RULE, +#define RTNLGRP_DECnet_RULERTNLGRP_DECnet_RULE RTNLGRP_NOP4, RTNLGRP_IPV6_PREFIX, #define RTNLGRP_IPV6_PREFIXRTNLGRP_IPV6_PREFIX diff --git a/include/net/dn_fib.h b/include/net/dn_fib.h index a15dcf0..32bc8ce 100644 --- a/include/net/dn_fib.h +++ b/include/net/dn_fib.h @@ -22,7 +22,7 @@ struct dn_kern_rta }; struct dn_fib_res { - struct dn_fib_rule *r; + struct fib_rule *r; struct dn_fib_info *fi; unsigned char prefixlen; unsigned char nh_sel; @@ -147,10 +147,8 @@ extern void dn_fib_table_cleanup(void); */ extern void dn_fib_rules_init(void); extern void dn_fib_rules_cleanup(void); -extern void dn_fib_rule_put(struct dn_fib_rule *); -extern __le16 dn_fib_rules_policy(__le16 saddr, struct dn_fib_res *res, unsigned *flags); extern unsigned dnet_addr_type(__le16 addr); -extern int dn_fib_lookup(const struct flowi *fl, struct dn_fib_res *res); +extern int dn_fib_lookup(struct flowi *fl, struct dn_fib_res *res); /* * rtnetlink interface @@ -176,7 +174,7 @@ static inline void dn_fib_res_put(struct if (res-fi) dn_fib_info_put(res-fi); if (res-r) - dn_fib_rule_put(res-r); + fib_rule_put(res-r); } extern struct dn_fib_table *dn_fib_tables[]; diff --git a/net/decnet/Kconfig b/net/decnet/Kconfig index 92f2ec4..36e72cb 100644 --- a/net/decnet/Kconfig +++ b/net/decnet/Kconfig @@ -27,6 +27,7 @@ config DECNET config DECNET_ROUTER bool DECnet: router support (EXPERIMENTAL) depends on DECNET EXPERIMENTAL + select FIB_RULES ---help--- Add support for turning your DECnet Endnode into a level 1 or 2 router. This is an experimental, but functional option. If you diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index 5486247..70e0273 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -130,6 +130,7 @@ Version 0.0.62.1.110 07-aug-98 E #include linux/poll.h #include net/neighbour.h #include net/dst.h +#include net/fib_rules.h #include net/dn.h #include net/dn_nsp.h #include net/dn_dev.h diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index 476455f..9ab2e5b 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -45,6 +45,7 @@ #include net/neighbour.h #include net/dst.h #include net/flow.h +#include net/fib_rules.h #include net/dn.h #include net/dn_dev.h #include net/dn_route.h @@ -1417,8 +1418,6 @@ static struct rtnetlink_link dnet_rtnetl [RTM_DELROUTE - RTM_BASE] = { .doit = dn_fib_rtm_delroute, }, [RTM_GETROUTE - RTM_BASE] = { .doit = dn_cache_getroute, .dumpit = dn_fib_dump, }, - [RTM_NEWRULE - RTM_BASE] = { .doit = dn_fib_rtm_newrule, }, - [RTM_DELRULE - RTM_BASE] = { .doit = dn_fib_rtm_delrule, }, [RTM_GETRULE - RTM_BASE] = { .dumpit = dn_fib_dump_rules,}, #else [RTM_GETROUTE - RTM_BASE] = { .doit = dn_cache_getroute, diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c index fa20e2e
[DECnet] Convert rwlock to spinlock
Hi, [This is also in the DECnet git tree] As per Stephen Hemminger's recent patch to ipv4/fib_semantics.c this is the same change but for DECnet. Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] --- net/decnet/dn_fib.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) ec32d8512a0878f0f48155bdd01f439c6bb80883 diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c index 846df39..ed5fb5c 100644 --- a/net/decnet/dn_fib.c +++ b/net/decnet/dn_fib.c @@ -59,7 +59,7 @@ extern int dn_cache_dump(struct sk_buff static DEFINE_SPINLOCK(dn_fib_multipath_lock); static struct dn_fib_info *dn_fib_info_list; -static DEFINE_RWLOCK(dn_fib_info_lock); +static DEFINE_SPINLOCK(dn_fib_info_lock); static struct { @@ -97,7 +97,7 @@ void dn_fib_free_info(struct dn_fib_info void dn_fib_release_info(struct dn_fib_info *fi) { - write_lock(dn_fib_info_lock); + spin_lock(dn_fib_info_lock); if (fi --fi-fib_treeref == 0) { if (fi-fib_next) fi-fib_next-fib_prev = fi-fib_prev; @@ -108,7 +108,7 @@ void dn_fib_release_info(struct dn_fib_i fi-fib_dead = 1; dn_fib_info_put(fi); } - write_unlock(dn_fib_info_lock); + spin_unlock(dn_fib_info_lock); } static inline int dn_fib_nh_comp(const struct dn_fib_info *fi, const struct dn_fib_info *ofi) @@ -379,13 +379,13 @@ link_it: fi-fib_treeref++; atomic_inc(fi-fib_clntref); - write_lock(dn_fib_info_lock); + spin_lock(dn_fib_info_lock); fi-fib_next = dn_fib_info_list; fi-fib_prev = NULL; if (dn_fib_info_list) dn_fib_info_list-fib_prev = fi; dn_fib_info_list = fi; - write_unlock(dn_fib_info_lock); + spin_unlock(dn_fib_info_lock); return fi; err_inval: -- 1.2.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC DECNET 04/04]: Increase number of possible routing tables to 2^32
Hi, On Mon, Jul 03, 2006 at 09:53:05AM +0200, Patrick McHardy wrote: [DECNET]: Increase number of possible routing tables to 2^32 Increase the nubmer of possible routing tables to 2^32 by replacing the fixed sized array of tables by a hash table. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] I've had a quick look though the DECnet parts of this and it looks good, atthough I've not had a chance to test it at all. Please cc Patrick Caulfield on DECnet changes, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MPLS extension for pktgen
Hi, On Tue, Mar 21, 2006 at 07:38:06PM +0100, Robert Olsson wrote: Steven Whitehouse writes: I've been looking into MPLS recently and so one of the first things that would be useful is a testbed to generate test traffic, and hence the attached patch to pktgen. If you have a moment to look over it, then please let me know if you would give it your blessing. The patch is against davem's current net-2.6.17 tree, Nice. Well never thought about mpls but it seems possible too. With mpls enabled it seems send something my tcpdump does not understand so I trust you there. I and it does not seem to brake standard ipv4 sending. So it should be OK. My tcpdump (3.9.4-2 according to RPM) shows: 09:05:28.751494 MPLS (label 16, exp 0, ttl 10) (label 32, exp 0, ttl 10) (label 0 (IPv4 explicit NULL), exp 0, [S], ttl 10) IP (tos 0x0, ttl 32, id 257, offset 0, flags [none], proto: UDP (17), length: 64) men-an-tol.chygwyn.com.discard 10.10.10.2.discard: [no cksum] UDP, length 36[|MPLS] 0x: 0001 000a 0002 000a 010a 4500 0040 0x0010: 0101 2011 7a6e 0a2c 0107 0a0a 0a02 0x0020: 0009 0009 002c be9b e955 00c7 0x0030: 4421 0f0f 0001 f03d 0310 0x0040: for the example label stack. So it looks correct to me, though I'm slightly worried by the [|MPLS] I counted up the bytes manually and unless I've made a mistake somewhere it appears ok. But I'll guess the mpls result code is not what you expected... echo mpls 0001000a,0002000a,000a/proc/net/pktgen/eth1 cat /proc/net/pktgen/eth1 | grep Res Result: 000a sprintf(pg_result, OK: mpls=); for(n = 0; n pkt_dev-nr_labels; n++) sprintf(pg_result, %08x%s, ntohl(pkt_dev-labels[n]), n == pkt_dev-nr_labels-1 ? : ,); Cheers. --ro Yes, I'll send another patch with that fixed shortly :-) Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MPLS extension for pktgen
Hi, Here is the updated patch. If you are happy with this, then will you send it on to Dave, or should I do that? Steve. --- Signed-off-by: Steven Whitehouse [EMAIL PROTECTED] diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt @@ -109,6 +109,22 @@ Examples: cycle through the port range. pgset udp_dst_max 9 set UDP destination port max. + pgset mpls 0001000a,0002000a,000a set MPLS labels (in this example + outer label=16,middle label=32, +inner label=0 (IPv4 NULL)) Note that +there must be no spaces between the +arguments. Leading zeros are required. +Do not set the bottom of stack bit, +thats done automatically. If you do +set the bottom of stack bit, that +indicates that you want to randomly +generate that address and the flag +MPLS_RND will be turned on. You +can have any mix of random and fixed +labels in the label stack. + + pgset mpls 0 turn off mpls (or any invalid argument works too!) + pgset stop aborts injection. Also, ^C aborts generator. @@ -167,6 +183,8 @@ pkt_size min_pkt_size max_pkt_size +mpls + udp_src_min udp_src_max @@ -211,4 +229,4 @@ Grant Grundler for testing on IA-64 and Stephen Hemminger, Andi Kleen, Dave Miller and many others. -Good luck with the linux net-development. \ No newline at end of file +Good luck with the linux net-development. diff --git a/net/core/pktgen.c b/net/core/pktgen.c --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -106,6 +106,9 @@ * * interruptible_sleep_on_timeout() replaced Nishanth Aravamudan [EMAIL PROTECTED] * 050103 + * + * MPLS support by Steven Whitehouse [EMAIL PROTECTED] + * */ #include linux/sys.h #include linux/types.h @@ -154,7 +157,7 @@ #include asm/div64.h /* do_div */ #include asm/timex.h -#define VERSION pktgen v2.66: Packet Generator for packet performance testing.\n +#define VERSION pktgen v2.67: Packet Generator for packet performance testing.\n /* #define PG_DEBUG(a) a */ #define PG_DEBUG(a) @@ -162,6 +165,8 @@ /* The buckets are exponential in 'width' */ #define LAT_BUCKETS_MAX 32 #define IP_NAME_SZ 32 +#define MAX_MPLS_LABELS 16 /* This is the max label stack depth */ +#define MPLS_STACK_BOTTOM __constant_htonl(0x0100) /* Device flag bits */ #define F_IPSRC_RND (10) /* IP-Src Random */ @@ -172,6 +177,7 @@ #define F_MACDST_RND (15) /* MAC-Dst Random */ #define F_TXSIZE_RND (16) /* Transmit size is random */ #define F_IPV6(17) /* Interface in IPV6 Mode */ +#define F_MPLS_RND(18) /* Random MPLS labels */ /* Thread control flag bits */ #define T_TERMINATE (10) @@ -278,6 +284,10 @@ struct pktgen_dev { __u16 udp_dst_min; /* inclusive, dest UDP port */ __u16 udp_dst_max; /* exclusive, dest UDP port */ + /* MPLS */ + unsigned nr_labels; /* Depth of stack, 0 = no MPLS */ + __be32 labels[MAX_MPLS_LABELS]; + __u32 src_mac_count;/* How many MACs to iterate through */ __u32 dst_mac_count;/* How many MACs to iterate through */ @@ -623,9 +633,19 @@ static int pktgen_if_show(struct seq_fil pkt_dev-udp_dst_min, pkt_dev-udp_dst_max); seq_printf(seq, - src_mac_count: %d dst_mac_count: %d \n Flags: , + src_mac_count: %d dst_mac_count: %d\n, pkt_dev-src_mac_count, pkt_dev-dst_mac_count); + if (pkt_dev-nr_labels) { + unsigned i; + seq_printf(seq, mpls: ); + for(i = 0; i pkt_dev-nr_labels; i++) + seq_printf(seq, %08x%s, ntohl(pkt_dev-labels[i]), + i == pkt_dev-nr_labels-1 ? \n : , ); + } + + seq_printf(seq, Flags: ); + if (pkt_dev-flags F_IPV6) seq_printf(seq, IPV6 ); @@ -644,6 +664,9 @@ static int pktgen_if_show(struct seq_fil if (pkt_dev-flags F_UDPDST_RND) seq_printf(seq, UDPDST_RND ); + if (pkt_dev-flags F_MPLS_RND) + seq_printf(seq, MPLS_RND ); + if (pkt_dev-flags F_MACSRC_RND) seq_printf(seq, MACSRC_RND ); @@ -691,6 +714,29 @@ static int pktgen_if_show(struct seq_fil return 0; } + +static int
MPLS extension for pktgen
Hi, I've been looking into MPLS recently and so one of the first things that would be useful is a testbed to generate test traffic, and hence the attached patch to pktgen. If you have a moment to look over it, then please let me know if you would give it your blessing. The patch is against davem's current net-2.6.17 tree, Steve. -- diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt @@ -109,6 +109,22 @@ Examples: cycle through the port range. pgset udp_dst_max 9 set UDP destination port max. + pgset mpls 0001000a,0002000a,000a set MPLS labels (in this example + outer label=16,middle label=32, +inner label=0 (IPv4 NULL)) Note that +there must be no spaces between the +arguments. Leading zeros are required. +Do not set the bottom of stack bit, +thats done automatically. If you do +set the bottom of stack bit, that +indicates that you want to randomly +generate that address and the flag +MPLS_RND will be turned on. You +can have any mix of random and fixed +labels in the label stack. + + pgset mpls 0 turn off mpls (or any invalid argument works too!) + pgset stop aborts injection. Also, ^C aborts generator. @@ -167,6 +183,8 @@ pkt_size min_pkt_size max_pkt_size +mpls + udp_src_min udp_src_max @@ -211,4 +229,4 @@ Grant Grundler for testing on IA-64 and Stephen Hemminger, Andi Kleen, Dave Miller and many others. -Good luck with the linux net-development. \ No newline at end of file +Good luck with the linux net-development. diff --git a/net/core/pktgen.c b/net/core/pktgen.c --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -106,6 +106,9 @@ * * interruptible_sleep_on_timeout() replaced Nishanth Aravamudan [EMAIL PROTECTED] * 050103 + * + * MPLS support by Steven Whitehouse [EMAIL PROTECTED] + * */ #include linux/sys.h #include linux/types.h @@ -154,7 +157,7 @@ #include asm/div64.h /* do_div */ #include asm/timex.h -#define VERSION pktgen v2.66: Packet Generator for packet performance testing.\n +#define VERSION pktgen v2.67: Packet Generator for packet performance testing.\n /* #define PG_DEBUG(a) a */ #define PG_DEBUG(a) @@ -162,6 +165,8 @@ /* The buckets are exponential in 'width' */ #define LAT_BUCKETS_MAX 32 #define IP_NAME_SZ 32 +#define MAX_MPLS_LABELS 16 /* This is the max label stack depth */ +#define MPLS_STACK_BOTTOM __constant_htonl(0x0100) /* Device flag bits */ #define F_IPSRC_RND (10) /* IP-Src Random */ @@ -172,6 +177,7 @@ #define F_MACDST_RND (15) /* MAC-Dst Random */ #define F_TXSIZE_RND (16) /* Transmit size is random */ #define F_IPV6(17) /* Interface in IPV6 Mode */ +#define F_MPLS_RND(18) /* Random MPLS labels */ /* Thread control flag bits */ #define T_TERMINATE (10) @@ -278,6 +284,10 @@ struct pktgen_dev { __u16 udp_dst_min; /* inclusive, dest UDP port */ __u16 udp_dst_max; /* exclusive, dest UDP port */ + /* MPLS */ + unsigned nr_labels; /* Depth of stack, 0 = no MPLS */ + __be32 labels[MAX_MPLS_LABELS]; + __u32 src_mac_count;/* How many MACs to iterate through */ __u32 dst_mac_count;/* How many MACs to iterate through */ @@ -623,9 +633,19 @@ static int pktgen_if_show(struct seq_fil pkt_dev-udp_dst_min, pkt_dev-udp_dst_max); seq_printf(seq, - src_mac_count: %d dst_mac_count: %d \n Flags: , + src_mac_count: %d dst_mac_count: %d\n, pkt_dev-src_mac_count, pkt_dev-dst_mac_count); + if (pkt_dev-nr_labels) { + unsigned i; + seq_printf(seq, mpls: ); + for(i = 0; i pkt_dev-nr_labels; i++) + seq_printf(seq, %08x%s, ntohl(pkt_dev-labels[i]), + i == pkt_dev-nr_labels-1 ? \n : , ); + } + + seq_printf(seq, Flags: ); + if (pkt_dev-flags F_IPV6) seq_printf(seq, IPV6 ); @@ -644,6 +664,9 @@ static int pktgen_if_show(struct seq_fil if (pkt_dev-flags F_UDPDST_RND) seq_printf(seq, UDPDST_RND ); + if (pkt_dev-flags F_MPLS_RND) + seq_printf(seq, MPLS_RND ); + if (pkt_dev-flags