Re: [PATCH] bonding: Allow tun-interfaces as slaves
Ping. Not sure if we reached some conclusion yet. If we did, I must have missed it. Jörn -- If you managed to confuse the compiler, you lost your human readers a long time ago. -- hummassa
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Wed, Aug 10, 2016 at 05:58:38PM -0700, Jay Vosburgh wrote: > Jörn Engel <jo...@purestorage.com> wrote: > >On Wed, Aug 10, 2016 at 02:26:49PM -0700, Jörn Engel wrote: > >> > >> Having to set one more parameter is a bit annoying. It would have to be > >> documented in a prominent place and people would still often miss it. > >> So I wonder if we can make the interface a little nicer. > >> > >> Options: > >> - If there are no slaves yet and the first slave added is tun, we trust > >> the users to know what they are doing. Automatically set > >> bond->params.fail_over_mac = BOND_FOM_KEEPMAC > >> Maybe do a printk to inform the user in case of a mistake. > > I don't think this is feasible, as I don't see a reliable way to > test for a slave being a tun device (ARPHRD_NONE is not just tun, and we > cannot check the ops as they are not statically built into the kernel). > I'm also not sure that heuristics are the proper way to enable this > functionality in general. I was looking for a slightly more generic thing than "is this device tun?". Something along the lines of "is this device L3 only?". We can always introduce a new flag and have tun set the flag. Naïve me thought ARPHRD_NONE might already match what I was looking for. But if such an approach causes problems for others, it is a non-starter. > >> - If we get an error and the slave device is tun, do a printk giving the > >> user enough information to find this parameter. > > This could probably be done as a change the existing logic, e.g., > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 1f276fa30ba6..019c1a689aae 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -1443,6 +1443,9 @@ int bond_enslave(struct net_device *bond_dev, struct > net_device *slave_dev) > res = -EOPNOTSUPP; > goto err_undo_flags; > } > + } else if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP && > +bond->params.fail_over_mac != BOND_FOM_KEEPMAC) { > + netdev_err(bond_dev, "The slave device > specified does not support setting the MAC address, but fail_over_mac is not > set to keepmac\n"); > } > } > > I haven't tested this, and I'm not sure it will get all corner > cases correct, but this should basically cover it. Nit: Indentation is wrong (two tabs instead of one). It should provide enough information for anyone that reads kernel messages. Works for me. [588380.721349] bond1: Adding slave tun0 [588380.721402] bond1: The slave device specified does not support setting the MAC address [588380.721404] bond1: The slave device specified does not support setting the MAC address, but fail_over_mac is not set to keepmac Jörn -- It is easier to lead men to combat, stirring up their passion, than to restrain them and direct them toward the patient labours of peace. -- Andre Gide
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Wed, Aug 10, 2016 at 02:26:49PM -0700, Jörn Engel wrote: > > Having to set one more parameter is a bit annoying. It would have to be > documented in a prominent place and people would still often miss it. > So I wonder if we can make the interface a little nicer. > > Options: > - If there are no slaves yet and the first slave added is tun, we trust > the users to know what they are doing. Automatically set > bond->params.fail_over_mac = BOND_FOM_KEEPMAC > Maybe do a printk to inform the user in case of a mistake. > - If we get an error and the slave device is tun, do a printk giving the > user enough information to find this parameter. > > I'm leaning towards the former, but you probably know a reason why I am > wrong again. Patch below is an implementation of the former. Not sure if something like this is worth considering. Jörn -- To announce that there must be no criticism of the President, or that we are to stand by the President, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. -- Theodore Roosevelt, Kansas City Star, 1918 diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1f276fa30ba6..306909a44fab 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1482,8 +1482,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) */ ether_addr_copy(new_slave->perm_hwaddr, slave_dev->dev_addr); - if (!bond->params.fail_over_mac || - BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) { + if (bond_dev->type != ARPHRD_NONE && + (!bond->params.fail_over_mac || +BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) { /* Set slave to master's mac address. The application already * set the master's mac address to that of the first slave */ -- 2.1.4
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Tue, Aug 09, 2016 at 04:51:04PM -0700, Jay Vosburgh wrote: > > This will cause balance-rr to add the slave to the bond if any > device's dev_set_mac_address call fails. > > If a bond of regular Ethernet devices is connected to a static > link aggregation (Etherchannel channel group), a set_mac failure would > result in that slave having a different MAC address than the bond, which > in turn would cause traffic inbound from the switch to that slave to be > dropped (as the destination MAC would not pass the device MAC filters). > > The failure check for the set_mac call serves a legitimate > purpose, and I don't believe we should bypass it without making the > bypass an option that is explicitly enabled for those special cases that > need it. > > E.g., something like the following (which I have not tested); > this would also need documentation and iproute2 updates to go with it. > This would be enabled with "fail_over_mac=keepmac". Thank you! Tested-by: Jörn Engel <jo...@purestorage.com> Having to set one more parameter is a bit annoying. It would have to be documented in a prominent place and people would still often miss it. So I wonder if we can make the interface a little nicer. Options: - If there are no slaves yet and the first slave added is tun, we trust the users to know what they are doing. Automatically set bond->params.fail_over_mac = BOND_FOM_KEEPMAC Maybe do a printk to inform the user in case of a mistake. - If we get an error and the slave device is tun, do a printk giving the user enough information to find this parameter. I'm leaning towards the former, but you probably know a reason why I am wrong again. Jörn -- For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. -- Richard Feynman
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Tue, Aug 09, 2016 at 12:06:36PM -0700, David Miller wrote: > > On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote: > > > > Simply not checking errors when setting the mac address solves the > > problem for me. No new features needed. > > But it only works in certain modes. > > So the best we can do is enforce the MAC address setting in the > modes that absolutely require it. We cannot ignore the MAC > address setting unilaterally. Something like this? [PATCH] bonding: Allow tun-interfaces as slaves in balance-rr mode Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be used to enslave tun-interfaces. 00503b6f702e broke that behaviour, afaics as an unintended side-effect. For the purpose of bond-over-tun in balance-rr mode, simply ignoring the error from dev_set_mac_address() is good enough. Signed-off-by: Joern Engel--- drivers/net/bonding/bond_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1f276fa30ba6..2f686bfe4304 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1490,7 +1490,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len); addr.sa_family = slave_dev->type; res = dev_set_mac_address(slave_dev, ); - if (res) { + /* round-robin mode works fine without a mac address */ + if (res && BOND_MODE(bond) != BOND_MODE_ROUNDROBIN) { netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res); goto err_restore_mtu; } -- 2.1.4
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Tue, Aug 09, 2016 at 11:21:31AM -0700, Jay Vosburgh wrote: > Jörn Engel <jo...@purestorage.com> wrote: > >On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote: > >> > >> I don't understand your problem clearly, can you explain more about how > >> the 00503b6f702e break tun-interfaces > >> and we will try to fix it. > > > >Here is a trivial testcase: > >openvpn --mktun --dev tun0 > >echo +tun0 > /sys/class/net/bond0/bonding/slaves > > > >Worked fine before your patch, no longer works after your patch. Works > >again after my patch. > > Could you describe your use case a bit further? Are you bonding > together multiple VPN tunnels? Yes. Specificaly I use "ssh -w" to create tunnels. Ssh is single-threaded, so the tunnel is too slow. Aggregate a bunch and you get closer to link speed. Alternative would be pfSense. Afaics that easily beats anything Linux can offer. I'm just more familiar with Linux and trust ssh security more than most alternatives. > This may be a regression, but since the patch that nominally > introduced it was 2 years ago, the impact appears to be very narrow. Did you check the dates on the other two bug reports? Anyone experiencing the problem and checking google will come to the conclusion that you don't care and not bother sending yet another bug report. You then come to the conclusion that users don't care. > >> and more, dev_set_mac_address will change the salver's mac address, some > >> nic don't support to change the mac address and > >> could not work as bond slave, so we need to check the return value, I > >> don't think this patch has any effective improvement. > > > >Using bonding in balance-rr mode, there doesn't seem to be a need to > >change the mac address. I suppose you might care in other modes, but I > >don't. > > The balance-rr mode (as well as the -xor mode) is designed to > interoperate with a Cisco Etherchannel-style static link aggregation, > which requires all members to have the same MAC address for proper > function. Linux was designed to be a terminal for dialup to a university in Helsinki, if memory serves. Sometimes it is a good thing to work in ways the design never intended. Jörn -- A defeated army first battles and then seeks victory. -- Sun Tzu
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote: > > This patch is a simple solution for this problem, but I don't think it is the > right solution, the bond is a virtual device base on L2, > if the slave has no mac address, it will break the design principle, so we > need to think more about it. The important point is: it worked. It solved a problem that at least three people cared enough about to send a bug report. Now it doesn't work anymore. That is a regression. Whether or not L2 has always been a design principle for bonding can be argued as well. But in the face of a regression, I suggest we fix the regression. > I think if the bonding dev has to support L3 virtual device, we need to add > new bond features to distinguish the dev and make the > bond xmit and transfer without the mac address. Simply not checking errors when setting the mac address solves the problem for me. No new features needed. If you want to retain error handling, you can make those checks conditional on the mode. In balance-rr or broadcast mode, ignore the error. I don't need and haven't tested broadcast mode, but it doesn't seem to depend on any L2 attributes either. Jörn -- Do not stop an army on its way home. -- Sun Tzu
Re: [PATCH] bonding: Allow tun-interfaces as slaves
Hello Tianhong! On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote: > > I don't understand your problem clearly, can you explain more about how the > 00503b6f702e break tun-interfaces > and we will try to fix it. Here is a trivial testcase: openvpn --mktun --dev tun0 echo +tun0 > /sys/class/net/bond0/bonding/slaves Worked fine before your patch, no longer works after your patch. Works again after my patch. > and more, dev_set_mac_address will change the salver's mac address, some nic > don't support to change the mac address and > could not work as bond slave, so we need to check the return value, I don't > think this patch has any effective improvement. Using bonding in balance-rr mode, there doesn't seem to be a need to change the mac address. I suppose you might care in other modes, but I don't. Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt
[PATCH] bonding: Allow tun-interfaces as slaves
Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be used to enslave tun-interfaces. 00503b6f702e broke that behaviour, afaics as an unintended side-effect. For the purpose of bond-over-tun in balance-rr mode, simply ignoring the error from dev_set_mac_address() is good enough. I am not familiar enough with the code to judge what new problems this patch might introduce. Signed-off-by: Joern Engel--- drivers/net/bonding/bond_main.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1f276fa30ba6..bc5dba847f50 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1489,11 +1489,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) */ memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len); addr.sa_family = slave_dev->type; - res = dev_set_mac_address(slave_dev, ); - if (res) { - netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res); - goto err_restore_mtu; - } + dev_set_mac_address(slave_dev, ); } /* set slave flag before open to prevent IPv6 addrconf */ @@ -1777,7 +1773,6 @@ err_restore_mac: dev_set_mac_address(slave_dev, ); } -err_restore_mtu: dev_set_mtu(slave_dev, new_slave->original_mtu); err_free: -- 2.1.4
Re: [Regression] Bonding no longer support tun-interfaces
Redirected by Davem. Is there a mailing list or a maintainer for regressions? There used to be, but I've been out of the loop for a while. On Mon, Aug 08, 2016 at 02:15:30PM -0700, Jörn Engel wrote: > This has been reported (and ignored) before: > http://lkml.iu.edu/hypermail/linux/kernel/1407.2/03790.html > https://bugzilla.kernel.org/show_bug.cgi?id=89161 > > Regression was introduced by: > > commit 00503b6f702e (refs/bisect/bad) > Author: dingtianhong <dingtianh...@huawei.com> > Date: Sat Jan 25 13:00:29 2014 +0800 > > bonding: fail_over_mac should only affect AB mode at enslave and removal > processing > > According to bonding.txt, the fail_over_ma should only affect > active-backup mode, > but I found that the fail_over_mac could be set to active or follow in all > modes, this will cause new slave could not be set to bond's MAC address at > enslave processing and restore its own MAC address at removal processing. > > The correct way to fix the problem is that we should not add restrictions > when > setting options, just need to modify the bond enslave and removal > processing > to check the mode in addition to fail_over_mac when setting a slave's MAC > during > enslavement. The change active slave processing already only calls the > fail_over_mac > function when in active-backup mode. > > Thanks for Jay's suggestion. > > The patch also modify the pr_warning() to pr_warn(). > > Cc: Jay Vosburgh <fu...@us.ibm.com> > Cc: Veaceslav Falico <vfal...@redhat.com> > Cc: Andy Gospodarek <a...@greyhouse.net> > Signed-off-by: Ding Tianhong <dingtianh...@huawei.com> > Signed-off-by: David S. Miller <da...@davemloft.net> > > Since I never needed bonding or tun-interfaces before, I come late to > the party. Some 6k lines have changed in the bonding driver since the > regression got in two years ago. So a simple revert is unlikely to lead > to happiness. > > But I absolutely need that functionality and would rather run a 3.13 > kernel than live with the regression. dingtianhong, any suggestions? > > Jörn > > -- > It is a cliché that most clichés are true, but then, like most clichés, > that cliché is untrue. > -- Stephen Fry Jörn -- I can say that I spend most of my time fixing bugs even if I have lots of new features to implement in mind, but I give bugs more priority. -- Andrea Arcangeli, 2000
Re: [alsa-devel] [BUG] New Kernel Bugs
On Thu, 15 November 2007 13:26:51 +0100, Rene Herman wrote: Can you please just shelve this crap? You have a way of knowing that ALSA will accept you and that is knowing or assuming that the ALSA project doesn't consist of drooling retards. Well, my experience with moderation has been that moderated mails are stuck in some queue for weeks. Two seperate lists, neither of them was alsa. If also is doing a better job, great. But it still has to live with the general reputation of non-subscriber moderation. When a project list goes to the difficulty of moderating non-subscribers it has made the explicit choice to _not_ become subscriber only. Then refusing valid non-subscribers after all makes no sense whatsoever. I'm sorry you got your feelings hurt by that other list but it was no doubt an accident; take it up with them. Been there, done that. In spite of people not being drooling retards, the amount of time and effort they invest into either moderation or improving the ruleset is quite limited. Problems persist. And even without mails being held hostage for weeks, every single moderation mail is annoying. Like the one I'm sure to receive after sending this out. Jörn -- Joern's library part 5: http://www.faqs.org/faqs/compression-faq/part2/section-9.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Tue, 13 November 2007 15:18:07 -0500, Mark Lord wrote: I just find it weird that something can be known broken for several -rc* kernels before I happen to install it, discover it's broken on my own machine, and then I track it down, fix it, and submit the patch, generally all within a couple of hours. Where the heck was the dude(ess) that broke it ?? AWOL. And when I receive hostility from the maintainers of said code for fixing their bugs, well.. that really motivates me to continue reporting new ones.. Given a decent bug report, I agree that having the bug not looked at is shameful. But what can a developer do if a bug report effectively reads there is some bug somewhere in recent kernels? How can I know that in this particular case it is my bug that I introduced? It could just as easily be 50 other people and none of them are eager to debug it unless they suspect it to be their bug. This is a common problem and fairly unrelated to linux in general or the kernel in particular. Who is going to be the sucker that figures out which developer the bug belongs to? And I have yet to find a project, commercial or opensource, where volunteers flock to become such a sucker. One option is to push this role to the bug reporter. Another is to strong-arm some developers into this role, by whatever means. A third would be for $LARGE_COMPANY to hire some people. If you have a better idea or would volunteer your time, I'd be grateful. Simply blaming one side, whether bug reporter or a random developer, for not being the sucker doesn't help anyone. Jörn -- Joern's library part 2: http://www.art.net/~hopkins/Don/unix-haters/tirix/embarrassing-memo.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Tue, 13 November 2007 13:56:58 -0800, Andrew Morton wrote: It's relatively common that a regression in subsystem A will manifest as a failure in subsystem B, and the report initially lands on the desk of the subsystem B developers. But that's OK. The subsystem B people are the ones with the expertise to be able to work out where the bug resides and to help the subsystem A people understand what went wrong. Alas, sometimes the B people will just roll eyes and do nothing because they know the problem wasn't in their code. Sometimes. And sometimes the A people will ignore the B people after the root cause has been worked out. Do you have a good idea how to shame A into action? Should I put you on Cc:? Right now I'm in the eye-rolling phase. Jörn -- The cost of changing business rules is much more expensive for software than for a secretaty. -- unknown - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic
On Fri, 3 August 2007 14:41:19 +0200, Jan-Bernd Themann wrote: This patch provides generic Large Receive Offload (LRO) functionality for IPv4/TCP traffic. LRO combines received tcp packets to a single larger tcp packet and passes them then to the network stack in order to increase performance (throughput). The interface supports two modes: Drivers can either pass SKBs or fragment lists to the LRO engine. Maybe this is a stupid question, but why is LRO done at the device driver level? If it is a unversal performance benefit, I would have expected it to be done generically, i.e. have all packets moved into network layer pass through LRO instead. +void lro_flush_pkt(struct net_lro_mgr *lro_mgr, +struct iphdr *iph, struct tcphdr *tcph); In particular this bit looks like it should be driven by a timeout, which would be settable via /proc/sys/net/core/lro_timeout or similar. Jörn -- Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. -- M.A. Jackson - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.19 PATCH 1/7] ehea: interface to network stack
On Mon, 21 August 2006 14:23:53 +0200, Jan-Bernd Themann wrote: Is it valid (common in the kernel environment) to treat NULL as 0 after a memset and thus to forget about initialization? Yes. According to C99, An implementation might conveivably have codes for floating zero and/or null pointer other than all bits zero. But as long as you don't use floating point (you shouldn't) and don't redefine NULL to something other than (void*)0, this is rather inconceivable for the kernel. Jörn -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.19 PATCH 3/7] ehea: queue management
On Fri, 18 August 2006 15:25:11 +0200, Thomas Klein wrote: Arjan van de Ven wrote: + queue-queue_length = nr_of_pages * pagesize; + queue-queue_pages = vmalloc(nr_of_pages * sizeof(void *)); wow... is this really so large that it warrants a vmalloc()??? Agreed: Replaced with kmalloc() kzalloc() and you can remove the memset() as well. Jörn -- Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. -- Rob Pike - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.19 PATCH 3/7] ehea: queue management
On Fri, 18 August 2006 13:31:19 +0200, Jan-Bernd Themann wrote: + if (queue-current_q_offset queue-queue_length) { + queue-current_q_offset -= queue-pagesize; + retvalue = NULL; + } + else if u64) retvalue) (EHEA_PAGESIZE-1)) != 0) { } else if (((u64) retvalue) (EHEA_PAGESIZE-1)) { + if (hret H_SUCCESS) { Do you have a reason to keep H_SUCCESS? + if(qp_attr-rq_count 1) + hw_queue_dtor(qp-hw_rqueue2); + if(qp_attr-rq_count 2) Small amount of whitespace damage. Jörn -- Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. -- Doug MacIlroy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] ehea: interface to network stack
On Sat, 12 August 2006 06:56:24 +1000, Anton Blanchard wrote: + + skb_index = ((index - i + + port_res-skb_arr_sq_len) +% port_res-skb_arr_sq_len); This is going to force an expensive divide. Its much better to change this to the simpler and quicker: i++; if (i max) i = 0; Is a conditional cheaper than a divide? In case of a misprediction I would assume it to be significantly slower and I don't know the ratio of mispredictions for this branch. Jörn -- Victory in war is not repetitious. -- Sun Tzu - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] ehea: queue management
On Fri, 11 August 2006 09:28:26 +0200, Thomas Klein wrote: Michael Neuling wrote: +static inline u32 map_swqe_size(u8 swqe_enc_size) +static inline u32|map_rwqe_size(u8 rwqe_enc_size) Agreed. Functions were replaced by a single map_wqe_size() function. Just a general thing, try to avoid having two identifiers that are near-100% identical. As seen in this thread, they are _very_ easy to confuse. Ime, there are two methods to avoid this. One is to make the identifiers longer, something like map_seek_wqe_size and map_read_wqe_size. The other is to make them shorter, just s and r is less confusing than the above. Which method works best depends on many things, including personal taste. Jörn -- And spam is a useful source of entropy for /dev/random too! -- Jasmine Strong - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
On Tue, 18 July 2006 23:08:01 +0400, Evgeniy Polyakov wrote: On Tue, Jul 18, 2006 at 02:15:17PM +0200, J?rn Engel ([EMAIL PROTECTED]) wrote: Your description makes it sound as if you would take a huge leap, changing all in-kernel code _and_ the userspace interface in a single patch. Am I wrong? Or am I right and would it make sense to extract small incremental steps from your patch similar to those Van did in his non-published work? My first implementation used existing kernel code and showed small performance win - there was binding of the socket to netchannel and all protocol processing was moved into process context. Iirc, Van didn't show performance numbers but rather cpu utilization numbers. And those went down significantly without changing the userspace interface. Did you look at cpu utilization as well? If you did and your numbers are worse than Vans, he either did something smarter than you or forged his numbers (quite unlikely). Jörn -- Sometimes, asking the right question is already the answer. -- Unknown - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
On Tue, 18 July 2006 12:16:26 +0400, Evgeniy Polyakov wrote: Current tests with the latest netchannel patch show that netchannels outperforms sockets in any type of bulk transfer (big-sized, small-sized, sending, receiving) over 1gb wire. I omit graphs and numbers here, since I posted it already several times. I also plan to proceed some negotiations which would allow to test netchannel support in 10gbit environment, but it can also happen after second development stage completed. [ I don't have enough time for a deeper look. So if my questions are stupid, please just tell me so and don't take it personal. ] After having seen Van Jacobson's presentation at LCA twice, it appeared to me that Van could get astonishing speedups with small incremental steps, only changing kernel code and leaving the kernel-userspace interface as is. Changing (or rather adding a new) the userspace interface was just the last step, which also gave some performance benefits but is also a change to the userspace interface and therefore easy to get wrong and hard to fix later. Your description makes it sound as if you would take a huge leap, changing all in-kernel code _and_ the userspace interface in a single patch. Am I wrong? Or am I right and would it make sense to extract small incremental steps from your patch similar to those Van did in his non-published work? Jörn -- When people work hard for you for a pat on the back, you've got to give them that pat. -- Robert Heinlein - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Van Jacobson's net channels and real-time
On Sat, 22 April 2006 15:29:58 +0200, Ingo Oeser wrote: On Saturday, 22. April 2006 13:48, Jörn Engel wrote: Unless I completely misunderstand something, one of the main points of the netchannels if to have *zero* fields written to by both producer and consumer. Hmm, for me the main point was to keep the complete processing of a single packet within one CPU/Core where this is a non-issue. That was another main point, yes. And the endpoints should be as little burden on the bottlenecks as possible. One bottleneck is the receive interrupt, which shouldn't wait for cachelines from other cpus too much. Jörn -- Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life. -- Charles Shultz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Van Jacobson's net channels and real-time
On Fri, 21 April 2006 18:52:47 +0200, Ingo Oeser wrote: What about sth. like struct netchannel { /* This is only read/written by the writer (producer) */ unsigned long write_ptr; struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES]; /* This is modified by both */ atomic_t filled_entries; /* cache_line_align this? */ /* This is only read/written by the reader (consumer) */ unsigned long read_ptr; } This would prevent this bug from the beginning and let us still use the full queue size. If cacheline bouncing because of the shared filled_entries becomes an issue, you are receiving or sending a lot. Unless I completely misunderstand something, one of the main points of the netchannels if to have *zero* fields written to by both producer and consumer. Receiving and sending a lot can be expected to be the common case, so taking a performance hit in this case is hardly a good idea. I haven't looked at Davem's implementation at all, but Van simply seperated fields in consumer-written and producer-written, with proper alignment between them. Some consumer-written fields are also read by the producer and vice versa. But none of this results in cacheline pingpong. If your description of the problem is correct, it should only mean that the implementation has a problem, not the concept. Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC: 2.6 patch] net/core/: possible cleanups
On Thu, 13 April 2006 13:58:28 +0200, Adrian Bunk wrote: - make the following trivial wrapper function a static inline: - neighbour.c: neigh_parms_destroy() Am I missing something or is there just a single caller? In that case, neigh_parms_put() could call kfree() directly and neigh_parms_destroy() is free for termination. Jörn -- [One] doesn't need to know [...] how to cause a headache in order to take an aspirin. -- Scott Culp, Manager of the Microsoft Security Response Center, 2001 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Uninline kfree_skb and allow NULL argument
On Thu, 23 February 2006 12:48:01 -0800, David S. Miller wrote: If you wish to contribute to a software project, you should adhere to the coding style and conventions of that project when submitting changes. It doesn't matter what the reasons are for those conventions, you should follow them until the projects decides to change them. Agreed. If you wish to discuss the merits of putting extern there or not in function declarations, you can start a thread about that and make proposals on linux-kernel. Not really. According to my limited understanding of C, the presence or absence of extern shouldn't make a difference either way. If this understanding if correct and people simply prefer to have the extern, that's perfectly fine with me. But if it does indeed make a difference, I'd like to learn something new. So place add extern here, thanks a lot. Done. Sorry about that. Jörn -- The cost of changing business rules is much more expensive for software than for a secretaty. -- unknown o Uninline kfree_skb, which saves some 15k of object code on my notebook. o Allow kfree_skb to be called with a NULL argument. Subsequent patches can remove conditional from drivers and further reduce source and object size. Signed-off-by: Jörn Engel [EMAIL PROTECTED] --- include/linux/skbuff.h | 17 + net/core/skbuff.c | 18 ++ 2 files changed, 19 insertions(+), 16 deletions(-) --- kfree_skb/include/linux/skbuff.h~kfree_skb_uninline_null2006-02-23 13:35:05.0 +0100 +++ kfree_skb/include/linux/skbuff.h2006-02-23 13:36:23.0 +0100 @@ -306,6 +306,7 @@ struct sk_buff { #include asm/system.h +extern void kfree_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); extern struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int fclone); @@ -406,22 +407,6 @@ static inline struct sk_buff *skb_get(st */ /** - * kfree_skb - free an sk_buff - * @skb: buffer to free - * - * Drop a reference to the buffer and free it if the usage count has - * hit zero. - */ -static inline void kfree_skb(struct sk_buff *skb) -{ - if (likely(atomic_read(skb-users) == 1)) - smp_rmb(); - else if (likely(!atomic_dec_and_test(skb-users))) - return; - __kfree_skb(skb); -} - -/** * skb_cloned - is the buffer a clone * @skb: buffer to check * --- kfree_skb/net/core/skbuff.c~kfree_skb_uninline_null 2006-02-23 13:35:05.0 +0100 +++ kfree_skb/net/core/skbuff.c 2006-02-23 13:37:01.0 +0100 @@ -355,6 +355,24 @@ void __kfree_skb(struct sk_buff *skb) } /** + * kfree_skb - free an sk_buff + * @skb: buffer to free + * + * Drop a reference to the buffer and free it if the usage count has + * hit zero. + */ +void kfree_skb(struct sk_buff *skb) +{ + if (unlikely(!skb)) + return; + if (likely(atomic_read(skb-users) == 1)) + smp_rmb(); + else if (likely(!atomic_dec_and_test(skb-users))) + return; + __kfree_skb(skb); +} + +/** * skb_clone - duplicate an sk_buff * @skb: buffer to clone * @gfp_mask: allocation priority - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Uninline kfree_skb and allow NULL argument
On Fri, 24 February 2006 00:32:04 -0800, David S. Miller wrote: One more problem :-) Please export the kfree_skb() symbol so networking modules don't break. Doh! Done. Any more reasons to wear a brown paperbag? Jörn -- When in doubt, punt. When somebody actually complains, go back and fix it... The 90% solution is a good thing. -- Rob Landley o Uninline kfree_skb, which saves some 15k of object code on my notebook. o Allow kfree_skb to be called with a NULL argument. Subsequent patches can remove conditional from drivers and further reduce source and object size. Signed-off-by: Jörn Engel [EMAIL PROTECTED] --- include/linux/skbuff.h | 17 + net/core/skbuff.c | 19 +++ 2 files changed, 20 insertions(+), 16 deletions(-) --- kfree_skb/include/linux/skbuff.h~kfree_skb_uninline_null2006-02-23 13:35:05.0 +0100 +++ kfree_skb/include/linux/skbuff.h2006-02-24 09:35:47.0 +0100 @@ -306,6 +306,7 @@ struct sk_buff { #include asm/system.h +extern void kfree_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); extern struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int fclone); @@ -406,22 +407,6 @@ static inline struct sk_buff *skb_get(st */ /** - * kfree_skb - free an sk_buff - * @skb: buffer to free - * - * Drop a reference to the buffer and free it if the usage count has - * hit zero. - */ -static inline void kfree_skb(struct sk_buff *skb) -{ - if (likely(atomic_read(skb-users) == 1)) - smp_rmb(); - else if (likely(!atomic_dec_and_test(skb-users))) - return; - __kfree_skb(skb); -} - -/** * skb_cloned - is the buffer a clone * @skb: buffer to check * --- kfree_skb/net/core/skbuff.c~kfree_skb_uninline_null 2006-02-23 13:35:05.0 +0100 +++ kfree_skb/net/core/skbuff.c 2006-02-24 09:36:15.0 +0100 @@ -355,6 +355,24 @@ void __kfree_skb(struct sk_buff *skb) } /** + * kfree_skb - free an sk_buff + * @skb: buffer to free + * + * Drop a reference to the buffer and free it if the usage count has + * hit zero. + */ +void kfree_skb(struct sk_buff *skb) +{ + if (unlikely(!skb)) + return; + if (likely(atomic_read(skb-users) == 1)) + smp_rmb(); + else if (likely(!atomic_dec_and_test(skb-users))) + return; + __kfree_skb(skb); +} + +/** * skb_clone - duplicate an sk_buff * @skb: buffer to clone * @gfp_mask: allocation priority @@ -1807,6 +1825,7 @@ void __init skb_init(void) EXPORT_SYMBOL(___pskb_trim); EXPORT_SYMBOL(__kfree_skb); +EXPORT_SYMBOL(kfree_skb); EXPORT_SYMBOL(__pskb_pull_tail); EXPORT_SYMBOL(__alloc_skb); EXPORT_SYMBOL(pskb_copy); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow kfree_skb to be called with a NULL argument
On Thu, 23 February 2006 19:28:49 +1100, Herbert Xu wrote: On Thu, Feb 23, 2006 at 07:53:36AM +0100, J?rn Engel wrote: How is that argument special for kfree_skb? Both libc free and kfree ignore NULL arguments and do so for good reasons. Well with kfree there is actually a slight gain in that you are doing the check in one place. kfree_skb on the other hand is inlined so the you're actually adding bloat to many places that simply don't need it. Wrt. the binary, you have a point. For source code, my patch does not any new bloat and allows removal of the existing. Lemme do a quick measurement for the kernel I run on my machine: -rwxr-xr-x 1 joern src 4836592 Feb 23 10:43 vmlinux -rwxr-xr-x 1 joern src 4836727 Feb 23 10:19 vmlinux.kfree_null 135 bytes added by my patch. Not that much. Jörn -- He who knows others is wise. He who knows himself is enlightened. -- Lao Tsu - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow kfree_skb to be called with a NULL argument
On Thu, 23 February 2006 03:11:12 -0800, David S. Miller wrote: Now there's a good idea. After all, the great majority of callers of kfree_skb expect to free the skb. Dave, what do you think? Absolutely. Should I merge the two patches into one and resend? Jörn -- If you're willing to restrict the flexibility of your approach, you can almost always do something better. -- John Carmack - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Uninline kfree_skb and allow NULL argument
On Thu, 23 February 2006 22:26:01 +1100, Herbert Xu wrote: On Thu, Feb 23, 2006 at 12:22:31PM +0100, J?rn Engel wrote: Should I merge the two patches into one and resend? Sounds good. Here it is. Jörn -- Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. -- Rob Pike o Uninline kfree_skb, which saves some 15k of object code on my notebook. o Allow kfree_skb to be called with a NULL argument. Subsequent patches can remove conditional from drivers and further reduce source and object size. Signed-off-by: Jörn Engel [EMAIL PROTECTED] --- include/linux/skbuff.h | 17 + net/core/skbuff.c | 18 ++ 2 files changed, 19 insertions(+), 16 deletions(-) --- kfree_skb/include/linux/skbuff.h~kfree_skb_uninline_null2006-02-23 13:35:05.0 +0100 +++ kfree_skb/include/linux/skbuff.h2006-02-23 13:36:23.0 +0100 @@ -306,6 +306,7 @@ struct sk_buff { #include asm/system.h +void kfree_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); extern struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int fclone); @@ -406,22 +407,6 @@ static inline struct sk_buff *skb_get(st */ /** - * kfree_skb - free an sk_buff - * @skb: buffer to free - * - * Drop a reference to the buffer and free it if the usage count has - * hit zero. - */ -static inline void kfree_skb(struct sk_buff *skb) -{ - if (likely(atomic_read(skb-users) == 1)) - smp_rmb(); - else if (likely(!atomic_dec_and_test(skb-users))) - return; - __kfree_skb(skb); -} - -/** * skb_cloned - is the buffer a clone * @skb: buffer to check * --- kfree_skb/net/core/skbuff.c~kfree_skb_uninline_null 2006-02-23 13:35:05.0 +0100 +++ kfree_skb/net/core/skbuff.c 2006-02-23 13:37:01.0 +0100 @@ -355,6 +355,24 @@ void __kfree_skb(struct sk_buff *skb) } /** + * kfree_skb - free an sk_buff + * @skb: buffer to free + * + * Drop a reference to the buffer and free it if the usage count has + * hit zero. + */ +void kfree_skb(struct sk_buff *skb) +{ + if (unlikely(!skb)) + return; + if (likely(atomic_read(skb-users) == 1)) + smp_rmb(); + else if (likely(!atomic_dec_and_test(skb-users))) + return; + __kfree_skb(skb); +} + +/** * skb_clone - duplicate an sk_buff * @skb: buffer to clone * @gfp_mask: allocation priority - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Allow kfree_skb to be called with a NULL argument
Fairly trivial. The extra conditional should get optimized away with current code. But it also allows to walk through network drivers and get rid of the permanent if (skb) kfree(skb); conditionals. Jörn -- Unless something dramatically changes, by 2015 we'll be largely wondering what all the fuss surrounding Linux was really about. -- Rob Enderle Allow kfree_skb to be called with a NULL argument. This allows slight simplification of many drivers. Signed-off-by: Jörn Engel [EMAIL PROTECTED] --- include/linux/skbuff.h |2 ++ 1 file changed, 2 insertions(+) --- kfree_skb/include/linux/skbuff.h~kfree_skb_null 2006-02-22 08:42:22.0 +0100 +++ kfree_skb/include/linux/skbuff.h2006-02-22 08:36:29.0 +0100 @@ -414,6 +414,8 @@ static inline struct sk_buff *skb_get(st */ static inline void kfree_skb(struct sk_buff *skb) { + if (unlikely(!skb)) + return; if (likely(atomic_read(skb-users) == 1)) smp_rmb(); else if (likely(!atomic_dec_and_test(skb-users))) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Some infrastructure for interrupt-less TX
On Wed, 22 February 2006 12:37:48 -0800, Caitlin Bestler wrote: [EMAIL PROTECTED] wrote: Below patch wasn't even compile tested. I'm not involved with network drivers anymore, so my personal interest is fairly low. But since I firmly believe in the advantages and feasibility of interrupt-less TX, there should at least be an ugly broken patch to flame about. Go for it, tell me how stupid I am! Jörn I am assuming the real goal is avoiding interrupts when transmit completions can be reported without them on a reasonably periodic basis. Not necessarily on a periodic basis. For some network driver I once worked on, the hardware simply had a ring buffer of n frames. Whenever a n+1th frame was transmitted, the first would be checked for completion. If it was completed, it was freed, else the new frame was dropped (and freed). So for this driver, the hardware permanently owned n memory areas which would never get freed. Nice performance at the cost of a little wasted memory. Alternatively you could set a timer as well, sure. Wouldn't that goal be achievable by the type of transmit buffer ring implied for net channels? Possibly. I don't really understand the transmit side of net channels yet. But the principle should be the same. Whatever data structures the kernel need on top of the raw packet is freed early, the raw packet is handed over to the hardware and freed late. Jörn -- There's nothing better for promoting creativity in a medium than making an audience feel Hmm I could do better than that! -- Douglas Adams in a slashdot interview - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] PHY Abstraction Layer III (now with more splitiness)
On Wed, 27 July 2005 14:34:19 -0700, Randy Dunlap wrote: Ok, here I won't agree to disagree with you. !foo as a check for NULL is a reasonable idea, but not my style. If that's the preferred style for the kernel, I will do that. But (var == constant) is a style that asks for errors. By putting the constant first in these checks, you never run the risk of leaving a bug like this: if (dev = NULL) ... This kind of error is quite frustrating to detect, and the eye will often miss it when scanning for errors. If you follow constant == var, though, then the bug looks like this: if (NULL = dev) which is instantly caught by the compiler. Just my 32 cents Yes, we know about that argument. :) The counter-argument basically goes like this: 1. All relevant compilers (GCC) warn on if (dev = NULL), so you will only miss the bug if you ignore compiler warnings. Ignoring compiler warnings is not generally endorsed by the kernel crowd. 2. Very hard to read, if (NULL = dev) is. Reversing the order is a fun thing to do for small green characters in fantasy and scifi stories and fairly popular in peotry as well. But understanding the meaning of reverse order sentences takes more time. In the kernel, peer review is an important aspect and making the code hard to read hurts peer review. And maybe you can add another one: 3. Im my personal experience, reverse order comparisons were a good indicator of buggy code. Jörn -- Schrödinger's cat is BLINKnot/BLINK dead. -- Illiad - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html