Re: tc-ipt v0.2: Extension does not know id 1504083504
On 2017-10-01 07:25, Sergey K. wrote: Hi Corey. I did it on your recommendation, replaced xtables.h file from my version of iptables 1.6.0, and replaced the file netfilter.h. Now it's works, but new construction doesn't: # tc filter add dev eth0 parent : u32 match u32 0 0 action xt -j SET --map-set WORLD_QoS dst xt: unrecognized option '--map-set' failed to find target (null) I looked into this for a bit, and it seems to be due to the SET extension not supporting the x6-style options (I don't know the proper name for it). From the iptables source: $ grep -c x6 extensions/libxt_MARK.c 9 $ grep -c x6 extensions/libxt_SET.c 0 I got lost in gdb for a while and didn't come up with an easy answer. The options in are in the struct, just not where tc is looking. (gdb) print *m->extra_opts $3 = {name = 0x76f1dad8 "add-set", has_arg = 1, flag = 0x0, val = 49} (gdb) print *(m->extra_opts+4) $7 = {name = 0x76f1db03 "map-set", has_arg = 1, flag = 0x0, val = 53} I don't know if the problem is in iptables or in tc. Somebody familiar with the code and its history could probably answer it easily, but I'm not at all familiar with it. Sorry. Hopefully somebody else will jump in and help you out. -Corey
Re: tc-ipt v0.2: Extension does not know id 1504083504
On 2017-09-29 23:34, Sergey K. wrote: Hello. I have to apply this patch https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/commit/?h=v4.10.0=97a02cabefb2e2dcfe27f89943709afa84be5525 to my version of iproute2. But now I have a new information message, when I'm using construction like this: tc filter add dev eth0 parent : u32 match u32 0 0 action xt -j MARK --set-mark 0 message text: tc-ipt v0.2: Extension does not know id 1504083504 I'm using Debian Stretch, kernel 4.9.0-3-amd64, iptables 1.6.0 and patched iproute 4.9.0 How to solve? Funny, I just ran into this too and subscribed here to report it. The error occurs during parsing of any options to the jump target; if the target has no options, there is no error. The problem seems to be an outdated version of struct xtables_target in include/xtables.h. The version in iptables has an additional member "udata" that makes the offsets in the struct different for anything following. A quick fix for this particular problem is to copy include/xtables.h from: git://git.netfilter.org/iptables ...into include/ in the iproute2 source, then recompile after a 'make clean'. As for a comprehensive fix, I don't know--presumably other headers in include/ may be out of date, but I don't want to just blindly send a patch unless someone who knows the ramifications says it's ok. This seems like it would need maintainer oversight. If there's something I can do, though, let me know. -Corey
Re: [NET_SCHED 00/04]: External SFQ classifiers/flow classifier
Patrick McHardy wrote: You're missing protocol, handle etc. Try something like this: tc filter add dev eth0 protocol ip pref 1 parent 1: handle 1 \ flow hash keys dst divisor 1024 Thanks, the kernel accepts that. I guess I understand tc filter usage less than I thought I did Time to teach myself better. -Corey -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel panic on 2.6.24 with esfq patch applied
Denys Fedoryshchenko wrote: Hi Probably bug related to ESFQ, now i will unload module and will test more. But probably not related, so if not difficult, please take a look. Feb 1 09:08:50 SERVER [12380.067104] BUG: unable to handle kernel NULL pointer dereference Feb 1 09:08:50 SERVER at virtual address 0008 Feb 1 09:08:50 SERVER [12380.067140] printing eip: c01f10ed Feb 1 09:08:50 SERVER *pde = Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067162] Oops: [#1] Feb 1 09:08:50 SERVER SMP Feb 1 09:08:50 SERVER Feb 1 09:08:50 SERVER [12380.067181] Modules linked in: Feb 1 09:08:50 SERVER netconsole Feb 1 09:08:50 SERVER configfs Feb 1 09:08:50 SERVER iTCO_wdt Feb 1 09:08:50 SERVER nf_nat_pptp Feb 1 09:08:50 SERVER nf_conntrack_pptp Feb 1 09:08:50 SERVER nf_conntrack_proto_gre Feb 1 09:08:50 SERVER nf_nat_proto_gre Feb 1 09:08:50 SERVER sch_esfq I'd rather you were using my recent patches to SFQ instead of ESFQ. I was able to crash a 2.6.24 user-mode Linux with ESFQ as well; I don't know if that's what you encountered, but the SFQ patches should be better anyway. http://fatooh.org/esfq-2.6/sfq-2.6.24.tar.bz2 http://fatooh.org/esfq-2.6/ -Corey -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel panic on 2.6.24 with esfq patch applied
Jarek Poplawski wrote: Hi Corey! I've just had a look at your site, and see you're a bit disappointed about the reception of your patches here. I don't use nor know about your work (maybe some time...), and wish you better public here, but maybe you'll find this story interesting: a few years ago, when I didn't even think of reading netdev, I quite often visited pages of such projects as HTB (not in every distro yet), IMQ or Julian Anastasov's. They were quite popular subjects on admins lists, and I really admired people around these projects, while the main kernel was something big and anonymous (no idea e.g. about D. Miller). For the same reason I knew more about people from iptables project - only because there was a need to visit this for some non standard modules, and about A. Kuznetsov - only because of patching iproute for HTB!!! (Only later I found that Alexey did more than needed for several such HTBs.) So, it seems there could be good side of such unofficial status too - if you only have something really useful for others... Thank you for the words of encouragement, but don't worry: I'm not going to run away and hide just yet. :) I was a bit discouraged about not getting reviews for the more recent patch submissions, though, and I hope I don't come across as bitter about that. Patch review is a lot of work, and I definitely appreciate the comments I received earlier--I was able to make many improvements as a result. In any case, kernel development progresses, and I now have to update my patches so they can apply to current git. The ball has rolled back into my court of its own accord; perhaps the playing field isn't level. ;) -Corey -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED 00/04]: External SFQ classifiers/flow classifier
Patrick McHardy wrote: These patches add support for external classifiers to SFQ and add a new flow classifier, which can do hashing based on user-specified keys or deterministic mapping of keys to classes. Additionally there is a patch to make the SFQ queues visisble as classes to verify that the hash is indeed doing something useful and a patch to consifiy struct tcf_ext_map, which I had queued in the same tree. Excellent! I'm glad this is applied. I'm having trouble figuring out how it works, though. As a test, I'm trying to set up SFQ equivalent to ESFQ's hash dst. Here's what I do, and this is what I get: # ./tc qdisc add dev eth0 root handle 1: sfq # ./tc filter add dev eth0 parent 1: flow hash keys dst RTNETLINK answers: Invalid argument We have an error talking to the kernel I've tried a few different keys with the same results. I don't know what I'm doing wrong, or even where to start figuring it out. Can you point me in the right direction? Here are some details that may be pertinent: - current net-2.6 git (I double-checked) - CONFIG_NET_CLS_FLOW=y - current iproute2 git - running on amd64 user-mode Linux Thanks, Corey -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SFQ: backport some features from ESFQ (try 5)
Corey Hickey wrote: Patchset try 2 addresses the review by Michael Buesch. Patchset try 3 addresses the review by Patrick McHardy. Patchset try 4 has a few cosmetic improvements. Patchset try 5 addresses further review by Patrick McHardy. This set of patches is substantially the same as my previous try, with changes made according to Patrick's recommendations. Any comments? Patrick? I know you're busy, but you've been sending reviews for my patches so far and yet another one would be much appreciated. Thanks, Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] Preparatory refactoring part 2.
Factor code out of sfq_init() so that the new function can be used by sfq_change() later. Actually, as the diff itself shows, most of the sfq_q_init() code comes from the original sfq_change(), but sfq_change() is only called by sfq_init() right now. Thus, it is safe to remove sfq_change(); tc qdisc change doesn't yet work for sfq anyway. Setting default parameters is moved into a separate function for clarity. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 88 +++--- 1 files changed, 47 insertions(+), 41 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 10e2f3d..8ea816a 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -413,43 +413,41 @@ static void sfq_perturbation(unsigned long arg) mod_timer(q-perturb_timer, jiffies + q-perturb_period); } -static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +static void +sfq_default_parameters(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - struct tc_sfq_qopt *ctl = RTA_DATA(opt); - unsigned int qlen; - - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) - return -EINVAL; - - sch_tree_lock(sch); - q-quantum = ctl-quantum ? : psched_mtu(sch-dev); - q-perturb_period = ctl-perturb_period*HZ; - if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH - 1); - qlen = sch-q.qlen; - while (sch-q.qlen q-limit) - sfq_drop(sch); - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); - - del_timer(q-perturb_timer); - if (q-perturb_period) { - mod_timer(q-perturb_timer, jiffies + q-perturb_period); - get_random_bytes(q-perturbation, 4); - } - sch_tree_unlock(sch); - return 0; + q-quantum= psched_mtu(sch-dev); + q-perturbation = 0; + q-perturb_period = 0; + q-limit = SFQ_DEPTH - 1; } -static int sfq_init(struct Qdisc *sch, struct rtattr *opt) +static int +sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { - struct sfq_sched_data *q = qdisc_priv(sch); int i; - init_timer(q-perturb_timer); - q-perturb_timer.data = (unsigned long)sch; - q-perturb_timer.function = sfq_perturbation; + /* At this point, parameters are set to either defaults (sfq_init) or +* the previous values (sfq_change). So, overwrite the parameters as +* specified. */ + if (opt) { + struct tc_sfq_qopt *ctl = RTA_DATA(opt); + + if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) + return -EINVAL; + + if (ctl-quantum) + q-quantum = ctl-quantum; + if (ctl-perturb_period) + q-perturb_period = ctl-perturb_period * HZ; + if (ctl-limit) + q-limit = ctl-limit; + } + q-limit = min_t(u32, q-limit, SFQ_DEPTH - 1); + q-tail = SFQ_DEPTH; + q-max_depth = 0; for (i=0; iSFQ_HASH_DIVISOR; i++) q-ht[i] = SFQ_DEPTH; @@ -458,23 +456,31 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; } - q-limit = SFQ_DEPTH - 1; - q-max_depth = 0; - q-tail = SFQ_DEPTH; - if (opt == NULL) { - q-quantum = psched_mtu(sch-dev); - q-perturb_period = 0; - get_random_bytes(q-perturbation, 4); - } else { - int err = sfq_change(sch, opt); - if (err) - return err; - } for (i=0; iSFQ_DEPTH; i++) sfq_link(q, i); return 0; } +static int sfq_init(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + int err; + + sfq_default_parameters(sch); + if ((err = sfq_q_init(q, opt))) + return err; + + init_timer(q-perturb_timer); + q-perturb_timer.data = (unsigned long)sch; + q-perturb_timer.function = sfq_perturbation; + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } + + return 0; +} + static void sfq_destroy(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SFQ: backport some features from ESFQ (try 5)
Patchset try 2 addresses the review by Michael Buesch. Patchset try 3 addresses the review by Patrick McHardy. Patchset try 4 has a few cosmetic improvements. Patchset try 5 addresses further review by Patrick McHardy. This set of patches is substantially the same as my previous try, with changes made according to Patrick's recommendations. Iproute2 patches will follow shortly. The following is the original patch text. This set of patches adds some of ESFQ's modifications to the original SFQ. Thus far, I have received support for this approach rather than for trying to get ESFQ included as a separate qdisc. http://mailman.ds9a.nl/pipermail/lartc/2007q2/021056.html My patches here implement tc qdisc change, user-configurable depth (number of flows), and user-configurable divisor (for setting hash table size). I've left out the remaining ESFQ features (usage of jhash and different hashing methods) because Patrick McHardy intends to submit a patch that will supersede that functionality; see the URL above. Default values remain the same, and SFQ's default behavior remains the same, so there should be no user disruption. Thanks for your consideration, Corey include/linux/pkt_sched.h | 23 ++- net/sched/sch_sfq.c | 434 +++-- 2 files changed, 319 insertions(+), 138 deletions(-) [PATCH 1/8] Preparatory refactoring part 1. [PATCH 2/8] Preparatory refactoring part 2. [PATCH 3/8] Make depth (number of queues) user-configurable [PATCH 4/8] Add divisor. [PATCH 5/8] Make qdisc changeable. [PATCH 6/8] Remove comments about hardcoded values. [PATCH 7/8] Rework perturb_period. [PATCH 8/8] Use nested compat attributes to pass parameters. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] Remove comments about hardcoded values.
None of these are true anymore (hooray!). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |8 net/sched/sch_sfq.c | 13 + 2 files changed, 1 insertions(+), 20 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 919af93..d754a3d 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,14 +148,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index ca8716f..7b11086 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -62,18 +62,7 @@ We still need true WFQ for top level CSZ, but using WFQ for the best effort traffic is absolutely pointless: - SFQ is superior for this purpose. - - IMPLEMENTATION: - This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; number of hash buckets to 1024. - The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. - - It is easy to increase these values, but not in flight. */ + SFQ is superior for this purpose. */ #define SFQ_DEPTH_DEFAULT 128 #define SFQ_DIVISOR_DEFAULT10 -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] Add divisor.
Make hash divisor user-configurable. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 27 +-- 1 files changed, 21 insertions(+), 6 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 1c1bf08..c74d5ce 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -76,7 +76,7 @@ It is easy to increase these values, but not in flight. */ #define SFQ_DEPTH_DEFAULT 128 -#define SFQ_HASH_DIVISOR 1024 +#define SFQ_DIVISOR_DEFAULT10 #define SFQ_HEAD 0 #define SFQ_TAIL 1 @@ -86,6 +86,10 @@ typedef unsigned int sfq_index; #define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) +/* In practice, the actual divisor size is limited by kcalloc, but we still + * don't want to left shift by more than 31. */ +#define SFQ_MAX_DIVISOR 31 + struct sfq_head { sfq_index next; @@ -99,6 +103,7 @@ struct sfq_sched_data unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; + unsignedhash_divisor; /* Variables */ struct timer_list perturb_timer; @@ -106,7 +111,7 @@ struct sfq_sched_data sfq_index tail; /* Index of current slot in round */ sfq_index max_depth; /* Maximal depth */ - sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ + sfq_index *ht;/* Hash table */ sfq_index *next; /* Active slots link */ short *allot; /* Current allotment per slot */ unsigned short *hash; /* Hash value indexed by slots */ @@ -116,7 +121,9 @@ struct sfq_sched_data static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) { - return jhash_2words(h, h1, q-perturbation) (SFQ_HASH_DIVISOR - 1); + unsigned mask = (1q-hash_divisor) - 1; + + return jhash_2words(h, h1, q-perturbation) mask; } static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) @@ -418,6 +425,7 @@ static void sfq_perturbation(unsigned long arg) static void sfq_q_destroy(struct sfq_sched_data *q) { + kfree(q-ht); kfree(q-dep); kfree(q-next); kfree(q-allot); @@ -441,6 +449,7 @@ sfq_default_parameters(struct Qdisc *sch) q-quantum= psched_mtu(sch-dev); q-perturbation = 0; q-perturb_period = 0; + q-hash_divisor = SFQ_DIVISOR_DEFAULT; q-depth = SFQ_DEPTH_DEFAULT; q-limit = SFQ_DEPTH_DEFAULT - 1; } @@ -463,18 +472,24 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-quantum = ctl-quantum; if (ctl-perturb_period) q-perturb_period = ctl-perturb_period * HZ; + if (ctl-divisor) + q-hash_divisor = ctl-divisor; if (ctl-flows) q-depth = ctl-flows; if (ctl-limit) q-limit = ctl-limit; - if (q-depth SFQ_MAX_DEPTH) + if (q-depth SFQ_MAX_DEPTH || + q-hash_divisor SFQ_MAX_DIVISOR) return -EINVAL; } q-limit = min_t(u32, q-limit, q-depth - 1); q-tail = q-depth; q-max_depth = 0; + q-ht = kcalloc(1q-hash_divisor, sizeof(sfq_index), GFP_KERNEL); + if (!q-ht) + goto err_case; q-dep = kcalloc(1 + q-depth*2, sizeof(struct sfq_head), GFP_KERNEL); if (!q-dep) goto err_case; @@ -491,7 +506,7 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (!q-qs) goto err_case; - for (i=0; iSFQ_HASH_DIVISOR; i++) + for (i=0; i 1q-hash_divisor; i++) q-ht[i] = q-depth; for (i=0; i q-depth; i++) { skb_queue_head_init(q-qs[i]); @@ -537,7 +552,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) opt.perturb_period = q-perturb_period/HZ; opt.limit = q-limit; - opt.divisor = SFQ_HASH_DIVISOR; + opt.divisor = q-hash_divisor; opt.flows = q-depth; RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] Make qdisc changeable.
Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 67 ++- 1 files changed, 66 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index c74d5ce..ca8716f 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -454,6 +454,17 @@ sfq_default_parameters(struct Qdisc *sch) q-limit = SFQ_DEPTH_DEFAULT - 1; } +static void +sfq_copy_parameters(struct sfq_sched_data *dst, struct sfq_sched_data *src) +{ + dst-quantum= src-quantum; + dst-perturbation = src-perturbation; + dst-perturb_period = src-perturb_period; + dst-hash_divisor = src-hash_divisor; + dst-limit = src-limit; + dst-depth = src-depth; +} + static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { @@ -542,6 +553,60 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sfq_sched_data tmp; + struct sk_buff *skb; + unsigned int qlen; + int err; + + /* set up tmp queue */ + memset(tmp, 0, sizeof(struct sfq_sched_data)); + sfq_copy_parameters(tmp, q); + if ((err = sfq_q_init(tmp, opt))) + return err; + + /* handle perturbation */ + /* This code avoids resetting the perturb_timer unless perturb_period +* is changed. Note that the rest of this function leaves +* q-perturb_timer alone, whereas all other members of q get +* overwritten from tmp. */ + if (!tmp.perturb_period) { + tmp.perturbation = 0; + del_timer(q-perturb_timer); + } else if (tmp.perturb_period != q-perturb_period) { + mod_timer(q-perturb_timer, jiffies + tmp.perturb_period); + } + + /* move packets from the old queue to the tmp queue */ + sch_tree_lock(sch); + qlen = sch-q.qlen; + while (sch-q.qlen = tmp.limit - 1) + sfq_drop(sch); + qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); + while ((skb = sfq_q_dequeue(q)) != NULL) + sfq_q_enqueue(skb, tmp, SFQ_TAIL); + + /* clean up the old queue */ + sfq_q_destroy(q); + + /* copy elements of the tmp queue into the old queue */ + sfq_copy_parameters(q, tmp); + q-tail = tmp.tail; + q-max_depth = tmp.max_depth; + q-ht= tmp.ht; + q-dep = tmp.dep; + q-next = tmp.next; + q-allot = tmp.allot; + q-hash = tmp.hash; + q-qs= tmp.qs; + + /* finish up */ + sch_tree_unlock(sch); + return 0; +} + static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); @@ -576,7 +641,7 @@ static struct Qdisc_ops sfq_qdisc_ops = { .init = sfq_init, .reset = sfq_reset, .destroy= sfq_destroy, - .change = NULL, + .change = sfq_change, .dump = sfq_dump, .owner = THIS_MODULE, }; -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] Use nested compat attributes to pass parameters.
This fixes the ambiguity between, for example: tc qdisc change ... perturb 0 tc qdisc change ... Without this patch, there is no way for SFQ to differentiate between a parameter specified to be 0 and a parameter that was omitted. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h | 13 net/sched/sch_sfq.c | 69 ++-- 2 files changed, 66 insertions(+), 16 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 14a08ad..b1a1a52 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,6 +148,19 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; +enum +{ + TCA_SFQ_UNSPEC, + TCA_SFQ_COMPAT, + TCA_SFQ_QUANTUM, + TCA_SFQ_PERTURB, + TCA_SFQ_LIMIT, + TCA_SFQ_DIVISOR, + TCA_SFQ_FLOWS, + __TCA_SFQ_MAX, +}; + +#define TCA_SFQ_MAX (__TCA_SFQ_MAX - 1) /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 2764a54..5c25b05 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -456,6 +456,29 @@ sfq_copy_parameters(struct sfq_sched_data *dst, struct sfq_sched_data *src) dst-depth = src-depth; } +/* SFQ parameters exist as individual rtattr attributes, with a nested + * struct tc_sfq_qopt for compatibility with older userspace tools. If an + * individual attribute is set, we want to use it; otherwise, fall back to the + * nested struct. + * There is one caveat: if a member of the nested struct is 0, we cannot + * determine if that parameter is supposed to be 0 or if it is merely unset. + * So, only set a parameter if the corresponding struct member (u32 compat) is + * nonzero. When setting a parameter to 0, it is necessary to use the + * individual attribute. */ +static inline int +sfq_get_parameter(u32 *dst, struct rtattr *tb[TCA_SFQ_MAX], int attr, + u32 compat) +{ + struct rtattr *rta = tb[(attr - 1)]; + if (rta) + *dst = RTA_GET_U32(rta); + else if (compat) + *dst = compat; + + rtattr_failure: + return -EINVAL; +} + static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { @@ -465,21 +488,24 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) * the previous values (sfq_change). So, overwrite the parameters as * specified. */ if (opt) { - struct tc_sfq_qopt *ctl = RTA_DATA(opt); - - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) - return -EINVAL; - - if (ctl-quantum) - q-quantum = ctl-quantum; - if (ctl-perturb_period) - q-perturb_period = ctl-perturb_period; - if (ctl-divisor) - q-hash_divisor = ctl-divisor; - if (ctl-flows) - q-depth = ctl-flows; - if (ctl-limit) - q-limit = ctl-limit; + struct tc_sfq_qopt *ctl; + struct rtattr *tb[TCA_SFQ_MAX]; + + if (rtattr_parse_nested_compat(tb, TCA_SFQ_MAX, opt, ctl, + sizeof(*ctl))) + goto rtattr_failure; + + if (sfq_get_parameter((q-quantum),tb, TCA_SFQ_QUANTUM, + ctl-quantum)|| + sfq_get_parameter((q-perturb_period), tb, TCA_SFQ_PERTURB, + ctl-perturb_period) || + sfq_get_parameter((q-hash_divisor), tb, TCA_SFQ_DIVISOR, + ctl-divisor)|| + sfq_get_parameter((q-depth), tb, TCA_SFQ_FLOWS, + ctl-flows) || + sfq_get_parameter((q-limit), tb, TCA_SFQ_LIMIT, + ctl-limit)) + goto rtattr_failure; if (q-depth SFQ_MAX_DEPTH || q-hash_divisor SFQ_MAX_DIVISOR) @@ -519,6 +545,8 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) for (i=0; i q-depth; i++) sfq_link(q, i); return 0; +rtattr_failure: + return -EINVAL; err_case: sfq_q_destroy(q); return -ENOBUFS; @@ -602,17 +630,26 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); unsigned char *b = skb_tail_pointer(skb); + struct rtattr *nest; struct tc_sfq_qopt opt; opt.quantum = q-quantum; opt.perturb_period = q-perturb_period; - opt.limit = q-limit; opt.divisor = q-hash_divisor; opt.flows = q-depth; + nest = RTA_NEST_COMPAT(skb, TCA_OPTIONS, sizeof(opt), opt); + + RTA_PUT_U32(skb, TCA_SFQ_QUANTUM, q-quantum); + RTA_PUT_U32(skb
[PATCH 7/8] Rework perturb_period.
perturb_period is the only parameter that doesn't match 1:1 with the value from userspace. Multiplying perturb_period by HZ when used rather than when assigned makes it easy and clean to use a small function for setting parameters (in a subsequent patch). perturb_period is currently a signed integer, but I can't see any good reason why this is so--a negative perturbation period will add a timer that expires in the past, causing constant perturbation, which makes hashing useless. Strictly speaking, this will break binary compatibility with older versions of tc, but that ought not to be a problem because (a) there's no valid use for a negative perturb_period, and (b) negative values will be seen as high values ( INT_MAX), which don't work anyway. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |2 +- net/sched/sch_sfq.c | 14 -- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index d754a3d..14a08ad 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -142,7 +142,7 @@ enum struct tc_sfq_qopt { unsignedquantum;/* Bytes per round allocated to flow */ - int perturb_period; /* Period of hash perturbation */ + unsignedperturb_period; /* Period of hash perturbation */ __u32 limit; /* Maximal packets in queue */ unsigneddivisor;/* Hash divisor */ unsignedflows; /* Maximal number of flows */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 7b11086..2764a54 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -70,6 +70,8 @@ #define SFQ_HEAD 0 #define SFQ_TAIL 1 +#define SFQ_PERTURB(period) (jiffies + (unsigned long)period * HZ) + /* This type must contain greater than depth*2 values, so depth is constrained * accordingly. */ typedef unsigned int sfq_index; @@ -88,7 +90,7 @@ struct sfq_head struct sfq_sched_data { /* Parameters */ - int perturb_period; + unsignedperturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; @@ -409,7 +411,7 @@ static void sfq_perturbation(unsigned long arg) get_random_bytes(q-perturbation, 4); if (q-perturb_period) - mod_timer(q-perturb_timer, jiffies + q-perturb_period); + mod_timer(q-perturb_timer, SFQ_PERTURB(q-perturb_period)); } static void sfq_q_destroy(struct sfq_sched_data *q) @@ -471,7 +473,7 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (ctl-quantum) q-quantum = ctl-quantum; if (ctl-perturb_period) - q-perturb_period = ctl-perturb_period * HZ; + q-perturb_period = ctl-perturb_period; if (ctl-divisor) q-hash_divisor = ctl-divisor; if (ctl-flows) @@ -535,7 +537,7 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) q-perturb_timer.data = (unsigned long)sch; q-perturb_timer.function = sfq_perturbation; if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; + q-perturb_timer.expires = SFQ_PERTURB(q-perturb_period); add_timer(q-perturb_timer); } @@ -565,7 +567,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) tmp.perturbation = 0; del_timer(q-perturb_timer); } else if (tmp.perturb_period != q-perturb_period) { - mod_timer(q-perturb_timer, jiffies + tmp.perturb_period); + mod_timer(q-perturb_timer, SFQ_PERTURB(tmp.perturb_period)); } /* move packets from the old queue to the tmp queue */ @@ -603,7 +605,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) struct tc_sfq_qopt opt; opt.quantum = q-quantum; - opt.perturb_period = q-perturb_period/HZ; + opt.perturb_period = q-perturb_period; opt.limit = q-limit; opt.divisor = q-hash_divisor; -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] Make depth (number of queues) user-configurable
* replace #define with a parameter * use old hardcoded value as a default * kcalloc() arrays in sfq_q_init() * free() arrays in new function sfq_q_destroy() * move sfq_destroy() to near sfq_q_destroy(), for clarity Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 104 +++ 1 files changed, 72 insertions(+), 32 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 8ea816a..1c1bf08 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -75,14 +75,16 @@ It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH_DEFAULT 128 #define SFQ_HASH_DIVISOR 1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 -/* This type should contain at least SFQ_DEPTH*2 values */ -typedef unsigned char sfq_index; +/* This type must contain greater than depth*2 values, so depth is constrained + * accordingly. */ +typedef unsigned int sfq_index; +#define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) struct sfq_head { @@ -96,6 +98,7 @@ struct sfq_sched_data int perturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; + unsigneddepth; /* Variables */ struct timer_list perturb_timer; @@ -104,11 +107,11 @@ struct sfq_sched_data sfq_index max_depth; /* Maximal depth */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH];/* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH];/* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + sfq_index *next; /* Active slots link */ + short *allot; /* Current allotment per slot */ + unsigned short *hash; /* Hash value indexed by slots */ + struct sk_buff_head *qs;/* Slot queue */ + struct sfq_head *dep; /* Linked list of slots, indexed by depth */ }; static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) @@ -160,7 +163,7 @@ static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q-qs[x].qlen + SFQ_DEPTH; + int d = q-qs[x].qlen + q-depth; p = d; n = q-dep[d].next; @@ -211,7 +214,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) drop a packet from it */ if (d 1) { - sfq_index x = q-dep[d+SFQ_DEPTH].next; + sfq_index x = q-dep[d + q-depth].next; skb = q-qs[x].prev; len = skb-len; __skb_unlink(skb, q-qs[x]); @@ -234,7 +237,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) kfree_skb(skb); sfq_dec(q, d); sch-q.qlen--; - q-ht[q-hash[d]] = SFQ_DEPTH; + q-ht[q-hash[d]] = q-depth; sch-qstats.drops++; sch-qstats.backlog -= len; return len; @@ -250,8 +253,8 @@ sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, int end) sfq_index x; x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; + if (x == q-depth) { + q-ht[hash] = x = q-dep[q-depth].next; q-hash[x] = hash; } @@ -287,7 +290,7 @@ sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, int end) sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ + if (q-tail == q-depth) { /* It is the first flow */ q-tail = x; q-next[x] = x; q-allot[x] = q-quantum; @@ -351,7 +354,7 @@ sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) sfq_index a, old_a; /* No active slots */ - if (q-tail == SFQ_DEPTH) + if (q-tail == q-depth) return NULL; a = old_a = q-next[q-tail]; @@ -362,10 +365,10 @@ sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) /* Is the slot empty? */ if (q-qs[a].qlen == 0) { - q-ht[q-hash[a]] = SFQ_DEPTH; + q-ht[q-hash[a]] = q-depth; a = q-next[a]; if (a == old_a) { - q-tail = SFQ_DEPTH; + q-tail = q-depth; return skb; } q-next[q-tail] = a; @@ -413,6 +416,23 @@ static
[PATCH 1/8] Preparatory refactoring part 1.
Make a new function sfq_q_enqueue() that operates directly on the queue data. This will be useful for implementing sfq_change() in a later patch. A pleasant side-effect is reducing most of the duplicate code in sfq_enqueue() and sfq_requeue(). Similarly, make a new function sfq_q_dequeue(). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 119 ++- 1 files changed, 70 insertions(+), 49 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index b542c87..10e2f3d 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -78,6 +78,9 @@ #define SFQ_DEPTH 128 #define SFQ_HASH_DIVISOR 1024 +#define SFQ_HEAD 0 +#define SFQ_TAIL 1 + /* This type should contain at least SFQ_DEPTH*2 values */ typedef unsigned char sfq_index; @@ -241,9 +244,8 @@ static unsigned int sfq_drop(struct Qdisc *sch) } static int -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, int end) { - struct sfq_sched_data *q = qdisc_priv(sch); unsigned hash = sfq_hash(q, skb); sfq_index x; @@ -252,15 +254,37 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-ht[hash] = x = q-dep[SFQ_DEPTH].next; q-hash[x] = hash; } - /* If selected queue has length q-limit, this means that -* all another queues are empty and that we do simple tail drop, -* i.e. drop _this_ packet. -*/ - if (q-qs[x].qlen = q-limit) - return qdisc_drop(skb, sch); - sch-qstats.backlog += skb-len; - __skb_queue_tail(q-qs[x], skb); + if (end == SFQ_TAIL) { + /* If selected queue has length q-limit, this means that +* all other queues are empty and that we do simple tail drop, +* i.e. drop _this_ packet. +*/ + if (q-qs[x].qlen = q-limit) { + unsigned int drop_len = skb-len; + + kfree_skb(skb); + return drop_len; + } + __skb_queue_tail(q-qs[x], skb); + } else { /* end == SFQ_HEAD */ + __skb_queue_head(q-qs[x], skb); + /* If selected queue has length q-limit+1, this means that +* all other queues are empty and we do simple tail drop. +* This packet is still requeued at head of queue, tail packet +* is dropped. +*/ + if (q-qs[x].qlen q-limit) { + unsigned int drop_len; + + skb = q-qs[x].prev; + drop_len = skb-len; + __skb_unlink(skb, q-qs[x]); + kfree_skb(skb); + return drop_len; + } + } + sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ if (q-tail == SFQ_DEPTH) { /* It is the first flow */ @@ -273,6 +297,21 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-tail = x; } } + + return 0; +} + +static int +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + + if (sfq_q_enqueue(skb, q, SFQ_TAIL)) { + sch-qstats.drops++; + return NET_XMIT_DROP; + } + + sch-qstats.backlog += skb-len; if (++sch-q.qlen = q-limit) { sch-bstats.bytes += skb-len; sch-bstats.packets++; @@ -287,58 +326,27 @@ static int sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct sfq_sched_data *q = qdisc_priv(sch); - unsigned hash = sfq_hash(q, skb); - sfq_index x; + unsigned int drop_len; - x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; - q-hash[x] = hash; - } sch-qstats.backlog += skb-len; - __skb_queue_head(q-qs[x], skb); - /* If selected queue has length q-limit+1, this means that -* all another queues are empty and we do simple tail drop. -* This packet is still requeued at head of queue, tail packet -* is dropped. -*/ - if (q-qs[x].qlen q-limit) { - skb = q-qs[x].prev; - __skb_unlink(skb, q-qs[x]); + if ((drop_len = sfq_q_enqueue(skb, q, SFQ_HEAD))) { + sch-qstats.backlog -= drop_len; sch-qstats.drops++; - sch-qstats.backlog -= skb-len; - kfree_skb(skb); return NET_XMIT_CN; } - sfq_inc(q, x); - if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ - q-tail = x; - q-next[x] = x
[iproute2] SFQ: backport some features from ESFQ (try 5)
These patches follow the ESFQ--SFQ kernel patches. See the kernel patch summary for general information. Thanks, Corey include/linux/pkt_sched.h | 23 ++- tc/q_sfq.c| 42 +- 2 files changed, 51 insertions(+), 14 deletions(-) [PATCH 1/3] SFQ: Support changing depth and divisor. [PATCH 2/3] Change perturb_period to unsigned. [PATCH 3/3] Use nested compat attributes for passing parameters to the kernel. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Change perturb_period to unsigned.
This corresponds to the kernel patch doing the same. Here, too, this will technically break binary compatibility with older kernels, but that shouldn't be a problem because negative perturb_period values aren't usable anyway. --- include/linux/pkt_sched.h |2 +- tc/q_sfq.c|4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 9d41f63..fb04a89 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -142,7 +142,7 @@ enum struct tc_sfq_qopt { unsignedquantum;/* Bytes per round allocated to flow */ - int perturb_period; /* Period of hash perturbation */ + unsignedperturb_period; /* Period of hash perturbation */ __u32 limit; /* Maximal packets in queue */ unsigneddivisor;/* Hash divisor */ unsignedflows; /* Maximal number of flows */ diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 19e76ba..83c8a54 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -47,7 +47,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl ok++; } else if (strcmp(*argv, perturb) == 0) { NEXT_ARG(); - if (get_integer(opt.perturb_period, *argv, 0)) { + if (get_u32(opt.perturb_period, *argv, 0)) { fprintf(stderr, Illegal \perturb\\n); return -1; } @@ -114,7 +114,7 @@ static int sfq_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) fprintf(f, flows %u/%u , qopt-flows, qopt-divisor); } if (qopt-perturb_period) - fprintf(f, perturb %dsec , qopt-perturb_period); + fprintf(f, perturb %usec , qopt-perturb_period); return 0; } -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] SFQ: Support changing depth and divisor.
This can safely be applied either before or after the kernel patches because the tc_sfq_qopt struct is unchanged: - old kernels will ignore the new parameters from new iproute2 - new kernels will use the same defaults for the new parameters --- include/linux/pkt_sched.h |9 - tc/q_sfq.c| 20 +++- 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 268c515..9d41f63 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,15 +148,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ - /* RED section */ enum diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 05385cf..19e76ba 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -25,7 +25,7 @@ static void explain(void) { - fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ perturb SECS ] [ quantum BYTES ]\n); + fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ depth FLOWS ] [ divisor HASHBITS ] [ perturb SECS ] [ quantum BYTES ]\n); } #define usage() return(-1) @@ -63,6 +63,24 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl return -1; } ok++; + } else if (strcmp(*argv, depth) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.flows, *argv, 0)) { + fprintf(stderr, Illegal \depth\\n); + return -1; + } + ok++; + } else if (strcmp(*argv, divisor) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.divisor, *argv, 0)) { + fprintf(stderr, Illegal \divisor\\n); + return -1; + } + if (opt.divisor = 15) { + fprintf(stderr, Illegal \divisor\, must be 15\n); + return -1; + } + ok++; } else if (strcmp(*argv, help) == 0) { explain(); return -1; -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Use nested compat attributes for passing parameters to the kernel.
Note that I have left sfq_print_opt() alone. At this point, there can be no difference between the data in the nested rtattrs and the data in the compat rtattr, and I didn't want to add clutter that isn't useful. Let me know if I should do differently. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h | 14 ++ tc/q_sfq.c| 18 -- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index fb04a89..aad04eb 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,6 +148,20 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; +enum +{ + TCA_SFQ_UNSPEC, + TCA_SFQ_COMPAT, + TCA_SFQ_QUANTUM, + TCA_SFQ_PERTURB, + TCA_SFQ_LIMIT, + TCA_SFQ_DIVISOR, + TCA_SFQ_FLOWS, + __TCA_SFQ_MAX, +}; + +#define TCA_SFQ_MAX (__TCA_SFQ_MAX - 1) + /* RED section */ enum diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 83c8a54..69f17c8 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -34,9 +34,13 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl { int ok=0; struct tc_sfq_qopt opt; + struct rtattr *nest; memset(opt, 0, sizeof(opt)); + /* put blank data in rtattr so there is a hole to fill later */ + nest = addattr_nest_compat(n, 1024, TCA_OPTIONS, opt, sizeof(opt)); + while (argc 0) { if (strcmp(*argv, quantum) == 0) { NEXT_ARG(); @@ -44,6 +48,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \limit\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_QUANTUM, opt.quantum); ok++; } else if (strcmp(*argv, perturb) == 0) { NEXT_ARG(); @@ -51,6 +56,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \perturb\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_PERTURB, opt.perturb_period); ok++; } else if (strcmp(*argv, limit) == 0) { NEXT_ARG(); @@ -62,6 +68,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \limit\, must be 1\n); return -1; } + addattr32(n, 1024, TCA_SFQ_LIMIT, opt.limit); ok++; } else if (strcmp(*argv, depth) == 0) { NEXT_ARG(); @@ -69,6 +76,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \depth\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_FLOWS, opt.flows); ok++; } else if (strcmp(*argv, divisor) == 0) { NEXT_ARG(); @@ -80,6 +88,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \divisor\, must be 15\n); return -1; } + addattr32(n, 1024, TCA_SFQ_DIVISOR, opt.divisor); ok++; } else if (strcmp(*argv, help) == 0) { explain(); @@ -92,8 +101,13 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl argc--; argv++; } - if (ok) - addattr_l(n, 1024, TCA_OPTIONS, opt, sizeof(opt)); + if (ok) { + /* fill the hole we left earlier with real compat data */ + memcpy(RTA_DATA(nest), opt, sizeof(opt)); + addattr_nest_compat_end(n, nest); + } + else + nest-rta_len = 0; return 0; } -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] Use nested compat attributes to pass parameters.
Corey Hickey wrote: +/* SFQ parameters exist as individual rtattr attributes, with a nested + * struct tc_sfq_qopt for compatibility with older userspace tools. If an + * individual attribute is set, we want to use it; otherwise, fall back to the + * nested struct. + * There is one caveat: if a member of the nested struct is 0, we cannot + * determine if that parameter is supposed to be 0 or if it is merely unset. + * So, only set a parameter if the corresponding struct member (u32 compat) is + * nonzero. When setting a parameter to 0, it is necessary to use the + * individual attribute. */ +static inline int +sfq_get_parameter(u32 *dst, struct rtattr *tb[TCA_SFQ_MAX], int attr, + u32 compat) +{ + struct rtattr *rta = tb[(attr - 1)]; + if (rta) + *dst = RTA_GET_U32(rta); + else if (compat) + *dst = compat; + + rtattr_failure: + return -EINVAL; +} + static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { @@ -465,21 +488,24 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) * the previous values (sfq_change). So, overwrite the parameters as * specified. */ if (opt) { - struct tc_sfq_qopt *ctl = RTA_DATA(opt); - - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) - return -EINVAL; - - if (ctl-quantum) - q-quantum = ctl-quantum; - if (ctl-perturb_period) - q-perturb_period = ctl-perturb_period; - if (ctl-divisor) - q-hash_divisor = ctl-divisor; - if (ctl-flows) - q-depth = ctl-flows; - if (ctl-limit) - q-limit = ctl-limit; + struct tc_sfq_qopt *ctl; + struct rtattr *tb[TCA_SFQ_MAX]; + + if (rtattr_parse_nested_compat(tb, TCA_SFQ_MAX, opt, ctl, + sizeof(*ctl))) + goto rtattr_failure; + + if (sfq_get_parameter((q-quantum),tb, TCA_SFQ_QUANTUM, + ctl-quantum)|| + sfq_get_parameter((q-perturb_period), tb, TCA_SFQ_PERTURB, + ctl-perturb_period) || + sfq_get_parameter((q-hash_divisor), tb, TCA_SFQ_DIVISOR, + ctl-divisor)|| + sfq_get_parameter((q-depth), tb, TCA_SFQ_FLOWS, + ctl-flows) || + sfq_get_parameter((q-limit), tb, TCA_SFQ_LIMIT, + ctl-limit)) + goto rtattr_failure; You may note that this part ended up being rather ugly, and I wouldn't blame anyone for wanting it to be improved. I don't see any good solutions; the alternatives I can provide are: 1. Use a macro, as I had originally written for this patch: http://marc.info/?l=linux-netdevm=119102007907626w=2 (search for GET_PARAM) The macro itself doesn't look pretty, but the usage is fairly clean. 2. Use neither macro nor function. There would be five repetitions of very similar code, with five lines per repetition: not very nice, but the formatting would still look better. 3. Only use a separate rtattr for perturb, since the other parameters work fine already. From the start, I had wanted to keep parameter parsing consistent, but it may not be worth it. This would definitely be the cleanest approach for readability, and the most simple. Sorry to bother you with such a superficial problem, but I really don't know what would be preferable. Thanks, Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Preparatory refactoring part 1.
Patrick McHardy wrote: Corey Hickey wrote: Make a new function sfq_q_enqueue() that operates directly on the queue data. This will be useful for implementing sfq_change() in a later patch. A pleasant side-effect is reducing most of the duplicate code in sfq_enqueue() and sfq_requeue(). Similarly, make a new function sfq_q_dequeue(). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 72 +++ 1 files changed, 38 insertions(+), 34 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3a23e30..57485ef 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c The sfq_q_enqueue part looks fine. - sch-qstats.drops++; A line in the changelog explaining that this was increased twice would have been nice. Certainly; I think I didn't realize, when you originally pointed out the duplicate incrementing, that it was a bug in the original version and not in my patch. Otherwise, I would have sent it as a separate patch. If a note in this patch will suffice, though, I'll definitely do so. sfq_drop(sch); return NET_XMIT_CN; } - - - -static struct sk_buff * -sfq_dequeue(struct Qdisc* sch) +static struct +sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) What is this function needed for? It gets used in sfq_change for moving packets from the old queue into the new one. In this case, we don't want to modify sch-q.qlen or sch-qstats.backlog, since those don't actually change. while ((skb = sfq_q_dequeue(q)) != NULL) sfq_q_enqueue(skb, tmp, SFQ_TAIL); I'll improve the description of this patch to make that more clear. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/10] Move two functions.
Patrick McHardy wrote: Corey Hickey wrote: Move sfq_q_destroy() to above sfq_q_init() so that it can be used by an error case in a later patch. Move sfq_destroy() as well, for clarity. This patch looks pointless, just put them where you need them in the patch introducing them. As you wish. I thought having a separate patch would ease reviewing. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] Add divisor.
Patrick McHardy wrote: Corey Hickey wrote: Make hash divisor user-configurable. @@ -120,7 +121,7 @@ static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1 /* Have we any rotation primitives? If not, WHY? */ h ^= (h1pert) ^ (h1(0x1F - pert)); h ^= h10; - return h 0x3FF; + return h (q-hash_divisor-1); This assumes that hash_divisor is a power of two, but this is not enforced anywhere. Ok. I'll move that part from userspace to the kernel. That should be better anyway. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] Make qdisc changeable.
Patrick McHardy wrote: Corey Hickey wrote: Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + ... + + /* finish up */ + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } else { + q-perturbation = 0; Seems counter-productive to explicitly set it to zero since it was still used during tranfering the packets with the old value. So I'd suggest to remove this or alternatively set it to the final value *before* transfering the packets. I suppose so; you're right. I'll adapt that part to fit before transferring the packets. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/10] Change perturb_period to unsigned.
Patrick McHardy wrote: Corey Hickey wrote: perturb_period is currently a signed integer, but I can't see any good reason why this is so--a negative perturbation period will add a timer that expires in the past, causing constant perturbation, which makes hashing useless. if (q-perturb_period) { q-perturb_timer.expires = jiffies + q-perturb_period; add_timer(q-perturb_timer); } Strictly speaking, this will break binary compatibility with older versions of tc, but that ought not to be a problem because (a) there's no valid use for a negative perturb_period, and (b) negative values will be seen as high values ( INT_MAX), which don't work anyway. If perturb_period is too large, (perturb_period * HZ) will overflow the size of an unsigned int and wrap around. So, check for thet and reject values that are too high. Sounds reasonable. --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -74,6 +74,9 @@ typedef unsigned int sfq_index; #define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) +/* We don't want perturb_period * HZ to overflow an unsigned int. */ +#define SFQ_MAX_PERTURB (UINT_MAX / HZ) jiffies are unsigned long. Hmm. You're right. It looks like my previous patch obviated the need for this part. I'll remove it. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] Use nested compat attributes to pass parameters.
Patrick McHardy wrote: Corey Hickey wrote: + +#define GET_PARAM(dst, nest, compat) do { \ + struct rtattr *rta = tb[(nest) - 1]; \ + if (rta) \ + (dst) = RTA_GET_U32(rta); \ + else if ((compat)) \ + (dst) = (compat); \ +} while (0) An inline function and a comment why this is done would increase readability. Well, I had a reason for making a macro, but it probably wasn't a good reason. Looking now, I don't see why not to make a function. I'll see what I can do. + nest = RTA_NEST_COMPAT(skb, TCA_OPTIONS, sizeof(opt), opt); + + RTA_PUT_U32(skb, TCA_SFQ_QUANTUM, q-quantum); + RTA_PUT_U32(skb, TCA_SFQ_PERTURB, q-perturb_period); + RTA_PUT_U32(skb, TCA_SFQ_LIMIT, q-limit); + RTA_PUT_U32(skb, TCA_SFQ_DIVISOR, q-hash_divisor); + RTA_PUT_U32(skb, TCA_SFQ_FLOWS, q-depth); RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); This is wrong, RTA_NEST_COMPAT already dumps the structure. You mean that last line (RTA_PUT) is superfluous, right? I can't see a reason for it to be there, so I must have just forgotten to delete it from the original code. If I'm wrong, I might need a little hand-holding here. My understanding of all the RTA stuff is a bit shaky. Much thanks for the review. I'll make a new set of patches soon. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] Preparatory refactoring part 2.
Patrick McHardy wrote: Corey Hickey wrote: The sfq_destroy() -- sfq_q_destroy() change looks pointless here, but it's cleaner to split now and add code to sfq_q_destroy() in a later patch. +static void sfq_destroy(struct Qdisc *sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_destroy(q); +} It does look pointless, after applying all patches sfq_destroy still remains a simply wrapper around sfq_q_destroy. It does remain a wrapper, but both functions are used. It doesn't have to be this way, but I wanted to avoid duplicating code and I didn't see a better layout. sfq_q_destroy is used in sfq_q_init if a kcalloc fails. sfq_q_init knows nothing about struct Qdisc *sch, so it can't call sfq_destroy. sfq_destroy is still marked as the destroy function in sfq_qdisc_ops. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SFQ: backport some features from ESFQ (try 4)
Patrick McHardy wrote: Corey Hickey wrote: Patchset try 2 addresses the review by Michael Buesch. Patchset try 3 addresses the review by Patrick McHardy. Patchset try 4 has a few cosmetic improvements. Nobody reviewed my last set of patches, and I wasn't pushy about asking. Since it's been a while, I ported the kernel and userspace patchsets to current git of net-2.6 and iproute2, respectively. Thanks for posting these patches, I'll have a closer look tommorrow. Many thanks for reviewing. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SFQ: backport some features from ESFQ (try 4)
Patchset try 2 addresses the review by Michael Buesch. Patchset try 3 addresses the review by Patrick McHardy. Patchset try 4 has a few cosmetic improvements. Nobody reviewed my last set of patches, and I wasn't pushy about asking. Since it's been a while, I ported the kernel and userspace patchsets to current git of net-2.6 and iproute2, respectively. The first 7 patches in this series resemble the corresponding 7 patches in patchset try 2. There aren't any major changes--just modifications to address errors noticed in review and slight reorganizations to make the next patches easier. Patches 8-10 implement parameter passing via nested compat attributes. This is necessary for using 'tc qdisc change' to disable perturbation. The rest of the parameters were added for consistency. Iproute2 patches will follow shortly. The following is the original patch text. This set of patches adds some of ESFQ's modifications to the original SFQ. Thus far, I have received support for this approach rather than for trying to get ESFQ included as a separate qdisc. http://mailman.ds9a.nl/pipermail/lartc/2007q2/021056.html My patches here implement tc qdisc change, user-configurable depth (number of flows), and user-configurable divisor (for setting hash table size). I've left out the remaining ESFQ features (usage of jhash and different hashing methods) because Patrick McHardy intends to submit a patch that will supersede that functionality; see the URL above. Default values remain the same, and SFQ's default behavior remains the same, so there should be no user disruption. Thanks for your consideration, Corey include/linux/pkt_sched.h | 23 ++-- net/sched/sch_sfq.c | 353 ++--- 2 files changed, 254 insertions(+), 122 deletions(-) [PATCH 01/10] Preparatory refactoring part 1. [PATCH 02/10] Preparatory refactoring part 2. [PATCH 03/10] Move two functions. [PATCH 04/10] Make depth (number of queues) user-configurable: [PATCH 05/10] Add divisor. [PATCH 06/10] Make qdisc changeable. [PATCH 07/10] Remove comments about hardcoded values. [PATCH 08/10] Multiply perturb_period by HZ when used rather than when assigned. [PATCH 09/10] Change perturb_period to unsigned. [PATCH 10/10] Use nested compat attributes to pass parameters. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/10] Move two functions.
Move sfq_q_destroy() to above sfq_q_init() so that it can be used by an error case in a later patch. Move sfq_destroy() as well, for clarity. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 22 +++--- 1 files changed, 11 insertions(+), 11 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 1ba3d1a..ca22cb7 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -382,6 +382,17 @@ static void sfq_perturbation(unsigned long arg) } } +static void sfq_q_destroy(struct sfq_sched_data *q) +{ + del_timer(q-perturb_timer); +} + +static void sfq_destroy(struct Qdisc *sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_destroy(q); +} + static void sfq_default_parameters(struct Qdisc *sch) { @@ -451,17 +462,6 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } -static void sfq_q_destroy(struct sfq_sched_data *q) -{ - del_timer(q-perturb_timer); -} - -static void sfq_destroy(struct Qdisc *sch) -{ - struct sfq_sched_data *q = qdisc_priv(sch); - sfq_q_destroy(q); -} - static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] Preparatory refactoring part 1.
Make a new function sfq_q_enqueue() that operates directly on the queue data. This will be useful for implementing sfq_change() in a later patch. A pleasant side-effect is reducing most of the duplicate code in sfq_enqueue() and sfq_requeue(). Similarly, make a new function sfq_q_dequeue(). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 72 +++ 1 files changed, 38 insertions(+), 34 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3a23e30..57485ef 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -77,6 +77,9 @@ #define SFQ_DEPTH 128 #define SFQ_HASH_DIVISOR 1024 +#define SFQ_HEAD 0 +#define SFQ_TAIL 1 + /* This type should contain at least SFQ_DEPTH*2 values */ typedef unsigned char sfq_index; @@ -244,10 +247,9 @@ static unsigned int sfq_drop(struct Qdisc *sch) return 0; } -static int -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static void +sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) { - struct sfq_sched_data *q = qdisc_priv(sch); unsigned hash = sfq_hash(q, skb); sfq_index x; @@ -256,8 +258,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-ht[hash] = x = q-dep[SFQ_DEPTH].next; q-hash[x] = hash; } - sch-qstats.backlog += skb-len; - __skb_queue_tail(q-qs[x], skb); + + if (end == SFQ_TAIL) + __skb_queue_tail(q-qs[x], skb); + else + __skb_queue_head(q-qs[x], skb); + sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ if (q-tail == SFQ_DEPTH) { /* It is the first flow */ @@ -270,6 +276,15 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-tail = x; } } +} + +static int +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + + sfq_q_enqueue(skb, q, SFQ_TAIL); + sch-qstats.backlog += skb-len; if (++sch-q.qlen = q-limit) { sch-bstats.bytes += skb-len; sch-bstats.packets++; @@ -284,45 +299,21 @@ static int sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct sfq_sched_data *q = qdisc_priv(sch); - unsigned hash = sfq_hash(q, skb); - sfq_index x; - x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; - q-hash[x] = hash; - } + sfq_q_enqueue(skb, q, SFQ_HEAD); sch-qstats.backlog += skb-len; - __skb_queue_head(q-qs[x], skb); - sfq_inc(q, x); - if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ - q-tail = x; - q-next[x] = x; - q-allot[x] = q-quantum; - } else { - q-next[x] = q-next[q-tail]; - q-next[q-tail] = x; - q-tail = x; - } - } if (++sch-q.qlen = q-limit) { sch-qstats.requeues++; return 0; } - sch-qstats.drops++; sfq_drop(sch); return NET_XMIT_CN; } - - - -static struct sk_buff * -sfq_dequeue(struct Qdisc* sch) +static struct +sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) { - struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, old_a; @@ -335,8 +326,6 @@ sfq_dequeue(struct Qdisc* sch) /* Grab packet */ skb = __skb_dequeue(q-qs[a]); sfq_dec(q, a); - sch-q.qlen--; - sch-qstats.backlog -= skb-len; /* Is the slot empty? */ if (q-qs[a].qlen == 0) { @@ -353,6 +342,21 @@ sfq_dequeue(struct Qdisc* sch) a = q-next[a]; q-allot[a] += q-quantum; } + + return skb; +} + +static struct sk_buff +*sfq_dequeue(struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sk_buff *skb; + + skb = sfq_q_dequeue(q); + if (skb == NULL) + return NULL; + sch-q.qlen--; + sch-qstats.backlog -= skb-len; return skb; } -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] Preparatory refactoring part 2.
Factor code out of sfq_init() and sfq_destroy(), again so that the new functions can be used by sfq_change() later. Actually, as the diff itself shows, most of the sfq_q_init() code comes from the original sfq_change(), but sfq_change() is only called by sfq_init() right now. Thus, it is safe to remove sfq_change(); tc qdisc change doesn't yet work for sfq anyway. Setting default parameters is moved into a separate function for clarity. The sfq_destroy() -- sfq_q_destroy() change looks pointless here, but it's cleaner to split now and add code to sfq_q_destroy() in a later patch. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 95 +-- 1 files changed, 54 insertions(+), 41 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 57485ef..1ba3d1a 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -382,43 +382,41 @@ static void sfq_perturbation(unsigned long arg) } } -static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +static void +sfq_default_parameters(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - struct tc_sfq_qopt *ctl = RTA_DATA(opt); - unsigned int qlen; - - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) - return -EINVAL; - - sch_tree_lock(sch); - q-quantum = ctl-quantum ? : psched_mtu(sch-dev); - q-perturb_period = ctl-perturb_period*HZ; - if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH - 2); - qlen = sch-q.qlen; - while (sch-q.qlen q-limit) - sfq_drop(sch); - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); - - del_timer(q-perturb_timer); - if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; - add_timer(q-perturb_timer); - } - sch_tree_unlock(sch); - return 0; + q-quantum= psched_mtu(sch-dev); + q-perturbation = 0; + q-perturb_period = 0; + q-limit = SFQ_DEPTH - 2; } -static int sfq_init(struct Qdisc *sch, struct rtattr *opt) +static int +sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { - struct sfq_sched_data *q = qdisc_priv(sch); int i; - init_timer(q-perturb_timer); - q-perturb_timer.data = (unsigned long)sch; - q-perturb_timer.function = sfq_perturbation; + /* At this point, parameters are set to either defaults (sfq_init) or +* the previous values (sfq_change). So, overwrite the parameters as +* specified. */ + if (opt) { + struct tc_sfq_qopt *ctl = RTA_DATA(opt); + + if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) + return -EINVAL; + + if (ctl-quantum) + q-quantum = ctl-quantum; + if (ctl-perturb_period) + q-perturb_period = ctl-perturb_period * HZ; + if (ctl-limit) + q-limit = ctl-limit; + } + q-limit = min_t(u32, q-limit, SFQ_DEPTH - 2); + q-tail = SFQ_DEPTH; + q-max_depth = 0; for (i=0; iSFQ_HASH_DIVISOR; i++) q-ht[i] = SFQ_DEPTH; @@ -427,28 +425,43 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; } - q-limit = SFQ_DEPTH - 2; - q-max_depth = 0; - q-tail = SFQ_DEPTH; - if (opt == NULL) { - q-quantum = psched_mtu(sch-dev); - q-perturb_period = 0; - } else { - int err = sfq_change(sch, opt); - if (err) - return err; - } + for (i=0; iSFQ_DEPTH; i++) sfq_link(q, i); return 0; } -static void sfq_destroy(struct Qdisc *sch) +static int sfq_init(struct Qdisc *sch, struct rtattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); + int err; + + sfq_default_parameters(sch); + if ((err = sfq_q_init(q, opt))) + return err; + + init_timer(q-perturb_timer); + q-perturb_timer.data = (unsigned long)sch; + q-perturb_timer.function = sfq_perturbation; + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } + + return 0; +} + +static void sfq_q_destroy(struct sfq_sched_data *q) +{ del_timer(q-perturb_timer); } +static void sfq_destroy(struct Qdisc *sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_destroy(q); +} + static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info
[PATCH 04/10] Make depth (number of queues) user-configurable:
* replace #define with a parameter * use old hardcoded value as a default * kcalloc() arrays in sfq_q_init() * free() arrays in sfq_q_destroy() Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 85 +++--- 1 files changed, 59 insertions(+), 26 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index ca22cb7..34a299d 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -74,14 +74,16 @@ It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH_DEFAULT 128 #define SFQ_HASH_DIVISOR 1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 -/* This type should contain at least SFQ_DEPTH*2 values */ -typedef unsigned char sfq_index; +/* This type must contain greater than depth*2 values, so depth is constrained + * accordingly. */ +typedef unsigned int sfq_index; +#define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) struct sfq_head { @@ -95,6 +97,7 @@ struct sfq_sched_data int perturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; + unsigneddepth; /* Variables */ struct timer_list perturb_timer; @@ -103,11 +106,11 @@ struct sfq_sched_data sfq_index max_depth; /* Maximal depth */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH];/* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH];/* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + sfq_index *next; /* Active slots link */ + short *allot; /* Current allotment per slot */ + unsigned short *hash; /* Hash value indexed by slots */ + struct sk_buff_head *qs;/* Slot queue */ + struct sfq_head *dep; /* Linked list of slots, indexed by depth */ }; static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) @@ -164,7 +167,7 @@ static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q-qs[x].qlen + SFQ_DEPTH; + int d = q-qs[x].qlen + q-depth; p = d; n = q-dep[d].next; @@ -215,7 +218,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) drop a packet from it */ if (d 1) { - sfq_index x = q-dep[d+SFQ_DEPTH].next; + sfq_index x = q-dep[d + q-depth].next; skb = q-qs[x].prev; len = skb-len; __skb_unlink(skb, q-qs[x]); @@ -238,7 +241,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) kfree_skb(skb); sfq_dec(q, d); sch-q.qlen--; - q-ht[q-hash[d]] = SFQ_DEPTH; + q-ht[q-hash[d]] = q-depth; sch-qstats.drops++; sch-qstats.backlog -= len; return len; @@ -254,8 +257,8 @@ sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) sfq_index x; x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; + if (x == q-depth) { + q-ht[hash] = x = q-dep[q-depth].next; q-hash[x] = hash; } @@ -266,7 +269,7 @@ sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ + if (q-tail == q-depth) { /* It is the first flow */ q-tail = x; q-next[x] = x; q-allot[x] = q-quantum; @@ -318,7 +321,7 @@ sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) sfq_index a, old_a; /* No active slots */ - if (q-tail == SFQ_DEPTH) + if (q-tail == q-depth) return NULL; a = old_a = q-next[q-tail]; @@ -329,10 +332,10 @@ sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) /* Is the slot empty? */ if (q-qs[a].qlen == 0) { - q-ht[q-hash[a]] = SFQ_DEPTH; + q-ht[q-hash[a]] = q-depth; a = q-next[a]; if (a == old_a) { - q-tail = SFQ_DEPTH; + q-tail = q-depth; return skb; } q-next[q-tail] = a; @@ -385,6 +388,11 @@ static void sfq_perturbation(unsigned long arg) static void
[PATCH 06/10] Make qdisc changeable.
Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 61 ++- 1 files changed, 60 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index d72ea7c..b8e8fa5 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -416,6 +416,17 @@ sfq_default_parameters(struct Qdisc *sch) q-limit = SFQ_DEPTH_DEFAULT - 2; } +static void +sfq_copy_parameters(struct sfq_sched_data *dst, struct sfq_sched_data *src) +{ + dst-quantum= src-quantum; + dst-perturbation = src-perturbation; + dst-perturb_period = src-perturb_period; + dst-hash_divisor = src-hash_divisor; + dst-limit = src-limit; + dst-depth = src-depth; +} + static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { @@ -503,6 +514,54 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sfq_sched_data tmp; + struct sk_buff *skb; + unsigned int qlen; + int err; + + /* set up tmp queue */ + memset(tmp, 0, sizeof(struct sfq_sched_data)); + sfq_copy_parameters(tmp, q); + if ((err = sfq_q_init(tmp, opt))) + return err; + + /* copy packets from the old queue to the tmp queue */ + sch_tree_lock(sch); + qlen = sch-q.qlen; + while (sch-q.qlen = tmp.limit - 1) + sfq_drop(sch); + qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); + while ((skb = sfq_q_dequeue(q)) != NULL) + sfq_q_enqueue(skb, tmp, SFQ_TAIL); + + /* clean up the old queue */ + sfq_q_destroy(q); + + /* copy elements of the tmp queue into the old queue */ + sfq_copy_parameters(q, tmp); + q-tail = tmp.tail; + q-max_depth = tmp.max_depth; + q-ht= tmp.ht; + q-dep = tmp.dep; + q-next = tmp.next; + q-allot = tmp.allot; + q-hash = tmp.hash; + q-qs= tmp.qs; + + /* finish up */ + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } else { + q-perturbation = 0; + } + sch_tree_unlock(sch); + return 0; +} + static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); @@ -537,7 +596,7 @@ static struct Qdisc_ops sfq_qdisc_ops = { .init = sfq_init, .reset = sfq_reset, .destroy= sfq_destroy, - .change = NULL, + .change = sfq_change, .dump = sfq_dump, .owner = THIS_MODULE, }; -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/10] Multiply perturb_period by HZ when used rather than when assigned.
perturb_period is the only parameter that doesn't match 1:1 with the value from userspace. This change makes it easy and clean to use a small macro for setting parameters (in a subsequent patch). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 341a9a1..2d3cc38 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -370,7 +370,7 @@ static void sfq_perturbation(unsigned long arg) q-perturbation = net_random()0x1F; if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; + q-perturb_timer.expires = jiffies + q-perturb_period * HZ; add_timer(q-perturb_timer); } } @@ -433,7 +433,7 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (ctl-quantum) q-quantum = ctl-quantum; if (ctl-perturb_period) - q-perturb_period = ctl-perturb_period * HZ; + q-perturb_period = ctl-perturb_period; if (ctl-divisor) q-hash_divisor = ctl-divisor; if (ctl-flows) @@ -496,7 +496,7 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) q-perturb_timer.data = (unsigned long)sch; q-perturb_timer.function = sfq_perturbation; if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; + q-perturb_timer.expires = jiffies + q-perturb_period * HZ; add_timer(q-perturb_timer); } @@ -542,7 +542,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) /* finish up */ if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; + q-perturb_timer.expires = jiffies + q-perturb_period * HZ; add_timer(q-perturb_timer); } else { q-perturbation = 0; @@ -558,7 +558,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) struct tc_sfq_qopt opt; opt.quantum = q-quantum; - opt.perturb_period = q-perturb_period/HZ; + opt.perturb_period = q-perturb_period; opt.limit = q-limit; opt.divisor = q-hash_divisor; -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] Remove comments about hardcoded values.
None of these are true anymore (hooray!). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |8 net/sched/sch_sfq.c | 17 +++-- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 268c515..58a0ea6 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,14 +148,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index b8e8fa5..341a9a1 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -61,18 +61,7 @@ We still need true WFQ for top level CSZ, but using WFQ for the best effort traffic is absolutely pointless: - SFQ is superior for this purpose. - - IMPLEMENTATION: - This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; number of hash buckets to 1024. - The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. - - It is easy to increase these values, but not in flight. */ + SFQ is superior for this purpose. */ #define SFQ_DEPTH_DEFAULT 128 #define SFQ_DIVISOR_DEFAULT1024 @@ -521,7 +510,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) struct sk_buff *skb; unsigned int qlen; int err; - + /* set up tmp queue */ memset(tmp, 0, sizeof(struct sfq_sched_data)); sfq_copy_parameters(tmp, q); @@ -536,7 +525,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); while ((skb = sfq_q_dequeue(q)) != NULL) sfq_q_enqueue(skb, tmp, SFQ_TAIL); - + /* clean up the old queue */ sfq_q_destroy(q); -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] Change perturb_period to unsigned.
perturb_period is currently a signed integer, but I can't see any good reason why this is so--a negative perturbation period will add a timer that expires in the past, causing constant perturbation, which makes hashing useless. if (q-perturb_period) { q-perturb_timer.expires = jiffies + q-perturb_period; add_timer(q-perturb_timer); } Strictly speaking, this will break binary compatibility with older versions of tc, but that ought not to be a problem because (a) there's no valid use for a negative perturb_period, and (b) negative values will be seen as high values ( INT_MAX), which don't work anyway. If perturb_period is too large, (perturb_period * HZ) will overflow the size of an unsigned int and wrap around. So, check for thet and reject values that are too high. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |2 +- net/sched/sch_sfq.c |8 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 58a0ea6..8559974 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -142,7 +142,7 @@ enum struct tc_sfq_qopt { unsignedquantum;/* Bytes per round allocated to flow */ - int perturb_period; /* Period of hash perturbation */ + unsignedperturb_period; /* Period of hash perturbation */ __u32 limit; /* Maximal packets in queue */ unsigneddivisor;/* Hash divisor */ unsignedflows; /* Maximal number of flows */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 2d3cc38..170fd37 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -74,6 +74,9 @@ typedef unsigned int sfq_index; #define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) +/* We don't want perturb_period * HZ to overflow an unsigned int. */ +#define SFQ_MAX_PERTURB (UINT_MAX / HZ) + struct sfq_head { sfq_index next; @@ -83,7 +86,7 @@ struct sfq_head struct sfq_sched_data { /* Parameters */ - int perturb_period; + unsignedperturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; @@ -441,7 +444,8 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (ctl-limit) q-limit = ctl-limit; - if (q-depth SFQ_MAX_DEPTH) + if (q-perturb_period SFQ_MAX_PERTURB || + q-depth SFQ_MAX_DEPTH) return -EINVAL; } q-limit = min_t(u32, q-limit, q-depth - 2); -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] Add divisor.
Make hash divisor user-configurable. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 34a299d..d72ea7c 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -75,7 +75,7 @@ It is easy to increase these values, but not in flight. */ #define SFQ_DEPTH_DEFAULT 128 -#define SFQ_HASH_DIVISOR 1024 +#define SFQ_DIVISOR_DEFAULT1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 @@ -98,6 +98,7 @@ struct sfq_sched_data unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; + unsignedhash_divisor; /* Variables */ struct timer_list perturb_timer; @@ -105,7 +106,7 @@ struct sfq_sched_data sfq_index tail; /* Index of current slot in round */ sfq_index max_depth; /* Maximal depth */ - sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ + sfq_index *ht;/* Hash table */ sfq_index *next; /* Active slots link */ short *allot; /* Current allotment per slot */ unsigned short *hash; /* Hash value indexed by slots */ @@ -120,7 +121,7 @@ static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1 /* Have we any rotation primitives? If not, WHY? */ h ^= (h1pert) ^ (h1(0x1F - pert)); h ^= h10; - return h 0x3FF; + return h (q-hash_divisor-1); } static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) @@ -388,6 +389,7 @@ static void sfq_perturbation(unsigned long arg) static void sfq_q_destroy(struct sfq_sched_data *q) { del_timer(q-perturb_timer); + kfree(q-ht); kfree(q-dep); kfree(q-next); kfree(q-allot); @@ -409,6 +411,7 @@ sfq_default_parameters(struct Qdisc *sch) q-quantum= psched_mtu(sch-dev); q-perturbation = 0; q-perturb_period = 0; + q-hash_divisor = SFQ_DIVISOR_DEFAULT; q-depth = SFQ_DEPTH_DEFAULT; q-limit = SFQ_DEPTH_DEFAULT - 2; } @@ -431,6 +434,8 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-quantum = ctl-quantum; if (ctl-perturb_period) q-perturb_period = ctl-perturb_period * HZ; + if (ctl-divisor) + q-hash_divisor = ctl-divisor; if (ctl-flows) q-depth = ctl-flows; if (ctl-limit) @@ -443,6 +448,9 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-tail = q-depth; q-max_depth = 0; + q-ht = kcalloc(q-hash_divisor, sizeof(sfq_index), GFP_KERNEL); + if (!q-ht) + goto err_case; q-dep = kcalloc(1 + q-depth*2, sizeof(struct sfq_head), GFP_KERNEL); if (!q-dep) goto err_case; @@ -459,7 +467,7 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (!q-qs) goto err_case; - for (i=0; iSFQ_HASH_DIVISOR; i++) + for (i=0; iq-hash_divisor; i++) q-ht[i] = q-depth; for (i=0; i q-depth; i++) { skb_queue_head_init(q-qs[i]); @@ -505,7 +513,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) opt.perturb_period = q-perturb_period/HZ; opt.limit = q-limit; - opt.divisor = SFQ_HASH_DIVISOR; + opt.divisor = q-hash_divisor; opt.flows = q-depth; RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] Use nested compat attributes to pass parameters.
This fixes the ambiguity between, for example: tc qdisc change ... perturb 0 tc qdisc change ... Without this patch, there is no way for SFQ to differentiate between a parameter specified to be 0 and a parameter that was omitted. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h | 13 +++ net/sched/sch_sfq.c | 53 +--- 2 files changed, 48 insertions(+), 18 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 8559974..aad04eb 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,6 +148,19 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; +enum +{ + TCA_SFQ_UNSPEC, + TCA_SFQ_COMPAT, + TCA_SFQ_QUANTUM, + TCA_SFQ_PERTURB, + TCA_SFQ_LIMIT, + TCA_SFQ_DIVISOR, + TCA_SFQ_FLOWS, + __TCA_SFQ_MAX, +}; + +#define TCA_SFQ_MAX (__TCA_SFQ_MAX - 1) /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 170fd37..36197f6 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -428,25 +428,31 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) * the previous values (sfq_change). So, overwrite the parameters as * specified. */ if (opt) { - struct tc_sfq_qopt *ctl = RTA_DATA(opt); - - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) - return -EINVAL; - - if (ctl-quantum) - q-quantum = ctl-quantum; - if (ctl-perturb_period) - q-perturb_period = ctl-perturb_period; - if (ctl-divisor) - q-hash_divisor = ctl-divisor; - if (ctl-flows) - q-depth = ctl-flows; - if (ctl-limit) - q-limit = ctl-limit; - + struct tc_sfq_qopt *ctl; + struct rtattr *tb[TCA_SFQ_MAX]; + + if (rtattr_parse_nested_compat(tb, TCA_SFQ_MAX, opt, ctl, + sizeof(*ctl))) + goto rtattr_failure; + +#define GET_PARAM(dst, nest, compat) do { \ + struct rtattr *rta = tb[(nest) - 1]; \ + if (rta) \ + (dst) = RTA_GET_U32(rta); \ + else if ((compat)) \ + (dst) = (compat); \ +} while (0) + + GET_PARAM(q-quantum,TCA_SFQ_QUANTUM, ctl-quantum); + GET_PARAM(q-perturb_period, TCA_SFQ_PERTURB, + ctl-perturb_period); + GET_PARAM(q-hash_divisor, TCA_SFQ_DIVISOR, ctl-divisor); + GET_PARAM(q-depth, TCA_SFQ_FLOWS, ctl-flows); + GET_PARAM(q-limit, TCA_SFQ_LIMIT, ctl-limit); + if (q-perturb_period SFQ_MAX_PERTURB || q-depth SFQ_MAX_DEPTH) - return -EINVAL; + goto rtattr_failure; } q-limit = min_t(u32, q-limit, q-depth - 2); q-tail = q-depth; @@ -482,6 +488,8 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) for (i=0; i q-depth; i++) sfq_link(q, i); return 0; +rtattr_failure: + return -EINVAL; err_case: sfq_q_destroy(q); return -ENOBUFS; @@ -559,17 +567,26 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); unsigned char *b = skb_tail_pointer(skb); + struct rtattr *nest; struct tc_sfq_qopt opt; opt.quantum = q-quantum; opt.perturb_period = q-perturb_period; - opt.limit = q-limit; opt.divisor = q-hash_divisor; opt.flows = q-depth; + nest = RTA_NEST_COMPAT(skb, TCA_OPTIONS, sizeof(opt), opt); + + RTA_PUT_U32(skb, TCA_SFQ_QUANTUM, q-quantum); + RTA_PUT_U32(skb, TCA_SFQ_PERTURB, q-perturb_period); + RTA_PUT_U32(skb, TCA_SFQ_LIMIT, q-limit); + RTA_PUT_U32(skb, TCA_SFQ_DIVISOR, q-hash_divisor); + RTA_PUT_U32(skb, TCA_SFQ_FLOWS, q-depth); RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); + RTA_NEST_COMPAT_END(skb, nest); + return skb-len; rtattr_failure: -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
sfq
These patches follow the ESFQ--SFQ kernel patches. See the kernel patch summary for general information. Thanks, Corey include/linux/pkt_sched.h | 23 ++- tc/q_sfq.c| 43 ++- 2 files changed, 52 insertions(+), 14 deletions(-) [PATCH 1/3] SFQ: Support changing depth and divisor. [PATCH 2/3] Change perturb_period to unsigned. [PATCH 3/3] Use nested compat attributes for passing parameters to the kernel. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Change perturb_period to unsigned.
This corresponds to the kernel patch doing the same. Here, too, this will technically break binary compatibility with older kernels, but that shouldn't be a problem because negative perturb_period values aren't usable anyway. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |2 +- tc/q_sfq.c|4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 9d41f63..fb04a89 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -142,7 +142,7 @@ enum struct tc_sfq_qopt { unsignedquantum;/* Bytes per round allocated to flow */ - int perturb_period; /* Period of hash perturbation */ + unsignedperturb_period; /* Period of hash perturbation */ __u32 limit; /* Maximal packets in queue */ unsigneddivisor;/* Hash divisor */ unsignedflows; /* Maximal number of flows */ diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 7754db7..c9fcc53 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -47,7 +47,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl ok++; } else if (strcmp(*argv, perturb) == 0) { NEXT_ARG(); - if (get_integer(opt.perturb_period, *argv, 0)) { + if (get_u32(opt.perturb_period, *argv, 0)) { fprintf(stderr, Illegal \perturb\\n); return -1; } @@ -115,7 +115,7 @@ static int sfq_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) fprintf(f, flows %u/%u , qopt-flows, qopt-divisor); } if (qopt-perturb_period) - fprintf(f, perturb %dsec , qopt-perturb_period); + fprintf(f, perturb %usec , qopt-perturb_period); return 0; } -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Use nested compat attributes for passing parameters to the kernel.
Note that I have left sfq_print_opt() alone. At this point, there can be no difference between the data in the nested rtattrs and the data in the compat rtattr, and I didn't want to add clutter that isn't useful. Let me know if I should do differently. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h | 14 ++ tc/q_sfq.c| 18 -- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index fb04a89..aad04eb 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,6 +148,20 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; +enum +{ + TCA_SFQ_UNSPEC, + TCA_SFQ_COMPAT, + TCA_SFQ_QUANTUM, + TCA_SFQ_PERTURB, + TCA_SFQ_LIMIT, + TCA_SFQ_DIVISOR, + TCA_SFQ_FLOWS, + __TCA_SFQ_MAX, +}; + +#define TCA_SFQ_MAX (__TCA_SFQ_MAX - 1) + /* RED section */ enum diff --git a/tc/q_sfq.c b/tc/q_sfq.c index c9fcc53..5bb3eb7 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -34,9 +34,13 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl { int ok=0; struct tc_sfq_qopt opt; + struct rtattr *nest; memset(opt, 0, sizeof(opt)); + /* put blank data in rtattr so there is a hole to fill later */ + nest = addattr_nest_compat(n, 1024, TCA_OPTIONS, opt, sizeof(opt)); + while (argc 0) { if (strcmp(*argv, quantum) == 0) { NEXT_ARG(); @@ -44,6 +48,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \limit\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_QUANTUM, opt.quantum); ok++; } else if (strcmp(*argv, perturb) == 0) { NEXT_ARG(); @@ -51,6 +56,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \perturb\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_PERTURB, opt.perturb_period); ok++; } else if (strcmp(*argv, limit) == 0) { NEXT_ARG(); @@ -62,6 +68,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \limit\, must be 1\n); return -1; } + addattr32(n, 1024, TCA_SFQ_LIMIT, opt.limit); ok++; } else if (strcmp(*argv, depth) == 0) { NEXT_ARG(); @@ -69,6 +76,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \depth\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_FLOWS, opt.flows); ok++; } else if (strcmp(*argv, divisor) == 0) { NEXT_ARG(); @@ -81,6 +89,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl return -1; } opt.divisor = 1opt.divisor; + addattr32(n, 1024, TCA_SFQ_DIVISOR, opt.divisor); ok++; } else if (strcmp(*argv, help) == 0) { explain(); @@ -93,8 +102,13 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl argc--; argv++; } - if (ok) - addattr_l(n, 1024, TCA_OPTIONS, opt, sizeof(opt)); + if (ok) { + /* fill the hole we left earlier with real compat data */ + memcpy(RTA_DATA(nest), opt, sizeof(opt)); + addattr_nest_compat_end(n, nest); + } + else + nest-rta_len = 0; return 0; } -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] SFQ: Support changing depth and divisor.
This can safely be applied either before or after the kernel patches because the tc_sfq_qopt struct is unchanged: - old kernels will ignore the parameters from new iproute2 - new kernels will use the same default parameters Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |9 - tc/q_sfq.c| 21 - 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 268c515..9d41f63 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,15 +148,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ - /* RED section */ enum diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 05385cf..7754db7 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -25,7 +25,7 @@ static void explain(void) { - fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ perturb SECS ] [ quantum BYTES ]\n); + fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ depth FLOWS ] [ divisor HASHBITS ] [ perturb SECS ] [ quantum BYTES ]\n); } #define usage() return(-1) @@ -63,6 +63,25 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl return -1; } ok++; + } else if (strcmp(*argv, depth) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.flows, *argv, 0)) { + fprintf(stderr, Illegal \depth\\n); + return -1; + } + ok++; + } else if (strcmp(*argv, divisor) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.divisor, *argv, 0)) { + fprintf(stderr, Illegal \divisor\\n); + return -1; + } + if (opt.divisor = 15) { + fprintf(stderr, Illegal \divisor\, must be 15\n); + return -1; + } + opt.divisor = 1opt.divisor; + ok++; } else if (strcmp(*argv, help) == 0) { explain(); return -1; -- 1.5.3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sfq (iproute2 patches)
Corey Hickey wrote: These patches follow the ESFQ--SFQ kernel patches. See the kernel patch summary for general information. Dang, I forgot to set the subject; these are the iproute2 patches. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/10] SFQ: backport some features from ESFQ (try 3)
Patchset try 2 addresses the review by Michael Buesch. Patchset try 3 addresses the review by Patrick McHardy. The first 7 patches in this series resemble the corresponding 7 patches I sent previously. There aren't any major changes--just modifications to address errors noticed in review and slight reorganizations to make the next patches easier. Patches 8-10 implement parameter passing via nested compat attributes. This is necessary for using 'tc qdisc change' to disable perturbation. The rest of the parameters were added for consistency. Iproute2 patches will follow shortly. The following is the original patch text. This set of patches adds some of ESFQ's modifications to the original SFQ. Thus far, I have received support for this approach rather than for trying to get ESFQ included as a separate qdisc. http://mailman.ds9a.nl/pipermail/lartc/2007q2/021056.html My patches here implement tc qdisc change, user-configurable depth (number of flows), and user-configurable divisor (for setting hash table size). I've left out the remaining ESFQ features (usage of jhash and different hashing methods) because Patrick McHardy intends to submit a patch that will supersede that functionality; see the URL above. Default values remain the same, and SFQ's default behavior remains the same, so there should be no user disruption. Thanks for your consideration, Corey include/linux/pkt_sched.h | 23 ++- net/sched/sch_sfq.c | 356 ++-- 2 files changed, 257 insertions(+), 122 deletions(-) [PATCH 01/10] Preparatory refactoring part 1. [PATCH 02/10] Preparatory refactoring part 2. [PATCH 03/10] Move two functions. [PATCH 04/10] Make depth (number of queues) user-configurable: [PATCH 05/10] Add divisor. [PATCH 06/10] Make qdisc changeable. [PATCH 07/10] Remove comments about hardcoded values. [PATCH 08/10] Multiply perturb_period by HZ when used rather than when assigned. [PATCH 09/10] Change perturb_period to unsigned. [PATCH 10/10] Use nested compat attributes to pass parameters. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] Preparatory refactoring part 1.
Make a new function sfq_q_enqueue() that operates directly on the queue data. This will be useful for implementing sfq_change() in a later patch. A pleasant side-effect is reducing most of the duplicate code in sfq_enqueue() and sfq_requeue(). Similarly, make a new function sfq_q_dequeue(). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 72 +++ 1 files changed, 38 insertions(+), 34 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 9579573..346e966 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -77,6 +77,9 @@ #define SFQ_DEPTH 128 #define SFQ_HASH_DIVISOR 1024 +#define SFQ_HEAD 0 +#define SFQ_TAIL 1 + /* This type should contain at least SFQ_DEPTH*2 values */ typedef unsigned char sfq_index; @@ -244,10 +247,9 @@ static unsigned int sfq_drop(struct Qdisc *sch) return 0; } -static int -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static void +sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) { - struct sfq_sched_data *q = qdisc_priv(sch); unsigned hash = sfq_hash(q, skb); sfq_index x; @@ -256,8 +258,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-ht[hash] = x = q-dep[SFQ_DEPTH].next; q-hash[x] = hash; } - sch-qstats.backlog += skb-len; - __skb_queue_tail(q-qs[x], skb); + + if (end == SFQ_TAIL) + __skb_queue_tail(q-qs[x], skb); + else + __skb_queue_head(q-qs[x], skb); + sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ if (q-tail == SFQ_DEPTH) { /* It is the first flow */ @@ -270,6 +276,15 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-tail = x; } } +} + +static int +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + + sfq_q_enqueue(skb, q, SFQ_TAIL); + sch-qstats.backlog += skb-len; if (++sch-q.qlen q-limit-1) { sch-bstats.bytes += skb-len; sch-bstats.packets++; @@ -284,45 +299,21 @@ static int sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct sfq_sched_data *q = qdisc_priv(sch); - unsigned hash = sfq_hash(q, skb); - sfq_index x; - x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; - q-hash[x] = hash; - } + sfq_q_enqueue(skb, q, SFQ_HEAD); sch-qstats.backlog += skb-len; - __skb_queue_head(q-qs[x], skb); - sfq_inc(q, x); - if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ - q-tail = x; - q-next[x] = x; - q-allot[x] = q-quantum; - } else { - q-next[x] = q-next[q-tail]; - q-next[q-tail] = x; - q-tail = x; - } - } if (++sch-q.qlen q-limit - 1) { sch-qstats.requeues++; return 0; } - sch-qstats.drops++; sfq_drop(sch); return NET_XMIT_CN; } - - - -static struct sk_buff * -sfq_dequeue(struct Qdisc* sch) +static struct +sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) { - struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, old_a; @@ -335,8 +326,6 @@ sfq_dequeue(struct Qdisc* sch) /* Grab packet */ skb = __skb_dequeue(q-qs[a]); sfq_dec(q, a); - sch-q.qlen--; - sch-qstats.backlog -= skb-len; /* Is the slot empty? */ if (q-qs[a].qlen == 0) { @@ -353,6 +342,21 @@ sfq_dequeue(struct Qdisc* sch) a = q-next[a]; q-allot[a] += q-quantum; } + + return skb; +} + +static struct sk_buff +*sfq_dequeue(struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sk_buff *skb; + + skb = sfq_q_dequeue(q); + if (skb == NULL) + return NULL; + sch-q.qlen--; + sch-qstats.backlog -= skb-len; return skb; } -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] Preparatory refactoring part 2.
Factor code out of sfq_init() and sfq_destroy(), again so that the new functions can be used by sfq_change() later. Actually, as the diff itself shows, most of the sfq_q_init() code comes from the original sfq_change(), but sfq_change() is only called by sfq_init() right now. Thus, it is safe to remove sfq_change(); tc qdisc change doesn't yet work for sfq anyway. Setting default parameters is moved into a separate function for clarity. The sfq_destroy() -- sfq_q_destroy() change looks pointless here, but it's cleaner to split now and add code to sfq_q_destroy() in a later patch. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 96 +-- 1 files changed, 55 insertions(+), 41 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 346e966..f95a0dc 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -382,43 +382,42 @@ static void sfq_perturbation(unsigned long arg) } } -static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +static void +sfq_default_parameters(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - struct tc_sfq_qopt *ctl = RTA_DATA(opt); - unsigned int qlen; - - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) - return -EINVAL; - - sch_tree_lock(sch); - q-quantum = ctl-quantum ? : psched_mtu(sch-dev); - q-perturb_period = ctl-perturb_period*HZ; - if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); - qlen = sch-q.qlen; - while (sch-q.qlen = q-limit-1) - sfq_drop(sch); - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); - - del_timer(q-perturb_timer); - if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; - add_timer(q-perturb_timer); - } - sch_tree_unlock(sch); - return 0; + q-quantum = psched_mtu(sch-dev); + q-perturbation = 0; + q-perturb_period = 0; + q-limit = SFQ_DEPTH; } -static int sfq_init(struct Qdisc *sch, struct rtattr *opt) + +static int +sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { - struct sfq_sched_data *q = qdisc_priv(sch); int i; - init_timer(q-perturb_timer); - q-perturb_timer.data = (unsigned long)sch; - q-perturb_timer.function = sfq_perturbation; + /* At this point, parameters are set to either defaults (sfq_init) or +* the previous values (sfq_change). So, overwrite the parameters as +* specified. */ + if (opt) { + struct tc_sfq_qopt *ctl = RTA_DATA(opt); + + if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) + return -EINVAL; + + if (ctl-quantum) + q-quantum = ctl-quantum; + if (ctl-perturb_period) + q-perturb_period = ctl-perturb_period * HZ; + if (ctl-limit) + q-limit = ctl-limit; + } + q-limit = min_t(u32, q-limit, SFQ_DEPTH); + q-tail = SFQ_DEPTH; + q-max_depth = 0; for (i=0; iSFQ_HASH_DIVISOR; i++) q-ht[i] = SFQ_DEPTH; @@ -427,28 +426,43 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; } - q-limit = SFQ_DEPTH; - q-max_depth = 0; - q-tail = SFQ_DEPTH; - if (opt == NULL) { - q-quantum = psched_mtu(sch-dev); - q-perturb_period = 0; - } else { - int err = sfq_change(sch, opt); - if (err) - return err; - } + for (i=0; iSFQ_DEPTH; i++) sfq_link(q, i); return 0; } -static void sfq_destroy(struct Qdisc *sch) +static int sfq_init(struct Qdisc *sch, struct rtattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); + int err; + + sfq_default_parameters(sch); + if ((err = sfq_q_init(q, opt))) + return err; + + init_timer(q-perturb_timer); + q-perturb_timer.data = (unsigned long)sch; + q-perturb_timer.function = sfq_perturbation; + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } + + return 0; +} + +static void sfq_q_destroy(struct sfq_sched_data *q) +{ del_timer(q-perturb_timer); } +static void sfq_destroy(struct Qdisc *sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_destroy(q); +} + static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org
[PATCH 03/10] Move two functions.
Move sfq_q_destroy() to above sfq_q_init() so that it can be used by an error case in a later patch. Move sfq_destroy() as well, for clarity. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 22 +++--- 1 files changed, 11 insertions(+), 11 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index f95a0dc..676195d 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -382,6 +382,17 @@ static void sfq_perturbation(unsigned long arg) } } +static void sfq_q_destroy(struct sfq_sched_data *q) +{ + del_timer(q-perturb_timer); +} + +static void sfq_destroy(struct Qdisc *sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_destroy(q); +} + static void sfq_default_parameters(struct Qdisc *sch) { @@ -452,17 +463,6 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } -static void sfq_q_destroy(struct sfq_sched_data *q) -{ - del_timer(q-perturb_timer); -} - -static void sfq_destroy(struct Qdisc *sch) -{ - struct sfq_sched_data *q = qdisc_priv(sch); - sfq_q_destroy(q); -} - static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] Make depth (number of queues) user-configurable:
* replace #define with a parameter * use old hardcoded value as a default * kcalloc() arrays in sfq_q_init() * free() arrays in sfq_q_destroy() Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 84 +++ 1 files changed, 58 insertions(+), 26 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 676195d..2e6d607 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -74,14 +74,16 @@ It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH_DEFAULT 128 #define SFQ_HASH_DIVISOR 1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 -/* This type should contain at least SFQ_DEPTH*2 values */ -typedef unsigned char sfq_index; +/* This type must contain greater than depth*2 values, so depth is constrained + * accordingly. */ +typedef unsigned int sfq_index; +#define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) struct sfq_head { @@ -95,6 +97,7 @@ struct sfq_sched_data int perturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; + unsigneddepth; /* Variables */ struct timer_list perturb_timer; @@ -103,11 +106,11 @@ struct sfq_sched_data sfq_index max_depth; /* Maximal depth */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH];/* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH];/* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + sfq_index *next; /* Active slots link */ + short *allot; /* Current allotment per slot */ + unsigned short *hash; /* Hash value indexed by slots */ + struct sk_buff_head *qs;/* Slot queue */ + struct sfq_head *dep; /* Linked list of slots, indexed by depth */ }; static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) @@ -164,7 +167,7 @@ static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q-qs[x].qlen + SFQ_DEPTH; + int d = q-qs[x].qlen + q-depth; p = d; n = q-dep[d].next; @@ -215,7 +218,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) drop a packet from it */ if (d 1) { - sfq_index x = q-dep[d+SFQ_DEPTH].next; + sfq_index x = q-dep[d + q-depth].next; skb = q-qs[x].prev; len = skb-len; __skb_unlink(skb, q-qs[x]); @@ -238,7 +241,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) kfree_skb(skb); sfq_dec(q, d); sch-q.qlen--; - q-ht[q-hash[d]] = SFQ_DEPTH; + q-ht[q-hash[d]] = q-depth; sch-qstats.drops++; sch-qstats.backlog -= len; return len; @@ -254,8 +257,8 @@ sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) sfq_index x; x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; + if (x == q-depth) { + q-ht[hash] = x = q-dep[q-depth].next; q-hash[x] = hash; } @@ -266,7 +269,7 @@ sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ + if (q-tail == q-depth) { /* It is the first flow */ q-tail = x; q-next[x] = x; q-allot[x] = q-quantum; @@ -318,7 +321,7 @@ sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) sfq_index a, old_a; /* No active slots */ - if (q-tail == SFQ_DEPTH) + if (q-tail == q-depth) return NULL; a = old_a = q-next[q-tail]; @@ -329,10 +332,10 @@ sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) /* Is the slot empty? */ if (q-qs[a].qlen == 0) { - q-ht[q-hash[a]] = SFQ_DEPTH; + q-ht[q-hash[a]] = q-depth; a = q-next[a]; if (a == old_a) { - q-tail = SFQ_DEPTH; + q-tail = q-depth; return skb; } q-next[q-tail] = a; @@ -385,6 +388,11 @@ static void sfq_perturbation(unsigned long arg) static void
[PATCH 08/10] Multiply perturb_period by HZ when used rather than when assigned.
perturb_period is the only parameter that doesn't match 1:1 with the value from userspace. This change makes it easy and clean to use a small macro for setting parameters (in a subsequent patch). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 77ffce3..f9f4460 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -370,7 +370,7 @@ static void sfq_perturbation(unsigned long arg) q-perturbation = net_random()0x1F; if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; + q-perturb_timer.expires = jiffies + q-perturb_period * HZ; add_timer(q-perturb_timer); } } @@ -432,7 +432,7 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (ctl-quantum) q-quantum = ctl-quantum; if (ctl-perturb_period) - q-perturb_period = ctl-perturb_period * HZ; + q-perturb_period = ctl-perturb_period; if (ctl-divisor) q-hash_divisor = ctl-divisor; if (ctl-flows) @@ -495,7 +495,7 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) q-perturb_timer.data = (unsigned long)sch; q-perturb_timer.function = sfq_perturbation; if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; + q-perturb_timer.expires = jiffies + q-perturb_period * HZ; add_timer(q-perturb_timer); } @@ -545,7 +545,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) /* finish up */ if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; + q-perturb_timer.expires = jiffies + q-perturb_period * HZ; add_timer(q-perturb_timer); } else { q-perturbation = 0; @@ -561,7 +561,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) struct tc_sfq_qopt opt; opt.quantum = q-quantum; - opt.perturb_period = q-perturb_period/HZ; + opt.perturb_period = q-perturb_period; opt.limit = q-limit; opt.divisor = q-hash_divisor; -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/10] Make qdisc changeable.
Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 64 ++- 1 files changed, 63 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 827b885..08e6862 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -415,6 +415,16 @@ sfq_default_parameters(struct Qdisc *sch) q-limit = q-depth = SFQ_DEPTH_DEFAULT; } +static void +sfq_copy_parameters(struct sfq_sched_data *dst, struct sfq_sched_data *src) +{ + dst-quantum= src-quantum; + dst-perturbation = src-perturbation; + dst-perturb_period = src-perturb_period; + dst-hash_divisor = src-hash_divisor; + dst-limit = src-limit; + dst-depth = src-depth; +} static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) @@ -503,6 +513,58 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sfq_sched_data tmp; + struct sk_buff *skb; + unsigned int qlen; + int err; + + /* set up tmp queue */ + memset(tmp, 0, sizeof(struct sfq_sched_data)); + sfq_copy_parameters(tmp, q); + if ((err = sfq_q_init(tmp, opt))) + return err; + + /* copy packets from the old queue to the tmp queue */ + sch_tree_lock(sch); + qlen = sch-q.qlen; + while (sch-q.qlen = tmp.limit - 1) + sfq_drop(sch); + qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); + while ((skb = sfq_q_dequeue(q)) != NULL) + sfq_q_enqueue(skb, tmp, SFQ_TAIL); + + /* clean up the old queue */ + sfq_q_destroy(q); + + /* copy elements of the tmp queue into the old queue */ + q-perturb_period = tmp.perturb_period; + q-quantum= tmp.quantum; + q-limit = tmp.limit; + q-depth = tmp.depth; + q-hash_divisor = tmp.hash_divisor; + q-tail = tmp.tail; + q-max_depth = tmp.max_depth; + q-ht= tmp.ht; + q-dep = tmp.dep; + q-next = tmp.next; + q-allot = tmp.allot; + q-hash = tmp.hash; + q-qs= tmp.qs; + + /* finish up */ + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } else { + q-perturbation = 0; + } + sch_tree_unlock(sch); + return 0; +} + static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); @@ -537,7 +599,7 @@ static struct Qdisc_ops sfq_qdisc_ops = { .init = sfq_init, .reset = sfq_reset, .destroy= sfq_destroy, - .change = NULL, + .change = sfq_change, .dump = sfq_dump, .owner = THIS_MODULE, }; -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] Remove comments about hardcoded values.
None of these are true anymore (hooray!). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |8 net/sched/sch_sfq.c | 17 +++-- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 268c515..58a0ea6 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,14 +148,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 08e6862..77ffce3 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -61,18 +61,7 @@ We still need true WFQ for top level CSZ, but using WFQ for the best effort traffic is absolutely pointless: - SFQ is superior for this purpose. - - IMPLEMENTATION: - This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; number of hash buckets to 1024. - The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. - - It is easy to increase these values, but not in flight. */ + SFQ is superior for this purpose. */ #define SFQ_DEPTH_DEFAULT 128 #define SFQ_DIVISOR_DEFAULT1024 @@ -520,7 +509,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) struct sk_buff *skb; unsigned int qlen; int err; - + /* set up tmp queue */ memset(tmp, 0, sizeof(struct sfq_sched_data)); sfq_copy_parameters(tmp, q); @@ -535,7 +524,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); while ((skb = sfq_q_dequeue(q)) != NULL) sfq_q_enqueue(skb, tmp, SFQ_TAIL); - + /* clean up the old queue */ sfq_q_destroy(q); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] Add divisor.
Make hash divisor user-configurable. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 2e6d607..827b885 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -75,7 +75,7 @@ It is easy to increase these values, but not in flight. */ #define SFQ_DEPTH_DEFAULT 128 -#define SFQ_HASH_DIVISOR 1024 +#define SFQ_DIVISOR_DEFAULT1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 @@ -98,6 +98,7 @@ struct sfq_sched_data unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; + unsignedhash_divisor; /* Variables */ struct timer_list perturb_timer; @@ -105,7 +106,7 @@ struct sfq_sched_data sfq_index tail; /* Index of current slot in round */ sfq_index max_depth; /* Maximal depth */ - sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ + sfq_index *ht;/* Hash table */ sfq_index *next; /* Active slots link */ short *allot; /* Current allotment per slot */ unsigned short *hash; /* Hash value indexed by slots */ @@ -120,7 +121,7 @@ static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1 /* Have we any rotation primitives? If not, WHY? */ h ^= (h1pert) ^ (h1(0x1F - pert)); h ^= h10; - return h 0x3FF; + return h (q-hash_divisor-1); } static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) @@ -388,6 +389,7 @@ static void sfq_perturbation(unsigned long arg) static void sfq_q_destroy(struct sfq_sched_data *q) { del_timer(q-perturb_timer); + kfree(q-ht); kfree(q-dep); kfree(q-next); kfree(q-allot); @@ -409,6 +411,7 @@ sfq_default_parameters(struct Qdisc *sch) q-quantum = psched_mtu(sch-dev); q-perturbation = 0; q-perturb_period = 0; + q-hash_divisor = SFQ_DIVISOR_DEFAULT; q-limit = q-depth = SFQ_DEPTH_DEFAULT; } @@ -431,6 +434,8 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-quantum = ctl-quantum; if (ctl-perturb_period) q-perturb_period = ctl-perturb_period * HZ; + if (ctl-divisor) + q-hash_divisor = ctl-divisor; if (ctl-flows) q-depth = ctl-flows; if (ctl-limit) @@ -443,6 +448,9 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-tail = q-depth; q-max_depth = 0; + q-ht = kcalloc(q-hash_divisor, sizeof(sfq_index), GFP_KERNEL); + if (!q-ht) + goto err_case; q-dep = kcalloc(1 + q-depth*2, sizeof(struct sfq_head), GFP_KERNEL); if (!q-dep) goto err_case; @@ -459,7 +467,7 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (!q-qs) goto err_case; - for (i=0; iSFQ_HASH_DIVISOR; i++) + for (i=0; iq-hash_divisor; i++) q-ht[i] = q-depth; for (i=0; i q-depth; i++) { skb_queue_head_init(q-qs[i]); @@ -505,7 +513,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) opt.perturb_period = q-perturb_period/HZ; opt.limit = q-limit; - opt.divisor = SFQ_HASH_DIVISOR; + opt.divisor = q-hash_divisor; opt.flows = q-depth; RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] Change perturb_period to unsigned.
perturb_period is currently a signed integer, but I can't see any good reason why this is so--a negative perturbation period will add a timer that expires in the past, causing constant perturbation, which makes hashing useless. if (q-perturb_period) { q-perturb_timer.expires = jiffies + q-perturb_period; add_timer(q-perturb_timer); } Strictly speaking, this will break binary compatibility with older versions of tc, but that ought not to be a problem because (a) there's no valid use for a negative perturb_period, and (b) negative values will be seen as high values ( INT_MAX), which don't work anyway. If perturb_period is too large, (perturb_period * HZ) will overflow the size of an unsigned int and wrap around. So, check for thet and reject values that are too high. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |2 +- net/sched/sch_sfq.c |8 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 58a0ea6..8559974 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -142,7 +142,7 @@ enum struct tc_sfq_qopt { unsignedquantum;/* Bytes per round allocated to flow */ - int perturb_period; /* Period of hash perturbation */ + unsignedperturb_period; /* Period of hash perturbation */ __u32 limit; /* Maximal packets in queue */ unsigneddivisor;/* Hash divisor */ unsignedflows; /* Maximal number of flows */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index f9f4460..157adc8 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -74,6 +74,9 @@ typedef unsigned int sfq_index; #define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) +/* We don't want perturb_period * HZ to overflow an unsigned int. */ +#define SFQ_MAX_PERTURB (UINT_MAX / HZ) + struct sfq_head { sfq_index next; @@ -83,7 +86,7 @@ struct sfq_head struct sfq_sched_data { /* Parameters */ - int perturb_period; + unsignedperturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; @@ -440,7 +443,8 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (ctl-limit) q-limit = ctl-limit; - if (q-depth SFQ_MAX_DEPTH) + if (q-perturb_period SFQ_MAX_PERTURB || + q-depth SFQ_MAX_DEPTH) return -EINVAL; } q-limit = min_t(u32, q-limit, q-depth); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] Use nested compat attributes to pass parameters.
This fixes the ambiguity between, for example: tc qdisc change ... perturb 0 tc qdisc change ... Without this patch, there is no way for SFQ to differentiate between a parameter specified to be 0 and a parameter that was omitted. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h | 13 +++ net/sched/sch_sfq.c | 53 +--- 2 files changed, 48 insertions(+), 18 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 8559974..aad04eb 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,6 +148,19 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; +enum +{ + TCA_SFQ_UNSPEC, + TCA_SFQ_COMPAT, + TCA_SFQ_QUANTUM, + TCA_SFQ_PERTURB, + TCA_SFQ_LIMIT, + TCA_SFQ_DIVISOR, + TCA_SFQ_FLOWS, + __TCA_SFQ_MAX, +}; + +#define TCA_SFQ_MAX (__TCA_SFQ_MAX - 1) /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 157adc8..62232c9 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -427,25 +427,31 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) * the previous values (sfq_change). So, overwrite the parameters as * specified. */ if (opt) { - struct tc_sfq_qopt *ctl = RTA_DATA(opt); - - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) - return -EINVAL; - - if (ctl-quantum) - q-quantum = ctl-quantum; - if (ctl-perturb_period) - q-perturb_period = ctl-perturb_period; - if (ctl-divisor) - q-hash_divisor = ctl-divisor; - if (ctl-flows) - q-depth = ctl-flows; - if (ctl-limit) - q-limit = ctl-limit; - + struct tc_sfq_qopt *ctl; + struct rtattr *tb[TCA_SFQ_MAX]; + + if (rtattr_parse_nested_compat(tb, TCA_SFQ_MAX, opt, ctl, + sizeof(*ctl))) + goto rtattr_failure; + +#define GET_PARAM(dst, nest, compat) do { \ + struct rtattr *rta = tb[(nest) - 1]; \ + if (rta) \ + (dst) = RTA_GET_U32(rta); \ + else if ((compat)) \ + (dst) = (compat); \ +} while (0) + + GET_PARAM(q-quantum,TCA_SFQ_QUANTUM, ctl-quantum); + GET_PARAM(q-perturb_period, TCA_SFQ_PERTURB, + ctl-perturb_period); + GET_PARAM(q-hash_divisor, TCA_SFQ_DIVISOR, ctl-divisor); + GET_PARAM(q-depth, TCA_SFQ_FLOWS, ctl-flows); + GET_PARAM(q-limit, TCA_SFQ_LIMIT, ctl-limit); + if (q-perturb_period SFQ_MAX_PERTURB || q-depth SFQ_MAX_DEPTH) - return -EINVAL; + goto rtattr_failure; } q-limit = min_t(u32, q-limit, q-depth); q-tail = q-depth; @@ -481,6 +487,8 @@ sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) for (i=0; i q-depth; i++) sfq_link(q, i); return 0; +rtattr_failure: + return -EINVAL; err_case: sfq_q_destroy(q); return -ENOBUFS; @@ -562,17 +570,26 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); unsigned char *b = skb_tail_pointer(skb); + struct rtattr *nest; struct tc_sfq_qopt opt; opt.quantum = q-quantum; opt.perturb_period = q-perturb_period; - opt.limit = q-limit; opt.divisor = q-hash_divisor; opt.flows = q-depth; + nest = RTA_NEST_COMPAT(skb, TCA_OPTIONS, sizeof(opt), opt); + + RTA_PUT_U32(skb, TCA_SFQ_QUANTUM, q-quantum); + RTA_PUT_U32(skb, TCA_SFQ_PERTURB, q-perturb_period); + RTA_PUT_U32(skb, TCA_SFQ_LIMIT, q-limit); + RTA_PUT_U32(skb, TCA_SFQ_DIVISOR, q-hash_divisor); + RTA_PUT_U32(skb, TCA_SFQ_FLOWS, q-depth); RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); + RTA_NEST_COMPAT_END(skb, nest); + return skb-len; rtattr_failure: -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3][iproute2] SFQ: backport some features from ESFQ
These patches follow the ESFQ--SFQ kernel patches. See the kernel patch summary for general information. Thanks, Corey include/linux/pkt_sched.h | 23 ++- tc/q_sfq.c| 43 ++- 2 files changed, 52 insertions(+), 14 deletions(-) [PATCH 1/3] SFQ: Support changing depth and divisor. [PATCH 2/3] Change perturb_period to unsigned. [PATCH 3/3] Use nested compat attributes for passing parameters to the kernel. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] SFQ: Support changing depth and divisor.
This can safely be applied either before or after the kernel patches because the tc_sfq_qopt struct is unchanged: - old kernels will ignore the parameters from new iproute2 - new kernels will use the same default parameters --- include/linux/pkt_sched.h |9 - tc/q_sfq.c| 21 - 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index d10f353..37946d4 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -139,15 +139,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ - /* RED section */ enum diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 05385cf..7754db7 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -25,7 +25,7 @@ static void explain(void) { - fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ perturb SECS ] [ quantum BYTES ]\n); + fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ depth FLOWS ] [ divisor HASHBITS ] [ perturb SECS ] [ quantum BYTES ]\n); } #define usage() return(-1) @@ -63,6 +63,25 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl return -1; } ok++; + } else if (strcmp(*argv, depth) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.flows, *argv, 0)) { + fprintf(stderr, Illegal \depth\\n); + return -1; + } + ok++; + } else if (strcmp(*argv, divisor) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.divisor, *argv, 0)) { + fprintf(stderr, Illegal \divisor\\n); + return -1; + } + if (opt.divisor = 15) { + fprintf(stderr, Illegal \divisor\, must be 15\n); + return -1; + } + opt.divisor = 1opt.divisor; + ok++; } else if (strcmp(*argv, help) == 0) { explain(); return -1; -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Change perturb_period to unsigned.
This corresponds to the kernel patch doing the same. Here, too, this will technically break binary compatibility with older kernels, but that shouldn't be a problem because negative perturb_period values aren't usable anyway. --- include/linux/pkt_sched.h |2 +- tc/q_sfq.c|4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 37946d4..0252bb7 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -133,7 +133,7 @@ enum struct tc_sfq_qopt { unsignedquantum;/* Bytes per round allocated to flow */ - int perturb_period; /* Period of hash perturbation */ + unsignedperturb_period; /* Period of hash perturbation */ __u32 limit; /* Maximal packets in queue */ unsigneddivisor;/* Hash divisor */ unsignedflows; /* Maximal number of flows */ diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 7754db7..c9fcc53 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -47,7 +47,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl ok++; } else if (strcmp(*argv, perturb) == 0) { NEXT_ARG(); - if (get_integer(opt.perturb_period, *argv, 0)) { + if (get_u32(opt.perturb_period, *argv, 0)) { fprintf(stderr, Illegal \perturb\\n); return -1; } @@ -115,7 +115,7 @@ static int sfq_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) fprintf(f, flows %u/%u , qopt-flows, qopt-divisor); } if (qopt-perturb_period) - fprintf(f, perturb %dsec , qopt-perturb_period); + fprintf(f, perturb %usec , qopt-perturb_period); return 0; } -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Use nested compat attributes for passing parameters to the kernel.
Note that I have left sfq_print_opt() alone. At this point, there can be no difference between the data in the nested rtattrs and the data in the compat rtattr, and I didn't want to add clutter that isn't useful. Let me know if I should do differently. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h | 14 ++ tc/q_sfq.c| 18 -- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 0252bb7..0388094 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -139,6 +139,20 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; +enum +{ + TCA_SFQ_UNSPEC, + TCA_SFQ_COMPAT, + TCA_SFQ_QUANTUM, + TCA_SFQ_PERTURB, + TCA_SFQ_LIMIT, + TCA_SFQ_DIVISOR, + TCA_SFQ_FLOWS, + __TCA_SFQ_MAX, +}; + +#define TCA_SFQ_MAX (__TCA_SFQ_MAX - 1) + /* RED section */ enum diff --git a/tc/q_sfq.c b/tc/q_sfq.c index c9fcc53..5bb3eb7 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -34,9 +34,13 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl { int ok=0; struct tc_sfq_qopt opt; + struct rtattr *nest; memset(opt, 0, sizeof(opt)); + /* put blank data in rtattr so there is a hole to fill later */ + nest = addattr_nest_compat(n, 1024, TCA_OPTIONS, opt, sizeof(opt)); + while (argc 0) { if (strcmp(*argv, quantum) == 0) { NEXT_ARG(); @@ -44,6 +48,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \limit\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_QUANTUM, opt.quantum); ok++; } else if (strcmp(*argv, perturb) == 0) { NEXT_ARG(); @@ -51,6 +56,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \perturb\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_PERTURB, opt.perturb_period); ok++; } else if (strcmp(*argv, limit) == 0) { NEXT_ARG(); @@ -62,6 +68,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \limit\, must be 1\n); return -1; } + addattr32(n, 1024, TCA_SFQ_LIMIT, opt.limit); ok++; } else if (strcmp(*argv, depth) == 0) { NEXT_ARG(); @@ -69,6 +76,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl fprintf(stderr, Illegal \depth\\n); return -1; } + addattr32(n, 1024, TCA_SFQ_FLOWS, opt.flows); ok++; } else if (strcmp(*argv, divisor) == 0) { NEXT_ARG(); @@ -81,6 +89,7 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl return -1; } opt.divisor = 1opt.divisor; + addattr32(n, 1024, TCA_SFQ_DIVISOR, opt.divisor); ok++; } else if (strcmp(*argv, help) == 0) { explain(); @@ -93,8 +102,13 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl argc--; argv++; } - if (ok) - addattr_l(n, 1024, TCA_OPTIONS, opt, sizeof(opt)); + if (ok) { + /* fill the hole we left earlier with real compat data */ + memcpy(RTA_DATA(nest), opt, sizeof(opt)); + addattr_nest_compat_end(n, nest); + } + else + nest-rta_len = 0; return 0; } -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] Make qdisc changeable.
Patrick McHardy wrote: + if ((err = sfq_q_init(tmp, opt))) + return err; This will also use defaults for all unspecified values. It would be more consistent with other qdiscs to only change those values that are actually specified, so something like tc qdisc change ... perturb 10 will *only* change the perturbation parameter. I'm fixed this for all the parameters except one--your example above. Since 0 is a valid value for perturb, I can't see any clever way to differentiate between the user specifying perturb 0 or leaving perturb unspecified. Either way, opt-perturb_period is 0. The only way I can see would be to add another member, say, opt-perturb_specified, and use that accordingly. Unfortunately, this would break usage of sfq with older versions of tc, so I'm hoping there's a better approach. Do you have any suggestions? I've looked at the other qdisc files and I don't see any other instances like this. Thanks, Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] Preparatory refactoring part 2.
Patrick McHardy wrote: Corey Hickey wrote: diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 8ae077f..0c46938 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -380,71 +380,71 @@ static void sfq_perturbation(unsigned long arg) } } -static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { - struct sfq_sched_data *q = qdisc_priv(sch); struct tc_sfq_qopt *ctl = RTA_DATA(opt); - unsigned int qlen; + int i; - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) + if (opt opt-rta_len RTA_LENGTH(sizeof(*ctl))) opt is dereferenced above (RTA_DATA), so if it is NULL we've already crashed. I think that test made ESFQ not crash when I did a qdisc change without giving any parameters, but that was a while ago and I might be mistaken. I'll need to rewrite much of this function anyway, and I'll pay attention to what happens when I get there. return -EINVAL; - sch_tree_lock(sch); - q-quantum = ctl-quantum ? : psched_mtu(sch-dev); - q-perturb_period = ctl-perturb_period*HZ; - if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + q-perturbation = 0; + q-max_depth = 0; + q-tail = q-limit = SFQ_DEPTH; + if (opt == NULL) { + q-perturb_period = 0; + } else { + struct tc_sfq_qopt *ctl = RTA_DATA(opt); + if (ctl-quantum) + q-quantum = ctl-quantum; + q-perturb_period = ctl-perturb_period*HZ; - qlen = sch-q.qlen; - while (sch-q.qlen = q-limit-1) - sfq_drop(sch); - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); I hope that patch that makes changing possible brings this back .. checking .. it doesn't. Please either keep this or fix up 6/7 to bring it back. It got lost in translation; I will add it to 6/7. Thanks, Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] Make qdisc changeable.
Patrick McHardy wrote: Corey Hickey wrote: Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 51 ++- 1 files changed, 50 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index e6a6a21..e042cd0 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -485,6 +485,55 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sfq_sched_data tmp; + struct sk_buff *skb; + int err; + + /* set up tmp queue */ + memset(tmp, 0, sizeof(struct sfq_sched_data)); + tmp.quantum = psched_mtu(sch-dev); /* default */ If no value is given it should use the old value instead of reinitializing to the default. + if ((err = sfq_q_init(tmp, opt))) + return err; This will also use defaults for all unspecified values. It would be more consistent with other qdiscs to only change those values that are actually specified, so something like tc qdisc change ... perturb 10 will *only* change the perturbation parameter. I somehow had it in my head that a qdisc change was supposed to work that way. I'll change my patch. I'm not sure reusing the initialization function and copying the parameters is the best way to do this. I'm not sure either, but I do like that it's conceptually simple and it keeps the parameter handling in one place. I've thought and stared for a while, and I don't see a better way, but that could of course be due to my limited understanding and general newbiehood in that regard. Thanks again for the review. I'll try to get a new batch of patches sent off tomorrow. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] Preparatory refactoring part 1.
Patrick McHardy wrote: -static int -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) Please make sure to break at 80 chars and to keep the style in this file consistent (newline before function name). Ok. For what it's worth, though, most of the original functions in the file don't have a newline before the function name. Omitting the newline would thus make the new/changed functions more consistent with the rest of the file. I don't have a preference either way, so unless you change your mind I'll put the newline back in.. { - struct sfq_sched_data *q = qdisc_priv(sch); unsigned hash = sfq_hash(q, skb); sfq_index x; @@ -256,8 +257,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-ht[hash] = x = q-dep[SFQ_DEPTH].next; q-hash[x] = hash; } - sch-qstats.backlog += skb-len; Why not keep this instead of having both callers do it? My idea was to have all the sfq_q_* functions operate on struct sfq_sched_data and have no knowledge of the struct Qdisc. I did this in order to be able to use the new functions in sfq_change() when the temporary sfq_sched_data doesn't have a parent Qdisc. There's probably a better way, and I am of course open to suggestions, but what I did made sense to me. - __skb_queue_tail(q-qs[x], skb); + + if (end == SFQ_TAIL) + __skb_queue_tail(q-qs[x], skb); + else + __skb_queue_head(q-qs[x], skb); + sfq_inc(q, x); if (q-qs[x].qlen == 1) {/* The flow is new */ if (q-tail == SFQ_DEPTH) { /* It is the first flow */ @@ -270,12 +275,21 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-tail = x; } } +} + +static int +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); newline please. Ok. + sfq_q_enqueue(skb, q, SFQ_TAIL); + sch-qstats.backlog += skb-len; if (++sch-q.qlen q-limit-1) { sch-bstats.bytes += skb-len; sch-bstats.packets++; return 0; } + sch-qstats.drops++; sfq_drop already increments this. You're right, of course. When I look at the original sfq_requeue(), though, I see that same line right before sfq_drop(). That's probably why it ended up in my patch. Is that a bug? The original sfq_enqueue() doesn't have that line. http://git.kernel.org/?p=linux/kernel/git/jgarzik/netdev-2.6.git;a=blob;f=net/sched/sch_sfq.c line 314 sfq_drop(sch); return NET_XMIT_CN; } @@ -284,28 +298,8 @@ static int sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct sfq_sched_data *q = qdisc_priv(sch); newline please Ok. -static struct sk_buff * -sfq_dequeue(struct Qdisc* sch) +static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) Keep style consistent please. Ok. Thank you for the review. I'll address the rest of your comments when I have time later. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] SFQ: backport some features from ESFQ
Hello, This set of patches adds some of ESFQ's modifications to the original SFQ. Thus far, I have received support for this approach rather than for trying to get ESFQ included as a separate qdisc. http://mailman.ds9a.nl/pipermail/lartc/2007q2/021056.html My patches here implement tc qdisc change, user-configurable depth (number of flows), and user-configurable divisor (for setting hash table size). I've left out the remaining ESFQ features (usage of jhash and different hashing methods) because Patrick McHardy intends to submit a patch that will supersede that functionality; see the URL above. Default values remain the same, and SFQ's default behavior remains the same, so there should be no user disruption. A patch for iproute2 is included after the end of the kernel patch series. Thanks for your consideration, Corey include/linux/pkt_sched.h |8 -- net/sched/sch_sfq.c | 301 + 2 files changed, 192 insertions(+), 117 deletions(-) [PATCH 1/7] Preparatory refactoring part 1. [PATCH 2/7] Preparatory refactoring part 2. [PATCH 3/7] Move two functions. [PATCH 4/7] Add depth. [PATCH 5/7] Add divisor. [PATCH 6/7] Make qdisc changeable. [PATCH 7/7] Remove comments about hardcoded values. [PATCH] [iproute2] SFQ: Support changing depth and divisor. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] Preparatory refactoring part 1.
Make a new function sfq_q_enqueue() that operates directly on the queue data. This will be useful for implementing sfq_change() in a later patch. A pleasant side-effect is reducing most of the duplicate code in sfq_enqueue() and sfq_requeue(). Similarly, make a new function sfq_q_dequeue(). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 70 ++ 1 files changed, 36 insertions(+), 34 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 9579573..8ae077f 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -77,6 +77,9 @@ #define SFQ_DEPTH 128 #define SFQ_HASH_DIVISOR 1024 +#define SFQ_HEAD 0 +#define SFQ_TAIL 1 + /* This type should contain at least SFQ_DEPTH*2 values */ typedef unsigned char sfq_index; @@ -244,10 +247,8 @@ static unsigned int sfq_drop(struct Qdisc *sch) return 0; } -static int -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) { - struct sfq_sched_data *q = qdisc_priv(sch); unsigned hash = sfq_hash(q, skb); sfq_index x; @@ -256,8 +257,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-ht[hash] = x = q-dep[SFQ_DEPTH].next; q-hash[x] = hash; } - sch-qstats.backlog += skb-len; - __skb_queue_tail(q-qs[x], skb); + + if (end == SFQ_TAIL) + __skb_queue_tail(q-qs[x], skb); + else + __skb_queue_head(q-qs[x], skb); + sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ if (q-tail == SFQ_DEPTH) { /* It is the first flow */ @@ -270,12 +275,21 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-tail = x; } } +} + +static int +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_enqueue(skb, q, SFQ_TAIL); + sch-qstats.backlog += skb-len; if (++sch-q.qlen q-limit-1) { sch-bstats.bytes += skb-len; sch-bstats.packets++; return 0; } + sch-qstats.drops++; sfq_drop(sch); return NET_XMIT_CN; } @@ -284,28 +298,8 @@ static int sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct sfq_sched_data *q = qdisc_priv(sch); - unsigned hash = sfq_hash(q, skb); - sfq_index x; - - x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; - q-hash[x] = hash; - } + sfq_q_enqueue(skb, q, SFQ_HEAD); sch-qstats.backlog += skb-len; - __skb_queue_head(q-qs[x], skb); - sfq_inc(q, x); - if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ - q-tail = x; - q-next[x] = x; - q-allot[x] = q-quantum; - } else { - q-next[x] = q-next[q-tail]; - q-next[q-tail] = x; - q-tail = x; - } - } if (++sch-q.qlen q-limit - 1) { sch-qstats.requeues++; return 0; @@ -316,13 +310,8 @@ sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) return NET_XMIT_CN; } - - - -static struct sk_buff * -sfq_dequeue(struct Qdisc* sch) +static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) { - struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, old_a; @@ -335,8 +324,6 @@ sfq_dequeue(struct Qdisc* sch) /* Grab packet */ skb = __skb_dequeue(q-qs[a]); sfq_dec(q, a); - sch-q.qlen--; - sch-qstats.backlog -= skb-len; /* Is the slot empty? */ if (q-qs[a].qlen == 0) { @@ -353,6 +340,21 @@ sfq_dequeue(struct Qdisc* sch) a = q-next[a]; q-allot[a] += q-quantum; } + + return skb; +} + +static struct sk_buff +*sfq_dequeue(struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sk_buff *skb; + + skb = sfq_q_dequeue(q); + if (skb == NULL) + return NULL; + sch-q.qlen--; + sch-qstats.backlog -= skb-len; return skb; } -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] Preparatory refactoring part 2.
Factor code out of sfq_init() and sfq_destroy(), again so that the new functions can be used by sfq_change() later. Actually, as the diff itself shows, most of the sfq_q_init() code comes from the original sfq_change(), but sfq_change() is only called by sfq_init() right now. Thus, it is safe to remove sfq_change(); tc qdisc change doesn't yet work for sfq anyway. The sfq_destroy() -- sfq_q_destroy() change looks pointless here, but it's cleaner to split now and add code to sfq_q_destroy() in a later patch. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 80 +- 1 files changed, 40 insertions(+), 40 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 8ae077f..0c46938 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -380,71 +380,71 @@ static void sfq_perturbation(unsigned long arg) } } -static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { - struct sfq_sched_data *q = qdisc_priv(sch); struct tc_sfq_qopt *ctl = RTA_DATA(opt); - unsigned int qlen; + int i; - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) + if (opt opt-rta_len RTA_LENGTH(sizeof(*ctl))) return -EINVAL; - sch_tree_lock(sch); - q-quantum = ctl-quantum ? : psched_mtu(sch-dev); - q-perturb_period = ctl-perturb_period*HZ; - if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + q-perturbation = 0; + q-max_depth = 0; + q-tail = q-limit = SFQ_DEPTH; + if (opt == NULL) { + q-perturb_period = 0; + } else { + struct tc_sfq_qopt *ctl = RTA_DATA(opt); + if (ctl-quantum) + q-quantum = ctl-quantum; + q-perturb_period = ctl-perturb_period*HZ; - qlen = sch-q.qlen; - while (sch-q.qlen = q-limit-1) - sfq_drop(sch); - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); + if (ctl-limit) + q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + } - del_timer(q-perturb_timer); - if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; - add_timer(q-perturb_timer); + for (i=0; iSFQ_HASH_DIVISOR; i++) + q-ht[i] = SFQ_DEPTH; + for (i=0; iSFQ_DEPTH; i++) { + skb_queue_head_init(q-qs[i]); + q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; + q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; } - sch_tree_unlock(sch); + + for (i=0; iSFQ_DEPTH; i++) + sfq_link(q, i); return 0; } static int sfq_init(struct Qdisc *sch, struct rtattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); - int i; + int err; + + q-quantum = psched_mtu(sch-dev); /* default */ + if ((err = sfq_q_init(q, opt))) + return err; init_timer(q-perturb_timer); q-perturb_timer.data = (unsigned long)sch; q-perturb_timer.function = sfq_perturbation; - - for (i=0; iSFQ_HASH_DIVISOR; i++) - q-ht[i] = SFQ_DEPTH; - for (i=0; iSFQ_DEPTH; i++) { - skb_queue_head_init(q-qs[i]); - q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; - q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; - } - q-limit = SFQ_DEPTH; - q-max_depth = 0; - q-tail = SFQ_DEPTH; - if (opt == NULL) { - q-quantum = psched_mtu(sch-dev); - q-perturb_period = 0; - } else { - int err = sfq_change(sch, opt); - if (err) - return err; + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); } - for (i=0; iSFQ_DEPTH; i++) - sfq_link(q, i); + return 0; } +static void sfq_q_destroy(struct sfq_sched_data *q) +{ + del_timer(q-perturb_timer); +} + static void sfq_destroy(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - del_timer(q-perturb_timer); + sfq_q_destroy(q); } static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] Move two functions.
Move sfq_q_destroy() to above sfq_q_init() so that it can be used by an error case in a later patch. Move sfq_destroy() as well, for clarity. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 24 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 0c46938..583f925 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -380,6 +380,17 @@ static void sfq_perturbation(unsigned long arg) } } +static void sfq_q_destroy(struct sfq_sched_data *q) +{ + del_timer(q-perturb_timer); +} + +static void sfq_destroy(struct Qdisc *sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_destroy(q); +} + static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { struct tc_sfq_qopt *ctl = RTA_DATA(opt); @@ -420,7 +431,7 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); int err; - + q-quantum = psched_mtu(sch-dev); /* default */ if ((err = sfq_q_init(q, opt))) return err; @@ -436,17 +447,6 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } -static void sfq_q_destroy(struct sfq_sched_data *q) -{ - del_timer(q-perturb_timer); -} - -static void sfq_destroy(struct Qdisc *sch) -{ - struct sfq_sched_data *q = qdisc_priv(sch); - sfq_q_destroy(q); -} - static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] Add depth.
Make depth (number of queues) user-configurable: * replace #define with a parameter * use old hardcoded value as a default * kmalloc() arrays in sfq_q_init() * free() arrays in sfq_q_destroy() Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 85 --- 1 files changed, 60 insertions(+), 25 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 583f925..2ff6a27 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -74,14 +74,14 @@ It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH_DEFAULT 128 #define SFQ_HASH_DIVISOR 1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 -/* This type should contain at least SFQ_DEPTH*2 values */ -typedef unsigned char sfq_index; +/* This type should contain at least depth*2 values */ +typedef unsigned int sfq_index; struct sfq_head { @@ -95,6 +95,7 @@ struct sfq_sched_data int perturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; + unsigneddepth; /* Variables */ struct timer_list perturb_timer; @@ -103,11 +104,11 @@ struct sfq_sched_data sfq_index max_depth; /* Maximal depth */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH];/* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH];/* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + sfq_index *next; /* Active slots link */ + short *allot; /* Current allotment per slot */ + unsigned short *hash; /* Hash value indexed by slots */ + struct sk_buff_head *qs;/* Slot queue */ + struct sfq_head *dep; /* Linked list of slots, indexed by depth */ }; static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) @@ -164,7 +165,7 @@ static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q-qs[x].qlen + SFQ_DEPTH; + int d = q-qs[x].qlen + q-depth; p = d; n = q-dep[d].next; @@ -215,7 +216,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) drop a packet from it */ if (d 1) { - sfq_index x = q-dep[d+SFQ_DEPTH].next; + sfq_index x = q-dep[d+q-depth].next; skb = q-qs[x].prev; len = skb-len; __skb_unlink(skb, q-qs[x]); @@ -238,7 +239,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) kfree_skb(skb); sfq_dec(q, d); sch-q.qlen--; - q-ht[q-hash[d]] = SFQ_DEPTH; + q-ht[q-hash[d]] = q-depth; sch-qstats.drops++; sch-qstats.backlog -= len; return len; @@ -253,8 +254,8 @@ static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigne sfq_index x; x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; + if (x == q-depth) { + q-ht[hash] = x = q-dep[q-depth].next; q-hash[x] = hash; } @@ -265,7 +266,7 @@ static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigne sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ + if (q-tail == q-depth) { /* It is the first flow */ q-tail = x; q-next[x] = x; q-allot[x] = q-quantum; @@ -316,7 +317,7 @@ static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) sfq_index a, old_a; /* No active slots */ - if (q-tail == SFQ_DEPTH) + if (q-tail == q-depth) return NULL; a = old_a = q-next[q-tail]; @@ -327,10 +328,10 @@ static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) /* Is the slot empty? */ if (q-qs[a].qlen == 0) { - q-ht[q-hash[a]] = SFQ_DEPTH; + q-ht[q-hash[a]] = q-depth; a = q-next[a]; if (a == old_a) { - q-tail = SFQ_DEPTH; + q-tail = q-depth; return skb; } q-next[q-tail] = a; @@ -383,6 +384,16 @@ static void sfq_perturbation(unsigned long arg) static void
[PATCH 5/7] Add divisor.
Make hash divisor user-configurable. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 2ff6a27..3e67a68 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -75,7 +75,7 @@ It is easy to increase these values, but not in flight. */ #define SFQ_DEPTH_DEFAULT 128 -#define SFQ_HASH_DIVISOR 1024 +#define SFQ_DIVISOR_DEFAULT1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 @@ -96,6 +96,7 @@ struct sfq_sched_data unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; + unsignedhash_divisor; /* Variables */ struct timer_list perturb_timer; @@ -103,7 +104,7 @@ struct sfq_sched_data sfq_index tail; /* Index of current slot in round */ sfq_index max_depth; /* Maximal depth */ - sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ + sfq_index *ht;/* Hash table */ sfq_index *next; /* Active slots link */ short *allot; /* Current allotment per slot */ unsigned short *hash; /* Hash value indexed by slots */ @@ -118,7 +119,7 @@ static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1 /* Have we any rotation primitives? If not, WHY? */ h ^= (h1pert) ^ (h1(0x1F - pert)); h ^= h10; - return h 0x3FF; + return h (q-hash_divisor-1); } static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) @@ -384,6 +385,8 @@ static void sfq_perturbation(unsigned long arg) static void sfq_q_destroy(struct sfq_sched_data *q) { del_timer(q-perturb_timer); + if(q-ht) + kfree(q-ht); if(q-dep) kfree(q-dep); if(q-next) @@ -415,12 +418,14 @@ static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-max_depth = 0; if (opt == NULL) { q-perturb_period = 0; + q-hash_divisor = SFQ_DIVISOR_DEFAULT; q-tail = q-limit = q-depth = SFQ_DEPTH_DEFAULT; } else { struct tc_sfq_qopt *ctl = RTA_DATA(opt); if (ctl-quantum) q-quantum = ctl-quantum; q-perturb_period = ctl-perturb_period*HZ; + q-hash_divisor = ctl-divisor ? : SFQ_DIVISOR_DEFAULT; q-tail = q-limit = q-depth = ctl-flows ? : SFQ_DEPTH_DEFAULT; if (q-depth p - 1) @@ -430,6 +435,9 @@ static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-limit = min_t(u32, ctl-limit, q-depth); } + q-ht = kmalloc(q-hash_divisor*sizeof(sfq_index), GFP_KERNEL); + if (!q-ht) + goto err_case; q-dep = kmalloc((1+q-depth*2)*sizeof(struct sfq_head), GFP_KERNEL); if (!q-dep) goto err_case; @@ -446,7 +454,7 @@ static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (!q-qs) goto err_case; - for (i=0; iSFQ_HASH_DIVISOR; i++) + for (i=0; iq-hash_divisor; i++) q-ht[i] = q-depth; for (i=0; iq-depth; i++) { skb_queue_head_init(q-qs[i]); @@ -492,7 +500,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) opt.perturb_period = q-perturb_period/HZ; opt.limit = q-limit; - opt.divisor = SFQ_HASH_DIVISOR; + opt.divisor = q-hash_divisor; opt.flows = q-depth; RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] Make qdisc changeable.
Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 51 ++- 1 files changed, 50 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3e67a68..4cd523f 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -490,6 +490,55 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sfq_sched_data tmp; + struct sk_buff *skb; + int err; + + /* set up tmp queue */ + memset(tmp, 0, sizeof(struct sfq_sched_data)); + tmp.quantum = psched_mtu(sch-dev); /* default */ + if ((err = sfq_q_init(tmp, opt))) + return err; + + /* copy packets from the old queue to the tmp queue */ + sch_tree_lock(sch); + while (sch-q.qlen = tmp.limit - 1) + sfq_drop(sch); + while ((skb = sfq_q_dequeue(q)) != NULL) + sfq_q_enqueue(skb, tmp, SFQ_TAIL); + + /* clean up the old queue */ + sfq_q_destroy(q); + + /* copy elements of the tmp queue into the old queue */ + q-perturb_period = tmp.perturb_period; + q-quantum= tmp.quantum; + q-limit = tmp.limit; + q-depth = tmp.depth; + q-hash_divisor = tmp.hash_divisor; + q-tail = tmp.tail; + q-max_depth = tmp.max_depth; + q-ht= tmp.ht; + q-dep = tmp.dep; + q-next = tmp.next; + q-allot = tmp.allot; + q-hash = tmp.hash; + q-qs= tmp.qs; + + /* finish up */ + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } else { + q-perturbation = 0; + } + sch_tree_unlock(sch); + return 0; +} + static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); @@ -524,7 +573,7 @@ static struct Qdisc_ops sfq_qdisc_ops = { .init = sfq_init, .reset = sfq_reset, .destroy= sfq_destroy, - .change = NULL, + .change = sfq_change, .dump = sfq_dump, .owner = THIS_MODULE, }; -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [iproute2] SFQ: Support changing depth and divisor.
This can safely be applied either before or after the kernel patches because the tc_sfq_qopt struct is unchanged: - old kernels will ignore the parameters from new iproute2 - new kernels will use the same default parameters --- include/linux/pkt_sched.h |9 - tc/q_sfq.c| 21 - 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index d10f353..37946d4 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -139,15 +139,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ - /* RED section */ enum diff --git a/tc/q_sfq.c b/tc/q_sfq.c index 05385cf..7754db7 100644 --- a/tc/q_sfq.c +++ b/tc/q_sfq.c @@ -25,7 +25,7 @@ static void explain(void) { - fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ perturb SECS ] [ quantum BYTES ]\n); + fprintf(stderr, Usage: ... sfq [ limit NUMBER ] [ depth FLOWS ] [ divisor HASHBITS ] [ perturb SECS ] [ quantum BYTES ]\n); } #define usage() return(-1) @@ -63,6 +63,25 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl return -1; } ok++; + } else if (strcmp(*argv, depth) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.flows, *argv, 0)) { + fprintf(stderr, Illegal \depth\\n); + return -1; + } + ok++; + } else if (strcmp(*argv, divisor) == 0) { + NEXT_ARG(); + if (get_unsigned(opt.divisor, *argv, 0)) { + fprintf(stderr, Illegal \divisor\\n); + return -1; + } + if (opt.divisor = 15) { + fprintf(stderr, Illegal \divisor\, must be 15\n); + return -1; + } + opt.divisor = 1opt.divisor; + ok++; } else if (strcmp(*argv, help) == 0) { explain(); return -1; -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] Remove comments about hardcoded values.
None of these are true anymore (hooray!). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |8 net/sched/sch_sfq.c | 17 +++-- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 268c515..58a0ea6 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,14 +148,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 4cd523f..8e84881 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -61,18 +61,7 @@ We still need true WFQ for top level CSZ, but using WFQ for the best effort traffic is absolutely pointless: - SFQ is superior for this purpose. - - IMPLEMENTATION: - This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; number of hash buckets to 1024. - The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. - - It is easy to increase these values, but not in flight. */ + SFQ is superior for this purpose. */ #define SFQ_DEPTH_DEFAULT 128 #define SFQ_DIVISOR_DEFAULT1024 @@ -496,7 +485,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) struct sfq_sched_data tmp; struct sk_buff *skb; int err; - + /* set up tmp queue */ memset(tmp, 0, sizeof(struct sfq_sched_data)); tmp.quantum = psched_mtu(sch-dev); /* default */ @@ -509,7 +498,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) sfq_drop(sch); while ((skb = sfq_q_dequeue(q)) != NULL) sfq_q_enqueue(skb, tmp, SFQ_TAIL); - + /* clean up the old queue */ sfq_q_destroy(q); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] SFQ: backport some features from ESFQ
Corey Hickey wrote: Hello, This set of patches adds some of ESFQ's modifications to the original SFQ. Thus far, I have received support for this approach rather than for trying to get ESFQ included as a separate qdisc. Crud; I wasn't expecting git-send-email to thread the messages that way. I guess I should have used --no-chain-reply-to. If I ought re-send this series with that option, somebody please say the word and I'll do so. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] Add depth.
Michael Buesch wrote: On Sunday 29 July 2007 09:08:51 Corey Hickey wrote: p = d; n = q-dep[d].next; @@ -215,7 +216,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) drop a packet from it */ if (d 1) { - sfq_index x = q-dep[d+SFQ_DEPTH].next; + sfq_index x = q-dep[d+q-depth].next; Please q-dep[d + q-depth] Makes it _much_ more readable. And doesn't confuse my brain with a minus and a BiggerThan sign ;) Ok. @@ -383,6 +384,16 @@ static void sfq_perturbation(unsigned long arg) static void sfq_q_destroy(struct sfq_sched_data *q) { del_timer(q-perturb_timer); + if(q-dep) + kfree(q-dep); + if(q-next) + kfree(q-next); + if(q-allot) + kfree(q-allot); + if(q-hash) + kfree(q-hash); + if(q-qs) + kfree(q-qs); No need to check for !=NULL. kfree handles NULL. Ok. Thanks. } static void sfq_destroy(struct Qdisc *sch) @@ -394,6 +405,7 @@ static void sfq_destroy(struct Qdisc *sch) static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { struct tc_sfq_qopt *ctl = RTA_DATA(opt); + sfq_index p = ~0U/2; int i; if (opt opt-rta_len RTA_LENGTH(sizeof(*ctl))) @@ -401,30 +413,53 @@ static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-perturbation = 0; q-max_depth = 0; - q-tail = q-limit = SFQ_DEPTH; if (opt == NULL) { q-perturb_period = 0; + q-tail = q-limit = q-depth = SFQ_DEPTH_DEFAULT; } else { struct tc_sfq_qopt *ctl = RTA_DATA(opt); if (ctl-quantum) q-quantum = ctl-quantum; q-perturb_period = ctl-perturb_period*HZ; + q-tail = q-limit = q-depth = ctl-flows ? : SFQ_DEPTH_DEFAULT; + + if (q-depth p - 1) + return -EINVAL; Compare depth against (~0U/2)-1? What's that doing? Should probably add a comment. ~0U/2 - 1 is the maximum value depth can be, based on how it is used in indexing q-dep. I agree, though, that deserves a comment. Actually, I'll also change it to '#define SFQ_DEPTH_MAX (~0U/2 - 1)' and put it near the top of the file next to the 'typedef unsigned int sfq_index;'. I could also include limits.h and use UINT_MAX instead of ~0U; would that be preferable? if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + q-limit = min_t(u32, ctl-limit, q-depth); } + q-dep = kmalloc((1+q-depth*2)*sizeof(struct sfq_head), GFP_KERNEL); + if (!q-dep) + goto err_case; + q-next = kmalloc(q-depth*sizeof(sfq_index), GFP_KERNEL); + if (!q-next) + goto err_case; + q-allot = kmalloc(q-depth*sizeof(short), GFP_KERNEL); + if (!q-allot) + goto err_case; + q-hash = kmalloc(q-depth*sizeof(unsigned short), GFP_KERNEL); + if (!q-hash) + goto err_case; + q-qs = kmalloc(q-depth*sizeof(struct sk_buff_head), GFP_KERNEL); + if (!q-qs) + goto err_case; You may chose to use kcalloc for array allocations. The arrays in the original code don't get zeroed either, so that shouldn't be necessary (and I haven't heard of any problems so far). Do you suggest I use kcalloc() anyway, just as a good practice? for (i=0; iSFQ_HASH_DIVISOR; i++) - q-ht[i] = SFQ_DEPTH; - for (i=0; iSFQ_DEPTH; i++) { + q-ht[i] = q-depth; + for (i=0; iq-depth; i++) { skb_queue_head_init(q-qs[i]); - q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; - q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; + q-dep[i+q-depth].next = i+q-depth; + q-dep[i+q-depth].prev = i+q-depth; } - for (i=0; iSFQ_DEPTH; i++) + for (i=0; iq-depth; i++) sfq_link(q, i); return 0; +err_case: This leaks a few kmallocs. Are you saying that the 'err_case:' leaks kmallocs? It calls sfq_q_destroy(q), which kfrees each of the arrays: dep, next, allot, hash, and qs. Is that sufficient, or am I missing something or misunderstanding you? + sfq_q_destroy(q); + return -ENOBUFS; } Thank you for your review. Could you please clarify the questions I have? I'll make, test, and submit a revision of this patch after that. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] Move two functions.
Move sfq_q_destroy() to above sfq_q_init() so that it can be used by an error case in a later patch. Move sfq_destroy() as well, for clarity. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 24 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 0c46938..583f925 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -380,6 +380,17 @@ static void sfq_perturbation(unsigned long arg) } } +static void sfq_q_destroy(struct sfq_sched_data *q) +{ + del_timer(q-perturb_timer); +} + +static void sfq_destroy(struct Qdisc *sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_destroy(q); +} + static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { struct tc_sfq_qopt *ctl = RTA_DATA(opt); @@ -420,7 +431,7 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); int err; - + q-quantum = psched_mtu(sch-dev); /* default */ if ((err = sfq_q_init(q, opt))) return err; @@ -436,17 +447,6 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } -static void sfq_q_destroy(struct sfq_sched_data *q) -{ - del_timer(q-perturb_timer); -} - -static void sfq_destroy(struct Qdisc *sch) -{ - struct sfq_sched_data *q = qdisc_priv(sch); - sfq_q_destroy(q); -} - static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] Preparatory refactoring part 2.
Factor code out of sfq_init() and sfq_destroy(), again so that the new functions can be used by sfq_change() later. Actually, as the diff itself shows, most of the sfq_q_init() code comes from the original sfq_change(), but sfq_change() is only called by sfq_init() right now. Thus, it is safe to remove sfq_change(); tc qdisc change doesn't yet work for sfq anyway. The sfq_destroy() -- sfq_q_destroy() change looks pointless here, but it's cleaner to split now and add code to sfq_q_destroy() in a later patch. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 80 +- 1 files changed, 40 insertions(+), 40 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 8ae077f..0c46938 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -380,71 +380,71 @@ static void sfq_perturbation(unsigned long arg) } } -static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { - struct sfq_sched_data *q = qdisc_priv(sch); struct tc_sfq_qopt *ctl = RTA_DATA(opt); - unsigned int qlen; + int i; - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) + if (opt opt-rta_len RTA_LENGTH(sizeof(*ctl))) return -EINVAL; - sch_tree_lock(sch); - q-quantum = ctl-quantum ? : psched_mtu(sch-dev); - q-perturb_period = ctl-perturb_period*HZ; - if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + q-perturbation = 0; + q-max_depth = 0; + q-tail = q-limit = SFQ_DEPTH; + if (opt == NULL) { + q-perturb_period = 0; + } else { + struct tc_sfq_qopt *ctl = RTA_DATA(opt); + if (ctl-quantum) + q-quantum = ctl-quantum; + q-perturb_period = ctl-perturb_period*HZ; - qlen = sch-q.qlen; - while (sch-q.qlen = q-limit-1) - sfq_drop(sch); - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); + if (ctl-limit) + q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + } - del_timer(q-perturb_timer); - if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; - add_timer(q-perturb_timer); + for (i=0; iSFQ_HASH_DIVISOR; i++) + q-ht[i] = SFQ_DEPTH; + for (i=0; iSFQ_DEPTH; i++) { + skb_queue_head_init(q-qs[i]); + q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; + q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; } - sch_tree_unlock(sch); + + for (i=0; iSFQ_DEPTH; i++) + sfq_link(q, i); return 0; } static int sfq_init(struct Qdisc *sch, struct rtattr *opt) { struct sfq_sched_data *q = qdisc_priv(sch); - int i; + int err; + + q-quantum = psched_mtu(sch-dev); /* default */ + if ((err = sfq_q_init(q, opt))) + return err; init_timer(q-perturb_timer); q-perturb_timer.data = (unsigned long)sch; q-perturb_timer.function = sfq_perturbation; - - for (i=0; iSFQ_HASH_DIVISOR; i++) - q-ht[i] = SFQ_DEPTH; - for (i=0; iSFQ_DEPTH; i++) { - skb_queue_head_init(q-qs[i]); - q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; - q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; - } - q-limit = SFQ_DEPTH; - q-max_depth = 0; - q-tail = SFQ_DEPTH; - if (opt == NULL) { - q-quantum = psched_mtu(sch-dev); - q-perturb_period = 0; - } else { - int err = sfq_change(sch, opt); - if (err) - return err; + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); } - for (i=0; iSFQ_DEPTH; i++) - sfq_link(q, i); + return 0; } +static void sfq_q_destroy(struct sfq_sched_data *q) +{ + del_timer(q-perturb_timer); +} + static void sfq_destroy(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - del_timer(q-perturb_timer); + sfq_q_destroy(q); } static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SFQ: backport some features from ESFQ (try 2)
Hello, Patchset try 2 addresses the review by Michael Buesch. This set of patches adds some of ESFQ's modifications to the original SFQ. Thus far, I have received support for this approach rather than for trying to get ESFQ included as a separate qdisc. http://mailman.ds9a.nl/pipermail/lartc/2007q2/021056.html My patches here implement tc qdisc change, user-configurable depth (number of flows), and user-configurable divisor (for setting hash table size). I've left out the remaining ESFQ features (usage of jhash and different hashing methods) because Patrick McHardy intends to submit a patch that will supersede that functionality; see the URL above. Default values remain the same, and SFQ's default behavior remains the same, so there should be no user disruption. A patch for iproute2 is included after the end of the kernel patch series. Thanks for your consideration, Corey include/linux/pkt_sched.h |8 -- net/sched/sch_sfq.c | 296 - 2 files changed, 187 insertions(+), 117 deletions(-) [PATCH 1/7] Preparatory refactoring part 1. [PATCH 2/7] Preparatory refactoring part 2. [PATCH 3/7] Move two functions. [PATCH 4/7] Add depth. [PATCH 5/7] Add divisor. [PATCH 6/7] Make qdisc changeable. [PATCH 7/7] Remove comments about hardcoded values. [PATCH] [iproute2] SFQ: Support changing depth and divisor. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] Add divisor.
Make hash divisor user-configurable. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 70124ac..e6a6a21 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -75,7 +75,7 @@ It is easy to increase these values, but not in flight. */ #define SFQ_DEPTH_DEFAULT 128 -#define SFQ_HASH_DIVISOR 1024 +#define SFQ_DIVISOR_DEFAULT1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 @@ -98,6 +98,7 @@ struct sfq_sched_data unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; unsigneddepth; + unsignedhash_divisor; /* Variables */ struct timer_list perturb_timer; @@ -105,7 +106,7 @@ struct sfq_sched_data sfq_index tail; /* Index of current slot in round */ sfq_index max_depth; /* Maximal depth */ - sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ + sfq_index *ht;/* Hash table */ sfq_index *next; /* Active slots link */ short *allot; /* Current allotment per slot */ unsigned short *hash; /* Hash value indexed by slots */ @@ -120,7 +121,7 @@ static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1 /* Have we any rotation primitives? If not, WHY? */ h ^= (h1pert) ^ (h1(0x1F - pert)); h ^= h10; - return h 0x3FF; + return h (q-hash_divisor-1); } static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) @@ -386,6 +387,7 @@ static void sfq_perturbation(unsigned long arg) static void sfq_q_destroy(struct sfq_sched_data *q) { del_timer(q-perturb_timer); + kfree(q-ht); kfree(q-dep); kfree(q-next); kfree(q-allot); @@ -411,12 +413,14 @@ static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-max_depth = 0; if (opt == NULL) { q-perturb_period = 0; + q-hash_divisor = SFQ_DIVISOR_DEFAULT; q-tail = q-limit = q-depth = SFQ_DEPTH_DEFAULT; } else { struct tc_sfq_qopt *ctl = RTA_DATA(opt); if (ctl-quantum) q-quantum = ctl-quantum; q-perturb_period = ctl-perturb_period*HZ; + q-hash_divisor = ctl-divisor ? : SFQ_DIVISOR_DEFAULT; q-tail = q-limit = q-depth = ctl-flows ? : SFQ_DEPTH_DEFAULT; if (q-depth SFQ_MAX_DEPTH) @@ -426,6 +430,9 @@ static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) q-limit = min_t(u32, ctl-limit, q-depth); } + q-ht = kcalloc(q-hash_divisor, sizeof(sfq_index), GFP_KERNEL); + if (!q-ht) + goto err_case; q-dep = kcalloc(1 + q-depth*2, sizeof(struct sfq_head), GFP_KERNEL); if (!q-dep) goto err_case; @@ -442,7 +449,7 @@ static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) if (!q-qs) goto err_case; - for (i=0; iSFQ_HASH_DIVISOR; i++) + for (i=0; iq-hash_divisor; i++) q-ht[i] = q-depth; for (i=0; i q-depth; i++) { skb_queue_head_init(q-qs[i]); @@ -488,7 +495,7 @@ static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) opt.perturb_period = q-perturb_period/HZ; opt.limit = q-limit; - opt.divisor = SFQ_HASH_DIVISOR; + opt.divisor = q-hash_divisor; opt.flows = q-depth; RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), opt); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] Preparatory refactoring part 1.
Make a new function sfq_q_enqueue() that operates directly on the queue data. This will be useful for implementing sfq_change() in a later patch. A pleasant side-effect is reducing most of the duplicate code in sfq_enqueue() and sfq_requeue(). Similarly, make a new function sfq_q_dequeue(). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 70 ++ 1 files changed, 36 insertions(+), 34 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 9579573..8ae077f 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -77,6 +77,9 @@ #define SFQ_DEPTH 128 #define SFQ_HASH_DIVISOR 1024 +#define SFQ_HEAD 0 +#define SFQ_TAIL 1 + /* This type should contain at least SFQ_DEPTH*2 values */ typedef unsigned char sfq_index; @@ -244,10 +247,8 @@ static unsigned int sfq_drop(struct Qdisc *sch) return 0; } -static int -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) { - struct sfq_sched_data *q = qdisc_priv(sch); unsigned hash = sfq_hash(q, skb); sfq_index x; @@ -256,8 +257,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-ht[hash] = x = q-dep[SFQ_DEPTH].next; q-hash[x] = hash; } - sch-qstats.backlog += skb-len; - __skb_queue_tail(q-qs[x], skb); + + if (end == SFQ_TAIL) + __skb_queue_tail(q-qs[x], skb); + else + __skb_queue_head(q-qs[x], skb); + sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ if (q-tail == SFQ_DEPTH) { /* It is the first flow */ @@ -270,12 +275,21 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-tail = x; } } +} + +static int +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + sfq_q_enqueue(skb, q, SFQ_TAIL); + sch-qstats.backlog += skb-len; if (++sch-q.qlen q-limit-1) { sch-bstats.bytes += skb-len; sch-bstats.packets++; return 0; } + sch-qstats.drops++; sfq_drop(sch); return NET_XMIT_CN; } @@ -284,28 +298,8 @@ static int sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct sfq_sched_data *q = qdisc_priv(sch); - unsigned hash = sfq_hash(q, skb); - sfq_index x; - - x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; - q-hash[x] = hash; - } + sfq_q_enqueue(skb, q, SFQ_HEAD); sch-qstats.backlog += skb-len; - __skb_queue_head(q-qs[x], skb); - sfq_inc(q, x); - if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ - q-tail = x; - q-next[x] = x; - q-allot[x] = q-quantum; - } else { - q-next[x] = q-next[q-tail]; - q-next[q-tail] = x; - q-tail = x; - } - } if (++sch-q.qlen q-limit - 1) { sch-qstats.requeues++; return 0; @@ -316,13 +310,8 @@ sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) return NET_XMIT_CN; } - - - -static struct sk_buff * -sfq_dequeue(struct Qdisc* sch) +static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) { - struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, old_a; @@ -335,8 +324,6 @@ sfq_dequeue(struct Qdisc* sch) /* Grab packet */ skb = __skb_dequeue(q-qs[a]); sfq_dec(q, a); - sch-q.qlen--; - sch-qstats.backlog -= skb-len; /* Is the slot empty? */ if (q-qs[a].qlen == 0) { @@ -353,6 +340,21 @@ sfq_dequeue(struct Qdisc* sch) a = q-next[a]; q-allot[a] += q-quantum; } + + return skb; +} + +static struct sk_buff +*sfq_dequeue(struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sk_buff *skb; + + skb = sfq_q_dequeue(q); + if (skb == NULL) + return NULL; + sch-q.qlen--; + sch-qstats.backlog -= skb-len; return skb; } -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] Add depth.
Make depth (number of queues) user-configurable: * replace #define with a parameter * use old hardcoded value as a default * kcalloc() arrays in sfq_q_init() * free() arrays in sfq_q_destroy() Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 81 +++ 1 files changed, 56 insertions(+), 25 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 583f925..70124ac 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -74,14 +74,16 @@ It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH_DEFAULT 128 #define SFQ_HASH_DIVISOR 1024 #define SFQ_HEAD 0 #define SFQ_TAIL 1 -/* This type should contain at least SFQ_DEPTH*2 values */ -typedef unsigned char sfq_index; +/* This type must contain greater than depth*2 values, so depth is constrained + * accordingly. */ +typedef unsigned int sfq_index; +#define SFQ_MAX_DEPTH (UINT_MAX / 2 - 1) struct sfq_head { @@ -95,6 +97,7 @@ struct sfq_sched_data int perturb_period; unsignedquantum;/* Allotment per round: MUST BE = MTU */ int limit; + unsigneddepth; /* Variables */ struct timer_list perturb_timer; @@ -103,11 +106,11 @@ struct sfq_sched_data sfq_index max_depth; /* Maximal depth */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH];/* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH];/* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + sfq_index *next; /* Active slots link */ + short *allot; /* Current allotment per slot */ + unsigned short *hash; /* Hash value indexed by slots */ + struct sk_buff_head *qs;/* Slot queue */ + struct sfq_head *dep; /* Linked list of slots, indexed by depth */ }; static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) @@ -164,7 +167,7 @@ static unsigned sfq_hash(struct sfq_sched_data *q, struct sk_buff *skb) static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q-qs[x].qlen + SFQ_DEPTH; + int d = q-qs[x].qlen + q-depth; p = d; n = q-dep[d].next; @@ -215,7 +218,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) drop a packet from it */ if (d 1) { - sfq_index x = q-dep[d+SFQ_DEPTH].next; + sfq_index x = q-dep[d + q-depth].next; skb = q-qs[x].prev; len = skb-len; __skb_unlink(skb, q-qs[x]); @@ -238,7 +241,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) kfree_skb(skb); sfq_dec(q, d); sch-q.qlen--; - q-ht[q-hash[d]] = SFQ_DEPTH; + q-ht[q-hash[d]] = q-depth; sch-qstats.drops++; sch-qstats.backlog -= len; return len; @@ -253,8 +256,8 @@ static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigne sfq_index x; x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; + if (x == q-depth) { + q-ht[hash] = x = q-dep[q-depth].next; q-hash[x] = hash; } @@ -265,7 +268,7 @@ static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigne sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ + if (q-tail == q-depth) { /* It is the first flow */ q-tail = x; q-next[x] = x; q-allot[x] = q-quantum; @@ -316,7 +319,7 @@ static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) sfq_index a, old_a; /* No active slots */ - if (q-tail == SFQ_DEPTH) + if (q-tail == q-depth) return NULL; a = old_a = q-next[q-tail]; @@ -327,10 +330,10 @@ static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) /* Is the slot empty? */ if (q-qs[a].qlen == 0) { - q-ht[q-hash[a]] = SFQ_DEPTH; + q-ht[q-hash[a]] = q-depth; a = q-next[a]; if (a == old_a) { - q-tail = SFQ_DEPTH; + q-tail = q-depth; return skb; } q-next[q-tail
[PATCH 6/7] Make qdisc changeable.
Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 51 ++- 1 files changed, 50 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index e6a6a21..e042cd0 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -485,6 +485,55 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sfq_sched_data tmp; + struct sk_buff *skb; + int err; + + /* set up tmp queue */ + memset(tmp, 0, sizeof(struct sfq_sched_data)); + tmp.quantum = psched_mtu(sch-dev); /* default */ + if ((err = sfq_q_init(tmp, opt))) + return err; + + /* copy packets from the old queue to the tmp queue */ + sch_tree_lock(sch); + while (sch-q.qlen = tmp.limit - 1) + sfq_drop(sch); + while ((skb = sfq_q_dequeue(q)) != NULL) + sfq_q_enqueue(skb, tmp, SFQ_TAIL); + + /* clean up the old queue */ + sfq_q_destroy(q); + + /* copy elements of the tmp queue into the old queue */ + q-perturb_period = tmp.perturb_period; + q-quantum= tmp.quantum; + q-limit = tmp.limit; + q-depth = tmp.depth; + q-hash_divisor = tmp.hash_divisor; + q-tail = tmp.tail; + q-max_depth = tmp.max_depth; + q-ht= tmp.ht; + q-dep = tmp.dep; + q-next = tmp.next; + q-allot = tmp.allot; + q-hash = tmp.hash; + q-qs= tmp.qs; + + /* finish up */ + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } else { + q-perturbation = 0; + } + sch_tree_unlock(sch); + return 0; +} + static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) { struct sfq_sched_data *q = qdisc_priv(sch); @@ -519,7 +568,7 @@ static struct Qdisc_ops sfq_qdisc_ops = { .init = sfq_init, .reset = sfq_reset, .destroy= sfq_destroy, - .change = NULL, + .change = sfq_change, .dump = sfq_dump, .owner = THIS_MODULE, }; -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] Remove comments about hardcoded values.
None of these are true anymore (hooray!). Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- include/linux/pkt_sched.h |8 net/sched/sch_sfq.c | 17 +++-- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 268c515..58a0ea6 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -148,14 +148,6 @@ struct tc_sfq_qopt unsignedflows; /* Maximal number of flows */ }; -/* - * NOTE: limit, divisor and flows are hardwired to code at the moment. - * - * limit=flows=128, divisor=1024; - * - * The only reason for this is efficiency, it is possible - * to change these parameters in compile time. - */ /* RED section */ diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index e042cd0..3890452 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -61,18 +61,7 @@ We still need true WFQ for top level CSZ, but using WFQ for the best effort traffic is absolutely pointless: - SFQ is superior for this purpose. - - IMPLEMENTATION: - This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; number of hash buckets to 1024. - The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. - - It is easy to increase these values, but not in flight. */ + SFQ is superior for this purpose. */ #define SFQ_DEPTH_DEFAULT 128 #define SFQ_DIVISOR_DEFAULT1024 @@ -491,7 +480,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) struct sfq_sched_data tmp; struct sk_buff *skb; int err; - + /* set up tmp queue */ memset(tmp, 0, sizeof(struct sfq_sched_data)); tmp.quantum = psched_mtu(sch-dev); /* default */ @@ -504,7 +493,7 @@ static int sfq_change(struct Qdisc *sch, struct rtattr *opt) sfq_drop(sch); while ((skb = sfq_q_dequeue(q)) != NULL) sfq_q_enqueue(skb, tmp, SFQ_TAIL); - + /* clean up the old queue */ sfq_q_destroy(q); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LARTC] ESFQ: request for user input
Patrick McHardy wrote: Corey Hickey wrote: Patrick McHardy wrote: Should ESFQ be merged into SFQ or remain as a separate qdisc? I've CCed netdev. I think merging parts of ESFQ (dynamic depth and flow number) would make a lot of sense, but I'm intending to submit an alternative to the ESFQ hashing scheme for 2.6.23: http://www.mail-archive.com/netdev@vger.kernel.org/msg39156.html Nice. I wasn't aware of that. Your patch looks like it supersedes ESFQ's hashing, so, if it gets applied, that already removes a large chunk of the differences between SFQ and ESFQ. If I don't hear any opposition, then I'll keep an eye out for when your patch gets accepted (assuming it does) and then submit patch(es) porting the rest of ESFQ's features to SFQ. I think it would be best if you would start posting patches to add the missing features (without the hash changes) to SFQ, if you're quick this may already go in during the 2.6.23 merge window. My changes are mostly independant of yours, if there are any clashes the one who goes last will just have to rediff their patches :) Since you need to pass additional parameters to SFQ for your changes, have a look at my rtnetlink compat attribute patch: http://article.gmane.org/gmane.linux.network/64851 Ok, I'll work on it later. Thanks. -Corey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html