Re: [PATCH 0/2] netem: trace enhancement
I have added the possibility to configure the number of buffers used to store the trace data for packet delays. The complete command to start netem with a trace file is: tc qdisc add dev eth1 root netem trace path/to/trace/file.bin buf 3 loops 1 0 with buf: the number of buffers to be used loops: how many times to loop through the tracefile the last argument is optional and specifies whether the default is to drop packets or 0-delay them. The patches are available at: http://www.tcn.hypert.net/tcn_kernel_2_6_23_confbuf http://www.tcn.hypert.net/tcn_iproute2_2_6_23_confbuf I'm looking forward for your comments! Thanks! Ariane Ben Greear wrote: Ariane Keller wrote: Yes, for short-term starvation it helps certainly. But I'm still not convinced that it is really necessary to add more buffers, because I'm not sure whether the bottleneck is really the loading of data from user space to kernel space. Some basic tests have shown that the kernel starts loosing packets at approximately the same packet rate regardless whether we use netem, or netem with the trace extension. But if you have contrary experience I'm happy to add a parameter which defines the number of buffers. I have no numbers, so if you think it works, then that is fine with me. If you actually run out of the trace buffers, do you just continue to run with the last settings? If so, that would keep up throughput even if you are out of trace buffers... What rates do you see, btw? (pps, bps). Thanks, Ben -- Ariane Keller Communication Systems Research Group, ETH Zurich Web: http://www.csg.ethz.ch/people/arkeller Office: ETZ G 60.1, Gloriastrasse 35, 8092 Zurich -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement: kernel
This patch applies to kernel 2.6.23. It enhances the network emulator netem with the possibility to read all delay/drop/duplicate etc values from a trace file. This trace file contains for each packet to be processed one value. The values are read from the file in a user space process called flowseed. These values are sent to the netem module with the help of rtnetlink sockets. In the netem module the values are cached in buffers. The number of buffers is configurable upon start of netem. If a buffer is empty the netem module sends a rtnetlink notification to the flowseed process. Upon receiving such a notification this process sends the next 1000 values to the netem module. signed-off-by: Ariane Keller [EMAIL PROTECTED] --- diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/include/linux/pkt_sched.h linux-2.6.23.8_mod/include/linux/pkt_sched.h --- linux-2.6.23.8/include/linux/pkt_sched.h 2007-11-16 19:14:27.0 +0100 +++ linux-2.6.23.8_mod/include/linux/pkt_sched.h 2007-12-21 19:42:49.0 +0100 @@ -439,6 +439,9 @@ enum TCA_NETEM_DELAY_DIST, TCA_NETEM_REORDER, TCA_NETEM_CORRUPT, + TCA_NETEM_TRACE, + TCA_NETEM_TRACE_DATA, + TCA_NETEM_STATS, __TCA_NETEM_MAX, }; @@ -454,6 +457,26 @@ struct tc_netem_qopt __u32 jitter; /* random jitter in latency (us) */ }; +struct tc_netem_stats +{ + int packetcount; + int packetok; + int normaldelay; + int drops; + int dupl; + int corrupt; + int novaliddata; + int reloadbuffer; +}; + +struct tc_netem_trace +{ + __u32 fid; /*flowid */ + __u32 def; /* default action 0 = no delay, 1 = drop*/ + __u32 ticks; /* number of ticks corresponding to 1ms */ + __u32 nr_bufs; /* number of buffers to save trace data*/ +}; + struct tc_netem_corr { __u32 delay_corr; /* delay correlation */ diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/include/net/flowseed.h linux-2.6.23.8_mod/include/net/flowseed.h --- linux-2.6.23.8/include/net/flowseed.h 1970-01-01 01:00:00.0 +0100 +++ linux-2.6.23.8_mod/include/net/flowseed.h 2007-12-21 19:43:24.0 +0100 @@ -0,0 +1,34 @@ +/* flowseed.h header file for the netem trace enhancement + */ + +#ifndef _FLOWSEED_H +#define _FLOWSEED_H +#include net/sch_generic.h + +/* must be divisible by 4 (=#pkts)*/ +#define DATA_PACKAGE 4000 +#define DATA_PACKAGE_ID 4008 + +/* struct per flow - kernel */ +struct tcn_control +{ + struct list_head full_buffer_list; + struct list_head empty_buffer_list; + struct buflist * buffer_in_use; + int *offsetpos; /* pointer to actual pos in the buffer in use */ + int flowid; +}; + +struct tcn_statistic +{ + int packetcount; + int packetok; + int normaldelay; + int drops; + int dupl; + int corrupt; + int novaliddata; + int reloadbuffer; +}; + +#endif diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/include/net/pkt_sched.h linux-2.6.23.8_mod/include/net/pkt_sched.h --- linux-2.6.23.8/include/net/pkt_sched.h 2007-11-16 19:14:27.0 +0100 +++ linux-2.6.23.8_mod/include/net/pkt_sched.h 2007-12-21 19:42:49.0 +0100 @@ -72,6 +72,9 @@ extern void qdisc_watchdog_cancel(struct extern struct Qdisc_ops pfifo_qdisc_ops; extern struct Qdisc_ops bfifo_qdisc_ops; +extern int qdisc_notify_pid(int pid, struct nlmsghdr *n, u32 clid, + struct Qdisc *old, struct Qdisc *new); + extern int register_qdisc(struct Qdisc_ops *qops); extern int unregister_qdisc(struct Qdisc_ops *qops); extern struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/net/core/rtnetlink.c linux-2.6.23.8_mod/net/core/rtnetlink.c --- linux-2.6.23.8/net/core/rtnetlink.c 2007-11-16 19:14:27.0 +0100 +++ linux-2.6.23.8_mod/net/core/rtnetlink.c 2007-12-21 19:42:49.0 +0100 @@ -460,7 +460,7 @@ int rtnetlink_send(struct sk_buff *skb, NETLINK_CB(skb).dst_group = group; if (echo) atomic_inc(skb-users); - netlink_broadcast(rtnl, skb, pid, group, GFP_KERNEL); + netlink_broadcast(rtnl, skb, pid, group, gfp_any()); if (echo) err = netlink_unicast(rtnl, skb, pid, MSG_DONTWAIT); return err; diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/net/sched/sch_api.c linux-2.6.23.8_mod/net/sched/sch_api.c --- linux-2.6.23.8/net/sched/sch_api.c 2007-11-16 19:14:27.0 +0100 +++ linux-2.6.23.8_mod/net/sched/sch_api.c 2007-12-21 19:42:49.0 +0100 @@ -28,6 +28,7 @@ #include linux/list.h #include linux/hrtimer.h +#include net/sock.h #include net/netlink.h #include net/pkt_sched.h @@ -841,6 +842,62 @@ rtattr_failure: nlmsg_trim(skb, b); return -1; }
Re: [PATCH 0/2] netem: trace enhancement: iproute
The iproute patch is to big to send on the mailing list, since the distribution data have changed the directory. For ease of discussion I add the important changes in this mail. signed-of-by: Ariane Keller [EMAIL PROTECTED] --- diff -uprN iproute2-2.6.23/netem/trace/flowseed.c iproute2-2.6.23_buf/netem/trace/flowseed.c --- iproute2-2.6.23/netem/trace/flowseed.c 1970-01-01 01:00:00.0 +0100 +++ iproute2-2.6.23_buf/netem/trace/flowseed.c 2007-12-12 08:43:01.0 +0100 @@ -0,0 +1,209 @@ +/* flowseed.cflowseedprocess to deliver values for packet delay, + * duplication, loss and curruption form userspace to netem + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors: Ariane Keller [EMAIL PROTECTED] ETH Zurich + * Rainer Baumann [EMAIL PROTECTED] ETH Zurich + */ + +#include ctype.h +#include stdio.h +#include fcntl.h +#include stdlib.h +#include string.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include sys/ipc.h +#include sys/sem.h +#include signal.h + +#include utils.h +#include linux/pkt_sched.h + +#define DATA_PACKAGE 4000 +#define DATA_PACKAGE_ID DATA_PACKAGE + sizeof(unsigned int) + sizeof(int) +#define TCA_BUF_MAX (64*1024) +/* maximal amount of parallel flows */ +struct rtnl_handle rth; +unsigned int loop; +int infinity = 0; +int fdflowseed; +char *sendpkg; +int fid; +int initialized = 0; +int semid; +int moreData = 1, r = 0, rold = 0; +FILE * file; + + +int printfct(const struct sockaddr_nl *who, + struct nlmsghdr *n, + void *arg) +{ + struct { + struct nlmsghdr n; + struct tcmsgt; + charbuf[TCA_BUF_MAX]; + } req; + struct tcmsg *t = NLMSG_DATA(n); + struct rtattr *tail = NULL; + struct tc_netem_qopt opt; + memset(opt, 0, sizeof(opt)); + + if(n-nlmsg_type == RTM_DELQDISC) { + goto outerr; + } + else if(n-nlmsg_type == RTM_NEWQDISC){ + initialized = 1; + + memset(req, 0, sizeof(req)); + req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)); + req.n.nlmsg_flags = NLM_F_REQUEST; + req.n.nlmsg_type = RTM_NEWQDISC; + req.t.tcm_family = AF_UNSPEC; + req.t.tcm_handle = t-tcm_handle; + req.t.tcm_parent = t-tcm_parent; + req.t.tcm_ifindex = t-tcm_ifindex; + + tail = NLMSG_TAIL(req.n); +again: + if (loop = 0 !infinity){ + goto out; + } + if ((r = read(fdflowseed, sendpkg + rold, DATA_PACKAGE - rold)) = 0) { + if (r + rold DATA_PACKAGE) { + /* Tail of input file reached, + set rest at start from next iteration */ + rold = r; + fprintf(file, flowseed: at end of file.\n); + + if (lseek(fdflowseed, 0L, SEEK_SET) 0){ + perror(lseek reset); + goto out; + } + goto again; + } + r = 0; + rold = 0; + memcpy(sendpkg + DATA_PACKAGE, fid, sizeof(int)); + memcpy(sendpkg + DATA_PACKAGE + sizeof(int), moreData, sizeof(int)); + + /* opt has to be added for each netem request */ + if (addattr_l(req.n, TCA_BUF_MAX, TCA_OPTIONS, opt, sizeof(opt)) 0){ + perror(add options); + return -1; + } + + if(addattr_l(req.n, TCA_BUF_MAX, TCA_NETEM_TRACE_DATA, sendpkg, DATA_PACKAGE_ID) 0){ + perror(add data\n); + return -1; + } + + tail-rta_len = (void *)NLMSG_TAIL(req.n) - (void *)tail; + + if(rtnl_send(rth, (char*)req, req.n.nlmsg_len) 0){ + perror(send data); + return -1; + } + return 0; + } + } +/* no more data, what to do? we send a notification to the kernel module */ +out: + fprintf(stderr, flowseed: Tail of input file reached. Exit.\n); + fprintf(file, flowseed: Tail of input file reached. Exit.\n); + moreData = 0; + memcpy(sendpkg + DATA_PACKAGE, fid, sizeof(int)); +
Re: [PATCH 0/2] netem: trace enhancement
On Mon, 10 Dec 2007 15:32:14 +0100 Ariane Keller [EMAIL PROTECTED] wrote: I finally managed to rewrite the netem trace extension to use rtnetlink communication for the data transfer for user space to kernel space. The kernel patch is available here: http://www.tcn.hypert.net/tcn_kernel_2_6_23_rtnetlink and the iproute patch is here: http://www.tcn.hypert.net/tcn_iproute2_2_6_23_rtnetlink Whenever new data is needed the kernel module sends a notification to the user space process. Thereupon the user space process sends a data package to the kernel module. I wonder if it wouldn't be possible to enhance/extend netlink to use sendfile/splice to get the data. It is rather more work than needed for just this, but it would be useful for large configuration. I had to write a new qdisc_notify function (qdisc_notify_pid) since the other was acquiring a lock, which we already hold in this situation. I hope everything works as expected and I'm looking forward for your comments. Thanks! Ariane -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
I finally managed to rewrite the netem trace extension to use rtnetlink communication for the data transfer for user space to kernel space. The kernel patch is available here: http://www.tcn.hypert.net/tcn_kernel_2_6_23_rtnetlink and the iproute patch is here: http://www.tcn.hypert.net/tcn_iproute2_2_6_23_rtnetlink Whenever new data is needed the kernel module sends a notification to the user space process. Thereupon the user space process sends a data package to the kernel module. I had to write a new qdisc_notify function (qdisc_notify_pid) since the other was acquiring a lock, which we already hold in this situation. I hope everything works as expected and I'm looking forward for your comments. Thanks! Ariane -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Thanks for your comments! Patrick McHardy wrote: Ariane Keller wrote: That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. I guess using rtnetlink is possible. However I'm not sure about how to implement it: The first thought was to use RTM_NEWQDISC to send the data to the netem_change() function (similar to tc_qdisc_modify() ). That sounds reasonable. But with this we would need the tcm_handle, tcm_parent arguments etc. which are not known in q_netem.c Therefore we would have to change the parse_qopt() function prototype in order to pass the whole req and not only the nlmsghdr. I assume you mean netem_init, parse_qopt is userspace. But I don't see how that is related, emptying the buffer happens during packet processing, right? Actually I meant parse_qopt from user space. If we would change that function prototype we would have the whole message header available in netem_parse_opt() and could pass this to the process which is responsible for sending the data to the kernel. This process would use this header every time it has to send new values to the netem_change() function in the kernel module. I thought about this because I was not aware of the qdisc_notify function. Anyway I've got some troubles with calling qdisc_notify. 1. I have to do a EXPORT_SYMBOL(qdisc_notify) (currently it is declared static in sch_api.c) 2. I'd like to call it from netem_enqueue(), which leads to a sleeping function called from invalid context, since we are still in interrupt context. Therefore I think I have to put it in a workqueue. I hope, this is ok. I guess I would simply change the qdisc_notify function to not require a struct nlmsghdr * (simply pass nlmsg_seq directly) and use that to send notifications. The netem dump function would add the buffer state. BTW, the parent class id is available in sch-parent, the handle in sch-handle, but qdisc_notify should take care of everything you need. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: Thanks for your comments! Patrick McHardy wrote: But with this we would need the tcm_handle, tcm_parent arguments etc. which are not known in q_netem.c Therefore we would have to change the parse_qopt() function prototype in order to pass the whole req and not only the nlmsghdr. I assume you mean netem_init, parse_qopt is userspace. But I don't see how that is related, emptying the buffer happens during packet processing, right? Actually I meant parse_qopt from user space. If we would change that function prototype we would have the whole message header available in netem_parse_opt() and could pass this to the process which is responsible for sending the data to the kernel. This process would use this header every time it has to send new values to the netem_change() function in the kernel module. You don't actually want to parse tc output in your program? Just open a netlink socket and do the necessary processing yourself, libnl makes this really easy. I thought about this because I was not aware of the qdisc_notify function. Anyway I've got some troubles with calling qdisc_notify. 1. I have to do a EXPORT_SYMBOL(qdisc_notify) (currently it is declared static in sch_api.c) This is fine. 2. I'd like to call it from netem_enqueue(), which leads to a sleeping function called from invalid context, since we are still in interrupt context. Therefore I think I have to put it in a workqueue. Just change it to use gfp_any(). -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. I guess using rtnetlink is possible. However I'm not sure about how to implement it: The first thought was to use RTM_NEWQDISC to send the data to the netem_change() function (similar to tc_qdisc_modify() ). But with this we would need the tcm_handle, tcm_parent arguments etc. which are not known in q_netem.c Therefore we would have to change the parse_qopt() function prototype in order to pass the whole req and not only the nlmsghdr. The second possibility would be to add a new message type, e.g RTM_NETEMDATA. This type would be registered in the netem kernel module with a callback function netem_recv_data(). If this function receives some data it searches for the correct flow, and saves the data in the corresponding buffer. However, I'm not convinced of any of these options. Do you have an alternative suggestion? Also, I think you will need a larger cache than 4-8k if you are running higher speeds (100,000 pps, etc), as you probably can't rely on user-space responding reliably every 10ms (or even less time for faster speeds.) Increasing the cache size to say 32k for each buffer would be no problem. Is this enough? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. I guess using rtnetlink is possible. However I'm not sure about how to implement it: The first thought was to use RTM_NEWQDISC to send the data to the netem_change() function (similar to tc_qdisc_modify() ). That sounds reasonable. But with this we would need the tcm_handle, tcm_parent arguments etc. which are not known in q_netem.c Therefore we would have to change the parse_qopt() function prototype in order to pass the whole req and not only the nlmsghdr. I assume you mean netem_init, parse_qopt is userspace. But I don't see how that is related, emptying the buffer happens during packet processing, right? I guess I would simply change the qdisc_notify function to not require a struct nlmsghdr * (simply pass nlmsg_seq directly) and use that to send notifications. The netem dump function would add the buffer state. BTW, the parent class id is available in sch-parent, the handle in sch-handle, but qdisc_notify should take care of everything you need. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: Increasing the cache size to say 32k for each buffer would be no problem. Is this enough? Maybe just a variable length list of 4k buffers chained together? Its usually easier to get 4k chunks of memory than 32k chunks, especially under high network load, and if you go ahead an make it arbitrary length, then each user can determine how many they want to have queued... Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
I thought about that as well, but in my opinion this does not help much. It's the same as before: in average every 10ms a new buffer needs to be filled. Ben Greear wrote: Ariane Keller wrote: Increasing the cache size to say 32k for each buffer would be no problem. Is this enough? Maybe just a variable length list of 4k buffers chained together? Its usually easier to get 4k chunks of memory than 32k chunks, especially under high network load, and if you go ahead an make it arbitrary length, then each user can determine how many they want to have queued... Thanks, Ben -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: I thought about that as well, but in my opinion this does not help much. It's the same as before: in average every 10ms a new buffer needs to be filled. But, you can fill 50 or 100 at a time, so if user-space is delayed for a few ms, the kernel still has plenty of buffers to work with until user-space gets another chance. I'm not worried about average thoughput of user-space to kernel, just random short-term starvation. Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ben Greear wrote: Ariane Keller wrote: I thought about that as well, but in my opinion this does not help much. It's the same as before: in average every 10ms a new buffer needs to be filled. But, you can fill 50 or 100 at a time, so if user-space is delayed for a few ms, the kernel still has plenty of buffers to work with until user-space gets another chance. I'm not worried about average thoughput of user-space to kernel, just random short-term starvation. Yes, for short-term starvation it helps certainly. But I'm still not convinced that it is really necessary to add more buffers, because I'm not sure whether the bottleneck is really the loading of data from user space to kernel space. Some basic tests have shown that the kernel starts loosing packets at approximately the same packet rate regardless whether we use netem, or netem with the trace extension. But if you have contrary experience I'm happy to add a parameter which defines the number of buffers. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: Yes, for short-term starvation it helps certainly. But I'm still not convinced that it is really necessary to add more buffers, because I'm not sure whether the bottleneck is really the loading of data from user space to kernel space. Some basic tests have shown that the kernel starts loosing packets at approximately the same packet rate regardless whether we use netem, or netem with the trace extension. But if you have contrary experience I'm happy to add a parameter which defines the number of buffers. I have no numbers, so if you think it works, then that is fine with me. If you actually run out of the trace buffers, do you just continue to run with the last settings? If so, that would keep up throughput even if you are out of trace buffers... What rates do you see, btw? (pps, bps). Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
If you actually run out of the trace buffers, do you just continue to run with the last settings? If so, that would keep up throughput even if you are out of trace buffers... Upon configuring the qdisc you can specify a default value, which is taken when the buffers are empty. It is either drop the packet or just forward it with no delay. What rates do you see, btw? (pps, bps). My machine was an AMD Athlon 2083MHz, with a default installation of Debian with Kernel 2.6.16 and HZ set to 1000. Up to 80'000 pps (with small udp packets) everything (without netem, with netem and with netem trace) worked fine (tested with up to 10ms delay). For 90'000 pps the kernel dropped some packets even with no netem running, some more with netem and allmost all with netem trace. As soon as I have changed the mechanism for the data transfer to rtnetlink I'll do some new tests, trying to reach a higher packet rate. Then I'll see whether it is necessary to add more buffers, or whether the system collapses before. Thanks again! Ariane -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Patrick McHardy wrote: Ariane Keller wrote: Thanks for your comments! I'd like to better understand your dislike of the current implementation of the data transfer from user space to kernel space. Is it the fact that we use configfs? I think, we had already a discussion about this (and we changed from procfs to configfs). Or don't you like that we need a user space daemon which is responsible for feeding the data to the kernel module? I think we do not have another option, since the trace file may be of arbitrary length. Or anything else? I dislike using anything besides rtnetlink for qdisc configuration. The only way to transfer arbitary amounts of data over netlink would be to spread the data over multiple messages. But then again, you're using kmalloc and only seem to allocate 4k, so how large are these traces in practice? For each packet to be processed there is 32bit of data, which encodes delay and drop, duplicate etc. The size of the actual trace file can therefore reach any length, depending on for how many packets the information is encoded (up to several GB). Therefore we send the trace file in chunks of 4000bytes to the kernel. In order to have always a packet-delay-value ready, we maintain two delay queues in the kernel (each of 4k). In a first step, both queues are filled, and the values are read from the first queue, if this queue is finished, we read values from the second queue and fill the first queue with new values from the trace file etc. Therefore we have a user space process running, which reads the values from the trace file, and sends them to the kernel. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: Patrick McHardy wrote: I dislike using anything besides rtnetlink for qdisc configuration. The only way to transfer arbitary amounts of data over netlink would be to spread the data over multiple messages. But then again, you're using kmalloc and only seem to allocate 4k, so how large are these traces in practice? For each packet to be processed there is 32bit of data, which encodes delay and drop, duplicate etc. The size of the actual trace file can therefore reach any length, depending on for how many packets the information is encoded (up to several GB). Therefore we send the trace file in chunks of 4000bytes to the kernel. In order to have always a packet-delay-value ready, we maintain two delay queues in the kernel (each of 4k). In a first step, both queues are filled, and the values are read from the first queue, if this queue is finished, we read values from the second queue and fill the first queue with new values from the trace file etc. Therefore we have a user space process running, which reads the values from the trace file, and sends them to the kernel. That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Patrick McHardy wrote: That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. Also, I think you will need a larger cache than 4-8k if you are running higher speeds (100,000 pps, etc), as you probably can't rely on user-space responding reliably every 10ms (or even less time for faster speeds.) Thanks, Ben -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: Thanks for your comments! I'd like to better understand your dislike of the current implementation of the data transfer from user space to kernel space. Is it the fact that we use configfs? I think, we had already a discussion about this (and we changed from procfs to configfs). Or don't you like that we need a user space daemon which is responsible for feeding the data to the kernel module? I think we do not have another option, since the trace file may be of arbitrary length. Or anything else? I dislike using anything besides rtnetlink for qdisc configuration. The only way to transfer arbitary amounts of data over netlink would be to spread the data over multiple messages. But then again, you're using kmalloc and only seem to allocate 4k, so how large are these traces in practice? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Thanks for your comments! I'd like to better understand your dislike of the current implementation of the data transfer from user space to kernel space. Is it the fact that we use configfs? I think, we had already a discussion about this (and we changed from procfs to configfs). Or don't you like that we need a user space daemon which is responsible for feeding the data to the kernel module? I think we do not have another option, since the trace file may be of arbitrary length. Or anything else? Patrick McHardy wrote: Stephen Hemminger wrote: Still interested in this. I got part way through integrating it but had concerns about the API from the application to netem for getting the data. It seemed like there ought to be a better way to do it that could handle large data sets better, but never really got a good solution worked out (that is why I never said anything). Would spreading them over multiple netlink messages work? A different, slightly ugly possibility would be to simply use copy_from_user, netlink is synchronous now (still better than using configfs IMO). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Stephen Hemminger wrote: Still interested in this. I got part way through integrating it but had concerns about the API from the application to netem for getting the data. It seemed like there ought to be a better way to do it that could handle large data sets better, but never really got a good solution worked out (that is why I never said anything). Would spreading them over multiple netlink messages work? A different, slightly ugly possibility would be to simply use copy_from_user, netlink is synchronous now (still better than using configfs IMO). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
On Tue, 27 Nov 2007 14:57:26 +0100 Ariane Keller [EMAIL PROTECTED] wrote: I just wanted to ask whether there is a general interest in this patch. If yes: great, how to proceed? otherwise: please let me know why. Thanks! Ariane Keller wrote: Hi Stephen Approximately a year ago we discussed an enhancement to netem, which we called trace control for netem. We obtain the value for the packet delay, drop, duplication and corruption from a so called trace file. The trace file may be obtained by monitoring network traffic and thus enables us to emulate real world network behavior. Traces can ether be generated individually (we supply a set of tools to do this) or can be downloaded from our homepage: http://tcn.hypert.net . Since our last submission on 2006-12-15 we did some code clean up and have created two new patches one against kernel 2.6.23.8 and one against iproute2-2.6.23. To refer to our discussion from last year please have a look at messages with subject LARTC: trace control for netem. We are looking forward for any comments, suggestions and instructions to bring the trace enhancement to the kernel and to iproute2. Thanks, Ariane Still interested in this. I got part way through integrating it but had concerns about the API from the application to netem for getting the data. It seemed like there ought to be a better way to do it that could handle large data sets better, but never really got a good solution worked out (that is why I never said anything). The 2.6.23.8 patch seems to be unavailable right now. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html