Re: [PATCH 0/2] netem: trace enhancement

2007-12-23 Thread Ariane Keller

I have added the possibility to configure the number
of buffers used to store the trace data for packet delays.
The complete command to start netem with a trace file is:
tc qdisc add dev eth1 root netem trace path/to/trace/file.bin buf 3 
loops 1 0

with buf: the number of buffers to be used
loops: how many times to loop through the tracefile
the last argument is optional and specifies whether the default is to 
drop packets or 0-delay them.


The patches are available at:
http://www.tcn.hypert.net/tcn_kernel_2_6_23_confbuf
http://www.tcn.hypert.net/tcn_iproute2_2_6_23_confbuf

I'm looking forward for your comments!
Thanks!
Ariane


Ben Greear wrote:

Ariane Keller wrote:


Yes, for short-term starvation it helps certainly.
But I'm still not convinced that it is really necessary to add more 
buffers, because I'm not sure whether the bottleneck is really the 
loading of data from user space to kernel space.
Some basic tests have shown that the kernel starts loosing packets at 
approximately the same packet rate regardless whether we use netem, or 
netem with the trace extension.
But if you have contrary experience I'm happy to add a parameter which 
defines the number of buffers.


I have no numbers, so if you think it works, then that is fine with me.

If you actually run out of the trace buffers, do you just continue to
run with the last settings?  If so, that would keep up throughput
even if you are out of trace buffers...

What rates do you see, btw?  (pps, bps).

Thanks,
Ben



--
Ariane Keller
Communication Systems Research Group, ETH Zurich
Web: http://www.csg.ethz.ch/people/arkeller
Office: ETZ G 60.1, Gloriastrasse 35, 8092 Zurich
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement: kernel

2007-12-23 Thread Ariane Keller

This patch applies to kernel 2.6.23.
It enhances the network emulator netem with the possibility
to read all delay/drop/duplicate etc values from a trace file.
This trace file contains for each packet to be processed one value.
The values are read from the file in a user space process called
flowseed. These values are sent to the netem module with the help of
rtnetlink sockets.
In the netem module the values are cached in buffers.
The number of buffers is configurable upon start of netem.
If a buffer is empty the netem module sends a rtnetlink notification
to the flowseed process.
Upon receiving such a notification this process sends
the next 1000 values to the netem module.

signed-off-by: Ariane Keller [EMAIL PROTECTED]

---
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/include/linux/pkt_sched.h 
linux-2.6.23.8_mod/include/linux/pkt_sched.h
--- linux-2.6.23.8/include/linux/pkt_sched.h	2007-11-16 
19:14:27.0 +0100
+++ linux-2.6.23.8_mod/include/linux/pkt_sched.h	2007-12-21 
19:42:49.0 +0100

@@ -439,6 +439,9 @@ enum
TCA_NETEM_DELAY_DIST,
TCA_NETEM_REORDER,
TCA_NETEM_CORRUPT,
+   TCA_NETEM_TRACE,
+   TCA_NETEM_TRACE_DATA,
+   TCA_NETEM_STATS,
__TCA_NETEM_MAX,
 };

@@ -454,6 +457,26 @@ struct tc_netem_qopt
__u32   jitter; /* random jitter in latency (us) */
 };

+struct tc_netem_stats
+{
+   int packetcount;
+   int packetok;
+   int normaldelay;
+   int drops;
+   int dupl;
+   int corrupt;
+   int novaliddata;
+   int reloadbuffer;
+};
+
+struct tc_netem_trace
+{
+   __u32   fid; /*flowid */
+   __u32   def; /* default action 0 = no delay, 1 = drop*/
+   __u32   ticks;   /* number of ticks corresponding to 1ms */
+   __u32   nr_bufs; /* number of buffers to save trace data*/
+};
+
 struct tc_netem_corr
 {
__u32   delay_corr; /* delay correlation */
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/include/net/flowseed.h 
linux-2.6.23.8_mod/include/net/flowseed.h
--- linux-2.6.23.8/include/net/flowseed.h	1970-01-01 01:00:00.0 
+0100
+++ linux-2.6.23.8_mod/include/net/flowseed.h	2007-12-21 
19:43:24.0 +0100

@@ -0,0 +1,34 @@
+/* flowseed.h header file for the netem trace enhancement
+ */
+
+#ifndef _FLOWSEED_H
+#define _FLOWSEED_H
+#include net/sch_generic.h
+
+/* must be divisible by 4 (=#pkts)*/
+#define DATA_PACKAGE 4000
+#define DATA_PACKAGE_ID 4008
+
+/* struct per flow - kernel */
+struct tcn_control
+{
+   struct list_head full_buffer_list;
+   struct list_head empty_buffer_list;
+   struct buflist * buffer_in_use; 
+   int *offsetpos;   /* pointer to actual pos in the buffer in use */
+   int flowid;
+};
+
+struct tcn_statistic
+{
+   int packetcount;
+   int packetok;
+   int normaldelay;
+   int drops;
+   int dupl;
+   int corrupt;
+   int novaliddata;
+   int reloadbuffer;
+};
+
+#endif
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/include/net/pkt_sched.h 
linux-2.6.23.8_mod/include/net/pkt_sched.h
--- linux-2.6.23.8/include/net/pkt_sched.h	2007-11-16 19:14:27.0 
+0100
+++ linux-2.6.23.8_mod/include/net/pkt_sched.h	2007-12-21 
19:42:49.0 +0100

@@ -72,6 +72,9 @@ extern void qdisc_watchdog_cancel(struct
 extern struct Qdisc_ops pfifo_qdisc_ops;
 extern struct Qdisc_ops bfifo_qdisc_ops;

+extern int qdisc_notify_pid(int pid, struct nlmsghdr *n, u32 clid,
+   struct Qdisc *old, struct Qdisc *new);
+
 extern int register_qdisc(struct Qdisc_ops *qops);
 extern int unregister_qdisc(struct Qdisc_ops *qops);
 extern struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/net/core/rtnetlink.c linux-2.6.23.8_mod/net/core/rtnetlink.c

--- linux-2.6.23.8/net/core/rtnetlink.c 2007-11-16 19:14:27.0 +0100
+++ linux-2.6.23.8_mod/net/core/rtnetlink.c	2007-12-21 
19:42:49.0 +0100

@@ -460,7 +460,7 @@ int rtnetlink_send(struct sk_buff *skb,
NETLINK_CB(skb).dst_group = group;
if (echo)
atomic_inc(skb-users);
-   netlink_broadcast(rtnl, skb, pid, group, GFP_KERNEL);
+   netlink_broadcast(rtnl, skb, pid, group, gfp_any());
if (echo)
err = netlink_unicast(rtnl, skb, pid, MSG_DONTWAIT);
return err;
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/net/sched/sch_api.c linux-2.6.23.8_mod/net/sched/sch_api.c

--- linux-2.6.23.8/net/sched/sch_api.c  2007-11-16 19:14:27.0 +0100
+++ linux-2.6.23.8_mod/net/sched/sch_api.c	2007-12-21 19:42:49.0 
+0100

@@ -28,6 +28,7 @@
 #include linux/list.h
 #include linux/hrtimer.h

+#include net/sock.h
 #include net/netlink.h
 #include net/pkt_sched.h

@@ -841,6 +842,62 @@ rtattr_failure:
nlmsg_trim(skb, b);
return -1;
 }

Re: [PATCH 0/2] netem: trace enhancement: iproute

2007-12-23 Thread Ariane Keller

The iproute patch is to big to send on the mailing list,
since the distribution data have changed the directory.
For ease of discussion I add the important changes in this mail.

signed-of-by: Ariane Keller [EMAIL PROTECTED]

---

diff -uprN iproute2-2.6.23/netem/trace/flowseed.c 
iproute2-2.6.23_buf/netem/trace/flowseed.c
--- iproute2-2.6.23/netem/trace/flowseed.c	1970-01-01 01:00:00.0 
+0100
+++ iproute2-2.6.23_buf/netem/trace/flowseed.c	2007-12-12 
08:43:01.0 +0100

@@ -0,0 +1,209 @@
+/* flowseed.cflowseedprocess to deliver values for packet delay,
+ *   duplication, loss and curruption form userspace to netem
+ *
+ *   This program is free software; you can redistribute it 
and/or
+ *   modify it under the terms of the GNU General Public 
License
+ *   as published by the Free Software Foundation; either 
version

+ *   2 of the License, or (at your option) any later version.
+ *
+ *  Authors: Ariane Keller [EMAIL PROTECTED] ETH Zurich
+ *   Rainer Baumann [EMAIL PROTECTED] ETH Zurich
+ */
+
+#include ctype.h
+#include stdio.h
+#include fcntl.h
+#include stdlib.h
+#include string.h
+#include sys/types.h
+#include sys/stat.h
+#include unistd.h
+#include sys/ipc.h
+#include sys/sem.h
+#include signal.h
+
+#include utils.h
+#include linux/pkt_sched.h
+
+#define DATA_PACKAGE 4000
+#define DATA_PACKAGE_ID DATA_PACKAGE + sizeof(unsigned int) + sizeof(int)
+#define TCA_BUF_MAX  (64*1024)
+/* maximal amount of parallel flows */
+struct rtnl_handle rth;
+unsigned int loop;
+int infinity = 0;
+int fdflowseed;
+char *sendpkg;
+int fid;
+int initialized = 0;
+int semid;
+int moreData = 1, r = 0, rold = 0;
+FILE * file;
+
+
+int printfct(const struct sockaddr_nl *who,
+  struct nlmsghdr *n,
+  void *arg)
+{
+   struct {
+   struct nlmsghdr n;
+   struct tcmsgt;
+   charbuf[TCA_BUF_MAX];
+   } req;
+   struct tcmsg *t = NLMSG_DATA(n);
+   struct rtattr *tail = NULL;
+   struct tc_netem_qopt opt;
+   memset(opt, 0, sizeof(opt));
+
+   if(n-nlmsg_type == RTM_DELQDISC) {
+   goto outerr;
+   }
+   else if(n-nlmsg_type == RTM_NEWQDISC){
+   initialized = 1;
+   
+   memset(req, 0, sizeof(req));
+   req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+   req.n.nlmsg_flags = NLM_F_REQUEST;
+   req.n.nlmsg_type = RTM_NEWQDISC;
+   req.t.tcm_family = AF_UNSPEC;
+   req.t.tcm_handle = t-tcm_handle;
+   req.t.tcm_parent = t-tcm_parent;
+   req.t.tcm_ifindex = t-tcm_ifindex;
+
+   tail = NLMSG_TAIL(req.n);
+again:
+   if (loop = 0  !infinity){
+   goto out;
+   }
+   if ((r = read(fdflowseed, sendpkg + rold, DATA_PACKAGE - rold)) 
= 0) {
+   if (r + rold  DATA_PACKAGE) {
+   /* Tail of input file reached,
+  set rest at start from next iteration */
+   rold = r;
+   fprintf(file, flowseed: at end of file.\n);
+
+   if (lseek(fdflowseed, 0L, SEEK_SET)  0){
+   perror(lseek reset);
+   goto out;
+   }
+   goto again;
+   }
+   r = 0;
+   rold = 0;
+   memcpy(sendpkg + DATA_PACKAGE, fid, sizeof(int));
+   memcpy(sendpkg + DATA_PACKAGE + sizeof(int), moreData, 
sizeof(int));
+   
+   /* opt has to be added for each netem request */
+   if (addattr_l(req.n, TCA_BUF_MAX, TCA_OPTIONS, opt, 
sizeof(opt))  0){
+   perror(add options);
+   return -1;
+   }
+
+			if(addattr_l(req.n, TCA_BUF_MAX, TCA_NETEM_TRACE_DATA, sendpkg, 
DATA_PACKAGE_ID)  0){

+   perror(add data\n);
+   return -1;
+   }
+
+   tail-rta_len = (void *)NLMSG_TAIL(req.n) - (void 
*)tail;
+
+   if(rtnl_send(rth, (char*)req, req.n.nlmsg_len)  0){
+   perror(send data);
+   return -1;
+   }
+   return 0;
+   }
+   }
+/* no more data, what to do? we send a notification to the kernel module */
+out:
+   fprintf(stderr, flowseed: Tail of input file reached. Exit.\n);
+   fprintf(file, flowseed: Tail of input file reached. Exit.\n);
+   moreData = 0;
+   memcpy(sendpkg + DATA_PACKAGE, fid, sizeof(int));
+   

Re: [PATCH 0/2] netem: trace enhancement

2007-12-12 Thread Stephen Hemminger
On Mon, 10 Dec 2007 15:32:14 +0100
Ariane Keller [EMAIL PROTECTED] wrote:

 I finally managed to rewrite the netem trace extension to use rtnetlink 
 communication for the data transfer for user space to kernel space.
 
 The kernel patch is available here:
 http://www.tcn.hypert.net/tcn_kernel_2_6_23_rtnetlink
 
 and the iproute patch is here:
 http://www.tcn.hypert.net/tcn_iproute2_2_6_23_rtnetlink
 
 Whenever new data is needed the kernel module sends a notification to 
 the user space process. Thereupon the user space process sends a data 
 package to the kernel module.

I wonder if it wouldn't be possible to enhance/extend netlink
to use sendfile/splice to get the data.  It is rather more work than
needed for just this, but it would be useful for large configuration.

 I had to write a new qdisc_notify function (qdisc_notify_pid) since the 
 other was acquiring a lock, which we already hold in this situation.



 I hope everything works as expected and I'm looking forward for your 
 comments.
 
 Thanks!
 Ariane


-- 
Stephen Hemminger [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-10 Thread Ariane Keller
I finally managed to rewrite the netem trace extension to use rtnetlink 
communication for the data transfer for user space to kernel space.


The kernel patch is available here:
http://www.tcn.hypert.net/tcn_kernel_2_6_23_rtnetlink

and the iproute patch is here:
http://www.tcn.hypert.net/tcn_iproute2_2_6_23_rtnetlink

Whenever new data is needed the kernel module sends a notification to 
the user space process. Thereupon the user space process sends a data 
package to the kernel module.
I had to write a new qdisc_notify function (qdisc_notify_pid) since the 
other was acquiring a lock, which we already hold in this situation.


I hope everything works as expected and I'm looking forward for your 
comments.


Thanks!
Ariane
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-05 Thread Ariane Keller

Thanks for your comments!

Patrick McHardy wrote:

Ariane Keller wrote:



That sounds like it would also be possible using rtnetlink. You could
send out a notification whenever you switch the active buffer and have
userspace listen to these and replace the inactive one.


I guess using rtnetlink is possible. However I'm not sure about how to 
implement it:
The first thought was to use RTM_NEWQDISC to send the data to the 
netem_change() function (similar to tc_qdisc_modify() ).


That sounds reasonable.

But with this we would need the tcm_handle, tcm_parent arguments etc. 
which are not known in q_netem.c
Therefore we would have to change the parse_qopt() function prototype 
in order to pass the whole req and not only the nlmsghdr.


I assume you mean netem_init, parse_qopt is userspace. But I don't
see how that is related, emptying the buffer happens during packet
processing, right?

Actually I meant parse_qopt from user space.
If we would change that function prototype we would have the whole 
message header available in netem_parse_opt() and could pass this to the 
process which is responsible for sending the data to the kernel. This 
process would use this header every time it has to send new values to 
the netem_change() function in the kernel module.


I thought about this because I was not aware of the qdisc_notify function.
Anyway I've got some troubles with calling qdisc_notify.
1. I have to do a EXPORT_SYMBOL(qdisc_notify) (currently it is declared 
static in sch_api.c)
2. I'd like to call it from netem_enqueue(), which leads to a sleeping 
function called from invalid context, since we are still in interrupt 
context. Therefore I think I have to put it in a workqueue.


I hope, this is ok.




I guess I would simply change the qdisc_notify function to not
require a struct nlmsghdr * (simply pass nlmsg_seq directly) and
use that to send notifications. The netem dump function would
add the buffer state. BTW, the parent class id is available in
sch-parent, the handle in sch-handle, but qdisc_notify should
take care of everything you need.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-05 Thread Patrick McHardy

Ariane Keller wrote:

Thanks for your comments!

Patrick McHardy wrote:

But with this we would need the tcm_handle, tcm_parent arguments etc. 
which are not known in q_netem.c
Therefore we would have to change the parse_qopt() function prototype 
in order to pass the whole req and not only the nlmsghdr.


I assume you mean netem_init, parse_qopt is userspace. But I don't
see how that is related, emptying the buffer happens during packet
processing, right?



Actually I meant parse_qopt from user space.
If we would change that function prototype we would have the whole 
message header available in netem_parse_opt() and could pass this to the 
process which is responsible for sending the data to the kernel. This 
process would use this header every time it has to send new values to 
the netem_change() function in the kernel module.



You don't actually want to parse tc output in your program?
Just open a netlink socket and do the necessary processing
yourself, libnl makes this really easy.


I thought about this because I was not aware of the qdisc_notify function.
Anyway I've got some troubles with calling qdisc_notify.
1. I have to do a EXPORT_SYMBOL(qdisc_notify) (currently it is declared 
static in sch_api.c)


This is fine.

2. I'd like to call it from netem_enqueue(), which leads to a sleeping 
function called from invalid context, since we are still in interrupt 
context. Therefore I think I have to put it in a workqueue.


Just change it to use gfp_any().
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Ariane Keller



That sounds like it would also be possible using rtnetlink. You could
send out a notification whenever you switch the active buffer and have
userspace listen to these and replace the inactive one.


I guess using rtnetlink is possible. However I'm not sure about how to 
implement it:
The first thought was to use RTM_NEWQDISC to send the data to the 
netem_change() function (similar to tc_qdisc_modify() ). But with this 
we would need the tcm_handle, tcm_parent arguments etc. which are not 
known in q_netem.c
Therefore we would have to change the parse_qopt() function prototype 
in order to pass the whole req and not only the nlmsghdr.


The second possibility would be to add a new message type, e.g 
RTM_NETEMDATA. This type would be registered in the netem kernel module 
with a callback function netem_recv_data(). If this function receives 
some data it searches for the correct flow, and saves the data in the 
corresponding buffer.


However, I'm not convinced of any of these options. Do you have an 
alternative suggestion?



Also, I think you will need a larger cache than 4-8k if you are 
running higher speeds (100,000 pps, etc),
as you probably can't rely on user-space responding reliably every 
10ms (or even less time for faster

speeds.)


Increasing the cache size to say 32k for each buffer would be no problem.
Is this enough?



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Patrick McHardy

Ariane Keller wrote:



That sounds like it would also be possible using rtnetlink. You could
send out a notification whenever you switch the active buffer and have
userspace listen to these and replace the inactive one.


I guess using rtnetlink is possible. However I'm not sure about how to 
implement it:
The first thought was to use RTM_NEWQDISC to send the data to the 
netem_change() function (similar to tc_qdisc_modify() ).


That sounds reasonable.

But with this 
we would need the tcm_handle, tcm_parent arguments etc. which are not 
known in q_netem.c
Therefore we would have to change the parse_qopt() function prototype in 
order to pass the whole req and not only the nlmsghdr.


I assume you mean netem_init, parse_qopt is userspace. But I don't
see how that is related, emptying the buffer happens during packet
processing, right?

I guess I would simply change the qdisc_notify function to not
require a struct nlmsghdr * (simply pass nlmsg_seq directly) and
use that to send notifications. The netem dump function would
add the buffer state. BTW, the parent class id is available in
sch-parent, the handle in sch-handle, but qdisc_notify should
take care of everything you need.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Ben Greear

Ariane Keller wrote:


Increasing the cache size to say 32k for each buffer would be no problem.
Is this enough?
Maybe just a variable length list of 4k buffers chained together?  Its 
usually easier
to get 4k chunks of memory than 32k chunks, especially under high 
network load,
and if you go ahead an make it arbitrary length, then each user can 
determine how many

they want to have queued...

Thanks,
Ben


--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Ariane Keller

I thought about that as well, but in my opinion this does not help much.
It's the same as before: in average every 10ms a new buffer needs to be 
filled.



Ben Greear wrote:

Ariane Keller wrote:


Increasing the cache size to say 32k for each buffer would be no problem.
Is this enough?
Maybe just a variable length list of 4k buffers chained together?  Its 
usually easier
to get 4k chunks of memory than 32k chunks, especially under high 
network load,
and if you go ahead an make it arbitrary length, then each user can 
determine how many

they want to have queued...

Thanks,
Ben



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Ben Greear

Ariane Keller wrote:

I thought about that as well, but in my opinion this does not help much.
It's the same as before: in average every 10ms a new buffer needs to 
be filled.
But, you can fill 50 or 100 at a time, so if user-space is delayed for a 
few ms, the
kernel still has plenty of buffers to work with until user-space gets 
another chance.

I'm not worried about average thoughput of user-space to kernel, just random
short-term starvation.

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Ariane Keller



Ben Greear wrote:

Ariane Keller wrote:

I thought about that as well, but in my opinion this does not help much.
It's the same as before: in average every 10ms a new buffer needs to 
be filled.
But, you can fill 50 or 100 at a time, so if user-space is delayed for a 
few ms, the
kernel still has plenty of buffers to work with until user-space gets 
another chance.
I'm not worried about average thoughput of user-space to kernel, just 
random

short-term starvation.


Yes, for short-term starvation it helps certainly.
But I'm still not convinced that it is really necessary to add more 
buffers, because I'm not sure whether the bottleneck is really the 
loading of data from user space to kernel space.
Some basic tests have shown that the kernel starts loosing packets at 
approximately the same packet rate regardless whether we use netem, or 
netem with the trace extension.
But if you have contrary experience I'm happy to add a parameter which 
defines the number of buffers.


Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Ben Greear

Ariane Keller wrote:


Yes, for short-term starvation it helps certainly.
But I'm still not convinced that it is really necessary to add more 
buffers, because I'm not sure whether the bottleneck is really the 
loading of data from user space to kernel space.
Some basic tests have shown that the kernel starts loosing packets at 
approximately the same packet rate regardless whether we use netem, or 
netem with the trace extension.
But if you have contrary experience I'm happy to add a parameter which 
defines the number of buffers.


I have no numbers, so if you think it works, then that is fine with me.

If you actually run out of the trace buffers, do you just continue to
run with the last settings?  If so, that would keep up throughput
even if you are out of trace buffers...

What rates do you see, btw?  (pps, bps).

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-04 Thread Ariane Keller



If you actually run out of the trace buffers, do you just continue to
run with the last settings?  If so, that would keep up throughput
even if you are out of trace buffers...


Upon configuring the qdisc you can specify a default value, which is 
taken when the buffers are empty. It is either drop the packet or just 
forward it with no delay.



What rates do you see, btw?  (pps, bps).
My machine was an AMD Athlon 2083MHz, with a default installation of 
Debian with Kernel 2.6.16 and HZ set to 1000.
Up to 80'000 pps (with small udp packets) everything (without netem, 
with netem and with netem trace) worked fine (tested with up to 10ms delay).
For 90'000 pps the kernel dropped some packets even with no netem 
running, some more with netem and allmost all with netem trace.


As soon as I have changed the mechanism for the data transfer to 
rtnetlink I'll do some new tests, trying to reach a higher packet rate. 
Then I'll see whether it is necessary to add more buffers, or whether 
the system collapses before.


Thanks again!
Ariane


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-03 Thread Ariane Keller



Patrick McHardy wrote:

Ariane Keller wrote:

Thanks for your comments!

I'd like to better understand your dislike of the current 
implementation  of the data transfer from user space to kernel space.

Is it the fact that we use configfs?
I think, we had already a discussion about this (and we changed from 
procfs to configfs).
Or don't you like that we need a user space daemon which is 
responsible for feeding the data to the kernel module?
I think we do not have another option, since the trace file may be of 
arbitrary length.

Or anything else?



I dislike using anything besides rtnetlink for qdisc configuration.
The only way to transfer arbitary amounts of data over netlink would
be to spread the data over multiple messages. But then again, you're
using kmalloc and only seem to allocate 4k, so how large are these
traces in practice?


For each packet to be processed there is 32bit of data, which encodes 
delay and drop, duplicate etc. The size of the actual trace file can 
therefore reach any length, depending on for how many packets the 
information is encoded (up to several GB).
Therefore we send the trace file in chunks of 4000bytes to the kernel. 
In order to have always a packet-delay-value ready, we maintain two 
delay queues in the kernel (each of 4k). In a first step, both queues 
are filled, and the values are read from the first queue, if this queue 
is finished, we read values from the second queue and fill the first 
queue with new values from the trace file etc. Therefore we have a user 
space process running, which reads the values from the trace file, and 
sends them to the kernel.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-03 Thread Patrick McHardy

Ariane Keller wrote:

Patrick McHardy wrote:


I dislike using anything besides rtnetlink for qdisc configuration.
The only way to transfer arbitary amounts of data over netlink would
be to spread the data over multiple messages. But then again, you're
using kmalloc and only seem to allocate 4k, so how large are these
traces in practice?


For each packet to be processed there is 32bit of data, which encodes 
delay and drop, duplicate etc. The size of the actual trace file can 
therefore reach any length, depending on for how many packets the 
information is encoded (up to several GB).
Therefore we send the trace file in chunks of 4000bytes to the kernel. 
In order to have always a packet-delay-value ready, we maintain two 
delay queues in the kernel (each of 4k). In a first step, both queues 
are filled, and the values are read from the first queue, if this queue 
is finished, we read values from the second queue and fill the first 
queue with new values from the trace file etc. Therefore we have a user 
space process running, which reads the values from the trace file, and 
sends them to the kernel.



That sounds like it would also be possible using rtnetlink. You could
send out a notification whenever you switch the active buffer and have
userspace listen to these and replace the inactive one.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-03 Thread Ben Greear

Patrick McHardy wrote:


That sounds like it would also be possible using rtnetlink. You could
send out a notification whenever you switch the active buffer and have
userspace listen to these and replace the inactive one.


Also, I think you will need a larger cache than 4-8k if you are running 
higher speeds (100,000 pps, etc),
as you probably can't rely on user-space responding reliably every 10ms 
(or even less time for faster

speeds.)

Thanks,
Ben



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-12-02 Thread Patrick McHardy

Ariane Keller wrote:

Thanks for your comments!

I'd like to better understand your dislike of the current implementation 
 of the data transfer from user space to kernel space.

Is it the fact that we use configfs?
I think, we had already a discussion about this (and we changed from 
procfs to configfs).
Or don't you like that we need a user space daemon which is responsible 
for feeding the data to the kernel module?
I think we do not have another option, since the trace file may be of 
arbitrary length.

Or anything else?



I dislike using anything besides rtnetlink for qdisc configuration.
The only way to transfer arbitary amounts of data over netlink would
be to spread the data over multiple messages. But then again, you're
using kmalloc and only seem to allocate 4k, so how large are these
traces in practice?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-11-30 Thread Ariane Keller

Thanks for your comments!

I'd like to better understand your dislike of the current implementation 
 of the data transfer from user space to kernel space.

Is it the fact that we use configfs?
I think, we had already a discussion about this (and we changed from 
procfs to configfs).
Or don't you like that we need a user space daemon which is responsible 
for feeding the data to the kernel module?
I think we do not have another option, since the trace file may be of 
arbitrary length.

Or anything else?




Patrick McHardy wrote:

Stephen Hemminger wrote:

Still interested in this. I got part way through integrating it but had
concerns about the API from the application to netem for getting the 
data.
It seemed like there ought to be a better way to do it that could 
handle large
data sets better, but never really got a good solution worked out 
(that is why

I never said anything).


Would spreading them over multiple netlink messages work? A different,
slightly ugly possibility would be to simply use copy_from_user, netlink
is synchronous now (still better than using configfs IMO).


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-11-29 Thread Patrick McHardy

Stephen Hemminger wrote:

Still interested in this. I got part way through integrating it but had
concerns about the API from the application to netem for getting the data.
It seemed like there ought to be a better way to do it that could handle large
data sets better, but never really got a good solution worked out (that is why
I never said anything).


Would spreading them over multiple netlink messages work? A different,
slightly ugly possibility would be to simply use copy_from_user, netlink
is synchronous now (still better than using configfs IMO).


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netem: trace enhancement

2007-11-29 Thread Stephen Hemminger
On Tue, 27 Nov 2007 14:57:26 +0100
Ariane Keller [EMAIL PROTECTED] wrote:

 I just wanted to ask whether there is a general interest in this patch.
 If yes: great, how to proceed?
 otherwise: please let me know why.
 
 Thanks!
 
 
 
 
 Ariane Keller wrote:
  Hi Stephen
  
  Approximately a year ago we discussed an enhancement to netem,
  which we called trace control for netem.
  
  We obtain the value for the packet delay, drop, duplication and 
  corruption from a so called trace file. The trace file may be obtained 
  by monitoring network traffic and thus enables us to emulate real 
  world network behavior.
  
  Traces can ether be generated individually (we supply a set of tools to 
  do this) or can be downloaded from our homepage: http://tcn.hypert.net .
  
  Since our last submission on 2006-12-15 we did some code clean up and 
  have created two new patches one against kernel 2.6.23.8 and one against 
  iproute2-2.6.23.
  To refer to our discussion from last year please have a look at messages 
  with subject LARTC: trace control for netem.
  
  We are looking forward for any comments, suggestions and instructions to 
  bring the trace enhancement to the kernel and to iproute2.
  
  Thanks,
  Ariane

Still interested in this. I got part way through integrating it but had
concerns about the API from the application to netem for getting the data.
It seemed like there ought to be a better way to do it that could handle large
data sets better, but never really got a good solution worked out (that is why
I never said anything).

The 2.6.23.8 patch seems to be unavailable right now.
-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html