Re: [ovs-discuss] Flow miss/Packet order question

2013-10-06 Thread Dmitry Fleytman

On Oct 3, 2013, at 22:41 PM, Jesse Gross  wrote:

> On Thu, Oct 3, 2013 at 6:26 AM, Dmitry Fleytman  wrote:
>> On Oct 3, 2013, at 04:20 AM, Jesse Gross  wrote:
>>> On Wed, Oct 2, 2013 at 4:49 AM, Dmitry Fleytman  wrote:
>>>> 
>>>> On Apr 30, 2012, at 20:15 PM, Ben Pfaff  wrote:
>>>> 
>>>>> I think that your explanation stems from a misunderstanding.  Yes, if
>>>>> an OpenFlow controller uses a reactive model, then it cannot avoid the
>>>>> problem.  However, I think that Joji is raising a different issue, one
>>>>> that is an implementation detail within Open vSwitch and that
>>>>> controllers have no power to avoid.
>>>>> 
>>>>> Let me explain in detail.  When a packet arrives for which there is no
>>>>> kernel flow, the kernel sends it to userspace.  Userspace sends the
>>>>> packet and sets up a kernel flow.  In the meantime, more packets might
>>>>> have arrived and been queued to userspace.  Userspace will send these
>>>>> packets, but any packets that arrive after the kernel flow is set up
>>>>> will be forwarded directly by the kernel before those queued to
>>>>> userspace go out.
>>>>> 
>>>> 
>>>> 
>>>> This is exactly the problem we face while going for KVM paravirtualized 
>>>> network driver for Windows (NetKVM) certification.
>>>> There are a few automated tests that send bursts of packets and wait for 
>>>> the same packets and the same order on the other side.
>>>> 
>>>> We have a POC patches (pretty dirty) that solve the problem (below). The 
>>>> idea is simple - when datapath makes upcall it queues packets in kernel 
>>>> until user mode completes processing and downloads a new flow. It looks 
>>>> like overkill to queue packets per datapath, queueing per vport will be 
>>>> enough, but it was easier to implement this way and it proves the concept 
>>>> as well. Still, it is obvious there is performance and scaling impact so 
>>>> another ideas are highly welcome.
>>>> 
>>>> What do you think? Should we go for this solution and prepare clean 
>>>> patches for submission?
>>> 
>>> I think in order to fully solve the problem you actually need to queue
>>> per flow, rather than per port or per datapath. Otherwise, you end up
>>> serializing to one flow setup at a time, which is probably a bigger
>>> problem in practice than the one this is trying to solve.
>>> 
>> 
>> The problem is we are talking about moments when there is no flow for 
>> specific packet and user mode is working on it's creation. How can we 
>> serialise per flow in this case?
> 
> You would have to track pending flows in some form, either through the
> flow table or a new data structure and release packets when a real
> flow is installed.
> 
> I hope that there is a simpler way of solving this problem because it
> would add quite a bit of complexity (not just managing all of these
> queues but also enforcing limits like a socket buffer). However, this
> is no way that serializing at a per datapath or per port level is
> going to perform well enough to make that a viable solution.


Ok, got it. Thanks for the idea.

___
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss


Re: [ovs-discuss] Flow miss/Packet order question

2013-10-03 Thread Dmitry Fleytman

On Oct 3, 2013, at 04:20 AM, Jesse Gross  wrote:

> On Wed, Oct 2, 2013 at 4:49 AM, Dmitry Fleytman  wrote:
>> 
>> On Apr 30, 2012, at 20:15 PM, Ben Pfaff  wrote:
>> 
>>> I think that your explanation stems from a misunderstanding.  Yes, if
>>> an OpenFlow controller uses a reactive model, then it cannot avoid the
>>> problem.  However, I think that Joji is raising a different issue, one
>>> that is an implementation detail within Open vSwitch and that
>>> controllers have no power to avoid.
>>> 
>>> Let me explain in detail.  When a packet arrives for which there is no
>>> kernel flow, the kernel sends it to userspace.  Userspace sends the
>>> packet and sets up a kernel flow.  In the meantime, more packets might
>>> have arrived and been queued to userspace.  Userspace will send these
>>> packets, but any packets that arrive after the kernel flow is set up
>>> will be forwarded directly by the kernel before those queued to
>>> userspace go out.
>>> 
>> 
>> 
>> This is exactly the problem we face while going for KVM paravirtualized 
>> network driver for Windows (NetKVM) certification.
>> There are a few automated tests that send bursts of packets and wait for the 
>> same packets and the same order on the other side.
>> 
>> We have a POC patches (pretty dirty) that solve the problem (below). The 
>> idea is simple - when datapath makes upcall it queues packets in kernel 
>> until user mode completes processing and downloads a new flow. It looks like 
>> overkill to queue packets per datapath, queueing per vport will be enough, 
>> but it was easier to implement this way and it proves the concept as well. 
>> Still, it is obvious there is performance and scaling impact so another 
>> ideas are highly welcome.
>> 
>> What do you think? Should we go for this solution and prepare clean patches 
>> for submission?
> 
> I think in order to fully solve the problem you actually need to queue
> per flow, rather than per port or per datapath. Otherwise, you end up
> serializing to one flow setup at a time, which is probably a bigger
> problem in practice than the one this is trying to solve.
> 

The problem is we are talking about moments when there is no flow for specific 
packet and user mode is working on it's creation. How can we serialise per flow 
in this case?

> It's not entirely clear to me that the general solution is really
> worth the extra complexity for situations beyond the WHQL test so
> there might be a bandaid for that particular problem. Do you know
> exactly what it is doing?

The issue is there are a few tests that verify packet ordering and they all do 
different things, but generally speaking this is some "test" unidirectional 
traffic i.e. a bunch of packets being sent without waiting for responses, 
exactly the problematic case.

Unfortunately strict packets order is a must for MS certification.

___
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss


Re: [ovs-discuss] Flow miss/Packet order question

2013-10-02 Thread Dmitry Fleytman

On Apr 30, 2012, at 20:15 PM, Ben Pfaff  wrote:

> I think that your explanation stems from a misunderstanding.  Yes, if
> an OpenFlow controller uses a reactive model, then it cannot avoid the
> problem.  However, I think that Joji is raising a different issue, one
> that is an implementation detail within Open vSwitch and that
> controllers have no power to avoid.
> 
> Let me explain in detail.  When a packet arrives for which there is no
> kernel flow, the kernel sends it to userspace.  Userspace sends the
> packet and sets up a kernel flow.  In the meantime, more packets might
> have arrived and been queued to userspace.  Userspace will send these
> packets, but any packets that arrive after the kernel flow is set up
> will be forwarded directly by the kernel before those queued to
> userspace go out.
> 


This is exactly the problem we face while going for KVM paravirtualized network 
driver for Windows (NetKVM) certification.
There are a few automated tests that send bursts of packets and wait for the 
same packets and the same order on the other side.

We have a POC patches (pretty dirty) that solve the problem (below). The idea 
is simple - when datapath makes upcall it queues packets in kernel until user 
mode completes processing and downloads a new flow. It looks like overkill to 
queue packets per datapath, queueing per vport will be enough, but it was 
easier to implement this way and it proves the concept as well. Still, it is 
obvious there is performance and scaling impact so another ideas are highly 
welcome.

What do you think? Should we go for this solution and prepare clean patches for 
submission?

Regards,
Dmitry Fleytman

=
Patch for kernel part:

---
 datapath.c |  149 +++-
 datapath.h |8 +++
 2 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/datapath.c b/datapath.c
index 22e30ef..19bb7c4 100644
--- a/datapath.c
+++ b/datapath.c
@@ -210,11 +210,27 @@ void ovs_dp_process_received_packet(struct vport *p, 
struct sk_buff *skb)
int error;
int key_len;
 
+#ifdef REORDER_HACK
+   unsigned long flags;
+   spin_lock_irqsave(&dp->pending_list_lock, flags);
+
+   if(dp->has_pending_upcalls) {
+   *((struct vport **)&skb->cb) = p;
+   skb_queue_head(&dp->pending_skb_list, skb);
+   spin_unlock_irqrestore(&dp->pending_list_lock, flags);
+   return;
+   }
+
+#endif //REORDER_HACK
+
stats = this_cpu_ptr(dp->stats_percpu);
 
/* Extract flow from 'skb' into 'key'. */
error = ovs_flow_extract(skb, p->port_no, &key, &key_len);
if (unlikely(error)) {
+#ifdef REORDER_HACK
+   spin_unlock_irqrestore(&dp->pending_list_lock, flags);
+#endif //REORDER_HACK
kfree_skb(skb);
return;
}
@@ -224,6 +240,11 @@ void ovs_dp_process_received_packet(struct vport *p, 
struct sk_buff *skb)
if (unlikely(!flow)) {
struct dp_upcall_info upcall;
 
+#ifdef REORDER_HACK
+   dp->has_pending_upcalls = 1;
+   spin_unlock_irqrestore(&dp->pending_list_lock, flags);
+#endif //REORDER_HACK
+
upcall.cmd = OVS_PACKET_CMD_MISS;
upcall.key = &key;
upcall.userdata = NULL;
@@ -238,15 +259,77 @@ void ovs_dp_process_received_packet(struct vport *p, 
struct sk_buff *skb)
 
stats_counter = &stats->n_hit;
ovs_flow_used(OVS_CB(skb)->flow, skb);
+
+#ifdef REORDER_HACK
+   spin_unlock_irqrestore(&dp->pending_list_lock, flags);
+#endif //REORDER_HACK
+
ovs_execute_actions(dp, skb);
 
 out:
+
/* Update datapath statistics. */
u64_stats_update_begin(&stats->sync);
(*stats_counter)++;
u64_stats_update_end(&stats->sync);
 }
 
+#ifdef REORDER_HACK
+
+static
+int __ovs_dp_process_pending_packet(struct vport *p, struct sk_buff *skb)
+{
+   struct datapath *dp = p->dp;
+   struct sw_flow *flow;
+   struct dp_stats_percpu *stats;
+   struct sw_flow_key key;
+   u64 *stats_counter;
+   int error;
+   int key_len;
+   int res = 1;
+
+   stats = this_cpu_ptr(dp->stats_percpu);
+
+   /* Extract flow from 'skb' into 'key'. */
+   error = ovs_flow_extract(skb, p->port_no, &key, &key_len);
+   if (unlikely(error)) {
+   kfree_skb(skb);
+   return res;
+   }
+
+   /* Look up flow. */
+   flow = ovs_flow_tbl_lookup(rcu_dereference(dp->table), &key, key_len);
+   if (unlikely(!flow)) {
+   struct dp_upcall_info upcall;
+
+   upcall.cmd = OVS_PACKET_CMD_MISS;
+   upcall.key = &key