Re: [pox-dev] Profiling Pox

Murphy McCauley Fri, 22 Mar 2013 10:42:46 -0700

On Mar 22, 2013, at 6:32 AM, Tmusic wrote:

> I've been trying some things over the last couple days...
> 
> The pypy problem was indeed due to some external modules. Debug-pox.py does 
> not provide much helpfull information. Can I suggest adding a 
> traceback.print_exc() when import fails  ( +- line 80 in  boot.py [after: 
> print("Module not found:", base_name)]). In my case it really showed which 
> import failed.


This actually should happen.  I thought it was there with debug-pox.py, but can 
you try running pox.py --verbose and see if it gives a useful stack trace?

> For profiling I tried yappi (https://code.google.com/p/yappi/). Not as handy 
> as cProfile with RunSnakeRun, but it works with the threading model and 
> provides #calls, total time,... per function. It requires some changes in the 
> code, but it's possible to create some wrappers and load it as a POX module. 
> Let me know if you're interested in the code :)

Sounds interesting.  Do you have it in a github fork or anything?

> The second issue is the parsing of flow stats. When I'm getting a 
> flow_stat_reply from a switch I'm parsing the statistics for each flow 
> contained in the reply. It goes for up to about 300 flows, but from that 
> point links start disconnecting again. I tried to split the calculation into 
> different parts (don't process them in one loop, but fire an event for each 
> flow which will process only that flow). So far this had no measurable 
> impact. I'm guessing these events are processed right afterwards which 
> basically goes back to the "one big for loop" scenario. Can this be the case? 
> Pypy offers an improvement  of going up to about 550 flows, but then the same 
> issues arise again.
> 
> Further, I was looking at the recoco and revent libraries. What I'd like to 
> do is submit the "processing events" with a lower priority, so the packet in 
> events are processed first. I guess this can resolve the problem?
> Are there some features in recoco or revent which could help in implementing 
> this? When I print the length of the schedule queue (cycle function in 
> recoco), not all fired events seem to be scheduled as a separate tasks. Where 
> is the processing queue for the events situated?

Right, this would be my suggestion.  The OpenFlow event handlers are producers 
that fill a work queue and then you have a consumer in the form of a recoco 
Task that tries to drain the queue.

I think recoco could make this somewhat simpler than it is with just a little 
work, but I've so rarely hit performance problems that I've never fleshed it 
out.  In theory the recoco.events module might be a nice way to do this, but I 
think it's not as general purpose as it should be (and it has been a long time 
since I've used it at all).  I've thrown together a quick producer/consumer 
example using recoco:
  https://gist.github.com/MurphyMc/939fccd335fb3920f993

On my machine, run under CPython the consumer sometimes gets backlogged but 
eventually catches up, and on PyPy it pretty much stays caught up all the time. 
 In general, you'll need some application-specific logic for if/when the 
consumer gets really backed up (e.g., just throw away really old events, or 
don't yield and just churn through the backlog while stalling the other Tasks, 
or temporarily raise the consumer Task's priority, or etc.).  Or if the problem 
is that your production is just really bursty but not actually more than you 
can handle amortized over time then you can just ignore it as in the example.

Some of the things you can do to play with tuning the example are:
1. Adjust the consumer's priority.
2. Adjust the minimum number of items to consume (batch size) in one scheduling 
period (the min(10, ...) in run()).

In your case, I'd expect #1 might not do as much as you'd expect since it only 
matters when another Task -- e.g., the OpenFlow IO Task -- is actually waiting 
to run.  In your case, I'd expect the OpenFlow task to be mostly idle, but then 
you get a flow_stats and suddenly you have a lot of work to do.  #2 (or the 
equivalent) is probably more useful.  You want to set it high enough that 
you're not wasting time rescheduling constantly, but low enough that discovery 
doesn't get starved.

I think a lot of the time I get away with a much simpler approximation which is 
where events just update some state (e.g. counters, list of expired flows), and 
then I have a pending callDelayed which tries to process them, and then 
callDelayed()s itself again in some amount of time if there is nothing left to 
do and in some shorter amount of time if there is stuff left to do.

Another possibility may just be to adjust discovery's timeouts.  There's 
nothing magic about the defaults.

> And finally I noticed strongly varying performance (100 flows more or less in 
> the reply before it starts crashing) with exactly the same traffic patterns. 
> Has this something to do with the random generator in the recoco scheduler 
> cycle() function? 

Doubtful -- you probably don't have any Tasks now that have a priority other 
than 1, so the randomization shouldn't kick in.  My first guess is that this is 
nondeterminism caused by how Python 2.x is switching between the IO thread and 
the cooperative thread.  If you used the version of recoco from the debugger 
branch (which combines these into a single thread), you might find that it 
evens out.

-- Murphy

Re: [pox-dev] Profiling Pox

Reply via email to