Hi Holger, On Wed, Jan 10, 2018 at 12:24:47AM +0000, Holger Freyther wrote: > the lua binding code was added to be able to automate OpenBSC tests. > In theory we should be able to do this for SMS and UpdateLocation > (call handling with MNCC exposing is left as a todo) but in practice > we miss a piece of software to coordinate this and run the test. We > miss it because it is an interesting problem but also I lost time on > switching countries, learning new tricks at a project...
Sure, I understand. However, it is definitely a part that we're very much looking forward to have :) > The basic testing structure looks easy as well. We want to define the > number of concurrent subscribers (0, 10, 100, 1000, n) and to make it > simple a single test (UL, send SMS, t) and execute the same test for > each subscriber and call it a success if y% of tests succeed within > time T. The way to measure this is easy as well. The lua script would > print some data (e.g. the name of the ms) when it starts and > completes. One might also think of a more structured format to return the data, but that could always added later. One could e.g. print a XML or JSON snippet that's easier to parse/consume by whoever processes it. What I also believe is very important is some kind of rate limiting / staggering when starting up. We know a single-BTS setup will for sure fail lots of LU if you stat 1k MS at the same time. So there should be some kind of provision to say something "start 1000 MS at a rate of 10 per second". I wouldn't go for more elaborate schemes, but simply a single linear rate/slope. > I am not sure if I should spawn, configure, add subscribers, a flavor > of Osmocom cellular? I look into having some set of templates for the > config, the stack to launch and in concept it looks awfully similar to > something the GSM tester is doing. Shall we leave virtbts/cellular to > the Osmocom tester and just focus on coordinating mobile? My feeling > is to leave this to the Osmo GSM tester. Yes, I think it's ok to focus on the "tester" side and not on the IUT (implementation under test) side. So we assume that the user will somehow bring up the [virtual] cellular network before excuting the load test. One preferred way of doing this is - I agree - by reusing those parts from osmo-gsm-tester. > If we have n subscribers I would launch m copies of "mobile" (but run > multiple MS in a single binary). I would argue the number of MS per 'mobile' should be configurable from 1-N. > So with 4 MS per mobile process and 10k subs we would end with 2.5k > processes + many log messages coming from each. The question is how many of those log messages do we need/want. In order to avoid the risk of 'mobile' blocking on writing to stdout/stderr, I think it would be best not to pipe that into other processes but write to files (could even be tmpfs!) and process the files after the run? > Would that scale with python? Should we look into doing this one in > Go? > Or can some of GSM tester be used (the template part)? I'm not sufficiently familiar with osmo-gsm-tester to say if we can use it. On an abstract level, I would think the "defining resources and generating configuration files" part should be reusable, but then it also just uses (jinja2?) templates that anyone can use in python. And whether it's sufficiently scalable to generate thousands of config files, I don't know either. > I would probably design this concurrently with Go(besides being the > first). I would suggest we keep not further the number of programming languages one needs to understand. But then, it's "just" a tool for load testing, so probably not that critical after all. My naive assumption would be that starting 2.5k processes (and processing the SIGCHILD from python should be possible without causing a performance/scalability problem? As indicated, log file processing could be handled later, or one could configure stdio logging to be absolutely minimal (with verbose logs going to files)? My attached test program (not using python 'subprocess' as I couldn't find a way to make it do non-blocking wait for the child to terminate) runs perfectly fine here, even without any rate limiting I get the following on my laptop: $ time ./subproc.py 2018-01-10 12:44:14,811 INFO Beginning starting of processes 2018-01-10 12:44:15,603 INFO Started 2500 processes 2018-01-10 12:44:18,607 INFO Waited for all processes ./subproc.py 2.74s user 1.46s system 108% cpu 3.881 total So 2500 processes could be forked in less than one second, and the starting/reaping in python needed onyl very few seconds of system time - compared with the amount of resources required to run the 'mobile' programs including the GSMTAP socket traffic etc. for sure neglectable? Now of course '/bin/sleep' is a much simpler program to start, but the overhead of the python "orchestration" doesn't change with the resource footprint of the program started. Just my thoughs, as usual. The decision is yours... -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
#!/usr/bin/env python3 import os import logging NUM_PROCS=2500 PROG="/bin/sleep" logger = logging.getLogger('') logger.setLevel(logging.INFO) console = logging.StreamHandler() console.setLevel(logging.INFO) formatter = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s') console.setFormatter(formatter) logger.addHandler(console) def start_proc(path, args): return os.spawnv(os.P_NOWAIT, path, [path] + args) logger.info("Beginning starting of processes") p_list = [] for i in range(0,NUM_PROCS): p = start_proc(PROG, ['3']) if p < 0: logger.error("Failed to start process: %d" % p) else: p_list.append(p) num_started = len(p_list) logger.info("Started %u processes" % num_started) for i in range(0, num_started): (pid, rc) = os.wait() if not pid in p_list: logger.error("not in list ?!?", t) if rc != 0: logger.error("Process %d exit error %d" % (pid, rc)) logger.info("Waited for all processes")