This reminds me I haven't been able to parse your logs to check your data. If what you say is true, that the failure is presented when the system (kernel + userspace) booted in less than 90 secs, then I can make 2 observations:
1. your data and Randy's diverge, Randy's data has every systemd failure tied to a timeout message and a boot time greater than the unit's timeout. I would happily investigate further, if you send me your logs. 2. my test would not be really as effective as I think to catch the condition. Things to take into account: A. I'm using Randy's data to compare against yours because you both used the same method. I have my data that I obtained with a different method that supports my point too. B. (Randy/Brian)'s method and Benjamin's method threw data that they 3 assume is the condition that made the autobuilder's fail, while I don't say they are wrong, this is not data coming out of the actual failures of the autobuilder. Having that said, I would not disprove your nice/ionice change nor adding this test to check if it can be a single trap for the systemd errors which right now have too many faces. Benjamin On Thu, 2015-10-22 at 23:10 +0000, Avery, Brian wrote: > The problem with this is one of the runs I did had a lot of systemd > timeout failures but the stats were like: > > Fails: > min startup time: 54 sec > avg startup time: 114 sec > max startup time: 260 sec > > Successes: > min startup time: 46 sec > avg startup time: 110 sec > max startup time: 266 sec > > so I¹m not sure that the time is a good indicator. > > These numbers are pulled from the jounalctl line in the > ssh_target_log > file that looks like: > > Oct 21 11:19:40 qemux86 systemd[1]: Startup finished in 42.255s > (kernel) + > 3min 10.075s (userspace) = 3min 52.331s. > > > -b > > > > On 10/22/15, 3:45 PM, "Benjamin Esquivel" > <[email protected]> wrote: > > > the new test: > > oeqa.runtime.systemd.SystemdJournalTests.test_systemd_boot_time > > will query the target boot time from journactl and will > > compare it against the default systemd's unit start timeout: > > (TimeoutStartSec) > > > > if the boot time is greater than the timeout, it will fail > > > > the reason is because once the boot time exceeds the unit start > > timeout then it is uncertain that the systemd units are able to > > properly come up and function > > > > when this test fails, the rest of the systemd units (services) > > should be > > skipped > > > > this test also prints the startup time in the test log like: > > ... > > test_systemd_boot_time (oeqa.runtime.systemd.SystemdJournalTests) > > ... > > Startup finished in 6.922s (kernel) + 52.089s (userspace) = > > 59.011s. > > ... > > > > [YOCTO#8141] > > > > Signed-off-by: Benjamin Esquivel <[email protected] > > > > > --- > > meta/lib/oeqa/runtime/systemd.py | 73 > > ++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 73 insertions(+) > > > > diff --git a/meta/lib/oeqa/runtime/systemd.py > > b/meta/lib/oeqa/runtime/systemd.py > > index c74394c..251d06e 100644 > > --- a/meta/lib/oeqa/runtime/systemd.py > > +++ b/meta/lib/oeqa/runtime/systemd.py > > @@ -21,6 +21,33 @@ class SystemdTest(oeRuntimeTest): > > self.assertEqual(status, expected, message) > > return output > > > > + def journalctl(self, args='',l_match_units=[]): > > + """ > > + Request for the journalctl output to the current target > > system > > + > > + Arguments: > > + -args, an optional argument pass through argument > > + -l_match_units, an optional list of units to filter the > > output > > + Returns: > > + -string output of the journalctl command > > + Raises: > > + -AssertionError, on remote commands that fail > > + -ValueError, on a journalctl call with filtering by > > l_match_units that > > + returned no entries > > + """ > > + query_units="" > > + if len(l_match_units): > > + query_units = ['_SYSTEMD_UNIT='+unit for unit in > > l_match_units] > > + query_units = " ".join(query_units) > > + command = 'journalctl %s %s' %(args, query_units) > > + status, output = self.target.run(command) > > + if status: > > + raise AssertionError("Command '%s' returned non-zero > > exit \ > > + code %d:\n%s" % (command, status, output)) > > + if len(output) == 1 and "-- No entries --" in output: > > + raise ValueError("List of units to match: %s, returned > > no > > entries" > > + % l_match_units) > > + return output > > > > class SystemdBasicTests(SystemdTest): > > > > @@ -99,3 +126,49 @@ class SystemdJournalTests(SystemdTest): > > def test_systemd_journal(self): > > (status, output) = self.target.run('journalctl') > > self.assertEqual(status, 0, output) > > + > > + @skipUnlessPassed('test_systemd_basic') > > + def test_systemd_boot_time(self, systemd_TimeoutStartSec=90): > > + """ > > + Compare the target boot time from journalctl against > > TimeoutStartSec > > + > > + Arguments: > > + -systemd_TimeoutStartSec, an optional argument containing > > systemd's > > + unit start timeout > > + """ > > + > > + expr_items=["Startup finished","kernel", > > "userspace","\.$"] > > + try: > > + output = self.journalctl(args="-o cat --reverse") > > + except AssertionError: > > + self.fail("Error occurred while calling journalctl") > > + if not len(output): > > + self.fail("Error: unable to obtain the startup time > > from\ > > + systemd journal") > > + > > + # check for the regular expression items that match the > > startup > > time > > + for line in output.split('\n'): > > + check_match = > > "".join(re.findall(".*".join(expr_items), > > line)) > > + if check_match: break > > + # put the startup time in the test log > > + if check_match: > > + print "\n%s" % check_match > > + else: > > + self.fail("Error while obtaining the boot time from > > journalctl") > > + boot_time_sec = 0 > > + > > + # get the numeric values from the string and convert them > > to > > seconds > > + # same data will be placed in list and string for > > manipulation > > + l_boot_time = check_match.split(" ")[-2:] > > + s_boot_time = " ".join(l_boot_time) > > + # Obtain the minutes it took to boot > > + if l_boot_time[0].endswith('min') and > > l_boot_time[0][0].isdigit(): > > + boot_time_min = s_boot_time.split("min")[0] > > + # convert to seconds and accumulate it > > + boot_time_sec += int(boot_time_min) * 60 > > + # Obtain the seconds it took to boot and accumulate > > + boot_time_sec += float(l_boot_time[1].split("s")[0]) > > + #Assert the target boot time against systemd's unit start > > timeout > > + msg = "Target boot time %s exceeds systemd's > > TimeoutStartSec %s"\ > > + %(boot_time_sec, systemd_TimeoutStartSec) > > + self.assertTrue(boot_time_sec < systemd_TimeoutStartSec, > > msg = > > msg) -- _______________________________________________ Openembedded-core mailing list [email protected] http://lists.openembedded.org/mailman/listinfo/openembedded-core
