Hello community, here is the log from the commit of package python-logreduce for openSUSE:Factory checked in at 2018-10-25 09:13:04 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-logreduce (Old) and /work/SRC/openSUSE:Factory/.python-logreduce.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-logreduce" Thu Oct 25 09:13:04 2018 rev:4 rq:644414 version:0.2.0 Changes: -------- --- /work/SRC/openSUSE:Factory/python-logreduce/python-logreduce.changes 2018-08-10 09:49:53.282280583 +0200 +++ /work/SRC/openSUSE:Factory/.python-logreduce.new/python-logreduce.changes 2018-10-25 09:13:04.970263204 +0200 @@ -1,0 +2,16 @@ +Wed Oct 24 18:45:49 UTC 2018 - Dirk Mueller <[email protected]> + +- update to 0.2.0: + * Use ara[-\_]\*.\*/ in the default ignore paths list + * Fix download asyncio loop and logger names + * Record test command used to train models + * Add a uuid to model object + * Remove chunk grouping in the process function + * Rewrite html output using patternfly + * Collect ZuulBuild in anomaly report + * Add --cacheonly argument to skip file download + * Add ara-.\* to the default ignore list + * Rewrite ZuulBuilds download module to discover base log\_url + * common: small fixes for automated process + +------------------------------------------------------------------- Old: ---- logreduce-0.1.3.tar.gz New: ---- logreduce-0.2.0.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-logreduce.spec ++++++ --- /var/tmp/diff_new_pack.TLN4CW/_old 2018-10-25 09:13:05.902262646 +0200 +++ /var/tmp/diff_new_pack.TLN4CW/_new 2018-10-25 09:13:05.906262644 +0200 @@ -12,40 +12,40 @@ # license that conforms to the Open Source Definition (Version 1.9) # published by the Open Source Initiative. -# Please submit bugfixes or comments via http://bugs.opensuse.org/ +# Please submit bugfixes or comments via https://bugs.opensuse.org/ # %{?!python_module:%define python_module() python-%{**} python3-%{**}} %define skip_python2 1 Name: python-logreduce -Version: 0.1.3 +Version: 0.2.0 Release: 0 Summary: Log file anomaly extractor License: Apache-2.0 Group: Development/Languages/Python -Url: https://logreduce.softwarefactory-project.io/ +URL: https://logreduce.softwarefactory-project.io/ Source: https://files.pythonhosted.org/packages/source/l/logreduce/logreduce-%{version}.tar.gz BuildRequires: %{python_module devel} BuildRequires: %{python_module pbr} BuildRequires: %{python_module setuptools} +BuildRequires: fdupes BuildRequires: python-rpm-macros +Requires: python-PyYAML +Requires: python-aiohttp +Requires: python-numpy +Requires: python-scikit-learn +Requires: python-scipy +BuildArch: noarch # SECTION test requirements BuildRequires: %{python_module PyYAML} BuildRequires: %{python_module aiohttp} -BuildRequires: %{python_module nose} +BuildRequires: %{python_module mock} BuildRequires: %{python_module numpy} +BuildRequires: %{python_module pytest} BuildRequires: %{python_module scikit-learn} BuildRequires: %{python_module scipy} # /SECTION -BuildRequires: fdupes -Requires: python-PyYAML -Requires: python-aiohttp -Requires: python-numpy -Requires: python-scikit-learn -Requires: python-scipy -BuildArch: noarch - %python_subpackages %description @@ -74,6 +74,7 @@ %install %python_install %python_expand %fdupes %{buildroot}%{$python_sitelib} + %check %python_exec setup.py test ++++++ logreduce-0.1.3.tar.gz -> logreduce-0.2.0.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/ChangeLog new/logreduce-0.2.0/ChangeLog --- old/logreduce-0.1.3/ChangeLog 2018-07-04 08:41:29.000000000 +0200 +++ new/logreduce-0.2.0/ChangeLog 2018-08-27 04:25:49.000000000 +0200 @@ -1,6 +1,21 @@ CHANGES ======= +0.2.0 +----- + +* Use ara[-\_]\*.\*/ in the default ignore paths list +* Fix download asyncio loop and logger names +* Record test command used to train models +* Add a uuid to model object +* Remove chunk grouping in the process function +* Rewrite html output using patternfly +* Collect ZuulBuild in anomaly report +* Add --cacheonly argument to skip file download +* Add ara-.\* to the default ignore list +* Rewrite ZuulBuilds download module to discover base log\_url +* common: small fixes for automated process + 0.1.3 ----- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/PKG-INFO new/logreduce-0.2.0/PKG-INFO --- old/logreduce-0.1.3/PKG-INFO 2018-07-04 08:41:29.000000000 +0200 +++ new/logreduce-0.2.0/PKG-INFO 2018-08-27 04:25:49.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: logreduce -Version: 0.1.3 +Version: 0.2.0 Summary: Extract anomalies from log files Home-page: https://logreduce.softwarefactory-project.io/ Author: Tristan Cacqueray diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/doc/index.rst new/logreduce-0.2.0/doc/index.rst --- old/logreduce-0.1.3/doc/index.rst 2018-10-25 09:13:05.966262609 +0200 +++ new/logreduce-0.2.0/doc/index.rst 2018-08-27 04:25:34.000000000 +0200 @@ -1 +1,219 @@ -symbolic link to ../README.rst +logreduce - extract anomaly from log files +========================================== + +Based on success logs, logreduce highlights useful text in failed logs. +The goal is to save time in finding a failure's root cause. + +On average, learning run at 2000 lines per second, and +testing run at 1300 lines per seconds. + + +How it works +------------ + +logreduce uses a *model* to learn successful logs and detect novelties in +failed logs: + +* Random words are manually removed using regular expression +* Then lines are converted to a matrix of token occurrences + (using **HashingVectorizer**), +* An unsupervised learner implements neighbor searches + (using **NearestNeighbors**). + + +Caveats +------- + +This method doesn't work when debug content is only included in failed logs. +To successfully detect anomalies, failed and success logs needs to be similar, +otherwise the extra informations in failed logs will be considered anomalous. + +For example this happens with testr where success logs only contains 'SUCCESS'. + + +Install +------- + +* Fedora: + +.. code-block:: console + + sudo dnf install -y python3-scikit-learn + git clone https://softwarefactory-project.io/r/logreduce + pushd logreduce + python3 setup.py develop --user + popd + +* Pip: + +.. code-block:: console + + pip install --user logreduce + + +Usage +----- + +Logreduce needs a **baseline** for success log training, and a **target** +for the log to reduce. + +Logreduce prints anomalies on the console, the log files are not modified: + +.. code-block:: console + + "%(distance)f | %(log_path)s:%(line_number)d: %(log_line)s" + +Local file usage +................ + +* Compare two files or directories without building a model: + +.. code-block:: console + + $ logreduce diff testr-nodepool-01/output.good testr-nodepool-01/output.fail + 0.232 | testr-nodepool-01/output.fail:0677: File "voluptuous/schema_builder.py", line 370, in validate_mapping + 0.462 | testr-nodepool-01/output.fail:0678: raise er.MultipleInvalid(errors) + 0.650 | testr-nodepool-01/output.fail:0679: voluptuous.error.MultipleInvalid: required key not provided @ data['providers'][2]['cloud'] + +* Compare two files or directories: + +.. code-block:: console + + $ logreduce dir preprod-logs/ /var/log/ + + +* Or build a model first and run it separately: + +.. code-block:: console + + $ logreduce dir-train sosreport.clf old-sosreport/ good-sosreport/ + $ logreduce dir-run sosreport.clf new-sosreport/ + + +Zuul job usage +.............. + +Logreduce can query Zuul build database to train a model. + +* Extract novelty from a job logs: + +.. code-block:: console + + $ logreduce job http://logs.openstack.org/... + + # Reduce comparaison to a single project (e.g. for tox jobs) + $ logreduce job --project openstack/nova http://logs.openstack.org/... + + # Compare using many baselines + $ logreduce job --count 10 http://logs.openstack.org/... + + # Include job artifacts + $ logreduce job --include-path logs/ http:/logs.openstack.org/... + +* Or build a model first and run it separately: + +.. code-block:: console + + $ logreduce job-train --job job_name job_name.clf + $ logreduce job-run job_name.clf http://logs.openstack.org/.../ + + +Journald usage +.............. + +Logreduce can look for anomaly in journald, comparing the last day/week/month +to the previous one: + +* Extract novelty from last day journal: + +.. code-block:: console + + $ logreduce journal --range day + +* Build a model using journal of last month and look for novelty in last week: + +.. code-block:: console + + $ logreduce journal-train --range month good-journal.clf + $ logreduce journal-run --range week good-journal.clf + + +logreduce-tests +--------------- + +This package contains tests data for different type of log such as testr +or syslog. Each tests includes a pre-computed list of the anomalies in log +failures. + +This package also includes a command line utility to run logreduce against all +tests data and print a summary of its performance. + + +Test format +........... + +Each tests case is composed of: + +* A *.good* file (or directory) that holds the baseline +* A *.fail* file (or directory) +* A *info.yaml* file that describe expected output: + +.. code-block:: yaml + + threshold: float # set the distance threshold for the test + anomalies: + - optional: bool # to define minor anomalies not considered false positive + lines: | # the expected lines to be highlighted + Traceback... + RuntimeError... + + +Evaluate +........ + +To run the evaluation, first install logreduce-tests: + +.. code-block:: console + + git clone https://softwarefactory-project.io/r/logreduce-tests + pushd logreduce-tests + python3 setup.py develop --user + +logreduce-tests expect tests directories as argument: + +.. code-block:: console + + $ logreduce-tests tests/testr-zuul-[0-9]* + [testr-zuul-01]: 100.00% accuracy, 5.00% false-positive + [testr-zuul-02]: 80.00% accuracy, 0.00% false-positive + ... + Summary: 90.00% accuracy, 2.50% false-positive + +Add --debug to display false positive and missing chunks. + + +TODOs +----- + +* Add terminal colors output +* Add progress bar +* Better differentiate training debug from testing debug +* Add a starting log line and report written +* Add tarball traversal in utils.files_iterator +* Add logstash filter module +* Improve tokenization tests + + +Roadmap +------- +* Add daemon worker mode with MQTT event listener +* Discard files that are 100% anomalous +* Report mean diviation instead of absolute distances +* Investigate second stage model + + +Contribute +---------- + +Contribution are most welcome, use **git-review** to propose a change. +Setup your ssh keys after sign in https://softwarefactory-project.io/auth/login diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/cmd.py new/logreduce-0.2.0/logreduce/cmd.py --- old/logreduce-0.1.3/logreduce/cmd.py 2018-07-04 08:41:06.000000000 +0200 +++ new/logreduce-0.2.0/logreduce/cmd.py 2018-08-27 04:25:34.000000000 +0200 @@ -17,6 +17,7 @@ import logging import os import time +import yaml import logreduce.download import logreduce.utils @@ -77,6 +78,8 @@ parser.set_defaults(func=None) parser.add_argument("--debug", action="store_true", help="Print debug") parser.add_argument("--tmp-dir", default=os.getcwd()) + parser.add_argument("--cacheonly", action="store_true", + help="Do not download any logs") # Common arguments def path_filters(s): @@ -193,6 +196,8 @@ s = sub.add_parser("job-run", help="Run a model against CI logs") s.set_defaults(func=self.job_run) report_filters(s) + s.add_argument("--zuul-web", default=DEFAULT_ZUUL_WEB, + help="The zuul-web url (including the tenant name)") path_filters(s) s.add_argument("model_file") s.add_argument("logs_url", help="The CI logs url or a local dir") @@ -312,6 +317,7 @@ for pipeline in pipelines: for baseline in logreduce.download.ZuulBuilds(self.zuul_web).get( job=self.job, + result='SUCCESS', branch=self.branch, pipeline=pipeline, project=self.project, @@ -323,22 +329,21 @@ print("%s: couldn't find success in pipeline %s" % ( self.job, " ".join(pipelines))) exit(4) - baselines_paths = [] - url_prefixes = {} for baseline in baselines: - if baseline[-1] != "/": - baseline += "/" + if baseline['log_url'][-1] != "/": + baseline['log_url'] += "/" dest = os.path.join( - self.tmp_dir, "_baselines", self.job, baseline.split('/')[-2]) - self.download_logs(baseline, dest) - baselines_paths.append(dest) - url_prefixes["%s/" % dest] = baseline + self.tmp_dir, "_baselines", self.job, + baseline['log_url'].split('/')[-2]) + self.download_logs(baseline['log_url'], dest) + baseline['local_path'] = dest # Train model clf = self._get_classifier() - clf.train(baselines_paths, url_prefixes) + clf.train(baselines) clf.save(model_file) - print("%s: built with %s" % (model_file, " ".join(baselines))) + print("%s: built with %s" % ( + model_file, " ".join(map(str, baselines)))) return clf def job_run(self, model_file, logs_url): @@ -347,7 +352,8 @@ target = logs_url else: target = self.download_logs(logs_url) - self._report(clf, target) + build = self._get_build(target) + self._report(clf, build) def job_allinone(self, logs_url): if self.job is None: @@ -366,7 +372,33 @@ clf = self.job_train(model_file) target = self.download_logs(logs_url) - self._report(clf, target) + build = self._get_build(target) + self._report(clf, build) + + def _get_build(self, target): + build_cache = os.path.join(target, "zuul-info/build.json") + if os.path.exists(build_cache): + return logreduce.download.ZuulBuild(json.load(open(build_cache))) + inv_path = os.path.join(target, "zuul-info/inventory.yaml") + try: + inv = yaml.safe_load(open(inv_path)) + except FileNotFoundError: + self.log.info("%s: couldn't find file", inv_path) + return None + try: + build_uuid = inv['all']['vars']['zuul']['build'] + except KeyError: + self.log.info("%s: couldn't find build id", inv_path) + return None + try: + build = logreduce.download.ZuulBuilds(self.zuul_web).get( + uuid=build_uuid)[0] + except IndexError: + self.log.warning("%s: couldn't find build", build_uuid) + return None + build['local_path'] = target + json.dump(build, open(build_cache, "w")) + return build # Jounrald usage def journal_train(self, model_file): @@ -410,9 +442,12 @@ self.job = logs_url.split('/')[-3] target_dir = os.path.join( self.tmp_dir, "_targets", self.job, logs_url.split('/')[-2]) + if self.cacheonly: + return target_dir + os.makedirs(target_dir, exist_ok=True) - logs_path = ["job-output.txt.gz"] + logs_path = ["job-output.txt.gz", "zuul-info/inventory.yaml"] if self.include_path: logs_path.append(self.include_path) @@ -448,7 +483,6 @@ console_output = True if json_file or self.html: console_output = False - start_time = time.monotonic() output = clf.process(path=target_dirs, path_source=target_source, threshold=float(self.threshold), @@ -458,7 +492,6 @@ console_output=console_output) if not output.get("anomalies_count"): exit(4) - output["total_time"] = time.monotonic() - start_time if self.html: open(self.html, "w").write( render_html(output, self.static_location)) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/download.py new/logreduce-0.2.0/logreduce/download.py --- old/logreduce-0.1.3/logreduce/download.py 2018-07-04 08:41:06.000000000 +0200 +++ new/logreduce-0.2.0/logreduce/download.py 2018-08-27 04:25:34.000000000 +0200 @@ -19,6 +19,7 @@ import re import os from urllib.parse import urlparse +import urllib.request import aiohttp @@ -55,7 +56,7 @@ class RecursiveDownload: - log = logging.getLogger("RecursiveDownload") + log = logging.getLogger("logreduce.RecursiveDownload") def __init__(self, url, dest, threads=4, trim=None, exclude_files=[], exclude_paths=[], exclude_extensions=[]): @@ -67,9 +68,14 @@ self.active_worker = 0 self.trim = trim + try: + loop = asyncio.get_event_loop() + except RuntimeError: + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + self.queue = asyncio.Queue() self.queue.put_nowait(url) - loop = asyncio.get_event_loop() self.tasks = [loop.create_task(self.handle_task(idx)) for idx in range(threads)] @@ -155,38 +161,72 @@ self.active_worker -= 1 +class ZuulBuild(dict): + def __repr__(self): + inf = "id=%s ref=%s" % (self['uuid'][:7], self['ref']) + if self.get("project"): + inf += " project=%s" % self['project'] + if self.get('local_path'): + inf += " local_path=%s" % self['local_path'] + if self.get("log_url"): + inf += " log_url=%s" % self['log_url'] + return "<ZuulBuild %s>" % inf + + def __str__(self): + return self.__repr__() + + def __unicode__(self): + return self.__repr__() + + class ZuulBuilds: - log = logging.getLogger("ZuulBuilds") + log = logging.getLogger("logreduce.ZuulBuilds") def __init__(self, zuul_url): self.zuul_url = zuul_url - def get(self, **kwarg): - loop = asyncio.get_event_loop() - urls = loop.run_until_complete(self._get(**kwarg)) - return urls - - async def _get(self, job, project=None, pipeline=None, - branch=None, - count=3, result="SUCCESS"): - url = "%s/builds?job_name=%s" % (self.zuul_url, job) + def get(self, job=None, project=None, pipeline=None, + branch=None, uuid=None, + count=3, result=None): + url = "%s/builds" % self.zuul_url + args = "" + if job: + args += "&job_name=%s" % job if project: - url += "&project=%s" % project + args += "&project=%s" % project if branch: - url += "&branch=%s" % branch + args += "&branch=%s" % branch if pipeline: - url += "&pipeline=%s" % pipeline + args += "&pipeline=%s" % pipeline + if uuid: + args += "&uuid=%s" % uuid if result: - url += "&result=%s" % result - async with aiohttp.ClientSession() as session: - self.log.debug('Getting %s' % url) - async with session.get(url, timeout=30) as response: - assert response.status == 200 - data = await response.read() - urls = [] - for build in json.loads(data.decode('utf-8'))[:count]: - urls.append(build["log_url"]) - return urls + args += "&result=%s" % result + if args: + url = "%s?%s" % (url, args[1:]) + self.log.debug('Getting %s' % url) + resp = urllib.request.urlopen(url) + builds_data = json.loads(resp.read().decode('utf-8')) + builds = [] + for build in builds_data[:count]: + # Discover true log_url when success-url is nested + log_url = build["log_url"].rstrip('/') + attempts = 5 + while attempts > 0: + inf_url = os.path.join(log_url, "zuul-info/inventory.yaml") + self.log.debug('Checking %s' % inf_url) + req = urllib.request.Request(inf_url, method='HEAD') + try: + resp = urllib.request.urlopen(req) + if resp.status == 200: + build["log_url"] = "%s/" % log_url + break + except urllib.error.HTTPError: + pass + attempts -= 1 + log_url = os.path.dirname(log_url) + builds.append(ZuulBuild(build)) + return builds def main(): @@ -195,12 +235,12 @@ if args.url: urls.append(args.url) else: - for url in ZuulBuilds(args.zuul_url).get( + for build in ZuulBuilds(args.zuul_url).get( job=args.job, pipeline=args.pipeline, project=args.project, count=args.count): - urls.append(url) + urls.append(build['log_url']) for url in urls: print(RecursiveDownload(url, args.dest, args.threads, exclude_files=args.exclude_file, diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/html_output.py new/logreduce-0.2.0/logreduce/html_output.py --- old/logreduce-0.1.3/logreduce/html_output.py 2018-07-04 08:41:06.000000000 +0200 +++ new/logreduce-0.2.0/logreduce/html_output.py 2018-08-27 04:25:34.000000000 +0200 @@ -12,91 +12,269 @@ import html import os.path -import sys +import pkg_resources + +LOGO = """ +iVBORw0KGgoAAAANSUhEUgAAABcAAAAXBAMAAAASBMmTAAAAFVBMVEU6feU9geVjl+WMsOUicOXA +z+X///9aF/8vAAAAAXRSTlMAQObYZgAAAAFiS0dEAIgFHUgAAAAJcEhZcwAACxMAAAsTAQCanBgA +AAAHdElNRQfiCAsFFSh04lDsAAAAp0lEQVQY012QXQ7CQAiE158LyKYH6BDf7WAPYEkP4BLvfxVx +WxPjhge+sAwDpfy9oxgfX5hQhbctt0VV4Z1OjTVC+OowtqeNrekH7rO7qq/3/HcAZoIzOZZy5uDt +IrHKlEN0Yg9D9kugQkVbTaDxMkAM9lOJuvWAhinwkR782dVS+iAwjpqzsDlYr7uDcsKPt3TtEc7o ++8Siqeb7doQI9jwFjcv/YcobPpYhOB4CZRcAAAAASUVORK5CYII= +""" + +HTML_DOM = """ +<!DOCTYPE html> +<html class='layout-pf'> + <head> + <title>Logreduce of {target}</title> + <meta charset='UTF-8'> + <link rel='stylesheet' type='text/css' href='{ptnfly_css_loc}'> + <link rel='stylesheet' type='text/css' href='{ptnfly_cssa_loc}'> + <style> +.loglines {{max-height: 800px; overflow-y: scroll;}} +.list-group-item-container {{overflow: hidden;}} +.ls {{margin-top: 0px; margin-bottom: 10px; border-color: black;}} +#debuginfo {{display: none;}} + </style> + </head> + <body> + <nav class="navbar navbar-default navbar-pf" role="navigation"> + <div class="navbar-header"> + <img src="data:image/jpeg;base64,{logo}" alt="LogReduce" /> + </div> + <div class="collapse navbar-collapse navbar-collapse-1"> + <ul class="nav navbar-nav navbar-utility"> + <li><a href="#" id='debugbtn'>Show Debug</a></li> + <li><a href="https://pypi.org/project/logreduce/" target="_blank"> + Documentation + </a></li> + <li><a href="#"><strong>Version</strong> {version}</a></li> + </ul> + <ul class="nav navbar-nav navbar-primary"> + <li class="active"><a href="log-classify.html">Report</a></li> + <li><a href="ara-report/">ARA Records Ansible</a></li> + <li><a href="./">Job Artifacts</a></li> + </ul> + </div> + </nav> + <div class="container" style='width: 100%'> + {body} + </div> + <script src='{jquery_loc}'></script> + <script src='{bootst_loc}/js/bootstrap.min.js'></script> + <script src='{ptnfly_loc}/js/patternfly.min.js'></script> + <script>{js}</script> + </body> +</html> +""" + +JS = """ +$(document).ready(function(){ +$('#debugbtn').on('click', function(event) {$('[id=debuginfo]').toggle();}); +}); +$(".list-group-item-header").click(function(event){ + if(!$(event.target).is("button, a, input, .fa-ellipsis-v")){ + $(this).find(".fa-angle-right").toggleClass("fa-angle-down") + .end().parent().toggleClass("list-view-pf-expand-active") + .find(".list-group-item-container").toggleClass("hidden"); + } +}) +$(".list-group-item-container .close").on("click", function (){ + $(this).parent().addClass("hidden") + .parent().removeClass("list-view-pf-expand-active") + .find(".fa-angle-right").removeClass("fa-angle-down"); +}) +""" + + +def render_unmatch_list(dom, output): + if output.get("unknown_files"): + dom.append("<br /><h2>Unmatched file in previous success logs</h2>") + dom.append("<ul>") + for fname in output["unknown_files"]: + dom.append("<li><a href='%s'>%s</a></li>" % (fname[1], fname[0])) + dom.append("</ul>") + + +def table(dom, columns, rows): + dom.append( + "<div id='debuginfo' style='overflow-x: auto'>" + "<table style='white-space: nowrap; margin: 0px' " + "class='table table-condensed table-responsive table-bordered'>" + ) + if columns: + dom.append("<thead><tr>") + for col in columns: + dom.append("<th>%s</th>" % col) + dom.append("</tr></thead>") + dom.append("<tbody>") + for row in rows: + if columns and len(row) > len(columns): + dom.append("<tr id='%s'>" % row.pop()) + else: + dom.append("<tr>") + for col in row: + dom.append("<td>%s</td>" % col) + dom.append("</tr>") + dom.append("</tbody></table><br /></div>") + + +def render_result_info(dom, output): + rows = [] + if output.get("train_command"): + rows.append(("Test command", output["train_command"])) + rows.append(("Command", output["test_command"])) + rows.append(("Targets", "%s" % " ".join( + map(html.escape, map(str, output["targets"]))))) + rows.append(("Baselines", "%s" % " ".join( + map(html.escape, map(str, output["baselines"]))))) + rows.append(("Anomalies count", output["anomalies_count"])) + rows.append(("Run time", "%.2f seconds" % output["total_time"])) + rows.append(("Reduction", "%02.2f%% (from %d lines to %d)" % ( + output["reduction"], + output["testing_lines_count"], + output["outlier_lines_count"]))) + table(dom, columns=[], rows=rows) + + +def render_result_table(dom, files_sorted): + columns = [ + "Anomaly count", + "Filename", + "Test time", + "Model" + ] + rows = [] + for filename, data in files_sorted: + if not data["scores"]: + continue + rows.append(( + len(data["scores"]), + "<a href='#%s'>%s</a> (<a href='%s'>log link</a>)" % ( + filename.replace('/', '_'), filename, data["file_url"]), + "%.2f sec" % data["test_time"], + "<a href='#model_%s'>%s</a>" % (data["model"], data["model"]))) + table(dom, columns, rows) + + +def render_model_table(dom, model_sorted, links): + columns = [ + "Model", "Train time", "Infos", "Baseline files" + ] + rows = [] + for model_name, data in model_sorted: + rows.append([ + model_name, + "%.2f sec" % data["train_time"], + data["info"], + " ".join(links[model_name]), + "model_%s" % model_name, + ]) + table(dom, columns, rows) + + +def render_logfile(dom, filename, data, source_links, expanded=False): + lines_dom = [] + last_pos = None + for idx in range(len(data["scores"])): + pos, dist = data["scores"][idx] + line = data["lines"][idx] + lines_dom.append( + "<font color='#%02x0000'>%1.3f | %04d: %s</font><br />" % ( + int(255 * dist), dist, pos + 1, html.escape(line))) + if last_pos and last_pos != pos and pos - last_pos != 1: + lines_dom.append("<hr class='ls' />") + last_pos = pos + + expand = " hidden" + list_expand = "" + angle = "" + if expanded: + expand = "" + angle = " fa-angle-down" + list_expand = " list-view-pf-expand-active" + dom.append(""" + <div class="list-group-item{list_expand}" id='{anchor}'> + <div class="list-group-item-header"> + <div class="list-view-pf-expand"> + <span class="fa fa-angle-right{angle}"></span> + </div> + <div class="list-view-pf-main-info"> + <div class="list-view-pf-left"> + <span class="fa pficon-degraded list-view-pf-icon-sm"></span> + </div> + <div class="list-view-pf-body"> + <div class="list-view-pf-description"> + <div class="list-group-item-heading"> + {filename} + </div> + <div class="list-group-item-text"> + (<a href="{loglink}">log link</a>) + </div> + </div> + <div class="list-view-pf-additional-info-item" id='debuginfo'> + <span class="pficon pficon-registry"></span> + <a href="{model_link}">{model_name}</a> model + </div> + <div class="list-view-pf-additional-info-item"> + <span class="fa fa-bug"></span> + <strong>{anomaly_count}</strong> + </div> + </div> + </div> + </div> + <div class="list-group-item-container container-fluid{expand}"> + <div class="close"><span class="pficon pficon-close"></span></div> + <div id='debuginfo'>baseline samples:<ul>{baselines}</ul></div> + <div class="loglines"> + {lines} + </div> + </div> + </div> + """.format( + lines="\n".join(lines_dom), + baselines="".join(map(lambda x: "<li>%s</li>" % x, source_links)), + list_expand=list_expand, + expand=expand, + angle=angle, + anchor=filename.replace('/', '_'), + model_name=data['model'], + model_link="#model_%s" % data['model'], + anomaly_count=len(data["scores"]), + filename=filename, + loglink=data['file_url'],)) + return def render_html(output, static_location=None): if static_location: jquery_loc = static_location + "/js/jquery.min.js" - bootst_loc = static_location + "/bootstrap/" + bootst_loc = static_location + "/bootstrap" + ptnfly_loc = static_location + "/patternfly" else: jquery_loc = "https://code.jquery.com/jquery-3.3.1.min.js" bootst_loc = "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7" - dom = [ - "<html><head>" - "<title>Logreduce of {target}</title>" - "<link rel='stylesheet' href='{bootst_loc}/css/bootstrap.min.css'>" - "<script src='{jquery_loc}'></script>" - "<script src='{bootst_loc}/js/bootstrap.min.js'>".format( - target=" ".join(output["target"]), - jquery_loc=jquery_loc, - bootst_loc=bootst_loc)] - dom.append( - "</script><script>$(document).ready(function(){" - "$('#debugbtn').on('click', function(event) {\n" - "$('[id=debuginfo]').toggle();\n" - "});});" - "</script>" - "<style>.panel-body {max-height: 800; overflow-y: scroll;}\n" - "#debuginfo {display: none;}</style>" - "</head><body style='margin-left: 20px'>" - "<h2 id='debuginfo'>Logreduce</h2>" - "<button type='button' id='debugbtn' " - "class='pull-right btn-xs btn-primary btn'>Show Debug</button>" - "<h4>--> <a href='./'>Full logs</a> // " - "<a href='ara-report'>ARA Records Ansible</a> <--</h4>") - # Results info - dom.append("<ul id='debuginfo'>") - dom.append(" <li>Command: %s</li>" % " ".join(sys.argv)) - dom.append(" <li>Target: %s</li>" % " ".join(output["target"])) - dom.append(" <li>Baseline: %s</li>" % " ".join(output["baseline"])) - dom.append(" <li>Anomalies count: %d</li>" % output["anomalies_count"]) - dom.append(" <li>Run time: %.2f seconds</li>" % output["total_time"]) - dom.append(" <li>%02.2f%% reduction (from %d lines to %d)</li>" % ( - output["reduction"], - output["testing_lines_count"], - output["outlier_lines_count"])) - dom.append("</ul>") - # Results table of content - dom.append("<div id='debuginfo' style='overflow-x: scroll'>" - "<table style='white-space: nowrap; margin: 0px' " - "class='table table-condensed table-responsive'>" - "<thead><tr>" - "<th>Anomaly count</th><th>Filename</th>" - "<th>Test time</th><th>Model</th>" - "</tr></thead><tbody>") + ptnfly_loc = "https://cdnjs.cloudflare.com/ajax/libs/patternfly/3.24.0" + + ptnfly_css_loc = "%s/css/patternfly.min.css" % ptnfly_loc + ptnfly_cssa_loc = "%s/css/patternfly-additions.min.css" % ptnfly_loc + + body = [] + + render_result_info(body, output) + files_sorted = sorted( output['files'].items(), key=lambda x: (x[0].startswith("job-output.txt") or x[1]['mean_distance']), reverse=True) - for filename, data in files_sorted: - if not data["chunks"]: - continue - dom.append(" <tr>" - "<td>%d</td>" % len(data["scores"]) + - "<td><a href='#%s'>%s</a> (<a href='%s'>log link</a>)</td>" - % (filename.replace('/', '_'), filename, - data["file_url"]) + - "<td>%.2f sec</td>" % data["test_time"] + - "<td><a href='#model_%s'>%s</a></td>" % ( - data["model"], data["model"]) + - "</tr>") - dom.append("</tbody></table></div><br />") - - # Model table - model_dom = [ - "<div id='debuginfo' style='overflow-x: scroll'>" - "<table style='white-space: nowrap; margin: 0px' " - "class='table table-condensed table-responsive'>" - "<thead><tr>" - "<th>Model</th><th>Train time</th>" - "<th>Infos</th><th>Baseline files</th>" - "</tr></thead><tbody>"] - models_sorted = sorted(output['models'].items(), - key=lambda x: x[1]['train_time'], - reverse=True) + models_sorted = sorted( + output['models'].items(), + key=lambda x: x[1]['train_time'], + reverse=True) + links = {} for model_name, data in models_sorted: source_links = [] for source_file in data["source_files"]: @@ -110,62 +288,31 @@ )) else: source_links.append(source_file) - data["source_links"] = source_links - model_dom.append(" <tr id='model_%s'>" % model_name + - "<td>%s</td>" % model_name + - "<td>%.2f sec</td>" % data["train_time"] + - "<td>%s</td>" % data["info"] + - "<td>%s</td>" % " ".join(data["source_links"]) + - "</tr>") - model_dom.append("</tbody></table></div><br />") + links[model_name] = source_links - # Anomalies result table + render_result_table(body, files_sorted) + + body.append('<div class="list-group list-view-pf list-view-pf-view">') + first = True for filename, data in files_sorted: - if not data["chunks"]: + if not data["scores"]: continue - heading_dom = ( - "<div class='panel-heading'>" - "%s (<a href='%s'>log link</a>)" - "<span class='pull-right' id='debuginfo'>model: " - "<a href='#model_%s'>%s</a> (%s)" - "</span></div>" % ( - filename, data['file_url'], - data['model'], data['model'], - output["models"][data["model"]]["info"])) - - dom.append( - "<div class='panel panel-default' id='%s'>" % ( - filename.replace('/', '_')) + - heading_dom + - "<div class='panel-body'>") - # Link sample baseline - dom.append("<div id='debuginfo'>baseline samples:<ul>") - for source_link in output["models"][data["model"]]["source_links"]: - dom.append("<li>%s</li>" % source_link) - dom.append("</ul></div>") - for idx in range(len(data["chunks"])): - lines = data["chunks"][idx].split('\n') - for line_pos in range(len(lines)): - line_score = data["scores"][idx][line_pos] - dom.append( - "<font color='#%02x0000'>%1.3f | %04d: %s</font><br />" % ( - int(255 * line_score), - line_score, - data["line_pos"][idx][line_pos], - html.escape(lines[line_pos]))) - if idx < len(data["chunks"]) - 1: - dom.append("<hr style='margin-top: 0px; margin-bottom: 10px; " - "border-color: black;' />") - dom.append("</div></div>") + render_logfile(body, filename, data, links[data["model"]], first) + first = False + body.append('</div>') - dom.extend(model_dom) - if output.get("unknown_files"): - dom.append("<br /><h2>Unmatched file in previous success logs</h2>") - dom.append("<ul>") - for fname in output["unknown_files"]: - dom.append("<li><a href='%s'>%s</a></li>" % (fname[1], fname[0])) - dom.append("</ul>") - dom.append("<h4>--> <a href='./'>Full logs</a> // " - "<a href='ara-report'>ARA Records Ansible</a> <--</h4>") - dom.append("</body></html>") - return "\n".join(dom) + render_model_table(body, models_sorted, links) + + render_unmatch_list(body, output) + + return HTML_DOM.format( + target=" ".join(map(html.escape, map(str, output["targets"]))), + js=JS, + logo=LOGO.replace('\n', ''), + version=pkg_resources.get_distribution("logreduce").version, + body="\n".join(body), + jquery_loc=jquery_loc, + bootst_loc=bootst_loc, + ptnfly_loc=ptnfly_loc, + ptnfly_css_loc=ptnfly_css_loc, + ptnfly_cssa_loc=ptnfly_cssa_loc) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/process.py new/logreduce-0.2.0/logreduce/process.py --- old/logreduce-0.1.3/logreduce/process.py 2018-07-04 08:41:06.000000000 +0200 +++ new/logreduce-0.2.0/logreduce/process.py 2018-08-27 04:25:34.000000000 +0200 @@ -15,7 +15,9 @@ import os import re import struct +import sys import time +import uuid import numpy as np import sklearn.utils.validation @@ -31,8 +33,8 @@ class Classifier: - log = logging.getLogger("Classifier") - version = 3 + log = logging.getLogger("logreduce.Classifier") + version = 4 def __init__(self, model='bag-of-words_nn', exclude_paths=[], exclude_files=[]): @@ -41,6 +43,11 @@ self.exclude_paths = exclude_paths self.exclude_files = exclude_files self.test_prefix = None + # Default + self.threshold = 0.2 + self.merge_distance = 5 + self.before_context = 2 + self.after_context = 2 def get(self, model_name): return self.models.setdefault(model_name, @@ -105,16 +112,22 @@ # Remove numbers and symbols return re.subn(r'[^a-zA-Z\/\._-]*', '', shortfilename)[0] - def train(self, path, url_prefixes={}): - """Train the model""" + def train(self, baselines, command=sys.argv): + """Train the model, baselines can be path(s) or build dict(s)""" start_time = time.monotonic() + self.train_command = " ".join(command) self.training_lines_count = 0 self.training_size = 0 - self.baseline = path + if not isinstance(baselines, list): + baselines = [baselines] + if not len(baselines): + raise RuntimeError("Empty training baselines") + + self.baselines = baselines # Group similar files for the same model to_train = {} - for filename, filename_rel in files_iterator(path, + for filename, filename_rel in files_iterator(baselines, self.exclude_files, self.exclude_paths): if filename_rel: @@ -133,6 +146,7 @@ model = self.get(model_name) model.size = 0 model.count = 0 + model.uuid = str(uuid.uuid4()) # Tokenize and store all lines in train_data train_data = [] for filename in filenames: @@ -171,13 +185,16 @@ finally: if fobj: fobj.close() - # Set forig for report.html absolute url + # Check for remote file source location forig = filename - for prefix, url in url_prefixes.items(): - if filename.startswith(prefix): - forig = os.path.join(url, - filename[len(prefix):]) - break + for build in self.baselines: + if isinstance(build, dict): + build_prefix = "%s/" % build.get( + 'local_path', '').rstrip('/') + if filename.startswith(build_prefix): + forig = os.path.join(build.get('log_url'), + filename[len(build_prefix):]) + break model.sources.append(forig) if not train_data: @@ -214,14 +231,20 @@ # @profile - def test(self, path): - """Return outliers""" + def test(self, targets): + """Return outliers, target can be path(s) or build dict(s)""" start_time = time.monotonic() self.testing_lines_count = 0 self.testing_size = 0 self.outlier_lines_count = 0 + if not isinstance(targets, list): + targets = [targets] + if not len(targets): + raise RuntimeError("Empty testing targets") + + self.targets = targets - for filename, filename_rel in files_iterator(path, + for filename, filename_rel in files_iterator(targets, self.exclude_files, self.exclude_paths): if filename_rel: @@ -312,6 +335,9 @@ # Transform and compute distance from the model model = self.models[model_name] try: + # Distances are a list of float list. + # The HashingNeighbors vectorizer uses n_neighbors=1 to only + # return the closest distance to a known baseline vector. distances = model.test(test_data) except (sklearn.utils.validation.NotFittedError, sklearn.exceptions.NotFittedError): @@ -321,11 +347,12 @@ def get_line_info(line_pos): line = data[line_pos] try: - distance = distances[test_data_pos.index(line_pos)] + # Only keep the first distance + distance = distances[test_data_pos.index(line_pos)][0] except ValueError: # Line wasn't in test data try: - distance = distances[dup_pos[line_pos]] + distance = distances[dup_pos[line_pos]][0] except KeyError: # Line wasn't a duplicate distance = 0.0 @@ -369,14 +396,17 @@ raise RuntimeError("No test lines found") def process(self, path, path_source=None, threshold=0.2, merge_distance=5, - before_context=3, after_context=1, console_output=False): + before_context=3, after_context=1, console_output=False, + command=sys.argv): """Process target and create a report""" + start_time = time.monotonic() self.threshold = threshold self.merge_distance = merge_distance self.before_context = before_context self.after_context = after_context - output = {'files': {}, 'unknown_files': [], 'models': {}, - 'anomalies_count': 0} + output = {'files': {}, 'unknown_files': [], + 'models': {}, 'anomalies_count': 0, + 'baselines': self.baselines} for file_result in self.test(path): filename, filename_orig, model, outliers, test_time = file_result if model is None: @@ -390,68 +420,52 @@ 'source_files': list(map(str, model.sources)), 'train_time': model.train_time, 'info': model.info, + 'uuid': model.uuid, }) file_info = output['files'].setdefault(filename, { 'file_url': filename_orig, 'test_time': test_time, 'model': model.name, - 'chunks': [], 'scores': [], - 'line_pos': [], - 'lines_count': 0, + 'lines': [], }) - current_chunk = [] - current_score = [] - current_pos = [] last_pos = None self.log.debug("%s: compared with %s" % ( filename, " ".join(list(map(str, model.sources))))) for pos, distance, outlier in outliers: - distance = abs(float(distance)) - if last_pos and pos - last_pos != 1: - # New chunk - file_info["chunks"].append("\n".join(current_chunk)) - file_info["scores"].append(current_score) - file_info["line_pos"].append(current_pos) - file_info["lines_count"] += len(current_chunk) - current_chunk = [] - current_score = [] - current_pos = [] - if last_pos and console_output: - print() - - # Clean ansible one-liner outputs + # Expand one-liner outputs (e.g. ansible) for line in outlier[:-1].split(r'\n'): line = line.replace(r'\t', '\t') - current_score.append(distance) - current_chunk.append(line) - current_pos.append(pos) + file_info['scores'].append((pos, distance)) + file_info['lines'].append(line) if console_output: + if last_pos and last_pos != pos and \ + pos - last_pos != 1: + print() print("%1.3f | %s:%04d:\t%s" % (distance, filename, pos + 1, line)) - last_pos = pos - if current_chunk: - file_info["chunks"].append("\n".join(current_chunk)) - file_info["scores"].append(current_score) - file_info["line_pos"].append(current_pos) - file_info["lines_count"] += len(current_chunk) - # Compute mean distances of outliers mean_distance = 0 if file_info["scores"]: - mean_distance = np.mean(np.hstack(file_info["scores"])) + # [:, 1] returns an 1d array with the distances only + mean_distance = np.mean(np.array(file_info['scores'])[:, 1]) + # TODO: do not cound sequential lines, only blocks output["anomalies_count"] += len(file_info["scores"]) file_info["mean_distance"] = mean_distance + output['targets'] = self.targets output["training_lines_count"] = self.training_lines_count output["testing_lines_count"] = self.testing_lines_count output["outlier_lines_count"] = self.outlier_lines_count output["reduction"] = 100 - (output["outlier_lines_count"] / output["testing_lines_count"]) * 100 - output["baseline"] = self.baseline - output["target"] = [path] if isinstance(path, str) else path + test_command = " ".join(command) + if test_command != self.train_command: + output["train_command"] = self.train_command + output["test_command"] = test_command + output["total_time"] = time.monotonic() - start_time return output diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/test_download.py new/logreduce-0.2.0/logreduce/tests/test_download.py --- old/logreduce-0.1.3/logreduce/tests/test_download.py 1970-01-01 01:00:00.000000000 +0100 +++ new/logreduce-0.2.0/logreduce/tests/test_download.py 2018-08-27 04:25:34.000000000 +0200 @@ -0,0 +1,47 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); you may +# not use this file except in compliance with the License. You may obtain +# a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +# License for the specific language governing permissions and limitations +# under the License. + +import unittest +import json +import uuid +from mock import patch + +import logreduce.download + + +class MockResponse(object): + def __init__(self, resp_data, code=200, msg='OK'): + self.resp_data = resp_data + self.status = code + self.msg = msg + self.headers = {'content-type': 'text/plain; charset=utf-8'} + + def read(self): + return self.resp_data.encode('utf-8') + + +class DownloadTests(unittest.TestCase): + @patch('urllib.request.urlopen') + def test_zuul_builds(self, mock_request): + fake_builds = [] + for i in range(3): + build_uuid = str(uuid.uuid4()) + fake_builds.append({ + "uuid": build_uuid, + "branch": "master", + "results": "SUCCESS", + "ref_url": "http://zuul.example.com/change/42", + "log_url": "http://zuul.example.com/logs/%s" % build_uuid, + }) + mock_request.return_value = MockResponse(json.dumps(fake_builds)) + zb = logreduce.download.ZuulBuilds("http://zuul.example.com/api") + self.assertEquals(3, len(zb.get(result="SUCCESS"))) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/test_html_output.py new/logreduce-0.2.0/logreduce/tests/test_html_output.py --- old/logreduce-0.1.3/logreduce/tests/test_html_output.py 1970-01-01 01:00:00.000000000 +0100 +++ new/logreduce-0.2.0/logreduce/tests/test_html_output.py 2018-08-27 04:25:34.000000000 +0200 @@ -0,0 +1,23 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); you may +# not use this file except in compliance with the License. You may obtain +# a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +# License for the specific language governing permissions and limitations +# under the License. + +import unittest + +import logreduce.html_output + +from . utils import fake_result + + +class ProcessTests(unittest.TestCase): + def test_html_output(self): + html = logreduce.html_output.render_html(fake_result) + assert 'This is an anomaly' in html diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/test_process.py new/logreduce-0.2.0/logreduce/tests/test_process.py --- old/logreduce-0.1.3/logreduce/tests/test_process.py 1970-01-01 01:00:00.000000000 +0100 +++ new/logreduce-0.2.0/logreduce/tests/test_process.py 2018-08-27 04:25:34.000000000 +0200 @@ -0,0 +1,75 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); you may +# not use this file except in compliance with the License. You may obtain +# a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +# License for the specific language governing permissions and limitations +# under the License. + +import io +import unittest +import os + +import logreduce.process + + +class ProcessTests(unittest.TestCase): + def test_process_diff(self): + # Compare two python test files + clf = logreduce.process.Classifier() + baseline = __file__ + target = os.path.join(os.path.dirname(baseline), "test_download.py") + clf.train(baseline) + for file_result in clf.test(target): + filename, filename_orig, model, outliers, test_time = file_result + assert os.path.basename(model.sources[0]) == "test_process.py" + assert filename == "test_download.py" + assert test_time > 0 + assert len(outliers) > 0 + assert isinstance(outliers[0][0], int), 'line number wrong type' + assert isinstance(outliers[0][1], float), 'distance wrong type' + assert isinstance(outliers[0][2], str), 'line wrong type' + assert outliers[0][0] > 0, 'license matched as anomaly' + + # Save model and reload the model + model = io.BytesIO() + model.name = ":memory:" + clf.save(model) + model.seek(0) + logreduce.process.Classifier.check(model) + # joblib load reset the seek for io bytes, bypass model check in test + model = io.BytesIO(model.read()) + import sklearn + clf = sklearn.externals.joblib.load(model) + + # Re-use the model with another test file + target = os.path.join(os.path.dirname(baseline), "test_units.py") + for file_result in clf.test(target): + filename, filename_orig, model, outliers, test_time = file_result + assert os.path.basename(model.sources[0]) == "test_process.py" + assert filename == "test_units.py" + assert test_time > 0 + assert len(outliers) > 0 + assert isinstance(outliers[0][0], int), 'line number wrong type' + assert isinstance(outliers[0][1], float), 'distance wrong type' + assert isinstance(outliers[0][2], str), 'line wrong type' + assert outliers[0][0] > 0, 'license matched as anomaly' + + # Test the process method + result = clf.process(target) + assert result['baselines'] == [__file__] + assert result['targets'] == [target] + assert 'test_units.py' in result['files'] + file_info = result['files']['test_units.py'] + assert result['models']['test_process.py'].get('uuid') != '' + assert file_info['mean_distance'] > 0.0 + assert file_info['mean_distance'] < 1.0 + assert isinstance(file_info['lines'][0], str), 'line wrong type' + scores = file_info['scores'] + assert isinstance(scores[0][0], int), 'line number wrong type' + assert isinstance(scores[0][1], float), 'distance wrong type' + assert scores[0][0] > 0, 'license matched as anomaly' diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/utils.py new/logreduce-0.2.0/logreduce/tests/utils.py --- old/logreduce-0.1.3/logreduce/tests/utils.py 1970-01-01 01:00:00.000000000 +0100 +++ new/logreduce-0.2.0/logreduce/tests/utils.py 2018-08-27 04:25:34.000000000 +0200 @@ -0,0 +1,45 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); you may +# not use this file except in compliance with the License. You may obtain +# a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +# License for the specific language governing permissions and limitations +# under the License. + +fake_result = { + 'anomalies_count': 18, + 'baselines': ['test_process.py'], + 'files': { + 'test_units.py': { + 'file_url': 'test_units.py', + 'lines': [ + 'This is an anomaly...', + ], + 'scores': [ + (1, 0.8), + ], + 'mean_distance': 0.8, + 'model': 'test_process.py', + 'test_time': 0.005851114000506641 + } + }, + 'models': { + 'test_process.py': { + 'info': '65 samples, 108 features', + 'source_files': ['test_process.py'], + 'train_time': 0.012661808999837376 + } + }, + 'outlier_lines_count': 1, + 'reduction': 61.76470588235294, + 'targets': ['test_units.py'], + 'testing_lines_count': 34, + 'training_lines_count': 74, + 'total_time': 42, + 'unknown_files': [], + "test_command": "logreduce dir test_process.py test_units.py" +} diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce/utils.py new/logreduce-0.2.0/logreduce/utils.py --- old/logreduce-0.1.3/logreduce/utils.py 2018-07-04 08:41:06.000000000 +0200 +++ new/logreduce-0.2.0/logreduce/utils.py 2018-08-27 04:25:34.000000000 +0200 @@ -28,9 +28,7 @@ DEFAULT_IGNORE_PATHS = [ "zuul-info/", '_zuul_ansible/', - 'ara-report/', - 'ara-sf/', - 'ara/', + 'ara[_-]*.*/', 'etc/hostname', 'etc/nodepool/provider', # sf-ci useless static files @@ -230,6 +228,9 @@ # Copy path list paths = list(paths) for path in paths: + if isinstance(path, dict) and path.get('local_path'): + # This is a build object, return the log's local path + path = path['local_path'] if isinstance(path, Journal): yield (path, "") elif os.path.isfile(path): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce.egg-info/PKG-INFO new/logreduce-0.2.0/logreduce.egg-info/PKG-INFO --- old/logreduce-0.1.3/logreduce.egg-info/PKG-INFO 2018-07-04 08:41:29.000000000 +0200 +++ new/logreduce-0.2.0/logreduce.egg-info/PKG-INFO 2018-08-27 04:25:49.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: logreduce -Version: 0.1.3 +Version: 0.2.0 Summary: Extract anomalies from log files Home-page: https://logreduce.softwarefactory-project.io/ Author: Tristan Cacqueray diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce.egg-info/SOURCES.txt new/logreduce-0.2.0/logreduce.egg-info/SOURCES.txt --- old/logreduce-0.1.3/logreduce.egg-info/SOURCES.txt 2018-07-04 08:41:29.000000000 +0200 +++ new/logreduce-0.2.0/logreduce.egg-info/SOURCES.txt 2018-08-27 04:25:49.000000000 +0200 @@ -27,7 +27,12 @@ logreduce.egg-info/pbr.json logreduce.egg-info/requires.txt logreduce.egg-info/top_level.txt +logreduce/tests/__init__.py +logreduce/tests/test_download.py +logreduce/tests/test_html_output.py +logreduce/tests/test_process.py logreduce/tests/test_units.py +logreduce/tests/utils.py playbooks/logreduce-tests.yaml roles/emit-job-report/README.rst roles/emit-job-report/defaults/main.yaml diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/logreduce.egg-info/pbr.json new/logreduce-0.2.0/logreduce.egg-info/pbr.json --- old/logreduce-0.1.3/logreduce.egg-info/pbr.json 2018-07-04 08:41:29.000000000 +0200 +++ new/logreduce-0.2.0/logreduce.egg-info/pbr.json 2018-08-27 04:25:49.000000000 +0200 @@ -1 +1 @@ -{"git_version": "f071111", "is_release": true} \ No newline at end of file +{"git_version": "2cc0ffd", "is_release": true} \ No newline at end of file diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/setup.cfg new/logreduce-0.2.0/setup.cfg --- old/logreduce-0.1.3/setup.cfg 2018-07-04 08:41:29.000000000 +0200 +++ new/logreduce-0.2.0/setup.cfg 2018-08-27 04:25:49.000000000 +0200 @@ -15,6 +15,10 @@ Topic :: Scientific/Engineering keywords = machine learning, ci, anomaly detection +[tool:pytest] +addopts = --verbose +python_files = logreduce/tests/*.py + [files] packages = logreduce @@ -38,5 +42,4 @@ [egg_info] tag_build = tag_date = 0 -tag_svn_revision = 0 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/test-requirements.txt new/logreduce-0.2.0/test-requirements.txt --- old/logreduce-0.1.3/test-requirements.txt 2018-07-04 08:41:06.000000000 +0200 +++ new/logreduce-0.2.0/test-requirements.txt 2018-08-27 04:25:34.000000000 +0200 @@ -1 +1,2 @@ -nose +pytest +mock diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/logreduce-0.1.3/tox.ini new/logreduce-0.2.0/tox.ini --- old/logreduce-0.1.3/tox.ini 2018-07-04 08:41:06.000000000 +0200 +++ new/logreduce-0.2.0/tox.ini 2018-08-27 04:25:34.000000000 +0200 @@ -8,7 +8,7 @@ sitepackages = True usedevelop = True deps = -rtest-requirements.txt -commands = nosetests -v --cover-package=logreduce +commands = py.test -v [testenv:pep8] deps = flake8
