commit python-logreduce for openSUSE:Factory

root Thu, 25 Oct 2018 00:13:24 -0700

Hello community,

here is the log from the commit of package python-logreduce for 
openSUSE:Factory checked in at 2018-10-25 09:13:04
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-logreduce (Old)
 and      /work/SRC/openSUSE:Factory/.python-logreduce.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-logreduce"

Thu Oct 25 09:13:04 2018 rev:4 rq:644414 version:0.2.0

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-logreduce/python-logreduce.changes        
2018-08-10 09:49:53.282280583 +0200
+++ /work/SRC/openSUSE:Factory/.python-logreduce.new/python-logreduce.changes   
2018-10-25 09:13:04.970263204 +0200
@@ -1,0 +2,16 @@
+Wed Oct 24 18:45:49 UTC 2018 - Dirk Mueller <[email protected]>
+
+- update to 0.2.0:
+  * Use ara[-\_]\*.\*/ in the default ignore paths list
+  * Fix download asyncio loop and logger names
+  * Record test command used to train models
+  * Add a uuid to model object
+  * Remove chunk grouping in the process function
+  * Rewrite html output using patternfly
+  * Collect ZuulBuild in anomaly report
+  * Add --cacheonly argument to skip file download
+  * Add ara-.\* to the default ignore list
+  * Rewrite ZuulBuilds download module to discover base log\_url
+  * common: small fixes for automated process
+
+-------------------------------------------------------------------

Old:
----
  logreduce-0.1.3.tar.gz

New:
----
  logreduce-0.2.0.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-logreduce.spec ++++++
--- /var/tmp/diff_new_pack.TLN4CW/_old  2018-10-25 09:13:05.902262646 +0200
+++ /var/tmp/diff_new_pack.TLN4CW/_new  2018-10-25 09:13:05.906262644 +0200
@@ -12,40 +12,40 @@
 # license that conforms to the Open Source Definition (Version 1.9)
 # published by the Open Source Initiative.
 
-# Please submit bugfixes or comments via http://bugs.opensuse.org/
+# Please submit bugfixes or comments via https://bugs.opensuse.org/
 #
 
 
 %{?!python_module:%define python_module() python-%{**} python3-%{**}}
 %define skip_python2 1
 Name:           python-logreduce
-Version:        0.1.3
+Version:        0.2.0
 Release:        0
 Summary:        Log file anomaly extractor
 License:        Apache-2.0
 Group:          Development/Languages/Python
-Url:            https://logreduce.softwarefactory-project.io/
+URL:            https://logreduce.softwarefactory-project.io/
 Source:         
https://files.pythonhosted.org/packages/source/l/logreduce/logreduce-%{version}.tar.gz
 BuildRequires:  %{python_module devel}
 BuildRequires:  %{python_module pbr}
 BuildRequires:  %{python_module setuptools}
+BuildRequires:  fdupes
 BuildRequires:  python-rpm-macros
+Requires:       python-PyYAML
+Requires:       python-aiohttp
+Requires:       python-numpy
+Requires:       python-scikit-learn
+Requires:       python-scipy
+BuildArch:      noarch
 # SECTION test requirements
 BuildRequires:  %{python_module PyYAML}
 BuildRequires:  %{python_module aiohttp}
-BuildRequires:  %{python_module nose}
+BuildRequires:  %{python_module mock}
 BuildRequires:  %{python_module numpy}
+BuildRequires:  %{python_module pytest}
 BuildRequires:  %{python_module scikit-learn}
 BuildRequires:  %{python_module scipy}
 # /SECTION
-BuildRequires:  fdupes
-Requires:       python-PyYAML
-Requires:       python-aiohttp
-Requires:       python-numpy
-Requires:       python-scikit-learn
-Requires:       python-scipy
-BuildArch:      noarch
-
 %python_subpackages
 
 %description
@@ -74,6 +74,7 @@
 %install
 %python_install
 %python_expand %fdupes %{buildroot}%{$python_sitelib}
+
 %check
 %python_exec setup.py test
 

++++++ logreduce-0.1.3.tar.gz -> logreduce-0.2.0.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/ChangeLog 
new/logreduce-0.2.0/ChangeLog
--- old/logreduce-0.1.3/ChangeLog       2018-07-04 08:41:29.000000000 +0200
+++ new/logreduce-0.2.0/ChangeLog       2018-08-27 04:25:49.000000000 +0200
@@ -1,6 +1,21 @@
 CHANGES
 =======
 
+0.2.0
+-----
+
+* Use ara[-\_]\*.\*/ in the default ignore paths list
+* Fix download asyncio loop and logger names
+* Record test command used to train models
+* Add a uuid to model object
+* Remove chunk grouping in the process function
+* Rewrite html output using patternfly
+* Collect ZuulBuild in anomaly report
+* Add --cacheonly argument to skip file download
+* Add ara-.\* to the default ignore list
+* Rewrite ZuulBuilds download module to discover base log\_url
+* common: small fixes for automated process
+
 0.1.3
 -----
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/PKG-INFO new/logreduce-0.2.0/PKG-INFO
--- old/logreduce-0.1.3/PKG-INFO        2018-07-04 08:41:29.000000000 +0200
+++ new/logreduce-0.2.0/PKG-INFO        2018-08-27 04:25:49.000000000 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 1.1
 Name: logreduce
-Version: 0.1.3
+Version: 0.2.0
 Summary: Extract anomalies from log files
 Home-page: https://logreduce.softwarefactory-project.io/
 Author: Tristan Cacqueray
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/doc/index.rst 
new/logreduce-0.2.0/doc/index.rst
--- old/logreduce-0.1.3/doc/index.rst   2018-10-25 09:13:05.966262609 +0200
+++ new/logreduce-0.2.0/doc/index.rst   2018-08-27 04:25:34.000000000 +0200
@@ -1 +1,219 @@
-symbolic link to ../README.rst
+logreduce - extract anomaly from log files
+==========================================
+
+Based on success logs, logreduce highlights useful text in failed logs.
+The goal is to save time in finding a failure's root cause.
+
+On average, learning run at 2000 lines per second, and
+testing run at 1300 lines per seconds.
+
+
+How it works
+------------
+
+logreduce uses a *model* to learn successful logs and detect novelties in
+failed logs:
+
+* Random words are manually removed using regular expression
+* Then lines are converted to a matrix of token occurrences
+  (using **HashingVectorizer**),
+* An unsupervised learner implements neighbor searches
+  (using **NearestNeighbors**).
+
+
+Caveats
+-------
+
+This method doesn't work when debug content is only included in failed logs.
+To successfully detect anomalies, failed and success logs needs to be similar,
+otherwise the extra informations in failed logs will be considered anomalous.
+
+For example this happens with testr where success logs only contains 'SUCCESS'.
+
+
+Install
+-------
+
+* Fedora:
+
+.. code-block:: console
+
+  sudo dnf install -y python3-scikit-learn
+  git clone https://softwarefactory-project.io/r/logreduce
+  pushd logreduce
+  python3 setup.py develop --user
+  popd
+
+* Pip:
+
+.. code-block:: console
+
+  pip install --user logreduce
+
+
+Usage
+-----
+
+Logreduce needs a **baseline** for success log training, and a **target**
+for the log to reduce.
+
+Logreduce prints anomalies on the console, the log files are not modified:
+
+.. code-block:: console
+
+  "%(distance)f | %(log_path)s:%(line_number)d: %(log_line)s"
+
+Local file usage
+................
+
+* Compare two files or directories without building a model:
+
+.. code-block:: console
+
+  $ logreduce diff testr-nodepool-01/output.good testr-nodepool-01/output.fail
+  0.232 | testr-nodepool-01/output.fail:0677:  File 
"voluptuous/schema_builder.py", line 370, in validate_mapping
+  0.462 | testr-nodepool-01/output.fail:0678:    raise 
er.MultipleInvalid(errors)
+  0.650 | testr-nodepool-01/output.fail:0679:  
voluptuous.error.MultipleInvalid: required key not provided @ 
data['providers'][2]['cloud']
+
+* Compare two files or directories:
+
+.. code-block:: console
+
+  $ logreduce dir preprod-logs/ /var/log/
+
+
+* Or build a model first and run it separately:
+
+.. code-block:: console
+
+  $ logreduce dir-train sosreport.clf old-sosreport/ good-sosreport/
+  $ logreduce dir-run sosreport.clf new-sosreport/
+
+
+Zuul job usage
+..............
+
+Logreduce can query Zuul build database to train a model.
+
+* Extract novelty from a job logs:
+
+.. code-block:: console
+
+  $ logreduce job http://logs.openstack.org/...
+
+  # Reduce comparaison to a single project (e.g. for tox jobs)
+  $ logreduce job --project openstack/nova http://logs.openstack.org/...
+
+  # Compare using many baselines
+  $ logreduce job --count 10 http://logs.openstack.org/...
+
+  # Include job artifacts
+  $ logreduce job --include-path logs/ http:/logs.openstack.org/...
+
+* Or build a model first and run it separately:
+
+.. code-block:: console
+
+  $ logreduce job-train --job job_name job_name.clf
+  $ logreduce job-run job_name.clf http://logs.openstack.org/.../
+
+
+Journald usage
+..............
+
+Logreduce can look for anomaly in journald, comparing the last day/week/month
+to the previous one:
+
+* Extract novelty from last day journal:
+
+.. code-block:: console
+
+  $ logreduce journal --range day
+
+* Build a model using journal of last month and look for novelty in last week:
+
+.. code-block:: console
+
+  $ logreduce journal-train --range month good-journal.clf
+  $ logreduce journal-run --range week good-journal.clf
+
+
+logreduce-tests
+---------------
+
+This package contains tests data for different type of log such as testr
+or syslog. Each tests includes a pre-computed list of the anomalies in log
+failures.
+
+This package also includes a command line utility to run logreduce against all
+tests data and print a summary of its performance.
+
+
+Test format
+...........
+
+Each tests case is composed of:
+
+* A *.good* file (or directory) that holds the baseline
+* A *.fail* file (or directory)
+* A *info.yaml* file that describe expected output:
+
+.. code-block:: yaml
+
+  threshold: float # set the distance threshold for the test
+  anomalies:
+    - optional: bool  # to define minor anomalies not considered false positive
+      lines: |        # the expected lines to be highlighted
+        Traceback...
+        RuntimeError...
+
+
+Evaluate
+........
+
+To run the evaluation, first install logreduce-tests:
+
+.. code-block:: console
+
+  git clone https://softwarefactory-project.io/r/logreduce-tests
+  pushd logreduce-tests
+  python3 setup.py develop --user
+
+logreduce-tests expect tests directories as argument:
+
+.. code-block:: console
+
+  $ logreduce-tests tests/testr-zuul-[0-9]*
+  [testr-zuul-01]: 100.00% accuracy,  5.00% false-positive
+  [testr-zuul-02]:  80.00% accuracy,  0.00% false-positive
+  ...
+  Summary:  90.00% accuracy,  2.50% false-positive
+
+Add --debug to display false positive and missing chunks.
+
+
+TODOs
+-----
+
+* Add terminal colors output
+* Add progress bar
+* Better differentiate training debug from testing debug
+* Add a starting log line and report written
+* Add tarball traversal in utils.files_iterator
+* Add logstash filter module
+* Improve tokenization tests
+
+
+Roadmap
+-------
+* Add daemon worker mode with MQTT event listener
+* Discard files that are 100% anomalous
+* Report mean diviation instead of absolute distances
+* Investigate second stage model
+
+
+Contribute
+----------
+
+Contribution are most welcome, use **git-review** to propose a change.
+Setup your ssh keys after sign in https://softwarefactory-project.io/auth/login
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/cmd.py 
new/logreduce-0.2.0/logreduce/cmd.py
--- old/logreduce-0.1.3/logreduce/cmd.py        2018-07-04 08:41:06.000000000 
+0200
+++ new/logreduce-0.2.0/logreduce/cmd.py        2018-08-27 04:25:34.000000000 
+0200
@@ -17,6 +17,7 @@
 import logging
 import os
 import time
+import yaml
 
 import logreduce.download
 import logreduce.utils
@@ -77,6 +78,8 @@
         parser.set_defaults(func=None)
         parser.add_argument("--debug", action="store_true", help="Print debug")
         parser.add_argument("--tmp-dir", default=os.getcwd())
+        parser.add_argument("--cacheonly", action="store_true",
+                            help="Do not download any logs")
 
         # Common arguments
         def path_filters(s):
@@ -193,6 +196,8 @@
             s = sub.add_parser("job-run", help="Run a model against CI logs")
             s.set_defaults(func=self.job_run)
             report_filters(s)
+            s.add_argument("--zuul-web", default=DEFAULT_ZUUL_WEB,
+                           help="The zuul-web url (including the tenant name)")
             path_filters(s)
             s.add_argument("model_file")
             s.add_argument("logs_url", help="The CI logs url or a local dir")
@@ -312,6 +317,7 @@
         for pipeline in pipelines:
             for baseline in logreduce.download.ZuulBuilds(self.zuul_web).get(
                     job=self.job,
+                    result='SUCCESS',
                     branch=self.branch,
                     pipeline=pipeline,
                     project=self.project,
@@ -323,22 +329,21 @@
             print("%s: couldn't find success in pipeline %s" % (
                 self.job, " ".join(pipelines)))
             exit(4)
-        baselines_paths = []
-        url_prefixes = {}
         for baseline in baselines:
-            if baseline[-1] != "/":
-                baseline += "/"
+            if baseline['log_url'][-1] != "/":
+                baseline['log_url'] += "/"
             dest = os.path.join(
-                self.tmp_dir, "_baselines", self.job, baseline.split('/')[-2])
-            self.download_logs(baseline, dest)
-            baselines_paths.append(dest)
-            url_prefixes["%s/" % dest] = baseline
+                self.tmp_dir, "_baselines", self.job,
+                baseline['log_url'].split('/')[-2])
+            self.download_logs(baseline['log_url'], dest)
+            baseline['local_path'] = dest
 
         # Train model
         clf = self._get_classifier()
-        clf.train(baselines_paths, url_prefixes)
+        clf.train(baselines)
         clf.save(model_file)
-        print("%s: built with %s" % (model_file, " ".join(baselines)))
+        print("%s: built with %s" % (
+            model_file, " ".join(map(str, baselines))))
         return clf
 
     def job_run(self, model_file, logs_url):
@@ -347,7 +352,8 @@
             target = logs_url
         else:
             target = self.download_logs(logs_url)
-        self._report(clf, target)
+        build = self._get_build(target)
+        self._report(clf, build)
 
     def job_allinone(self, logs_url):
         if self.job is None:
@@ -366,7 +372,33 @@
             clf = self.job_train(model_file)
 
         target = self.download_logs(logs_url)
-        self._report(clf, target)
+        build = self._get_build(target)
+        self._report(clf, build)
+
+    def _get_build(self, target):
+        build_cache = os.path.join(target, "zuul-info/build.json")
+        if os.path.exists(build_cache):
+            return logreduce.download.ZuulBuild(json.load(open(build_cache)))
+        inv_path = os.path.join(target, "zuul-info/inventory.yaml")
+        try:
+            inv = yaml.safe_load(open(inv_path))
+        except FileNotFoundError:
+            self.log.info("%s: couldn't find file", inv_path)
+            return None
+        try:
+            build_uuid = inv['all']['vars']['zuul']['build']
+        except KeyError:
+            self.log.info("%s: couldn't find build id", inv_path)
+            return None
+        try:
+            build = logreduce.download.ZuulBuilds(self.zuul_web).get(
+                uuid=build_uuid)[0]
+        except IndexError:
+            self.log.warning("%s: couldn't find build", build_uuid)
+            return None
+        build['local_path'] = target
+        json.dump(build, open(build_cache, "w"))
+        return build
 
     # Jounrald usage
     def journal_train(self, model_file):
@@ -410,9 +442,12 @@
                 self.job = logs_url.split('/')[-3]
             target_dir = os.path.join(
                 self.tmp_dir, "_targets", self.job, logs_url.split('/')[-2])
+        if self.cacheonly:
+            return target_dir
+
         os.makedirs(target_dir, exist_ok=True)
 
-        logs_path = ["job-output.txt.gz"]
+        logs_path = ["job-output.txt.gz", "zuul-info/inventory.yaml"]
         if self.include_path:
             logs_path.append(self.include_path)
 
@@ -448,7 +483,6 @@
         console_output = True
         if json_file or self.html:
             console_output = False
-        start_time = time.monotonic()
         output = clf.process(path=target_dirs,
                              path_source=target_source,
                              threshold=float(self.threshold),
@@ -458,7 +492,6 @@
                              console_output=console_output)
         if not output.get("anomalies_count"):
             exit(4)
-        output["total_time"] = time.monotonic() - start_time
         if self.html:
             open(self.html, "w").write(
                 render_html(output, self.static_location))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/download.py 
new/logreduce-0.2.0/logreduce/download.py
--- old/logreduce-0.1.3/logreduce/download.py   2018-07-04 08:41:06.000000000 
+0200
+++ new/logreduce-0.2.0/logreduce/download.py   2018-08-27 04:25:34.000000000 
+0200
@@ -19,6 +19,7 @@
 import re
 import os
 from urllib.parse import urlparse
+import urllib.request
 
 import aiohttp
 
@@ -55,7 +56,7 @@
 
 
 class RecursiveDownload:
-    log = logging.getLogger("RecursiveDownload")
+    log = logging.getLogger("logreduce.RecursiveDownload")
 
     def __init__(self, url, dest, threads=4, trim=None,
                  exclude_files=[], exclude_paths=[], exclude_extensions=[]):
@@ -67,9 +68,14 @@
         self.active_worker = 0
         self.trim = trim
 
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+
         self.queue = asyncio.Queue()
         self.queue.put_nowait(url)
-        loop = asyncio.get_event_loop()
         self.tasks = [loop.create_task(self.handle_task(idx))
                       for idx in range(threads)]
 
@@ -155,38 +161,72 @@
             self.active_worker -= 1
 
 
+class ZuulBuild(dict):
+    def __repr__(self):
+        inf = "id=%s ref=%s" % (self['uuid'][:7], self['ref'])
+        if self.get("project"):
+            inf += " project=%s" % self['project']
+        if self.get('local_path'):
+            inf += " local_path=%s" % self['local_path']
+        if self.get("log_url"):
+            inf += " log_url=%s" % self['log_url']
+        return "<ZuulBuild %s>" % inf
+
+    def __str__(self):
+        return self.__repr__()
+
+    def __unicode__(self):
+        return self.__repr__()
+
+
 class ZuulBuilds:
-    log = logging.getLogger("ZuulBuilds")
+    log = logging.getLogger("logreduce.ZuulBuilds")
 
     def __init__(self, zuul_url):
         self.zuul_url = zuul_url
 
-    def get(self, **kwarg):
-        loop = asyncio.get_event_loop()
-        urls = loop.run_until_complete(self._get(**kwarg))
-        return urls
-
-    async def _get(self, job, project=None, pipeline=None,
-                   branch=None,
-                   count=3, result="SUCCESS"):
-        url = "%s/builds?job_name=%s" % (self.zuul_url, job)
+    def get(self, job=None, project=None, pipeline=None,
+            branch=None, uuid=None,
+            count=3, result=None):
+        url = "%s/builds" % self.zuul_url
+        args = ""
+        if job:
+            args += "&job_name=%s" % job
         if project:
-            url += "&project=%s" % project
+            args += "&project=%s" % project
         if branch:
-            url += "&branch=%s" % branch
+            args += "&branch=%s" % branch
         if pipeline:
-            url += "&pipeline=%s" % pipeline
+            args += "&pipeline=%s" % pipeline
+        if uuid:
+            args += "&uuid=%s" % uuid
         if result:
-            url += "&result=%s" % result
-        async with aiohttp.ClientSession() as session:
-            self.log.debug('Getting %s' % url)
-            async with session.get(url, timeout=30) as response:
-                assert response.status == 200
-                data = await response.read()
-                urls = []
-                for build in json.loads(data.decode('utf-8'))[:count]:
-                    urls.append(build["log_url"])
-                return urls
+            args += "&result=%s" % result
+        if args:
+            url = "%s?%s" % (url, args[1:])
+        self.log.debug('Getting %s' % url)
+        resp = urllib.request.urlopen(url)
+        builds_data = json.loads(resp.read().decode('utf-8'))
+        builds = []
+        for build in builds_data[:count]:
+            # Discover true log_url when success-url is nested
+            log_url = build["log_url"].rstrip('/')
+            attempts = 5
+            while attempts > 0:
+                inf_url = os.path.join(log_url, "zuul-info/inventory.yaml")
+                self.log.debug('Checking %s' % inf_url)
+                req = urllib.request.Request(inf_url, method='HEAD')
+                try:
+                    resp = urllib.request.urlopen(req)
+                    if resp.status == 200:
+                        build["log_url"] = "%s/" % log_url
+                        break
+                except urllib.error.HTTPError:
+                    pass
+                attempts -= 1
+                log_url = os.path.dirname(log_url)
+            builds.append(ZuulBuild(build))
+        return builds
 
 
 def main():
@@ -195,12 +235,12 @@
     if args.url:
         urls.append(args.url)
     else:
-        for url in ZuulBuilds(args.zuul_url).get(
+        for build in ZuulBuilds(args.zuul_url).get(
                 job=args.job,
                 pipeline=args.pipeline,
                 project=args.project,
                 count=args.count):
-            urls.append(url)
+            urls.append(build['log_url'])
     for url in urls:
         print(RecursiveDownload(url, args.dest, args.threads,
                                 exclude_files=args.exclude_file,
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/html_output.py 
new/logreduce-0.2.0/logreduce/html_output.py
--- old/logreduce-0.1.3/logreduce/html_output.py        2018-07-04 
08:41:06.000000000 +0200
+++ new/logreduce-0.2.0/logreduce/html_output.py        2018-08-27 
04:25:34.000000000 +0200
@@ -12,91 +12,269 @@
 
 import html
 import os.path
-import sys
+import pkg_resources
+
+LOGO = """
+iVBORw0KGgoAAAANSUhEUgAAABcAAAAXBAMAAAASBMmTAAAAFVBMVEU6feU9geVjl+WMsOUicOXA
+z+X///9aF/8vAAAAAXRSTlMAQObYZgAAAAFiS0dEAIgFHUgAAAAJcEhZcwAACxMAAAsTAQCanBgA
+AAAHdElNRQfiCAsFFSh04lDsAAAAp0lEQVQY012QXQ7CQAiE158LyKYH6BDf7WAPYEkP4BLvfxVx
+WxPjhge+sAwDpfy9oxgfX5hQhbctt0VV4Z1OjTVC+OowtqeNrekH7rO7qq/3/HcAZoIzOZZy5uDt
+IrHKlEN0Yg9D9kugQkVbTaDxMkAM9lOJuvWAhinwkR782dVS+iAwjpqzsDlYr7uDcsKPt3TtEc7o
++8Siqeb7doQI9jwFjcv/YcobPpYhOB4CZRcAAAAASUVORK5CYII=
+"""
+
+HTML_DOM = """
+<!DOCTYPE html>
+<html class='layout-pf'>
+  <head>
+    <title>Logreduce of {target}</title>
+    <meta charset='UTF-8'>
+    <link rel='stylesheet' type='text/css' href='{ptnfly_css_loc}'>
+    <link rel='stylesheet' type='text/css' href='{ptnfly_cssa_loc}'>
+    <style>
+.loglines {{max-height: 800px; overflow-y: scroll;}}
+.list-group-item-container {{overflow: hidden;}}
+.ls {{margin-top: 0px; margin-bottom: 10px; border-color: black;}}
+#debuginfo {{display: none;}}
+    </style>
+  </head>
+  <body>
+    <nav class="navbar navbar-default navbar-pf" role="navigation">
+      <div class="navbar-header">
+        <img src="data:image/jpeg;base64,{logo}" alt="LogReduce" />
+      </div>
+      <div class="collapse navbar-collapse navbar-collapse-1">
+        <ul class="nav navbar-nav navbar-utility">
+          <li><a href="#" id='debugbtn'>Show Debug</a></li>
+          <li><a href="https://pypi.org/project/logreduce/"; target="_blank">
+            Documentation
+          </a></li>
+          <li><a href="#"><strong>Version</strong> {version}</a></li>
+        </ul>
+        <ul class="nav navbar-nav navbar-primary">
+            <li class="active"><a href="log-classify.html">Report</a></li>
+            <li><a href="ara-report/">ARA Records Ansible</a></li>
+            <li><a href="./">Job Artifacts</a></li>
+        </ul>
+      </div>
+    </nav>
+    <div class="container" style='width: 100%'>
+      {body}
+    </div>
+    <script src='{jquery_loc}'></script>
+    <script src='{bootst_loc}/js/bootstrap.min.js'></script>
+    <script src='{ptnfly_loc}/js/patternfly.min.js'></script>
+    <script>{js}</script>
+  </body>
+</html>
+"""
+
+JS = """
+$(document).ready(function(){
+$('#debugbtn').on('click', function(event) {$('[id=debuginfo]').toggle();});
+});
+$(".list-group-item-header").click(function(event){
+  if(!$(event.target).is("button, a, input, .fa-ellipsis-v")){
+    $(this).find(".fa-angle-right").toggleClass("fa-angle-down")
+      .end().parent().toggleClass("list-view-pf-expand-active")
+      .find(".list-group-item-container").toggleClass("hidden");
+    }
+})
+$(".list-group-item-container .close").on("click", function (){
+  $(this).parent().addClass("hidden")
+         .parent().removeClass("list-view-pf-expand-active")
+         .find(".fa-angle-right").removeClass("fa-angle-down");
+})
+"""
+
+
+def render_unmatch_list(dom, output):
+    if output.get("unknown_files"):
+        dom.append("<br /><h2>Unmatched file in previous success logs</h2>")
+        dom.append("<ul>")
+        for fname in output["unknown_files"]:
+            dom.append("<li><a href='%s'>%s</a></li>" % (fname[1], fname[0]))
+        dom.append("</ul>")
+
+
+def table(dom, columns, rows):
+    dom.append(
+        "<div id='debuginfo' style='overflow-x: auto'>"
+        "<table style='white-space: nowrap; margin: 0px' "
+        "class='table table-condensed table-responsive table-bordered'>"
+    )
+    if columns:
+        dom.append("<thead><tr>")
+        for col in columns:
+            dom.append("<th>%s</th>" % col)
+        dom.append("</tr></thead>")
+    dom.append("<tbody>")
+    for row in rows:
+        if columns and len(row) > len(columns):
+            dom.append("<tr id='%s'>" % row.pop())
+        else:
+            dom.append("<tr>")
+        for col in row:
+            dom.append("<td>%s</td>" % col)
+        dom.append("</tr>")
+    dom.append("</tbody></table><br /></div>")
+
+
+def render_result_info(dom, output):
+    rows = []
+    if output.get("train_command"):
+        rows.append(("Test command", output["train_command"]))
+    rows.append(("Command", output["test_command"]))
+    rows.append(("Targets", "%s" % " ".join(
+        map(html.escape, map(str, output["targets"])))))
+    rows.append(("Baselines", "%s" % " ".join(
+        map(html.escape, map(str, output["baselines"])))))
+    rows.append(("Anomalies count", output["anomalies_count"]))
+    rows.append(("Run time", "%.2f seconds" % output["total_time"]))
+    rows.append(("Reduction", "%02.2f%% (from %d lines to %d)" % (
+        output["reduction"],
+        output["testing_lines_count"],
+        output["outlier_lines_count"])))
+    table(dom, columns=[], rows=rows)
+
+
+def render_result_table(dom, files_sorted):
+    columns = [
+        "Anomaly count",
+        "Filename",
+        "Test time",
+        "Model"
+    ]
+    rows = []
+    for filename, data in files_sorted:
+        if not data["scores"]:
+            continue
+        rows.append((
+            len(data["scores"]),
+            "<a href='#%s'>%s</a> (<a href='%s'>log link</a>)" % (
+                filename.replace('/', '_'), filename, data["file_url"]),
+            "%.2f sec" % data["test_time"],
+            "<a href='#model_%s'>%s</a>" % (data["model"], data["model"])))
+    table(dom, columns, rows)
+
+
+def render_model_table(dom, model_sorted, links):
+    columns = [
+        "Model", "Train time", "Infos", "Baseline files"
+    ]
+    rows = []
+    for model_name, data in model_sorted:
+        rows.append([
+            model_name,
+            "%.2f sec" % data["train_time"],
+            data["info"],
+            " ".join(links[model_name]),
+            "model_%s" % model_name,
+        ])
+    table(dom, columns, rows)
+
+
+def render_logfile(dom, filename, data, source_links, expanded=False):
+    lines_dom = []
+    last_pos = None
+    for idx in range(len(data["scores"])):
+        pos, dist = data["scores"][idx]
+        line = data["lines"][idx]
+        lines_dom.append(
+            "<font color='#%02x0000'>%1.3f | %04d: %s</font><br />" % (
+                int(255 * dist), dist, pos + 1, html.escape(line)))
+        if last_pos and last_pos != pos and pos - last_pos != 1:
+            lines_dom.append("<hr class='ls' />")
+        last_pos = pos
+
+    expand = " hidden"
+    list_expand = ""
+    angle = ""
+    if expanded:
+        expand = ""
+        angle = " fa-angle-down"
+        list_expand = " list-view-pf-expand-active"
+    dom.append("""
+    <div class="list-group-item{list_expand}" id='{anchor}'>
+      <div class="list-group-item-header">
+        <div class="list-view-pf-expand">
+          <span class="fa fa-angle-right{angle}"></span>
+        </div>
+        <div class="list-view-pf-main-info">
+          <div class="list-view-pf-left">
+            <span class="fa pficon-degraded list-view-pf-icon-sm"></span>
+          </div>
+          <div class="list-view-pf-body">
+            <div class="list-view-pf-description">
+              <div class="list-group-item-heading">
+                {filename}
+              </div>
+              <div class="list-group-item-text">
+                (<a href="{loglink}">log link</a>)
+              </div>
+            </div>
+            <div class="list-view-pf-additional-info-item" id='debuginfo'>
+              <span class="pficon pficon-registry"></span>
+              <a href="{model_link}">{model_name}</a> model
+            </div>
+            <div class="list-view-pf-additional-info-item">
+              <span class="fa fa-bug"></span>
+              <strong>{anomaly_count}</strong>
+            </div>
+          </div>
+        </div>
+      </div>
+      <div class="list-group-item-container container-fluid{expand}">
+        <div class="close"><span class="pficon pficon-close"></span></div>
+        <div id='debuginfo'>baseline samples:<ul>{baselines}</ul></div>
+        <div class="loglines">
+          {lines}
+        </div>
+      </div>
+    </div>
+    """.format(
+        lines="\n".join(lines_dom),
+        baselines="".join(map(lambda x: "<li>%s</li>" % x, source_links)),
+        list_expand=list_expand,
+        expand=expand,
+        angle=angle,
+        anchor=filename.replace('/', '_'),
+        model_name=data['model'],
+        model_link="#model_%s" % data['model'],
+        anomaly_count=len(data["scores"]),
+        filename=filename,
+        loglink=data['file_url'],))
+    return
 
 
 def render_html(output, static_location=None):
     if static_location:
         jquery_loc = static_location + "/js/jquery.min.js"
-        bootst_loc = static_location + "/bootstrap/"
+        bootst_loc = static_location + "/bootstrap"
+        ptnfly_loc = static_location + "/patternfly"
     else:
         jquery_loc = "https://code.jquery.com/jquery-3.3.1.min.js";
         bootst_loc = "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7";
-    dom = [
-        "<html><head>"
-        "<title>Logreduce of {target}</title>"
-        "<link rel='stylesheet' href='{bootst_loc}/css/bootstrap.min.css'>"
-        "<script src='{jquery_loc}'></script>"
-        "<script src='{bootst_loc}/js/bootstrap.min.js'>".format(
-            target=" ".join(output["target"]),
-            jquery_loc=jquery_loc,
-            bootst_loc=bootst_loc)]
-    dom.append(
-        "</script><script>$(document).ready(function(){"
-        "$('#debugbtn').on('click', function(event) {\n"
-        "$('[id=debuginfo]').toggle();\n"
-        "});});"
-        "</script>"
-        "<style>.panel-body {max-height: 800; overflow-y: scroll;}\n"
-        "#debuginfo {display: none;}</style>"
-        "</head><body style='margin-left: 20px'>"
-        "<h2 id='debuginfo'>Logreduce</h2>"
-        "<button type='button' id='debugbtn' "
-        "class='pull-right btn-xs btn-primary btn'>Show Debug</button>"
-        "<h4>--&gt; <a href='./'>Full logs</a> // "
-        "<a href='ara-report'>ARA Records Ansible</a> &lt;--</h4>")
-    # Results info
-    dom.append("<ul id='debuginfo'>")
-    dom.append("  <li>Command: %s</li>" % " ".join(sys.argv))
-    dom.append("  <li>Target: %s</li>" % " ".join(output["target"]))
-    dom.append("  <li>Baseline: %s</li>" % " ".join(output["baseline"]))
-    dom.append("  <li>Anomalies count: %d</li>" % output["anomalies_count"])
-    dom.append("  <li>Run time: %.2f seconds</li>" % output["total_time"])
-    dom.append("  <li>%02.2f%% reduction (from %d lines to %d)</li>" % (
-        output["reduction"],
-        output["testing_lines_count"],
-        output["outlier_lines_count"]))
-    dom.append("</ul>")
-    # Results table of content
-    dom.append("<div id='debuginfo' style='overflow-x: scroll'>"
-               "<table style='white-space: nowrap; margin: 0px' "
-               "class='table table-condensed table-responsive'>"
-               "<thead><tr>"
-               "<th>Anomaly count</th><th>Filename</th>"
-               "<th>Test time</th><th>Model</th>"
-               "</tr></thead><tbody>")
+        ptnfly_loc = "https://cdnjs.cloudflare.com/ajax/libs/patternfly/3.24.0";
+
+    ptnfly_css_loc = "%s/css/patternfly.min.css" % ptnfly_loc
+    ptnfly_cssa_loc = "%s/css/patternfly-additions.min.css" % ptnfly_loc
+
+    body = []
+
+    render_result_info(body, output)
+
     files_sorted = sorted(
         output['files'].items(),
         key=lambda x: (x[0].startswith("job-output.txt") or
                        x[1]['mean_distance']),
         reverse=True)
 
-    for filename, data in files_sorted:
-        if not data["chunks"]:
-            continue
-        dom.append("  <tr>"
-                   "<td>%d</td>" % len(data["scores"]) +
-                   "<td><a href='#%s'>%s</a> (<a href='%s'>log link</a>)</td>"
-                   % (filename.replace('/', '_'), filename,
-                      data["file_url"]) +
-                   "<td>%.2f sec</td>" % data["test_time"] +
-                   "<td><a href='#model_%s'>%s</a></td>" % (
-                       data["model"], data["model"]) +
-                   "</tr>")
-    dom.append("</tbody></table></div><br />")
-
-    # Model table
-    model_dom = [
-        "<div id='debuginfo' style='overflow-x: scroll'>"
-        "<table style='white-space: nowrap; margin: 0px' "
-        "class='table table-condensed table-responsive'>"
-        "<thead><tr>"
-        "<th>Model</th><th>Train time</th>"
-        "<th>Infos</th><th>Baseline files</th>"
-        "</tr></thead><tbody>"]
-    models_sorted = sorted(output['models'].items(),
-                           key=lambda x: x[1]['train_time'],
-                           reverse=True)
+    models_sorted = sorted(
+        output['models'].items(),
+        key=lambda x: x[1]['train_time'],
+        reverse=True)
+    links = {}
     for model_name, data in models_sorted:
         source_links = []
         for source_file in data["source_files"]:
@@ -110,62 +288,31 @@
                 ))
             else:
                 source_links.append(source_file)
-        data["source_links"] = source_links
-        model_dom.append("  <tr id='model_%s'>" % model_name +
-                         "<td>%s</td>" % model_name +
-                         "<td>%.2f sec</td>" % data["train_time"] +
-                         "<td>%s</td>" % data["info"] +
-                         "<td>%s</td>" % " ".join(data["source_links"]) +
-                         "</tr>")
-    model_dom.append("</tbody></table></div><br />")
+        links[model_name] = source_links
 
-    # Anomalies result table
+    render_result_table(body, files_sorted)
+
+    body.append('<div class="list-group list-view-pf list-view-pf-view">')
+    first = True
     for filename, data in files_sorted:
-        if not data["chunks"]:
+        if not data["scores"]:
             continue
-        heading_dom = (
-            "<div class='panel-heading'>"
-            "%s (<a href='%s'>log link</a>)"
-            "<span class='pull-right' id='debuginfo'>model: "
-            "<a href='#model_%s'>%s</a> (%s)"
-            "</span></div>" % (
-                filename, data['file_url'],
-                data['model'], data['model'],
-                output["models"][data["model"]]["info"]))
-
-        dom.append(
-            "<div class='panel panel-default' id='%s'>" % (
-                filename.replace('/', '_')) +
-            heading_dom +
-            "<div class='panel-body'>")
-        # Link sample baseline
-        dom.append("<div id='debuginfo'>baseline samples:<ul>")
-        for source_link in output["models"][data["model"]]["source_links"]:
-            dom.append("<li>%s</li>" % source_link)
-        dom.append("</ul></div>")
-        for idx in range(len(data["chunks"])):
-            lines = data["chunks"][idx].split('\n')
-            for line_pos in range(len(lines)):
-                line_score = data["scores"][idx][line_pos]
-                dom.append(
-                    "<font color='#%02x0000'>%1.3f | %04d: %s</font><br />" % (
-                        int(255 * line_score),
-                        line_score,
-                        data["line_pos"][idx][line_pos],
-                        html.escape(lines[line_pos])))
-            if idx < len(data["chunks"]) - 1:
-                dom.append("<hr style='margin-top: 0px; margin-bottom: 10px; "
-                           "border-color: black;' />")
-        dom.append("</div></div>")
+        render_logfile(body, filename, data, links[data["model"]], first)
+        first = False
+    body.append('</div>')
 
-    dom.extend(model_dom)
-    if output.get("unknown_files"):
-        dom.append("<br /><h2>Unmatched file in previous success logs</h2>")
-        dom.append("<ul>")
-        for fname in output["unknown_files"]:
-            dom.append("<li><a href='%s'>%s</a></li>" % (fname[1], fname[0]))
-        dom.append("</ul>")
-    dom.append("<h4>--&gt; <a href='./'>Full logs</a> // "
-               "<a href='ara-report'>ARA Records Ansible</a> &lt;--</h4>")
-    dom.append("</body></html>")
-    return "\n".join(dom)
+    render_model_table(body, models_sorted, links)
+
+    render_unmatch_list(body, output)
+
+    return HTML_DOM.format(
+        target=" ".join(map(html.escape, map(str, output["targets"]))),
+        js=JS,
+        logo=LOGO.replace('\n', ''),
+        version=pkg_resources.get_distribution("logreduce").version,
+        body="\n".join(body),
+        jquery_loc=jquery_loc,
+        bootst_loc=bootst_loc,
+        ptnfly_loc=ptnfly_loc,
+        ptnfly_css_loc=ptnfly_css_loc,
+        ptnfly_cssa_loc=ptnfly_cssa_loc)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/process.py 
new/logreduce-0.2.0/logreduce/process.py
--- old/logreduce-0.1.3/logreduce/process.py    2018-07-04 08:41:06.000000000 
+0200
+++ new/logreduce-0.2.0/logreduce/process.py    2018-08-27 04:25:34.000000000 
+0200
@@ -15,7 +15,9 @@
 import os
 import re
 import struct
+import sys
 import time
+import uuid
 
 import numpy as np
 import sklearn.utils.validation
@@ -31,8 +33,8 @@
 
 
 class Classifier:
-    log = logging.getLogger("Classifier")
-    version = 3
+    log = logging.getLogger("logreduce.Classifier")
+    version = 4
 
     def __init__(self,
                  model='bag-of-words_nn', exclude_paths=[], exclude_files=[]):
@@ -41,6 +43,11 @@
         self.exclude_paths = exclude_paths
         self.exclude_files = exclude_files
         self.test_prefix = None
+        # Default
+        self.threshold = 0.2
+        self.merge_distance = 5
+        self.before_context = 2
+        self.after_context = 2
 
     def get(self, model_name):
         return self.models.setdefault(model_name,
@@ -105,16 +112,22 @@
         # Remove numbers and symbols
         return re.subn(r'[^a-zA-Z\/\._-]*', '', shortfilename)[0]
 
-    def train(self, path, url_prefixes={}):
-        """Train the model"""
+    def train(self, baselines, command=sys.argv):
+        """Train the model, baselines can be path(s) or build dict(s)"""
         start_time = time.monotonic()
+        self.train_command = " ".join(command)
         self.training_lines_count = 0
         self.training_size = 0
-        self.baseline = path
+        if not isinstance(baselines, list):
+            baselines = [baselines]
+        if not len(baselines):
+            raise RuntimeError("Empty training baselines")
+
+        self.baselines = baselines
 
         # Group similar files for the same model
         to_train = {}
-        for filename, filename_rel in files_iterator(path,
+        for filename, filename_rel in files_iterator(baselines,
                                                      self.exclude_files,
                                                      self.exclude_paths):
             if filename_rel:
@@ -133,6 +146,7 @@
             model = self.get(model_name)
             model.size = 0
             model.count = 0
+            model.uuid = str(uuid.uuid4())
             # Tokenize and store all lines in train_data
             train_data = []
             for filename in filenames:
@@ -171,13 +185,16 @@
                 finally:
                     if fobj:
                         fobj.close()
-                # Set forig for report.html absolute url
+                # Check for remote file source location
                 forig = filename
-                for prefix, url in url_prefixes.items():
-                    if filename.startswith(prefix):
-                        forig = os.path.join(url,
-                                             filename[len(prefix):])
-                        break
+                for build in self.baselines:
+                    if isinstance(build, dict):
+                        build_prefix = "%s/" % build.get(
+                            'local_path', '').rstrip('/')
+                        if filename.startswith(build_prefix):
+                            forig = os.path.join(build.get('log_url'),
+                                                 filename[len(build_prefix):])
+                            break
                 model.sources.append(forig)
 
             if not train_data:
@@ -214,14 +231,20 @@
 
 
 #    @profile
-    def test(self, path):
-        """Return outliers"""
+    def test(self, targets):
+        """Return outliers, target can be path(s) or build dict(s)"""
         start_time = time.monotonic()
         self.testing_lines_count = 0
         self.testing_size = 0
         self.outlier_lines_count = 0
+        if not isinstance(targets, list):
+            targets = [targets]
+        if not len(targets):
+            raise RuntimeError("Empty testing targets")
+
+        self.targets = targets
 
-        for filename, filename_rel in files_iterator(path,
+        for filename, filename_rel in files_iterator(targets,
                                                      self.exclude_files,
                                                      self.exclude_paths):
             if filename_rel:
@@ -312,6 +335,9 @@
             # Transform and compute distance from the model
             model = self.models[model_name]
             try:
+                # Distances are a list of float list.
+                # The HashingNeighbors vectorizer uses n_neighbors=1 to only
+                # return the closest distance to a known baseline vector.
                 distances = model.test(test_data)
             except (sklearn.utils.validation.NotFittedError,
                     sklearn.exceptions.NotFittedError):
@@ -321,11 +347,12 @@
             def get_line_info(line_pos):
                 line = data[line_pos]
                 try:
-                    distance = distances[test_data_pos.index(line_pos)]
+                    # Only keep the first distance
+                    distance = distances[test_data_pos.index(line_pos)][0]
                 except ValueError:
                     # Line wasn't in test data
                     try:
-                        distance = distances[dup_pos[line_pos]]
+                        distance = distances[dup_pos[line_pos]][0]
                     except KeyError:
                         # Line wasn't a duplicate
                         distance = 0.0
@@ -369,14 +396,17 @@
             raise RuntimeError("No test lines found")
 
     def process(self, path, path_source=None, threshold=0.2, merge_distance=5,
-                before_context=3, after_context=1, console_output=False):
+                before_context=3, after_context=1, console_output=False,
+                command=sys.argv):
         """Process target and create a report"""
+        start_time = time.monotonic()
         self.threshold = threshold
         self.merge_distance = merge_distance
         self.before_context = before_context
         self.after_context = after_context
-        output = {'files': {}, 'unknown_files': [], 'models': {},
-                  'anomalies_count': 0}
+        output = {'files': {}, 'unknown_files': [],
+                  'models': {}, 'anomalies_count': 0,
+                  'baselines': self.baselines}
         for file_result in self.test(path):
             filename, filename_orig, model, outliers, test_time = file_result
             if model is None:
@@ -390,68 +420,52 @@
                 'source_files': list(map(str, model.sources)),
                 'train_time': model.train_time,
                 'info': model.info,
+                'uuid': model.uuid,
             })
             file_info = output['files'].setdefault(filename, {
                 'file_url': filename_orig,
                 'test_time': test_time,
                 'model': model.name,
-                'chunks': [],
                 'scores': [],
-                'line_pos': [],
-                'lines_count': 0,
+                'lines': [],
             })
-            current_chunk = []
-            current_score = []
-            current_pos = []
             last_pos = None
             self.log.debug("%s: compared with %s" % (
                 filename, " ".join(list(map(str, model.sources)))))
 
             for pos, distance, outlier in outliers:
-                distance = abs(float(distance))
-                if last_pos and pos - last_pos != 1:
-                    # New chunk
-                    file_info["chunks"].append("\n".join(current_chunk))
-                    file_info["scores"].append(current_score)
-                    file_info["line_pos"].append(current_pos)
-                    file_info["lines_count"] += len(current_chunk)
-                    current_chunk = []
-                    current_score = []
-                    current_pos = []
-                    if last_pos and console_output:
-                        print()
-
-                # Clean ansible one-liner outputs
+                # Expand one-liner outputs (e.g. ansible)
                 for line in outlier[:-1].split(r'\n'):
                     line = line.replace(r'\t', '\t')
-                    current_score.append(distance)
-                    current_chunk.append(line)
-                    current_pos.append(pos)
+                    file_info['scores'].append((pos, distance))
+                    file_info['lines'].append(line)
                     if console_output:
+                        if last_pos and last_pos != pos and \
+                                pos - last_pos != 1:
+                            print()
                         print("%1.3f | %s:%04d:\t%s" % (distance,
                                                         filename,
                                                         pos + 1,
                                                         line))
 
-                last_pos = pos
-            if current_chunk:
-                file_info["chunks"].append("\n".join(current_chunk))
-                file_info["scores"].append(current_score)
-                file_info["line_pos"].append(current_pos)
-                file_info["lines_count"] += len(current_chunk)
-
             # Compute mean distances of outliers
             mean_distance = 0
             if file_info["scores"]:
-                mean_distance = np.mean(np.hstack(file_info["scores"]))
+                # [:, 1] returns an 1d array with the distances only
+                mean_distance = np.mean(np.array(file_info['scores'])[:, 1])
+                # TODO: do not cound sequential lines, only blocks
                 output["anomalies_count"] += len(file_info["scores"])
             file_info["mean_distance"] = mean_distance
 
+        output['targets'] = self.targets
         output["training_lines_count"] = self.training_lines_count
         output["testing_lines_count"] = self.testing_lines_count
         output["outlier_lines_count"] = self.outlier_lines_count
         output["reduction"] = 100 - (output["outlier_lines_count"] /
                                      output["testing_lines_count"]) * 100
-        output["baseline"] = self.baseline
-        output["target"] = [path] if isinstance(path, str) else path
+        test_command = " ".join(command)
+        if test_command != self.train_command:
+            output["train_command"] = self.train_command
+        output["test_command"] = test_command
+        output["total_time"] = time.monotonic() - start_time
         return output
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/test_download.py 
new/logreduce-0.2.0/logreduce/tests/test_download.py
--- old/logreduce-0.1.3/logreduce/tests/test_download.py        1970-01-01 
01:00:00.000000000 +0100
+++ new/logreduce-0.2.0/logreduce/tests/test_download.py        2018-08-27 
04:25:34.000000000 +0200
@@ -0,0 +1,47 @@
+# Licensed under the Apache License, Version 2.0 (the "License"); you may
+# not use this file except in compliance with the License. You may obtain
+# a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+
+import unittest
+import json
+import uuid
+from mock import patch
+
+import logreduce.download
+
+
+class MockResponse(object):
+    def __init__(self, resp_data, code=200, msg='OK'):
+        self.resp_data = resp_data
+        self.status = code
+        self.msg = msg
+        self.headers = {'content-type': 'text/plain; charset=utf-8'}
+
+    def read(self):
+        return self.resp_data.encode('utf-8')
+
+
+class DownloadTests(unittest.TestCase):
+    @patch('urllib.request.urlopen')
+    def test_zuul_builds(self, mock_request):
+        fake_builds = []
+        for i in range(3):
+            build_uuid = str(uuid.uuid4())
+            fake_builds.append({
+                "uuid": build_uuid,
+                "branch": "master",
+                "results": "SUCCESS",
+                "ref_url": "http://zuul.example.com/change/42";,
+                "log_url": "http://zuul.example.com/logs/%s"; % build_uuid,
+            })
+        mock_request.return_value = MockResponse(json.dumps(fake_builds))
+        zb = logreduce.download.ZuulBuilds("http://zuul.example.com/api";)
+        self.assertEquals(3, len(zb.get(result="SUCCESS")))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/test_html_output.py 
new/logreduce-0.2.0/logreduce/tests/test_html_output.py
--- old/logreduce-0.1.3/logreduce/tests/test_html_output.py     1970-01-01 
01:00:00.000000000 +0100
+++ new/logreduce-0.2.0/logreduce/tests/test_html_output.py     2018-08-27 
04:25:34.000000000 +0200
@@ -0,0 +1,23 @@
+# Licensed under the Apache License, Version 2.0 (the "License"); you may
+# not use this file except in compliance with the License. You may obtain
+# a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+
+import unittest
+
+import logreduce.html_output
+
+from . utils import fake_result
+
+
+class ProcessTests(unittest.TestCase):
+    def test_html_output(self):
+        html = logreduce.html_output.render_html(fake_result)
+        assert 'This is an anomaly' in html
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/test_process.py 
new/logreduce-0.2.0/logreduce/tests/test_process.py
--- old/logreduce-0.1.3/logreduce/tests/test_process.py 1970-01-01 
01:00:00.000000000 +0100
+++ new/logreduce-0.2.0/logreduce/tests/test_process.py 2018-08-27 
04:25:34.000000000 +0200
@@ -0,0 +1,75 @@
+# Licensed under the Apache License, Version 2.0 (the "License"); you may
+# not use this file except in compliance with the License. You may obtain
+# a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+
+import io
+import unittest
+import os
+
+import logreduce.process
+
+
+class ProcessTests(unittest.TestCase):
+    def test_process_diff(self):
+        # Compare two python test files
+        clf = logreduce.process.Classifier()
+        baseline = __file__
+        target = os.path.join(os.path.dirname(baseline), "test_download.py")
+        clf.train(baseline)
+        for file_result in clf.test(target):
+            filename, filename_orig, model, outliers, test_time = file_result
+            assert os.path.basename(model.sources[0]) == "test_process.py"
+            assert filename == "test_download.py"
+            assert test_time > 0
+            assert len(outliers) > 0
+            assert isinstance(outliers[0][0], int), 'line number wrong type'
+            assert isinstance(outliers[0][1], float), 'distance wrong type'
+            assert isinstance(outliers[0][2], str), 'line wrong type'
+            assert outliers[0][0] > 0, 'license matched as anomaly'
+
+        # Save model and reload the model
+        model = io.BytesIO()
+        model.name = ":memory:"
+        clf.save(model)
+        model.seek(0)
+        logreduce.process.Classifier.check(model)
+        # joblib load reset the seek for io bytes, bypass model check in test
+        model = io.BytesIO(model.read())
+        import sklearn
+        clf = sklearn.externals.joblib.load(model)
+
+        # Re-use the model with another test file
+        target = os.path.join(os.path.dirname(baseline), "test_units.py")
+        for file_result in clf.test(target):
+            filename, filename_orig, model, outliers, test_time = file_result
+            assert os.path.basename(model.sources[0]) == "test_process.py"
+            assert filename == "test_units.py"
+            assert test_time > 0
+            assert len(outliers) > 0
+            assert isinstance(outliers[0][0], int), 'line number wrong type'
+            assert isinstance(outliers[0][1], float), 'distance wrong type'
+            assert isinstance(outliers[0][2], str), 'line wrong type'
+            assert outliers[0][0] > 0, 'license matched as anomaly'
+
+        # Test the process method
+        result = clf.process(target)
+        assert result['baselines'] == [__file__]
+        assert result['targets'] == [target]
+        assert 'test_units.py' in result['files']
+        file_info = result['files']['test_units.py']
+        assert result['models']['test_process.py'].get('uuid') != ''
+        assert file_info['mean_distance'] > 0.0
+        assert file_info['mean_distance'] < 1.0
+        assert isinstance(file_info['lines'][0], str), 'line wrong type'
+        scores = file_info['scores']
+        assert isinstance(scores[0][0], int), 'line number wrong type'
+        assert isinstance(scores[0][1], float), 'distance wrong type'
+        assert scores[0][0] > 0, 'license matched as anomaly'
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/tests/utils.py 
new/logreduce-0.2.0/logreduce/tests/utils.py
--- old/logreduce-0.1.3/logreduce/tests/utils.py        1970-01-01 
01:00:00.000000000 +0100
+++ new/logreduce-0.2.0/logreduce/tests/utils.py        2018-08-27 
04:25:34.000000000 +0200
@@ -0,0 +1,45 @@
+# Licensed under the Apache License, Version 2.0 (the "License"); you may
+# not use this file except in compliance with the License. You may obtain
+# a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations
+# under the License.
+
+fake_result = {
+    'anomalies_count': 18,
+    'baselines': ['test_process.py'],
+    'files': {
+        'test_units.py': {
+            'file_url': 'test_units.py',
+            'lines': [
+                'This is an anomaly...',
+            ],
+            'scores': [
+                (1, 0.8),
+            ],
+            'mean_distance': 0.8,
+            'model': 'test_process.py',
+            'test_time': 0.005851114000506641
+        }
+    },
+    'models': {
+        'test_process.py': {
+            'info': '65 samples, 108 features',
+            'source_files': ['test_process.py'],
+            'train_time': 0.012661808999837376
+        }
+    },
+    'outlier_lines_count': 1,
+    'reduction': 61.76470588235294,
+    'targets': ['test_units.py'],
+    'testing_lines_count': 34,
+    'training_lines_count': 74,
+    'total_time': 42,
+    'unknown_files': [],
+    "test_command": "logreduce dir test_process.py test_units.py"
+}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce/utils.py 
new/logreduce-0.2.0/logreduce/utils.py
--- old/logreduce-0.1.3/logreduce/utils.py      2018-07-04 08:41:06.000000000 
+0200
+++ new/logreduce-0.2.0/logreduce/utils.py      2018-08-27 04:25:34.000000000 
+0200
@@ -28,9 +28,7 @@
 DEFAULT_IGNORE_PATHS = [
     "zuul-info/",
     '_zuul_ansible/',
-    'ara-report/',
-    'ara-sf/',
-    'ara/',
+    'ara[_-]*.*/',
     'etc/hostname',
     'etc/nodepool/provider',
     # sf-ci useless static files
@@ -230,6 +228,9 @@
         # Copy path list
         paths = list(paths)
     for path in paths:
+        if isinstance(path, dict) and path.get('local_path'):
+            # This is a build object, return the log's local path
+            path = path['local_path']
         if isinstance(path, Journal):
             yield (path, "")
         elif os.path.isfile(path):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce.egg-info/PKG-INFO 
new/logreduce-0.2.0/logreduce.egg-info/PKG-INFO
--- old/logreduce-0.1.3/logreduce.egg-info/PKG-INFO     2018-07-04 
08:41:29.000000000 +0200
+++ new/logreduce-0.2.0/logreduce.egg-info/PKG-INFO     2018-08-27 
04:25:49.000000000 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 1.1
 Name: logreduce
-Version: 0.1.3
+Version: 0.2.0
 Summary: Extract anomalies from log files
 Home-page: https://logreduce.softwarefactory-project.io/
 Author: Tristan Cacqueray
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce.egg-info/SOURCES.txt 
new/logreduce-0.2.0/logreduce.egg-info/SOURCES.txt
--- old/logreduce-0.1.3/logreduce.egg-info/SOURCES.txt  2018-07-04 
08:41:29.000000000 +0200
+++ new/logreduce-0.2.0/logreduce.egg-info/SOURCES.txt  2018-08-27 
04:25:49.000000000 +0200
@@ -27,7 +27,12 @@
 logreduce.egg-info/pbr.json
 logreduce.egg-info/requires.txt
 logreduce.egg-info/top_level.txt
+logreduce/tests/__init__.py
+logreduce/tests/test_download.py
+logreduce/tests/test_html_output.py
+logreduce/tests/test_process.py
 logreduce/tests/test_units.py
+logreduce/tests/utils.py
 playbooks/logreduce-tests.yaml
 roles/emit-job-report/README.rst
 roles/emit-job-report/defaults/main.yaml
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/logreduce.egg-info/pbr.json 
new/logreduce-0.2.0/logreduce.egg-info/pbr.json
--- old/logreduce-0.1.3/logreduce.egg-info/pbr.json     2018-07-04 
08:41:29.000000000 +0200
+++ new/logreduce-0.2.0/logreduce.egg-info/pbr.json     2018-08-27 
04:25:49.000000000 +0200
@@ -1 +1 @@
-{"git_version": "f071111", "is_release": true}
\ No newline at end of file
+{"git_version": "2cc0ffd", "is_release": true}
\ No newline at end of file
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/setup.cfg 
new/logreduce-0.2.0/setup.cfg
--- old/logreduce-0.1.3/setup.cfg       2018-07-04 08:41:29.000000000 +0200
+++ new/logreduce-0.2.0/setup.cfg       2018-08-27 04:25:49.000000000 +0200
@@ -15,6 +15,10 @@
        Topic :: Scientific/Engineering
 keywords = machine learning, ci, anomaly detection
 
+[tool:pytest]
+addopts = --verbose
+python_files = logreduce/tests/*.py
+
 [files]
 packages = logreduce
 
@@ -38,5 +42,4 @@
 [egg_info]
 tag_build = 
 tag_date = 0
-tag_svn_revision = 0
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/test-requirements.txt 
new/logreduce-0.2.0/test-requirements.txt
--- old/logreduce-0.1.3/test-requirements.txt   2018-07-04 08:41:06.000000000 
+0200
+++ new/logreduce-0.2.0/test-requirements.txt   2018-08-27 04:25:34.000000000 
+0200
@@ -1 +1,2 @@
-nose
+pytest
+mock
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/logreduce-0.1.3/tox.ini new/logreduce-0.2.0/tox.ini
--- old/logreduce-0.1.3/tox.ini 2018-07-04 08:41:06.000000000 +0200
+++ new/logreduce-0.2.0/tox.ini 2018-08-27 04:25:34.000000000 +0200
@@ -8,7 +8,7 @@
 sitepackages = True
 usedevelop = True
 deps = -rtest-requirements.txt
-commands = nosetests -v --cover-package=logreduce
+commands = py.test -v
 
 [testenv:pep8]
 deps = flake8

commit python-logreduce for openSUSE:Factory

Reply via email to