Re: [OpenStack-Infra] Log storage/serving

Sean Dague Tue, 15 Oct 2013 13:53:14 -0700

On 10/10/2013 01:42 PM, James E. Blair wrote:
<snip>

Okay, let me try to summarize current thinking:


* We want to try to avoid writing a tool that receives logs because
   swift provides most/all of the needed functionality.
   * The swift tempurl middleware will allow us to have the client
     directly PUT files in swift using a HMAC signed token.
   * This means any pre-processing of logs would need to happen with the
     log-uploading-client or via some unspecified event trigger.

* We may or may not want a log-serving app.
   * We're doing neat things like filtering on level and html-ifying logs
     as we serve them with our current log-serving app.
   * We could probably do that processing pre-upload (including embedding
     javascript in html pages to do the visual filtering) and then we
     could serve static pages instead.
   * A log serving app may be required to provide some kinds of indexes.

So to decide on the log-serving app, we need to figure out:

1) What do we want out of indexes?

Let's take a current example log path:

   
http://logs.openstack.org/95/50795/4/check/check-grenade-devstack-vm/3c17e3c/console.html

Ignoring the change[:-2] at the beginning since it's an implementation
artifact, that's basically:

   /change/patchset/pipeline/job/run[random]/

The upload script can easily handle creating index pages below that
point.  But since it runs in the context of a job run, it can't create
index pages above that (besides the technical difficulty, we don't want
to give it permission outside of its run anyway).  So I believe that
without a log-receiving app, our only options are:

   a) Use the static web swift middleware to provide indexes.  Due to the
   intersection of this feature, CDN, and container sizes with our
   current providers, this is complicated and we end up at a dead end
   every time we talk through it.

   b) Use a log-serving application to generate index pages where we need
   them.  We could do this by querying swift.  If we eliminate the
   ability to list ridiculously large indexes (like all changes, etc) and
   restrict it down to the level of, say, a single change, then this
   might be manageable.  However, swift may still have to perform a large
   query to get us down to that level.

   c) Reduce the discoverability of test runs.  We could actually just
   collapse the whole path into a random string and leave that as a
   comment in Gerrit.  Users would effectively never be able to discover
   any runs other than the final ones that are reported in Gerrit, and
   even comparing runs for different patchsets would involve looking up
   the URL for each in the respective Gerrit comments.  Openstack-infra
   tools, such as elastic-recheck, could still discover other runs by
   watching for ZMQ or Gearman events.

   This would make little difference to most end-users as well as project
   tooling, but it would make it a little harder to develop new project
   tooling without access to that event stream.

Honestly, option C is growing on me, but I'd like some more feedback on
that.

2) What do we want out of processing?

Currently we HTMLify and filter logs by log level at run-time when
serving them.  I think our choices are:

   a) Continue doing this -- this requires a log-serving app that will
   fetch logs from swift, process them, and serve them.

   b) Pre-process logs before uploading them.  HTMLify and add
   client-side javascript line-level filtering.  The logstash script may
   need to do its own filtering since it won't be running a javascript
   interpreter, but it could probably still do so based on metadata
   encoded into the HTML by the pre-processor.  Old logs won't benefit
   from new features in the pre-processor though (unless we really feel
   like batch-reprocessing).

I think the choices of 1c and 2b get us out of the business of running
log servers altogether and moves all the logic and processing to the
edges.  I'm leaning toward them for that reason.

I'm completely indifferent to how storage and upload happen. Filesystem/ swift / all is good to me.

However, my experience doing htmlify-screen-log.py, and it's maturationinto openstack-infra/os-loganalyze, and the fact that I probably spendmore time staring at devstack/tempest logs than just about anyone, hasgiven me a couple of thoughts on log-serving.

Our logs are kind of interesting beasts. We have a few differentformats, and we've got a number of different consumers. There are somereal niceties of being able to put a dynamic layer between the raw logsand the consumer:

1) HTTP negotiation - both due to our wsgi app, as well as mod_deflatewe are able to do content negotiation with the client to serve them theappropriate data. This means today you can get either the text/htmlversion (if your client supports it), it it doesn't it gets you atext/plain version. The content is also wire compressed, automatically,based on your client's ability to handle the compression.

2) Dynamic Filtering - we added the level= parameter to the wsgi scriptto speed up logstash indexing, as it turns out that python is infinitelyfaster at throwing away DEBUG lines than logstash was. It turns outpeople love it to, becausehttp://logs.openstack.org/70/44670/3/gate/gate-tempest-devstack-vm-neutron/02d68e3/logs/screen-q-svc.txt.gz?level=TRACEloads super quick, and lets you see where top issues are.

There are a few other interesting facts that we discovered in thisprocess - n-cpu on a nova-network run comes in at about 5 MB gzipped (40MB uncompresssed) of html once we do our filtering on it. If you arerunning a browser other than Chrome on a nice Intel chip, life isn'tgood. A future enhancement here is to be nicer to people and disableDEBUG by default if the file size is too big.

A 40 MB html file means that client side filtering would be problematic.First off, you need to take a huge network hit anyway, secondly I expectthe DOM manipulation at that level of complexity would even give Chromea run for it's money.

And then there is just the nice idea of keeping the raw artifact and thepresentation layer separate. The fact that we can update ourpresentation filter, and logs from last week, which we are still usingto debug issues, are easier to ready, is a good thing.

So regardless of the eventual solution here, I *really* want the abilityto have a presentation layer filter between the raw logs and theclients. HTTP has so many nice features of negotiation worked into thespec, which we're actually using today, and makes life easier for folks.And I'd really like to not loose that.


So 2a has strong vote from me.

        -Sean

--
Sean Dague
http://dague.net

_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Log storage/serving

Reply via email to