On 10/10/2013 01:42 PM, James E. Blair wrote:
<snip>
Okay, let me try to summarize current thinking:
* We want to try to avoid writing a tool that receives logs because
swift provides most/all of the needed functionality.
* The swift tempurl middleware will allow us to have the client
directly PUT files in swift using a HMAC signed token.
* This means any pre-processing of logs would need to happen with the
log-uploading-client or via some unspecified event trigger.
* We may or may not want a log-serving app.
* We're doing neat things like filtering on level and html-ifying logs
as we serve them with our current log-serving app.
* We could probably do that processing pre-upload (including embedding
javascript in html pages to do the visual filtering) and then we
could serve static pages instead.
* A log serving app may be required to provide some kinds of indexes.
>
So to decide on the log-serving app, we need to figure out:
1) What do we want out of indexes?
Let's take a current example log path:
http://logs.openstack.org/95/50795/4/check/check-grenade-devstack-vm/3c17e3c/console.html
Ignoring the change[:-2] at the beginning since it's an implementation
artifact, that's basically:
/change/patchset/pipeline/job/run[random]/
The upload script can easily handle creating index pages below that
point. But since it runs in the context of a job run, it can't create
index pages above that (besides the technical difficulty, we don't want
to give it permission outside of its run anyway). So I believe that
without a log-receiving app, our only options are:
a) Use the static web swift middleware to provide indexes. Due to the
intersection of this feature, CDN, and container sizes with our
current providers, this is complicated and we end up at a dead end
every time we talk through it.
b) Use a log-serving application to generate index pages where we need
them. We could do this by querying swift. If we eliminate the
ability to list ridiculously large indexes (like all changes, etc) and
restrict it down to the level of, say, a single change, then this
might be manageable. However, swift may still have to perform a large
query to get us down to that level.
c) Reduce the discoverability of test runs. We could actually just
collapse the whole path into a random string and leave that as a
comment in Gerrit. Users would effectively never be able to discover
any runs other than the final ones that are reported in Gerrit, and
even comparing runs for different patchsets would involve looking up
the URL for each in the respective Gerrit comments. Openstack-infra
tools, such as elastic-recheck, could still discover other runs by
watching for ZMQ or Gearman events.
This would make little difference to most end-users as well as project
tooling, but it would make it a little harder to develop new project
tooling without access to that event stream.
Honestly, option C is growing on me, but I'd like some more feedback on
that.
2) What do we want out of processing?
Currently we HTMLify and filter logs by log level at run-time when
serving them. I think our choices are:
a) Continue doing this -- this requires a log-serving app that will
fetch logs from swift, process them, and serve them.
b) Pre-process logs before uploading them. HTMLify and add
client-side javascript line-level filtering. The logstash script may
need to do its own filtering since it won't be running a javascript
interpreter, but it could probably still do so based on metadata
encoded into the HTML by the pre-processor. Old logs won't benefit
from new features in the pre-processor though (unless we really feel
like batch-reprocessing).
I think the choices of 1c and 2b get us out of the business of running
log servers altogether and moves all the logic and processing to the
edges. I'm leaning toward them for that reason.
I'm completely indifferent to how storage and upload happen. Filesystem
/ swift / all is good to me.
However, my experience doing htmlify-screen-log.py, and it's maturation
into openstack-infra/os-loganalyze, and the fact that I probably spend
more time staring at devstack/tempest logs than just about anyone, has
given me a couple of thoughts on log-serving.
Our logs are kind of interesting beasts. We have a few different
formats, and we've got a number of different consumers. There are some
real niceties of being able to put a dynamic layer between the raw logs
and the consumer:
1) HTTP negotiation - both due to our wsgi app, as well as mod_deflate
we are able to do content negotiation with the client to serve them the
appropriate data. This means today you can get either the text/html
version (if your client supports it), it it doesn't it gets you a
text/plain version. The content is also wire compressed, automatically,
based on your client's ability to handle the compression.
2) Dynamic Filtering - we added the level= parameter to the wsgi script
to speed up logstash indexing, as it turns out that python is infinitely
faster at throwing away DEBUG lines than logstash was. It turns out
people love it to, because
http://logs.openstack.org/70/44670/3/gate/gate-tempest-devstack-vm-neutron/02d68e3/logs/screen-q-svc.txt.gz?level=TRACE
loads super quick, and lets you see where top issues are.
There are a few other interesting facts that we discovered in this
process - n-cpu on a nova-network run comes in at about 5 MB gzipped (40
MB uncompresssed) of html once we do our filtering on it. If you are
running a browser other than Chrome on a nice Intel chip, life isn't
good. A future enhancement here is to be nicer to people and disable
DEBUG by default if the file size is too big.
A 40 MB html file means that client side filtering would be problematic.
First off, you need to take a huge network hit anyway, secondly I expect
the DOM manipulation at that level of complexity would even give Chrome
a run for it's money.
And then there is just the nice idea of keeping the raw artifact and the
presentation layer separate. The fact that we can update our
presentation filter, and logs from last week, which we are still using
to debug issues, are easier to ready, is a good thing.
So regardless of the eventual solution here, I *really* want the ability
to have a presentation layer filter between the raw logs and the
clients. HTTP has so many nice features of negotiation worked into the
spec, which we're actually using today, and makes life easier for folks.
And I'd really like to not loose that.
So 2a has strong vote from me.
-Sean
--
Sean Dague
http://dague.net
_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra