Hey,

Great overview and plan James, thanks for that :-).

So it seems to me that we're duplicating the job of swift a little bit by writing a program to accept an object over http and store it on disk. If our end-game is logs stored in swift then why not make jenkins (and other workers) push the logs straight to swift?

This not only saves writing #2 and #6 but it also reduces the workload of managing log receiving servers. We would still likely need some kind of log serving point though*.

So as long as a worker can determine the end point of a log it can report it back to zuul as its URL. For example, if the worker knows that the serving application is at http://logs.openstack.org it might report back http://logs.openstack.org/obj/abc123.

We could then use either psuedo folders[0] or have the worker generate an index. For example, why not create an index object with links to the other objects (using the known serving application URL prepended)? In fact, the reporter can choose whether to generate an index file or just send the psuedo folder to be served up.

I like the idea of zuul handing out to workers swift keys for the destination but to me this seems optional. A worker can still upload their logs elsewhere (swift or otherwise) and send back a link using the swift serving application or not. So long as the link provides some kind of index or complete log it shouldn't matter. This also solves the matter of post-processing as it becomes a workers decision.

*For example, depending on the size of the logs (and therefore the job/worker) we could actually use javascript to serve up the swift object with CORS further reducing the infrastructure requirement and utilising the powerful CPU's and javascript engines we all have. I already started doing that as a crude way to serve and add formatting to my logs[1]. Basically javascript grabs a file and runs some regex for highlighting. So my worker only reports http://laughing-spice/logviewer/?q=http://worker/job/index.html where the index.html contains links to logs like http://laughing-spice/logviewer/?q=http://worker/job/mysqllog.txt etc.

In terms of the cutover/downtime, I'm not sure where that would come in? If we are still just reporting URL's back to zuul we can change the different workers over one at a time. Eventually all new reports will have a different URL then all we have to do is worry about archiving the old logs. For these I don't imagine it to be difficult to place in swift and set up some kind of application to redirect with 301's.

Cheers,
Josh

[0] http://docs.openstack.org/trunk/openstack-object-storage/developer/content/pseudo-hierarchical-folders-directories.html
[1] https://github.com/rcbau/laughing-spice

--
Rackspace Australia

On 9/11/13 7:54 AM, [email protected] wrote:
Hi,

We've had a few conversations recently in various fora about log
storage, so I thought it'd be a good idea to write down some ideas.

The current state is that Jenkins uses SCP to copy files to
static.openstack.org, which has an Apache vhost for logs.openstack.org.
There's a really big filesystem, and we use Apache mod_autoindex to
automatically serve directory indexes.  The destination log paths are
calculated in advance by Zuul (actually in a custom parameter function
defined by our configuration -- Zuul itself knows nothing about this),
they are passed to Jenkins as a parameter, and the same paths are used
to build the URL left in the review text in Gerrit.

This causes us to need to maintain a very large filesystem (we use
Cinder volumes with LVM, so it's not so bad), but it's still not very
cloudy, and does require occasional manual work.  Swift is an obvious
candidate for storing this sort of thing.

The reason it was built this way instead of using swift is simply time:
SCP and mod_autoindex already existed.  Swift (at least the two
implementations we have access to) are not great at calculating and
serving indexes to information -- so _something_ needs to be written in
order to use Swift (either index pages for log files we generate, or an
application that stores logs in swift and retrieves them and serves them
over the web).

I like the approach of having an application store and retrieve log
data.  It would accomplish a number of goals:

* By using something other than SCP, we can reduce the access needed by
   the worker.  Currently Jenkins can write to anywhere in the log
   filesystem, and we just count on the integrity of the Jenkins master
   to prevent abuse of that privilege.

* A log-receiving mechanism with tighter access controls means that we
   could use a different kind of worker (something without the
   master/slave separation that Jenkins has) so that the job itself could
   upload its own logs.

* A log-receiver could pre-process logs (compression, highlighting,
   shipping to logstash, etc).

* The log-receiving and log-serving application(s) would be horizontally
   scalable (static.o.o has been and could again be a bottleneck).

* The log-serving application could also do any processing before
   serving.

* Finally, all of this is actually fairly generalizable to artifact
   processing, such as tarballs, so we should probably switch to calling
   it artifact storage and retrieval.

Sean Dague recently wrote a mod_python script that turns some OpenStack
log files into HTML with syntax highlighting and links:

   
http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/logs/htmlify-screen-log.py

This seems like it could be a good starting point, as it actually
addresses one of the points in the above list.

Here's how I think we could get from where we are to where we want to be:

1) Have Zuul generate a token (suggestion: HMAC signature using a shared
secret) that can later be used to determine what kind of artifacts a job
should be permitted to store, and where they can be stored.  Eg, a token
might say that this run of gate-tempest can store artifacts to the logs
container at '.../gate-tempest/1234' for the next 6 hours.  Another job
might get a token (or multiple tokens) that say it can store logs as
well as a tarball.

This way even a completely untrusted worker can store artifacts because
the token (which is effectively public) is scoped to only what the job
needs.  This could be done entirely with a custom parameter function
(just as the log paths are currently calculated) without any changes to
Zuul itself, or we could extend Zuul to natively support this concept.

2) Write a program (or extend the mod_python script) that accepts
artifacts over HTTP with a token.  It would then write them to the
filesystem as we do now.  It can offline-validate the token with the
shared secret (between it and Zuul).  It could also invalidate the token
after its use.

3) Write a script that we can invoke from within our Jenkins jobs to use
the token to upload artifacts.  Other non-Jenkins workers can use the
same protocol to upload their artifacts.

4) Write a program (or extend the mod_python script) that accepts
requests (using the same URL format) and reads the files from the
filesystem and serves them.

5) Extend the artifact serving program in #3 so that it first checks a
mysql database (we can use trove to provide the db) for each request; if
it finds the item, then it serves it from swift.  If it is for a
directory instead of a file, it uses the database to calculate the index
and generates an index page and serves it.  If the item is not found in
the DB, it fetches it from the disk.  If it's a directory that isn't in
the DB, it generates an index based on the filesystem directory
contents.

6) Extend the artifact storing program in #2 to optionally store the
artifacts in swift instead of the filesystem.

I think that approach gives us a reasonably secure system, and the
stepwise nature means that we can test each component in turn, and
provide a smooth transition.

Some variants to consider:

   * The token system doesn't have to be HMAC-based; there's lots of
     stuff out there.  We could do online validation with Zuul instead of
     a shared secret, for instance.

   * Not trying to do the phased implementation, and just doing a cutover
     with downtime (and bulk import old data).

   * Also, it would be nice to make pre and post processing easily
     pluggable and configurable early on; there's no telling what we may
     want to do in the future.

I think that about encompasses the ideas and conversations I've had
around the subject.  Any thoughts?

-Jim

_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Reply via email to