Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

Patrick Petit Wed, 13 Jan 2016 00:54:51 -0800

On 12 Jan 2016 at 13:24:26, Kwasniewska, Alicja ([email protected]) 
wrote:

Unfortunately I do not have any experience in working or testing Heka, so it’s 
hard for me to compare its performance vs Logstash performance. However I’ve 
read that Heka possess a lot advantages over Logstash in this scope.

But which version of Logstash did you test? One guy from the Logstash community 
said that: “The next release of logstash (1.2.0 is in beta) has a 3.5x 
improvement in event throughput. For numbers: on my workstation at home (6 vcpu 
on virtualbox, host OS windows, 8 GB ram, host cpu is FX-8150) - with logstash 
1.1.13, I can process roughly 31,000 events/sec parsing apache logs. With 
logstash 1.2.0.beta1, I can process 102,000 events/sec.”

You also said that Heka is a unified data processing, but do we need this? Heka 
seems to address stream processing needs, while Logstash focuses mainly on 
processing logs. We want to create a central logging service, and Logstash was 
created especially for it and seems to work well for this application.

I think you are touching a key point here. Our thinking is that Heka is doing 
at least as well as Logstach to collecting and parsing logs with lesser 
footprint and higher performance but it can do more as you noticed. This is 
exactly why we came to using that tool in a first place and like it hence the 
motivation to proposing it here. It’s not a handicap but an asset because you 
can choose to do more if you want to and so avoid the sprawl of tools to do 
different things. Consider the prospect of transforming logs matching a 
particular pattern into metric messages (e.x. average http response time, http 
5xx errors count, errors rate, ...) that you could send to a time-series like 
InfluxDB… Wouldn't that be cool? I am not saying that you couldn't do it with 
Logstach but doing it with Heka could be distributed on the hosts and is much 
easier to implement because of the streams processing design. That’s a big plus.

One thing that is obvious is the fact that the Logstash is better known, more 
popular and tested. Maybe it has some performance disadvantages, but at least 
we know what we can expect from it. Also, it has more pre-built plugins and has 
a lot examples of usage, while Heka doesn’t have many of them yet and is 
nowhere near the range of plugins and integrations provided by Logstash.

I tend to disagree with that. You may think that Heka has less plugins 
out-of-the-box but in practice it has all the plugins needed to cover a variety 
of use cases I would say even beyond Lofstach thanks to Heka’s approach to 
decoupling protocol (input and output) plugins from 
deserialisation/serialisation (decoder/encoder) plugins. You can slice and dice 
combinations of those plugins and if you need to support a new message format 
it suffices to implement a decoder or an encoder in Lua using any combination 
of protocols including http, tcp, udp, amqp, kafka, statsd, … What more would 
you need?

In the case of adding plugins, I’ve read that in order to add Go plugins, the 
binary has to be recompiled, what is a little bit frustrating (static linking - 
to wire in new plugins, have to recompile). On the other hand, the Lua plugins 
do not require it, but the question is whether Lua plugins are sufficient? Or 
maybe adding Go plugins is not so bad?

We are using Heka to address a much broader spectrum of use cases and 
functionalities (some being very sophisticated) but as it is not the subject of 
the conversation I will not expand on this but we never found the need to write 
a plugin in Go. Lua and associated libraries have always been sufficient to 
address our needs.  

You also said that you didn’t test the Heka with Docker, right? But do you have 
any experience in setting up Heka in Docker container? I saw that with Heka 
0.8.0 new Docker features were implemented (included Dockerfiles to generate 
Heka Docker containers for both development and deployment), but did you test 
it? If you didn’t, we could not be sure whether there are any issues with it.

Moreover you will have to write your own Dockerfile for Heka that inherits from 
Kolla base image (as we discussed during last meeting, we would like to have 
our own images), you won’t be able to inherit from ianneub/heka:0.10 as 
specified in the link that you sent 
http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/.

There are also some issues with DockerInput Module which you want to use. For 
example splitters are not available in DockerInput 
(https://github.com/mozilla-services/heka/issues/1643). I can’t say that it 
will affect us, but we also don’t know which new issues may arise during first 
tests, as any of us has ever tried Heka in and with Dockers.

I am not stick to any specific solution, however just not sure whether Heka 
won’t surprise us with something hard to solve, configure, etc.

Well I guess that’s a fact of life we (especially in IT industry) have to live 
with no matter what.

Alicja Kwaśniewska

From: Sam Yaple [mailto:[email protected]]
Sent: Monday, January 11, 2016 11:37 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

Here is why I am on board with this. As we have discovered, the logging with 
the syslog plugin leaves alot to be desired. It (to my understanding) still 
can't save tracebacks/stacktraces to the log files for whatever reason. 
stdout/stderr however works perfectly fine. That said the Docker log stuff has 
been a source of pain in the past, but it has gotten better. It does have the 
limitation of being only able to log one output at a time. This means, as an 
example, the neutron-dhcp-agent could send its logs to stdout/err but the 
dnsmasq process that it launch (that also has logs) would have to mix its logs 
in with the neutron logs in stdout/err. Can Heka handle this and separate them 
efficiently? Otherwise I see no choice but to stick with something that can 
handle multiple logs from a single container.

Sam Yaple

On Mon, Jan 11, 2016 at 10:16 PM, Eric LEMOINE <[email protected]> wrote:

Le 11 janv. 2016 18:45, "Michał Jastrzębski" <[email protected]> a écrit :
>
> On 11 January 2016 at 10:55, Eric LEMOINE <[email protected]> wrote:
> > Currently the services running in containers send their logs to
> > rsyslog. And rsyslog stores the logs in local files, located in the
> > host's /var/log directory.
>
> Yeah, however plan was to teach rsyslog to forward logs to central
> logging stack once this thing is implemented.

Yes. With the current ELK Change Request, Rsyslog sends logs to the central 
Logstash instance. If you read my design doc you'll understand that it's 
precisely what we're proposing changing.

> > I know. Our plan is to rely on Docker. Basically: containers write
> > their logs to stdout. The logs are collected by Docker Engine, which
> > makes them available through the unix:///var/run/docker.sock socket.
> > The socket is mounted into the Heka container, which uses the Docker
> > Log Input plugin [*] to reads the logs from that that socket.
> >
> > [*] <http://hekad.readthedocs.org/en/latest/config/inputs/docker_log.html>
>
> So docker logs isn't best thing there is, however I'd suspect that's
> mostly console output fault. If you can tap into stdout efficiently,
> I'd say that's pretty good option.

I'm not following you. Could you please be more specific?

> >> Seems to me we need additional comparason of heka vs rsyslog;) Also
> >> this would have to be hands down better because rsyslog is already
> >> implemented, working and most of operators knows how to use it.
> >
> >
> > We don't need to remove Rsyslog. Services running in containers can
> > write their logs to both Rsyslog and stdout, which even is what they
> > do today (at least for the OpenStack services).
> >
>
> There is no point for that imho. I don't want to have several systems
> doing the same thing. Let's make decision of one, but optimal toolset.
> Could you please describe bottoms up what would your logging stack
> look like? Heka listening on stdout, transfering stuff to
> elasticsearch and kibana on top of it?

My plan is to provide details in the blueprint document, that I'll continue 
working on if the core developers agree with the principles of the proposed 
architecture and change.

But here's our plan—as already described in my previous email: the Kolla 
services, which run in containers, write their logs to stdout. Logs are 
collected by the Docker engine. Heka's Docker Log Input plugin is used to read 
the container logs from the Docker endpoint (Unix socket). Since Heka will run 
in a container a volume is necessary for accessing the Docker endpoint. The 
Docker Log Input plugin inserts the logs into the Heka pipeline, at the end of 
which an Elasticsearch Output plugin will send the log messages to 
Elasticsearch. Here's a blog post reporting on that approach: 
<http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/>. We 
haven't tested that approach yet, but we plan to experiment with it as we work 
on the specs.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________ 
OpenStack Development Mailing List (not for usage questions) 
Unsubscribe: [email protected]?subject:unsubscribe 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

Reply via email to