Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-13 Thread Kwasniewska, Alicja
Eric, Patrick, Simon, Clark thanks for your comments.

I don't know Heka, so that's why I ask a lot of questions. I hope you are fine 
with that:) I am not against Heka, I was just curious how reliable it is  and 
how much experience you have with setting it up in Docker environment in order 
to know both advantages and disadvantages of this solution. 

@Eric, great that you are going to create POC, it will explain a lot and it 
will show us possible problems. 

Kind regards,
Alicja Kwaśniewska


-Original Message-
From: Eric LEMOINE [mailto:elemo...@mirantis.com] 
Sent: Wednesday, January 13, 2016 10:55 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

Hi Alicja


Thank you for your comments.  Answers and comments below.



On Tue, Jan 12, 2016 at 1:19 PM, Kwasniewska, Alicja 
<alicja.kwasniew...@intel.com> wrote:
> Unfortunately I do not have any experience in working or testing Heka, 
> so it’s hard for me to compare its performance vs Logstash 
> performance. However I’ve read that Heka possess a lot advantages over 
> Logstash in this scope.
>
>
> But which version of Logstash did you test? One guy from the Logstash 
> community said that: “The next release of logstash (1.2.0 is in beta) 
> has a 3.5x improvement in event throughput. For numbers: on my 
> workstation at home
> (6 vcpu on virtualbox, host OS windows, 8 GB ram, host cpu is FX-8150) 
> - with logstash 1.1.13, I can process roughly 31,000 events/sec 
> parsing apache logs. With logstash 1.2.0.beta1, I can process 102,000 
> events/sec.”


I've used the latest Docker image:
<https://hub.docker.com/r/library/logstash/>.  It uses Logstash 2.1.1, which is 
the most recent stable version.



> You also said that Heka is a unified data processing, but do we need this?


Heka, as a unified data processing, enables to derive metrics from logs, HTTP 
response times for example.  Alerts can also be triggered on specific log 
patterns.


> Heka seems to address stream processing needs, while Logstash focuses 
> mainly on processing logs. We want to create a central logging 
> service, and Logstash was created especially for it and seems to work 
> well for this application.
>
>
> One thing that is obvious is the fact that the Logstash is better 
> known, more popular and tested. Maybe it has some performance 
> disadvantages, but at least we know what we can expect from it. Also, 
> it has more pre-built plugins and has a lot examples of usage, while 
> Heka doesn’t have many of them yet and is nowhere near the range of 
> plugins and integrations provided by Logstash.


As Simon said Heka already includes quite a lot of plugins.  See the Heka 
documentation [*] for an exhaustive list.  It may indeed be the case that 
Logstash includes even more plugins, but Heka has taken us pretty far already.



> In the case of adding plugins, I’ve read that in order to add Go 
> plugins, the binary has to be recompiled, what is a little bit 
> frustrating (static linking - to wire in new plugins, have to 
> recompile). On the other hand, the Lua plugins do not require it, but 
> the question is whether Lua plugins are sufficient? Or maybe adding Go 
> plugins is not so bad?


See Simon's answer.



> You also said that you didn’t test the Heka with Docker, right?


I did test Heka with Docker.  In my performance tests both Heka and Logstash 
ran in Docker containers.  What I haven't tested yet is the Docker Log Input 
plugin.  We'll do more tests as part of the work on specifications.



> But do you
> have any experience in setting up Heka in Docker container? I saw that 
> with Heka 0.8.0 new Docker features were implemented (included 
> Dockerfiles to generate Heka Docker containers for both development 
> and deployment), but did you test it? If you didn’t, we could not be 
> sure whether there are any issues with it.
>
>
> Moreover you will have to write your own Dockerfile for Heka that 
> inherits from Kolla base image (as we discussed during last meeting, 
> we would like to have our own images), you won’t be able to inherit 
> from ianneub/heka:0.10 as specified in the link that you sent 
> http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/.



As I said in my first email Heka has no dependencies, so creating a Dockerfile 
for Heka is quite easy.  See 
<https://github.com/elemoine/heka-logstash-comparison/blob/master/Dockerfile>
for the super-simple Dockerfile I've used so far.


> There are also some issues with DockerInput Module which you want to use.
> For example splitters are not available in DockerInput 
> (https://github.com/mozilla-services/heka/issues/1643). I can’t say 
> that it will affect us, but we also don’t know which new issues may 
> arise du

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-13 Thread Eric LEMOINE
Hi Alicja


Thank you for your comments.  Answers and comments below.



On Tue, Jan 12, 2016 at 1:19 PM, Kwasniewska, Alicja
 wrote:
> Unfortunately I do not have any experience in working or testing Heka, so
> it’s hard for me to compare its performance vs Logstash performance. However
> I’ve read that Heka possess a lot advantages over Logstash in this scope.
>
>
> But which version of Logstash did you test? One guy from the Logstash
> community said that: “The next release of logstash (1.2.0 is in beta) has a
> 3.5x improvement in event throughput. For numbers: on my workstation at home
> (6 vcpu on virtualbox, host OS windows, 8 GB ram, host cpu is FX-8150) -
> with logstash 1.1.13, I can process roughly 31,000 events/sec parsing apache
> logs. With logstash 1.2.0.beta1, I can process 102,000 events/sec.”


I've used the latest Docker image:
.  It uses Logstash 2.1.1,
which is the most recent stable version.



> You also said that Heka is a unified data processing, but do we need this?


Heka, as a unified data processing, enables to derive metrics from
logs, HTTP response times for example.  Alerts can also be triggered
on specific log patterns.


> Heka seems to address stream processing needs, while Logstash focuses mainly
> on processing logs. We want to create a central logging service, and
> Logstash was created especially for it and seems to work well for this
> application.
>
>
> One thing that is obvious is the fact that the Logstash is better known,
> more popular and tested. Maybe it has some performance disadvantages, but at
> least we know what we can expect from it. Also, it has more pre-built
> plugins and has a lot examples of usage, while Heka doesn’t have many of
> them yet and is nowhere near the range of plugins and integrations provided
> by Logstash.


As Simon said Heka already includes quite a lot of plugins.  See the
Heka documentation [*] for an exhaustive list.  It may indeed be the
case that Logstash includes even more plugins, but Heka has taken us
pretty far already.



> In the case of adding plugins, I’ve read that in order to add Go plugins,
> the binary has to be recompiled, what is a little bit frustrating (static
> linking - to wire in new plugins, have to recompile). On the other hand, the
> Lua plugins do not require it, but the question is whether Lua plugins are
> sufficient? Or maybe adding Go plugins is not so bad?


See Simon's answer.



> You also said that you didn’t test the Heka with Docker, right?


I did test Heka with Docker.  In my performance tests both Heka and
Logstash ran in Docker containers.  What I haven't tested yet is the
Docker Log Input plugin.  We'll do more tests as part of the work on
specifications.



> But do you
> have any experience in setting up Heka in Docker container? I saw that with
> Heka 0.8.0 new Docker features were implemented (included Dockerfiles to
> generate Heka Docker containers for both development and deployment), but
> did you test it? If you didn’t, we could not be sure whether there are any
> issues with it.
>
>
> Moreover you will have to write your own Dockerfile for Heka that inherits
> from Kolla base image (as we discussed during last meeting, we would like to
> have our own images), you won’t be able to inherit from ianneub/heka:0.10 as
> specified in the link that you sent
> http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/.



As I said in my first email Heka has no dependencies, so creating a
Dockerfile for Heka is quite easy.  See

for the super-simple Dockerfile I've used so far.


> There are also some issues with DockerInput Module which you want to use.
> For example splitters are not available in DockerInput
> (https://github.com/mozilla-services/heka/issues/1643). I can’t say that it
> will affect us, but we also don’t know which new issues may arise during
> first tests, as any of us has ever tried Heka in and with Dockers.



Yes, we're aware of that limitation.  But, we're not sure this is a
problem, as the decoder can be the component coalescing log lines.  We
already have a Lua decoder that does that, accumulating lines of
Python Tracebacks.  I am going to look at this in more detail when
working on the blueprint.



> I am not stick to any specific solution, however just not sure whether Heka
> won’t surprise us with something hard to solve, configure, etc.


We chose Heka because it's lightweight and fast, while providing us
with the flexibility we need for processing different types of data
streams.  The distributed architecture we think is necessary for large
environments requires running the logs processing component on each
cluster node, and we did not want to run a JVM on each node,
especially on compute nodes where user VMs run.



Thanks,


Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-13 Thread Patrick Petit
 in IT industry) have to live 
with no matter what.


 


Alicja Kwaśniewska

 

From: Sam Yaple [mailto:sam...@yaple.net]
Sent: Monday, January 11, 2016 11:37 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

 

Here is why I am on board with this. As we have discovered, the logging with 
the syslog plugin leaves alot to be desired. It (to my understanding) still 
can't save tracebacks/stacktraces to the log files for whatever reason. 
stdout/stderr however works perfectly fine. That said the Docker log stuff has 
been a source of pain in the past, but it has gotten better. It does have the 
limitation of being only able to log one output at a time. This means, as an 
example, the neutron-dhcp-agent could send its logs to stdout/err but the 
dnsmasq process that it launch (that also has logs) would have to mix its logs 
in with the neutron logs in stdout/err. Can Heka handle this and separate them 
efficiently? Otherwise I see no choice but to stick with something that can 
handle multiple logs from a single container.



Sam Yaple

 

On Mon, Jan 11, 2016 at 10:16 PM, Eric LEMOINE <elemo...@mirantis.com> wrote:


Le 11 janv. 2016 18:45, "Michał Jastrzębski" <inc...@gmail.com> a écrit :
>
> On 11 January 2016 at 10:55, Eric LEMOINE <elemo...@mirantis.com> wrote:
> > Currently the services running in containers send their logs to
> > rsyslog. And rsyslog stores the logs in local files, located in the
> > host's /var/log directory.
>
> Yeah, however plan was to teach rsyslog to forward logs to central
> logging stack once this thing is implemented.

Yes. With the current ELK Change Request, Rsyslog sends logs to the central 
Logstash instance. If you read my design doc you'll understand that it's 
precisely what we're proposing changing.

> > I know. Our plan is to rely on Docker. Basically: containers write
> > their logs to stdout. The logs are collected by Docker Engine, which
> > makes them available through the unix:///var/run/docker.sock socket.
> > The socket is mounted into the Heka container, which uses the Docker
> > Log Input plugin [*] to reads the logs from that that socket.
> >
> > [*] <http://hekad.readthedocs.org/en/latest/config/inputs/docker_log.html>
>
> So docker logs isn't best thing there is, however I'd suspect that's
> mostly console output fault. If you can tap into stdout efficiently,
> I'd say that's pretty good option.

I'm not following you. Could you please be more specific?

> >> Seems to me we need additional comparason of heka vs rsyslog;) Also
> >> this would have to be hands down better because rsyslog is already
> >> implemented, working and most of operators knows how to use it.
> >
> >
> > We don't need to remove Rsyslog. Services running in containers can
> > write their logs to both Rsyslog and stdout, which even is what they
> > do today (at least for the OpenStack services).
> >
>
> There is no point for that imho. I don't want to have several systems
> doing the same thing. Let's make decision of one, but optimal toolset.
> Could you please describe bottoms up what would your logging stack
> look like? Heka listening on stdout, transfering stuff to
> elasticsearch and kibana on top of it?

My plan is to provide details in the blueprint document, that I'll continue 
working on if the core developers agree with the principles of the proposed 
architecture and change.

But here's our plan—as already described in my previous email: the Kolla 
services, which run in containers, write their logs to stdout. Logs are 
collected by the Docker engine. Heka's Docker Log Input plugin is used to read 
the container logs from the Docker endpoint (Unix socket). Since Heka will run 
in a container a volume is necessary for accessing the Docker endpoint. The 
Docker Log Input plugin inserts the logs into the Heka pipeline, at the end of 
which an Elasticsearch Output plugin will send the log messages to 
Elasticsearch. Here's a blog post reporting on that approach: 
<http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/>. We 
haven't tested that approach yet, but we plan to experiment with it as we work 
on the specs.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 

__ 
OpenStack Development Mailing List (not for usage questions) 
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
___

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-13 Thread Eric LEMOINE
On Tue, Jan 12, 2016 at 5:36 PM, Michał Jastrzębski  wrote:
> So tracebacks sort of works, they're there just ugly. That's why I'm
> also happy if we change rsyslog to heka.
>
> Eric, I hope I wont ask too much, but could you please prepare PoC of
> kolla+heka, for what I care heka can log to local log file same as
> rsyslog for now. Would that be big problem?


I'll certainly do that POC while working on the blueprint.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-13 Thread Eric LEMOINE
On Tue, Jan 12, 2016 at 8:26 PM, Steven Dake (stdake)  wrote:
> Eric,
>
> Thanks for using the mailing list for this discussion.  I like to see more
> mailing list conversations on big changes related to kolla, which this one
> is :)
> Responses inline.
>
> Please put document #3 (the design document) in gerrit rather than google
> docs in the main kolla repo as a spec using our special snowflake (read
> less cumbersome) spec template.


Sure.  The Google doc is a temporary thing.


> On 1/11/16, 8:16 AM, "Eric LEMOINE"  wrote:
>>In the proposed architecture each cluster node runs an instance of
>>Heka for collecting and processing logs.  And instead of sending the
>>processed logs to a centralized Logstash instance, logs are directly
>>sent to Elasticsearch, which itself can be distributed across multiple
>>nodes for high-availability and scaling.  The proposed architecture is
>>based on Heka, and it doesn't use Logstash.
>
> How are the elasticsearch network addresses discovered by Heka here?


Initially one Elasticsearch instance will be used, as in Alicja's work
().  In the future, HAProxy,
which is already included in Kolla, could be used between Heka and
Elasticsearch.  Another option would be to extend Heka's Elasticsearch
Ouput plugin to work with a list a Elasticsearch hosts instead of just
one host.



>>That being said, it is important to note that the intent of this
>>proposal is not strictly directed at replacing Logstash by Heka.  The
>>intent is to propose a distributed architecture with Heka running on
>>each cluster node rather than having Logstash run as a centralized
>>logs processing component.  For such a distributed architecture we
>>think that Heka is more appropriate, with a smaller memory footprint
>>and better performances in general.  In addition, Heka is also more
>>than a logs processing tool, as it's designed to process streams of
>>any type of data, including events, logs and metrics.
>
> I think the followup was that the intent of this proposal was to replace
> both logstash and rsyslog.  Could you comment on that?



Yes, it may make sense to remove Rsyslog entirely and only rely on
Heka.  This is something we want/need to assess in the specifications.



> It may be that
> this work has to be punted to the N cycle if that¹s the case - we are
> super short on time, and need updates done.



Yes, sure, that makes sense.


>
> Will you be making it to the Kolla Midcycle Feb 9th and 10th to discuss
> this system face to face?


I won't be able to go myself unfortunately.  But there will probably
be someone from Mirantis (from the kolla-mesos team) whom you can talk
to.




>>* Heka is written in Go and has no dependencies.  Go programs are
>>compiled to native code.  This in contrast to Logstash which uses
>>JRuby and as such requires running a Java Virtual Machine.  Besides
>>this native versus interpreted code aspect, this also can raise the
>>question of which JVM to use (Oracle, OpenJDK?) and which version
>>(6,7,8?).
>
> This I did not know.  I was aware kibana (visualization) was implemented
> in Java.
>
> I would prefer to avoid any Java dependencies in the Kolla project.  The
> reason being there is basically a fork of the virtual machines, the Oracle
> version and the openjdk version.  This creates licensing problems for our
> downstreams.  If Kibana and Elasticsearch are developed in Java, I guess
> we will just have to live with that but the less Java dependencies the
> better.


Confirming that Elasticsearch and Logstash run in a JVM.  However,
Kibana is written in JavaScript, and Kibana version 4 includes a
server-side component that runs in NodeJS.



>>* There are six types of Heka plugins: Inputs, Splitters, Decoders,
>>Filters, Encoders, and Outputs.  Heka plugins are written in Go or
>>Lua.  When written in Lua their executions are sandbox'ed, where
>>misbehaving plugins may be shut down by Heka.  Lua plugins may also be
>>dynamically added to Heka with no config changes or Heka restart. This
>>is an important property on container environments such as Mesos,
>>where workloads are changed dynamically.
>
> For any update to the Kolla environment, I expect a full pull, stop, start
> of the service. This preserves immutability which is a magical property of
> a container.  For more details on my opinions on this matter, please take
> 10 minutes and read:
>
> http://sdake.io/2015/11/11/the-tldr-on-immutable-infrastructure/


Thanks for the link.


>>* Heka is faster than Logstash for processing logs, and its memory
>>footprint is smaller.  I ran tests, where 3,400,000 log messages were
>>read from 500 input files and then written to a single output file.
>>Heka processed the 3,400,000 log messages in 12 seconds, consuming
>>500M of RAM.  Logstash processed the 3,400,000 log messages in 1mn
>>35s, consuming 1.1G of RAM.  Adding a grok filter to parse and
>>structure logs, Logstash 

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-12 Thread Simon Pasquier
Hello Alicja,

Comments inline.

On Tue, Jan 12, 2016 at 1:19 PM, Kwasniewska, Alicja <
alicja.kwasniew...@intel.com> wrote:

> Unfortunately I do not have any experience in working or testing Heka, so
> it’s hard for me to compare its performance vs Logstash performance.
> However I’ve read that Heka possess a lot advantages over Logstash in this
> scope.
>
>
> But which version of Logstash did you test? One guy from the Logstash
> community said that: *“The next release of logstash (1.2.0 is in beta)
> has a 3.5x improvement in event throughput. For numbers: on my workstation
> at home (6 vcpu on virtualbox, host OS windows, 8 GB ram, host cpu is
> FX-8150) - with logstash 1.1.13, I can process roughly 31,000 events/sec
> parsing apache logs. With logstash 1.2.0.beta1, I can process 102,000
> events/sec.”*
>
>
> You also said that Heka is a unified data processing, but do we need this?
> Heka seems to address stream processing needs, while Logstash focuses
> mainly on processing logs. We want to create a central logging service, and
> Logstash was created especially for it and seems to work well for this
> application.
>
>
> One thing that is obvious is the fact that the Logstash is better known,
> more popular and tested. Maybe it has some performance disadvantages, but
> at least we know what we can expect from it. Also, it has more pre-built
> plugins and has a lot examples of usage, while Heka doesn’t have many of
> them yet and is nowhere near the range of plugins and integrations provided
> by Logstash.
>

>From my experience, Heka has already a large number of plugins that cover
most of the use cases. But I understand your concerns regarding the
adoption of Heka vs Logstash.


>
>
> In the case of adding plugins, I’ve read that in order to add Go plugins,
> the binary has to be recompiled, what is a little bit frustrating (static
> linking - to wire in new plugins, have to recompile). On the other hand,
> the Lua plugins do not require it, but the question is whether Lua plugins
> are sufficient? Or maybe adding Go plugins is not so bad?
>

For the reason that you pointed, Lua plugins are first-class citizens and
the Heka developers encourage their use over writing custom Go plugins. In
terms of performances, Lua and Go plugins are usually equivalent.


>
>
> You also said that you didn’t test the Heka with Docker, right? But do you
> have any experience in setting up Heka in Docker container? I saw that with
> Heka 0.8.0 new Docker features were implemented (included Dockerfiles to
> generate Heka Docker containers for both development and deployment), but
> did you test it? If you didn’t, we could not be sure whether there are any
> issues with it.
>

>From my experience, Heka runs in Docker without problem.


>
>
> Moreover you will have to write your own Dockerfile for Heka that inherits
> from Kolla base image (as we discussed during last meeting, we would like
> to have our own images), you won’t be able to inherit from
> ianneub/heka:0.10 as specified in the link that you sent
> http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/.
>

Since the Heka binary embeds all its dependencies, writing the Dockerfile
shouldn't be hard.


>
>
> There are also some issues with DockerInput Module which you want to use.
> For example splitters are not available in DockerInput (
> https://github.com/mozilla-services/heka/issues/1643). I can’t say that
> it will affect us, but we also don’t know which new issues may arise during
> first tests, as any of us has ever tried Heka in and with Dockers.
>

Good point. This should be investigated by Eric in his specification.


>
>
> I am not stick to any specific solution, however just not sure whether
> Heka won’t surprise us with something hard to solve, configure, etc.
>

I just wanted to mention that Heka powers the Firefox Telemetry Data
Pipeline [1] which collects and processes many data.

Simon

[1] https://people.mozilla.org/~rmiller/heka-monitorama-2015-06/#/41


>
>
>
> * Alicja Kwaśniewska*
>
>
>
> *From:* Sam Yaple [mailto:sam...@yaple.net]
> *Sent:* Monday, January 11, 2016 11:37 PM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [kolla] Introduction of Heka in Kolla
>
>
>
> Here is why I am on board with this. As we have discovered, the logging
> with the syslog plugin leaves alot to be desired. It (to my understanding)
> still can't save tracebacks/stacktraces to the log files for whatever
> reason. stdout/stderr however works perfectly fine. That said the Docker
> log stuff has been a source of pain in the past, but it has gotten better.
> It does have the limitation of being only able to log one ou

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-12 Thread Clark Boylan
On Tue, Jan 12, 2016, at 04:19 AM, Kwasniewska, Alicja wrote:
> Unfortunately I do not have any experience in working or testing Heka, so
> it’s hard for me to compare its performance vs Logstash performance.
> However I’ve read that Heka possess a lot advantages over Logstash in
> this scope.
> 
> 
> But which version of Logstash did you test? One guy from the Logstash
> community said that: “The next release of logstash (1.2.0 is in beta) has
> a 3.5x improvement in event throughput. For numbers: on my workstation at
> home (6 vcpu on virtualbox, host OS windows, 8 GB ram, host cpu is
> FX-8150) - with logstash 1.1.13, I can process roughly 31,000 events/sec
> parsing apache logs. With logstash 1.2.0.beta1, I can process 102,000
> events/sec.”
In addition to the version of Logstash that is used, the specific grok
rules and file inputs can make a big difference with performance. I
would make sure that your 500 input files are representative of the log
files you will generate running Kolla/OpenStack and that you write grok
rules that would actually be useful. If you need to you can always grab
them from the CI logs.

For the Elasticsearch indexing that we expose at
http://logstash.openstack.org we ended up going with Logstash because
many of the alternate tools (Heka and Fluentd and friends) seemed more
appropriate for moving logs from point A to point B with little to no
modification. With Jenkins build logs we needed to be able to accept
many different log formats (libvirt, oslo, swift, apache, devstack
console logs, and so on) and collapse them into a mostly common event
format. That said if you can get structured logs and keep the number of
formats to a minimum using a tool like Heka or Fluentd makes a lot of
sense.
> 
> 
> You also said that Heka is a unified data processing, but do we need
> this? Heka seems to address stream processing needs, while Logstash
> focuses mainly on processing logs. We want to create a central logging
> service, and Logstash was created especially for it and seems to work
> well for this application.
> 
> 
> One thing that is obvious is the fact that the Logstash is better known,
> more popular and tested. Maybe it has some performance disadvantages, but
> at least we know what we can expect from it. Also, it has more pre-built
> plugins and has a lot examples of usage, while Heka doesn’t have many of
> them yet and is nowhere near the range of plugins and integrations
> provided by Logstash.
Not only that but the OpenStack Infra team and others have written rules
for handling OpenStack logs. In theory you can just drop them into place
and it will all work.
> 
> 
> In the case of adding plugins, I’ve read that in order to add Go plugins,
> the binary has to be recompiled, what is a little bit frustrating (static
> linking - to wire in new plugins, have to recompile). On the other hand,
> the Lua plugins do not require it, but the question is whether Lua
> plugins are sufficient? Or maybe adding Go plugins is not so bad?
> 
> 
> You also said that you didn’t test the Heka with Docker, right? But do
> you have any experience in setting up Heka in Docker container? I saw
> that with Heka 0.8.0 new Docker features were implemented (included
> Dockerfiles to generate Heka Docker containers for both development and
> deployment), but did you test it? If you didn’t, we could not be sure
> whether there are any issues with it.
> 
> 
> Moreover you will have to write your own Dockerfile for Heka that
> inherits from Kolla base image (as we discussed during last meeting, we
> would like to have our own images), you won’t be able to inherit from
> ianneub/heka:0.10 as specified in the link that you sent
> http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/.
> 
> 
> There are also some issues with DockerInput Module which you want to use.
> For example splitters are not available in DockerInput
> (https://github.com/mozilla-services/heka/issues/1643). I can’t say that
> it will affect us, but we also don’t know which new issues may arise
> during first tests, as any of us has ever tried Heka in and with Dockers.
> 
> 
> I am not stick to any specific solution, however just not sure whether
> Heka won’t surprise us with something hard to solve, configure, etc.
> 
> 
> Alicja Kwaśniewska
> 

Happy to answer any other questions about how the Infra team runs
Logstash (we don't centralize it for example).

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-12 Thread Steven Dake (stdake)
Eric,

Thanks for using the mailing list for this discussion.  I like to see more
mailing list conversations on big changes related to kolla, which this one
is :)
Responses inline.

Please put document #3 (the design document) in gerrit rather than google
docs in the main kolla repo as a spec using our special snowflake (read
less cumbersome) spec template.

On 1/11/16, 8:16 AM, "Eric LEMOINE"  wrote:

>Hi
>
>As discussed on IRC the other day [1] we want to propose a distributed
>logs processing architecture based on Heka [2], built on Alicja
>Kwasniewska's ELK work with
>.  Please take a look at the
>design document I've started working on [3].  The document is still
>work-in-progress, but the "Problem statement" and "Proposed change"
>sections should provide you with a good overview of the architecture
>we have in mind.
>
>In the proposed architecture each cluster node runs an instance of
>Heka for collecting and processing logs.  And instead of sending the
>processed logs to a centralized Logstash instance, logs are directly
>sent to Elasticsearch, which itself can be distributed across multiple
>nodes for high-availability and scaling.  The proposed architecture is
>based on Heka, and it doesn't use Logstash.

How are the elasticsearch network addresses discovered by Heka here?

>
>That being said, it is important to note that the intent of this
>proposal is not strictly directed at replacing Logstash by Heka.  The
>intent is to propose a distributed architecture with Heka running on
>each cluster node rather than having Logstash run as a centralized
>logs processing component.  For such a distributed architecture we
>think that Heka is more appropriate, with a smaller memory footprint
>and better performances in general.  In addition, Heka is also more
>than a logs processing tool, as it's designed to process streams of
>any type of data, including events, logs and metrics.

I think the followup was that the intent of this proposal was to replace
both logstash and rsyslog.  Could you comment on that?  It may be that
this work has to be punted to the N cycle if that¹s the case - we are
super short on time, and need updates done.

Will you be making it to the Kolla Midcycle Feb 9th and 10th to discuss
this system face to face?

>
>Some elements of comparison between Heka and Logstash:
>
>* Logstash was designed for logs processing.  Heka is a "unified data
>processing" software, designed to process streams of any type of data.
>So Heka is about running one service on each box instead of many.
>Using a single service for processing different types of data also
>makes it possible to do correlations, and derive metrics from logs and
>events.  See Rob Miller's presentation [4] for more details.
>
>* The virtual size of the Logstash Docker image is 447 MB, while the
>virtual size of an Heka image built from the same base image
>(debian:jessie) is 177 MB.  For comparison the virtual size of the
>Elasticsearch image is 345 MB.

Just a heads up, I don't think there is much concern over size of images.

>
>* Heka is written in Go and has no dependencies.  Go programs are
>compiled to native code.  This in contrast to Logstash which uses
>JRuby and as such requires running a Java Virtual Machine.  Besides
>this native versus interpreted code aspect, this also can raise the
>question of which JVM to use (Oracle, OpenJDK?) and which version
>(6,7,8?).

This I did not know.  I was aware kibana (visualization) was implemented
in Java.

I would prefer to avoid any Java dependencies in the Kolla project.  The
reason being there is basically a fork of the virtual machines, the Oracle
version and the openjdk version.  This creates licensing problems for our
downstreams.  If Kibana and Elasticsearch are developed in Java, I guess
we will just have to live with that but the less Java dependencies the
better.

>
>* There are six types of Heka plugins: Inputs, Splitters, Decoders,
>Filters, Encoders, and Outputs.  Heka plugins are written in Go or
>Lua.  When written in Lua their executions are sandbox'ed, where
>misbehaving plugins may be shut down by Heka.  Lua plugins may also be
>dynamically added to Heka with no config changes or Heka restart. This
>is an important property on container environments such as Mesos,
>where workloads are changed dynamically.

For any update to the Kolla environment, I expect a full pull, stop, start
of the service. This preserves immutability which is a magical property of
a container.  For more details on my opinions on this matter, please take
10 minutes and read:

http://sdake.io/2015/11/11/the-tldr-on-immutable-infrastructure/


>
>* To avoid losing logs under high load it is often recommend to use
>Logstash together with Redis [5].  Redis plays the role of a buffer,
>where logs are queued when Logstash or Elasticsearch cannot keep up
>with the load.  Heka, as a "unified data processing" software,
>includes its own 

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-12 Thread Michał Jastrzębski
So tracebacks sort of works, they're there just ugly. That's why I'm
also happy if we change rsyslog to heka.

Eric, I hope I wont ask too much, but could you please prepare PoC of
kolla+heka, for what I care heka can log to local log file same as
rsyslog for now. Would that be big problem?

On 11 January 2016 at 16:37, Sam Yaple  wrote:
> Here is why I am on board with this. As we have discovered, the logging with
> the syslog plugin leaves alot to be desired. It (to my understanding) still
> can't save tracebacks/stacktraces to the log files for whatever reason.
> stdout/stderr however works perfectly fine. That said the Docker log stuff
> has been a source of pain in the past, but it has gotten better. It does
> have the limitation of being only able to log one output at a time. This
> means, as an example, the neutron-dhcp-agent could send its logs to
> stdout/err but the dnsmasq process that it launch (that also has logs) would
> have to mix its logs in with the neutron logs in stdout/err. Can Heka handle
> this and separate them efficiently? Otherwise I see no choice but to stick
> with something that can handle multiple logs from a single container.
>
> Sam Yaple
>
> On Mon, Jan 11, 2016 at 10:16 PM, Eric LEMOINE 
> wrote:
>>
>>
>> Le 11 janv. 2016 18:45, "Michał Jastrzębski"  a écrit :
>> >
>> > On 11 January 2016 at 10:55, Eric LEMOINE  wrote:
>> > > Currently the services running in containers send their logs to
>> > > rsyslog. And rsyslog stores the logs in local files, located in the
>> > > host's /var/log directory.
>> >
>> > Yeah, however plan was to teach rsyslog to forward logs to central
>> > logging stack once this thing is implemented.
>>
>> Yes. With the current ELK Change Request, Rsyslog sends logs to the
>> central Logstash instance. If you read my design doc you'll understand that
>> it's precisely what we're proposing changing.
>>
>> > > I know. Our plan is to rely on Docker. Basically: containers write
>> > > their logs to stdout. The logs are collected by Docker Engine, which
>> > > makes them available through the unix:///var/run/docker.sock socket.
>> > > The socket is mounted into the Heka container, which uses the Docker
>> > > Log Input plugin [*] to reads the logs from that that socket.
>> > >
>> > > [*]
>> > > 
>> >
>> > So docker logs isn't best thing there is, however I'd suspect that's
>> > mostly console output fault. If you can tap into stdout efficiently,
>> > I'd say that's pretty good option.
>>
>> I'm not following you. Could you please be more specific?
>>
>> > >> Seems to me we need additional comparason of heka vs rsyslog;) Also
>> > >> this would have to be hands down better because rsyslog is already
>> > >> implemented, working and most of operators knows how to use it.
>> > >
>> > >
>> > > We don't need to remove Rsyslog. Services running in containers can
>> > > write their logs to both Rsyslog and stdout, which even is what they
>> > > do today (at least for the OpenStack services).
>> > >
>> >
>> > There is no point for that imho. I don't want to have several systems
>> > doing the same thing. Let's make decision of one, but optimal toolset.
>> > Could you please describe bottoms up what would your logging stack
>> > look like? Heka listening on stdout, transfering stuff to
>> > elasticsearch and kibana on top of it?
>>
>> My plan is to provide details in the blueprint document, that I'll
>> continue working on if the core developers agree with the principles of the
>> proposed architecture and change.
>>
>> But here's our plan—as already described in my previous email: the Kolla
>> services, which run in containers, write their logs to stdout. Logs are
>> collected by the Docker engine. Heka's Docker Log Input plugin is used to
>> read the container logs from the Docker endpoint (Unix socket). Since Heka
>> will run in a container a volume is necessary for accessing the Docker
>> endpoint. The Docker Log Input plugin inserts the logs into the Heka
>> pipeline, at the end of which an Elasticsearch Output plugin will send the
>> log messages to Elasticsearch. Here's a blog post reporting on that
>> approach:
>> .
>> We haven't tested that approach yet, but we plan to experiment with it as we
>> work on the specs.
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> 

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-11 Thread Sam Yaple
Here is why I am on board with this. As we have discovered, the logging
with the syslog plugin leaves alot to be desired. It (to my understanding)
still can't save tracebacks/stacktraces to the log files for whatever
reason. stdout/stderr however works perfectly fine. That said the Docker
log stuff has been a source of pain in the past, but it has gotten better.
It does have the limitation of being only able to log one output at a time.
This means, as an example, the neutron-dhcp-agent could send its logs to
stdout/err but the dnsmasq process that it launch (that also has logs)
would have to mix its logs in with the neutron logs in stdout/err. Can Heka
handle this and separate them efficiently? Otherwise I see no choice but to
stick with something that can handle multiple logs from a single container.

Sam Yaple

On Mon, Jan 11, 2016 at 10:16 PM, Eric LEMOINE 
wrote:

>
> Le 11 janv. 2016 18:45, "Michał Jastrzębski"  a écrit :
> >
> > On 11 January 2016 at 10:55, Eric LEMOINE  wrote:
> > > Currently the services running in containers send their logs to
> > > rsyslog. And rsyslog stores the logs in local files, located in the
> > > host's /var/log directory.
> >
> > Yeah, however plan was to teach rsyslog to forward logs to central
> > logging stack once this thing is implemented.
>
> Yes. With the current ELK Change Request, Rsyslog sends logs to the
> central Logstash instance. If you read my design doc you'll understand that
> it's precisely what we're proposing changing.
>
> > > I know. Our plan is to rely on Docker. Basically: containers write
> > > their logs to stdout. The logs are collected by Docker Engine, which
> > > makes them available through the unix:///var/run/docker.sock socket.
> > > The socket is mounted into the Heka container, which uses the Docker
> > > Log Input plugin [*] to reads the logs from that that socket.
> > >
> > > [*] <
> http://hekad.readthedocs.org/en/latest/config/inputs/docker_log.html>
> >
> > So docker logs isn't best thing there is, however I'd suspect that's
> > mostly console output fault. If you can tap into stdout efficiently,
> > I'd say that's pretty good option.
>
> I'm not following you. Could you please be more specific?
>
> > >> Seems to me we need additional comparason of heka vs rsyslog;) Also
> > >> this would have to be hands down better because rsyslog is already
> > >> implemented, working and most of operators knows how to use it.
> > >
> > >
> > > We don't need to remove Rsyslog. Services running in containers can
> > > write their logs to both Rsyslog and stdout, which even is what they
> > > do today (at least for the OpenStack services).
> > >
> >
> > There is no point for that imho. I don't want to have several systems
> > doing the same thing. Let's make decision of one, but optimal toolset.
> > Could you please describe bottoms up what would your logging stack
> > look like? Heka listening on stdout, transfering stuff to
> > elasticsearch and kibana on top of it?
>
> My plan is to provide details in the blueprint document, that I'll
> continue working on if the core developers agree with the principles of the
> proposed architecture and change.
>
> But here's our plan—as already described in my previous email: the Kolla
> services, which run in containers, write their logs to stdout. Logs are
> collected by the Docker engine. Heka's Docker Log Input plugin is used to
> read the container logs from the Docker endpoint (Unix socket). Since Heka
> will run in a container a volume is necessary for accessing the Docker
> endpoint. The Docker Log Input plugin inserts the logs into the Heka
> pipeline, at the end of which an Elasticsearch Output plugin will send the
> log messages to Elasticsearch. Here's a blog post reporting on that
> approach: <
> http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/>.
> We haven't tested that approach yet, but we plan to experiment with it as
> we work on the specs.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-11 Thread Eric LEMOINE
Le 11 janv. 2016 18:45, "Michał Jastrzębski"  a écrit :
>
> On 11 January 2016 at 10:55, Eric LEMOINE  wrote:
> > Currently the services running in containers send their logs to
> > rsyslog. And rsyslog stores the logs in local files, located in the
> > host's /var/log directory.
>
> Yeah, however plan was to teach rsyslog to forward logs to central
> logging stack once this thing is implemented.

Yes. With the current ELK Change Request, Rsyslog sends logs to the central
Logstash instance. If you read my design doc you'll understand that it's
precisely what we're proposing changing.

> > I know. Our plan is to rely on Docker. Basically: containers write
> > their logs to stdout. The logs are collected by Docker Engine, which
> > makes them available through the unix:///var/run/docker.sock socket.
> > The socket is mounted into the Heka container, which uses the Docker
> > Log Input plugin [*] to reads the logs from that that socket.
> >
> > [*] <
http://hekad.readthedocs.org/en/latest/config/inputs/docker_log.html>
>
> So docker logs isn't best thing there is, however I'd suspect that's
> mostly console output fault. If you can tap into stdout efficiently,
> I'd say that's pretty good option.

I'm not following you. Could you please be more specific?

> >> Seems to me we need additional comparason of heka vs rsyslog;) Also
> >> this would have to be hands down better because rsyslog is already
> >> implemented, working and most of operators knows how to use it.
> >
> >
> > We don't need to remove Rsyslog. Services running in containers can
> > write their logs to both Rsyslog and stdout, which even is what they
> > do today (at least for the OpenStack services).
> >
>
> There is no point for that imho. I don't want to have several systems
> doing the same thing. Let's make decision of one, but optimal toolset.
> Could you please describe bottoms up what would your logging stack
> look like? Heka listening on stdout, transfering stuff to
> elasticsearch and kibana on top of it?

My plan is to provide details in the blueprint document, that I'll continue
working on if the core developers agree with the principles of the proposed
architecture and change.

But here's our plan—as already described in my previous email: the Kolla
services, which run in containers, write their logs to stdout. Logs are
collected by the Docker engine. Heka's Docker Log Input plugin is used to
read the container logs from the Docker endpoint (Unix socket). Since Heka
will run in a container a volume is necessary for accessing the Docker
endpoint. The Docker Log Input plugin inserts the logs into the Heka
pipeline, at the end of which an Elasticsearch Output plugin will send the
log messages to Elasticsearch. Here's a blog post reporting on that
approach: <
http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/>.
We haven't tested that approach yet, but we plan to experiment with it as
we work on the specs.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-11 Thread Eric LEMOINE
On Mon, Jan 11, 2016 at 5:01 PM, Michał Jastrzębski  wrote:
> On 11 January 2016 at 09:16, Eric LEMOINE  wrote:

>> * Logstash was designed for logs processing.  Heka is a "unified data
>> processing" software, designed to process streams of any type of data.
>> So Heka is about running one service on each box instead of many.
>> Using a single service for processing different types of data also
>> makes it possible to do correlations, and derive metrics from logs and
>> events.  See Rob Miller's presentation [4] for more details.
>
> Right now we use rsyslog for that.


Currently the services running in containers send their logs to
rsyslog. And rsyslog stores the logs in local files, located in the
host's /var/log directory.


> As I understand Heka right now
> would be actually alternative to rsyslog, and that is already
> implemented. Also with Heka case, we might run into same problem we've
> seen with rsyslog - transporting logs from service to heka. Keep in
> mind we're in dockers and heka will be in different container than
> service it's supposed to listen to. We do that by sharing faked
> /dev/log across containers and rsyslog can handle that.


I know. Our plan is to rely on Docker. Basically: containers write
their logs to stdout. The logs are collected by Docker Engine, which
makes them available through the unix:///var/run/docker.sock socket.
The socket is mounted into the Heka container, which uses the Docker
Log Input plugin [*] to reads the logs from that that socket.

[*] 


> Also Heka seems to be just processing mechanism while logstash is
> well...stash, so it saves and persists logs, so it seems to me they're
> different layers of log processing.

No. Logstash typically stores the logs in Elasticsearch. And we'd do
the same with Heka.


>
> Seems to me we need additional comparason of heka vs rsyslog;) Also
> this would have to be hands down better because rsyslog is already
> implemented, working and most of operators knows how to use it.


We don't need to remove Rsyslog. Services running in containers can
write their logs to both Rsyslog and stdout, which even is what they
do today (at least for the OpenStack services).


Hope that makes sense!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-11 Thread Michał Jastrzębski
On 11 January 2016 at 09:16, Eric LEMOINE  wrote:
> Hi
>
> As discussed on IRC the other day [1] we want to propose a distributed
> logs processing architecture based on Heka [2], built on Alicja
> Kwasniewska's ELK work with
> .  Please take a look at the
> design document I've started working on [3].  The document is still
> work-in-progress, but the "Problem statement" and "Proposed change"
> sections should provide you with a good overview of the architecture
> we have in mind.
>
> In the proposed architecture each cluster node runs an instance of
> Heka for collecting and processing logs.  And instead of sending the
> processed logs to a centralized Logstash instance, logs are directly
> sent to Elasticsearch, which itself can be distributed across multiple
> nodes for high-availability and scaling.  The proposed architecture is
> based on Heka, and it doesn't use Logstash.
>
> That being said, it is important to note that the intent of this
> proposal is not strictly directed at replacing Logstash by Heka.  The
> intent is to propose a distributed architecture with Heka running on
> each cluster node rather than having Logstash run as a centralized
> logs processing component.  For such a distributed architecture we
> think that Heka is more appropriate, with a smaller memory footprint
> and better performances in general.  In addition, Heka is also more
> than a logs processing tool, as it's designed to process streams of
> any type of data, including events, logs and metrics.
>
> Some elements of comparison between Heka and Logstash:
>
> * Logstash was designed for logs processing.  Heka is a "unified data
> processing" software, designed to process streams of any type of data.
> So Heka is about running one service on each box instead of many.
> Using a single service for processing different types of data also
> makes it possible to do correlations, and derive metrics from logs and
> events.  See Rob Miller's presentation [4] for more details.

Right now we use rsyslog for that. As I understand Heka right now
would be actually alternative to rsyslog, and that is already
implemented. Also with Heka case, we might run into same problem we've
seen with rsyslog - transporting logs from service to heka. Keep in
mind we're in dockers and heka will be in different container than
service it's supposed to listen to. We do that by sharing faked
/dev/log across containers and rsyslog can handle that.
Also Heka seems to be just processing mechanism while logstash is
well...stash, so it saves and persists logs, so it seems to me they're
different layers of log processing.

Seems to me we need additional comparason of heka vs rsyslog;) Also
this would have to be hands down better because rsyslog is already
implemented, working and most of operators knows how to use it.

> * The virtual size of the Logstash Docker image is 447 MB, while the
> virtual size of an Heka image built from the same base image
> (debian:jessie) is 177 MB.  For comparison the virtual size of the
> Elasticsearch image is 345 MB.
>
> * Heka is written in Go and has no dependencies.  Go programs are
> compiled to native code.  This in contrast to Logstash which uses
> JRuby and as such requires running a Java Virtual Machine.  Besides
> this native versus interpreted code aspect, this also can raise the
> question of which JVM to use (Oracle, OpenJDK?) and which version
> (6,7,8?).
>
> * There are six types of Heka plugins: Inputs, Splitters, Decoders,
> Filters, Encoders, and Outputs.  Heka plugins are written in Go or
> Lua.  When written in Lua their executions are sandbox'ed, where
> misbehaving plugins may be shut down by Heka.  Lua plugins may also be
> dynamically added to Heka with no config changes or Heka restart. This
> is an important property on container environments such as Mesos,
> where workloads are changed dynamically.
>
> * To avoid losing logs under high load it is often recommend to use
> Logstash together with Redis [5].  Redis plays the role of a buffer,
> where logs are queued when Logstash or Elasticsearch cannot keep up
> with the load.  Heka, as a "unified data processing" software,
> includes its own resilient message queue, making it unnecessary to use
> an external queue (Redis for example).
>
> * Heka is faster than Logstash for processing logs, and its memory
> footprint is smaller.  I ran tests, where 3,400,000 log messages were
> read from 500 input files and then written to a single output file.
> Heka processed the 3,400,000 log messages in 12 seconds, consuming
> 500M of RAM.  Logstash processed the 3,400,000 log messages in 1mn
> 35s, consuming 1.1G of RAM.  Adding a grok filter to parse and
> structure logs, Logstash processed the 3,400,000 log messages in 2mn
> 15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka
> processed the 3,400,000 log messages in 27s, consuming 730M of RAM.
> See my GitHub repo [6] for more 

[openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-11 Thread Eric LEMOINE
Hi

As discussed on IRC the other day [1] we want to propose a distributed
logs processing architecture based on Heka [2], built on Alicja
Kwasniewska's ELK work with
.  Please take a look at the
design document I've started working on [3].  The document is still
work-in-progress, but the "Problem statement" and "Proposed change"
sections should provide you with a good overview of the architecture
we have in mind.

In the proposed architecture each cluster node runs an instance of
Heka for collecting and processing logs.  And instead of sending the
processed logs to a centralized Logstash instance, logs are directly
sent to Elasticsearch, which itself can be distributed across multiple
nodes for high-availability and scaling.  The proposed architecture is
based on Heka, and it doesn't use Logstash.

That being said, it is important to note that the intent of this
proposal is not strictly directed at replacing Logstash by Heka.  The
intent is to propose a distributed architecture with Heka running on
each cluster node rather than having Logstash run as a centralized
logs processing component.  For such a distributed architecture we
think that Heka is more appropriate, with a smaller memory footprint
and better performances in general.  In addition, Heka is also more
than a logs processing tool, as it's designed to process streams of
any type of data, including events, logs and metrics.

Some elements of comparison between Heka and Logstash:

* Logstash was designed for logs processing.  Heka is a "unified data
processing" software, designed to process streams of any type of data.
So Heka is about running one service on each box instead of many.
Using a single service for processing different types of data also
makes it possible to do correlations, and derive metrics from logs and
events.  See Rob Miller's presentation [4] for more details.

* The virtual size of the Logstash Docker image is 447 MB, while the
virtual size of an Heka image built from the same base image
(debian:jessie) is 177 MB.  For comparison the virtual size of the
Elasticsearch image is 345 MB.

* Heka is written in Go and has no dependencies.  Go programs are
compiled to native code.  This in contrast to Logstash which uses
JRuby and as such requires running a Java Virtual Machine.  Besides
this native versus interpreted code aspect, this also can raise the
question of which JVM to use (Oracle, OpenJDK?) and which version
(6,7,8?).

* There are six types of Heka plugins: Inputs, Splitters, Decoders,
Filters, Encoders, and Outputs.  Heka plugins are written in Go or
Lua.  When written in Lua their executions are sandbox'ed, where
misbehaving plugins may be shut down by Heka.  Lua plugins may also be
dynamically added to Heka with no config changes or Heka restart. This
is an important property on container environments such as Mesos,
where workloads are changed dynamically.

* To avoid losing logs under high load it is often recommend to use
Logstash together with Redis [5].  Redis plays the role of a buffer,
where logs are queued when Logstash or Elasticsearch cannot keep up
with the load.  Heka, as a "unified data processing" software,
includes its own resilient message queue, making it unnecessary to use
an external queue (Redis for example).

* Heka is faster than Logstash for processing logs, and its memory
footprint is smaller.  I ran tests, where 3,400,000 log messages were
read from 500 input files and then written to a single output file.
Heka processed the 3,400,000 log messages in 12 seconds, consuming
500M of RAM.  Logstash processed the 3,400,000 log messages in 1mn
35s, consuming 1.1G of RAM.  Adding a grok filter to parse and
structure logs, Logstash processed the 3,400,000 log messages in 2mn
15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka
processed the 3,400,000 log messages in 27s, consuming 730M of RAM.
See my GitHub repo [6] for more information about the test
environment.

Also, I want to say that our team has been using Heka in production
for about a year, in clusters of up to 200 nodes.  Heka has proven to
be very robust, efficient and flexible enough to address our logs
processing and monitoring use-cases.  We've also acquired a solid
experience with it.

Any comments are welcome!

Thanks.


[1] 

[2] 
[3] 

[4] 
[5] 
[6] 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-11 Thread Sam Yaple
I like the idea of using Heka. You and I have discussed this on IRC before.
So my vote for this is +1. I can't think of any downside. I would like to
hear Alicja Kwasniewska's view on this as she has done the majority of work
with Logstash up until this point.

Sam Yaple

On Mon, Jan 11, 2016 at 3:16 PM, Eric LEMOINE  wrote:

> Hi
>
> As discussed on IRC the other day [1] we want to propose a distributed
> logs processing architecture based on Heka [2], built on Alicja
> Kwasniewska's ELK work with
> .  Please take a look at the
> design document I've started working on [3].  The document is still
> work-in-progress, but the "Problem statement" and "Proposed change"
> sections should provide you with a good overview of the architecture
> we have in mind.
>
> In the proposed architecture each cluster node runs an instance of
> Heka for collecting and processing logs.  And instead of sending the
> processed logs to a centralized Logstash instance, logs are directly
> sent to Elasticsearch, which itself can be distributed across multiple
> nodes for high-availability and scaling.  The proposed architecture is
> based on Heka, and it doesn't use Logstash.
>
> That being said, it is important to note that the intent of this
> proposal is not strictly directed at replacing Logstash by Heka.  The
> intent is to propose a distributed architecture with Heka running on
> each cluster node rather than having Logstash run as a centralized
> logs processing component.  For such a distributed architecture we
> think that Heka is more appropriate, with a smaller memory footprint
> and better performances in general.  In addition, Heka is also more
> than a logs processing tool, as it's designed to process streams of
> any type of data, including events, logs and metrics.
>
> Some elements of comparison between Heka and Logstash:
>
> * Logstash was designed for logs processing.  Heka is a "unified data
> processing" software, designed to process streams of any type of data.
> So Heka is about running one service on each box instead of many.
> Using a single service for processing different types of data also
> makes it possible to do correlations, and derive metrics from logs and
> events.  See Rob Miller's presentation [4] for more details.
>
> * The virtual size of the Logstash Docker image is 447 MB, while the
> virtual size of an Heka image built from the same base image
> (debian:jessie) is 177 MB.  For comparison the virtual size of the
> Elasticsearch image is 345 MB.
>
> * Heka is written in Go and has no dependencies.  Go programs are
> compiled to native code.  This in contrast to Logstash which uses
> JRuby and as such requires running a Java Virtual Machine.  Besides
> this native versus interpreted code aspect, this also can raise the
> question of which JVM to use (Oracle, OpenJDK?) and which version
> (6,7,8?).
>
> * There are six types of Heka plugins: Inputs, Splitters, Decoders,
> Filters, Encoders, and Outputs.  Heka plugins are written in Go or
> Lua.  When written in Lua their executions are sandbox'ed, where
> misbehaving plugins may be shut down by Heka.  Lua plugins may also be
> dynamically added to Heka with no config changes or Heka restart. This
> is an important property on container environments such as Mesos,
> where workloads are changed dynamically.
>
> * To avoid losing logs under high load it is often recommend to use
> Logstash together with Redis [5].  Redis plays the role of a buffer,
> where logs are queued when Logstash or Elasticsearch cannot keep up
> with the load.  Heka, as a "unified data processing" software,
> includes its own resilient message queue, making it unnecessary to use
> an external queue (Redis for example).
>
> * Heka is faster than Logstash for processing logs, and its memory
> footprint is smaller.  I ran tests, where 3,400,000 log messages were
> read from 500 input files and then written to a single output file.
> Heka processed the 3,400,000 log messages in 12 seconds, consuming
> 500M of RAM.  Logstash processed the 3,400,000 log messages in 1mn
> 35s, consuming 1.1G of RAM.  Adding a grok filter to parse and
> structure logs, Logstash processed the 3,400,000 log messages in 2mn
> 15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka
> processed the 3,400,000 log messages in 27s, consuming 730M of RAM.
> See my GitHub repo [6] for more information about the test
> environment.
>
> Also, I want to say that our team has been using Heka in production
> for about a year, in clusters of up to 200 nodes.  Heka has proven to
> be very robust, efficient and flexible enough to address our logs
> processing and monitoring use-cases.  We've also acquired a solid
> experience with it.
>
> Any comments are welcome!
>
> Thanks.
>
>
> [1] <
> http://eavesdrop.openstack.org/meetings/kolla/2016/kolla.2016-01-06-16.32.html
> >
> [2] 
> [3] <
> 

Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

2016-01-11 Thread Michał Jastrzębski
On 11 January 2016 at 10:55, Eric LEMOINE  wrote:
> On Mon, Jan 11, 2016 at 5:01 PM, Michał Jastrzębski  wrote:
>> On 11 January 2016 at 09:16, Eric LEMOINE  wrote:
>
>>> * Logstash was designed for logs processing.  Heka is a "unified data
>>> processing" software, designed to process streams of any type of data.
>>> So Heka is about running one service on each box instead of many.
>>> Using a single service for processing different types of data also
>>> makes it possible to do correlations, and derive metrics from logs and
>>> events.  See Rob Miller's presentation [4] for more details.
>>
>> Right now we use rsyslog for that.
>
>
> Currently the services running in containers send their logs to
> rsyslog. And rsyslog stores the logs in local files, located in the
> host's /var/log directory.

Yeah, however plan was to teach rsyslog to forward logs to central
logging stack once this thing is implemented.

>> As I understand Heka right now
>> would be actually alternative to rsyslog, and that is already
>> implemented. Also with Heka case, we might run into same problem we've
>> seen with rsyslog - transporting logs from service to heka. Keep in
>> mind we're in dockers and heka will be in different container than
>> service it's supposed to listen to. We do that by sharing faked
>> /dev/log across containers and rsyslog can handle that.
>
>
> I know. Our plan is to rely on Docker. Basically: containers write
> their logs to stdout. The logs are collected by Docker Engine, which
> makes them available through the unix:///var/run/docker.sock socket.
> The socket is mounted into the Heka container, which uses the Docker
> Log Input plugin [*] to reads the logs from that that socket.
>
> [*] 

So docker logs isn't best thing there is, however I'd suspect that's
mostly console output fault. If you can tap into stdout efficiently,
I'd say that's pretty good option.

>
>> Also Heka seems to be just processing mechanism while logstash is
>> well...stash, so it saves and persists logs, so it seems to me they're
>> different layers of log processing.
>
> No. Logstash typically stores the logs in Elasticsearch. And we'd do
> the same with Heka.
>
>
>>
>> Seems to me we need additional comparason of heka vs rsyslog;) Also
>> this would have to be hands down better because rsyslog is already
>> implemented, working and most of operators knows how to use it.
>
>
> We don't need to remove Rsyslog. Services running in containers can
> write their logs to both Rsyslog and stdout, which even is what they
> do today (at least for the OpenStack services).
>

There is no point for that imho. I don't want to have several systems
doing the same thing. Let's make decision of one, but optimal toolset.
Could you please describe bottoms up what would your logging stack
look like? Heka listening on stdout, transfering stuff to
elasticsearch and kibana on top of it?

> Hope that makes sense!
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev