Re: [heka] State and future of Heka

Mac Stork Fri, 03 Jun 2016 04:20:07 -0700

Hi all,

Ali, I share your opinion concerning Heka's strengths. I also think that
Heka stands out because of the flexibility of its filters. There are few to
none lightweight data collectors/shippers that allow to process events with
that many decoders/filters/encoders, with the possibility of chaining them.
The numerous filtering possibilities was what made us use Heka.


Concerning the alternative to Heka, i.e elastic's Beats: there is obviously
a lack of outputs. However things might take a turn and you should look
(might even participate) at this recent ticket about having
community-maintained outputs:
https://github.com/elastic/beats/pull/1681

Vincent

On 2 June 2016 at 22:22, Ali <h...@alijnabavi.info> wrote:

> Thanks, Rob!
>
> I have to say, I'm EXTREMELY DISAPPOINTED to hear this.
>
> I have been away from Heka for a while (working on other projects at work)
> and am now able to refocus on designing our new data
> collection/analysis/reporting system.  Once I read this e-mail, I started
> looking around to see what else was out there and what has changed over the
> last several months.  Elastic's Beats
> <https://www.elastic.co/products/beats> project, particularly Filebeat
> <https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html>,
> seemed like a really interesting and welcome development.  However,
> compared to the flexibility of Heka's ins and outs, Filebeats seems to be
> wanting badly.
>
> Suffice it to say, Heka still seems to stand alone in this space.  Its
> flexibility is amazing.  (Again, mostly talking about inputs and outputs
> here.)  The closest I can come to it is nxlog
> <http://nxlog-ce.sourceforge.net/about>, and I just really dislike that
> it's not more transparent and open-source.
>
> Anyway, I understand the rationale behind this decision and am hopeful
> that another org will continue work on this project.  Thanks for all of
> your efforts, Rob et al!
>
> -Ali
>
> P.S.  If anyone's interested, here's my situation right now:
> https://www.reddit.com/r/bigdata/comments/4m81vo/which_log_collectors_to_use_for_robust_handling/
>  and
> https://discuss.elastic.co/t/how-can-i-get-data-from-filebeat-to-flume/51734
>
>
> On Fri, May 6, 2016 at 12:51 PM Rob Miller <rmil...@mozilla.com> wrote:
>
>> Hi everyone,
>>
>> I'm loooong overdue in sending out an update about the current state of
>> and plans for Heka. Unfortunately, what I have to share here will
>> probably be disappointing for many of you, and it might impact whether
>> or not you want to continue using it, as all signs point to Heka getting
>> less support and fewer updates moving forward.
>>
>> The short version is that Heka has some design flaws that make it hard
>> to incrementally improve it enough to meet the high throughput and
>> reliability goals that we were hoping to achieve. While it would be
>> possible to do a major overhaul of the code to resolve most of these
>> issues, I don't have the personal bandwidth to do that work, since most
>> of my time is consumed working on Mozilla's immediate data processing
>> needs rather than general purpose tools these days. Hindsight
>> (https://github.com/trink/hindsight), built around the same Lua sandbox
>> technology as Heka, doesn't have these issues, and internally we're
>> using it more and more instead of Heka, so there's no organizational
>> imperative for me (or anyone else) to spend the time required to
>> overhaul the Go code base.
>>
>> Heka is still in use here, though, especially on our edge nodes, so it
>> will see a bit more improvement and at least a couple more releases.
>> Most notably, it's on my list to switch to using the most recent Lua
>> sandbox code, which will move most of the protobuf processing to custom
>> C code, and will likely improve performance as well as remove a lot of
>> the problematic cgo code, which is what's currently keeping us from
>> being able to upgrade to a recent Go version.
>>
>> Beyond that, however, Heka's future is uncertain. The code that's there
>> will still work, of course, but I may not be doing any further
>> improvements, and my ability to keep up with support requests and PRs,
>> already on the decline, will likely continue to wane.
>>
>> So what are the options? If you're using a significant amount of Lua
>> based functionality, you might consider transitioning to Hindsight. Any
>> Lua code that works in Heka will work in Hindsight. Hindsight is a much
>> leaner and more solid foundation. Hindsight has far fewer i/o plugins
>> than Heka, though, so for many it won't be a simple transition.
>>
>> Also, if there's someone out there (an organization, most likely) that
>> has a strong interest in keeping Heka's codebase alive, through funding
>> or coding contributions, I'd be happy to support that endeavor. Some
>> restrictions apply, however; the work that needs to be done to improve
>> Heka's foundation is not beginner level work, and my time to help is
>> very limited, so I'm only willing to support folks who demonstrate that
>> they are up to the task. Please contact me off-list if you or your
>> organization is interested.
>>
>> Anyone casually following along can probably stop reading here. Those of
>> you interested in the gory details can read on to hear more about what
>> the issues are and how they might be resolved.
>>
>> First, I'll say that I think there's a lot that Heka got right. The
>> basic composition of the pipeline (input -> split -> decode -> route ->
>> process -> encode -> output) seems to hit a sweet spot for composability
>> and reuse. The Lua sandbox, and especially the use of LPEG for text
>> parsing and transformation, has proven to be extremely efficient and
>> powerful; it's the most important and valuable part of the Heka stack.
>> The routing infrastructure is efficient and solid. And, perhaps most
>> importantly, Heka is useful; there are a lot of you out there using it
>> to get work done.
>>
>> There was one fundamental mistake made, however, which is that we
>> shouldn't have used channels. There are many competing opinions about Go
>> channels. I'm not going to get in to whether or not they're *ever* a
>> good idea, but I will say unequivocally that their use as the means of
>> pushing messages through the Heka pipeline was a mistake, for a number
>> of reasons.
>>
>> First, they don't perform well enough. While Heka performs many tasks
>> faster than some other popular tools, we've consistently hit a
>> throughput ceiling thanks to all of the synchronization that channels
>> require. And this ceiling, sadly, is generally lower than is acceptable
>> for the amount of data that we at Mozilla want to push through our
>> aggregators single system.
>>
>> Second, they make it very hard to prevent message loss. If unbuffered
>> channels are used everywhere, performance plummets unacceptably due to
>> context-switching costs. But using buffered channels means that many
>> messages are in flight at a time, most of which are sitting in channels
>> waiting to be processed. Keeping track of which messages have made it
>> all the way through the pipeline requires complicated coordination
>> between chunks of code that are conceptually quite far away from each
>> other.
>>
>> Third, the buffered channels mean that Heka consumes much more RAM than
>> would be otherwise needed, since we have to pre-allocate a pool of
>> messages. If the pool size is too small, then Heka becomes susceptible
>> to deadlocks, with all of the available packs sitting in channel queues,
>> unable to be processed because some plugin is blocked on waiting for an
>> available pack. But cranking up the pool size causes Heka to use more
>> memory, even when it's idle.
>>
>> Hindsight avoids all of these problems by using disk queues instead of
>> RAM buffers between all of the processing stages. It's a bit
>> counterintuitive, but at high throughput performance is actually better
>> than with RAM buffers, because a) there's no need for synchronization
>> locks and b) the data is typically read quickly enough after it's
>> written that it stays in the disk cache.
>>
>> There's much less chance of message loss, because every plugin is
>> holding on to only one message in memory at a time, while using a
>> written-to-disk cursor file to track the current position in the disk
>> buffer. If the plug is pulled mid-process, some messages that were
>> already processed might be processed again, but nothing will be lost,
>> and there's no need for complex coordination between different stages of
>> the pipeline.
>>
>> Finally, there's no need for a pool of messages. Each plugin is holding
>> some small number of packs (possibly as few as one) in its own memory
>> space, and those packs never escape that plugin's ownership. RAM usage
>> doesn't grow, and pool exhaustion related deadlocks are a thing of the
>> past.
>>
>> For Heka to have a viable future, it would basically need to be updated
>> to work almost exactly like Hindsight. First, all of the APIs would need
>> to be changed to no longer refer to channels. (The fact that we exposed
>> channels to the APIs is another mistake we made... it's now generally
>> frowned upon in Go land to expose channels as part of your public APIs.)
>> There's already a non-channel based API for filters and outputs, but
>> most of the plugins haven't yet been updated to use the new API, which
>> would need to happen.
>>
>> Then the hard work would start; a major overhaul of Heka's internals, to
>> switch from channel based message passing to disk queue based message
>> passing. The work that's been done to support disk buffering for filters
>> and outputs is useful, but not quite enough, because it's not scalable
>> for each plugin to have its own queue; the number of open file
>> descriptors would grow very quickly. Instead it would need to work like
>> Hindsight, where there's one queue that all of the inputs write to, and
>> another that filters write to. Each plugin reads through its specified
>> input queue, looking for messages that match its message matcher,
>> writing its location in the queue back to the shared cursors file.
>>
>> There would also be some complexity in reconciling Heka's breakdown of
>> the input stage into input/splitter/decoder with Hindsight's
>> encapsulation of all of these stages into a single sandbox.
>>
>> Ultimately I think this would be at least 2-3 months full time work for
>> me. I'm not the fastest coder around, but I know where the bodies are
>> buried, so I'd guess it would take anyone else at least as long,
>> possibly longer if they're not already familiar with how everything is
>> put together.
>>
>> And that's about it. If you've gotten this far, thanks for reading.
>> Also, thanks to everyone who's contributed to Heka in any way, be it by
>> code, doc fixes, bug reports, or even just appreciation. I'm sorry for
>> those of you using it regularly that there's not a more stable future.
>>
>> Regards,
>>
>> -r
>> _______________________________________________
>> Heka mailing list
>> Heka@mozilla.org
>> https://mail.mozilla.org/listinfo/heka
>>
>
> _______________________________________________
> Heka mailing list
> Heka@mozilla.org
> https://mail.mozilla.org/listinfo/heka
>
>

_______________________________________________
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka

Re: [heka] State and future of Heka

Reply via email to