[heka] Heka test suite

2016-05-06 Thread Timur Batyrshin
Hi all,

Half a year ago I felt the need to write unit tests for my Lua modules for Heka 
but didn’t find easy way to do that with standard Heka/LuaSandbox toolchain.

For that I’ve created a pure Lua mock for Heka: 
https://github.com/timurb/heka_mock

That is you can write unit tests with standard Lua testing libraries (like 
busted), make sure you are good with some basic logic and move on to real 
testing with Heka afterwards.

Some usage examples can be found in test/spec folder.

It’s far from being complete: I’ve never intended to implement binary functions 
(like correct decode_message() ) and some others may be missing too. Also it is 
by no means a replacement for real testing against Heka.

Nevertheless it helped me to catch couple of bugs which I’d have hard time 
catching with Heka (like passing array by reference rather than copying).

I’m no longer an active Heka user for now but I think I better send the 
announcement anyway than keep the tool hidden from the community :-)

If you need help on using that I’d be happy to help and if you have pull 
requests I’d be happy to merge them.

Also any feedback is very welcome.

Timur___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] regexp in the left part of the message_matcher?

2016-03-13 Thread Timur Batyrshin
My use case is simple: I want to send out only part of metrics.
My list of metrics is quite large to be typed as a message matcher (hundreds 
and thousands total with dozens and hundreds of matches).

Looks like I’ll iterate over the metrics as you suggest — I’m doing something 
like that anyway for conversion purposes.


Thanks,
Timur

On 13 Mar 2016 at 21:34:03, Michael Trinkala (mtrink...@mozilla.com) wrote:

All field names are treated as literals so no it is not possible.  Within the 
plugin you could iterate all fields implementing your own matching function but 
that is still sub optimal.  I would probably look into altering your message 
structure to better match the consumption use case.

Trink

On Sun, Mar 13, 2016 at 7:08 AM, Timur Batyrshin <ert...@gmail.com> wrote:
Hi all,

I’ve been looking for a way to specify regexp in left part of the message 
matcher (example: Fields[foo_.*] != NIL).
I could find a way to do that by checking the docs.

Could you confirm that I can’t do that or I just missed the doc page?


Thanks,
Timur

___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] How to parse multi-line node.js stack traces?

2016-02-03 Thread Timur Batyrshin
Hi Rodion,


Is it correct that multiline logs always end in empty line and no empty lines 
can be seen in the multiline?

If that’s correct use TokenSplitter or RegexSplitter to split stream of 
loglines into individual messages.

Then use MultiDecoder with cascade strategy set to “first-wins” to decode 
different formats of log lines.
The subsequent decoders are going to be plain LPEG-based lua decoders — one for 
every format of your log messages
(or PayloadRegexDecoder but it is said to be slower and not recommended).

For testing LPEG grammar there is a great online tool http://lpeg.trink.com/ 
which is really helpful in debugging.


Regards,
Timur
On 2 Feb 2016 at 21:05:17, Rodion Vynnychenko (rodd...@gmail.com) wrote:

Hello,

is it possible to make heka understand multi-line stack traces node.js 
produces? For example this script:

var e = new (require('events').EventEmitter)();

for (var i = 0; i < 20; i++) {
    e.on('test', function() {});
}

process.stderr.write("Something normal written to stderr\n");

console.log(a());

would produce stderr output along the lines of:

(node) warning: possible EventEmitter memory leak detected. 11 listeners added. 
Use emitter.setMaxListeners() to increase limit.
Trace
    at EventEmitter.addListener (events.js:160:15)
    at Object. (/tmp/test.js:4:4)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3
Something normal written to stderr

/tmp/test.js:7
console.log(a());
    ^
ReferenceError: a is not defined
    at Object. (/tmp/test.js:7:13)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3

What would be the best way to configure heka in a way that would capture the 
about output in 3 messages instead of one per line?

--
Cheers,
Rodion
___  
Heka mailing list  
Heka@mozilla.org  
https://mail.mozilla.org/listinfo/heka  
___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] 2 consecutive inject_payload() calls in a single process_message() of encoder

2016-01-19 Thread Timur Batyrshin
That is configured by which param? plugin_chansize or which?

Hopefully I will not hit the case as there will be no more than 10 messages 
emitted by a single decoder at the same time.
But if there are several inputs with the same decoder running it becomes not so 
unlikely..


Timur
On 19 Jan 2016 at 23:07:20, Rob Miller (rmil...@mozilla.com) wrote:

Yes, decoders do, but keep in mind that none of the messages will be injected 
until they're *all* injected. This isn't usually a problem if you're only 
emitting a few output messages per input message, but if you're emitting a lot 
of messages you can easily exhaust the message pool.  

Filters don't have this problem.  

-r  



On 01/19/2016 12:04 PM, Timur Batyrshin wrote:  
> Ok, thanks for clarification!  
>  
> I think I’ll have to split the messages in decode then.  
> Do decoders support injecting several messages?  
>  
> Timur  
>  
> On 19 Jan 2016 at 23:00:53, Rob Miller (rmil...@mozilla.com  
> <mailto:rmil...@mozilla.com>) wrote:  
>  
> > No, I mean that SandboxEncoders don't at this point support multiple  
> > inject calls for each input message. You can only call inject once. If  
> > you want smaller input, you'll have to do the work in a filter.  
> >  
> > -r  
> >  
> >  
> > On 01/19/2016 11:53 AM, Timur Batyrshin wrote:  
> > > My point was not accumulating strings in output buffer but injecting  
> > > several messages instead to make them smaller.  
> > > Or do you mean that inject_message() run several times in a single run  
> > > should work?  
> > > I’ll try that — haven’t done that yet.  
> > >  
> > >  
> > > Timur  
> > >  
> > > On 19 Jan 2016 at 22:50:28, Rob Miller (rmil...@mozilla.com  
> > > <mailto:rmil...@mozilla.com>) wrote:  
> > >  
> > > > Yes, that's right. A SandboxEncoder can only make a single inject  
> > > > call. If you call `inject_message`, then you will be emitting a  
> > > > protobuf encoded Heka message. If you call `inject_payload`, then you  
> > > > will be emitting a UTF8 encoded set of bytes. If you want to  
> > > > accumulate multiple string values in the output buffer, you can use  
> > > > `add_to_payload` before a final `inject_payload` call.  
> > > >  
> > > > -r  
> > > >  
> > > >  
> > > > On 01/16/2016 12:08 PM, Timur Batyrshin wrote:  
> > > > > An encoder I was using for reference:  
> > > > >  
> > > > > function process_message()  
> > > > > inject_payload("msg1", "", "msg1")  
> > > > > inject_payload("msg2", "", "msg2")  
> > > > > return 0  
> > > > > end  
> > > > >  
> > > > > This produces only “msg2”.  
> > > > >  
> > > > >  
> > > > > Timur  
> > > > >  
> > > > >  
> > > > > On 16 Jan 2016 at 22:09:35, Timur Batyrshin 
> > > > > (timur.batyrs...@cinarra.com  
> > > > > <mailto:timur.batyrs...@cinarra.com>) wrote:  
> > > > >  
> > > > > > Hi,  
> > > > > >  
> > > > > >  
> > > > > > I’ve been trying to do some buffering and splitting of messages in 
> > > > > > an  
> > > > > > encoder and I’ve run in a following issue.  
> > > > > >  
> > > > > > Looks like calling inject_payload() several times in a single run 
> > > > > > of  
> > > > > > encoder’s process_message() doesn’t work and only the last message 
> > > > > > is  
> > > > > > injected.  
> > > > > > Is it so?  
> > > > > >  
> > > > > > If yes, any ideas how can I solve the following task?  
> > > > > > * I have incoming messages consisting of up to several hundreds of  
> > > > > > custom fields  
> > > > > > * I’m encoding the fields into JSON array like [{“metric":  
> > > > > > “foobar.count", "value": 1000}, {“metric": “bazqux.rate”, “value”:  
> > > > > > 10}] and sending them out via HTTP to a different service  
> > > > > > * I’d like to keep outgoing HTTP messages relatively small (no more 
> > > > > >  
> > > > > > than a few dozens of fiel

[heka] 2 consecutive inject_payload() calls in a single process_message() of encoder

2016-01-16 Thread Timur Batyrshin


Hi,


I’ve been trying to do some buffering and splitting of messages in an encoder 
and I’ve run in a following issue.

Looks like calling inject_payload() several times in a single run of encoder’s 
process_message() doesn’t work and only the last message is injected.
Is it so?

If yes, any ideas how can I solve the following task?
* I have incoming messages consisting of up to several hundreds of custom fields
* I’m encoding the fields into JSON array like [{“metric": “foobar.count", 
"value": 1000}, {“metric": “bazqux.rate”, “value”: 10}]  and sending them out 
via HTTP to a different service 
* I’d like to keep outgoing HTTP messages relatively small (no more than a few 
dozens of fields) and for that reason I’m going to split messages with too many 
fields into several smaller ones

For that I was going to do the splitting inside encoder attached to HTTPOutput 
and calling inject_payload() several times but it seems that only the last 
message is sent out.
I’ve tried setting max_process_inject = 1000 in [hekad] config section but this 
didn’t help either.

I could try using a filter to split messages and reinject them back to Heka 
(haven’t tested yet if it is going to work in this scenario) but this looks 
like a bit overweight to me.

Any other ideas how to handle the task?


Thanks,
Timur___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] 2 consecutive inject_payload() calls in a single process_message() of encoder

2016-01-16 Thread Timur Batyrshin
An encoder I was using for reference:

function process_message()
  inject_payload("msg1", "", "msg1")
  inject_payload("msg2", "", "msg2")
  return 0
end

This produces only “msg2”.


Timur


On 16 Jan 2016 at 22:09:35, Timur Batyrshin (timur.batyrs...@cinarra.com) wrote:

Hi,


I’ve been trying to do some buffering and splitting of messages in an encoder 
and I’ve run in a following issue.

Looks like calling inject_payload() several times in a single run of encoder’s 
process_message() doesn’t work and only the last message is injected.
Is it so?

If yes, any ideas how can I solve the following task?
* I have incoming messages consisting of up to several hundreds of custom fields
* I’m encoding the fields into JSON array like [{“metric": “foobar.count", 
"value": 1000}, {“metric": “bazqux.rate”, “value”: 10}]  and sending them out 
via HTTP to a different service 
* I’d like to keep outgoing HTTP messages relatively small (no more than a few 
dozens of fields) and for that reason I’m going to split messages with too many 
fields into several smaller ones

For that I was going to do the splitting inside encoder attached to HTTPOutput 
and calling inject_payload() several times but it seems that only the last 
message is sent out.
I’ve tried setting max_process_inject = 1000 in [hekad] config section but this 
didn’t help either.

I could try using a filter to split messages and reinject them back to Heka 
(haven’t tested yet if it is going to work in this scenario) but this looks 
like a bit overweight to me.

Any other ideas how to handle the task?


Thanks,
Timur___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] TcpOutput plugin error

2016-01-10 Thread Timur Batyrshin
Hi Emily,

Not sure what is your exact case for sending out data to Heka.
Usually I find it much more easy to use JSON or similar plain text format
to sending messages to Heka unless you have tight requirements for
throughput.

In my tests I've seen throughputs of ~1K messages/second (~10Mbit/s) on
c4.large instance on AWS using stock lua JSON decoder/encoder and HTTP
output/input.
If you are expecting smaller throughputs you should probably look into that
-- at least until you get used to Heka and to how it works.

Best regards,
Timur

On Fri, Jan 8, 2016 at 3:22 AM, Emily Gu <77.e...@gmail.com> wrote:

> This is working. Thanks!
>
> I'm confusing on the two instances parts and also others.
>
> Yes, I need to send our custom data into Heka. I want to see if I need to
> write my own custom Heka plugin or leverage existing Heka plugins. My
> custom data is a slice of metrics can send into Heka through TCP.
>
> Your suggestion is very much appreciated.
>
> Thanks,
> Emily
>
> On Thu, Jan 7, 2016 at 4:10 PM, Rob Miller  wrote:
>
>> From what I can tell (and it's not very clear), it looks like you've got
>> one Heka instance running that has only a TcpInput, nothing else. That will
>> accept data, but it's not going to do anything with that data.
>>
>> Then you've got a separate Heka config that contains no inputs, but only
>> a TcpOutput (pointing at the input that's specified in the other config)
>> and a FileOutput. These outputs might conceivably send data somewhere, but
>> there are no inputs, so it's not clear where that data would come from.
>>
>> Drop the TcpOutput altogether, and combine the TcpInput and the
>> FileOutput into a single config:
>>
>> [hekad]
>> maxprocs = 1
>> share_dir = "/Users/egu/heka/share/heka"
>>
>> [tcp_in:3242]
>> type = "TcpInput"
>> splitter = "HekaFramingSplitter"
>> decoder = "ProtobufDecoder"
>> address = ":3242"
>>
>> [tcp_heka_output_log]
>> type = "FileOutput"
>> message_matcher = "TRUE"
>> path = "/tmp/output.log"
>> perm = "664"
>> encoder = "tcp_heka_output_encoder"
>>
>> [tcp_heka_output_encoder]
>> type = "PayloadEncoder"
>> append_newlines = false
>>
>>
>> Once you've done that, you should be able to use `heka-inject` to send a
>> message into your running Heka:
>>
>> $ heka-inject -heka 127.0.0.1:3242 -payload "1212 this is just a test"
>>
>> If you want to send custom data in through that TcpInput, then you'll
>> have to switch to using a different splitter and a different decoder, the
>> default setup you're using will only know how to handle Heka protobuf
>> streams.
>>
>> -r
>>
>>
>>
>>
>> On 01/07/2016 03:48 PM, Emily Gu wrote:
>>
>>> Thanks you both Rob and David very much!
>>>
>>> Not sure where I need to define "base_dir"?
>>>
>>> I'm going to write a Heka plugin to pass our metrics data into Heka.
>>>
>>> For now, I have a hard time to see the data I send in through
>>> TCP programmatically through TcpInput in the output.log file.
>>> I don't see any output.  The configs are:
>>>
>>> tcp_input.toml
>>> 
>>>
>>> [hekad]
>>>
>>> maxprocs = 1
>>>
>>> share_dir = "/Users/egu/heka/share/heka"
>>>
>>>
>>> [tcp_in:3242]
>>>
>>> type = "TcpInput"
>>>
>>> splitter = "HekaFramingSplitter"
>>>
>>> decoder = "ProtobufDecoder"
>>>
>>> address = ":3242"
>>>
>>>
>>> tcp_output.toml
>>>
>>> ==
>>>
>>> [hekad]
>>>
>>> maxprocs = 1
>>>
>>> share_dir = "/Users/egu/heka/share/heka"
>>>
>>>
>>> [tcp_out:3242]
>>>
>>> type = "TcpOutput"
>>>
>>> message_matcher = "TRUE"
>>>
>>> address = "127.0.0.1:3242 "
>>>
>>>
>>> [tcp_heka_output_log]
>>>
>>> type = "FileOutput"
>>>
>>> message_matcher = "TRUE"
>>>
>>> path = "/tmp/output.log"
>>>
>>> perm = "664"
>>>
>>> encoder = "tcp_heka_output_encoder"
>>>
>>>
>>> [tcp_heka_output_encoder]
>>>
>>> type = "PayloadEncoder"
>>>
>>> append_newlines = false
>>>
>>>
>>> The client:
>>>
>>> package main
>>>
>>>
>>> import (
>>>
>>>  "fmt"
>>>
>>>  "github.com/mozilla-services/heka/client
>>> "
>>>
>>> )
>>>
>>>
>>>
>>> func main() {
>>>
>>>  message_bytes := []byte {100}
>>>
>>>
>>>  sender, err := client.NewNetworkSender("tcp", "127.0.0.1:3242
>>> ")
>>>
>>>  if err != nil {
>>>
>>>  fmt.Println("Could not connect to", "127.0.0.1:3242
>>> ")
>>>
>>>  return
>>>
>>>  }
>>>
>>>  fmt.Println("Connected")
>>>
>>>  var i int
>>>
>>>  for i = 0; i < 10; i++ {
>>>
>>> fmt.Println("message byte:", string(message_bytes))
>>>
>>>  err = sender.SendMessage(message_bytes)
>>>
>>>  if err != nil {
>>>
>>>  break
>>>
>>>  }
>>>
>>>  }
>>>
>>>  fmt.Println("sent", i, "messages")
>>>
>>> }
>>>
>>>
>>>
>>> Please let me know what else I need to change.
>>>
>>> Thanks,
>>>
>>> Emily
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jan 7, 2016 at 3:28 PM, David Birdsong >> 

[heka] Sending protobuf Heka messages over HTTP

2015-12-14 Thread Timur Batyrshin
Hi,

Is it possible to send Heka messages in protobuf over HTTP?

I’ve tried to do so using HTTPOutput and HTTPListenInput plugins each set with 
ProtobufEncoder/ProtobufDecoder but receiving heka only 
shows the following messages (which I assume mean that protobuf decoding has 
failed):
2015/12/14 10:00:36
:Timestamp: 1970-01-01 00:00:00 + UTC
:Type:
:Hostname:
:Pid: 0
:Uuid:
:Logger:
:Payload:
:EnvVersion:
:Severity: 7

If I change HTTPOutput+HTTPListenInput into TcpOutput/TcpInput (with no other 
config changes) all starts working fine.
I need to use HTTP here but not plain TCP.
I could use custom JSON Encoder instead of ProtobufEncoder but first I’d like 
to avoid extra conversion steps if possible.
(PayloadEncoder would not work here as it doesn’t encode Hostname and the rest 
of standard Heka fields).


Here are my configs for reference:

# Sender
[hekad]
maxprocs = 2
poolsize = 100
plugin_chansize = 30

[DashboardOutput]
ticker_interval = 5
message_matcher = "Hostname == 'ip-10-10-9-99' && (Type == 'heka.all-report' || 
Type == 'heka.sandbox-output' || Type == 'heka.sandbox-terminated')"

[http_output]
type = "HttpOutput"
address = "http://54.229.73.31:8325;
encoder = "ProtobufEncoder"
message_matcher = "TRUE"

[hekad]
maxprocs = 2
poolsize = 100
plugin_chansize = 30
[DashboardOutput]
ticker_interval = 5
message_matcher = "TRUE"
[HttpListenInput]
address = "0.0.0.0:8325"
decoder = "ProtobufDecoder"
[debug_encoder]
type="RstEncoder"

[LogOutput]
encoder = "debug_encoder"
message_matcher = "Hostname != 'ip-10-10-9-156’"

Both Hekas are 0.10.0b2 running on Ubuntu 12.04 on AWS.


Thanks,
Timur___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


[heka] measure Heka load

2015-11-16 Thread Timur Batyrshin
Hi,

Is there a way to measure Heka’s load except by parsing the output of 
DashboardPlugin in a special way?

I’ve just hit a case when one of my router Heka’s had HttpOutput with 
non-existent host which severely decreased it’s performance (to ~10-20kb/sec 
which is no more than 1k messages/sec).
I’d like to catch such cases in future as well as the cases when I need to 
think of some capacity increasing in other ways.

I see only memory statistics in the standard Heka router messages while I think 
I’d wanted to watch for load in router and inputRecycleChan.
Is there a standard way to do so?

Thanks,
Timur___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] locality in lua sandboxes

2015-11-02 Thread Timur Batyrshin
Actually I was using the function on k (just skipped it here for clarity) so 
iterating is necessary.

This could be otherwise solved by allocating empty table inside 
process_message() to later assign it to .Fields but 
it will trigger extra garbage collection as well.
Do you know how to measure how GC is affected in different cases?

Thanks,
Timur
On 2 Nov 2015 at 21:06:06, Rob Miller (rmil...@mozilla.com) wrote:

Forgot to mention, in the sample code you included, your problem will go away 
if instead of iterating through the fields returned from the grammar you just 
set the msg.Fields value every time:  


function process_message()  
local data = read_message("Payload")  
msg.Fields = grammar:match(data)  
inject_message(msg)  
return 0  
end  


-r  

On 11/02/2015 10:03 AM, Rob Miller wrote:  
> If you're not careful to zero out the values, or to explicitly set each  
> value every time, then yes, you'll end up leaking data from one  
> process_message call to the next.  
>  
> Even so, however, it's often a good idea to define the msg table outside  
> of the process_message call because then the same block of memory will  
> be reused each time. If you define the table inside of process_message,  
> then a new chunk of memory will be allocated with every call, which will  
> go out of scope when the call exits. This will cause a great deal of  
> garbage collection churn, likely impacting performance greatly.  
>  
> So, yes, you should be careful to not let values leak through, but it's  
> generally worth taking the extra care.  
>  
> -r  
>  
>  
> On 11/01/2015 06:10 AM, Timur Batyrshin wrote:  
> > Hi,  
> >  
> > In many stock decoders I see the construct like this:  
> >  
> > local msg = {  
> > Timestamp = nil,  
> > EnvVersion = nil,  
> > Hostname = nil,  
> > Type = msg_type,  
> > Payload = nil,  
> > Fields = nil,  
> > Severity = nil  
> > }  
> >  
> > function process_message()  
> >  
> >  
> > Here the local variable is defined outside of main functions.  
> >  
> > From the docs here  
> > (http://hekad.readthedocs.org/en/v0.10.0b1/sandbox/index.html#how-to-create-a-simple-sandbox-filter)
> >   
> >  
> > I understand that this variable is initialized once at Heka start and  
> > after that it is reused.  
> > This would mean that previous decodes could affect the subsequent  
> > decodes.  
> >  
> > Does this sound like a bug or I'm missing something?  
> >  
> > I'm asking about that because I'm using the similar approach in my code  
> > and I've seen leaking the old data into new messages (some non-relevant  
> > parts were skipped):  
> >  
> > local msg = {  
> > Type = msg_type,  
> > Payload = nil,  
> > Hostname = read_config("Hostname"),  
> > Fields = {},  
> > }  
> >  
> > function process_message()  
> > local data = read_message("Payload")  
> > fields = grammar:match(data)  
> > for k,v in pairs(fields) do  
> > msg.Fields[k] = v  
> > end  
> >  
> > inject_message(msg)  
> > return 0  
> > end  
> >  
> > In this case the fields set in the first message appeared in the  
> > successive message.  
> > After I've moved the local msg {} into inside of process_message() all  
> > seemed to start working fine.  
> >  
> > Why I'm writing here about that is this behaviour could be subtly  
> > affecting many other decoders in Heka.  
> >  
> >  
> > Thanks and regards,  
> > Timur  
> >  
> >  
> > ___  
> > Heka mailing list  
> > Heka@mozilla.org  
> > https://mail.mozilla.org/listinfo/heka  
> >  
>  

___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] question on output buffering

2015-11-01 Thread Timur Batyrshin
Sure, I’ll try that. This is just not so easy to test as queues grow quite slow.
Although my issue is a bit different to that described — in my case Heka works 
totally fine but doesn’t
truncate/delete the cache file on disk until restart.

Regards,
Timur

On 2 Nov 2015 at 08:59:46, Mathieu Parent (math.par...@gmail.com) wrote:

2015-10-31 18:58 GMT+01:00 Timur Batyrshin <ert...@gmail.com>:  
> Hi,  
>  
> I’m using TCPOutput to relay data to another Heka instance and I quite often  
> see that a on-disk  
> buffer grows too much.  

Rob Miller has proposed a patch at:  
https://github.com/mozilla-services/heka/issues/1738#issuecomment-150404452  

I've not had time to test it yet. Can you test it?  

> Today’s case:  
>  
> /var/cache/hekad/output_queue/relay_output# ls -l  
> total 9784788  
> -rw-r--r-- 1 root root 10019573004 Oct 31 17:37 30.log  
> -rw-r--r-- 1 root root 14 Oct 31 17:37 checkpoint.txt  
> root@r-jp-cms:/var/cache/hekad/output_queue/relay_output# cat checkpoint.txt  
> 30 10019808619  
> (the difference in numbers could have been caused by me running cat command  
> after some short delay)  
>  
> At the same time I see the metrics emitted from this host by this output as  
> I actually get alerted on them which helped me to find this.  
> After I restart Heka the file shrinks and starts to grow from the very low  
> size.  
>  
>  
> Here is a section from my config file for relay_output:  
>  
> [relay_output]  
> type = "TcpOutput"  
> address = "my.secret.domain:9123"  
> message_matcher = "TRUE"  
>  
> I have no section for buffering for this plugin (nor for any other plugin)  
> and have no other TcpOutputs.  
>  
> The docs at http://hekad.readthedocs.org/en/v0.10.0b1/buffering.html say  
> that the file should grow  
> no larger that 128Mb by default but in the above case it is already 10Gb and  
> growing.  

This may be a different problem. I had it too, but couldn't reproduce  
it currently.  

> Is it the default different to that specified in docs or do I miss something  
> else in configuration?  
>  
>  
> I’m running Heka 0.10.0b from GH releases page on Ubuntu 12.04  
>  
>  
> Thanks,  
> Timur  
>  
> ___  
> Heka mailing list  
> Heka@mozilla.org  
> https://mail.mozilla.org/listinfo/heka  
>  



--  
Mathieu  
___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


[heka] question on output buffering

2015-10-31 Thread Timur Batyrshin
Hi,

I’m using TCPOutput to relay data to another Heka instance and I quite often 
see that a on-disk
buffer grows too much.

Today’s case:

/var/cache/hekad/output_queue/relay_output# ls -l
total 9784788
-rw-r--r-- 1 root root 10019573004 Oct 31 17:37 30.log
-rw-r--r-- 1 root root          14 Oct 31 17:37 checkpoint.txt
root@r-jp-cms:/var/cache/hekad/output_queue/relay_output# cat checkpoint.txt
30 10019808619
(the difference in numbers could have been caused by me running cat command 
after some short delay)

At the same time I see the metrics emitted from this host by this output as I 
actually get alerted on them which helped me to find this.
After I restart Heka the file shrinks and starts to grow from the very low size.


Here is a section from my config file for relay_output:

[relay_output]
type = "TcpOutput"
address = "my.secret.domain:9123"
message_matcher = "TRUE"

I have no section for buffering for this plugin (nor for any other plugin) and 
have no other TcpOutputs.

The docs at http://hekad.readthedocs.org/en/v0.10.0b1/buffering.html say that 
the file should grow
no larger that 128Mb by default but in the above case it is already 10Gb and 
growing.

Is it the default different to that specified in docs or do I miss something 
else in configuration?


I’m running Heka 0.10.0b from GH releases page on Ubuntu 12.04


Thanks,
Timur
___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


[heka] bad argument #0 to 'decode_message' (must have one string argument)

2015-10-12 Thread Timur Batyrshin
Hi,

I’ve been writing a decoder for myself and have hit the following issue
which I can’t understand.

When I start Heka it produces the following error message in logs:

2015/10/12 13:54:47 SubDecoder
‘zerogw-zerogw_decoder-stdout-zerogw_rotate_fields’ error: FATAL:
process_message() /usr/share/heka/lua_decoders/rotate_fields.lua:30: bad
argument #0 to ‘decode_message’ (must have one string argument)

At the same time the code for decoder is the following:

-- the only lines above are comments which are skipped
metric_field = read_config("metric_field") or "metric"
value_field = read_config("value_field") or "value"

function process_message()
  local fields = {}

  raw = read_message("raw”)   # line 29
  msg = decode_message(raw)   # line 30

-- other part of code is probably irrelevant as crash is seen in the above line

(I’ve tried writing that as decode_message(read_message("raw”)) with the
same effect)

What’s really weird is the exactly the same decoder works fine on other
hosts.

I’m using the following Heka config:

[zerogw]
type = "ProcessInput"
ticker_interval = 0
splitter = "on_newline"
decoder = "zerogw_decoder"
stdout = true
stderr = false

[zerogw.command.0]
bin = "/usr/local/bin/zerogw_collector.py"
args = ["-s", "tcp://127.0.0.1:5111"]

[on_newline]
type = "TokenSplitter"
delimiter = "\n"

[estp_decoder]
type = "PayloadRegexDecoder"
match_regex = '^(?P[^\s]+) (?P\d+) (?P\d+)'
timestamp_layout = "Epoch"

[estp_decoder.message_fields]
Service = "Zerogw"
Metric = "%Name%"
Value = "%Value%"

[zerogw_decoder]
type = "MultiDecoder"
subs = ["estp_decoder", "zerogw_rotate_fields"]
cascade_strategy = "all"

[zerogw_rotate_fields]
type = "SandboxDecoder"
filename = "lua_decoders/rotate_fields.lua"

[zerogw_rotate_fields.config]
metric_field = "Metric"
value_field = "Value"

zerogw_collector.py produces about a dozen of lines to stdout every 5
seconds in the format as seen in message payload (see below).

As MultiDecoder has cascade_strategy = "all" Heka dumps messages processed
by the first decoder in the chain to stdout which are the following:

2015/10/12 14:04:22
:Timestamp: 2015-10-12 14:04:22 + UTC
:Type: ProcessInput
:Hostname: t-eu-zgw
:Pid: 5212
:Uuid: 2c7deb23-7961-49dc-8f57-da716d851439
:Logger: zerogw
:Payload: zerogw.connections.total 1444658662 4

:EnvVersion:
:Severity: 7
:Fields:
| name:"ProcessInputName" type:string value:"zerogw.stdout"
| name:"ExitStatus" type:integer value:0
| name:"Value" type:string value:"4"
| name:"Service" type:string value:"Zerogw"
| name:"Metric" type:string value:"zerogw.connections.total"

In plain Lua I’d dump the result of read_message("raw") to stdout, add some
prints everywhere and see what happens inside but don’t know how to do.

Any clues on how I should debug such cases?

Thanks,
Timur
​
___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


[heka] a question on heka message format

2015-10-07 Thread Timur Batyrshin
Hi,

I’ve got a question about decoding of Heka messages.
Suppose the following call:
msg = decode_message(read_message("raw"))

Here msg.Fields will hold a table of values like the following:
{
 “name”: “foobar”,
 “type”: “string”,
 “value”: 123
}

What is very much unclear to me is why some of the fields here are produced not 
as plain values but as a table holding a single value, like
{
 “name”: “foobar”,
 “value”: [123]
}

This way I need every time to check for if the value is an array or not, an 
example from yourselves: 
https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/lua/modules/ts_line_protocol.lua#L161-L162


When does this condition occur?

Thanks,
Timur___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] Grok based parsing

2015-09-24 Thread Timur Batyrshin
Hi Andre,

Have you checked LPEG? It is really cool too.
Here is a tutorial for it: http://lua-users.org/wiki/LpegTutorial
Some real usage example in Heka can be used at 
https://github.com/mozilla-services/heka/tree/dev/sandbox/lua/decoders
(for example 
https://github.com/mozilla-services/heka/blob/dev/sandbox/lua/decoders/linux_loadavg.lua#L48-L58
 which is failrly easy to
understand even without reading docs on LPEG).
There is also an online LPEG testing tool as well with a few examples to play 
with it easily: http://lpeg.trink.com/

Regards
Timur

On 24 Sep 2015 at 13:38:54, Andre (andre-li...@fucs.org) wrote:

Hi there,  

Grok is perhaps one of the coolest features of logstash (once you get  
used to debug it...)  

Do you think heka should have similar capability?  

If yes, what would you reckon, Go or Lua?*  


Cheers  


* - I could find pre existing implementations of Grok in both languages  
___  
Heka mailing list  
Heka@mozilla.org  
https://mail.mozilla.org/listinfo/heka  
___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


[heka] splitting multiline logs without delays

2015-09-20 Thread Timur Batyrshin
Hi,

I’m splitting logs which partially consist of multiline messages (like stack 
traces).
For this I’m using RegexSplitter with delimiter like “\n([^\s])” as all 
stacktraces start with space.
The problem with this approach is there is a lag of 1 log message (because you 
can’t match the delimiter until
you receive the next line) which is quite noticeable on logs with rarely 
incoming messages.

Any ideas on how to deal with this?

The only solution that comes to my mind is to create a custom splitter with 
timeout.
This way slow logs will be processed with no lag (but with delay of “timeout”) 
and fast logs will be processed as usual.

May it be I’m missing a simpler solution?


Thanks,
Timur___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] unit testing lua sandboxes

2015-09-10 Thread Timur Batyrshin
Hi Rob,

Thanks, I’ll take a look!

What is the supposed way to run specs?
Is it “go test” as described in gospec’s docs or should they be run in some 
other way?


Thanks,
Timur

On 8 Sep 2015 at 20:09:47, Rob Miller (rmil...@mozilla.com) wrote:

You can see examples of sandbox plugin unit tests we've written here:  

- 
https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_decoders_test.go
  
- 
https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_encoders_test.go
  
- 
https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_filters_test.go
  

The first tests in each of those files are exercising the core plugin 
functionality, but if you scroll down you'll find tests for specific plugin 
implementations, e.g.:  

- 
https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_decoders_test.go#L301
  
- 
https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_decoders_test.go#L373
  

Hope this helps,  

-r  


On 09/08/2015 06:53 AM, Timur Batyrshin wrote:  
> Hi,  
>  
> Is there a recommended way to write unit tests for lua sandboxes?  
> I think I haven’t seen any references to that in docs and google told me  
> nothing about that too.  
>  
>  
> Thanks,  
> Timur  
>  
>  
>  
> ___  
> Heka mailing list  
> Heka@mozilla.org  
> https://mail.mozilla.org/listinfo/heka  
>  

___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] unit testing lua sandboxes

2015-09-10 Thread Timur Batyrshin
Thanks!


On 10 Sep 2015 at 19:36:38, Rob Miller (rmil...@mozilla.com) wrote:

Once the GOPATH environment is activated (by sourcing `env.sh`, possibly via 
`build.sh`) then `go test` will work as normal to run the tests from an 
individual go package. To run the tests for all of the Heka-related packages 
(again, only after the GOPATH is correctly set), you can use `make test` from 
within the build folder.  

-r  


On 09/10/2015 07:35 AM, Timur Batyrshin wrote:  
> Hi Rob,  
>  
> Thanks, I’ll take a look!  
>  
> What is the supposed way to run specs?  
> Is it “go test” as described in gospec’s docs or should they be run in  
> some other way?  
>  
>  
> Thanks,  
> Timur  
>  
> On 8 Sep 2015 at 20:09:47, Rob Miller (rmil...@mozilla.com  
> <mailto:rmil...@mozilla.com>) wrote:  
>  
> > You can see examples of sandbox plugin unit tests we've written here:  
> >  
> > -  
> > https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_decoders_test.go
> >   
> >  
> > -  
> > https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_encoders_test.go
> >   
> >  
> > -  
> > https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_filters_test.go
> >   
> >  
> >  
> > The first tests in each of those files are exercising the core plugin  
> > functionality, but if you scroll down you'll find tests for specific  
> > plugin implementations, e.g.:  
> >  
> > -  
> > https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_decoders_test.go#L301
> >   
> >  
> > -  
> > https://github.com/mozilla-services/heka/blob/versions/0.10/sandbox/plugins/sandbox_decoders_test.go#L373
> >   
> >  
> >  
> > Hope this helps,  
> >  
> > -r  
> >  
> >  
> > On 09/08/2015 06:53 AM, Timur Batyrshin wrote:  
> > > Hi,  
> > >  
> > > Is there a recommended way to write unit tests for lua sandboxes?  
> > > I think I haven’t seen any references to that in docs and google told me  
> > > nothing about that too.  
> > >  
> > >  
> > > Thanks,  
> > > Timur  
> > >  
> > >  
> > >  
> > > ___  
> > > Heka mailing list  
> > > Heka@mozilla.org  
> > > https://mail.mozilla.org/listinfo/heka  
> > >  
> >  

___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


Re: [heka] Regarding Hekad with InfluxDB v0.9

2015-08-02 Thread Timur Batyrshin
Hi Saravanakumar,

I think Heka 0.9.2 works only with InfluxDB 0.8 
If you need to send metrics to InfluxDB 0.9 try to use Heka 0.10beta.

Regards, 
Timur


 On 02 Aug 2015 04:54:18, Saravanakumar S sarsi...@outlook.com wrote: 
 Dear Mozilla Team,
 
 Can you help me to get solution for the below question:
 
 I installed InfluxDB v0.9 and Hekad tool in same machine, i need help on 
 hekad configuration, where it can send output to InfluxDB v0.9.
 
 Here is the steps, done so for using influx CLI
 (1) CREATE DATABASE mydb
 (2) USE mydb
 (3) CREATE USER admin WITH PASSWORD 'admin' WITH ALL PRIVILEGES
 
 Here is the mine hekad configuration(/etc/hekad.toml) 
 [LogstreamerInput]
 log_directory = /var/log 
 file_match = 'auth.log'
 
 [influxdb]
 type = SandboxEncoder
 filename = lua_encoders/schema_influx.lua
 [influxdb.config]
 series = heka.%{Logger}
 skip_fields = Pid EnvVersion
 
 [InfluxOutput]
 message_matcher = Type == 'influxdb'
 encoder = influxdb
 type = HttpOutput
 address = http://localhost:8086/db/mydb/series;
 username = admin
 password = admin
 
 [LogOutput] 
 message_matcher = TRUE 
 encoder = influxdb
 
 Problem i am facing is:
 If i inputted echo hello  /var/log/auth.log and i can able see the logs 
 hekad standard output, but not able see the same logs in InfluxDBv0.9.
 I googled, the below configuration in /etc/hekad.toml is w.r.to InfluxDBv0.8 
 (not for InfluxDBv0.9). I tried with the URI mentioned in InfluxDBV0.9 here 
 in address, not working. Can you please provide me the right hekad.toml 
 configuration for InfluxDBv0.9
 address = http://localhost:8086/db/mydb/series;
 
 Thanks,
 Saravanakumar S,
 Chennai, India
 ___
 Heka mailing list
 Heka@mozilla.org
 https://mail.mozilla.org/listinfo/heka
___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka


[heka] dependencies/links between different hosts

2015-07-09 Thread Timur Batyrshin
Hi all,

I've just started to learn Heka and it looks really great.
I have one question which doesn't look clear to me yet.

How do we make dependencies/links between hosts?
I need to do some calculations with different metrics (find delta, compare,
etc) of both hostA and hostB at the same time.
For example how do we compare number of messages passed through balancer to
a sum of messages received by all backends (to route that to alert if delta
is high)?

Looks like we need to do that using filters and looks like I need to write
one for my needs but how should I deal with that?
Can Lua sandboxes exchange data or do I have to keep several buffers inside
my filter implementation and route different metrics to different buffers
inside Lua or is there another good practice of doing that?

I'm just scratching surface now of how things work so general direction to
which to look without going too much into details will be perfectly fine
for me.

Thanks,
Timur
___
Heka mailing list
Heka@mozilla.org
https://mail.mozilla.org/listinfo/heka