Re: Merge ListenSyslog events

Bryan Bende Thu, 14 Jan 2016 09:48:04 -0800

Makes sense about not wanting change the logging configurations.

Thanks for taking the time to capture that issue in JIRA. I would say you
are already following the process by discussing the change with the
community and putting in a very descriptive JIRA :)


On a side note, I've been working on NIFI-1273 for the past week [1] and as
part of the ticket I've refactored some of the internals of ListenSyslog
and moved a lot of the inner classes to their own regular classes. While
doing this I was also considering your point about pattern matching for the
end of messages, and I tried to create an extension point that would let us
support different message delimiters in the future. It may not be perfect,
but I think it will make it slightly easier to make some of the changes you
are looking for.

It will probably take a few more days before I can get a pull request
submitted, but just wanted to point this out so we can coordinate.

Thanks,

Bryan

[1] https://issues.apache.org/jira/browse/NIFI-1273

On Thu, Jan 14, 2016 at 12:27 PM, Louis-Étienne Dorval <[email protected]>
wrote:

> Hi,
>
> Thanks for the reply Bryan.
>
> I'd rather not update the logback/log4j because the service is already in
> place and for now I just try to fit around the current system. Anyway
> according to the RFC, a syslog message must not be longer than 1024 bytes
> so a single "event" might be splitted anyway.
>
> I've create NIFI-1392 for that feature. I'm not sure of the process for a
> feature request but I'll try to find some times to create a pull request or
> a patch for this.
>
>
> Best regards,
> Louis-Etienne
>
> On 8 January 2016 at 12:15, Bryan Bende <[email protected]> wrote:
>
>> Hello,
>>
>> Glad to hear you are getting started using ListenSyslog!
>>
>> You are definitely running into something that we should consider
>> supporting. The current implementation treats each new-line as the message
>> delimiter and places each message on to a queue.
>>
>> When the processor is triggered, it grabs messages from the queue up to
>> the "Max Batch Size". So in the default case it grabs a single message from
>> the queue, which in your case is a single line
>> from one of the mult-line messages, and produces a FlowFile. When "Max
>> Batch Size" is set higher to say 100, it grabs up to 100 messages and
>> produces a FlowFile containing all 100 messages.
>>
>> The messages in the queue are simultaneously coming from all of the
>> incoming connections, so this is why you don't see all the lines from one
>> server in the same order. Imagine the queue having something like:
>>
>> java-server-1 message1 line1
>> java-server-2 message1 line1
>> java-server-1 message1 line2
>> java-server-3 message1 line1
>> java-server-2 message1 line2
>> ....
>>
>> I would need to dig into that splunk documentation a little more, but I
>> think you are right that we could possibly expose some kind of message
>> delimiter pattern on the processor which
>> would be applied when reading the messages, before they even make into
>> the queue, so that by the time it gets put in the queue it would be all of
>> the lines from one message.
>>
>> Given the current situation, there might be one other option for you. Are
>> you able to control/change the logback/log4j configuration for the servers
>> sending the logs?
>>
>> If so, a JSON layout might solve the problem. These configuration files
>> show how to do that:
>>
>> https://github.com/bbende/jsonevent-producer/tree/master/src/main/resources
>>
>> I know this worked well with the ListenUDP processor to ensure that an
>> entire stack trace was sent as a single JSON document, but I have not had a
>> chance to try it with ListenSyslog and the SyslogAppender.
>> If you are using ListenSyslog with TCP, then it will probably come down
>> to whether logback/log4j puts new-lines inside the JSON document, or only a
>> single new-line at the end.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 11:36 AM, Louis-Étienne Dorval <[email protected]
>> > wrote:
>>
>>> Hi everyone!
>>>
>>> I'm looking to use the new ListenSyslog processor in a proof-of-concept
>>> [project but I encounter a problem that I can find a suitable solution
>>> (yet!).
>>> I'm receiving logs from multiple Java-based server using a logback/log4j
>>> SyslogAppender. The messages are received successfully but when a stack
>>> trace happens, each lines are broken into single FlowFile.
>>>
>>> I'm trying to achieve something like the following:
>>> http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents
>>>
>>> I tried:
>>> - Increasing the "Max Batch Size", but I end up merging lines that
>>> should not be merge and there's no way to know then length of the stack
>>> trace...
>>> - Use MergeContent using the host as "Correlation Attribute Name", but
>>> as before I merge lines that should not be merge
>>> - Use MergeContent followed by SplitContent, that might work but the
>>> SplitContent is pretty restrictive and I can't find a "Byte Sequence" that
>>> are different from stack trace.
>>>
>>> Even if I find a magic "Byte Sequence" for my last try (MergeContent +
>>> SplitContent), I would most probably lose a part of the stacktrace as the
>>> MergeContent is limited by the "Max Batch Size"
>>>
>>>
>>> The only solution that I see is to modify the ListenSyslog to add some
>>> similar parameter as the Splunk documentation explains and use that rather
>>> than a fixed "Max Batch Size".
>>>
>>> Am I missing a another option?
>>> Would that be a suitable feature? (maybe I should ask that question in
>>> the dev mailing list)
>>>
>>> Best regards!
>>>
>>
>>
>

Re: Merge ListenSyslog events

Reply via email to