Makes sense about not wanting change the logging configurations. Thanks for taking the time to capture that issue in JIRA. I would say you are already following the process by discussing the change with the community and putting in a very descriptive JIRA :)
On a side note, I've been working on NIFI-1273 for the past week [1] and as part of the ticket I've refactored some of the internals of ListenSyslog and moved a lot of the inner classes to their own regular classes. While doing this I was also considering your point about pattern matching for the end of messages, and I tried to create an extension point that would let us support different message delimiters in the future. It may not be perfect, but I think it will make it slightly easier to make some of the changes you are looking for. It will probably take a few more days before I can get a pull request submitted, but just wanted to point this out so we can coordinate. Thanks, Bryan [1] https://issues.apache.org/jira/browse/NIFI-1273 On Thu, Jan 14, 2016 at 12:27 PM, Louis-Étienne Dorval <[email protected]> wrote: > Hi, > > Thanks for the reply Bryan. > > I'd rather not update the logback/log4j because the service is already in > place and for now I just try to fit around the current system. Anyway > according to the RFC, a syslog message must not be longer than 1024 bytes > so a single "event" might be splitted anyway. > > I've create NIFI-1392 for that feature. I'm not sure of the process for a > feature request but I'll try to find some times to create a pull request or > a patch for this. > > > Best regards, > Louis-Etienne > > On 8 January 2016 at 12:15, Bryan Bende <[email protected]> wrote: > >> Hello, >> >> Glad to hear you are getting started using ListenSyslog! >> >> You are definitely running into something that we should consider >> supporting. The current implementation treats each new-line as the message >> delimiter and places each message on to a queue. >> >> When the processor is triggered, it grabs messages from the queue up to >> the "Max Batch Size". So in the default case it grabs a single message from >> the queue, which in your case is a single line >> from one of the mult-line messages, and produces a FlowFile. When "Max >> Batch Size" is set higher to say 100, it grabs up to 100 messages and >> produces a FlowFile containing all 100 messages. >> >> The messages in the queue are simultaneously coming from all of the >> incoming connections, so this is why you don't see all the lines from one >> server in the same order. Imagine the queue having something like: >> >> java-server-1 message1 line1 >> java-server-2 message1 line1 >> java-server-1 message1 line2 >> java-server-3 message1 line1 >> java-server-2 message1 line2 >> .... >> >> I would need to dig into that splunk documentation a little more, but I >> think you are right that we could possibly expose some kind of message >> delimiter pattern on the processor which >> would be applied when reading the messages, before they even make into >> the queue, so that by the time it gets put in the queue it would be all of >> the lines from one message. >> >> Given the current situation, there might be one other option for you. Are >> you able to control/change the logback/log4j configuration for the servers >> sending the logs? >> >> If so, a JSON layout might solve the problem. These configuration files >> show how to do that: >> >> https://github.com/bbende/jsonevent-producer/tree/master/src/main/resources >> >> I know this worked well with the ListenUDP processor to ensure that an >> entire stack trace was sent as a single JSON document, but I have not had a >> chance to try it with ListenSyslog and the SyslogAppender. >> If you are using ListenSyslog with TCP, then it will probably come down >> to whether logback/log4j puts new-lines inside the JSON document, or only a >> single new-line at the end. >> >> -Bryan >> >> >> On Fri, Jan 8, 2016 at 11:36 AM, Louis-Étienne Dorval <[email protected] >> > wrote: >> >>> Hi everyone! >>> >>> I'm looking to use the new ListenSyslog processor in a proof-of-concept >>> [project but I encounter a problem that I can find a suitable solution >>> (yet!). >>> I'm receiving logs from multiple Java-based server using a logback/log4j >>> SyslogAppender. The messages are received successfully but when a stack >>> trace happens, each lines are broken into single FlowFile. >>> >>> I'm trying to achieve something like the following: >>> http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents >>> >>> I tried: >>> - Increasing the "Max Batch Size", but I end up merging lines that >>> should not be merge and there's no way to know then length of the stack >>> trace... >>> - Use MergeContent using the host as "Correlation Attribute Name", but >>> as before I merge lines that should not be merge >>> - Use MergeContent followed by SplitContent, that might work but the >>> SplitContent is pretty restrictive and I can't find a "Byte Sequence" that >>> are different from stack trace. >>> >>> Even if I find a magic "Byte Sequence" for my last try (MergeContent + >>> SplitContent), I would most probably lose a part of the stacktrace as the >>> MergeContent is limited by the "Max Batch Size" >>> >>> >>> The only solution that I see is to modify the ListenSyslog to add some >>> similar parameter as the Splunk documentation explains and use that rather >>> than a fixed "Max Batch Size". >>> >>> Am I missing a another option? >>> Would that be a suitable feature? (maybe I should ask that question in >>> the dev mailing list) >>> >>> Best regards! >>> >> >> >
