Ok makes sense, there are basically two options to make it efficient...
A) You can use ListenSyslog with batching, followed by ValidateRecord
with one of the syslog record readers [1][2].
B) You can use ListenTCPRecord with a syslog record reader.
A will probably work better for a larger number of TCP connections, B
would work better for a smaller number of connections.
One challenge with both of them is that there isn't a syslog record
writer, so you would probably have to use the
FreeFormTextRecordSetWriter with some expression that rewrites the
message using the record fields, like "${hostname} ${body}" if you
wanted to rewrite each message with the hostname and body.
[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.9.2/org.apache.nifi.syslog.SyslogReader/index.html
[2]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.9.2/org.apache.nifi.syslog.Syslog5424Reader/index.html
[3]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.9.2/org.apache.nifi.text.FreeFormTextRecordSetWriter/index.html
On Tue, Aug 6, 2019 at 10:08 AM Clay Teahouse <[email protected]> wrote:
>
> Hello Bryan,
>
> I am ingesting millions of syslog records from various data sources. I need
> to make sure the format is valid and then prefix each message with the host
> name (from syslog header) and some other meta data and push the records to
> various consumers.
>
> thanks
> Clay
>
> On Tue, Aug 6, 2019 at 6:26 AM Bryan Bende <[email protected]> wrote:
>>
>> Can you describe what you want to do with each message?
>>
>> Right now I’m not following why you need to parse them.
>>
>> On Tue, Aug 6, 2019 at 6:40 AM Clay Teahouse <[email protected]> wrote:
>>>
>>> Bryan,
>>> Understood, but wouldn't then this processor be inefficient if you are
>>> dealing with a very large number of syslog messages, if you don't have the
>>> batching option? I suppose we could have had the option of parsing each
>>> syslog record in a batch and then writing the syslog message along with the
>>> syslog headers to the flowfile content.
>>> thanks
>>> Clay
>>>
>>> On Mon, Aug 5, 2019 at 12:12 PM Bryan Bende <[email protected]> wrote:
>>>>
>>>> Clay,
>>>>
>>>> You can only parse when its 1 message per flow file because parsing
>>>> adds all the field/value pairs as flow file attributes, which wouldn't
>>>> really make sense when you have say 1k messages with all different
>>>> values for those fields.
>>>>
>>>> -Bryan
>>>>
>>>> On Mon, Aug 5, 2019 at 11:25 AM Clay Teahouse <[email protected]>
>>>> wrote:
>>>> >
>>>> > Hi Edward, Bryan
>>>> > One more question regarding ListenSyslog. Is it possible to set batch
>>>> > size > 1 with parse set to true? I am ingesting a very high volume of
>>>> > syslog records and want to avoid flowfiles containing only one record
>>>> > but at the same time, I want to be able to parse the records. Is there a
>>>> > way around this?
>>>> >
>>>> > thanks
>>>> > Clay
>>>> >
>>>> > On Fri, Aug 2, 2019 at 8:50 AM Edward Armes <[email protected]>
>>>> > wrote:
>>>> >>
>>>> >> HI Clay,
>>>> >>
>>>> >> So as Bryan has said the actual connection is managed by a selector and
>>>> >> all this does is goes through each connection and once that connection
>>>> >> has data to receive it the selector then hands that over to a thread in
>>>> >> the TCP receiving thread pool which does then some basic TCP processing
>>>> >> and puts it into a buffer for an instance of associated ListenSyslog
>>>> >> processor to processes, when the framework executes an instance of that
>>>> >> processor.
>>>> >>
>>>> >> Just so you're aware while setting the maximum number of connections
>>>> >> does create a thread pool of 4,000 threads. In reality these threads
>>>> >> don't really exist until one is created by the selector to run on the
>>>> >> pool. So in short unless a single Nifi server gets 4,000 syslog
>>>> >> messages in a very short space time (< 1 micro-second) I can't see it
>>>> >> being an issue.
>>>> >>
>>>> >> Edward
>>>> >>
>>>> >> On Fri, Aug 2, 2019 at 2:06 PM Bryan Bende <[email protected]> wrote:
>>>> >>>
>>>> >>> The actual connections themselves are managed with a selector, so if
>>>> >>> all the connections are idle there should only be one thread for the
>>>> >>> socket.
>>>> >>>
>>>> >>> As soon as a connection has something available to read then a thread
>>>> >>> is spawned to start reading the connection until either no matter is
>>>> >>> available, or it is closed.
>>>> >>>
>>>> >>> On Fri, Aug 2, 2019 at 7:18 AM Clay Teahouse <[email protected]>
>>>> >>> wrote:
>>>> >>> >
>>>> >>> > Hello Edward,
>>>> >>> > So, if have of to listen to 32,000 tcp connections and I have only
>>>> >>> > 80 cores, and I configure each ListenSyslog instance for 4,000
>>>> >>> > connections, doesn't each spawn 4,000 threads behind the scene? The
>>>> >>> > tcp connections will be idle most of the time.
>>>> >>> >
>>>> >>> > thanks
>>>> >>> > Clay
>>>> >>> >
>>>> >>> >
>>>> >>> > On Fri, Aug 2, 2019 at 6:10 AM Edward Armes <[email protected]>
>>>> >>> > wrote:
>>>> >>> >>
>>>> >>> >> Hi Clay,
>>>> >>> >>
>>>> >>> >> Because Nifi underneath uses a thread pool for it's own threading
>>>> >>> >> underneath, and each instance processor runs does so in it's own
>>>> >>> >> thread, I don't see any reason why not. One thing to note that the
>>>> >>> >> way the ListenTCP processor appears to have been written such that
>>>> >>> >> it gets all the requests that have been received on that socket and
>>>> >>> >> processes them until either it has no more requests left or process
>>>> >>> >> or that instance of the processor is no longer scheduled to run.
>>>> >>> >>
>>>> >>> >> Hope that helps
>>>> >>> >>
>>>> >>> >> Edward
>>>> >>> >>
>>>> >>> >> On Fri, Aug 2, 2019 at 11:28 AM Clay Teahouse
>>>> >>> >> <[email protected]> wrote:
>>>> >>> >>>
>>>> >>> >>> Hello All,
>>>> >>> >>>
>>>> >>> >>> I need to listen to and process thousands of persistent TCP
>>>> >>> >>> connections. I have 10 nodes, each having 8 cores.
>>>> >>> >>> My understanding is that with existing NiFi listening processors,
>>>> >>> >>> such as ListnSyslog, a thread is utilized for each TCP connection.
>>>> >>> >>> Does this scale? Do I need to write a custom processor that
>>>> >>> >>> utilizes a thread pool for reading the data from the socket and
>>>> >>> >>> processing them?
>>>> >>> >>>
>>>> >>> >>> thanks
>>>> >>> >>> Clay
>>
>> --
>> Sent from Gmail Mobile