Clay,

You can only parse when its 1 message per flow file because parsing
adds all the field/value pairs as flow file attributes, which wouldn't
really make sense when you have say 1k messages with all different
values for those fields.

-Bryan

On Mon, Aug 5, 2019 at 11:25 AM Clay Teahouse <[email protected]> wrote:
>
> Hi Edward, Bryan
> One more question regarding ListenSyslog. Is it possible to set batch size > 
> 1 with parse set to true? I am ingesting a very high volume of syslog records 
> and want to avoid flowfiles containing only one record but at the same time, 
> I want to be able to parse the records. Is there a way around this?
>
> thanks
> Clay
>
> On Fri, Aug 2, 2019 at 8:50 AM Edward Armes <[email protected]> wrote:
>>
>> HI Clay,
>>
>> So as Bryan has said the actual connection is managed by a selector and all 
>> this does is goes through each connection and once that connection has data 
>> to receive it the selector then hands that over to a thread in the TCP 
>> receiving thread pool which does then some basic TCP processing and puts it 
>> into a buffer for an instance of associated ListenSyslog processor to 
>> processes, when the framework executes an instance of that processor.
>>
>> Just so you're aware while setting the maximum number of connections does 
>> create a thread pool of 4,000 threads. In reality these threads don't really 
>> exist until one is created by the selector to run on the pool. So in short 
>> unless a single Nifi server gets 4,000 syslog messages in a very short space 
>> time (< 1 micro-second) I can't see it being an issue.
>>
>> Edward
>>
>> On Fri, Aug 2, 2019 at 2:06 PM Bryan Bende <[email protected]> wrote:
>>>
>>> The actual connections themselves are managed with a selector, so if
>>> all the connections are idle there should only be one thread for the
>>> socket.
>>>
>>> As soon as a connection has something available to read then a thread
>>> is spawned to start reading the connection until either no matter is
>>> available, or it is closed.
>>>
>>> On Fri, Aug 2, 2019 at 7:18 AM Clay Teahouse <[email protected]> wrote:
>>> >
>>> > Hello Edward,
>>> > So, if have of to listen to 32,000 tcp connections and I have only 80 
>>> > cores, and I configure each ListenSyslog instance for 4,000 connections, 
>>> > doesn't each spawn 4,000 threads behind the scene? The tcp connections 
>>> > will be idle most of the time.
>>> >
>>> > thanks
>>> > Clay
>>> >
>>> >
>>> > On Fri, Aug 2, 2019 at 6:10 AM Edward Armes <[email protected]> 
>>> > wrote:
>>> >>
>>> >> Hi Clay,
>>> >>
>>> >> Because Nifi underneath uses a thread pool for it's own threading 
>>> >> underneath, and each instance processor runs does so in it's own thread, 
>>> >> I don't see any reason why not. One thing to note that the way the 
>>> >> ListenTCP processor appears to have been written such that it gets all 
>>> >> the requests that have been received on that socket and processes them 
>>> >> until either it has no more requests left or process or that instance of 
>>> >> the processor is no longer scheduled to run.
>>> >>
>>> >> Hope that helps
>>> >>
>>> >> Edward
>>> >>
>>> >> On Fri, Aug 2, 2019 at 11:28 AM Clay Teahouse <[email protected]> 
>>> >> wrote:
>>> >>>
>>> >>> Hello All,
>>> >>>
>>> >>> I need to listen to and process thousands of persistent TCP 
>>> >>> connections. I have 10 nodes, each having 8 cores.
>>> >>> My understanding is that with existing NiFi listening processors, such 
>>> >>> as ListnSyslog, a thread is utilized for each TCP connection. Does this 
>>> >>> scale? Do I need to write a custom processor that utilizes a thread 
>>> >>> pool for reading the data from the socket and processing them?
>>> >>>
>>> >>> thanks
>>> >>> Clay

Reply via email to