I was looking into doing this in a customer version of ListenTCP, but I can try 
changing this to use a property.

I am a developer and have been working in the industry for a 36+ years, I have 
just never used Java or any of the tools.  I’m currently trying to use the 
JetBrains IntelliJ IDE to build NiFi and modify the ListenTCP processor (and to 
have a look under the covers.)  There are several other custom processors we 
would like to create so I will be learning this eventually.

Are there any good resources to get started building custom processors?  I’m 
already looking at Horton works for some help.


Raymond Rogers
Senior Embedded Software Engineer

15301 N. Dallas Pkwy Suite 500
Dallas, TX 75001
D: +1 972 744 3928
rmgnetworks.com<http://www.rmgnetworks.com>

[cid:RMG_Logo_EmailSig_a7107ed9-9d13-42cc-b797-b75f7cb2a204.jpg]

From: Bryan Bende [mailto:[email protected]]
Sent: Wednesday, January 25, 2017 5:13 PM
To: [email protected]
Subject: Re: ListenTCP to receive CSV stream.

Regarding what Joe pointed out, the existing property for "Batching Message 
Delimiter" is the delimiter between messages when written to a flow file, aka 
the outbound delimiter. The delimiter when reading off the channel is 
hard-coded here and here:

https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/StandardSocketChannelHandler.java#L155
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/SSLSocketChannelHandler.java#L150

I'm not really sure why it would be breaking up a 50-character line across 
multiple flow files... are there definitely no '\n' characters within those 50 
characters?



On Wed, Jan 25, 2017 at 6:00 PM, Raymond Rogers 
<[email protected]<mailto:[email protected]>> wrote:
Brain,
I looked like I should be getting a single line or complete line messages in 
the flow-files, which under a light load I do get.  When I increase the message 
rate to what the production world would look like I am seeing lines get chopped 
into little pieces (one 50-character line may end up in 3-4 flow-files.)


Raymond Rogers
Senior Embedded Software Engineer

15301 N. Dallas Pkwy Suite 500
Dallas, TX 75001
D: +1 972 744 3928<tel:(972)%20744-3928>
rmgnetworks.com<http://www.rmgnetworks.com>

[cid:[email protected]]
From: Bryan Bende [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, January 25, 2017 4:48 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: ListenTCP to receive CSV stream.

Raymond,

Currently ListenTCP uses new line characters to determine logical message 
boundaries, and coming out of the processor you can either have 1 logical 
message per flow file, or batch together a configurable number of logical 
messages into 1 flow file which would be more performant.

In your case it sounds like you would want to read data until seeing the "end 
of data" marker and treat the whole CSV as one logical message. There is a JIRA 
to add this capability: https://issues.apache.org/jira/browse/NIFI-1985

I think the best you can do currently is to us a MergeContent processor 
somewhere after ListenTCP to merge together the individual lines from the CSV, 
but since there is not other information available to tell it how many total 
lines there are, it can't guarantee that they are all merged together in one 
flow file. You might be able to make some assumptions about the timing and size 
of the data and configure MergeContent in such a way that it should usually get 
you the whole CSV as one file.

Hope this helps.

-Bryan

On Wed, Jan 25, 2017 at 5:18 PM, Raymond Rogers 
<[email protected]<mailto:[email protected]>> wrote:
I'm still new to NiFi and I'm trying to receive text stream containing a CSV 
file of an unknown length (anything from ~100 bytes to almost 300 KB) over a 
TCP socket.  The CSV does have an "end of data" marker that I can look for but 
I am unsure of how to accumulate the text until I receive the marker and create 
a flow-file that contains all of the data up to that point.

The data is being sent from an application that cannot changed to use a 
different format.

Any suggestions?


Raymond Rogers
Senior Embedded Software Engineer

15301 N. Dallas Pkwy Suite 500
Dallas, TX 75001
D: +1 972 744 3928<tel:(972)%20744-3928>
rmgnetworks.com<http://www.rmgnetworks.com>

[cid:[email protected]]

Notice of Confidentiality: This transmission contains information that may be 
confidential and that may also be privileged. Unless you are the intended 
recipient of the message (or authorized to receive it for the intended 
recipient) you may not copy, forward, or otherwise use it, or disclose its 
contents to anyone else. If you have received this transmission in error, 
please notify us immediately and delete it from your system.


Reply via email to