I was looking into doing this in a customer version of ListenTCP, but I can try changing this to use a property.
I am a developer and have been working in the industry for a 36+ years, I have just never used Java or any of the tools. I’m currently trying to use the JetBrains IntelliJ IDE to build NiFi and modify the ListenTCP processor (and to have a look under the covers.) There are several other custom processors we would like to create so I will be learning this eventually. Are there any good resources to get started building custom processors? I’m already looking at Horton works for some help. Raymond Rogers Senior Embedded Software Engineer 15301 N. Dallas Pkwy Suite 500 Dallas, TX 75001 D: +1 972 744 3928 rmgnetworks.com<http://www.rmgnetworks.com> [cid:RMG_Logo_EmailSig_a7107ed9-9d13-42cc-b797-b75f7cb2a204.jpg] From: Bryan Bende [mailto:[email protected]] Sent: Wednesday, January 25, 2017 5:13 PM To: [email protected] Subject: Re: ListenTCP to receive CSV stream. Regarding what Joe pointed out, the existing property for "Batching Message Delimiter" is the delimiter between messages when written to a flow file, aka the outbound delimiter. The delimiter when reading off the channel is hard-coded here and here: https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/StandardSocketChannelHandler.java#L155 https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/SSLSocketChannelHandler.java#L150 I'm not really sure why it would be breaking up a 50-character line across multiple flow files... are there definitely no '\n' characters within those 50 characters? On Wed, Jan 25, 2017 at 6:00 PM, Raymond Rogers <[email protected]<mailto:[email protected]>> wrote: Brain, I looked like I should be getting a single line or complete line messages in the flow-files, which under a light load I do get. When I increase the message rate to what the production world would look like I am seeing lines get chopped into little pieces (one 50-character line may end up in 3-4 flow-files.) Raymond Rogers Senior Embedded Software Engineer 15301 N. Dallas Pkwy Suite 500 Dallas, TX 75001 D: +1 972 744 3928<tel:(972)%20744-3928> rmgnetworks.com<http://www.rmgnetworks.com> [cid:[email protected]] From: Bryan Bende [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, January 25, 2017 4:48 PM To: [email protected]<mailto:[email protected]> Subject: Re: ListenTCP to receive CSV stream. Raymond, Currently ListenTCP uses new line characters to determine logical message boundaries, and coming out of the processor you can either have 1 logical message per flow file, or batch together a configurable number of logical messages into 1 flow file which would be more performant. In your case it sounds like you would want to read data until seeing the "end of data" marker and treat the whole CSV as one logical message. There is a JIRA to add this capability: https://issues.apache.org/jira/browse/NIFI-1985 I think the best you can do currently is to us a MergeContent processor somewhere after ListenTCP to merge together the individual lines from the CSV, but since there is not other information available to tell it how many total lines there are, it can't guarantee that they are all merged together in one flow file. You might be able to make some assumptions about the timing and size of the data and configure MergeContent in such a way that it should usually get you the whole CSV as one file. Hope this helps. -Bryan On Wed, Jan 25, 2017 at 5:18 PM, Raymond Rogers <[email protected]<mailto:[email protected]>> wrote: I'm still new to NiFi and I'm trying to receive text stream containing a CSV file of an unknown length (anything from ~100 bytes to almost 300 KB) over a TCP socket. The CSV does have an "end of data" marker that I can look for but I am unsure of how to accumulate the text until I receive the marker and create a flow-file that contains all of the data up to that point. The data is being sent from an application that cannot changed to use a different format. Any suggestions? Raymond Rogers Senior Embedded Software Engineer 15301 N. Dallas Pkwy Suite 500 Dallas, TX 75001 D: +1 972 744 3928<tel:(972)%20744-3928> rmgnetworks.com<http://www.rmgnetworks.com> [cid:[email protected]] Notice of Confidentiality: This transmission contains information that may be confidential and that may also be privileged. Unless you are the intended recipient of the message (or authorized to receive it for the intended recipient) you may not copy, forward, or otherwise use it, or disclose its contents to anyone else. If you have received this transmission in error, please notify us immediately and delete it from your system.
