Re: SplitRecord behaviour

2019-03-01 Thread Kumara M S, Hemantha (Nokia - IN/Bangalore)
To: dev@nifi.apache.org Subject: Re: SplitRecord behaviour If you increase the concurrent tasks on PublishKafka then you are right that you could publish multiple records at the same time, but I suspect that the overhead of doing the split will cancel out any gains from publishing in parallel

RE: SplitRecord behaviour

2019-03-01 Thread Kumara M S, Hemantha (Nokia - IN/Bangalore)
) at one shot instead of sending one record at a time. Thanks, Hemantha -Original Message- From: Bryan Bende Sent: Friday, March 1, 2019 7:52 PM To: dev@nifi.apache.org Subject: Re: SplitRecord behaviour Hello, Flow files are not transferred until the session they came form is committed

Re: SplitRecord behaviour

2019-03-01 Thread Bryan Bende
You can call transfer for each segment while processing the incoming stream, its just that the real transfer won't actually happen until commit is called. Most processors extend AbstractProcessor so commit is called for you at the end, but you could choose to manage the session yourself and call

Re: SplitRecord behaviour

2019-03-01 Thread Otto Fowler
Bryan, So the best practice when segmenting is to - build your segments as a list while processing the incoming stream - then after send them all to the relationship right? On March 1, 2019 at 09:21:46, Bryan Bende (bbe...@gmail.com) wrote: Hello, Flow files are not transferred until the

Re: SplitRecord behaviour

2019-03-01 Thread Bryan Bende
Hello, Flow files are not transferred until the session they came form is committed. So imagine we periodically commit and some of the splits are transferred, then half way through a failure is encountered, the entire original flow file will be reprocessed, producing some of the same splits that

SplitRecord behaviour

2019-03-01 Thread Kumara M S, Hemantha (Nokia - IN/Bangalore)
Hi All, We have a use case where receiving huge json(file size might vary from 1GB to 50GB) via http, convert in to XML(xml format is not fixed, any other format is fine) and send out using Kafka. - here is the restriction is CPU & RAM usage requirement(once it is fixed, it should handle all