You might also try to replace ExtractText with a series of ExecuteStreamCommand
processors that perform system calls (sed/awk/grep or the Windows equivalents)
on the flowfiles contents. You can even write the result directly to a
I suspect there are wildcards in your ExtractText regex that are taking a while
to buffer and compare.
On Oct 18, 2016, at 2:31 PM, prabhu Mahendran <prabhuu161...@gmail.com> wrote:
> Thanks for your response.
> Please find the response for your questions.
> ==>The first processor that you see that exhibits poor performance is
> ExtractText, correct?
> Yes,Extract Text exhibits poor performance.
> ==>How big is your Java heap?
> I have set 1 GB for java heap.
> ==>Do you have back pressure configured on the connection between ExtractText
> and ReplaceText?
> There is no back pressure between extract and
> replace text.
> ==>when you say that you specify concurrent tasks, what are you configuring
> the concurrent tasks
> to be?
> I have specify concurrent tasks to be 2 for the
> extract text processor due to slower processing rate.Which
> is specified in Concurrent Task Text box.
> ==>Have you changed the maximum number of concurrent tasks available to your
> No i haven't changed.
> ==>How many CPU's are available on this machine?
> Only single cpu are available in this machine with
> core i5 processor CPU @2.20Ghz.
> ==> Are these the only processors in your flow, or do you have other
> dataflows going on in the
> same instance as NiFi?
> Yes this is the only processor in work flow which is
> running and no other instances are running.
>> On Mon, Oct 17, 2016 at 6:08 PM, Mark Payne <marka...@hotmail.com> wrote:
>> Certainly, the performance that you are seeing, taking 4-5 hours to move 3M
>> rows into SQLServer is far from
>> ideal, but the good news is that it is also far from typical. You should be
>> able to see far better results.
>> To help us understand what is limiting the performance, and to make sure
>> that we understand what you are seeing,
>> I have a series of questions that would help us to understand what is going
>> The first processor that you see that exhibits poor performance is
>> ExtractText, correct?
>> Can you share the configuration that you have for that processor?
>> How big is your Java heap? This is configured in conf/bootstrap.conf; by
>> default it is configured as:
>> Do you have backpressure configured on the connection between ExtractText
>> and ReplaceText?
>> Also, when you say that you specify concurrent tasks, what are you
>> configuring the concurrent tasks
>> to be? Have you changed the maximum number of concurrent tasks available to
>> your dataflow? By default, NiFi will
>> use only 10 threads max. How many CPU's are available on this machine?
>> And finally, are these the only processors in your flow, or do you have
>> other dataflows going on in the
>> same instance as NiFi?
>>> On Oct 17, 2016, at 3:35 AM, prabhu Mahendran <prabhuu161...@gmail.com>
>>> Hi All,
>>> I have tried to perform the below operation.
>>> dat file(input)-->JSON-->SQL-->SQLServer
>>> My Input File(.dat)-->3,00,000 rows.
>>> Objective: Move the data from '.dat' file into SQLServer.
>>> I can able to Store the data in SQL Server by using combination of above
>>> processors.But it takes almost 4-5 hrs to move complete data into SQLServer.
>>> Combination of SplitText's perform data read quickly.But Extract Text takes
>>> long time to pass given data matches with user defined expression.If input
>>> comes 107 MB but it send outputs in KB size only even ReplaceText processor
>>> also processing data in KB Size only.
>>> In accordance with above slow processing leads the more time taken for data
>>> into SQLsever.
>>> Extract Text,ReplaceText,ConvertJsonToSQL processors send's outgoing flow
>>> file in Kilobytes only.
>>> If i have specify concurrent tasks for those
>>> ExtractText,ReplaceText,ConvertJsonToSQL then it occupy the 100% cpu and
>>> disk usage.
>>> It just 30 MB data ,But processors takes 6 hrs for data movement into
>>> Faced Problem is..,
>>> Almost 6 hrs taken for move the 3lakhs data into SQL Server.
>>> ExtractText,ReplaceText take long time for processing data(it send
>>> output flowfile kb size only).
>>> Can anyone help me to solve below requirement?
>>> Need to reduce the number of time taken by the processors for move the
>>> lakhs of data into SQL Server.
>>> If anything i'm done wrong,please help me to done it right.