Prabhu, You might also try to replace ExtractText with a series of ExecuteStreamCommand processors that perform system calls (sed/awk/grep or the Windows equivalents) on the flowfiles contents. You can even write the result directly to a flowfile attribute.
I suspect there are wildcards in your ExtractText regex that are taking a while to buffer and compare. Lee On Oct 18, 2016, at 2:31 PM, prabhu Mahendran <[email protected]> wrote: > Mark, > > Thanks for your response. > > Please find the response for your questions. > > ==>The first processor that you see that exhibits poor performance is > ExtractText, correct? > Yes,Extract Text exhibits poor performance. > > ==>How big is your Java heap? > I have set 1 GB for java heap. > > ==>Do you have back pressure configured on the connection between ExtractText > and ReplaceText? > There is no back pressure between extract and > replace text. > > ==>when you say that you specify concurrent tasks, what are you configuring > the concurrent tasks > to be? > I have specify concurrent tasks to be 2 for the > extract text processor due to slower processing rate.Which > is specified in Concurrent Task Text box. > > ==>Have you changed the maximum number of concurrent tasks available to your > dataflow? > No i haven't changed. > > ==>How many CPU's are available on this machine? > Only single cpu are available in this machine with > core i5 processor CPU @2.20Ghz. > > ==> Are these the only processors in your flow, or do you have other > dataflows going on in the > same instance as NiFi? > Yes this is the only processor in work flow which is > running and no other instances are running. > > Thanks > >> On Mon, Oct 17, 2016 at 6:08 PM, Mark Payne <[email protected]> wrote: >> Prabhu, >> >> Certainly, the performance that you are seeing, taking 4-5 hours to move 3M >> rows into SQLServer is far from >> ideal, but the good news is that it is also far from typical. You should be >> able to see far better results. >> >> To help us understand what is limiting the performance, and to make sure >> that we understand what you are seeing, >> I have a series of questions that would help us to understand what is going >> on. >> >> The first processor that you see that exhibits poor performance is >> ExtractText, correct? >> Can you share the configuration that you have for that processor? >> >> How big is your Java heap? This is configured in conf/bootstrap.conf; by >> default it is configured as: >> java.arg.2=-Xms512m >> java.arg.3=-Xmx512m >> >> Do you have backpressure configured on the connection between ExtractText >> and ReplaceText? >> >> Also, when you say that you specify concurrent tasks, what are you >> configuring the concurrent tasks >> to be? Have you changed the maximum number of concurrent tasks available to >> your dataflow? By default, NiFi will >> use only 10 threads max. How many CPU's are available on this machine? >> >> And finally, are these the only processors in your flow, or do you have >> other dataflows going on in the >> same instance as NiFi? >> >> Thanks >> -Mark >> >> >>> On Oct 17, 2016, at 3:35 AM, prabhu Mahendran <[email protected]> >>> wrote: >>> >>> Hi All, >>> >>> I have tried to perform the below operation. >>> >>> dat file(input)-->JSON-->SQL-->SQLServer >>> >>> >>> GetFile-->SplitText-->SplitText-->ExtractText-->ReplaceText-->ConvertJsonToSQL-->PutSQL. >>> >>> My Input File(.dat)-->3,00,000 rows. >>> >>> Objective: Move the data from '.dat' file into SQLServer. >>> >>> I can able to Store the data in SQL Server by using combination of above >>> processors.But it takes almost 4-5 hrs to move complete data into SQLServer. >>> >>> Combination of SplitText's perform data read quickly.But Extract Text takes >>> long time to pass given data matches with user defined expression.If input >>> comes 107 MB but it send outputs in KB size only even ReplaceText processor >>> also processing data in KB Size only. >>> >>> In accordance with above slow processing leads the more time taken for data >>> into SQLsever. >>> >>> >>> Extract Text,ReplaceText,ConvertJsonToSQL processors send's outgoing flow >>> file in Kilobytes only. >>> >>> If i have specify concurrent tasks for those >>> ExtractText,ReplaceText,ConvertJsonToSQL then it occupy the 100% cpu and >>> disk usage. >>> >>> It just 30 MB data ,But processors takes 6 hrs for data movement into >>> SQLServer. >>> >>> Faced Problem is.., >>> >>> Almost 6 hrs taken for move the 3lakhs data into SQL Server. >>> ExtractText,ReplaceText take long time for processing data(it send >>> output flowfile kb size only). >>> Can anyone help me to solve below requirement? >>> >>> Need to reduce the number of time taken by the processors for move the >>> lakhs of data into SQL Server. >>> >>> >>> >>> If anything i'm done wrong,please help me to done it right. >
