Lee, Thanks for your idea.
I have one doubt regarding Execute Stream that needs CommandPath and ArgumentDelimiter. I have given this regex (.+)[|](.+)[|](.+)[|](.+) in Extract Text processor. How can i give this reg ex to execute Stream processor? or Is any other processor which having same functionality like ExtractText processor? Thanks On Tue, Oct 18, 2016 at 11:42 AM, Lee Laim <[email protected]> wrote: > > Prabhu, > > You might also try to replace ExtractText with a series of > ExecuteStreamCommand processors that perform system calls (sed/awk/grep or > the Windows equivalents) on the flowfiles contents. You can even write the > result directly to a flowfile attribute. > > I suspect there are wildcards in your ExtractText regex that are taking a > while to buffer and compare. > > Lee > > On Oct 18, 2016, at 2:31 PM, prabhu Mahendran <[email protected]> > wrote: > > Mark, > > Thanks for your response. > > Please find the response for your questions. > > ==>The first processor that you see that exhibits poor performance is > ExtractText, correct? > Yes,Extract Text exhibits poor performance. > > ==>How big is your Java heap? > I have set 1 GB for java heap. > > ==>Do you have back pressure configured on the connection between > ExtractText and ReplaceText? > There is no back pressure between extract and > replace text. > > ==>when you say that you specify concurrent tasks, what are you > configuring the concurrent tasks > to be? > I have specify concurrent tasks to be 2 for the > extract text processor due to slower processing rate.Which > is specified in Concurrent Task Text box. > > ==>Have you changed the maximum number of concurrent tasks available to > your dataflow? > No i haven't changed. > > ==>How many CPU's are available on this machine? > Only single cpu are available in this machine with > core i5 processor CPU @2.20Ghz. > > ==> Are these the only processors in your flow, or do you have other > dataflows going on in the > same instance as NiFi? > Yes this is the only processor in work flow which > is running and no other instances are running. > > Thanks > > On Mon, Oct 17, 2016 at 6:08 PM, Mark Payne <[email protected]> wrote: > >> Prabhu, >> >> Certainly, the performance that you are seeing, taking 4-5 hours to move >> 3M rows into SQLServer is far from >> ideal, but the good news is that it is also far from typical. You should >> be able to see far better results. >> >> To help us understand what is limiting the performance, and to make sure >> that we understand what you are seeing, >> I have a series of questions that would help us to understand what is >> going on. >> >> The first processor that you see that exhibits poor performance is >> ExtractText, correct? >> Can you share the configuration that you have for that processor? >> >> How big is your Java heap? This is configured in conf/bootstrap.conf; by >> default it is configured as: >> java.arg.2=-Xms512m >> java.arg.3=-Xmx512m >> >> Do you have backpressure configured on the connection between ExtractText >> and ReplaceText? >> >> Also, when you say that you specify concurrent tasks, what are you >> configuring the concurrent tasks >> to be? Have you changed the maximum number of concurrent tasks available >> to your dataflow? By default, NiFi will >> use only 10 threads max. How many CPU's are available on this machine? >> >> And finally, are these the only processors in your flow, or do you have >> other dataflows going on in the >> same instance as NiFi? >> >> Thanks >> -Mark >> >> >> On Oct 17, 2016, at 3:35 AM, prabhu Mahendran <[email protected]> >> wrote: >> >> Hi All, >> >> I have tried to perform the below operation. >> >> dat file(input)-->JSON-->SQL-->SQLServer >> >> >> GetFile-->SplitText-->SplitText-->ExtractText-->ReplaceText- >> ->ConvertJsonToSQL-->PutSQL. >> >> My Input File(.dat)-->3,00,000 rows. >> >> *Objective:* Move the data from '.dat' file into SQLServer. >> >> I can able to Store the data in SQL Server by using combination of above >> processors.But it takes almost 4-5 hrs to move complete data into SQLServer. >> >> Combination of SplitText's perform data read quickly.But Extract Text >> takes long time to pass given data matches with user defined expression.If >> input comes 107 MB but it send outputs in KB size only even ReplaceText >> processor also processing data in KB Size only. >> >> In accordance with above slow processing leads the more time taken for >> data into SQLsever. >> >> >> Extract Text,ReplaceText,ConvertJsonToSQL processors send's outgoing >> flow file in Kilobytes only. >> >> If i have specify concurrent tasks for those >> ExtractText,ReplaceText,ConvertJsonToSQL then it occupy the 100% cpu and >> disk usage. >> >> It just 30 MB data ,But processors takes 6 hrs for data movement into >> SQLServer. >> >> Faced Problem is.., >> >> >> 1. Almost 6 hrs taken for move the 3lakhs data into SQL Server. >> 2. ExtractText,ReplaceText take long time for processing >> data(it send output flowfile kb size only). >> >> Can anyone help me to solve below *requirement*? >> >> Need to reduce the number of time taken by the processors for move the >> lakhs of data into SQL Server. >> >> >> >> If anything i'm done wrong,please help me to done it right. >> >> >> >> >> >> >> >> >> >
