Re: High volume data with ExecuteSQL processor

Matt Burgess Mon, 15 Oct 2018 07:04:28 -0700

Dnyaneshwar,

In the upcoming release of NiFi 1.8.0, ExecuteSQL will have Max Rows
Per Flow File [1]. In the meantime, you might try GenerateTableFetch,
it takes incoming flow files and generates SQL for X number of rows
per flow file (it is called Partition Size in that processor). The
limitation is that you can't provide your own SQL, it will generate
SQL based on the columns to return, any max-value columns specified,
and an optional custom WHERE clause. If you have complex SQL this
won't be a viable workaround, but if not it should do the trick for
now.


Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-1251

On Mon, Oct 15, 2018 at 1:37 AM Dnyaneshwar Pawar
<[email protected]> wrote:
>
> Hi Koji,
>
> As suggested, the "Max Rows Per Flow File" is not available for ExecuteSQL 
> processor, its available with QueryDatabaseTable processor. But we cannot use 
> QueryDatabaseTable processor as its not accepting upstream connections and we 
> have requirement of accepting upstream connection from other processors (e.g. 
> HandleHTTPRequest processor). Please suggest how we can use ExecuteSQL to 
> process high volume data.
>
> -----Original Message-----
> From: Koji Kawamura <[email protected]>
> Sent: Tuesday, September 25, 2018 5:59 AM
> To: [email protected]
> Subject: Re: High volume data with ExecuteSQL processor
>
> Hello,
>
> Did you try setting 'Max Rows Per Flow File' at ExecuteSQL processor?
> If the OOM happened when NiFi writes all results into a single FlowFile, then 
> the property can help breaking the result set into several FlowFiles to avoid 
> that.
>
> Thanks,
> Koji
> On Fri, Sep 21, 2018 at 3:56 PM Dnyaneshwar Pawar 
> <[email protected]> wrote:
> >
> > Hi,
> >
> >
> >
> > How to execute/process High volume data with ExecuteSQL processor:
> >
> >
> >
> > We tried to execute query for db2 database which has around 10 lakh
> > records. While executing this query
> >
> > we are getting OutOfMemory error and that request(flowfile) is stuck
> > in queue. When we restart nifi, it still stuck in queue and as soon as
> > we start nifi,
> >
> > we are again getting same error as it is stuck in queue. Is there any way 
> > to configure retry for queue(connection to 2 processor).
> >
> >
> >
> > We also tried to change property for Flow File repository in
> > nifi.properties (nifi.flowfile.repository.implementation) to
> > 'org.apache.nifi.controller.repository.VolatileFlowFileRepository',
> >
> > This is removing flowfile in query while restarting nifi. But it has risk 
> > of data loss in the event of power/machine failure for other processes.
> >
> > So please suggest how to execute high volume data query execution or any 
> > retry mechanism available for queued flowfile.
> >
> >
> >
> >
> >
> > Regards,
> >
> > Dnyaneshwar Pawar
> >
> >
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.

Re: High volume data with ExecuteSQL processor

Reply via email to