Re: ExecuteSQL Extract database tables multiple times.

Matt Burgess Fri, 04 Mar 2016 12:19:56 -0800

Currently ExecuteSql will put all available rows into a single flow file.
There is a Jira case (https://issues.apache.org/jira/browse/NIFI-1251) to
allow the user to break up the result set into flow files containing a
specified number of records.

I'm not sure why you get 26 flow files, although if you let the flow run
for 26 seconds you should see 26 flow files, each with the contents of the
"users" table. This is because it will run every second (per your config)
and execute the same query ("SELECT * FROM users") every time.  There is a
new processor in the works (https://issues.apache.org/jira/browse/NIFI-1575)
that will allow the user to specify "maximum value columns", where the max
values for each specified column will be kept track of, so that each
subsequent execution of the processor will only retrieve rows whose values
for those columns are greater than the currently-held maximum value. An
example would be a users table with a primary key user_id, which is
strictly increasing. The processor would run once, fetching all available
records, then unless a new row is added (with a higher user_id value), no
flow files will be output. If rows are added in the meantime, then upon the
next execution of the processor, only those "new" rows will be output.

I'm happy to help you work through this if you'd like to provide more
details about your table setup (columns, rows) and flow.

Regards,
Matt

On Fri, Mar 4, 2016 at 3:04 PM, Ralf Meier <n...@cht3.com> wrote:

> Hi,
>
> i tried to understand the executeSQL Processor.
> I created a database with a table „users“. This table has two entries.
>
> The problem with the processor is that it selected multiple times the
> entries from the table and created altogether 26 flow files even that only
> two entries where available. In addition each flow file consist of the both
> entires.
>
> I configured the executeSQL Processor the following way:
> Settings: Didn’t changed anything here except of auto terminate on failure:
> Scheduling:
>         Cron based: * * * * * ? (Run every minute)
>         Concurrent tasks: 1
> Properties:
>         Database Connection Pooling Service: DBmysql
>         SQL select query: Select * from user
>         My Wait Time: 0 seconds
>
> Then I used a processor: convertAvroToJson and a PutFile Processor.
>
> If I runt the flow it creates 26 flow files and each of them has all
> entries of the tables as json included.
>
> My goal is to extract the table ones. So that the entries are only created
> ones as json as row not 26 times.
> My understanding was that each row of the table will be one flow file and
> therefore for each line of the table would be one json file on disk (using
> PutFile).
>
> But it seems that this not right. What happens if I have millions of
> entries in such a table? Will this be done with one flow file?
>
> How would I configure that Nifi extract the table ones?
>
> It would be great if somebody could help me with this ?
>
>
>  BR
> Ralf

Re: ExecuteSQL Extract database tables multiple times.

Reply via email to