This is not currently possible, but there is a Jira case about adding
Hive support to QueryDatabaseTable [1].

The comments mention that paging of results in Hive is not possible,
but I've seen some examples using ROWNUMBER OVER() and such, and
although complicated and messy (like the PL/SQL statement to do
pagination), it may indeed be possible. Please feel free to write an
Improvement Jira to add Hive support to GenerateTableFetch if you
like, we can continue the discussion there.

Also in an upcoming release, it will be possible to use
UpdateAttribute to keep track of certain variables in state, such that
you could loop using SelectHiveQL, keeping track of a timestamp value
for example, and use that in the HiveQL query, thereby emulating this
capability.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-3093

On Mon, Jan 9, 2017 at 12:23 PM, Provenzano Nicolas
<[email protected]> wrote:
> Hi all,
>
>
>
> The GenerateTableFetch processor allow defining a « max value column » to
> get only recent rows (for example).
>
>
>
> Is there any way of doing the same with the SelectHiveQL ? Currently, it
> seems the SelectHiveQL processor always gets all the rows each time it is
> run while I would like to get only the added or updated rows since the last
> run ?
>
>
>
> Thanks in advance
>
>
>
> Nicolas

Reply via email to