Just curious: Why is it necessary to identify that global state using catalog, 
schema, table or column names? Those are database concepts. As a user, I'd 
prefer to have some flexibility here. Surely the component needs to know how to 
identify the database object, but beyond that, we should be free to choose how 
to identify that "global"state ourselves. Would that be feasible, maybe as an 
optional attribute that by default would be set with the database.table.column 
I have another example, this time regarding the "while" attribute of 
QueryDatabaseTable. Simply put, it's not enough. I have data that for some 
reason was originally inserted out of order into a source table (or the 
database just decided to show it like that), and without an "order by" clause I 
can't avoid having gaps in the data I'm trying to extract from it now. Not a 
total blocker since I could use a little SQL injection to work around the 
limitation, but it should not be like that. What if one needs a more elaborate 
"select", for example? I would be far preferable to trust the user and let them 
inform the whole SQL command, of course with placeholders so NiFi could 
properly set the values under its control. An optional attribute that would 
take precedence over the "while" when informed would be enough. I'd expect that 
kind of thing to be not so hard to implement, and it wold make the processor 
much more powerful for more advanced users without complicating the lives of 
those who are happy with a simpler configuration.
In general, when dealing with SQL and databases, the less assumptions we make 
about how the component is going to be used, the better. SQL is very powerful, 
and its power should be fully available to this kind of processor, IMHO.
What do you think?
Thank you,


    On Tuesday, March 13, 2018 8:56 PM, Matt Burgess <> 

Originally, we stored state only based on column names, which could obviously 
cause problems when you have two different tables with the same column name. 
However this was because the original DB Fetch processors (QueryDatabaseTable, 
GenerateTableFetch, e.g.) did not accept incoming flow files and thus acted on 
a single table.  Then we added the ability for GenerateTableFetch to accept 
incoming flow files, and realized we should be storing state at least based on 
table name + column name, since GTF might get multiple tables in.  I believe 
you are running into the issue where we'd need to qualify the state based on 
database name + table name + column name, and please feel free to write up a 
Jira for that improvement.  The cutoff at table name was a tradeoff against 
complexity, as a database might not be fully-qualified by its name either 
(imagine multiple hosts with replicated DBs inside, then how do we know that 
two hosts don't point at the same place?).
For your use case, I think we'd need to store state by the aforementioned 
"better-qualified" key, but that might be the limit to our name qualification. 
We will have to deal with backwards-compatibility as we did before we added 
table names, but since we have precedence I wouldn't think it would be too 
difficult to implement.
As a workaround, you might try swapping QueryDatabaseTable with 
GenerateTableFetch, and trying to distribute the flow files for a particular DB 
to a particular instance of ExecuteSQL to actually fetch the rows. You should 
be able to use RouteOnAttribute for this, assuming your table name is an 
attribute on the flow file.

On Tue, Mar 13, 2018 at 8:30 PM, Ramaninder Singh Jhajj 
<> wrote:

Hello Everyone,
I am facing an issue with QueryDatabaseTable processor.
I have 4 identical mySQL database tables on 4 different AWS instances. The 
structure is same but data is different. What I am trying to do is, have 4 
QueryDatabaseTable processors and fetch the data from all the 4 instances to 
process it further and store in elasticsearch. 

This is the structure of the flow, now my issue is:When any one of the 
processors run, it stores the "Maximum-value Columns" value in the state as 
shown below and this state is global in the cluster. 

Now when second QueryDatabaseTable processor runs, it overwrites the state 
value written by the first. I am facing issue with maintaining state for all 4 
processors. Processors runs fine without any issue but obviously, the data 
being fetches is not consistent as "id" column values gets overwritten in the 
Is there any solution to this problem. I need to do incremental fetch to all 4 
identical tables on 4 instances in a single flow and single cluster.
Please let me know if anyone faced similar problem and if there is any solution 
for this.
Kind Regards,Raman


Reply via email to