Raman, Originally, we stored state only based on column names, which could obviously cause problems when you have two different tables with the same column name. However this was because the original DB Fetch processors (QueryDatabaseTable, GenerateTableFetch, e.g.) did not accept incoming flow files and thus acted on a single table. Then we added the ability for GenerateTableFetch to accept incoming flow files, and realized we should be storing state at least based on table name + column name, since GTF might get multiple tables in. I believe you are running into the issue where we'd need to qualify the state based on database name + table name + column name, and please feel free to write up a Jira for that improvement. The cutoff at table name was a tradeoff against complexity, as a database might not be fully-qualified by its name either (imagine multiple hosts with replicated DBs inside, then how do we know that two hosts don't point at the same place?).
For your use case, I think we'd need to store state by the aforementioned "better-qualified" key, but that might be the limit to our name qualification. We will have to deal with backwards-compatibility as we did before we added table names, but since we have precedence I wouldn't think it would be too difficult to implement. As a workaround, you might try swapping QueryDatabaseTable with GenerateTableFetch, and trying to distribute the flow files for a particular DB to a particular instance of ExecuteSQL to actually fetch the rows. You should be able to use RouteOnAttribute for this, assuming your table name is an attribute on the flow file. Regards, Matt On Tue, Mar 13, 2018 at 8:30 PM, Ramaninder Singh Jhajj < [email protected]> wrote: > Hello Everyone, > > I am facing an issue with QueryDatabaseTable processor. > > I have 4 identical mySQL database tables on 4 different AWS instances. The > structure is same but data is different. What I am trying to do is, have 4 > QueryDatabaseTable processors and fetch the data from all the 4 instances > to process it further and store in elasticsearch. > > > This is the structure of the flow, now my issue is: > *When any one of the processors run, it stores the "Maximum-value Columns" > value in the state as shown below and this state is global in the cluster. * > > > > ** > > *Now when second QueryDatabaseTable processor runs, it overwrites the > state value written by the first. I am facing issue with maintaining state > for all 4 processors. Processors runs fine without any issue but obviously, > the data being fetches is not consistent as "id" column values gets > overwritten in the state.* > > Is there any solution to this problem. I need to do incremental fetch to > all 4 identical tables on 4 instances in a single flow and single cluster. > > Please let me know if anyone faced similar problem and if there is any > solution for this. > > Kind Regards, > Raman >
