Re: Issue with GenerateTableFetch Processor
Thank you, this is very helpful! John -- Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
Issue with GenerateTableFetch Processor
I’m having an issue with the GenerateTableFetch Processor, and I wanted to ask for some insight into whether this is a bug or expected behavior. Using NiFi 1.12.1 I have a MySQL table with 1M+ rows, and I have a GenerateTableFetch processor with a `maximum-value column` and `partition-size` set to 25000 and a `run schedule` of 9 minutes. When the etl starts up I get a sequence of queries for the existing 1M+ rows like this example `SELECT … ORDER BY maxvalcolumn LIMIT 25000 OFFSET 375000`. The on 9 minutes intervals I get queries like `SELECT … FROM ... WHERE maxvalcolumn > … AND maxvalcolumn <= … ORDER BY maxvalcolumn LIMIT 25000` The issue is that I see only 1 query per 9 minutes with a `LIMIT 25000`, so if my table accumulates more than 25000 rows in 9 minutes the `LIMIT 25000` term simply drops the additional rows and they are passed up. Does the GenerateTableFetch delta copy generate any additional queries with the `OFFSET` term? I’m not sure if there’s a configuration where I can get multiple queries using the `OFFSET` term in the 9 minute interval, or if I can have the query generated without the `LIMIT 25000` term. Thanks, John -- Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
Re: Is there a way to set Processor state from the cli/api?
To be clear, my desired deployment topology is a 'stateless container', ie. a single container with no persistent volumes of any kind. I can stick with NiFi's Local State Provider if I can, upon redeployment of the container, read the index from the target and write it into the QueryDatabaseTable processor before I start the processor. When doing delta copies from datastore A to datastore B a common pattern is to use the table index in datastore B (the target) as the source of truth when restarting the system. I understand my deployment options with Docker persistent volumes, the Redis State Provider, etc. but it seems unnecessary when the target is available to read. All of this depends on having the ability to programmatically set Processor state, which seems to be unsupported. -- Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
Is there a way to set Processor state from the cli/api?
I'm using a QueryDatabaseTable processor to do delta copies every 10 minutes from MySQL into an avro file. I'm running NiFi in a container & want to make it stateless, so what I'd like to do is upon startup read the current pointer from the destination and set the QueryDatabaseTable processor's state so the delta copies begin from that index. Looking through the CLI/API docs I can't find a way to set a processor's state. Does anyone have a way to do this that they can share? -- Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
Run Schedule via Environment Variable?
Does anyone have a pattern for injecting the run schedule for a set of QueryDatabaseTable processors via an environment variable? The processor does not accept a variable for this field. -- Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
NiFi PutBigQueryBatch and AVRO logical types
I'm using NiFi 1.9.2 QueryDatabaseTable->PutBigQueryBatch to attempt to replicate a MySQL table to BigQuery. In QueryDatabaseTable I've configured 'Use Avro Logical Types=true', so I have a MySQL DATETIME which is encoded in Avro as a Long with logical type timestamp-millis. The PutBigQueryBatch does not support the BigQuery use_avro_logical_types option, so I have an explicit schema where I cast the Long to a BigQuery TIMESTAMP. The issue, though, is the Long is being interpreted by BQ as a Timestamp with microseconds, and so the resulting Timestamp is off by x1000. Does anyone have a suggestion for a workaround? I've tried an UpdateRecord processor with :multiply(1000), but it has a type conversion exception when writing the new Avro file. -- Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/