Re: Issue with GenerateTableFetch Processor

2021-02-11 Thread John W. Phillips
Thank you, this is very helpful!  John



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Issue with GenerateTableFetch Processor

2021-02-10 Thread John W. Phillips
I’m having an issue with the GenerateTableFetch Processor, and I wanted to
ask for some insight into whether this is a bug or expected behavior.  Using
NiFi 1.12.1 I have a MySQL table with 1M+ rows, and I have a
GenerateTableFetch processor with a `maximum-value column` and
`partition-size` set to 25000 and a `run schedule` of 9 minutes.  When the
etl starts up I get a sequence of queries for the existing 1M+ rows like
this example
`SELECT … ORDER BY maxvalcolumn LIMIT 25000 OFFSET 375000`.

The on 9 minutes intervals I get queries like 
`SELECT … FROM ... WHERE maxvalcolumn > … AND maxvalcolumn <= … ORDER BY
maxvalcolumn LIMIT 25000`

The issue is that I see only 1 query per 9 minutes with a `LIMIT 25000`, so
if my table accumulates more than 25000 rows in 9 minutes the `LIMIT 25000`
term simply drops the additional rows and they are passed up.  Does the
GenerateTableFetch delta copy generate any additional queries with the
`OFFSET` term?  I’m not sure if there’s a configuration where I can get
multiple queries using the `OFFSET` term in the 9 minute interval, or if I
can have the query generated without the `LIMIT 25000` term.

Thanks,
John



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Is there a way to set Processor state from the cli/api?

2019-09-06 Thread John W. Phillips
To be clear, my desired deployment topology is a 'stateless container', ie. a
single container with no persistent volumes of any kind.  I can stick with
NiFi's Local State Provider if I can, upon redeployment of the container,
read the index from the target and write it into the QueryDatabaseTable
processor before I start the processor.
When doing delta copies from datastore A to datastore B a common pattern is
to use the table index in datastore B (the target) as the source of truth
when restarting the system.  I understand my deployment options with Docker
persistent volumes, the Redis State Provider, etc. but it seems unnecessary
when the target is available to read.  All of this depends on having the
ability to programmatically set Processor state, which seems to be
unsupported.





--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Is there a way to set Processor state from the cli/api?

2019-09-05 Thread John W. Phillips


I'm using a QueryDatabaseTable processor to do delta copies every 10 minutes
from MySQL into an avro file.  I'm running NiFi in a container & want to
make it stateless, so what I'd like to do is upon startup read the current
pointer from the destination and set the QueryDatabaseTable processor's
state so the delta copies begin from that index.
Looking through the CLI/API docs I can't find a way to set a processor's
state.  Does anyone have a way to do this that they can share?



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Run Schedule via Environment Variable?

2019-07-31 Thread John W. Phillips
Does anyone have a pattern for injecting the run schedule for a set of
QueryDatabaseTable processors via an environment variable?  The processor
does not accept a variable for this field.



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


NiFi PutBigQueryBatch and AVRO logical types

2019-07-15 Thread John W. Phillips


I'm using NiFi 1.9.2 QueryDatabaseTable->PutBigQueryBatch to attempt to
replicate a MySQL table to BigQuery.  In QueryDatabaseTable I've configured
'Use Avro Logical Types=true', so I have a MySQL DATETIME which is encoded
in Avro as a Long with logical type timestamp-millis.  The PutBigQueryBatch
does not support the BigQuery use_avro_logical_types option, so I have an
explicit schema where I cast the Long to a BigQuery TIMESTAMP.

The issue, though, is the Long is being interpreted by BQ as a Timestamp
with microseconds, and so the resulting Timestamp is off by x1000.  Does
anyone have a suggestion for a workaround?  I've tried an UpdateRecord
processor with :multiply(1000), but it has a type conversion exception when
writing the new Avro file.



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/