Re: Best practice for RDBMS to JSON

Matt Burgess Thu, 24 May 2018 10:09:47 -0700

For large result sets you’ll want to parallelize the fetch across a multi-node 
NiFi cluster. You can do this with GenerateTableFetch -> RemoteProcessGroup (to 
the same cluster) -> InputPort -> ExecuteSQL. GenerateTableFetch is like 
QueryDatabaseTable but it just generates the SQL to fetch “pages”, it doesn’t 
execute them. The RPG->InputPort distributes the SQL statements among the 
cluster, and ExecuteSQL will do the fetching (in parallel at that point).
Converting to JSON should be done with ConvertRecord so you don’t need a Split 
processor, unless you want them as individual files.


Regards,
Matt

> On May 24, 2018, at 12:52 PM, Anthony Roach <[email protected]> 
> wrote:
> 
> Looking at some examples online for moving data from a relational table to a 
> document store or JSON files, it is unclear how well QueryDatabaseTable will 
> scale to large resultsets.
>  
> Is the QueryDatabaseTable -> SplitAvro -> ConvertAvroToJSON -> PutFile the 
> best way to do this?
>  
> Examples:
> https://blog.couchbase.com/nifi-processing-flow-couchbase-server/
> https://www.batchiq.com/database-extract-with-nifi.html
>

Re: Best practice for RDBMS to JSON

Reply via email to