Bob,
  Even if you were able to manually create the NiFi flow for all 200 tables 
successfully, you may want to make some change to the flows down the road.  You 
would have to perform 200 manual changes again.

This may be mitigated slightly by breaking the flows into 2 pieces: Acquisition 
and Ingestion.  You would have 200 unique Acquisition flows and a single 
ingestion flow.  You will have to rely on Process Group variables and flow file 
attributes to accomplish this.

I think your answer is automation.  NiFi provides a REST 
API<https://nifi.apache.org/docs/nifi-docs/rest-api/index.html> for any action 
you can take in the UI.  This means you can write automation software to 
analyze the data sources and dynamically build the NiFi Flows.

You do not have to start from scratch, you can make use of this Python API for 
managing NiFi<https://pypi.org/project/nipyapi/>.  Or you could go with a more 
fully developed automation approach provided by Kylo<http://kylo.io/>.

Good luck,
Paul Gibeault

From: Kuhfahl, Bob [mailto:[email protected]]
Sent: Friday, August 17, 2018 7:55 AM
To: [email protected]
Subject: [EXT] Design pattern advice needed

Problem:

  *   Source database with over 200 tables.
  *   Current Nifi ‘system’ we are developing can extract data from those 200 
tables into NiFi flows of JSON-formatted data, essentially separate flows for 
each table with an attribute that indicates the tablename and other useful 
attributes but NOT the  schema.
  *   Do some data transforms, and prepare it for target database load.  This 
is where I am struggling.
  *   Large volume of data so we need to batch load using PutDatabaseRecord.
  *   PutDatabaseRecord record readers such as JsonPathReader need attributes 
defined for each element in the data – I’d need to define over 200 instances of 
PutDatabaseRecord and route based on the tablename.  Not.
  *   AvroReader seems almost a natural fit, I can InferAvroSchema from the 
Json, but I’m not finding an easy way to convert the Json to Avro…
  *   CSVReader seems like the only other choice but the manual conversion of 
formats might also be a pain…

Thoughts on solutions?

Reply via email to