Hi guys, I could really use an advice. We need to replicate 300 tables from 3 Oracle DBs into Apache Kudu. I'm thinking about doing this:
OracleDB1 --> OracleDB2 --> Oracle GoldenGate --> Kafka --> NiFi 3 node cluster --> Kudu OracleDB3 --> GoldenGate will stream changes in 300 tables in near real-time to Kafka (I am not sure if I want to use a single topic or one per table). NiFi will process messages from Kafka and upsert data into Kudu. Now to my question/challenge. Every table has a bunch of columns and I need to apply certain rules before I move records into Kudu: 1) adjust timezone for certain columns 2) convert data types 3) merge records from 3 Oracle DBs into one single table and order final table columns in a specific way 4) drop certain columns Since there are 900 tables total involved, I do not want to hardcode any of these rules and I already built a little tool, that produces rules for table columns based on some logic. I store these rules in MySQL database. Technically I can now use NiFi, listen to Kafka topic/s, and fetch these rules from MySQL tables and then build dynamically some logic in NiFi (either using newer Record processors or Groovy scripting) and form a Kudu upsert statement (or using PutKudu processor). My concern though since there are 900 tables and the pipeline runs in real-time, I have to make tons of requests to our MySQL database. I am wondering if it would work or I need to use some caching solution with NiFi (Redis?) to store these rules for NiFi. Appreciate any guidance! Boris
