Thanks Mike. ExecuteSQL looks good and am trying it. Also I wanted to understand how can we control triggering the NiFi jobs from devops tools like CloudBees/ElectricFlow?
On Tue, Aug 13, 2019 at 7:35 AM Mike Thomsen <[email protected]> wrote: > Bimal, > > 1. Take a look at ExecuteSQLRecord and see if that works for you. I don't > use SQL databases that much, but it works like a charm for me and others > for querying and getting an inferred avro schema based on the schema of the > database table (you can massage it into another format with ConvertRecord). > 2. Take a look at QueryRecord and PartitionRecord with them configured to > use Avro readers and writers. > > Mike > > On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta <[email protected]> wrote: > >> Hi NiFi users, >> >> We had been using the kylo data ingest template to read the data from our >> Oracle and DB2 databases and move it into HDFS and Hive. >> The kylo data ingest template also provided some features to validate, >> profile and split the data based on validation rules. We also built some >> custom processors and added them to the template. >> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors >> don't work there. We were able to make our custom processors work in 1.9.0 >> but the kylo nar files don't work. I don't know if any work around exists >> for that. >> >> However given that the kylo project is dead, I don't want to depend on >> those kylo-nar files and processors, what I wanted to understand is how do >> I replicate that functionality using the standard processors available in >> NiFi. >> >> Essentially are there processors that allow me to do the below: >> 1. Read data from database - I know QueryDatabaseTable. Any other? How do >> I make it parameterized so that I don't need to create one flow for one >> table. How can we pass the table name while running the job? >> 2. Partition and convert to avro- I know splitavro, but does it partition >> also, and how do I pass the partition parameters >> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS, >> but should I use PutSQL for Hive by converting the avro in step 2 to SQL? >> Or is there a better option. Does this support upserts as well? >> 4. Apply validation rules to the data before being written into Hive. >> Like calling a custom spark job that will execute the validation rules and >> split the data. Any processor that can help achieve this? >> >> I know a few users in this group had used kylo on top of NiFi. It will be >> great if some of you can provide your perspective as well. >> >> Thanks in advance. >> >> Bimal Mehta >> >
