Re: Data Ingestion using NiFi

Bimal Mehta Tue, 13 Aug 2019 08:46:00 -0700

Thanks Mike.
ExecuteSQL looks good and am trying it.

Also I wanted to understand how can we control triggering the NiFi jobs
from devops tools like CloudBees/ElectricFlow?


On Tue, Aug 13, 2019 at 7:35 AM Mike Thomsen <[email protected]> wrote:

> Bimal,
>
> 1. Take a look at ExecuteSQLRecord and see if that works for you. I don't
> use SQL databases that much, but it works like a charm for me and others
> for querying and getting an inferred avro schema based on the schema of the
> database table (you can massage it into another format with ConvertRecord).
> 2. Take a look at QueryRecord and PartitionRecord with them configured to
> use Avro readers and writers.
>
> Mike
>
> On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta <[email protected]> wrote:
>
>> Hi NiFi users,
>>
>> We had been using the kylo data ingest template to read the data from our
>> Oracle and DB2 databases and move it into HDFS and Hive.
>> The kylo data ingest template also provided some features to validate,
>> profile and split the data based on validation rules. We also built some
>> custom processors and added them to the template.
>> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
>> don't work there. We were able to make our custom processors work in 1.9.0
>> but the kylo nar files don't work. I don't know if any work around exists
>> for that.
>>
>> However given that the kylo project is dead, I don't want to depend on
>> those kylo-nar files and processors, what I wanted to understand is how do
>> I replicate that functionality using the standard processors available in
>> NiFi.
>>
>> Essentially are there processors that allow me to do the below:
>> 1. Read data from database - I know QueryDatabaseTable. Any other? How do
>> I make it parameterized so that I don't need to create one flow for one
>> table. How can we pass the table name while running the job?
>> 2. Partition and convert to avro- I know splitavro, but does it partition
>> also, and how do I pass the partition parameters
>> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS,
>> but should I use PutSQL for Hive by converting the avro in step 2 to SQL?
>> Or is there a better option. Does this support upserts as well?
>> 4. Apply validation rules to the data before being written into Hive.
>> Like calling a custom spark job that will execute the validation rules and
>> split the data. Any processor that can help achieve this?
>>
>> I know a few users in this group had used kylo on top of NiFi. It will be
>> great if some of you can provide your perspective as well.
>>
>> Thanks in advance.
>>
>> Bimal Mehta
>>
>

Re: Data Ingestion using NiFi

Reply via email to