Re: Data Ingestion using NiFi

Mike Thomsen Tue, 13 Aug 2019 11:28:35 -0700

One of the easiest ways to trigger events in NiFi is to have a message
queue processor set up and listening to a queue where you post an event to
trigger the flow.


On Tue, Aug 13, 2019 at 11:45 AM Bimal Mehta <bimal...@gmail.com> wrote:

> Thanks Mike.
> ExecuteSQL looks good and am trying it.
>
> Also I wanted to understand how can we control triggering the NiFi jobs
> from devops tools like CloudBees/ElectricFlow?
>
> On Tue, Aug 13, 2019 at 7:35 AM Mike Thomsen <mikerthom...@gmail.com>
> wrote:
>
>> Bimal,
>>
>> 1. Take a look at ExecuteSQLRecord and see if that works for you. I don't
>> use SQL databases that much, but it works like a charm for me and others
>> for querying and getting an inferred avro schema based on the schema of the
>> database table (you can massage it into another format with ConvertRecord).
>> 2. Take a look at QueryRecord and PartitionRecord with them configured to
>> use Avro readers and writers.
>>
>> Mike
>>
>> On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta <bimal...@gmail.com> wrote:
>>
>>> Hi NiFi users,
>>>
>>> We had been using the kylo data ingest template to read the data from
>>> our Oracle and DB2 databases and move it into HDFS and Hive.
>>> The kylo data ingest template also provided some features to validate,
>>> profile and split the data based on validation rules. We also built some
>>> custom processors and added them to the template.
>>> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
>>> don't work there. We were able to make our custom processors work in 1.9.0
>>> but the kylo nar files don't work. I don't know if any work around exists
>>> for that.
>>>
>>> However given that the kylo project is dead, I don't want to depend on
>>> those kylo-nar files and processors, what I wanted to understand is how do
>>> I replicate that functionality using the standard processors available in
>>> NiFi.
>>>
>>> Essentially are there processors that allow me to do the below:
>>> 1. Read data from database - I know QueryDatabaseTable. Any other? How
>>> do I make it parameterized so that I don't need to create one flow for one
>>> table. How can we pass the table name while running the job?
>>> 2. Partition and convert to avro- I know splitavro, but does it
>>> partition also, and how do I pass the partition parameters
>>> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to
>>> HDFS, but should I use PutSQL for Hive by converting the avro in step 2 to
>>> SQL? Or is there a better option. Does this support upserts as well?
>>> 4. Apply validation rules to the data before being written into Hive.
>>> Like calling a custom spark job that will execute the validation rules and
>>> split the data. Any processor that can help achieve this?
>>>
>>> I know a few users in this group had used kylo on top of NiFi. It will
>>> be great if some of you can provide your perspective as well.
>>>
>>> Thanks in advance.
>>>
>>> Bimal Mehta
>>>
>>

Re: Data Ingestion using NiFi

Reply via email to