Re: Data Ingestion using NiFi

2019-08-13 Thread Mike Thomsen
One of the easiest ways to trigger events in NiFi is to have a message
queue processor set up and listening to a queue where you post an event to
trigger the flow.

On Tue, Aug 13, 2019 at 11:45 AM Bimal Mehta  wrote:

> Thanks Mike.
> ExecuteSQL looks good and am trying it.
>
> Also I wanted to understand how can we control triggering the NiFi jobs
> from devops tools like CloudBees/ElectricFlow?
>
> On Tue, Aug 13, 2019 at 7:35 AM Mike Thomsen 
> wrote:
>
>> Bimal,
>>
>> 1. Take a look at ExecuteSQLRecord and see if that works for you. I don't
>> use SQL databases that much, but it works like a charm for me and others
>> for querying and getting an inferred avro schema based on the schema of the
>> database table (you can massage it into another format with ConvertRecord).
>> 2. Take a look at QueryRecord and PartitionRecord with them configured to
>> use Avro readers and writers.
>>
>> Mike
>>
>> On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta  wrote:
>>
>>> Hi NiFi users,
>>>
>>> We had been using the kylo data ingest template to read the data from
>>> our Oracle and DB2 databases and move it into HDFS and Hive.
>>> The kylo data ingest template also provided some features to validate,
>>> profile and split the data based on validation rules. We also built some
>>> custom processors and added them to the template.
>>> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
>>> don't work there. We were able to make our custom processors work in 1.9.0
>>> but the kylo nar files don't work. I don't know if any work around exists
>>> for that.
>>>
>>> However given that the kylo project is dead, I don't want to depend on
>>> those kylo-nar files and processors, what I wanted to understand is how do
>>> I replicate that functionality using the standard processors available in
>>> NiFi.
>>>
>>> Essentially are there processors that allow me to do the below:
>>> 1. Read data from database - I know QueryDatabaseTable. Any other? How
>>> do I make it parameterized so that I don't need to create one flow for one
>>> table. How can we pass the table name while running the job?
>>> 2. Partition and convert to avro- I know splitavro, but does it
>>> partition also, and how do I pass the partition parameters
>>> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to
>>> HDFS, but should I use PutSQL for Hive by converting the avro in step 2 to
>>> SQL? Or is there a better option. Does this support upserts as well?
>>> 4. Apply validation rules to the data before being written into Hive.
>>> Like calling a custom spark job that will execute the validation rules and
>>> split the data. Any processor that can help achieve this?
>>>
>>> I know a few users in this group had used kylo on top of NiFi. It will
>>> be great if some of you can provide your perspective as well.
>>>
>>> Thanks in advance.
>>>
>>> Bimal Mehta
>>>
>>


Re: Data Ingestion using NiFi

2019-08-13 Thread Bimal Mehta
Thanks Mike.
ExecuteSQL looks good and am trying it.

Also I wanted to understand how can we control triggering the NiFi jobs
from devops tools like CloudBees/ElectricFlow?

On Tue, Aug 13, 2019 at 7:35 AM Mike Thomsen  wrote:

> Bimal,
>
> 1. Take a look at ExecuteSQLRecord and see if that works for you. I don't
> use SQL databases that much, but it works like a charm for me and others
> for querying and getting an inferred avro schema based on the schema of the
> database table (you can massage it into another format with ConvertRecord).
> 2. Take a look at QueryRecord and PartitionRecord with them configured to
> use Avro readers and writers.
>
> Mike
>
> On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta  wrote:
>
>> Hi NiFi users,
>>
>> We had been using the kylo data ingest template to read the data from our
>> Oracle and DB2 databases and move it into HDFS and Hive.
>> The kylo data ingest template also provided some features to validate,
>> profile and split the data based on validation rules. We also built some
>> custom processors and added them to the template.
>> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
>> don't work there. We were able to make our custom processors work in 1.9.0
>> but the kylo nar files don't work. I don't know if any work around exists
>> for that.
>>
>> However given that the kylo project is dead, I don't want to depend on
>> those kylo-nar files and processors, what I wanted to understand is how do
>> I replicate that functionality using the standard processors available in
>> NiFi.
>>
>> Essentially are there processors that allow me to do the below:
>> 1. Read data from database - I know QueryDatabaseTable. Any other? How do
>> I make it parameterized so that I don't need to create one flow for one
>> table. How can we pass the table name while running the job?
>> 2. Partition and convert to avro- I know splitavro, but does it partition
>> also, and how do I pass the partition parameters
>> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS,
>> but should I use PutSQL for Hive by converting the avro in step 2 to SQL?
>> Or is there a better option. Does this support upserts as well?
>> 4. Apply validation rules to the data before being written into Hive.
>> Like calling a custom spark job that will execute the validation rules and
>> split the data. Any processor that can help achieve this?
>>
>> I know a few users in this group had used kylo on top of NiFi. It will be
>> great if some of you can provide your perspective as well.
>>
>> Thanks in advance.
>>
>> Bimal Mehta
>>
>


Re: Data Ingestion using NiFi

2019-08-13 Thread Mike Thomsen
Bimal,

1. Take a look at ExecuteSQLRecord and see if that works for you. I don't
use SQL databases that much, but it works like a charm for me and others
for querying and getting an inferred avro schema based on the schema of the
database table (you can massage it into another format with ConvertRecord).
2. Take a look at QueryRecord and PartitionRecord with them configured to
use Avro readers and writers.

Mike

On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta  wrote:

> Hi NiFi users,
>
> We had been using the kylo data ingest template to read the data from our
> Oracle and DB2 databases and move it into HDFS and Hive.
> The kylo data ingest template also provided some features to validate,
> profile and split the data based on validation rules. We also built some
> custom processors and added them to the template.
> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
> don't work there. We were able to make our custom processors work in 1.9.0
> but the kylo nar files don't work. I don't know if any work around exists
> for that.
>
> However given that the kylo project is dead, I don't want to depend on
> those kylo-nar files and processors, what I wanted to understand is how do
> I replicate that functionality using the standard processors available in
> NiFi.
>
> Essentially are there processors that allow me to do the below:
> 1. Read data from database - I know QueryDatabaseTable. Any other? How do
> I make it parameterized so that I don't need to create one flow for one
> table. How can we pass the table name while running the job?
> 2. Partition and convert to avro- I know splitavro, but does it partition
> also, and how do I pass the partition parameters
> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS,
> but should I use PutSQL for Hive by converting the avro in step 2 to SQL?
> Or is there a better option. Does this support upserts as well?
> 4. Apply validation rules to the data before being written into Hive. Like
> calling a custom spark job that will execute the validation rules and split
> the data. Any processor that can help achieve this?
>
> I know a few users in this group had used kylo on top of NiFi. It will be
> great if some of you can provide your perspective as well.
>
> Thanks in advance.
>
> Bimal Mehta
>