Re: NiFi for backup solution

2016-10-13 Thread Joe Witt
You'd only need to do that if you have strict ordering requirements like
reading directly from a transaction log and replicating it.  If yes I'd
skip nifi unless your also doing other cases with it.

Sounds like Matts path gets you going though so that might work out just
fine.

Thanks
Joe

On Oct 13, 2016 11:25 AM, "Gop Krr"  wrote:

> Thanks Joe and Matt.
> @Joe, based on your comment, I need to use NiFi as a producer which puts
> the data on Kafka queue and then have NiFi consumer, which writes the data
> back to the destination. Is my understanding correct?
>
> @Matt, My use case is for the DynamoDB. I will look into whether
> incremental copy is supported for Dynamodb.
> Thanks again and felt so good to see the vibrant community. I got my
> questions answered within five minutes. Kudos to NiFi community.
>
> On Thu, Oct 13, 2016 at 8:17 AM, Matt Burgess 
> wrote:
>
>> Rai,
>>
>> There are incremental data movement processors in NiFi depending on
>> your source/target. For example, if your sources are files, you can
>> use ListFile in combination with FetchFile, the former will keep track
>> of which files it has found thus far, so if you put new files into the
>> location (or update existing ones), only those new/updated files will
>> be processed the next time.
>>
>> For database (RDBMS) sources, there are the QueryDatabaseTable and
>> GenerateTableFetch processors, which support the idea of "maximum
>> value columns", such that for each of said columns, the processor(s)
>> will keep track of the maximum value observed in that column, then for
>> future executions of the processor, only rows whose values in those
>> columns exceed the currently-observed maximum will be retrieved, then
>> the maximum will be updated, and so forth.
>>
>> The Usage documentation for these processors can be found at
>> https://nifi.apache.org/docs.html (left-hand side under Processors).
>>
>> Regards,
>> Matt
>>
>> On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr  wrote:
>> > Hi All,
>> > I am learning NiFi as well as trying to deploy it in production for
>> few  use
>> > cases. One of the use case is ETL and another use case is, using NiFi
>> as a
>> > backup solution, where it takes the data from one source and moves to
>> > another database|file. Is anyone using NiFi for this purpose? Does NiFi
>> > support incremental data move?
>> > It would be awesome if someone can point me to right documentation.
>> > Thanks
>> > Rai
>>
>
>


Re: NiFi for backup solution

2016-10-13 Thread Gop Krr
Thanks Joe and Matt.
@Joe, based on your comment, I need to use NiFi as a producer which puts
the data on Kafka queue and then have NiFi consumer, which writes the data
back to the destination. Is my understanding correct?

@Matt, My use case is for the DynamoDB. I will look into whether
incremental copy is supported for Dynamodb.
Thanks again and felt so good to see the vibrant community. I got my
questions answered within five minutes. Kudos to NiFi community.

On Thu, Oct 13, 2016 at 8:17 AM, Matt Burgess  wrote:

> Rai,
>
> There are incremental data movement processors in NiFi depending on
> your source/target. For example, if your sources are files, you can
> use ListFile in combination with FetchFile, the former will keep track
> of which files it has found thus far, so if you put new files into the
> location (or update existing ones), only those new/updated files will
> be processed the next time.
>
> For database (RDBMS) sources, there are the QueryDatabaseTable and
> GenerateTableFetch processors, which support the idea of "maximum
> value columns", such that for each of said columns, the processor(s)
> will keep track of the maximum value observed in that column, then for
> future executions of the processor, only rows whose values in those
> columns exceed the currently-observed maximum will be retrieved, then
> the maximum will be updated, and so forth.
>
> The Usage documentation for these processors can be found at
> https://nifi.apache.org/docs.html (left-hand side under Processors).
>
> Regards,
> Matt
>
> On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr  wrote:
> > Hi All,
> > I am learning NiFi as well as trying to deploy it in production for few
> use
> > cases. One of the use case is ETL and another use case is, using NiFi as
> a
> > backup solution, where it takes the data from one source and moves to
> > another database|file. Is anyone using NiFi for this purpose? Does NiFi
> > support incremental data move?
> > It would be awesome if someone can point me to right documentation.
> > Thanks
> > Rai
>


Re: NiFi for backup solution

2016-10-13 Thread Matt Burgess
Rai,

There are incremental data movement processors in NiFi depending on
your source/target. For example, if your sources are files, you can
use ListFile in combination with FetchFile, the former will keep track
of which files it has found thus far, so if you put new files into the
location (or update existing ones), only those new/updated files will
be processed the next time.

For database (RDBMS) sources, there are the QueryDatabaseTable and
GenerateTableFetch processors, which support the idea of "maximum
value columns", such that for each of said columns, the processor(s)
will keep track of the maximum value observed in that column, then for
future executions of the processor, only rows whose values in those
columns exceed the currently-observed maximum will be retrieved, then
the maximum will be updated, and so forth.

The Usage documentation for these processors can be found at
https://nifi.apache.org/docs.html (left-hand side under Processors).

Regards,
Matt

On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr  wrote:
> Hi All,
> I am learning NiFi as well as trying to deploy it in production for few  use
> cases. One of the use case is ETL and another use case is, using NiFi as a
> backup solution, where it takes the data from one source and moves to
> another database|file. Is anyone using NiFi for this purpose? Does NiFi
> support incremental data move?
> It would be awesome if someone can point me to right documentation.
> Thanks
> Rai


Re: NiFi for backup solution

2016-10-13 Thread Joe Witt
Rai,

NiFi can certainly be used for some data replication scenarios and
quite often is.  If you can treat the source like a continuous data
source then there is some way to keep state about what has been pulled
already, what has changed or needs yet to be pulled, and it can just
keep running then generally speaking it will work out well.  Depending
on how the flow is setup, error conditions that can occur in remote
delivery, and cluster topology NiFi won't be able to ensure the order
that data is received is the order in which data is delivered.  So, if
you need to ensure data is copied in precisely the same order (like
log replication) and each object/message/event is on the order of KBs
in size then I'd recommend looking at Apache Kafka and Kafka Connect's
support for keeping things ordered within the same partition of the
same topic.

Thanks
Joe

On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr  wrote:
> Hi All,
> I am learning NiFi as well as trying to deploy it in production for few  use
> cases. One of the use case is ETL and another use case is, using NiFi as a
> backup solution, where it takes the data from one source and moves to
> another database|file. Is anyone using NiFi for this purpose? Does NiFi
> support incremental data move?
> It would be awesome if someone can point me to right documentation.
> Thanks
> Rai