Re: NiFi for backup solution
You'd only need to do that if you have strict ordering requirements like reading directly from a transaction log and replicating it. If yes I'd skip nifi unless your also doing other cases with it. Sounds like Matts path gets you going though so that might work out just fine. Thanks Joe On Oct 13, 2016 11:25 AM, "Gop Krr" wrote: > Thanks Joe and Matt. > @Joe, based on your comment, I need to use NiFi as a producer which puts > the data on Kafka queue and then have NiFi consumer, which writes the data > back to the destination. Is my understanding correct? > > @Matt, My use case is for the DynamoDB. I will look into whether > incremental copy is supported for Dynamodb. > Thanks again and felt so good to see the vibrant community. I got my > questions answered within five minutes. Kudos to NiFi community. > > On Thu, Oct 13, 2016 at 8:17 AM, Matt Burgess > wrote: > >> Rai, >> >> There are incremental data movement processors in NiFi depending on >> your source/target. For example, if your sources are files, you can >> use ListFile in combination with FetchFile, the former will keep track >> of which files it has found thus far, so if you put new files into the >> location (or update existing ones), only those new/updated files will >> be processed the next time. >> >> For database (RDBMS) sources, there are the QueryDatabaseTable and >> GenerateTableFetch processors, which support the idea of "maximum >> value columns", such that for each of said columns, the processor(s) >> will keep track of the maximum value observed in that column, then for >> future executions of the processor, only rows whose values in those >> columns exceed the currently-observed maximum will be retrieved, then >> the maximum will be updated, and so forth. >> >> The Usage documentation for these processors can be found at >> https://nifi.apache.org/docs.html (left-hand side under Processors). >> >> Regards, >> Matt >> >> On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr wrote: >> > Hi All, >> > I am learning NiFi as well as trying to deploy it in production for >> few use >> > cases. One of the use case is ETL and another use case is, using NiFi >> as a >> > backup solution, where it takes the data from one source and moves to >> > another database|file. Is anyone using NiFi for this purpose? Does NiFi >> > support incremental data move? >> > It would be awesome if someone can point me to right documentation. >> > Thanks >> > Rai >> > >
Re: NiFi for backup solution
Thanks Joe and Matt. @Joe, based on your comment, I need to use NiFi as a producer which puts the data on Kafka queue and then have NiFi consumer, which writes the data back to the destination. Is my understanding correct? @Matt, My use case is for the DynamoDB. I will look into whether incremental copy is supported for Dynamodb. Thanks again and felt so good to see the vibrant community. I got my questions answered within five minutes. Kudos to NiFi community. On Thu, Oct 13, 2016 at 8:17 AM, Matt Burgess wrote: > Rai, > > There are incremental data movement processors in NiFi depending on > your source/target. For example, if your sources are files, you can > use ListFile in combination with FetchFile, the former will keep track > of which files it has found thus far, so if you put new files into the > location (or update existing ones), only those new/updated files will > be processed the next time. > > For database (RDBMS) sources, there are the QueryDatabaseTable and > GenerateTableFetch processors, which support the idea of "maximum > value columns", such that for each of said columns, the processor(s) > will keep track of the maximum value observed in that column, then for > future executions of the processor, only rows whose values in those > columns exceed the currently-observed maximum will be retrieved, then > the maximum will be updated, and so forth. > > The Usage documentation for these processors can be found at > https://nifi.apache.org/docs.html (left-hand side under Processors). > > Regards, > Matt > > On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr wrote: > > Hi All, > > I am learning NiFi as well as trying to deploy it in production for few > use > > cases. One of the use case is ETL and another use case is, using NiFi as > a > > backup solution, where it takes the data from one source and moves to > > another database|file. Is anyone using NiFi for this purpose? Does NiFi > > support incremental data move? > > It would be awesome if someone can point me to right documentation. > > Thanks > > Rai >
Re: NiFi for backup solution
Rai, There are incremental data movement processors in NiFi depending on your source/target. For example, if your sources are files, you can use ListFile in combination with FetchFile, the former will keep track of which files it has found thus far, so if you put new files into the location (or update existing ones), only those new/updated files will be processed the next time. For database (RDBMS) sources, there are the QueryDatabaseTable and GenerateTableFetch processors, which support the idea of "maximum value columns", such that for each of said columns, the processor(s) will keep track of the maximum value observed in that column, then for future executions of the processor, only rows whose values in those columns exceed the currently-observed maximum will be retrieved, then the maximum will be updated, and so forth. The Usage documentation for these processors can be found at https://nifi.apache.org/docs.html (left-hand side under Processors). Regards, Matt On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr wrote: > Hi All, > I am learning NiFi as well as trying to deploy it in production for few use > cases. One of the use case is ETL and another use case is, using NiFi as a > backup solution, where it takes the data from one source and moves to > another database|file. Is anyone using NiFi for this purpose? Does NiFi > support incremental data move? > It would be awesome if someone can point me to right documentation. > Thanks > Rai
Re: NiFi for backup solution
Rai, NiFi can certainly be used for some data replication scenarios and quite often is. If you can treat the source like a continuous data source then there is some way to keep state about what has been pulled already, what has changed or needs yet to be pulled, and it can just keep running then generally speaking it will work out well. Depending on how the flow is setup, error conditions that can occur in remote delivery, and cluster topology NiFi won't be able to ensure the order that data is received is the order in which data is delivered. So, if you need to ensure data is copied in precisely the same order (like log replication) and each object/message/event is on the order of KBs in size then I'd recommend looking at Apache Kafka and Kafka Connect's support for keeping things ordered within the same partition of the same topic. Thanks Joe On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr wrote: > Hi All, > I am learning NiFi as well as trying to deploy it in production for few use > cases. One of the use case is ETL and another use case is, using NiFi as a > backup solution, where it takes the data from one source and moves to > another database|file. Is anyone using NiFi for this purpose? Does NiFi > support incremental data move? > It would be awesome if someone can point me to right documentation. > Thanks > Rai