Re: NiFi for backup solution

2016-10-13 Thread Matt Burgess
Rai, There are incremental data movement processors in NiFi depending on your source/target. For example, if your sources are files, you can use ListFile in combination with FetchFile, the former will keep track of which files it has found thus far, so if you put new files into the location (or

Re: NiFi for backup solution

2016-10-13 Thread Joe Witt
Rai, NiFi can certainly be used for some data replication scenarios and quite often is. If you can treat the source like a continuous data source then there is some way to keep state about what has been pulled already, what has changed or needs yet to be pulled, and it can just keep running then

Re: NiFi for backup solution

2016-10-13 Thread Joe Witt
You'd only need to do that if you have strict ordering requirements like reading directly from a transaction log and replicating it. If yes I'd skip nifi unless your also doing other cases with it. Sounds like Matts path gets you going though so that might work out just fine. Thanks Joe On Oct

Re: Book/Training for NiFi

2016-10-13 Thread Andy LoPresto
Hi Rai, There are some excellent documents on the Apache NiFi site [1] to help you learn. There is an Administrator Guide [2], a User Guide [3], a Developer Guide [4], a NiFi In-Depth document [5], an Expression Language Guide [6] and processor and component documentation [7] as well.

Re: Book/Training for NiFi

2016-10-13 Thread Gop Krr
Thanks Andy. Appreciate your guidance. On Thu, Oct 13, 2016 at 10:39 AM, Andy LoPresto wrote: > Hi Rai, > > There are some excellent documents on the Apache NiFi site [1] to help you > learn. There is an Administrator Guide [2], a User Guide [3], a Developer > Guide [4], a

Book/Training for NiFi

2016-10-13 Thread Gop Krr
Hi All, Is there any book for apache NiFi? Also, does Hortonworks conducts training for NiFi? Thanks Rai

GetKafka maximum fetch size

2016-10-13 Thread Igor Kravzov
Hi, I am getting the following exception in nifi-0.6.1: kafka.common.MessageSizeTooLargeException: Found a message larger than the maximum fetch size of this consumer. Increase the fetch size, or decrease the maximum message size the broker will allow. What is the max size? How can I increase

PutDynamoDB processor

2016-10-13 Thread Gop Krr
Hi All, I have been trying to use get and load processor for the dynamodb and I am almost there. I am able to run the get processor and I see, data is flowing :) But I see the following error in my nifi-app.log file: 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9]

Re: GetKafka maximum fetch size

2016-10-13 Thread Jeremy Farbota
Igor, Kafka consumer properties can be found here: http://kafka.apache.org/documentation.html#consumerconfigs GetKafka uses the old consumer so the consumer property is: fetch.message.max.bytes The default for that property is ~1M. If possible, you should limit the replica.fetch.max.bytes on

Re: PutDynamoDB processor

2016-10-13 Thread James Wing
Rai, The GetDynamoDB processor requires a hash key value to look up an item in the table. The default setting is an Expression Language statement that reads the hash key value from a flowfile attribute, dynamodb.item.hash.key.value. But this is not required. You can change it to any attribute

Re: PutDynamoDB processor

2016-10-13 Thread Gop Krr
Thanks James. I am looking to iterate through the table so that it takes hash key values one by one. Do I achieve it through the expression language? if I write an script to do that, how do I pass it to my processor? Thanks Niraj On Thu, Oct 13, 2016 at 1:42 PM, James Wing

Re: Rest API Client swagger.json

2016-10-13 Thread Matt Gilman
Thanks for submitting the PR Stephane! I see that Andy has already stated that he's reviewing. Thanks Andy! On Thu, Oct 13, 2016 at 7:42 PM, Stéphane Maarek wrote: > Investigated some more, open a JIRA issue, closed it via > https://github.com/apache/nifi/pull/1135 >

Re: Best practices for pushing to production

2016-10-13 Thread Andy LoPresto
Hi Stéphane, This is a request that has grown popular recently. NiFi was not initially designed with environment promotion in mind, so it is something we are currently investigating and trying to address. The development/QA/production environment promotion process [1] (sometimes referred to

Re: Rest API Client swagger.json

2016-10-13 Thread Stéphane Maarek
Investigated some more, open a JIRA issue, closed it via https://github.com/apache/nifi/pull/1135 On Fri, Oct 14, 2016 at 9:47 AM Stéphane Maarek wrote: > Hi, > > Thanks it helps ! Good to know there is already a java client I could use. > Nonetheless I think it would

Re: Rest API Client swagger.json

2016-10-13 Thread Andy LoPresto
Stéphane asked a question on the PR but as it was already closed, I wanted to reproduce it here for visibility and to see if other community members had something to add: Stéphane: good stuff. Quick question, what do you think of NiFi automating the build and release of API clients in various

Re: Push x Pull ETL

2016-10-13 Thread Jeff
Great to hear, Marcio! On Thu, Oct 13, 2016 at 9:26 PM Márcio Faria wrote: > Jeff, > > Many thanks. I'm now more confident NiFi could be a good fit for us. > > Marcio > > > On Wednesday, October 12, 2016 9:06 PM, Jeff wrote: > > > Hello Marcio, > >

Re: Rest API Client swagger.json

2016-10-13 Thread Stéphane Maarek
Thanks, FYI, I've started to host my own swagger-codegen generated Java Client on my github: https://github.com/simplesteph/nifi-api-client-java . Check out the docs! If you want to start playing and get a feel for it: public static void main(String[] args) { ApiClient apiClient = new

Re: Push x Pull ETL

2016-10-13 Thread Márcio Faria
Jeff, Many thanks. I'm now more confident NiFi could be a good fit for us. Marcio On Wednesday, October 12, 2016 9:06 PM, Jeff wrote: Hello Marcio, You're asking on the right list! Based on the scenario you described, I think NiFi would suit your needs.  To address

Nifi hardware recommendation

2016-10-13 Thread Ali Nazemian
Dear Nifi Users/ developers, Hi, I was wondering is there any benchmark about the question that is it better to dedicate disk control to Nifi or using RAID for this purpose? For example, which of these scenarios is recommended from the performance point of view? Scenario 1: 24 disk in total 2

Re: Nifi hardware recommendation

2016-10-13 Thread Joe Witt
Ali, You have a lot of nice resources to work with there. I'd recommend the series of RAID-1 configuration personally provided you keep in mind this means you can only lose a single disk for any one partition. As long as they're being monitored and would be quickly replaced this in practice

Re: Nifi hardware recommendation

2016-10-13 Thread Joe Witt
Ali, I agree with your assumption. It would be great to test that out and provide some numbers but intuitively I agree. I could envision certain scatter/gather data flows that could challenge that sequential access assumption but honestly with how awesome disk caching is in Linux these days in

Re: Rest API Client swagger.json

2016-10-13 Thread Matt Gilman
Stephane, Yes, you are correct that Apache NiFi uses swagger. However, we are only using it for keeping the documentation in sync. We use a maven plugin that inspects the swagger annotations and generates a swagger.json. The swagger.json is generated to nifi-web-api/target/swagger-ui/swagger.json

Re: Nifi hardware recommendation

2016-10-13 Thread Ali Nazemian
Dear Joe, Thank you very much. That was a really great explanation. I investigated the Nifi architecture, and it seems that most of the read/write operations for flow file repo and provenance repo are random. However, for content repo most of the read/write operations are sequential. Let's say

Re: Nifi hardware recommendation

2016-10-13 Thread Ali Nazemian
Thank you very much. I would be more than happy to provide some benchmark results after the implementation. Sincerely yours, Ali On Thu, Oct 13, 2016 at 11:32 PM, Joe Witt wrote: > Ali, > > I agree with your assumption. It would be great to test that out and > provide some