Re: Remote Processor Group error: Unable to communicate with remote NiFi cluster
Hi Kevin, I hope the latest version would solve your problems. Just a thought, I wonder if there's any round robin stuff is going on behind the DNS.. It could produce intermittently failure. Please let us know if the issue persists after update. Thanks, Koji On Fri, Dec 23, 2016 at 3:14 AM, Kevin Verhoevenwrote: > Thanks Koji, > > The VM ip addresses have not changed and the target URL is accessible. > However, I am using a DNS alias which allows me to change the name if the ip > changes. One thing that is odd is that flowfiles have moved through the > RemoteProcessGroup successfully, but intermittently it will stop. > > I will try the 1.1.0 update to test, or wait for the 1.1.1 update. This might > just solve my problem. > > Thanks for your advice, > > Kevin > > -Original Message- > From: Koji Kawamura [mailto:ijokaruma...@gmail.com] > Sent: Wednesday, December 21, 2016 10:53 PM > To: users@nifi.apache.org > Subject: Re: Remote Processor Group error: Unable to communicate with remote > NiFi cluster > > Hello Kevin, > > Sorry to hear that you got bitten by the issue, I think the 'Unable to find > remote process group with id ...' is the same one I've encountered with NiFi > 1.0.0. > The issue has been solved since 1.1.0. > https://issues.apache.org/jira/browse/NIFI-2687 > > So, if you're able to upgrade the cluster to 1.1.0, you'll be able to update > RemoteProcessGroup. > Please also note that bug fix release 1.1.1 is under voting process right > now. It seems to be released in next few days. > I'd recommend to wait 1.1.1 if you're going to update your cluster again. > > However, I remember that although I couldn't update RemoteProcessGroup due to > NIFI-2687, NiFi was able to transfer data using RemoteProcessGroup. So, it > seems there's another issue. > > Have the VMs ip addresses (or hostnames) been changed? Is the target URL used > by the RemoteProcessGroup still accessible? > > Thanks, > Koji > > On Thu, Dec 22, 2016 at 8:18 AM, Kevin Verhoeven > wrote: >> I am running a cluster of twelve nodes on four VMs, each node runs on >> a different port. This configuration worked great on version 0.7. >> However, after an update from 0.7 to 1.0 we started seeing errors and >> flowfiles stopped moving into the Remote Processor Group. We are using >> the Remote Processor Group to send flowfiles, generated on the Primary >> Node, back into the same cluster but distributed among the other nodes. >> >> >> >> The error we see on the Remote Processor Group: >> >> >> >> Unable to refresh Remote Group’s peers due to Unable to communicate >> with remote NiFi cluster in order to determine which nodes exist in >> the remote cluster. >> >> >> >> When I attempt to change the Remote Processor I see the following error: >> >> >> >> Unable to find remote process group with id >> '87f96bb0-0158-1000--c65b55ce'. >> >> >> >> Do you have any suggestions? >> >> >> >> Thanks, >> >> >> >> Kevin
Re: Wait for child file to process successfully
Brian, You can use MergeContent in Defragment mode. Just be sure to set the number of bins used by MergeContent equal to or greater than the number of concurrent merges you expect to have going on in your flow, and to route successfully processed and failed flowfiles (after they've been gracefully handled, however it suits your use case) to the MergeContent processor. If a fragment (one of the child flowfiles) is not sent to MergeContent, it will never be able to complete the defragmentation since MergeContent would not have received all the fragments. UnpackContent keeps track of the "batch" of files that are unpacked from the original archive by assigning to each child flowfile a set of fragment attributes that provide an ID to correlate merging (defragmenting in this case), the total number of fragments, and the fragment index. After the merge is complete, you'll have a recreation of the original zip file, and it signifies that all the child flowfiles have completed processing. - Jeff On Thu, Dec 22, 2016 at 12:29 PM BD International < b.deep.internatio...@gmail.com> wrote: > Hello, > > I've got a data flow which picks up a zip file and uses UnpackContent to > extract the contents. The subsequent files are them converted to json and > stored in a database. > > I would like to store the original zip file and only delete the file once > all the extracted files have been stored correctly, has anyone else come > across a way to do this? > > Thanks in advance, > > Brian >
Re: Flow seized, cannot list flow files for full queue
You have the scenario correct. You identified the problem with my configuration correctly as well. The nifi.queue.swap.threshold was the culprit. I don't recall changing that setting, but I must have. I wonder if any value under the size of the separate swap queue (10k) causes this problem. Thanks very much for you assistance Bryan and Merry Christmas! - Nick On Thu, Dec 22, 2016 at 9:59 AM, Bryan Bendewrote: > Nick, > > Thanks for reporting back. > > Just to confirm the scenario, you ran over night without any stalling > happening, and then while nothing was stalled you stopped and started the > GeoEnrichIP processor, which then didn't consume anything from the incoming > queue? > Or were things already stalled from overnight, and you stopped and started > the processor to see if it would start processing again? > > I noticed in your nifi.properties you lowered the swap threshold to 1k, > the default is 20k. Was there a specific reason for lowering it so much? > Would you be to do another test putting that back to 20k? > > The way swapping works is that when the active queue for a processor > reaches the threshold (1k in your case), it starts putting any additional > flow files on to a separate swap queue, and when the swap queue reaches 10k > it starts writing these swapped flow file files to disk in batches of 10k. > > I wouldn't expect setting the threshold to 1k to cause no processing to > happen, but it will definitely cause a lot of extra work because as soon as > 10k flowfiles are swapped back in, you are already over the 1k threshold > again. > > One other thing to check would be to see if any crazy garbage collection > is happening during these stalls. You could probably connect JVisualVM to > one of your NiFi JVM processes and see if the GC activity graph is spiking > up. > > -Bryan > > > On Thu, Dec 22, 2016 at 11:36 AM, Nick Carenza < > nick.care...@thecontrolgroup.com> wrote: > >> I replaced the Kafka processor with PulishKafka_0_10. It didn't start >> consuming from the stalled queue. I cleared all the queues again and it ran >> overnight without stalling, longer than it has before. I stopped and >> started the geoEnrichIp processor just now to see if it would stall and it >> did. I should be able to restart a processor like that right, and it should >> start consuming the queue again? As soon as I clear the stalled queue, >> whether or not it's full, it starts flowing again. >> >> Thanks, >> Nick >> >> On Wed, Dec 21, 2016 at 11:34 AM, Bryan Bende wrote: >> >>> Thanks for the info. >>> >>> Since your Kafka broker is 0.10.1, I would be curious if you experience >>> the same behavior switching to PublishKafka_0_10. >>> >>> The Kafka processors line up like this... >>> >>> GetKafka/PutKafka use the 0.8.x Kafka client >>> ConsumeKafka/PublishKafka use the 0.9.x Kafka client >>> ConsumeKafka_0_10/PublishKafka_0_10 use the 0.10.x Kafka client >>> >>> In some cases it is possible to use a version of the client with a >>> different version of the broker, but it usually works best to use the >>> client that goes with the broker. >>> >>> I'm wondering if your PutKafka processor is getting stuck somehow, which >>> then causes back-pressure to build up all the way back to your TCP >>> processor, since it looked like all your queues were filled up. >>> >>> It is entirely possible that there is something else going on, but maybe >>> we can eliminate the Kafka processor from the list of possible problems by >>> testing with PublishKafka_0_10. >>> >>> -Bryan >>> >>> On Wed, Dec 21, 2016 at 2:25 PM, Nick Carenza < >>> nick.care...@thecontrolgroup.com> wrote: >>> Hey Brian, Thanks for taking the time! - This is nifi 1.1.0. I had the same troubles on 1.0.0 and upgraded recently with the hope there was a fix for the issue. - Kafka is version 2.11-0.10.1.0 - I am using the PutKafka processor. - Nick On Wed, Dec 21, 2016 at 11:19 AM, Bryan Bende wrote: > Hey Nick, > > Sorry to hear about these troubles. A couple of questions... > > - What version of NiFi is this? > - What version of Kafka are you using? > - Which Kafka processor in NiFi are you using? It looks like PutKafka, > but just confirming. > > Thanks, > > Bryan > > > On Wed, Dec 21, 2016 at 2:00 PM, Nick Carenza < > nick.care...@thecontrolgroup.com> wrote: > >> I am running into an issue where a processor will stop receiving flow >> files from it's queue. >> >> flow: tcp --(100,000)--> evaljsonpath --(100,000)--> geoip >> --(100,000)--> putkafka >> >> This time, putkafka is the processor that has stopped receiving >> flowfiles >> >> I will try to list the queue and I'll get a message that says the >> queue has no flow files in it. I checked the http request and the >> response >> says there are 100,000 flow
Re: Flow seized, cannot list flow files for full queue
Nick, Good news, I was able to reproduce this and I am fairly confident that as long as you increase the swap threshold above 10k you shouldn't see this problem anymore. I created this JIRA which further describes what is happening: https://issues.apache.org/jira/browse/NIFI-3250 Thanks, Bryan On Thu, Dec 22, 2016 at 12:59 PM, Bryan Bendewrote: > Nick, > > Thanks for reporting back. > > Just to confirm the scenario, you ran over night without any stalling > happening, and then while nothing was stalled you stopped and started the > GeoEnrichIP processor, which then didn't consume anything from the incoming > queue? > Or were things already stalled from overnight, and you stopped and started > the processor to see if it would start processing again? > > I noticed in your nifi.properties you lowered the swap threshold to 1k, > the default is 20k. Was there a specific reason for lowering it so much? > Would you be to do another test putting that back to 20k? > > The way swapping works is that when the active queue for a processor > reaches the threshold (1k in your case), it starts putting any additional > flow files on to a separate swap queue, and when the swap queue reaches 10k > it starts writing these swapped flow file files to disk in batches of 10k. > > I wouldn't expect setting the threshold to 1k to cause no processing to > happen, but it will definitely cause a lot of extra work because as soon as > 10k flowfiles are swapped back in, you are already over the 1k threshold > again. > > One other thing to check would be to see if any crazy garbage collection > is happening during these stalls. You could probably connect JVisualVM to > one of your NiFi JVM processes and see if the GC activity graph is spiking > up. > > -Bryan > > > On Thu, Dec 22, 2016 at 11:36 AM, Nick Carenza < > nick.care...@thecontrolgroup.com> wrote: > >> I replaced the Kafka processor with PulishKafka_0_10. It didn't start >> consuming from the stalled queue. I cleared all the queues again and it ran >> overnight without stalling, longer than it has before. I stopped and >> started the geoEnrichIp processor just now to see if it would stall and it >> did. I should be able to restart a processor like that right, and it should >> start consuming the queue again? As soon as I clear the stalled queue, >> whether or not it's full, it starts flowing again. >> >> Thanks, >> Nick >> >> On Wed, Dec 21, 2016 at 11:34 AM, Bryan Bende wrote: >> >>> Thanks for the info. >>> >>> Since your Kafka broker is 0.10.1, I would be curious if you experience >>> the same behavior switching to PublishKafka_0_10. >>> >>> The Kafka processors line up like this... >>> >>> GetKafka/PutKafka use the 0.8.x Kafka client >>> ConsumeKafka/PublishKafka use the 0.9.x Kafka client >>> ConsumeKafka_0_10/PublishKafka_0_10 use the 0.10.x Kafka client >>> >>> In some cases it is possible to use a version of the client with a >>> different version of the broker, but it usually works best to use the >>> client that goes with the broker. >>> >>> I'm wondering if your PutKafka processor is getting stuck somehow, which >>> then causes back-pressure to build up all the way back to your TCP >>> processor, since it looked like all your queues were filled up. >>> >>> It is entirely possible that there is something else going on, but maybe >>> we can eliminate the Kafka processor from the list of possible problems by >>> testing with PublishKafka_0_10. >>> >>> -Bryan >>> >>> On Wed, Dec 21, 2016 at 2:25 PM, Nick Carenza < >>> nick.care...@thecontrolgroup.com> wrote: >>> Hey Brian, Thanks for taking the time! - This is nifi 1.1.0. I had the same troubles on 1.0.0 and upgraded recently with the hope there was a fix for the issue. - Kafka is version 2.11-0.10.1.0 - I am using the PutKafka processor. - Nick On Wed, Dec 21, 2016 at 11:19 AM, Bryan Bende wrote: > Hey Nick, > > Sorry to hear about these troubles. A couple of questions... > > - What version of NiFi is this? > - What version of Kafka are you using? > - Which Kafka processor in NiFi are you using? It looks like PutKafka, > but just confirming. > > Thanks, > > Bryan > > > On Wed, Dec 21, 2016 at 2:00 PM, Nick Carenza < > nick.care...@thecontrolgroup.com> wrote: > >> I am running into an issue where a processor will stop receiving flow >> files from it's queue. >> >> flow: tcp --(100,000)--> evaljsonpath --(100,000)--> geoip >> --(100,000)--> putkafka >> >> This time, putkafka is the processor that has stopped receiving >> flowfiles >> >> I will try to list the queue and I'll get a message that says the >> queue has no flow files in it. I checked the http request and the >> response >> says there are 100,000 flow files in the queue but the flowFileSummaries >> array is
RE: Remote Processor Group error: Unable to communicate with remote NiFi cluster
Thanks Koji, The VM ip addresses have not changed and the target URL is accessible. However, I am using a DNS alias which allows me to change the name if the ip changes. One thing that is odd is that flowfiles have moved through the RemoteProcessGroup successfully, but intermittently it will stop. I will try the 1.1.0 update to test, or wait for the 1.1.1 update. This might just solve my problem. Thanks for your advice, Kevin -Original Message- From: Koji Kawamura [mailto:ijokaruma...@gmail.com] Sent: Wednesday, December 21, 2016 10:53 PM To: users@nifi.apache.org Subject: Re: Remote Processor Group error: Unable to communicate with remote NiFi cluster Hello Kevin, Sorry to hear that you got bitten by the issue, I think the 'Unable to find remote process group with id ...' is the same one I've encountered with NiFi 1.0.0. The issue has been solved since 1.1.0. https://issues.apache.org/jira/browse/NIFI-2687 So, if you're able to upgrade the cluster to 1.1.0, you'll be able to update RemoteProcessGroup. Please also note that bug fix release 1.1.1 is under voting process right now. It seems to be released in next few days. I'd recommend to wait 1.1.1 if you're going to update your cluster again. However, I remember that although I couldn't update RemoteProcessGroup due to NIFI-2687, NiFi was able to transfer data using RemoteProcessGroup. So, it seems there's another issue. Have the VMs ip addresses (or hostnames) been changed? Is the target URL used by the RemoteProcessGroup still accessible? Thanks, Koji On Thu, Dec 22, 2016 at 8:18 AM, Kevin Verhoevenwrote: > I am running a cluster of twelve nodes on four VMs, each node runs on > a different port. This configuration worked great on version 0.7. > However, after an update from 0.7 to 1.0 we started seeing errors and > flowfiles stopped moving into the Remote Processor Group. We are using > the Remote Processor Group to send flowfiles, generated on the Primary > Node, back into the same cluster but distributed among the other nodes. > > > > The error we see on the Remote Processor Group: > > > > Unable to refresh Remote Group’s peers due to Unable to communicate > with remote NiFi cluster in order to determine which nodes exist in > the remote cluster. > > > > When I attempt to change the Remote Processor I see the following error: > > > > Unable to find remote process group with id > '87f96bb0-0158-1000--c65b55ce'. > > > > Do you have any suggestions? > > > > Thanks, > > > > Kevin
Wait for child file to process successfully
Hello, I've got a data flow which picks up a zip file and uses UnpackContent to extract the contents. The subsequent files are them converted to json and stored in a database. I would like to store the original zip file and only delete the file once all the extracted files have been stored correctly, has anyone else come across a way to do this? Thanks in advance, Brian
Wait for child file to process successfully
Hello, I've got a data flow which picks up a zip file and uses UnpackContent to extract the contents. The subsequent files are them converted to json and stored in a database. I would like to store the original zip file and only delete the file once all the extracted files have been stored correctly, has anyone else come across a way to do this? Thanks in advance, Brian On 30 Nov 2016 14:28, "Aldrin Piri"wrote: Hi Andreas, 1) There is nothing from a framework perspective that provides this. However, a typical option is to make use of an attribute from an upstream processor to help categorize and handle the data. Attributes written vary from processor to processor or can be explicitly set/updated using the UpdateAttribute processor. 2) This is also something that is universally handled across the framework through processors. Some processors, such as InvokeHTTP and, I believe, those for AWS, do set such properties when a failure happens. What you are attempting to do though seems like it might be a good enhancement to add to the processor and, frankly, a reasonable request to also work toward providing more universally across components in the application. For the time being, however, your UpdateAttribute approach is the best option at this juncture. Would you mind opening up a JIRA issue so we can discuss this a bit more and evaluate trying to extend such functionality in a standardized way? On Wed, Nov 30, 2016 at 9:08 AM, Andreas Petter (External) < andreas.petter.exter...@telefonica.com> wrote: > Hello everybody, > > > > I have 2 questions: > > 1. Is there some way to find out through which relationship/queue a > FlowFile walked into a processor, in the onTrigger-Method? > > 2. Is there a generic way how errors (e.g. Exceptions) are > propagated with FlowFiles to subsequent processors? > > > > Background Story: > > I am writing a failure processor which handles failure events from > FetchSFTP outgoing relationships, writing some flowfile attributes into a > database and performing some further tasks to cope with the error. Now I > would like to know through which of the three failure-reporting-relationships > the FlowFile came along and get some generic failure information (e.g. the > Exception). Right now I am adding 3 UpdateAttribute processors which each > add an attribute identifying the relationship (and thereby the type of > error). Maybe there is a better way to do this? I am using NiFi 1.0. > > > > Thank you very much for any help you might provide. > > Kind regards, > > Andreas Petter >
Re: Flow seized, cannot list flow files for full queue
I replaced the Kafka processor with PulishKafka_0_10. It didn't start consuming from the stalled queue. I cleared all the queues again and it ran overnight without stalling, longer than it has before. I stopped and started the geoEnrichIp processor just now to see if it would stall and it did. I should be able to restart a processor like that right, and it should start consuming the queue again? As soon as I clear the stalled queue, whether or not it's full, it starts flowing again. Thanks, Nick On Wed, Dec 21, 2016 at 11:34 AM, Bryan Bendewrote: > Thanks for the info. > > Since your Kafka broker is 0.10.1, I would be curious if you experience > the same behavior switching to PublishKafka_0_10. > > The Kafka processors line up like this... > > GetKafka/PutKafka use the 0.8.x Kafka client > ConsumeKafka/PublishKafka use the 0.9.x Kafka client > ConsumeKafka_0_10/PublishKafka_0_10 use the 0.10.x Kafka client > > In some cases it is possible to use a version of the client with a > different version of the broker, but it usually works best to use the > client that goes with the broker. > > I'm wondering if your PutKafka processor is getting stuck somehow, which > then causes back-pressure to build up all the way back to your TCP > processor, since it looked like all your queues were filled up. > > It is entirely possible that there is something else going on, but maybe > we can eliminate the Kafka processor from the list of possible problems by > testing with PublishKafka_0_10. > > -Bryan > > On Wed, Dec 21, 2016 at 2:25 PM, Nick Carenza < > nick.care...@thecontrolgroup.com> wrote: > >> Hey Brian, >> >> Thanks for taking the time! >> >> - This is nifi 1.1.0. I had the same troubles on 1.0.0 and upgraded >> recently with the hope there was a fix for the issue. >> - Kafka is version 2.11-0.10.1.0 >> - I am using the PutKafka processor. >> >> - Nick >> >> On Wed, Dec 21, 2016 at 11:19 AM, Bryan Bende wrote: >> >>> Hey Nick, >>> >>> Sorry to hear about these troubles. A couple of questions... >>> >>> - What version of NiFi is this? >>> - What version of Kafka are you using? >>> - Which Kafka processor in NiFi are you using? It looks like PutKafka, >>> but just confirming. >>> >>> Thanks, >>> >>> Bryan >>> >>> >>> On Wed, Dec 21, 2016 at 2:00 PM, Nick Carenza < >>> nick.care...@thecontrolgroup.com> wrote: >>> I am running into an issue where a processor will stop receiving flow files from it's queue. flow: tcp --(100,000)--> evaljsonpath --(100,000)--> geoip --(100,000)--> putkafka This time, putkafka is the processor that has stopped receiving flowfiles I will try to list the queue and I'll get a message that says the queue has no flow files in it. I checked the http request and the response says there are 100,000 flow files in the queue but the flowFileSummaries array is empty. GET /nifi-api/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35 > c2/listing-requests/22754339-0159-1000-2dc9-07db09366132 HTTP/1.1 > { > "listingRequest": { > "id": "22754339-0159-1000-2dc9-07db09366132", > "uri": "http://ipaddress:8080/nifi-ap > i/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35c2/listi > ng-requests/22754339-0159-1000-2dc9-07db09366132", > "submissionTime": "12/21/2016 17:37:07.385 UTC", > "lastUpdated": "17:37:07 UTC", > "percentCompleted": 100, > "finished": true, > "maxResults": 100, > "state": "Completed successfully", > "queueSize": { > "byteCount": 288609476, > "objectCount": 10 > }, > "flowFileSummaries": [], > "sourceRunning": true, > "destinationRunning": true > } > } I tried stopping and starting all the processors, replacing the putkafka with a new duplicate putkafka processor and moving the queue over to it, restarting kafka itself. I ran a dump with all the processors "running". Since this is not running in a production environment, as a last resort I cleared the queue and then everything started flowing again. I have experienced this issue many times since I have begun evaluating Nifi. I have heard others having great success with it so I am convinced I have misconfigured something. I have tried to provide any relevant configuration information here: # nifi.properties nifi.version=1.1.0 nifi.flowcontroller.autoResumeState=true nifi.flowcontroller.graceful.shutdown.period=10 sec nifi.flowservice.writedelay.interval=500 ms nifi.administrative.yield.duration=30 sec nifi.bored.yield.duration=10 millis nifi.state.management.provider.local=local-provider nifi.swap.manager.implementation=org.apache.nifi.controller. FileSystemSwapManager nifi.queue.swap.threshold=1000 nifi.swap.in.period=5 sec
Re: NiFi Cron scheduling
I think the problem was worse on my mac due to Date/Time settings and auto NTP updates. When turning off the automatic NTP sync I didn't see the problem occurring anymore. Anyway, our target server is CentOS and we haven't seen the issue there. We're also running Nifi in a docker container now (even for local dev). So for the moment we're covered. But good to know you were able to find the potential culprit and were able to log an issue for it. Thx ! -- View this message in context: http://apache-nifi-users-list.2361937.n4.nabble.com/NiFi-Cron-scheduling-tp481p522.html Sent from the Apache NiFi Users List mailing list archive at Nabble.com.