Re: MergeRecord
Hello. Thanks for the answer. The 20k is just the last test, I’ve tested with 100,1000, with an input queue of 10k, and it doesn’t change anything. I will try to simplify the test case and to not use the inferred schema. Regards > Le 13 avr. 2018 à 04:50, Koji Kawamuraa écrit : > > Hello, > > I checked your template. Haven't run the flow since I don't have > sample input XML files. > However, when I looked at the MergeRecord processor configuration, I found > that: > Minimum Number of Records = 2 > Max Bin Age = 10 sec > > By briefly looked at MergeRecord source code, it expires a bin that is > not complete after Max Bin Age. > Do you have 20,000 records to merge always within 10 sec window? > If not, I recommend to lower the minimum number of records. > > I haven't checked actual MergeRecord behavior so I may be wrong, but > worth to change the configuration. > > Hope this helps, > Koji > > > On Fri, Apr 13, 2018 at 12:26 AM, DEHAY Aurelien > wrote: >> Hello. >> >> Please see the template attached. The problem we have is that, however any >> configuration we can set in the mergerecord, we can't manage it to actually >> merge record. >> >> All the record are the same format, we put an inferschema not to have to >> write it down ourselves. The only differences between schemas is then that >> the doc="" field are different. Is it possible for it to prevent the merging? >> >> Thanks for any pointer or info. >> >> >> Aurélien DEHAY >> >> >> >> This electronic transmission (and any attachments thereto) is intended >> solely for the use of the addressee(s). It may contain confidential or >> legally privileged information. If you are not the intended recipient of >> this message, you must delete it immediately and notify the sender. Any >> unauthorized use or disclosure of this message is strictly prohibited. >> Faurecia does not guarantee the integrity of this transmission and shall >> therefore never be liable if the message is altered or falsified nor for any >> virus, interception or damage to your system. This electronic transmission (and any attachments thereto) is intended solely for the use of the addressee(s). It may contain confidential or legally privileged information. If you are not the intended recipient of this message, you must delete it immediately and notify the sender. Any unauthorized use or disclosure of this message is strictly prohibited. Faurecia does not guarantee the integrity of this transmission and shall therefore never be liable if the message is altered or falsified nor for any virus, interception or damage to your system.
Fine-grained control over when a NiFi processor can run
Hello, I have a custom NiFi processor that invokes an external HTTP endpoint. That endpoint will be hosted by services running at customer sites, and those customer sites require the ability to define when the service can be called by my processor. Their goal is to prevent calls from coming in during their peak hours so that they only have to process my requests during a configurable set of off-peak hours. Additionally, we have a goal of keeping the code for making the HTTP request separate from the code for checking whether or not it is currently in a time window that requests are allowed. This is not a strict requirement, but we have many different scenarios that would like to use the HTTP request processor without any schedule restrictions and still other scenarios that would like to check schedule restrictions before running other processors. My first idea for this was to implement 2 different custom processors, one to make the HTTP request and another to check the current time against the configured schedule restrictions. Flow files would first enter the schedule restriction processor, and transfer to a “success” relationship only if the request is currently permitted against the schedule. That success relationship would then be connected to the HTTP request processor. The potential problem I see with this is that flow files could back up for some reason between the schedule restriction check processor and the HTTP requests. So a flow file could pass the schedule restriction check, wait for a while until the HTTP request processor picks up the work, and then end up sending an HTTP request outside of the permitted schedule window. I could avoid this problem completely by combining the logic into a single processor, but that makes it more difficult to reuse these processors in different combinations for the other scenarios mentioned above. I’m looking for other options to consider for addressing this workflow. I have a couple of thoughts: Implement the HTTP processor independently and then a second processor that subclasses the first to add schedule restrictions. This keeps the two bits of code separate but doesn’t give as much flexibility as I’d like Just implement this as 2 separate processors and try to figure out some way in the flow to prevent flow files from backing up between these 2 processors (not sure if this is possible) Implement the schedule restriction as a particular implementation of a controller service interface, and have the HTTP request processor depend on an instance of that controller. Alternate implementations of that controller service interface could be created that exclude the schedule restriction check. Any thoughts on these approaches? Do any alternatives come to mind that I am missing? Thanks in advance -Tim
Re: MergeRecord
Hello, I checked your template. Haven't run the flow since I don't have sample input XML files. However, when I looked at the MergeRecord processor configuration, I found that: Minimum Number of Records = 2 Max Bin Age = 10 sec By briefly looked at MergeRecord source code, it expires a bin that is not complete after Max Bin Age. Do you have 20,000 records to merge always within 10 sec window? If not, I recommend to lower the minimum number of records. I haven't checked actual MergeRecord behavior so I may be wrong, but worth to change the configuration. Hope this helps, Koji On Fri, Apr 13, 2018 at 12:26 AM, DEHAY Aurelienwrote: > Hello. > > Please see the template attached. The problem we have is that, however any > configuration we can set in the mergerecord, we can't manage it to actually > merge record. > > All the record are the same format, we put an inferschema not to have to > write it down ourselves. The only differences between schemas is then that > the doc="" field are different. Is it possible for it to prevent the merging? > > Thanks for any pointer or info. > > > Aurélien DEHAY > > > > This electronic transmission (and any attachments thereto) is intended solely > for the use of the addressee(s). It may contain confidential or legally > privileged information. If you are not the intended recipient of this > message, you must delete it immediately and notify the sender. Any > unauthorized use or disclosure of this message is strictly prohibited. > Faurecia does not guarantee the integrity of this transmission and shall > therefore never be liable if the message is altered or falsified nor for any > virus, interception or damage to your system.
Re: Multi Domains Nifi connection and UI acces
Hello, NiFi 1.6.0 has been released and it adds new nifi.property to whitelist multiple hosts so that NiFi can be accessed by different hostnames. Please see NIFI-4761 for details. I recommend updating to 1.6.0. https://issues.apache.org/jira/browse/NIFI-4761 Thanks, Koji On Wed, Apr 11, 2018 at 4:07 PM, Abdou Bwrote: > > Hello, > > I use Nifi 1.5 and I would like to use the functionnality that enables Nifi > to use various network interface. > So as stated in the offficial documentation, i created the following > properties > > nifi.web.https.network.interface.default=0.0.0.0 > > nifi.web.https.network.interface.eth0=eth0 > nifi.web.https.network.interface.eth1=eth1 > nifi.web.https.network.interface.eth2=eth2 > nifi.web.https.network.interface.eth3=eth3 > nifi.web.https.network.interface.eth4=eth4 > > I use a certificate with SAN, the SAN includes : FQDN eth0 + FQDN eth1 + > FQDN eth2 + FQDN eth3 + FQDN eth4 > > I have to use eth1 to connect to the UI > so i set up : > nifi.web.https.host=the FQDN of eth0 > but I have an error stating that > "Cluster is still in the process of voting on the appropriate Data Flow. " > > if I use the eth0, I can't connect anymore (I got a message about header > not allowed) but the cluster seems to work. > I saw that jetty seems to be more restrictive than before and allow only > host that use the nifi.web.https.host to connect to UI > > > would you have any advice ? > > Best Regards >
Re: ExecuteScript Javascript flowFile.getAttributes()
Great, thank you. That does work. On Thu, Apr 12, 2018 at 11:53 AM, Matt Burgesswrote: > Nick, > > Here's a example slightly modified from my cookbook example (I'm not > sure that works with the array brackets, might need to use .get() > instead), this one is a full working script to log each attribute: > > var flowFile = session.get() > if (flowFile != null) { > var attrs = flowFile.getAttributes(); > for each (var attr in attrs.entrySet()) { >log.info("Attribute " + attr.getKey() + " = " + attr.getValue()); > } > session.transfer(flowFile, REL_SUCCESS); > } > > Regards, > Matt > > > On Thu, Apr 12, 2018 at 2:18 PM, Nick Carenza > wrote: > > Hey users, > > > > I am having trouble getting attributes from my flowfiles as described > here > > https://community.hortonworks.com/articles/75032/ > executescript-cookbook-part-1.html. > > > > Does anyone have a working example they could share of getting and > iterating > > through all of a flowfile's attributes in javascipt? > > > > Thanks. > > Nick >
Re: ExecuteScript Javascript flowFile.getAttributes()
Nick, Here's a example slightly modified from my cookbook example (I'm not sure that works with the array brackets, might need to use .get() instead), this one is a full working script to log each attribute: var flowFile = session.get() if (flowFile != null) { var attrs = flowFile.getAttributes(); for each (var attr in attrs.entrySet()) { log.info("Attribute " + attr.getKey() + " = " + attr.getValue()); } session.transfer(flowFile, REL_SUCCESS); } Regards, Matt On Thu, Apr 12, 2018 at 2:18 PM, Nick Carenzawrote: > Hey users, > > I am having trouble getting attributes from my flowfiles as described here > https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html. > > Does anyone have a working example they could share of getting and iterating > through all of a flowfile's attributes in javascipt? > > Thanks. > Nick
ExecuteScript Javascript flowFile.getAttributes()
Hey users, I am having trouble getting attributes from my flowfiles as described here https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html . Does anyone have a working example they could share of getting and iterating through all of a flowfile's attributes in javascipt? Thanks. Nick
Restarting NiFi causing SiteToSiteBulletinReportingTask to fail
I am running HDF 3.0.1.1 which comes with NiFi 1.2.0.3.0.1.1-5. We are using SiteToSiteBulletinReportingTask to monitor bulletins (for things like Disk Usage and Memory Usage). When we restart NiFi via Ambari (either with a Restart or Stop and then Start), when NiFi comes back up the SiteToSiteBulletinReportingTask no longer works. It throws the following error when it is first trying to start up: SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000--3ccd7573] org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh Remote Group's peers due to response code 409:Conflict with explanation: null No matter how long we wait, it never works. The ways I have been able to get it to start working again are as follows: * Stop and then Start the Remote Input Port the SiteToSiteBulletinReportingTask is using * Delete the SiteToSiteBulletinReportingTask and create a new one * Wait a while and stop and start the SiteToSiteBulletinReportingTask (however this doesn't work consistently) I have tested the same flow steps using a process that uses a Remote Process Group and a different Remote Input Port, and that RPG throws the same error when first coming up but then starts working after a period of time. So maybe the SiteToSiteBulletinReportingTask isn't trying enough times to connect to the Remote Input Port? Sincerely, Chad Woodhead