Re: MergeRecord

2018-04-12 Thread DEHAY Aurelien

Hello. 

Thanks for the answer. 

The 20k is just the last test, I’ve tested with 100,1000, with an input queue 
of 10k, and it doesn’t change anything. 

I will try to simplify the test case and to not use the inferred schema. 

Regards

> Le 13 avr. 2018 à 04:50, Koji Kawamura  a écrit :
> 
> Hello,
> 
> I checked your template. Haven't run the flow since I don't have
> sample input XML files.
> However, when I looked at the MergeRecord processor configuration, I found 
> that:
> Minimum Number of Records = 2
> Max Bin Age = 10 sec
> 
> By briefly looked at MergeRecord source code, it expires a bin that is
> not complete after Max Bin Age.
> Do you have 20,000 records to merge always within 10 sec window?
> If not, I recommend to lower the minimum number of records.
> 
> I haven't checked actual MergeRecord behavior so I may be wrong, but
> worth to change the configuration.
> 
> Hope this helps,
> Koji
> 
> 
> On Fri, Apr 13, 2018 at 12:26 AM, DEHAY Aurelien
>  wrote:
>> Hello.
>> 
>> Please see the template attached. The problem we have is that, however any 
>> configuration we can set in the mergerecord, we can't manage it to actually 
>> merge record.
>> 
>> All the record are the same format, we put an inferschema not to have to 
>> write it down ourselves. The only differences between schemas is then that 
>> the doc="" field are different. Is it possible for it to prevent the merging?
>> 
>> Thanks for any pointer or info.
>> 
>> 
>> Aurélien DEHAY
>> 
>> 
>> 
>> This electronic transmission (and any attachments thereto) is intended 
>> solely for the use of the addressee(s). It may contain confidential or 
>> legally privileged information. If you are not the intended recipient of 
>> this message, you must delete it immediately and notify the sender. Any 
>> unauthorized use or disclosure of this message is strictly prohibited.  
>> Faurecia does not guarantee the integrity of this transmission and shall 
>> therefore never be liable if the message is altered or falsified nor for any 
>> virus, interception or damage to your system.

This electronic transmission (and any attachments thereto) is intended solely 
for the use of the addressee(s). It may contain confidential or legally 
privileged information. If you are not the intended recipient of this message, 
you must delete it immediately and notify the sender. Any unauthorized use or 
disclosure of this message is strictly prohibited.  Faurecia does not guarantee 
the integrity of this transmission and shall therefore never be liable if the 
message is altered or falsified nor for any virus, interception or damage to 
your system.



Fine-grained control over when a NiFi processor can run

2018-04-12 Thread Tim Dean
Hello,

I have a custom NiFi processor that invokes an external HTTP endpoint. That 
endpoint will be hosted by services running at customer sites, and those 
customer sites require the ability to define when the service can be called by 
my processor. Their goal is to prevent calls from coming in during their peak 
hours so that they only have to process my requests during a configurable set 
of off-peak hours.

Additionally, we have a goal of keeping the code for making the HTTP request 
separate from the code for checking whether or not it is currently in a time 
window that requests are allowed. This is not a strict requirement, but we have 
many different scenarios that would like to use the HTTP request processor 
without any schedule restrictions and still other scenarios that would like to 
check schedule restrictions before running other processors.

My first idea for this was to implement 2 different custom processors, one to 
make the HTTP request and another to check the current time against the 
configured schedule restrictions. Flow files would first enter the schedule 
restriction processor, and transfer to a “success” relationship only if the 
request is currently permitted against the schedule. That success relationship 
would then be connected to the HTTP request processor.

The potential problem I see with this is that flow files could back up for some 
reason between the schedule restriction check processor and the HTTP requests. 
So a flow file could pass the schedule restriction check, wait for a while 
until the HTTP request processor picks up the work, and then end up sending an 
HTTP request outside of the permitted schedule window.

I could avoid this problem completely by combining the logic into a single 
processor, but that makes it more difficult to reuse these processors in 
different combinations for the other scenarios mentioned above.

I’m looking for other options to consider for addressing this workflow. I have 
a couple of thoughts:
Implement the HTTP processor independently and then a second processor that 
subclasses the first to add schedule restrictions. This keeps the two bits of 
code separate but doesn’t give as much flexibility as I’d like
Just implement this as 2 separate processors and try to figure out some way in 
the flow to prevent flow files from backing up between these 2 processors (not 
sure if this is possible)
Implement the schedule restriction as a particular implementation of a 
controller service interface, and have the HTTP request processor depend on an 
instance of that controller. Alternate implementations of that controller 
service interface could be created that exclude the schedule restriction check.

Any thoughts on these approaches? Do any alternatives come to mind that I am 
missing?

Thanks in advance

-Tim

Re: MergeRecord

2018-04-12 Thread Koji Kawamura
Hello,

I checked your template. Haven't run the flow since I don't have
sample input XML files.
However, when I looked at the MergeRecord processor configuration, I found that:
Minimum Number of Records = 2
Max Bin Age = 10 sec

By briefly looked at MergeRecord source code, it expires a bin that is
not complete after Max Bin Age.
Do you have 20,000 records to merge always within 10 sec window?
If not, I recommend to lower the minimum number of records.

I haven't checked actual MergeRecord behavior so I may be wrong, but
worth to change the configuration.

Hope this helps,
Koji


On Fri, Apr 13, 2018 at 12:26 AM, DEHAY Aurelien
 wrote:
> Hello.
>
> Please see the template attached. The problem we have is that, however any 
> configuration we can set in the mergerecord, we can't manage it to actually 
> merge record.
>
> All the record are the same format, we put an inferschema not to have to 
> write it down ourselves. The only differences between schemas is then that 
> the doc="" field are different. Is it possible for it to prevent the merging?
>
> Thanks for any pointer or info.
>
>
> Aurélien DEHAY
>
>
>
> This electronic transmission (and any attachments thereto) is intended solely 
> for the use of the addressee(s). It may contain confidential or legally 
> privileged information. If you are not the intended recipient of this 
> message, you must delete it immediately and notify the sender. Any 
> unauthorized use or disclosure of this message is strictly prohibited.  
> Faurecia does not guarantee the integrity of this transmission and shall 
> therefore never be liable if the message is altered or falsified nor for any 
> virus, interception or damage to your system.


Re: Multi Domains Nifi connection and UI acces

2018-04-12 Thread Koji Kawamura
Hello,

NiFi 1.6.0 has been released and it adds new nifi.property to
whitelist multiple hosts so that NiFi can be accessed by different
hostnames.
Please see NIFI-4761 for details. I recommend updating to 1.6.0.
https://issues.apache.org/jira/browse/NIFI-4761

Thanks,
Koji

On Wed, Apr 11, 2018 at 4:07 PM, Abdou B  wrote:
>
> Hello,
>
>  I use Nifi 1.5 and I would like to use the functionnality that enables Nifi
> to use various network interface.
> So as stated in the offficial documentation, i created the following
> properties
>
> nifi.web.https.network.interface.default=0.0.0.0
>
> nifi.web.https.network.interface.eth0=eth0
> nifi.web.https.network.interface.eth1=eth1
> nifi.web.https.network.interface.eth2=eth2
> nifi.web.https.network.interface.eth3=eth3
> nifi.web.https.network.interface.eth4=eth4
>
> I use a certificate with SAN, the SAN includes : FQDN eth0 +  FQDN eth1 +
> FQDN eth2  + FQDN eth3 +  FQDN eth4
>
> I have to use eth1 to connect to the UI
> so i set up  :
> nifi.web.https.host=the FQDN of eth0
> but I have an error stating that
> "Cluster is still in the process of voting on the appropriate Data Flow. "
>
> if I use the  eth0, I can't connect anymore (I got a message about header
> not allowed) but the cluster seems to work.
> I saw that jetty seems to be more restrictive than before and allow only
> host that use the  nifi.web.https.host to connect to UI
>
>
> would you have any advice ?
>
> Best Regards
>


Re: ExecuteScript Javascript flowFile.getAttributes()

2018-04-12 Thread Nick Carenza
Great, thank you. That does work.

On Thu, Apr 12, 2018 at 11:53 AM, Matt Burgess  wrote:

> Nick,
>
> Here's a example slightly modified from my cookbook example (I'm not
> sure that works with the array brackets, might need to use .get()
> instead), this one is a full working script to log each attribute:
>
> var flowFile = session.get()
> if (flowFile != null) {
> var attrs = flowFile.getAttributes();
> for each (var attr in attrs.entrySet()) {
>log.info("Attribute " + attr.getKey() + " = " + attr.getValue());
>   }
>   session.transfer(flowFile, REL_SUCCESS);
> }
>
> Regards,
> Matt
>
>
> On Thu, Apr 12, 2018 at 2:18 PM, Nick Carenza
>  wrote:
> > Hey users,
> >
> > I am having trouble getting attributes from my flowfiles as described
> here
> > https://community.hortonworks.com/articles/75032/
> executescript-cookbook-part-1.html.
> >
> > Does anyone have a working example they could share of getting and
> iterating
> > through all of a flowfile's attributes in javascipt?
> >
> > Thanks.
> > Nick
>


Re: ExecuteScript Javascript flowFile.getAttributes()

2018-04-12 Thread Matt Burgess
Nick,

Here's a example slightly modified from my cookbook example (I'm not
sure that works with the array brackets, might need to use .get()
instead), this one is a full working script to log each attribute:

var flowFile = session.get()
if (flowFile != null) {
var attrs = flowFile.getAttributes();
for each (var attr in attrs.entrySet()) {
   log.info("Attribute " + attr.getKey() + " = " + attr.getValue());
  }
  session.transfer(flowFile, REL_SUCCESS);
}

Regards,
Matt


On Thu, Apr 12, 2018 at 2:18 PM, Nick Carenza
 wrote:
> Hey users,
>
> I am having trouble getting attributes from my flowfiles as described here
> https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html.
>
> Does anyone have a working example they could share of getting and iterating
> through all of a flowfile's attributes in javascipt?
>
> Thanks.
> Nick


ExecuteScript Javascript flowFile.getAttributes()

2018-04-12 Thread Nick Carenza
Hey users,

I am having trouble getting attributes from my flowfiles as described here
https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html
.

Does anyone have a working example they could share of getting and
iterating through all of a flowfile's attributes in javascipt?

Thanks.
Nick


Restarting NiFi causing SiteToSiteBulletinReportingTask to fail

2018-04-12 Thread Woodhead, Chad
I am running HDF 3.0.1.1 which comes with NiFi 1.2.0.3.0.1.1-5. We are using 
SiteToSiteBulletinReportingTask to monitor bulletins (for things like Disk 
Usage and Memory Usage). When we restart NiFi via Ambari (either with a Restart 
or Stop and then Start), when NiFi comes back up the 
SiteToSiteBulletinReportingTask no longer works. It throws the following error 
when it is first trying to start up:

SiteToSiteBulletinReportingTask[id=ba6b4499-0162-1000--3ccd7573] 
org.apache.nifi.remote.client.PeerSelector@34e976af Unable to refresh Remote 
Group's peers due to response code 409:Conflict with explanation: null

No matter how long we wait, it never works. The ways I have been able to get it 
to start working again are as follows:

  *   Stop and then Start the Remote Input Port the 
SiteToSiteBulletinReportingTask is using
  *   Delete the SiteToSiteBulletinReportingTask and create a new one
  *   Wait a while and stop and start the SiteToSiteBulletinReportingTask 
(however this doesn't work consistently)

I have tested the same flow steps using a process that uses a Remote Process 
Group and a different Remote Input Port, and that RPG throws the same error 
when first coming up but then starts working after a period of time. So maybe 
the SiteToSiteBulletinReportingTask isn't trying enough times to connect to the 
Remote Input Port?

Sincerely,
Chad Woodhead