Re: NPE in ListenSyslog processor

2017-04-27 Thread Conrad Crampton
F5 was sending zero length messages, now it’s not and not seeing NPE any more.
Despite this, excellent response to the issue from the community yet again – 
less than 12 hours to from report to fix!
Thank you

From: Andrew Grande <apere...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Tuesday, 25 April 2017 at 23:08
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: NPE in ListenSyslog processor


I wonder if the cause of zero length messages is the health check from the f5 
balancer. Worth verifying with your team.

Andrew

On Tue, Apr 25, 2017, 3:15 PM Andy LoPresto 
<alopre...@apache.org<mailto:alopre...@apache.org>> wrote:
PR 1694 [1] is available for this issue.

[1] https://github.com/apache/nifi/pull/1694

Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Apr 25, 2017, at 10:07 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
Thanks for the swift reply (as usual).
NIFI-3738 created [1].

I have passed over to infrastructure to try and establish cause of the zero 
length datagrams, but at least I now know there isn’t anything fundamentally 
wrong here and can (safely) ignore the errors.

Thanks
Conrad


[1] https://issues.apache.org/jira/browse/NIFI-3738

On 25/04/2017, 17:46, "Bryan Bende" <bbe...@gmail.com<mailto:bbe...@gmail.com>> 
wrote:

   Hi Conrad,

   Line 431 of ListenSyslog has the following code:

   if (!valid || !event.isValid())

   So to get an NPE there means event must be null, and event comes from this 
code:

   boolean valid = true;
   try {
   event = parser.parseEvent(rawSyslogEvent.getData(), sender);
   } catch (final ProcessException pe) {
   getLogger().warn("Failed to parse Syslog event; routing to invalid");
   valid = false;
   }

   The parser returns null if the bytes sent in are null or length 0.

   We should be checking if (!valid || event == null || !event.isValid())
   to avoid this case, and I think a similar situation exists in the
   ParseSyslog processor. It appears this would only happen if parsing
   messages is enabled in ListenSyslog.

   Do you want to create a JIRA for this?

   The other question is why you are ending up with these 0 length
   messages, but that one I am not sure about. In the case of UDP, its
   just reading from a datagram channel into a byte buffer and passing
   those bytes a long, so I think it means its receiving a 0 byte
   datagram from the sender.

   Thanks,

   Bryan


   On Tue, Apr 25, 2017 at 12:31 PM, Conrad Crampton
   <conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Hi,

Been away for a bit from this community due to other work pressures, but
picking up Nifi again and successfully upgraded to 1.1.2 (apart from
screwing up one of the nodes temporarily).

So, with the renewed interest in log processing our infrastructure team has
put in an F5 load balancer to distribute the syslog traffic I am collecting
to my 6 node cluster. This is to stop one node being the only workhorse for
receiving syslog traffic. I had previously used the ‘standard’ pattern of
having the ListenSyslog processor connect to a RPG and then the rest of my
data processing flow receive via a local port – to effectively distribute
the processing load. I was finding though that the single node was getting
too many warnings about buffer, sockets being full etc. – hence the external
load balancing.



I am no load balancing expert, but what I believe happens is the F5 load
balancer receives syslog traffic (over UDP) then distributes this load to
all Nifi nodes (gives a bit of syslog traffic to each I believe). All
appears fine, but then I start getting NPE in my node logs thus:



2017-04-25 17:16:34,832 ERROR [Timer-Driven Process Thread-7]
o.a.n.processors.standard.ListenSyslog
ListenSyslog[id=0a932c37-0158-1000--656754bf]
ListenSyslog[id=0a932c37-0158-1000--656754bf] failed to process due
to java.lang.NullPointerException; rolling back session:
java.lang.NullPointerException

2017-04-25 17:16:34,833 ERROR [Timer-Driven Process Thread-7]
o.a.n.processors.standard.ListenSyslog

java.lang.NullPointerException: null

   at
org.apache.nifi.processors.standard.ListenSyslog.onTrigger(ListenSyslog.java:431)
~[nifi-standard-processors-1.1.2.jar:1.1.2]

   at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
~[nifi-api-1.1.2.jar:1.1.2]

   at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099)
[nifi-framework-core-1.1.2.jar:1.1.2]

   at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
[nifi-framework-core-1.1.2.jar:1

Re: NPE in ListenSyslog processor

2017-04-25 Thread Conrad Crampton
Hi,
Thanks for the swift reply (as usual).
NIFI-3738 created [1].

I have passed over to infrastructure to try and establish cause of the zero 
length datagrams, but at least I now know there isn’t anything fundamentally 
wrong here and can (safely) ignore the errors.

Thanks
Conrad


[1] https://issues.apache.org/jira/browse/NIFI-3738

On 25/04/2017, 17:46, "Bryan Bende" <bbe...@gmail.com> wrote:

Hi Conrad,

Line 431 of ListenSyslog has the following code:

if (!valid || !event.isValid())

So to get an NPE there means event must be null, and event comes from this 
code:

boolean valid = true;
try {
event = parser.parseEvent(rawSyslogEvent.getData(), sender);
} catch (final ProcessException pe) {
getLogger().warn("Failed to parse Syslog event; routing to invalid");
valid = false;
}

The parser returns null if the bytes sent in are null or length 0.

We should be checking if (!valid || event == null || !event.isValid())
to avoid this case, and I think a similar situation exists in the
ParseSyslog processor. It appears this would only happen if parsing
messages is enabled in ListenSyslog.

Do you want to create a JIRA for this?

The other question is why you are ending up with these 0 length
messages, but that one I am not sure about. In the case of UDP, its
just reading from a datagram channel into a byte buffer and passing
those bytes a long, so I think it means its receiving a 0 byte
datagram from the sender.

Thanks,

Bryan


On Tue, Apr 25, 2017 at 12:31 PM, Conrad Crampton
<conrad.cramp...@secdata.com> wrote:
> Hi,
>
> Been away for a bit from this community due to other work pressures, but
> picking up Nifi again and successfully upgraded to 1.1.2 (apart from
> screwing up one of the nodes temporarily).
>
> So, with the renewed interest in log processing our infrastructure team 
has
> put in an F5 load balancer to distribute the syslog traffic I am 
collecting
> to my 6 node cluster. This is to stop one node being the only workhorse 
for
> receiving syslog traffic. I had previously used the ‘standard’ pattern of
> having the ListenSyslog processor connect to a RPG and then the rest of my
> data processing flow receive via a local port – to effectively distribute
> the processing load. I was finding though that the single node was getting
> too many warnings about buffer, sockets being full etc. – hence the 
external
> load balancing.
>
>
>
> I am no load balancing expert, but what I believe happens is the F5 load
> balancer receives syslog traffic (over UDP) then distributes this load to
> all Nifi nodes (gives a bit of syslog traffic to each I believe). All
> appears fine, but then I start getting NPE in my node logs thus:
>
>
>
> 2017-04-25 17:16:34,832 ERROR [Timer-Driven Process Thread-7]
> o.a.n.processors.standard.ListenSyslog
> ListenSyslog[id=0a932c37-0158-1000--656754bf]
> ListenSyslog[id=0a932c37-0158-1000--656754bf] failed to process 
due
> to java.lang.NullPointerException; rolling back session:
> java.lang.NullPointerException
>
> 2017-04-25 17:16:34,833 ERROR [Timer-Driven Process Thread-7]
> o.a.n.processors.standard.ListenSyslog
>
> java.lang.NullPointerException: null
>
> at
> 
org.apache.nifi.processors.standard.ListenSyslog.onTrigger(ListenSyslog.java:431)
> ~[nifi-standard-processors-1.1.2.jar:1.1.2]
>
> at
> 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> ~[nifi-api-1.1.2.jar:1.1.2]
>
> at
> 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
> at
> 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
> at
> 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
> at
> 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_51]
>
> at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_51]
>
> at
>

NPE in ListenSyslog processor

2017-04-25 Thread Conrad Crampton
Hi,
Been away for a bit from this community due to other work pressures, but 
picking up Nifi again and successfully upgraded to 1.1.2 (apart from screwing 
up one of the nodes temporarily).
So, with the renewed interest in log processing our infrastructure team has put 
in an F5 load balancer to distribute the syslog traffic I am collecting to my 6 
node cluster. This is to stop one node being the only workhorse for receiving 
syslog traffic. I had previously used the ‘standard’ pattern of having the 
ListenSyslog processor connect to a RPG and then the rest of my data processing 
flow receive via a local port – to effectively distribute the processing load. 
I was finding though that the single node was getting too many warnings about 
buffer, sockets being full etc. – hence the external load balancing.

I am no load balancing expert, but what I believe happens is the F5 load 
balancer receives syslog traffic (over UDP) then distributes this load to all 
Nifi nodes (gives a bit of syslog traffic to each I believe). All appears fine, 
but then I start getting NPE in my node logs thus:

2017-04-25 17:16:34,832 ERROR [Timer-Driven Process Thread-7] 
o.a.n.processors.standard.ListenSyslog 
ListenSyslog[id=0a932c37-0158-1000--656754bf] 
ListenSyslog[id=0a932c37-0158-1000--656754bf] failed to process due to 
java.lang.NullPointerException; rolling back session: 
java.lang.NullPointerException
2017-04-25 17:16:34,833 ERROR [Timer-Driven Process Thread-7] 
o.a.n.processors.standard.ListenSyslog
java.lang.NullPointerException: null
at 
org.apache.nifi.processors.standard.ListenSyslog.onTrigger(ListenSyslog.java:431)
 ~[nifi-standard-processors-1.1.2.jar:1.1.2]
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 ~[nifi-api-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_51]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_51]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_51]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_51]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
2017-04-25 17:16:34,833 ERROR [Timer-Driven Process Thread-7] 
o.a.n.processors.standard.ListenSyslog 
ListenSyslog[id=0a932c37-0158-1000--656754bf] 
ListenSyslog[id=0a932c37-0158-1000--656754bf] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2017-04-25 17:16:34,833 ERROR [Timer-Driven Process Thread-7] 
o.a.n.processors.standard.ListenSyslog
java.lang.NullPointerException: null
at 
org.apache.nifi.processors.standard.ListenSyslog.onTrigger(ListenSyslog.java:431)
 ~[na:na]
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 ~[nifi-api-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099)
 ~[nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
 [nifi-framework-core-1.1.2.jar:1.1.2]
   at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
 [nifi-framework-core-1.1.2.jar:1.1.2]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_51]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_51]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_51]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_51]
at 

Re: NPE MergeContent processor

2016-11-15 Thread Conrad Crampton
Hi Mark,
That is really good news. As to whether it sounds familiar – I try so many 
things when I was upgrading from 0.6.1 to 1.0.0 I couldn’t say whether this was 
indeed the cause. It may have been – I think though whilst the outcome is 
probably the same, the exact cause may have been due the aforementioned 
‘titting about’ ☺
Anyway, I was thinking of splitting my cluster in two anyway to make better use 
of the syslog ingestion (I think that will give me better throughput as I am 
seeing the syslog ingestion as a bottleneck with repeated warnings over full 
buffer), at which point I will delete the provenance repository anyway which 
will get rid of this won’t it? I’m assuming I can delete all repositories and 
just leave the flowfile.xml to have the same starting point for the workflows?
Anyway, thanks again for pursuing this and once again I am incredibly impressed 
with the reaction to issues/ bugs etc. in this community.
Regards
Conrad

From: Mark Payne <marka...@hotmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Tuesday, 15 November 2016 at 18:09
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: NPE MergeContent processor


Conrad,



Good news - I have been able to replicate the issue and track down the problem. 
I created a JIRA to address it - 
https://issues.apache.org/jira/browse/NIFI-3040.

I have a PR up to address the issue. It looks like the problem is due to 
Replaying a FlowFile from Provenance and then restarting NiFi before the 
replayed FlowFile

has completed processing. Does that sound familiar?



In the case of MergeContent you'd see a NullPointerException. In other cases it 
will generally just complain because the UUID is null.



The issue has to do with the FlowFile not being properly persisted when a 
REPLAY event occurs. So if you still have the FlowFile that is causing this, 
you'd have

to manually remove it from its queue to address the issue, but the issue 
shouldn't happen any more after this fix makes its way in.



Sorry that this has been you, but thanks for working with us to give us all we 
needed to investigate. And thanks for being patient as we've diagnosed and dug 
in.



Cheers

-Mark


From: Oleg Zhurakousky <ozhurakou...@hortonworks.com>
Sent: Friday, November 11, 2016 2:07 PM
To: users@nifi.apache.org
Subject: Re: NPE MergeContent processor

Sorry, I should have been more clear.
I’ve spent considerable amount of time slicing and dicing this thing and while 
I am still validating few possibilities, this is more likely to due to FlowFile 
being rehydrated from the corrupted repo with missing UUID and when such file’s 
ID ends up to be in a parent/child of ProvenanceEventRecord we get this issue.
Basically FlowFile must never exist without UUID similar to the way provenance 
event record where existence if UUID is validated during the call to build(). 
We should definitely do the same in a builder for FlowFile and even though it 
will not eliminate the issue it may help to pin point its origin.

I’ll raise  the corresponding JIRA to improve FlowFile validation.

Cheers
Oleg

> On Nov 11, 2016, at 3:00 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
> that said even if it is due to crashes or even disk full cases we
> should figure out what happened and make it not possible.  We must
> always work to eliminate the possibility of corruption causing events
> and work to recover well in the face of corruption...
>
> On Fri, Nov 11, 2016 at 2:57 PM, Oleg Zhurakousky
> <ozhurakou...@hortonworks.com> wrote:
>> Conrad
>>
>> Is it possible that you may be dealing with corrupted repositories (swap,
>> flow file etc.) due to your upgrades or may be even possible crashes?
>>
>> Cheers
>> Oleg
>>
>> On Nov 11, 2016, at 3:11 AM, Conrad Crampton <conrad.cramp...@secdata.com>
>> wrote:
>>
>> Hi,
>> This is the flow. The incoming flow is basically a syslog message which is
>> parsed, enriched then saved to HDFS
>> 1.   Parse (extracttext)
>> 2.   Assign matching parts to attributes
>> 3.   Enrich ip address location
>> 4.   Assign attributes with geoenrichment
>> 5.   Execute python script to parse useragent
>> 6.   Create json from attributes
>> 7.   Convert to avro (all strings)
>> 8.   Convert to target avro schema (had to do 7 & 8 due to bug(?) where
>> couldn’t go directly from json to avro with integers/longs)
>> 9.   Merge into bins (see props below)
>> 10.   Append ‘.avro’ to filenames (for reading in Spark subsequently)
>> 11.   Save to HDFS
>>
>> Does this help at all?
>> If you need anything else just shout.
>> Regards
>> Conrad
>>
>> 
>>
>>
>> 
>&

Re: Nifi- PutEmail processor issue

2016-11-14 Thread Conrad Crampton
Hi,
Can’t you just loop the failure outcome back to the PutEmail processor – that 
way it will keep looping until sent. (or am I missing something?)
Regards
Conrad

From: "Gadiputi, Sravani" 
Reply-To: "users@nifi.apache.org" 
Date: Tuesday, 15 November 2016 at 04:57
To: "users@nifi.apache.org" 
Subject: Nifi- PutEmail processor issue


Hi,

I have used PutEmail processor in my project to send email notification for 
successful/failure copying of a files.
Each file flow having corresponding PutEmail to send  email notification to 
respective recipients.

Here the issue is, sometimes email notification will send to respective 
recipients successfully  for successful/failure job.
But sometimes for any one specific job email notification will not be send to 
recipients though job is successful, due to  below error.

Error:

Could not connect to SMTP host
Java.net.ConnectException: Connection timed out

Could you please suggest me how we can overcome this error.


Thanks,
Sravani

This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.


***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: NPE MergeContent processor

2016-11-10 Thread Conrad Crampton
Hi,
The processor continues to write (to HDFS – the next processor in flow) and 
doesn’t block any others coming into this processor (MergeContent), so not 
quite the same observed behaviour as NIFI-2015.
If there is anything else you would like me to do to help with this more than 
happy to help.
Regards
Conrad

From: Bryan Bende <bbe...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 10 November 2016 at 14:59
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: NPE MergeContent processor

Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=2356], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=705637, 
length=2317],offset=0,name=,size=2317], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=725376, 
length=2333],offset=0,name=,size=2333], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=728703, 
length=2377],offset=0,name=,size=2377]] into 
StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-ed3c991fd631,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1478709819819-416, container=default, 
section=416], offset=982498, 
length=4576],offset=0,name=3649103647775837,size=4576]
2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent
java.lang.NullPointerException: null
at 
org.apache.nifi.stream.io<http://org.apache.nifi.stream.io>.DataOutputStream.writeUTF(DataOutputStream.java:300)
 ~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.stream.io<http://org.apache.nifi.stream.io>.DataOutputStream.writeUTF(DataOutputStream.java:281)
 ~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUID(StandardRecordWriter.java:257)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUIDs(StandardRecordWriter.java:266)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRe

NPE MergeContent processor

2016-11-09 Thread Conrad Crampton
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=2356], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=705637, 
length=2317],offset=0,name=,size=2317], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=725376, 
length=2333],offset=0,name=,size=2333], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=728703, 
length=2377],offset=0,name=,size=2377]] into 
StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-ed3c991fd631,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1478709819819-416, container=default, 
section=416], offset=982498, 
length=4576],offset=0,name=3649103647775837,size=4576]
2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent
java.lang.NullPointerException: null
at 
org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:300) 
~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:281) 
~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUID(StandardRecordWriter.java:257)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUIDs(StandardRecordWriter.java:266)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:232)
 ~[na:na]
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:766)
 ~[na:na]
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.registerEvents(PersistentProvenanceRepository.java:432)
 ~[na:na]
at 
org.apache.nifi.controller.repository.StandardProcessSession.updateProvenanceRepo(StandardProcessSession.java:713)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:311)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:299)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.processor.util.bin.BinFiles.processBins(BinFiles.java:256) 
~[nifi-processor-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.processor.util.bin.BinFiles.onTrigger(BinFiles.java:190) 

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-21 Thread Conrad Crampton
I’ve just tried to replicate but without success. Using a RPG with RAW transfer 
protocol.

If you want I can try removing the additional lines in my nifi.properties (the 
ones for http.port and host that I added which appeared to get this working?

Regards
Conrad


On 21/10/2016, 15:24, "Joe Witt" <joe.w...@gmail.com> wrote:

Whoa there can you turn on debug logging in conf/logback.xml by adding

   

Can you submit a JIRA for the log output with the stacktrace for those
NPEs please.

Thanks
Joe

On Fri, Oct 21, 2016 at 10:21 AM, Conrad Crampton
<conrad.cramp...@secdata.com> wrote:
> Hi,
>
> I realise this may be getting boring for most but…
>
> I didn’t find any resolution to the upgrade so I have cleanly installed
> v1.0.0 and oddly experienced the same issue with RPGs in that although the
> RPGs didn’t show any errors (in so much as they had no warning on them and
> reported that S2S was secure) the errors reported were about not being 
able
> to determine other nodes in cluster.
>
> This is a section of log that showed an error that I don’t think I saw
> before but only including here in case one of you fine folks need it for 
any
> reason…
>
>
>
> ERROR [Site-to-Site Worker Thread-194]
> o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote
> instance Peer[url=nifi://nifinode3-cm1.mis.local:57289]
> 
(SocketFlowFileServerProtocol[CommsID=04674c10-7351-443f-99c8-bffc59d650a7])
> due to java.lang.NullPointerException; closing connection
>
> 2016-10-21 12:19:11,401 ERROR [Site-to-Site Worker Thread-195]
> o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote
> instance Peer[url=nifi://nifinode1-cm1.mis.local:35831]
> 
(SocketFlowFileServerProtocol[CommsID=97eb2f1c-f3dd-4924-89c9-d294bb4037f5])
> due to java.lang.NullPointerException; closing connection
>
> 2016-10-21 12:19:11,455 ERROR [Site-to-Site Worker Thread-196]
> o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote
> instance Peer[url=nifi://nifinode5-cm1.mis.local:59093]
> 
(SocketFlowFileServerProtocol[CommsID=61e129ca-2e21-477a-8201-16b905e5beb6])
> due to java.lang.NullPointerException; closing connection
>
> 2016-10-21 12:19:11,462 ERROR [Site-to-Site Worker Thread-197]
> o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote
> instance Peer[url=nifi://nifinode6-cm1.mis.local:37275]
> 
(SocketFlowFileServerProtocol[CommsID=48ec62f4-a9ba-45a7-a149-4892d0193819])
> due to java.lang.NullPointerException; closing connection
>
> 2016-10-21 12:19:11,470 ERROR [Site-to-Site Worker Thread-198]
> o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote
> instance Peer[url=nifi://nifinode4-cm1.mis.local:57745]
> 
(SocketFlowFileServerProtocol[CommsID=266459a8-7a9b-44bd-8047-b5ba4d9b67ec])
> due to java.lang.NullPointerException; closing connection
>
>
>
> in my naivety this suggests a problem with the socket (RAW) connection
> protocol. The relevant section for S2S connection in my nifi.properties is
>
> nifi.remote.input.socket.host=nifinode6-cm1.mis.local // the number
> different for each node obviously
>
> nifi.remote.input.socket.port=10443
>
> nifi.remote.input.secure=true
>
> nifi.remote.input.http.enabled=false
>
> nifi.remote.input.http.transaction.ttl=30 sec
>
>
>
> the errors suggest that the port specified here aren’t being used, but 
some
> random ports are being used instead. Of course this is complete 
supposition
> and probably a red herring.
>
>
>
> Anyway, I updated my nifi.properties to
>
> nifi.remote.input.socket.host=nifinode6-cm1.mis.local
>
> nifi.remote.input.http.host=nifinode6-cm1.mis.local
>
> nifi.remote.input.socket.port=10443
>
> nifi.remote.input.http.port=11443
>
> nifi.remote.input.secure=true
>
> nifi.remote.input.http.enabled=true
>
> nifi.remote.input.http.transaction.ttl=30 sec
>
>
>
> and used HTTP for my RPG and is now working ok.
>
>
>
> The test harness for this is
>
>
>
> GenerateFlowFile->RGP(test input port)
>
> InputPort(test input)->LogAttribute
>
>
>
> Regards,
>
> Conrad
>
>
>
>
>
>
>
> From: Conrad Crampton <conrad.cramp...@secdata.com>

RE: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-21 Thread Conrad Crampton
Hi,
I realise this may be getting boring for most but…
I didn’t find any resolution to the upgrade so I have cleanly installed v1.0.0 
and oddly experienced the same issue with RPGs in that although the RPGs didn’t 
show any errors (in so much as they had no warning on them and reported that 
S2S was secure) the errors reported were about not being able to determine 
other nodes in cluster.
This is a section of log that showed an error that I don’t think I saw before 
but only including here in case one of you fine folks need it for any reason…

ERROR [Site-to-Site Worker Thread-194] o.a.nifi.remote.SocketRemoteSiteListener 
Unable to communicate with remote instance 
Peer[url=nifi://nifinode3-cm1.mis.local:57289] 
(SocketFlowFileServerProtocol[CommsID=04674c10-7351-443f-99c8-bffc59d650a7]) 
due to java.lang.NullPointerException; closing connection
2016-10-21 12:19:11,401 ERROR [Site-to-Site Worker Thread-195] 
o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote 
instance Peer[url=nifi://nifinode1-cm1.mis.local:35831] 
(SocketFlowFileServerProtocol[CommsID=97eb2f1c-f3dd-4924-89c9-d294bb4037f5]) 
due to java.lang.NullPointerException; closing connection
2016-10-21 12:19:11,455 ERROR [Site-to-Site Worker Thread-196] 
o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote 
instance Peer[url=nifi://nifinode5-cm1.mis.local:59093] 
(SocketFlowFileServerProtocol[CommsID=61e129ca-2e21-477a-8201-16b905e5beb6]) 
due to java.lang.NullPointerException; closing connection
2016-10-21 12:19:11,462 ERROR [Site-to-Site Worker Thread-197] 
o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote 
instance Peer[url=nifi://nifinode6-cm1.mis.local:37275] 
(SocketFlowFileServerProtocol[CommsID=48ec62f4-a9ba-45a7-a149-4892d0193819]) 
due to java.lang.NullPointerException; closing connection
2016-10-21 12:19:11,470 ERROR [Site-to-Site Worker Thread-198] 
o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote 
instance Peer[url=nifi://nifinode4-cm1.mis.local:57745] 
(SocketFlowFileServerProtocol[CommsID=266459a8-7a9b-44bd-8047-b5ba4d9b67ec]) 
due to java.lang.NullPointerException; closing connection

in my naivety this suggests a problem with the socket (RAW) connection 
protocol. The relevant section for S2S connection in my nifi.properties is
nifi.remote.input.socket.host=nifinode6-cm1.mis.local // the number different 
for each node obviously
nifi.remote.input.socket.port=10443
nifi.remote.input.secure=true
nifi.remote.input.http.enabled=false
nifi.remote.input.http.transaction.ttl=30 sec

the errors suggest that the port specified here aren’t being used, but some 
random ports are being used instead. Of course this is complete supposition and 
probably a red herring.

Anyway, I updated my nifi.properties to
nifi.remote.input.socket.host=nifinode6-cm1.mis.local
nifi.remote.input.http.host=nifinode6-cm1.mis.local
nifi.remote.input.socket.port=10443
nifi.remote.input.http.port=11443
nifi.remote.input.secure=true
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec

and used HTTP for my RPG and is now working ok.

The test harness for this is

GenerateFlowFile->RGP(test input port)
InputPort(test input)->LogAttribute

Regards,
Conrad



From: Conrad Crampton <conrad.cramp...@secdata.com>
Date: Friday, 21 October 2016 at 08:18
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

Hi,
Yes, the access policy for the input port at the root level of my workspace has 
S2S access policy (receive data via site to site) for all server nodes (I have 
all my nodes in one group).
For the next level down in my workspace (I have other process groups from my 
main ‘root’ page in the UI space to organise the flows), I have input ports on 
the next level down of flows which when I check the access policies on that for 
S2S, the receive and send data via site-to-site options are greyed out and if I 
try to override the policy, they still are. I don’t know if this is important, 
but from reading the post below, the issue that the access policies looks to 
address is different from my issue. The RPG has a locked padlock and says site 
to site is secure. I don’t have any warning triangles on the RPG itself, but I 
have the aforementioned warnings in my logs.

I’m going to abandon this now as I can’t get it to work.

I’m going to try a fresh install and go with that – and have to recreate all my 
flows where necessary. I’m moving to a new model of data ingestion anyway so 
isn’t as catastrophic as it might be.

Thanks for the help.
Conrad

From: Andy LoPresto <alopre...@apache.org>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 20 October 2016 at 17:31
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

* 

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-21 Thread Conrad Crampton
Hi,
Yes, the access policy for the input port at the root level of my workspace has 
S2S access policy (receive data via site to site) for all server nodes (I have 
all my nodes in one group).
For the next level down in my workspace (I have other process groups from my 
main ‘root’ page in the UI space to organise the flows), I have input ports on 
the next level down of flows which when I check the access policies on that for 
S2S, the receive and send data via site-to-site options are greyed out and if I 
try to override the policy, they still are. I don’t know if this is important, 
but from reading the post below, the issue that the access policies looks to 
address is different from my issue. The RPG has a locked padlock and says site 
to site is secure. I don’t have any warning triangles on the RPG itself, but I 
have the aforementioned warnings in my logs.

I’m going to abandon this now as I can’t get it to work.

I’m going to try a fresh install and go with that – and have to recreate all my 
flows where necessary. I’m moving to a new model of data ingestion anyway so 
isn’t as catastrophic as it might be.

Thanks for the help.
Conrad

From: Andy LoPresto <alopre...@apache.org>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 20 October 2016 at 17:31
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

* PGP Signed by an unknown key
Conrad,

For the site-to-site did you follow the instructions here [1]? Each node needs 
to be added as a user in order to make the connections.

[1] 
http://bryanbende.com/development/2016/08/30/apache-nifi-1.0.0-secure-site-to-site

Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Oct 20, 2016, at 7:36 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Ok,
So I tried everything suggested so far to no avail unfortunately.

So what I have done is to create all new certs etc. using the tookit. Updated 
my existing authoriszed-users.xml to have to match the full cert distinguished 
names CN=server.name, OU=NIFI etc.

Recreated all my remote process groups to not reference the original NCM as 
that still wouldn’t work – after a complete new install (upgrade).

So now what I have is a six node cluster using original data/worker nodes and 
they are part of the cluster – all appears to be working ie. I can log into the 
UI (nice by the way ;-) on each server. There are no SSL handshake errors etc. 
BUT the RPG (newly created) still don’t appear to be working. I am getting

11:34:24 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
nifi6-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@782af623 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:25 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
nifi1-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@3a547274 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:25 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi2-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@54c2df1 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:25 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi5-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@50d59f3c Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:26 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
nifi2-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@97c92ef Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:26 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi6-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@70663037 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:27 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi4-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@3c040426 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster

I can telnet from server to server on both https (UI) port and S2S port.
I am really at a loss as to what to do now.

Data is queuing up in my input processors with nowhere to go.
Do I have to do someth

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-20 Thread Conrad Crampton
l.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Oct 19, 2016, at 1:03 PM, Bryan Bende 
<bbe...@gmail.com<mailto:bbe...@gmail.com>> wrote:

That is definitely weird that it is only an issue on the node that used to be 
the NCM.
Might be worth double checking the keystore and truststore of that one node and 
make sure it has what you would expect, and also double check nifi.properties 
compared to the others to see if anything seems different.

Changing all of the keystores, truststores, etc should be fine from a data 
perspective...

If you decide to go that route it would probably be easiest to start back over 
from a security perspective, meaning:
- Stop all the nodes and delete the users.xml and authorizations.xml from all 
nodes
- Configure authorizers.xml with the appropriate initial admin (or legacy file) 
and node identities based on the new certs
- Ensure authorizers.xml is the same on all nodes
- Then restart everything

Alternatively, you might be able to manually add users for all of the new certs 
before shutting everything down and give them the appropriate policies, then 
restart everything, but requires more manual work to get everything lined up.


On Wed, Oct 19, 2016 at 11:52 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
As a plan for tomorrow – I have generated new keystores, truststores, client 
certts  etc. for all nodes in my cluster using the

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 19 October 2016 at 15:33

To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups


Trying to think of things to check here...

Does every node have nifi.remote.input.secure=true in nifi.properties and the 
URL in the RPG is an https URL?

On Wed, Oct 19, 2016 at 10:25 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
One other thing…
The RPGs have an unlocked padlock on them saying S2S is not secure.
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 19 October 2016 at 15:20
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

Ok that does seem like a TLS/SSL issue...

Is this a single cluster doing site-to-site to itself?

On Wed, Oct 19, 2016 at 10:06 AM, Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
thanks conrad - did get it.  Bryan is being more helpful that I so I
went silent :-)

On Wed, Oct 19, 2016 at 10:02 AM, Conrad Crampton
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
> Hi Joe,
> Yep,
> Tried removing the RPG that referenced the NCM and adding new one with 
> one of the nifis as url.
> That sort of worked, but kept getting errors about the NCM not being 
> available for the ports and therefore couldn’t actually enable the port I 
> needed to for that RPG.
> Thanks
> Conrad
>
> (sending again as don’t know if the stupid header ‘spoofed’ is stopping 
> getting though – apologies if already sent)
>
> On 19/10/2016, 14:12, "Joe Witt" 
> <joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
>
> Conrad,
>
> For s2s now you can just point at any of the nodes in the cluster.
> Have you tried changing the URL or removing and adding new RPG
> entries?
>
> Thanks
> Joe
>
> On Wed, Oct 19, 2016 at 8:38 AM, Conrad Crampton
> <conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> 
> wrote:
> > Hi,
> >
> > I have finally taken the plunge to upgrade my cluster from 0.6.1 to 
> 1.0.0.
> >
> > 6 nodes with a NCM.
> >
> > With the removal of NCM in 1.0.0 I believe I now have an issue 
> where none of
> > my Remote Process Groups work as they previously did because they 
> were
> > configured to connect to the NCM (as the RPG url) which now doesn’t 
> exist.
> >
> > I have tried converting my NCM to a node but whilst I can get it 
> running
> &

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-19 Thread Conrad Crampton
Hi,
As a plan for tomorrow – I have generated new keystores, truststores, client 
certts  etc. for all nodes in my cluster using the

From: Bryan Bende <bbe...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Wednesday, 19 October 2016 at 15:33
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

Trying to think of things to check here...

Does every node have nifi.remote.input.secure=true in nifi.properties and the 
URL in the RPG is an https URL?

On Wed, Oct 19, 2016 at 10:25 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
One other thing…
The RPGs have an unlocked padlock on them saying S2S is not secure.
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 19 October 2016 at 15:20
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

Ok that does seem like a TLS/SSL issue...

Is this a single cluster doing site-to-site to itself?

On Wed, Oct 19, 2016 at 10:06 AM, Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
thanks conrad - did get it.  Bryan is being more helpful that I so I
went silent :-)

On Wed, Oct 19, 2016 at 10:02 AM, Conrad Crampton
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
> Hi Joe,
> Yep,
> Tried removing the RPG that referenced the NCM and adding new one with 
> one of the datanodes as url.
> That sort of worked, but kept getting errors about the NCM not being 
> available for the ports and therefore couldn’t actually enable the port I 
> needed to for that RPG.
> Thanks
> Conrad
>
> (sending again as don’t know if the stupid header ‘spoofed’ is stopping 
> getting though – apologies if already sent)
>
> On 19/10/2016, 14:12, "Joe Witt" 
> <joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
>
> Conrad,
>
> For s2s now you can just point at any of the nodes in the cluster.
> Have you tried changing the URL or removing and adding new RPG
> entries?
>
> Thanks
> Joe
>
> On Wed, Oct 19, 2016 at 8:38 AM, Conrad Crampton
> <conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> 
> wrote:
> > Hi,
> >
> > I have finally taken the plunge to upgrade my cluster from 0.6.1 to 
> 1.0.0.
> >
> > 6 nodes with a NCM.
> >
> > With the removal of NCM in 1.0.0 I believe I now have an issue 
> where none of
> > my Remote Process Groups work as they previously did because they 
> were
> > configured to connect to the NCM (as the RPG url) which now doesn’t 
> exist.
> >
> > I have tried converting my NCM to a node but whilst I can get it 
> running
> > (sort of) when I try and connect to the cluster I get something 
> like this in
> > my logs…
> >
> >
> >
> > 2016-10-19 13:14:44,109 ERROR [main] 
> o.a.nifi.controller.StandardFlowService
> > Failed to load flow from cluster due to:
> > org.apache.nifi.controller.UninheritableFlowException: Failed to 
> connect
> > node to cluster because local flow is different than cluster flow.
> >
> > org.apache.nifi.controller.UninheritableFlowException: Failed to 
> connect
> > node to cluster because local flow is different than cluster flow.
> >
> > at
> > 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:879)
> > ~[nifi-framework-core-1.0.0.jar:1.0.0]
> >
> > at
> > 
> org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:493)
> > ~[nifi-framework-core-1.0.0.jar:1.0.0]
> >
> > at
> > org.apache.nifi.web.server.JettyServer.start(JettyServer.java:746)
> > [nifi-jetty-1.0.0.jar:1.0.0]
> >
> > at org.apache.nifi.NiFi.(NiFi.java:152)
> > [nifi-runtime-1.0.0.jar:1.0.0]
> >
> > at org.apache.nifi.NiFi.m

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-19 Thread Conrad Crampton
Yes.
All 6 nodes and 1 NCM were the original set up
After upgrade not fussed whether I lost the NCM and went to just the 6 nodes or 
re-introduced the redundant NCM as a new node (for actually doing stuff). I had 
to go the latter route in the end as the RPG were complaining about it not 
being there.
So the S2S stuff and RPG are only there to split the data after being received 
by primary node.
Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Wednesday, 19 October 2016 at 15:20
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

Ok that does seem like a TLS/SSL issue...

Is this a single cluster doing site-to-site to itself?

On Wed, Oct 19, 2016 at 10:06 AM, Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
thanks conrad - did get it.  Bryan is being more helpful that I so I
went silent :-)

On Wed, Oct 19, 2016 at 10:02 AM, Conrad Crampton
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
> Hi Joe,
> Yep,
> Tried removing the RPG that referenced the NCM and adding new one with 
> one of the datanodes as url.
> That sort of worked, but kept getting errors about the NCM not being 
> available for the ports and therefore couldn’t actually enable the port I 
> needed to for that RPG.
> Thanks
> Conrad
>
> (sending again as don’t know if the stupid header ‘spoofed’ is stopping 
> getting though – apologies if already sent)
>
> On 19/10/2016, 14:12, "Joe Witt" 
> <joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
>
> Conrad,
>
> For s2s now you can just point at any of the nodes in the cluster.
> Have you tried changing the URL or removing and adding new RPG
> entries?
>
> Thanks
> Joe
>
> On Wed, Oct 19, 2016 at 8:38 AM, Conrad Crampton
> <conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> 
> wrote:
> > Hi,
> >
> > I have finally taken the plunge to upgrade my cluster from 0.6.1 to 
> 1.0.0.
> >
> > 6 nodes with a NCM.
> >
> > With the removal of NCM in 1.0.0 I believe I now have an issue 
> where none of
> > my Remote Process Groups work as they previously did because they 
> were
> > configured to connect to the NCM (as the RPG url) which now doesn’t 
> exist.
> >
> > I have tried converting my NCM to a node but whilst I can get it 
> running
> > (sort of) when I try and connect to the cluster I get something 
> like this in
> > my logs…
> >
> >
> >
> > 2016-10-19 13:14:44,109 ERROR [main] 
> o.a.nifi.controller.StandardFlowService
> > Failed to load flow from cluster due to:
> > org.apache.nifi.controller.UninheritableFlowException: Failed to 
> connect
> > node to cluster because local flow is different than cluster flow.
> >
> > org.apache.nifi.controller.UninheritableFlowException: Failed to 
> connect
> > node to cluster because local flow is different than cluster flow.
> >
> > at
> > 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:879)
> > ~[nifi-framework-core-1.0.0.jar:1.0.0]
> >
> > at
> > 
> org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:493)
> > ~[nifi-framework-core-1.0.0.jar:1.0.0]
> >
> > at
> > org.apache.nifi.web.server.JettyServer.start(JettyServer.java:746)
> > [nifi-jetty-1.0.0.jar:1.0.0]
> >
> > at org.apache.nifi.NiFi.(NiFi.java:152)
> > [nifi-runtime-1.0.0.jar:1.0.0]
> >
> > at org.apache.nifi.NiFi.main(NiFi.java:243)
> > [nifi-runtime-1.0.0.jar:1.0.0]
> >
> > Caused by: org.apache.nifi.controller.UninheritableFlowException: 
> Proposed
> > Authorizer is not inheritable by the flow controller because of 
> Authorizer
> > differences: Proposed Authorizations do not match current 
> Authorizations
> >
> > at
> > 
> org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:252)
> > ~[nifi-framework-core-1.0.0.jar:1.0.0]
> >
>  

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-19 Thread Conrad Crampton
Hi,
Ok, so I have now connected my (old NCM) server to the cluster, but now I am 
getting reports in the other server logs that can’t connect due to SSL 
handshake exception…

2016-10-19 14:57:02,363 WARN [NiFi Site-to-Site Connection Pool Maintenance] 
o.apache.nifi.remote.client.PeerSelector 
org.apache.nifi.remote.client.PeerSelector@590db21a Unable to refresh Remote 
Group's peers due to Remote host closed connection during handshake
2016-10-19 14:57:02,379 WARN [NiFi Site-to-Site Connection Pool Maintenance] 
o.apache.nifi.remote.client.PeerSelector 
org.apache.nifi.remote.client.PeerSelector@4db0b1a7 Unable to refresh Remote 
Group's peers due to Remote host closed connection during handshake
2016-10-19 14:57:02,379 WARN [NiFi Site-to-Site Connection Pool Maintenance] 
o.apache.nifi.remote.client.PeerSelector 
org.apache.nifi.remote.client.PeerSelector@52ab2525 Unable to refresh Remote 
Group's peers due to Remote host closed connection during handshake

etc.etc.etc….

Prior to upgrade all servers ‘talked’ to each other no problem. This would 
suggest a problem with certs, but can’t see why this would be introduced as an 
issue with the upgrade.

For what its worth, I have followed the wiki article from way back about 
separating off new installs from a common root of directories (for repositories 
etc.).

Any other suggestions??
Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Wednesday, 19 October 2016 at 14:16
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

Regarding the error about the uninheritable flow caused by the "proposed 
authorizations do not match current authorizations"... that basically means 
that the node trying to connect has a different set of authorizations 
(users/groups/policies) from what the other nodes in the cluster have.
This most likely means that something in the authorizers.xml on the new node is 
different from the authorizers.xml on the other nodes, and thus generated 
different users/groups/policies on that node during start up.

The process to add a new node to an existing cluster would be the following...

- From the UI, add a user for the DN of the new node
- Go to the new node and configure authorizers.xml so that it does not have an 
initial admin, legacy file, or node identities, by having none of this it will 
inherit everything from the cluster
- Start the new node

Since you already attempted to start this node, you will want to stop it and 
delete users.xml and authorizations.xml before attempting the above process.

Sorry if the documentation is not clear on this.


On Wed, Oct 19, 2016 at 9:12 AM, Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
Conrad,

For s2s now you can just point at any of the nodes in the cluster.
Have you tried changing the URL or removing and adding new RPG
entries?

Thanks
Joe

On Wed, Oct 19, 2016 at 8:38 AM, Conrad Crampton
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
> Hi,
>
> I have finally taken the plunge to upgrade my cluster from 0.6.1 to 1.0.0.
>
> 6 nodes with a NCM.
>
> With the removal of NCM in 1.0.0 I believe I now have an issue where none of
> my Remote Process Groups work as they previously did because they were
> configured to connect to the NCM (as the RPG url) which now doesn’t exist.
>
> I have tried converting my NCM to a node but whilst I can get it running
> (sort of) when I try and connect to the cluster I get something like this in
> my logs…
>
>
>
> 2016-10-19 13:14:44,109 ERROR [main] o.a.nifi.controller.StandardFlowService
> Failed to load flow from cluster due to:
> org.apache.nifi.controller.UninheritableFlowException: Failed to connect
> node to cluster because local flow is different than cluster flow.
>
> org.apache.nifi.controller.UninheritableFlowException: Failed to connect
> node to cluster because local flow is different than cluster flow.
>
> at
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:879)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:493)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> org.apache.nifi.web.server.JettyServer.start(JettyServer.java:746)
> [nifi-jetty-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.NiFi.(NiFi.java:152)
> [nifi-runtime-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.NiFi.main(NiFi.java:243)
> [nifi-runtime-1.0.0.jar:1.0.0]
>
> Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed
> Authorizer is not inheritable by the flow controller because of Authorizer
> differences: P

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-19 Thread Conrad Crampton
Hi Joe,
Yep,
Tried removing the RPG that referenced the NCM and adding new one with one 
of the datanodes as url. 
That sort of worked, but kept getting errors about the NCM not being 
available for the ports and therefore couldn’t actually enable the port I 
needed to for that RPG.
Thanks
Conrad

(sending again as don’t know if the stupid header ‘spoofed’ is stopping getting 
though – apologies if already sent)

On 19/10/2016, 14:12, "Joe Witt" <joe.w...@gmail.com> wrote:

Conrad,

For s2s now you can just point at any of the nodes in the cluster.
Have you tried changing the URL or removing and adding new RPG
entries?

Thanks
Joe

On Wed, Oct 19, 2016 at 8:38 AM, Conrad Crampton
<conrad.cramp...@secdata.com> wrote:
> Hi,
>
> I have finally taken the plunge to upgrade my cluster from 0.6.1 to 
1.0.0.
>
> 6 nodes with a NCM.
>
> With the removal of NCM in 1.0.0 I believe I now have an issue where 
none of
> my Remote Process Groups work as they previously did because they were
> configured to connect to the NCM (as the RPG url) which now doesn’t 
exist.
>
> I have tried converting my NCM to a node but whilst I can get it 
running
> (sort of) when I try and connect to the cluster I get something like 
this in
> my logs…
>
>
>
> 2016-10-19 13:14:44,109 ERROR [main] 
o.a.nifi.controller.StandardFlowService
> Failed to load flow from cluster due to:
> org.apache.nifi.controller.UninheritableFlowException: Failed to 
connect
> node to cluster because local flow is different than cluster flow.
>
> org.apache.nifi.controller.UninheritableFlowException: Failed to 
connect
> node to cluster because local flow is different than cluster flow.
>
> at
> 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:879)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:493)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> org.apache.nifi.web.server.JettyServer.start(JettyServer.java:746)
> [nifi-jetty-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.NiFi.(NiFi.java:152)
> [nifi-runtime-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.NiFi.main(NiFi.java:243)
> [nifi-runtime-1.0.0.jar:1.0.0]
>
> Caused by: org.apache.nifi.controller.UninheritableFlowException: 
Proposed
> Authorizer is not inheritable by the flow controller because of 
Authorizer
> differences: Proposed Authorizations do not match current 
Authorizations
>
> at
> 
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:252)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1435)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:83)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:671)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:857)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> ... 4 common frames omitted
>
> 2016-10-19 13:14:44,414 ERROR [main] 
o.a.n.c.c.node.NodeClusterCoordinator
> Event Reported for ncm-cm1.mis-cds.local:9090 -- Node disconnected 
from
> cluster due to org.apache.nifi.controller.UninheritableFlowException: 
Failed
> to connect node to cluster because local flow is different than 
cluster
> flow.
>
> 2016-10-19 13:14:44,420 ERROR [Shutdown Cluster Coordinator]
> org.apache.nifi.NiFi An Unknown Error Occurred in Thread 
Thread[Shutdown
> Cluster Coordinator,5,main]: java.lang.NullPointerException
>
> 2016-10-19 13:14:44,423 ERROR [Shutdown Cluster Coordinator]
> org.apache.nifi.NiFi
>
> java.lang.NullPointerException: null
  

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-19 Thread Conrad Crampton
Hi Joe,
Yep,
Tried removing the RPG that referenced the NCM and adding new one with one of 
the datanodes as url. 
That sort of worked, but kept getting errors about the NCM not being available 
for the ports and therefore couldn’t actually enable the port I needed to for 
that RPG.
Thanks
Conrad

On 19/10/2016, 14:12, "Joe Witt" <joe.w...@gmail.com> wrote:

Conrad,

For s2s now you can just point at any of the nodes in the cluster.
Have you tried changing the URL or removing and adding new RPG
entries?

Thanks
Joe

On Wed, Oct 19, 2016 at 8:38 AM, Conrad Crampton
<conrad.cramp...@secdata.com> wrote:
> Hi,
>
> I have finally taken the plunge to upgrade my cluster from 0.6.1 to 1.0.0.
>
> 6 nodes with a NCM.
>
> With the removal of NCM in 1.0.0 I believe I now have an issue where none 
of
> my Remote Process Groups work as they previously did because they were
> configured to connect to the NCM (as the RPG url) which now doesn’t exist.
>
> I have tried converting my NCM to a node but whilst I can get it running
> (sort of) when I try and connect to the cluster I get something like this 
in
> my logs…
>
>
>
> 2016-10-19 13:14:44,109 ERROR [main] 
o.a.nifi.controller.StandardFlowService
> Failed to load flow from cluster due to:
> org.apache.nifi.controller.UninheritableFlowException: Failed to connect
> node to cluster because local flow is different than cluster flow.
>
> org.apache.nifi.controller.UninheritableFlowException: Failed to connect
> node to cluster because local flow is different than cluster flow.
>
> at
> 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:879)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:493)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> org.apache.nifi.web.server.JettyServer.start(JettyServer.java:746)
> [nifi-jetty-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.NiFi.(NiFi.java:152)
> [nifi-runtime-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.NiFi.main(NiFi.java:243)
> [nifi-runtime-1.0.0.jar:1.0.0]
>
> Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed
> Authorizer is not inheritable by the flow controller because of Authorizer
> differences: Proposed Authorizations do not match current Authorizations
>
> at
> 
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:252)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1435)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:83)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:671)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at
> 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:857)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> ... 4 common frames omitted
>
> 2016-10-19 13:14:44,414 ERROR [main] o.a.n.c.c.node.NodeClusterCoordinator
> Event Reported for ncm-cm1.mis-cds.local:9090 -- Node disconnected from
> cluster due to org.apache.nifi.controller.UninheritableFlowException: 
Failed
> to connect node to cluster because local flow is different than cluster
> flow.
>
> 2016-10-19 13:14:44,420 ERROR [Shutdown Cluster Coordinator]
> org.apache.nifi.NiFi An Unknown Error Occurred in Thread Thread[Shutdown
> Cluster Coordinator,5,main]: java.lang.NullPointerException
>
> 2016-10-19 13:14:44,423 ERROR [Shutdown Cluster Coordinator]
> org.apache.nifi.NiFi
>
> java.lang.NullPointerException: null
>
> at
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
> ~[na:1.8.0_51]
>
> at
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
> ~[na:1.8.0_51]
>
> at
> 
org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.updateNodeStatus(NodeClusterCoordinator.java:570)
> ~[nifi-framework-cluster-1.0.0.jar:1.0.0]
>
> at
>

Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-19 Thread Conrad Crampton
Hi,
I have finally taken the plunge to upgrade my cluster from 0.6.1 to 1.0.0.
6 nodes with a NCM.
With the removal of NCM in 1.0.0 I believe I now have an issue where none of my 
Remote Process Groups work as they previously did because they were configured 
to connect to the NCM (as the RPG url) which now doesn’t exist.
I have tried converting my NCM to a node but whilst I can get it running (sort 
of) when I try and connect to the cluster I get something like this in my logs…

2016-10-19 13:14:44,109 ERROR [main] o.a.nifi.controller.StandardFlowService 
Failed to load flow from cluster due to: 
org.apache.nifi.controller.UninheritableFlowException: Failed to connect node 
to cluster because local flow is different than cluster flow.
org.apache.nifi.controller.UninheritableFlowException: Failed to connect node 
to cluster because local flow is different than cluster flow.
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:879)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:493)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:746) 
[nifi-jetty-1.0.0.jar:1.0.0]
at org.apache.nifi.NiFi.(NiFi.java:152) 
[nifi-runtime-1.0.0.jar:1.0.0]
at org.apache.nifi.NiFi.main(NiFi.java:243) 
[nifi-runtime-1.0.0.jar:1.0.0]
Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed 
Authorizer is not inheritable by the flow controller because of Authorizer 
differences: Proposed Authorizations do not match current Authorizations
at 
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:252)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1435) 
~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:83)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:671)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:857)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
... 4 common frames omitted
2016-10-19 13:14:44,414 ERROR [main] o.a.n.c.c.node.NodeClusterCoordinator 
Event Reported for ncm-cm1.mis-cds.local:9090 -- Node disconnected from cluster 
due to org.apache.nifi.controller.UninheritableFlowException: Failed to connect 
node to cluster because local flow is different than cluster flow.
2016-10-19 13:14:44,420 ERROR [Shutdown Cluster Coordinator] 
org.apache.nifi.NiFi An Unknown Error Occurred in Thread Thread[Shutdown 
Cluster Coordinator,5,main]: java.lang.NullPointerException
2016-10-19 13:14:44,423 ERROR [Shutdown Cluster Coordinator] 
org.apache.nifi.NiFi
java.lang.NullPointerException: null
at 
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011) 
~[na:1.8.0_51]
at 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) 
~[na:1.8.0_51]
at 
org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.updateNodeStatus(NodeClusterCoordinator.java:570)
 ~[nifi-framework-cluster-1.0.0.jar:1.0.0]
at 
org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.shutdown(NodeClusterCoordinator.java:119)
 ~[nifi-framework-cluster-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:330)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_51]
2016-10-19 13:14:44,448 WARN [main] o.a.n.c.l.e.CuratorLeaderElectionManager 
Failed to close Leader Selector for Cluster Coordinator
java.lang.IllegalStateException: Already closed or has not been started
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:173) 
~[guava-18.0.jar:na]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector.close(LeaderSelector.java:270)
 ~[curator-recipes-2.11.0.jar:na]
at 
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.stop(CuratorLeaderElectionManager.java:159)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.FlowController.shutdown(FlowController.java:1303) 
[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.StandardFlowService.stop(StandardFlowService.java:339)
 [nifi-framework-core-1.0.0.jar:1.0.0]
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:753) 
[nifi-jetty-1.0.0.jar:1.0.0]
at org.apache.nifi.NiFi.(NiFi.java:152) 
[nifi-runtime-1.0.0.jar:1.0.0]
at org.apache.nifi.NiFi.main(NiFi.java:243) 
[nifi-runtime-1.0.0.jar:1.0.0]
2016-10-19 13:14:45,062 WARN [Cluster Socket 

Re: How to deal with decimals while they're not supported?

2016-08-12 Thread Conrad Crampton
Excellent.
Again, haven’t tried it, but could possibly calculate the scaling factor from 
doing a length of the string after the decimal point. Haven’t really worked 
this through and may get very complicated.
Alternatively you could do another substring on the decimal portion and limit 
it to just two decimal places for example.
Regards
Conrad

From: Stéphane Maarek <stephane.maa...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Friday, 12 August 2016 at 08:04
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: How to deal with decimals while they're not supported?

That works :) That implies knowing a bit about the scale of numbers, but that's 
a decent workaround. It'd be so great if decimals were supported natively 
though. Or at least the casting of decimal to int without having to remove the 
decimal
i.e (1.2):toNumber() instead of 1.2:toString().substringBefore('.'):toNumber()

On Fri, Aug 12, 2016 at 4:59 PM Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
HI,
Can you coerce to a string, take the numbers after the decimal point and use 
that? So a similar logic to multiplying by 10 then doing the comparison. 
Obviously have to revert back to number to do the actual comparison though.
Haven’t tested but something like…
myAttr:toString():substringAfter(‘.’):toNumber():lt(1)
Conrad

From: Stéphane Maarek 
<stephane.maa...@gmail.com<mailto:stephane.maa...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Friday, 12 August 2016 at 01:16
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: How to deal with decimals while they're not supported?

Hi,

I have a flow in which I extract an attribute from json using jsonpath. That 
attribute happens to be a decimal number (0.123). I wanted to do a simple 
operation such as myAttr:lt(0.1) but obviously that won't work. What also won't 
work is myAttr:multiply(10):lt(1). I'm kinda stuck and I really need this logic 
to be working. What do you advise as a workaround?

Also, I've seen there is a JIRA for this: 
https://issues.apache.org/jira/browse/NIFI-1662 but stuff hasn't moved much 
since it first appeared. Not sure if it got de-prioritized or something

Congrats on the 1.0.0 beta, it looks great !!

Cheers,
Stephane


***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.

SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: How to deal with decimals while they're not supported?

2016-08-12 Thread Conrad Crampton
HI,
Can you coerce to a string, take the numbers after the decimal point and use 
that? So a similar logic to multiplying by 10 then doing the comparison. 
Obviously have to revert back to number to do the actual comparison though.
Haven’t tested but something like…
myAttr:toString():substringAfter(‘.’):toNumber():lt(1)
Conrad

From: Stéphane Maarek 
Reply-To: "users@nifi.apache.org" 
Date: Friday, 12 August 2016 at 01:16
To: "users@nifi.apache.org" 
Subject: How to deal with decimals while they're not supported?

Hi,

I have a flow in which I extract an attribute from json using jsonpath. That 
attribute happens to be a decimal number (0.123). I wanted to do a simple 
operation such as myAttr:lt(0.1) but obviously that won't work. What also won't 
work is myAttr:multiply(10):lt(1). I'm kinda stuck and I really need this logic 
to be working. What do you advise as a workaround?

Also, I've seen there is a JIRA for this: 
https://issues.apache.org/jira/browse/NIFI-1662 but stuff hasn't moved much 
since it first appeared. Not sure if it got de-prioritized or something

Congrats on the 1.0.0 beta, it looks great !!

Cheers,
Stephane


***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: SPOOFED: Re: ExecuteSQL question

2016-08-08 Thread Conrad Crampton
And final update…
It turns out that ExecuteSql doesn’t distribute processing across the nodes in 
the cluster – not unsurprising really when I think about it, so refinement to 
the below was to just have the text file on one node so that it effectively 
only runs on that node. Of course I could have set the ExecuteSql processor 
just to run on Primary Node but I fear that that node would become overloaded 
as I have a bunch of other stuff that runs on there exclusively. 

I don’t know if on roadmap, but would be useful to be able to indicate for a 
processor to run on any specific node in cluster other than just primary (so a 
manual load balancing could be under control of user). I think this has been 
discussed before though).

Conrad

On 05/08/2016, 15:35, "Conrad Crampton" <conrad.cramp...@secdata.com> wrote:

Thanks for the input on this.
As a follow-up this is what I have done…

Created a text file which contains single value of the lowest ID of the 
data I am retrieving, save on each node of cluster.
Then
GetFile (to read file)
ExtractText (to get the value – and set to attribute)
UpdateAttribute (to create another attribute which is the upper bound of 
the ID to use in sql i.e. lower >= ID < upper)
ExecuteSql (using the attribute as above as parameters) – using DBPool for 
connection
Branch1
MergeContent
PutHDFS
Branch2
ReplaceText (replace flowfile content with upper attribute from above)
PutFile (same filename as came in)

Rinse…Repeat

Appears to be working ok. Issue is that if ExecuteSQL fails, then as I have 
it now, the file is original deleted – probably better to get this moved so 
always have a backup of the last id used.

Conrad


On 03/08/2016, 15:34, "Yohann Lepage" <yoh...@lepage.info> wrote:

Hi,

I have exactly the same use case to periodically get rows from some
security appliances with just a read only access.

Currently (without NiFi), we use an SQL query  to track the maximum
value, depending on the DB/appliance/vendor, it could be a simple
"SELECT getdate()" or  "select max(SW_TIME) from log_table" or  a more
complex query with INNER JOIN.

So, it would be great to have an option to customize the tracking query.

    Regards
    
2016-08-03 16:02 GMT+02:00 Conrad Crampton 
<conrad.cramp...@secdata.com>:
> Hi,
> Thanks for this.
> I did think about a MV but unfortunately I haven’t access to create 
views – just read access. That would have been my simplest option ;-) Life’s 
never that easy though is it?
> The only part of the sql I need to be dynamic is the date parameter 
(I could even use the id column). Instead of using the MapCache (if that isn’t 
a good idea), could I use the GetFile to just pull a single txt file with the 
parameter (i.e. last run or max id value from the last run), which creates 
flowfile, read that value and pass that into ExecuteSql (using the 
aforementioned value as the parameter in the sql) as the select query can be 
dynamically constructed from attributes/ flowfile content (as per docs)? And 
then finally write text file back to file system to be picked up next time?
> Thanks
> Conrad
>
> On 03/08/2016, 14:02, "Matt Burgess" <mattyb...@gmail.com> wrote:
>
> Conrad,
>
> Is it possible to add a view (materialized or not) to the RDBMS? 
That
> view could take care of the denormalization and then
> QueryDatabaseTable could point at the view. The DB would take 
care of
> the push-down filters, which functionally is like if you had a
> QueryDatabaseTable for each table then did the joins.
>
> In NiFi 1.0 there is a GenerateTableFetch processor which is like
> QueryDatabaseTable except it generates SQL instead of executing 
SQL.
> That might be used in your a) option above but you'd have to 
reconcile
> the SQL statements into a JOIN. A possible improvement to 
either/both
> processors would be to add attributes for the maximum value 
columns
> whose values are the maximum observed values. Then you wouldn't 
have
> to parse or manipulate the SQL if you really just want the max 
values.
>
> I have been thinking about how QueryDatabaseTable and
> GenerateTableFetch would work if they accepted incoming flow 
files (to
> allow dynamic table names for example). It's a bit more tricky 
because
> those processors run without input to get max valu

Re: ExecuteSQL question

2016-08-03 Thread Conrad Crampton
Hi, 
Thanks for this.
I did think about a MV but unfortunately I haven’t access to create views – 
just read access. That would have been my simplest option ;-) Life’s never that 
easy though is it?
The only part of the sql I need to be dynamic is the date parameter (I could 
even use the id column). Instead of using the MapCache (if that isn’t a good 
idea), could I use the GetFile to just pull a single txt file with the 
parameter (i.e. last run or max id value from the last run), which creates 
flowfile, read that value and pass that into ExecuteSql (using the 
aforementioned value as the parameter in the sql) as the select query can be 
dynamically constructed from attributes/ flowfile content (as per docs)? And 
then finally write text file back to file system to be picked up next time?
Thanks
Conrad

On 03/08/2016, 14:02, "Matt Burgess" <mattyb...@gmail.com> wrote:

Conrad,

Is it possible to add a view (materialized or not) to the RDBMS? That
view could take care of the denormalization and then
QueryDatabaseTable could point at the view. The DB would take care of
the push-down filters, which functionally is like if you had a
QueryDatabaseTable for each table then did the joins.

In NiFi 1.0 there is a GenerateTableFetch processor which is like
QueryDatabaseTable except it generates SQL instead of executing SQL.
That might be used in your a) option above but you'd have to reconcile
the SQL statements into a JOIN. A possible improvement to either/both
processors would be to add attributes for the maximum value columns
whose values are the maximum observed values. Then you wouldn't have
to parse or manipulate the SQL if you really just want the max values.

I have been thinking about how QueryDatabaseTable and
GenerateTableFetch would work if they accepted incoming flow files (to
allow dynamic table names for example). It's a bit more tricky because
those processors run without input to get max values, so their
behavior would change when a flow file is present but would return to
the original behavior if no flow file is present. Since the
ListDatabaseTables processor is also in 1.0, it would be nice to use
that as input to the other two processors.

I'm definitely interested in any thoughts or discussion around these things 
:)

Regards,
Matt

On Wed, Aug 3, 2016 at 8:37 AM, Conrad Crampton
<conrad.cramp...@secdata.com> wrote:
> Hi,
>
> My use case is that I want to ship a load of rows from an RDMS 
periodically
> and put in HDFS as Avro.
>
> QueryTable processor has functionality that would be great i.e. maxcolumn
> value (there are couple of columns I could use for this from the data) and
> it is this functionality I am looking for, BUT the data is not from one
> single table. The nature of the RDBMS is that the business view on the 
data
> requires a bunch of joins from other tables/schemas to get the correct 
Avro
> file so the options I appear to have are
>
> a)   Use QueryTable for each table that make up the business view and 
do
> the joins etc. in HDFS (Spark or something) – or potentially do the
> reconciliation within NiFi???
>
> b)   Use ExecuteSQL to run the complete SQL to get the rows which can
> easily be put into HDFS as Avro given that the line will be the business
> (denormalised) data that is required.
>
> The problem with a) is the reconciliation (denormalisation) of the data 
and
> the problem with b) is how to maintain the maxcolumn value so I only get 
the
> data since the last run.
>
>
>
> In order to address b) can I use the DistrubutedMapCacheServer & Client to
> hold a key/value pair of last run date and extract from this date as a
> parameter?
>
>
>
> Thanks for any suggestions.
>
>
>
> Conrad
>
>
>
> SecureData, combating cyber threats
>
> 
>
> The information contained in this message or any of its attachments may be
> privileged and confidential and intended for the exclusive use of the
> intended recipient. If you are not the intended recipient any disclosure,
> reproduction, distribution or other dissemination or use of this
> communications is strictly prohibited. The views expressed in this email 
are
> those of the individual and not necessarily of SecureData Europe Ltd. Any
> prices quoted are only valid if followed up by a formal written quote.
>
> SecureData Europe Limited. Registered in England & Wales 04365896.
> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,

ExecuteSQL question

2016-08-03 Thread Conrad Crampton
Hi,
My use case is that I want to ship a load of rows from an RDMS periodically and 
put in HDFS as Avro.
QueryTable processor has functionality that would be great i.e. maxcolumn value 
(there are couple of columns I could use for this from the data) and it is this 
functionality I am looking for, BUT the data is not from one single table. The 
nature of the RDBMS is that the business view on the data requires a bunch of 
joins from other tables/schemas to get the correct Avro file so the options I 
appear to have are

a)   Use QueryTable for each table that make up the business view and do 
the joins etc. in HDFS (Spark or something) – or potentially do the 
reconciliation within NiFi???

b)   Use ExecuteSQL to run the complete SQL to get the rows which can 
easily be put into HDFS as Avro given that the line will be the business 
(denormalised) data that is required.
The problem with a) is the reconciliation (denormalisation) of the data and the 
problem with b) is how to maintain the maxcolumn value so I only get the data 
since the last run.

In order to address b) can I use the DistrubutedMapCacheServer & Client to hold 
a key/value pair of last run date and extract from this date as a parameter?

Thanks for any suggestions.

Conrad


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Syslog to avro format

2016-07-26 Thread Conrad Crampton
Why not converting to JSON?
I do exactly this, parse the syslog (into attributes), convert attributes to 
JSON, JSON->Avro.
I had to have an intermediate Avro schema that was only strings due to a 
problem converting JSON integers into equivalent Avro, then convert Avro schema 
to final one (that included ints).
HTH,
Conrad

From: Madhukar Thota 
Reply-To: "users@nifi.apache.org" 
Date: Tuesday, 26 July 2016 at 09:52
To: "users@nifi.apache.org" 
Subject: Syslog to avro format

Friends,

What is the best way to get Syslog data into avro format without converting to 
JSON?

Any suggestions?



***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Nifi & Parsey McParseface! RegEx in a Processor...

2016-06-06 Thread Conrad Crampton
Hi,
This may be a long shot as I don’t know how many combinations of the column 
lengths with | and + there are, but you could try using ReplaceTextWithMapping 
processor where you have all combinations of +--| etc. in a text file with what 
they represent in term of counts e.g
+--   [0]
|  +--   [1]
|  +--   [3]

etc. (tab separated)

Also, I’m not a particularly experienced in the area of sed, awk etc. but I’m 
guessing some bash guru would be able to come up with some sort of script that 
does this that could be called from ExcecuteScript processor.

Regards
Conrad

From: Pat Trainor 
Reply-To: "users@nifi.apache.org" 
Date: Sunday, 5 June 2016 at 18:33
To: "users@nifi.apache.org" 
Subject: Nifi & Parsey McParseface! RegEx in a Processor...

I have had success with using ReplaceText processor out of the box to modify 
the output of a nifi-called script. I'm applying nifi to running the parsey 
mcparseface system (Syntaxnet) from google. The ouput of the application looks 
like this:

---
Input: It is to two English scholars , father and son , Edward Pococke , senior 
and junior , that the world is indebted for the knowledge of one of the most 
charming productions Arabian philosophy can boast of .
Parse:
is VBZ ROOT
+-- It PRP nsubj
+-- to IN prep
|   +-- scholars NNS pobj
|   +-- two CD num
|   +-- English JJ amod
|   +-- , , punct
|   +-- father NN conj
|   |   +-- and CC cc
|   |   +-- son NN conj
|   +-- Pococke NNP appos
[...]
---

As you can see, my ExecuteProcessorStream is working fine. But there is a bit 
of importance that needs to be taken from this text. My ReplaceText Processor 
used (the first one) is shown in the attached. It only removes characters.

How many 'spaces' each of the '+' signs is is important. Simply removing 
leading spaces, + and | characters moves the first word in each line to the 
first column, without telling you how many columns over the words started in 
the original input.

WHat is needed is a way to count the number of columns in the beginning of each 
line that precedes the first alphanumeric. It doesn't matter if the same 
processor can also clean things out to my present efforts:

Input: It is to two English scholars , father and son , Edward Pococke , senior 
and junior , that the world is indebted for the knowledge of one of the most 
charming productions Arabian philosophy can boast of .
Parse:
is VBZ ROOT
It PRP nsubj
to IN prep
[...]

I am hoping to somehow use the expressions (a la ${line:blah...) in Nifi, or 
another mechanism I'm not aware of, to gather the column count, making it 
available for later processing/storage.

[0]is VBZ ROOT
[1]It PRP nsubj
[1]to IN prep
[2] ...

With the [X] being the # of columns over from the left that the alpha-numeric 
character was.

The reasoning for this is that the position signifies how 'important' that 
attribute is in the sentence. It looks like a tree, but the numer (indentation) 
is the length of the branch the word is on.

Is there a clever way to accomplish most/all of this, either with () regex or 
named attributes, in Nifi?

Thanks!
pat
( ͡° ͜ʖ ͡°)

"A wise man can learn more from a foolish question than a fool can learn from a 
wise answer". ~ Bruce Lee.


***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Spark or custom processor?

2016-06-03 Thread Conrad Crampton
Andre, helpful comments – I did consider the logstash part you describe a few 
weeks ago for another use case, but I can see why you are using it for this. 
When MiNiFi comes along, I may consider using that to mimic your topology.
Thanks
Conrad

From: Andre <andre-li...@fucs.org>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Friday, 3 June 2016 at 00:26
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Spark or custom processor?


Conrad,

Your work stream is very similar to mine. NIFI will works ok by itself, without 
the need for Spark (keep Spark option there but for other types of processing).

What we do is:

Syslog -> local disk -> logstash-forwarder (tail) -> ListenLumberjack processor
(PR290 -  experimental and not yet merged) -> ParseSyslog -> BlackMagicStuff

The reason we do it this way is to decouple the data flow from blocking 
mechanisms such as RELP and Lumberjack; no matter what happens with NiFi 
cluster, you still have a copy of the data for replay.

This is particularly relevant on environments where you would use TCP syslog or 
any other protocol that can block if unable to push log messages (search for 
tcp syslog causing an outage to Atlassian cloud a few years ago).

We are not an Internet scale shop but still have enough logs to make a SIEM 
suffer and in our opinion NiFi is able to perform well.

For load balancing, any session based TCP base lb will help you utilise all 
your nifi nodes.

Cheers
On 2 Jun 2016 23:28, "Conrad Crampton" 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
ListenSyslog (using the approach that is being discussed currently in another 
thread – ListenSyslog running on primary node as RGP, all other nodes 
connecting to the port that the RPG exposes).
Various enrichment, routing on attributes etc. and finally into HDFS as Avro.
I want to branch off at an appropriate point in the flow and do some further 
realtime analysis – got the output to port feeding to Spark process working 
fine (notwithstanding the issue that you have been so kind to help with 
previously with the SSLContext), just thinking about if this is most 
appropriate solution.

I have dabbled with a custom processor (for enriching url splitting/ enriching 
etc. – probably could have done with ExecuteScript processor in hindsight) so 
am comfortable with going this route if that is deemed more appropriate.

Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, 2 June 2016 at 13:12
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Spark or custom processor?

Conrad,

I would think that you could do this all in NiFi.

How do the log files come into NiFi? TailFile, ListenUDP/ListenTCP, 
List+FetchFile?

-Bryan


On Thu, Jun 2, 2016 at 6:41 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
Any advice on ‘best’ architectural approach whereby some processing function 
has to be applied to every flow file in a dataflow with some (possible) output 
based on flowfile content.
e.g. inspect log files for specific ip then send message to syslog

approach 1
Spark
Output port from NiFi -> Spark listens to that stream -> processes and outputs 
accordingly
Advantages – scale spark job on Yarn, decoupled (reusable) from NiFi
Disadvantages – adds complexity, decoupled from NiFi.

Approach 2
NiFi
Custom processor -> PutSyslog
Advantages – reuse existing NiFi processors/ capability, obvious flow (design 
intent)
Disadvantages – scale??

Any comments/ advice/ experience of either approaches?

Thanks
Conrad



SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT



***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.


Re: Spark or custom processor?

2016-06-02 Thread Conrad Crampton
Ryan, Great tip
thanks Conrad

From: Ryan Ward <ryan.wa...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 2 June 2016 at 15:11
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Spark or custom processor?

Conrad,
Depending on the number of clients and endpoints you have you can load balance 
the TCP connections with haproxy it wouldn't load balance the data just the 
connections. If you are using rsyslog you can tell it to rebind every x number 
of messages to better load balance the data.

http://blog.afkham.org/2014/10/tcp-load-balancing-with-haproxy.html
Ryan

On Thu, Jun 2, 2016 at 10:09 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Mark,
Ah, never considered the ScanAttribute processor before – looks like I could 
coerce it to work asis for my use case – with a few chained together (and more 
likely routeonattribute processor) for all the criteria.

Quick follow up question on PutSyslog though – as the flows run on all nodes in 
cluster, for PutSyslog, do I also have to run on single node otherwise doesn’t 
the putting of message get executed on all nodes (and therefore I get duplicate 
syslog messages x number of nodes)?

Thanks
Conrad

From: Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, 2 June 2016 at 14:58

To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Spark or custom processor?

Conrad,

Excellent - I think this is a great use case, as well. This is similar to the 
enrichment case, as you are operating
on each piece of data in conjunction with some 'reference dataset' (bad 
domains, etc.) which would likely be
some file, etc. that is configured in the Processor. This is actually similar 
to the ScanContent / ScanAttribute
Processors I think. You may want to review those for 'inspiration' for your 
processor.

At a high level, the way that they work is that they are configured with a file 
that is a dictionary of terms to look
for in the FlowFile. As each FlowFile comes through, it checks if its 
attributes (or content, depending on the processor)
match any of the terms in the dictionary and routes each FlowFile to either 
'matched' or 'unmatched'. The Processor
will periodically check the dataset file and reload the dictionary if the file 
has changed. Typically, GetSFTP or GetHTTP
or something like that would be used to fetch new versions of the dictionary 
and then PutFile would be used to write
the file to a directory. This allows the Scan* processors not to have to worry 
about fetching the data and allows the
data to come from wherever.

Hope this is helpful!

Thanks
-Mark


On Jun 2, 2016, at 9:51 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Mark,
A very helpful explanation and distinction on appropriate use for NiFi. I think 
my particular use case currently (probably) falls into the Simple Event 
Processing. I say ‘probably’ because I am bringing in some other data to 
compare the data against (bad domains and maybe others), but certainly isn’t 
doing anything clever at the moment in terms of windowing/ aggregation with 
previously seen data etc.

Thanks for the advice, very helpful.
Conrad

From: Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, 2 June 2016 at 14:42
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Spark or custom processor?

Conrad,

Typically, the way that we like to think about using NiFi vs. something like 
Spark or Storm is whether
the processing is Simple Event Processing or Complex Event Processing. Simple 
Event Processing
encapsulates those tasks where you are able to operate on a single piece of 
data by itself (or in correlation
with an Enrichment Dataset). So tasks like enrichment, splitting, and 
transformation are squarely within
the wheelhouse of NiFi.

When we talk about doing Complex Event Processing, we are generally talking 
about either processing data
from multiple streams together (think JOIN operations) or analyzing data across 
time windows (think calculating
norms, standard deviation, etc. over the last 30 minutes). The idea here is to 
derive a single new "insight" from
windows of data or joined streams of data - not to transform or enrich 
individual pieces of data. For this, we would
recommend something like Spark, Storm, Flink, etc.

In te

Re: Spark or custom processor?

2016-06-02 Thread Conrad Crampton
Mark,
Ah, never considered the ScanAttribute processor before – looks like I could 
coerce it to work asis for my use case – with a few chained together (and more 
likely routeonattribute processor) for all the criteria.

Quick follow up question on PutSyslog though – as the flows run on all nodes in 
cluster, for PutSyslog, do I also have to run on single node otherwise doesn’t 
the putting of message get executed on all nodes (and therefore I get duplicate 
syslog messages x number of nodes)?

Thanks
Conrad

From: Mark Payne <marka...@hotmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 2 June 2016 at 14:58
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Spark or custom processor?

Conrad,

Excellent - I think this is a great use case, as well. This is similar to the 
enrichment case, as you are operating
on each piece of data in conjunction with some 'reference dataset' (bad 
domains, etc.) which would likely be
some file, etc. that is configured in the Processor. This is actually similar 
to the ScanContent / ScanAttribute
Processors I think. You may want to review those for 'inspiration' for your 
processor.

At a high level, the way that they work is that they are configured with a file 
that is a dictionary of terms to look
for in the FlowFile. As each FlowFile comes through, it checks if its 
attributes (or content, depending on the processor)
match any of the terms in the dictionary and routes each FlowFile to either 
'matched' or 'unmatched'. The Processor
will periodically check the dataset file and reload the dictionary if the file 
has changed. Typically, GetSFTP or GetHTTP
or something like that would be used to fetch new versions of the dictionary 
and then PutFile would be used to write
the file to a directory. This allows the Scan* processors not to have to worry 
about fetching the data and allows the
data to come from wherever.

Hope this is helpful!

Thanks
-Mark


On Jun 2, 2016, at 9:51 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Mark,
A very helpful explanation and distinction on appropriate use for NiFi. I think 
my particular use case currently (probably) falls into the Simple Event 
Processing. I say ‘probably’ because I am bringing in some other data to 
compare the data against (bad domains and maybe others), but certainly isn’t 
doing anything clever at the moment in terms of windowing/ aggregation with 
previously seen data etc.

Thanks for the advice, very helpful.
Conrad

From: Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, 2 June 2016 at 14:42
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Spark or custom processor?

Conrad,

Typically, the way that we like to think about using NiFi vs. something like 
Spark or Storm is whether
the processing is Simple Event Processing or Complex Event Processing. Simple 
Event Processing
encapsulates those tasks where you are able to operate on a single piece of 
data by itself (or in correlation
with an Enrichment Dataset). So tasks like enrichment, splitting, and 
transformation are squarely within
the wheelhouse of NiFi.

When we talk about doing Complex Event Processing, we are generally talking 
about either processing data
from multiple streams together (think JOIN operations) or analyzing data across 
time windows (think calculating
norms, standard deviation, etc. over the last 30 minutes). The idea here is to 
derive a single new "insight" from
windows of data or joined streams of data - not to transform or enrich 
individual pieces of data. For this, we would
recommend something like Spark, Storm, Flink, etc.

In terms of scalability, NiFi certainly was not designed to scale outward in 
the way that Spark was. With Spark you
may be scaling to thousands of nodes, but with NiFi you would get a pretty poor 
user experience because each change
in the UI must be replicated to all of those nodes. That being said, NiFi does 
scale up very well to take full advantage
of however much CPU and disks you have available. We typically see processing 
of several terabytes of data per day
on a single node, so we have generally not needed to scale out to hundreds or 
thousands of nodes.

I hope this helps to clarify when/where to use each one. If there are things 
that are still unclear or if you have more
questions, as always, don't hesitate to shoot another email!

Thanks
-Mark


On Jun 2, 2016, at 9:28 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
ListenSyslog (using the approach that is being discussed currently in anot

Re: Spark or custom processor?

2016-06-02 Thread Conrad Crampton
Bryan, thanks for this. I wasn’t’ aware of the nuances between UDP and TCP 
syslog streams. The RPG approach certainly makes my head hurt every time I 
refer back to it so certainly would simplify things – and also remove the 
single point of failure with processing on primary node only (but introducing 
another one by way of UDP load balancer).
Cheers
Conrad

From: Bryan Bende <bbe...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 2 June 2016 at 14:47
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Spark or custom processor?

In addition to what Mark said, regarding getting the logs into NiFi...

I've found that when syslog servers forward messages over TCP, they typically 
open a single connection, so in this case sending to the primary node probably 
makes sense.

If you are forwarding over UDP, you might be able to stick a UDP load balancer 
in front of of NiFi so that the logs are being routed to the ListenSyslog on 
each node, and then you wouldn't need the RPG.

Just something to think about.

On Thu, Jun 2, 2016 at 9:42 AM, Mark Payne 
<marka...@hotmail.com<mailto:marka...@hotmail.com>> wrote:
Conrad,

Typically, the way that we like to think about using NiFi vs. something like 
Spark or Storm is whether
the processing is Simple Event Processing or Complex Event Processing. Simple 
Event Processing
encapsulates those tasks where you are able to operate on a single piece of 
data by itself (or in correlation
with an Enrichment Dataset). So tasks like enrichment, splitting, and 
transformation are squarely within
the wheelhouse of NiFi.

When we talk about doing Complex Event Processing, we are generally talking 
about either processing data
from multiple streams together (think JOIN operations) or analyzing data across 
time windows (think calculating
norms, standard deviation, etc. over the last 30 minutes). The idea here is to 
derive a single new "insight" from
windows of data or joined streams of data - not to transform or enrich 
individual pieces of data. For this, we would
recommend something like Spark, Storm, Flink, etc.

In terms of scalability, NiFi certainly was not designed to scale outward in 
the way that Spark was. With Spark you
may be scaling to thousands of nodes, but with NiFi you would get a pretty poor 
user experience because each change
in the UI must be replicated to all of those nodes. That being said, NiFi does 
scale up very well to take full advantage
of however much CPU and disks you have available. We typically see processing 
of several terabytes of data per day
on a single node, so we have generally not needed to scale out to hundreds or 
thousands of nodes.

I hope this helps to clarify when/where to use each one. If there are things 
that are still unclear or if you have more
questions, as always, don't hesitate to shoot another email!

Thanks
-Mark


On Jun 2, 2016, at 9:28 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
ListenSyslog (using the approach that is being discussed currently in another 
thread – ListenSyslog running on primary node as RGP, all other nodes 
connecting to the port that the RPG exposes).
Various enrichment, routing on attributes etc. and finally into HDFS as Avro.
I want to branch off at an appropriate point in the flow and do some further 
realtime analysis – got the output to port feeding to Spark process working 
fine (notwithstanding the issue that you have been so kind to help with 
previously with the SSLContext), just thinking about if this is most 
appropriate solution.

I have dabbled with a custom processor (for enriching url splitting/ enriching 
etc. – probably could have done with ExecuteScript processor in hindsight) so 
am comfortable with going this route if that is deemed more appropriate.

Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, 2 June 2016 at 13:12
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Spark or custom processor?

Conrad,

I would think that you could do this all in NiFi.

How do the log files come into NiFi? TailFile, ListenUDP/ListenTCP, 
List+FetchFile?

-Bryan


On Thu, Jun 2, 2016 at 6:41 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
Any advice on ‘best’ architectural approach whereby some processing function 
has to be applied to every flow file in a dataflow with some (possible) output 
based on flowfile content.
e.g. inspect log files for specific ip then send message to syslog

approach 1
Spark
Output port from NiFi -> Spark listens to that stream

Re: Spark or custom processor?

2016-06-02 Thread Conrad Crampton
Mark,
A very helpful explanation and distinction on appropriate use for NiFi. I think 
my particular use case currently (probably) falls into the Simple Event 
Processing. I say ‘probably’ because I am bringing in some other data to 
compare the data against (bad domains and maybe others), but certainly isn’t 
doing anything clever at the moment in terms of windowing/ aggregation with 
previously seen data etc.

Thanks for the advice, very helpful.
Conrad

From: Mark Payne <marka...@hotmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 2 June 2016 at 14:42
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Spark or custom processor?

Conrad,

Typically, the way that we like to think about using NiFi vs. something like 
Spark or Storm is whether
the processing is Simple Event Processing or Complex Event Processing. Simple 
Event Processing
encapsulates those tasks where you are able to operate on a single piece of 
data by itself (or in correlation
with an Enrichment Dataset). So tasks like enrichment, splitting, and 
transformation are squarely within
the wheelhouse of NiFi.

When we talk about doing Complex Event Processing, we are generally talking 
about either processing data
from multiple streams together (think JOIN operations) or analyzing data across 
time windows (think calculating
norms, standard deviation, etc. over the last 30 minutes). The idea here is to 
derive a single new "insight" from
windows of data or joined streams of data - not to transform or enrich 
individual pieces of data. For this, we would
recommend something like Spark, Storm, Flink, etc.

In terms of scalability, NiFi certainly was not designed to scale outward in 
the way that Spark was. With Spark you
may be scaling to thousands of nodes, but with NiFi you would get a pretty poor 
user experience because each change
in the UI must be replicated to all of those nodes. That being said, NiFi does 
scale up very well to take full advantage
of however much CPU and disks you have available. We typically see processing 
of several terabytes of data per day
on a single node, so we have generally not needed to scale out to hundreds or 
thousands of nodes.

I hope this helps to clarify when/where to use each one. If there are things 
that are still unclear or if you have more
questions, as always, don't hesitate to shoot another email!

Thanks
-Mark


On Jun 2, 2016, at 9:28 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
ListenSyslog (using the approach that is being discussed currently in another 
thread – ListenSyslog running on primary node as RGP, all other nodes 
connecting to the port that the RPG exposes).
Various enrichment, routing on attributes etc. and finally into HDFS as Avro.
I want to branch off at an appropriate point in the flow and do some further 
realtime analysis – got the output to port feeding to Spark process working 
fine (notwithstanding the issue that you have been so kind to help with 
previously with the SSLContext), just thinking about if this is most 
appropriate solution.

I have dabbled with a custom processor (for enriching url splitting/ enriching 
etc. – probably could have done with ExecuteScript processor in hindsight) so 
am comfortable with going this route if that is deemed more appropriate.

Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, 2 June 2016 at 13:12
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Spark or custom processor?

Conrad,

I would think that you could do this all in NiFi.

How do the log files come into NiFi? TailFile, ListenUDP/ListenTCP, 
List+FetchFile?

-Bryan


On Thu, Jun 2, 2016 at 6:41 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
Any advice on ‘best’ architectural approach whereby some processing function 
has to be applied to every flow file in a dataflow with some (possible) output 
based on flowfile content.
e.g. inspect log files for specific ip then send message to syslog

approach 1
Spark
Output port from NiFi -> Spark listens to that stream -> processes and outputs 
accordingly
Advantages – scale spark job on Yarn, decoupled (reusable) from NiFi
Disadvantages – adds complexity, decoupled from NiFi.

Approach 2
NiFi
Custom processor -> PutSyslog
Advantages – reuse existing NiFi processors/ capability, obvious flow (design 
intent)
Disadvantages – scale??

Any comments/ advice/ experience of either approaches?

Thanks
Conrad



SecureData, combating cyber threats


The information containe

Re: Spark or custom processor?

2016-06-02 Thread Conrad Crampton
Hi,
ListenSyslog (using the approach that is being discussed currently in another 
thread – ListenSyslog running on primary node as RGP, all other nodes 
connecting to the port that the RPG exposes).
Various enrichment, routing on attributes etc. and finally into HDFS as Avro.
I want to branch off at an appropriate point in the flow and do some further 
realtime analysis – got the output to port feeding to Spark process working 
fine (notwithstanding the issue that you have been so kind to help with 
previously with the SSLContext), just thinking about if this is most 
appropriate solution.

I have dabbled with a custom processor (for enriching url splitting/ enriching 
etc. – probably could have done with ExecuteScript processor in hindsight) so 
am comfortable with going this route if that is deemed more appropriate.

Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Thursday, 2 June 2016 at 13:12
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Spark or custom processor?

Conrad,

I would think that you could do this all in NiFi.

How do the log files come into NiFi? TailFile, ListenUDP/ListenTCP, 
List+FetchFile?

-Bryan


On Thu, Jun 2, 2016 at 6:41 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
Any advice on ‘best’ architectural approach whereby some processing function 
has to be applied to every flow file in a dataflow with some (possible) output 
based on flowfile content.
e.g. inspect log files for specific ip then send message to syslog

approach 1
Spark
Output port from NiFi -> Spark listens to that stream -> processes and outputs 
accordingly
Advantages – scale spark job on Yarn, decoupled (reusable) from NiFi
Disadvantages – adds complexity, decoupled from NiFi.

Approach 2
NiFi
Custom processor -> PutSyslog
Advantages – reuse existing NiFi processors/ capability, obvious flow (design 
intent)
Disadvantages – scale??

Any comments/ advice/ experience of either approaches?

Thanks
Conrad



SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT



***This email originated outside SecureData***

Click 
here<https://www.mailcontrol.com/sr/0Yez0Z9rJiDGX2PQPOmvUr11KAWLA5a39FXrkhyyO4eQg2DXa9Xl!rwzg+4hlLPKdufvfzcRzpTaNxM9hG2QrA==>
 to report this email as spam.


Spark or custom processor?

2016-06-02 Thread Conrad Crampton
Hi,
Any advice on ‘best’ architectural approach whereby some processing function 
has to be applied to every flow file in a dataflow with some (possible) output 
based on flowfile content.
e.g. inspect log files for specific ip then send message to syslog

approach 1
Spark
Output port from NiFi -> Spark listens to that stream -> processes and outputs 
accordingly
Advantages – scale spark job on Yarn, decoupled (reusable) from NiFi
Disadvantages – adds complexity, decoupled from NiFi.

Approach 2
NiFi
Custom processor -> PutSyslog
Advantages – reuse existing NiFi processors/ capability, obvious flow (design 
intent)
Disadvantages – scale??

Any comments/ advice/ experience of either approaches?

Thanks
Conrad




SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Use Case...Please help

2016-05-24 Thread Conrad Crampton
I suspect then that mapping the network drive on the NiFi cluster is out of the 
question as is a standalone NiFi instance on it.
Other options then – are these logs ‘emitted’ by syslog? If so use ListenSyslog 
processor, if not then I’m struggling. Perhaps you could spin up a lightweight 
machine/ server/ docker container that you can map to the Philips network drive 
then use a local Nifi instance as I suggested before?

There are many other ingestion processors which you could explore – I haven’t 
used any others so can’t help but the docs give a good run down of these. Does 
the Philips Network drive have an API to interact with it if so, you could use 
ExecuteProcess/Script processor.

Conrad

From: "Tripathi, Shiv Deepak" 
<shiv.deepak.tripa...@philips.com<mailto:shiv.deepak.tripa...@philips.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Tuesday, 24 May 2016 at 12:48
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: RE: Use Case...Please help

You want to move data from Philips network (mapped drive) to HDFS (Amazon 
cloud) with a NiFi installation hosted on Amazon too? Yes(NiFi installation on 
amazon only).

when i say "added as a service in Hortonworks Hadoop cluster.”- I mean that I 
have created a hadoop cluster on cloud and installed nifi on the Hadoop cluster.

Philips network drive  is a storage drive. I don’t think NiFi can’t be 
installed there as it is not a server or machine.

Thanks,
Deepak


From: Conrad Crampton [mailto:conrad.cramp...@secdata.com]
Sent: Tuesday, May 24, 2016 4:31 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Use Case...Please help

So as far as I understand this
You want to move data from Philips network (mapped drive) to HDFS (Amazon 
cloud) with a NiFi installation hosted on Amazon too?

As I said before to use ListFiles or FetchFile the mapped drive has to be local 
to the NiFi server. As your NiFi is running on a remote cloud server (as you 
stated – I don’t quite understand what you mean when you say "added as a 
service in Hortonworks Hadoop cluster.”).
If you need to get these log files from a remote machine you could use an 
instance of Nifi running on that machine (where Philips network data is 
generated) then use ListFiles -> GetFile then use a combination of Remote 
Process Groups/ Site to Site communication using output port (on the local 
version) and input port (on cluster) with you clustered NiFi version. This is 
something that has been recommended to me for a similar use case in a previous 
thread (I haven’t tried it out yet though). Other alternatives would be use 
ListSFTP and set up an SFTP server on that machine.
Once you have the picked up the files in the clustered NIFi, depending on what 
you want to do with the data in those files, SplitText processor (to make 
multiple FlowFiles (per line typically), the ExtractText (to parse those lines 
to get attribute data), UpdateAttributes to further process, the probably 
finally MergeProcessor to create tar files of these lines then PutHDFS to 
finally store (probably using attribute data to partition appropriately).

I’ve made a lot of assumptions again as I still am not totally sure of what you 
want to do, but hopefully you have some pointers to move you forward.

Regards
Conrad


From: "Tripathi, Shiv Deepak" 
<shiv.deepak.tripa...@philips.com<mailto:shiv.deepak.tripa...@philips.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Tuesday, 24 May 2016 at 11:10
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: RE: Use Case...Please help

Thanks for the time you spent on my use case.

I mounted one of my windows drive where my test input data resides and its 
working.

As of now I am not satisfied with the implementation which I did. I am trying 
to extend it.

My real use case is:

SourceLocation: Philips Network

Screenshot1: “SourceDir.jpg”
Highlighted drive is my network drive which I mapped to my machine currently. I 
want nifi to pick file directly from here .which is if I don’t map it to my 
machine den also it should pick from the network drive by specifying the path.

Nifi Installation:

Nifi is installed on cloud ec2 instances and added as a service in Hortonworks 
Hadoop cluster.

Destination:

S3 and HDFS on Amazon cloud:

Could you please assist me in which order I need to select processor and how to 
specify the path.

Thanks,
Deepak


From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Monday, May 23, 2016 6:25 PM
To: users@nifi.apache.org<mailto:users@nifi.apache

Re: Spark & NiFi question

2016-05-24 Thread Conrad Crampton
Hi Bryan
Firstly, let me apologise for my constant stream of emails on this that appear 
not to be taking any of your replies into consideration. I thought no one was 
looking at it! My email client/ server appears to have stopped letting any 
emails relating to this thread though even though I get all others in the list! 
I must appear to be a complete numbnuts! I checked the archive list on the 
mail-archives website and found all of your posts!
Having been on this mailing list for a while now, I couldn’t quite believe 
no-one was assisting given the usually brilliant responses (so I definitely 
concur with Joe’s previous comment) :-)

Anyway, I can’t thank you enough Bryan for confirming that I, in fact, am not 
going mad and there is a bug here. I will park the work here and wait for 0.7 
to be released as I know what I want to do actually works on the Spark end 
(having proved on a local (insecure) NiFi.

Thanks again,
Conrad


From: Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Monday, 23 May 2016 at 16:04
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Spark & NiFi question

Hi,
I don’t know if I’m hitting some bug here but something doesn’t make sense.
With ssl debug on I get the following
NiFi Receiver, READ: TLSv1.2 Application Data, length = 1648
Padded plaintext after DECRYPTION:  len = 1648
: 65 A2 B8 34 DF 20 6B 95   56 88 97 16 7A EC 8F E3  e..4. k.V...z...
0010: 48 54 54 50 2F 31 2E 31   20 32 30 30 20 4F 4B 0D  HTTP/1.1 200 OK.
0020: 0A 44 61 74 65 3A 20 4D   6F 6E 2C 20 32 33 20 4D  .Date: Mon, 23 M
0030: 61 79 20 32 30 31 36 20   31 34 3A 34 39 3A 33 39  ay 2016 14:49:39
0040: 20 47 4D 54 0D 0A 53 65   72 76 65 72 3A 20 4A 65   GMT..Server: Je
0050: 74 74 79 28 39 2E 32 2E   31 31 2E 76 32 30 31 35  tty(9.2.11.v2015
0060: 30 35 32 39 29 0D 0A 43   61 63 68 65 2D 43 6F 6E  0529)..Cache-Con
0070: 74 72 6F 6C 3A 20 70 72   69 76 61 74 65 2C 20 6E  trol: private, n
0080: 6F 2D 63 61 63 68 65 2C   20 6E 6F 2D 73 74 6F 72  o-cache, no-stor
0090: 65 2C 20 6E 6F 2D 74 72   61 6E 73 66 6F 72 6D 0D  e, no-transform.
00A0: 0A 56 61 72 79 3A 20 41   63 63 65 70 74 2D 45 6E  .Vary: Accept-En
00B0: 63 6F 64 69 6E 67 2C 20   55 73 65 72 2D 41 67 65  coding, User-Age
00C0: 6E 74 0D 0A 44 61 74 65   3A 20 4D 6F 6E 2C 20 32  nt..Date: Mon, 2
00D0: 33 20 4D 61 79 20 32 30   31 36 20 31 34 3A 34 39  3 May 2016 14:49
00E0: 3A 33 39 20 47 4D 54 0D   0A 43 6F 6E 74 65 6E 74  :39 GMT..Content
00F0: 2D 54 79 70 65 3A 20 61   70 70 6C 69 63 61 74 69  -Type: applicati
0100: 6F 6E 2F 6A 73 6F 6E 0D   0A 56 61 72 79 3A 20 41  on/json..Vary: A
0110: 63 63 65 70 74 2D 45 6E   63 6F 64 69 6E 67 2C 20  ccept-Encoding,
0120: 55 73 65 72 2D 41 67 65   6E 74 0D 0A 43 6F 6E 74  User-Agent..Cont
0130: 65 6E 74 2D 4C 65 6E 67   74 68 3A 20 31 32 38 35  ent-Length: 1285
0140: 0D 0A 0D 0A 7B 22 72 65   76 69 73 69 6F 6E 22 3A  ."revision":
0150: 7B 22 63 6C 69 65 6E 74   49 64 22 3A 22 39 34 38  ."clientId":"948
0160: 66 62 34 31 33 2D 65 39   37 64 2D 34 32 37 65 2D  fb413-e97d-427e-
0170: 61 34 38 36 2D 31 31 63   39 65 37 31 63 63 62 62  a486-11c9e71ccbb
0180: 32 22 7D 2C 22 63 6F 6E   74 72 6F 6C 6C 65 72 22  2".,"controller"
0190: 3A 7B 22 69 64 22 3A 22   31 38 63 38 39 64 32 33  :."id":"18c89d23
01A0: 2D 61 35 31 65 2D 34 35   35 38 2D 62 30 31 61 2D  -a51e-4558-b01a-
01B0: 33 66 36 30 64 66 31 31   63 39 61 64 22 2C 22 6E  3f60df11c9ad","n
01C0: 61 6D 65 22 3A 22 4E 69   46 69 20 46 6C 6F 77 22  ame":"NiFi Flow"
01D0: 2C 22 63 6F 6D 6D 65 6E   74 73 22 3A 22 22 2C 22  ,"comments":"","
01E0: 72 75 6E 6E 69 6E 67 43   6F 75 6E 74 22 3A 31 36  runningCount":16
01F0: 34 2C 22 73 74 6F 70 70   65 64 43 6F 75 6E 74 22  4,"stoppedCount"
0200: 3A 34 33 2C 22 69 6E 76   61 6C 69 64 43 6F 75 6E  :43,"invalidCoun
0210: 74 22 3A 31 2C 22 64 69   73 61 62 6C 65 64 43 6F  t":1,"disabledCo
0220: 75 6E 74 22 3A 30 2C 22   69 6E 70 75 74 50 6F 72  unt":0,"inputPor
0230: 74 43 6F 75 6E 74 22 3A   37 2C 22 6F 75 74 70 75  tCount":7,"outpu
0240: 74 50 6F 72 74 43 6F 75   6E 74 22 3A 31 2C 22 72  tPortCount":1,"r
0250: 65 6D 6F 74 65 53 69 74   65 4C 69 73 74 65 6E 69  emoteSiteListeni
0260: 6E 67 50 6F 72 74 22 3A   39 38 37 30 2C 22 73 69  ngPort":9870,"si
0270: 74 65 54 6F 53 69 74 65   53 65 63 75 72 65 22 3A  teToSiteSecure":
0280: 74 72 75 65 2C 22 69 6E   73 74 61 6E 63 65 49 64  true,"instanceId
0290: 22 3A 22 30 35 38 30 63   35 31 38 2D 39 62 63 37  ":"0580c518-9bc7
02A0: 2D 34 37 38 33 2D 39 32   34 38 2D 35 38 30 61 36  -

Re: Spark & NiFi question

2016-05-23 Thread Conrad Crampton
9 2D 62   39 39 61 2D 38 64 34 34  a-4a29-b99a-8d44
03C0: 65 66 37 38 66 30 31 30   22 2C 22 6E 61 6D 65 22  ef78f010","name"
03D0: 3A 22 48 44 46 53 57 65   62 73 65 6E 73 65 53 65  :"HDFSWebsenseSe
03E0: 63 75 72 69 74 79 22 2C   22 63 6F 6D 6D 65 6E 74  curity","comment
03F0: 73 22 3A 22 22 2C 22 73   74 61 74 65 22 3A 22 53  s":"","state":"S
0400: 54 4F 50 50 45 44 22 7D   2C 7B 22 69 64 22 3A 22  TOPPED".,."id":"
0410: 33 61 66 30 33 66 66 36   2D 39 62 65 37 2D 33 32  3af03ff6-9be7-32
0420: 35 61 2D 61 63 66 33 2D   63 36 62 39 61 37 64 32  5a-acf3-c6b9a7d2
0430: 31 36 65 33 22 2C 22 6E   61 6D 65 22 3A 22 50 6F  16e3","name":"Po
0440: 72 74 20 39 30 39 39 20   49 6E 63 6F 6D 69 6E 67  rt 9099 Incoming
0450: 20 53 79 73 6C 6F 67 73   22 2C 22 63 6F 6D 6D 65   Syslogs","comme
0460: 6E 74 73 22 3A 22 22 2C   22 73 74 61 74 65 22 3A  nts":"","state":
0470: 22 52 55 4E 4E 49 4E 47   22 7D 2C 7B 22 69 64 22  "RUNNING".,."id"
0480: 3A 22 65 65 34 31 37 64   35 61 2D 62 64 39 38 2D  :"ee417d5a-bd98-
0490: 33 32 65 61 2D 61 63 35   38 2D 63 36 32 33 64 66  32ea-ac58-c623df
04A0: 35 65 64 64 66 35 22 2C   22 6E 61 6D 65 22 3A 22  5eddf5","name":"
04B0: 50 6F 72 74 20 39 31 30   31 20 49 6E 63 6F 6D 69  Port 9101 Incomi
04C0: 6E 67 20 53 79 73 6C 6F   67 73 22 2C 22 63 6F 6D  ng Syslogs","com
04D0: 6D 65 6E 74 73 22 3A 22   22 2C 22 73 74 61 74 65  ments":"","state
04E0: 22 3A 22 52 55 4E 4E 49   4E 47 22 7D 2C 7B 22 69  ":"RUNNING".,."i
04F0: 64 22 3A 22 39 34 37 30   38 30 61 36 2D 34 65 61  d":"947080a6-4ea
0500: 66 2D 33 37 64 37 2D 62   36 32 62 2D 39 37 62 61  f-37d7-b62b-97ba
0510: 62 35 37 66 34 64 39 38   22 2C 22 6E 61 6D 65 22  b57f4d98","name"
0520: 3A 22 50 6F 72 74 20 39   31 30 30 20 49 6E 63 6F  :"Port 9100 Inco
0530: 6D 69 6E 67 20 53 79 73   6C 6F 67 73 22 2C 22 63  ming Syslogs","c
0540: 6F 6D 6D 65 6E 74 73 22   3A 22 22 2C 22 73 74 61  omments":"","sta
0550: 74 65 22 3A 22 52 55 4E   4E 49 4E 47 22 7D 2C 7B  te":"RUNNING".,.
0560: 22 69 64 22 3A 22 63 33   37 34 35 64 37 65 2D 39  "id":"c3745d7e-9
0570: 62 66 66 2D 33 31 31 32   2D 38 65 33 63 2D 39 36  bff-3112-8e3c-96
0580: 34 61 66 62 39 63 36 36   37 33 22 2C 22 6E 61 6D  4afb9c6673","nam
0590: 65 22 3A 22 50 6F 72 74   20 39 31 30 32 20 49 6E  e":"Port 9102 In
05A0: 63 6F 6D 69 6E 67 20 53   79 73 6C 6F 67 73 22 2C  coming Syslogs",
05B0: 22 63 6F 6D 6D 65 6E 74   73 22 3A 22 22 2C 22 73  "comments":"","s
05C0: 74 61 74 65 22 3A 22 52   55 4E 4E 49 4E 47 22 7D  tate":"RUNNING".
05D0: 5D 2C 22 6F 75 74 70 75   74 50 6F 72 74 73 22 3A  ],"outputPorts":
05E0: 5B 7B 22 69 64 22 3A 22   61 62 38 36 62 37 34 36  [."id":"ab86b746
05F0: 2D 37 39 63 33 2D 34 30   31 65 2D 62 35 30 35 2D  -79c3-401e-b505-
0600: 39 64 39 34 30 35 62 32   32 62 33 31 22 2C 22 6E  9d9405b22b31","n
0610: 61 6D 65 22 3A 22 53 70   61 72 6B 20 74 65 73 74  ame":"Spark test
0620: 20 6F 75 74 22 2C 22 63   6F 6D 6D 65 6E 74 73 22   out","comments"
0630: 3A 22 22 2C 22 73 74 61   74 65 22 3A 22 52 55 4E  :"","state":"RUN
0640: 4E 49 4E 47 22 7D 5D 7D   7D 15 C4 DA 96 85 23 76  NING".]...#v
0650: 2B DB 4B 46 5A 9A DD 4F   9B EF D8 46 70 FF CD EC  +.KFZ..O...Fp...
0660: 99 19 31 F3 7F CC C1 14   07 06 06 06 06 06 06 06  ..1.
16/05/23 15:49:39 WARN EndpointConnectionPool: EndpointConnectionPool[Cluster 
URL=https://yarn-cm1.mis-cds.local:9090/nifi/] Unable to refresh Remote Group's 
peers due to java.io.IOException: Unable to communicate with 
yarn-cm1.mis-cds.local:9870 because it requires Secure Site-to-Site 
communications, but this instance is not configured for secure communications
16/05/23 15:49:39 WARN EndpointConnectionPool: EndpointConnectionPool[Cluster 
URL=https://yarn-cm1.mis-cds.local:9090/nifi/] Unable to refresh Remote Group's 
peers due to java.io.IOException: Unable to communicate with 
yarn-cm1.mis-cds.local:9870 because it requires Secure Site-to-Site 
communications, but this instance is not configured for secure communications
Exception in thread "NiFi Receiver" java.lang.NullPointerException
at org.apache.nifi.spark.NiFiReceiver$ReceiveRunnable.run(NiFiReceiver.java:150)
at java.lang.Thread.run(Thread.java:745)

Which clearly shows that secure site to site communication is true
"r
0250: 65 6D 6F 74 65 53 69 74   65 4C 69 73 74 65 6E 69  emoteSiteListeni
0260: 6E 67 50 6F 72 74 22 3A   39 38 37 30 2C 22 73 69  ngPort":9870,"si
0270: 74 65 54 6F 53 69 74 65   53 65 63 75 72 65 22 3A  teToSiteSecure":
0280: 74 72 75 65 2C 22 69 6E 

Re: Use Case...Please help

2016-05-23 Thread Conrad Crampton
Hi,
A bit hard to tell with the information you have given, but it appears to me 
that you are trying to fetch a file from the file system and transfer to HDFS 
but you are referring to a file that isn’t actually available on the machine 
NiFi is running on. The ‘Input Directory Location’ attribute (according to the 
docs) states this is used to preserve state (I assume of the files being read 
in the directory) either locally or across the cluster – it isn’t there to 
indicate that the file is remote (to the Nifi server).
If the assumptions I am making are true, then you can’t use ListFile processor 
to get a file from a server that isn’t the local nit one.
You will need to get the files in a different way (FTP etc.
Conrad

From: "Tripathi, Shiv Deepak" 
>
Reply-To: "users@nifi.apache.org" 
>
Date: Monday, 23 May 2016 at 11:28
To: "users@nifi.apache.org" 
>
Subject: RE: Use Case...Please help

Can anybody help me please. I am new and got stucked.


Thanks,
Deepak
From: Tripathi, Shiv Deepak [mailto:shiv.deepak.tripa...@philips.com]
Sent: Sunday, May 22, 2016 11:10 PM
To: users@nifi.apache.org
Subject: RE: Use Case...Please help

Hi Mark,

In order to implement apache nifi.

I downloaded hortonworks sandbox and installed apache nifi on that. Its working 
fine in below scenario.

Scenario 1: My input directory is in local file system on HDP(screenshot name 
“listfilelocaldir”) and output is on HDFS file system.

For all processor in dataflow please see screenshot – “HDP sandbox local to 
HDFS”

Scenario 2: could you please tell me which processor and in what order I need 
to use if I want to send file from 
\\btc7n001\Ongoing-MR\MRI\Deepak  
(password enable network drive mapped to my machine) to HDP cluster created in 
VMplayer.

Its not recognizing the input directory at all. Please see the screenshot 
name-“Usecaseinputdir.jpeg”

Please help me.

Thanks,
Deepak


From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Monday, May 16, 2016 6:19 PM
To: users@nifi.apache.org
Subject: Re: Use Case...Please help

Deepak,

Yes, you should be able to do so.

Thanks
-Mark

On May 16, 2016, at 8:44 AM, Tripathi, Shiv Deepak 
> 
wrote:

Thanks a lot Mark.

Looking forward to try it out.

If I understood correctly than I can drop log copying script and staging 
machine and can directly pull the logs from repository.

Please confirm.

Thanks,
Deepak

From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Monday, May 16, 2016 5:06 PM
To: users@nifi.apache.org
Subject: Re: Use Case...Please help

Deepak,

Thanks for providing such a detailed description of your use case. I think NiFi 
would be an excellent
tool to help you out here!

As I mentioned before, you would typically use ListFile -> FetchFile to pull 
the data in. Clearly, here,
though, you want to be more selective about what you pull in. You can 
accomplish this by using a
RouteOnAttribute processor. So you'd have something like: ListFile -> 
RouteOnAttribute -> FetchFile.
The RouteOnAttribute processor is very powerful and allows you to configure how 
to route each piece
of data based on whatever attributes are available. The ListFile Processor adds 
the following attributes
to each piece of data that it pulls in:

filename (name of file)
path (relative path of file)
absolute.path (absolute directory of file)
fs.owner (owner of the file)
fs.group (group that the file belongs to)
fs.lastModified (last modified date)
fs.length (file length)
fs.permissions (file permissions, such as rw-rw-r--)

From these, you can make all sorts of routing decisions, based on name, 
timestamp, etc. You can choose
to terminate data that does not meet your criteria.

When you use FetchFile, you have the option of deleting the source file, moving 
it elsewhere, or leaving
it as-is. So you wouldn't need to delete it if you don't want to. This is 
possible because ListFile keeps track
of what has been 'listed'. So it won't ingest duplicate data, but it will pick 
up new files (if any existing
file is modified, it will pick up the new version of the file.)

You can then use UnpackContent if you want to unzip the data, or you can leave 
it zipped. After the FetchFile,
you can also use a RouteOnAttribute processor to separate out the XML from the 
log files and put those to
different directories in HDFS.

Does this sound like it will provide you all that you need?

Thanks
-Mark



On May 16, 2016, at 3:06 AM, Tripathi, Shiv Deepak 
> 
wrote:

Hi Mark,

I am very happy to see the detailed reply. I am very thankful to you. 

Re: Spark & NiFi question

2016-05-23 Thread Conrad Crampton
Hi,
An update to this but still not working
I have now set keystore and truststore as system properties, and included these 
as part of the SiteToSiteClientConfig building. I have used a cert that I have 
for one of the servers in my cluster as I know they can communicate over ssl 
with NCM as my 6 node cluster works over ssl and has remote ports working (as I 
read from syslog on a primary server then distribute to other via remote ports 
as suggested somewhere else) .
When I try now to connect to output port via Spark, I get a
"EndpointConnectionPool[Cluster URL=https://yarn-cm1.mis-cds.local:9090/nifi/] 
Unable to refresh Remote Group's peers due to java.io.IOException: Unable to 
communicate with yarn-cm1.mis-cds.local:9870 because it requires Secure 
Site-to-Site communications, but this instance is not configured for secure 
communications"
Exception even though I know Secure Site-to-Site communication is working (9870 
being the port set up for remote s2s comms in nifi.properties), so I am now 
really confused!!

Does the port that I wish to read from need to be set up with remote process 
group (conceptually I’m struggling with how to do this for an output port), or 
is it is sufficient to be ‘just an output port’?

I have this working when connecting to an unsecured (http) instance of NiFi 
running on my laptop with Spark and a standard output port. Does it make a 
difference that my production cluster is a cluster and therefore needs setting 
up differently?

So many questions but I’m stuck now so any suggestions welcome.
Thanks
Conrad

From: Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Friday, 20 May 2016 at 09:16
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: SPOOFED: Re: Spark & NiFi question

Thanks for the pointers Bryan, however wrt your first suggestion. I tried 
without setting SSL properties on System properties and get an unable to find 
ssl path error – this gets resolved by doing as I have done (but of course this 
may be a red herring). I initially tried setting on site builder but got the 
same error as below – it appears to make no difference as to what is logged in 
the nifi-users.log if I include SSL props on site builder or not, I get the 
same error viz:

2016-05-20 08:59:47,082 INFO [NiFi Web Server-29590180] 
o.a.n.w.s.NiFiAuthenticationFilter Attempting request for 
(

Spark & NiFi question

2016-05-19 Thread Conrad Crampton
Hi,
Tried following a couple of blog posts about this [1], [2], but neither of 
these refer to using NiFi in clustered environment with SSL and I suspect this 
is where I am hitting problems (but don’t know where).

The blogs state that using an output port (in the root process group I.e. on 
main canvas) which I have done and tried to connect thus..

System.setProperty("javax.net.ssl.keyStore", "/spark-processor.jks");
System.setProperty("javax.net.ssl.keyStorePassword", “*");
System.setProperty("javax.net.ssl.trustStore", “/cacerts.jks");

SiteToSiteClientConfig config = new SiteToSiteClient.Builder()
.url("https://yarn-cm1.mis-cds.local:9090/nifi;)
.portName("Spark test out")
.buildConfig();

SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("NiFi 
Spark Log Processor");
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new 
Duration(5000));
JavaReceiverInputDStream packetStream = jssc.receiverStream(new 
NiFiReceiver(config, StorageLevel.MEMORY_ONLY()));

JavaDStream text = packetStream.map(dataPacket -> new 
String(dataPacket.getContent(), StandardCharsets.UTF_8));
text.print();
jssc.start();
jssc.awaitTermination();

The error I am getting is

16/05/19 16:39:03 WARN ReceiverSupervisorImpl: Restarting receiver with delay 
2000 ms: Failed to receive data from NiFi
java.io.IOException: Server returned HTTP response code: 401 for URL: 
https://yarn-cm1.mis-cds.local:9090/nifi-api/controller
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1889)
at 
sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1884)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1883)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1456)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
at 
org.apache.nifi.remote.util.NiFiRestApiUtil.getController(NiFiRestApiUtil.java:69)
at 
org.apache.nifi.remote.client.socket.EndpointConnectionPool.refreshRemoteInfo(EndpointConnectionPool.java:891)
at 
org.apache.nifi.remote.client.socket.EndpointConnectionPool.getPortIdentifier(EndpointConnectionPool.java:878)
at 
org.apache.nifi.remote.client.socket.EndpointConnectionPool.getOutputPortIdentifier(EndpointConnectionPool.java:862)
at 
org.apache.nifi.remote.client.socket.SocketClient.getPortIdentifier(SocketClient.java:81)
at 
org.apache.nifi.remote.client.socket.SocketClient.createTransaction(SocketClient.java:123)
at org.apache.nifi.spark.NiFiReceiver$ReceiveRunnable.run(NiFiReceiver.java:149)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Server returned HTTP response code: 401 for 
URL: https://yarn-cm1.mis-cds.local:9090/nifi-api/controller
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at 
org.apache.nifi.remote.util.NiFiRestApiUtil.getController(NiFiRestApiUtil.java:66)
... 7 more

Any pointers would be helpful in getting this working. I don’t know if I have 
to set up a remote process group with the output port (not sure how this 
works), or what. When I go to 
https://yarn-cm1.mis-cds.local:9090/nifi-api/controller in the browser, I get 
an access denied error.
I have created keystore and signed by the RootCA used to sign all the self 
signed certs for the cluster.

Running 0.6.1, 6 node cluster.

Thanks
Conrad

[1[ - 
https://community.hortonworks.com/articles/12708/nifi-feeding-data-to-spark-streaming.html
[2] - https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a 

Re: Logstash/ Filebeat/ Lumberjack -> Nifi

2016-05-09 Thread Conrad Crampton
Thanks for this – you make some very interesting points about the use of 
Logstash and you are correct, I am only just looking at Logstash but will now 
look to use Nifi if possible instead to connect to my central cluster.
Regards
Conrad

From: Andrew Psaltis <psaltis.and...@gmail.com<mailto:psaltis.and...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Saturday, 7 May 2016 at 16:43
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi

Hi Conrad,
Based on your email it sounds like you are potentially just getting started 
with Logstash. The one thing I can share is that up until recently I worked in 
an environment where we had ~3,000 nodes deployed and all either had Logstash 
or Flume (was transitioning to Logstash). We used Puppet and the Logstash 
module was in the base templates so as App developers provisioned new nodes 
Logstash was automatically deployed and configured. I can tell you that it 
seems really easy at first, however, my team was always messing with, tweaking, 
and troubleshooting the Logstash scripts as we wanted to ingest different data 
sources, modify how the data was captured, or fix bugs. Knowing now what I do 
about NiFi, if I had a chance to do it over again (will be talking to old 
colleagues about it) I would just use Nifi on all of those edge nodes and then 
send the data to central NiFi cluster. To me there are at least several huge 
benefits to this:

  1.  You use one tool, which provides an amazingly easy and very powerful way 
to control and adjust the dataflow all without having to muck with any scripts. 
You can easily filter / enrich / transform the data at the edge node all via a 
UI.
  2.  You get provenance information from the edge all the way back. This is 
very powerful, you can actually answer the questions from others of "how come 
my log entry never made it to System X" or even better how the data was changed 
along the way. The "why did my log entry make it to System X" sometimes can be 
answered via searching through logs, but that also assumes you have the 
information in the logs to begin with. I can tell you that these questions will 
come up. We had data that would go through a pipeline and finally into HDFS. 
And we would get the questions from app developers when they queried the data 
in Hive and wanted to know why certain log entries were missing.

Hope this helps.

In good health,
Andrew

On Sat, May 7, 2016 at 8:15 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi Bryan,
Some good tips and validation of my thinking.
It did occur to me to use the standalone NiFi and as I have no particular need 
to use Logstash for any other reason.
Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Friday, 6 May 2016 at 14:56
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi

Hi Conrad,

I am not that familiar with LogStash, but as you mentioned there is a PR for 
Lumberjack processors [1] which is not yet released, but could help if you are 
already using LogStash.
If LogStash has outputs for TCP, UDP, or syslog then like you mentioned, it 
seems like this could work well with ListenTCP, ListenUDP, or ListenSyslog.

I think the only additional benefit of Lumberjack is that it is an application 
level protocol that provides additional reliability on top of the networking 
protocols, meaning if ListenLumberjack receives an event over TCP it would then 
acknowledge that NiFi has successfully received and stored the data, since TCP 
can only guarantee it was delivered to the socket, but the application could 
have dropped it.

Although MiNiFi is not yet released, a possible solution is to run standalone 
NiFi instances on the servers where your logs are, with a simple flow like 
TailFile -> Remote Process Group which sends the logs back to a central NiFi 
instance over Site-To-Site.

Are you able to share any more info about what kind of logs they are and how 
they are being produced?
If they are coming from Java applications using logback or log4j, and if you 
have control over those applications, you can also use a specific appender like 
a UDP appender to send directly over to ListenUDP in NiFi.

Hope that helps.

-Bryan

[1] https://github.com/apache/nifi/pull/290

On Fri, May 6, 2016 at 3:33 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com&g

ConvertAvroSchema locale default error

2016-05-06 Thread Conrad Crampton
Hi,
I am seeing this error in my logs (may or may not be linked to upgrade to 0.6.1)

java.lang.IllegalArgumentException: Invalid locale format: default
 at org.apache.commons.lang.LocaleUtils.toLocale(LocaleUtils.java:110) 
~[na:na]
 at 
org.apache.nifi.processors.kite.ConvertAvroSchema.onTrigger(ConvertAvroSchema.java:277)
 ~[na:na]
….


I have a number of flows that use ConvertAvroSchema processor but I just have 
the default locale of ‘default’ in there which would appear to be the problem. 
I’m running on Java 8 if that make a difference.

Running >locale on a node in my cluster with the problem returns the same as 
nodes without the error.
[root@yarn-cm1 opt]# locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

Should I manually set the locale in the ConvertAvroSchema or need to do 
something else??

Thanks
Conrad


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Nifi versions and upgrades

2016-05-06 Thread Conrad Crampton
Ok, will do
Cheers
Conrad




On 06/05/2016, 13:18, "Joe Witt" <joe.w...@gmail.com> wrote:

>Conrad,
>
>Glad the upgrade is back on track.
>
>As for the kite/avro and locale case probably best to start another
>thread as it will have a better chance of catching the eye of folks
>with expertise there.
>
>Thanks
>Joe
>
>On Fri, May 6, 2016 at 8:05 AM, Conrad Crampton
><conrad.cramp...@secdata.com> wrote:
>> Hi Joe,
>> Thanks for your quick response (again).
>> You prompted me to look at every node in the cluster with ps aux | grep nifi 
>> and found that two of my nodes were still running 0.5.1. I do remember now 
>> that I was experiencing an odd thing with these in that everytime I did a 
>> service nifi stop, I would see nifi start up again (using ps) which was a 
>> bit odd but I had thought I had sorted that.
>> However, having now sorted out these two nodes all appears well again. 
>> Correct version reported and all processors available. ;-)
>>
>> However…
>> I am seeing
>> java.lang.IllegalArgumentException: Invalid locale format: default
>> at 
>> org.apache.commons.lang.LocaleUtils.toLocale(LocaleUtils.java:110) ~[na:na]
>> at 
>> org.apache.nifi.processors.kite.ConvertAvroSchema.onTrigger(ConvertAvroSchema.java:277)
>>  ~[na:na]
>>
>>
>> I have a number of flows that use convertAvroSchema processor but I just 
>> have the default locale of ‘default’ in there which would appear to be the 
>> problem. I’m running on Java 8 if that make a difference.
>>
>> Thanks
>> Conrad
>>
>>
>>
>> On 06/05/2016, 12:22, "Joe Witt" <joe.w...@gmail.com> wrote:
>>
>>>Conrad,
>>>
>>>Inside the conf/nifi.properties file there is a property called
>>>
>>>  nifi.version=
>>>
>>>You will need to change that to the version you've upgrade to if you
>>>just carried forward the old config file.  That value is what ends up
>>>being displayed in the web-ui.
>>>
>>>Now, having said this if you are running and see old processors and
>>>such listed then it means the NiFi instance you're connecting to is
>>>likely still an old one.  Are you sure you stopped the old one before
>>>starting the new one?  Could you show a listing of the nifi lib
>>>directory as found in the ps listing you reference?
>>>
>>>Thanks
>>>Joe
>>>
>>>
>>>
>>>On Fri, May 6, 2016 at 5:36 AM, Conrad Crampton
>>><conrad.cramp...@secdata.com> wrote:
>>>> Hi,
>>>> I have followed the advice on upgrade planning [1] and [2] and had success
>>>> since upgrading from 0.4.1 to 0.6.1 (via .5x) or so I thought!
>>>> I am using ansible to perform the upgrades to 6 node cluster but then go
>>>> into each server to do a ./nifi.sh install to install new version as
>>>> service. Running service nifi start shows correct bootstrap messages in 
>>>> that
>>>> 0.6.1 is running with correct home (/opt/nifi-0.6.1) and all seems fine.
>>>> There is a /work directory expanded with all correct versions of nars etc.
>>>> and ps aux | grep nifi shows that nifi is in fact running from
>>>> /opt/nifi-0.6.1.
>>>>
>>>> So what’s the problem?
>>>>
>>>> In the UI, I was trying to use the new ListenTCP processor and it isn’t
>>>> listed when I drag the processors onto the workspace (I get 125 processors
>>>> listed), and when I do help…about, I get 0.5.1 as version.
>>>>
>>>> Can anyone suggest how to force UI to update (if indeed this is the
>>>> problem)??
>>>>
>>>> Thanks
>>>> Conrad
>>>>
>>>>
>>>> [1] https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi
>>>> [2] https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance
>>>>
>>>>
>>>> SecureData, combating cyber threats
>>>>
>>>> 
>>>>
>>>> The information contained in this message or any of its attachments may be
>>>> privileged and confidential and intended for the exclusive use of the
>>>> intended recipient. If you are not the intended recipient any disclosure,
>>>> reproduction, distribution or other dissemination or use of this
>>>> communications is strictly prohibited. The views expressed in this email 
>>>> are
>>>> those of the individual and not necessarily of SecureData Europe Ltd. Any
>>>> prices quoted are only valid if followed up by a formal written quote.
>>>>
>>>> SecureData Europe Limited. Registered in England & Wales 04365896.
>>>> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
>>>> Maidstone, Kent, ME16 9NT
>>>
>>>
>>> ***This email originated outside SecureData***
>>>
>>>Click 
>>>https://www.mailcontrol.com/sr/peFRXtQH4erGX2PQPOmvUgItITKVa7z0XOOW7+M1r6JBBlJeuOi1+SO3eKXW6JRGeUDNxz32L0JsVqdfswMBgw==
>>>  to report this email as spam.


Re: Nifi versions and upgrades

2016-05-06 Thread Conrad Crampton
Hi Joe,
Thanks for your quick response (again).
You prompted me to look at every node in the cluster with ps aux | grep nifi 
and found that two of my nodes were still running 0.5.1. I do remember now that 
I was experiencing an odd thing with these in that everytime I did a service 
nifi stop, I would see nifi start up again (using ps) which was a bit odd but I 
had thought I had sorted that.
However, having now sorted out these two nodes all appears well again. Correct 
version reported and all processors available. ;-)

However…
I am seeing 
java.lang.IllegalArgumentException: Invalid locale format: default
at org.apache.commons.lang.LocaleUtils.toLocale(LocaleUtils.java:110) 
~[na:na]
at 
org.apache.nifi.processors.kite.ConvertAvroSchema.onTrigger(ConvertAvroSchema.java:277)
 ~[na:na]


I have a number of flows that use convertAvroSchema processor but I just have 
the default locale of ‘default’ in there which would appear to be the problem. 
I’m running on Java 8 if that make a difference.

Thanks
Conrad



On 06/05/2016, 12:22, "Joe Witt" <joe.w...@gmail.com> wrote:

>Conrad,
>
>Inside the conf/nifi.properties file there is a property called
>
>  nifi.version=
>
>You will need to change that to the version you've upgrade to if you
>just carried forward the old config file.  That value is what ends up
>being displayed in the web-ui.
>
>Now, having said this if you are running and see old processors and
>such listed then it means the NiFi instance you're connecting to is
>likely still an old one.  Are you sure you stopped the old one before
>starting the new one?  Could you show a listing of the nifi lib
>directory as found in the ps listing you reference?
>
>Thanks
>Joe
>
>
>
>On Fri, May 6, 2016 at 5:36 AM, Conrad Crampton
><conrad.cramp...@secdata.com> wrote:
>> Hi,
>> I have followed the advice on upgrade planning [1] and [2] and had success
>> since upgrading from 0.4.1 to 0.6.1 (via .5x) or so I thought!
>> I am using ansible to perform the upgrades to 6 node cluster but then go
>> into each server to do a ./nifi.sh install to install new version as
>> service. Running service nifi start shows correct bootstrap messages in that
>> 0.6.1 is running with correct home (/opt/nifi-0.6.1) and all seems fine.
>> There is a /work directory expanded with all correct versions of nars etc.
>> and ps aux | grep nifi shows that nifi is in fact running from
>> /opt/nifi-0.6.1.
>>
>> So what’s the problem?
>>
>> In the UI, I was trying to use the new ListenTCP processor and it isn’t
>> listed when I drag the processors onto the workspace (I get 125 processors
>> listed), and when I do help…about, I get 0.5.1 as version.
>>
>> Can anyone suggest how to force UI to update (if indeed this is the
>> problem)??
>>
>> Thanks
>> Conrad
>>
>>
>> [1] https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi
>> [2] https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance
>>
>>
>> SecureData, combating cyber threats
>>
>> 
>>
>> The information contained in this message or any of its attachments may be
>> privileged and confidential and intended for the exclusive use of the
>> intended recipient. If you are not the intended recipient any disclosure,
>> reproduction, distribution or other dissemination or use of this
>> communications is strictly prohibited. The views expressed in this email are
>> those of the individual and not necessarily of SecureData Europe Ltd. Any
>> prices quoted are only valid if followed up by a formal written quote.
>>
>> SecureData Europe Limited. Registered in England & Wales 04365896.
>> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
>> Maidstone, Kent, ME16 9NT
>
>
> ***This email originated outside SecureData***
>
>Click 
>https://www.mailcontrol.com/sr/peFRXtQH4erGX2PQPOmvUgItITKVa7z0XOOW7+M1r6JBBlJeuOi1+SO3eKXW6JRGeUDNxz32L0JsVqdfswMBgw==
>  to report this email as spam.


Nifi versions and upgrades

2016-05-06 Thread Conrad Crampton
Hi,
I have followed the advice on upgrade planning [1] and [2] and had success 
since upgrading from 0.4.1 to 0.6.1 (via .5x) or so I thought!
I am using ansible to perform the upgrades to 6 node cluster but then go into 
each server to do a ./nifi.sh install to install new version as service. 
Running service nifi start shows correct bootstrap messages in that 0.6.1 is 
running with correct home (/opt/nifi-0.6.1) and all seems fine. There is a 
/work directory expanded with all correct versions of nars etc. and ps aux | 
grep nifi shows that nifi is in fact running from /opt/nifi-0.6.1.

So what’s the problem?

In the UI, I was trying to use the new ListenTCP processor and it isn’t listed 
when I drag the processors onto the workspace (I get 125 processors listed), 
and when I do help…about, I get 0.5.1 as version.

Can anyone suggest how to force UI to update (if indeed this is the problem)??

Thanks
Conrad


[1] https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi
[2] https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Logstash/ Filebeat/ Lumberjack -> Nifi

2016-05-06 Thread Conrad Crampton
Hi,
Some advice if possible please. Whilst I would love to wait for the MiNiFi 
project realise its objectives as this sounds exactly what I want from the 
initial suggestions I have a pressing need to shift some log files on remote 
servers (to my DC) to my NiFi cluster. Having a quick look at LogStash it would 
look to provide what I want but there doesn’t (yet – I’m aware of the work 
going on Lumberjack processor but not in current release) appear to be a simple 
way of getting files from Logstash to Nifi.

The options currently would appear to be use any number of output plugins in 
Logstash – TCP, UDP, syslog, kafka, http, rabbitmq then use the equivalent 
receiver in Nifi (with some intermediate service in some cases – Kafka, 
rabbitmq).

Can any one suggest the ‘best’ way here? I’m trying to prove a point about 
cutting out some other intermediate product so this is something that has to be 
in production now – I can always refactor at a later date to have a ‘better’ 
solution (MiNiFi ??).

Why don’t I ask on Logstash forums? You folks have always been a great help 
before ;-)

Thanks
Conrad

Nb. Of course not saying Logstash folks wouldn’t be equally helpful :-)


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: ConvertJSONToAvro floating point / double errors

2016-04-08 Thread Conrad Crampton
I think I had the same or similar problem (with earlier version) and couldn’t 
get a direct Json->Avro to work with any Json numerical type. I ended up 
working around by having the Avro schema data types being String only and then 
doing a ConvertAvroSchema to move from the string only schema to the final 
intended one that included number types.
Can’t answer if the error though is intended behaviour for some reason 
unbeknown to me.
Regards
Conrad



On 07/04/2016, 21:49, "Samuel Piercy"  wrote:

>In Nifi 0.6.0 with Java 1.7, the conversion of JSON data to Avro appears to 
>reject the floating point values (both float and double).
>
>Using the ConvertJSONToAvro processor and the following examples causes errors.
>
>Sample Avro Schema:  
>{"name": " sample_float_avro", "type": "record", "fields": [   
>{"name":"sampleLong", "type":["null", "long"]}  ,{"name":"sampleInt", 
>"type":["null", "int"]}  ,{"name":"sampleFloat", "type":["null", "float"]} ]}
>
>Fail Example:
>{"sampleLong":123456, "sampleInt":345, "sampleFloat":10.1}
>
>Success Example:
>{"sampleLong":123456, "sampleInt":345, "sampleFloat":10}
>
>Is anyone else experiencing similar issues?
>
>Sam
>
>
> ***This email originated outside SecureData***
>
>Click https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==  to report this 
>email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Sqoop Support in NIFI

2016-03-29 Thread Conrad Crampton
Hi,
Why use sqoop at all? Use a combination of ExecuteSQL [1] and PutHDFS [2].
I have just replace the use of Flume using a combination of ListenSyslog and 
PutHDFS which I guess is a similar architectural pattern.
HTH
Conrad


http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/index.html
 [1]
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/index.html
 [2]

From: prabhu Mahendran >
Reply-To: "users@nifi.apache.org" 
>
Date: Tuesday, 29 March 2016 at 07:27
To: "users@nifi.apache.org" 
>
Subject: Sqoop Support in NIFI

Hi,

I am new to nifi.

   I have to know that  "Is there is any Support for Sqoop with help of 
NIFI Processors?."

And in which way to done the following case with help of Sqoop.

Move data from oracle,SqlServer,MySql into HDFS and vice versa.


Thanks,
Prabhu Mahendran





***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Help on creating that flow that requires processing attributes in a flow content but need to preserve the original flow content

2016-03-22 Thread Conrad Crampton
My 2p. 
If the kaka.key value (very simple json), you could use UpdateAttribute and use 
some expression language - specifically the string manipulation functions to 
extract the part you want.
I like the power or ExecuteProcessor by the way.

And I agree, this community is phenomenally responsive and helpful.

Regards
Conrad




On 21/03/2016, 18:38, "McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote)" 
 wrote:

>Thanks everyone.  While I’m naturally disappointed that this doesn’t exist, I 
>am hyper charged about the responsiveness and enthusiasm of the NiFi community!
>
>From: Matt Burgess >
>Reply-To: "users@nifi.apache.org" 
>>
>Date: Monday, March 21, 2016 at 1:58 PM
>To: "users@nifi.apache.org" 
>>
>Subject: Re: Help on creating that flow that requires processing attributes in 
>a flow content but need to preserve the original flow content
>
>One way (in NiFi 0.5.0+) is to use the ExecuteScript processor, which gives 
>you full control over the session and flowfile(s).  For example if you had 
>JSON in your "kafka.key" attribute such as "{"data": {"myKey": "myValue"}}" , 
>you could use the following Groovy script to parse out the value of the 
>'data.myKey' field:
>
>def flowfile = session.get()
>if(!flowfile) return
>def json = new 
>groovy.json.JsonSlurper().parseText(flowfile.getAttribute('kafka.key'))
>flowfile = session.putAttribute(flowfile, 'myKey', json.data.myKey)
>session.transfer(flowfile, REL_SUCCESS)
>
>
>I put an example of this up as a Gist 
>(https://gist.github.com/mattyb149/478864017ec70d76f74f)
>
>A possible improvement could be to add a "jsonPath" function to Expression 
>Language, which could take any value (including an attribute) along with a 
>JSONPath expression to evaluate against it...
>
>Regards,
>Matt
>
>On Mon, Mar 21, 2016 at 1:48 PM, McDermott, Chris Kevin (MSDU - 
>STaTS/StorefrontRemote) 
>> wrote:
>Joe,
>
>Thanks for the reply.  I think I was not clear.
>
>The JSON I need to evaluate is in a FlowFile attribute (kafka.key) which I 
>need to be able to evaluate without modifying the original FlowFile content 
>(which was read from the Kafka topic).  What I can’t figure out is how to 
>squirrel away the flowfile content so that I can write the value of the 
>kafka.key attribute to the FlowFile content, so that I can process it with 
>EvaluateJsonPath, and then read content I squirreled away back into the 
>FlowFile content. I considered using the the DistributedMapCache, but there 
>would be no guarantee what I added to the cache would still be there when I 
>needed it back.
>
>
>
>
>On 3/21/16, 1:37 PM, "Joe Witt" 
>> wrote:
>
>>Chris,
>>
>>Sounds like you have the right flow in mind already.  EvaluateJSONPath
>>does not write content.  It merely evaluates the given jsonpath
>>expression against the content of the flowfile and if appropriate
>>creates a flowfile attribute of what it finds.
>>
>>For example if you have JSON from Twitter you can use EvaluateJsonPath
>>and add a property with a name
>>'twitter.user' and a value of '$.user.name'
>>
>>Once you run the tweets through each flow file will have an attribute
>>called 'twitter.user' with the name found in the message.  No
>>manipulation of content at all.  Just promotes things it finds to flow
>>file attributes.
>>
>>Thanks
>>Joe
>>
>>On Mon, Mar 21, 2016 at 1:34 PM, McDermott, Chris Kevin (MSDU -
>>STaTS/StorefrontRemote) 
>>> wrote:
>>> What I need to do is read a file from Kafka.  The Kafka key contains a JSON 
>>> string which I need to turn in FlowFile attributes while preserving the 
>>> original FlowFile content.  Obviously I can use EvaluteJsonPath but that 
>>> necessitates replacing the FlowFile content with the kaka.key attribute, 
>>> thus loosing the original FlowFile content.  I feel like I’m missing 
>>> something fundamental.
>
>
>
> ***This email originated outside SecureData***
>
>Click https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==  to report this 
>email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written 

NPE in ExtractText

2016-03-19 Thread Conrad Crampton
Hi,
I’m getting repeated NullPointerException reported for an ExtractText processor 
(processing the resultant splits from a ListenSyslog) thus:

datanode2-cm1.mis-cds.local:9092ExtractText[id=4372efbf-efcf-3065-acd4-b8fdb91d64fb]
 ExtractText[id=4372efbf-efcf-3065-acd4-b8fdb91d64fb] failed to process due to 
java.lang.NullPointerException; rolling back session: 
java.lang.NullPointerException

The same error appears for each node in my cluster too so not specific to one 
node.
I have enabled additional logging for ExtractText processor (I think) by adding



 

To each logback.xml on each node, but this doesn’t give any more details as to 
why the NPE.
Flowflies are getting through the processor but concerned about the errors as 
clearly something isn’t correct so suggestions welcome.

Thanks
Conrad


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: ListHDFS cache (again)

2016-03-19 Thread Conrad Crampton
Hi Mark,
I’m using 0.5.1 and yes I am using DistributedMapCacheClientService (that is 
the only option isn’t it?).
Deleting the local file I guess is the piece that I’m missing – recreating the 
processor therefore would have a different id and I suppose this is the reason 
for it starting after that.

I understand why it would be an edge case to be able to clear the cache for 
this processor, but there are times when a major balls up requires one to 
reprocess a load of files in HDFS!

Thanks, I’ll look out for 0.6

Conrad

From: Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 16 March 2016 at 12:36
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: ListHDFS cache (again)

Conrad,

If you are using a version that used DistributedMapCacheClientService, it also 
was saving state in a local file,
$NIFI_HOME/conf/state/

You would need to delete that file, as well. I am not sure though if it would 
notice that the file was deleted
without restarting NiFi. That processor wasn't really designed with the intent 
of allowing the user to clear
the cache.

That said, in NiFi 0.6.0, which should be coming very soon (hopefully next 
week?) it has been updated
to allow users to very easily view and reset the state. Hopefully that will 
ease the pain for you.

Thanks
-Mark

On Mar 16, 2016, at 4:33 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
The subject of invalidating/ resetting the cache/ forcing new listing returned 
from ListHDFS was raised back in December 2015 [1] with the ‘resolution’ to 
delete the backing DistributedMapCacheClientService and add a different one. I 
have tried this – recreated completely from scratch, different name, different 
node, different port – all manner of combinations, but still the listing 
returned from ListHDFS starts where it left off. Satanic forces are at work 
here on my cluster I’m sure!!!

The odd thing is that there was an moment when I ran the flow where it did 
start from scratch but I can’t remember the circumstances of this. I’ve tried 
rebooting the NCM to no avail.

Any further suggestions to those in [1] as this is a bit of a show stopper as I 
can’t get the data into the flow I want no matter how I try.

I have got this working again, but only by deleting the processor and 
recreating. A bit of a faff so would be interested if anyone else had same 
problem and any tips on a better workaround than mine.

Thanks
Conrad

[1] 
https://mail-archives.apache.org/mod_mbox/nifi-users/201512.mbox/%3CCAEf2RqDGoTGkBd4dnLhuawPr4oOmFF7rRkrv_=ae8u2rq6o...@mail.gmail.com%3E


SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT




***This email originated outside SecureData***

Click 
here<https://www.mailcontrol.com/sr/3btdVV2+MQvGX2PQPOmvUg5!3elP1qzzaasieKVLFyIFErGdCBN8EnXT9xYndTKV1Buv4e!TI0sJwVIOzJj0SQ==>
 to report this email as spam.


Re: NPE in ExtractText

2016-03-19 Thread Conrad Crampton
Hi,
I don’t know if this is expected behaviour but I think I understand why this is 
happening now. I have a regexp in the ExtractText processors viz:

(?s:^.+: (\d\d?)(\w\w\w)(\d{4}) ([\d ]\d:\d\d:\d\d) Product=(.+?) 
OriginIP=(.+?) Origin=(.+?) Action=(.+?) SIP=(.+?) Source=(.+?) SPort=(\d+?) 
DIP=(.+) Destination=(.+?) DPort=(\d+?) Protocol=(.+?)(?: ICMPType=(.+?) 
ICMPCode=(.+?))? IFName=(.+?) IFDirection=(.+?) Reason=(.+?) Rule=(.+?) 
PolicyName=(.+?) Info=(.+?) XlateSIP=(.+?) XlateSPort=([\d]+|\-?) 
XlateDIP=(.+?) XlateDPort=([\d]+|\-?)(.*)$)

With this (?: ICMPType=(.+?) ICMPCode=(.+?))?  the problem I think. Because I 
have made a non capturing matching group optional, for those log lines that 
don’t have this section matching the dynamic variable can’t set the index 
correctly as the match is returning null for these capture groups. Obviously I 
haven’t gone too deep into the code, but if I have a RouteOnContent processor 
before this testing for this string and remove this from regexp (and have two 
ExtractText processors) then it works. It appeared that all the NPE were thrown 
for those lines that didn’t match the optional matching group.

Has this been observed before?

Thanks
Conrad

From: Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 16 March 2016 at 12:01
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: NPE in ExtractText

Hi,
I’m getting repeated NullPointerException reported for an ExtractText processor 
(processing the resultant splits from a ListenSyslog) thus:

datanode2-cm1.mis-cds.local:9092ExtractText[id=4372efbf-efcf-3065-acd4-b8fdb91d64fb]
 ExtractText[id=4372efbf-efcf-3065-acd4-b8fdb91d64fb] failed to process due to 
java.lang.NullPointerException; rolling back session: 
java.lang.NullPointerException

The same error appears for each node in my cluster too so not specific to one 
node.
I have enabled additional logging for ExtractText processor (I think) by adding



 

To each logback.xml on each node, but this doesn’t give any more details as to 
why the NPE.
Flowflies are getting through the processor but concerned about the errors as 
clearly something isn’t correct so suggestions welcome.

Thanks
Conrad


SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT



***This email originated outside SecureData***

Click 
here<https://www.mailcontrol.com/sr/dvbEMcYHHi!GX2PQPOmvUmuFhuAgpUQ2cHTMpiNQsjDSRSQPhu87ylZpGJ6uvw1L1Buv4e!TI0sJwVIOzJj0SQ==>
 to report this email as spam.


Re: User Authentication with username and password

2016-03-10 Thread Conrad Crampton
Hi,
In case you missed this in an earlier thread, Matt Clarke [1] provided some 
very easy steps to create certificates for cluster SSL to work. I’m sure it 
would be easy to extend to create individual user certs. I’m sure this would be 
a pain for many users in which case I would recommend the LDAP way (again, very 
easy to set up following the docs).
Regards
Conrad

[1] 
http://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3CCAC9dF2e_Kuf%2B_JVNM%2BVjiqcmA-rPwBiUc0kOZbvACWFC37XUtg%40mail.gmail.com%3E

From: Aldrin Piri >
Reply-To: "users@nifi.apache.org" 
>
Date: Friday, 11 March 2016 at 04:14
To: "users@nifi.apache.org" 
>
Subject: Re: User Authentication with username and password

Uwe,

Definitely been a frequently requested item by the community and as Andy 
pointed out, it is quite nuanced in getting just right in a way that manages to 
get as close to that delicate balance between usability and security short of 
the computer encased in concrete on the bottom of the ocean floor.

I think James has a good start in providing a basis of implementation for 
extending and drawing from the expertise of the entirety of the community we 
should be able to find an implementation that checks all the right boxes.  The 
parts are there in some forms looking at the work that was performed to 
integrate LDAP and Active Directory.  While the user facing portions have very 
similar constructs, the heart of the security model delegates to other systems.

For Uwe, and anyone that has any interest, I would suggest also checking out 
both the JIRA issue NIFI-1614 [1] and the associated PR [2] and provide some 
input on how such an implementation might look.  Please leave comments, uses, 
and functionality that would make sense to incorporate.  With some iterations 
and design we can find out how such a mechanism would work in both satisfying 
the design approach and principles for NiFi authentication and authorization, 
but doing so in a manner that treats system data and control with the utmost 
importance.

If appropriate, perhaps we could spin out a Wiki entry/feature proposal/design 
that folks could hash out all the constraints.  Such a model will require a 
fair bit of effort and consideration as NiFi has generally avoided doing too 
much outside the purview of dataflow and relied on already established and 
proven technologies.

Thanks for chiming in, Uwe and thanks, James, for the serendipitous PR, or 
extremely fast coding to meet Uwe's inquiry!

[1] https://issues.apache.org/jira/browse/NIFI-1614
[2] https://github.com/apache/nifi/pull/267

On Thu, Mar 10, 2016 at 9:40 PM, Andy LoPresto 
> wrote:
I think it is important to ensure everyone is using the same definitions here. 
As Matt pointed out, NiFi has capabilities for authentication (that is, 
determining an entity is WHO they claim to be) and authorization (determining 
if an entity can DO what they claim). The *AuthorizationProviders allow an 
authenticated user’s access to varying permissions to be determined. However, 
there is no current model for file-based or “simple” authentication in NiFi. As 
Matt stated, client authentication through certificates will allow user 
authentication based on the DN in the certificate, and LDAP authentication is 
also currently available. I am working on Kerberos authentication for 0.6.0 as 
well.

James has provided a PR for file-based authentication but in reviewing it I 
found a couple issues, which are not unique to his code, that prevent me from 
feeling comfortable with it as a safe and production-ready solution. User 
credential administration is a large effort and providing a temporary solution 
will unfortunately often be conscripted into a production environment and 
weaken the overall system security of the installation.

Unfortunately “simple” authentication really isn’t.


Andy LoPresto
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 10, 2016, at 1:55 PM, Matt Gilman 
> wrote:

When NiFi is running over HTTP everyone accesses the application as an 
anonymous user and has full access.

If you want to have individual user accounts, you'll need to first run NiFi 
over HTTPS. In order to do this, you'll need to obtain a server certificate for 
NiFi to use. These details are configured in nifi.security.* sections of the 
properties file. You can choose any port you'd like but typically you'll see 
443 or 8443.

Once this is set up, you'll have two choices for authentication.

The first is to issue client certificates for your users. These certificates 
will be loaded into your 

Re: StoreInKiteDataset - clustered?

2016-03-09 Thread Conrad Crampton
Bryan,
Great spot – thanks for the pointer. This isn’t how it is described in their 
user docs though!
All working now.
Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Tuesday, 8 March 2016 at 17:05
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: StoreInKiteDataset - clustered?

Conrad,

I really have no experience with Kite, but the javadocs [1] suggest something 
like this should work:

dataset:hive://[metastore-host]:[metastore-port]/[path]

-Bryan

[1] 
http://kitesdk.org/docs/0.11.0/apidocs/org/kitesdk/data/DatasetRepositories.html#open(java.lang.String)

On Tue, Mar 8, 2016 at 11:32 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
Having looked at the tests for StoreInKiteDataset processor I can’t see why my 
attempt at using this is failing with a "failed to process due to 
java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI; 
rolling back session: java.lang.IllegalArgumentException: Missing Hive 
MetaStore connection URI” exception. My uri is dataset:hive:prd/logs which I 
created using the Kite CLI on one of my nodes in cluster.
One part of the puzzle that I don’t understand is where this processor gets its 
knowledge of the metadata in order to store in the underlying storage structure 
of the Kite dataset – in this case; hive.
Do I have to somehow replicate the kite metastore (should one exist) one each 
node of my NiFi cluster (or hadoop cluster) so NiFi finds it or am I missing 
some dependencies (as suggested by an previous thread that had no apparent 
successful conclusion)?

Any pointers welcome.
Thanks
Conrad



SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT




***This email originated outside SecureData***

Click 
here<https://www.mailcontrol.com/sr/9aY8QupNoMrGX2PQPOmvUsrLibhXE7+SH!4rXeSq4YH7CxnPuKWAKeHHH7Zw0zmi8nXiLCl81uhue2rhFpgg0Q==>
 to report this email as spam.


StoreInKiteDataset performance (perfomance in general)

2016-03-09 Thread Conrad Crampton
Hi,
Writing to above seems very slow. Anyone give a rough metrics on throughput 
should look like for a modest schema (30 attributes) writing to above (hive as 
underlying metastore)? 6 node cluster – 16 cores / node.
What should I be setting jvm/ thread counts etc to for optimal performance?
Thanks
Conrad


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


StoreInKiteDataset - clustered?

2016-03-08 Thread Conrad Crampton
Hi,
Having looked at the tests for StoreInKiteDataset processor I can’t see why my 
attempt at using this is failing with a "failed to process due to 
java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI; 
rolling back session: java.lang.IllegalArgumentException: Missing Hive 
MetaStore connection URI” exception. My uri is dataset:hive:prd/logs which I 
created using the Kite CLI on one of my nodes in cluster.
One part of the puzzle that I don’t understand is where this processor gets its 
knowledge of the metadata in order to store in the underlying storage structure 
of the Kite dataset – in this case; hive.
Do I have to somehow replicate the kite metastore (should one exist) one each 
node of my NiFi cluster (or hadoop cluster) so NiFi finds it or am I missing 
some dependencies (as suggested by an previous thread that had no apparent 
successful conclusion)?

Any pointers welcome.
Thanks
Conrad



SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


UI performance slow

2016-03-03 Thread Conrad Crampton
Hi,
I have just upgraded my 6 node cluster to 0.5.1 and am experiencing slow 
performance in the UI.
I can’t necessarily attribute this to the upgrade because:

  1.  I didn’t use the cluster particularly very much as I was developing/ 
testing on my laptop, so I don’t know what performance was like before.
  2.  I have increased the number of nodes in the cluster from 3 to 6 (just 
because I introduced Ansible into the mix to ease deployment)

The issues I am experiencing are simple things like dragging processors around, 
opening up dialogs (template list etc.). When doing these actions, I get the 
little spinning circle next to the ‘stats last refreshed’ time. Causing the UI 
experience to be a bit unhelpful.

Anyone suggest some things to check to improve this?

I am using the default bootstrap.conf file without any changes (other than 
run.as=)

Thanks
Conrad


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Aw: Re: Regular Expressions

2016-03-02 Thread Conrad Crampton
Hi,
Yes, it is valid in the tools I use (didn’t try the online one as you have 
access to that).
Clearly nothing is group captured with this regexp though – but it matches.
Conrad

From: Uwe Geercken >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, 2 March 2016 at 16:00
To: "users@nifi.apache.org" 
>
Subject: Aw: Re: Regular Expressions

tks for your reply.

would you do me a favor and check if the expression further below is valid in 
yr regex tool?

tks.

Uwe
--
Diese Nachricht wurde von meinem Android Mobiltelefon mit WEB.DE 
Mail gesendet.

Am 02.03.2016, 14:41, Joe Skora > 
schrieb:
RegexPal is pretty easy to use, and supports PCRE.

On Tue, Mar 1, 2016 at 4:30 PM, Uwe Geercken 
> wrote:
Hello,

I was wondering which tool people use to validate their regular expressions?

I was going throught some of the templates I found in the web and found one 
with following regular expression:

(?s:^.*$)

When using http://www.regexr.com/ which I find very good and complete, 
regexr.com tells me that the question mark at the beginning 
is invalid?

So which way do you write or validate your expressions?

Tks for feedback.

Uwe




***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Nifi JSON event storage in HDFS

2016-03-02 Thread Conrad Crampton
Hi,
I have similar specifications about SQL access – those specifying this keep 
saying Hive, but I don’t believe that is the requirement (typical developer 
knowing best eh?) - I think it is just SQL access that is required. Drill is 
more flexible (in my opinion – I am not affiliated to Drill in any way) and has 
drivers for tooling access too (in a similar way Hive has). There is Spark 
support for Avro too.
I’ll be interested to follow your progress on this.
Conrad

From: Mike Harding <mikeyhard...@gmail.com<mailto:mikeyhard...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 2 March 2016 at 10:54
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Nifi JSON event storage in HDFS

Hi Conrad,

Thanks for the heads up, I will investigate Apache Drill. I also forgot to 
mention that I have downstream requirements about which tools the data 
modellers are comfortable using - they want to use Hive and Spark as the data 
access engines primarily so the data needs to be persisted in HDFS in a way 
that it can be easily accessed by these services.

But your right - there is multiple ways of doing this and I'm hoping NiFi would 
help scope/simplify the pipeline design.

Cheers,
M

On 2 March 2016 at 10:38, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
I am doing something similar, but having wrestled with Hive data population 
(not from NiFi) and its performance I am currently looking at Apache Drill as 
my SQL abstraction layer over my Hadoop cluster (similar size to yours). To 
this end, I have chosen Avro as my ‘persistence’ format and using a number of 
processors to get from raw data though mapping attributes to json to avro (via 
schemas) and ultimately storing in HDFS. Querying this with Drill is a breeze 
then as the schema is already specified within the data which Drill 
understands. The schema can also be extended without impacting existing data 
too.
HTH – I’m sure there are a ton of other ways to skin this particular cat though,
Conrad

From: Mike Harding <mikeyhard...@gmail.com<mailto:mikeyhard...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 2 March 2016 at 10:33
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Nifi JSON event storage in HDFS

Hi All,

I currently have a small hadoop cluster running with HDFS and Hive. My ultimate 
goal is to leverage NiFi's ingestion and flow capabilities to store real-time 
external JSON formatted event data.

What I am unclear about is what the best strategy/design is for storing 
FlowFile data (i.e. JSON events in my case) within HDFS that can then be 
accessed and analysed in Hive tables.

Is much of the design in terms of storage handled in the NiFi flow or do I need 
to set something up external of NiFi to ensure I can query each JSON formatted 
event as a record in a Hive log table for example?

Any examples or suggestions much appreciated,

Thanks,
M



***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.


SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT



Re: Nifi JSON event storage in HDFS

2016-03-02 Thread Conrad Crampton
Hi,
I am doing something similar, but having wrestled with Hive data population 
(not from NiFi) and its performance I am currently looking at Apache Drill as 
my SQL abstraction layer over my Hadoop cluster (similar size to yours). To 
this end, I have chosen Avro as my ‘persistence’ format and using a number of 
processors to get from raw data though mapping attributes to json to avro (via 
schemas) and ultimately storing in HDFS. Querying this with Drill is a breeze 
then as the schema is already specified within the data which Drill 
understands. The schema can also be extended without impacting existing data 
too.
HTH – I’m sure there are a ton of other ways to skin this particular cat though,
Conrad

From: Mike Harding >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, 2 March 2016 at 10:33
To: "users@nifi.apache.org" 
>
Subject: Nifi JSON event storage in HDFS

Hi All,

I currently have a small hadoop cluster running with HDFS and Hive. My ultimate 
goal is to leverage NiFi's ingestion and flow capabilities to store real-time 
external JSON formatted event data.

What I am unclear about is what the best strategy/design is for storing 
FlowFile data (i.e. JSON events in my case) within HDFS that can then be 
accessed and analysed in Hive tables.

Is much of the design in terms of storage handled in the NiFi flow or do I need 
to set something up external of NiFi to ensure I can query each JSON formatted 
event as a record in a Hive log table for example?

Any examples or suggestions much appreciated,

Thanks,
M



***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Re: Authentication error - access denied exporting template

2016-02-22 Thread Conrad Crampton
Excellent news, thanks for such a great tool and the continuing hard work.
Conrad



On 22/02/2016, 15:56, "Joe Witt" <joe.w...@gmail.com> wrote:

>We are working to have an RC for it any day.  We're very close so
>today is feasible.
>
>Thanks
>Joe
>
>On Mon, Feb 22, 2016 at 10:18 AM, Conrad Crampton
><conrad.cramp...@secdata.com> wrote:
>> Matt,
>> Thanks for the update. I trawled the user list archive for anything similar,
>> didn’t think to check the issue log.
>> Is there a date for the 0.5.1 release?
>> Cheer
>> Conrad
>>
>> From: Matt Gilman <matt.c.gil...@gmail.com>
>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Date: Monday, 22 February 2016 at 15:04
>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>> Subject: Re: Authentication error - access denied exporting template
>>
>> Thanks for reporting this. The issue is specifically tied to downloading
>> resources (content or templates) and accessing UI extensions (like the
>> content viewer or custom UI) when logged in via LDAP. Using client
>> certificates should be working as expected.
>>
>> This token issue is addressed in NIFI-1497 [1] and will be included in the
>> upcoming 0.5.1 release. Sorry for the inconvenience.
>>
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-1497
>>
>> On Mon, Feb 22, 2016 at 9:48 AM, Conrad Crampton
>> <conrad.cramp...@secdata.com> wrote:
>>>
>>> Hi,
>>> I have a working NiFi installation on both my local machine and on a
>>> cluster. Both set up with certificates and https access and LDAP
>>> integration. All good.
>>> However, I have come across an issue where I can’t now export templates as
>>> I get an ‘access denied’ error in the UI, and in the nifi-user.log I get
>>> this stack trace…
>>>
>>> o.a.n.w.s.NiFiAuthenticationFilter Unable to authorize: An Authentication
>>> object was not found in the SecurityContext
>>>
>>> org.springframework.security.authentication.AuthenticationCredentialsNotFoundException:
>>> An Authentication object was not found in the SecurityContext
>>> at
>>> org.springframework.security.access.intercept.AbstractSecurityInterceptor.credentialsNotFound(AbstractSecurityInterceptor.java:378)
>>> ~[spring-security-core-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.springframework.security.access.intercept.AbstractSecurityInterceptor.beforeInvocation(AbstractSecurityInterceptor.java:222)
>>> ~[spring-security-core-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:123)
>>> ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:90)
>>> ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>>> [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:122)
>>> ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>>> [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.apache.nifi.web.security.NiFiAuthenticationFilter.authenticate(NiFiAuthenticationFilter.java:99)
>>> [nifi-web-security-0.4.1.jar:0.4.1]
>>> at
>>> org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:60)
>>> [nifi-web-security-0.4.1.jar:0.4.1]
>>> at
>>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>>> [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
>>> at
>>> org.apache.nifi.web.security.NiFiAuthenticationFilter.authenticate(NiFiAuthenticationFilter.java:99)
>>> [nifi-web-security-0.4.1.jar:0.4.1]
>>> at
>>> org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:60)
>>> [nifi-web-security-0.4.1.jar:0.4.1]
>>> at
>>> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>>> [spring-security-web-

Re: Authentication error - access denied exporting template

2016-02-22 Thread Conrad Crampton
Matt,
Thanks for the update. I trawled the user list archive for anything similar, 
didn’t think to check the issue log.
Is there a date for the 0.5.1 release?
Cheer
Conrad

From: Matt Gilman <matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Monday, 22 February 2016 at 15:04
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Authentication error - access denied exporting template

Thanks for reporting this. The issue is specifically tied to downloading 
resources (content or templates) and accessing UI extensions (like the content 
viewer or custom UI) when logged in via LDAP. Using client certificates should 
be working as expected.

This token issue is addressed in NIFI-1497 [1] and will be included in the 
upcoming 0.5.1 release. Sorry for the inconvenience.

Matt

[1] https://issues.apache.org/jira/browse/NIFI-1497

On Mon, Feb 22, 2016 at 9:48 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
I have a working NiFi installation on both my local machine and on a cluster. 
Both set up with certificates and https access and LDAP integration. All good.
However, I have come across an issue where I can’t now export templates as I 
get an ‘access denied’ error in the UI, and in the nifi-user.log I get this 
stack trace…

o.a.n.w.s.NiFiAuthenticationFilter Unable to authorize: An Authentication 
object was not found in the SecurityContext
org.springframework.security.authentication.AuthenticationCredentialsNotFoundException:
 An Authentication object was not found in the SecurityContext
at 
org.springframework.security.access.intercept.AbstractSecurityInterceptor.credentialsNotFound(AbstractSecurityInterceptor.java:378)
 ~[spring-security-core-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.access.intercept.AbstractSecurityInterceptor.beforeInvocation(AbstractSecurityInterceptor.java:222)
 ~[spring-security-core-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:123)
 ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:90)
 ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:122)
 ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.authenticate(NiFiAuthenticationFilter.java:99)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:60)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.authenticate(NiFiAuthenticationFilter.java:99)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:60)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.apache.nifi.web.security.node.NodeAuthorizedUserFilter.doFilter(NodeAuthorizedUserFilter.java:112)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:213)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:176)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.web.filter.DelegatingFilterProxy.invokeDeleg

Authentication error - access denied exporting template

2016-02-22 Thread Conrad Crampton
Hi,
I have a working NiFi installation on both my local machine and on a cluster. 
Both set up with certificates and https access and LDAP integration. All good.
However, I have come across an issue where I can’t now export templates as I 
get an ‘access denied’ error in the UI, and in the nifi-user.log I get this 
stack trace…

o.a.n.w.s.NiFiAuthenticationFilter Unable to authorize: An Authentication 
object was not found in the SecurityContext
org.springframework.security.authentication.AuthenticationCredentialsNotFoundException:
 An Authentication object was not found in the SecurityContext
at 
org.springframework.security.access.intercept.AbstractSecurityInterceptor.credentialsNotFound(AbstractSecurityInterceptor.java:378)
 ~[spring-security-core-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.access.intercept.AbstractSecurityInterceptor.beforeInvocation(AbstractSecurityInterceptor.java:222)
 ~[spring-security-core-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:123)
 ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:90)
 ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:122)
 ~[spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.authenticate(NiFiAuthenticationFilter.java:99)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:60)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.authenticate(NiFiAuthenticationFilter.java:99)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.apache.nifi.web.security.NiFiAuthenticationFilter.doFilter(NiFiAuthenticationFilter.java:60)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.apache.nifi.web.security.node.NodeAuthorizedUserFilter.doFilter(NodeAuthorizedUserFilter.java:112)
 [nifi-web-security-0.4.1.jar:0.4.1]
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:213)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:176)
 [spring-security-web-4.0.3.RELEASE.jar:4.0.3.RELEASE]
at 
org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:344)
 [spring-web-4.1.6.RELEASE.jar:4.1.6.RELEASE]
at 
org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:261)
 [spring-web-4.1.6.RELEASE.jar:4.1.6.RELEASE]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 [jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
at 
org.apache.nifi.web.filter.ThreadLocalFilter.doFilter(ThreadLocalFilter.java:38)
 [classes/:na]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 [jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
at org.apache.nifi.web.filter.TimerFilter.doFilter(TimerFilter.java:52) 
[classes/:na]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 [jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) 
[jetty-servlet-9.2.11.v20150529.jar:9.2.11.v20150529]
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
[jetty-server-9.2.11.v20150529.jar:9.2.11.v20150529]
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) 

Re: splitText output appears to be getting dropped

2016-02-19 Thread Conrad Crampton
Hi,
Perfect!
I tried \n for linefeed – didn’t think of shift+enter!

The reason I was updating filename early on in my flow file was just because I 
already had UpdateAttributes that was a handy place to do so. I can put it just 
before the PutFile though so no major issue, just wondered why this was 
happening and if it was be design (feature) or bug.

Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Friday, 19 February 2016 at 16:16
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Hello,

MergeContent has properties for header, demarcator, and footer, and also has a 
strategy property which specifies whether these values come from a file or 
inline text.

If you do inline text and specify a demarcator of a new line (shift + enter in 
the demarcator value) then binary concatenation will get you all of the lines 
merged together with new lines between them.

As far as the file naming, can you just wait until after RouteContent to rename 
them? They just need be renamed before the PutFile, but it doesn't necessarily 
have to be before RouteOnContent.

Let us know if that helps.

Thanks,

Bryan


On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
Sorry to piggy back on this thread, but I have pretty much the same issue – I 
am splitting log files -> routeoncontent (various paths) two of these paths 
(including unmatched), basically need to just get farmed off into a directory 
just in case they are needed later.
These go into a MergeContent processor where I would like to merge into one 
file – each flowfile content as a line in the file delimited by line feed (as 
like the original file), whichever way I try this though doesn’t quite do what 
I want. If I try BinaryConcatenation the file ends up as one long line, if TAR 
each Flowfile is a separate file in a TAR (not unsurprisingly). There doesn’t 
seem to be anyway of merging flow file content into one file (that ideally has 
similar functions to be able to compress, specify number of files etc.)

Another related question to the answer below (really helped me out with same 
issue), however if I rename the filename early on in my process flow, it 
appears to be changed back to its original at MergeContent processor time so I 
have to put another UpdateAttributes step in after the Merge to rename the 
filename.
The flow is

UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent -> PutFile
 ^   ^^ ^
 |  | ||
Filename changed samesame reverted

If I put an extra UpdateAttribute before PutFile then fine. Logging at each of 
the above points shows filename updated to ${uuid}-${filename}, but at reverted 
is back at filename.

Any suggestions on particularly the first question??

Thanks
Conrad



From: Jeff Lord <jeffrey.l...@gmail.com<mailto:jeffrey.l...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Friday, 19 February 2016 at 03:22
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: splitText output appears to be getting dropped

Matt,

Thanks a bunch!
That did the trick.
Is there a better way to handle this out of curiosity? Than writing out a 
single line into multiple files.
Each file contains a single string that will be used to build a url.

-Jeff

On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke 
<matt.clarke@gmail.com<mailto:matt.clarke@gmail.com>> wrote:

Jeff,
  It appears you files are being dropped because your are auto-terminating 
the failure relationship on your putFile processor. When the splitText 
processor splits the file by lines every new file has the same filename as the 
original it came from. My guess is the first file is being worked to disk and 
all others are failing because a file of the same name already exists in target 
dir. Try adding an UpdateAttribute processor after the splitText to rename all 
the files. Easiest way is to append the files uuid to its filename.  I also do 
not recommend auto-terminating failure relationships except in rare cases.

Matt

On Feb 18, 2016 8:36 PM, "Jeff Lord" 
<jeffrey.l...@gmail.com<mailto:jeffrey.l...@gmail.com>> wrote:
I have a pretty simple flow where I query for a list of ids using 
executeProcess and than pass that list along to splitText where I am trying to 
split on each line to than dynamically build a url further down the line using 
updateA

Re: Using the Content of a FlowFile in NiFi Expression Language?

2016-02-15 Thread Conrad Crampton
Hi,
This is exactly what I am using NiFi for mostly – parsing log files that have 
one line per FlowFile. To supplement Aldrin’s answer, I am doing exactly this – 
using regexp to parse the FlowFile content (in some cases I am also 
pre-processing the line with ReplaceTextWithMapping (for lookup values), then 
using AttributesToJson to make the FlowFile a single line of Json thus 
converting semi structured to a known format. I then convert to Avro finally 
amongst other things.
Hope that helps.
Conrad

From: Aldrin Piri >
Reply-To: "users@nifi.apache.org" 
>
Date: Tuesday, 16 February 2016 at 04:35
To: "users@nifi.apache.org" 
>
Subject: Re: Using the Content of a FlowFile in NiFi Expression Language?

Jeff,

This would easily be handled by ExtractText [1] wherein you can specify a regex 
to extract values from the content and add them as attributes on the FlowFile.  
For your case, you could use an ExtractText instance with a dynamic property of 
a desired attribute name and regular expression.  The InvokeHTTP could then 
make use of the attribute extracted via Expression Language as you highlighted 
in your original message.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html

On Mon, Feb 15, 2016 at 11:14 PM, Jeff - Data Bean Australia 
> wrote:
Hi,

The ${ attribute } expression can help using attributes of a FlowFile in NiFi 
Expression Language, but what about the content?

For example, if there is only one line in a FlowFile and I would like to use it 
as part of my URL for InvokeHTTP processor, how can I do it?

Thanks,
Jeff

--
Data Bean - A Big Data Solution Provider in Australia.




***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT