Re: How to configure a ExecuteStreamCommand

2016-03-02 Thread jose antonio rodriguez diaz
Hello All,

Finally I could make it work. In addition to errors identified by Joe, I had 
another errors, specifically on the connection between PutFile and 
ExecuteStreamProcess, success relation wasn’t checked, so to try to improve 
this data flow what could I do to log failure relationships, what do you 
suggest? Any comment will be nice to ear, no matter if it is about any other 
way to improve the whole data flow.

Thanks again Joe for answer me.

Regards 


> El 3 mar 2016, a las 0:12, jose antonio rodriguez diaz 
>  escribió:
> 
> Hello Joe,
> 
> I am gonna explain you the whole picture of what I got and what I would like 
> to have. Right now we I receive a file (In fact I receive several files) in a 
> shared net unit called “Z" then manually I move the file to a local folder 
> called “data” and finally I execute a batch file, this batch file consume 
> (read) the file and after read it move it to another folder called 
> “imported”. That’s why I just try to invoke a batch program once the file has 
> been dropped on the “data” folder.
> 
> I have changed Max attribute length and set it to 256 and also leaved the 
> Output destination attribute empty. Even so again I haven’t been able to 
> execute the batch file (there is no new file called foo.txt on my desktop).
> 
> Have you any idea what I am doing wrong? I am pretty sure should be an easy 
> fix. Please fell free to make any comment or suggestion regarding to my case.
> 
> Thanks in advance.
> 
>> El 2 mar 2016, a las 17:36, Joe Percivall  escribió:
>> 
>> Hello,
>> 
>> Welcome to NiFi!
>> 
>> I just tried running an ExecuteStreamCommand processor with the properties 
>> you have (I created a script and modified the paths to point to a folder 
>> that exists) and two things jump out. One, the Max attribute length must 
>> take an integer. If you set it to be a path the processor will be invalid 
>> and you'll see a yellow warning icon in the top left of the processor. This 
>> means the processor will not run and you'll see the flowfiles queue up in 
>> the relationship preceding it.
>> 
>> Second, the Output Destination Attribute is only for when you want to output 
>> the results of the command to an attribute instead of the content of a new 
>> flowfile (useful for running a command to find the character encoding of the 
>> contents). Using an integer for the max attribute length I am able to 
>> correctly run the script.
>> 
>> As a helpful hint, you can see the description of a property by hovering 
>> over the light blue "?" icon in the configure processor tab. Also you can 
>> see the documentation for the processor by right clicking on it and 
>> selecting "usage" from the list.
>> 
>> Also what will you eventually be doing with your script? The way the 
>> ExecuteStreamCommand is designed to work is by taking in a FlowFile and then 
>> running an external command on it. So you may make your flow more efficient 
>> and user friendly by putting the ExecuteStreamCommand between the Get and 
>> Put.
>> 
>> Hope that helps,
>> Joe
>> - - - - - - 
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joeperciv...@yahoo.com
>> 
>> 
>> 
>> 
>> On Sunday, February 28, 2016 4:53 PM, jose antonio rodriguez diaz 
>>  wrote:
>> Hello All,
>> 
>> I am just getting started with apache Nifi doing a kind of PoC (Proof of 
>> Concept) My DataFlow is compose as follow
>> GetFile->PutFile->ExecuteStreamCommand
>> 
>> The idea is move a file from a folder to another one and then execute an 
>> script. The first step (move the file from one side to the other) works 
>> perfectly but I haven´t been able to execute the script. The script is very 
>> simple I just want to create a file on my desktop.
>> 
>> the script called script.sh is located on my Desktop 
>> ($HOME/Desktop/script.sh)
>> 
>> #!/bin/bash
>> 
>> echo "This is a test" >> /Users/joseantoniorodriguez/Desktop/foo.txt
>> 
>> 
>> 
>> 
>> Also the ExecuteStreamCommand is configured as follow
>> 
>> Command Path: /Users/joseantoniorodriguez/Desktop/script.sh
>> Ignore STDIN: true
>> Working directory: /Users/joseantoniorodriguez/Desktop
>> Argument delimiter: ;
>> Output destination attribute: /Users/joseantoniorodriguez/Desktop —> ¿Is 
>> this necessary?
>> Max attribute length: /Users/joseantoniorodriguez/Desktop
>> 
>> 
>> The file I’m using to test are both csv about one of 324KB and the other 
>> 22MB.
>> 
>> After execute I could see the file has been moved from one folder to the 
>> other but I did´t see any foo.txt file on my desktop, also I did´t see any 
>> error on the Flow.
>> 
>> Could anybody give me a hand with this I am pretty sure this should be a 
>> ridiculous error or misconfiguration. BY the way the OS is Mac OS X.
>> 
>> Thanks in advance.
> 



StoreInKiteDataset Processor for Hive

2016-03-02 Thread prabhu Mahendran
Hi,

I have checked Nifi-0.5.1 Binary Source and try to use Hive Support in
StoreinKite Processor. But it shows like below.

java.lang.illegal argumentexception:Missing hive metastore connection URI:


And My Target DataSetURI :dataset:hive:default/customers2


Anyone help me for solve above issue, is there any Hive dependency required
for storing it ?

Best,
Prabhu Mahendran


Re: javascript executescript processor

2016-03-02 Thread Matt Burgess
Good idea! In the scripting processor NAR there are language-specific handlers 
that (among other things) import helpful packages and classes. If such a util 
class was included, the processors could leverage them to make the script even 
easier to implement.

Regards,
Matt

> On Mar 2, 2016, at 9:59 PM, Sumanth Chinthagunta  wrote:
> 
> Thanks for Great blog Matt.
> Thinking we should provide an util class like this, to reduce verbose code 
> for scripting users. 
> https://github.com/xmlking/nifi-scripting/blob/master/nifi-sumo-common/src/main/java/com/crossbusiness/nifi/processors/NiFiUtils.java
> 
> Sent from my iPhone
> 
>> On Mar 2, 2016, at 1:40 PM, Matt Burgess  wrote:
>> 
>> Ask and ye shall receive ;) I realize most of my examples are in Groovy so 
>> it was a good idea to do some non-trivial stuff in another language, thanks 
>> for the suggestion!
>> 
>> I recreated the JSON-to-JSON template but with Javascript as the language: 
>> http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited.html
>> 
>> Regards,
>> Matt
>> 
>>> On Wed, Mar 2, 2016 at 10:52 AM, Mike Harding  
>>> wrote:
>>> Hi Matt,
>>> 
>>> Do you know if there is documentation that describes the ExecuteScript 
>>> JavaScript API at the moment ? Just as a practical example how would I 
>>> translate the Groovy code sample you walk through in this post > 
>>> http://funnifi.blogspot.co.uk/2016/02/executescript-json-to-json-conversion.html
>>> 
>>> Thanks,
>>> M
>>> 
>>> 
>>> 
>>> 
>>> 
 On 1 March 2016 at 18:32, Mike Harding  wrote:
 Hi Matt,
 
 That's exactly what I'm looking for - much appreciated !
 
 Thanks,
 Mike
 
> On Tue, 1 Mar 2016 at 18:13, Matt Burgess  wrote:
> Mike,
> 
> I have a blog containing a few posts on how to use ExecuteScript and 
> InvokeScriptedProcessor: http://funnifi.blogspot.com
> 
> One contains an example using Javascript to get data from Hazelcast and 
> update flowfile attributes: 
> http://funnifi.blogspot.com/2016/02/executescript-using-modules.html
> 
> If you'd like to share what you'd like to do with ExecuteScript, I'd be 
> happy to help you get going!
> 
> Regards,
> Matt
> 
>> On Tue, Mar 1, 2016 at 11:53 AM, Mike Harding  
>> wrote:
>> Hi,
>> 
>> I'd like to utilise the ExecuteScript processor but I understand that 
>> its experimental. Can anyone point me in the direction of an example or 
>> tutorial preferably using Javascript on how to get started with it?
>> 
>> Thanks,
>> Mike 
>> 


Re: Nifi JSON event storage in HDFS

2016-03-02 Thread Christopher Wilson
I used the ConvertJsonToAvro and PutHDFS processors to land files into a
Hive warehouse . Once you get the AVRO schema right it's easy.  Look at the
avro-tools jar file to help with the schema.

Chris

On Wed, Mar 2, 2016, 4:59 AM Conrad Crampton 
wrote:

> Hi,
> I have similar specifications about SQL access – those specifying this
> keep saying Hive, but I don’t believe that is the requirement (typical
> developer knowing best eh?) - I think it is just SQL access that is
> required. Drill is more flexible (in my opinion – I am not affiliated to
> Drill in any way) and has drivers for tooling access too (in a similar way
> Hive has). There is Spark support for Avro too.
> I’ll be interested to follow your progress on this.
> Conrad
>
> From: Mike Harding 
> Reply-To: "users@nifi.apache.org" 
> Date: Wednesday, 2 March 2016 at 10:54
> To: "users@nifi.apache.org" 
> Subject: Re: Nifi JSON event storage in HDFS
>
> Hi Conrad,
>
> Thanks for the heads up, I will investigate Apache Drill. I also forgot to
> mention that I have downstream requirements about which tools the data
> modellers are comfortable using - they want to use Hive and Spark as the
> data access engines primarily so the data needs to be persisted in HDFS in
> a way that it can be easily accessed by these services.
>
> But your right - there is multiple ways of doing this and I'm hoping NiFi
> would help scope/simplify the pipeline design.
>
> Cheers,
> M
>
> On 2 March 2016 at 10:38, Conrad Crampton 
> wrote:
>
>> Hi,
>> I am doing something similar, but having wrestled with Hive data
>> population (not from NiFi) and its performance I am currently looking at
>> Apache Drill as my SQL abstraction layer over my Hadoop cluster (similar
>> size to yours). To this end, I have chosen Avro as my ‘persistence’ format
>> and using a number of processors to get from raw data though mapping
>> attributes to json to avro (via schemas) and ultimately storing in HDFS.
>> Querying this with Drill is a breeze then as the schema is already
>> specified within the data which Drill understands. The schema can also be
>> extended without impacting existing data too.
>> HTH – I’m sure there are a ton of other ways to skin this particular cat
>> though,
>> Conrad
>>
>> From: Mike Harding 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Wednesday, 2 March 2016 at 10:33
>> To: "users@nifi.apache.org" 
>> Subject: Nifi JSON event storage in HDFS
>>
>> Hi All,
>>
>> I currently have a small hadoop cluster running with HDFS and Hive. My
>> ultimate goal is to leverage NiFi's ingestion and flow capabilities to
>> store real-time external JSON formatted event data.
>>
>> What I am unclear about is what the best strategy/design is for storing
>> FlowFile data (i.e. JSON events in my case) within HDFS that can then be
>> accessed and analysed in Hive tables.
>>
>> Is much of the design in terms of storage handled in the NiFi flow or do
>> I need to set something up external of NiFi to ensure I can query each JSON
>> formatted event as a record in a Hive log table for example?
>>
>> Any examples or suggestions much appreciated,
>>
>> Thanks,
>> M
>>
>>
>> ***This email originated outside SecureData***
>>
>> Click here  to
>> report this email as spam.
>>
>>
>> SecureData, combating cyber threats
>>
>> --
>>
>> The information contained in this message or any of its attachments may
>> be privileged and confidential and intended for the exclusive use of the
>> intended recipient. If you are not the intended recipient any disclosure,
>> reproduction, distribution or other dissemination or use of this
>> communications is strictly prohibited. The views expressed in this email
>> are those of the individual and not necessarily of SecureData Europe Ltd.
>> Any prices quoted are only valid if followed up by a formal written quote.
>>
>> SecureData Europe Limited. Registered in England & Wales 04365896.
>> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
>> Maidstone, Kent, ME16 9NT
>>
>
>


Re: How to configure a ExecuteStreamCommand

2016-03-02 Thread jose antonio rodriguez diaz
Hello Joe,

I am gonna explain you the whole picture of what I got and what I would like to 
have. Right now we I receive a file (In fact I receive several files) in a 
shared net unit called “Z" then manually I move the file to a local folder 
called “data” and finally I execute a batch file, this batch file consume 
(read) the file and after read it move it to another folder called “imported”. 
That’s why I just try to invoke a batch program once the file has been dropped 
on the “data” folder.

I have changed Max attribute length and set it to 256 and also leaved the 
Output destination attribute empty. Even so again I haven’t been able to 
execute the batch file (there is no new file called foo.txt on my desktop).

Have you any idea what I am doing wrong? I am pretty sure should be an easy 
fix. Please fell free to make any comment or suggestion regarding to my case.

Thanks in advance.

> El 2 mar 2016, a las 17:36, Joe Percivall  escribió:
> 
> Hello,
> 
> Welcome to NiFi!
> 
> I just tried running an ExecuteStreamCommand processor with the properties 
> you have (I created a script and modified the paths to point to a folder that 
> exists) and two things jump out. One, the Max attribute length must take an 
> integer. If you set it to be a path the processor will be invalid and you'll 
> see a yellow warning icon in the top left of the processor. This means the 
> processor will not run and you'll see the flowfiles queue up in the 
> relationship preceding it.
> 
> Second, the Output Destination Attribute is only for when you want to output 
> the results of the command to an attribute instead of the content of a new 
> flowfile (useful for running a command to find the character encoding of the 
> contents). Using an integer for the max attribute length I am able to 
> correctly run the script.
> 
> As a helpful hint, you can see the description of a property by hovering over 
> the light blue "?" icon in the configure processor tab. Also you can see the 
> documentation for the processor by right clicking on it and selecting "usage" 
> from the list.
> 
> Also what will you eventually be doing with your script? The way the 
> ExecuteStreamCommand is designed to work is by taking in a FlowFile and then 
> running an external command on it. So you may make your flow more efficient 
> and user friendly by putting the ExecuteStreamCommand between the Get and Put.
> 
> Hope that helps,
> Joe
> - - - - - - 
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
> 
> 
> 
> 
> On Sunday, February 28, 2016 4:53 PM, jose antonio rodriguez diaz 
>  wrote:
> Hello All,
> 
> I am just getting started with apache Nifi doing a kind of PoC (Proof of 
> Concept) My DataFlow is compose as follow
> GetFile->PutFile->ExecuteStreamCommand
> 
> The idea is move a file from a folder to another one and then execute an 
> script. The first step (move the file from one side to the other) works 
> perfectly but I haven´t been able to execute the script. The script is very 
> simple I just want to create a file on my desktop.
> 
> the script called script.sh is located on my Desktop ($HOME/Desktop/script.sh)
> 
> #!/bin/bash
> 
> echo "This is a test" >> /Users/joseantoniorodriguez/Desktop/foo.txt
> 
> 
> 
> 
> Also the ExecuteStreamCommand is configured as follow
> 
> Command Path: /Users/joseantoniorodriguez/Desktop/script.sh
> Ignore STDIN: true
> Working directory: /Users/joseantoniorodriguez/Desktop
> Argument delimiter: ;
> Output destination attribute: /Users/joseantoniorodriguez/Desktop —> ¿Is this 
> necessary?
> Max attribute length: /Users/joseantoniorodriguez/Desktop
> 
> 
> The file I’m using to test are both csv about one of 324KB and the other 22MB.
> 
> After execute I could see the file has been moved from one folder to the 
> other but I did´t see any foo.txt file on my desktop, also I did´t see any 
> error on the Flow.
> 
> Could anybody give me a hand with this I am pretty sure this should be a 
> ridiculous error or misconfiguration. BY the way the OS is Mac OS X.
> 
> Thanks in advance.



Re: javascript executescript processor

2016-03-02 Thread Matt Burgess
Ask and ye shall receive ;) I realize most of my examples are in Groovy so
it was a good idea to do some non-trivial stuff in another language, thanks
for the suggestion!

I recreated the JSON-to-JSON template but with Javascript as the language:
http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited.html

Regards,
Matt

On Wed, Mar 2, 2016 at 10:52 AM, Mike Harding 
wrote:

> Hi Matt,
>
> Do you know if there is documentation that describes the ExecuteScript
> JavaScript API at the moment ? Just as a practical example how would I
> translate the Groovy code sample you walk through in this post >
> http://funnifi.blogspot.co.uk/2016/02/executescript-json-to-json-conversion.html
>
> Thanks,
> M
>
>
>
>
>
> On 1 March 2016 at 18:32, Mike Harding  wrote:
>
>> Hi Matt,
>>
>> That's exactly what I'm looking for - much appreciated !
>>
>> Thanks,
>> Mike
>>
>> On Tue, 1 Mar 2016 at 18:13, Matt Burgess  wrote:
>>
>>> Mike,
>>>
>>> I have a blog containing a few posts on how to use ExecuteScript and
>>> InvokeScriptedProcessor: http://funnifi.blogspot.com
>>>
>>> One contains an example using Javascript to get data from Hazelcast and
>>> update flowfile attributes:
>>> http://funnifi.blogspot.com/2016/02/executescript-using-modules.html
>>>
>>> If you'd like to share what you'd like to do with ExecuteScript, I'd be
>>> happy to help you get going!
>>>
>>> Regards,
>>> Matt
>>>
>>> On Tue, Mar 1, 2016 at 11:53 AM, Mike Harding 
>>> wrote:
>>>
 Hi,

 I'd like to utilise the ExecuteScript processor but I understand that
 its experimental. Can anyone point me in the direction of an example or
 tutorial preferably using Javascript on how to get started with it?

 Thanks,
 Mike

>>>
>>>
>


Re: How to configure a ExecuteStreamCommand

2016-03-02 Thread Joe Percivall
Hello,

Welcome to NiFi!

I just tried running an ExecuteStreamCommand processor with the properties you 
have (I created a script and modified the paths to point to a folder that 
exists) and two things jump out. One, the Max attribute length must take an 
integer. If you set it to be a path the processor will be invalid and you'll 
see a yellow warning icon in the top left of the processor. This means the 
processor will not run and you'll see the flowfiles queue up in the 
relationship preceding it.

Second, the Output Destination Attribute is only for when you want to output 
the results of the command to an attribute instead of the content of a new 
flowfile (useful for running a command to find the character encoding of the 
contents). Using an integer for the max attribute length I am able to correctly 
run the script.

As a helpful hint, you can see the description of a property by hovering over 
the light blue "?" icon in the configure processor tab. Also you can see the 
documentation for the processor by right clicking on it and selecting "usage" 
from the list.

Also what will you eventually be doing with your script? The way the 
ExecuteStreamCommand is designed to work is by taking in a FlowFile and then 
running an external command on it. So you may make your flow more efficient and 
user friendly by putting the ExecuteStreamCommand between the Get and Put.
 
Hope that helps,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Sunday, February 28, 2016 4:53 PM, jose antonio rodriguez diaz 
 wrote:
Hello All,

I am just getting started with apache Nifi doing a kind of PoC (Proof of 
Concept) My DataFlow is compose as follow
GetFile->PutFile->ExecuteStreamCommand

The idea is move a file from a folder to another one and then execute an 
script. The first step (move the file from one side to the other) works 
perfectly but I haven´t been able to execute the script. The script is very 
simple I just want to create a file on my desktop.

the script called script.sh is located on my Desktop ($HOME/Desktop/script.sh)

#!/bin/bash

echo "This is a test" >> /Users/joseantoniorodriguez/Desktop/foo.txt




Also the ExecuteStreamCommand is configured as follow

Command Path: /Users/joseantoniorodriguez/Desktop/script.sh
Ignore STDIN: true
Working directory: /Users/joseantoniorodriguez/Desktop
Argument delimiter: ;
Output destination attribute: /Users/joseantoniorodriguez/Desktop —> ¿Is this 
necessary?
Max attribute length: /Users/joseantoniorodriguez/Desktop


The file I’m using to test are both csv about one of 324KB and the other 22MB.

After execute I could see the file has been moved from one folder to the other 
but I did´t see any foo.txt file on my desktop, also I did´t see any error on 
the Flow.

Could anybody give me a hand with this I am pretty sure this should be a 
ridiculous error or misconfiguration. BY the way the OS is Mac OS X.

Thanks in advance.


Re: Processor with State

2016-03-02 Thread Joe Percivall
I created a jira ticket to track this idea for a processor that enables 
updating an attribute using state, which should enable the very basics of data 
science: https://issues.apache.org/jira/browse/NIFI-1582
 
Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Wednesday, March 2, 2016 11:19 AM, Joe Percivall  
wrote:
Hello Claudio,

Your use-case actually could leverage a couple of recently added features to 
create a really cool open-source processor. The two key features that were 
added are State Management and the ability to reference processor specific 
variables in expression language. You can take a look at RouteText to see both 
in action. 

By utilizing both you can create a processor that is configured with multiple 
Expression language expressions. There would be dynamic properties which would 
accept expression language and then store the evaluated value via state 
management. Then there would be a routing property (that supports expression 
language) that could simply add an attribute to the flowfile with the evaluated 
value which would allow it to be used by flowing processors for routing.

This would allow you to do your use-case where you store the value for the 
incoming stream and route differently once you go over a threshold. It could 
even allow more complex use-cases. One instance, I believe, would be possible 
is to have a running average and standard deviation and route data to different 
locations based on it's standard deviation.


You can think of this like an UpdateAttribute with the ability to store and 
calculate variables using expression language.
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Thursday, February 25, 2016 1:12 PM, Claudio Caldato 
 wrote:



I expect that in the future I’ll need something a little more sophisticated but 
for now my problem is very simple:
I want to be able to trigger an alert (only once) when an attribute in an 
incoming stream, for instance, goes over a predefined threshold. The Processor 
should then trigger (only once again) another trigger when the signal goes back 
to normal (below threshold). Basically a RouteByAttribute but with memory.

Thanks 
Claudio





On 2/24/16, 8:56 PM, "Joe Witt"  wrote:

>Claudio
>
>Hello there and welcome to the nifi community.  There are some
>processors available now that allow you to store values in distributed
>(across the cluster) maps and to retrieve them.  And now within
>processors there is the ability interact with state management
>features built into the framework.  So the basic pieces are there.  I
>would like to better understand the idea though because it may be even
>more straight forward.
>
>Where does the state or signal come from that would prompt you to
>store a value away?  And is this source/signal separate from the feed
>of data you'd like to tag with this value?
>
>For example, we have the UpdateAttribute processor which can be used
>to tag attributes onto flow files going by.  You can of course simply
>call the rest api to change the tag being applied as needed and that
>can be done by whatever the signal/source is potentially.
>
>Thanks
>Joe
>
>On Wed, Feb 24, 2016 at 11:49 PM, Claudio Caldato
> wrote:
>>
>> I need to be able to store a simple value (it can be true/false) in the
>> processor across messages, basically I need a processor with a local state
>> (set of properties) that I can use to set the value of properties on output
>> messages
>>
>> Can it be done or do I need to build a custom processor?
>>
>> Thanks
>> Claudio
>>


Re: Processor with State

2016-03-02 Thread Joe Percivall
Hello Claudio,

Your use-case actually could leverage a couple of recently added features to 
create a really cool open-source processor. The two key features that were 
added are State Management and the ability to reference processor specific 
variables in expression language. You can take a look at RouteText to see both 
in action. 

By utilizing both you can create a processor that is configured with multiple 
Expression language expressions. There would be dynamic properties which would 
accept expression language and then store the evaluated value via state 
management. Then there would be a routing property (that supports expression 
language) that could simply add an attribute to the flowfile with the evaluated 
value which would allow it to be used by flowing processors for routing.

This would allow you to do your use-case where you store the value for the 
incoming stream and route differently once you go over a threshold. It could 
even allow more complex use-cases. One instance, I believe, would be possible 
is to have a running average and standard deviation and route data to different 
locations based on it's standard deviation.


You can think of this like an UpdateAttribute with the ability to store and 
calculate variables using expression language.
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Thursday, February 25, 2016 1:12 PM, Claudio Caldato 
 wrote:



I expect that in the future I’ll need something a little more sophisticated but 
for now my problem is very simple:
I want to be able to trigger an alert (only once) when an attribute in an 
incoming stream, for instance, goes over a predefined threshold. The Processor 
should then trigger (only once again) another trigger when the signal goes back 
to normal (below threshold). Basically a RouteByAttribute but with memory.

Thanks 
Claudio





On 2/24/16, 8:56 PM, "Joe Witt"  wrote:

>Claudio
>
>Hello there and welcome to the nifi community.  There are some
>processors available now that allow you to store values in distributed
>(across the cluster) maps and to retrieve them.  And now within
>processors there is the ability interact with state management
>features built into the framework.  So the basic pieces are there.  I
>would like to better understand the idea though because it may be even
>more straight forward.
>
>Where does the state or signal come from that would prompt you to
>store a value away?  And is this source/signal separate from the feed
>of data you'd like to tag with this value?
>
>For example, we have the UpdateAttribute processor which can be used
>to tag attributes onto flow files going by.  You can of course simply
>call the rest api to change the tag being applied as needed and that
>can be done by whatever the signal/source is potentially.
>
>Thanks
>Joe
>
>On Wed, Feb 24, 2016 at 11:49 PM, Claudio Caldato
> wrote:
>>
>> I need to be able to store a simple value (it can be true/false) in the
>> processor across messages, basically I need a processor with a local state
>> (set of properties) that I can use to set the value of properties on output
>> messages
>>
>> Can it be done or do I need to build a custom processor?
>>
>> Thanks
>> Claudio
>>


Re: Aw: Re: Regular Expressions

2016-03-02 Thread Conrad Crampton
Hi,
Yes, it is valid in the tools I use (didn’t try the online one as you have 
access to that).
Clearly nothing is group captured with this regexp though – but it matches.
Conrad

From: Uwe Geercken >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, 2 March 2016 at 16:00
To: "users@nifi.apache.org" 
>
Subject: Aw: Re: Regular Expressions

tks for your reply.

would you do me a favor and check if the expression further below is valid in 
yr regex tool?

tks.

Uwe
--
Diese Nachricht wurde von meinem Android Mobiltelefon mit WEB.DE 
Mail gesendet.

Am 02.03.2016, 14:41, Joe Skora > 
schrieb:
RegexPal is pretty easy to use, and supports PCRE.

On Tue, Mar 1, 2016 at 4:30 PM, Uwe Geercken 
> wrote:
Hello,

I was wondering which tool people use to validate their regular expressions?

I was going throught some of the templates I found in the web and found one 
with following regular expression:

(?s:^.*$)

When using http://www.regexr.com/ which I find very good and complete, 
regexr.com tells me that the question mark at the beginning 
is invalid?

So which way do you write or validate your expressions?

Tks for feedback.

Uwe




***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Aw: Re: Regular Expressions

2016-03-02 Thread Uwe Geercken
tks for your reply.

would you do me a favor and check if the _expression_ further below is valid in yr regex tool?

tks.

Uwe
-- 
Diese Nachricht wurde von meinem Android Mobiltelefon mit WEB.DE Mail gesendet.Am 02.03.2016, 14:41, Joe Skora  schrieb:

  

  RegexPal is pretty easy to use, and supports PCRE.


  
  
On Tue, Mar 1, 2016 at 4:30 PM, Uwe Geercken  wrote:

  Hello,
  
  I was wondering which tool people use to validate their regular expressions?
  
  I was going throught some of the templates I found in the web and found one with following regular _expression_:
  
  (?s:^.*$)
  
  When using http://www.regexr.com/ which I find very good and complete, regexr.com tells me that the question mark at the beginning is invalid?
  
  So which way do you write or validate your expressions?
  
  Tks for feedback.
  
  Uwe
  

  
  

  



Re: javascript executescript processor

2016-03-02 Thread Mike Harding
Hi Matt,

Do you know if there is documentation that describes the ExecuteScript
JavaScript API at the moment ? Just as a practical example how would I
translate the Groovy code sample you walk through in this post >
http://funnifi.blogspot.co.uk/2016/02/executescript-json-to-json-conversion.html

Thanks,
M





On 1 March 2016 at 18:32, Mike Harding  wrote:

> Hi Matt,
>
> That's exactly what I'm looking for - much appreciated !
>
> Thanks,
> Mike
>
> On Tue, 1 Mar 2016 at 18:13, Matt Burgess  wrote:
>
>> Mike,
>>
>> I have a blog containing a few posts on how to use ExecuteScript and
>> InvokeScriptedProcessor: http://funnifi.blogspot.com
>>
>> One contains an example using Javascript to get data from Hazelcast and
>> update flowfile attributes:
>> http://funnifi.blogspot.com/2016/02/executescript-using-modules.html
>>
>> If you'd like to share what you'd like to do with ExecuteScript, I'd be
>> happy to help you get going!
>>
>> Regards,
>> Matt
>>
>> On Tue, Mar 1, 2016 at 11:53 AM, Mike Harding 
>> wrote:
>>
>>> Hi,
>>>
>>> I'd like to utilise the ExecuteScript processor but I understand that
>>> its experimental. Can anyone point me in the direction of an example or
>>> tutorial preferably using Javascript on how to get started with it?
>>>
>>> Thanks,
>>> Mike
>>>
>>
>>


Re: Regular Expressions

2016-03-02 Thread Joe Skora
RegexPal is pretty easy to use, and supports PCRE.

On Tue, Mar 1, 2016 at 4:30 PM, Uwe Geercken  wrote:

> Hello,
>
> I was wondering which tool people use to validate their regular
> expressions?
>
> I was going throught some of the templates I found in the web and found
> one with following regular expression:
>
> (?s:^.*$)
>
> When using http://www.regexr.com/ which I find very good and complete,
> regexr.com tells me that the question mark at the beginning is invalid?
>
> So which way do you write or validate your expressions?
>
> Tks for feedback.
>
> Uwe
>


Re: Nifi JSON event storage in HDFS

2016-03-02 Thread Conrad Crampton
Hi,
I have similar specifications about SQL access – those specifying this keep 
saying Hive, but I don’t believe that is the requirement (typical developer 
knowing best eh?) - I think it is just SQL access that is required. Drill is 
more flexible (in my opinion – I am not affiliated to Drill in any way) and has 
drivers for tooling access too (in a similar way Hive has). There is Spark 
support for Avro too.
I’ll be interested to follow your progress on this.
Conrad

From: Mike Harding >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, 2 March 2016 at 10:54
To: "users@nifi.apache.org" 
>
Subject: Re: Nifi JSON event storage in HDFS

Hi Conrad,

Thanks for the heads up, I will investigate Apache Drill. I also forgot to 
mention that I have downstream requirements about which tools the data 
modellers are comfortable using - they want to use Hive and Spark as the data 
access engines primarily so the data needs to be persisted in HDFS in a way 
that it can be easily accessed by these services.

But your right - there is multiple ways of doing this and I'm hoping NiFi would 
help scope/simplify the pipeline design.

Cheers,
M

On 2 March 2016 at 10:38, Conrad Crampton 
> wrote:
Hi,
I am doing something similar, but having wrestled with Hive data population 
(not from NiFi) and its performance I am currently looking at Apache Drill as 
my SQL abstraction layer over my Hadoop cluster (similar size to yours). To 
this end, I have chosen Avro as my ‘persistence’ format and using a number of 
processors to get from raw data though mapping attributes to json to avro (via 
schemas) and ultimately storing in HDFS. Querying this with Drill is a breeze 
then as the schema is already specified within the data which Drill 
understands. The schema can also be extended without impacting existing data 
too.
HTH – I’m sure there are a ton of other ways to skin this particular cat though,
Conrad

From: Mike Harding >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, 2 March 2016 at 10:33
To: "users@nifi.apache.org" 
>
Subject: Nifi JSON event storage in HDFS

Hi All,

I currently have a small hadoop cluster running with HDFS and Hive. My ultimate 
goal is to leverage NiFi's ingestion and flow capabilities to store real-time 
external JSON formatted event data.

What I am unclear about is what the best strategy/design is for storing 
FlowFile data (i.e. JSON events in my case) within HDFS that can then be 
accessed and analysed in Hive tables.

Is much of the design in terms of storage handled in the NiFi flow or do I need 
to set something up external of NiFi to ensure I can query each JSON formatted 
event as a record in a Hive log table for example?

Any examples or suggestions much appreciated,

Thanks,
M



***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats



The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT



Re: Nifi JSON event storage in HDFS

2016-03-02 Thread Conrad Crampton
Hi,
I am doing something similar, but having wrestled with Hive data population 
(not from NiFi) and its performance I am currently looking at Apache Drill as 
my SQL abstraction layer over my Hadoop cluster (similar size to yours). To 
this end, I have chosen Avro as my ‘persistence’ format and using a number of 
processors to get from raw data though mapping attributes to json to avro (via 
schemas) and ultimately storing in HDFS. Querying this with Drill is a breeze 
then as the schema is already specified within the data which Drill 
understands. The schema can also be extended without impacting existing data 
too.
HTH – I’m sure there are a ton of other ways to skin this particular cat though,
Conrad

From: Mike Harding >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, 2 March 2016 at 10:33
To: "users@nifi.apache.org" 
>
Subject: Nifi JSON event storage in HDFS

Hi All,

I currently have a small hadoop cluster running with HDFS and Hive. My ultimate 
goal is to leverage NiFi's ingestion and flow capabilities to store real-time 
external JSON formatted event data.

What I am unclear about is what the best strategy/design is for storing 
FlowFile data (i.e. JSON events in my case) within HDFS that can then be 
accessed and analysed in Hive tables.

Is much of the design in terms of storage handled in the NiFi flow or do I need 
to set something up external of NiFi to ensure I can query each JSON formatted 
event as a record in a Hive log table for example?

Any examples or suggestions much appreciated,

Thanks,
M



***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT


Nifi JSON event storage in HDFS

2016-03-02 Thread Mike Harding
Hi All,

I currently have a small hadoop cluster running with HDFS and Hive. My
ultimate goal is to leverage NiFi's ingestion and flow capabilities to
store real-time external JSON formatted event data.

What I am unclear about is what the best strategy/design is for storing
FlowFile data (i.e. JSON events in my case) within HDFS that can then be
accessed and analysed in Hive tables.

Is much of the design in terms of storage handled in the NiFi flow or do I
need to set something up external of NiFi to ensure I can query each JSON
formatted event as a record in a Hive log table for example?

Any examples or suggestions much appreciated,

Thanks,
M