Re: How do you recover a workflow ?

2016-11-10 Thread Alessio Palma
The point is that I have a workflow but sometimes things go wrong and I need to 
manually restart it; this action requires:

1) Change some parameters ( UpdateAttribute processor )

2) Fire a new flowfile which will start again the workflow.  Perhaps this is 
the most obscure point. We are using nifi to execute some old cron jobs and I'm 
using the GenerateFlowFile ( crontab scheduling strategy ) processor to start 
the flow.
When the workflow did not complete I use another GenerateFlowFile processor to 
fire a new flowfile which allows me to execute again the flow out of the 
schedule.

All these points can be executed faster if I can insert the value into some 
kind of form into the screen and can fire a new flowfile clicking some button 
instead to start/stop ad additional GenerateFlowFile processor.

Perhaps I'm doing it in the wrong way. So how do you restart a workflow ? Maybe 
this feature can help others in the same task.
Don't know... I'm just asking.





From: Jeff 
Sent: Friday, November 11, 2016 2:36:02 AM
To: users@nifi.apache.org
Subject: Re: How do you recover a workflow ?

Hello Alessio,

Could you provide some more details about your NiFi flow?

One of the triggers I used to manually be able to start processing in my flow 
was to have a GetFile processor (configured with "Keep Source File" set to 
false) watching for files in a directory, and when I wanted to test the flow, I 
would just run the touch command to create a file that the GetFile processor 
would detect and emit a flowfile for it.

Depending on your use case, there might be a better source processor for 
flowfiles that you can use in your flow.

On Thu, Nov 10, 2016 at 6:55 AM Alessio Palma 
> wrote:

Hello all,

what is the best pratice to recover a workflow gone bad ?

Currently I use a generateFlowFile processor attached to some entry point, 
which allows me to restart something. Start then stop and a flowfile is 
created, but this is not the best option.

I really miss the option to put a flowfile using a mouse click. Also some way 
to display a basic interface where to insert/modify some values used into some 
updateAttribute process helps a lot.

What do you think ?


AP



Re: UI is not opening after forming nifi 1.0.0 secure cluster in windows

2016-11-10 Thread Andy LoPresto
What is the error you receive in your browser when you try to navigate to the 
UI? Are you connecting to the correct port?

Can you run an OpenSSL s_client command to try to connect via the command line? 
You will need the CA cert, the client certificate, and the client private key 
to attempt the connection below.

$ openssl s_client -connect  -debug -state -cert 
 -key  -CAfile 


Are there any errors in $NIFI_HOME/logs/nifi-app.log or 
$NIFI_HOME/logs/nifi-bootstrap.log? Are there any entries in 
$NIFI_HOME/logs/nifi-user.log?

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 10, 2016, at 8:41 PM, Manojkumar Ravichandran  
> wrote:
> 
> Hi,
> 
> Tried to form a secure cluster in nifi 1.0.0 in windows by following the 
> instructions from the below link
> 
> http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy
>  
> 
> It seems like in log file cluster has been formed and heart beats are 
> transferring successfully, everything has been settled fine and it shows in 
> log file that URL has been launched in the specified port number, but UI is 
> not opening in the browser of cluster machines.
> 
> To overcome this,I have turned off the firewall settings and but still UI is 
> not opening in the borwser
> 
> What will be reason for it ?
> 
> 
> 
> Regards,
> 
> Manojkumar R
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: How do you recover a workflow ?

2016-11-10 Thread Jeff
Hello Alessio,

Could you provide some more details about your NiFi flow?

One of the triggers I used to manually be able to start processing in my
flow was to have a GetFile processor (configured with "Keep Source File"
set to false) watching for files in a directory, and when I wanted to test
the flow, I would just run the touch command to create a file that the
GetFile processor would detect and emit a flowfile for it.

Depending on your use case, there might be a better source processor for
flowfiles that you can use in your flow.

On Thu, Nov 10, 2016 at 6:55 AM Alessio Palma <
alessio.pa...@docomodigital.com> wrote:

> Hello all,
>
> what is the best pratice to recover a workflow gone bad ?
>
> Currently I use a generateFlowFile processor attached to some entry point,
> which allows me to restart something. Start then stop and a flowfile is
> created, but this is not the best option.
>
> I really miss the option to put a flowfile using a mouse click. Also some
> way to display a basic interface where to insert/modify some values used
> into some updateAttribute process helps a lot.
>
> What do you think ?
>
>
> AP
>
>
>


Enable Compression on Remote Port?

2016-11-10 Thread Peter Wicks (pwicks)
When I have a Remote Process Group and I view its Remote Ports I can see that 
all my ports show "Compressed" as No.  How can I change this so that the ports 
use compression?


Re: NPE MergeContent processor

2016-11-10 Thread Oleg Zhurakousky
Conrad

Any chance you an provide a bit more info about your flow?
I was able to find a condition when something like this can happen, but it 
would have to be with some legacy NiFi distribution, so it’s a bit puzzling, 
but i really want o see if we can close the loop on this.
In any event I think it is safe to raise JIRA on this one

Cheers
Oleg

On Nov 10, 2016, at 10:06 AM, Conrad Crampton 
> wrote:

Hi,
The processor continues to write (to HDFS – the next processor in flow) and 
doesn’t block any others coming into this processor (MergeContent), so not 
quite the same observed behaviour as NIFI-2015.
If there is anything else you would like me to do to help with this more than 
happy to help.
Regards
Conrad

From: Bryan Bende >
Reply-To: "users@nifi.apache.org" 
>
Date: Thursday, 10 November 2016 at 14:59
To: "users@nifi.apache.org" 
>
Subject: Re: NPE MergeContent processor

Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton 
> wrote:
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=2356], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=705637, 
length=2317],offset=0,name=,size=2317], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=725376, 
length=2333],offset=0,name=,size=2333], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=728703, 
length=2377],offset=0,name=,size=2377]] into 
StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-ed3c991fd631,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1478709819819-416, container=default, 
section=416], offset=982498, 
length=4576],offset=0,name=3649103647775837,size=4576]
2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent
java.lang.NullPointerException: null
at 

how to update nifi to the latest jolt version?

2016-11-10 Thread Sebastian Lagemann
Hello,

I’m trying to use the latest jolt specification, especially the operation 
"modify-overwrite-beta“, introduced with jolt version 0.0.22. It seems that 
currently only version 0.0.21 is used by nifi (see 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/pom.xml). I 
tried to update to the latest jolt version (from 0.0.21 to 0.0.24) in the 
corresponding pom.xml and packaged the whole package (after I removed one 
failing test in 
nifi-framework-bundle/nifi-framework/nifi-framework-cluster/src/test/java/org/apache/nifi/cluster/coordination/http/StandardHttpResponseMapperSpec.groovy,
 not the way to go but need some fast results :-)).
Nifi is using the new .nar file but unfortunately the jolt processor is still 
complaining that the jolt specification is invalid (see below).

Does anyone has a hint/idea how to use the latest jolt version, I had already 
in mind of extracting the processor from the nifi standard bar bundle and 
create a new named processor for testing, but I guess there must be an easier 
solution?

The jolt spec I’m using:
[
  {
"operation": "modify-overwrite",
"spec": {
  "lastElementOfType": "=lastElement(@(1,type))"
}
  },
  {
"operation": "shift",
"spec": {
  "value": "@(1,lastElementOfType)",
  "_meta": {
"appId": "appId",
"userId": "userId"
  }
}
  }
]

The data I’m using:
  {
"type":["user","profile","personalInfo", "firstName"],
"value":"testname",
"_meta":{"appId":"test","userId":"56c1614b677c35cc3f28fbd0"}
  }

The expected flow file content afterwards:
{
  "appId" : "test",
  "firstName" : "testname",
  "playerId" : "56c1614b677c35cc3f28fbd0"
}

The jolt demo from http://jolt-demo.appspot.com/#inception 
 delivers the expected results.

Thanks,

Sebastian



Re: Getting the number of logs

2016-11-10 Thread Peddy, Sai
Hi Joe,

Thanks for the response. Yes, that’s essentially what I needed. However, we are 
on an older version of NIFI and currently updating isn’t yet possible. We’ve 
decided it might be better to create a custom processor to use internally – 
which will essentially update the specific counter value in counters – which is 
already available in NIFI. Then using an API call from a different codebase we 
will retrieve the counter value and reset it.

Thanks,
Sai Peddy
From: Joe Percivall 
Reply-To: "users@nifi.apache.org" , Joe Percivall 

Date: Wednesday, November 9, 2016 at 8:16 PM
To: "users@nifi.apache.org" 
Subject: Re: Getting the number of logs

Hello Sai,

I'm gonna paraphrase what I think your use-case is first, let me know if this 
is wrong. You want to keep track of the number of logs coming in and every hour 
you want to document how many came in in that hour. Currently NiFi doesn't 
handle this type of "stateful" event processing very well and with what NiFi 
currently offers you are very limited.

That said, I've done some work in order to help NiFi into the "stateful" event 
processing space that may help you. I currently have an open PR[1] to add state 
to UpdateAttribute. This allows you keep stateful values (like a count) and 
even acts as a Stateful Rule Engine (using UpdateAttribute's 'Advanced Tab').

So in order to solve your use-case you can set up one stateful UpdateAttribute 
along your main flow that counts all your incoming FlowFiles. Then add a 
GenerateFlowFile processor running on an hourly cron job that is routed to the 
stateful UpdateAttribute to act as a trigger. When the Stateful UpdateAttribute 
is triggered it adds the count as an attribute of the triggering flowfile and 
resets the count. Then just do a RouteOnAttribute after the stateful 
UpdateAttribute to separate the triggering FlowFile from the incoming data and 
put it to ElasticSearch.

That may not have been the best explanation and if not I can create a template 
and take screenshots tomorrow if you're interested. One thing to keep in mind 
though, this stateful processing does have a limitation in this PR in that it 
will only work with local state. So no tracking counts across a whole cluster, 
just per node.

[1] https://github.com/apache/nifi/pull/319

Joe

- - - - - -
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com


On Wednesday, November 9, 2016 11:41 AM, "Peddy, Sai" 
 wrote:

Hi All,

Previously posted this in the Dev listserv moving it over to the Users listserv

I’m currently working on a use case to be able to track the number of 
individual logs that come in and put that information in ElasticSearch. I 
wanted to see if there is an easy way to do this and whether anyone had any 
good ideas?

Current approach I am considering: Route the Log Files coming in – to a Split 
Text & Route Text Processor to make sure no empty logs get through and get the 
individual log count when files contain multiple logs – At the end of this the 
total number of logs are visible in the UI queue, where it displays the 
queueCount, but this information is not readily available to any processor. 
Current thought process is that I can use the ExecuteScript Processor and 
update a local file to keep track and insert the document into elastic search 
hourly.

Any advice would be appreciated

Thanks,
Sai Peddy


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


RE: Nifi vs Sqoop

2016-11-10 Thread Provenzano Nicolas
Thanks Bryan.

De : Bryan Bende [mailto:bbe...@gmail.com]
Envoyé : jeudi 10 novembre 2016 15:26
À : users@nifi.apache.org
Objet : Re: Nifi vs Sqoop

Hello,

I can't speak to a direct comparison between NiFi and sqoop, but I can say that 
sqoop is a specific tool that was built just for database extraction, so it can 
probably do some things NiFi can't, since NiFi is a general purpose data flow 
tool.

That being said, NiFi does have the ability to extraction from relation 
databases...

The GenerateTableFetch processor [1] would likely be what you want for more of 
a bulk-extraction, and QueryDatabaseTable [2] for incremental fetching

I believe the "Maximum Value Columns" property on QueryDatabaseTable is how you 
achieve finding new rows since last execution.

Thanks,

Bryan

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GenerateTableFetch/index.html
[2] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseTable/index.html


On Wed, Nov 9, 2016 at 4:37 AM, Provenzano Nicolas 
> wrote:
Hi all,

I have the following requirements :


• I need to load at day 1 a full SQL table,

• And then need to incrementally load new data (using capture data 
change mechanism).

Initially, I was thinking using Sqoop to do it.

Looking at Nifi and especially the QueryDatabaseTable processor, I’m wondering 
if I could use Nifi instead.

Has someone already compared both to do it and what were the outcomes ?

I can’t see however how to configure the QueryDatabaseTable to handle the new 
lines (for example, looking at a “lastmodificationdate” field and taking only 
the lines for which lastModificationDate > lastRequestDate) ?

Thanks in advance

BR

Nicolas



RE: Nifi vs Sqoop

2016-11-10 Thread Provenzano Nicolas
Hi Matt, 

It fully answers to my question. 

Thanks and regards,

Nicolas

-Message d'origine-
De : Matt Burgess [mailto:mattyb...@apache.org] 
Envoyé : jeudi 10 novembre 2016 15:32
À : users@nifi.apache.org
Objet : Re: Nifi vs Sqoop

Nicolas,

The Max Value Columns property of QueryDatabaseTable is the specification by 
which the processor fetches only the new lines. In your case you would put 
"lastmodificationdate" as the Max Value Column. The first time the processor is 
triggered, it will execute a "SELECT * from myTable" and get all the rows (as 
it does not yet know about "new" vs "old" rows). Then for the Max Value Column, 
it will keep track of the maximum value currently observed for that column.
The next time the processor is triggered, it will execute a "SELECT * FROM 
myTable WHERE lastModificationDate > the_max_value_seen_so_far".
Thus only rows whose value for the Max Value Column is greater than the current 
maximum will be returned. Then the maximum is again updated, and so on.

Does this answer your question(about QueryDatabaseTable)? If not please let me 
know.

If your source table is large and/or you'd like to parallelize the fetching of 
rows from the table, consider the GenerateTableFetch processor [1] instead. 
Rather than _executing_ SQL like QueryDatabaseTable does, GenerateTableFetch 
_generates_ SQL, and will generate a number of flow files, each containing a 
SQL statement that grabs X rows from the table. If you supply a Max Value 
Column here, it too will perform incremental fetch after the initial one. These 
flow files can be distributed throughout your cluster (using a 
RemoteProcessGroup pointing to the same cluster, and an Input Port to receive 
the flow files), creating a parallel distributed fetch capability like Sqoop. 
From a scaling perspective, Sqoop uses MapReduce so it can scale with the size 
of your Hadoop cluster.
GenerateTableFetch can scale to the size of your NiFi cluster. You might choose 
NiFi or Sqoop based on the volume and velocity of your data.

Regards,
Matt

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GenerateTableFetch/index.html

On Wed, Nov 9, 2016 at 4:37 AM, Provenzano Nicolas  
wrote:
> Hi all,
>
>
>
> I have the following requirements :
>
>
>
> · I need to load at day 1 a full SQL table,
>
> · And then need to incrementally load new data (using capture data
> change mechanism).
>
>
>
> Initially, I was thinking using Sqoop to do it.
>
>
>
> Looking at Nifi and especially the QueryDatabaseTable processor, I’m 
> wondering if I could use Nifi instead.
>
>
>
> Has someone already compared both to do it and what were the outcomes ?
>
>
>
> I can’t see however how to configure the QueryDatabaseTable to handle 
> the new lines (for example, looking at a “lastmodificationdate” field 
> and taking only the lines for which lastModificationDate > lastRequestDate) ?
>
>
>
> Thanks in advance
>
>
>
> BR
>
>
>
> Nicolas


Re: NPE MergeContent processor

2016-11-10 Thread Conrad Crampton
Hi,
The processor continues to write (to HDFS – the next processor in flow) and 
doesn’t block any others coming into this processor (MergeContent), so not 
quite the same observed behaviour as NIFI-2015.
If there is anything else you would like me to do to help with this more than 
happy to help.
Regards
Conrad

From: Bryan Bende 
Reply-To: "users@nifi.apache.org" 
Date: Thursday, 10 November 2016 at 14:59
To: "users@nifi.apache.org" 
Subject: Re: NPE MergeContent processor

Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton 
> wrote:
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=2356], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=705637, 
length=2317],offset=0,name=,size=2317], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=725376, 
length=2333],offset=0,name=,size=2333], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=728703, 
length=2377],offset=0,name=,size=2377]] into 
StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-ed3c991fd631,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1478709819819-416, container=default, 
section=416], offset=982498, 
length=4576],offset=0,name=3649103647775837,size=4576]
2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent
java.lang.NullPointerException: null
at 
org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:300)
 ~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:281)
 ~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUID(StandardRecordWriter.java:257)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUIDs(StandardRecordWriter.java:266)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:232)
 ~[na:na]
at 

Re: NPE MergeContent processor

2016-11-10 Thread Bryan Bende
Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton <
conrad.cramp...@secdata.com> wrote:

> Hi,
>
> I saw this error after I upgraded to 1.0.0 but thought it was maybe due to
> the issues I had with that upgrade (entirely my fault it turns out!), but I
> have seen it a number of times since so I turned debugging on to get a
> better stacktrace. Relevant log section as below.
>
> Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
>
> I would have raised a Jira issue, but after logging in to Jira it only let
> me create a service desk request (which didn’t sound right).
>
> Regards
>
> Conrad
>
>
>
> 2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7]
> has chosen to yield its resources; will not be scheduled to run again for
> 1000 milliseconds
>
> 2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116]
> Binned 42 FlowFiles
>
> 2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116]
> Merged [StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-
> 8200d5cdd33d,claim=StandardContentClaim [resourceClaim=
> StandardResourceClaim[id=1475059643340-275849, container=default,
> section=393], offset=567158, 
> length=2337],offset=0,name=17453303363322987,size=2337],
> StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=573643, 
> length=2279],offset=0,name=17453303351196175,size=2279],
> StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=583957, 
> length=2223],offset=0,name=17453303531879367,size=2223],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=595617, 
> length=2356],offset=0,name=,size=2356],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=705637, 
> length=2317],offset=0,name=,size=2317],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=725376, 
> length=2333],offset=0,name=,size=2333],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=728703, 
> length=2377],offset=0,name=,size=2377]]
> into StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-
> ed3c991fd631,claim=StandardContentClaim [resourceClaim=
> StandardResourceClaim[id=1478709819819-416, container=default,
> section=416], offset=982498, length=4576],offset=0,name=
> 3649103647775837,size=4576]
>
> 2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116]
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process
> session due to java.lang.NullPointerException:
> java.lang.NullPointerException
>
> 2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent
>
> java.lang.NullPointerException: null
>
> at 
> org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:300)
> ~[nifi-utils-1.0.0.jar:1.0.0]
>
> at 
> org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:281)
> ~[nifi-utils-1.0.0.jar:1.0.0]
>
> at 
> org.apache.nifi.provenance.StandardRecordWriter.writeUUID(StandardRecordWriter.java:257)
> ~[na:na]
>
> at 
> org.apache.nifi.provenance.StandardRecordWriter.writeUUIDs(StandardRecordWriter.java:266)
> ~[na:na]
>
> at 
> org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:232)
> ~[na:na]
>
> at org.apache.nifi.provenance.PersistentProvenanceRepository
> .persistRecord(PersistentProvenanceRepository.java:766) ~[na:na]
>
> at org.apache.nifi.provenance.PersistentProvenanceRepository
> .registerEvents(PersistentProvenanceRepository.java:432) ~[na:na]
>
> at org.apache.nifi.controller.repository.StandardProcessSession.
> updateProvenanceRepo(StandardProcessSession.java:713)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.controller.repository.
> 

Re: Host disconnected due to different workflow configuration

2016-11-10 Thread Alessio Palma
OK, understood it.
It's still a bit fragile.
Is there a way to force the adoption of a configuration from a node using the 
mouse  without to stop and start the server ?




From: Bryan Bende 
Sent: Thursday, November 10, 2016 3:19:16 PM
To: users@nifi.apache.org
Subject: Re: Host disconnected due to different workflow configuration

Hello,

In general NiFi does its best to prevent changes being made to the flow when 
one of the cluster nodes is down. For example, if you have a 3 node cluster and 
only 2 nodes are up, you can't change the flow.

When a request comes in to change the flow, lets say you drag a new processor 
on the graph, this is sent to one of the nodes which then does a two 
phase-commit with the rest of the nodes in the cluster.

The error message you got means that all the nodes responded successfully to 
the first phase, and on the second phase of the commit, one of the nodes 
encountered an error.
At this point the change was applied to the other nodes, and the node with the 
error was purposely disconnected from the cluster because it is in an 
inconsistent state.

If possible, can you see what other errors happened in the log of that node 
before seeing "host out of cluster..."? because the real problem is there was 
some other issue on that node that caused it to fail.

-Bryan

On Wed, Nov 9, 2016 at 5:45 AM, Alessio Palma 
> wrote:

Hello all,

I experienced "host out of the cluster which was no longer able to
join", log reports  configuration workflow has been changed and it's
different from the one running into the cluster.
Due to this issue there is not way to join again the cluster.
To resolve this I stopped the whole cluster and copied the same
configuration to every host. After the restart anything worked well.
Is here a good way to prevent flow changes when all the host into the
cluster are not connected ?




Re: Nifi vs Sqoop

2016-11-10 Thread Matt Burgess
Nicolas,

The Max Value Columns property of QueryDatabaseTable is the
specification by which the processor fetches only the new lines. In
your case you would put "lastmodificationdate" as the Max Value
Column. The first time the processor is triggered, it will execute a
"SELECT * from myTable" and get all the rows (as it does not yet know
about "new" vs "old" rows). Then for the Max Value Column, it will
keep track of the maximum value currently observed for that column.
The next time the processor is triggered, it will execute a "SELECT *
FROM myTable WHERE lastModificationDate > the_max_value_seen_so_far".
Thus only rows whose value for the Max Value Column is greater than
the current maximum will be returned. Then the maximum is again
updated, and so on.

Does this answer your question(about QueryDatabaseTable)? If not
please let me know.

If your source table is large and/or you'd like to parallelize the
fetching of rows from the table, consider the GenerateTableFetch
processor [1] instead. Rather than _executing_ SQL like
QueryDatabaseTable does, GenerateTableFetch _generates_ SQL, and will
generate a number of flow files, each containing a SQL statement that
grabs X rows from the table. If you supply a Max Value Column here, it
too will perform incremental fetch after the initial one. These flow
files can be distributed throughout your cluster (using a
RemoteProcessGroup pointing to the same cluster, and an Input Port to
receive the flow files), creating a parallel distributed fetch
capability like Sqoop. From a scaling perspective, Sqoop uses
MapReduce so it can scale with the size of your Hadoop cluster.
GenerateTableFetch can scale to the size of your NiFi cluster. You
might choose NiFi or Sqoop based on the volume and velocity of your
data.

Regards,
Matt

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GenerateTableFetch/index.html

On Wed, Nov 9, 2016 at 4:37 AM, Provenzano Nicolas
 wrote:
> Hi all,
>
>
>
> I have the following requirements :
>
>
>
> · I need to load at day 1 a full SQL table,
>
> · And then need to incrementally load new data (using capture data
> change mechanism).
>
>
>
> Initially, I was thinking using Sqoop to do it.
>
>
>
> Looking at Nifi and especially the QueryDatabaseTable processor, I’m
> wondering if I could use Nifi instead.
>
>
>
> Has someone already compared both to do it and what were the outcomes ?
>
>
>
> I can’t see however how to configure the QueryDatabaseTable to handle the
> new lines (for example, looking at a “lastmodificationdate” field and taking
> only the lines for which lastModificationDate > lastRequestDate) ?
>
>
>
> Thanks in advance
>
>
>
> BR
>
>
>
> Nicolas


Re: Nifi vs Sqoop

2016-11-10 Thread Bryan Bende
Hello,

I can't speak to a direct comparison between NiFi and sqoop, but I can say
that sqoop is a specific tool that was built just for database extraction,
so it can probably do some things NiFi can't, since NiFi is a general
purpose data flow tool.

That being said, NiFi does have the ability to extraction from relation
databases...

The GenerateTableFetch processor [1] would likely be what you want for more
of a bulk-extraction, and QueryDatabaseTable [2] for incremental fetching

I believe the "Maximum Value Columns" property on QueryDatabaseTable is how
you achieve finding new rows since last execution.

Thanks,

Bryan

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GenerateTableFetch/index.html
[2]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseTable/index.html


On Wed, Nov 9, 2016 at 4:37 AM, Provenzano Nicolas <
nicolas.provenz...@gfi.fr> wrote:

> Hi all,
>
>
>
> I have the following requirements :
>
>
>
> · I need to load at day 1 a full SQL table,
>
> · And then need to incrementally load new data (using capture
> data change mechanism).
>
>
>
> Initially, I was thinking using Sqoop to do it.
>
>
>
> Looking at Nifi and especially the QueryDatabaseTable processor, I’m
> wondering if I could use Nifi instead.
>
>
>
> Has someone already compared both to do it and what were the outcomes ?
>
>
>
> I can’t see however how to configure the QueryDatabaseTable to handle the
> new lines (for example, looking at a “lastmodificationdate” field and
> taking only the lines for which lastModificationDate > lastRequestDate) ?
>
>
>
> Thanks in advance
>
>
>
> BR
>
>
>
> Nicolas
>


Re: Host disconnected due to different workflow configuration

2016-11-10 Thread Bryan Bende
Hello,

In general NiFi does its best to prevent changes being made to the flow
when one of the cluster nodes is down. For example, if you have a 3 node
cluster and only 2 nodes are up, you can't change the flow.

When a request comes in to change the flow, lets say you drag a new
processor on the graph, this is sent to one of the nodes which then does a
two phase-commit with the rest of the nodes in the cluster.

The error message you got means that all the nodes responded successfully
to the first phase, and on the second phase of the commit, one of the nodes
encountered an error.
At this point the change was applied to the other nodes, and the node with
the error was purposely disconnected from the cluster because it is in an
inconsistent state.

If possible, can you see what other errors happened in the log of that node
before seeing "host out of cluster..."? because the real problem is there
was some other issue on that node that caused it to fail.

-Bryan

On Wed, Nov 9, 2016 at 5:45 AM, Alessio Palma <
alessio.pa...@docomodigital.com> wrote:

> Hello all,
>
> I experienced "host out of the cluster which was no longer able to
> join", log reports  configuration workflow has been changed and it's
> different from the one running into the cluster.
> Due to this issue there is not way to join again the cluster.
> To resolve this I stopped the whole cluster and copied the same
> configuration to every host. After the restart anything worked well.
> Is here a good way to prevent flow changes when all the host into the
> cluster are not connected ?
>
>
>


How do you recover a workflow ?

2016-11-10 Thread Alessio Palma
Hello all,

what is the best pratice to recover a workflow gone bad ?

Currently I use a generateFlowFile processor attached to some entry point, 
which allows me to restart something. Start then stop and a flowfile is 
created, but this is not the best option.

I really miss the option to put a flowfile using a mouse click. Also some way 
to display a basic interface where to insert/modify some values used into some 
updateAttribute process helps a lot.

What do you think ?


AP