Re: Connecting to OPCUA

2017-04-21 Thread Raveendran, Varsha
Thank you for pointing this out, wade and Aldrin. i will set it up correctly.

I have a follow up question to this. Currently the plan is to use the GetValue 
processor to continuously receive values in a raspberry pi that will in turn 
push the values to a NiFi server installed in cloud.

We want to first understand the rate at which the Pi can receive the values 
using the opc ua processors. That means  GetValue must continuously hit the 
endpoint and get the values. I imagine this like a "while(true)" loop coming 
from a software background!
Currently I see that the tasks/time is only about 10/seconds and that the queue 
before GetValue is filling up very soon. I tried a couple of optimizations - 
Increased concurrent threads, Decreased scheduled time so that it can get the 
values from OPC server at a faster rate and also the yield time.
However, none of these seem to increase the rate. We are hoping for atleast a 
1000 tag values/second to be sent from the pi into the cloud.

Any suggestions?
I'm new to NiFi and still learning.

Thanks & Regards,
Varsha

On 22-Apr-2017, at 02:19, Aldrin Piri 
> wrote:

Hi Varsha,

Went through the build process quickly on my side building the dependent UA 
Java and then referencing that version in the listed project with some cleanup 
and had the GetNodeIds processor available in NiFi after copying the resulting 
NARs into my lib directory.

Doing some looking through other repositories of that user, it appears you were 
actually using this one https://github.com/wadesalazar/Nifi-OPCUA-Processors 
which has the processor your listed above, GetExpandedNodeIds.

On Thu, Apr 20, 2017 at 10:03 AM, Raveendran, Varsha 
> wrote:
Hello,

I am following the steps described in 
https://community.hortonworks.com/articles/90355/collect-data-from-opc-ua-protocol.html
 to connect to an OPC UA server from MiNiFi.

I built https://github.com/wadesalazar/NIFI-OPCUA from source code using Maven. 
However, after adding the nar file to lib folder I am unable to view GetNodeIds 
on NiFi canvas.
Instead I see GetExpandedNodeIds.  Are these processors performing the same 
functionality?

I’m able to connect to the specified OPC endpoint using GetExpandedNodeIds  
though without the Controller Service created. Just wanted to confirm that I’m 
on the right track and did not make a mistake while  building the nar file.


Thanks & Regards,
Varsha

Registered Office: 130 Pandurang Budhkar Marg, Worli, Mumbai – 400018; 
Corporate Identity number: L28920MH1957PLC010839; Tel.: +91 (22) 3967 
7000; Fax: +91 22 3967 
7500;
Contact / Email: www.siemens.co.in/contact; 
Website: www.siemens.co.in. Sales Offices: Ahmedabad, 
Bengaluru, Bhopal, Bhubaneswar, Chandigarh, Chennai, Coimbatore, Gurgaon, 
Hyderabad, Jaipur, Jamshedpur, Kharghar, Kolkata, Lucknow, Kochi, Mumbai, 
Nagpur, Navi Mumbai, New Delhi,


Registered Office: 130 Pandurang Budhkar Marg, Worli, Mumbai – 400018; 
Corporate Identity number: L28920MH1957PLC010839; Tel.: +91 (22) 3967 7000; 
Fax: +91 22 3967 7500;
Contact / Email: www.siemens.co.in/contact; Website: www.siemens.co.in. Sales 
Offices: Ahmedabad, Bengaluru, Bhopal, Bhubaneswar, Chandigarh, Chennai, 
Coimbatore, Gurgaon, Hyderabad, Jaipur, Jamshedpur, Kharghar, Kolkata, Lucknow, 
Kochi, Mumbai, Nagpur, Navi Mumbai, New Delhi,


Re: Security between MiNiFi and NiFi

2017-04-21 Thread Aldrin Piri
Hi Varsha,

The preferred mechanism for accomplishing this is via Site to Site and
Remote Processing Groups [1].  This makes use of the configured Security
Properties [2].  For other means of data egress, these would vary depending
on specific processor or extension used.

[1]
https://github.com/apache/nifi-minifi/blob/rel/minifi-0.1.0/minifi-docs/src/main/markdown/System_Admin_Guide.md#remote-process-groups
[2]
https://github.com/apache/nifi-minifi/blob/rel/minifi-0.1.0/minifi-docs/src/main/markdown/System_Admin_Guide.md#security-properties

On Fri, Apr 21, 2017 at 12:24 PM, Raveendran, Varsha <
varsha.raveend...@siemens.com> wrote:

> Hello,
>
>
>
> Can you refer me to the documentation for setting up a secure
> communication between MiNiFi (on an IoT device) and NiFi on EC2 instance?
>
>
>
> Thanks,
>
> Varsha
>
> Registered Office: 130 Pandurang Budhkar Marg, Worli, Mumbai – 400018;
> Corporate Identity number: L28920MH1957PLC010839; Tel.: +91 (22) 3967 7000
> <+91%2022%203967%207000>; Fax: +91 22 3967 7500 <+91%2022%203967%207500>;
> Contact / Email: www.siemens.co.in/contact; Website: www.siemens.co.in.
> Sales Offices: Ahmedabad, Bengaluru, Bhopal, Bhubaneswar, Chandigarh,
> Chennai, Coimbatore, Gurgaon, Hyderabad, Jaipur, Jamshedpur, Kharghar,
> Kolkata, Lucknow, Kochi, Mumbai, Nagpur, Navi Mumbai, New Delhi,
>


Re: List processors

2017-04-21 Thread Bryan Bende
Ah I see, the providers say which scopes they support, and then the
framework only lets them be used for the appropriate scope, which
makes sense.

I suppose the WriteAheadLocalStateProvider could be modified to
support CLUSTER scope, although this seems like a bad idea to give
people an option that we know causes data loss. You could make your
own version of the WriteAheadLocalStateProvider with this change
though.

If you are running a cluster on 0.7, don't you already have ZooKeeper
in order to run the cluster? Just trying to see what the issue with
using ZooKeeper is.

On Fri, Apr 21, 2017 at 2:32 PM, Juan Sequeiros  wrote:
> Thanks Bryan just tried and NIFI does not start because of this:
>
> " Cannot use Cluster State Provider ( WriteAheadLocalStateProvider ) as it
> only supports scope(s) {LOCAL} but instance is configured to use scope
> CLUSTER"
>
>
> On Fri, Apr 21, 2017 at 2:12 PM Bryan Bende  wrote:
>>
>> Juan,
>>
>> I believe from the processor side of the things, when a processor
>> calls save/retrieve on the state manager, the processor has to specify
>> a context like CLUSTER or LOCAL. If you specify CLUSTER, and no
>> clustered state provider exists, then it will save it to the local
>> provider. This allows a processor to work seamlessly across a
>> standalone Nifi and a clustered Nifi.
>>
>> The issue is that NiFi is not allowed to start up in clustered mode
>> without a clustered state provider, generally that is the ZooKeeper
>> provider, although it is an extension point and someone can implement
>> their own.
>>
>> I would think you could do the following...
>>
>> The normal clustered provider looks like this in state-management.xml
>>
>> 
>> zk-provider
>>
>> org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider
>> 
>> /nifi
>> 10 seconds
>> Open
>> 
>>
>> If you take the config from the local provider and drop it in the
>> cluster provider...
>>
>> 
>> local-cluster-provider
>>
>> org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider
>>./state/local
>>false
>>16
>>2 mins
>> 
>>
>> Basically defining another instance of the local state provider as the
>> cluster provider.
>>
>> Not totally sure if this works, but theoretically it should.
>>
>> -Bryan
>>
>>
>> On Fri, Apr 21, 2017 at 1:54 PM, Juan Sequeiros 
>> wrote:
>> > To add more to the issue I see this on my log:
>> >
>> > "Failed to restore processor state; yielding java.io.IOException; Failed
>> > to
>> > obtain value from Zookeeper for component " with exception code
>> > CONNECTIONLOSS
>> >
>> > So this is confirming what I expected and at this point not sure if this
>> > is
>> > a bug or working as expected  Feels like the dataflow manager should
>> > configure how to handle state and be able to use ListS3 in cluster even
>> > though I do not have zookeeper?
>> >
>> >
>> >
>> > On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros 
>> > wrote:
>> >>
>> >> Hello all,
>> >>
>> >> My preliminary testing shows that if I run ListS3 ( maybe all list
>> >> processors? )  processor on a cluster and that cluster is not running
>> >> or
>> >> configured to talk to zookeeper that he does not maintain state at all
>> >> even
>> >> though I would expect him to maintain state locally.
>> >>
>> >> EX: ListProcessor ( run on primary node ) > distribute to cluster and
>> >> use
>> >> ConsumeProcessor,
>> >>
>> >> * We accept the fact that if primary node changes it would lose state,
>> >> but
>> >> I want to maintain local state on cluster.
>> >>
>> >> I have tried changing the
>> >> nifi.state.management.provider.cluster=local-provider and he fails
>> >> since I
>> >> am clustered.
>> >>
>> >> I am certainly going to do more testing unless its definitely true that
>> >> if
>> >> clustered it only maintains zookeeper state.
>> >>
>> >> Or it might be processor dependent? My tests have been with ListS3 and
>> >> i
>> >> am on Apache NIFI 0.7


Re: List processors

2017-04-21 Thread Juan Sequeiros
Thanks Bryan just tried and NIFI does not start because of this:

" Cannot use Cluster State Provider ( WriteAheadLocalStateProvider ) as it
only supports scope(s) {LOCAL} but instance is configured to use scope
CLUSTER"


On Fri, Apr 21, 2017 at 2:12 PM Bryan Bende  wrote:

> Juan,
>
> I believe from the processor side of the things, when a processor
> calls save/retrieve on the state manager, the processor has to specify
> a context like CLUSTER or LOCAL. If you specify CLUSTER, and no
> clustered state provider exists, then it will save it to the local
> provider. This allows a processor to work seamlessly across a
> standalone Nifi and a clustered Nifi.
>
> The issue is that NiFi is not allowed to start up in clustered mode
> without a clustered state provider, generally that is the ZooKeeper
> provider, although it is an extension point and someone can implement
> their own.
>
> I would think you could do the following...
>
> The normal clustered provider looks like this in state-management.xml
>
> 
> zk-provider
>
> org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider
> 
> /nifi
> 10 seconds
> Open
> 
>
> If you take the config from the local provider and drop it in the
> cluster provider...
>
> 
> local-cluster-provider
>
>  
> org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider
>./state/local
>false
>16
>2 mins
> 
>
> Basically defining another instance of the local state provider as the
> cluster provider.
>
> Not totally sure if this works, but theoretically it should.
>
> -Bryan
>
>
> On Fri, Apr 21, 2017 at 1:54 PM, Juan Sequeiros 
> wrote:
> > To add more to the issue I see this on my log:
> >
> > "Failed to restore processor state; yielding java.io.IOException; Failed
> to
> > obtain value from Zookeeper for component " with exception code
> > CONNECTIONLOSS
> >
> > So this is confirming what I expected and at this point not sure if this
> is
> > a bug or working as expected  Feels like the dataflow manager should
> > configure how to handle state and be able to use ListS3 in cluster even
> > though I do not have zookeeper?
> >
> >
> >
> > On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros 
> wrote:
> >>
> >> Hello all,
> >>
> >> My preliminary testing shows that if I run ListS3 ( maybe all list
> >> processors? )  processor on a cluster and that cluster is not running or
> >> configured to talk to zookeeper that he does not maintain state at all
> even
> >> though I would expect him to maintain state locally.
> >>
> >> EX: ListProcessor ( run on primary node ) > distribute to cluster and
> use
> >> ConsumeProcessor,
> >>
> >> * We accept the fact that if primary node changes it would lose state,
> but
> >> I want to maintain local state on cluster.
> >>
> >> I have tried changing the
> >> nifi.state.management.provider.cluster=local-provider and he fails
> since I
> >> am clustered.
> >>
> >> I am certainly going to do more testing unless its definitely true that
> if
> >> clustered it only maintains zookeeper state.
> >>
> >> Or it might be processor dependent? My tests have been with ListS3 and i
> >> am on Apache NIFI 0.7
>


Re: List processors

2017-04-21 Thread Bryan Bende
Juan,

I believe from the processor side of the things, when a processor
calls save/retrieve on the state manager, the processor has to specify
a context like CLUSTER or LOCAL. If you specify CLUSTER, and no
clustered state provider exists, then it will save it to the local
provider. This allows a processor to work seamlessly across a
standalone Nifi and a clustered Nifi.

The issue is that NiFi is not allowed to start up in clustered mode
without a clustered state provider, generally that is the ZooKeeper
provider, although it is an extension point and someone can implement
their own.

I would think you could do the following...

The normal clustered provider looks like this in state-management.xml


zk-provider

org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider

/nifi
10 seconds
Open


If you take the config from the local provider and drop it in the
cluster provider...


local-cluster-provider
   
org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider
   ./state/local
   false
   16
   2 mins


Basically defining another instance of the local state provider as the
cluster provider.

Not totally sure if this works, but theoretically it should.

-Bryan


On Fri, Apr 21, 2017 at 1:54 PM, Juan Sequeiros  wrote:
> To add more to the issue I see this on my log:
>
> "Failed to restore processor state; yielding java.io.IOException; Failed to
> obtain value from Zookeeper for component " with exception code
> CONNECTIONLOSS
>
> So this is confirming what I expected and at this point not sure if this is
> a bug or working as expected  Feels like the dataflow manager should
> configure how to handle state and be able to use ListS3 in cluster even
> though I do not have zookeeper?
>
>
>
> On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros  wrote:
>>
>> Hello all,
>>
>> My preliminary testing shows that if I run ListS3 ( maybe all list
>> processors? )  processor on a cluster and that cluster is not running or
>> configured to talk to zookeeper that he does not maintain state at all even
>> though I would expect him to maintain state locally.
>>
>> EX: ListProcessor ( run on primary node ) > distribute to cluster and use
>> ConsumeProcessor,
>>
>> * We accept the fact that if primary node changes it would lose state, but
>> I want to maintain local state on cluster.
>>
>> I have tried changing the
>> nifi.state.management.provider.cluster=local-provider and he fails since I
>> am clustered.
>>
>> I am certainly going to do more testing unless its definitely true that if
>> clustered it only maintains zookeeper state.
>>
>> Or it might be processor dependent? My tests have been with ListS3 and i
>> am on Apache NIFI 0.7


Re: List processors

2017-04-21 Thread Juan Sequeiros
To add more to the issue I see this on my log:

"Failed to restore processor state; yielding java.io.IOException; Failed to
obtain value from Zookeeper for component " with exception code
CONNECTIONLOSS

So this is confirming what I expected and at this point not sure if this is
a bug or working as expected  Feels like the dataflow manager should
configure how to handle state and be able to use ListS3 in cluster even
though I do not have zookeeper?



On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros  wrote:

> Hello all,
>
> My preliminary testing shows that if I run ListS3 ( maybe all list
> processors? )  processor on a cluster and that cluster is not running or
> configured to talk to zookeeper that he does not maintain state at all even
> though I would expect him to maintain state locally.
>
> EX: ListProcessor ( run on primary node ) > distribute to cluster and use
> ConsumeProcessor,
>
> * We accept the fact that if primary node changes it would lose state, but
> I want to maintain local state on cluster.
>
> I have tried changing the
> nifi.state.management.provider.cluster=local-provider and he fails since I
> am clustered.
>
> I am certainly going to do more testing unless its definitely true that if
> clustered it only maintains zookeeper state.
>
> Or it might be processor dependent? My tests have been with ListS3 and i
> am on Apache NIFI 0.7
>


Security between MiNiFi and NiFi

2017-04-21 Thread Raveendran, Varsha
Hello,

Can you refer me to the documentation for setting up a secure communication 
between MiNiFi (on an IoT device) and NiFi on EC2 instance?

Thanks,
Varsha

Registered Office: 130 Pandurang Budhkar Marg, Worli, Mumbai - 400018; 
Corporate Identity number: L28920MH1957PLC010839; Tel.: +91 (22) 3967 7000; 
Fax: +91 22 3967 7500;
Contact / Email: www.siemens.co.in/contact; Website: www.siemens.co.in. Sales 
Offices: Ahmedabad, Bengaluru, Bhopal, Bhubaneswar, Chandigarh, Chennai, 
Coimbatore, Gurgaon, Hyderabad, Jaipur, Jamshedpur, Kharghar, Kolkata, Lucknow, 
Kochi, Mumbai, Nagpur, Navi Mumbai, New Delhi,


Re: How to get ftp file according to Current date?

2017-04-21 Thread Pierre Villard
You could use the combination of ListFTP and FetchFTP (this is, most of the
time, a better approach), and between the two processors you could do a
RouteOnAttribute and only keep the flow files with the filename you are
looking for.

Pierre.

2017-04-21 13:29 GMT+02:00 prabhu Mahendran :

>  I have tried that "GetFTP" processor in which downloads file from FTP
> accoding to the two attributes
>
> 1."FileFilterRegex" -Name of file in FTP
> 2."RemotePath"-Path of an FTP file.
>
> I wants to download the File from FTP Server only if it having today's
> date which is append with filename.
>
> *For example:*
>
> File name is *20170421TempFile.txt *
> which is in FTP Server.
>
> Now i need to give that system date only to be append with filename.It
> should get the current system date automatically *instead of given date
> value directly*.
>
> So i have find that ${now()} gets the current system date but i cannot
> give it in *"FileFilterRegex"* attribute due to it doesn't have
> expression language support.
>
> Finally i need to get particular file with current date.
>
>  Anyone give some idea/guide me to achieve my requirements?
>


List processors

2017-04-21 Thread Juan Sequeiros
Hello all,

My preliminary testing shows that if I run ListS3 ( maybe all list
processors? )  processor on a cluster and that cluster is not running or
configured to talk to zookeeper that he does not maintain state at all even
though I would expect him to maintain state locally.

EX: ListProcessor ( run on primary node ) > distribute to cluster and use
ConsumeProcessor,

* We accept the fact that if primary node changes it would lose state, but
I want to maintain local state on cluster.

I have tried changing the
nifi.state.management.provider.cluster=local-provider and he fails since I
am clustered.

I am certainly going to do more testing unless its definitely true that if
clustered it only maintains zookeeper state.

Or it might be processor dependent? My tests have been with ListS3 and i am
on Apache NIFI 0.7


Re: Help with SFTP processor

2017-04-21 Thread James Keeney
Joe -

Still working with the SFTP chain. We have been testing and have
encountered a different problem. During some transfers but not all, when
doing a transfer of a large number of files (>10), some of the files are
not transferring.

I'm getting no errors at all.

Here is the chain I'm using:

ListFile --> FetchFile --> PutSFTP

I used putfile processors for the failure and reject and terminating on
success. No files are ending up in the directories I set aside for Failure
and Reject.

Any suggestions on how best to debug this?

Thanks.

On Thu, Apr 6, 2017 at 5:24 PM Joe Witt  wrote:

> I just mean while designing/interacting with the flow you can
> start/stop processors, you can click on the connections and say 'list
> queue' and then you can click on each object in the queue and see its
> attributes and content.  Really helps step through the flow at each
> step.
>
> On Thu, Apr 6, 2017 at 5:08 PM, James Keeney  wrote:
> > Thanks again. When you refer to live queue listing and data viewing what
> are
> > you referring to? The dashboard or something else.
> >
> > Jim K.
> >
> > On Thu, Apr 6, 2017 at 4:49 PM Joe Witt  wrote:
> >>
> >> No problem.  Remember you can use live queue listing and data viewing
> >> to see all the attributes we know about the object at each stage.
> >> That is exactly how I figured out how to wire this together and what I
> >> needed from each step.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Thu, Apr 6, 2017 at 4:43 PM, James Keeney 
> wrote:
> >> > Thank you. That was the final detail I was not getting. The use of the
> >> > ${path} expression variable. I now see that I needed to look at the
> >> > writes
> >> > attributes section of the ListFile processor.
> >> >
> >> > Jim K.
> >> >
> >> > On Thu, Apr 6, 2017 at 2:30 PM Joe Witt  wrote:
> >> >>
> >> >> Jim,
> >> >>
> >> >> Yep I understand your question and how to support that is what I was
> >> >> trying to convey.
> >> >>
> >> >> ListFile should pull from "/home/source".  Lets say it finds that
> >> >> 'home/source/test/newfile.txt file.
> >> >>
> >> >> The resulting flowfile will have an attribute called 'path' that says
> >> >> 'test'
> >> >>
> >> >> Then you use FetchFile to actually pull in the bytes.  'path' still
> >> >> says
> >> >> 'test'
> >> >>
> >> >> Then you use PutSFTP with the 'Remote Path' set to
> "/www/files/${path}"
> >> >>
> >> >> I have just verified that this works myself locally using
> >> >> List/Fetch/PutFile.  In your case you'd use PutSFTP.
> >> >>
> >> >> Thanks
> >> >> Joe
> >> >>
> >> >> On Thu, Apr 6, 2017 at 12:51 PM, James Keeney 
> >> >> wrote:
> >> >> > Thanks for getting back to me. I will follow up on the
> documentation
> >> >> > Pull
> >> >> > Request.
> >> >> >
> >> >> > As to the directory question, I wasn't specific enough. I've
> already
> >> >> > configured the setting you described.
> >> >> >
> >> >> > Here is what is going on:
> >> >> >
> >> >> > Say the source directory is /home/source and the destination is
> >> >> > /www/files
> >> >> >
> >> >> > This works:
> >> >> >
> >> >> > If a user drops the file text.txt into /home/source then I want
> that
> >> >> > to
> >> >> > be
> >> >> > /www/files/text.txt That is working as expected.
> >> >> >
> >> >> > This does not work
> >> >> >
> >> >> > If a user creates a subdirectory and drop a file, so
> >> >> > /home/source/test/newfile.txt then I want the destination to
> reflect
> >> >> > the
> >> >> > subdirectory as in /www/files/test/newfile.txt But what happens is
> >> >> > the
> >> >> > file
> >> >> > is being placed into the destination directory without the new
> >> >> > subdirectory.
> >> >> > So what is getting created is /www/files/newfile.txt and not
> >> >> > /www/files/test/newfile.txt
> >> >> >
> >> >> > Any suggestions?
> >> >> >
> >> >> > Jim K.
> >> >> >
> >> >> >
> >> >> > On Thu, Apr 6, 2017 at 12:19 PM Joe Witt 
> wrote:
> >> >> >>
> >> >> >> Jim,
> >> >> >>
> >> >> >> Glad you've made progress on the SFTP side.  Please file a JIRA
> with
> >> >> >> your suggestions for the docs and the ideal case then is you'd
> file
> >> >> >> a
> >> >> >> Pull Request
> >> >> >> (
> https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
> >> >> >> which actually provides the suggested documentation changes.
> >> >> >>
> >> >> >> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
> >> >> >> PutSFTP to set is 'Remote Path'.  You'll want this value to have
> the
> >> >> >> base directory you need to write to which could be './' or could
> be
> >> >> >> 'some/place/to/write/to' and you'll also want it to reflect the
> >> >> >> directory structure from which you fetched the file locally.  This
> >> >> >> will be available to you from the 'path' attribute of the
> flowfile.
> >> >> >> This is set by the ListFile processor (see writes attributes) [2].
> >> >> >>
> 

Re: NiFi best practices to manage big flowfiles

2017-04-21 Thread Mark Payne
Simone,

There is a Feature Proposal that was put together on our wiki at [1] that 
proposes a way to have a FlowFile refer to content that lives elsewhere outside 
of the content repo itself. I think this is what you're getting at. It's a 
great idea, but I don't know that any progress has yet been made on it.

If this is something that you're interested in delving into developing we would 
certainly be more than happy to work with you guys on bringing this to fruition.

Thanks
-Mark


[1] https://cwiki.apache.org/confluence/display/NIFI/External+FlowFile+content

Sent from my iPhone

On Apr 21, 2017, at 7:50 AM, Andrew Grande 
> wrote:


Let me ask you this. All those processing cli steps, do they change file 
format, content, etc? If yes, NiFi is not doing anything that you aren't doing 
already. E.g. unpacking a file requires space for the original and decompressed 
file to be available.

You can use ListFile and not move any files in NiFi. It will have a full file 
path as an attribute which you can pass around to your tool invocations.

HTH,
Andrew

On Fri, Apr 21, 2017, 7:17 AM Simone Giannecchini 
>
 wrote:
Dear Andrew,
I am working with Damiano on this, so let me first thank you for your
indications.

The use case is as follows:

- a satellite acquisition is placed on a shared file system. It can be
significative in size, e.g. 10GB
- it has to be pulled through a chain of operations out of a larger
DAG where the elements of the sequence is decided depending on the
data itself
- we will surely create a number of intermediate files as we are going
to use standard CLI tools for the processing
- the resulting file will be placed again in a shared file system to
be served by a cluster of mapping servers to generate maps on the fly

We are getting thousands of this files per day hence we are trying to
minimize file move operations.

If you are still not sleeping, here is the point. Can I avoid, without
having to customize too many parts of NIFI, to bring the original file
into the content repository or we are stretching NIFI to far from its
intended usage patterns?

Thanks for your  patience.


Regards,
Simone Giannecchini
==
GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.
==
Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob:   +39  333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

---
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility  for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.


On Fri, Apr 21, 2017 at 1:01 PM, Andrew Grande 
> wrote:
> Hi,
>
> First, there won't be multiple copies of a file within NiFi. If you pass
> around the content and don't change it (only attributes), it will merely
> point a reference to it, no more.
>
> You need to decide if you want to delete processed files, this is what
> GetFile does. Might want to look into ListFile/FetchFile instead, it
> maintains internal state of files already processed.
>
> Assuming you want to delete the file from the original location, you can use
> PutFile 

Re: NiFi best practices to manage big flowfiles

2017-04-21 Thread Andrew Grande
Let me ask you this. All those processing cli steps, do they change file
format, content, etc? If yes, NiFi is not doing anything that you aren't
doing already. E.g. unpacking a file requires space for the original and
decompressed file to be available.

You can use ListFile and not move any files in NiFi. It will have a full
file path as an attribute which you can pass around to your tool
invocations.

HTH,
Andrew

On Fri, Apr 21, 2017, 7:17 AM Simone Giannecchini <
simone.giannecch...@geo-solutions.it> wrote:

> Dear Andrew,
> I am working with Damiano on this, so let me first thank you for your
> indications.
>
> The use case is as follows:
>
> - a satellite acquisition is placed on a shared file system. It can be
> significative in size, e.g. 10GB
> - it has to be pulled through a chain of operations out of a larger
> DAG where the elements of the sequence is decided depending on the
> data itself
> - we will surely create a number of intermediate files as we are going
> to use standard CLI tools for the processing
> - the resulting file will be placed again in a shared file system to
> be served by a cluster of mapping servers to generate maps on the fly
>
> We are getting thousands of this files per day hence we are trying to
> minimize file move operations.
>
> If you are still not sleeping, here is the point. Can I avoid, without
> having to customize too many parts of NIFI, to bring the original file
> into the content repository or we are stretching NIFI to far from its
> intended usage patterns?
>
> Thanks for your  patience.
>
>
> Regards,
> Simone Giannecchini
> ==
> GeoServer Professional Services from the experts!
> Visit http://goo.gl/it488V for more information.
> ==
> Ing. Simone Giannecchini
> @simogeo
> Founder/Director
>
> GeoSolutions S.A.S.
> Via di Montramito 3/A
> 55054  Massarosa (LU)
> Italy
> phone: +39 0584 962313
> fax: +39 0584 1660272
> mob:   +39  333 8128928
>
> http://www.geo-solutions.it
> http://twitter.com/geosolutions_it
>
> ---
> AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
> Le informazioni contenute in questo messaggio di posta elettronica e/o
> nel/i file/s allegato/i sono da considerarsi strettamente riservate.
> Il loro utilizzo è consentito esclusivamente al destinatario del
> messaggio, per le finalità indicate nel messaggio stesso. Qualora
> riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
> cortesemente di darcene notizia via e-mail e di procedere alla
> distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
> Conservare il messaggio stesso, divulgarlo anche in parte,
> distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
> diverse, costituisce comportamento contrario ai principi dettati dal
> D.Lgs. 196/2003.
>
> The information in this message and/or attachments, is intended solely
> for the attention and use of the named addressee(s) and may be
> confidential or proprietary in nature or covered by the provisions of
> privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
> Data Protection Code).Any use not in accord with its purpose, any
> disclosure, reproduction, copying, distribution, or either
> dissemination, either whole or partial, is strictly forbidden except
> previous formal approval of the named addressee(s). If you are not the
> intended recipient, please contact immediately the sender by
> telephone, fax or e-mail and delete the information in this message
> that has been received in error. The sender does not give any warranty
> or accept liability as the content, accuracy or completeness of sent
> messages and accepts no responsibility  for changes made after they
> were sent or for other risks which arise as a result of e-mail
> transmission, viruses, etc.
>
>
> On Fri, Apr 21, 2017 at 1:01 PM, Andrew Grande  wrote:
> > Hi,
> >
> > First, there won't be multiple copies of a file within NiFi. If you pass
> > around the content and don't change it (only attributes), it will merely
> > point a reference to it, no more.
> >
> > You need to decide if you want to delete processed files, this is what
> > GetFile does. Might want to look into ListFile/FetchFile instead, it
> > maintains internal state of files already processed.
> >
> > Assuming you want to delete the file from the original location, you can
> use
> > PutFile in your file to write it to the new working directory and connect
> > the success relationship to ExecuteStreamCommand.
> >
> > Andrew
> >
> >
> > On Fri, Apr 21, 2017, 5:37 AM damiano.giampa...@geo-solutions.it
> >  wrote:
> >>
> >> Hi list,
> >>
> >> I'm a NiFi newbie and I'm trying to figure out the best way to use it
> as a
> >> batch ingestion system for satellite imagery as raster files.
> >> The files are pushed on the FS by an external system and then they must
> be
> >> processed and published through WMS protocols.
> >> I tried to draft the flow 

How to get ftp file according to Current date?

2017-04-21 Thread prabhu Mahendran
 I have tried that "GetFTP" processor in which downloads file from FTP
accoding to the two attributes

1."FileFilterRegex" -Name of file in FTP
2."RemotePath"-Path of an FTP file.

I wants to download the File from FTP Server only if it having today's date
which is append with filename.

*For example:*

File name is *20170421TempFile.txt *
which is in FTP Server.

Now i need to give that system date only to be append with filename.It
should get the current system date automatically *instead of given date
value directly*.

So i have find that ${now()} gets the current system date but i cannot give
it in *"FileFilterRegex"* attribute due to it doesn't have expression
language support.

Finally i need to get particular file with current date.

 Anyone give some idea/guide me to achieve my requirements?


Re: NiFi best practices to manage big flowfiles

2017-04-21 Thread Simone Giannecchini
Dear Andrew,
I am working with Damiano on this, so let me first thank you for your
indications.

The use case is as follows:

- a satellite acquisition is placed on a shared file system. It can be
significative in size, e.g. 10GB
- it has to be pulled through a chain of operations out of a larger
DAG where the elements of the sequence is decided depending on the
data itself
- we will surely create a number of intermediate files as we are going
to use standard CLI tools for the processing
- the resulting file will be placed again in a shared file system to
be served by a cluster of mapping servers to generate maps on the fly

We are getting thousands of this files per day hence we are trying to
minimize file move operations.

If you are still not sleeping, here is the point. Can I avoid, without
having to customize too many parts of NIFI, to bring the original file
into the content repository or we are stretching NIFI to far from its
intended usage patterns?

Thanks for your  patience.


Regards,
Simone Giannecchini
==
GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.
==
Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob:   +39  333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

---
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility  for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.


On Fri, Apr 21, 2017 at 1:01 PM, Andrew Grande  wrote:
> Hi,
>
> First, there won't be multiple copies of a file within NiFi. If you pass
> around the content and don't change it (only attributes), it will merely
> point a reference to it, no more.
>
> You need to decide if you want to delete processed files, this is what
> GetFile does. Might want to look into ListFile/FetchFile instead, it
> maintains internal state of files already processed.
>
> Assuming you want to delete the file from the original location, you can use
> PutFile in your file to write it to the new working directory and connect
> the success relationship to ExecuteStreamCommand.
>
> Andrew
>
>
> On Fri, Apr 21, 2017, 5:37 AM damiano.giampa...@geo-solutions.it
>  wrote:
>>
>> Hi list,
>>
>> I'm a NiFi newbie and I'm trying to figure out the best way to use it as a
>> batch ingestion system for satellite imagery as raster files.
>> The files are pushed on the FS by an external system and then they must be
>> processed and published through WMS protocols.
>> I tried to draft the flow using the NiFi processors available, summarizing
>> I used:
>>
>> - GetFile and PutFile processors to watch for incoming files to process
>> - UpdateAttributes to manage the location of the incoming files and the
>> intermediate processing products
>> - ExecuteStreamProcess to call the gdalwarp and gdaladdo command line
>> utilities to do the geospatial processing needed (http://www.gdal.org/)
>>
>> The issue I found with my use case is the fact that what represent
>> flowfiles are big raster files (1 to 6GB) and I would like to minimize the
>> number of copies of that resource.
>>
>> I used the GetFile processor to watch a FileSystem folder for incoming
>> files.
>> Once a new file is found, it is imported in the 

Re: NiFi best practices to manage big flowfiles

2017-04-21 Thread Andrew Grande
Hi,

First, there won't be multiple copies of a file within NiFi. If you pass
around the content and don't change it (only attributes), it will merely
point a reference to it, no more.

You need to decide if you want to delete processed files, this is what
GetFile does. Might want to look into ListFile/FetchFile instead, it
maintains internal state of files already processed.

Assuming you want to delete the file from the original location, you can
use PutFile in your file to write it to the new working directory and
connect the success relationship to ExecuteStreamCommand.

Andrew

On Fri, Apr 21, 2017, 5:37 AM damiano.giampa...@geo-solutions.it <
damiano.giampa...@geo-solutions.it> wrote:

> Hi list,
>
> I'm a NiFi newbie and I'm trying to figure out the best way to use it as a
> batch ingestion system for satellite imagery as raster files.
> The files are pushed on the FS by an external system and then they must be
> processed and published through WMS protocols.
> I tried to draft the flow using the NiFi processors available, summarizing
> I used:
>
> - GetFile and PutFile processors to watch for incoming files to process
> - UpdateAttributes to manage the location of the incoming files and the
> intermediate processing products
> - ExecuteStreamProcess to call the gdalwarp and gdaladdo command line
> utilities to do the geospatial processing needed (http://www.gdal.org/)
>
> The issue I found with my use case is the fact that what represent
> flowfiles are big raster files (1 to 6GB) and I would like to minimize the
> number of copies of that resource.
>
> I used the GetFile processor to watch a FileSystem folder for incoming
> files.
> Once a new file is found, it is imported in the NiFi internal content
> repository so I can't reference it with its absolute FS path anymore since
> it is transformed into a flowfile (Did I misunderstand something here?)
> So I have to copy it again somewhere else on the FS to process it, the
> geospatial processing utilities I have to use require the abs path of the
> files to process.
>
> There could be maybe a solution which better address the design of this
> flow, for example I can watch not for the real file but for a txt file
> which describe its FS abs path.
>
> Anyway I was wondering if it is possible to configure the GetFile
> processors to use as flowfile payload only the absolute path of the file
> found, but I guess that in that case I have to write my own GetFile
> processor. (the same could be valid also for other Getxxx processors)
>
>
> Anyone has some hints to suggest me? Am I on the right path?
> I would be sure that I am not overlooking at some NiFi concept/feature
> that can allows me to better manage this Use case.
>
>
> I hope to have been clear enough, any shared shared would be extremely
> appreciated!
>
>
> Best regards,
> Damiano
>
> --
>
> ==
> GeoServer Professional Services from the experts!
> Visit http://goo.gl/it488V for more information.
> ==
> Damiano Giampaoli
> Software Engineer
>
> GeoSolutions S.A.S.
> Via di Montramito 3/A
> 55054  Massarosa (LU)
> Italy
> phone: +39 0584 962313
> fax: +39 0584 1660272
> mob:   +39 333 8128928 <%2B39%20%20333%208128928>
>
> http://www.geo-solutions.it
> http://twitter.com/geosolutions_it
>
> ---
> AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
> Le informazioni contenute in questo messaggio di posta elettronica e/o
> nel/i file/s allegato/i sono da considerarsi strettamente riservate.
> Il loro utilizzo è consentito esclusivamente al destinatario del
> messaggio, per le finalità indicate nel messaggio stesso. Qualora
> riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
> cortesemente di darcene notizia via e-mail e di procedere alla
> distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
> Conservare il messaggio stesso, divulgarlo anche in parte,
> distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
> diverse, costituisce comportamento contrario ai principi dettati dal
> D.Lgs. 196/2003.
>
> The information in this message and/or attachments, is intended solely
> for the attention and use of the named addressee(s) and may be
> confidential or proprietary in nature or covered by the provisions of
> privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
> Data Protection Code).Any use not in accord with its purpose, any
> disclosure, reproduction, copying, distribution, or either
> dissemination, either whole or partial, is strictly forbidden except
> previous formal approval of the named addressee(s). If you are not the
> intended recipient, please contact immediately the sender by
> telephone, fax or e-mail and delete the information in this message
> that has been received in error. The sender does not give any warranty
> or accept liability as the content, accuracy or completeness of sent
> messages and accepts no responsibility  for changes made after they
> were sent or for 

Re: Clustering Best Practices?

2017-04-21 Thread James McMahon
Indeed. A very important distinction. Thank you for the correction Andrew.
I'll be more careful with my terminology. -Jim

On Thu, Apr 20, 2017 at 6:40 PM, Andrew Grande  wrote:

> BTW, your NiFi instance is not single-threaded, it's a single node. It
> still runs multiple worker threads in the flow.
>
> Andrew
>
>
> On Thu, Apr 20, 2017, 7:01 AM James McMahon  wrote:
>
>> Good morning. I have established an initial single-threaded NiFi server
>> instance for my customers. It works well, but I anticipate increasing usage
>> as groups learn more about it. I also want to move beyond our single
>> threaded-ness.
>>
>> I would like to take the next step in the evolution of our NiFi
>> capability, implementing a clustered NiFi server configuration to help
>> me address the following requirements:
>> 1. increase our fault tolerance
>> 2. permit our configuration to scale to peak processing demands during
>> bulk data loads and as more customers begin to leverage our NiFi instance
>> 3. permit our configuration to load balance
>>
>> I do intend to begin by reading through the clustering sections in the
>> NiFi Sys Admin guide. I am also interested in hearing from our user
>> community, particularly regarding clustering "best practices" and practical
>> insights based on your experiences. Thanks in advance for any insights you
>> are willing to share.  -Jim
>>
>


NiFi best practices to manage big flowfiles

2017-04-21 Thread damiano.giampa...@geo-solutions.it
Hi list,
I'm a NiFi newbie and I'm trying to figure out the best way to use it as a 
batch ingestion system for satellite imagery as raster files.The files are 
pushed on the FS by an external system and then they must be processed and 
published through WMS protocols.I tried to draft the flow using the NiFi 
processors available, summarizing I used:
- GetFile and PutFile processors to watch for incoming files to process - 
UpdateAttributes to manage the location of the incoming files and the 
intermediate processing products- ExecuteStreamProcess to call the gdalwarp and 
gdaladdo command line utilities to do the geospatial processing needed 
(http://www.gdal.org/)
The issue I found with my use case is the fact that what represent flowfiles 
are big raster files (1 to 6GB) and I would like to minimize the number of 
copies of that resource.
I used the GetFile processor to watch a FileSystem folder for incoming 
files.Once a new file is found, it is imported in the NiFi internal content 
repository so I can't reference it with its absolute FS path anymore since it 
is transformed into a flowfile (Did I misunderstand something here?)So I have 
to copy it again somewhere else on the FS to process it, the geospatial 
processing utilities I have to use require the abs path of the files to process.
There could be maybe a solution which better address the design of this flow, 
for example I can watch not for the real file but for a txt file which describe 
its FS abs path.
Anyway I was wondering if it is possible to configure the GetFile processors to 
use as flowfile payload only the absolute path of the file found, but I guess 
that in that case I have to write my own GetFile processor. (the same could be 
valid also for other Getxxx processors)

Anyone has some hints to suggest me? Am I on the right path? I would be sure 
that I am not overlooking at some NiFi concept/feature that can allows me to 
better manage this Use case.

I hope to have been clear enough, any shared shared would be extremely 
appreciated!

Best regards,Damiano
-- 


==
GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.
==

Damiano Giampaoli
Software Engineer


GeoSolutions S.A.S.
Via di Montramito 3/A
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax:     +39 0584 1660272
mob:   +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

---
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility  for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.