exporting the canvas using the rest API

2018-06-19 Thread Knapp, Michael
Hi,

My team is developing a platform on Kubernetes that hosts NiFi instances for 
tenants.  We are not giving tenants direct access to the running NiFi pods 
(docker containers).  They will only have indirect access through NiFi’s rest 
API and also through one I am developing.

After a tenant creates their canvas on our platform, I want to provide some 
means for them to export their canvas XML, basically to let them save it as 
another docker image.

Unfortunately my own code is finding it impossible to use Kubernetes’s “Exec” 
service, I already lost a week trying that.  Without that, I cannot easily 
export the canvas XML (flow.xml).  The exec service would have let my own API 
run arbitrary commands on the pod (docker container), like cat 
/path/to/nifi/conf/flow.xml.gz for example.

Is there any easy way for me to get the canvas (flow.xml) from NiFi using 
NiFi’s rest API?  I am using NiFi 1.3.

Michael Knapp


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


host names in a clustered NiFi

2017-08-08 Thread Knapp, Michael
Hi,

I’m trying to run NiFi in a clustered configuration within docker containers on 
kubernetes.  While I have NiFi starting in standalone mode, I get exceptions 
when launching it in clustered mode.  I’m having a lot of trouble figuring out 
what the host names really should be.

I looked through the system 
properties
 (web and cluster properties) and also the Admin Guide (Cluster setup section). 
 The description of these host properties is quite vague:

nifi.cluster.node.address – The fully qualified address of the 
node.  It is blank by default.
Nifi.web.http.host – The HTTP host.  It is blank by default.
Nifi.web.https.host – The HTTPS host. It is blank by default.

The descriptions above are quite vague and don’t explain to me how NiFi will 
behave based on these values.  In fact, I keep needing to go through a lot of 
trial and error to make this work.  What happens if they are blank?  What 
happens if I insert the IP address?

Just asking for the host name is ambiguous, I have a docker internal host name, 
an EC2 host name, a kubernetes service name, load balancer host names, and a 
route53 host name.  The correct value really depends on how these are used in 
the code.  I could easily see these needing to be the kubernetes service name 
or the internal host name, or the route53 host name.  What should I be using 
here?  Should all nodes in the cluster have the same value for this?

Here is the stack trace I am seeing at runtime:

2017-08-08 14:23:51,534 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
Failed to start web server: Unresolved address
2017-08-08 14:23:51,535 ERROR [NiFi logging handler] org.apache.nifi.StdErr 
Shutting down...
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut 
2017-08-08 14:23:51,536 WARN [main] org.apache.nifi.web.server.JettyServer 
Failed to start web server... shutting down.
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut 
java.net.SocketException: Unresolved address
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
sun.nio.ch.Net.translateToSocketException(Net.java:131)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
sun.nio.ch.Net.translateException(Net.java:157)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
sun.nio.ch.Net.translateException(Net.java:163)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:298)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.eclipse.jetty.server.Server.doStart(Server.java:431)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.apache.nifi.web.server.JettyServer.start(JettyServer.java:705)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.apache.nifi.NiFi.(NiFi.java:160)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
org.apache.nifi.NiFi.main(NiFi.java:267)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut 
Caused by: java.nio.channels.UnresolvedAddressException: null
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
sun.nio.ch.Net.checkAddress(Net.java:101)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
2017-08-08 14:23:51,536 INFO [NiFi logging handler] org.apache.nifi.StdOut ... 
9 common frames omitted


Michael Knapp


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the 

couple questions

2017-06-23 Thread Knapp, Michael
Hi,

My team is starting to do more and more with NiFi, and I had several questions 
for you.

First, we are thinking of having multiple separate NiFi flows but we want a 
single source for data provenance.  In the source code I only see these 
implementations: PersistentProvenanceRepository, VolatileProvenanceRepository, 
and MockProvenanceRepository.  I was hoping to find a web service that I could 
run separately from NiFi, and have all my NiFi clusters publish events to that. 
 Is there any public implementation like that?

Also, we are thinking seriously about using repositories that are not backed by 
the local file system.  I am helping an intern write an implementation of 
ContentRepository that is backed by S3, he has already had some success with 
this (we started by copying a lot from the VolatileContentRepository).  I’m 
also interested in implementations backed by Kafka and Pachyderm.  If that 
works, we will probably also need the other repositories to follow, 
specifically the FlowFileRepository.  Unfortunately, I cannot find a lot of 
documentation on how to write these repositories, I have just been figuring 
things out by reviewing the source code and unit tests, but it is still very 
confusing to me.  So I was wondering:

1.   Has anybody been working on alternative ContentRepository 
implementations?  Specifically with S3, pachyderm, kafka, or some 
databases/datastores?

2.   Is there any thorough documentation regarding the contracts that these 
implementations must adhere to? (besides source code and unit tests)

I’m mainly interested in alternative repositories so I can make NiFi truly 
fault tolerant (one node dies, and the others immediately take over its work).  
Also it would greatly simplify a lot of infrastructure/configuration management 
for us, could help us save some money, and might help us with compliance 
issues.  On the down side, it might hurt the file throughput.

Please let me know,

Michael Knapp



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: [DISCUSS] Encrypted repositories (content, flowfile, provenance)

2017-01-24 Thread Knapp, Michael
I didn't realize ContentRepository was found using the Service Loader, I will 
try that out soon.


BTW I never got around to responding the last time you helped, but I remember 
that your advice helped me get NIFI working again.


Thanks again Bryan.


From: Bryan Bende <bbe...@gmail.com>
Sent: Tuesday, January 24, 2017 12:54:53 PM
To: dev@nifi.apache.org
Subject: Re: [DISCUSS] Encrypted repositories (content, flowfile, provenance)

Michael,

While processors, controller services, and reporting tasks are definitely
the most common extension points, you should still be able to deploy a NAR
with a custom repository implementation.

The provenance repository is a good example to look at since it is deployed
as it's own NAR [1].

All extension points use the Java Service Loader so your JAR that would
packaged in your NAR needs to have the appropriate
src/main/resources/META-INF/services file as can be seen here for
provenance [2]. Without that file NiFi would not recognize the extension
point.

Thanks,

Bryan

[1]
https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-provenance-repository-bundle
[2]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/resources/META-INF/services/org.apache.nifi.provenance.ProvenanceRepository


On Tue, Jan 24, 2017 at 12:11 PM, Knapp, Michael <
michael.kn...@capitalone.com> wrote:

> Andy,
>
>
> I and several of my co-workers are very interested in seeing this feature
> added to NIFI.  I have been trying to write an
> EncryptedFileSystemRepository myself but unfortunately did not get very far
> before running into issues.
>
>
> So far I just tried a simple solution: I essentially extended
> FileSystemRepository, and overrode the "read" and "write" methods to wrap
> them with java's CipherInputStream or CipherOutputStream.  I also added
> ways to pass in a secret key.  If that had worked, I would have updated it
> to use AWS's KMS service to provide secret keys.  I do think we would need
> to roll keys periodically, but still haven't figured out how that would
> work.
>
>
> I ran into a lot of classpath issues.  First I tried making a thin jar and
> putting it on NIFI's lib, but unfortunately NIFI's classloader ensured that
> my jar did not see any of nifi's framework classes.  I tried making an uber
> jar but encountered a lot of other classpath issues.  I tried making a nar,
> but NIFI still could not find my EncryptedRepository when spring was wiring
> up the application.  It seems you can only extend repositories from within
> the source code.  So I tried putting my code into the source and
> re-building.  Unfortunately my work's proxy prevented me from getting some
> essential apache artifacts so I am unable to build NIFI from source.  I
> might try this again from a personal account.
>
>
> So it seems that with NIFI it is not easy to implement things that are not
> processors or controller services, since you have to add that to the source
> and re-build it.  Perhaps NIFI needs an easier way to add these
> implementations without re-building from source.
>
>
> Since I don't want to put the entire AWS core artifact and its transitive
> dependencies inside NIFI, I was thinking of using java's SPI framework to
> provide encryption services to the EncryptedRepository.
>
>
> I'm not sure I really answered any of your questions, but hopefully I gave
> you an idea of how we might use this.  I might be available to help develop
> and/or test the solution.
>
>
> Michael Knapp
>
> Capital One
>
> 
> From: Andy LoPresto <alopre...@apache.org>
> Sent: Tuesday, January 24, 2017 12:13:07 AM
> To: dev@nifi.apache.org
> Subject: [DISCUSS] Encrypted repositories (content, flowfile, provenance)
>
> I am working on building drop-in encrypted repositories for NiFi [1]. I am
> currently in the planning stages and have written up a fairly extensive
> Jira ticket documenting my plans (high-level) and some concerns (much
> bigger section). I would welcome community feedback in order to capture
> expectations, concerns (for security, performance, and usability), and any
> other valuable contributions. All of us are smarter than one of us, and I
> am not that smart. Thanks.
>
> [1] https://issues.apache.org/jira/browse/NIFI-3388
>
> Andy LoPresto
> alopre...@apache.org<mailto:alopre...@apache.org>
> alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> 
>
> The information contained in this e-mail is confidential and/or
> propr

Re: [DISCUSS] Encrypted repositories (content, flowfile, provenance)

2017-01-24 Thread Knapp, Michael
Andy,


I and several of my co-workers are very interested in seeing this feature added 
to NIFI.  I have been trying to write an EncryptedFileSystemRepository myself 
but unfortunately did not get very far before running into issues.


So far I just tried a simple solution: I essentially extended 
FileSystemRepository, and overrode the "read" and "write" methods to wrap them 
with java's CipherInputStream or CipherOutputStream.  I also added ways to pass 
in a secret key.  If that had worked, I would have updated it to use AWS's KMS 
service to provide secret keys.  I do think we would need to roll keys 
periodically, but still haven't figured out how that would work.


I ran into a lot of classpath issues.  First I tried making a thin jar and 
putting it on NIFI's lib, but unfortunately NIFI's classloader ensured that my 
jar did not see any of nifi's framework classes.  I tried making an uber jar 
but encountered a lot of other classpath issues.  I tried making a nar, but 
NIFI still could not find my EncryptedRepository when spring was wiring up the 
application.  It seems you can only extend repositories from within the source 
code.  So I tried putting my code into the source and re-building.  
Unfortunately my work's proxy prevented me from getting some essential apache 
artifacts so I am unable to build NIFI from source.  I might try this again 
from a personal account.


So it seems that with NIFI it is not easy to implement things that are not 
processors or controller services, since you have to add that to the source and 
re-build it.  Perhaps NIFI needs an easier way to add these implementations 
without re-building from source.


Since I don't want to put the entire AWS core artifact and its transitive 
dependencies inside NIFI, I was thinking of using java's SPI framework to 
provide encryption services to the EncryptedRepository.


I'm not sure I really answered any of your questions, but hopefully I gave you 
an idea of how we might use this.  I might be available to help develop and/or 
test the solution.


Michael Knapp

Capital One


From: Andy LoPresto 
Sent: Tuesday, January 24, 2017 12:13:07 AM
To: dev@nifi.apache.org
Subject: [DISCUSS] Encrypted repositories (content, flowfile, provenance)

I am working on building drop-in encrypted repositories for NiFi [1]. I am 
currently in the planning stages and have written up a fairly extensive Jira 
ticket documenting my plans (high-level) and some concerns (much bigger 
section). I would welcome community feedback in order to capture expectations, 
concerns (for security, performance, and usability), and any other valuable 
contributions. All of us are smarter than one of us, and I am not that smart. 
Thanks.

[1] https://issues.apache.org/jira/browse/NIFI-3388

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


problems with custom controller services, and other comments

2017-01-10 Thread Knapp, Michael
Devs,

For some reason NIFI is not working with some custom controller services I have 
written.  I wrote new implementations of AWSCredentialsProviderService, that 
aim to work with session tokens.  I am hoping to run NIFI from my local machine 
and to be able to connect with AWS using session tokens.

For both implementations I tried, it fails when I try to create them from the 
web UI.  I created a PutS3Object processor, and configured the “AWS Credentials 
Provider Service” property.  From that property I tried to create a new service 
and selected my custom implementation.  When I click “create” the value for the 
credentials provider service is the ID of the controller service, not its name. 
 While my controller services require several properties to be set, the web UI 
is not letting me set them.  Usually I see an arrow next to the property, which 
allows me to configure a controller service, but I am not getting that now.  I 
looked in the nifi-app logs, and I do not see any exception, I have even set 
the logging to TRACE for all things, and still don’t see any problem in the 
logs.  The PutS3Object processor is not validating because the controller 
service is found to be invalid.  I tried creating a unit test, it seems to work 
for me in tests, but I can’t use TestRunners because that is processor 
oriented, not meant for controller services.  I have a suspicion that spring’s 
aspect oriented programming is somehow fuddling with my service.

Does anybody know what I am doing wrong here?

Other unrelated comments:

1.   The first time you unpack NIFI it takes super long for it to start for 
me, like a half hour or more.  I think you should make it easy for people to 
scale back their NIFI implementation.  Really I would like to start it with 
just the minimum NAR files for it to start, and I can add others that I need.  
Maybe a sub-directory in lib for the essential nars could help people separate 
the essential stuff from the optional nars.  The first time I tried installing 
it, I thought it was broken when really it just was taking forever (over 30 
minutes).  I think that new users will probably abandon NIFI if they can’t get 
it to start quickly out of the box.  Maybe split the optional nars into an 
“extra-lib”, and people can move those into lib as necessary for their goals.

2.   Building NIFI from source takes over an hour for me, really I just 
want to build the bare minimum things to get it to start.  I tried creating 
maven profiles to have it build just the minimum pieces, but this proved to be 
non-trivial as maven does not seem to respect the “modules” tag in profiles, 
and the nifi-assembly project requires all of the optional nars to also be 
built.  Creating this might be too complicated for me.  Has anybody thought 
about supporting a quick/minimal build?

3.   The “nifi-aws-processors” is challenging to use because in one project 
they have defined the interfaces for controller services 
(AWSCredentialsProviderService) and also included the services.  I tried 
creating my own nar with an implementation of AWSCredentialsProviderService, 
but since it depended on “nifi-aws-processors”, my nar was also re-hosting all 
of the AWS processors.  I was facing a lot of classpath issues because of this. 
 I worked around this by using maven shade to shade in the 
“nifi-aws-processors” into my own jar, but excluding the services it provided.  
Then in my nar project I had to exclude the dependency on 
“nifi-aws-processors”.  This was a lot of work on my part when all they needed 
to do was split that project into api, api-nar, impl, and impl-nar.

4.   I think it is very confusing how there is a “Controller Services” for 
the entire NIFI canvas, and separate ones for individual processor groups.  It 
seems that processors cannot use global controller services, and I am still 
uncertain about why I would ever create a global one.  From Nifi settings, I 
would like to also see controller services in processor groups, and vice versa. 
 From a processor, I would like to assign controller services that are global 
in scope, not limited to a processor group.  I think this is something that 
will confuse and frustrate a lot of new developers, driving them to consider 
competing products.

5.   I think the developer guide needs some clarification on what jars are 
provided and not.  New developers will be unsure if they should include 
“nifi-api” as a provided or compile dependency, and same goes for 
nifi-framework-core.

6.   Perhaps the maven-nar-plugin could let people tell NIFI to only use 
services listed under a certain set of bundled-dependencies.  For example, my 
code depends on “nifi-aws-processors”, but I don’t want my nar to also host the 
services in that jar.  From the MOJO you are able to exclude entire artifacts, 
but you can’t exclude the services within certain artifacts.  This might be a 
problem to fix on the classloader side instead of from the 

SQL to CSV?

2016-12-28 Thread Knapp, Michael
Nifi Devs,

I noticed you have two processors (ExecuteSQL and QueryDatabaseTable) that 
perform SQL select statements and put the results into a flow file.  While I am 
not sure what their difference is, I did notice that they both produce avro, 
and the schema is inferred from the result set.  While the schema is included 
in the output file’s contents, I am not sure of any easy way to get that from a 
*StreamCallback.  So I am wondering,


1.   Could we update the processor to support multiple output formats?  I 
think CSV should definitely be supported.  Parquet might also be useful for me. 
 JSON is an option but since you already have a ConvertAvroToJSON processor 
that is not a big deal for me.

2.   Could we update the processor to include the schema as one of the 
output flow file attributes?

3.   Is there any utility to get an avro schema from the input stream 
callback?

4.   Has anybody thought about writing a processor to convert Avro to CSV?  
Or even something more generic than that, a generic format conversion 
processor?  It could support CSV, JSON, Avro, Parquet, XML, and possibly others.

Please let me know,

Michael Knapp
Capital One


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: encryption before writing to disk

2016-12-23 Thread Knapp, Michael
Great advice!  I just might wind up using that.

Is there any way to have some processors use one repository and other 
processors use the default repository?  If not, then I may need to have two 
separate nifi instances running for that.

On 12/23/16, 11:42 AM, "Pierre Villard" <pierre.villard...@gmail.com> wrote:

Kind of.

WriteAheadFlowFileRepository is for FlowFile repository, it only stores
attributes and flow file states [1].
For flow files content, you would need to extend FileSystemRepository.
And if you need to also encrypt provenance data, that's an additional
repository to extend.

Just a quick remark: I didn't say it in my previous mail, but a short term
solution for you could be to use the volatile repositories to keep
everything in memory instead of disks, but be aware of the limitations (in
case of a NiFi restart for example).

[1]

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#flowfile-repository

2016-12-23 17:26 GMT+01:00 Knapp, Michael <michael.kn...@capitalone.com>:

> Pierre,
>
> OK so let me see if I am interpreting your advice correctly:
> 1. Extend WriteAheadFlowFileRepository.
> 2. Override methods as necessary.
> 3. Make it encrypt everything that goes to disk.  I could have it use
> AWS’s KMS service to do that.  I might even have custom encryption logic
> here.
>
> Is that right?
>
> On 12/23/16, 10:23 AM, "Pierre Villard" <pierre.villard...@gmail.com>
> wrote:
>
> Hi Michael,
>
> The feature you describe is on the roadmap and you may find more
> details
> here [1] and/or participate into the discussion. Regarding the
> implementation you suggest, I think that implementing a custom
> repository
> instead of a controller service would be an easier approach (no need 
to
> change the processors). In any case, this would be a great feature and
> it
> is possible that Andy already started some developments on this
> subject.
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/
> Security+Feature+Roadmap
>
> -Pierre
>
>
> 2016-12-23 15:43 GMT+01:00 Knapp, Michael <
> michael.kn...@capitalone.com>:
>
> > Nifi Developers,
> >
> > So I have a somewhat interesting task.  I want to run Nifi on AWS,
> but at
> > the same time there is a lot of red tape involved with putting data
> on
> > AWS.  Some data may not be placed on an AWS disk unless it is
> encrypted.
> > Running Nifi on top of an encrypted EBS is not considered good
> enough in my
> > case.  The ListenHTTP processor does not let people encrypt content
> before
> > it is written to disk.  You can encrypt the content downstream, but
> at that
> > point it has already been written to disk so that is too late.
> People can
> > encrypt content before it is sent to Nifi, but in some situations
> that may
> > be very challenging for the developer, as they may have limited
> access to
> > the source of the data.
> >
> > I was thinking of modifying the ListenHTTP processor and other
> similar
> > processors.  I want to create a ControllerService interface that
> merely
> > returns a StreamCallback implementation.  The ListenHTTP processor
> would
> > take this as an optional property.  If that property is set, then 
the
> > processor will use that to modify/encrypt content before it is even
> written
> > to disk.  If the property is not set, then it will operate the same
> way it
> > does now.
> >
> > I looked for a good project to place this controller service
> interface in,
> > I feel like this service is so basic that it should really be part
> of the
> > framework’s core, but I don’t see any other controller services
> there.  So
> > my best guess for now is to put this in the
> nifi-ssl-context-service-(api|bundle)
> > projects.  I feel like this is not really related to SSL, but that
> is the
> > only project I found that has controller services listed and is a
> > dependency of both the nifi-kafka projects and the
> nifi-standard-processors
> > projects.  I think it would be a waste to set up a new api/bundle
> pair just
> > for o

Re: encryption before writing to disk

2016-12-23 Thread Knapp, Michael
Pierre,

OK so let me see if I am interpreting your advice correctly:
1. Extend WriteAheadFlowFileRepository.  
2. Override methods as necessary.  
3. Make it encrypt everything that goes to disk.  I could have it use AWS’s KMS 
service to do that.  I might even have custom encryption logic here.

Is that right?

On 12/23/16, 10:23 AM, "Pierre Villard" <pierre.villard...@gmail.com> wrote:

Hi Michael,

The feature you describe is on the roadmap and you may find more details
here [1] and/or participate into the discussion. Regarding the
implementation you suggest, I think that implementing a custom repository
instead of a controller service would be an easier approach (no need to
change the processors). In any case, this would be a great feature and it
is possible that Andy already started some developments on this subject.

[1]
https://cwiki.apache.org/confluence/display/NIFI/Security+Feature+Roadmap

-Pierre


2016-12-23 15:43 GMT+01:00 Knapp, Michael <michael.kn...@capitalone.com>:

> Nifi Developers,
>
> So I have a somewhat interesting task.  I want to run Nifi on AWS, but at
> the same time there is a lot of red tape involved with putting data on
> AWS.  Some data may not be placed on an AWS disk unless it is encrypted.
> Running Nifi on top of an encrypted EBS is not considered good enough in 
my
> case.  The ListenHTTP processor does not let people encrypt content before
> it is written to disk.  You can encrypt the content downstream, but at 
that
> point it has already been written to disk so that is too late.  People can
> encrypt content before it is sent to Nifi, but in some situations that may
> be very challenging for the developer, as they may have limited access to
> the source of the data.
>
> I was thinking of modifying the ListenHTTP processor and other similar
> processors.  I want to create a ControllerService interface that merely
> returns a StreamCallback implementation.  The ListenHTTP processor would
> take this as an optional property.  If that property is set, then the
> processor will use that to modify/encrypt content before it is even 
written
> to disk.  If the property is not set, then it will operate the same way it
> does now.
>
> I looked for a good project to place this controller service interface in,
> I feel like this service is so basic that it should really be part of the
> framework’s core, but I don’t see any other controller services there.  So
> my best guess for now is to put this in the 
nifi-ssl-context-service-(api|bundle)
> projects.  I feel like this is not really related to SSL, but that is the
> only project I found that has controller services listed and is a
> dependency of both the nifi-kafka projects and the 
nifi-standard-processors
> projects.  I think it would be a waste to set up a new api/bundle pair 
just
> for one interface.
>
> So my questions are:
>
> 1.   Do you think this is a good idea?
>
> 2.   Where should I put this code if I write it?
>
> Michael Knapp
> Capital One
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the 
intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


encryption before writing to disk

2016-12-23 Thread Knapp, Michael
Nifi Developers,

So I have a somewhat interesting task.  I want to run Nifi on AWS, but at the 
same time there is a lot of red tape involved with putting data on AWS.  Some 
data may not be placed on an AWS disk unless it is encrypted.  Running Nifi on 
top of an encrypted EBS is not considered good enough in my case.  The 
ListenHTTP processor does not let people encrypt content before it is written 
to disk.  You can encrypt the content downstream, but at that point it has 
already been written to disk so that is too late.  People can encrypt content 
before it is sent to Nifi, but in some situations that may be very challenging 
for the developer, as they may have limited access to the source of the data.

I was thinking of modifying the ListenHTTP processor and other similar 
processors.  I want to create a ControllerService interface that merely returns 
a StreamCallback implementation.  The ListenHTTP processor would take this as 
an optional property.  If that property is set, then the processor will use 
that to modify/encrypt content before it is even written to disk.  If the 
property is not set, then it will operate the same way it does now.

I looked for a good project to place this controller service interface in, I 
feel like this service is so basic that it should really be part of the 
framework’s core, but I don’t see any other controller services there.  So my 
best guess for now is to put this in the nifi-ssl-context-service-(api|bundle) 
projects.  I feel like this is not really related to SSL, but that is the only 
project I found that has controller services listed and is a dependency of both 
the nifi-kafka projects and the nifi-standard-processors projects.  I think it 
would be a waste to set up a new api/bundle pair just for one interface.

So my questions are:

1.   Do you think this is a good idea?

2.   Where should I put this code if I write it?

Michael Knapp
Capital One


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.