Re: NiFi Registry -> NiFi

2021-07-16 Thread Andrew Grande
This is the Way, Team :)

Congrats on this effort!

Andrew

On Fri, Jul 16, 2021, 8:29 AM Matt Burgess  wrote:

> All,
>
> Now that [1] is merged, NiFi Registry is now officially in the NiFi
> codebase! Thanks to all who participated in discussions, reviews, and
> testing. Here are some notes on what this means going forward:
>
> - All future releases of NiFi Registry will be in lockstep with NiFi,
> as of the current release version of 1.14.0 (rather than 0.9.0). NiFi
> Registry
> artifacts are now built and published as part of the NiFi release
> process. This will allow NiFi Registry to immediately benefit from
> features in NiFi components (and vice versa).
>
> - Open Jira issues in the NIFIREG project have been moved to the NIFI
> Jira project, and Resolved issues with a Fix Version of 0.9.0 have
> been moved to the NIFI Jira project with a Fix Version of 1.14.0.
> Resolved issues for previous versions are retained in the NIFIREG Jira
> project for the purposes of archival/history, but the project is now
> effectively read-only. From now on, Jiras should be submitted against
> the NIFI project with a Component of "NiFi Registry".
>
> - Future PRs should be submitted against the nifi repo's main branch.
> If you have an open PR against the nifi-registry repo, you can port it
> to the nifi repo by adding ".patch" to the URL of your PR, downloading
> the patch, and applying it to a branch in your fork of the nifi repo,
> for example:
>
> git checkout -b NIFIREG-406
> git apply --directory='nifi-registry' ~/289.patch
>
> There is an open Jira case [2] to make the nifi-registry repo
> read-only so no more
> PRs can be submitted against it, and we can update NiFi's PR template
> to refer to NiFi Registry if need be.
>
> There's still work to be done of course, contributions and PR reviews
> are most welcome!
>
> Regards,
> Matt
>
> [1] https://github.com/apache/nifi/pull/5065
> [2] https://issues.apache.org/jira/browse/INFRA-22112
>


Re: Apache NiFi Registry

2021-07-01 Thread Andrew Grande
Isn't the proper state for this use case enabled/disabled? NiFi will start
a PG and every schedulable component in it. If one needs to prevent this,
disable a processor.

Andrew

On Thu, Jul 1, 2021, 2:17 PM Phillip Lord 
wrote:

> Hello,
>
> My organization is considering utilizing the Registry.  From my testing it
> appears that versioning doesn't keep track of the state of components
> (stopped/started/etc).  Is this accurate?  Are there plans to have
> versioning keep track of this in future releases?
>
> I'm using NiFi 1.11.4 and Registry version 0.8.0
>
> Thanks,
> Phil Lord
>


Re: [discuss] we need to enable secure by default...

2021-02-09 Thread Andrew Grande
Mysql has been generating an admin password on default installs for, like,
forever. This workflow should be familiar for many users.

I'd suggest taking the automation tooling into account and how a production
rollout (user-provided password) would fit into the workflow.

Andrew

On Tue, Feb 9, 2021, 8:15 PM Tony Kurc  wrote:

> Joe,
> In addition to your suggestions, were you thinking of making this processor
> disabled by default as well?
>
> Tony
>
>
> On Tue, Feb 9, 2021, 11:04 PM Joe Witt  wrote:
>
> > Team
> >
> > While secure by default may not be practical perhaps ‘not blatantly wide
> > open’ by default should be adopted.
> >
> > I think we should consider killing support for http entirely and support
> > only https.  We should consider auto generating a user and password and
> > possibly server cert if nothing is configured and log the generated user
> > and password.   Sure it could still be configured to be non secure but
> that
> > would truly be an admins fault.  Now its just ‘on’
> >
> > This tweet is a great example of why
> >
> > https://twitter.com/_escctrl_/status/1359280656174510081?s=21
> >
> >
> > Who agrees?  Who disagrees?   Please share ideas.
> >
> > Thanks
> >
>


Re: nifi deploy automation - variables pass through (incl controller services)

2019-09-02 Thread Andrew Grande
I think there is confusion here. Changing *properties* is considered a
tracked change, the behavior is per design.

There are *variables* on the PG level which are considered non-trackable
environment settings and won't trigger a dirty flag.

There is a dedicated command in the cli to list and set these *variables*.

Hope it helps,
Andrew

On Mon, Sep 2, 2019, 4:07 PM sivasankar Reddy  wrote:

> Hi Bryan,
>
> Thanks for the options. After pg-import or pg-change-version, if we use
> nifi rest api and update properties, looks like nifi instance detects
> changes(compares with bucket flow in registry) and creates * mark.
>
> We can change pg-change-version however just for properties setting , may
> be its not advisable to change version in nifi instance.
>
> Any ideas around this will be helpful.
>
> Thanks
> Siva
>
> On 2019/08/23 16:23:38, Bryan Bende  wrote:
> > Well there are two different encryption parts here -
> > encryption-at-rest and encryption-in-transit...
> >
> > Encryption-at-rest would have to be part of your script that is
> > calling the REST API. Somehow you create a config file with encrypted
> > values, and your script needs to read those in an decrypt them in
> > memory and then put them into the content that will be sent in the
> > HTTP PUT or POST request.
> >
> > Encryption-in-transit would be handled by the fact that you would be
> > making an HTTPS request to NiFi, so the contents of the POST or PUT
> > request are encrypted in transit. This is the same thing that is
> > happening if you were in the NiFi UI and typed in a password into the
> > sensitive property field.
> >
> >
> > On Fri, Aug 23, 2019 at 11:35 AM sivasankar Reddy 
> wrote:
> > >
> > > Hi Bryan,
> > >
> > > Thanks for the reply.
> > > Able to get the json object and set the property for Password. However
> value of password can be either referred from config file (its plain string
> visible in config). Is there any way that password is encrypted and this
> set property can take decrypted value.
> > >
> > > this way we ensure that password is sensitive. Any other ideas to
> achieve this
> > >
> > >
> > > Regards,
> > > Siva
> > >
> > >
> > >
> > > On 2019/08/22 17:38:37, Bryan Bende  wrote:
> > > > The parameters work is on-going and will be in the next release which
> > > > is 1.10.0. Releases don't really have specific timelines, but
> > > > generally they happen every few months. Most likely 1.10.0 is a
> couple
> > > > of weeks away, but depends on when active work is completed and when
> > > > someone volunteers to make a release.
> > > >
> > > > There is no timeline for the "set-property" command since it was just
> > > > suggested as a new feature in this email thread :) It requires
> someone
> > > > creating a JIRA and deciding to work on it.
> > > >
> > > > All of the functionality in the CLI and NiPyAPI, and even NiFi's own
> > > > UI, is based on the REST API. So you can still perform a "set
> > > > property" by using the REST API to modify the configuration of a
> > > > processor, the CLI would just make it easier so that you wouldn't
> have
> > > > to understand the lower level API details. The best way to understand
> > > > the API calls is to us the UI while you have Chrome Dev Tools open,
> > > > and then perform the action you are interested in and look at the
> > > > requests made on the Network tab. You'll be able to see what URLs are
> > > > called and what the request and response bodies look like.
> > > >
> > > > On Thu, Aug 22, 2019 at 11:46 AM sivasankar Reddy <
> mail2ms...@gmail.com> wrote:
> > > > >
> > > > > Hi Bryan,
> > > > >
> > > > > Thanks for the reply. looks like even set-property will be a new
> feature, as current CLI doesn't have that.
> > > > >
> > > > > Could you please share timelines of these features if its in
> roadmap.
> > > > > 1. "set-property"
> > > > > 2. “parameters”
> > > > >
> > > > > The only option currently is set to set sensitive parameters
> manually? or any other option through CLI
> > > > >
> > > > > Regards,
> > > > > Siva
> > > > >
> > > > > On 2019/08/22 04:10:46, Bryan Bende  wrote:
> > > > > > Currently there isn’t a good way with the CLI, we would need to
> add a
> > > > > > command like set-property that took the id of a component, and
> the name and
> > > > > > value for a property.
> > > > > >
> > > > > > The next release will have a new feature called “parameters”
> which will
> > > > > > solve this problem. You’ll be able to use a parameter in a
> sensitive
> > > > > > property and then use CLI to set parameter values.
> > > > > >
> > > > > > On Wed, Aug 21, 2019 at 12:29 PM sivasankar Reddy <
> mail2ms...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Bryan,
> > > > > > >
> > > > > > > Thanks for the update. Could you please mention about
> > > > > > > sensitive parameters aspect from deployment point of view
> using CLI.
> > > > > > >
> > > > > > > How to set the sensitive parameters values after pg-import.
> > > > > > >
> > > > > > > 

Re: Nifi registry - extension-repository, extensions, bundles

2019-07-30 Thread Andrew Grande
Mike,

These are upcoming features, not yet released. It's no surprise you don't
see full reference docs where you'd expect them.

Should probably ask if those will make it into NiFi 1.10.x or there's a
different plan.

Andrew

On Tue, Jul 30, 2019, 1:43 PM HWANG, MICHAEL (MICHAEL) <
mhw...@research.att.com> wrote:

> Hello
>
> I'm playing with Nifi and the Nifi registry and trying to understand the
> features and capabilities.  I noticed that in the Nifi registry API there
> are endpoints for "/bundles", "/extension-repository", "/extensions" but I
> don't understand how they are used nor intended to be used.  I've read the
> descriptions in the API docs but it's not substantial enough for me to
> understand.  I've combed through the Nifi docs and grepped through the
> source code but there are no references to these endpoints.  In my sandbox
> instances, I can link to the registry API just fine and version control my
> process groups/flows but the three endpoints all are empty.
>
> Can I get information on their intended use or examples or pointers to
> code or documentation?  These APIs are extensive and seem like a great way
> to extend Nifi's capabilities but I just don't know enough.
>
> Thanks
>
> Mike
>


Re: RedisLabs license changes

2019-02-23 Thread Andrew Grande
Is there a separate license for a client driver?

On Sat, Feb 23, 2019, 9:57 AM Mike Thomsen  wrote:

>
> https://news.slashdot.org/story/19/02/22/2155223/redis-changes-its-open-source-licenseagain
>
> The reporting on it is sorta ambiguous as to whether the core Redis has
> stopped being BSD or whether it's just RedisLabs' modules that are
> affected. I wanted to bring this up now for discussion because the initial
> reporting is very unfavorable on the implications for applications that
> use, let alone depend on Redis for what Redis typically does.
>
> For now Redis.io still says it's 3 Clause BSD, but I am going to try to
> look into it more closely.
>
> Mike
>


Re: [DISCUSS] Deprecate processors who have Record oriented counterpart?

2019-02-23 Thread Andrew Grande
I'm not sure deprecating is warranted. In my experience, record based
processors are very powerful, but have a steep learning curve the way they
are in NiFi today, and, frankly, simple things should be dead simple.

Now, moving the record UX towards an easy extreme affects this equation,
but e.g. I never open up a conversation with a new user by talking about
records, Schema Registry or NiFi Registry.

Maybe there's something coming up which I'm not aware yet? Please share.

Andrew

On Sat, Feb 23, 2019, 7:43 AM Sivaprasanna 
wrote:

> Team,
>
> Ever since the Record based processors were first introduced, there has
> been active development in improving the Record APIs and constant interest
> in introducing new set of Record oriented processors. It has gone to a
> level where almost all the processors that deal with mainstream tech have a
> Record based counterpart, such as the processors for MongoDB, Kafka, RDBMS,
> HBase, etc., These record based processors have overcome the limitations of
> the standard processors letting us build flows which are concise and
> efficient especially when we are dealing with structured data. And more
> over with the recent release of NiFi (1.9), we now have a new feature that
> offers schema inference capability which even simplifies the process of
> building flows with such processors. Having said that, I'm wondering if
> this is a right time to raise the talk of deprecating processors which the
> community believes has a much better record oriented counterpart, covering
> all the functionalities currently offered by the standard processor.
>
> There are a few things that has to be talked about, like how should the
> deprecated processor be displayed in the UI, etc., but even before going
> through that route, I want to understand the community's thoughts on this.
>
> Thanks,
> Sivaprasanna
>


Re: NiFi, Docker Environment Variable for enabling debugging of NiFi inside a Docker container

2019-02-13 Thread Andrew Grande
Here's what I did previously, Erik. A user only had to specify an
additional -debug flag for the main shell script which would in turn take
care of any bootstrap rewriting/generation for allowing for the remote JVM
debug session to connect. Maybe it could give a few ideas.

Andrew

On Wed, Feb 13, 2019, 7:34 AM Erik Anderson  wrote:

> > I was reading that email and was thinking of JVM debug options, with
> > suspend y/n. I guess it just shows we meant very different things by
> debug
> > mode. Maybe you could incorporate those into a PR too?
> >
> > Andrew
>
> Good point Andrew. and sorry for the slow response. I had to look at how
> NiFi set the JVM properties.
>
> We had issues with Java and needed to manually set env Java flags.
>
> It seems all of the Java debugging flags are set in the bootstrap.conf,
> located here.
>
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/bootstrap.conf
>
> For me, I chained together the Dockerfile
> ENTRYPOINT ["my_company_entry_point.sh"]
> then had it call the original entry point
> In short, all of NiFi properties are set via environment variables in a
> /etc/profile.d/nifi.sh on the host and propogate throughout the NiFi
> system. Like I listed in
>
> https://github.com/dprophet/nifi/blob/master/nifi-docker/dockerhub/CONFIGURATION.md
>
> I normalized all NIFI specific environment variables to start with NIFI_
>
> Why? Because you set the
>
> export NIFI_FOO="my foo"
>
> and its directly passed into the container and into the NiFi role account
> used to start the container. You now have access to the environment
> variables throughout the NiFi system.
>
> Example:
> docker run --name nifi --env-file <(env | grep NIFI_) --hostname nifi
>
> IMO, the Dockerfile entry point should allow a plug and play script so you
> can set these custom behaviors (both enterprise behaviors and custom
> developer/debugging). I doubt any enterprise will blindly pull a DockerHub
> container and run it. From my experience, a public container isnt
> enterprise friendly.
>
> Andrew, define what you want for JVM debugging, what you would want to set
> (and unset), and I will take a look.
>
> Erik Anderson
> Bloomberg
> https://www.linkedin.com/in/erikanderson/
>


Re: NiFi, Docker Environment Variable for enabling debugging of NiFi inside a Docker container

2019-02-08 Thread Andrew Grande
Erik,

I was reading that email and was thinking of JVM debug options, with
suspend y/n. I guess it just shows we meant very different things by debug
mode. Maybe you could incorporate those into a PR too?

Andrew

On Fri, Feb 8, 2019, 2:15 PM Erik Anderson  wrote:

> I have a slight improvement to Environment variables that the NiFI docker
> system uses to bootstrap its nifi.properties files
>
> I documented it under
>
> https://issues.apache.org/jira/browse/NIFI-6013
>
> The commit is here.
>
>
> https://github.com/dprophet/nifi/commit/ea31ac6bd8f00944166e3e230af2040636c0505c
>
> *
>
> There are times, when learning NiFi, that I want to stop and start NiFi
> with debugging on. If NiFi was running inside a Docker container it was
> hard.
>
> What I did was I created a new environment variable within the NiFi
> bootstrapping system it uses to start itself up inside a docker container.
> FYI, some of those environment variables are documented here.
>
> https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/README.md
>
> What I did was introduce a new environment variable
>
> NIFI_DEBUG
>
> And in file
>
> https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/sh/start.sh
>
> I added
>
> if [[ -n $NIFI_DEBUG ]] ; then
> . "${scripts_dir}/update_logback.sh"
> fi
>
> Basically what it does, if the NIFI_DEBUG is set
> Example:
>
>
> NIFI_DEBUG="org.apache.nifi.web.security=TRACE,org.apache.nifi.authorization=DEBUG,org.apache.nifi.web.api.config=TRACE"
>
> It updates the
> ${NIFI_HOME}/conf/logback.xml
>
> File to enable the debugging modes of NiFi
>
> This is a very fast mechanism to restart NiFi with debugging turned on.
> Very developer and workflow friendly.
>
> If of interest I will create a JIRA, put the code on a fork, and give a
> pull request.
>
> Erik Anderson
> Bloomberg
>


Re: Lowering the barrier of entry

2019-01-28 Thread Andrew Grande
ight content pass-through to reversing the text)
> > > > > > * here’s how to make a unit test (introduce the TestRunner
> > > framework, etc.)
> > > > > > 3. Running, building, installing
> > > > > > * Run your unit test from the IDE/maven
> > > > > > * Build the NAR module
> > > > > > * Install the NAR in NiFi lib/ or custom/
> > > > > > * Restart NiFi
> > > > > > * See the NAR loaded in the log
> > > > > > * Deploy the component on the canvas
> > > > > >
> > > > > > I imagine this being written more conversationally/blog-like than
> > > most of
> > > > > > our current reference documentation to be used as a split-screen
> > > > > > walkthrough. Each section could certainly link to the existing
> > > detailed
> > > > > > documentation for various topics, like the processor lifecycle,
> etc.
> > > > > >
> > > > > > Does this sounds like something that would have helped you?
> > > > > >
> > > > > > Andy LoPresto
> > > > > > alopre...@apache.org
> > > > > > alopresto.apa...@gmail.com
> > > > > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D
> EF69
> > > > > >
> > > > > > > On Jan 25, 2019, at 1:59 PM, James Srinivasan <
> > > > > > james.sriniva...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > 9) Oh, and the wiki is a little hard to navigate and the
> contents
> > > rather
> > > > > > patchy
> > > > > > >
> > > > > > > On Fri, 25 Jan 2019 at 21:57, James Srinivasan
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> As someone relatively new to NiFi dev, here's my £0.02. (Yes,
> I
> > > > > > >> realise I could and possibly should submit PRs :)
> > > > > > >>
> > > > > > >> 1) I'm used to Java and Maven, so used the archetype. It
> worked
> > > fine,
> > > > > > >> it would have been nice it if set up unit tests for me.
> > > > > > >> 2) The User and Developer documentation is great and
> > > comprehensive.
> > > > > > >> Finding the developer docs is a little painful (handful of
> items
> > > at
> > > > > > >> the end of a scrolling list of 200+ processors)
> > > > > > >> 3) The Developer docs could possibly do with a little more
> > > clarity on
> > > > > > >> processor lifetime i.e. what is called when ^h^h^h - skimming
> back
> > > > > > >> over the docs, it looks pretty clear now
> > > > > > >> 4) Some example code for common operations e.g.
> getting/setting
> > > > > > >> attributes or reading/writing/modifying flowfile content
> would be
> > > > > > >> great.
> > > > > > >> 5) When using existing processors for inspiration, best
> practices
> > > > > > >> weren't always clear e.g. some generated properties inside
> > > > > > >> getSupportedPropertyDescriptors(), others generated a private
> > > static
> > > > > > >> list on init and returned that. Such differences are
> inevitable
> > > in a
> > > > > > >> large project, but it would be nice to have something blessed
> to
> > > start
> > > > > > >> from.
> > > > > > >> 6) (Minor niggle - layout of the docs doesn't work great on a
> > > phone
> > > > > > screen)
> > > > > > >> 7) I couldn't find (m?)any docs about the Groovy scripting
> API,
> > > but
> > > > > > >> the great blog posts by Matt Burgess and others were
> invaluable
> > > > > > >> 8) In case this all sounds too negative, NiFi is fab!
> > > > > > >>
> > > > > > >> On Fri, 25 Jan 2019 at 18:47, Andrew Grande <
> apere...@gmail.com>
> > > wrote:
> > > > > > >>>
> > > > > > >>> I am not against the archetype. But we need to spell out
> every
> > > step of
> > > > > > the
&g

Re: Lowering the barrier of entry

2019-01-28 Thread Andrew Grande
ssibly do with a little more clarity
> on
> > > >> processor lifetime i.e. what is called when ^h^h^h - skimming back
> > > >> over the docs, it looks pretty clear now
> > > >> 4) Some example code for common operations e.g. getting/setting
> > > >> attributes or reading/writing/modifying flowfile content would be
> > > >> great.
> > > >> 5) When using existing processors for inspiration, best practices
> > > >> weren't always clear e.g. some generated properties inside
> > > >> getSupportedPropertyDescriptors(), others generated a private static
> > > >> list on init and returned that. Such differences are inevitable in a
> > > >> large project, but it would be nice to have something blessed to
> start
> > > >> from.
> > > >> 6) (Minor niggle - layout of the docs doesn't work great on a phone
> > > screen)
> > > >> 7) I couldn't find (m?)any docs about the Groovy scripting API, but
> > > >> the great blog posts by Matt Burgess and others were invaluable
> > > >> 8) In case this all sounds too negative, NiFi is fab!
> > > >>
> > > >> On Fri, 25 Jan 2019 at 18:47, Andrew Grande 
> wrote:
> > > >>>
> > > >>> I am not against the archetype. But we need to spell out every
> step of
> > > the
> > > >>> way. I'd like to see a user thinking about their custom logic ASAP
> > > rather
> > > >>> than fighting the tools to get started. Those steps should be
> > > brain-dead,
> > > >>> just reflexes, if you know what I mean. Hell, let them create a
> custom
> > > >>> processor project or prototype in a script by accident even! :)
> > > >>>
> > > >>> On Fri, Jan 25, 2019, 10:43 AM Bryan Bende 
> wrote:
> > > >>>
> > > >>>> That makes sense about the best practice for deploying to an
> > > >>>> additional lib directory.
> > > >>>>
> > > >>>> So for the project structure you are saying it would be easier to
> have
> > > >>>> a repo somewhere with essentially the same thing that is in the
> > > >>>> archetype, but they just clone it and rename it themselves (what
> the
> > > >>>> archetype does for you)?
> > > >>>>
> > > >>>> Something that I think would be awesome is if we could provide a
> > > >>>> web-based project initializer that would essentially run the
> archetype
> > > >>>> behind the scenes and then let you download the archive of the
> code,
> > > >>>> just like the spring-boot starter [1]. Not sure if their
> initializr is
> > > >>>> something that can be re-used and customized [2].
> > > >>>>
> > > >>>> The problem is we would need to host that somewhere.
> > > >>>>
> > > >>>> [1] https://start.spring.io/
> > > >>>> [2] https://github.com/spring-io/initializr
> > > >>>>
> > > >>>> On Fri, Jan 25, 2019 at 12:56 PM Andrew Grande <
> apere...@gmail.com>
> > > wrote:
> > > >>>>>
> > > >>>>> We assume they create new projects from archetypes every day.
> They
> > > don't.
> > > >>>>>
> > > >>>>> We also assume they know how to deploy new NARs. Most don't.
> > > Especially
> > > >>>> if
> > > >>>>> we want them to follow best practices and create an additional
> NAR
> > > >>>> bundles
> > > >>>>> directory entry im the config (vs dumping into nifi lib).
> > > >>>>>
> > > >>>>> I can attest that I feel a bit lost myself every time I need to
> come
> > > back
> > > >>>>> to this and refresh my brain synapses. If we could make these not
> > > require
> > > >>>>> any of that and make simple thinga dead simple
> > > >>>>>
> > > >>>>> Andrew
> > > >>>>>
> > > >>>>> On Fri, Jan 25, 2019, 9:47 AM Bryan Bende 
> wrote:
> > > >>>>>
> > > >>>>>> Andrew,
> > > >>>>>>
> > > >>>>>> I'm not disagreeing w

Re: Lowering the barrier of entry

2019-01-25 Thread Andrew Grande
I am not against the archetype. But we need to spell out every step of the
way. I'd like to see a user thinking about their custom logic ASAP rather
than fighting the tools to get started. Those steps should be brain-dead,
just reflexes, if you know what I mean. Hell, let them create a custom
processor project or prototype in a script by accident even! :)

On Fri, Jan 25, 2019, 10:43 AM Bryan Bende  wrote:

> That makes sense about the best practice for deploying to an
> additional lib directory.
>
> So for the project structure you are saying it would be easier to have
> a repo somewhere with essentially the same thing that is in the
> archetype, but they just clone it and rename it themselves (what the
> archetype does for you)?
>
> Something that I think would be awesome is if we could provide a
> web-based project initializer that would essentially run the archetype
> behind the scenes and then let you download the archive of the code,
> just like the spring-boot starter [1]. Not sure if their initializr is
> something that can be re-used and customized [2].
>
> The problem is we would need to host that somewhere.
>
> [1] https://start.spring.io/
> [2] https://github.com/spring-io/initializr
>
> On Fri, Jan 25, 2019 at 12:56 PM Andrew Grande  wrote:
> >
> > We assume they create new projects from archetypes every day. They don't.
> >
> > We also assume they know how to deploy new NARs. Most don't. Especially
> if
> > we want them to follow best practices and create an additional NAR
> bundles
> > directory entry im the config (vs dumping into nifi lib).
> >
> > I can attest that I feel a bit lost myself every time I need to come back
> > to this and refresh my brain synapses. If we could make these not require
> > any of that and make simple thinga dead simple
> >
> > Andrew
> >
> > On Fri, Jan 25, 2019, 9:47 AM Bryan Bende  wrote:
> >
> > > Andrew,
> > >
> > > I'm not disagreeing with your points, but I'm curious how you see
> > > those two ideas being different from the processor archetype and the
> > > wiki page with the archetype commands?
> > >
> > > Is it just that people don't know about it?
> > >
> > > -Bryan
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions
> > >
> > > On Fri, Jan 25, 2019 at 12:23 PM Otto Fowler 
> > > wrote:
> > > >
> > > > I think this ties into my other discuss thread on refreshing the
> > > archetypes
> > > >
> > > >
> > > > On January 25, 2019 at 11:50:10, Andrew Grande (apere...@gmail.com)
> > > wrote:
> > > >
> > > > I consistently see my users struggling when they move up the nifi
> food
> > > > chain and start looking at custom processors. The good content about
> > > > prototyping processsors via scripting processors and finalizing with
> a
> > > full
> > > > NAR bundle is everywhere but where it should be.
> > > >
> > > > A few simple changes could help (not *more* docs). They are great,
> much
> > > > better than in many other projecta, but people are already drowning
> in
> > > > those.
> > > >
> > > > How about:
> > > >
> > > > + ISP has a pre-populated processor sceleton. A simple no-op to fill
> in
> > > is
> > > > miles better than a blank text area (which invokes a blank stare).
> > > >
> > > > + As much as we may loook down on this, but... A simple guide to a
> full
> > > NAR
> > > > build as a series of copy/paste commands.
> > > >
> > > > There's more, but this should fit the context for now.
> > > >
> > > > Andrew
> > > >
> > > > On Fri, Jan 25, 2019, 8:13 AM Mike Thomsen 
> > > wrote:
> > > >
> > > > > One of the changes we should make is to create a separate guide for
> > > > product
> > > > > vendors on how to build and maintain a bundle. We're at that point
> > > where
> > > > > vendors will have to do it on their own as extension providers, so
> it
> > > > would
> > > > > be very helpful for them to have a simple and straight forward
> document
> > > > > showing them what should be there, best practices for
> maintainability
> > > and
> > > > > where to announce it.
> > > > >
> > > > > On Fri, Jan 25, 2019 at 9:59 AM Br

Re: Lowering the barrier of entry

2019-01-25 Thread Andrew Grande
We assume they create new projects from archetypes every day. They don't.

We also assume they know how to deploy new NARs. Most don't. Especially if
we want them to follow best practices and create an additional NAR bundles
directory entry im the config (vs dumping into nifi lib).

I can attest that I feel a bit lost myself every time I need to come back
to this and refresh my brain synapses. If we could make these not require
any of that and make simple thinga dead simple

Andrew

On Fri, Jan 25, 2019, 9:47 AM Bryan Bende  wrote:

> Andrew,
>
> I'm not disagreeing with your points, but I'm curious how you see
> those two ideas being different from the processor archetype and the
> wiki page with the archetype commands?
>
> Is it just that people don't know about it?
>
> -Bryan
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions
>
> On Fri, Jan 25, 2019 at 12:23 PM Otto Fowler 
> wrote:
> >
> > I think this ties into my other discuss thread on refreshing the
> archetypes
> >
> >
> > On January 25, 2019 at 11:50:10, Andrew Grande (apere...@gmail.com)
> wrote:
> >
> > I consistently see my users struggling when they move up the nifi food
> > chain and start looking at custom processors. The good content about
> > prototyping processsors via scripting processors and finalizing with a
> full
> > NAR bundle is everywhere but where it should be.
> >
> > A few simple changes could help (not *more* docs). They are great, much
> > better than in many other projecta, but people are already drowning in
> > those.
> >
> > How about:
> >
> > + ISP has a pre-populated processor sceleton. A simple no-op to fill in
> is
> > miles better than a blank text area (which invokes a blank stare).
> >
> > + As much as we may loook down on this, but... A simple guide to a full
> NAR
> > build as a series of copy/paste commands.
> >
> > There's more, but this should fit the context for now.
> >
> > Andrew
> >
> > On Fri, Jan 25, 2019, 8:13 AM Mike Thomsen 
> wrote:
> >
> > > One of the changes we should make is to create a separate guide for
> > product
> > > vendors on how to build and maintain a bundle. We're at that point
> where
> > > vendors will have to do it on their own as extension providers, so it
> > would
> > > be very helpful for them to have a simple and straight forward document
> > > showing them what should be there, best practices for maintainability
> and
> > > where to announce it.
> > >
> > > On Fri, Jan 25, 2019 at 9:59 AM Bryan Bende  wrote:
> > >
> > > > I think we have a lot more documentation than most projects, but I
> > > > think an issue is that content is scattered in many different
> > > > locations, and some of the docs are huge reference guides where it
> can
> > > > be hard to find all the pieces of what you are trying to do.
> > > >
> > > > The first thing a new contributor wants to do is get the code and run
> > > > a build, and we do have a quick-start guide linked to on the site,
> but
> > > > I think there is a lot of extra information in there that is not
> > > > really relevant to someone just wanting get the code and build. We
> > > > could have separate guides per OS like "Build NiFi on Linux", "Build
> > > > NiFi on Windows", etc, where each guide was 4-5 steps like:
> > > >
> > > > - Clone repo
> > > > - checkout master
> > > > - run maven
> > > > - cd to assembly
> > > > - ./bin/nifi.sh
> > > >
> > > > The next thing they want to do is contribute a change, and we have a
> > > > great contributor guide, but again I think there could be a very
> short
> > > > tutorial for the most common steps:
> > > >
> > > > - fork repo
> > > > - clone fork
> > > > - create branch
> > > > - make changes
> > > > - push branch
> > > > - submit pr
> > > >
> > > > and then say something like "for a more detailed description of the
> > > > contribution process, please reference the Contributor Guide".
> > > >
> > > > If we then make these getting started guides more prominent right in
> > > > the middle of the NiFi homepage, then maybe they will be easier to
> > > > find for new community members.
> > > >
> > > > We can keep extending this idea to other common 

Re: Lowering the barrier of entry

2019-01-25 Thread Andrew Grande
I consistently see my users struggling when they move up the nifi food
chain and start looking at custom processors. The good content about
prototyping processsors via scripting processors and finalizing with a full
NAR bundle is everywhere but where it should be.

A few simple changes could help (not *more* docs). They are great, much
better than in many other projecta, but people are already drowning in
those.

How about:

+ ISP has a pre-populated processor sceleton. A simple no-op to fill in is
miles better than a blank text area (which invokes a blank stare).

+ As much as we may loook down on this, but... A simple guide to a full NAR
build as a series of copy/paste commands.

There's more, but this should fit the context for now.

Andrew

On Fri, Jan 25, 2019, 8:13 AM Mike Thomsen  wrote:

> One of the changes we should make is to create a separate guide for product
> vendors on how to build and maintain a bundle. We're at that point where
> vendors will have to do it on their own as extension providers, so it would
> be very helpful for them to have a simple and straight forward document
> showing them what should be there, best practices for maintainability and
> where to announce it.
>
> On Fri, Jan 25, 2019 at 9:59 AM Bryan Bende  wrote:
>
> > I think we have a lot more documentation than most projects, but I
> > think an issue is that content is scattered in many different
> > locations, and some of the docs are huge reference guides where it can
> > be hard to find all the pieces of what you are trying to do.
> >
> > The first thing a new contributor wants to do is get the code and run
> > a build, and we do have a quick-start guide linked to on the site, but
> > I think there is a lot of extra information in there that is not
> > really relevant to someone just wanting get the code and build. We
> > could have separate guides per OS like "Build NiFi on Linux", "Build
> > NiFi on Windows", etc, where each guide was 4-5 steps like:
> >
> > - Clone repo
> > - checkout master
> > - run maven
> > - cd to assembly
> > - ./bin/nifi.sh
> >
> > The next thing they want to do is contribute a change, and we have a
> > great contributor guide, but again I think there could be a very short
> > tutorial for the most common steps:
> >
> > - fork repo
> > - clone fork
> > - create branch
> > - make changes
> > - push branch
> > - submit pr
> >
> > and then say something like "for a more detailed description of the
> > contribution process, please reference the Contributor Guide".
> >
> > If we then make these getting started guides more prominent right in
> > the middle of the NiFi homepage, then maybe they will be easier to
> > find for new community members.
> >
> > We can keep extending this idea to other common tasks beyond just
> > building and contributing.
> >
> >
> > On Thu, Jan 24, 2019 at 8:03 PM Andy LoPresto 
> > wrote:
> > >
> > > Hi folks,
> > >
> > > Based on some recent (and long-term) experiences, I wanted to discuss
> > with the community what we could do to lower the barrier of entry to
> using
> > & contributing to NiFi. I hope to get some good feedback from both
> > long-time and newer members, and determine some immediate concrete steps
> we
> > can take.
> > >
> > > Problems identified:
> > > * NiFi has a number of custom profiles, so a simple “mvn clean install”
> > in project root doesn’t get a new developer up and running immediately
> > > * The API is very well defined, but for new contributors, it can be a
> > challenge to know where to put functionality, and building a custom
> > processor + NAR and deploying isn’t a one-step process
> > > * Project size (and build size/time) is large. This can restrict the
> > minimum hardware necessary, elongate the development cycle, etc.
> > > * Some new users do not receive mailing list replies
> > >
> > > Possible solutions:
> > > * On a clean git clone, “mvn clean install” should build a working
> > instance. Maybe we provide a quickstart.sh script to handle the default
> > maven build, change to the target directory, and start NiFi?
> > > * Individual contributors have written excellent blogs, and
> > documentation exists, but making it more prominent or more easily
> accessed
> > could help?
> > > * Extension registry will solve all the world’s problems (related to
> > bundling and build time)
> > > * Not sure about this one — I don’t know if it’s because they’re not
> > subscribed, their mail client is blocking them, etc.
> > >
> > > I’ve said my bit, now I am eager to hear from other community members
> on
> > their experiences, steps that helped them, and suggestions for the future
> > to continue to make the NiFi community welcoming to new users. Thanks.
> > >
> > >
> > > Andy LoPresto
> > > alopre...@apache.org
> > > alopresto.apa...@gmail.com
> > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> > >
> >
>


Re: [DISCUSS] Extension Registry

2018-11-13 Thread Andrew Grande
I would like to see a clear separation of blob and metadata storage. Most
often you'd see some object storage being already distributed and
replicated, let's think about an easy way to backup or migrate the metadata
between registry instances.

Andrew

On Tue, Nov 13, 2018, 11:32 AM Michael Moser  wrote:

> I have thought about this in the past, too.  Here's a scenario where I
> could never really lay down an approach I was happy with.
>
> Consider that a NiFi user searches the NiFi registry and finds the PutMongo
> processor.  Registry knows the PutMongo processor is in the
> nifi-mongodb-nar, and through its Nar-Dependency-Id that it has a
> dependency on a controller service interface in
> nifi-mongodb-client-service-api-nar.  Great, the user can then download and
> install those two nars.  How would we then suggest that the user also needs
> a MongoDBClientService controller service implementation, such as that in
> the nifi-mongodb-services-nar?
>
> I'm not looking for an answer now, of course, but I just wanted to feed the
> discussion.
>
> Thanks,
> -- Mike
>
>
> On Tue, Nov 13, 2018 at 1:34 PM Bryan Bende  wrote:
>
> > Mark,
> >
> > I think there are a couple of ways that could be solved, but
> > ultimately it would be up to how the users choose to setup/manage the
> > registry, or registries.
> >
> > The NiFi Registry security model is based around permissions to
> > buckets (read/write), and all versioned items belong to a bucket. So
> > you could think of each bucket as a mini extension repository for
> > which you can control access to specific users and groups, so there
> > could be a bucket of extensions for each of the NiFi instances in your
> > example. There can also be multiple registry instances registered with
> > a given NiFi so extensions can be pulled from multiple registries.
> >
> > The extensions an instance needs is based on the flows it is running,
> > the flows already have specific bundle coordinates for every component
> > in the flow. You can think of it similar to Maven where you declared a
> > dependency on library foo and the build goes out and gets it for you,
> > in this case it is a flow that declares a dependency on a bundle.
> >
> > Mike,
> >
> > Bundles would need to be uploaded to NiFi Registry (to a specific
> > bucket) as part of some TBD release process. At a minimum I was
> > envisioning NiFi CLI commands that can be pointed to a file or a
> > directory and upload the given bundles to registry. There could be
> > other options as well, possibly through a Maven plugin to release
> > directly into registry, or possibly to have a type of extension in
> > NiFi Registry that actually points to an external location, i.e. all
> > the NARs that end up in Maven central could somehow be imported into
> > NiFi Registry, but with pointers back to the content which actually
> > comes from Maven central. Lot of things to figure out here.
> >
> > -Bryan
> > On Tue, Nov 13, 2018 at 1:16 PM Joe Witt  wrote:
> > >
> > > Group selection based on tag names for bundles could probably do that.
> > > Meaning it could be a sorting/filtering mechanism in the NiFi/Registry
> > > interface perhaps.  Will be good to consider that UX as that
> > > progresses.
> > >
> > > As far as the different environments NiFi instances would certainly be
> > > able to load only referenced Nars for versioned flows so you'll get
> > > the optimal set (at runtime) automatically.  Very powerful.
> > > On Tue, Nov 13, 2018 at 1:12 PM Mark Bean 
> wrote:
> > > >
> > > > Joe,
> > > >
> > > > I envision the Registry being able to provide a subset of NARs
> > required for
> > > > a specific NiFi instance. The user may have a relatively small set of
> > NARs
> > > > required for a NiFi used for basic routing/distribution, and a
> > different
> > > > more extensive set of NARs required for a more robust NiFi instance
> > which
> > > > performs various forms of processing/transformations. The grouping I
> am
> > > > describing would be a way to select multiple NARs required for a
> > specific
> > > > NiFi instance.
> > > >
> > > > Expanding the scenario a little farther, suppose an integration/test
> > > > environment defines the group. Then, the production environment can
> > use the
> > > > group definition to pull (or ensure it possesses) only the relevant
> > NARs
> > > > necessary.
> > > >
> > > > -Mark
> > > >
> > > > On Tue, Nov 13, 2018 at 1:00 PM Joe Witt  wrote:
> > > >
> > > > > Mark
> > > > >
> > > > > Can you describe your use case from the user perspective both for
> the
> > > > > entity that would upload the items and demarcate them as a group as
> > > > > well as the user that would consume those bundles?
> > > > >
> > > > > I ask because the point here is that nars are themselves a 'group'
> in
> > > > > that they are a logical/contained grouping of extensions.  These
> can
> > > > > have relationships to other nars as we know.  And flows are
> designed
> > > > > against specific components that 

Re: ExecuteGroovyScript processor unable to resolve IOUtils

2018-04-22 Thread Andrew Grande
Max, which NiFi version are you using. Can you try adding the @Grab
annotation in your script declaring the commons-io dependency? IIRC, it was
added recently.

Andrew

On Sun, Apr 22, 2018, 7:38 AM Max Viazovskyi  wrote:

> Recently I needed to write custom script to override flow file content,
> when
> script was prepared I fount that it can be executed with ExecuteScript
> processor, but ExecuteGroovyScript fails to compile the same script, it
> shows the error: unable to resolve class import
> org.apache.commons.io.IOUtils.
> <
> http://apache-nifi-developer-list.39713.n7.nabble.com/file/t949/groovy_script_error.png>
>
>
> Script is the following:
> import org.apache.commons.io.IOUtils
> import java.nio.charset.*
>
> def flowFile = session.get()
> if (!flowFile) return
>
>
> flowFile = session.write(flowFile,
> { inputStream, outputStream ->
> def text = org.apache.commons.io.IOUtils.toString(inputStream,
> StandardCharsets.UTF_8)
> text = text + '\n' + new Date()
> outputStream.write(text.getBytes(StandardCharsets.UTF_8))
> } as StreamCallback)
>
> session.transfer(flowFile, ExecuteScript.REL_SUCCESS)
>
> Also you could check it with template  GroovyScriptIsInvalid.xml
> <
> http://apache-nifi-developer-list.39713.n7.nabble.com/file/t949/GroovyScriptIsInvalid.xml>
>
>
> Thanks,
> Max
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>


Re: ListSFTP incoming relationship

2018-03-27 Thread Andrew Grande
The key here is that ListXXX processor maintains state. A directory is part
of such state. Allowing arbitrary directories via an expression would
create never ending stream of new entries in the state storage, effectively
engineering a distributed DoS attack on the NiFi node or shared ZK quorum
(for when state is stored in there).

Maybe if we focus on thinking about assumptions and restrictions the
processor should make to contain that risk...

Andrew

On Tue, Mar 27, 2018, 9:56 AM Bryan Bende  wrote:

> I'm not sure that would solve the problem because you'd still be
> limited to one directory. What most people are asking for is the
> ability to use a dynamic directory from an incoming flow file.
>
> I think we might be trying to fit two different use-cases into one
> processor which might not make sense.
>
> Scenario #1... There is a directory that is constantly receiving new
> data and has a significant amount of files, and I want to periodically
> find new files. This is what the current processors are optimized for.
>
> Scenario #2... There is a directory that is mostly static with a
> moderate/small number of files, and at points in my flow I want to
> dynamically perform a listing of this directory and retrieve the
> files. This is more geared towards the mentality of running a
> job/workflow.
>
>
>
>
> On Tue, Mar 27, 2018 at 9:36 AM, Otto Fowler 
> wrote:
> > What if the changes where ‘on top of’ some base set of properties, like
> > directory?
> > Like a filter, where if present from the incoming file will have the
> LIST*
> > list only things
> > that match a name or attribute?
> >
> >
> >
> > On March 27, 2018 at 00:08:41, Joe Witt (joe.w...@gmail.com) wrote:
> >
> > Scott
> >
> > This idea has come up a couple of times and there is definitely
> > something intriguing to it. Where I think this idea stalls out though
> > is in implementation.
> >
> > While I agree that the other List* processors might similarly benefit
> > lets focus on ListFile. Today you tell ListFile what directory to
> > start looking for files in. It goes off scanning that directory for
> > hits and stores state about what it has already searched/seen. And it
> > is important to keep track of how much it has already scanned because
> > at times the search directory can be massive (100,000s of thousands or
> > more files and directories to scan for example).
> >
> > In the proposed model the directory to be scanned could be provided
> > dynamically by looking at an attribute of an incoming flowfile (or
> > other criteria can be provided - not just the directory to scan). In
> > this case the ListFile processor goes on scanning against that now.
> > What about the previous directory (or directories) it was told to
> > scan? Does it still track those too? What if it starts scanning the
> > newly provided directory, hasn't finished pulling all the data or new
> > data is continually arriving, and it is told to switch to another
> > directory.
> >
> > I think if those questions can get solid answers and someone invests
> > time in creating a PR then this could be pretty powerful. Would be
> > good to see a written description of the use case(s) for this too.
> >
> > Thanks
> > Joe
> >
> > On Mon, Mar 26, 2018 at 11:58 PM, scott  wrote:
> >> Hello Devs,
> >>
> >> I would like to request a feature to a major processor, ListSFTP. But
> > before
> >> I do down the official road, I wanted to ask if anyone thought it was a
> >> terrible idea or impossible, etc. The request is to add support for an
> >> incoming relationship to the ListSFTP processor specifically, but I
> could
> >> see it added to many of the commonly used head processes, such as
> > ListFile.
> >> I would envision functionality more like InvokeHTTP or ExecuteSQL, where
> > an
> >> incoming flow file could initiate the action, and the attributes in the
> >> incoming flow file could be used to configure the processor actions.
> It's
> >> the configuration aspect that most appeals to me, because it opens it up
> > to
> >> being centrally or dynamically configured.
> >>
> >> Thanks,
> >>
> >> Scott
> >>
>


Re: [EXT] Re: NiFi Versioned Process Group Status Icons

2018-03-26 Thread Andrew Grande
I would factor in any plans for a local version history, it may add new
requirements to this status icon. Just saying there's more to that logic
potentially.

Andrew

On Sun, Mar 25, 2018, 10:17 PM Peter Wicks (pwicks) 
wrote:

> Matt,
>
> If you really want to enable awareness of this feature, enable the
> "Version" menu option even if no NiFi Registry has been enabled, and when
> the user clicks, "Start version control" take them to the help section for
> starting up a version control server and registering it.
>
> The existing "Up to date" and "Stale" status icons that show in the upper
> left corner are great. They only show up if the Process Group is versioned,
> and they aren't distracting you from the status of your flow in general;
> simple single icon. But the five new status icons at the bottom, even if I
> am actively using NiFi Registry, feel very busy and distracting.
>
> I am excited about NiFi Registry. I am just starting to use it and I think
> it's going to solve a lot of issues.
>
> Thanks,
>   Peter
>
> -Original Message-
> From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
> Sent: Saturday, March 24, 2018 04:18
> To: dev@nifi.apache.org
> Subject: [EXT] Re: NiFi Versioned Process Group Status Icons
>
> Peter,
>
> The status icon for a specific Process Group is hidden until that group is
> versioned. That icon is positioned next to the group name.
>
> For the icons in the status bar and in the bottom of the Process Group, we
> were ultimately just trying to remain consistent. These reflect the counts
> of the encapsulated versioned Process Groups and does not include itself.
> Like we have with the status icons for the encapsulated Processors and
> Ports. Even if this Process Group is not configured to have any
> encapsulated components, we still render the counts as zero.
>
> Additionally, the presence of the icons helps drive awareness of the
> feature.
>
> Rob or Drew may have some additional insight to add. Hope this helps
>
> Matt
>
> On Fri, Mar 23, 2018 at 12:10 AM, Peter Wicks (pwicks) 
> wrote:
>
> > Why does NiFi show status icons for Versioned Process Group's on
> > servers that are not configured to connect to a NiFi Registry?
> >
> > Thanks,
> >   Peter
> >
>


Re: [DISCUSS] MiNiFi Flow Designer

2018-03-01 Thread Andrew Grande
My 2c.

Using NiFi registry for both NiFi and MiNiFi flows is ideal. For the
designer, IMO, adding another process/service to host it is not too great,
as it adds to the complexity of overall ecosystem (the nifi/minifi java/cpp
is already quite involved with c2 server, etc). I would like to see a
familiar NiFi UI concept evolve to have a 'profile' e.g. on a group level.
This profile would dictate which processors are available for that
particular runtime. The same profile eould be associated with a flow in the
registry.

The main concept is to stay within the same familiar design environment for
all deployment targets, but make NiFi understand what set of available
components and behaviors are available for the current target at hand.

Andrew

On Thu, Mar 1, 2018, 7:30 PM Jeff Zemerick  wrote:

> I think it sounds like a good idea. The majority of the MiNiFi flows I have
> made are all fairly simple and I have not found the YAML authoring to be
> overly burdensome. The more complex flows were done in NiFi and converted.
> So as long as the option to make the flows by hand remains (and I don't
> think there was anything in the proposal to the contrary), I think it would
> be a beneficial addition.
>
> Jeff
>
>
> On Thu, Mar 1, 2018 at 2:21 PM, Scott Aslan  wrote:
>
> > AndyC/Aldrin,
> >
> > The UI/UX design system sub project currently under discussion will be
> > based off of the work completed during the NiFi Registry and will not
> > contain any UI/UX functionality from NiFi. The long term goal would be to
> > upgrade angular in the NiFi UI in order to leverage this new design
> system.
> >
> > -Scotty
> >
> >
> > On Thu, Mar 1, 2018 at 1:45 PM, Kevin Doran  wrote:
> >
> > > Aldrin,
> > >
> > > Thanks a lot for taking the time to write up such a detailed and well
> > > thought out proposal.
> > >
> > > I really like the idea of publishing flows to NiFi Registry that then
> get
> > > deployed to MiNiFi agents via the C2 Server. I don't have much to add
> > > there, and I think that you've identified one of the big missing parts
> to
> > > make that experience user-friendly -- how to design and author the
> flows
> > in
> > > the first place.
> > >
> > > The MiNiFi Flow Designer would fill that need in a really elegant way
> > with
> > > a low learning curve for NiFi users. Having the ability to design and
> > > author flows in an environment other than the "runtime" environment one
> > is
> > > targeting would be a huge step forward. Not just for MiNiFi, but
> > > multi-environment deployments of NiFi, where one might be developing a
> > flow
> > > in an environment other than where they intend to run it. This would
> make
> > > the SDLC options for NiFi flows even more flexible, which is a big part
> > of
> > > the goal of the new NiFi Registry.
> > >
> > > It certainly presents some challenges, but I definitely think the
> upside
> > > for MiNiFi flow authoring would be huge.
> > >
> > > Thanks,
> > > Kevin
> > >
> > > On 2/28/18, 07:53, "Aldrin Piri"  wrote:
> > >
> > > It appears I accidentally left off the actual links.  Here they are
> > in
> > > all
> > > their glory:
> > >
> > > [1]
> > > https://cwiki.apache.org/confluence/display/MINIFI/
> > > MiNiFi+Command+and+Control
> > > [2]
> > > https://github.com/apache/nifi-minifi/blob/master/
> > > minifi-docs/src/main/markdown/System_Admin_Guide.md#config-file
> > > [3] https://issues.apache.org/jira/browse/MINIFICPP-36
> > > [4] https://github.com/apache/nifi-minifi/tree/master/minifi-c2
> > > [5]
> > > https://github.com/apache/nifi-minifi/tree/master/
> > > minifi-c2#configuration-providers
> > > [6]
> > > https://github.com/apache/nifi-minifi/blob/master/
> > > minifi-docs/src/main/markdown/System_Admin_Guide.md#
> > > automatic-warm-redeploy
> > > [7]
> > > https://cwiki.apache.org/confluence/display/NIFI/
> > > Extension+Repositories+%28aka+Extension+Registry%29+for+
> > > Dynamically-loaded+Extensions
> > > [8]
> > > https://cwiki.apache.org/confluence/display/MINIFI/
> > >
> MiNiFi+Command+and+Control#MiNiFiCommandandControl-FlowAuthorshipDetails
> > >
> > >
> > > On Wed, Feb 28, 2018 at 1:53 AM, Aldrin Piri  >
> > > wrote:
> > >
> > > > Hey folks,
> > > >
> > > > With the release of Registry, I’ve been contemplating leveraging
> it
> > > and
> > > > the current codebase of NiFi to facilitate ease of use for MiNiFi
> > > flow
> > > > design.  One of the areas I would like to form a concerted effort
> > > around is
> > > > that of the Command and Control (C2) functionality originally
> > > presented
> > > > when the MiNiFi effort was started and further expounded upon
> with
> > a
> > > > feature proposal [1].  In that proposal, while the names are
> dated,
> > > we have
> > > > components that fulfill some of the core building 

Re: Re: Re: 答复: Re: Is there a REST API to run a dataflow on demand?

2018-02-22 Thread Andrew Grande
One could write a script and call it in 1 step. I don't believe there is
anything available OOTB.

Andrew

On Thu, Feb 22, 2018, 7:58 PM  wrote:

>  Thanks a lot for your help.
>
> Yes. that is what I do to trigger a dataflow on demand.
> But I want to know if there is an API that I can do this in one step.
>
>
>
> 发件人:
> "Daniel Chaffelson" 
> 收件人:
> dev@nifi.apache.org
> 日期:
> 2018/02/23 04:46
> 主题:
> Re: Re: 答复: Re: Is there a REST API to run a dataflow on demand?
>
>
>
> Hi Boying,
>
> I have been working on a NiFi Python Client SDK that might help you here,
> as the goal is to be able to replicate everyday actions taken in the NiFi
> GUI as well as extending it for CICD/SDLC work.
> For example with the following commands you would:
>
>1. get the reference object for a processor
>2. stop it if it is running
>3. change the scheduling period to 3s (or most other parameters)
>4. start it again
>
>
> import nipyapi
> processor_state_1 = nipyapi.canvas.get_processor('MyProcessor')
> nipyapi.canvas.schedule_processor(processor, scheduled=False)
> update = nipyapi.nifi.ProcessorConfigDTO(
> scheduling_period='3s'
> )
> processor_state_2 = nipyapi.canvas.update_processor(processor, update)
> nipyapi.canvas.schedule_processor(processor, scheduled=True)
>
> If you need a different set of steps then please let me know and perhaps I
> can help.
> Those commands are currently in the master branch awaiting release:
> https://github.com/Chaffelson/nipyapi
>
> Thanks,
> Dan
>
> On Thu, Feb 22, 2018 at 7:41 AM  wrote:
>
> > Thanks very much, I'll try your suggestions.
> >
> >
> >
> > 发件人:
> > James Wing 
> > 收件人:
> > NiFi Dev List 
> > 日期:
> > 2018/02/22 14:05
> > 主题:
> > Re: 答复: Re: Is there a REST API to run a dataflow on demand?
> >
> >
> >
> > The NiFi API can be used to start and stop processors or process groups,
> > and this might solve your use case.  But NiFi does not have an API to
> run
> > a
> > processor only once, immediately, separate from its configured schedule.
> I
> > have solved similar problems in the past by creating two separate
> upstream
> > sources - one for scheduled operation, and one for ad-hoc operation.
> > GenerateFlowFile, GetFile, or similar processors can be used to inject a
> > flowfile where you need to kick off the flow.
> >
> > Thanks,
> >
> > James
> >
> > On Wed, Feb 21, 2018 at 5:57 PM,  wrote:
> >
> > > Thanks a lot.
> > >
> > > But I want to know if there is a REST API that triggers a dataflow on
> > > demand?
> > > I don't find the API in the page.
> > >
> > >
> > >
> > >
> > > 发件人:
> > > Charlie Meyer 
> > > 收件人:
> > > dev@nifi.apache.org
> > > 日期:
> > > 2018/02/22 09:36
> > > 主题:
> > > Re: Is there a REST API to run a dataflow on demand?
> > >
> > >
> > >
> > > Yep, when you make the changes in the UI, open developer tools in your
> > > browser and see what calls to the nifi api it is making then mimic
> those
> > > with code.
> > >
> > > The nifi team also kindly publishes
> > > https://nifi.apache.org/docs/nifi-docs/rest-api/index.html which help
> a
> > > lot.
> > >
> > > Best of luck!
> > >
> > > -Charlie
> > >
> > > On Wed, Feb 21, 2018 at 7:34 PM,  wrote:
> > >
> > > > Hi, team,
> > > >
> > > > We set up several NiFi dataflows for data processing.
> > > > These dataflows are configured to run once per day in the midnight.
> > > >
> > > > But sometimes, some dataflows are failed,I want to run the dataflow
> > > again
> > > > immediately after fixing the issue instead of waiting for running it
> > in
> > > > the midnight to
> > > > make sure that the issue is really fixed.
> > > >
> > > > The only way I know to do this is to change the time of running the
> > > > dataflow to the 5 mintutes from now for example
> > > > and then change it back to midnight.
> > > >
> > > > It's a little inconvenient.
> > > >
> > > > Is there any REST API that I can use to trigger the dataflow on
> demand
> > > > i.e. without change the time back and forth?
> > > >
> > > > Thanks
> > > >
> > > > Boying
> > > >
> > > >
> > > >
> > > > 本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存
> 、
> > 对
> > > 外
> > > > 披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告
> 知
> > 发
> > > 件
> > > > 人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。
> > > >
> > > >
> > > > This email message may contain confidential and/or privileged
> > > information.
> > > > If you are not the intended recipient, please do not read, save,
> > > forward,
> > > > disclose or copy the contents of this email or open any file
> attached
> > to
> > > > this email. We will be grateful if you could advise the sender
> > > immediately
> > > > by replying this email, and delete this email and any attachment or
> > > links
> > > > to this email completely and immediately from your computer system.
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > 

Re: Will you accept contributions in Scala?

2018-02-10 Thread Andrew Grande
Wasn't there a warning trigger about the NiFi distro size from Apache
recently? IMO, before talking alternative languages, solve the modularity
and NAR distribution problem. I think the implementation of a module won't
matter much then, the point being not everything has to go in the core,
base distribution, but can still be easily sourced from a known repo, for
example.

I have a feeling NiFi 1.6+ can be approaching 2GB distro size soon :)

Andrew

On Sat, Feb 10, 2018, 5:12 PM Joey Frazee  wrote:

> This probably necessitates a vote, yeah?
>
> Frankly, I’m usually happier writing Scala, and I’ve not encountered any
> problems using processors written in Scala, but I think it’ll be important
> to tread lightly.
>
> There’s a few things that pop into my head:
>
> - Maintainability and reviewability. A very very good Java developer need
> not, by definition, either know how to write or identify good Scala or spot
> problems and bugs.
> - Every Scala processor would either end up with a 5MB scala-lang.jar
> packaged into the .nar or we’d have to start including it in the core
> somewhere, if it’s not. It’s possible it might have already gotten pulled
> up from other dependencies.
> - Style. There’s a tremendous amount of variation in Scala style because
> of its type system, implicits, macros, and functional nature. There are
> very good people out there that can write good Scala that isn’t readable by
> the 99%.
> - Binary compatibility. Scala tends to be a little more brazen about
> breaking binary compatibility in major releases and those happen a bit more
> often than with Java. That’s not a problem for any potential source code in
> the project, but it could present some dependency issues someday.
> - Testing. There’s N > 1 test frameworks and testing styles within those,
> so there’s a lot of options for introducing more variability into the tests.
> - NiFi uses a lot of statics in setting up properties and relationships
> and the like, and idiomatic Scala approaches that stuff a bit differently,
> so it’ll be necessary to impose some style guidelines so there isn’t too
> much variation.
>
> That said, there are some things that won’t be problematic:
>
> - As mentioned, processors written in Scala do just work.
> - The scala-maven-plugin works just fine allowing mixed Java-Scala
> projects (btw, it’d probably be not super great to do mixed Java-Scala and
> mixed Maven-SBT though).
> - A lot of the above concerns could be addressed by having clear style
> guidelines.
>
> Another thing: most of the projects I see deliver separate jars for Scala
> components are delivering idiomatic APIs wrapping Java (or vice versa). I
> think publishing   a separate set of jars/nars for stuff written in Scala
> would be odd since here it’d mostly be processors with new functionality
> and not functionality for using Scala. I could imagine a lib of implicits,
> traits, classes that could make the Scala development more enjoyable. That
> probably would make sense to deliver that way.
>
> -joey
>
> On Feb 10, 2018, 10:33 AM -0600, Bryan Bende , wrote:
> > I agree more with Andy about sticking with Java. The more varying
> languages
> > used, the more challenging it is to maintain. Once the code is part of
> the
> > Apache NiFi git repo, it is now the responsibility of the committers and
> > PMC members to maintain it.
> >
> > I’d even say I am somewhat against the groovy/Spock test code that Andy
> > mentioned. I have frequently spent hours trying to fix a Spock test that
> > broke from something I was working on. Every committer is familiar with
> > JUnit, but only a couple know Spock. Just using this as an example that
> > every committer knows Java, but only a couple probably know Scala,
> Clojure,
> > etc.
> >
> > On Sat, Feb 10, 2018 at 10:25 AM Jeff  wrote:
> >
> > > +1 to Joe's response. If you can develop a component in Groovy or Scala
> > > (or Clojure!) more quickly/comfortably, or if allowing components
> written
> > > in other languages would encourage people to contribute more, I'm all
> for
> > > it.
> > >
> > > On Sat, Feb 10, 2018 at 7:42 AM Joe Witt  wrote:
> > >
> > > > i personally would be ok with it for an extension/processor provided
> it
> > > > integrates well with the build.
> > > >
> > > > i would agree with andys view for core framework stuff but for
> > > extensions i
> > > > think we can do it like mikethomsen suggested.
> > > >
> > > > others?
> > > >
> > > > thanks
> > > > joe
> > > >
> > > > On Feb 10, 2018 7:30 AM, "Mike Thomsen" 
> wrote:
> > > >
> > > > > I'm just a community contributor, so take that FWIW, but a
> compromise
> > > > might
> > > > > be to publish the Scala code as separate maven modules to maven
> central
> > > > and
> > > > > then submit a thoroughly tested processor written in Java. As long
> as
> > > you
> > > > > have enough unit and integration tests to give 

Re: clear all flowfiles in all queues upon NiFi restart

2018-01-11 Thread Andrew Grande
Perhaps you could delete the repository directories when you need to
restart with no data?

On Thu, Jan 11, 2018, 9:16 PM 尹文才  wrote:

> Hi Mark, forgot to ask about VolatileFlowFileRepository you mentioned, if I
> switch to use VolatileFlowFileRepository, will NiFi swap out all the other
> FlowFiles to disk if a queue is already full?
> Is it just simply keeping all FlowFiles in memory?
>
> Regards,
> Ben
>
> 2018-01-12 12:07 GMT+08:00 尹文才 :
>
> > Thanks Mark, my case is that I'm using NiFi to do some ETL work and it's
> > possible that NiFi dies unexpectedly due to lack of system resources.
> After
> > NiFi restarts itself,
> > I will re-extract all the data from database and re-perform all the
> > operations, so I need to clear all possible FlowFiles that might exist in
> > any queue.
> >
> > Regards,
> > Ben
> >
> > 2018-01-12 11:49 GMT+08:00 Mark Payne :
> >
> >> Ben,
> >>
> >> I have to admit - that’s kind of an odd request :) I’m curious what the
> >> use case is, if you can share?
> >>
> >> Regardless, the easiest way would be to update nifi.properties so that
> >> the FlowFile repo that is used is the VolatileFlowFileRepository. This
> >> would avoid writing the FlowFile state to disk, so ok restart you will
> lose
> >> all FlowFiles. The content will still be present, but nifi will delete
> it
> >> all on startup because there is no FlowFile associated with it.
> >>
> >> I’m on my phone right now so can’t easily tell you the exact name of the
> >> property to change but you’ll probably find it pretty quickly. The Admin
> >> Guide may well explain the different repositories as well.
> >>
> >> Thanks
> >> -Mark
> >>
> >> Sent from my iPhone
> >>
> >> > On Jan 11, 2018, at 10:31 PM, 尹文才  wrote:
> >> >
> >> > Hi guys, I'm trying to clear all FlowFIles in all queues when NiFi is
> >> > restarted, but I don't know the correct way to do this. I checked all
> >> > NiFi's guide documentation,
> >> > it seems there're 2 possible solutions:
> >> > 1. write a custom notification service: a notification service could
> be
> >> > notified when NiFi is restarted and then inside the service, delete
> all
> >> the
> >> > files inside content_repository, flowfile_repository and
> >> > provenance_repository.
> >> >   I know there're now 2 existing services: email and http. But I'm not
> >> > quite sure how to correctly write one and deploy it into my NiFi
> >> > environment, is there a tutorial on writing one notification service?
> >> >
> >> > 2. I know from the developer guide that by using the annotation
> >> @Shutdown
> >> > in a custom processor, the method could be called when NiFi is
> >> successfully
> >> > shut down. The problem with this approach is the method could
> >> >   not be guaranteed to be called when NiFi dies unexpectedly.
> >> >
> >> > Does anyone know what is the correct way to implement it? Thanks.
> >> >
> >> > Regards,
> >> > Ben
> >>
> >
> >
>


Re: Question about Nifi

2017-10-18 Thread Andrew Grande
Check out the nifi-app.log file under $NIFI_HOME/logs.

This is also customizable by editing the conf/logback.xml file.

Andrew

On Wed, Oct 18, 2017, 7:58 AM ☆★☆★☆ <1506877...@qq.com> wrote:

> Hi,
>
> I am using nifi to develop some custom plugin(By Java),
> can you please tell me after I use getLogger().error(), or use
> getLogger().info(), where can I find the log file, where is the log file's
> path and filename
>
> BR


Re: AvroSchemaRegistry doesn't enjoy copy and paste?

2017-06-22 Thread Andrew Grande
Definitely something is auto replacing quotes, I can confirm pasting worked
fine before from a programmer's editor.

Andrew

On Thu, Jun 22, 2017, 9:06 AM Mark Payne  wrote:

> Andre,
>
> I've not seen this personally. I just clicked on the link you sent, copied
> the schema,
> and pasted it in, and it did not have any problems. What application are
> you copying
> the text from? I've certainly seen that some applications (specifically
> Microsoft Outlook
> and Office) love to take double-quotes and change them into other
> characters so that
> they look nicer. But if you then copy that and paste it, it is not pasting
> a double-quote but
> some other unicode character.
>
> Would recommend you open the below link in Chrome and copy from there and
> see if
> that works?
>
> Thanks
> -Mark
>
>
>
> > On Jun 22, 2017, at 8:56 AM, Andre  wrote:
> >
> > All,
> >
> > I was playing with the AvroSchemaRegistry and noticed it seems to not
> play
> > ball when the DFM pastes the schema into the dynamic property value.
> >
> > To test it I basically copied the demo schema from Mark's blog post[1]
> and
> > pasted into a NiFi 1.3.0 instance. To my surprise the controller would
> not
> > validate, instead it displayed:
> >
> > "was expecting double-quote to start field name"
> >
> > I also faced similar errors using the following schema:
> >
> >
> https://github.com/fluenda/SecuritySchemas/blob/master/CEFRev23/cefRev23_nifi.avsc
> >
> > Has anyone else seen this?
> >
> > Cheers
>
>


Re: SplitText processor OOM larger input files

2017-06-02 Thread Andrew Grande
1 vcore, which is not even a full core (a shared and oversubscribed cpu
core). I'm not sure what you expected to see when you raised concurrency to
10 :)

There's a lot of things NiFi is doing behind the scenes, especially around
provenance recording. I don't recommend anything below 4 cores to have
meaningful experience​. If in a cloud, go to 8 cores per VM, unless you are
designing for a low footprint with MiNiFi.

Andrew

On Fri, Jun 2, 2017, 6:30 AM Martin Eden <martineden...@gmail.com> wrote:

> Thanks Andrew,
>
> I have added UpdateAttribute processors to update the file names like you
> said. Now it works, writing out 1MB files at a time (updated the
> MergeContent MaxNumberOfEntries to 1 to achieve that since each line in
> my csv is 100 bytes).
>
> The current flow is:
> ListHDFS -> FetchHDFS -> UnpackContent -> SplitText(5000) -> SplitText(1)
> -> RouteOnContent -> MergeContent -> UpdateAttribute -> PutHDFS
>
>
> -> MergeContent -> UpdateAttribute -> PutHDFS
>
>
> -> MergeContent -> UpdateAttribute -> PutHDFS
>
> So now let's talk performance.
>
> With a 1 node NiFi running on a Google Compute Engine instance with 1 core
> and 3.7 GB RAM and a 20GB disk, when I feed one 300MB zip file
> (uncompressed 2.5GB csv text) to this flow it is basically never finishing
> the job of transferring all the data.
>
> The inbound queue of RouteOnContent is always red and outbound queues are
> mostly green so that indicates that this processor is the bottleneck. To
> mitigate this I increased its number of concurrent tasks to 10 and then
> observed tasks in progress 10, outbound queues temporarily red, avg task
> latency increased from 2ms to 20ms, cpu on box maxed out to 100% by the
> NiFi java process, load avg 5.
>
> I then decreased the number of concurrent tasks of RouteOnContent to 5 and
> the task average time dropped to about half as expected, with cpu still
> 100% taken by the NiFi java process.
>
> The RouteOnContent has 3 simple regexes that it applies.
>
> Questions:
>
> 1. Is it safe to say that I maxed out the performance of this flow on one
> box with 1 core and 3.8 GB ram?
>
> 2. The performance seems a lot lower than expected though which is
> worrying. Is this expected? I am planning to do this at much larger scale,
> hundreds of GBs.
>
> 3. Is the RouteOnContent that I am using hitting NiFi hard? Is this not a
> recommended use case? Is there anything obviously wrong in my flow?
> Doing a bit of digging around in docs, presentations and other people's
> experience it seems that NiFi's sweet spot is routing files based on
> metadata (properties) and not really based on the actual contents of the
> files.
>
> 4. Is Nifi suitable for large scale ETL. Copying and doing simple massaging
> of data from File System A to File System B? From Database A to Database B?
> This is what I am evaluating it for.
>
> I do see how running this on a box with more CPU and RAM, faster disks
> (vertical scaling) would improve the performance and then adding another
> node to the cluster. But I want to first validate the choice of
> benchmarking flow and understand the performance on one box.
>
> Thanks a lot for all the people for helping me on this thread on my NiFi
> evaluation journey. This is a really big plus for community support of
> NiFi.
>
> M
>
>
>
>
>
>
> On Thu, Jun 1, 2017 at 1:30 PM, Andrew Grande <apere...@gmail.com> wrote:
>
> > It looks like your max bin size is 1000 and 10MB. Every time you hit
> those,
> > it will write out a merged file. Update tge filename attribute to be
> unique
> > before writing via PutHDFS.
> >
> > Andrew
> >
> > On Thu, Jun 1, 2017, 2:24 AM Martin Eden <martineden...@gmail.com>
> wrote:
> >
> > > Hi Joe,
> > >
> > > Thanks for the explanations. Really useful in understanding how it
> works.
> > > Good to know that in the future this will be improved.
> > >
> > > About the appending to HDFS issue let me recap. My flow is:
> > > ListHDFS -> FetchHDFS -> UnpackContent -> SplitText(5000) ->
> SplitText(1)
> > > -> RouteOnContent -> MergeContent -> PutHDFS -> hdfs://dir1/f.csv
> > >
> > >
> > > -> MergeContent -> PutHDFS -> hdfs://dir2/f.csv
> > >
> > >
> > > -> MergeContent -> PutHDFS -> hdfs://dir3/f.csv
> > >
> > > ListHDFS is monitoring an input folder where 300MB zip files are added
> > > periodically. Each file uncompressed is 2.5 GB csv.
> > >
> > > So I am writing out to hdfs from multiple P

Re: SplitText processor OOM larger input files

2017-06-01 Thread Andrew Grande
It looks like your max bin size is 1000 and 10MB. Every time you hit those,
it will write out a merged file. Update tge filename attribute to be unique
before writing via PutHDFS.

Andrew

On Thu, Jun 1, 2017, 2:24 AM Martin Eden  wrote:

> Hi Joe,
>
> Thanks for the explanations. Really useful in understanding how it works.
> Good to know that in the future this will be improved.
>
> About the appending to HDFS issue let me recap. My flow is:
> ListHDFS -> FetchHDFS -> UnpackContent -> SplitText(5000) -> SplitText(1)
> -> RouteOnContent -> MergeContent -> PutHDFS -> hdfs://dir1/f.csv
>
>
> -> MergeContent -> PutHDFS -> hdfs://dir2/f.csv
>
>
> -> MergeContent -> PutHDFS -> hdfs://dir3/f.csv
>
> ListHDFS is monitoring an input folder where 300MB zip files are added
> periodically. Each file uncompressed is 2.5 GB csv.
>
> So I am writing out to hdfs from multiple PutHDFS processors all of them
> having conflict resolution set to *APPEND* and different output folders.
>
> The name of the file will be however the same *f.csv*. It gets picked up
> from the name of the flow files which bear the name of the original
> uncompressed file. This happens I think in the MergeContent processor.
>
> Since all of these processors are running with 1 concurrent task, it seems
> that we cannot append concurrently to hdfs even if we are appending to
> different files in different folders for some reason. Any ideas how to
> mitigate this?
>
> It seems other people have encountered this
> <
> https://community.hortonworks.com/questions/61096/puthdfs-leaseexpiredexception-error-when-running-m.html
> >
> with NiFi but there is no conclusive solution. It does seem also that
> appending to hdfs is somewhat problematic
> <
> http://community.cloudera.com/t5/Storage-Random-Access-HDFS/How-to-append-files-to-HDFS-with-Java-quot-current-leaseholder/td-p/41369
> >
> .
>
> So stepping back, the reason I am doing append in the PutHDFS is because I
> did not manage to find a setting in the MergeContent processors that
> basically allows creation of multiple bundled flow files with the same root
> name but different sequence numbers or timestamps (like f.csv.1, f.csv.2
> ). They all get the same name which is f.csv. Is that possible somehow?
> See my detailed MergeContent processor config below.
>
> So basically I have a 2.5GB csv file that eventually gets broken up in
> lines and the lines gets merged together in bundles of 10 MB but when those
> bundles are emitted to the PutHDFS they have the same name as the original
> file over and over again. I would like them to have a different name based
> on a timestamp or sequence number let's say so that I can avoid the append
> conflict resolution in PutHDFS which is causing me grief right now. Is that
> possible?
>
> Thanks,
> M
>
>
> Currently my MergeContent processor config is:
>   
> *Merge Strategy Bin-Packing Algorithm
> *
> *Merge Format Binary Concatenation
> *
> Attribute StrategyKeep Only Common
> Attributes 
> Correlation Attribute Name 
> Minimum Number of Entries1 
> Maximum Number of Entries 1000
> 
> Minimum Group Size 0 B 
> *Maximum Group Size 10 MB *
> Max Bin Age 
> Maximum number of Bins 5 
> Delimiter StrategyText 
> Header File 
> Footer File 
> Demarcator File  
> Compression Level 1
> Keep Path false 
>   
>
>
> On Wed, May 31, 2017 at 3:52 PM, Joe Witt  wrote:
>
> > Split failed before even with backpressure:
> > - yes that backpressure kicks in when destination queues for a given
> > processor have reached their target size (in count of flowfiles or
> > total size represented).  However, to clarify why the OOM happened it
> > is important to realize that it is not about 'flow files over a quick
> > period of time' but rather 'flow files held within a single process
> > session.  Your SplitText was pulling a single flowfile but then
> > creating lets say 1,000,000 resulting flow files and then committing
> > that change.  That happens within a session.  But all those flow file
> > objects (not their content) are held in memory and at such high
> > numbers it creates excessive heap usage.  The two phase divide/conquer
> > approach Koji suggested solves that and eventually we need to solve
> > that by swapping out the flowfiles to disk within a session.  We
> > actually do swap out flowfiles sitting on queues after a certain
> > threshold is reached for this very reason.  This means you should be
> > able to have many millions of flowfiles sitting around in the flow for
> > whatever reason and not hit memory problems.
> >
> > Hope that helps there.
> >
> > On PutHDFS it looks like possibly two things are trying to append to
> > the same file?  If yes I'd really recommend not appending but rather
> > use MergeContent to create data bundles of a given size then write
> > those to HDFS.
> >
> > Thanks
> > Joe
> >
> > On Wed, May 31, 2017 at 10:33 AM, 

Re: Not able to see the uploaded file in S3

2017-05-25 Thread Andrew Grande
Looks like you have checked the auto-terminate relationship and the upload
fails silently. Make sure you expose the failure relationship from the
PutS3 by sending it e.g. to a funnel.

On Thu, May 25, 2017, 7:00 AM suman@cuddle.ai 
wrote:

> Hi All,
>
> I have a simple flow consists of following processors.
>
> GetFile-->PutS3Object
>
> The flow is successful but not able to see the file in S3.
> When I checked NiFi DataProvenance of PutS3Object in details section i can
> see the below message.
>
> Details
> Auto-Terminated by failure Relationship
>
> in PutS3Object i have given bucket name and used
> AWSCredentialsProviderController Service and the user is having proper
> rights for the S3 Bucket.
>
>
> Please let me know if i am doing anything wrong.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Not-able-to-see-the-uploaded-file-in-S3-tp15978.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: Partitioning from actual Data (FlowFile) in NiFi

2017-05-15 Thread Andrew Grande
Yes to compress. The output of the merge step is a larger piece of data, no
more/older than configured by the merge step. It can produce partial
smaller buckets if it were configured with max age attribute.

Andrew

On Mon, May 15, 2017, 5:28 AM Anshuman Ghosh 
wrote:

> Thank you so much Bryan :-)
> It is working fine now as the following workflow
>
> *Consume from Kafka ==> *
> *Evaluate JSON path (Timestamp) ==> *
> *Update Attribute to get year, month and day; since we receive a 19 digit
> long Timestamp value , we had to use the following trick
>
> (**${Click.RequestTimestamp:toString():substring(0,13):toNumber():format("",
> "GMT")}**) ==> Convert JSON to Avro ==> *
> *Merge Content on similar Attribute (Timestamp - Date) ==> *
> *Write merged FlowFile onto Google Cloud Storage (GCS) buckets*
>
> ​Let me know whether it can be further improvised.
> Also will it be okay to use a "*CompressContent*" processor right after
> merge step?​
>
>
> Than
> ​king you in advance!​
>
> ​
> __
>
> *Kind Regards,*
> *Anshuman Ghosh*
> *Contact - +49 179 9090964*
>
>
>
> On Thu, May 11, 2017 at 4:44 PM, Joe Witt  wrote:
>
> > Cool.  Bryan offers a good approach now.  And this JIRA captures a
> > really powerful way to do it going forward
> > https://issues.apache.org/jira/browse/NIFI-3866
> >
> > Thanks
> > Joe
> >
> > On Thu, May 11, 2017 at 10:41 AM, Bryan Bende  wrote:
> > > If your data is JSON, then you could extract the date field from the
> > > JSON before you convert to Avro by using EvaluateJsonPath.
> > >
> > > From there lets say you have an attribute called "time" with the unix
> > > timestamp, you could use an UpdateAttribute processor to create
> > > attributes for each part of the timestamp:
> > >
> > > time.year = ${time:format("", "GMT")}
> > > time.month = ${time:format("MM", "GMT")}
> > > time.day = ${time:format("dd", "GMT")}
> > >
> > > Then in PutHDFS you can do something similar to what you were already
> > doing:
> > >
> > > /year=${time.year}/month=${time.month}/day=${time.day}/
> > >
> > > As Joe mentioned there is a bunch of new record reader/writer related
> > > capabilities in 1.2.0, and there is a follow JIRA to add a "record
> > > path" which would allow you to extract a value (like your date field)
> > > from any data format.
> > >
> > > On Thu, May 11, 2017 at 10:04 AM, Anshuman Ghosh
> > >  wrote:
> > >> Hello Joe,
> > >>
> > >> Regret for the inconvenience, I would keep that in mind going forward!
> > >>
> > >> Thank you for your suggestion :-)
> > >> We have recently built NiFi from the master branch, so it should be
> > similar
> > >> to 1.2.0
> > >> We receive data in JSON format and then convert to Avro before writing
> > to
> > >> HDFS.
> > >> The date filed here is an Unix timestamp of 19 digit (bigint)
> > >>
> > >> It would be really great if you can help a bit on how we can achieve
> the
> > >> same with Avro here.
> > >> Thanking you in advance!
> > >>
> > >>
> > >> __
> > >>
> > >> *Kind Regards,*
> > >> *Anshuman Ghosh*
> > >> *Contact - +49 179 9090964*
> > >>
> > >>
> > >> On Thu, May 11, 2017 at 3:53 PM, Joe Witt  wrote:
> > >>
> > >>> Anshuman
> > >>>
> > >>> Hello.  Please avoid directly addressing specific developers and
> > >>> instead just address the mailing list you need (dev or user).
> > >>>
> > >>> If your data is CSV, for example, you can use RouteText to
> efficiently
> > >>> partition the incoming sets by matching field/column values and in so
> > >>> doing you'll now have the flowfile attribute you need for that group.
> > >>> Then you can merge those together with MergeContent for like
> > >>> attributes and when writing to HDFS you can use that value.
> > >>>
> > >>> With the next record reader/writer capabilities in Apache NiFI 1.2.0
> > >>> we can now provide a record oriented PartitionRecord processor which
> > >>> will then also let you easily do this pattern on all kinds of
> > >>> formats/schemas in a nice/clean way.
> > >>>
> > >>> Joe
> > >>>
> > >>> On Thu, May 11, 2017 at 9:49 AM, Anshuman Ghosh
> > >>>  wrote:
> > >>> > Hello everyone,
> > >>> >
> > >>> > It would be great if you can help me implementing this use-case
> > >>> >
> > >>> > Is there any way (NiFi processor) to use an attribute (field/
> column)
> > >>> value
> > >>> > for partitioning when writing the final FlowFile to HDFS/ other
> > storage.
> > >>> > Earlier we were using simple system date
> > >>> > (/year=${now():format('')}/month=${now():format('MM')}/
> > >>> day=${now():format('dd')}/)
> > >>> > for this but that doesn't make sense when we consume old data from
> > Kafka
> > >>> and
> > >>> > want to partition on original date (a date field inside Kafka
> > message)
> > >>> >
> > >>> >
> > >>> > Thank you!
> > >>> > __
> > >>> >
> > >>> > Kind Regards,

Re: Zookeeper issues at initial Cluster startup

2017-02-28 Thread Andrew Grande
The delay on startup is due to the lack of entropy in your server, which is
typical for cloud environments. I.e. SecureRandom doesn't yet have enough
randomness to initialize and blocks waiting. Search the nifi entropy in
archives, there were workarounds posted.

Andrew

On Tue, Feb 28, 2017, 3:34 PM Mark Bean  wrote:

> I did indeed have a port problem. Thank you for leading me to that. So, I'm
> using the default port of 2181 for zookeeper. So, I updated the zookeeper
> connect string (in both state-management.xml and nifi.properties) to be:
> FQDN1:2181,FQDN2:2181,FQDN3:2181
>
> I am continuing to use :11001:11000 in place of the recommended :2888:3888
> in the zookeeper.properties file. This is due to available ports.
> server.1=:11001:11000
> server.2=:11001:11000
> server.3=:11001:11000
>
> I also had a cut/paste error. I acutally had:
> nifi.cluster.flow.election.max.candidates=2 (not 201, as originally stated)
> The rationale was that once 2 of 3 Nodes connected, the flow could be
> accepted. In either case, based on your comments, I set this to 3 since
> there are 3 Nodes
> nifi.cluster.flow.election.max.candidates=3
>
> Also, I reduced the wait time to 2 mins (from the default 5 mins) hoping to
> either connect or fail more quickly:
> nifi.cluster.flow.election.max.wait.time=2 mins
>
> I cleaned everything out of state except for the ./state/zookeeper/myid
> file. And, I removed all flow.xml.gz files. Now, I am seeing the same
> "Background retry gave up" errors continuously being reported in the
> nifi-app.log on one Node. The other two Nodes remain hung with the last
> nifi-app.log entry being:
> INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 122 properties
> from /nifi.properties
>
> As noted earlier, on the Node generating the errors, immediately before
> they begin, I see
> 2017-02-28 14:22:46,489 INFO [main]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> CuratorLeaderElectionManager[stopped=false] Attempted to register Leader
> Election for role 'Cluster Coordinator' but this role is already registered
> 2017-02-28 14:22:53,506 INFO [Curator-Framework-0]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
> 2017-02-28 14:22:53,510 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
>
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1e175829
> Connection State changed to SUSPENDED
>
> Is it expected that the connection state is SUSPENDED?
> What reasons might cause the two Nodes to apparently hang with no errors or
> warnings before connecting to the cluster? From the log files, I can't even
> tell if the Nodes are trying to connect.
>
>
> On Tue, Feb 28, 2017 at 2:36 PM, Jeff  wrote:
>
> > Mark,
> > There's some copy/paste errors in my last response as well.  Sorry!
> > server.1=:2888:3888
> > server.2=:2888:3888
> > server.3=:2888:3888
> >
> > On Tue, Feb 28, 2017 at 2:31 PM Jeff  wrote:
> >
> > > Mark,
> > >
> > > In my original response, I said that in zookeeper.propertiers, the
> > > server.N properties should be set to the host:port of your ZK server,
> and
> > > that was pretty ambiguous.  It should not be set to the same port as
> > > clientPort.
> > >
> > > As Bryan mentioned, with the default clientPort set to 2181, typically
> > the
> > > server.N properties are set to hostname:2888:3888.  In your case, you
> > might
> > > want to try something like the following, as long as these ports are
> not
> > > currently in use:
> > > server.1=:2888:3888
> > > server.2=:2888:3888
> > > server.3=:2888:3888
> > >
> > > Also, your settings for leader elections:
> > > nifi.cluster.flow.election.max.wait.time=5 mins
> > > nifi.cluster.flow.election.max.candidates=201
> > >
> > > This will wait for 201 election candidates to connect, or 5 minutes.
> You
> > > might want to set the max candidates to 3, since you have 3 nodes in
> your
> > > cluster.
> > >
> > > The contents of ./state/zookeeper look correct, you should be okay
> there.
> > >
> > >
> > > On Tue, Feb 28, 2017 at 2:19 PM Bryan Bende  wrote:
> > >
> > > Mark,
> > >
> > > I am not totally sure, but there could be an issue with the ports in
> > > some of the connect strings.
> > >
> > > In zookeeper.properties there is an entry for clientPort which
> > > defaults to 2181, the value of this property is what should be
> > > referenced in nifi.zookeeper.connect.string and state-management.xml
> > > Connect String, so if you left it alone then:
> > >
> > > FQDN1:2181,FQDN2:2181,FQDN3:2181
> > >
> > > In the server entries in zookeeper.properties, I believe they should
> > > be referencing different ports. For example, when using the default
> > > clientPort=2181 the server entries are typically like:
> > >
> > > server.1=localhost:2888:3888
> > >
> > > From the ZooKeeper docs the definition for these two ports is:
> > >
> > > "There are two port numbers n. The 

Re: Central processor repository

2017-02-25 Thread Andrew Grande
Uwe,

Are you aware of this one? https://issues.apache.org/jira/browse/NIFI-2168

Andrew

On Sat, Feb 25, 2017, 7:53 AM u...@moosheimer.com  wrote:

> Pretty good idea!
> Would appreciate a place where we all can upload sources (together with
> some information) and everybody can use or modify it.
>
> Best Regards,
> Uwe
>
> > Am 25.02.2017 um 13:10 schrieb Uwe Geercken :
> >
> > Hello,
> >
> > I remember a while ago that there was a short discussion if it would be
> good to have a central place somewhere where people could upload the
> processors they created and others could download them from this central
> point. It would make life easier compared to surfing the web to find them
> and I believe would also add to the popularity of Nifi.
> >
> > I don't know if this has been discussed more deeply. Are there any plans
> for such a central repository?
> >
> > Rgds,
> >
> > Uwe
>
>


Re: flow as code and minify scaling/isolation

2017-02-24 Thread Andrew Grande
Hi,

I think all processors acting as clients do isolate Kerberos keytabs and
client certificates.

The Kafka situation is a current design limitation of Kafka, not NiFi. The
good news is there's an effort underway to have Kafka not rely on global
singleton config and specify those per connection instead. But this is more
in the Kafka 0.11.x line.

Andrew

On Fri, Feb 24, 2017, 4:23 PM hunter morgan 
wrote:

> thanks for the links.
>
> i'm thinking that having the option of getting a template out of it or
> running in minifi would be good enough. i was sad to find that the rest api
> didn't seem to be included in minifi, so with it, accessible template
> export. i'm gonna look at that this weekend. glad to have more direction.
>
> yeah there is an impedance mismatch so far. but the minifi yaml config
> looks
> like the closest official completed work to such a workflow. i have mixed
> feelings about the flow repository stuff that's going on, but that's
> probably because i'm a dev that likes my existing tools (git, vi, cli
> goodness).
>
> it's hard to provide secure multitenant capability in nifi and isolate
> keytabs/jass/keystores between users, especially when processors use code
> (like kafka clients) that require or document using jvm opts to configure
> global jaas.
>
>
> also i think i wasn't joined to the list or something, so i should find out
> quicker next time there's a response.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/flow-as-code-and-minify-scaling-isolation-tp14564p14963.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: [DISCUSS] Scale-out/Object Storage - taming the diversity of processors

2017-02-21 Thread Andrew Grande
I am observing one assumption in this thread. For some reason we are
implying all these will be hadoop compatible file systems. They don't
always have an HDFS plugin, nor should they as a mandatory requirement.
Untie completely from the Hadoop nar. This allows for effective minifi
interaction without the weight of hadoop libs for example. Massive size
savings where it matters.

For the deployment, it's easy enough for an admin to either rely on a
standard tar or rpm if the NAR modules are already available in the distro
(well, I won't talk registry till it arrives). Mounting a common directory
on every node or distributing additional jars everywhere, plus configs, and
then keeping it consistent across is something which can be avoided by
simpler packaging.

Andrew

On Tue, Feb 21, 2017, 6:47 PM Andre <andre-li...@fucs.org> wrote:

> Andrew,
>
> Thank you for contributing.
>
> On 22 Feb 2017 10:21 AM, "Andrew Grande" <apere...@gmail.com> wrote:
>
> Andre,
>
> I came across multiple NiFi use cases where going through the HDFS layer
> and the fs plugin may not be possible. I.e. when no HDFS layer present at
> all, so no NN to connect to.
>
>
> Not sure I understand what you mean.
>
>
> Another important aspect is operations. Current PutHDFS model with
> additional jar location, well, it kinda works, but I very much dislike it.
> Too many possibilities for a human error in addition to deployment pain,
> especially in a cluster.
>
>
> Fair enough. Would you mind expanding a bit on what sort of  challenges
> currently apply in terms of cluster deployment?
>
>
> Finally, native object storage processors have features which may not even
> apply to the HDFS layer. E.g. the Azure storage has Table storage, etc.
>
>
> This is a very valid point but I am sure exceptions (in this case a NoSQL
> DB operating under the umbrella term of "storage").
>
> I perhaps should have made it more explicit but the requirements are:
>
> - existence of a hadoop compatible interface
> - ability to handle files
>
> Again, thank you for the input, truly appreciated.
>
> Andre
>
> I agree consolidating various efforts is worthwhile, but only within a
> context of a specific storage solution. Not 'unifying' them into a single
> layer.
>
> Andrew
>
> On Tue, Feb 21, 2017, 6:10 PM Andre <andre-li...@fucs.org> wrote:
>
> > dev,
> >
> > I was having a chat with Pierre around PR#379 and we thought it would be
> > worth sharing this with the wider group:
> >
> >
> > I recently noticed that we merged a number of PRs and merges around
> > scale-out/cloud based object store into the master.
> >
> > Would it make sense to start considering adopting a pattern where
> > Put/Get/ListHDFS are used in tandem with implementations of the
> > hadoop.filesystem interfaces instead of creating new processors, except
> > where a particular deficiency/incompatibility in the hadoop.filesystem
> > implementation exists?
> >
> > Candidates for removal / non merge would be:
> >
> > - Alluxio (PR#379)
> > - WASB (PR#626)
> >  - Azure* (PR#399)
> > - *GCP (recently merged as PR#1482)
> > - *S3 (although this has been in code so it would have to be deprecated)
> >
> > The pattern would be pretty much the same as the one documented and
> > successfully deployed here:
> >
> > https://community.hortonworks.com/articles/71916/connecting-
> > to-azure-data-lake-from-a-nifi-dataflow.html
> >
> > Which means that in the case of Alluxio, one would use the properties
> > documented here:
> >
> > https://www.alluxio.com/docs/community/1.3/en/Running-
> > Hadoop-MapReduce-on-Alluxio.html
> >
> > While with Google Cloud Storage we would use the properties documented
> > here:
> >
> > https://cloud.google.com/hadoop/google-cloud-storage-connector
> >
> > I noticed that specific processors could have the ability to handle
> > particular properties to a filesystem, however I would like to believe
> the
> > same issue would plague hadoop users, and therefore is reasonable to
> > believe the Hadoop compatible implementations would have ways of exposing
> > those properties as well?
> >
> > In the case the properties are exposed, we perhaps simply adjust the
> *HDFS
> > processors to use dynamic properties to pass those to the underlying
> > module, therefore providing a way to explore particular settings of an
> > underlying storage platforms.
> >
> > Any opinion would be welcome
> >
> > PS-sent it again with proper subject label
> >
>


Re: [DISCUSS] Scale-out/Object Storage - taming the diversity of processors

2017-02-21 Thread Andrew Grande
Andre,

I came across multiple NiFi use cases where going through the HDFS layer
and the fs plugin may not be possible. I.e. when no HDFS layer present at
all, so no NN to connect to.

Another important aspect is operations. Current PutHDFS model with
additional jar location, well, it kinda works, but I very much dislike it.
Too many possibilities for a human error in addition to deployment pain,
especially in a cluster.

Finally, native object storage processors have features which may not even
apply to the HDFS layer. E.g. the Azure storage has Table storage, etc.

I agree consolidating various efforts is worthwhile, but only within a
context of a specific storage solution. Not 'unifying' them into a single
layer.

Andrew

On Tue, Feb 21, 2017, 6:10 PM Andre  wrote:

> dev,
>
> I was having a chat with Pierre around PR#379 and we thought it would be
> worth sharing this with the wider group:
>
>
> I recently noticed that we merged a number of PRs and merges around
> scale-out/cloud based object store into the master.
>
> Would it make sense to start considering adopting a pattern where
> Put/Get/ListHDFS are used in tandem with implementations of the
> hadoop.filesystem interfaces instead of creating new processors, except
> where a particular deficiency/incompatibility in the hadoop.filesystem
> implementation exists?
>
> Candidates for removal / non merge would be:
>
> - Alluxio (PR#379)
> - WASB (PR#626)
>  - Azure* (PR#399)
> - *GCP (recently merged as PR#1482)
> - *S3 (although this has been in code so it would have to be deprecated)
>
> The pattern would be pretty much the same as the one documented and
> successfully deployed here:
>
> https://community.hortonworks.com/articles/71916/connecting-
> to-azure-data-lake-from-a-nifi-dataflow.html
>
> Which means that in the case of Alluxio, one would use the properties
> documented here:
>
> https://www.alluxio.com/docs/community/1.3/en/Running-
> Hadoop-MapReduce-on-Alluxio.html
>
> While with Google Cloud Storage we would use the properties documented
> here:
>
> https://cloud.google.com/hadoop/google-cloud-storage-connector
>
> I noticed that specific processors could have the ability to handle
> particular properties to a filesystem, however I would like to believe the
> same issue would plague hadoop users, and therefore is reasonable to
> believe the Hadoop compatible implementations would have ways of exposing
> those properties as well?
>
> In the case the properties are exposed, we perhaps simply adjust the *HDFS
> processors to use dynamic properties to pass those to the underlying
> module, therefore providing a way to explore particular settings of an
> underlying storage platforms.
>
> Any opinion would be welcome
>
> PS-sent it again with proper subject label
>


Re: Add hidden show capability to sensitive text-area boxes

2017-02-12 Thread Andrew Grande
Hi,

NiFi never pulls sensitive valued to the UI once set, it's a one-way
operation only. Today, the UI has a hint 'sensitive value is set' to
indicate it's not empty. Maybe you can provide more background on what you
are trying to achieve in that context?

Thanks,
Andrew

On Sun, Feb 12, 2017, 7:43 AM Ramakrishnan Venkatachalam <
ramkrish9...@gmail.com> wrote:

> Hi,
>
> In controller-services, for ex: HiveConnectionPool there is a property
> called "Password". One of our clients who use  nifi wish to add the
> capability of hidden show (input type="password"). I understand property
> list table is dynamically populated using jquery (please correct me if am
> wrong).
>
> Below is what i suggest:
>
> When we mention like below in this case for password property:
>
> public static final PropertyDescriptor DB_PASSWORD = new
> PropertyDescriptor.Builder().name("hive-db-password")
> .displayName("Password")
> .description("The password for the database user")
> .defaultValue(null)
> .required(false)
> .sensitive(true)
> .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
> .build();
>
> i suggest to add
>
>  .hiddenshow(true) (or any other name)
>
> so that frame work can add an jquery and css appropriately to build the
> password like hidden show nature: "***" ( we can even replace the
> textarea tag to input type='password> ) only for these cases.
>
>
> I would like to your views on this..
>
> Thanks
> Ramakrishnan V
>


Re: Dealing with cluster errors

2017-02-10 Thread Andrew Grande
Joe,

External ZK quorum would be my first move. And make sure those boxes have
fast disks and no heavy load from other processes.

Andrew

On Fri, Feb 10, 2017, 7:23 AM Joe Gresock  wrote:

> I should add that the flows on the individual nodes appear to be processing
> the data just fine, and the solution I've found so far is to just wait for
> the data to subside, after which point the console comes up successfully.
> So, no complaint on the durability of the underlying data flows.  It's just
> problematic that I can't reliably make changes to the flow during high
> traffic periods.
>
> On Fri, Feb 10, 2017 at 12:00 PM, Joe Gresock  wrote:
>
> > We have a 7-node cluster and we currently use the embedded zookeepers on
> 3
> > of the nodes.  I've noticed that when we have a high volume in our flow
> > (which is causing the CPU to be hit pretty hard), I have a really hard
> time
> > getting the console page to come up, as it cycles through the following
> > error messages when I relolad the page:
> >
> >
> >- An unexpected error has occurred.  Please check the logs.  (there is
> >never any error in the logs for this one)
> >- Could not replicate request to  because the node is not
> >connected   (this is never the current host I'm trying to hit, which
> makes
> >the error text feel a bit irrelevant to the user.  i.e., "I wasn't
> trying
> >to replicate a request to that node, I just want to load the console
> on
> >this node")
> >- An error occurred communicating with the application core.  Please
> >check the logs and fix any configuration issues before restarting.
> (Again,
> >can't find any errors in nifi-app.log or nifi-user.log)
> >
> > I can go about a half-hour reloading the page before it comes up once,
> and
> > then I can only get maybe one action in before it auto-refreshes and
> shows
> > me one of the above error messages again.
> >
> > My first thought was that using some external zookeeper servers would
> > improve this, but that's just a hunch.  Has anyone encountered this
> > behavior with high data volume?
> > Joe
> >
> > --
> > I know what it is to be in need, and I know what it is to have plenty.  I
> > have learned the secret of being content in any and every situation,
> > whether well fed or hungry, whether living in plenty or in want.  I can
> > do all this through him who gives me strength.*-Philippians 4:12-13*
> >
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.*-Philippians 4:12-13*
>


Re: Setting a node as primary?

2017-02-09 Thread Andrew Grande
Not anymore. The assumption is every node in a cluster can become a primary
in the event of failover, and has to be configured to be the same across.

Andrew

On Thu, Feb 9, 2017, 5:58 AM Joe Gresock  wrote:

> Is there a way to request a node to become PRIMARY?  I seem to remember
> this from 0.x, but can't find a way in 1.x.
>
> In the PUT /controller/cluster/nodes/{nodeId} documentation, it says "The
> node configuration. The only configuration that will be honored at this
> endpoint is the status or primary flag."
>
> However, setting "roles" : ["PRIMARY"] doesn't seem to do anything.  Also,
> looking at the StandardNiFiServiceFacade code, there doesn't appear to be a
> way to do anything but connect or disconnect a node.
>
>
> Thanks,
> Joe
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.*-Philippians 4:12-13*
>


Re: [ANNOUNCE] New Apache NiFi Committer Joey Frazee

2017-01-03 Thread Andrew Grande
Great job, Joey, congrats :)

On Tue, Jan 3, 2017 at 2:42 PM Aldrin Piri  wrote:

> On behalf of the Apache NiFI PMC, I am very pleased to announce that Joey
> Frazee has accepted the PMC's invitation to become a committer on the
> Apache NiFi project. We greatly appreciate all of Joey's hard work and
> generous contributions and look forward to continued involvement in the
> project.
>
> Joey's contributions including support for HL7, JMS, and EventHub
> extensions.  Joey can also be found assisting with the lists, as well as
> articles and repositories based around the NiFi community.
>
> Congrats, Joey!
>


Re: Adding JVM arguments when running NiFi...

2016-12-19 Thread Andrew Grande
Sounds like switching to the LinkedHashMap could address.the JVM startup
flags ordering g?

Andrew

On Thu, Dec 15, 2016, 1:11 PM Russell Bateman  wrote:

> Joe,
>
> I have more detail and success to report...
>
> There is no problem in NiFi /per se/. It was not really necessary to dig
> down into the code really because there's no problem there. It's just
> how java.arg./n/is consumed that creates trouble. One must play around
> with the value of /n/ when argument ordering is crucial. In the present
> case, I was able to turn on JFR using the following (in bold) in
> /conf/bootstrap.conf/:
>
> #Set headless mode by default
> java.arg.14=-Djava.awt.headless=true
>
> *# Light up the Java Flight Recorder...**
> **java.arg.32=-XX:+UnlockCommercialFeatures**
> **java.arg.31=-XX:+FlightRecorder**
>
> **java.arg.42=-XX:StartFlightRecording=duration=120m,filename=recording.jfr*
>
> # Master key in hexadecimal format for encrypted sensitive
> configuration values
> nifi.bootstrap.sensitive.key=
>
> The point was that -XX:+UnlockCommercialFeatures must absolutely precede
> the other two options. Because of these arguments passing through
> HashMap, some values of /n/ will fail to sort an argument before others
> and the addition of other arguments might also come along and disturb
> this order. What I did here was by trial and error. I guess NiFi code
> could man-handle /n/ in every case such that the order be respected. I'm
> guessing that my case is the only or one of the very rare times options
> are order-dependent. I wouldn't personally up-vote a JIRA to fix this.
>
> Hoping this forum has an adequate profile out there in Googleland, this
> very discussion will be enough to help the next guy.
>
> Russ
>
> P.S. My trial and error went pretty fast using this command line between
> reassigning instances of /n/ for these arguments. It took me maybe 6
> tries. (Script /bounce-nifi.sh/ does little more than shut down, then
> start up NiFi, something I do a hundred times per day across at least
> two versions.)
>
> ~/dev/nifi/nifi-1.1.0/logs $ rm *.log ; bounce-nifi.sh 1.1.0 ; tail
> -f nifi-bootstrap.log
>
>
>
>
> On 12/14/2016 07:41 PM, Joe Witt wrote:
> > The way you set it up initially is what I'd thought would have done
> > the trick.  Perhaps we're not ordering the arguments in the same
> > manner supplied.  Will need to look into that.
> >
> > Thanks
> > Joe
> >
> > On Wed, Dec 14, 2016 at 9:00 PM, Russell Bateman 
> wrote:
> >> Of course, as many have done, I've run Java applications with JFR
> enabled
> >> using these options against this very JVM (jdk1.8.0_112). So, it isn't a
> >> problem for the JVM I'm using. I haven't finished digging down into
> >> ProcessBuilder, or deeper, to figure out why these options are not
> getting
> >> love. I'll get back at it tomorrow and report back.
> >>
> >>
> >> On 12/14/2016 11:05 AM, Russell Bateman wrote:
> >>>
> >>> I've doctored /conf/bootstrap.conf/ to contain these additional lines:
> >>>
> >>>  java.arg.15=-XX:+UnlockCommercialFeatures
> >>>  java.arg.16=-XX:+FlightRecorder
> >>>
> >>>
> java.arg.17=-XX:StartFlightRecording=duration=120m,filename=recording.jfr
> >>>
> >>>
> >>> In the end, NiFi's grumpy about this and won't start (from
> >>> /logs/nifi-bootstrap.log/):
> >>>
> >>> 2016-12-14 10:39:36,489 ERROR [NiFi logging handler]
> >>> org.apache.nifi.StdErr Error: *To use 'FlightRecorder', first unlock
> using
> >>> -XX:+UnlockCommercialFeatures.*
> >>> 2016-12-14 10:39:36,489 ERROR [NiFi logging handler]
> >>> org.apache.nifi.StdErr Error: Could not create the Java Virtual
> Machine.
> >>> 2016-12-14 10:39:36,489 ERROR [NiFi logging handler]
> >>> org.apache.nifi.StdErr Error: A fatal exception has occurred. Program
> will
> >>> exit.
> >>> 2016-12-14 10:39:36,507 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> NiFi
> >>> never started. Will not restart NiFi
> >>>
> >>> I tried using all options as one (in case the order is disturbed,
> which it
> >>> was):
> >>>
> >>>
> >>>
> java.arg.15=-XX:+UnlockCommercialFeatures-XX:+FlightRecorder-XX:StartFlightRecording=duration=120m,filename=recording.jfr
> >>>
> >>>
> >>> and then I get:
> >>>
> >>> *2016-12-14 10:50:07,574 ERROR [NiFi logging handler]
> >>> org.apache.nifi.StdErr Unrecognized VM option 'UnlockCommercialFeatures
> >>> -XX:+FlightRecorder
> >>> -XX:StartFlightRecording=duration=120m,filename=recording.jfr'*
> >>> 2016-12-14 10:50:07,574 ERROR [NiFi logging handler]
> >>> org.apache.nifi.StdErr Error: Could not create the Java Virtual
> Machine.
> >>> 2016-12-14 10:50:07,574 ERROR [NiFi logging handler]
> >>> org.apache.nifi.StdErr Error: A fatal exception has occurred. Program
> will
> >>> exit.
> >>> 2016-12-14 10:50:07,598 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> NiFi
> >>> never started. Will not restart NiFi
> >>>
> >>> Here's the second command line from /logs/nifi-bootstrap.log/, which
> I've

Re: Another holiday request...

2016-12-02 Thread Andrew Grande
Sure, thanks for the feedback, but credit is not mine :)

I shared a general design tone the team mentioned on a few occasions and
respect it. Community managed to skew it a little with NiFi 1.1.0, for
example, not how 'Change Color' now has a more pronounced effect on the
processor than before. Also, the addition of the backpressure gauges and
throttling indicators to connections. So yes, they are listening, and
feedback is always welcome.

Andrew

On Fri, Dec 2, 2016 at 10:43 AM Russell Bateman <r...@windofkeltia.com>
wrote:

> Thanks, Andrew. That's a good point that I had missed. I knew some
> indication of troubles reached the triangle icon on the processor's
> representation, but confess I had not realized there was a documented
> threshold. This will work fine (though I'll bet our guys would like the
> more carnival atmosphere).
>
> Russ
>
> P.S. I hope my attempt at humor didn't offend.
>
> On 12/02/2016 07:08 AM, Andrew Grande wrote:
> > I think the team was going for a toned down style anyway. One way to
> draw a
> > user's attention, which is available today already, is to use a standard
> > log statement with proper level. This will issue a bulletin both on the
> > processor (and visually indicate a note) as well as add it to a global
> list
> > of bulletins. By default processors ignore anything lower than WARN, but
> > this is adjustable in processor settings.
> >
> > Andrew
> >
> > On Thu, Dec 1, 2016 at 5:47 PM Russell Bateman <r...@windofkeltia.com>
> > wrote:
> >
> >> In pursuit of yet more wacky functionality, I wondered about a good way
> >> to get a NiFi user's attention (yeah, the person standing looking at the
> >> UI) when some condition, threshold, etc. is met as detected by a custom
> >> processor. In view of the season, is it possible to turn the processor
> >> representation flashing green or red?
> >>
> >> Okay, seriously, I guess I could just throw ProcessExceptionor
> >> discretely (sniff) drop something into the log. (Do any better ideas
> >> comes to mind?)
> >>
> >> If NiFi is to gain greater public recognition and popularity, we need to
> >> consider how to attract Hollywood through cool visual effects. I'm
> >> anticipating seeing Tom Cruise looking at NiFi on-screen during a tense
> >> moment in MI-6!
> >>
> >> ;-)
> >>
> >> Thanks,
> >>
> >> Russ
> >>
>
>


Re: Another holiday request...

2016-12-02 Thread Andrew Grande
I think the team was going for a toned down style anyway. One way to draw a
user's attention, which is available today already, is to use a standard
log statement with proper level. This will issue a bulletin both on the
processor (and visually indicate a note) as well as add it to a global list
of bulletins. By default processors ignore anything lower than WARN, but
this is adjustable in processor settings.

Andrew

On Thu, Dec 1, 2016 at 5:47 PM Russell Bateman 
wrote:

> In pursuit of yet more wacky functionality, I wondered about a good way
> to get a NiFi user's attention (yeah, the person standing looking at the
> UI) when some condition, threshold, etc. is met as detected by a custom
> processor. In view of the season, is it possible to turn the processor
> representation flashing green or red?
>
> Okay, seriously, I guess I could just throw ProcessExceptionor
> discretely (sniff) drop something into the log. (Do any better ideas
> comes to mind?)
>
> If NiFi is to gain greater public recognition and popularity, we need to
> consider how to attract Hollywood through cool visual effects. I'm
> anticipating seeing Tom Cruise looking at NiFi on-screen during a tense
> moment in MI-6!
>
> ;-)
>
> Thanks,
>
> Russ
>


Re: PostHttp processor: read timeout error (socketTimeoutException)

2016-11-15 Thread Andrew Grande
The influxdb rest endpoint can get overloaded and timeout. Also, see if you
get better results with NiFi's InvokeHttp processor maybe.

Andrew

On Mon, Nov 14, 2016, 7:00 AM balacode63  wrote:

> Hi all,
>
> I'm using /postHttp/ processor to post data to influxdb. I'm getting the
> following error frequently.
>
> /"Failed to Post.due to java.net.SocketTimeoutException:Read timed
> out;
> transferring to failure: java.net.SocketTimeoutException:Read timed out;"
> /
> <
> http://apache-nifi-developer-list.39713.n7.nabble.com/file/n13879/error.png
> >
>
> Properties for posthttp processor:
> Data timeout: 60 seconds
> Connection timeout: 60 seconds
>
> Note:
> Influxdb and nifi are running in different servers.
>
> Please guide me to fix this issue.
>
> Thanks,
> Bala
>
>
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/PostHttp-processor-read-timeout-error-socketTimeoutException-tp13879.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: Uploading a csv file into an RDBMS table quickly like LOAD DATA

2016-11-14 Thread Andrew Grande
You could e.g. use NiFi to get the csv file, put it on a file system and
invoke the native mysql loader command via ExecuteStreamCommand.

Andrew

On Sun, Nov 13, 2016, 6:57 PM raghav130593  wrote:

> I receive tab delimited CSV files from an email and I want to insert them
> into a Teradata DB table. Right now, I am going through a sequence of tasks
> where I SplitText->extractText -> ReplaceText(where I create the SQL
> statement) -> putSQL. i am creating a row by row SQL insert here and there
> are thousands of records which does cause a slow insertion process. Is
> there
> anyway where I can do something like the LOAD DATA INFILE command like in
> MySQL where the csv data is loaded into the table at a very high speed. Is
> there anyway where I can merge this SQL command with the incoming csv file
> in NiFi?
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Uploading-a-csv-file-into-an-RDBMS-table-quickly-like-LOAD-DATA-tp13878.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: GetSFTP backpressure question

2016-10-28 Thread Andrew Grande
UpdateAttribute is a good example of a soft threshold behavior. By default,
this processor operates in micro batches of 100. So, regardless if you set
BP to e.g. 5, it will commit 100 events and then be throttled back by the
connection. A user will see 100 events in the connection even though 5 was
the limit.

I'd say it's important to understand this behavior and that the flow will
be enabled again as soon as the backlog drops to under 5, but not sure if
there's a generic fix, or even if a fix is due.

Andrew

On Fri, Oct 28, 2016, 7:55 AM Joe Witt  wrote:

> Great questions and discussion points here and I agree with your
> statement about the importance of honoring back pressure targets the
> user believes they set.
>
> The way back pressure works is that before a processor is given a
> thread to execute (each onTrigger cycle) the framework checks all
> possible output relationships and ensures that at that moment in time
> all of them have space available according to the limits set on those
> connection (size or number of things).  Once that processor is given
> the thread to execute its onTrigger cycle it is up to that processor
> to be a good steward and the framework does offer a method for that
> processor to check if all destinations have space available which is
> important if for efficiency reasons it chooses to do more than one
> thing at a time.  The processor doesn't get to know how close or how
> full the queues are that it writes to so that is important to
> understand as well.  To the processor the destinations are either full
> or have space available.
>
> This sort of back pressure is an optimistic approach and really means
> these are enforced as soft limits and as you point out can be exceeded
> in some cases.  It basically means that the back pressure target can
> be exceeded by however much data could be produced by a processor in a
> single execution cycle once it is given a thread.
>
> I believe the user's expectation is well articulated via the current
> mechanism of setting the max values on the connections and it is then
> important that processors get written or improved to better honor that
> or that they document for the user under what conditions they could
> exceed the backpressure target.
>
> Thanks
> Joe
>
> On Fri, Oct 28, 2016 at 7:30 AM, Joe Gresock  wrote:
> > I have a NiFilosophical question that came up when I had a GetSFTP
> > processor running to a back-pressured connection.
> >
> > My GetSFTP is configured with max selects = 100, and the files in the
> > remote directory are nearly 1GB each.  The queue has a backpressure of
> 2GB,
> > and I assumed each run of GetSFTP would stop feeding files once it hit
> > backpressure.
> >
> > I was initially puzzled when I started periodically seeing huge backlogs
> > (71GB) on each worker in the cluster in this particular queue, until I
> > looked at the queued count/bytes stats (very useful tool, btw):
> >
> > Queued bytes statistics 
> > Queued count statistics 
> >
> > Now it's evident that GetSFTP continues to emit files until it hits the
> max
> > selects, regardless of backpressure.  I think I understand why
> backpressure
> > couldn't necessarily trump this behavior (e.g., what if a processor
> needed
> > to emit a query result set in batches.. what would you do with the flow
> > files it wanted to emit if you suddenly hit backpressure?)
> >
> > So my questions are:
> > - Do you think it's the user's responsibility to be aware of cases when
> > backpressure is overridden by a processor's implementation?  I think this
> > is important to understand, because backpressure is usually in place to
> > prevent a full disk, which is a fairly critical requirement.
> > - Is there something we can do to document this so it's more universally
> > understood?
> > - Perhaps the GetSFTP Max Selects property can indicate that it will
> > override backpressure?  In which case, are there other processors that
> > would need similar documentation?
> > - Or do we want a more universal approach, like putting this caveat in
> the
> > general documentation?
> >
> > Joe
> >
> > --
> > I know what it is to be in need, and I know what it is to have plenty.  I
> > have learned the secret of being content in any and every situation,
> > whether well fed or hungry, whether living in plenty or in want.  I can
> do
> > all this through him who gives me strength.*-Philippians 4:12-13*
>


Re: Processor running slow in production, not locally

2016-10-05 Thread Andrew Grande
Just a sanity check, number of open file handles increased as per
quickstart document? Might need much more for your flow.

Another tip, when your server experiences undesired hiccups like that try
running 'nifi.sh dump save-in-this-file.txt' and investigate/share where
NiFi threads are being held back.

Andrew

On Tue, Oct 4, 2016, 10:54 AM Russell Bateman <
russell.bate...@perfectsearchcorp.com> wrote:

> We use the templating to create FHIR XML, in this case, a
>
> 
>...
>
> 
>
> construct that includes a base-64 encoding of a PDF, the flowfile
> contents coming into the templating processor. These can get to be
> megabytes in size though our sample data was just under 1Mb.
>
> Yesterday, I built a new, reduced flow restricting the use of my
> /VelocityTemplating/ processor to perform only the part of that task
> that I suspected would be taking so much time, that is, copying the
> base-64 data into the template in place of the VTL macro. However, I
> could not reproduce the problem though I did this on the very production
> server (actually, more of a staging server, but it was the very server
> where the trouble was detected in the first place).
>
> Predictably (that is if, like me, you believe Murphy reigns supreme in
> this universe), the action using the very files in question took
> virtually no time at all, just as had been my experience running on my
> local development host. I then slightly expanded the new flow to take in
> some of the other trappings of the original one (but, it was the
> templating that was reported as being the bottleneck--minutes to fill
> out the template instead of milliseconds). In short, I could not
> replicate the problem. True, the moon is in a different phase than late
> last week when this was reported.
>
> I will come back here and report if and when we stumble upon this, it
> reoccurs and/or we took a decision about anything, for the benefit of
> the community. At present, we're looking to force re-ingestion of the
> run, using the original flow design, including the documents that
> reportedly experienced this trouble to see if it happens yet again.
>
> In the meantime, I can say:
>
> - I keep no state in this processor (indeed, I try not to and don't
> think I have anything stateful in any of our custom processors).
> - The server runs some 40 cores, 128Gb RAM on 12Tb of disk,
> dedicated hardware, CentOS 7, recently built and installed.
> - Reportedly, I learned, little else was going on on the server at
> the same time, either in NiFi or elsewhere.
> - NiFi heap is configured to be 12Gb.
> - Not so far along yet as to understand thread usage or garbage
> collection state.
>
> Again, thanks for the suggestions from both of you.
>
> Russ
>
>
> On 10/03/2016 06:28 PM, Joe Witt wrote:
> > Russ,
> >
> > As Jeff points out lack of available threads could be a factor flow
> > slower processing times but this would manifest itself by you seeing
> > that the processor isn't running very often.  If it is that the
> > process itself when executing takes much longer than on the other box
> > then it is probably best to look at some other culprits.  To check
> > this out you can view the status history and look at the average
> > number of tasks and average task time for this process.  Does it look
> > right to you in terms of how often it runs, how long it takes, and is
> > the amount of time it takes growing?
> >
> > If you find that performance of this processor itself is slowing then
> > consider a few things.
> > 1) Does it maintain some internal state and if so is the data
> > structure it is using efficient for lookups?
> > 2) How does your heap look?  Is there a lot of garbage collection
> > activity?  Are there any full garbage collections and if so how often?
> >   It should generally be the case in a well configured and designed
> > system that full garbage collections never occur (ever).
> > 3) Attaching a remote debugger and/or running profilers on it can be
> > really illuminating.
> >
> > JOe
> >
> > On Mon, Oct 3, 2016 at 11:26 AM, Jeff  wrote:
> >> Russel,
> >>
> >> This sounds like it's an environmental issue.  Are you able to see the
> heap
> >> usage on the production machine?  Are there enough available threads to
> get
> >> the throughput you are observing when you run locally?  Have you
> >> double-checked the scheduling tab on the processor config to make sure
> it
> >> is running as aggressively as it runs locally?
> >>
> >> I have run into this sort of thing before, and it was because of
> flowfile
> >> congestion in other areas of the flow, and there were no threads
> available
> >> for other processors to get through their own queues.
> >>
> >> Just trying to think through some of the obvious/high level things that
> >> might be affecting your flow...
> >>
> >> - Jeff
> >>
> >> On Mon, Oct 3, 2016 at 9:43 AM Russell Bateman <
> >> 

Re: Custom expression language functions

2016-09-11 Thread Andrew Grande
I think there is a decent compromise available. Or maybe it's there
already? Invoking a public static function via reflection.  There are
deployment and packaging limitations, sure, but at least it makes it easier
for users to invoke utility functions (or custom ones).

Andrew

On Sun, Sep 11, 2016, 5:24 PM Joe Witt  wrote:

> Gunjan
>
> Can you show some specific el statements youd like to be able to do?
>
> Thanks
> Joe
>
> On Sep 11, 2016 4:24 PM, "Gunjan Dave"  wrote:
>
> > Hi Joe,
> > Having the ability to extend or customize EL functions is important.
> Infact
> > few important functions like ability to fetch from distributed cache or
> > readily using Certain java claases like java.lang.*, Java.io.*,
> > java.match.* etc are missing.
> >
> > adding processor is certainly feasible but its an extra hop. With the
> > ability to write global functions it can be achived in single processor
> > instead of two.
> >
> > this would prove most useful in advanced rules tab of updateattribute
> > processor.
> >
> >
> >
> > On Sun, Sep 11, 2016, 12:07 PM Gunjan Dave 
> > wrote:
> >
> > > Hi Team, is there a way I can create custom functions which can then be
> > > referred in expression language and also be reusable.
> > >
> > > Thanks
> > > GUNJAN
> > >
> >
>


Re: ExecuteSql output formats.

2016-09-03 Thread Andrew Grande
Ate you thinking about some sort of a converter which can be referenced a
la controller service?

Andrew

On Sat, Sep 3, 2016, 8:31 AM Toivo Adams  wrote:

> Hi,
>
> Currently Avro is supported, but often other formats are needed.
> Workaround is to let ExecuteSql create FlowFile in Avro and immediately
> after that (using another processor) convert Avro to some other format.
> This is a waste of time and resources.
>
> Maybe it's wise to add pluggable output format converter?
> We can define ResultsetConverter interface and add property OutputConverter
> to ExecuteSql processor.
> Property OutputConverter accept  ResultsetConverter implementation class
> name.
> If property has value  ExecuteSql will try to use provided converter
> instead
> of Avro.
> This way everyone can create its own output formats as needed.
>
> I am ready to implement a such functionality with some simple
> ResultsetConverter implementation.
>
> What do you think?
>
> Thanks
> Toivo
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/ExecuteSql-output-formats-tp13265.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: Apache NiFi - 3 tier Architecture

2016-09-01 Thread Andrew Grande
Nishad,

You can split NiFi clusters at will and connect them securely. Take a look
at Remote Process Group and an underlying site-to-site protocol.

In fact, linking data centers and layers is one of the mainstream NiFi use
cases.

Andrew

On Thu, Sep 1, 2016, 1:05 PM Nishad Hameed  wrote:

> Hi All,
>
> We are planning to use the "Apache NiFi" for our data flow and scheduling
> service. But as we have 3 tier architecture, we need to see if we can split
> NiFi.
>
> Like
>
> 1.   Web/Presentation layer
>
> 2.   Application layer
>
> 3.   Data Layer
>
> Is there any project already implemented this way?
>
> Waiting for your feedback.
>
> Thanks & Regards
> Nishad C Hameed
>
> MACBIS/T-MSIS
> M:+1(301) 335-5346
> L:+1(301) 977-7884 x 681
>
>
> This electronic mail (including any attachments) may contain information
> that is privileged, confidential, and/or otherwise protected from
> disclosure to anyone other than its intended recipient(s). Any
> dissemination or use of this electronic email or its contents (including
> any attachments) by persons other than the intended recipient(s) is
> strictly prohibited. If you have received this message in error, please
> notify the sender by reply email and delete the original message (including
> any attachments) in its entirety.
>


Re: Nifi Cross Account Download With A Profile Flag

2016-08-31 Thread Andrew Grande
Debug logging can be set in a processor itself in the UI, too.

On Wed, Aug 31, 2016, 5:34 PM James Wing  wrote:

> Keren,
>
> Which version of NiFi are you using?
>
> One thing I noticed in your configuration of FetchS3Object is you are
> setting both the Access Key and Secret Key properties with the AWS
> Credentials Provider.  When you are using the AWS Credentials Provider
> Service, you should not specify keys.
>
> A more certainly helpful thing to do is enable debug logging for the AWS
> processor package by adding a line like the following to conf/logback.xml:
>
> 
>
> With the debug logging enabled, there are messages indicating which
> credential type is being attempted.  Your settings for the AWS Credentials
> Provider look appropriate.  The controller service is indeed designed to
> refresh the STS token automagically using the AWS SDK classes for temporary
> credentials.
>
> Last, you might experiment with configuring
> AWSCredentialsProviderControllerService to use your named CLI profile
> "crossaccountrole", which should also work.
>
> Thanks,
>
> James
>
> On Wed, Aug 31, 2016 at 1:44 PM, Tseytlin, Keren <
> keren.tseyt...@capitalone.com> wrote:
>
> > Hi All!
> >
> > Looking for some help on enabling Cross Account communication within
> Nifi!
> >
> > My goal: There are files stored from CloudTrail in an S3 bucket in VPC B.
> > My Nifi machines are in VPC A. I want Nifi to be able to get those files
> > from VPC B. VPC A and VPC B need to be communicating in the FetchS3Object
> > component.
> >
> > See this link for some additional info: http://docs.aws.amazon.com/
> > awscloudtrail/latest/userguide/cloudtrail-sharing-logs-assume-role.html
> >
> > I have communication working manually on the Nifi machines in VPC A when
> I
> > use the AWS CLI. The process is as follows:
> >
> > 1. Run sts -assume-role on my Nifi machine (VPC A) to assume a role
> > I've created in VPC B that is configured to have access to the S3 bucket
> in
> > VPC B.
> >
> > 2. This will generate temporary keys that need to be refreshed every
> > hour. There is no way to have assume role create permanent keys. Export
> the
> > keys as environment variables.
> >
> > 3. Set up ~/.aws/config to have a profile "crossaccountrole" that
> > connects to the arn of the role created in VPC B.
> >
> > 4. Run the following command à "aws s3 cp s3://  > name locally> --profile crossaccountrole"
> >
> > Most importantly, if I ever try to run this without the --profile flag,
> > then it will not allow me to download the file.  It seems like perhaps to
> > get it to work with Nifi I need a place to pass in the profile that needs
> > to be used in order for the communication to work.
> >
> > I've been trying to implement this in Nifi. Within the FetchS3Object, I
> > have created an AWSCredentialsProviderService which has the following
> > properties:
> >
> > ·  Access Key: VPC A access key
> >
> > ·  Secret Key: VPC A secret key
> >
> > ·  Assume Role ARN: VPC B role
> >
> > ·  Assume Role Session Name: crossaccountrole
> >
> > ·  Session Time: 3600
> > The general properties in the FetchS3Object are as follows:
> >
> > ·  Bucket: VPC B bucket name
> >
> > ·  Object: Filename of VPC B bucket object
> >
> > ·  Access Key: VPC A access key
> >
> > ·  Secret Key: VPC A secret key
> >
> > ·  AWS Credentials Provider Service: 
> >
> > However, when this tries to run I get Access Denied. I've been going
> > through the source code for Nifi and I'm not sure if short-lived tokens
> get
> > passed through. Can anyone please provide me some guidance or suggestions
> > on how to get this to work? J
> >
> > Best,
> > Keren
> > 
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>


Re: PostHTTP Penalize file on HTTP 5xx response

2016-08-31 Thread Andrew Grande
Wasn't HTTP 400 Bad Request meant for that? 500 only means the server
failed, not necessarily due to user input.

Andrew

On Wed, Aug 31, 2016, 10:16 AM Mark Payne  wrote:

> Hey Chris,
>
> I think it is reasonable to penalize when we receive a 500 response. 500
> means Internal Server Error, and it is
> very reasonable to believe that the Internal Server Error occurred due to
> the specific input (i.e., that it may not
> always occur with different input). So penalizing the FlowFile so that it
> can be retried after a little bit is reasonable
> IMO.
>
> When using the prioritizers, any FlowFile that is penalized will not hold
> up other FlowFiles. They are always at the
> bottom of the queue until the penalization expires.
>
> Thanks
> -Mark
>
>
> > On Aug 31, 2016, at 10:06 AM, McDermott, Chris Kevin (MSDU -
> STaTS/StorefrontRemote)  wrote:
> >
> > I wanted to ask if it would be at all sane to have the PostHTTP
> processor penalize a flowfile on 5xx response.  5xx indicates that the
> request may be good but it cannot be handle by the server Currently it
> seems the processor routes files eliciting this response to the failure
> output but does not penalize them.  What do we think of adding such
> penalization?
> >
> > On a related note.  If a file penalized file is routed to a funnel that
> is connect to a processor via a connection with the OldestFlowFileFirst
> prioritizer will the consumption of files from that connection be blocked
> until penalization period is over?
> >
> > What I am trying to accomplish is this: I am using PostHTTP to send
> files to web service that is throttling incoming data by returning a 500
> response.  When that happens I want to slow down files being to that that
> service.
> >
> > Thanks,
> >
> > Chris McDermott.
> >
> > Remote Business Analytics
> > STaTS/StoreFront Remote
> > HPE Storage
> > Hewlett Packard Enterprise
> > Mobile: +1 978-697-5315
> >
> >
>
>


Re: State sharing

2016-07-11 Thread Andrew Grande
Sumo,

Something lightweight I devised here, backed by a simple in mem concurrent
map, for cases when a distributed map is too much
https://github.com/aperepel/nifi-csv-bundle/tree/master/nifi-csv-processors/src/main/java/org/apache/nifi/processors/lookup

In a cluster, though, it's the true distributed replicated cache that one
must explicitly design for and use. While state framework is ok, the
default implementation backed by a ZK is not meant for high speed
concurrent use. Distributed caches are, however.

Andrew

On Sun, Jul 10, 2016, 10:58 PM Sumanth Chinthagunta <xmlk...@gmail.com>
wrote:

> Thanks Bryan.
> Would be nice if we get support for state sharing across diffrent
> processors in the future.
> -Sumo
>
> Sent from my iPhone
>
> > On Jul 10, 2016, at 7:39 PM, Bryan Bende <bbe...@gmail.com> wrote:
> >
> > Sumo,
> >
> > Two different processor instances (different UUIDs) can not share state
> > that is stored through the state manager. For something like this you
> would
> > likely use the distributed map cache.
> >
> > As Andrew mentioned, the state is accessible across the cluster, so a
> > given processor can access the state from any node because the processor
> > will have the same UUID on each node.
> >
> > -Bryan
> >
> >> On Sunday, July 10, 2016, Andrew Grande <apere...@gmail.com> wrote:
> >>
> >> Sumo,
> >>
> >> IIRC there's a node one selects when setting state. If you invoke with a
> >> cluster mode, the state will be set to a ZK by default. Otherwise just
> >> local to this processor node.
> >>
> >> Andrew
> >>
> >> On Sun, Jul 10, 2016, 10:17 PM Sumanth Chinthagunta <xmlk...@gmail.com
> >> <javascript:;>>
> >> wrote:
> >>
> >>> If I set state from one ExecuteScript processor via stateManager , can
> I
> >>> access same state from other processor ?
> >>> Thanks
> >>> Sumo
> >>>
> >>> Sent from my iPhone
> >
> >
> > --
> > Sent from Gmail Mobile
>


Re: State sharing

2016-07-10 Thread Andrew Grande
Sumo,

IIRC there's a node one selects when setting state. If you invoke with a
cluster mode, the state will be set to a ZK by default. Otherwise just
local to this processor node.

Andrew

On Sun, Jul 10, 2016, 10:17 PM Sumanth Chinthagunta 
wrote:

> If I set state from one ExecuteScript processor via stateManager , can I
> access same state from other processor ?
> Thanks
> Sumo
>
> Sent from my iPhone


Re: Rollbacks

2016-03-15 Thread Andrew Grande
Devin,

What you're asking for is a contradicting requirement. One trades individual 
message transactional control (and necessary overhead) for the higher 
throughput with micro-batching (but lesser control). In short, you can't expect 
to rollback a message and not affect the whole batch.

However, if you 'commit' this batch as received by your processor, and take on 
the responsibility of storing, tracking and commit/rollback of those yourself 
for downstream connection But then, why?

In general, one should leverage NiFi 'Scheduling' tab and have the 
micro-batching aspect controlled via the framework. Unless you really really 
have a very good reason to do it yourself.

Hope this helps,
Andrew




On 3/7/16, 5:00 PM, "Devin Fisher"  wrote:

>Question about rollbacks. I have a processor that is grabbing a list of
>FlowFiles from session.get(100). It will then process each flow file one at
>a time.  I want to then be able if there is an error with a single FlowFile
>to roll it back (and only this failed FlowFile) and transfer it to the
>FAILED relationship. But reading the javadoc for ProcessSession I don't get
>the sense that I can do that.
>
>Is my workflow wrong, should I only get one at a time from the session and
>commit after each one?
>
>Devin


Re: Content Repo Large.. Archive in there?

2015-10-23 Thread Andrew Grande
Ryan,

./conf/archive is to create a snapshot of your entire flow, not the content 
repository data. See the attached screenshot (Settings menu on the right).

Andrew




On 10/23/15, 12:47 PM, "ryan.andrew.hendrick...@gmail.com on behalf of Ryan H" 
 
wrote:

>Hi,
>   I'm noticing my Content Repo growing large.  There's a number of files...
>
>content_repo/837/archive/144...-837
>
>   Is this new in 3.0?  My conf file says any archiving should be going
>into ./conf/archive, but i don't see anything in there.
>
>Thanks,
>Ryan


Re: Content Repo Large.. Archive in there?

2015-10-23 Thread Andrew Grande
Attachments don't go through, view at imagebin: http://ibin.co/2K3SwR0z8yWX




On 10/23/15, 12:52 PM, "Andrew Grande" <agra...@hortonworks.com> wrote:

>Ryan,
>
>./conf/archive is to create a snapshot of your entire flow, not the content 
>repository data. See the attached screenshot (Settings menu on the right).
>
>Andrew
>
>
>
>
>On 10/23/15, 12:47 PM, "ryan.andrew.hendrick...@gmail.com on behalf of Ryan H" 
><ryan.andrew.hendrick...@gmail.com on behalf of rhendrickson.w...@gmail.com> 
>wrote:
>
>>Hi,
>>   I'm noticing my Content Repo growing large.  There's a number of files...
>>
>>content_repo/837/archive/144...-837
>>
>>   Is this new in 3.0?  My conf file says any archiving should be going
>>into ./conf/archive, but i don't see anything in there.
>>
>>Thanks,
>>Ryan


Re: Newbie Dev Contribution : Questions

2015-10-15 Thread Andrew Grande
Venky,

Thanks for putting yourself out to help. The best way is to employ an extensive 
testing framework in NiFi. E.g. take a look at 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestRouteOnAttribute.java

Andrew




On 10/15/15, 3:26 PM, "Venkatesh Sellappa"  wrote:

>Devs, 
>
>Firstly, I wanted to commend the fantastic product and the great
>documentation on the website.
>The Contributor's guide and the User Guide is especially good.
>
>I have been going through the code and trying to see if i can contribute on
>some of the newbie, beginner Jiras.
>
>As part of doing this, i ran into one particular blind spot : 
>
>Is there a way to debug/run the individual components of the project in an
>IDE ? This was not immediately obvious from the documentation. 
>
>What's the best place to look for this ?
>
>Venky.
>
>
>
>
>
>--
>View this message in context: 
>http://apache-nifi-developer-list.39713.n7.nabble.com/Newbie-Dev-Contribution-Questions-tp3118.html
>Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
>


Re: Ingest Original data from External system by data's dependent condition

2015-10-13 Thread Andrew Grande
A typical pattern/workaround for this situation was to copy e.g. the json _in 
full_ into an attribute, leaving the payload in a binary format. But, as you 
can imagine, it's not ideal as FlowFile memory and disk pressure will be raised 
significantly and duplicate that of an existing content repo.

Andrew




On 10/13/15, 9:21 AM, "Joe Witt"  wrote:

>Hello
>
>Is the only reason for converting from AVRO or whatever to JSON so
>that you can extract attributes?
>
>I recommend not converting the data simply so that you can do that.  I
>recommend building processes to extract attributes from the raw.  I
>believe we have JIRA's targeted for the next release to do this for
>AVRO just like JSON.  If you have other custom formats in mind i
>recommend building 'ExtractXYZAttributes'.
>
>There is no mechanism in play today where we convert from format A to
>B and then in the resulting B we keep the original A hanging around
>that object.  You can do this of course by making archive/container
>formats to hold both but this is also not recommended.
>
>Does this make sense?
>
>Thanks
>Joe
>
>On Tue, Oct 13, 2015 at 9:06 AM, Oleg Zhurakousky
> wrote:
>> Sorry, I meant to say that you have to enrich the original file with a 
>> correlation attribute, otherwise there is nothing to correlate on.
>> I am not sure if NiFi has any implementation of ContentEnricher (EIP), 
>> perhaps UpdateAttribute will do the trick.
>>
>> Oleg
>>
>>> On Oct 13, 2015, at 8:21 AM, yejug  wrote:
>>>
>>> Hi Oleg
>>>
>>> THanks for response, may be I missing something (I cannot find you image
>>> =)), but you suggestion doesn;t appropriate.
>>>
>>> There into MergeContent processor brings two types of flowFiles :
>>> 1) one is flow file with original content (AVRO) but without populated
>>> "correlation" attribute, directly from GetKafka
>>> 2) and second type of flow file with parsed content (JSON) and populated
>>> "correlation" attribute
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: 
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/Ingest-Original-data-from-External-system-by-data-s-dependent-condition-tp3093p3096.html
>>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
>>>
>>
>