Re: "External" extensions

2015-11-02 Thread xmlking
In nodejs/npm world, each module has package.json, which declaratively indicate 
which node version and other external modules it depends on.
Similarly I am thinking  Nar modules can declare which version of JVM and NiFi 
it depends on and also which other modules it depends on.  NPM ( NiFi Package 
Manager) can warn users if the module they are trying to   Install doesn't 
match their runtime.

-Sumo



Sent from my iPad

> On Nov 1, 2015, at 2:14 PM, Oleg Zhurakousky  
> wrote:
> 
> Well the question still remains unanswered, what relationship those projects 
> have to ASF distribution of NiFi? I seriously doubt that anyone on this list 
> suggests that all have to be part of the release. And if they are not then 
> they are just individual projects managed in/out of ASF, right?
> 
> Sent from my iPhone
> 
>> On Nov 1, 2015, at 17:10, Adam Estrada  wrote:
>> 
>> The elasticsearch project has a really cool plugin utility that 
>> automatically downloads and builds plugins from GitHub, BitBucket, etc...
>> 
>> Has anyone taken a look at that?
>> 
>> A
>> 
>> Sent from my iPhone
>> 
>>> On Nov 1, 2015, at 3:54 PM, Benson Margulies  wrote:
>>> 
>>> ASF policy; a PMC should not be in the business of creating and
>>> maintaining code 'somewhere else' and/or under another license, for
>>> fear of confusion.
>>> 
>>> Gray area -- some PMC members can be in that business, as long as the
>>> boundary is clear.
>>> 
>>> There was a thing called 'apache extras' for this. Unfortunately, it
>>> was hosted as part of google code, which is defunct. As far as I know,
>>> various plans to replace it have not come to fruition, but I might be
>>> behind.
>>> 
>>> 
>>> 
>>> 
>>> 
 On Sun, Nov 1, 2015 at 3:04 PM,   wrote:
 How about maintaining a registry like npm https://www.npmjs.com or 
 https://github.com/jspm/registry where individuals host their modules on 
 github and users can discover them via registry?
 
 Sent from my iPad
 
> On Nov 1, 2015, at 10:35 AM, Joe Witt  wrote:
> 
> " but raises several questions, all pertaining to the relationship of
> this project with ASF, its ownership and control."
> 
> ...that is what I'm struggling to respond to as well.
> 
> It feels like the right path within the ASF is to establish child
> projects of Apache NiFi.  I think we knew we needed to do this anyway
> as we've mentioned before.  It just might be time now...
> 
> On Sun, Nov 1, 2015 at 1:34 PM, Oleg Zhurakousky
>  wrote:
>> Tony, plenty of opinion but so are the questions/concerns.
>> Managing it on GitHub is perfect, but raises several questions, all 
>> pertaining to the relationship of this project with ASF, its ownership 
>> and control.
>> Perhaps some PMCs on the list can shed some light as to how it could be 
>> done?
>> 
>> Cheers
>> Oleg
>> 
>> Sent from my iPhone
>> 
>>> On Nov 1, 2015, at 13:08, Adam Estrada  wrote:
>>> 
>>> This has been suggested before. It's a great idea!!! I suggest creating 
>>> a repo on github for NiFi-Processors or something like that. There are 
>>> many more folks searching through GitHub than on the Apache wikis, IMO. 
>>> This will inevitably help spread the word...
>>> 
>>> A
>>> 
>>> Sent from my iPhone
>>> 
 On Nov 1, 2015, at 12:55 PM, Tony Kurc  wrote:
 
 Not very strong opinions on this?
> On Oct 30, 2015 10:53 AM, "Joe Witt"  wrote:
> 
> Tony,
> 
> I completely agree we should do this.  A quick github search reveals
> there are some nice utilities/processors folks have built for NiFi but
> for which they're not necessarily going to submit them as PRs.  We
> should link to these as much as possible but we should also help folks
> understand these aren't 'apache' things and are not of the Apache NiFi
> community directly but they are good for users and developers to know
> about.
> 
> Perhaps a wiki page linking to these is good provided we have the
> above sort of disclaimer and a healthy recognition such references
> will become stale...
> 
> Thanks
> Joe
> 
>> On Fri, Oct 30, 2015 at 10:48 AM, Tony Kurc  wrote:
>> All,
>> I wanted to start a conversation about projects that are good for 
>> people
>> using or developing NiFi, but either can't or don't belong in the 
>> source
>> tree. This could be due to licensing issues (for example not 
>> compatible
> (or
>> not yet determined if it is compatible (GPL [1])) with the Apache

RE: Next release?

2015-11-02 Thread Rick Braddy
Joe,

This reminds me... are there any entry or exit criteria (from a defects 
perspective) established for NiFi releases?  In other words, what is the 
criteria for determining when the code is ready for release and production use?

Thanks
Rick

-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Monday, November 02, 2015 8:56 AM
To: dev@nifi.apache.org
Subject: Re: Next release?

Team...we def need to address or move a good bit of ticketage to move towards 
an RC.  It isn't critical we do it 'now' but we should strive for 6 to 8 week 
release cycles in my view.

We should also decouple the framework/app releases from those of processors in 
my view but we can kick off another thread for discussion there.

Thanks
Joe
On Oct 29, 2015 11:50 AM, "Joe Witt"  wrote:

> mike - that is good to know.  Look forward to seeing the ticket.  If 
> you can put the thread dumps up that would obviously be awesome though 
> I recognize why that is non-trivial.
>
> Thanks
> Joe
>
> On Thu, Oct 29, 2015 at 11:18 AM, Michael Moser 
> wrote:
> > All,
> >
> > On an extremely busy cluster that I work with, I've noticed some 
> > thread starvation issues on the NCM.  It manifests as the "spinning 
> > wheel of death" when refreshing the NiFi UI.  Thread and heap dumps 
> > point to the WebClusterManager in the framework. I've made some 
> > small quick-win
> changes
> > that I'm testing now, but would appreciate feedback from the community.
> I
> > will write up a ticket shortly that explains it, but would like to 
> > see it in 0.4.0 if reviewers agree with the changes.
> >
> > Thanks,
> > -- Mike
> >
> >
> > On Thu, Oct 29, 2015 at 10:04 AM, Joe Witt  wrote:
> >
> >> I haven't done it in a while.  Am happy to take it.  We need to 
> >> scrub
> the
> >> items assigned to 040 and pick our must haves ...
> >> On Oct 29, 2015 9:20 AM, "Sean Busbey"  wrote:
> >>
> >> > Hi Folks!
> >> >
> >> > Tomorrow marks 6 weeks since the 0.3.0 release. Any one up for 
> >> > starting a release candidate?
> >> >
> >> > --
> >> > Sean
> >> >
> >>
>


RE: Next release?

2015-11-02 Thread Joe Witt
The current process is outlined in our release guide.  But the main idea is
that all who wish to participate in release validation do so from the RC.
Unit tests are of course run by the builds but we rely on people power to
verify system level testing and that is part of that testing folks should
do.  We obviously can't test all the things and environments and so on with
this model.  The more CI we can get established the better we can do.  But
we have much room for improvement in validating releases.
On Nov 2, 2015 10:00 AM, "Rick Braddy"  wrote:

> Joe,
>
> This reminds me... are there any entry or exit criteria (from a defects
> perspective) established for NiFi releases?  In other words, what is the
> criteria for determining when the code is ready for release and production
> use?
>
> Thanks
> Rick
>
> -Original Message-
> From: Joe Witt [mailto:joe.w...@gmail.com]
> Sent: Monday, November 02, 2015 8:56 AM
> To: dev@nifi.apache.org
> Subject: Re: Next release?
>
> Team...we def need to address or move a good bit of ticketage to move
> towards an RC.  It isn't critical we do it 'now' but we should strive for 6
> to 8 week release cycles in my view.
>
> We should also decouple the framework/app releases from those of
> processors in my view but we can kick off another thread for discussion
> there.
>
> Thanks
> Joe
> On Oct 29, 2015 11:50 AM, "Joe Witt"  wrote:
>
> > mike - that is good to know.  Look forward to seeing the ticket.  If
> > you can put the thread dumps up that would obviously be awesome though
> > I recognize why that is non-trivial.
> >
> > Thanks
> > Joe
> >
> > On Thu, Oct 29, 2015 at 11:18 AM, Michael Moser 
> > wrote:
> > > All,
> > >
> > > On an extremely busy cluster that I work with, I've noticed some
> > > thread starvation issues on the NCM.  It manifests as the "spinning
> > > wheel of death" when refreshing the NiFi UI.  Thread and heap dumps
> > > point to the WebClusterManager in the framework. I've made some
> > > small quick-win
> > changes
> > > that I'm testing now, but would appreciate feedback from the community.
> > I
> > > will write up a ticket shortly that explains it, but would like to
> > > see it in 0.4.0 if reviewers agree with the changes.
> > >
> > > Thanks,
> > > -- Mike
> > >
> > >
> > > On Thu, Oct 29, 2015 at 10:04 AM, Joe Witt  wrote:
> > >
> > >> I haven't done it in a while.  Am happy to take it.  We need to
> > >> scrub
> > the
> > >> items assigned to 040 and pick our must haves ...
> > >> On Oct 29, 2015 9:20 AM, "Sean Busbey"  wrote:
> > >>
> > >> > Hi Folks!
> > >> >
> > >> > Tomorrow marks 6 weeks since the 0.3.0 release. Any one up for
> > >> > starting a release candidate?
> > >> >
> > >> > --
> > >> > Sean
> > >> >
> > >>
> >
>


Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Joe Witt
Mark

All fair points.  Can you please point out which processor docs
specifically should be better.  Let's fix em..you will quickly lose that
new user vibe and not notice what needs to improve as much.  We need to
make the new user experience awesome.

Thanks
Joe
On Nov 2, 2015 10:08 AM, "Mark Petronic"  wrote:

> My primary use is for understanding Nifi. I like to direct various
> processors output into both their logical next processor stage as well as
> into a log attribute processor. Then I tail the Nifi app log file and watch
> what happens - in real time. I do not intend to use this for long term log
> retention. I agree that providence is the right choice for that. So, the
> only reason I wanted to allow configuration of a custom logger was simply
> to isolate all the attribute-rich logging from the normal logging because I
> was primarily interested in the attribute flows as a way to (a) better
> understand what a processor emits because, frankly, the documentation of
> some of the processors is very sparse. So, I learn imperatively, so to
> speak. I say that as a new user. I feel I should be able to get a pretty
> good understanding of a processor by reading the usage. But I am finding
> that the documentation, in some cases, is more like what I like to refer to
> as, "note to self" documentation. Great if you are the guy who wrote the
> processor with those "insights" - not so great if you are not the
> developer. So, then I need to dig up the code. That should not be needed as
> the first step of understanding a processor as a new user. There is some
> well documented processors but not all are, IMHO. (b) Validate my flows
> with some test data and verify attribute values look correct and routing is
> happen on them as expected, etc. Again, easier, IMO, to see in the logs
> than digging into the providence data.
>
> Maybe this is just a good "private" feature for me so maybe I will just
> create a private version to use on my own. I already have it working but
> would need more polish to achieve PR status. Maybe this is the sort of
> thing that others would not find beneficial? That's fine. There are others
> ways I can contribute in the future. I'm still having fun! :)
>
> On Sun, Nov 1, 2015 at 12:41 PM, Joe Witt  wrote:
>
> > Mark Petronic,
> >
> > I share Payne's perspective on this.  But I'd also like to work with
> > you to better understand the workflow.  For those of us that have used
> > this tool for a long time there is a lot we take for granted from a
> > new user perspective.  We believe the provenance feature to provide a
> > far superior option to understanding how an item went through the
> > system and the timing and what we knew when and so on.  But, it would
> > be great to understand it from your perspective as someone learning
> > NiFi.  Not meaning to take away from your proposed contrib - that
> > would be great too.  Just want to see if the prov user experience
> > solves what you're looking for and if not can we make it do that.
> >
> > Thanks
> > Joe
> >
> > On Sun, Nov 1, 2015 at 11:23 AM, Mark Payne 
> wrote:
> > > Mark,
> > >
> > > To make sure that I understand what you're proposing, you want to add a
> > property to
> > > LogAttribute that allows users to provide a custom logger name?
> > >
> > > If that is indeed what you are suggesting then I think it's a great
> idea.
> > >
> > > That being said, in practice I rarely ever use LogAttribute and we even
> > considered removing
> > > it from the codebase before we open sourced, because the Data
> Provenance
> > provides a
> > > much better view of what's going on to debug your flows.
> > >
> > > I know you're pretty new to NiFi, so if you've not yet had a chance to
> > play with the Provenance,
> > > you can see the section in the User Guide at
> >
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
> > <
> >
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
> > >
> > >
> > > If you're interested in updating the LogAttribute processor, though,
> > we'd be happy to have
> > > that contribution added, as it does make the Processor more usable.
> > >
> > > Thanks
> > > -Mark
> > >
> > >> On Oct 31, 2015, at 12:35 PM, Mark Petronic 
> > wrote:
> > >>
> > >> From the code, it appears it cannot be done as the attribute logging
> > >> goes the same getLogger() instance as the normal nifi-app traces. Has
> > >> anyone considered making that configurable, maybe allowing you do
> > >> define a different logger name for LogAttribute then creating that
> > >> logger definition in log back conf allowing flexibility? I'm using
> > >> attribute logging heavily as I try to better learn/debug Nifi (it give
> > >> you a nice 'under the hood' view of the flow) and build up some flows
> > >> and feel it would be beneficial to be able to capture the LogAttribte
> > >> message by themselves for 

Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Aldrin Piri
We greatly appreciate contributions.  Your prescribed approach sounds great
and if you are willing to give us a few cycles pointing out, and optionally
correcting, the items that are in need of improvement, we will certainly
incorporate.

Thanks!

On Mon, Nov 2, 2015 at 1:28 PM, Mark Petronic 
wrote:

> I'm sort of in the camp of "don't come with a complaint if you don't come
> with a solution" and hesitated to even raise the documentation comment
> without just fixing it myself. How about this, I just do some updates on
> some processor docs myself and use that as my first contribution to work
> through the process of committing to this project?
>
> But, to give you one quick example, EvaluateJSONPath (which, btw has pretty
> good docs otherwise) does not mention HOW to extract the JSON you are
> interested in. I had to look at the code to figure out it used this:
> https://github.com/jayway/JsonPath. Ok, that was not hard, I admit, but,
> as
> a user, should I need to look at the code for such information? I submit,
> no. Me personally, I like to dig into the code. So, this is more a comment
> on "overall goodness" for the general new user experience.
>
> I agree with your assessment of 'new user vibe' as I am starting to not
> notice it as much. lol
>
> On Mon, Nov 2, 2015 at 10:15 AM, Joe Witt  wrote:
>
> > Mark
> >
> > All fair points.  Can you please point out which processor docs
> > specifically should be better.  Let's fix em..you will quickly lose that
> > new user vibe and not notice what needs to improve as much.  We need to
> > make the new user experience awesome.
> >
> > Thanks
> > Joe
> > On Nov 2, 2015 10:08 AM, "Mark Petronic"  wrote:
> >
> > > My primary use is for understanding Nifi. I like to direct various
> > > processors output into both their logical next processor stage as well
> as
> > > into a log attribute processor. Then I tail the Nifi app log file and
> > watch
> > > what happens - in real time. I do not intend to use this for long term
> > log
> > > retention. I agree that providence is the right choice for that. So,
> the
> > > only reason I wanted to allow configuration of a custom logger was
> simply
> > > to isolate all the attribute-rich logging from the normal logging
> > because I
> > > was primarily interested in the attribute flows as a way to (a) better
> > > understand what a processor emits because, frankly, the documentation
> of
> > > some of the processors is very sparse. So, I learn imperatively, so to
> > > speak. I say that as a new user. I feel I should be able to get a
> pretty
> > > good understanding of a processor by reading the usage. But I am
> finding
> > > that the documentation, in some cases, is more like what I like to
> refer
> > to
> > > as, "note to self" documentation. Great if you are the guy who wrote
> the
> > > processor with those "insights" - not so great if you are not the
> > > developer. So, then I need to dig up the code. That should not be
> needed
> > as
> > > the first step of understanding a processor as a new user. There is
> some
> > > well documented processors but not all are, IMHO. (b) Validate my flows
> > > with some test data and verify attribute values look correct and
> routing
> > is
> > > happen on them as expected, etc. Again, easier, IMO, to see in the logs
> > > than digging into the providence data.
> > >
> > > Maybe this is just a good "private" feature for me so maybe I will just
> > > create a private version to use on my own. I already have it working
> but
> > > would need more polish to achieve PR status. Maybe this is the sort of
> > > thing that others would not find beneficial? That's fine. There are
> > others
> > > ways I can contribute in the future. I'm still having fun! :)
> > >
> > > On Sun, Nov 1, 2015 at 12:41 PM, Joe Witt  wrote:
> > >
> > > > Mark Petronic,
> > > >
> > > > I share Payne's perspective on this.  But I'd also like to work with
> > > > you to better understand the workflow.  For those of us that have
> used
> > > > this tool for a long time there is a lot we take for granted from a
> > > > new user perspective.  We believe the provenance feature to provide a
> > > > far superior option to understanding how an item went through the
> > > > system and the timing and what we knew when and so on.  But, it would
> > > > be great to understand it from your perspective as someone learning
> > > > NiFi.  Not meaning to take away from your proposed contrib - that
> > > > would be great too.  Just want to see if the prov user experience
> > > > solves what you're looking for and if not can we make it do that.
> > > >
> > > > Thanks
> > > > Joe
> > > >
> > > > On Sun, Nov 1, 2015 at 11:23 AM, Mark Payne 
> > > wrote:
> > > > > Mark,
> > > > >
> > > > > To make sure that I understand what you're proposing, you want to
> > add a
> > > > property to
> > > > > LogAttribute that allows 

Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Adam Taft
This thread has forked into two different conversations:  1. improvements
to LogAttribute processor; 2. improvements to processor documentation.

1)  re: improvements to LogAttribute - we already have NIFI-67 [1] that
suggests a number of improvements to LogAttribute.  One of these is the use
of a custom name for the logger so that logback rules can be written
against that name.

While the provenance engine is great for many scenarios, in my opinion, it
doesn't replace the need for true text-based logging.  The tooling for log
processing is very mature and there's no ability to "grep" a provenance
repository, migrate or offload provenance logs into deep storage, store log
events into a database, or do any other cool syslogd or logback type
things.  Being able to capture and log a flowfile at the exact right place
in the data flow and processing it using the command line is an extremely
valuable tool in the toolkit.

For a long time, I've wanted to work on at least some of the things
mentioned in NIFI-67 and will hopefully get to do so time willing.  Having
a custom "name" for the LogAttribute processor seems like a no-brainer.
Contributions for this should definitely be welcome!

2) improvements to processor document - I agree, even as a somewhat
seasoned NIFI user, I still have a hard time reading and understanding the
processor documentation.  I often do exactly what Mark P. suggests and
instead go directly to the source.  Any contribution towards better
processor documentation is greatly appreciated!

[1] https://issues.apache.org/jira/browse/NIFI-67


On Mon, Nov 2, 2015 at 1:54 PM, Aldrin Piri  wrote:

> We greatly appreciate contributions.  Your prescribed approach sounds great
> and if you are willing to give us a few cycles pointing out, and optionally
> correcting, the items that are in need of improvement, we will certainly
> incorporate.
>
> Thanks!
>
> On Mon, Nov 2, 2015 at 1:28 PM, Mark Petronic 
> wrote:
>
> > I'm sort of in the camp of "don't come with a complaint if you don't come
> > with a solution" and hesitated to even raise the documentation comment
> > without just fixing it myself. How about this, I just do some updates on
> > some processor docs myself and use that as my first contribution to work
> > through the process of committing to this project?
> >
> > But, to give you one quick example, EvaluateJSONPath (which, btw has
> pretty
> > good docs otherwise) does not mention HOW to extract the JSON you are
> > interested in. I had to look at the code to figure out it used this:
> > https://github.com/jayway/JsonPath. Ok, that was not hard, I admit, but,
> > as
> > a user, should I need to look at the code for such information? I submit,
> > no. Me personally, I like to dig into the code. So, this is more a
> comment
> > on "overall goodness" for the general new user experience.
> >
> > I agree with your assessment of 'new user vibe' as I am starting to not
> > notice it as much. lol
> >
> > On Mon, Nov 2, 2015 at 10:15 AM, Joe Witt  wrote:
> >
> > > Mark
> > >
> > > All fair points.  Can you please point out which processor docs
> > > specifically should be better.  Let's fix em..you will quickly lose
> that
> > > new user vibe and not notice what needs to improve as much.  We need to
> > > make the new user experience awesome.
> > >
> > > Thanks
> > > Joe
> > > On Nov 2, 2015 10:08 AM, "Mark Petronic" 
> wrote:
> > >
> > > > My primary use is for understanding Nifi. I like to direct various
> > > > processors output into both their logical next processor stage as
> well
> > as
> > > > into a log attribute processor. Then I tail the Nifi app log file and
> > > watch
> > > > what happens - in real time. I do not intend to use this for long
> term
> > > log
> > > > retention. I agree that providence is the right choice for that. So,
> > the
> > > > only reason I wanted to allow configuration of a custom logger was
> > simply
> > > > to isolate all the attribute-rich logging from the normal logging
> > > because I
> > > > was primarily interested in the attribute flows as a way to (a)
> better
> > > > understand what a processor emits because, frankly, the documentation
> > of
> > > > some of the processors is very sparse. So, I learn imperatively, so
> to
> > > > speak. I say that as a new user. I feel I should be able to get a
> > pretty
> > > > good understanding of a processor by reading the usage. But I am
> > finding
> > > > that the documentation, in some cases, is more like what I like to
> > refer
> > > to
> > > > as, "note to self" documentation. Great if you are the guy who wrote
> > the
> > > > processor with those "insights" - not so great if you are not the
> > > > developer. So, then I need to dig up the code. That should not be
> > needed
> > > as
> > > > the first step of understanding a processor as a new user. There is
> > some
> > > > well documented processors but not all 

[GitHub] nifi pull request: NIFI-1051 Allowed FileSystemRepository to skip ...

2015-11-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/111


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread davidrsm...@btinternet.com
Hi

Where I work we have created an attribute loggers of our own. It is a fairly 
simple affair which used a regex to determine which attributes to log, and 
writes them as key value pairs to a file, whose location is determined by a 
user properly. I'm happy to put this out there if anyone is interested.

Sent from my HTC

- Reply message -
From: "Adam Taft" 
Date: Mon, Nov 2, 2015 19:23
Subject: LogAttribute - Sending that output to a custom logger?
To: 

This thread has forked into two different conversations:  1. improvements
to LogAttribute processor; 2. improvements to processor documentation.

1)  re: improvements to LogAttribute - we already have NIFI-67 [1] that
suggests a number of improvements to LogAttribute.  One of these is the use
of a custom name for the logger so that logback rules can be written
against that name.

While the provenance engine is great for many scenarios, in my opinion, it
doesn't replace the need for true text-based logging.  The tooling for log
processing is very mature and there's no ability to "grep" a provenance
repository, migrate or offload provenance logs into deep storage, store log
events into a database, or do any other cool syslogd or logback type
things.  Being able to capture and log a flowfile at the exact right place
in the data flow and processing it using the command line is an extremely
valuable tool in the toolkit.

For a long time, I've wanted to work on at least some of the things
mentioned in NIFI-67 and will hopefully get to do so time willing.  Having
a custom "name" for the LogAttribute processor seems like a no-brainer.
Contributions for this should definitely be welcome!

2) improvements to processor document - I agree, even as a somewhat
seasoned NIFI user, I still have a hard time reading and understanding the
processor documentation.  I often do exactly what Mark P. suggests and
instead go directly to the source.  Any contribution towards better
processor documentation is greatly appreciated!

[1] https://issues.apache.org/jira/browse/NIFI-67


On Mon, Nov 2, 2015 at 1:54 PM, Aldrin Piri  wrote:

> We greatly appreciate contributions.  Your prescribed approach sounds great
> and if you are willing to give us a few cycles pointing out, and optionally
> correcting, the items that are in need of improvement, we will certainly
> incorporate.
>
> Thanks!
>
> On Mon, Nov 2, 2015 at 1:28 PM, Mark Petronic 
> wrote:
>
> > I'm sort of in the camp of "don't come with a complaint if you don't come
> > with a solution" and hesitated to even raise the documentation comment
> > without just fixing it myself. How about this, I just do some updates on
> > some processor docs myself and use that as my first contribution to work
> > through the process of committing to this project?
> >
> > But, to give you one quick example, EvaluateJSONPath (which, btw has
> pretty
> > good docs otherwise) does not mention HOW to extract the JSON you are
> > interested in. I had to look at the code to figure out it used this:
> > https://github.com/jayway/JsonPath. Ok, that was not hard, I admit, but,
> > as
> > a user, should I need to look at the code for such information? I submit,
> > no. Me personally, I like to dig into the code. So, this is more a
> comment
> > on "overall goodness" for the general new user experience.
> >
> > I agree with your assessment of 'new user vibe' as I am starting to not
> > notice it as much. lol
> >
> > On Mon, Nov 2, 2015 at 10:15 AM, Joe Witt  wrote:


Re: Common data exchange formats and tabular data

2015-11-02 Thread Matthew Burgess
Hello all,

I am new to the NiFi community but I have a good amount of experience with
ETL tools and applications that process lots of tabular data. In my
experience, JSON is only useful as the common format for tabular data if it
has a "flat" schema, in which case there aren't any advantages for JSON over
other formats such as CSV. However, I've seen lots of "CSV" files that don't
seem to adhere to any standard, so I would presume NiFi would need a rigid
schema such as RFC-4180 (http://www.rfc-base.org/txt/rfc-4180.txt).

However CSV isn't a natural way to express the schema of the rows, so JSON
or YAML is probably a better choice. There's a format called Tabular Data
Package that combines CSV and JSON for tabular data serialization:
http://dataprotocols.org/tabular-data-package/

Avro is similar, but the schema must always be provided with the data. In
the case of NiFi DataFlows, it's likely more efficient to send the schema
once as an initialization packet (I can't remember the real term in NiFi),
then the rows can be streamed individually, in batches of user-defined size,
sampled, etc.

Having said all that, there are projects like Apache Drill that can handle
non-flat JSON files and still present them in tabular format. They have
functions like KVGEN and FLATTEN to transform the document(s) into tabular
format. In the use cases you present below, you already know the data is
tabular and as such, the extra data model transformation is not needed.  If
this is desired, it should be apparent that a Streaming JSON processor would
be necessary; otherwise, for large tabular datasets you'd have to read the
whole JSON file into memory to parse individual rows.

Regards,
Matt

From:  Toivo Adams 
Reply-To:  
Date:  Monday, November 2, 2015 at 5:12 AM
To:  
Subject:  Common data exchange formats and tabular data

All,
Some processors get/put data in tabular form. (PutSQL, ExecuteSQL, soon
Cassandra) 
It would be very nice to be able use such processors in pipeline ­ previous
processor output is next processor input. To achieve this, processors should
use common data exchange format.

JSON is most widely used, it¹s simple and readable. But JSON lacks schema.
Schema can be very useful to automate data insert/update.

Avro has schema, but is somewhat more complicated and not widely used
(yet?).

Please see also:

https://issues.apache.org/jira/browse/NIFI-978

https://issues.apache.org/jira/browse/NIFI-901

Opinions?

Thanks
Toivo




--
View this message in context:
http://apache-nifi-developer-list.39713.n7.nabble.com/Common-data-exchange-f
ormats-and-tabular-data-tp3508.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.





Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Mark Petronic
My primary use is for understanding Nifi. I like to direct various
processors output into both their logical next processor stage as well as
into a log attribute processor. Then I tail the Nifi app log file and watch
what happens - in real time. I do not intend to use this for long term log
retention. I agree that providence is the right choice for that. So, the
only reason I wanted to allow configuration of a custom logger was simply
to isolate all the attribute-rich logging from the normal logging because I
was primarily interested in the attribute flows as a way to (a) better
understand what a processor emits because, frankly, the documentation of
some of the processors is very sparse. So, I learn imperatively, so to
speak. I say that as a new user. I feel I should be able to get a pretty
good understanding of a processor by reading the usage. But I am finding
that the documentation, in some cases, is more like what I like to refer to
as, "note to self" documentation. Great if you are the guy who wrote the
processor with those "insights" - not so great if you are not the
developer. So, then I need to dig up the code. That should not be needed as
the first step of understanding a processor as a new user. There is some
well documented processors but not all are, IMHO. (b) Validate my flows
with some test data and verify attribute values look correct and routing is
happen on them as expected, etc. Again, easier, IMO, to see in the logs
than digging into the providence data.

Maybe this is just a good "private" feature for me so maybe I will just
create a private version to use on my own. I already have it working but
would need more polish to achieve PR status. Maybe this is the sort of
thing that others would not find beneficial? That's fine. There are others
ways I can contribute in the future. I'm still having fun! :)

On Sun, Nov 1, 2015 at 12:41 PM, Joe Witt  wrote:

> Mark Petronic,
>
> I share Payne's perspective on this.  But I'd also like to work with
> you to better understand the workflow.  For those of us that have used
> this tool for a long time there is a lot we take for granted from a
> new user perspective.  We believe the provenance feature to provide a
> far superior option to understanding how an item went through the
> system and the timing and what we knew when and so on.  But, it would
> be great to understand it from your perspective as someone learning
> NiFi.  Not meaning to take away from your proposed contrib - that
> would be great too.  Just want to see if the prov user experience
> solves what you're looking for and if not can we make it do that.
>
> Thanks
> Joe
>
> On Sun, Nov 1, 2015 at 11:23 AM, Mark Payne  wrote:
> > Mark,
> >
> > To make sure that I understand what you're proposing, you want to add a
> property to
> > LogAttribute that allows users to provide a custom logger name?
> >
> > If that is indeed what you are suggesting then I think it's a great idea.
> >
> > That being said, in practice I rarely ever use LogAttribute and we even
> considered removing
> > it from the codebase before we open sourced, because the Data Provenance
> provides a
> > much better view of what's going on to debug your flows.
> >
> > I know you're pretty new to NiFi, so if you've not yet had a chance to
> play with the Provenance,
> > you can see the section in the User Guide at
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
> <
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
> >
> >
> > If you're interested in updating the LogAttribute processor, though,
> we'd be happy to have
> > that contribution added, as it does make the Processor more usable.
> >
> > Thanks
> > -Mark
> >
> >> On Oct 31, 2015, at 12:35 PM, Mark Petronic 
> wrote:
> >>
> >> From the code, it appears it cannot be done as the attribute logging
> >> goes the same getLogger() instance as the normal nifi-app traces. Has
> >> anyone considered making that configurable, maybe allowing you do
> >> define a different logger name for LogAttribute then creating that
> >> logger definition in log back conf allowing flexibility? I'm using
> >> attribute logging heavily as I try to better learn/debug Nifi (it give
> >> you a nice 'under the hood' view of the flow) and build up some flows
> >> and feel it would be beneficial to be able to capture the LogAttribte
> >> message by themselves for more clarity on what is happening. I would
> >> not mind maybe trying to implement this feature as my first crack at
> >> contributing to the project. Seems like a fairly easy one that would
> >> allow me to "go through the motions" of a full pull request process
> >> and iron out the process. Anyone have any thoughts on this?
> >
>


Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Adam Taft
David,

This sounds like a slightly different use case than the NiFi standard
LogAttribute processor.  It sounds like your processor is more of a generic
attribute converter and file writer.  The LogAttribute processor is
designed to interact with the underlying NiFi logging subsystem, not
necessarily just to write files.

That being said, your processor may be a useful contribution to Apache
NiFi.  Specifically, the value-add of your processor might be in the
key-value format you've defined to output the flowfile attributes.  It
might be interesting to see this expressed as an attribute-to-payload
converter, chained together with potentially other processors like PutFile
in the dataflow.

If you want to contribute your processor, I would recommend making it
available on GitHub (or similar) for review by the Apache NiFi community.
Just post a link of your contribution here or even issue a pull request for
your processor.  It would at least be evaluated and considered for
inclusion.

Hope this helps.

Adam


On Mon, Nov 2, 2015 at 5:39 PM, davidrsm...@btinternet.com <
davidrsm...@btinternet.com> wrote:

> Hi
>
> Where I work we have created an attribute loggers of our own. It is a
> fairly simple affair which used a regex to determine which attributes to
> log, and writes them as key value pairs to a file, whose location is
> determined by a user properly. I'm happy to put this out there if anyone is
> interested.
>
> Sent from my HTC
>
>
> - Reply message -
> From: "Adam Taft" 
> Date: Mon, Nov 2, 2015 19:23
> Subject: LogAttribute - Sending that output to a custom logger?
> To: 
>
> This thread has forked into two different conversations:  1. improvements
> to LogAttribute processor; 2. improvements to processor documentation.
>
> 1)  re: improvements to LogAttribute - we already have NIFI-67 [1] that
> suggests a number of improvements to LogAttribute.  One of these is the use
> of a custom name for the logger so that logback rules can be written
> against that name.
>
> While the provenance engine is great for many scenarios, in my opinion, it
> doesn't replace the need for true text-based logging.  The tooling for log
> processing is very mature and there's no ability to "grep" a provenance
> repository, migrate or offload provenance logs into deep storage, store log
> events into a database, or do any other cool syslogd or logback type
> things.  Being able to capture and log a flowfile at the exact right place
> in the data flow and processing it using the command line is an extremely
> valuable tool in the toolkit.
>
> For a long time, I've wanted to work on at least some of the things
> mentioned in NIFI-67 and will hopefully get to do so time willing.  Having
> a custom "name" for the LogAttribute processor seems like a no-brainer.
> Contributions for this should definitely be welcome!
>
> 2) improvements to processor document - I agree, even as a somewhat
> seasoned NIFI user, I still have a hard time reading and understanding the
> processor documentation.  I often do exactly what Mark P. suggests and
> instead go directly to the source.  Any contribution towards better
> processor documentation is greatly appreciated!
>
> [1] https://issues.apache.org/jira/browse/NIFI-67
>
>
> On Mon, Nov 2, 2015 at 1:54 PM, Aldrin Piri  wrote:
>
> > We greatly appreciate contributions.  Your prescribed approach sounds
> great
> > and if you are willing to give us a few cycles pointing out, and
> optionally
> > correcting, the items that are in need of improvement, we will certainly
> > incorporate.
> >
> > Thanks!
> >
> > On Mon, Nov 2, 2015 at 1:28 PM, Mark Petronic 
> > wrote:
> >
> > > I'm sort of in the camp of "don't come with a complaint if you don't
> come
> > > with a solution" and hesitated to even raise the documentation comment
> > > without just fixing it myself. How about this, I just do some updates
> on
> > > some processor docs myself and use that as my first contribution to
> work
> > > through the process of committing to this project?
> > >
> > > But, to give you one quick example, EvaluateJSONPath (which, btw has
> > pretty
> > > good docs otherwise) does not mention HOW to extract the JSON you are
> > > interested in. I had to look at the code to figure out it used this:
> > > https://github.com/jayway/JsonPath. Ok, that was not hard, I admit,
> but,
> > > as
> > > a user, should I need to look at the code for such information? I
> submit,
> > > no. Me personally, I like to dig into the code. So, this is more a
> > comment
> > > on "overall goodness" for the general new user experience.
> > >
> > > I agree with your assessment of 'new user vibe' as I am starting to not
> > > notice it as much. lol
> > >
> > > On Mon, Nov 2, 2015 at 10:15 AM, Joe Witt  wrote:
>
>
>


Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Joe Witt
https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide

On Mon, Nov 2, 2015 at 9:04 PM, Adam Taft  wrote:
> David,
>
> This sounds like a slightly different use case than the NiFi standard
> LogAttribute processor.  It sounds like your processor is more of a generic
> attribute converter and file writer.  The LogAttribute processor is
> designed to interact with the underlying NiFi logging subsystem, not
> necessarily just to write files.
>
> That being said, your processor may be a useful contribution to Apache
> NiFi.  Specifically, the value-add of your processor might be in the
> key-value format you've defined to output the flowfile attributes.  It
> might be interesting to see this expressed as an attribute-to-payload
> converter, chained together with potentially other processors like PutFile
> in the dataflow.
>
> If you want to contribute your processor, I would recommend making it
> available on GitHub (or similar) for review by the Apache NiFi community.
> Just post a link of your contribution here or even issue a pull request for
> your processor.  It would at least be evaluated and considered for
> inclusion.
>
> Hope this helps.
>
> Adam
>
>
> On Mon, Nov 2, 2015 at 5:39 PM, davidrsm...@btinternet.com <
> davidrsm...@btinternet.com> wrote:
>
>> Hi
>>
>> Where I work we have created an attribute loggers of our own. It is a
>> fairly simple affair which used a regex to determine which attributes to
>> log, and writes them as key value pairs to a file, whose location is
>> determined by a user properly. I'm happy to put this out there if anyone is
>> interested.
>>
>> Sent from my HTC
>>
>>
>> - Reply message -
>> From: "Adam Taft" 
>> Date: Mon, Nov 2, 2015 19:23
>> Subject: LogAttribute - Sending that output to a custom logger?
>> To: 
>>
>> This thread has forked into two different conversations:  1. improvements
>> to LogAttribute processor; 2. improvements to processor documentation.
>>
>> 1)  re: improvements to LogAttribute - we already have NIFI-67 [1] that
>> suggests a number of improvements to LogAttribute.  One of these is the use
>> of a custom name for the logger so that logback rules can be written
>> against that name.
>>
>> While the provenance engine is great for many scenarios, in my opinion, it
>> doesn't replace the need for true text-based logging.  The tooling for log
>> processing is very mature and there's no ability to "grep" a provenance
>> repository, migrate or offload provenance logs into deep storage, store log
>> events into a database, or do any other cool syslogd or logback type
>> things.  Being able to capture and log a flowfile at the exact right place
>> in the data flow and processing it using the command line is an extremely
>> valuable tool in the toolkit.
>>
>> For a long time, I've wanted to work on at least some of the things
>> mentioned in NIFI-67 and will hopefully get to do so time willing.  Having
>> a custom "name" for the LogAttribute processor seems like a no-brainer.
>> Contributions for this should definitely be welcome!
>>
>> 2) improvements to processor document - I agree, even as a somewhat
>> seasoned NIFI user, I still have a hard time reading and understanding the
>> processor documentation.  I often do exactly what Mark P. suggests and
>> instead go directly to the source.  Any contribution towards better
>> processor documentation is greatly appreciated!
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-67
>>
>>
>> On Mon, Nov 2, 2015 at 1:54 PM, Aldrin Piri  wrote:
>>
>> > We greatly appreciate contributions.  Your prescribed approach sounds
>> great
>> > and if you are willing to give us a few cycles pointing out, and
>> optionally
>> > correcting, the items that are in need of improvement, we will certainly
>> > incorporate.
>> >
>> > Thanks!
>> >
>> > On Mon, Nov 2, 2015 at 1:28 PM, Mark Petronic 
>> > wrote:
>> >
>> > > I'm sort of in the camp of "don't come with a complaint if you don't
>> come
>> > > with a solution" and hesitated to even raise the documentation comment
>> > > without just fixing it myself. How about this, I just do some updates
>> on
>> > > some processor docs myself and use that as my first contribution to
>> work
>> > > through the process of committing to this project?
>> > >
>> > > But, to give you one quick example, EvaluateJSONPath (which, btw has
>> > pretty
>> > > good docs otherwise) does not mention HOW to extract the JSON you are
>> > > interested in. I had to look at the code to figure out it used this:
>> > > https://github.com/jayway/JsonPath. Ok, that was not hard, I admit,
>> but,
>> > > as
>> > > a user, should I need to look at the code for such information? I
>> submit,
>> > > no. Me personally, I like to dig into the code. So, this is more a
>> > comment
>> > > on "overall goodness" for the general new user experience.
>> > >
>> > > I agree with your assessment of