Re: [RESULT][VOTE] Release Apache NiFi 0.7.1

2016-10-19 Thread Tony Kurc
Awesome! Nice work pulling this together and finding some of the edge cases
in releases. Since you can't perform that move step in dist, I'll take care
of that this evening.

On Thu, Oct 20, 2016 at 12:12 AM, Joe Skora  wrote:

> Apache NiFi Community, I am pleased to announce that the 0.7.1 release of
> Apache NiFi passes with 4 +1 (binding) votes 4 +1 (non-binding) votes 0 0
> votes 0 -1 votes Thanks to all who helped make this release possible. Here
> is the PMC vote thread:
> https://lists.apache.org/thread.html/991a1837d62f74fa437d2c14ce83ec
> ac1bbed0f9e4e63999b4b5@%3Cdev.nifi.apache.org%3E
>


[RESULT][VOTE] Release Apache NiFi 0.7.1

2016-10-19 Thread Joe Skora
Apache NiFi Community, I am pleased to announce that the 0.7.1 release of
Apache NiFi passes with 4 +1 (binding) votes 4 +1 (non-binding) votes 0 0
votes 0 -1 votes Thanks to all who helped make this release possible. Here
is the PMC vote thread:
https://lists.apache.org/thread.html/991a1837d62f74fa437d2c14ce83ecac1bbed0f9e4e63999b4b5@%3Cdev.nifi.apache.org%3E


Re: 0.7.1 website updates

2016-10-19 Thread Joe Skora
I'll add that to the revisions I have in progress.

On Wed, Oct 19, 2016 at 10:27 PM, Joe Witt  wrote:

> Definitely agree.  We should update the docs for that section to indicate
> that it should only occur for releases on the latest line.
>
> Thanks
> Joe
>
> On Oct 19, 2016 10:05 PM, "Joe Skora"  wrote:
>
> > The current NiFi release guide
> >  describes
> > 14 steps to complete a release.  Of those, steps 7 through 11 involve
> > updating the application documentation on the NiFi website.
> >
> > 7. From a nifi.tar.gz collect the docs/html/* files and svn commit them
> to
> > https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-docs/html/
> > 8. From a nifi.tar.gz collect the nifi-framework-nar.nar/META-
> > INF/bundled-dependencies/nifi-web-api.war/docs/rest-api/* files and svn
> > commit them to https://svn.apache.org/repos/
> asf/nifi/site/trunk/docs/nifi-
> > docs/rest-api/
> >
> > 9. Run an instance of NiFi
> >
> > 10. Copy nifi/work/docs/components/* and svn commit to
> > https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-
> > docs/components/
> > 11. wget http://localhost:8080/nifi-docs/documentation and svn commit to
> > https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-
> docs/index.html
> >
> >
> > Since the release of 1.0.0 the documentation on the website is based on
> the
> > 1.x branch, so it seems that steps 7 through 11 should not be performed
> for
> > the 0.7.1 release.
> >
> > Do you agree?
> >
>


Re: 0.7.1 website updates

2016-10-19 Thread Joe Witt
Definitely agree.  We should update the docs for that section to indicate
that it should only occur for releases on the latest line.

Thanks
Joe

On Oct 19, 2016 10:05 PM, "Joe Skora"  wrote:

> The current NiFi release guide
>  describes
> 14 steps to complete a release.  Of those, steps 7 through 11 involve
> updating the application documentation on the NiFi website.
>
> 7. From a nifi.tar.gz collect the docs/html/* files and svn commit them to
> https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-docs/html/
> 8. From a nifi.tar.gz collect the nifi-framework-nar.nar/META-
> INF/bundled-dependencies/nifi-web-api.war/docs/rest-api/* files and svn
> commit them to https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-
> docs/rest-api/
>
> 9. Run an instance of NiFi
>
> 10. Copy nifi/work/docs/components/* and svn commit to
> https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-
> docs/components/
> 11. wget http://localhost:8080/nifi-docs/documentation and svn commit to
> https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-docs/index.html
>
>
> Since the release of 1.0.0 the documentation on the website is based on the
> 1.x branch, so it seems that steps 7 through 11 should not be performed for
> the 0.7.1 release.
>
> Do you agree?
>


0.7.1 website updates

2016-10-19 Thread Joe Skora
The current NiFi release guide
 describes
14 steps to complete a release.  Of those, steps 7 through 11 involve
updating the application documentation on the NiFi website.

7. From a nifi.tar.gz collect the docs/html/* files and svn commit them to
https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-docs/html/
8. From a nifi.tar.gz collect the nifi-framework-nar.nar/META-
INF/bundled-dependencies/nifi-web-api.war/docs/rest-api/* files and svn
commit them to https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-
docs/rest-api/

9. Run an instance of NiFi

10. Copy nifi/work/docs/components/* and svn commit to
https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-docs/components/
11. wget http://localhost:8080/nifi-docs/documentation and svn commit to
https://svn.apache.org/repos/asf/nifi/site/trunk/docs/nifi-docs/index.html


Since the release of 1.0.0 the documentation on the website is based on the
1.x branch, so it seems that steps 7 through 11 should not be performed for
the 0.7.1 release.

Do you agree?


Re: IMPORTANT -- Lunch and Learn - NiFi

2016-10-19 Thread Devin Fisher
Sorry, this was for some company nifi training. Did not mean to send to
nifi dev mailing list.

Sorry again.

Devin

On Wed, Oct 19, 2016 at 11:43 AM, Joe Percivall <
joeperciv...@yahoo.com.invalid> wrote:

> Hello Devin,
>
> I'm a bit confused but I don't think you meant to send this to the Apache
> dev mailing list.
>
> Joe
>  - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
>
> On Wednesday, October 19, 2016 1:40 PM, Devin Fisher  perfectsearchcorp.com> wrote:
> New like for Template: (fixed a typo)
>
> https://drive.google.com/open?id=0B_BTD18gUE3CeHAxSExfd3Q2T2s
>
>
> On Wed, Oct 19, 2016 at 11:23 AM, Devin Fisher <
> devin.fis...@perfectsearchcorp.com> wrote:
>
> > This week Info: (Hopefully this one is correct)
> >
> > 
> > Nifi HW Template
> >  -- You
> > will need this for the Homework.
> > Slide Show Presentation
> >  8t6xovAbkqo9CQDoy7BISTTjzsV0pkIA>
> >
>


Re: CsvToAttributes processor

2016-10-19 Thread Matt Burgess
Alternative to n^2 processors, there was some discussion a little
while back about having Controller Service instances to do data format
conversions [1]. However that's a complex issue and might not get
integrated in the near-term. I agree with Andy that CSV->JSON is a
useful task, and that when we get the extension registry (and/or the
controller services), we can update the processors accordingly.

Regards,
Matt

[1] 
http://apache-nifi-developer-list.39713.n7.nabble.com/Looking-for-feedback-on-my-WIP-Design-td13097.html

On Wed, Oct 19, 2016 at 1:58 PM, Andy LoPresto  wrote:
> I like Matt’s idea. Currently there are ConvertCSVToAvro and
> ConvertAvroToJSON processors, but no processor that directly converts CSV to
> JSON. Keeping the content in the content claim, as Joe and Matt pointed out,
> will greatly improve performance over loading it into attributes. If
> attribute-based routing is desired, an UpdateAttribute processor can follow
> on to update a single attribute from the content without polluting it with
> unnecessary data.
>
> While I am not a proponent of creating n^2 processors just to do format
> conversions, I think CSV to JSON is a common-enough and useful-enough task
> that this would be beneficial. And once we get the extension registry,
> people can go nuts with n^2 conversion processors.
>
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Oct 19, 2016, at 1:14 PM, Matt Foley  wrote:
>
> For the specific use case of processing CSV files (and possibly “flat” db
> tables), would many of the same goals be met if the simple list of “bare”
> values in each record was turned into easily parsable key/value pairs?
> Perhaps either JSON or YAML format?  But still left in the content rather
> than moved into the attribute list, so as to avoid the problems Joe stated.
> Granted each downstream processor will have to re-parse the content, but
> it’s fast and easy - for instance, in python one can read such content into
> a {dictionary} with just a couple lines of code.  Indexers consume it well,
> too, or can be taught to do so.
>
> Thanks,
> --Matt
>
>
> On 10/19/16, 6:02 AM, "Joe Witt"  wrote:
>
>Francois
>
>Thanks for starting the discussion and this is indeed the type of
>thing people would find helpful.  One thing I'd want to flag with this
>approach is the impact it will have on performance at higher rates.
>We're starting to see people wanting to do this more and more where
>they'll take the content of a flowfile and turn it into attributes.
>This can put a lot of pressure on the heap and garbage collection and
>is best to avoid if you want to achieve sustained high performance.
>Keeping the content in its native form or converting it to another
>form will yield much higher sustained throughput as we can stream
>those things from their underlying storage in the content repository
>to their new form in the repository or to another system all while
>only ever having only as much in memory as your technique for
>operating on them. So for example we can do things like compress a 1GB
>file and only have say 1KB in memory (as an example).  But by taking
>the content and turning it into attributes on the flow file the flow
>file object (not its content) will be in memory most of the time and
>this is where problems can occur.  It would be better to have pushing
>to elastic be driven off the content though this admittedly
>introducing a different challenge which is 'well, what format of that
>content does it expect'?  We have some examples of this pattern now in
>our SQL processors for instance which are built around a specific data
>format but we need to do better and offer generic or pluggable ways to
>read record oriented data from a variety of formats and not have the
>processors be specific to the underlying format where possible and
>appropriate.  The key is to do this without forcing some goofy
>normalization format that will kill performance as well and which
>would make it more brittle.
>
>So, anyway, I said all that to say that it is great you've offered to
>contribute it and I think you certainly should.  We should just take
>care to document its intended use and limitations on performance to
>consider, and enable it to limit how many columns/fields get turned
>into attributes maybe by setting a max or by having a
>whitelist/blacklist type model.  Even if it won't achieve highest
>sustained performance I suspect this will be quite helpful for people
>as is.
>
>Thanks!
>Joe
>
>On Wed, Oct 19, 2016 at 6:50 AM, Uwe Geercken 
> wrote:
>
> Francois,
>
> very nice. Thanks.
>
> I have been working on a simple version a while ago. But it had another
> scope: I wnated to have a Nifi processor to merge CSV data with a template
> from a template engi

Re: CsvToAttributes processor

2016-10-19 Thread Andy LoPresto
I like Matt’s idea. Currently there are ConvertCSVToAvro and ConvertAvroToJSON 
processors, but no processor that directly converts CSV to JSON. Keeping the 
content in the content claim, as Joe and Matt pointed out, will greatly improve 
performance over loading it into attributes. If attribute-based routing is 
desired, an UpdateAttribute processor can follow on to update a single 
attribute from the content without polluting it with unnecessary data.

While I am not a proponent of creating n^2 processors just to do format 
conversions, I think CSV to JSON is a common-enough and useful-enough task that 
this would be beneficial. And once we get the extension registry, people can go 
nuts with n^2 conversion processors.


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 19, 2016, at 1:14 PM, Matt Foley  wrote:
> 
> For the specific use case of processing CSV files (and possibly “flat” db 
> tables), would many of the same goals be met if the simple list of “bare” 
> values in each record was turned into easily parsable key/value pairs?  
> Perhaps either JSON or YAML format?  But still left in the content rather 
> than moved into the attribute list, so as to avoid the problems Joe stated.  
> Granted each downstream processor will have to re-parse the content, but it’s 
> fast and easy - for instance, in python one can read such content into a 
> {dictionary} with just a couple lines of code.  Indexers consume it well, 
> too, or can be taught to do so.
> 
> Thanks,
> --Matt
> 
> 
> On 10/19/16, 6:02 AM, "Joe Witt"  wrote:
> 
>Francois
> 
>Thanks for starting the discussion and this is indeed the type of
>thing people would find helpful.  One thing I'd want to flag with this
>approach is the impact it will have on performance at higher rates.
>We're starting to see people wanting to do this more and more where
>they'll take the content of a flowfile and turn it into attributes.
>This can put a lot of pressure on the heap and garbage collection and
>is best to avoid if you want to achieve sustained high performance.
>Keeping the content in its native form or converting it to another
>form will yield much higher sustained throughput as we can stream
>those things from their underlying storage in the content repository
>to their new form in the repository or to another system all while
>only ever having only as much in memory as your technique for
>operating on them. So for example we can do things like compress a 1GB
>file and only have say 1KB in memory (as an example).  But by taking
>the content and turning it into attributes on the flow file the flow
>file object (not its content) will be in memory most of the time and
>this is where problems can occur.  It would be better to have pushing
>to elastic be driven off the content though this admittedly
>introducing a different challenge which is 'well, what format of that
>content does it expect'?  We have some examples of this pattern now in
>our SQL processors for instance which are built around a specific data
>format but we need to do better and offer generic or pluggable ways to
>read record oriented data from a variety of formats and not have the
>processors be specific to the underlying format where possible and
>appropriate.  The key is to do this without forcing some goofy
>normalization format that will kill performance as well and which
>would make it more brittle.
> 
>So, anyway, I said all that to say that it is great you've offered to
>contribute it and I think you certainly should.  We should just take
>care to document its intended use and limitations on performance to
>consider, and enable it to limit how many columns/fields get turned
>into attributes maybe by setting a max or by having a
>whitelist/blacklist type model.  Even if it won't achieve highest
>sustained performance I suspect this will be quite helpful for people
>as is.
> 
>Thanks!
>Joe
> 
>On Wed, Oct 19, 2016 at 6:50 AM, Uwe Geercken  wrote:
>> Francois,
>> 
>> very nice. Thanks.
>> 
>> I have been working on a simple version a while ago. But it had another 
>> scope: I wnated to have a Nifi processor to merge CSV data with a template 
>> from a template engine (e.g. Apache Velocity). I will review my code and 
>> have a look at your processor.
>> 
>> Where can we get it? Github?
>> 
>> Rgds,
>> 
>> Uwe
>> 
>>> Gesendet: Mittwoch, 19. Oktober 2016 um 11:10 Uhr
>>> Von: "François Prunier" 
>>> An: dev@nifi.apache.org
>>> Betreff: CsvToAttributes processor
>>> 
>>> Hello Nifi folks,
>>> 
>>> I've built a processor to parse CSV files with headers and turn each
>>> line in a flowfile. Each resulting flowfile has as many attributes as
>>> the number of columns. Each attributes has the name of a column with the
>>> corresponding value for

Re: IMPORTANT -- Lunch and Learn - NiFi

2016-10-19 Thread Joe Percivall
Hello Devin,

I'm a bit confused but I don't think you meant to send this to the Apache dev 
mailing list. 

Joe
 - - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Wednesday, October 19, 2016 1:40 PM, Devin Fisher 
 wrote:
New like for Template: (fixed a typo)

https://drive.google.com/open?id=0B_BTD18gUE3CeHAxSExfd3Q2T2s


On Wed, Oct 19, 2016 at 11:23 AM, Devin Fisher <
devin.fis...@perfectsearchcorp.com> wrote:

> This week Info: (Hopefully this one is correct)
>
> 
> Nifi HW Template
>  -- You
> will need this for the Homework.
> Slide Show Presentation
> 
>


Re: IMPORTANT -- Lunch and Learn - NiFi

2016-10-19 Thread Devin Fisher
New like for Template: (fixed a typo)

https://drive.google.com/open?id=0B_BTD18gUE3CeHAxSExfd3Q2T2s

On Wed, Oct 19, 2016 at 11:23 AM, Devin Fisher <
devin.fis...@perfectsearchcorp.com> wrote:

> This week Info: (Hopefully this one is correct)
>
> 
> Nifi HW Template
>  -- You
> will need this for the Homework.
> Slide Show Presentation
> 
>


Re: IMPORTANT -- Lunch and Learn - NiFi

2016-10-19 Thread Devin Fisher
This week Info: (Hopefully this one is correct)


Nifi HW Template
 -- You will
need this for the Homework.
Slide Show Presentation



Re: IMPORTANT -- Lunch and Learn - NiFi

2016-10-19 Thread Devin Fisher
This week Info:

Nifi HW Template
 -- You will
need this for the Homework.
Slide Show Presentation


On Wed, Oct 12, 2016 at 11:37 AM, Lara Leavitt <
lara.leav...@perfectsearchcorp.com> wrote:

> Hi,
>
> I've installed nifi locally, but spreadsheet is locked for me to mark it.
>
> -Lara
>
> On Mon, Oct 10, 2016 at 1:06 PM, Devin Fisher  perfectsearchcorp.com> wrote:
>
>> This week (on Wednesday during lunch and learn) we as a team are going to
>> be learning about our ETL tool NIFI.
>>
>> For this training, everyone will need to do a few things before we get
>> into the training. I've broken these prerequisites into two categories.
>>
>> *Please DO RIGHT NOW:*
>>
>>1. Go to the HW spreadsheet
>>
>> 
>>and make sure your name is on the list. (if not please add it)
>>
>> (if your name was not on the list, you may consider checking the
>> employee directory
>> 
>> and make sure you are on that list)
>>
>> Please *Do Before Wednesday:*
>>
>>1. Have a IMAT Nifi Instance available for your use. This can take
>>the following forms:
>>   - A fully installed IMAT appliance (installed and configured).
>>   - ETL dev sandbox (mostly applies to Russ and I).
>>   - Local Installed using instructions below.
>>2. Once you have an IMAT Nifi Instance for you and only you to use.
>>Please revisit the HW spreadsheet
>>
>> 
>>  and
>>mark your name as having Nifi Available.
>>
>>
>> Local Install of Nifi:
>>
>>1. Make sure you have Java 7 or greater installed. Test: java -version
>>2. Download Nifi with IMAT
>>
>>extensions from fortandy. (http://10.10.10.240/data/user
>>/devin.fisher/lunchLearn/nifi-0.7.0-bin.zip
>>
>>)
>>3. Extract Nifi zip to any directory.
>>4. In */nifi-0.7.0/bin* and run:
>>   1. Windows: *run-nifi.bat*
>>   2. Linux/macOS: *./nifi.sh start*
>>5. Allow time for nifi to start up (up to 3 mins)
>>6. In browser go to http://127.0.0.1:8799/nifi
>>
>> If you have any issue installing please let me know.
>>
>
>
>
> --
> Lara Leavitt
> User Interface Engineer
> Perfect Search Corporation
>


Re: CsvToAttributes processor

2016-10-19 Thread Matt Foley
For the specific use case of processing CSV files (and possibly “flat” db 
tables), would many of the same goals be met if the simple list of “bare” 
values in each record was turned into easily parsable key/value pairs?  Perhaps 
either JSON or YAML format?  But still left in the content rather than moved 
into the attribute list, so as to avoid the problems Joe stated.  Granted each 
downstream processor will have to re-parse the content, but it’s fast and easy 
- for instance, in python one can read such content into a {dictionary} with 
just a couple lines of code.  Indexers consume it well, too, or can be taught 
to do so.

Thanks,
--Matt


On 10/19/16, 6:02 AM, "Joe Witt"  wrote:

Francois

Thanks for starting the discussion and this is indeed the type of
thing people would find helpful.  One thing I'd want to flag with this
approach is the impact it will have on performance at higher rates.
We're starting to see people wanting to do this more and more where
they'll take the content of a flowfile and turn it into attributes.
This can put a lot of pressure on the heap and garbage collection and
is best to avoid if you want to achieve sustained high performance.
Keeping the content in its native form or converting it to another
form will yield much higher sustained throughput as we can stream
those things from their underlying storage in the content repository
to their new form in the repository or to another system all while
only ever having only as much in memory as your technique for
operating on them. So for example we can do things like compress a 1GB
file and only have say 1KB in memory (as an example).  But by taking
the content and turning it into attributes on the flow file the flow
file object (not its content) will be in memory most of the time and
this is where problems can occur.  It would be better to have pushing
to elastic be driven off the content though this admittedly
introducing a different challenge which is 'well, what format of that
content does it expect'?  We have some examples of this pattern now in
our SQL processors for instance which are built around a specific data
format but we need to do better and offer generic or pluggable ways to
read record oriented data from a variety of formats and not have the
processors be specific to the underlying format where possible and
appropriate.  The key is to do this without forcing some goofy
normalization format that will kill performance as well and which
would make it more brittle.

So, anyway, I said all that to say that it is great you've offered to
contribute it and I think you certainly should.  We should just take
care to document its intended use and limitations on performance to
consider, and enable it to limit how many columns/fields get turned
into attributes maybe by setting a max or by having a
whitelist/blacklist type model.  Even if it won't achieve highest
sustained performance I suspect this will be quite helpful for people
as is.

Thanks!
Joe

On Wed, Oct 19, 2016 at 6:50 AM, Uwe Geercken  wrote:
> Francois,
>
> very nice. Thanks.
>
> I have been working on a simple version a while ago. But it had another 
scope: I wnated to have a Nifi processor to merge CSV data with a template from 
a template engine (e.g. Apache Velocity). I will review my code and have a look 
at your processor.
>
> Where can we get it? Github?
>
> Rgds,
>
> Uwe
>
>> Gesendet: Mittwoch, 19. Oktober 2016 um 11:10 Uhr
>> Von: "François Prunier" 
>> An: dev@nifi.apache.org
>> Betreff: CsvToAttributes processor
>>
>> Hello Nifi folks,
>>
>> I've built a processor to parse CSV files with headers and turn each
>> line in a flowfile. Each resulting flowfile has as many attributes as
>> the number of columns. Each attributes has the name of a column with the
>> corresponding value for the line.
>>
>> For example, this CSV file:
>>
>> |col1,col2,col3 a,b,c d,e,f |
>>
>> would generate two flowfiles with the following attributes:
>>
>> |col1 = a col2 = b col3 = c |
>>
>> and
>>
>> |col1 = d col2 = e col3 = f |
>>
>> As of now, you can configure the charset plus delimiter, quote and
>> escape character. It's based on the commons-csv parser.
>>
>> It's very handy if you want to, for example, index a CSV file into
>> elasticsearch.
>>
>> Would you guys be interested in a pull request to add this processor to
>> the main code base ? It needs a bit more documentation and cleanup that
>> I would need to add in but it's already successfully used in production.
>>
>> Best regards,
>> --
>> *François Prunier
>> * *Hurence* - /Vos experts Big Data/
>> http://www.hurence.com
>> *mobile:* +33 6

Re: [VOTE] Release Apache NiFi 0.7.1 (RC1)

2016-10-19 Thread Pierre Villard
+1 (non-binding)

Full build with contrib-check, ran few workflows successfully.

2016-10-19 2:39 GMT+02:00 Koji Kawamura :

> +1 (non-binding)
>
> - git tag and commit ID verified
> - Downloaded nifi-0.7.1-source-release.zip and verified hashes
> - Full build finished without issue
> - Application runs without issue on Mac with java 1.8.0_111-b14
>
> - Found a trivial typo on release note wiki page, updated the wiki
> page: JolTransformJSON -> JoltTransformJSON
>
> On Wed, Oct 19, 2016 at 3:40 AM, Tony Kurc  wrote:
> > +1 (binding)
> >
> > reviewed License and notice
> > Built successfully from source using openjdk 1.8.0 on amazon linux.
> > verified hashes and signature
> > ran a few simple flows without issue
> >
> >
> >
> >
> > On Tue, Oct 18, 2016 at 2:28 PM, Michael Moser 
> wrote:
> >
> >> +1 (non-binding)
> >>
> >> git tag and commit ID verified
> >> Downloaded nifi-0.7.1-source-release.zip and verified hashes
> >> Source builds with RAT checks using Java 1.7.0_79
> >> Application runs and a few templates work as expected on CentOS 6
> >>
> >>
> >>
> >> On Tue, Oct 18, 2016 at 1:38 PM, Joe Skora  wrote:
> >>
> >> > Yeah, I'll get my keys added.  I didn't want to move the target commit
> >> once
> >> > we had that zeroed in.
> >> >
> >> >
> >> > On Tue, Oct 18, 2016 at 6:27 PM, Joe Witt  wrote:
> >> >
> >> > > +1 (binding).
> >> > >
> >> > > full build verification steps all passed/were as expected.  L&N
> looks
> >> > > good.  Simple live testing behaved as expected.
> >> > >
> >> > > Something to address for future purposes is please don't forget to
> add
> >> > > your keys to the KEYS file in the 0.x and master.
> >> > >
> >> > > Thanks for stepping up to take on the RM work and nice job.
> >> > >
> >> > > Joe
> >> > >
> >> > > On Tue, Oct 18, 2016 at 12:40 PM, James Wing 
> wrote:
> >> > > > +1, I was able to verify hashes, build with tests and checks, run
> >> NiFi
> >> > > and
> >> > > > do simple testing.  Thanks for putting this release together, Joe.
> >> > > >
> >> > > >
> >> > > > On Sun, Oct 16, 2016 at 8:32 PM, Joe Skora 
> >> wrote:
> >> > > >
> >> > > >> Hello,
> >> > > >>
> >> > > >> I am pleased to be calling this vote for the source release of
> >> Apache
> >> > > NiFi
> >> > > >> nifi-0.7.1.
> >> > > >>
> >> > > >> The source zip, including signatures, digests, etc. can be found
> at:
> >> > > >> https://repository.apache.org/content/repositories/
> >> orgapachenifi-1091
> >> > > >>
> >> > > >> The Git tag is nifi-0.7.1-RC1
> >> > > >> The Git commit ID is 421d5e61553e5fa160af9e0cc9fdc237af46906d
> >> > > >> *
> >> > > >> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
> >> > > >> 421d5e61553e5fa160af9e0cc9fdc237af46906d
> >> > > >> *
> >> > > >> https://github.com/apache/nifi/commit/
> >> 421d5e61553e5fa160af9e0cc9fdc2
> >> > > >> 37af46906d
> >> > > >>
> >> > > >> Checksums of nifi-0.7.1-source-release.zip:
> >> > > >> MD5: a15fc40ec887d82440f2de05ef71f810
> >> > > >> SHA1: 1565f4e123478e91fd26022b939d9d2f6ea6a2cf
> >> > > >>
> >> > > >> Release artifacts are signed with the following key:
> >> > > >> https://people.apache.org/keys/committer/jskora.asc
> >> > > >>
> >> > > >> KEYS file available here:
> >> > > >> https://dist.apache.org/repos/dist/release/nifi/KEYS
> >> > > >>
> >> > > >> 41 issues were closed/resolved for this release:
> >> > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> >> > > >> projectId=12316020&version=12338025
> >> > > >>
> >> > > >> Release note highlights can be found here:
> >> > > >> https://cwiki.apache.org/confluence/display/NIFI/
> >> > > >> Release+Notes#ReleaseNotes-Version0.7.1
> >> > > >>
> >> > > >> The vote will be open for 72 hours.
> >> > > >> Please download the release candidate and evaluate the necessary
> >> items
> >> > > >> including checking hashes, signatures, build from source, and
> test.
> >> > The
> >> > > >> please vote:
> >> > > >>
> >> > > >> [ ] +1 Release this package as nifi-0.7.1
> >> > > >> [ ] +0 no opinion
> >> > > >> [ ] -1 Do not release this package because because...
> >> > > >>
> >> > >
> >> >
> >>
>


Re: [DISCUSS] NiFi 1.1.0 release

2016-10-19 Thread Joe Witt
Team,

There are 31 open JIRAs at present tagged to Apache NiFi 1.1.0.  Let's
avoiding putting more in there for now at least without a discussion.
Of the 31 JIRAs there the vast majority need review so we should be
able to close these down fairly quickly as long as we don't let the
list grow.

Thanks
joe

On Fri, Oct 14, 2016 at 4:39 PM, Edgardo Vega  wrote:
> Joe,
>
> Appreciate the offer it isn't my PR. I was just using it as an example. All
> mine are currently closed, which I greatly appreciate.
>
> Cheers,
>
> Edgardo
>
> On Friday, October 14, 2016, Joe Witt  wrote:
>
>> Edgardo,
>>
>> You mentioned a PR from August. I'd be happy to help you work that
>> through review.
>>
>> Thanks
>> Joe
>>
>> On Fri, Oct 14, 2016 at 10:45 AM, Edgardo Vega > > wrote:
>> > I have agreed that at this point a release is important. My goal was try
>> to
>> > squeeze in a much goodness as possible into the release, but the
>> important
>> > bug fixes should come first. Getting 1.x into a state where the release
>> > notes don't say that it is geared toward developers and testers is really
>> > huge.
>> >
>> > I think Nifi is a great community otherwise I would participate in the
>> > mailing list, create Jira tickets and pull requests. I am only trying to
>> > strengthen the great thing that is going on here. We can always do
>> better.
>> > I was not trying to put down this community only to participate and make
>> it
>> > better. I think this conversation is an indication of how great this
>> > community is.
>> >
>> > Maybe I am being sensitive about this issue and trying to strengthen the
>> > nifi community even more, after coming from a conference where it was
>> > reported there was lots of excitement at first and now the participation
>> in
>> > the community has really died down and they are struggling. I don't want
>> to
>> > see that happen here.
>> >
>> > Cheers,
>> >
>> > Edgardo
>> >
>> >
>> >
>> >
>> > On Fri, Oct 14, 2016 at 9:37 AM, Andre > > wrote:
>> >
>> >> Edgardo,
>> >>
>> >> Thank you for your feedback. We hear your comments and as a committer I
>> can
>> >> share we are constantly looking to improve the PR process, having
>> already
>> >> taken many of the steps you suggest.
>> >>
>> >> However, it is important to notice that the number of PRs should not be
>> >> seen as a metric of engagement by the development community: Most of us
>> >> will submit PRs so that our work can be carefully reviewed by our peers
>> and
>> >> some of us will use JIRA patches to provide contributions.
>> >>
>> >> Having said that, it is true that some PRs may sit idle for a long time
>> and
>> >> we are working to improve this pipeline.
>> >>
>> >> It was therefore no coincidence that I  browsed most of the PRs
>> performing
>> >> a triage of items that have been superseded or diverged from the current
>> >> code base.
>> >>
>> >> In fact, less than a month ago the dev team closed a number of stalled
>> and
>> >> superseded PRs (commit cc5e827aa1dfe2f376e9836380ba63c15269eea8).
>> >>
>> >> Despite all the above, I think Joe has a point. The master contain a
>> series
>> >> of important bug fixes and suspect the community would benefit from a
>> >> release sooner rather than later.
>> >>
>> >> Once again, thank you for your feedback and contribution. It is good to
>> >> have you here.
>> >>
>> >> Andre
>> >>
>> >> On Fri, Oct 14, 2016 at 11:30 PM, Edgardo Vega > >
>> >> wrote:
>> >>
>> >> > Joe - You are correct I was mentioning the PRs that are currently
>> open.
>> >> >
>> >> > Regardless of how it happens reducing the count of open PRs I believe
>> to
>> >> be
>> >> > extremely important. Maybe I was hoping that the release could be a
>> >> forcing
>> >> > function to make that happen. I believe that developers are more
>> willing
>> >> to
>> >> > contribute when they see that their PRs will actually be able accepted
>> >> and
>> >> > merged into the code base. Having a low number of open PRs in progress
>> >> is a
>> >> > great indication that the main nifi developers are fully engaged with
>> the
>> >> > community.
>> >> >
>> >> > There are a few PRs that don't have any comments from committers at
>> all.
>> >> I
>> >> > found one from August in that state. If that was my PR I don't think I
>> >> > would be so willing to put another one in anytime soon. I do get that
>> >> > sometime PRs get stalled by the originator, if so maybe a rule about
>> >> > closing them after a certain amount of time or being taken over by a
>> core
>> >> > contributor if they think it worthwhile.
>> >> >
>> >> > I would like to shoutout to James Wing on my last PR he was quick to
>> >> > review, provided great comments, testing, and even some additional
>> code.
>> >> It
>> >> > was a great PR experience.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Edgardo
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Oct 13, 2016 at 4:14 PM, Joe Percivall <
>> joeperciv...@yahoo.com .
>> >> > invalid> wrote:
>> >> >
>> >> > > Joe, I think you misread. Edgardo i

Re: CsvToAttributes processor

2016-10-19 Thread Joe Witt
Francois

Thanks for starting the discussion and this is indeed the type of
thing people would find helpful.  One thing I'd want to flag with this
approach is the impact it will have on performance at higher rates.
We're starting to see people wanting to do this more and more where
they'll take the content of a flowfile and turn it into attributes.
This can put a lot of pressure on the heap and garbage collection and
is best to avoid if you want to achieve sustained high performance.
Keeping the content in its native form or converting it to another
form will yield much higher sustained throughput as we can stream
those things from their underlying storage in the content repository
to their new form in the repository or to another system all while
only ever having only as much in memory as your technique for
operating on them. So for example we can do things like compress a 1GB
file and only have say 1KB in memory (as an example).  But by taking
the content and turning it into attributes on the flow file the flow
file object (not its content) will be in memory most of the time and
this is where problems can occur.  It would be better to have pushing
to elastic be driven off the content though this admittedly
introducing a different challenge which is 'well, what format of that
content does it expect'?  We have some examples of this pattern now in
our SQL processors for instance which are built around a specific data
format but we need to do better and offer generic or pluggable ways to
read record oriented data from a variety of formats and not have the
processors be specific to the underlying format where possible and
appropriate.  The key is to do this without forcing some goofy
normalization format that will kill performance as well and which
would make it more brittle.

So, anyway, I said all that to say that it is great you've offered to
contribute it and I think you certainly should.  We should just take
care to document its intended use and limitations on performance to
consider, and enable it to limit how many columns/fields get turned
into attributes maybe by setting a max or by having a
whitelist/blacklist type model.  Even if it won't achieve highest
sustained performance I suspect this will be quite helpful for people
as is.

Thanks!
Joe

On Wed, Oct 19, 2016 at 6:50 AM, Uwe Geercken  wrote:
> Francois,
>
> very nice. Thanks.
>
> I have been working on a simple version a while ago. But it had another 
> scope: I wnated to have a Nifi processor to merge CSV data with a template 
> from a template engine (e.g. Apache Velocity). I will review my code and have 
> a look at your processor.
>
> Where can we get it? Github?
>
> Rgds,
>
> Uwe
>
>> Gesendet: Mittwoch, 19. Oktober 2016 um 11:10 Uhr
>> Von: "François Prunier" 
>> An: dev@nifi.apache.org
>> Betreff: CsvToAttributes processor
>>
>> Hello Nifi folks,
>>
>> I've built a processor to parse CSV files with headers and turn each
>> line in a flowfile. Each resulting flowfile has as many attributes as
>> the number of columns. Each attributes has the name of a column with the
>> corresponding value for the line.
>>
>> For example, this CSV file:
>>
>> |col1,col2,col3 a,b,c d,e,f |
>>
>> would generate two flowfiles with the following attributes:
>>
>> |col1 = a col2 = b col3 = c |
>>
>> and
>>
>> |col1 = d col2 = e col3 = f |
>>
>> As of now, you can configure the charset plus delimiter, quote and
>> escape character. It's based on the commons-csv parser.
>>
>> It's very handy if you want to, for example, index a CSV file into
>> elasticsearch.
>>
>> Would you guys be interested in a pull request to add this processor to
>> the main code base ? It needs a bit more documentation and cleanup that
>> I would need to add in but it's already successfully used in production.
>>
>> Best regards,
>> --
>> *François Prunier
>> * *Hurence* - /Vos experts Big Data/
>> http://www.hurence.com
>> *mobile:* +33 6 38 68 60 50
>>
>>


Re: Creating and Writing to an Avro File

2016-10-19 Thread Bryan Bende
Hello, and welcome!

1) If you want to create a flow file and write data to it you would need to
do something like the following:

FlowFile flowFile = session.create();

flowFile = session.write(flowFile, new OutputStreamCallback() {
  @Override
  public void process(InputStream in, OutputStream out) throws IOException {
try (OutputStream outputStream = new BufferedOutputStream(out)) {
  outputStream.write(...);
}
  }};

session.transfer(SOME_RELATIONSHIP, flowFile);

Of course you could use session.append() depending what you want to do.

2) By default each processor is scheduled with a run schedule of 0 seconds,
which means as fast as possible.
If you go into the scheduling tab of the processor you can select the
appropriate scheduling strategy.

-Bryan


On Wed, Oct 19, 2016 at 2:48 AM, Arsalan Siddiqi 
wrote:

> Hi
> I am new to Nifi. I am trying to read data from a cache and then create a
> new flow file and write the contents of the cache to the Flowfiles content.
> Firstly no flow file is created when I run my processor. As in the code i
> am
> looping over an array, I think i need to use the append function to append
> the contents to the flow file. At the moment I am guessing it is just
> overwriting the contents with the latest object of the array. At the moment
> I just want it to create a file with some content and then later I can
> figure out how i can manipulate it.
>
> Another thing is how can i schedule the processor to run only once. At
> first
> I had a log info statement inside the while loop and i see that even when I
> run the processor for a second, the log info runs over 10 times. The
> number of entries in the cache are 300 so I know that the loop will run 300
> times, thus printing the log info at least 300 times, but how to i make it
> execute only once.
>
>  n13654/writeavro.png>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/Creating-and-Writing-to-an-Avro-File-tp13654.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Aw: CsvToAttributes processor

2016-10-19 Thread Uwe Geercken
Francois,

very nice. Thanks.

I have been working on a simple version a while ago. But it had another scope: 
I wnated to have a Nifi processor to merge CSV data with a template from a 
template engine (e.g. Apache Velocity). I will review my code and have a look 
at your processor.

Where can we get it? Github?

Rgds,

Uwe

> Gesendet: Mittwoch, 19. Oktober 2016 um 11:10 Uhr
> Von: "François Prunier" 
> An: dev@nifi.apache.org
> Betreff: CsvToAttributes processor
>
> Hello Nifi folks,
> 
> I've built a processor to parse CSV files with headers and turn each 
> line in a flowfile. Each resulting flowfile has as many attributes as 
> the number of columns. Each attributes has the name of a column with the 
> corresponding value for the line.
> 
> For example, this CSV file:
> 
> |col1,col2,col3 a,b,c d,e,f |
> 
> would generate two flowfiles with the following attributes:
> 
> |col1 = a col2 = b col3 = c |
> 
> and
> 
> |col1 = d col2 = e col3 = f |
> 
> As of now, you can configure the charset plus delimiter, quote and 
> escape character. It's based on the commons-csv parser.
> 
> It's very handy if you want to, for example, index a CSV file into 
> elasticsearch.
> 
> Would you guys be interested in a pull request to add this processor to 
> the main code base ? It needs a bit more documentation and cleanup that 
> I would need to add in but it's already successfully used in production.
> 
> Best regards,
> -- 
> *François Prunier
> * *Hurence* - /Vos experts Big Data/
> http://www.hurence.com
> *mobile:* +33 6 38 68 60 50
> 
>


CsvToAttributes processor

2016-10-19 Thread François Prunier

Hello Nifi folks,

I've built a processor to parse CSV files with headers and turn each 
line in a flowfile. Each resulting flowfile has as many attributes as 
the number of columns. Each attributes has the name of a column with the 
corresponding value for the line.


For example, this CSV file:

|col1,col2,col3 a,b,c d,e,f |

would generate two flowfiles with the following attributes:

|col1 = a col2 = b col3 = c |

and

|col1 = d col2 = e col3 = f |

As of now, you can configure the charset plus delimiter, quote and 
escape character. It's based on the commons-csv parser.


It's very handy if you want to, for example, index a CSV file into 
elasticsearch.


Would you guys be interested in a pull request to add this processor to 
the main code base ? It needs a bit more documentation and cleanup that 
I would need to add in but it's already successfully used in production.


Best regards,
--
*François Prunier
* *Hurence* - /Vos experts Big Data/
http://www.hurence.com
*mobile:* +33 6 38 68 60 50



Creating and Writing to an Avro File

2016-10-19 Thread Arsalan Siddiqi
Hi 
I am new to Nifi. I am trying to read data from a cache and then create a
new flow file and write the contents of the cache to the Flowfiles content.
Firstly no flow file is created when I run my processor. As in the code i am
looping over an array, I think i need to use the append function to append
the contents to the flow file. At the moment I am guessing it is just
overwriting the contents with the latest object of the array. At the moment
I just want it to create a file with some content and then later I can
figure out how i can manipulate it. 

Another thing is how can i schedule the processor to run only once. At first
I had a log info statement inside the while loop and i see that even when I
run the processor for a second, the log info runs over 10 times. The
number of entries in the cache are 300 so I know that the loop will run 300
times, thus printing the log info at least 300 times, but how to i make it
execute only once. 


 



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Creating-and-Writing-to-an-Avro-File-tp13654.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.