[GitHub] nifi pull request: [NIFI-1394] - Unit test creates resources in HO...

2016-01-15 Thread smarthi
GitHub user smarthi opened a pull request:

https://github.com/apache/nifi/pull/174

[NIFI-1394] - Unit test creates resources in HOME directory

Changed the DB_LOCATION to be "/tmp/test/h2"

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/smarthi/incubator-nifi NIFI-1394

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #174


commit 8c822c4560a2137a98dbe2ad384cc1018b096ff3
Author: smarthi 
Date:   2016-01-16T06:48:06Z

[NIFI-1394] - Unit test creates resources in HOME directory




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: NiFi 0.4.1 InvokeHttp processor POST error issue

2016-01-15 Thread Adam Taft
Joe,

Awesome.  +1 to "Use Chunked Encoding" as 'false' by default.  I think for
historical purposes of HTTP, if you know the size of your payload, you
should explicitly assert it using content-length.  Sending payloads with
"normal" transfer encoding by default would be a great strategy.

>From a pragmatic point of view, if you know the size of your payload,
there's really no reason to use chunked encoding.  Having this feature
property exposed at all is just a nicety, I guess.

Adam




On Fri, Jan 15, 2016 at 1:58 PM, Joe Percivall <
joeperciv...@yahoo.com.invalid> wrote:

> Adam,
>
> You are right and it looks like OkHttp could easily support it.
> If you look at this line of OkHttp [1], you'll see that if the
> contentLength is not set then it will use chunked and otherwise it won't.
> If we added a property to InvokeHttp in which the user chooses to have
> chunked or not then adjust the contentLength method of the RequestBody we
> can enable the option.
> I tried making a simple change in InvokeHttp to the RequestBody to just
> implement the contentLength method with the correct value, combined with
> updating the mime-type to "application/x-www-form-urlencoded" to send as
> the content-type, and I successfully ran the template. I'll created a
> ticket[2] and will have a patch up for the change shortly.
> I'm going to assume that the default value for the "Use Chunked Encoding"
> property will false.
> [1]
> https://github.com/square/okhttp/blob/parent-2.7.1/okhttp/src/main/java/com/squareup/okhttp/Call.java#L263[2]
>  https://issues.apache.org/jira/browse/NIFI-1405
> Joe - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
>
>
>
> On Friday, January 15, 2016 9:24 AM, Adam Taft  wrote:
> Joe,
>
> Just as a quick observation, this statement isn't completely accurate:
>
> > "... and can stream the contents instead of loading into memory"
>
> The original InvokeHTTP code (pre okhttp) explicitly set the content-length
> header, because it was known (the flowfile payload content length is always
> known).  This does not, however, imply that the entire contents were loaded
> into memory.  The previous InvokeHTTP used the
> #setFixedLengthStreamingMode(long), which is described as:
>
> "This method is used to enable streaming of a HTTP request body without
> internal buffering, when the content length is known in advance." [1]
>
> HttpURLConnection doesn't need to buffer if the length is known in
> advance.  It's only when it doesn't know the length that it either needs to
> buffer to determine it or use chunked encoding.
>
> I think it's important to be able to support non-chunked encoded POST
> requests.  There are many "legacy" (or even "broken") web services that
> don't work with chunked encoding, obviously like in this case.
>
> Unfortunately, I don't recall that okhttp has similar direct support for
> "fixed length streaming".  It's probable that a custom implementation of
> okhttp.RequestBody would need to be created to support this. [2]
>
> [1]
>
> https://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode-long-
>
> [2] http://square.github.io/okhttp/3.x/okhttp/okhttp3/RequestBody.html
>
>
> On Thu, Jan 14, 2016 at 10:29 PM, Joe Percivall <
> joeperciv...@yahoo.com.invalid> wrote:
>
> > Hello Evan,
> >
> > Glad to hear you're enjoying NiFi!
> >
> > I was able to replicate your results so I dug in a bit and noticed in
> > Wireshark that the "Transfer-Encoding" header for InvokeHttp was set to
> > "chunked". When I tried using the same flag for curl it failed so I'm
> > relatively confident that is the problem. Currently InvokeHttp requires
> > using the chunk encoding for POST (primarily because you don't need to
> know
> > the content-length and can stream the contents instead of loading into
> > memory).
> >
> > PostHttp does have a "Use Chunked Encoding" option which would solve your
> > problem except that it doesn't work properly. PostHttp is using the
> > "EntityTemplate" which streams the content so the content length will
> never
> > be implemented and thus it will alway use the chunked encoding. I
> created a
> > ticket for it [1].
> >
> >
> > Also as a note, when creating a template you have to either explicitly
> > select the connections or not select anything and create a template for
> the
> > whole canvas (your template didn't have any connections).
> >
> > [1] https://issues.apache.org/jira/browse/NIFI-1396
> >
> > Cheers,
> > Joe
> > - - - - - -
> > Joseph Percivall
> > linkedin.com/in/Percivall
> > e: joeperciv...@yahoo.com
> >
> >
> >
> > On Thursday, January 14, 2016 8:07 PM, "yuchen@thomsonreuters.com" <
> > yuchen@thomsonreuters.com> wrote:
> >
> >
> >
> >
> > Hi Guys,
> >
> > Not sure if it is the correct way to raise issue by sending this email,
> if
> > not, let me know where the post the issue, thanks.
> >
> > We are using NiFi InvokeHttp processor to do POST to an webpage.
> > URL:
> 

Re: NiFi 0.4.1 InvokeHttp processor POST error issue

2016-01-15 Thread Joe Percivall
Adam,

You are right and it looks like OkHttp could easily support it. 
If you look at this line of OkHttp [1], you'll see that if the contentLength is 
not set then it will use chunked and otherwise it won't. If we added a property 
to InvokeHttp in which the user chooses to have chunked or not then adjust the 
contentLength method of the RequestBody we can enable the option.
I tried making a simple change in InvokeHttp to the RequestBody to just 
implement the contentLength method with the correct value, combined with 
updating the mime-type to "application/x-www-form-urlencoded" to send as the 
content-type, and I successfully ran the template. I'll created a ticket[2] and 
will have a patch up for the change shortly.
I'm going to assume that the default value for the "Use Chunked Encoding" 
property will false.
[1] 
https://github.com/square/okhttp/blob/parent-2.7.1/okhttp/src/main/java/com/squareup/okhttp/Call.java#L263[2]
 https://issues.apache.org/jira/browse/NIFI-1405
Joe - - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com






On Friday, January 15, 2016 9:24 AM, Adam Taft  wrote:
Joe,

Just as a quick observation, this statement isn't completely accurate:

> "... and can stream the contents instead of loading into memory"

The original InvokeHTTP code (pre okhttp) explicitly set the content-length
header, because it was known (the flowfile payload content length is always
known).  This does not, however, imply that the entire contents were loaded
into memory.  The previous InvokeHTTP used the
#setFixedLengthStreamingMode(long), which is described as:

"This method is used to enable streaming of a HTTP request body without
internal buffering, when the content length is known in advance." [1]

HttpURLConnection doesn't need to buffer if the length is known in
advance.  It's only when it doesn't know the length that it either needs to
buffer to determine it or use chunked encoding.

I think it's important to be able to support non-chunked encoded POST
requests.  There are many "legacy" (or even "broken") web services that
don't work with chunked encoding, obviously like in this case.

Unfortunately, I don't recall that okhttp has similar direct support for
"fixed length streaming".  It's probable that a custom implementation of
okhttp.RequestBody would need to be created to support this. [2]

[1]
https://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode-long-

[2] http://square.github.io/okhttp/3.x/okhttp/okhttp3/RequestBody.html


On Thu, Jan 14, 2016 at 10:29 PM, Joe Percivall <
joeperciv...@yahoo.com.invalid> wrote:

> Hello Evan,
>
> Glad to hear you're enjoying NiFi!
>
> I was able to replicate your results so I dug in a bit and noticed in
> Wireshark that the "Transfer-Encoding" header for InvokeHttp was set to
> "chunked". When I tried using the same flag for curl it failed so I'm
> relatively confident that is the problem. Currently InvokeHttp requires
> using the chunk encoding for POST (primarily because you don't need to know
> the content-length and can stream the contents instead of loading into
> memory).
>
> PostHttp does have a "Use Chunked Encoding" option which would solve your
> problem except that it doesn't work properly. PostHttp is using the
> "EntityTemplate" which streams the content so the content length will never
> be implemented and thus it will alway use the chunked encoding. I created a
> ticket for it [1].
>
>
> Also as a note, when creating a template you have to either explicitly
> select the connections or not select anything and create a template for the
> whole canvas (your template didn't have any connections).
>
> [1] https://issues.apache.org/jira/browse/NIFI-1396
>
> Cheers,
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
> On Thursday, January 14, 2016 8:07 PM, "yuchen@thomsonreuters.com" <
> yuchen@thomsonreuters.com> wrote:
>
>
>
>
> Hi Guys,
>
> Not sure if it is the correct way to raise issue by sending this email, if
> not, let me know where the post the issue, thanks.
>
> We are using NiFi InvokeHttp processor to do POST to an webpage.
> URL:
> http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx
> Request header: Content-Type: application/x-www-form-urlencoded
> POST Data:
> txt_stock_code=24984&sel_DateOfReleaseFrom_y=2016&sel_DateOfReleaseFrom_m=01&sel_DateOfReleaseFrom_d=04&sel_DateOfReleaseTo_y=2016&sel_DateOfReleaseTo_m=01&sel_DateOfReleaseTo_d=11&IsFromNewList=False&sel_tier_1=-2&sel_tier_2_group=-2&sel_tier_2=-2
>
> To make sure the request header and request body are correct, we use
> Fiddler to compose the post request.
> And the response show the request header and post data are correct.
>
>
> Attached file is the template we are using, it is working fine on version
> 0.3.0
>
> But not on the latest version 0.4.1
>
> So we suppose it is potential defect of the InvokeHttp processo

Re: ListenLumberjack processor is working

2016-01-15 Thread Bryan Bende
Andre,

Very cool that you have made progress here. Being able to integrate with
logstash will be very useful.

I think the refactoring I'm doing for the RELP stuff should help reduce the
amount of code that had to be carried over from ListenSyslog. I'm happy to
help you update your code once my changes are in. Sorry it hasn't gotten in
sooner.

-Bryan


On Fri, Jan 15, 2016 at 8:39 AM, Andre  wrote:

> Hey folks,
>
> I've managed to progress on ListenLumberjack. The code is a bit
> 'spaghettic' at the moment, with some serious amount of logger. enabled
> to allow some additional troubleshooting, but overall it "works".
>
> I am strongly considering refactor the code as whole once Bryan completes
> the ListenRELP processor.
>
> Functional code (I guess? :D ) should be available in here:
>
> https://github.com/trixpan/nifi-lumberjack-bundle/
>
> Known issues:
> * If logstash-forwarder goes silent for too long the processor will raise a
> Timeout. Couldn't find evidence of a keep alive within Lumberjack so I am
> considering catching this error as debug.
> * I suspect the code may have some memory leaks.
> * Tests haven't been created yet. To be honest I never wrote unit tests in
> my whole life so it will be another ride. :-)
>
> My results were the following:
>
> Single thread, 2 sec runs
> 2016/01/15 23:52:27.589014 Registrar: processing 4000 events
> 2016/01/15 23:52:29.169361 Registrar: processing 4000 events
> 2016/01/15 23:52:30.552031 Registrar: processing 4000 events
> 2016/01/15 23:52:32.998425 Registrar: processing 4000 events
> 2016/01/15 23:52:35.411438 Registrar: processing 4000 events
> 2016/01/15 23:52:37.062141 Registrar: processing 4000 events
> 2016/01/15 23:52:39.468577 Registrar: processing 4000 events
> 2016/01/15 23:52:40.940890 Registrar: processing 4000 events
> 2016/01/15 23:52:43.480875 Registrar: processing 4000 events
> 2016/01/15 23:52:45.026758 Registrar: processing 4000 events
>
> 4 threads, 2 sec runs
> 2016/01/15 23:56:03.376303 Registrar: processing 4000 events
> 2016/01/15 23:56:03.443074 Registrar: processing 4000 events
> 2016/01/15 23:56:03.471795 Registrar: processing 4000 events
> 2016/01/15 23:56:03.508283 Registrar: processing 4000 events
> 2016/01/15 23:56:03.534002 Registrar: processing 4000 events
> 2016/01/15 23:56:03.562387 Registrar: processing 4000 events
> 2016/01/15 23:56:03.587744 Registrar: processing 4000 events
> 2016/01/15 23:56:03.622716 Registrar: processing 4000 events
> 2016/01/15 23:56:03.649074 Registrar: processing 4000 events
> 2016/01/15 23:56:03.675780 Registrar: processing 4000 events
>
> Would anyone have a decent logstash testbed to put some extra pressure
> against the processor?
>


Re: NiFi 0.4.1 InvokeHttp processor POST error issue

2016-01-15 Thread Adam Taft
Joe,

Just as a quick observation, this statement isn't completely accurate:

> "... and can stream the contents instead of loading into memory"

The original InvokeHTTP code (pre okhttp) explicitly set the content-length
header, because it was known (the flowfile payload content length is always
known).  This does not, however, imply that the entire contents were loaded
into memory.  The previous InvokeHTTP used the
#setFixedLengthStreamingMode(long), which is described as:

"This method is used to enable streaming of a HTTP request body without
internal buffering, when the content length is known in advance." [1]

HttpURLConnection doesn't need to buffer if the length is known in
advance.  It's only when it doesn't know the length that it either needs to
buffer to determine it or use chunked encoding.

I think it's important to be able to support non-chunked encoded POST
requests.  There are many "legacy" (or even "broken") web services that
don't work with chunked encoding, obviously like in this case.

Unfortunately, I don't recall that okhttp has similar direct support for
"fixed length streaming".  It's probable that a custom implementation of
okhttp.RequestBody would need to be created to support this. [2]

[1]
https://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode-long-

[2] http://square.github.io/okhttp/3.x/okhttp/okhttp3/RequestBody.html

On Thu, Jan 14, 2016 at 10:29 PM, Joe Percivall <
joeperciv...@yahoo.com.invalid> wrote:

> Hello Evan,
>
> Glad to hear you're enjoying NiFi!
>
> I was able to replicate your results so I dug in a bit and noticed in
> Wireshark that the "Transfer-Encoding" header for InvokeHttp was set to
> "chunked". When I tried using the same flag for curl it failed so I'm
> relatively confident that is the problem. Currently InvokeHttp requires
> using the chunk encoding for POST (primarily because you don't need to know
> the content-length and can stream the contents instead of loading into
> memory).
>
> PostHttp does have a "Use Chunked Encoding" option which would solve your
> problem except that it doesn't work properly. PostHttp is using the
> "EntityTemplate" which streams the content so the content length will never
> be implemented and thus it will alway use the chunked encoding. I created a
> ticket for it [1].
>
>
> Also as a note, when creating a template you have to either explicitly
> select the connections or not select anything and create a template for the
> whole canvas (your template didn't have any connections).
>
> [1] https://issues.apache.org/jira/browse/NIFI-1396
>
> Cheers,
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
> On Thursday, January 14, 2016 8:07 PM, "yuchen@thomsonreuters.com" <
> yuchen@thomsonreuters.com> wrote:
>
>
>
>
> Hi Guys,
>
> Not sure if it is the correct way to raise issue by sending this email, if
> not, let me know where the post the issue, thanks.
>
> We are using NiFi InvokeHttp processor to do POST to an webpage.
> URL:
> http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx
> Request header: Content-Type: application/x-www-form-urlencoded
> POST Data:
> txt_stock_code=24984&sel_DateOfReleaseFrom_y=2016&sel_DateOfReleaseFrom_m=01&sel_DateOfReleaseFrom_d=04&sel_DateOfReleaseTo_y=2016&sel_DateOfReleaseTo_m=01&sel_DateOfReleaseTo_d=11&IsFromNewList=False&sel_tier_1=-2&sel_tier_2_group=-2&sel_tier_2=-2
>
> To make sure the request header and request body are correct, we use
> Fiddler to compose the post request.
> And the response show the request header and post data are correct.
>
>
> Attached file is the template we are using, it is working fine on version
> 0.3.0
>
> But not on the latest version 0.4.1
>
> So we suppose it is potential defect of the InvokeHttp processor in this
> version.
> We checked the source code and try to locate the issue, and found it is
> using com.squareup.okhttp.Request; to do the request, so we are not go any
> further to dig the issue…
> Currently we are using Curl to do the POST as a workaround.
>
> Let me know your comments, thanks.
>
> Finally, NiFi is a great tool!!! You guys are awesome!!!
>
> Best Regards,
> Evan from Thomson Reuters
>


ListenLumberjack processor is working

2016-01-15 Thread Andre
Hey folks,

I've managed to progress on ListenLumberjack. The code is a bit
'spaghettic' at the moment, with some serious amount of logger. enabled
to allow some additional troubleshooting, but overall it "works".

I am strongly considering refactor the code as whole once Bryan completes
the ListenRELP processor.

Functional code (I guess? :D ) should be available in here:

https://github.com/trixpan/nifi-lumberjack-bundle/

Known issues:
* If logstash-forwarder goes silent for too long the processor will raise a
Timeout. Couldn't find evidence of a keep alive within Lumberjack so I am
considering catching this error as debug.
* I suspect the code may have some memory leaks.
* Tests haven't been created yet. To be honest I never wrote unit tests in
my whole life so it will be another ride. :-)

My results were the following:

Single thread, 2 sec runs
2016/01/15 23:52:27.589014 Registrar: processing 4000 events
2016/01/15 23:52:29.169361 Registrar: processing 4000 events
2016/01/15 23:52:30.552031 Registrar: processing 4000 events
2016/01/15 23:52:32.998425 Registrar: processing 4000 events
2016/01/15 23:52:35.411438 Registrar: processing 4000 events
2016/01/15 23:52:37.062141 Registrar: processing 4000 events
2016/01/15 23:52:39.468577 Registrar: processing 4000 events
2016/01/15 23:52:40.940890 Registrar: processing 4000 events
2016/01/15 23:52:43.480875 Registrar: processing 4000 events
2016/01/15 23:52:45.026758 Registrar: processing 4000 events

4 threads, 2 sec runs
2016/01/15 23:56:03.376303 Registrar: processing 4000 events
2016/01/15 23:56:03.443074 Registrar: processing 4000 events
2016/01/15 23:56:03.471795 Registrar: processing 4000 events
2016/01/15 23:56:03.508283 Registrar: processing 4000 events
2016/01/15 23:56:03.534002 Registrar: processing 4000 events
2016/01/15 23:56:03.562387 Registrar: processing 4000 events
2016/01/15 23:56:03.587744 Registrar: processing 4000 events
2016/01/15 23:56:03.622716 Registrar: processing 4000 events
2016/01/15 23:56:03.649074 Registrar: processing 4000 events
2016/01/15 23:56:03.675780 Registrar: processing 4000 events

Would anyone have a decent logstash testbed to put some extra pressure
against the processor?