Re: FetchS3Object leak

2017-03-22 Thread David Hesson
Hi James,

Sorry - you are right. I searched for close and didn't notice it was in a try 
with resources block. The weirder thing is we only allow a single task to 
schedule at a time, which lead me to think it was leaking resources.

I'll continue investigating and see if I can find out what's going on.

Thanks!

On 2017-03-22 16:34 (-0400), James Wing  wrote: 
> David,
> 
> Can you clarify which part of the FetchS3Object code looks problematic to
> you?  From a quick look, I found one use of S3Object in FetchS3Object.java,
> line ~106:
> 
> try (final S3Object s3Object = client.getObject(request)) {
> flowFile = session.importFrom(s3Object.getObjectContent(),
> flowFile);
> attributes.put("s3.bucket", s3Object.getBucketName());
> 
> I believe declaring the variable within the try block will lead to its
> proper and certain closure, but I'm not 100% on all the fine print with
> that.  Is this what you are referring to, and does it not work as I hope?
> 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/FetchS3Object.java#L106
> 
> Thanks,
> 
> James
> 
> 
> On Wed, Mar 22, 2017 at 12:41 PM, David Hesson  wrote:
> 
> > Greetings,
> >
> > In investigating a connection pool issue we were having during development,
> > I was checking the FetchS3Object code to see how it reads content from S3.
> > I don't see a close()
> >  > amazonaws/services/s3/model/S3Object.html#close-->invocation
> > on the S3Object in the FetchS3Object processor. I believe this can lead to
> > leaks on that object.
> >
> > We we're seeing logs like the following after trying to process some 90k
> > objects from S3:
> > INFO [Timer-Driven Process Thread-55] com.amazonaws.http.AmazonHttpClient
> > Unable to execute HTTP request: Timeout waiting for connection from pool
> >
> > Is the S3Object not closed because the stream content is lazily loaded
> > later in the flow (when accessed)? I didn't check the processSession
> > implementation which reads the input stream. Just figured I'd ask and see
> > if you all were aware, or that this is for some reason by design.
> >
> > Thanks,
> > dh
> >
> 


Re: FetchS3Object leak

2017-03-22 Thread James Wing
David,

Can you clarify which part of the FetchS3Object code looks problematic to
you?  From a quick look, I found one use of S3Object in FetchS3Object.java,
line ~106:

try (final S3Object s3Object = client.getObject(request)) {
flowFile = session.importFrom(s3Object.getObjectContent(),
flowFile);
attributes.put("s3.bucket", s3Object.getBucketName());

I believe declaring the variable within the try block will lead to its
proper and certain closure, but I'm not 100% on all the fine print with
that.  Is this what you are referring to, and does it not work as I hope?

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/FetchS3Object.java#L106

Thanks,

James


On Wed, Mar 22, 2017 at 12:41 PM, David Hesson  wrote:

> Greetings,
>
> In investigating a connection pool issue we were having during development,
> I was checking the FetchS3Object code to see how it reads content from S3.
> I don't see a close()
>  amazonaws/services/s3/model/S3Object.html#close-->invocation
> on the S3Object in the FetchS3Object processor. I believe this can lead to
> leaks on that object.
>
> We we're seeing logs like the following after trying to process some 90k
> objects from S3:
> INFO [Timer-Driven Process Thread-55] com.amazonaws.http.AmazonHttpClient
> Unable to execute HTTP request: Timeout waiting for connection from pool
>
> Is the S3Object not closed because the stream content is lazily loaded
> later in the flow (when accessed)? I didn't check the processSession
> implementation which reads the input stream. Just figured I'd ask and see
> if you all were aware, or that this is for some reason by design.
>
> Thanks,
> dh
>


FetchS3Object leak

2017-03-22 Thread David Hesson
Greetings,

In investigating a connection pool issue we were having during development,
I was checking the FetchS3Object code to see how it reads content from S3.
I don't see a close()
invocation
on the S3Object in the FetchS3Object processor. I believe this can lead to
leaks on that object.

We we're seeing logs like the following after trying to process some 90k
objects from S3:
INFO [Timer-Driven Process Thread-55] com.amazonaws.http.AmazonHttpClient
Unable to execute HTTP request: Timeout waiting for connection from pool

Is the S3Object not closed because the stream content is lazily loaded
later in the flow (when accessed)? I didn't check the processSession
implementation which reads the input stream. Just figured I'd ask and see
if you all were aware, or that this is for some reason by design.

Thanks,
dh