Re: GetSQS causes high CPU usage

2015-11-03 Thread Joe Witt
Adam,

Just wanted to follow up on this.  Have you had any better results and
should we put a JIRA in behind what you're seeing?

Thanks
Joe

On Tue, Oct 20, 2015 at 7:58 PM, Adam Lamar  wrote:
> Adam,
>
> Thanks for the reply!
>
> Amazon supports (and recommends) long polling on SQS queues[1]. The GetSQS
> code doesn't attempt long polling at all, but I wasn't sure if this was
> intentional or if the option had just never been added. With a 20 second
> long poll, the processor would make 3 requests per minute instead of 60,
> assuming the queue was empty during that time.
>
> Another data point - even during high CPU usage, the GetSQS processor was
> only making one request per second to SQS (verified via tcpdump). While not
> ideal from a billing perspective, doesn't it seem wrong that 1 request a
> second is causing such high CPU?
>
> Perhaps to muddy the waters a bit, I played with the run schedule yesterday,
> and even now that I've turned it back to 1 second, CPU usage is remaining
> low. Before I could start/stop GetSQS repeatedly and observe the high CPU
> usage, but now I can't reproduce it. If I'm able to consistently reproduce
> the issue in the future, I'll be sure to post again.
>
> Cheers,
> Adam
>
>
> [1]
> http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html
>
>
> On 10/20/15 4:37 AM, Adam Estrada wrote:
>>
>> Adam,
>>
>> I suspect that getSQS is polling Amazon to check for data. It's not
>> exactly like your standard message broker in that you have to force the
>> poll. Anyway, throw a wait time in there and see if that fixes it. This will
>> also help lower your monthly Amazon bill...
>>
>> Adam
>>
>>
>>> On Oct 19, 2015, at 11:41 PM, Adam Lamar  wrote:
>>>
>>> Hi everybody!
>>>
>>> I've been testing NiFi 0.3.0 with the GetSQS processor to fetch objects
>>> from an AWS bucket as they're created. My flow looks like this:
>>>
>>> GetSQS
>>> SplitJson
>>> ExtractText
>>> FetchS3Object
>>> PutFile
>>>
>>> I noticed that GetSQS causes a high amount of CPU usage - about 90% of
>>> one core. If I turn off GetSQS, CPU usage immediately drops to 2%. If I turn
>>> GetSQS back on with the run schedule at 10, it stays at 2%.
>>>
>>> Would it be worth using setWaitTimeSeconds [1] to make the SQS receive a
>>> blocking call? Alternatively, should GetSQS default to a longer run
>>> schedule?
>>>
>>> Cheers,
>>> Adam
>>>
>>>
>>> [1]
>>> http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sqs/model/ReceiveMessageRequest.html#setWaitTimeSeconds(java.lang.Integer)
>
>


Re: GetSQS causes high CPU usage

2015-11-03 Thread Adam Lamar

Hey Joe,

I think there are two possible JIRAs.

1) Add long polling support using setWaitTimeSeconds() - should be 
really easy. I can take a crack at a pull request. Here's a JIRA: 
https://issues.apache.org/jira/browse/NIFI-1103


2) Investigate the high CPU usage. I saw this initially for several 
days, but it went away after I adjusted the run schedule (from 1 second 
to 10 seconds back to 1 second). I have CPU charts showing the high 
usage and corresponding drop, but I need to reproduce the issue.


I'll circle back in a few days when I get some time to work on it.

Cheers,
Adam

On 11/3/15 2:41 AM, Joe Witt wrote:

Adam,

Just wanted to follow up on this.  Have you had any better results and
should we put a JIRA in behind what you're seeing?

Thanks
Joe

On Tue, Oct 20, 2015 at 7:58 PM, Adam Lamar  wrote:

Adam,

Thanks for the reply!

Amazon supports (and recommends) long polling on SQS queues[1]. The GetSQS
code doesn't attempt long polling at all, but I wasn't sure if this was
intentional or if the option had just never been added. With a 20 second
long poll, the processor would make 3 requests per minute instead of 60,
assuming the queue was empty during that time.

Another data point - even during high CPU usage, the GetSQS processor was
only making one request per second to SQS (verified via tcpdump). While not
ideal from a billing perspective, doesn't it seem wrong that 1 request a
second is causing such high CPU?

Perhaps to muddy the waters a bit, I played with the run schedule yesterday,
and even now that I've turned it back to 1 second, CPU usage is remaining
low. Before I could start/stop GetSQS repeatedly and observe the high CPU
usage, but now I can't reproduce it. If I'm able to consistently reproduce
the issue in the future, I'll be sure to post again.

Cheers,
Adam


[1]
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html


On 10/20/15 4:37 AM, Adam Estrada wrote:

Adam,

I suspect that getSQS is polling Amazon to check for data. It's not
exactly like your standard message broker in that you have to force the
poll. Anyway, throw a wait time in there and see if that fixes it. This will
also help lower your monthly Amazon bill...

Adam



On Oct 19, 2015, at 11:41 PM, Adam Lamar  wrote:

Hi everybody!

I've been testing NiFi 0.3.0 with the GetSQS processor to fetch objects
from an AWS bucket as they're created. My flow looks like this:

GetSQS
SplitJson
ExtractText
FetchS3Object
PutFile

I noticed that GetSQS causes a high amount of CPU usage - about 90% of
one core. If I turn off GetSQS, CPU usage immediately drops to 2%. If I turn
GetSQS back on with the run schedule at 10, it stays at 2%.

Would it be worth using setWaitTimeSeconds [1] to make the SQS receive a
blocking call? Alternatively, should GetSQS default to a longer run
schedule?

Cheers,
Adam


[1]
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sqs/model/ReceiveMessageRequest.html#setWaitTimeSeconds(java.lang.Integer)






Re: GetSQS causes high CPU usage

2015-10-20 Thread Adam Lamar

Adam,

Thanks for the reply!

Amazon supports (and recommends) long polling on SQS queues[1]. The 
GetSQS code doesn't attempt long polling at all, but I wasn't sure if 
this was intentional or if the option had just never been added. With a 
20 second long poll, the processor would make 3 requests per minute 
instead of 60, assuming the queue was empty during that time.


Another data point - even during high CPU usage, the GetSQS processor 
was only making one request per second to SQS (verified via tcpdump). 
While not ideal from a billing perspective, doesn't it seem wrong that 1 
request a second is causing such high CPU?


Perhaps to muddy the waters a bit, I played with the run schedule 
yesterday, and even now that I've turned it back to 1 second, CPU usage 
is remaining low. Before I could start/stop GetSQS repeatedly and 
observe the high CPU usage, but now I can't reproduce it. If I'm able to 
consistently reproduce the issue in the future, I'll be sure to post again.


Cheers,
Adam


[1] 
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html


On 10/20/15 4:37 AM, Adam Estrada wrote:

Adam,

I suspect that getSQS is polling Amazon to check for data. It's not exactly 
like your standard message broker in that you have to force the poll. Anyway, 
throw a wait time in there and see if that fixes it. This will also help lower 
your monthly Amazon bill...

Adam



On Oct 19, 2015, at 11:41 PM, Adam Lamar  wrote:

Hi everybody!

I've been testing NiFi 0.3.0 with the GetSQS processor to fetch objects from an 
AWS bucket as they're created. My flow looks like this:

GetSQS
SplitJson
ExtractText
FetchS3Object
PutFile

I noticed that GetSQS causes a high amount of CPU usage - about 90% of one 
core. If I turn off GetSQS, CPU usage immediately drops to 2%. If I turn GetSQS 
back on with the run schedule at 10, it stays at 2%.

Would it be worth using setWaitTimeSeconds [1] to make the SQS receive a 
blocking call? Alternatively, should GetSQS default to a longer run schedule?

Cheers,
Adam


[1] 
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sqs/model/ReceiveMessageRequest.html#setWaitTimeSeconds(java.lang.Integer)