Re: GetSQS causes high CPU usage

2015-10-20 Thread Adam Lamar

Adam,

Thanks for the reply!

Amazon supports (and recommends) long polling on SQS queues[1]. The 
GetSQS code doesn't attempt long polling at all, but I wasn't sure if 
this was intentional or if the option had just never been added. With a 
20 second long poll, the processor would make 3 requests per minute 
instead of 60, assuming the queue was empty during that time.


Another data point - even during high CPU usage, the GetSQS processor 
was only making one request per second to SQS (verified via tcpdump). 
While not ideal from a billing perspective, doesn't it seem wrong that 1 
request a second is causing such high CPU?


Perhaps to muddy the waters a bit, I played with the run schedule 
yesterday, and even now that I've turned it back to 1 second, CPU usage 
is remaining low. Before I could start/stop GetSQS repeatedly and 
observe the high CPU usage, but now I can't reproduce it. If I'm able to 
consistently reproduce the issue in the future, I'll be sure to post again.


Cheers,
Adam


[1] 
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html


On 10/20/15 4:37 AM, Adam Estrada wrote:

Adam,

I suspect that getSQS is polling Amazon to check for data. It's not exactly 
like your standard message broker in that you have to force the poll. Anyway, 
throw a wait time in there and see if that fixes it. This will also help lower 
your monthly Amazon bill...

Adam



On Oct 19, 2015, at 11:41 PM, Adam Lamar  wrote:

Hi everybody!

I've been testing NiFi 0.3.0 with the GetSQS processor to fetch objects from an 
AWS bucket as they're created. My flow looks like this:

GetSQS
SplitJson
ExtractText
FetchS3Object
PutFile

I noticed that GetSQS causes a high amount of CPU usage - about 90% of one 
core. If I turn off GetSQS, CPU usage immediately drops to 2%. If I turn GetSQS 
back on with the run schedule at 10, it stays at 2%.

Would it be worth using setWaitTimeSeconds [1] to make the SQS receive a 
blocking call? Alternatively, should GetSQS default to a longer run schedule?

Cheers,
Adam


[1] 
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/sqs/model/ReceiveMessageRequest.html#setWaitTimeSeconds(java.lang.Integer)




Re: output port

2015-10-20 Thread Bryan Bende
Ok, can you describe the flow in NiFi leading up to the output port? What
kind of data is in the content of the FlowFiles?

On Tuesday, October 20, 2015, Rama Krishna Manne <
chaitanya.mann...@gmail.com> wrote:

> so you are seeing the messages reach the output port and then get removed
> from the queue
>
> yes
>
> And on the spark side the NiFi Spark receiver never receives anything? Or
> it receives message, but they have no content?
>
> It receives the data but no content to do computation
>
> On Mon, Oct 19, 2015 at 2:14 PM, Bryan Bende  > wrote:
>
>> Hello,
>>
>> Just to clarify, so you are seeing the messages reach the output port and
>> then get removed from the queue? And on the spark side the NiFi Spark
>> receiver never receives anything? Or it receives message, but they have no
>> content?
>>
>> -Bryan
>>
>>
>> On Monday, October 19, 2015, Rama Krishna Manne <
>> chaitanya.mann...@gmail.com
>> > wrote:
>>
>>> I have an flow in which messages are emitted to an output and
>>> apache-spark will pull the messages from the port , I see the messages are
>>> pulled by spark but cannot see the data pulled(cannot do any computations)
>>>
>>> I tried a different way , the messages are pushed apache kafka and spark
>>> pulls messages from kafka queue instead of output port and this one worked.
>>>
>>> my question is why it didn't work for output port  and using kafka
>>> instead of output port is it a good flow ?
>>>
>>>
>>>
>>>
>>
>> --
>> Sent from Gmail Mobile
>>
>
>

-- 
Sent from Gmail Mobile


Re: output port

2015-10-20 Thread Rama Krishna Manne
so you are seeing the messages reach the output port and then get removed
from the queue

yes

And on the spark side the NiFi Spark receiver never receives anything? Or
it receives message, but they have no content?

It receives the data but no content to do computation

On Mon, Oct 19, 2015 at 2:14 PM, Bryan Bende  wrote:

> Hello,
>
> Just to clarify, so you are seeing the messages reach the output port and
> then get removed from the queue? And on the spark side the NiFi Spark
> receiver never receives anything? Or it receives message, but they have no
> content?
>
> -Bryan
>
>
> On Monday, October 19, 2015, Rama Krishna Manne <
> chaitanya.mann...@gmail.com> wrote:
>
>> I have an flow in which messages are emitted to an output and
>> apache-spark will pull the messages from the port , I see the messages are
>> pulled by spark but cannot see the data pulled(cannot do any computations)
>>
>> I tried a different way , the messages are pushed apache kafka and spark
>> pulls messages from kafka queue instead of output port and this one worked.
>>
>> my question is why it didn't work for output port  and using kafka
>> instead of output port is it a good flow ?
>>
>>
>>
>>
>
> --
> Sent from Gmail Mobile
>