Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-23 Thread Roberto Coluccio
Any chance anyone gave a look at this?

Thanks!

On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio <
roberto.coluc...@gmail.com> wrote:

> Thanks Shixiong!
>
> I'm attaching the thread dumps (I printed the Spark UI after expanding all
> the elements, hope that's fine) and related stderr (INFO level) executors
> logs. There are 3 of them. Thread dumps have been collected at the time the
> StreamingContext was (trying to) shutdown, i.e. when I saw the following
> logs in driver's stderr:
>
> 16/02/10 15:46:25 INFO ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0
> 16/02/10 15:46:25 INFO StreamingContext: Invoking stop(stopGracefully=true) 
> from shutdown hook
> 16/02/10 15:46:25 INFO ReceiverTracker: Sent stop signal to all 3 receivers
> 16/02/10 15:46:35 INFO ReceiverTracker: Waiting for receiver job to terminate 
> gracefully
>
>
> Then, from 15:50 ongoing, the driver started again to report logs as it
> was continuing to process as usual. You might find some exceptions in
> executors logs that have right the 15:50 timestamp.
>
> Thanks you very much in advance!
>
> Roberto
>
>
>
> On Tue, Feb 9, 2016 at 6:25 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Could you do a thread dump in the executor that runs the Kinesis receiver
>> and post it? It would be great if you can provide the executor log as well?
>>
>> On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio <
>> roberto.coluc...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> can anybody kindly help me out a little bit here? I just verified the
>>> problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's
>>> definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully
>>> able to get Streaming drivers to terminate with no issue IF I don't use
>>> Kinesis and open any Receivers.
>>>
>>> Thank you!
>>>
>>> Roberto
>>>
>>>
>>> On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <
>>> roberto.coluc...@gmail.com> wrote:
>>>
 Hi,

 I'm struggling around an issue ever since I tried to upgrade my Spark
 Streaming solution from 1.4.1 to 1.5+.

 I have a Spark Streaming app which creates 3 ReceiverInputDStreams
 leveraging KinesisUtils.createStream API.

 I used to leverage a timeout to terminate my app
 (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf
 spark.streaming.stopGracefullyOnShutdown=true).

 I used to submit my Spark app on EMR in yarn-cluster mode.

 Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9).

 Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0
 on emr-4.3.0) I can't get the app to actually terminate. Logs tells me it
 tries to, but no confirmation of receivers stop is retrieved. Instead, when
 the timer gets to the next period, the StreamingContext continues its
 processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem
 and pmem killls disabled).

 ...

 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, 
 exitCode: 0
 16/02/02 21:22:08 INFO StreamingContext: Invoking 
 stop(stopGracefully=true) from shutdown hook
 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers
 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to 
 terminate gracefully
 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB)
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
 ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 
 MB)
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
 ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 
 MB)
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
 ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 
 MB)
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
 ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 
 MB)
 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB)
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
 ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 
 MB)
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
 ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 
 MB)
 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
 ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 
 MB)
 16/02/02 21:22:52 INFO 

Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-09 Thread Shixiong(Ryan) Zhu
Could you do a thread dump in the executor that runs the Kinesis receiver
and post it? It would be great if you can provide the executor log as well?

On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio  wrote:

> Hello,
>
> can anybody kindly help me out a little bit here? I just verified the
> problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's
> definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully
> able to get Streaming drivers to terminate with no issue IF I don't use
> Kinesis and open any Receivers.
>
> Thank you!
>
> Roberto
>
>
> On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <
> roberto.coluc...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm struggling around an issue ever since I tried to upgrade my Spark
>> Streaming solution from 1.4.1 to 1.5+.
>>
>> I have a Spark Streaming app which creates 3 ReceiverInputDStreams
>> leveraging KinesisUtils.createStream API.
>>
>> I used to leverage a timeout to terminate my app
>> (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf
>> spark.streaming.stopGracefullyOnShutdown=true).
>>
>> I used to submit my Spark app on EMR in yarn-cluster mode.
>>
>> Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9).
>>
>> Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on
>> emr-4.3.0) I can't get the app to actually terminate. Logs tells me it
>> tries to, but no confirmation of receivers stop is retrieved. Instead, when
>> the timer gets to the next period, the StreamingContext continues its
>> processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem
>> and pmem killls disabled).
>>
>> ...
>>
>> 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, 
>> exitCode: 0
>> 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) 
>> from shutdown hook
>> 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers
>> 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to 
>> terminate gracefully
>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB)
>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB)
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 145444830 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 145444830 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 145444800 ms to 
>> 145444830 ms (aligned to 145444800 ms and 145444830 ms)
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 145444830 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 145444830 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 145444830 ms
>> 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time 
>> 145444830 ms
>> 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time 
>> 145444830 ms
>> 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 
>> 145444830 ms.0 from job set of time 145444830 ms
>>
>> ...
>>
>>
>> Please, this is really blocking in the upgrade process to latest Spark
>> versions and I really don't know how to work it around.
>>
>> Any help would be very much appreciated.
>>
>> Thank you,
>>
>> Roberto
>>
>>
>>
>


Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-09 Thread Roberto Coluccio
Hello,

can anybody kindly help me out a little bit here? I just verified the
problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's
definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully
able to get Streaming drivers to terminate with no issue IF I don't use
Kinesis and open any Receivers.

Thank you!

Roberto


On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio  wrote:

> Hi,
>
> I'm struggling around an issue ever since I tried to upgrade my Spark
> Streaming solution from 1.4.1 to 1.5+.
>
> I have a Spark Streaming app which creates 3 ReceiverInputDStreams
> leveraging KinesisUtils.createStream API.
>
> I used to leverage a timeout to terminate my app
> (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf
> spark.streaming.stopGracefullyOnShutdown=true).
>
> I used to submit my Spark app on EMR in yarn-cluster mode.
>
> Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9).
>
> Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on
> emr-4.3.0) I can't get the app to actually terminate. Logs tells me it
> tries to, but no confirmation of receivers stop is retrieved. Instead, when
> the timer gets to the next period, the StreamingContext continues its
> processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem
> and pmem killls disabled).
>
> ...
>
> 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0
> 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) 
> from shutdown hook
> 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers
> 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to terminate 
> gracefully
> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB)
> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB)
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 145444830 
> ms for checkpointing
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 145444830 
> ms for checkpointing
> 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 145444800 ms to 
> 145444830 ms (aligned to 145444800 ms and 145444830 ms)
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 145444830 
> ms for checkpointing
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 145444830 
> ms for checkpointing
> 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 145444830 ms
> 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time 
> 145444830 ms
> 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time 
> 145444830 ms
> 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 145444830 
> ms.0 from job set of time 145444830 ms
>
> ...
>
>
> Please, this is really blocking in the upgrade process to latest Spark
> versions and I really don't know how to work it around.
>
> Any help would be very much appreciated.
>
> Thank you,
>
> Roberto
>
>
>


[Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-02 Thread Roberto Coluccio
Hi,

I'm struggling around an issue ever since I tried to upgrade my Spark
Streaming solution from 1.4.1 to 1.5+.

I have a Spark Streaming app which creates 3 ReceiverInputDStreams
leveraging KinesisUtils.createStream API.

I used to leverage a timeout to terminate my app
(StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf
spark.streaming.stopGracefullyOnShutdown=true).

I used to submit my Spark app on EMR in yarn-cluster mode.

Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9).

Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on
emr-4.3.0) I can't get the app to actually terminate. Logs tells me it
tries to, but no confirmation of receivers stop is retrieved. Instead, when
the timer gets to the next period, the StreamingContext continues its
processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem
and pmem killls disabled).

...

16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED,
exitCode: 0
16/02/02 21:22:08 INFO StreamingContext: Invoking
stop(stopGracefully=true) from shutdown hook
16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers
16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to
terminate gracefully
16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0
on 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0
on ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free:
1224.9 MB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0
on ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free:
1224.0 MB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0
on ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free:
1224.9 MB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0
on ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free:
1224.7 MB)
16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0
on 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0
on ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free:
1224.9 MB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0
on ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free:
1224.9 MB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0
on ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free:
1224.0 MB)
16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0
on ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free:
1224.7 MB)
16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time
145444830 ms for checkpointing
16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time
145444830 ms for checkpointing
16/02/02 21:25:00 INFO TransformedDStream: Slicing from 145444800
ms to 145444830 ms (aligned to 145444800 ms and 145444830
ms)
16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time
145444830 ms for checkpointing
16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time
145444830 ms for checkpointing
16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 145444830 ms
16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time
145444830 ms
16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time
145444830 ms
16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job
145444830 ms.0 from job set of time 145444830 ms

...


Please, this is really blocking in the upgrade process to latest Spark
versions and I really don't know how to work it around.

Any help would be very much appreciated.

Thank you,

Roberto