Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers
Any chance anyone gave a look at this? Thanks! On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio < roberto.coluc...@gmail.com> wrote: > Thanks Shixiong! > > I'm attaching the thread dumps (I printed the Spark UI after expanding all > the elements, hope that's fine) and related stderr (INFO level) executors > logs. There are 3 of them. Thread dumps have been collected at the time the > StreamingContext was (trying to) shutdown, i.e. when I saw the following > logs in driver's stderr: > > 16/02/10 15:46:25 INFO ApplicationMaster: Final app status: SUCCEEDED, > exitCode: 0 > 16/02/10 15:46:25 INFO StreamingContext: Invoking stop(stopGracefully=true) > from shutdown hook > 16/02/10 15:46:25 INFO ReceiverTracker: Sent stop signal to all 3 receivers > 16/02/10 15:46:35 INFO ReceiverTracker: Waiting for receiver job to terminate > gracefully > > > Then, from 15:50 ongoing, the driver started again to report logs as it > was continuing to process as usual. You might find some exceptions in > executors logs that have right the 15:50 timestamp. > > Thanks you very much in advance! > > Roberto > > > > On Tue, Feb 9, 2016 at 6:25 PM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Could you do a thread dump in the executor that runs the Kinesis receiver >> and post it? It would be great if you can provide the executor log as well? >> >> On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio < >> roberto.coluc...@gmail.com> wrote: >> >>> Hello, >>> >>> can anybody kindly help me out a little bit here? I just verified the >>> problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's >>> definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully >>> able to get Streaming drivers to terminate with no issue IF I don't use >>> Kinesis and open any Receivers. >>> >>> Thank you! >>> >>> Roberto >>> >>> >>> On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio < >>> roberto.coluc...@gmail.com> wrote: >>> Hi, I'm struggling around an issue ever since I tried to upgrade my Spark Streaming solution from 1.4.1 to 1.5+. I have a Spark Streaming app which creates 3 ReceiverInputDStreams leveraging KinesisUtils.createStream API. I used to leverage a timeout to terminate my app (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf spark.streaming.stopGracefullyOnShutdown=true). I used to submit my Spark app on EMR in yarn-cluster mode. Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9). Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on emr-4.3.0) I can't get the app to actually terminate. Logs tells me it tries to, but no confirmation of receivers stop is retrieved. Instead, when the timer gets to the next period, the StreamingContext continues its processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem and pmem killls disabled). ... 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) from shutdown hook 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to terminate gracefully 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB) 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB) 16/02/02 21:22:52 INFO
Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers
Could you do a thread dump in the executor that runs the Kinesis receiver and post it? It would be great if you can provide the executor log as well? On Tue, Feb 9, 2016 at 3:14 PM, Roberto Colucciowrote: > Hello, > > can anybody kindly help me out a little bit here? I just verified the > problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's > definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully > able to get Streaming drivers to terminate with no issue IF I don't use > Kinesis and open any Receivers. > > Thank you! > > Roberto > > > On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio < > roberto.coluc...@gmail.com> wrote: > >> Hi, >> >> I'm struggling around an issue ever since I tried to upgrade my Spark >> Streaming solution from 1.4.1 to 1.5+. >> >> I have a Spark Streaming app which creates 3 ReceiverInputDStreams >> leveraging KinesisUtils.createStream API. >> >> I used to leverage a timeout to terminate my app >> (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf >> spark.streaming.stopGracefullyOnShutdown=true). >> >> I used to submit my Spark app on EMR in yarn-cluster mode. >> >> Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9). >> >> Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on >> emr-4.3.0) I can't get the app to actually terminate. Logs tells me it >> tries to, but no confirmation of receivers stop is retrieved. Instead, when >> the timer gets to the next period, the StreamingContext continues its >> processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem >> and pmem killls disabled). >> >> ... >> >> 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, >> exitCode: 0 >> 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) >> from shutdown hook >> 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers >> 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to >> terminate gracefully >> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141 >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >> 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >> ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >> ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >> ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on >> ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB) >> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184 >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >> 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >> ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >> ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >> ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB) >> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on >> ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB) >> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 145444830 >> ms for checkpointing >> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 145444830 >> ms for checkpointing >> 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 145444800 ms to >> 145444830 ms (aligned to 145444800 ms and 145444830 ms) >> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 145444830 >> ms for checkpointing >> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 145444830 >> ms for checkpointing >> 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 145444830 ms >> 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time >> 145444830 ms >> 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time >> 145444830 ms >> 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job >> 145444830 ms.0 from job set of time 145444830 ms >> >> ... >> >> >> Please, this is really blocking in the upgrade process to latest Spark >> versions and I really don't know how to work it around. >> >> Any help would be very much appreciated. >> >> Thank you, >> >> Roberto >> >> >> >
Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers
Hello, can anybody kindly help me out a little bit here? I just verified the problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully able to get Streaming drivers to terminate with no issue IF I don't use Kinesis and open any Receivers. Thank you! Roberto On Tue, Feb 2, 2016 at 4:40 PM, Roberto Colucciowrote: > Hi, > > I'm struggling around an issue ever since I tried to upgrade my Spark > Streaming solution from 1.4.1 to 1.5+. > > I have a Spark Streaming app which creates 3 ReceiverInputDStreams > leveraging KinesisUtils.createStream API. > > I used to leverage a timeout to terminate my app > (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf > spark.streaming.stopGracefullyOnShutdown=true). > > I used to submit my Spark app on EMR in yarn-cluster mode. > > Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9). > > Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on > emr-4.3.0) I can't get the app to actually terminate. Logs tells me it > tries to, but no confirmation of receivers stop is retrieved. Instead, when > the timer gets to the next period, the StreamingContext continues its > processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem > and pmem killls disabled). > > ... > > 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, > exitCode: 0 > 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) > from shutdown hook > 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers > 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to terminate > gracefully > 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141 > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB) > 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184 > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB) > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 145444830 > ms for checkpointing > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 145444830 > ms for checkpointing > 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 145444800 ms to > 145444830 ms (aligned to 145444800 ms and 145444830 ms) > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 145444830 > ms for checkpointing > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 145444830 > ms for checkpointing > 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 145444830 ms > 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time > 145444830 ms > 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time > 145444830 ms > 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 145444830 > ms.0 from job set of time 145444830 ms > > ... > > > Please, this is really blocking in the upgrade process to latest Spark > versions and I really don't know how to work it around. > > Any help would be very much appreciated. > > Thank you, > > Roberto > > >
[Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers
Hi, I'm struggling around an issue ever since I tried to upgrade my Spark Streaming solution from 1.4.1 to 1.5+. I have a Spark Streaming app which creates 3 ReceiverInputDStreams leveraging KinesisUtils.createStream API. I used to leverage a timeout to terminate my app (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf spark.streaming.stopGracefullyOnShutdown=true). I used to submit my Spark app on EMR in yarn-cluster mode. Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9). Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on emr-4.3.0) I can't get the app to actually terminate. Logs tells me it tries to, but no confirmation of receivers stop is retrieved. Instead, when the timer gets to the next period, the StreamingContext continues its processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem and pmem killls disabled). ... 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) from shutdown hook 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to terminate gracefully 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB) 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB) 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB) 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 145444830 ms for checkpointing 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 145444830 ms for checkpointing 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 145444800 ms to 145444830 ms (aligned to 145444800 ms and 145444830 ms) 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 145444830 ms for checkpointing 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 145444830 ms for checkpointing 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 145444830 ms 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time 145444830 ms 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time 145444830 ms 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 145444830 ms.0 from job set of time 145444830 ms ... Please, this is really blocking in the upgrade process to latest Spark versions and I really don't know how to work it around. Any help would be very much appreciated. Thank you, Roberto