Thanks for the help. I set --executor-cores and it works now. I've used --total-executor-cores and don't realize it changed.
Tathagata Das <t...@databricks.com>于2015年7月10日周五 上午3:11写道: > 1. There will be a long running job with description "start()" as that is > the jobs that is running the receivers. It will never end. > > 2. You need to set the number of cores given to the Spark executors by the > YARN container. That is SparkConf spark.executor.cores, --executor-cores > in spark-submit. Since it is by default 1, your only container has one core > which is occupied by the receiver, leaving no cores to run the map tasks. > So the map stage is blocked > > 3. Note these log lines. Especially "15/07/09 18:29:00 INFO > receiver.ReceiverSupervisorImpl: Received stop signal" . I think somehow > your streaming context is being shutdown too early which is causing the > KafkaReceiver to stop. Something your should debug. > > > 15/07/09 18:27:13 INFO consumer.ConsumerFetcherThread: > [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42], > Starting > 15/07/09 18:27:13 INFO consumer.ConsumerFetcherManager: > [ConsumerFetcherManager-1436437633199] Added fetcher for partitions > ArrayBuffer([[adhoc_data,0], initOffset 53 to broker > id:42,host:szq1.appadhoc.com,port:9092] ) > 15/07/09 18:27:13 INFO storage.MemoryStore: ensureFreeSpace(1680) called with > curMem=96628, maxMem=16669841817 > 15/07/09 18:27:13 INFO storage.MemoryStore: Block input-0-1436437633600 > stored as bytes in memory (estimated size 1680.0 B, free 15.5 GB) > 15/07/09 18:27:13 WARN storage.BlockManager: Block input-0-1436437633600 > replicated to only 0 peer(s) instead of 1 peers > 15/07/09 18:27:14 INFO receiver.BlockGenerator: Pushed block > input-0-1436437633600*15/07/09 18:29:00 INFO receiver.ReceiverSupervisorImpl: > Received stop signal > *15/07/09 18:29:00 INFO receiver.ReceiverSupervisorImpl: Stopping receiver > with message: Stopped by driver: > 15/07/09 18:29:00 INFO consumer.ZookeeperConsumerConnector: > [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201], > ZKConsumerConnector shutting down > 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager: > [ConsumerFetcherManager-1436437633199] Stopping leader finder thread > 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: > [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread], > Shutting down > 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: > [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread], > Stopped > 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: > [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread], > Shutdown completed > 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager: > [ConsumerFetcherManager-1436437633199] Stopping all fetchers > 15/07/09 18:29:00 INFO consumer.ConsumerFetcherThread: > [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42], > Shutting down > 15/07/09 18:29:01 INFO consumer.SimpleConsumer: Reconnect due to socket > error: java.nio.channels.ClosedByInterruptException > 15/07/09 18:29:01 INFO consumer.ConsumerFetcherThread: > [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42], > Stopped > 15/07/09 18:29:01 INFO consumer.ConsumerFetcherThread: > [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42], > Shutdown completed > 15/07/09 18:29:01 INFO consumer.ConsumerFetcherManager: > [ConsumerFetcherManager-1436437633199] All connections stopped > 15/07/09 18:29:01 INFO zkclient.ZkEventThread: Terminate ZkClient event > thread. > 15/07/09 18:29:01 INFO zookeeper.ZooKeeper: Session: 0x14e70eedca00315 closed > 15/07/09 18:29:01 INFO zookeeper.ClientCnxn: EventThread shut down > 15/07/09 18:29:01 INFO consumer.ZookeeperConsumerConnector: > [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201], > ZKConsumerConnector shutdown completed in 74 ms > 15/07/09 18:29:01 INFO receiver.ReceiverSupervisorImpl: Called receiver onStop > 15/07/09 18:29:01 INFO receiver.ReceiverSupervisorImpl: Deregistering > receiver 0 > > > >