How many executors and cores do you acquire?

td

On Thu, Oct 8, 2015 at 6:11 PM, Bharath Mukkati <sparknewbie1...@gmail.com>
wrote:

> Hi Spark Users,
>
> I am testing my application on Spark 1.5 and kinesis-asl-1.5. The
> streaming application starts but I see a ton of stages scheduled for
> ReceiverTracker (submitJob at ReceiverTracker.scala:557 <http://xxx>).
>
> In the driver logs I see this sequence repeat:
> 15/10/09 00:10:54 INFO INFO ReceiverTracker: Starting 100 receivers
> 15/10/09 00:10:54 INFO ReceiverTracker: ReceiverTracker started
>
> 15/10/09 00:10:54 INFO ReceiverTracker: Receiver 0 started
> 15/10/09 00:10:54 DEBUG ClosureCleaner: +++ Cleaning closure <function1>
> (org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9)
> +++
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + declared fields: 3
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      public static final long
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.serialVersionUID
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      private final scala.Option
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.checkpointDirOption$1
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      private final
> org.apache.spark.util.SerializableConfiguration
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.serializableHadoopConf$1
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + declared methods: 2
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      public final java.lang.Object
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(java.lang.Object)
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      public final void
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(scala.collection.Iterator)
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + inner classes: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + outer classes: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + outer objects: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + populating accessed fields
> because this is the starting closure
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + fields accessed by starting
> closure: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  +++ closure <function1>
> (org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9)
> is now cleaned +++
>
> ...
> (and so on for 100 receivers)
>
> And then I start seeing ...
> 15/10/09 00:11:02 INFO ReceiverTracker: Restarting Receiver 36
> .. and so on for the other receivers
>
> After which the I see Receiver started logs
> 15/10/09 00:11:02 INFO ReceiverTracker: Receiver 20 started
> ..
> Again the Restarting Receiver logs appear
>
> After a while the driver hangs, no new logs appear although the app seems
> to be running. The streaming console shows scheduled stages and jobs.
>
> There are no ERROR logs in the driver. However I see the following
> Exceptions (DEBUG logs)
>
> akka.remote.ShutDownAssociation: Shut down address:
> akka.tcp://driverPropsFetcher@ip-<xxx>:57886
> Caused by: akka.remote.transport.Transport$InvalidAssociationException:
> The remote system terminated the association because it is shutting down.
> ] from Actor[akka://sparkDriver/deadLetters]
> 15/10/09 00:10:37 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
> message AssociationError [akka.tcp://sparkDriver@<xxx>:39053] <-
> [akka.tcp://driverPropsFetcher@<xxx>:57886]: Error [Shut down address:
> akka.tcp://driverPropsFetcher@<xxx>:57886] [
> akka.remote.ShutDownAssociation: Shut down address:
> akka.tcp://driverPropsFetcher@<xxx>:57886
> Caused by: akka.remote.transport.Transport$InvalidAssociationException:
> The remote system terminated the association because it is shutting down.
> ] from Actor[akka://sparkDriver/deadLetters]
>
> In one of the executor logs I see the following Exceptions:
>
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 WARN receiver.ReceiverSupervisorImpl: Skip stopping receiver
> because it has not yet stared
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Stopping BlockGenerator
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Waiting for block pushing thread to
> terminate
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Pushing out the last 0 blocks
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Stopped block pushing thread
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Stopped BlockGenerator
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.ReceiverSupervisorImpl: Waiting for receiver to be
> stopped
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.ReceiverSupervisorImpl: Stopped receiver without
> error
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:38 INFO receiver.BlockGenerator: Started BlockGenerator
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:38 INFO receiver.BlockGenerator: Started block pushing thread
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:38 INFO receiver.ReceiverSupervisorImpl: Stopping receiver with
> message: Registered unsuccessfully because Driver refused to start receiver
> 46:
>
>
> There is no data in the kinesis stream from where the app is reading. The
> number of shards is 100. And the app starts 100 receivers.
>
> Has anyone else seen this behavior? Any ideas on how I can debug the
> problem and find out the root cause and fix would be very helpful.
>
> Thanks,
>
> Bharath
>
>
>

Reply via email to