Re: Spark 2.4.0 Master going down

2019-02-28 Thread lokeshkumar
Hi Akshay

Thanks for the response please find below the answers to your questions.

1. We are running Spark in cluster mode the cluster manager being Spark's
standalone cluster manager.
2. All the ports are open and we preconfigure on what ports the
communication should happen and modify firewall rules to allow traffic on
these ports. (The functionality is fine till Spark master goes down after 60
mins)
3. Memory consumptions of all the components:

Spark Master:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00   0.00  12.91  35.11  97.08  95.80  50.239 20.197   
0.436
Spark Worker:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
 51.64   0.00  46.66  27.44  97.57  95.85 100.381 20.233   
0.613
Spark Submit Process (Driver):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  63.57  93.82  26.29  98.24  97.53   4663  124.648   109   20.910 
145.558
Spark executor (Coarse grained):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  69.77  17.74  31.13  95.67  90.44   7353  556.888 51.572 
558.460



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 2.4.0 Master going down

2019-02-28 Thread lokeshkumar
Hi Akshay

Thanks for the response please find below the answers to your questions.

1. We are running Spark in cluster mode the cluster manager being Spark's
standalone cluster manager.
2. All the ports are open and we preconfigure on what ports the
communication should happen and modify firewall rules to allow traffic on
these ports. (The functionality is fine till Spark master goes down after 60
mins)
3. Memory consumptions of all the components:

Spark Master:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00   0.00  12.91  35.11  97.08  95.80  50.239 20.197   
0.436
Spark Worker:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
 51.64   0.00  46.66  27.44  97.57  95.85 100.381 20.233   
0.613
Spark Submit Process (Driver):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  63.57  93.82  26.29  98.24  97.53   4663  124.648   109   20.910 
145.558
Spark executor (Coarse grained):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
GCT   
  0.00  69.77  17.74  31.13  95.67  90.44   7353  556.888 51.572 
558.460



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 2.4.0 Master going down

2019-02-28 Thread Lokesh Kumar Padhnavis
Hi Akshay

Thanks for the response please find below the answers to your questions.

1. We are running Spark in cluster mode the cluster manager being Spark's
standalone cluster manager.
2. All the ports are open and we preconfigure on what ports the
communication should happen and modify firewall rules to allow traffic on
these ports. (The functionality is fine till Spark master goes down after
60 mins)
3. Memory consumptions of all the components:

Spark Master:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
 GCT
  0.00   0.00  12.91  35.11  97.08  95.80  50.239 20.197
0.436
Spark Worker:
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
 GCT
 51.64   0.00  46.66  27.44  97.57  95.85 100.381 20.233
0.613
Spark Submit Process (Driver):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
 GCT
  0.00  63.57  93.82  26.29  98.24  97.53   4663  124.648   109   20.910
145.558
Spark executor (Coarse grained):
  S0 S1 E  O  M CCSYGC YGCTFGCFGCT
 GCT
  0.00  69.77  17.74  31.13  95.67  90.44   7353  556.888 51.572
558.460



On Thu, Feb 28, 2019 at 3:13 PM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:

> Hi Lokesh,
>
> Please provide further information to help identify the issue.
>
> 1) Are you running in a standalone mode or cluster mode? If cluster, then
> is a spark master/slave or YARN/Mesos?
> 2) Have you tried checking if all ports between your master and the
> machine with IP 192.168.43.167 are accessible?
> 3) Have you checked the memory consumption of the executors/driver running
> in the cluster?
>
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Wed, Feb 27, 2019 at 8:27 PM lokeshkumar  wrote:
>
>> Hi All
>>
>> We are running Spark version 2.4.0 and we run few Spark streaming jobs
>> listening on Kafka topics. We receive an average of 10-20 msgs per
>> second.
>> And the Spark master has been going down after 1-2 hours of it running.
>> Exception is given below:
>> Along with that spark executors also get killed.
>>
>> This was not happening with Spark 2.1.1 it started happening with Spark
>> 2.4.0 any help/suggestion is appreciated.
>>
>> The exception that we see is
>>
>> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>> at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
>> at
>>
>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
>> at
>>
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
>> at
>>
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
>> at
>>
>> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>> Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any
>> reply from 192.168.43.167:40007 in 120 seconds. This timeout is
>> controlled
>> by spark.rpc.askTimeout
>> at
>> org.apache.spark.rpc.RpcTimeout.org
>> $apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
>> at
>>
>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
>> at
>>
>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
>> at
>>
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
>> at scala.util.Try$.apply(Try.scala:192)
>> at scala.util.Failure.recover(Try.scala:216)
>> at
>> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
>> at
>> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>> at
>>
>> org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
>> at
>>
>> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
>> at
>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>> at
>>
>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>> at scala.concurrent.Promise$class.complete(Promise.scala:55)
>> at
>> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
>> at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
>> at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>> at
>>
>> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
>> at
>>
>> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Bat

Re: Spark 2.4.0 Master going down

2019-02-28 Thread Akshay Bhardwaj
Hi Lokesh,

Please provide further information to help identify the issue.

1) Are you running in a standalone mode or cluster mode? If cluster, then
is a spark master/slave or YARN/Mesos?
2) Have you tried checking if all ports between your master and the machine
with IP 192.168.43.167 are accessible?
3) Have you checked the memory consumption of the executors/driver running
in the cluster?


Akshay Bhardwaj
+91-97111-33849


On Wed, Feb 27, 2019 at 8:27 PM lokeshkumar  wrote:

> Hi All
>
> We are running Spark version 2.4.0 and we run few Spark streaming jobs
> listening on Kafka topics. We receive an average of 10-20 msgs per second.
> And the Spark master has been going down after 1-2 hours of it running.
> Exception is given below:
> Along with that spark executors also get killed.
>
> This was not happening with Spark 2.1.1 it started happening with Spark
> 2.4.0 any help/suggestion is appreciated.
>
> The exception that we see is
>
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
> at
>
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
> at
>
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
> at
>
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
> at
>
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any
> reply from 192.168.43.167:40007 in 120 seconds. This timeout is controlled
> by spark.rpc.askTimeout
> at
> org.apache.spark.rpc.RpcTimeout.org
> $apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
> at
>
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
> at
>
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
> at
>
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
> at scala.util.Try$.apply(Try.scala:192)
> at scala.util.Failure.recover(Try.scala:216)
> at
> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
> at
> scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at
>
> org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> at
>
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
> at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at scala.concurrent.Promise$class.complete(Promise.scala:55)
> at
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
> at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at
>
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> at
>
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> at
>
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> at
>
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> at
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
> at
>
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
> at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
> at
> scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:157)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.org
> $apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:206)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:243)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent