Configuration:

Driver memory we tried: 2GB / 4GB / 5GB
Executor memory we tried: 4G / 5GB
Even reduced: *spark.memory.fraction *to 0.2  (we are not using cache)
VM Memory: 32 GB and 8 core
We tried for SPARK_WORKER_MEMORY:  30GB / 24GB
SPARK_WORKER_CORES: 32 (because jobs are not CPU bound )
SPARK_WORKER_INSTANCES: 1


What we feel there is not enable space for user classes / objects or clean
up for these is not happening frequently.





On Sat, May 9, 2020 at 12:30 AM Amit Sharma <resolve...@gmail.com> wrote:

> What memory you are assigning per executor. What is the driver memory
> configuration?
>
>
> Thanks
> Amit
>
> On Fri, May 8, 2020 at 12:59 PM Hrishikesh Mishra <sd.hri...@gmail.com>
> wrote:
>
>> We submit spark job through spark-submit command, Like below one.
>>
>>
>> sudo /var/lib/pf-spark/bin/spark-submit \
>> --total-executor-cores 30 \
>> --driver-cores 2 \
>> --class com.hrishikesh.mishra.Main\
>> --master spark://XX.XX.XXX.19:6066  \
>> --deploy-mode cluster  \
>> --supervise
>> http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar
>>
>>
>>
>>
>> We have python http server, where we hosted all jars.
>>
>> The user kill the driver driver-20200508153502-1291 and its visible in
>> log also, but this is not problem. OOM is separate from this.
>>
>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>> driver-20200508153502-1291
>>
>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>
>> 20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was
>> killed by user*
>>
>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>
>>
>> On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> It's been a while since I worked with Spark Standalone, but I'd check
>>> the logs of the workers. How do you spark-submit the app?
>>>
>>> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> <https://twitter.com/jaceklaskowski>
>>>
>>>
>>> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd.hri...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Jacek for quick response.
>>>> Due to our system constraints, we can't move to Structured Streaming
>>>> now. But definitely YARN can be tried out.
>>>>
>>>> But my problem is I'm able to figure out where is the issue, Driver,
>>>> Executor, or Worker. Even exceptions are clueless.  Please see the below
>>>> exception, I'm unable to spot the issue for OOM.
>>>>
>>>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>>>> driver-20200508153502-1291
>>>>
>>>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>>>
>>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>>>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>>>
>>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>>>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>>>
>>>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>>>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>>>
>>>> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
>>>> killed by user
>>>>
>>>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>>>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>>>> handling the following exception:*
>>>>
>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>
>>>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>>>
>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>
>>>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>>>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>>>> handling the following exception:*
>>>>
>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>
>>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>>
>>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>>
>>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>>
>>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>>>
>>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>>>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>>>>> something. " and I noticed this "Kafka Direct Stream" and "Spark 
>>>>> Standalone
>>>>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>>>>> use Spark Structured Streaming at the very least and/or use YARN as the
>>>>> cluster manager".
>>>>>
>>>>> Another thought was that the user code (your code) could be leaking
>>>>> resources so Spark eventually reports heap-related errors that may not
>>>>> necessarily be Spark's.
>>>>>
>>>>> Pozdrawiam,
>>>>> Jacek Laskowski
>>>>> ----
>>>>> https://about.me/JacekLaskowski
>>>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>>>> Follow me on https://twitter.com/jaceklaskowski
>>>>>
>>>>> <https://twitter.com/jaceklaskowski>
>>>>>
>>>>>
>>>>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I am getting out of memory error in worker log in streaming jobs in
>>>>>> every couple of hours. After this worker dies. There is no shuffle, no
>>>>>> aggression, no. caching  in job, its just a transformation.
>>>>>> I'm not able to identify where is the problem, driver or executor.
>>>>>> And why worker getting dead after the OOM streaming job should die. Am I
>>>>>> missing something.
>>>>>>
>>>>>> Driver Memory:  2g
>>>>>> Executor memory: 4g
>>>>>>
>>>>>> Spark Version:  2.4
>>>>>> Kafka Direct Stream
>>>>>> Spark Standalone Cluster.
>>>>>>
>>>>>>
>>>>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager:
>>>>>> authentication disabled; ui acls disabled; users  with view permissions:
>>>>>> Set(root); groups with view permissions: Set(); users  with modify
>>>>>> permissions: Set(root); groups with modify permissions: Set()
>>>>>>
>>>>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>>>> exception in thread Thread[ExecutorRunner for
>>>>>> app-20200506124717-10226/0,5,main]
>>>>>>
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>>>>
>>>>>> at
>>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>>>>> Source)
>>>>>>
>>>>>> at
>>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>>>>> Source)
>>>>>>
>>>>>> at
>>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>>>>> Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>>>>
>>>>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>>>>
>>>>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>>>>
>>>>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>>>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>>>>
>>>>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>>>>> driver driver-20200505181719-1187
>>>>>>
>>>>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Hrishi
>>>>>>
>>>>>

Reply via email to