Configuration: Driver memory we tried: 2GB / 4GB / 5GB Executor memory we tried: 4G / 5GB Even reduced: *spark.memory.fraction *to 0.2 (we are not using cache) VM Memory: 32 GB and 8 core We tried for SPARK_WORKER_MEMORY: 30GB / 24GB SPARK_WORKER_CORES: 32 (because jobs are not CPU bound ) SPARK_WORKER_INSTANCES: 1
What we feel there is not enable space for user classes / objects or clean up for these is not happening frequently. On Sat, May 9, 2020 at 12:30 AM Amit Sharma <resolve...@gmail.com> wrote: > What memory you are assigning per executor. What is the driver memory > configuration? > > > Thanks > Amit > > On Fri, May 8, 2020 at 12:59 PM Hrishikesh Mishra <sd.hri...@gmail.com> > wrote: > >> We submit spark job through spark-submit command, Like below one. >> >> >> sudo /var/lib/pf-spark/bin/spark-submit \ >> --total-executor-cores 30 \ >> --driver-cores 2 \ >> --class com.hrishikesh.mishra.Main\ >> --master spark://XX.XX.XXX.19:6066 \ >> --deploy-mode cluster \ >> --supervise >> http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar >> >> >> >> >> We have python http server, where we hosted all jars. >> >> The user kill the driver driver-20200508153502-1291 and its visible in >> log also, but this is not problem. OOM is separate from this. >> >> 20/05/08 15:36:55 INFO Worker: Asked to kill driver >> driver-20200508153502-1291 >> >> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process! >> >> 20/05/08 15:36:55 INFO CommandUtils: Redirection to >> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed >> >> 20/05/08 15:36:55 INFO CommandUtils: Redirection to >> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed >> >> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application >> app-20200508153654-11776 removed, cleanupLocalDirs = true >> >> 20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was >> killed by user* >> >> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception >> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full >> stacktrace] was thrown by a user handler's exceptionCaught() method while >> handling the following exception:* >> >> *java.lang.OutOfMemoryError: Java heap space* >> >> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught >> exception in thread Thread[dispatcher-event-loop-6,5,main]* >> >> *java.lang.OutOfMemoryError: Java heap space* >> >> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception >> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full >> stacktrace] was thrown by a user handler's exceptionCaught() method while >> handling the following exception:* >> >> *java.lang.OutOfMemoryError: Java heap space* >> >> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >> >> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >> >> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >> >> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called >> >> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory >> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922 >> >> >> On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote: >> >>> Hi, >>> >>> It's been a while since I worked with Spark Standalone, but I'd check >>> the logs of the workers. How do you spark-submit the app? >>> >>> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory? >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://about.me/JacekLaskowski >>> "The Internals Of" Online Books <https://books.japila.pl/> >>> Follow me on https://twitter.com/jaceklaskowski >>> >>> <https://twitter.com/jaceklaskowski> >>> >>> >>> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd.hri...@gmail.com> >>> wrote: >>> >>>> Thanks Jacek for quick response. >>>> Due to our system constraints, we can't move to Structured Streaming >>>> now. But definitely YARN can be tried out. >>>> >>>> But my problem is I'm able to figure out where is the issue, Driver, >>>> Executor, or Worker. Even exceptions are clueless. Please see the below >>>> exception, I'm unable to spot the issue for OOM. >>>> >>>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver >>>> driver-20200508153502-1291 >>>> >>>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process! >>>> >>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to >>>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed >>>> >>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to >>>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed >>>> >>>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application >>>> app-20200508153654-11776 removed, cleanupLocalDirs = true >>>> >>>> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was >>>> killed by user >>>> >>>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception >>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full >>>> stacktrace] was thrown by a user handler's exceptionCaught() method while >>>> handling the following exception:* >>>> >>>> *java.lang.OutOfMemoryError: Java heap space* >>>> >>>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught >>>> exception in thread Thread[dispatcher-event-loop-6,5,main]* >>>> >>>> *java.lang.OutOfMemoryError: Java heap space* >>>> >>>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception >>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full >>>> stacktrace] was thrown by a user handler's exceptionCaught() method while >>>> handling the following exception:* >>>> >>>> *java.lang.OutOfMemoryError: Java heap space* >>>> >>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >>>> >>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >>>> >>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process! >>>> >>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called >>>> >>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory >>>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922 >>>> >>>> >>>> >>>> >>>> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote: >>>> >>>>> Hi, >>>>> >>>>> Sorry for being perhaps too harsh, but when you asked "Am I missing >>>>> something. " and I noticed this "Kafka Direct Stream" and "Spark >>>>> Standalone >>>>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to >>>>> use Spark Structured Streaming at the very least and/or use YARN as the >>>>> cluster manager". >>>>> >>>>> Another thought was that the user code (your code) could be leaking >>>>> resources so Spark eventually reports heap-related errors that may not >>>>> necessarily be Spark's. >>>>> >>>>> Pozdrawiam, >>>>> Jacek Laskowski >>>>> ---- >>>>> https://about.me/JacekLaskowski >>>>> "The Internals Of" Online Books <https://books.japila.pl/> >>>>> Follow me on https://twitter.com/jaceklaskowski >>>>> >>>>> <https://twitter.com/jaceklaskowski> >>>>> >>>>> >>>>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd.hri...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I am getting out of memory error in worker log in streaming jobs in >>>>>> every couple of hours. After this worker dies. There is no shuffle, no >>>>>> aggression, no. caching in job, its just a transformation. >>>>>> I'm not able to identify where is the problem, driver or executor. >>>>>> And why worker getting dead after the OOM streaming job should die. Am I >>>>>> missing something. >>>>>> >>>>>> Driver Memory: 2g >>>>>> Executor memory: 4g >>>>>> >>>>>> Spark Version: 2.4 >>>>>> Kafka Direct Stream >>>>>> Spark Standalone Cluster. >>>>>> >>>>>> >>>>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: >>>>>> authentication disabled; ui acls disabled; users with view permissions: >>>>>> Set(root); groups with view permissions: Set(); users with modify >>>>>> permissions: Set(root); groups with modify permissions: Set() >>>>>> >>>>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught >>>>>> exception in thread Thread[ExecutorRunner for >>>>>> app-20200506124717-10226/0,5,main] >>>>>> >>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>> >>>>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source) >>>>>> >>>>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source) >>>>>> >>>>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source) >>>>>> >>>>>> at >>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown >>>>>> Source) >>>>>> >>>>>> at >>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown >>>>>> Source) >>>>>> >>>>>> at >>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown >>>>>> Source) >>>>>> >>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) >>>>>> >>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) >>>>>> >>>>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) >>>>>> >>>>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) >>>>>> >>>>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) >>>>>> >>>>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) >>>>>> >>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480) >>>>>> >>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405) >>>>>> >>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) >>>>>> >>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) >>>>>> >>>>>> at >>>>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464) >>>>>> >>>>>> at >>>>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436) >>>>>> >>>>>> at >>>>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114) >>>>>> >>>>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114) >>>>>> >>>>>> at org.apache.spark.deploy.worker.ExecutorRunner.org >>>>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149) >>>>>> >>>>>> at >>>>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73) >>>>>> >>>>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing >>>>>> driver driver-20200505181719-1187 >>>>>> >>>>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> Hrishi >>>>>> >>>>>