Logger overridden when using JavaSparkContext
Hi there, we're haveing a strange Problem here using Spark in a Java application using the JavaSparkContext: We are using java.util.logging.* for logging in our application with 2 Handlers (Console + Filehandler): {{{ .handlers=java.util.logging.ConsoleHandler, java.util.logging.FileHandler .level = FINE java.util.logging.ConsoleHandler.level=INFO java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter java.util.logging.FileHandler.level= FINE java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter java.util.logging.FileHandler.limit=1024 java.util.logging.FileHandler.count=5 java.util.logging.FileHandler.append= true java.util.logging.FileHandler.pattern=%t/delivery-model.%u.%g.txt java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS %5$s%6$s%n }}} The thing is, that when the JavaSparcContext is started, the Logging stops. The log4j.properties for spark looks like this: {{{ log4j.rootLogger=WARN, theConsoleAppender log4j.additivity.io.datapath=false log4j.appender.theConsoleAppender=org.apache.log4j.ConsoleAppender log4j.appender.theConsoleAppender.layout=org.apache.log4j.PatternLayout log4j.appender.theConsoleAppender.layout.ConversionPattern=%d{-MM-dd HH:mm:ss} %m%n }}} Obviously iam not an expert in the Logging-Architecture yet, but i really need to understand how the Handler of our JUL-Logging are changed by the spark-library. Thanks in advance! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Logger overridden when using JavaSparkContext
I checked the handlers of my rootLogger (java.util.logging.Logger.getLogger("")) which where a Console and a FileHandler. After the JavaSparkContext was created, the rootLogger only contained a 'org.slf4j.bridge.SLF4JBridgeHandler'. Am 11.01.2016 um 10:56 schrieb Max Schmidt: > Hi there, > > we're haveing a strange Problem here using Spark in a Java application > using the JavaSparkContext: > > We are using java.util.logging.* for logging in our application with 2 > Handlers (Console + Filehandler): > > {{{ > .handlers=java.util.logging.ConsoleHandler, java.util.logging.FileHandler > > .level = FINE > > java.util.logging.ConsoleHandler.level=INFO > java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter > > java.util.logging.FileHandler.level= FINE > java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter > java.util.logging.FileHandler.limit=1024 > java.util.logging.FileHandler.count=5 > java.util.logging.FileHandler.append= true > java.util.logging.FileHandler.pattern=%t/delivery-model.%u.%g.txt > > java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td > %1$tH:%1$tM:%1$tS %5$s%6$s%n > }}} > > The thing is, that when the JavaSparcContext is started, the Logging stops. > > The log4j.properties for spark looks like this: > > {{{ > log4j.rootLogger=WARN, theConsoleAppender > log4j.additivity.io.datapath=false > log4j.appender.theConsoleAppender=org.apache.log4j.ConsoleAppender > log4j.appender.theConsoleAppender.layout=org.apache.log4j.PatternLayout > log4j.appender.theConsoleAppender.layout.ConversionPattern=%d{-MM-dd > HH:mm:ss} %m%n > }}} > > Obviously iam not an expert in the Logging-Architecture yet, but i > really need to understand how the Handler of our JUL-Logging are changed > by the spark-library. > > Thanks in advance! > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Logger overridden when using JavaSparkContext
Okay, i solved this problem... It was my own fault by setting the RootLogger for the java.util.logging*. An explicit name for the handler/level solved it. Am 2016-01-11 12:33, schrieb Max Schmidt: I checked the handlers of my rootLogger (java.util.logging.Logger.getLogger("")) which where a Console and a FileHandler. After the JavaSparkContext was created, the rootLogger only contained a 'org.slf4j.bridge.SLF4JBridgeHandler'. Am 11.01.2016 um 10:56 schrieb Max Schmidt: Hi there, we're haveing a strange Problem here using Spark in a Java application using the JavaSparkContext: We are using java.util.logging.* for logging in our application with 2 Handlers (Console + Filehandler): {{{ .handlers=java.util.logging.ConsoleHandler, java.util.logging.FileHandler .level = FINE java.util.logging.ConsoleHandler.level=INFO java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter java.util.logging.FileHandler.level= FINE java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter java.util.logging.FileHandler.limit=1024 java.util.logging.FileHandler.count=5 java.util.logging.FileHandler.append= true java.util.logging.FileHandler.pattern=%t/delivery-model.%u.%g.txt java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS %5$s%6$s%n }}} The thing is, that when the JavaSparcContext is started, the Logging stops. The log4j.properties for spark looks like this: {{{ log4j.rootLogger=WARN, theConsoleAppender log4j.additivity.io.datapath=false log4j.appender.theConsoleAppender=org.apache.log4j.ConsoleAppender log4j.appender.theConsoleAppender.layout=org.apache.log4j.PatternLayout log4j.appender.theConsoleAppender.layout.ConversionPattern=%d{-MM-dd HH:mm:ss} %m%n }}} Obviously iam not an expert in the Logging-Architecture yet, but i really need to understand how the Handler of our JUL-Logging are changed by the spark-library. Thanks in advance! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: No active SparkContext
Just to mark this question closed - we expierienced an OOM-Exception on the Master, which we didn't see on the Driver, but made him crash. Am 24.03.2016 um 09:54 schrieb Max Schmidt: > Hi there, > > we're using with the java-api (1.6.0) a ScheduledExecutor that > continuously executes a SparkJob to a standalone cluster. > > After each job we close the JavaSparkContext and create a new one. > > But sometimes the Scheduling JVM crashes with: > > 24.03.2016-08:30:27:375# error - Application has been killed. Reason: > All masters are unresponsive! Giving up. > 24.03.2016-08:30:27:398# error - Error initializing SparkContext. > java.lang.IllegalStateException: Cannot call methods on a stopped > SparkContext. > This stopped SparkContext was created at: > > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59) > io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53) > io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130) > io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117) > io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown > Source) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.run(FutureTask.java:266) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > > The currently active SparkContext was created at: > > (No active SparkContext.) > > at > org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106) > at > org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1578) > at > org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2179) > at org.apache.spark.SparkContext.(SparkContext.scala:579) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59) > at > io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53) > at > io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130) > at > io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117) > at > io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown > Source) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 24.03.2016-08:30:27:402# info - SparkMeasurement - finished. > > Any guess? > -- > *Max Schmidt, Senior Java Developer* | m...@datapath.io | LinkedIn > <https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/> > Datapath.io > > Decreasing AWS latency. > Your traffic optimized. > > Datapath.io GmbH > Mainz | HRB Nr. 46222 > Sebastian Spies, CEO > -- *Max Schmidt, Senior Java Developer* | m...@datapath.io <mailto:m...@datapath.io> | LinkedIn <https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/> Datapath.io Decreasing AWS latency. Your traffic optimized. Datapath.io GmbH Mainz | HRB Nr. 46222 Sebastian Spies, CEO
Re: No active SparkContext
Am 24.03.2016 um 10:34 schrieb Simon Hafner: > 2016-03-24 9:54 GMT+01:00 Max Schmidt <m...@datapath.io > <mailto:m...@datapath.io>>: > > we're using with the java-api (1.6.0) a ScheduledExecutor that > continuously > > executes a SparkJob to a standalone cluster. > I'd recommend Scala. Why should I use scala? > > > After each job we close the JavaSparkContext and create a new one. > Why do that? You can happily reuse it. Pretty sure that also causes > the other problems, because you have a race condition on waiting for > the job to finish and stopping the Context. I do that because it is a very common pattern to create an object for specific "job" and release its resources when its done. The first problem that came in my mind was that the appName is immutable once the JavaSparkContext was created, so it is, to me, not possible to resuse the JavaSparkContext for jobs with different IDs (that we wanna see in the webUI). And of course it is possible to wait for closing the JavaSparkContext gracefully, except when there is some asynchronous action in the background? -- *Max Schmidt, Senior Java Developer* | m...@datapath.io <mailto:m...@datapath.io> | LinkedIn <https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/> Datapath.io Decreasing AWS latency. Your traffic optimized. Datapath.io GmbH Mainz | HRB Nr. 46222 Sebastian Spies, CEO
Re: apache spark errors
es, TID = 47709 > > 644989 [Executor task launch worker-13] ERROR > org.apache.spark.executor.Executor - Managed memory leak > detected; size = 5326260 bytes, TID = 47863 > > 720701 [Executor task launch worker-12] ERROR > org.apache.spark.executor.Executor - Managed memory leak > detected; size = 5399578 bytes, TID = 48959 > > 1147961 [Executor task launch worker-16] ERROR > org.apache.spark.executor.Executor - Managed memory leak > detected; size = 5251872 bytes, TID = 54922 > > > > > > How can I fix this? > > > > With kind regard, > > > > Michel > > > > > -- *Max Schmidt, Senior Java Developer* | m...@datapath.io <mailto:m...@datapath.io> | LinkedIn <https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/> Datapath.io Decreasing AWS latency. Your traffic optimized. Datapath.io GmbH Mainz | HRB Nr. 46222 Sebastian Spies, CEO
Re: No active SparkContext
Am 2016-03-24 18:00, schrieb Mark Hamstra: You seem to be confusing the concepts of Job and Application. A Spark Application has a SparkContext. A Spark Application is capable of running multiple Jobs, each with its own ID, visible in the webUI. Obviously I mixed it up, but then I would like to know how my Java application should be constrcuted if wanted to submit periodic 'Applications' to my cluster? Did anyone use the http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/launcher/package-summary.html for this scenario? On Thu, Mar 24, 2016 at 6:11 AM, Max Schmidt <m...@datapath.io> wrote: Am 24.03.2016 um 10:34 schrieb Simon Hafner: 2016-03-24 9:54 GMT+01:00 Max Schmidt <m...@datapath.io>: > we're using with the java-api (1.6.0) a ScheduledExecutor that continuously > executes a SparkJob to a standalone cluster. I'd recommend Scala. Why should I use scala? After each job we close the JavaSparkContext and create a new one. Why do that? You can happily reuse it. Pretty sure that also causes the other problems, because you have a race condition on waiting for the job to finish and stopping the Context. I do that because it is a very common pattern to create an object for specific "job" and release its resources when its done. The first problem that came in my mind was that the appName is immutable once the JavaSparkContext was created, so it is, to me, not possible to resuse the JavaSparkContext for jobs with different IDs (that we wanna see in the webUI). And of course it is possible to wait for closing the JavaSparkContext gracefully, except when there is some asynchronous action in the background? -- MAX SCHMIDT, SENIOR JAVA DEVELOPER | m...@datapath.io | LinkedIn [1] Decreasing AWS latency. Your traffic optimized. Datapath.io GmbH Mainz | HRB Nr. 46222 Sebastian Spies, CEO Links: -- [1] https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
No active SparkContext
Hi there, we're using with the java-api (1.6.0) a ScheduledExecutor that continuously executes a SparkJob to a standalone cluster. After each job we close the JavaSparkContext and create a new one. But sometimes the Scheduling JVM crashes with: 24.03.2016-08:30:27:375# error - Application has been killed. Reason: All masters are unresponsive! Giving up. 24.03.2016-08:30:27:398# error - Error initializing SparkContext. java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. This stopped SparkContext was created at: org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59) io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53) io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130) io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117) io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown Source) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) The currently active SparkContext was created at: (No active SparkContext.) at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106) at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1578) at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2179) at org.apache.spark.SparkContext.(SparkContext.scala:579) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59) at io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53) at io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130) at io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117) at io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 24.03.2016-08:30:27:402# info - SparkMeasurement - finished. Any guess? -- *Max Schmidt, Senior Java Developer* | m...@datapath.io <mailto:m...@datapath.io> | LinkedIn <https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/> Datapath.io Decreasing AWS latency. Your traffic optimized. Datapath.io GmbH Mainz | HRB Nr. 46222 Sebastian Spies, CEO
Where to set properties for the retainedJobs/Stages?
Can somebody tell me the interaction between the properties: spark.ui.retainedJobs spark.ui.retainedStages spark.history.retainedApplications I know from the bugtracker, that the last one describes the number of applications the history-server holds in memory. Can I set the properties in the spark-env.sh? And where? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Where to set properties for the retainedJobs/Stages?
Yes but doc doesn't say any word for which variable the configs are valid, so do I have to set them for the history-server? The daemon? The workers? And what if I use the java API instead of spark-submit for the jobs? I guess that the spark-defaults.conf are obsolete for the java API? Am 2016-04-01 18:58, schrieb Ted Yu: You can set them in spark-defaults.conf See also https://spark.apache.org/docs/latest/configuration.html#spark-ui [1] On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io> wrote: Can somebody tell me the interaction between the properties: spark.ui.retainedJobs spark.ui.retainedStages spark.history.retainedApplications I know from the bugtracker, that the last one describes the number of applications the history-server holds in memory. Can I set the properties in the spark-env.sh? And where? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org Links: -- [1] https://spark.apache.org/docs/latest/configuration.html#spark-ui - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Where to set properties for the retainedJobs/Stages?
Okay I put the props in the spark-defaults, but they are not recognized, as they don't appear in the 'Environment' tab during a application execution. spark.eventLog.enabled for example. Am 01.04.2016 um 21:22 schrieb Ted Yu: > Please > read > https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties > w.r.t. spark-defaults.conf > > On Fri, Apr 1, 2016 at 12:06 PM, Max Schmidt <m...@datapath.io > <mailto:m...@datapath.io>> wrote: > > Yes but doc doesn't say any word for which variable the configs > are valid, so do I have to set them for the history-server? The > daemon? The workers? > > And what if I use the java API instead of spark-submit for the jobs? > > I guess that the spark-defaults.conf are obsolete for the java API? > > > Am 2016-04-01 18:58, schrieb Ted Yu: > > You can set them in spark-defaults.conf > > See > also https://spark.apache.org/docs/latest/configuration.html#spark-ui > [1] > > On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io > <mailto:m...@datapath.io>> wrote: > > Can somebody tell me the interaction between the properties: > > spark.ui.retainedJobs > spark.ui.retainedStages > spark.history.retainedApplications > > I know from the bugtracker, that the last one describes > the number of > applications the history-server holds in memory. > > Can I set the properties in the spark-env.sh? And where? > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: > user-h...@spark.apache.org <mailto:user-h...@spark.apache.org> > > > > > Links: > -- > [1] > https://spark.apache.org/docs/latest/configuration.html#spark-ui > > > > > -- *Max Schmidt, Senior Java Developer* | m...@datapath.io <mailto:m...@datapath.io> | LinkedIn <https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/> Datapath.io Decreasing AWS latency. Your traffic optimized. Datapath.io GmbH Mainz | HRB Nr. 46222 Sebastian Spies, CEO