Logger overridden when using JavaSparkContext

2016-01-11 Thread Max Schmidt
Hi there,

we're haveing a strange Problem here using Spark in a Java application
using the JavaSparkContext:

We are using java.util.logging.* for logging in our application with 2
Handlers (Console + Filehandler):

{{{
.handlers=java.util.logging.ConsoleHandler, java.util.logging.FileHandler

.level = FINE

java.util.logging.ConsoleHandler.level=INFO
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter

java.util.logging.FileHandler.level= FINE
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.FileHandler.limit=1024
java.util.logging.FileHandler.count=5
java.util.logging.FileHandler.append= true
java.util.logging.FileHandler.pattern=%t/delivery-model.%u.%g.txt

java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td
%1$tH:%1$tM:%1$tS %5$s%6$s%n
}}}

The thing is, that when the JavaSparcContext is started, the Logging stops.

The log4j.properties for spark looks like this:

{{{
log4j.rootLogger=WARN, theConsoleAppender
log4j.additivity.io.datapath=false
log4j.appender.theConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.theConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.theConsoleAppender.layout.ConversionPattern=%d{-MM-dd
HH:mm:ss} %m%n
}}}

Obviously iam not an expert in the Logging-Architecture yet, but i
really need to understand how the Handler of our JUL-Logging are changed
by the spark-library.

Thanks in advance!



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Logger overridden when using JavaSparkContext

2016-01-11 Thread Max Schmidt
I checked the handlers of my rootLogger
(java.util.logging.Logger.getLogger("")) which where
a Console and a FileHandler.

After the JavaSparkContext was created, the rootLogger only contained a
'org.slf4j.bridge.SLF4JBridgeHandler'.

Am 11.01.2016 um 10:56 schrieb Max Schmidt:
> Hi there,
>
> we're haveing a strange Problem here using Spark in a Java application
> using the JavaSparkContext:
>
> We are using java.util.logging.* for logging in our application with 2
> Handlers (Console + Filehandler):
>
> {{{
> .handlers=java.util.logging.ConsoleHandler, java.util.logging.FileHandler
>
> .level = FINE
>
> java.util.logging.ConsoleHandler.level=INFO
> java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
>
> java.util.logging.FileHandler.level= FINE
> java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
> java.util.logging.FileHandler.limit=1024
> java.util.logging.FileHandler.count=5
> java.util.logging.FileHandler.append= true
> java.util.logging.FileHandler.pattern=%t/delivery-model.%u.%g.txt
>
> java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td
> %1$tH:%1$tM:%1$tS %5$s%6$s%n
> }}}
>
> The thing is, that when the JavaSparcContext is started, the Logging stops.
>
> The log4j.properties for spark looks like this:
>
> {{{
> log4j.rootLogger=WARN, theConsoleAppender
> log4j.additivity.io.datapath=false
> log4j.appender.theConsoleAppender=org.apache.log4j.ConsoleAppender
> log4j.appender.theConsoleAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.theConsoleAppender.layout.ConversionPattern=%d{-MM-dd
> HH:mm:ss} %m%n
> }}}
>
> Obviously iam not an expert in the Logging-Architecture yet, but i
> really need to understand how the Handler of our JUL-Logging are changed
> by the spark-library.
>
> Thanks in advance!
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Logger overridden when using JavaSparkContext

2016-01-11 Thread Max Schmidt

Okay, i solved this problem...
It was my own fault by setting the RootLogger for the 
java.util.logging*.

An explicit name for the handler/level solved it.

Am 2016-01-11 12:33, schrieb Max Schmidt:

I checked the handlers of my rootLogger
(java.util.logging.Logger.getLogger("")) which where
a Console and a FileHandler.

After the JavaSparkContext was created, the rootLogger only contained 
a

'org.slf4j.bridge.SLF4JBridgeHandler'.

Am 11.01.2016 um 10:56 schrieb Max Schmidt:

Hi there,

we're haveing a strange Problem here using Spark in a Java 
application

using the JavaSparkContext:

We are using java.util.logging.* for logging in our application with 
2

Handlers (Console + Filehandler):

{{{
.handlers=java.util.logging.ConsoleHandler, 
java.util.logging.FileHandler


.level = FINE

java.util.logging.ConsoleHandler.level=INFO

java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter

java.util.logging.FileHandler.level= FINE

java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.FileHandler.limit=1024
java.util.logging.FileHandler.count=5
java.util.logging.FileHandler.append= true
java.util.logging.FileHandler.pattern=%t/delivery-model.%u.%g.txt

java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td
%1$tH:%1$tM:%1$tS %5$s%6$s%n
}}}

The thing is, that when the JavaSparcContext is started, the Logging 
stops.


The log4j.properties for spark looks like this:

{{{
log4j.rootLogger=WARN, theConsoleAppender
log4j.additivity.io.datapath=false
log4j.appender.theConsoleAppender=org.apache.log4j.ConsoleAppender

log4j.appender.theConsoleAppender.layout=org.apache.log4j.PatternLayout

log4j.appender.theConsoleAppender.layout.ConversionPattern=%d{-MM-dd
HH:mm:ss} %m%n
}}}

Obviously iam not an expert in the Logging-Architecture yet, but i
really need to understand how the Handler of our JUL-Logging are 
changed

by the spark-library.

Thanks in advance!




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: No active SparkContext

2016-03-31 Thread Max Schmidt
Just to mark this question closed - we expierienced an OOM-Exception on
the Master, which we didn't see on the Driver, but made him crash.

Am 24.03.2016 um 09:54 schrieb Max Schmidt:
> Hi there,
>
> we're using with the java-api (1.6.0) a ScheduledExecutor that
> continuously executes a SparkJob to a standalone cluster.
>
> After each job we close the JavaSparkContext and create a new one.
>
> But sometimes the Scheduling JVM crashes with:
>
> 24.03.2016-08:30:27:375# error - Application has been killed. Reason:
> All masters are unresponsive! Giving up.
> 24.03.2016-08:30:27:398# error - Error initializing SparkContext.
> java.lang.IllegalStateException: Cannot call methods on a stopped
> SparkContext.
> This stopped SparkContext was created at:
>
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53)
> io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130)
> io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117)
> io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown
> Source)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
>
> The currently active SparkContext was created at:
>
> (No active SparkContext.)
>
> at
> org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
> at
> org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1578)
> at
> org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2179)
> at org.apache.spark.SparkContext.(SparkContext.scala:579)
> at
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> at
> io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53)
> at
> io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130)
> at
> io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117)
> at
> io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown
> Source)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 24.03.2016-08:30:27:402# info - SparkMeasurement - finished.
>
> Any guess?
> -- 
> *Max Schmidt, Senior Java Developer* | m...@datapath.io | LinkedIn
> <https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/>
> Datapath.io
>  
> Decreasing AWS latency.
> Your traffic optimized.
>
> Datapath.io GmbH
> Mainz | HRB Nr. 46222
> Sebastian Spies, CEO
>

-- 
*Max Schmidt, Senior Java Developer* | m...@datapath.io
<mailto:m...@datapath.io> | LinkedIn
<https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/>
Datapath.io
 
Decreasing AWS latency.
Your traffic optimized.

Datapath.io GmbH
Mainz | HRB Nr. 46222
Sebastian Spies, CEO



Re: No active SparkContext

2016-03-24 Thread Max Schmidt
Am 24.03.2016 um 10:34 schrieb Simon Hafner:
> 2016-03-24 9:54 GMT+01:00 Max Schmidt <m...@datapath.io
> <mailto:m...@datapath.io>>:
> > we're using with the java-api (1.6.0) a ScheduledExecutor that
> continuously
> > executes a SparkJob to a standalone cluster.
> I'd recommend Scala.
Why should I use scala?
>
> > After each job we close the JavaSparkContext and create a new one.
> Why do that? You can happily reuse it. Pretty sure that also causes
> the other problems, because you have a race condition on waiting for
> the job to finish and stopping the Context.
I do that because it is a very common pattern to create an object for
specific "job" and release its resources when its done.

The first problem that came in my mind was that the appName is immutable
once the JavaSparkContext was created, so it is, to me, not possible to
resuse the JavaSparkContext for jobs with different IDs (that we wanna
see in the webUI).

And of course it is possible to wait for closing the JavaSparkContext
gracefully, except when there is some asynchronous action in the background?

-- 
*Max Schmidt, Senior Java Developer* | m...@datapath.io
<mailto:m...@datapath.io> | LinkedIn
<https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/>
Datapath.io
 
Decreasing AWS latency.
Your traffic optimized.

Datapath.io GmbH
Mainz | HRB Nr. 46222
Sebastian Spies, CEO



Re: apache spark errors

2016-03-24 Thread Max Schmidt
es, TID = 47709
>
> 644989 [Executor task launch worker-13] ERROR
> org.apache.spark.executor.Executor  - Managed memory leak
> detected; size = 5326260 bytes, TID = 47863
>
> 720701 [Executor task launch worker-12] ERROR
> org.apache.spark.executor.Executor  - Managed memory leak
> detected; size = 5399578 bytes, TID = 48959
>
> 1147961 [Executor task launch worker-16] ERROR
> org.apache.spark.executor.Executor  - Managed memory leak
> detected; size = 5251872 bytes, TID = 54922
>
>  
>
>  
>
> How can I fix this?
>
>  
>
> With kind regard,
>
>  
>
> Michel
>
>  
>
>  
>

-- 
*Max Schmidt, Senior Java Developer* | m...@datapath.io
<mailto:m...@datapath.io> | LinkedIn
<https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/>
Datapath.io
 
Decreasing AWS latency.
Your traffic optimized.

Datapath.io GmbH
Mainz | HRB Nr. 46222
Sebastian Spies, CEO



Re: No active SparkContext

2016-03-24 Thread Max Schmidt

Am 2016-03-24 18:00, schrieb Mark Hamstra:

You seem to be confusing the concepts of Job and Application.  A
Spark Application has a SparkContext.  A Spark Application is capable
of running multiple Jobs, each with its own ID, visible in the webUI.


Obviously I mixed it up, but then I would like to know how my Java 
application should be constrcuted if wanted to submit periodic 
'Applications' to my cluster?

Did anyone use the

http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/launcher/package-summary.html

for this scenario?


On Thu, Mar 24, 2016 at 6:11 AM, Max Schmidt <m...@datapath.io> wrote:


Am 24.03.2016 um 10:34 schrieb Simon Hafner:


2016-03-24 9:54 GMT+01:00 Max Schmidt <m...@datapath.io>:
> we're using with the java-api (1.6.0) a ScheduledExecutor that 
continuously

> executes a SparkJob to a standalone cluster.
I'd recommend Scala.

Why should I use scala?


After each job we close the JavaSparkContext and create a new one.

Why do that? You can happily reuse it. Pretty sure that also causes
the other problems, because you have a race condition on waiting 
for

the job to finish and stopping the Context.
I do that because it is a very common pattern to create an object 
for specific "job" and release its resources when its done.


The first problem that came in my mind was that the appName is 
immutable once the JavaSparkContext was created, so it is, to me, not 
possible to resuse the JavaSparkContext for jobs with different IDs 
(that we wanna see in the webUI).


And of course it is possible to wait for closing the 
JavaSparkContext gracefully, except when there is some asynchronous 
action in the background?


--

MAX SCHMIDT, SENIOR JAVA DEVELOPER | m...@datapath.io | LinkedIn [1]

 
Decreasing AWS latency.
Your traffic optimized.

Datapath.io GmbH
Mainz | HRB Nr. 46222
Sebastian Spies, CEO




Links:
--
[1] https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



No active SparkContext

2016-03-24 Thread Max Schmidt
Hi there,

we're using with the java-api (1.6.0) a ScheduledExecutor that
continuously executes a SparkJob to a standalone cluster.

After each job we close the JavaSparkContext and create a new one.

But sometimes the Scheduling JVM crashes with:

24.03.2016-08:30:27:375# error - Application has been killed. Reason:
All masters are unresponsive! Giving up.
24.03.2016-08:30:27:398# error - Error initializing SparkContext.
java.lang.IllegalStateException: Cannot call methods on a stopped
SparkContext.
This stopped SparkContext was created at:

org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53)
io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130)
io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117)
io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown
Source)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

The currently active SparkContext was created at:

(No active SparkContext.)

at
org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
at
org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1578)
at
org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2179)
at org.apache.spark.SparkContext.(SparkContext.scala:579)
at
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at
io.datapath.spark.AbstractSparkJob.createJavaSparkContext(AbstractSparkJob.java:53)
at
io.datapath.measurement.SparkJobMeasurements.work(SparkJobMeasurements.java:130)
at
io.datapath.measurement.SparkMeasurementScheduler.lambda$submitSparkJobMeasurement$30(SparkMeasurementScheduler.java:117)
at
io.datapath.measurement.SparkMeasurementScheduler$$Lambda$17/1568787282.run(Unknown
Source)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
24.03.2016-08:30:27:402# info - SparkMeasurement - finished.

Any guess?
-- 
*Max Schmidt, Senior Java Developer* | m...@datapath.io
<mailto:m...@datapath.io> | LinkedIn
<https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/>
Datapath.io
 
Decreasing AWS latency.
Your traffic optimized.

Datapath.io GmbH
Mainz | HRB Nr. 46222
Sebastian Spies, CEO



Where to set properties for the retainedJobs/Stages?

2016-04-01 Thread Max Schmidt
Can somebody tell me the interaction between the properties:

spark.ui.retainedJobs
spark.ui.retainedStages
spark.history.retainedApplications

I know from the bugtracker, that the last one describes the number of
applications the history-server holds in memory.

Can I set the properties in the spark-env.sh? And where?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Where to set properties for the retainedJobs/Stages?

2016-04-01 Thread Max Schmidt
Yes but doc doesn't say any word for which variable the configs are 
valid, so do I have to set them for the history-server? The daemon? The 
workers?


And what if I use the java API instead of spark-submit for the jobs?

I guess that the spark-defaults.conf are obsolete for the java API?


Am 2016-04-01 18:58, schrieb Ted Yu:

You can set them in spark-defaults.conf

See 
also https://spark.apache.org/docs/latest/configuration.html#spark-ui 
[1]


On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io> wrote:


Can somebody tell me the interaction between the properties:

spark.ui.retainedJobs
spark.ui.retainedStages
spark.history.retainedApplications

I know from the bugtracker, that the last one describes the number 
of

applications the history-server holds in memory.

Can I set the properties in the spark-env.sh? And where?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Links:
--
[1] https://spark.apache.org/docs/latest/configuration.html#spark-ui





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Where to set properties for the retainedJobs/Stages?

2016-04-04 Thread Max Schmidt
Okay I put the props in the spark-defaults, but they are not recognized,
as they don't appear in the 'Environment' tab during a application
execution.

spark.eventLog.enabled for example.

Am 01.04.2016 um 21:22 schrieb Ted Yu:
> Please
> read 
> https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties
> w.r.t. spark-defaults.conf
>
> On Fri, Apr 1, 2016 at 12:06 PM, Max Schmidt <m...@datapath.io
> <mailto:m...@datapath.io>> wrote:
>
> Yes but doc doesn't say any word for which variable the configs
> are valid, so do I have to set them for the history-server? The
> daemon? The workers?
>
> And what if I use the java API instead of spark-submit for the jobs?
>
> I guess that the spark-defaults.conf are obsolete for the java API?
>
>
> Am 2016-04-01 18:58, schrieb Ted Yu:
>
> You can set them in spark-defaults.conf
>
> See
> also https://spark.apache.org/docs/latest/configuration.html#spark-ui
> [1]
>
> On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io
> <mailto:m...@datapath.io>> wrote:
>
> Can somebody tell me the interaction between the properties:
>
> spark.ui.retainedJobs
> spark.ui.retainedStages
> spark.history.retainedApplications
>
> I know from the bugtracker, that the last one describes
> the number of
> applications the history-server holds in memory.
>
> Can I set the properties in the spark-env.sh? And where?
>
>
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail:
> user-h...@spark.apache.org <mailto:user-h...@spark.apache.org>
>
>
>
>
> Links:
> --
> [1]
> https://spark.apache.org/docs/latest/configuration.html#spark-ui
>
>
>
>
>

-- 
*Max Schmidt, Senior Java Developer* | m...@datapath.io
<mailto:m...@datapath.io> | LinkedIn
<https://www.linkedin.com/in/maximilian-schmidt-9893b7bb/>
Datapath.io
 
Decreasing AWS latency.
Your traffic optimized.

Datapath.io GmbH
Mainz | HRB Nr. 46222
Sebastian Spies, CEO