RE: Logging in executors

2016-04-18 Thread Ashic Mahtab
I spent ages on this recently, and here's what I found:
--conf  
"spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///local/file/on.executor.properties"
 
works. Alternatively, you can also do:
--conf  
"spark.executor.extraJavaOptions=-Dlog4j.configuration=filename.properties"  
--files="path/to/filename.properties"
log4j.properties files packaged with the application don't seem to have any 
effect. This is likely because log4j gets initialised before your app stuff is 
loaded. You can also reinitialise log4j logging as part of your application 
code. That also worked for us, but we went the extraJavaOptions route as it was 
less invasive on the application side.
-Ashic.

Date: Mon, 18 Apr 2016 10:32:03 -0300
Subject: Re: Logging in executors
From: cma...@despegar.com
To: yuzhih...@gmail.com
CC: user@spark.apache.org

Thanks Ted, already checked it but is not the same. I'm working with StandAlone 
spark, the examples refers to HDFS paths, therefore I assume Hadoop 2 Resource 
Manager is used. I've tried all possible flavours. The only one that worked was 
changing the spark-defaults.conf in every machine. I'll go with this by now, 
but the extra java opts for the executor are definitely not working, at least 
for logging configuration.

Thanks,-carlos.
On Fri, Apr 15, 2016 at 3:28 PM, Ted Yu  wrote:
See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1
On Fri, Apr 15, 2016 at 5:38 AM, Carlos Rojas Matas  wrote:
Hi guys,
any clue on this? Clearly the 
spark.executor.extraJavaOpts=-Dlog4j.configuration is not working on the 
executors.
Thanks,-carlos.
On Wed, Apr 13, 2016 at 2:48 PM, Carlos Rojas Matas  wrote:
Hi Yong,
thanks for your response. As I said in my first email, I've tried both the 
reference to the classpath resource (env/dev/log4j-executor.properties) as the 
file:// protocol. Also, the driver logging is working fine and I'm using the 
same kind of reference.
Below the content of my classpath:


Plus this is the content of the exploded fat jar assembled with sbt assembly 
plugin:



This folder is at the root level of the classpath.
Thanks,-carlos.
On Wed, Apr 13, 2016 at 2:35 PM, Yong Zhang  wrote:



Is the env/dev/log4j-executor.properties file within your jar file? Is the path 
matching with what you specified as env/dev/log4j-executor.properties?
If you read the log4j document here: 
https://logging.apache.org/log4j/1.2/manual.html
When you specify the log4j.configuration=my_custom.properties, you have 2 
option:
1) the my_custom.properties has to be in the jar (or in the classpath). In your 
case, since you specify the package path, you need to make sure they are 
matched in your jar file2) use like 
log4j.configuration=file:///tmp/my_custom.properties. In this way, you need to 
make sure file my_custom.properties exists in /tmp folder on ALL of your worker 
nodes.
Yong

Date: Wed, 13 Apr 2016 14:18:24 -0300
Subject: Re: Logging in executors
From: cma...@despegar.com
To: yuzhih...@gmail.com
CC: user@spark.apache.org

Thanks for your response Ted. You're right, there was a typo. I changed it, now 
I'm executing:
bin/spark-submit --master spark://localhost:7077 --conf 
"spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
 --conf 
"spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
 --class
The content of this file is:
# Set everything to be logged to the consolelog4j.rootCategory=INFO, 
FILElog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.appender.console.target=System.errlog4j.appender.console.layout=org.apache.log4j.PatternLayoutlog4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
 HH:mm:ss} %p %c{1}: %m%n
log4j.appender.FILE=org.apache.log4j.RollingFileAppenderlog4j.appender.FILE.File=/tmp/executor.loglog4j.appender.FILE.ImmediateFlush=truelog4j.appender.FILE.Threshold=debuglog4j.appender.FILE.Append=truelog4j.appender.FILE.MaxFileSize=100MBlog4j.appender.FILE.MaxBackupIndex=5log4j.appender.FILE.layout=org.apache.log4j.PatternLayoutlog4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd
 HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too 
verboselog4j.logger.org.spark-project.jetty=WARNlog4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERRORlog4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFOlog4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFOlog4j.logger.org.apache.parquet=ERRORlog4j.logger.parquet=ERRORlog4j.logger.com.despegar.p13n=DEBUG
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent 
UDFs in SparkSQL with Hive 
supportlog4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATALlog4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

Finally, the code on which I'm using logging in the executor is:
def groupAndCount(keys: DStream[(String, List[String]

Re: Logging in executors

2016-04-18 Thread Ted Yu
Looking through this thread, I don't see Spark version you use.

Can you tell us the Spark release ?

Thanks

On Mon, Apr 18, 2016 at 6:32 AM, Carlos Rojas Matas 
wrote:

> Thanks Ted, already checked it but is not the same. I'm working with
> StandAlone spark, the examples refers to HDFS paths, therefore I assume
> Hadoop 2 Resource Manager is used. I've tried all possible flavours. The
> only one that worked was changing the spark-defaults.conf in every machine.
> I'll go with this by now, but the extra java opts for the executor are
> definitely not working, at least for logging configuration.
>
> Thanks,
> -carlos.
>
> On Fri, Apr 15, 2016 at 3:28 PM, Ted Yu  wrote:
>
>> See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1
>>
>> On Fri, Apr 15, 2016 at 5:38 AM, Carlos Rojas Matas 
>> wrote:
>>
>>> Hi guys,
>>>
>>> any clue on this? Clearly the
>>> spark.executor.extraJavaOpts=-Dlog4j.configuration is not working on the
>>> executors.
>>>
>>> Thanks,
>>> -carlos.
>>>
>>> On Wed, Apr 13, 2016 at 2:48 PM, Carlos Rojas Matas >> > wrote:
>>>
>>>> Hi Yong,
>>>>
>>>> thanks for your response. As I said in my first email, I've tried both
>>>> the reference to the classpath resource (env/dev/log4j-executor.properties)
>>>> as the file:// protocol. Also, the driver logging is working fine and I'm
>>>> using the same kind of reference.
>>>>
>>>> Below the content of my classpath:
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> Plus this is the content of the exploded fat jar assembled with sbt
>>>> assembly plugin:
>>>>
>>>> [image: Inline image 2]
>>>>
>>>>
>>>> This folder is at the root level of the classpath.
>>>>
>>>> Thanks,
>>>> -carlos.
>>>>
>>>> On Wed, Apr 13, 2016 at 2:35 PM, Yong Zhang 
>>>> wrote:
>>>>
>>>>> Is the env/dev/log4j-executor.properties file within your jar file? Is
>>>>> the path matching with what you specified as
>>>>> env/dev/log4j-executor.properties?
>>>>>
>>>>> If you read the log4j document here:
>>>>> https://logging.apache.org/log4j/1.2/manual.html
>>>>>
>>>>> When you specify the log4j.configuration=my_custom.properties, you
>>>>> have 2 option:
>>>>>
>>>>> 1) the my_custom.properties has to be in the jar (or in the
>>>>> classpath). In your case, since you specify the package path, you need to
>>>>> make sure they are matched in your jar file
>>>>> 2) use like log4j.configuration=file:///tmp/my_custom.properties. In
>>>>> this way, you need to make sure file my_custom.properties exists in /tmp
>>>>> folder on ALL of your worker nodes.
>>>>>
>>>>> Yong
>>>>>
>>>>> --
>>>>> Date: Wed, 13 Apr 2016 14:18:24 -0300
>>>>> Subject: Re: Logging in executors
>>>>> From: cma...@despegar.com
>>>>> To: yuzhih...@gmail.com
>>>>> CC: user@spark.apache.org
>>>>>
>>>>>
>>>>> Thanks for your response Ted. You're right, there was a typo. I
>>>>> changed it, now I'm executing:
>>>>>
>>>>> bin/spark-submit --master spark://localhost:7077 --conf
>>>>> "spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
>>>>> --conf
>>>>> "spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
>>>>> --class
>>>>>
>>>>> The content of this file is:
>>>>>
>>>>> # Set everything to be logged to the console
>>>>> log4j.rootCategory=INFO, FILE
>>>>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>>>> log4j.appender.console.target=System.err
>>>>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>>>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>>>>> %p %c{1}: %m%n
>>>>>
>>>>> log4j.appender.FILE=org.apache.log4j.RollingFileAppender
>>>>> log4j.appender.FILE.File=/tmp/executor.log
>>>>> log4j.appender.FILE.ImmediateFlush=t

Re: Logging in executors

2016-04-18 Thread Carlos Rojas Matas
Thanks Ted, already checked it but is not the same. I'm working with
StandAlone spark, the examples refers to HDFS paths, therefore I assume
Hadoop 2 Resource Manager is used. I've tried all possible flavours. The
only one that worked was changing the spark-defaults.conf in every machine.
I'll go with this by now, but the extra java opts for the executor are
definitely not working, at least for logging configuration.

Thanks,
-carlos.

On Fri, Apr 15, 2016 at 3:28 PM, Ted Yu  wrote:

> See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1
>
> On Fri, Apr 15, 2016 at 5:38 AM, Carlos Rojas Matas 
> wrote:
>
>> Hi guys,
>>
>> any clue on this? Clearly the
>> spark.executor.extraJavaOpts=-Dlog4j.configuration is not working on the
>> executors.
>>
>> Thanks,
>> -carlos.
>>
>> On Wed, Apr 13, 2016 at 2:48 PM, Carlos Rojas Matas 
>> wrote:
>>
>>> Hi Yong,
>>>
>>> thanks for your response. As I said in my first email, I've tried both
>>> the reference to the classpath resource (env/dev/log4j-executor.properties)
>>> as the file:// protocol. Also, the driver logging is working fine and I'm
>>> using the same kind of reference.
>>>
>>> Below the content of my classpath:
>>>
>>> [image: Inline image 1]
>>>
>>> Plus this is the content of the exploded fat jar assembled with sbt
>>> assembly plugin:
>>>
>>> [image: Inline image 2]
>>>
>>>
>>> This folder is at the root level of the classpath.
>>>
>>> Thanks,
>>> -carlos.
>>>
>>> On Wed, Apr 13, 2016 at 2:35 PM, Yong Zhang 
>>> wrote:
>>>
>>>> Is the env/dev/log4j-executor.properties file within your jar file? Is
>>>> the path matching with what you specified as
>>>> env/dev/log4j-executor.properties?
>>>>
>>>> If you read the log4j document here:
>>>> https://logging.apache.org/log4j/1.2/manual.html
>>>>
>>>> When you specify the log4j.configuration=my_custom.properties, you have
>>>> 2 option:
>>>>
>>>> 1) the my_custom.properties has to be in the jar (or in the classpath).
>>>> In your case, since you specify the package path, you need to make sure
>>>> they are matched in your jar file
>>>> 2) use like log4j.configuration=file:///tmp/my_custom.properties. In
>>>> this way, you need to make sure file my_custom.properties exists in /tmp
>>>> folder on ALL of your worker nodes.
>>>>
>>>> Yong
>>>>
>>>> --
>>>> Date: Wed, 13 Apr 2016 14:18:24 -0300
>>>> Subject: Re: Logging in executors
>>>> From: cma...@despegar.com
>>>> To: yuzhih...@gmail.com
>>>> CC: user@spark.apache.org
>>>>
>>>>
>>>> Thanks for your response Ted. You're right, there was a typo. I changed
>>>> it, now I'm executing:
>>>>
>>>> bin/spark-submit --master spark://localhost:7077 --conf
>>>> "spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
>>>> --conf
>>>> "spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
>>>> --class
>>>>
>>>> The content of this file is:
>>>>
>>>> # Set everything to be logged to the console
>>>> log4j.rootCategory=INFO, FILE
>>>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>>> log4j.appender.console.target=System.err
>>>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
>>>> %p %c{1}: %m%n
>>>>
>>>> log4j.appender.FILE=org.apache.log4j.RollingFileAppender
>>>> log4j.appender.FILE.File=/tmp/executor.log
>>>> log4j.appender.FILE.ImmediateFlush=true
>>>> log4j.appender.FILE.Threshold=debug
>>>> log4j.appender.FILE.Append=true
>>>> log4j.appender.FILE.MaxFileSize=100MB
>>>> log4j.appender.FILE.MaxBackupIndex=5
>>>> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
>>>> log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
>>>> %c{1}: %m%n
>>>>
>>>> # Settings to quiet third party logs that are too verbose
>>>> log4j.logger.org.spark-project.jetty=WARN
>>>>
>>>> lo

Re: Logging in executors

2016-04-15 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1

On Fri, Apr 15, 2016 at 5:38 AM, Carlos Rojas Matas 
wrote:

> Hi guys,
>
> any clue on this? Clearly the
> spark.executor.extraJavaOpts=-Dlog4j.configuration is not working on the
> executors.
>
> Thanks,
> -carlos.
>
> On Wed, Apr 13, 2016 at 2:48 PM, Carlos Rojas Matas 
> wrote:
>
>> Hi Yong,
>>
>> thanks for your response. As I said in my first email, I've tried both
>> the reference to the classpath resource (env/dev/log4j-executor.properties)
>> as the file:// protocol. Also, the driver logging is working fine and I'm
>> using the same kind of reference.
>>
>> Below the content of my classpath:
>>
>> [image: Inline image 1]
>>
>> Plus this is the content of the exploded fat jar assembled with sbt
>> assembly plugin:
>>
>> [image: Inline image 2]
>>
>>
>> This folder is at the root level of the classpath.
>>
>> Thanks,
>> -carlos.
>>
>> On Wed, Apr 13, 2016 at 2:35 PM, Yong Zhang  wrote:
>>
>>> Is the env/dev/log4j-executor.properties file within your jar file? Is
>>> the path matching with what you specified as
>>> env/dev/log4j-executor.properties?
>>>
>>> If you read the log4j document here:
>>> https://logging.apache.org/log4j/1.2/manual.html
>>>
>>> When you specify the log4j.configuration=my_custom.properties, you have
>>> 2 option:
>>>
>>> 1) the my_custom.properties has to be in the jar (or in the classpath).
>>> In your case, since you specify the package path, you need to make sure
>>> they are matched in your jar file
>>> 2) use like log4j.configuration=file:///tmp/my_custom.properties. In
>>> this way, you need to make sure file my_custom.properties exists in /tmp
>>> folder on ALL of your worker nodes.
>>>
>>> Yong
>>>
>>> --
>>> Date: Wed, 13 Apr 2016 14:18:24 -0300
>>> Subject: Re: Logging in executors
>>> From: cma...@despegar.com
>>> To: yuzhih...@gmail.com
>>> CC: user@spark.apache.org
>>>
>>>
>>> Thanks for your response Ted. You're right, there was a typo. I changed
>>> it, now I'm executing:
>>>
>>> bin/spark-submit --master spark://localhost:7077 --conf
>>> "spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
>>> --conf
>>> "spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
>>> --class
>>>
>>> The content of this file is:
>>>
>>> # Set everything to be logged to the console
>>> log4j.rootCategory=INFO, FILE
>>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> log4j.appender.console.target=System.err
>>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
>>> %c{1}: %m%n
>>>
>>> log4j.appender.FILE=org.apache.log4j.RollingFileAppender
>>> log4j.appender.FILE.File=/tmp/executor.log
>>> log4j.appender.FILE.ImmediateFlush=true
>>> log4j.appender.FILE.Threshold=debug
>>> log4j.appender.FILE.Append=true
>>> log4j.appender.FILE.MaxFileSize=100MB
>>> log4j.appender.FILE.MaxBackupIndex=5
>>> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
>>> log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
>>> %c{1}: %m%n
>>>
>>> # Settings to quiet third party logs that are too verbose
>>> log4j.logger.org.spark-project.jetty=WARN
>>>
>>> log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
>>> log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
>>> log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
>>> log4j.logger.org.apache.parquet=ERROR
>>> log4j.logger.parquet=ERROR
>>> log4j.logger.com.despegar.p13n=DEBUG
>>>
>>> # SPARK-9183: Settings to avoid annoying messages when looking up
>>> nonexistent UDFs in SparkSQL with Hive support
>>> log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
>>> log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
>>>
>>>
>>> Finally, the code on which I'm using logging in the executor is:
>>>
>>> def groupAndCount(keys: DStream[(String, List[String])])(handler: 
>>> ResultHandler) = {
>&g

Re: Logging in executors

2016-04-15 Thread Carlos Rojas Matas
Hi guys,

any clue on this? Clearly the
spark.executor.extraJavaOpts=-Dlog4j.configuration is not working on the
executors.

Thanks,
-carlos.

On Wed, Apr 13, 2016 at 2:48 PM, Carlos Rojas Matas 
wrote:

> Hi Yong,
>
> thanks for your response. As I said in my first email, I've tried both the
> reference to the classpath resource (env/dev/log4j-executor.properties) as
> the file:// protocol. Also, the driver logging is working fine and I'm
> using the same kind of reference.
>
> Below the content of my classpath:
>
> [image: Inline image 1]
>
> Plus this is the content of the exploded fat jar assembled with sbt
> assembly plugin:
>
> [image: Inline image 2]
>
>
> This folder is at the root level of the classpath.
>
> Thanks,
> -carlos.
>
> On Wed, Apr 13, 2016 at 2:35 PM, Yong Zhang  wrote:
>
>> Is the env/dev/log4j-executor.properties file within your jar file? Is
>> the path matching with what you specified as
>> env/dev/log4j-executor.properties?
>>
>> If you read the log4j document here:
>> https://logging.apache.org/log4j/1.2/manual.html
>>
>> When you specify the log4j.configuration=my_custom.properties, you have 2
>> option:
>>
>> 1) the my_custom.properties has to be in the jar (or in the classpath).
>> In your case, since you specify the package path, you need to make sure
>> they are matched in your jar file
>> 2) use like log4j.configuration=file:///tmp/my_custom.properties. In this
>> way, you need to make sure file my_custom.properties exists in /tmp folder
>> on ALL of your worker nodes.
>>
>> Yong
>>
>> --
>> Date: Wed, 13 Apr 2016 14:18:24 -0300
>> Subject: Re: Logging in executors
>> From: cma...@despegar.com
>> To: yuzhih...@gmail.com
>> CC: user@spark.apache.org
>>
>>
>> Thanks for your response Ted. You're right, there was a typo. I changed
>> it, now I'm executing:
>>
>> bin/spark-submit --master spark://localhost:7077 --conf
>> "spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
>> --conf
>> "spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
>> --class
>>
>> The content of this file is:
>>
>> # Set everything to be logged to the console
>> log4j.rootCategory=INFO, FILE
>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> log4j.appender.console.target=System.err
>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
>> %c{1}: %m%n
>>
>> log4j.appender.FILE=org.apache.log4j.RollingFileAppender
>> log4j.appender.FILE.File=/tmp/executor.log
>> log4j.appender.FILE.ImmediateFlush=true
>> log4j.appender.FILE.Threshold=debug
>> log4j.appender.FILE.Append=true
>> log4j.appender.FILE.MaxFileSize=100MB
>> log4j.appender.FILE.MaxBackupIndex=5
>> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
>> log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
>> %c{1}: %m%n
>>
>> # Settings to quiet third party logs that are too verbose
>> log4j.logger.org.spark-project.jetty=WARN
>>
>> log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
>> log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
>> log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
>> log4j.logger.org.apache.parquet=ERROR
>> log4j.logger.parquet=ERROR
>> log4j.logger.com.despegar.p13n=DEBUG
>>
>> # SPARK-9183: Settings to avoid annoying messages when looking up
>> nonexistent UDFs in SparkSQL with Hive support
>> log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
>> log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
>>
>>
>> Finally, the code on which I'm using logging in the executor is:
>>
>> def groupAndCount(keys: DStream[(String, List[String])])(handler: 
>> ResultHandler) = {
>>
>>   val result = keys.reduceByKey((prior, current) => {
>> (prior ::: current)
>>   }).flatMap {
>> case (date, keys) =>
>>   val rs = keys.groupBy(x => x).map(
>>   obs =>{
>> val (d,t) = date.split("@") match {
>>   case Array(d,t) => (d,t)
>> }
>> import org.apache.log4j.Logger
>> import scala.collection.JavaConverters._
>> val logger: Logger = Logger.getRootLogger
>> logger.info(s"Metric retrieved $d")
>> Metric("PV", d, obs._1, t, obs._2.size)
>> }
>>   )
>>   rs
>>   }
>>
>>   result.foreachRDD((rdd: RDD[Metric], time: Time) => {
>> handler(rdd, time)
>>   })
>>
>> }
>>
>>
>> Originally the import and logger object was outside the map function. I'm
>> also using the root logger just to see if it's working, but nothing gets
>> logged. I've checked that the property is set correctly on the executor
>> side through println(System.getProperty("log4j.configuration")) and is OK,
>> but still not working.
>>
>> Thanks again,
>> -carlos.
>>
>
>


Re: Logging in executors

2016-04-13 Thread Carlos Rojas Matas
Hi Yong,

thanks for your response. As I said in my first email, I've tried both the
reference to the classpath resource (env/dev/log4j-executor.properties) as
the file:// protocol. Also, the driver logging is working fine and I'm
using the same kind of reference.

Below the content of my classpath:

[image: Inline image 1]

Plus this is the content of the exploded fat jar assembled with sbt
assembly plugin:

[image: Inline image 2]


This folder is at the root level of the classpath.

Thanks,
-carlos.

On Wed, Apr 13, 2016 at 2:35 PM, Yong Zhang  wrote:

> Is the env/dev/log4j-executor.properties file within your jar file? Is the
> path matching with what you specified as env/dev/log4j-executor.properties?
>
> If you read the log4j document here:
> https://logging.apache.org/log4j/1.2/manual.html
>
> When you specify the log4j.configuration=my_custom.properties, you have 2
> option:
>
> 1) the my_custom.properties has to be in the jar (or in the classpath). In
> your case, since you specify the package path, you need to make sure they
> are matched in your jar file
> 2) use like log4j.configuration=file:///tmp/my_custom.properties. In this
> way, you need to make sure file my_custom.properties exists in /tmp folder
> on ALL of your worker nodes.
>
> Yong
>
> --------------
> Date: Wed, 13 Apr 2016 14:18:24 -0300
> Subject: Re: Logging in executors
> From: cma...@despegar.com
> To: yuzhih...@gmail.com
> CC: user@spark.apache.org
>
>
> Thanks for your response Ted. You're right, there was a typo. I changed
> it, now I'm executing:
>
> bin/spark-submit --master spark://localhost:7077 --conf
> "spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
> --conf
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
> --class
>
> The content of this file is:
>
> # Set everything to be logged to the console
> log4j.rootCategory=INFO, FILE
> log4j.appender.console=org.apache.log4j.ConsoleAppender
> log4j.appender.console.target=System.err
> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
> %c{1}: %m%n
>
> log4j.appender.FILE=org.apache.log4j.RollingFileAppender
> log4j.appender.FILE.File=/tmp/executor.log
> log4j.appender.FILE.ImmediateFlush=true
> log4j.appender.FILE.Threshold=debug
> log4j.appender.FILE.Append=true
> log4j.appender.FILE.MaxFileSize=100MB
> log4j.appender.FILE.MaxBackupIndex=5
> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
> log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
> %c{1}: %m%n
>
> # Settings to quiet third party logs that are too verbose
> log4j.logger.org.spark-project.jetty=WARN
> log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
> log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
> log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
> log4j.logger.org.apache.parquet=ERROR
> log4j.logger.parquet=ERROR
> log4j.logger.com.despegar.p13n=DEBUG
>
> # SPARK-9183: Settings to avoid annoying messages when looking up
> nonexistent UDFs in SparkSQL with Hive support
> log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
> log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
>
>
> Finally, the code on which I'm using logging in the executor is:
>
> def groupAndCount(keys: DStream[(String, List[String])])(handler: 
> ResultHandler) = {
>
>   val result = keys.reduceByKey((prior, current) => {
> (prior ::: current)
>   }).flatMap {
> case (date, keys) =>
>   val rs = keys.groupBy(x => x).map(
>   obs =>{
> val (d,t) = date.split("@") match {
>   case Array(d,t) => (d,t)
> }
> import org.apache.log4j.Logger
> import scala.collection.JavaConverters._
> val logger: Logger = Logger.getRootLogger
> logger.info(s"Metric retrieved $d")
> Metric("PV", d, obs._1, t, obs._2.size)
> }
>   )
>   rs
>   }
>
>   result.foreachRDD((rdd: RDD[Metric], time: Time) => {
> handler(rdd, time)
>   })
>
> }
>
>
> Originally the import and logger object was outside the map function. I'm
> also using the root logger just to see if it's working, but nothing gets
> logged. I've checked that the property is set correctly on the executor
> side through println(System.getProperty("log4j.configuration")) and is OK,
> but still not working.
>
> Thanks again,
> -carlos.
>


RE: Logging in executors

2016-04-13 Thread Yong Zhang
Is the env/dev/log4j-executor.properties file within your jar file? Is the path 
matching with what you specified as env/dev/log4j-executor.properties?
If you read the log4j document here: 
https://logging.apache.org/log4j/1.2/manual.html
When you specify the log4j.configuration=my_custom.properties, you have 2 
option:
1) the my_custom.properties has to be in the jar (or in the classpath). In your 
case, since you specify the package path, you need to make sure they are 
matched in your jar file2) use like 
log4j.configuration=file:///tmp/my_custom.properties. In this way, you need to 
make sure file my_custom.properties exists in /tmp folder on ALL of your worker 
nodes.
Yong

Date: Wed, 13 Apr 2016 14:18:24 -0300
Subject: Re: Logging in executors
From: cma...@despegar.com
To: yuzhih...@gmail.com
CC: user@spark.apache.org

Thanks for your response Ted. You're right, there was a typo. I changed it, now 
I'm executing:
bin/spark-submit --master spark://localhost:7077 --conf 
"spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
 --conf 
"spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
 --class
The content of this file is:
# Set everything to be logged to the consolelog4j.rootCategory=INFO, 
FILElog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.appender.console.target=System.errlog4j.appender.console.layout=org.apache.log4j.PatternLayoutlog4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
 HH:mm:ss} %p %c{1}: %m%n
log4j.appender.FILE=org.apache.log4j.RollingFileAppenderlog4j.appender.FILE.File=/tmp/executor.loglog4j.appender.FILE.ImmediateFlush=truelog4j.appender.FILE.Threshold=debuglog4j.appender.FILE.Append=truelog4j.appender.FILE.MaxFileSize=100MBlog4j.appender.FILE.MaxBackupIndex=5log4j.appender.FILE.layout=org.apache.log4j.PatternLayoutlog4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd
 HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too 
verboselog4j.logger.org.spark-project.jetty=WARNlog4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERRORlog4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFOlog4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFOlog4j.logger.org.apache.parquet=ERRORlog4j.logger.parquet=ERRORlog4j.logger.com.despegar.p13n=DEBUG
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent 
UDFs in SparkSQL with Hive 
supportlog4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATALlog4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

Finally, the code on which I'm using logging in the executor is:
def groupAndCount(keys: DStream[(String, List[String])])(handler: 
ResultHandler) = {
  
  val result = keys.reduceByKey((prior, current) => {
(prior ::: current)
  }).flatMap {
case (date, keys) =>
  val rs = keys.groupBy(x => x).map(
  obs =>{
val (d,t) = date.split("@") match {
  case Array(d,t) => (d,t)
}
import org.apache.log4j.Logger
import scala.collection.JavaConverters._
val logger: Logger = Logger.getRootLogger
logger.info(s"Metric retrieved $d")
Metric("PV", d, obs._1, t, obs._2.size)
}
  )
  rs
  }

  result.foreachRDD((rdd: RDD[Metric], time: Time) => {
handler(rdd, time)
  })

}
Originally the import and logger object was outside the map function. I'm also 
using the root logger just to see if it's working, but nothing gets logged. 
I've checked that the property is set correctly on the executor side through 
println(System.getProperty("log4j.configuration")) and is OK, but still not 
working.
Thanks again,-carlos. 

Re: Logging in executors

2016-04-13 Thread Carlos Rojas Matas
Thanks for your response Ted. You're right, there was a typo. I changed it,
now I'm executing:

bin/spark-submit --master spark://localhost:7077 --conf
"spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
--conf
"spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-executor.properties"
--class

The content of this file is:

# Set everything to be logged to the console
log4j.rootCategory=INFO, FILE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
%c{1}: %m%n

log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=/tmp/executor.log
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.Threshold=debug
log4j.appender.FILE.Append=true
log4j.appender.FILE.MaxFileSize=100MB
log4j.appender.FILE.MaxBackupIndex=5
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
%c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
log4j.logger.com.despegar.p13n=DEBUG

# SPARK-9183: Settings to avoid annoying messages when looking up
nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR


Finally, the code on which I'm using logging in the executor is:

def groupAndCount(keys: DStream[(String, List[String])])(handler:
ResultHandler) = {

  val result = keys.reduceByKey((prior, current) => {
(prior ::: current)
  }).flatMap {
case (date, keys) =>
  val rs = keys.groupBy(x => x).map(
  obs =>{
val (d,t) = date.split("@") match {
  case Array(d,t) => (d,t)
}
import org.apache.log4j.Logger
import scala.collection.JavaConverters._
val logger: Logger = Logger.getRootLogger
logger.info(s"Metric retrieved $d")
Metric("PV", d, obs._1, t, obs._2.size)
}
  )
  rs
  }

  result.foreachRDD((rdd: RDD[Metric], time: Time) => {
handler(rdd, time)
  })

}


Originally the import and logger object was outside the map function. I'm
also using the root logger just to see if it's working, but nothing gets
logged. I've checked that the property is set correctly on the executor
side through println(System.getProperty("log4j.configuration")) and is OK,
but still not working.

Thanks again,
-carlos.


Re: Logging in executors

2016-04-13 Thread Ted Yu
bq. --conf "spark.executor.extraJavaOptions=-Dlog4j.
configuration=env/dev/log4j-driver.properties"

I think the above may have a typo : you refer to log4j-driver.properties in
both arguments.

FYI

On Wed, Apr 13, 2016 at 8:09 AM, Carlos Rojas Matas 
wrote:

> Hi guys,
>
> I'm trying to enable logging in the executors but with no luck.
>
> According to the oficial documentation and several blogs, this should be
> done passing the
> "spark.executor.extraJavaOpts=-Dlog4j.configuration=[my-file]" to the
> spark-submit tool. I've tried both sending a reference to a classpath
> resource as using the "file:" protocol but nothing happens. Of course in
> the later case, I've used the --file option in the command line, although
> is not clear where this file is uploaded in the worker machine.
>
> However, I was able to make it work by setting the properties in the
> spark-defaults.conf file pointing to each one of the configurations on the
> machine. This approach has a big drawback though: if I change something in
> the log4j configuration I need to change it in every machine (and I''m not
> sure if restarting is required) which is not what I'm looking for.
>
> The complete command I'm using is as follows:
>
> bin/spark-submit --master spark://localhost:7077 --conf
> "spark.driver.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
> --conf
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=env/dev/log4j-driver.properties"
> --class [my-main-class] [my-jar].jar
>
>
> Both files are in the classpath and are reachable -- already tested with
> the driver.
>
> Any comments will be welcomed.
>
> Thanks in advance.
> -carlos.
>
>