Re: (YARN CLUSTER MODE) Where to find logs within Spark RDD processing function ?

2016-04-29 Thread nguyen duc tuan
what does the WebUI show? What do you see when you click on "stderr" and
"stdout" links ? These links must contain stdoutput and stderr for each
executor.
About your custom logging in executor, are you sure you checked "${spark.
yarn.app.container.log.dir}/spark-app.log"
Actual location of this file each executor is ${
yarn.nodemanager.remote-app-log-dir}/{applicationId}/${spark.
yarn.app.container.log.dir}/spark-app.log (yarn.nodemanager.remote-app-log-dir
setting can found in yarn-site.xml in hadoop config folder)
For example, in above example, when I click to "stdout" link with respect
to hslave-13, I get link "
http://hslave-13:8042/node/containerlogs/container_1459219311185_2456_01_04/tuannd/stdout?start=-4096";,
this means the location of file is in hslave-13: ${
yarn.nodemanager.remote-app-log-dir}/appId/
container_1459219311185_2456_01_04/spark-app.log

I also see that you forgot to send file "log4j.properties" to executors in
spark-submit command. Executors will try to find log4j.properties in its
execution's folder. In this case, this file is not found, the setting for
logging will be ignored.
You have to add parameters --files /path/to/your/log4j.properties in order
to send this file to executors.
​​

Finally, In order to debug what is happening in executors, you should write
it directly to stdout or stderr. It's much easier to check than go directly
to executor and find your log file :)

2016-04-29 21:30 GMT+07:00 dev loper :

> Hi Ted & Nguyen,
>
> @Ted , I was under the belief that if the log4j.properties file would be
> taken from the application classpath if  file path is not specified.
> Please correct me if I am wrong. I tried your approach as well still I
> couldn't find the logs.
>
> @nguyen I am running it on a Yarn cluster , so Spark UI is redirecting me
> to Yarn UI. I couldn't see the logs there as well. I checked the logs on
> both Master and worker. I am running a cluster with one master and one
> worker.  Even I tired yarn logs there also its not turning up. Does yarn
> logs  include executor logs as well ?
>
>
> Request your help to identify the issue .
>
> On Fri, Apr 29, 2016 at 7:32 PM, Ted Yu  wrote:
>
>> Please use the following syntax:
>>
>> --conf
>>  
>> "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///local/file/log4j.properties"
>>
>> FYI
>>
>> On Fri, Apr 29, 2016 at 6:03 AM, dev loper  wrote:
>>
>>> Hi Spark Team,
>>>
>>> I have asked the same question on stack overflow  , no luck yet.
>>>
>>>
>>> http://stackoverflow.com/questions/36923949/where-to-find-logs-within-spark-rdd-processing-function-yarn-cluster-mode?noredirect=1#comment61419406_36923949
>>>
>>> I am running my Spark Application on Yarn Cluster. No matter what I do,
>>> I am not able to get the logs within the RDD function printed . Below you
>>> can find the sample snippet which I have written for the RDD processing
>>> function . I have simplified the code to illustrate the syntax I have used
>>> to write the function. When I am running it locally I am able to see the
>>> logs but not in cluster mode. Neither System.err.println nor the logger
>>> seems to be working. But I could see all my driver logs. I even tried to
>>> log using the Root logger , but it was not working at all within the RDD
>>> processing function .I was desperate to see the log messages so finally I
>>> found a guide to use logger as transient (
>>> https://www.mapr.com/blog/how-log-apache-spark) ,but event that didn't
>>> help
>>>
>>> class SampleFlatMapFunction implements PairFlatMapFunction 
>>> ,String,String>{
>>>
>>> private static final long serialVersionUID = 6565656322667L;
>>> transient Logger  executorLogger = 
>>> LogManager.getLogger("sparkExecutor");
>>>
>>>
>>> private void readObject(java.io.ObjectInputStream in)
>>> throws IOException, ClassNotFoundException {
>>> in.defaultReadObject();
>>> executorLogger = LogManager.getLogger("sparkExecutor");
>>> }
>>> @Override
>>> public Iterable> call(Tuple2 
>>> tuple)throws Exception {
>>>
>>> executorLogger.info(" log testing from  executorLogger ::");
>>> System.err.println(" log testing from  executorLogger system error 
>>> stream ");
>>>
>>>
>>> List> updates = new ArrayList<>();
>>> //process Tuple , expand and add it to list.
>>> return updates;
>>>
>>>  }
>>>  };
>>>
>>> My Log4j Configuration is given below
>>>
>>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> log4j.appender.console.target=System.err
>>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} 
>>> %p %c{1}: %m%n
>>>
>>> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
>>> log4j.appender.stdout.target=System.out
>>> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
>>> log4j.appender.stdout.layout.ConversionPatte

Re: (YARN CLUSTER MODE) Where to find logs within Spark RDD processing function ?

2016-04-29 Thread dev loper
Hi Ted & Nguyen,

@Ted , I was under the belief that if the log4j.properties file would be
taken from the application classpath if  file path is not specified.
Please correct me if I am wrong. I tried your approach as well still I
couldn't find the logs.

@nguyen I am running it on a Yarn cluster , so Spark UI is redirecting me
to Yarn UI. I couldn't see the logs there as well. I checked the logs on
both Master and worker. I am running a cluster with one master and one
worker.  Even I tired yarn logs there also its not turning up. Does yarn
logs  include executor logs as well ?


Request your help to identify the issue .

On Fri, Apr 29, 2016 at 7:32 PM, Ted Yu  wrote:

> Please use the following syntax:
>
> --conf
>  
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///local/file/log4j.properties"
>
> FYI
>
> On Fri, Apr 29, 2016 at 6:03 AM, dev loper  wrote:
>
>> Hi Spark Team,
>>
>> I have asked the same question on stack overflow  , no luck yet.
>>
>>
>> http://stackoverflow.com/questions/36923949/where-to-find-logs-within-spark-rdd-processing-function-yarn-cluster-mode?noredirect=1#comment61419406_36923949
>>
>> I am running my Spark Application on Yarn Cluster. No matter what I do, I
>> am not able to get the logs within the RDD function printed . Below you can
>> find the sample snippet which I have written for the RDD processing
>> function . I have simplified the code to illustrate the syntax I have used
>> to write the function. When I am running it locally I am able to see the
>> logs but not in cluster mode. Neither System.err.println nor the logger
>> seems to be working. But I could see all my driver logs. I even tried to
>> log using the Root logger , but it was not working at all within the RDD
>> processing function .I was desperate to see the log messages so finally I
>> found a guide to use logger as transient (
>> https://www.mapr.com/blog/how-log-apache-spark) ,but event that didn't
>> help
>>
>> class SampleFlatMapFunction implements PairFlatMapFunction 
>> ,String,String>{
>>
>> private static final long serialVersionUID = 6565656322667L;
>> transient Logger  executorLogger = LogManager.getLogger("sparkExecutor");
>>
>>
>> private void readObject(java.io.ObjectInputStream in)
>> throws IOException, ClassNotFoundException {
>> in.defaultReadObject();
>> executorLogger = LogManager.getLogger("sparkExecutor");
>> }
>> @Override
>> public Iterable> call(Tuple2 
>> tuple)throws Exception {
>>
>> executorLogger.info(" log testing from  executorLogger ::");
>> System.err.println(" log testing from  executorLogger system error 
>> stream ");
>>
>>
>> List> updates = new ArrayList<>();
>> //process Tuple , expand and add it to list.
>> return updates;
>>
>>  }
>>  };
>>
>> My Log4j Configuration is given below
>>
>> log4j.appender.console=org.apache.log4j.ConsoleAppender
>> log4j.appender.console.target=System.err
>> log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
>> %c{1}: %m%n
>>
>> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
>> log4j.appender.stdout.target=System.out
>> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
>> log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
>> %c{1}: %m%n
>>
>> log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
>> log4j.appender.RollingAppender.File=/var/log/spark/spark.log
>> log4j.appender.RollingAppender.DatePattern='.'-MM-dd
>> log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
>> log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - 
>> %m%n
>>
>> log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
>> 
>> log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log
>> log4j.appender.RollingAppenderU.DatePattern='.'-MM-dd
>> log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
>> log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - 
>> %m%n
>>
>>
>> # By default, everything goes to console and file
>> log4j.rootLogger=INFO, RollingAppender, console
>>
>> # My custom logging goes to another file
>> log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU
>>
>>
>> i have tried yarn logs, Spark UI Logs nowhere I could see the log
>> statements from RDD processing functions . I tried below Approaches but it
>> didn't work
>>
>> yarn logs -applicationId
>>
>> I checked even below HDFS path also
>>
>> /tmp/logs/
>>
>>
>> I am running my spark-submit command by passing below arguments, Even
>> then its not working
>>
>>   --master yarn --deploy-mode cluster   --conf 
>> "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"  
>> --conf 
>> "

Re: (YARN CLUSTER MODE) Where to find logs within Spark RDD processing function ?

2016-04-29 Thread Ted Yu
Please use the following syntax:

--conf
 
"spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///local/file/log4j.properties"

FYI

On Fri, Apr 29, 2016 at 6:03 AM, dev loper  wrote:

> Hi Spark Team,
>
> I have asked the same question on stack overflow  , no luck yet.
>
>
> http://stackoverflow.com/questions/36923949/where-to-find-logs-within-spark-rdd-processing-function-yarn-cluster-mode?noredirect=1#comment61419406_36923949
>
> I am running my Spark Application on Yarn Cluster. No matter what I do, I
> am not able to get the logs within the RDD function printed . Below you can
> find the sample snippet which I have written for the RDD processing
> function . I have simplified the code to illustrate the syntax I have used
> to write the function. When I am running it locally I am able to see the
> logs but not in cluster mode. Neither System.err.println nor the logger
> seems to be working. But I could see all my driver logs. I even tried to
> log using the Root logger , but it was not working at all within the RDD
> processing function .I was desperate to see the log messages so finally I
> found a guide to use logger as transient (
> https://www.mapr.com/blog/how-log-apache-spark) ,but event that didn't
> help
>
> class SampleFlatMapFunction implements PairFlatMapFunction 
> ,String,String>{
>
> private static final long serialVersionUID = 6565656322667L;
> transient Logger  executorLogger = LogManager.getLogger("sparkExecutor");
>
>
> private void readObject(java.io.ObjectInputStream in)
> throws IOException, ClassNotFoundException {
> in.defaultReadObject();
> executorLogger = LogManager.getLogger("sparkExecutor");
> }
> @Override
> public Iterable> call(Tuple2 tuple) 
>throws Exception {
>
> executorLogger.info(" log testing from  executorLogger ::");
> System.err.println(" log testing from  executorLogger system error 
> stream ");
>
>
> List> updates = new ArrayList<>();
> //process Tuple , expand and add it to list.
> return updates;
>
>  }
>  };
>
> My Log4j Configuration is given below
>
> log4j.appender.console=org.apache.log4j.ConsoleAppender
> log4j.appender.console.target=System.err
> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
> %c{1}: %m%n
>
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.target=System.out
> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
> log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
> %c{1}: %m%n
>
> log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
> log4j.appender.RollingAppender.File=/var/log/spark/spark.log
> log4j.appender.RollingAppender.DatePattern='.'-MM-dd
> log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - 
> %m%n
>
> log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
> 
> log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log
> log4j.appender.RollingAppenderU.DatePattern='.'-MM-dd
> log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
> log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - 
> %m%n
>
>
> # By default, everything goes to console and file
> log4j.rootLogger=INFO, RollingAppender, console
>
> # My custom logging goes to another file
> log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU
>
>
> i have tried yarn logs, Spark UI Logs nowhere I could see the log
> statements from RDD processing functions . I tried below Approaches but it
> didn't work
>
> yarn logs -applicationId
>
> I checked even below HDFS path also
>
> /tmp/logs/
>
>
> I am running my spark-submit command by passing below arguments, Even then
> its not working
>
>   --master yarn --deploy-mode cluster   --conf 
> "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"  
> --conf 
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
>
> Can somebody guide me on logging within spark RDD and map functions ? What
> am I missing in the above steps ?
>
> Thanks
>
> Dev
>


Re: (YARN CLUSTER MODE) Where to find logs within Spark RDD processing function ?

2016-04-29 Thread nguyen duc tuan
These are executor's logs, not the driver logs. To see this log files, you
have to go to executor machines where tasks is running. To see what you
will print to stdout or stderr you can either go to the executor machines
directly (will store in "stdout" and "stderr" files somewhere in the
executor machine) or see through webui

2016-04-29 20:03 GMT+07:00 dev loper :

> Hi Spark Team,
>
> I have asked the same question on stack overflow  , no luck yet.
>
>
> http://stackoverflow.com/questions/36923949/where-to-find-logs-within-spark-rdd-processing-function-yarn-cluster-mode?noredirect=1#comment61419406_36923949
>
> I am running my Spark Application on Yarn Cluster. No matter what I do, I
> am not able to get the logs within the RDD function printed . Below you can
> find the sample snippet which I have written for the RDD processing
> function . I have simplified the code to illustrate the syntax I have used
> to write the function. When I am running it locally I am able to see the
> logs but not in cluster mode. Neither System.err.println nor the logger
> seems to be working. But I could see all my driver logs. I even tried to
> log using the Root logger , but it was not working at all within the RDD
> processing function .I was desperate to see the log messages so finally I
> found a guide to use logger as transient (
> https://www.mapr.com/blog/how-log-apache-spark) ,but event that didn't
> help
>
> class SampleFlatMapFunction implements PairFlatMapFunction 
> ,String,String>{
>
> private static final long serialVersionUID = 6565656322667L;
> transient Logger  executorLogger = LogManager.getLogger("sparkExecutor");
>
>
> private void readObject(java.io.ObjectInputStream in)
> throws IOException, ClassNotFoundException {
> in.defaultReadObject();
> executorLogger = LogManager.getLogger("sparkExecutor");
> }
> @Override
> public Iterable> call(Tuple2 tuple) 
>throws Exception {
>
> executorLogger.info(" log testing from  executorLogger ::");
> System.err.println(" log testing from  executorLogger system error 
> stream ");
>
>
> List> updates = new ArrayList<>();
> //process Tuple , expand and add it to list.
> return updates;
>
>  }
>  };
>
> My Log4j Configuration is given below
>
> log4j.appender.console=org.apache.log4j.ConsoleAppender
> log4j.appender.console.target=System.err
> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
> %c{1}: %m%n
>
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.target=System.out
> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
> log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
> %c{1}: %m%n
>
> log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
> log4j.appender.RollingAppender.File=/var/log/spark/spark.log
> log4j.appender.RollingAppender.DatePattern='.'-MM-dd
> log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - 
> %m%n
>
> log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
> 
> log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log
> log4j.appender.RollingAppenderU.DatePattern='.'-MM-dd
> log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
> log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - 
> %m%n
>
>
> # By default, everything goes to console and file
> log4j.rootLogger=INFO, RollingAppender, console
>
> # My custom logging goes to another file
> log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU
>
>
> i have tried yarn logs, Spark UI Logs nowhere I could see the log
> statements from RDD processing functions . I tried below Approaches but it
> didn't work
>
> yarn logs -applicationId
>
> I checked even below HDFS path also
>
> /tmp/logs/
>
>
> I am running my spark-submit command by passing below arguments, Even then
> its not working
>
>   --master yarn --deploy-mode cluster   --conf 
> "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"  
> --conf 
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
>
> Can somebody guide me on logging within spark RDD and map functions ? What
> am I missing in the above steps ?
>
> Thanks
>
> Dev
>


(YARN CLUSTER MODE) Where to find logs within Spark RDD processing function ?

2016-04-29 Thread dev loper
Hi Spark Team,

I have asked the same question on stack overflow  , no luck yet.

http://stackoverflow.com/questions/36923949/where-to-find-logs-within-spark-rdd-processing-function-yarn-cluster-mode?noredirect=1#comment61419406_36923949

I am running my Spark Application on Yarn Cluster. No matter what I do, I
am not able to get the logs within the RDD function printed . Below you can
find the sample snippet which I have written for the RDD processing
function . I have simplified the code to illustrate the syntax I have used
to write the function. When I am running it locally I am able to see the
logs but not in cluster mode. Neither System.err.println nor the logger
seems to be working. But I could see all my driver logs. I even tried to
log using the Root logger , but it was not working at all within the RDD
processing function .I was desperate to see the log messages so finally I
found a guide to use logger as transient (
https://www.mapr.com/blog/how-log-apache-spark) ,but event that didn't help

class SampleFlatMapFunction implements PairFlatMapFunction
,String,String>{

private static final long serialVersionUID = 6565656322667L;
transient Logger  executorLogger = LogManager.getLogger("sparkExecutor");


private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
in.defaultReadObject();
executorLogger = LogManager.getLogger("sparkExecutor");
}
@Override
public Iterable> call(Tuple2
tuple)throws Exception {

executorLogger.info(" log testing from  executorLogger ::");
System.err.println(" log testing from  executorLogger system
error stream ");


List> updates = new ArrayList<>();
//process Tuple , expand and add it to list.
return updates;

 }
 };

My Log4j Configuration is given below

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{1}: %m%n

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{1}: %m%n

log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=/var/log/spark/spark.log
log4j.appender.RollingAppender.DatePattern='.'-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n

log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender

log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log
log4j.appender.RollingAppenderU.DatePattern='.'-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d
%c %M - %m%n


# By default, everything goes to console and file
log4j.rootLogger=INFO, RollingAppender, console

# My custom logging goes to another file
log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU


i have tried yarn logs, Spark UI Logs nowhere I could see the log
statements from RDD processing functions . I tried below Approaches but it
didn't work

yarn logs -applicationId

I checked even below HDFS path also

/tmp/logs/


I am running my spark-submit command by passing below arguments, Even then
its not working

  --master yarn --deploy-mode cluster   --conf
"spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
 --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"

Can somebody guide me on logging within spark RDD and map functions ? What
am I missing in the above steps ?

Thanks

Dev