Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-27 Thread Christophe Préaud
Yes, spark.yarn.historyServer.address is used to access the spark history 
server from yarn, it is not needed if you use only the yarn history server.
It may be possible to have both history servers running, but I have not tried 
that yet.

Besides, as far as I have understood, yarn and spark history servers have two 
different purposes:
- yarn history server is for looking at your application logs after it has 
finished
- spark history server is for looking at your application in the spark web ui 
(the one with the Stages, Storage, Environment and Executors) after it 
has finished

Regards,
Christophe.

On 26/02/2015 20:30, Colin Kincaid Williams wrote:
 Right now I have set spark.yarn.historyServer.address in my spark configs to 
have yarn point to the spark-history server. Then from your mail it sounds like 
I should try another setting, or remove it completely. I also noticed that the 
aggregated log files appear in a directory in hdfs under application/spark vs. 
application/yarn or similar. I will review my configurations and see if I can 
get this working.

Thanks,

Colin Williams


On Thu, Feb 26, 2015 at 9:11 AM, Christophe Préaud 
christophe.pre...@kelkoo.commailto:christophe.pre...@kelkoo.com wrote:
You can see this information in the yarn web UI using the configuration I 
provided in my former mail (click on the application id, then on logs; you will 
then be automatically redirected to the yarn history server UI).


On 24/02/2015 19:49, Colin Kincaid Williams wrote:
So back to my original question.

I can see the spark logs using the example above:

yarn logs -applicationId application_1424740955620_0009

This shows yarn log aggregation working. I can see the std out and std error in 
that container information above. Then how can I get this information in a 
web-ui ? Is this not currently supported?

On Tue, Feb 24, 2015 at 10:44 AM, Imran Rashid 
iras...@cloudera.commailto:iras...@cloudera.com wrote:
the spark history server and the yarn history server are totally independent.  
Spark knows nothing about yarn logs, and vice versa, so unfortunately there 
isn't any way to get all the info in one place.

On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams 
disc...@uw.edumailto:disc...@uw.edu wrote:
Looks like in my tired state, I didn't mention spark the whole time. However, 
it might be implied by the application log above. Spark log aggregation appears 
to be working, since I can run the yarn command above. I do have yarn logging 
setup for the yarn history server. I was trying to use the spark 
history-server, but maybe I should try setting

spark.yarn.historyServer.address

to the yarn history-server, instead of the spark history-server? I tried this 
configuration when I started, but didn't have much luck.

Are you getting your spark apps run in yarn client or cluster mode in your yarn 
history server? If so can you share any spark settings?

On Tue, Feb 24, 2015 at 8:48 AM, Christophe Préaud 
christophe.pre...@kelkoo.commailto:christophe.pre...@kelkoo.com wrote:
Hi Colin,

Here is how I have configured my hadoop cluster to have yarn logs available 
through both the yarn CLI and the _yarn_ history server (with gzip compression 
and 10 days retention):

1. Add the following properties in the yarn-site.xml on each node managers and 
on the resource manager:
  property
nameyarn.log-aggregation-enable/name
valuetrue/value
  /property
  property
nameyarn.log-aggregation.retain-seconds/name
value864000/value
  /property
  property
nameyarn.log.server.url/name

valuehttp://dc1-kdp-dev-hadoop-03.dev.dc1.kelkoo.net:19888/jobhistory/logs/value
  /property
  property
nameyarn.nodemanager.log-aggregation.compression-type/name
valuegz/value
  /property

2. Restart yarn and then start the yarn history server on the server defined in 
the yarn.log.server.url property above:

/opt/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver # should fail if 
historyserver is not yet started
/opt/hadoop/sbin/stop-yarn.sh
/opt/hadoop/sbin/start-yarn.sh
/opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver


It may be slightly different for you if the resource manager and the history 
server are not on the same machine.

Hope it will work for you as well!
Christophe.

On 24/02/2015 06:31, Colin Kincaid Williams wrote:
 Hi,

 I have been trying to get my yarn logs to display in the spark history-server 
 or yarn history-server. I can see the log information


 yarn logs -applicationId application_1424740955620_0009
 15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to us3sm2hbqa04r07-comp-prod-local


 Container: container_1424740955620_0009_01_02 on 
 us3sm2hbqa07r07.comp.prod.local_8041
 ===
 LogType: stderr
 LogLength: 0
 Log Contents:

 LogType: stdout
 LogLength: 897
 Log Contents:
 [GC [PSYoungGen: 262656K-23808K(306176K)] 

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-26 Thread Christophe Préaud
You can see this information in the yarn web UI using the configuration I 
provided in my former mail (click on the application id, then on logs; you will 
then be automatically redirected to the yarn history server UI).

On 24/02/2015 19:49, Colin Kincaid Williams wrote:
So back to my original question.

I can see the spark logs using the example above:

yarn logs -applicationId application_1424740955620_0009

This shows yarn log aggregation working. I can see the std out and std error in 
that container information above. Then how can I get this information in a 
web-ui ? Is this not currently supported?

On Tue, Feb 24, 2015 at 10:44 AM, Imran Rashid 
iras...@cloudera.commailto:iras...@cloudera.com wrote:
the spark history server and the yarn history server are totally independent.  
Spark knows nothing about yarn logs, and vice versa, so unfortunately there 
isn't any way to get all the info in one place.

On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams 
disc...@uw.edumailto:disc...@uw.edu wrote:
Looks like in my tired state, I didn't mention spark the whole time. However, 
it might be implied by the application log above. Spark log aggregation appears 
to be working, since I can run the yarn command above. I do have yarn logging 
setup for the yarn history server. I was trying to use the spark 
history-server, but maybe I should try setting

spark.yarn.historyServer.address

to the yarn history-server, instead of the spark history-server? I tried this 
configuration when I started, but didn't have much luck.

Are you getting your spark apps run in yarn client or cluster mode in your yarn 
history server? If so can you share any spark settings?

On Tue, Feb 24, 2015 at 8:48 AM, Christophe Préaud 
christophe.pre...@kelkoo.commailto:christophe.pre...@kelkoo.com wrote:
Hi Colin,

Here is how I have configured my hadoop cluster to have yarn logs available 
through both the yarn CLI and the _yarn_ history server (with gzip compression 
and 10 days retention):

1. Add the following properties in the yarn-site.xml on each node managers and 
on the resource manager:
  property
nameyarn.log-aggregation-enable/name
valuetrue/value
  /property
  property
nameyarn.log-aggregation.retain-seconds/name
value864000/value
  /property
  property
nameyarn.log.server.url/name

valuehttp://dc1-kdp-dev-hadoop-03.dev.dc1.kelkoo.net:19888/jobhistory/logs/value
  /property
  property
nameyarn.nodemanager.log-aggregation.compression-type/name
valuegz/value
  /property

2. Restart yarn and then start the yarn history server on the server defined in 
the yarn.log.server.url property above:

/opt/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver # should fail if 
historyserver is not yet started
/opt/hadoop/sbin/stop-yarn.sh
/opt/hadoop/sbin/start-yarn.sh
/opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver


It may be slightly different for you if the resource manager and the history 
server are not on the same machine.

Hope it will work for you as well!
Christophe.

On 24/02/2015 06:31, Colin Kincaid Williams wrote:
 Hi,

 I have been trying to get my yarn logs to display in the spark history-server 
 or yarn history-server. I can see the log information


 yarn logs -applicationId application_1424740955620_0009
 15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to us3sm2hbqa04r07-comp-prod-local


 Container: container_1424740955620_0009_01_02 on 
 us3sm2hbqa07r07.comp.prod.local_8041
 ===
 LogType: stderr
 LogLength: 0
 Log Contents:

 LogType: stdout
 LogLength: 897
 Log Contents:
 [GC [PSYoungGen: 262656K-23808K(306176K)] 262656K-23880K(1005568K), 
 0.0283450 secs] [Times: user=0.14 sys=0.03, real=0.03 secs]
 Heap
  PSYoungGen  total 306176K, used 111279K [0xeaa8, 
 0x0001, 0x0001)
   eden space 262656K, 33% used 
 [0xeaa8,0xeffebbe0,0xfab0)
   from space 43520K, 54% used 
 [0xfab0,0xfc240320,0xfd58)
   to   space 43520K, 0% used 
 [0xfd58,0xfd58,0x0001)
  ParOldGen   total 699392K, used 72K [0xbff8, 
 0xeaa8, 0xeaa8)
   object space 699392K, 0% used 
 [0xbff8,0xbff92010,0xeaa8)
  PSPermGen   total 35328K, used 34892K [0xbad8, 
 0xbd00, 0xbff8)
   object space 35328K, 98% used 
 [0xbad8,0xbcf93088,0xbd00)



 Container: container_1424740955620_0009_01_03 on 
 us3sm2hbqa09r09.comp.prod.local_8041
 ===
 LogType: stderr
 LogLength: 0
 Log Contents:

 LogType: stdout
 LogLength: 896
 Log Contents:
 [GC [PSYoungGen: 262656K-23725K(306176K)] 262656K-23797K(1005568K), 
 0.0358650 secs] 

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Christophe Préaud
Hi Colin,

Here is how I have configured my hadoop cluster to have yarn logs available 
through both the yarn CLI and the _yarn_ history server (with gzip compression 
and 10 days retention):

1. Add the following properties in the yarn-site.xml on each node managers and 
on the resource manager:
  property
nameyarn.log-aggregation-enable/name
valuetrue/value
  /property
  property
nameyarn.log-aggregation.retain-seconds/name
value864000/value
  /property
  property
nameyarn.log.server.url/name

valuehttp://dc1-kdp-dev-hadoop-03.dev.dc1.kelkoo.net:19888/jobhistory/logs/value
  /property
  property
nameyarn.nodemanager.log-aggregation.compression-type/name
valuegz/value
  /property

2. Restart yarn and then start the yarn history server on the server defined in 
the yarn.log.server.url property above:

/opt/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver # should fail if 
historyserver is not yet started
/opt/hadoop/sbin/stop-yarn.sh
/opt/hadoop/sbin/start-yarn.sh
/opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver


It may be slightly different for you if the resource manager and the history 
server are not on the same machine.

Hope it will work for you as well!
Christophe.

On 24/02/2015 06:31, Colin Kincaid Williams wrote:
 Hi,

 I have been trying to get my yarn logs to display in the spark history-server 
 or yarn history-server. I can see the log information


 yarn logs -applicationId application_1424740955620_0009
 15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to us3sm2hbqa04r07-comp-prod-local


 Container: container_1424740955620_0009_01_02 on 
 us3sm2hbqa07r07.comp.prod.local_8041
 ===
 LogType: stderr
 LogLength: 0
 Log Contents:

 LogType: stdout
 LogLength: 897
 Log Contents:
 [GC [PSYoungGen: 262656K-23808K(306176K)] 262656K-23880K(1005568K), 
 0.0283450 secs] [Times: user=0.14 sys=0.03, real=0.03 secs]
 Heap
  PSYoungGen  total 306176K, used 111279K [0xeaa8, 
 0x0001, 0x0001)
   eden space 262656K, 33% used 
 [0xeaa8,0xeffebbe0,0xfab0)
   from space 43520K, 54% used 
 [0xfab0,0xfc240320,0xfd58)
   to   space 43520K, 0% used 
 [0xfd58,0xfd58,0x0001)
  ParOldGen   total 699392K, used 72K [0xbff8, 
 0xeaa8, 0xeaa8)
   object space 699392K, 0% used 
 [0xbff8,0xbff92010,0xeaa8)
  PSPermGen   total 35328K, used 34892K [0xbad8, 
 0xbd00, 0xbff8)
   object space 35328K, 98% used 
 [0xbad8,0xbcf93088,0xbd00)



 Container: container_1424740955620_0009_01_03 on 
 us3sm2hbqa09r09.comp.prod.local_8041
 ===
 LogType: stderr
 LogLength: 0
 Log Contents:

 LogType: stdout
 LogLength: 896
 Log Contents:
 [GC [PSYoungGen: 262656K-23725K(306176K)] 262656K-23797K(1005568K), 
 0.0358650 secs] [Times: user=0.28 sys=0.04, real=0.04 secs]
 Heap
  PSYoungGen  total 306176K, used 65712K [0xeaa8, 
 0x0001, 0x0001)
   eden space 262656K, 15% used 
 [0xeaa8,0xed380bf8,0xfab0)
   from space 43520K, 54% used 
 [0xfab0,0xfc22b4f8,0xfd58)
   to   space 43520K, 0% used 
 [0xfd58,0xfd58,0x0001)
  ParOldGen   total 699392K, used 72K [0xbff8, 
 0xeaa8, 0xeaa8)
   object space 699392K, 0% used 
 [0xbff8,0xbff92010,0xeaa8)
  PSPermGen   total 29696K, used 29486K [0xbad8, 
 0xbca8, 0xbff8)
   object space 29696K, 99% used 
 [0xbad8,0xbca4b838,0xbca8)



 Container: container_1424740955620_0009_01_01 on 
 us3sm2hbqa09r09.comp.prod.local_8041
 ===
 LogType: stderr
 LogLength: 0
 Log Contents:

 LogType: stdout
 LogLength: 21
 Log Contents:
 Pi is roughly 3.1416

 I can see some details for the application in the spark history-server at 
 this url 
 http://us3sm2hbqa04r07.comp.prod.local:18080/history/application_1424740955620_0009/jobs/
  . When running in spark-master mode, I can see the stdout and stderror 
 somewhere in the spark history-server. Then how do I get the information 
 which I see above into the Spark history-server ?


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de 

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
Looks like in my tired state, I didn't mention spark the whole time.
However, it might be implied by the application log above. Spark log
aggregation appears to be working, since I can run the yarn command above.
I do have yarn logging setup for the yarn history server. I was trying to
use the spark history-server, but maybe I should try setting

spark.yarn.historyServer.address

to the yarn history-server, instead of the spark history-server? I tried
this configuration when I started, but didn't have much luck.

Are you getting your spark apps run in yarn client or cluster mode in your
yarn history server? If so can you share any spark settings?

On Tue, Feb 24, 2015 at 8:48 AM, Christophe Préaud 
christophe.pre...@kelkoo.com wrote:

 Hi Colin,

 Here is how I have configured my hadoop cluster to have yarn logs
 available through both the yarn CLI and the _yarn_ history server (with
 gzip compression and 10 days retention):

 1. Add the following properties in the yarn-site.xml on each node managers
 and on the resource manager:
   property
 nameyarn.log-aggregation-enable/name
 valuetrue/value
   /property
   property
 nameyarn.log-aggregation.retain-seconds/name
 value864000/value
   /property
   property
 nameyarn.log.server.url/name
 value
 http://dc1-kdp-dev-hadoop-03.dev.dc1.kelkoo.net:19888/jobhistory/logs
 /value
   /property
   property
 nameyarn.nodemanager.log-aggregation.compression-type/name
 valuegz/value
   /property

 2. Restart yarn and then start the yarn history server on the server
 defined in the yarn.log.server.url property above:

 /opt/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver # should fail
 if historyserver is not yet started
 /opt/hadoop/sbin/stop-yarn.sh
 /opt/hadoop/sbin/start-yarn.sh
 /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver


 It may be slightly different for you if the resource manager and the
 history server are not on the same machine.

 Hope it will work for you as well!
 Christophe.

 On 24/02/2015 06:31, Colin Kincaid Williams wrote:
  Hi,
 
  I have been trying to get my yarn logs to display in the spark
 history-server or yarn history-server. I can see the log information
 
 
  yarn logs -applicationId application_1424740955620_0009
  15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing
 over to us3sm2hbqa04r07-comp-prod-local
 
 
  Container: container_1424740955620_0009_01_02 on
 us3sm2hbqa07r07.comp.prod.local_8041
 
 ===
  LogType: stderr
  LogLength: 0
  Log Contents:
 
  LogType: stdout
  LogLength: 897
  Log Contents:
  [GC [PSYoungGen: 262656K-23808K(306176K)] 262656K-23880K(1005568K),
 0.0283450 secs] [Times: user=0.14 sys=0.03, real=0.03 secs]
  Heap
   PSYoungGen  total 306176K, used 111279K [0xeaa8,
 0x0001, 0x0001)
eden space 262656K, 33% used
 [0xeaa8,0xeffebbe0,0xfab0)
from space 43520K, 54% used
 [0xfab0,0xfc240320,0xfd58)
to   space 43520K, 0% used
 [0xfd58,0xfd58,0x0001)
   ParOldGen   total 699392K, used 72K [0xbff8,
 0xeaa8, 0xeaa8)
object space 699392K, 0% used
 [0xbff8,0xbff92010,0xeaa8)
   PSPermGen   total 35328K, used 34892K [0xbad8,
 0xbd00, 0xbff8)
object space 35328K, 98% used
 [0xbad8,0xbcf93088,0xbd00)
 
 
 
  Container: container_1424740955620_0009_01_03 on
 us3sm2hbqa09r09.comp.prod.local_8041
 
 ===
  LogType: stderr
  LogLength: 0
  Log Contents:
 
  LogType: stdout
  LogLength: 896
  Log Contents:
  [GC [PSYoungGen: 262656K-23725K(306176K)] 262656K-23797K(1005568K),
 0.0358650 secs] [Times: user=0.28 sys=0.04, real=0.04 secs]
  Heap
   PSYoungGen  total 306176K, used 65712K [0xeaa8,
 0x0001, 0x0001)
eden space 262656K, 15% used
 [0xeaa8,0xed380bf8,0xfab0)
from space 43520K, 54% used
 [0xfab0,0xfc22b4f8,0xfd58)
to   space 43520K, 0% used
 [0xfd58,0xfd58,0x0001)
   ParOldGen   total 699392K, used 72K [0xbff8,
 0xeaa8, 0xeaa8)
object space 699392K, 0% used
 [0xbff8,0xbff92010,0xeaa8)
   PSPermGen   total 29696K, used 29486K [0xbad8,
 0xbca8, 0xbff8)
object space 29696K, 99% used
 [0xbad8,0xbca4b838,0xbca8)
 
 
 
  Container: container_1424740955620_0009_01_01 on
 us3sm2hbqa09r09.comp.prod.local_8041
 
 

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Imran Rashid
the spark history server and the yarn history server are totally
independent.  Spark knows nothing about yarn logs, and vice versa, so
unfortunately there isn't any way to get all the info in one place.

On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams disc...@uw.edu
wrote:

 Looks like in my tired state, I didn't mention spark the whole time.
 However, it might be implied by the application log above. Spark log
 aggregation appears to be working, since I can run the yarn command above.
 I do have yarn logging setup for the yarn history server. I was trying to
 use the spark history-server, but maybe I should try setting

 spark.yarn.historyServer.address

 to the yarn history-server, instead of the spark history-server? I tried
 this configuration when I started, but didn't have much luck.

 Are you getting your spark apps run in yarn client or cluster mode in your
 yarn history server? If so can you share any spark settings?

 On Tue, Feb 24, 2015 at 8:48 AM, Christophe Préaud 
 christophe.pre...@kelkoo.com wrote:

 Hi Colin,

 Here is how I have configured my hadoop cluster to have yarn logs
 available through both the yarn CLI and the _yarn_ history server (with
 gzip compression and 10 days retention):

 1. Add the following properties in the yarn-site.xml on each node
 managers and on the resource manager:
   property
 nameyarn.log-aggregation-enable/name
 valuetrue/value
   /property
   property
 nameyarn.log-aggregation.retain-seconds/name
 value864000/value
   /property
   property
 nameyarn.log.server.url/name
 value
 http://dc1-kdp-dev-hadoop-03.dev.dc1.kelkoo.net:19888/jobhistory/logs
 /value
   /property
   property
 nameyarn.nodemanager.log-aggregation.compression-type/name
 valuegz/value
   /property

 2. Restart yarn and then start the yarn history server on the server
 defined in the yarn.log.server.url property above:

 /opt/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver # should fail
 if historyserver is not yet started
 /opt/hadoop/sbin/stop-yarn.sh
 /opt/hadoop/sbin/start-yarn.sh
 /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver


 It may be slightly different for you if the resource manager and the
 history server are not on the same machine.

 Hope it will work for you as well!
 Christophe.

 On 24/02/2015 06:31, Colin Kincaid Williams wrote:
  Hi,
 
  I have been trying to get my yarn logs to display in the spark
 history-server or yarn history-server. I can see the log information
 
 
  yarn logs -applicationId application_1424740955620_0009
  15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider:
 Failing over to us3sm2hbqa04r07-comp-prod-local
 
 
  Container: container_1424740955620_0009_01_02 on
 us3sm2hbqa07r07.comp.prod.local_8041
 
 ===
  LogType: stderr
  LogLength: 0
  Log Contents:
 
  LogType: stdout
  LogLength: 897
  Log Contents:
  [GC [PSYoungGen: 262656K-23808K(306176K)] 262656K-23880K(1005568K),
 0.0283450 secs] [Times: user=0.14 sys=0.03, real=0.03 secs]
  Heap
   PSYoungGen  total 306176K, used 111279K [0xeaa8,
 0x0001, 0x0001)
eden space 262656K, 33% used
 [0xeaa8,0xeffebbe0,0xfab0)
from space 43520K, 54% used
 [0xfab0,0xfc240320,0xfd58)
to   space 43520K, 0% used
 [0xfd58,0xfd58,0x0001)
   ParOldGen   total 699392K, used 72K [0xbff8,
 0xeaa8, 0xeaa8)
object space 699392K, 0% used
 [0xbff8,0xbff92010,0xeaa8)
   PSPermGen   total 35328K, used 34892K [0xbad8,
 0xbd00, 0xbff8)
object space 35328K, 98% used
 [0xbad8,0xbcf93088,0xbd00)
 
 
 
  Container: container_1424740955620_0009_01_03 on
 us3sm2hbqa09r09.comp.prod.local_8041
 
 ===
  LogType: stderr
  LogLength: 0
  Log Contents:
 
  LogType: stdout
  LogLength: 896
  Log Contents:
  [GC [PSYoungGen: 262656K-23725K(306176K)] 262656K-23797K(1005568K),
 0.0358650 secs] [Times: user=0.28 sys=0.04, real=0.04 secs]
  Heap
   PSYoungGen  total 306176K, used 65712K [0xeaa8,
 0x0001, 0x0001)
eden space 262656K, 15% used
 [0xeaa8,0xed380bf8,0xfab0)
from space 43520K, 54% used
 [0xfab0,0xfc22b4f8,0xfd58)
to   space 43520K, 0% used
 [0xfd58,0xfd58,0x0001)
   ParOldGen   total 699392K, used 72K [0xbff8,
 0xeaa8, 0xeaa8)
object space 699392K, 0% used
 [0xbff8,0xbff92010,0xeaa8)
   PSPermGen   total 29696K, used 29486K [0xbad8,
 

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
So back to my original question.

I can see the spark logs using the example above:

yarn logs -applicationId application_1424740955620_0009

This shows yarn log aggregation working. I can see the std out and std
error in that container information above. Then how can I get this
information in a web-ui ? Is this not currently supported?

On Tue, Feb 24, 2015 at 10:44 AM, Imran Rashid iras...@cloudera.com wrote:

 the spark history server and the yarn history server are totally
 independent.  Spark knows nothing about yarn logs, and vice versa, so
 unfortunately there isn't any way to get all the info in one place.

 On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams disc...@uw.edu
 wrote:

 Looks like in my tired state, I didn't mention spark the whole time.
 However, it might be implied by the application log above. Spark log
 aggregation appears to be working, since I can run the yarn command above.
 I do have yarn logging setup for the yarn history server. I was trying to
 use the spark history-server, but maybe I should try setting

 spark.yarn.historyServer.address

 to the yarn history-server, instead of the spark history-server? I tried
 this configuration when I started, but didn't have much luck.

 Are you getting your spark apps run in yarn client or cluster mode in
 your yarn history server? If so can you share any spark settings?

 On Tue, Feb 24, 2015 at 8:48 AM, Christophe Préaud 
 christophe.pre...@kelkoo.com wrote:

 Hi Colin,

 Here is how I have configured my hadoop cluster to have yarn logs
 available through both the yarn CLI and the _yarn_ history server (with
 gzip compression and 10 days retention):

 1. Add the following properties in the yarn-site.xml on each node
 managers and on the resource manager:
   property
 nameyarn.log-aggregation-enable/name
 valuetrue/value
   /property
   property
 nameyarn.log-aggregation.retain-seconds/name
 value864000/value
   /property
   property
 nameyarn.log.server.url/name
 value
 http://dc1-kdp-dev-hadoop-03.dev.dc1.kelkoo.net:19888/jobhistory/logs
 /value
   /property
   property
 nameyarn.nodemanager.log-aggregation.compression-type/name
 valuegz/value
   /property

 2. Restart yarn and then start the yarn history server on the server
 defined in the yarn.log.server.url property above:

 /opt/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver # should
 fail if historyserver is not yet started
 /opt/hadoop/sbin/stop-yarn.sh
 /opt/hadoop/sbin/start-yarn.sh
 /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver


 It may be slightly different for you if the resource manager and the
 history server are not on the same machine.

 Hope it will work for you as well!
 Christophe.

 On 24/02/2015 06:31, Colin Kincaid Williams wrote:
  Hi,
 
  I have been trying to get my yarn logs to display in the spark
 history-server or yarn history-server. I can see the log information
 
 
  yarn logs -applicationId application_1424740955620_0009
  15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider:
 Failing over to us3sm2hbqa04r07-comp-prod-local
 
 
  Container: container_1424740955620_0009_01_02 on
 us3sm2hbqa07r07.comp.prod.local_8041
 
 ===
  LogType: stderr
  LogLength: 0
  Log Contents:
 
  LogType: stdout
  LogLength: 897
  Log Contents:
  [GC [PSYoungGen: 262656K-23808K(306176K)] 262656K-23880K(1005568K),
 0.0283450 secs] [Times: user=0.14 sys=0.03, real=0.03 secs]
  Heap
   PSYoungGen  total 306176K, used 111279K [0xeaa8,
 0x0001, 0x0001)
eden space 262656K, 33% used
 [0xeaa8,0xeffebbe0,0xfab0)
from space 43520K, 54% used
 [0xfab0,0xfc240320,0xfd58)
to   space 43520K, 0% used
 [0xfd58,0xfd58,0x0001)
   ParOldGen   total 699392K, used 72K [0xbff8,
 0xeaa8, 0xeaa8)
object space 699392K, 0% used
 [0xbff8,0xbff92010,0xeaa8)
   PSPermGen   total 35328K, used 34892K [0xbad8,
 0xbd00, 0xbff8)
object space 35328K, 98% used
 [0xbad8,0xbcf93088,0xbd00)
 
 
 
  Container: container_1424740955620_0009_01_03 on
 us3sm2hbqa09r09.comp.prod.local_8041
 
 ===
  LogType: stderr
  LogLength: 0
  Log Contents:
 
  LogType: stdout
  LogLength: 896
  Log Contents:
  [GC [PSYoungGen: 262656K-23725K(306176K)] 262656K-23797K(1005568K),
 0.0358650 secs] [Times: user=0.28 sys=0.04, real=0.04 secs]
  Heap
   PSYoungGen  total 306176K, used 65712K [0xeaa8,
 0x0001, 0x0001)
eden space 262656K, 15% used
 [0xeaa8,0xed380bf8,0xfab0)
from space 43520K, 54% used