Re: Job History Logs for spark jobs submitted on YARN

2016-01-21 Thread nsalian
Hello,

Thanks for the question.
1) Typically the Resource Manager in YARN would print out the Aggregate
Resource Allocation for the application after you have found the specific
application using the application id.

2) As MapReduce, there is a parameter that is part of either the
spark-defaults.conf or the application specific configuration.
spark.eventLog.dir=hdfs://:8020/user/spark/applicationHistory
This is where the Spark History Server gets the information after the
application is completed.

3) In the History server on Spark there are the tabs that allow you to look
at the information that you need:
Jobs
Stages
Storage
Environment
Executors

Especially the Executors will give a bit more detailed information:
Storage Memory  
Disk Used

Hopefully that helps.
Thank you.




-
Neelesh S. Salian
Cloudera
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Job-History-Logs-for-spark-jobs-submitted-on-YARN-tp25946p26043.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Job History Logs for spark jobs submitted on YARN

2016-01-12 Thread Arkadiusz Bicz
Hi,

You can checkout http://spark.apache.org/docs/latest/monitoring.html,
you can monitor hdfs, memory usage per job and executor and driver. I
have connected it to Graphite for storage and Grafana for
visualization. I have also connected to collectd which provides me all
server nodes metrics like disc, memory and cpu utilization.

On Tue, Jan 12, 2016 at 10:50 AM, laxmanvemula  wrote:
> I observe that YARN jobs history logs are created in /user/history/done
> (*.jhist files) for all the mapreduce jobs like hive, pig etc. But for spark
> jobs submitted in yarn-cluster mode, the logs are not being created.
>
> I would like to see resource utilization by spark jobs. Is there any other
> place where I can find the resource utilization by spark jobs (CPU, Memory
> etc). Or is there any configuration to be set so that the job history logs
> are created just like other mapreduce jobs.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Job-History-Logs-for-spark-jobs-submitted-on-YARN-tp25946.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org