Thanks a lot Sonal.. I will give it a try. Regards Sumit Chawla
On Wed, Dec 7, 2016 at 10:45 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > You can try updating metrics.properties for the sink of your choice. In > our case, we add the following for getting application metrics in JSON > format using http > > *.sink.reifier.class= org.apache.spark.metrics.sink.MetricsServlet > > Here, we have defined the sink with name reifier and its class is the > MetricsServlet class. Then you can poll <master > ui>/metrics/applications/json > > Take a look at https://github.com/hammerlab/spark-json-relay if it serves > your need. > > Thanks, > Sonal > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > On Wed, Dec 7, 2016 at 1:10 AM, Chawla,Sumit <sumitkcha...@gmail.com> > wrote: > >> Any pointers on this? >> >> Regards >> Sumit Chawla >> >> >> On Mon, Dec 5, 2016 at 8:30 PM, Chawla,Sumit <sumitkcha...@gmail.com> >> wrote: >> >>> An example implementation i found is : https://github.com/groupon/s >>> park-metrics >>> >>> Anyone has any experience using this? I am more interested in something >>> for Pyspark specifically. >>> >>> The above link pointed to - https://github.com/apache/sp >>> ark/blob/master/conf/metrics.properties.template. I need to spend some >>> time reading it, but any quick pointers will be appreciated. >>> >>> >>> >>> Regards >>> Sumit Chawla >>> >>> >>> On Mon, Dec 5, 2016 at 8:17 PM, Chawla,Sumit <sumitkcha...@gmail.com> >>> wrote: >>> >>>> Hi Manish >>>> >>>> I am specifically looking for something similar to following: >>>> >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.1 >>>> /apis/common/index.html#accumulators--counters. >>>> >>>> Flink has this concept of Accumulators, where user can keep its custom >>>> counters etc. While the application is executing these counters are >>>> queryable through REST API provided by Flink Monitoring Backend. This way >>>> you don't have to wait for the program to complete. >>>> >>>> >>>> >>>> Regards >>>> Sumit Chawla >>>> >>>> >>>> On Mon, Dec 5, 2016 at 5:53 PM, manish ranjan <cse1.man...@gmail.com> >>>> wrote: >>>> >>>>> http://spark.apache.org/docs/latest/monitoring.html >>>>> >>>>> You can even install tools like dstat >>>>> <http://dag.wieers.com/home-made/dstat/>, iostat >>>>> <http://linux.die.net/man/1/iostat>, and iotop >>>>> <http://linux.die.net/man/1/iotop>, *collectd* can provide >>>>> fine-grained profiling on individual nodes. >>>>> >>>>> If you are using Mesos as Resource Manager , mesos exposes metrics as >>>>> well for the running job. >>>>> >>>>> Manish >>>>> >>>>> ~Manish >>>>> >>>>> >>>>> >>>>> On Mon, Dec 5, 2016 at 4:17 PM, Chawla,Sumit <sumitkcha...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi All >>>>>> >>>>>> I have a long running job which takes hours and hours to process >>>>>> data. How can i monitor the operational efficency of this job? I am >>>>>> interested in something like Storm\Flink style User metrics/aggregators, >>>>>> which i can monitor while my job is running. Using these metrics i want >>>>>> to >>>>>> monitor, per partition performance in processing items. As of now, only >>>>>> way for me to get these metrics is when the job finishes. >>>>>> >>>>>> One possibility is that spark can flush the metrics to external >>>>>> system every few seconds, and thus use an external system to monitor >>>>>> these >>>>>> metrics. However, i wanted to see if the spark supports any such use >>>>>> case >>>>>> OOB. >>>>>> >>>>>> >>>>>> Regards >>>>>> Sumit Chawla >>>>>> >>>>>> >>>>> >>>> >>> >> >