Re: Monitoring single-run job statistics
Hi Stephan, Thank you for you answer. I would love to contribute but currently I have no capacity as I am buried with my thesis. I will reach out after graduating :) Bestr regards. Filip Pozdrawiam, Filip Łęczycki 2016-01-05 10:35 GMT+01:00 Stephan Ewen <se...@apache.org>: > Hi Filip! > > There are thoughts and efforts to extend Flink to push the result > statistics of Flink jobs to the YARN timeline server. That way, you can > explore jobs that are completed. > > Since the whole web dashboard in Flink has a pure REST design, this is a > quite straightforward fix. > > From the capacities I see in the community, I can not promise that to be > fixed immediately. Let me know, though, if you are interested in > contributing an addition there, and I can walk you through the steps that > would be needed. > > Greetings, > Stephan > > > On Mon, Jan 4, 2016 at 9:17 PM, Filip Łęczycki <filipleczy...@gmail.com> > wrote: > >> Hi Till, >> >> Thank you for you answer however I am sorry to hear that. I was reluctant >> to execute jobs with long running Flink cluster due to the fact that >> multiple jobs would cloud yarn statistics regarding cpu and memory time as >> well as Flink's garbage collector statistics in log, as they would be >> stored for the whole Flink cluster, instead of a single job. >> >> Do you know whether is there a way to extract mentioned stats (cpu time, >> mem time, gc time) for a single job ran on long running Flink cluster? >> >> I will be very grateful for an answer:) >> >> Best regards, >> Filip >> >> Pozdrawiam, >> Filip Łęczycki >> >> 2016-01-04 10:05 GMT+01:00 Till Rohrmann <till.rohrm...@gmail.com>: >> >>> Hi Filip, >>> >>> at the moment it is not possible to retrieve the job statistics after >>> the job has finished with flink run -m yarn-cluster. The reason is that >>> the YARN cluster is only alive as long as the job is executed. Thus, I >>> would recommend you to execute your jobs with a long running Flink cluster >>> on YARN. >>> >>> Cheers, >>> Till >>> >>> >>> On Fri, Jan 1, 2016 at 11:29 PM, Filip Łęczycki <filipleczy...@gmail.com >>> > wrote: >>> >>>> Hi all, >>>> >>>> I am running filnk aps on YARN cluster and I am trying to get some >>>> benchmarks. When I start a long-running flink cluster on my YARN cluster I >>>> have an access to web UI and rest API that provide me statistics of the >>>> deployed jobs (as desribed here: >>>> https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html). >>>> I was wondering is this possible to get such information about a single run >>>> job trigerred with 'flink run -m yarn-cluster ...'? After the job is >>>> finished there is no flink client running so I cannot use rest api to get >>>> stats. >>>> >>>> Thanks for any help:) >>>> >>>> >>>> Best regards/Pozdrawiam, >>>> Filip Łęczycki >>>> >>> >>> >> >
Re: Monitoring single-run job statistics
Hi Till, Thank you for you answer however I am sorry to hear that. I was reluctant to execute jobs with long running Flink cluster due to the fact that multiple jobs would cloud yarn statistics regarding cpu and memory time as well as Flink's garbage collector statistics in log, as they would be stored for the whole Flink cluster, instead of a single job. Do you know whether is there a way to extract mentioned stats (cpu time, mem time, gc time) for a single job ran on long running Flink cluster? I will be very grateful for an answer:) Best regards, Filip Pozdrawiam, Filip Łęczycki 2016-01-04 10:05 GMT+01:00 Till Rohrmann <till.rohrm...@gmail.com>: > Hi Filip, > > at the moment it is not possible to retrieve the job statistics after the > job has finished with flink run -m yarn-cluster. The reason is that the > YARN cluster is only alive as long as the job is executed. Thus, I would > recommend you to execute your jobs with a long running Flink cluster on > YARN. > > Cheers, > Till > > > On Fri, Jan 1, 2016 at 11:29 PM, Filip Łęczycki <filipleczy...@gmail.com> > wrote: > >> Hi all, >> >> I am running filnk aps on YARN cluster and I am trying to get some >> benchmarks. When I start a long-running flink cluster on my YARN cluster I >> have an access to web UI and rest API that provide me statistics of the >> deployed jobs (as desribed here: >> https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html). >> I was wondering is this possible to get such information about a single run >> job trigerred with 'flink run -m yarn-cluster ...'? After the job is >> finished there is no flink client running so I cannot use rest api to get >> stats. >> >> Thanks for any help:) >> >> >> Best regards/Pozdrawiam, >> Filip Łęczycki >> > >
Monitoring single-run job statistics
Hi all, I am running filnk aps on YARN cluster and I am trying to get some benchmarks. When I start a long-running flink cluster on my YARN cluster I have an access to web UI and rest API that provide me statistics of the deployed jobs (as desribed here: https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html). I was wondering is this possible to get such information about a single run job trigerred with 'flink run -m yarn-cluster ...'? After the job is finished there is no flink client running so I cannot use rest api to get stats. Thanks for any help:) Best regards/Pozdrawiam, Filip Łęczycki
Re: Job Statistics
Hi Jean, I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for? Best, Max On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Fabian, I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console. Thanks! Jean Em 18/06/2015 04:55, Fabian Hueske fhue...@gmail.com escreveu: Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de : Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Re: Job Statistics
Hi Fabian, I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console. Thanks! Jean Em 18/06/2015 04:55, Fabian Hueske fhue...@gmail.com escreveu: Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de: Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Re: Job Statistics
Hi Maximilian, The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results? Thank you!! 2015-06-18 10:45 GMT-03:00 Maximilian Michels m...@apache.org: Hi Jean, I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for? Best, Max On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Fabian, I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console. Thanks! Jean Em 18/06/2015 04:55, Fabian Hueske fhue...@gmail.com escreveu: Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de : Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Re: Job Statistics
Hello Max, I will try to do that! Do you know if I could obtain data about the I/O and communication as well? From what I could understand I can get the runtime and the accumulator results only. Is that right? 2015-06-18 11:37 GMT-03:00 Maximilian Michels m...@apache.org: Hi Jean, As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment: JobExecutionResult result = env.execute(); System.out.println(runtime: result.getNetRuntime()); for (Map.EntryString, Object entry : result.getAllAccumulatorResults().entrySet()) { System.out.println(entry.getKey() + : entry.getValue()); } You would do that in your Flink program. You could also store metrics in the accumulators. However, since you're trying to compare different systems I'd advise you to use some external tools for monitoring resource usage like Ganglia or collectd. Best, Max On Thu, Jun 18, 2015 at 4:11 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Maximilian, The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results? Thank you!! 2015-06-18 10:45 GMT-03:00 Maximilian Michels m...@apache.org: Hi Jean, I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for? Best, Max On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Fabian, I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console. Thanks! Jean Em 18/06/2015 04:55, Fabian Hueske fhue...@gmail.com escreveu: Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de: Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Re: Job Statistics
Hi, I tried to view directly from the web interface but I could not find any other information about the completed jobs. I have the list, but when I open it, no further information is provided. Is this correct? 2015-06-18 15:10 GMT-03:00 Jean Bez jeanluca...@gmail.com: Hello Max, I will try to do that! Do you know if I could obtain data about the I/O and communication as well? From what I could understand I can get the runtime and the accumulator results only. Is that right? 2015-06-18 11:37 GMT-03:00 Maximilian Michels m...@apache.org: Hi Jean, As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment: JobExecutionResult result = env.execute(); System.out.println(runtime: result.getNetRuntime()); for (Map.EntryString, Object entry : result.getAllAccumulatorResults().entrySet()) { System.out.println(entry.getKey() + : entry.getValue()); } You would do that in your Flink program. You could also store metrics in the accumulators. However, since you're trying to compare different systems I'd advise you to use some external tools for monitoring resource usage like Ganglia or collectd. Best, Max On Thu, Jun 18, 2015 at 4:11 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Maximilian, The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results? Thank you!! 2015-06-18 10:45 GMT-03:00 Maximilian Michels m...@apache.org: Hi Jean, I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for? Best, Max On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Fabian, I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console. Thanks! Jean Em 18/06/2015 04:55, Fabian Hueske fhue...@gmail.com escreveu: Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de: Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Re: Job Statistics
Hi! There are no I/O or record statistics collected at the moment. It is work in progress. Also a new Web Frontend that visualizes those is in the works, so this is going to improve soon, but for now, there is no easy way to grab those numbers. If you are interested in contributing, I could pull you into some of the discussions about collecting and reporting metrics. Greetings, Stephan On Thu, Jun 18, 2015 at 1:42 PM, Jean Bez jeanluca...@gmail.com wrote: Hi, I tried to view directly from the web interface but I could not find any other information about the completed jobs. I have the list, but when I open it, no further information is provided. Is this correct? 2015-06-18 15:10 GMT-03:00 Jean Bez jeanluca...@gmail.com: Hello Max, I will try to do that! Do you know if I could obtain data about the I/O and communication as well? From what I could understand I can get the runtime and the accumulator results only. Is that right? 2015-06-18 11:37 GMT-03:00 Maximilian Michels m...@apache.org: Hi Jean, As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment: JobExecutionResult result = env.execute(); System.out.println(runtime: result.getNetRuntime()); for (Map.EntryString, Object entry : result.getAllAccumulatorResults().entrySet()) { System.out.println(entry.getKey() + : entry.getValue()); } You would do that in your Flink program. You could also store metrics in the accumulators. However, since you're trying to compare different systems I'd advise you to use some external tools for monitoring resource usage like Ganglia or collectd. Best, Max On Thu, Jun 18, 2015 at 4:11 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Maximilian, The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results? Thank you!! 2015-06-18 10:45 GMT-03:00 Maximilian Michels m...@apache.org: Hi Jean, I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for? Best, Max On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez jeanluca...@gmail.com wrote: Hi Fabian, I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console. Thanks! Jean Em 18/06/2015 04:55, Fabian Hueske fhue...@gmail.com escreveu: Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de: Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Re: Job Statistics
Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean signature.asc Description: OpenPGP digital signature
Re: Job Statistics
Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax mj...@informatik.hu-berlin.de: Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean
Job Statistics
Hello, Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. Thank you! Regards, Jean