Re: Aggregation of Streaming UI Statistics for multiple jobs
You can explore the rest API - https://spark.apache.org/docs/2.0.2/monitoring.html#rest-api On Sun, May 27, 2018 at 10:18 AM, skmishrawrote: > Hi, > > I am working on a streaming use case where I need to run multiple spark > streaming applications at the same time and measure the throughput and > latencies. The spark UI provides all the statistics, but if I want to run > more than 100 applications at the same time then I do not have any clue on > how to aggregate these statistics. Opening 100 windows and collecting all > the data does not seem to be an easy job. Hence, if you could provide any > help on how to collect these statistics from code, then I can write a > script > to run my experiment. Any help is greatly appreciated. Thanks in advance. > > Regards, > Sitakanta Mishra > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Big data visualization
Yes Amin Spark is primarily being used for ETL. Once you transform , you can store it in any nosql DBs that support use case. The BI dashboard app can further connect to the nosql DB for reports and visualization. HTH Deepak. On Mon, May 28, 2018, 05:47 amin mohebbiwrote: > I am working on analytic application using Apache Spark to store and > analyze data. Spark might be used as a ETL application to aggregate > different metrics and then join with the aggregated metrics. The data > sources are flat files that are coming from two different sources(interval > meter data and customer information) on a daily basis(65Gb per day - time > series data). The end users are BI users, so we cannot provide them > notebook visualization. They only can use Power BI , Tableua or Excel to > do self service filters for run time analytics, graphing the data and > reporting. > > So, my question is that what is the best tools to implement this pipeline? > I do not think storing parquet or orc in file system is a good choice in > production, and I think we have to deposit the data somewhere (time series > or standard db) , please correct me if I am wrong. > > 1- where to store the data? files system/time series db/azure cosmos / > standard db? > 2- Is it right way to do to use spark as to etl and aggregation > application , store it somewhere and use power bi for reporting and > dashboard purposes? > Best Regards ... Amin > Mohebbi PhD candidate in Software Engineering at university of Malaysia > Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my > amin_...@me.com >
Big data visualization
I am working on analytic application using Apache Spark to store and analyze data. Spark might be used as a ETL application to aggregate different metrics and then join with the aggregated metrics. The data sources are flat files that are coming from two different sources(interval meter data and customer information) on a daily basis(65Gb per day - time series data). The end users are BI users, so we cannot provide them notebook visualization. They only can use Power BI , Tableua or Excel to do self service filters for run time analytics, graphing the data and reporting. So, my question is that what is the best tools to implement this pipeline? I do not think storing parquet or orc in file system is a good choice in production, and I think we have to deposit the data somewhere (time series or standard db) , please correct me if I am wrong. 1- where to store the data? files system/time series db/azure cosmos / standard db?2- Is it right way to do to use spark as to etl and aggregation application , store it somewhere and use power bi for reporting and dashboard purposes? Best Regards ... Amin Mohebbi PhD candidate in Software Engineering at university of Malaysia Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my amin_...@me.com
Spark AsyncEventQueue doubt
Hi, I'm getting the below ERROR and WARN when running a little heavy calculation on a dataset - To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2018-05-27 12:51:11 ERROR AsyncEventQueue:70 - Dropping event from queue > appStatus. This likely means one of the listeners is too slow and cannot > keep up with the rate at which tasks are being started by the scheduler. > 2018-05-27 12:51:11 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Thu Jan > 01 05:30:00 IST 1970. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 12:52:14 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:51:11 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 12:53:14 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:52:14 IST 2018. > 2018-05-27 12:54:14 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:53:14 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 12:55:14 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:54:14 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 12:56:15 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:55:14 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 12:57:32 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:56:15 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 12:58:32 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:57:32 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 12:59:33 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:58:32 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 13:00:34 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 12:59:33 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 13:01:35 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 13:00:34 IST 2018. > [Stage 7784:>(9 + 3) / 34][Stage 7786:(53 + 38) / 200][Stage 7789:(74 + > 53) / 200]2018-05-27 13:02:36 WARN AsyncEventQueue:66 - Dropped > com.codahale.metrics.Counter@1d8423d1 events from appStatus since Sun May > 27 13:01:35 IST 2018. > Even though my job is not failing but why am I getting these? Thanks, Aakash.