Re: Standard practices for building dashboards for spark processed data

2020-02-26 Thread Breno Arosa
I have been using Athena/Presto to read the parquet files in datalake, if your are already saving data to s3 I think this is the easiest option. Then I use Redash or Metabase to build dashboards (they have different limitations), both are very intuitive to use and easy to setup with docker. --

Re: [External Email] Re: Standard practices for building dashboards for spark processed data

2020-02-26 Thread Aniruddha P Tekade
Hi Roland, Thank you for your reply. That's quite helpful. I think I should try influxDB then. But I am curious if in case of prometheus writing a custom exporter be a good choice and solve the purpose efficiently? Grafana is not something I want to drop. Best, Aniruddha --- ᐧ On Tue,

Re: Standard practices for building dashboards for spark processed data

2020-02-25 Thread Roland Johann
Hi Ani, Prometheus is not well suited for ingesting explicit timeseries data. Its purpose is for technical monitoring. If you want to monitor your spark jobs with prometheus you can publish the metrics so prometheus can scrape it. What you propably are looking for is a timeseries database that

Standard practices for building dashboards for spark processed data

2020-02-25 Thread Aniruddha P Tekade
Hello, I am trying to build a data pipeline that uses spark structured streaming with delta project and runs into Kubernetes. Due to this, I get my output files only into parquet format. Since I am asked to use the prometheus and grafana for building the dashboard for this pipeline, I run an