Re: Clustering and Large-scale analysis of Hive Queries

2018-08-03 Thread Gopal Vijayaraghavan
> I am interested in working on a project that takes a large number of Hive > queries (as well as their meta data like amount of resources used etc) and > find out common sub queries and expensive query groups etc. This was roughly the central research topic of one of the Hive CBO devs,

Re: Clustering and Large-scale analysis of Hive Queries

2018-07-26 Thread Thai Bui
I don’t see any project especially tuned for Hive doing what you described. I have encountered this problem recently as the number of users and queries grew exponentially in my company and I’m interested. We are currently collecting Hive internal metrics in order to do certain analysis (don’t

Re: Clustering and Large-scale analysis of Hive Queries

2018-07-25 Thread Johannes Alberti
Did you guys already look at Dr Elephant? https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark Not sure if there is anything you might find useful, but I would be interested in hearing about good and bad about Dr Elephant w/ Hive.

Clustering and Large-scale analysis of Hive Queries

2018-07-25 Thread Zheng Shao
Hi, I am interested in working on a project that takes a large number of Hive queries (as well as their meta data like amount of resources used etc) and find out common sub queries and expensive query groups etc. Are there any existing work in this domain? Happy to collaborate as well if there