Hi everyone, May I know if there is any tool available to analyze and tune the performance for Hive queries? And what is the state of the art?
I had some experience on tuning Pig, based on manually clicking JT web pages and collecting pieces of data from here and there, and guessing what might be wrong. That was a slow and uncomfortable process. So before I dive into Hive, I'd like to hear any experience from you. PS: for individual jobs, we built a tool called Starfish: http://www.cs.duke.edu/starfish/release.html . It can be used to analyze the job's performance and profile the job for auto-tuning. It could be used for Hive too, but now doesn't capture the Hive-related info, as well as the interaction among jobs. Thanks, Jie