Hi Sparkers, hoping for insight here: running a simple describe mytable here where mytable is a partitioned Hive table.
Spark produces the following times: Query 1 of 1, Rows read: 50, Elapsed time (seconds) - Total: 73.02, SQL query: 72.831, Reading results: 0.189 Whereas Hive over the same metastore shows: Query 1 of 1, Rows read: 47, Elapsed time (seconds) - Total: 0.44, SQL query: 0.204, Reading results: 0.236 I am looking at the metastore as Thriftserver couldn't start up at all until I increased hive.metastore.client.socket.timeout to 600 Why would metastore access from Spark's Thriftserver be so much worse than from Hive? The issue is pretty urgent for me as I ran into this problem during a push to a production cluster (QA metastore table is smaller and it's a different cluster that didn't show this). Is there a known issue with metastore access -- I only see https://issues.apache.org/jira/browse/SPARK-5923 but I'm using Postgres. We are upgrading from Shark and both Hive and Shark process this a lot faster. Describe table in itself is not a critical query for me but I am experiencing performance hit in other queries and I'm suspecting the metastore interaction (e.g. https://www.mail-archive.com/user@spark.apache.org/msg26242.html)