> I wanted to know why is it necessary to remove the Hive jars from the >Spark build as mentioned on this
Because SparkSQL was originally based on Hive & still uses Hive AST to parse SQL. The org.apache.spark.sql.hive package contains the parser which has hard-references to the hive's internal AST, which is unfortunately auto-generated code (HiveParser.TOK_TABNAME etc). Everytime Hive makes a release, those constants change in value and that is private API because of the lack of backwards-compat, which is violated by SparkSQL. So Hive-on-Spark forces mismatched versions of Hive classes, because it's a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic laws of causality. Spark cannot depend on a version of Hive that is unreleased and Hive-on-Spark release cannot depend on a version of Spark that is unreleased. Cheers, Gopal