> > I'm not actually using Hive at the moment - in fact, I'm trying to avoid > it if I can. I'm just wondering whether Spark has anything similar I can > leverage? >
Let me clarify, you do not need to have Hive installed, and what I'm suggesting is completely self-contained in Spark SQL. We support the Hive Query Language for expressing partitioned tables when you are using a HiveContext, but the execution will be done using RDDs. If you don't manually configure a hive installation, Spark will just create a local metastore in the current directory. In the future we are planning to support non-HiveQL mechanisms for expressing partitioning.