If the link to PR/1819 is broken. Here is the one https://github.com/apache/spark/pull/1819.
On Sun, Aug 10, 2014 at 5:56 PM, Eric Friedman <eric.d.fried...@gmail.com> wrote: > Thanks Michael, I can try that too. > > I know you guys aren't in sales/marketing (thank G-d), but given all the > hoopla about the CDH<->DataBricks partnership, it'd be awesome if you guys > were somewhat more aligned, by which I mean that the DataBricks releases on > Apache that say "for CDH5" would actually work on CDH5. I know Cloudera has > to qualify them for support and so on, but if DataBricks development > treated mainstream CDH as the primary deployment target, well that would be > great. Of course I'm being selfish. *smile*. > > > > On Aug 10, 2014, at 2:43 PM, Michael Armbrust <mich...@databricks.com> > wrote: > > I imagine it's not the only instance of this kind of problem people >> will ever encounter. Can you rebuild Spark with this particular >> release of Hive? > > > Unfortunately the Hive APIs that we use change to much from release to > release to make this possible. There is a JIRA for compiling Spark SQL > against Hive 13: SPARK-2706 > <https://issues.apache.org/jira/browse/SPARK-2706>. > > if I try to add hive-exec-0.12.0-cdh5.0.3.jar to my SPARK_CLASSPATH, in >> order to get DeprecatedParquetInputFormat, I find out that there is an >> incompatibility in the SerDeUtils class. Spark's Hive snapshot expects to >> find > > > Instead of including CDH's version of Hive, I'd try just including the > Hive jars for Parquet from here: > http://mvnrepository.com/artifact/com.twitter/parquet-hive-bundle/1.5.0 > > However, support for this is a work in progress. You'll likely need to > make sure you have a version of Spark that includes this commit (added last > Friday) > https://github.com/apache/spark/commit/9016af3f2729101027e33593e094332f05f48d92 > > Another option would be to try this *experimental* patch: pr/1819. > >