Re: CDH5, HiveContext, Parquet

Eric Friedman Sun, 10 Aug 2014 14:57:07 -0700

Thanks Michael, I can try that too. 

I know you guys aren't in sales/marketing (thank G-d), but given all the hoopla 
about the CDH<->DataBricks partnership, it'd be awesome if you guys were 
somewhat more aligned, by which I mean that the DataBricks releases on Apache 
that say "for CDH5" would actually work on CDH5. I know Cloudera has to qualify 
them for support and so on, but if DataBricks development treated mainstream 
CDH as the primary deployment target, well that would be great.  Of course I'm 
being selfish.  *smile*.




On Aug 10, 2014, at 2:43 PM, Michael Armbrust <mich...@databricks.com> wrote:

>> I imagine it's not the only instance of this kind of problem people
>> will ever encounter. Can you rebuild Spark with this particular
>> release of Hive?
> 
> Unfortunately the Hive APIs that we use change to much from release to 
> release to make this possible.  There is a JIRA for compiling Spark SQL 
> against Hive 13: SPARK-2706.
> 
>> if I try to add hive-exec-0.12.0-cdh5.0.3.jar to my SPARK_CLASSPATH, in 
>> order to get DeprecatedParquetInputFormat, I find out that there is an 
>> incompatibility in the SerDeUtils class.  Spark's Hive snapshot expects to 
>> find 
> 
> 
> Instead of including CDH's version of Hive, I'd try just including the Hive 
> jars for Parquet from here: 
> http://mvnrepository.com/artifact/com.twitter/parquet-hive-bundle/1.5.0
> 
> However, support for this is a work in progress.  You'll likely need to make 
> sure you have a version of Spark that includes this commit (added last 
> Friday) 
> https://github.com/apache/spark/commit/9016af3f2729101027e33593e094332f05f48d92
> 
> Another option would be to try this experimental patch: pr/1819.

Re: CDH5, HiveContext, Parquet

Reply via email to