Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Frank Heimerzheim Tue, 13 Dec 2016 07:06:01 -0800

Hello Todd,

thanks a lot for the clarification.


Greetings
Frank

2016-12-13 15:36 GMT+01:00 Todd Lipcon <[email protected]>:

> Hi Frank,
>
> I'm sorry to say that the Java storage handler implementation you're
> looking for doesn't exist. The Hive metastore requires that non-HDFS
> storage engines set some value for the 'storage handler' property, so
> Impala uses that special string to denote a Kudu table in the HMS. However,
> there is no such Java implementation- Impala detects this class name and
> uses its own implementation to plan and execute queries against Kudu.
>
> The Hive support for Kudu is tracked here: https://issues.apache.
> org/jira/browse/HIVE-12971
> This work isn't committed to the Hive project but there is a prototype on
> github that you could try. Note that it's not being actively developed by
> the Kudu dev community at this point in time, but if you get it working,
> please report back with your experiences.
>
> Thanks
> -Todd
>
> On Tue, Dec 13, 2016 at 6:12 PM, Frank Heimerzheim <[email protected]>
> wrote:
>
>> Hello,
>>
>> within the impala-shell i can create an external table and thereafter
>> select and insert data from an underlying kudu table. Within the statement
>> for creation of the table an 'StorageHandler' will be set to
>>  'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
>> there exists apparently an *.jar with the referenced library within.
>>
>> When trying to select from a hive-shell there is an error that the
>> handler is not available. Trying to 'rdd.collect()' from an hiveCtx within
>> an sparkSession i also get an error JavaClassNotFoundException as
>> the KuduStorageHandler is not available.
>>
>> I then tried to find a jar in my system with the intention to copy it to
>> all my data nodes. Sadly i couldn´t find the specific jar. I think it
>> exists in the system as impala apparently is using it. For a test i´ve
>> changed the 'StorageHandler' in the creation statement to
>> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
>> worked. Also the select from impala, but i didin´t return any data. There
>> was no error as i expected. The test was just for the case impala would in
>> a magic way select data from kudu without an correct 'StorageHandler'.
>> Apparently this is not the case and impala has access to an
>>  'com.cloudera.kudu.hive.KuduStorageHandler'.
>>
>> Long story, short question:
>> In which *.jar i can find the  'com.cloudera.kudu.hive.KuduS
>> torageHandler'?
>> Is the approach to copy the jar per hand to all nodes an appropriate way
>> to bring spark in a position to work with kudu?
>> What about the beeline-shell from hive and the possibility to read from
>> kudu?
>>
>> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
>> parcels. Build a working python-kudu library successfully from scratch (git)
>>
>> Thanks a lot!
>> Frank
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Reply via email to