Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Frank Heimerzheim Mon, 09 Jan 2017 02:54:57 -0800

Hello Todd,

one additional question:


There exists a KuduContext in org.apache.kudu.spark.kudu._ which provides
read/write/update to be used with scala and spark. I´m now looking fo a
similar solution for python and spark. I´ve found
https://github.com/bkvarda/iot_demo which looks fine on a first look. But i
would much more prever an "official"  solution. Is there anything to be
expected in the near future? Or a way - i don´t know yet - to use the scala
library from python?

Thanks
Frank

2016-12-13 16:05 GMT+01:00 Frank Heimerzheim <[email protected]>:

> Hello Todd,
>
> thanks a lot for the clarification.
>
> Greetings
> Frank
>
> 2016-12-13 15:36 GMT+01:00 Todd Lipcon <[email protected]>:
>
>> Hi Frank,
>>
>> I'm sorry to say that the Java storage handler implementation you're
>> looking for doesn't exist. The Hive metastore requires that non-HDFS
>> storage engines set some value for the 'storage handler' property, so
>> Impala uses that special string to denote a Kudu table in the HMS. However,
>> there is no such Java implementation- Impala detects this class name and
>> uses its own implementation to plan and execute queries against Kudu.
>>
>> The Hive support for Kudu is tracked here: https://issues.apache.or
>> g/jira/browse/HIVE-12971
>> This work isn't committed to the Hive project but there is a prototype on
>> github that you could try. Note that it's not being actively developed by
>> the Kudu dev community at this point in time, but if you get it working,
>> please report back with your experiences.
>>
>> Thanks
>> -Todd
>>
>> On Tue, Dec 13, 2016 at 6:12 PM, Frank Heimerzheim <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> within the impala-shell i can create an external table and thereafter
>>> select and insert data from an underlying kudu table. Within the statement
>>> for creation of the table an 'StorageHandler' will be set to
>>>  'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
>>> there exists apparently an *.jar with the referenced library within.
>>>
>>> When trying to select from a hive-shell there is an error that the
>>> handler is not available. Trying to 'rdd.collect()' from an hiveCtx within
>>> an sparkSession i also get an error JavaClassNotFoundException as
>>> the KuduStorageHandler is not available.
>>>
>>> I then tried to find a jar in my system with the intention to copy it to
>>> all my data nodes. Sadly i couldn´t find the specific jar. I think it
>>> exists in the system as impala apparently is using it. For a test i´ve
>>> changed the 'StorageHandler' in the creation statement to
>>> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
>>> worked. Also the select from impala, but i didin´t return any data. There
>>> was no error as i expected. The test was just for the case impala would in
>>> a magic way select data from kudu without an correct 'StorageHandler'.
>>> Apparently this is not the case and impala has access to an
>>>  'com.cloudera.kudu.hive.KuduStorageHandler'.
>>>
>>> Long story, short question:
>>> In which *.jar i can find the  'com.cloudera.kudu.hive.KuduS
>>> torageHandler'?
>>> Is the approach to copy the jar per hand to all nodes an appropriate way
>>> to bring spark in a position to work with kudu?
>>> What about the beeline-shell from hive and the possibility to read from
>>> kudu?
>>>
>>> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
>>> parcels. Build a working python-kudu library successfully from scratch (git)
>>>
>>> Thanks a lot!
>>> Frank
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Reply via email to