Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Frank Heimerzheim Tue, 13 Dec 2016 03:12:28 -0800

Hello,

within the impala-shell i can create an external table and thereafter
select and insert data from an underlying kudu table. Within the statement
for creation of the table an 'StorageHandler' will be set to
 'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
there exists apparently an *.jar with the referenced library within.


When trying to select from a hive-shell there is an error that the handler
is not available. Trying to 'rdd.collect()' from an hiveCtx within an
sparkSession i also get an error JavaClassNotFoundException as
the KuduStorageHandler is not available.

I then tried to find a jar in my system with the intention to copy it to
all my data nodes. Sadly i couldn´t find the specific jar. I think it
exists in the system as impala apparently is using it. For a test i´ve
changed the 'StorageHandler' in the creation statement to
'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
worked. Also the select from impala, but i didin´t return any data. There
was no error as i expected. The test was just for the case impala would in
a magic way select data from kudu without an correct 'StorageHandler'.
Apparently this is not the case and impala has access to an
 'com.cloudera.kudu.hive.KuduStorageHandler'.

Long story, short question:
In which *.jar i can find the  'com.cloudera.kudu.hive.KuduStorageHandler'?
Is the approach to copy the jar per hand to all nodes an appropriate way to
bring spark in a position to work with kudu?
What about the beeline-shell from hive and the possibility to read from
kudu?

My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
parcels. Build a working python-kudu library successfully from scratch (git)

Thanks a lot!
Frank

Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Reply via email to