Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Frank Heimerzheim Tue, 07 Feb 2017 06:18:08 -0800

Hello,

quite a while i´ve worked successfully with https://maven2repo.com/org.
apache.kudu/kudu-spark_2.10/1.2.0/jar


For a bit i ignored a problem with kudu datatype int8. With the connector i
can´t write int8 as int in python will always bring up errors like

"java.lang.IllegalArgumentException: id isn´t [Type: int64, size: 8, Tye:
unixtime_micros, size: 8], it´s int8"

As python isn´t hard typed the connector is trying to find a suitable type
for python int in java/kudu. Apparently the python int is matched to int64/
unixtime_micros and not int8 as kudu is expecting at this place.

As a quick solution all my int in kudu are int64 at the moment

In the long run i can´t accept this waste of hdd space or even worse I/O.
Any idea when i can store int8 from python/spark to kudu?

With the "normal" python api everything works fine, only the spark/kudu/python
connector brings up the problem.

As so often: Thanks in advance for your excellent help!

Frank

2016-12-13 12:12 GMT+01:00 Frank Heimerzheim <[email protected]>:

> Hello,
>
> within the impala-shell i can create an external table and thereafter
> select and insert data from an underlying kudu table. Within the statement
> for creation of the table an 'StorageHandler' will be set to
>  'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
> there exists apparently an *.jar with the referenced library within.
>
> When trying to select from a hive-shell there is an error that the handler
> is not available. Trying to 'rdd.collect()' from an hiveCtx within an
> sparkSession i also get an error JavaClassNotFoundException as
> the KuduStorageHandler is not available.
>
> I then tried to find a jar in my system with the intention to copy it to
> all my data nodes. Sadly i couldn´t find the specific jar. I think it
> exists in the system as impala apparently is using it. For a test i´ve
> changed the 'StorageHandler' in the creation statement to
> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
> worked. Also the select from impala, but i didin´t return any data. There
> was no error as i expected. The test was just for the case impala would in
> a magic way select data from kudu without an correct 'StorageHandler'.
> Apparently this is not the case and impala has access to an
>  'com.cloudera.kudu.hive.KuduStorageHandler'.
>
> Long story, short question:
> In which *.jar i can find the  'com.cloudera.kudu.hive.
> KuduStorageHandler'?
> Is the approach to copy the jar per hand to all nodes an appropriate way
> to bring spark in a position to work with kudu?
> What about the beeline-shell from hive and the possibility to read from
> kudu?
>
> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
> parcels. Build a working python-kudu library successfully from scratch (git)
>
> Thanks a lot!
> Frank
>

Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'

Reply via email to