Hello, quite a while i´ve worked successfully with https://maven2repo.com/org. apache.kudu/kudu-spark_2.10/1.2.0/jar
For a bit i ignored a problem with kudu datatype int8. With the connector i can´t write int8 as int in python will always bring up errors like "java.lang.IllegalArgumentException: id isn´t [Type: int64, size: 8, Tye: unixtime_micros, size: 8], it´s int8" As python isn´t hard typed the connector is trying to find a suitable type for python int in java/kudu. Apparently the python int is matched to int64/ unixtime_micros and not int8 as kudu is expecting at this place. As a quick solution all my int in kudu are int64 at the moment In the long run i can´t accept this waste of hdd space or even worse I/O. Any idea when i can store int8 from python/spark to kudu? With the "normal" python api everything works fine, only the spark/kudu/python connector brings up the problem. As so often: Thanks in advance for your excellent help! Frank 2016-12-13 12:12 GMT+01:00 Frank Heimerzheim <[email protected]>: > Hello, > > within the impala-shell i can create an external table and thereafter > select and insert data from an underlying kudu table. Within the statement > for creation of the table an 'StorageHandler' will be set to > 'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as > there exists apparently an *.jar with the referenced library within. > > When trying to select from a hive-shell there is an error that the handler > is not available. Trying to 'rdd.collect()' from an hiveCtx within an > sparkSession i also get an error JavaClassNotFoundException as > the KuduStorageHandler is not available. > > I then tried to find a jar in my system with the intention to copy it to > all my data nodes. Sadly i couldn´t find the specific jar. I think it > exists in the system as impala apparently is using it. For a test i´ve > changed the 'StorageHandler' in the creation statement to > 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement > worked. Also the select from impala, but i didin´t return any data. There > was no error as i expected. The test was just for the case impala would in > a magic way select data from kudu without an correct 'StorageHandler'. > Apparently this is not the case and impala has access to an > 'com.cloudera.kudu.hive.KuduStorageHandler'. > > Long story, short question: > In which *.jar i can find the 'com.cloudera.kudu.hive. > KuduStorageHandler'? > Is the approach to copy the jar per hand to all nodes an appropriate way > to bring spark in a position to work with kudu? > What about the beeline-shell from hive and the possibility to read from > kudu? > > My Environment: Cloudera 5.7 with kudu and impala-kudu from installed > parcels. Build a working python-kudu library successfully from scratch (git) > > Thanks a lot! > Frank >
