On Tue, Feb 7, 2017 at 6:17 AM, Frank Heimerzheim <[email protected]> wrote:
> Hello, > > quite a while i´ve worked successfully with https://maven2repo.com/org. > apache.kudu/kudu-spark_2.10/1.2.0/jar > > For a bit i ignored a problem with kudu datatype int8. With the connector > i can´t write int8 as int in python will always bring up errors like > > "java.lang.IllegalArgumentException: id isn´t [Type: int64, size: 8, Tye: > unixtime_micros, size: 8], it´s int8" > > As python isn´t hard typed the connector is trying to find a suitable type > for python int in java/kudu. Apparently the python int is matched to > int64/unixtime_micros and not int8 as kudu is expecting at this place. > > As a quick solution all my int in kudu are int64 at the moment > > In the long run i can´t accept this waste of hdd space or even worse I/O. > Any idea when i can store int8 from python/spark to kudu? > > With the "normal" python api everything works fine, only the spark/kudu/python > connector brings up the problem. > Not 100% sure I'm following. You're using pyspark here? Can you post a bit of sample code that reproduces the issue? -Todd > 2016-12-13 12:12 GMT+01:00 Frank Heimerzheim <[email protected]>: > >> Hello, >> >> within the impala-shell i can create an external table and thereafter >> select and insert data from an underlying kudu table. Within the statement >> for creation of the table an 'StorageHandler' will be set to >> 'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as >> there exists apparently an *.jar with the referenced library within. >> >> When trying to select from a hive-shell there is an error that the >> handler is not available. Trying to 'rdd.collect()' from an hiveCtx within >> an sparkSession i also get an error JavaClassNotFoundException as >> the KuduStorageHandler is not available. >> >> I then tried to find a jar in my system with the intention to copy it to >> all my data nodes. Sadly i couldn´t find the specific jar. I think it >> exists in the system as impala apparently is using it. For a test i´ve >> changed the 'StorageHandler' in the creation statement to >> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement >> worked. Also the select from impala, but i didin´t return any data. There >> was no error as i expected. The test was just for the case impala would in >> a magic way select data from kudu without an correct 'StorageHandler'. >> Apparently this is not the case and impala has access to an >> 'com.cloudera.kudu.hive.KuduStorageHandler'. >> >> Long story, short question: >> In which *.jar i can find the 'com.cloudera.kudu.hive.KuduS >> torageHandler'? >> Is the approach to copy the jar per hand to all nodes an appropriate way >> to bring spark in a position to work with kudu? >> What about the beeline-shell from hive and the possibility to read from >> kudu? >> >> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed >> parcels. Build a working python-kudu library successfully from scratch (git) >> >> Thanks a lot! >> Frank >> > > -- Todd Lipcon Software Engineer, Cloudera
