hi,all
I want to test carbondata by using tpc-ds data , I try to load table :
catalog_returns
I got error :
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
There is an unexpected error: unable to generate the mdkey
at
org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:125)
at
org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:48)
at
org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.<init>(NewCarbonDataLoadRDD.scala:243)
at
org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.compute(NewCarbonDataLoadRDD.scala:220)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
unable to generate the mdkey
at
org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.processBatch(DataWriterProcessorStepImpl.java:181)
at
org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:111)
... 11 more
Caused by: java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask@67098e0 rejected from
java.util.concurrent.ThreadPoolExecutor@5b91b608[Shutting down, pool size = 1,
active threads = 1, queued tasks = 0, completed tasks = 24]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.addDataToStore(CarbonFactDataHandlerColumnar.java:466)
at
org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.processBatch(DataWriterProcessorStepImpl.java:178)
... 12 more
my table creat with sql :
create table if not exists tpcds_carbon.catalog_returns
(
cr_returned_date_sk int ,
cr_returned_time_sk int ,
cr_item_sk int ,
cr_refunded_customer_sk int ,
cr_refunded_cdemo_sk int ,
cr_refunded_hdemo_sk int ,
cr_refunded_addr_sk int ,
cr_returning_customer_sk int ,
cr_returning_cdemo_sk int ,
cr_returning_hdemo_sk int ,
cr_returning_addr_sk int ,
cr_call_center_sk int ,
cr_catalog_page_sk int ,
cr_ship_mode_sk int ,
cr_warehouse_sk int ,
cr_reason_sk int ,
cr_order_number int ,
cr_return_quantity int ,
cr_return_amount decimal(7,2) ,
cr_return_tax decimal(7,2) ,
cr_return_amt_inc_tax decimal(7,2) ,
cr_fee decimal(7,2) ,
cr_return_ship_cost decimal(7,2) ,
cr_refunded_cash decimal(7,2) ,
cr_reversed_charge decimal(7,2) ,
cr_store_credit decimal(7,2) ,
cr_net_loss decimal(7,2)
)STORED BY 'carbondata'
TBLPROPERTIES ('DICTIONARY_INCLUDE'='cr_item_sk,cr_order_number') //because
these two cols are the real PK
and I load data with :
carbon.sql("load data inpath
'hdfs://AAA:9000/tpcds/source/catalog_returns/catalog_returns_1_4.dat' into
table carbon_catalog_returns2
OPTIONS('DELIMITER'='|','fileheader'='cr_returned_date_sk,cr_returned_time_sk,cr_item_sk,cr_refunded_customer_sk,cr_refunded_cdemo_sk,cr_refunded_hdemo_sk,cr_refunded_addr_sk,cr_returning_customer_sk,cr_returning_cdemo_sk,cr_returning_hdemo_sk,cr_returning_addr_sk,cr_call_center_sk,cr_catalog_page_sk,cr_ship_mode_sk,cr_warehouse_sk,cr_reason_sk,cr_order_number,cr_return_quantity,cr_return_amount,cr_return_tax,cr_return_amt_inc_tax,cr_fee,cr_return_ship_cost,cr_refunded_cash,cr_reversed_charge,cr_store_credit,cr_net_loss')")
any one know what was wrong?
2017-08-24
lk_hadoop