Hi, Which version of carbon and spark are you using? How much data are you loading and what is the machine configuration?
I have tried loading catlog_returns with 20 MB data in my local machine and it is successful. I used the latest master branch and spark-2.1 version. Also please send the complete log as the log information you provided does not say the actual cause. Regards, Ravindra. On 24 August 2017 at 14:02, lk_hadoop <[email protected]> wrote: > hi,all > I want to test carbondata by using tpc-ds data , I try to load > table : catalog_returns > I got error : > > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > There is an unexpected error: unable to generate the mdkey > at org.apache.carbondata.processing.newflow.steps. > DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:125) > at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute( > DataLoadExecutor.java:48) > at org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.<init>( > NewCarbonDataLoadRDD.scala:243) > at org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD. > compute(NewCarbonDataLoadRDD.scala:220) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: > unable to generate the mdkey > at org.apache.carbondata.processing.newflow.steps. > DataWriterProcessorStepImpl.processBatch(DataWriterProcessorStepImpl. > java:181) > at org.apache.carbondata.processing.newflow.steps. > DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:111) > ... 11 more > Caused by: java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.FutureTask@67098e0 rejected from > java.util.concurrent.ThreadPoolExecutor@5b91b608[Shutting down, pool size > = 1, active threads = 1, queued tasks = 0, completed tasks = 24] > at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution( > ThreadPoolExecutor.java:2047) > at java.util.concurrent.ThreadPoolExecutor.reject( > ThreadPoolExecutor.java:823) > at java.util.concurrent.ThreadPoolExecutor.execute( > ThreadPoolExecutor.java:1369) > at java.util.concurrent.AbstractExecutorService.submit( > AbstractExecutorService.java:134) > at org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar. > addDataToStore(CarbonFactDataHandlerColumnar.java:466) > at org.apache.carbondata.processing.newflow.steps. > DataWriterProcessorStepImpl.processBatch(DataWriterProcessorStepImpl. > java:178) > ... 12 more > > my table creat with sql : > create table if not exists tpcds_carbon.catalog_returns > ( > cr_returned_date_sk int , > cr_returned_time_sk int , > cr_item_sk int , > cr_refunded_customer_sk int , > cr_refunded_cdemo_sk int , > cr_refunded_hdemo_sk int , > cr_refunded_addr_sk int , > cr_returning_customer_sk int , > cr_returning_cdemo_sk int , > cr_returning_hdemo_sk int , > cr_returning_addr_sk int , > cr_call_center_sk int , > cr_catalog_page_sk int , > cr_ship_mode_sk int , > cr_warehouse_sk int , > cr_reason_sk int , > cr_order_number int , > cr_return_quantity int , > cr_return_amount decimal(7,2) , > cr_return_tax decimal(7,2) , > cr_return_amt_inc_tax decimal(7,2) , > cr_fee decimal(7,2) , > cr_return_ship_cost decimal(7,2) , > cr_refunded_cash decimal(7,2) , > cr_reversed_charge decimal(7,2) , > cr_store_credit decimal(7,2) , > cr_net_loss decimal(7,2) > )STORED BY 'carbondata' > TBLPROPERTIES ('DICTIONARY_INCLUDE'='cr_item_sk,cr_order_number') > //because these two cols are the real PK > > and I load data with : > > carbon.sql("load data inpath 'hdfs://AAA:9000/tpcds/source/ > catalog_returns/catalog_returns_1_4.dat' into table > carbon_catalog_returns2 OPTIONS('DELIMITER'='|',' > fileheader'='cr_returned_date_sk,cr_returned_time_sk,cr_ > item_sk,cr_refunded_customer_sk,cr_refunded_cdemo_sk,cr_ > refunded_hdemo_sk,cr_refunded_addr_sk,cr_returning_customer_ > sk,cr_returning_cdemo_sk,cr_returning_hdemo_sk,cr_ > returning_addr_sk,cr_call_center_sk,cr_catalog_page_sk, > cr_ship_mode_sk,cr_warehouse_sk,cr_reason_sk,cr_order_ > number,cr_return_quantity,cr_return_amount,cr_return_tax, > cr_return_amt_inc_tax,cr_fee,cr_return_ship_cost,cr_ > refunded_cash,cr_reversed_charge,cr_store_credit,cr_net_loss')") > > any one know what was wrong? > > > > > 2017-08-24 > ------------------------------ > lk_hadoop > -- Thanks & Regards, Ravi
