hi all, when I use carbondata to run a query "select count(*) from
action_carbondata where starttimestr = 20180301;", then an error occurs. This
is the error info:
###################
0: jdbc:hive2://localhost:10000> select count(*) from action_carbondata where
starttimestr = 20180301;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task
12 in stage 7.0 failed 4 times, most recent failure: Lost task 12.3 in stage
7.0 (TID 173, sz-pg-entanalytics-research-001.tendcloud.com, executor 1):
org.apache.spark.util.TaskCompletionListenerException:
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException:
Previous exception in task: java.util.concurrent.ExecutionException:
java.util.concurrent.ExecutionException: java.io.IOException:
org.apache.thrift.protocol.TProtocolException: Required field 'data_chunk_list'
was not present! Struct: DataChunk3(data_chunk_list:null)
org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.updateScanner(AbstractDataBlockIterator.java:136)
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:64)
org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46)
org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283)
org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171)
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:391)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
Source)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown
Source)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
org.apache.spark.scheduler.Task.run(Task.scala:108)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
at
org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:118)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace: (state=,code=0)
###################
create table statement:
CREATE TABLE action_carbondata(
cur_appversioncode integer,
cur_appversionname integer,
cur_browserid integer,
cur_carrierid integer,
cur_channelid integer,
cur_cityid integer,
cur_countryid integer,
cur_ip string,
cur_networkid integer,
cur_osid integer,
cur_provinceid integer,
deviceproductoffset long,
duration integer,
eventcount integer,
eventlabelid integer,
eventtypeid integer,
organizationid integer,
platformid integer,
productid integer,
relatedaccountproductoffset long,
sessionduration integer,
sessionid string,
sessionstarttime long,
sessionstatus integer,
sourceid integer,
starttime long,
starttimestr string )
partitioned by (eventid int)
STORED BY 'carbondata'
TBLPROPERTIES ('partition_type'='Hash','NUM_PARTITIONS'='39',
'SORT_COLUMNS'='productid,sourceid,starttimestr,platformid,organizationid,eventtypeid,eventlabelid,cur_channelid,cur_provinceid,cur_countryid,cur_cityid,cur_osid,cur_appversioncode,cur_appversionname,cur_carrierid,cur_networkid,cur_browserid,sessionstatus,cur_ip');
The value of "starttimestr" field:
20180303
20180304.
any advice is appreciated.