Re: Select" query failed when executing "COMPACT" and "CLEAN".
Hi I can't reproduce it with "spark2.1+ carbondata1.1.1" Maybe not completely finish compaction , then you do the query. Have you tried: Execute the query in the same shell after " compaction and clean"? Regards Liang yixu2001 wrote > dev > > > spark2.1+ carbondata1.1.1 > > "Select" query failed when executing "COMPACT" and "CLEAN". > > After my cluster running for a period of time, there are so many fragments > generated in it. Then I input the following 2 command. > "ALTER table e_carbon.offer COMPACT 'MAJOR';" > "CLEAN FILES FOR TABLE e_carbon.offer;" > > When the above 2 command are being executed, I execute a query statement > in another shell such as: > "select offer_id,count(*) from e_carbon.offer group by offer_id having > count(*)>1" > > It failed with the following infomation: > "scala> cc.sql("select offer_id,count(*) from e_carbon.offer group by > offer_id having count(*)>1").show; > > > [Stage 163:==> (60 + 9) > / 173]18/01/16 16:38:17 WARN scheduler.TaskSetManager: Lost task 149.0 in > stage 163.0 (TID 6854, HDD012, executor 106): > org.apache.spark.util.TaskCompletionListenerException: > java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: java.io.FileNotFoundException: > File does not exist: > /user/e_carbon/public/carbon.store/e_carbon/offer_inst_c/Fact/Part0/Segment_22.15/part-0-7_batchno0-0-1516091074114.carbondata > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105) > at org.apache.spark.scheduler.Task.run(Task.scala:112) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > 18/01/16 16:38:17 WARN scheduler.TaskSetManager: Lost task 150.0 in stage > 163.0 (TID 6855, HDD013, executor 95): > org.apache.carbondata.core.datastore.exception.IndexBuilderException: > Block B-tree loading failed > at > org.apache.carbondata.core.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:286) > at > org.apache.carbondata.core.datastore.BlockIndexStore.getAll(BlockIndexStore.java:194) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:132) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:187) > at > org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:36) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:112) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:213) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spar
Select" query failed when executing "COMPACT" and "CLEAN".
dev spark2.1+ carbondata1.1.1 "Select" query failed when executing "COMPACT" and "CLEAN". After my cluster running for a period of time, there are so many fragments generated in it. Then I input the following 2 command. "ALTER table e_carbon.offer COMPACT 'MAJOR';" "CLEAN FILES FOR TABLE e_carbon.offer;" When the above 2 command are being executed, I execute a query statement in another shell such as: "select offer_id,count(*) from e_carbon.offer group by offer_id having count(*)>1" It failed with the following infomation: "scala> cc.sql("select offer_id,count(*) from e_carbon.offer group by offer_id having count(*)>1").show; [Stage 163:==> (60 + 9) / 173]18/01/16 16:38:17 WARN scheduler.TaskSetManager: Lost task 149.0 in stage 163.0 (TID 6854, HDD012, executor 106): org.apache.spark.util.TaskCompletionListenerException: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File does not exist: /user/e_carbon/public/carbon.store/e_carbon/offer_inst_c/Fact/Part0/Segment_22.15/part-0-7_batchno0-0-1516091074114.carbondata at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105) at org.apache.spark.scheduler.Task.run(Task.scala:112) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/01/16 16:38:17 WARN scheduler.TaskSetManager: Lost task 150.0 in stage 163.0 (TID 6855, HDD013, executor 95): org.apache.carbondata.core.datastore.exception.IndexBuilderException: Block B-tree loading failed at org.apache.carbondata.core.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:286) at org.apache.carbondata.core.datastore.BlockIndexStore.getAll(BlockIndexStore.java:194) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:132) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:187) at org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:36) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:112) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:213) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolEx