Chetan Bhat created CARBONDATA-4321:
---------------------------------------

             Summary: Major Compaction of a table with multiple big data loads 
each having different sort scopes fails
                 Key: CARBONDATA-4321
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4321
             Project: CarbonData
          Issue Type: Bug
          Components: data-load
    Affects Versions: 2.3.0
         Environment: SUSE/Cent OS, Spark 3.1.1
            Reporter: Chetan Bhat
         Attachments: Failure_Logs.txt

Test Steps :

>From Spark beeline table is created with compression format gzip, table having 
>more than 100 columns.

3 big data loads each with different sort scopes are loaded in the table.

Major compaction is executed on the table.

create table JL_r3
(
p_cap_time String,
city String,
product_code String,
user_base_station String,
user_belong_area_code String,
user_num String,
user_imsi String,
user_id String,
user_msisdn String,
dim1 String,
dim2 String,
dim3 String,
dim4 String,
dim5 String,
dim6 String,
dim7 String,
dim8 String,
dim9 String,
dim10 String,
dim11 String,
dim12 String,
dim13 String,
dim14 String,
dim15 String,
dim16 String,
dim17 String,
dim18 String,
dim19 String,
dim20 String,
dim21 String,
dim22 String,
dim23 String,
dim24 String,
dim25 String,
dim26 String,
dim27 String,
dim28 String,
dim29 String,
dim30 String,
dim31 String,
dim32 String,
dim33 String,
dim34 String,
dim35 String,
dim36 String,
dim37 String,
dim38 String,
dim39 String,
dim40 String,
dim41 String,
dim42 String,
dim43 String,
dim44 String,
dim45 String,
dim46 String,
dim47 String,
dim48 String,
dim49 String,
dim50 String,
dim51 String,
dim52 String,
dim53 String,
dim54 String,
dim55 String,
dim56 String,
dim57 String,
dim58 String,
dim59 String,
dim60 String,
dim61 String,
dim62 String,
dim63 String,
dim64 String,
dim65 String,
dim66 String,
dim67 String,
dim68 String,
dim69 String,
dim70 String,
dim71 String,
dim72 String,
dim73 String,
dim74 String,
dim75 String,
dim76 String,
dim77 String,
dim78 String,
dim79 String,
dim80 String,
dim81 String,
M1 double,
M2 double,
M3 double,
M4 double,
M5 double,
M6 double,
M7 double,
M8 double,
M9 double,
M10 double )
stored as carbondata
TBLPROPERTIES('table_blocksize'='256','sort_columns'='dim81','carbon.column.compressor'='gzip');

0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA inpath 
'hdfs://hacluster/chetan/Bigdata_bulk.csv' into table JL_r3 
options('sort_scope'='global_sort','DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','BAD_RECORDS_LOGGER_ENABLE'='TRUE','IS_EMPTY_DATA_BAD_RECORD'='TRUE','FILEHEADER'='p_cap_time,city,product_code,user_base_station,user_belong_area_code,user_num,user_imsi,user_id,user_msisdn,dim1,dim2,dim3,dim4,dim5,dim6,dim7,dim8,dim9,dim10,dim11,dim12,dim13,dim14,dim15,dim16,dim17,dim18,dim19,dim20,dim21,dim22,dim23,dim24,dim25,dim26,dim27,dim28,dim29,dim30,dim31,dim32,dim33,dim34,dim35,dim36,dim37,dim38,dim39,dim40,dim41,dim42,dim43,dim44,dim45,dim46,dim47,dim48,dim49,dim50,dim51,dim52,dim53,dim54,dim55,dim56,dim57,dim58,dim59,dim60,dim61,dim62,dim63,dim64,dim65,dim66,dim67,dim68,dim69,dim70,dim71,dim72,dim73,dim74,dim75,dim76,dim77,dim78,dim79,dim80,dim81,M1,M2,M3,M4,M5,M6,M7,M8,M9,M10');
+-------------+
| Segment ID  |
+-------------+
| 0           |
+-------------+
1 row selected (41.011 seconds)
0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA inpath 
'hdfs://hacluster/chetan/Bigdata_bulk.csv' into table JL_r3 
options('sort_scope'='local_sort','DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','BAD_RECORDS_LOGGER_ENABLE'='TRUE','IS_EMPTY_DATA_BAD_RECORD'='TRUE','FILEHEADER'='p_cap_time,city,product_code,user_base_station,user_belong_area_code,user_num,user_imsi,user_id,user_msisdn,dim1,dim2,dim3,dim4,dim5,dim6,dim7,dim8,dim9,dim10,dim11,dim12,dim13,dim14,dim15,dim16,dim17,dim18,dim19,dim20,dim21,dim22,dim23,dim24,dim25,dim26,dim27,dim28,dim29,dim30,dim31,dim32,dim33,dim34,dim35,dim36,dim37,dim38,dim39,dim40,dim41,dim42,dim43,dim44,dim45,dim46,dim47,dim48,dim49,dim50,dim51,dim52,dim53,dim54,dim55,dim56,dim57,dim58,dim59,dim60,dim61,dim62,dim63,dim64,dim65,dim66,dim67,dim68,dim69,dim70,dim71,dim72,dim73,dim74,dim75,dim76,dim77,dim78,dim79,dim80,dim81,M1,M2,M3,M4,M5,M6,M7,M8,M9,M10');
+-------------+
| Segment ID  |
+-------------+
| 1           |
+-------------+
1 row selected (17.094 seconds)
0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA inpath 
'hdfs://hacluster/chetan/Bigdata_bulk.csv' into table JL_r3 
options('sort_scope'='no_sort','DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','BAD_RECORDS_LOGGER_ENABLE'='TRUE','IS_EMPTY_DATA_BAD_RECORD'='TRUE','FILEHEADER'='p_cap_time,city,product_code,user_base_station,user_belong_area_code,user_num,user_imsi,user_id,user_msisdn,dim1,dim2,dim3,dim4,dim5,dim6,dim7,dim8,dim9,dim10,dim11,dim12,dim13,dim14,dim15,dim16,dim17,dim18,dim19,dim20,dim21,dim22,dim23,dim24,dim25,dim26,dim27,dim28,dim29,dim30,dim31,dim32,dim33,dim34,dim35,dim36,dim37,dim38,dim39,dim40,dim41,dim42,dim43,dim44,dim45,dim46,dim47,dim48,dim49,dim50,dim51,dim52,dim53,dim54,dim55,dim56,dim57,dim58,dim59,dim60,dim61,dim62,dim63,dim64,dim65,dim66,dim67,dim68,dim69,dim70,dim71,dim72,dim73,dim74,dim75,dim76,dim77,dim78,dim79,dim80,dim81,M1,M2,M3,M4,M5,M6,M7,M8,M9,M10');
+-------------+
| Segment ID  |
+-------------+
| 2           |
+-------------+
1 row selected (9.062 seconds)
0: jdbc:hive2://10.21.19.14:23040/default> alter table JL_r3 compact 'major';

 

Issue : Major Compaction of a table with multiple big data loads each having 
different sort scopes fails.

0: jdbc:hive2://10.21.19.14:23040/default> alter table JL_r3 compact 'major';
Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs 
for more info. Exception in compaction Job aborted due to stage failure: Task 0 
in stage 2813.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
2813.0 (TID 136282) (vm1 executor driver): java.lang.Exception: Error in close 
data handler
        at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.readAndLoadDataFromSortTempFiles(CompactionResultSortProcessor.java:407)
        at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.execute(CompactionResultSortProcessor.java:186)
        at 
org.apache.carbondata.spark.rdd.CarbonMergerRDD$$anon$1.<init>(CarbonMergerRDD.scala:258)
        at 
org.apache.carbondata.spark.rdd.CarbonMergerRDD.internalCompute(CarbonMergerRDD.scala:120)
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException:
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processWriteTaskSubmitList(CarbonFactDataHandlerColumnar.java:475)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:494)
        at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.readAndLoadDataFromSortTempFiles(CompactionResultSortProcessor.java:404)
        ... 14 more
Caused by: java.util.concurrent.ExecutionException: 
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException:
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processWriteTaskSubmitList(CarbonFactDataHandlerColumnar.java:472)
        ... 16 more
Caused by: 
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException:
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Consumer.call(CarbonFactDataHandlerColumnar.java:685)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Consumer.call(CarbonFactDataHandlerColumnar.java:656)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
Caused by: java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$TablePageList.get(CarbonFactDataHandlerColumnar.java:584)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Consumer.call(CarbonFactDataHandlerColumnar.java:674)
        ... 5 more

Driver stacktrace:
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
        at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.sql.AnalysisException: Compaction failed. Please 
check logs for more info. Exception in compaction Job aborted due to stage 
failure: Task 0 in stage 2813.0 failed 1 times, most recent failure: Lost task 
0.0 in stage 2813.0 (TID 136282) (vm1 executor driver): java.lang.Exception: 
Error in close data handler
        at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.readAndLoadDataFromSortTempFiles(CompactionResultSortProcessor.java:407)
        at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.execute(CompactionResultSortProcessor.java:186)
        at 
org.apache.carbondata.spark.rdd.CarbonMergerRDD$$anon$1.<init>(CarbonMergerRDD.scala:258)
        at 
org.apache.carbondata.spark.rdd.CarbonMergerRDD.internalCompute(CarbonMergerRDD.scala:120)
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException:
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processWriteTaskSubmitList(CarbonFactDataHandlerColumnar.java:475)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:494)
        at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.readAndLoadDataFromSortTempFiles(CompactionResultSortProcessor.java:404)
        ... 14 more
Caused by: java.util.concurrent.ExecutionException: 
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException:
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processWriteTaskSubmitList(CarbonFactDataHandlerColumnar.java:472)
        ... 16 more
Caused by: 
org.apache.carbondata.core.datastore.exception.CarbonDataWriterException:
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Consumer.call(CarbonFactDataHandlerColumnar.java:685)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Consumer.call(CarbonFactDataHandlerColumnar.java:656)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
Caused by: java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$TablePageList.get(CarbonFactDataHandlerColumnar.java:584)
        at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Consumer.call(CarbonFactDataHandlerColumnar.java:674)
        ... 5 more

Driver stacktrace:
        at 
org.apache.spark.sql.util.CarbonException$.analysisException(CarbonException.scala:23)
        at 
org.apache.spark.sql.execution.command.management.CarbonAlterTableCompactionCommand.$anonfun$processData$3(CarbonAlterTableCompactionCommand.scala:197)
        at org.apache.carbondata.events.package$.withEvents(package.scala:27)
        at 
org.apache.spark.sql.execution.command.management.CarbonAlterTableCompactionCommand.processData(CarbonAlterTableCompactionCommand.scala:185)
        at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162)
        at 
org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118)
        at 
org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114)
        at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155)
        at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:168)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
        at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
        at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:325)
        ... 16 more (state=,code=0)

 

Expected Result : The compaction should be success.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to