Iñigo Martinez created KYLIN-3555: ------------------------------------- Summary: Garbage collection on HBase step fails with S3 selected as storage Key: KYLIN-3555 URL: https://issues.apache.org/jira/browse/KYLIN-3555 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v2.4.1 Reporter: Iñigo Martinez Attachments: Screenshot from 2018-09-11 12-31-25.png
When building a cube with S3 selected has storage, build process fails at latest step. Although s3 has been defined as storage, cleanup task tries to delete from HDFS and, of course, there is no file at HDFS. {code:java} 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: s3://XXXXXXX-emr-kylin 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:87 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns is dropped. 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:87 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile is dropped. 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:90 : HDFS path /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns not exists. 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job f8416975-eea6-4500-9cb7-4374f28451dc-237] steps.HDFSPathGarbageCollectionStep:68 : job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception java.io.FileNotFoundException: File /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971) at org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95) at org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)