Nishant Bangarwa created HIVE-21628: ---------------------------------------
Summary: Use druid-s3-extensions when using S3 as druid deep storage Key: HIVE-21628 URL: https://issues.apache.org/jira/browse/HIVE-21628 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Currently DruidStorageHandler always use druid-hdfs-extensions for S3 as well as HDFS. HDFS extension, pushes the segment to an intermediate directory and then does rename to copy it to final path. 1) The rename causes additional copy of data over, which is avoided by druid-s3 extension 2) rename may fail when the pushed file is not yet available due to eventual consistent model of S3. Refer exception below - {code} Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:184) ... 22 more Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip at org.apache.hive.druid.com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:665) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$push$0(AppenderatorImpl.java:528) at org.apache.hive.druid.com.google.common.util.concurrent.Futures$1.apply(Futures.java:713) at org.apache.hive.druid.com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861) ... 3 more Caused by: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) at org.apache.hadoop.fs.FileSystem.getFileLinkStatus(FileSystem.java:2727) at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:1560) at org.apache.hadoop.fs.HadoopFsWrapper.rename(HadoopFsWrapper.java:53) at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.copyFilesWithChecks(HdfsDataSegmentPusher.java:168) at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:149) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$3(AppenderatorImpl.java:647) at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:638) ... 6 more {code} This task is add the ability to switch to using druid-s3-extension when using S3A file scheme for druid storage directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)