Nishant Bangarwa created HIVE-21628:
---------------------------------------

             Summary: Use druid-s3-extensions when using S3 as druid deep 
storage
                 Key: HIVE-21628
                 URL: https://issues.apache.org/jira/browse/HIVE-21628
             Project: Hive
          Issue Type: Task
            Reporter: Nishant Bangarwa


Currently DruidStorageHandler always use druid-hdfs-extensions for S3 as well 
as HDFS.
HDFS extension, pushes the segment to an intermediate directory and then does 
rename to copy it to final path. 
1) The rename causes additional copy of data over, which is avoided by druid-s3 
extension
2) rename may fail when the pushed file is not yet available due to eventual 
consistent model of S3. Refer exception below - 

{code} 
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.io.FileNotFoundException: No such file or directory: 
s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
        at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
        at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
        at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
        at 
org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:184)
        ... 22 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: No such 
file or directory: 
s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
        at 
org.apache.hive.druid.com.google.common.base.Throwables.propagate(Throwables.java:160)
        at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:665)
        at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$push$0(AppenderatorImpl.java:528)
        at 
org.apache.hive.druid.com.google.common.util.concurrent.Futures$1.apply(Futures.java:713)
        at 
org.apache.hive.druid.com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861)
        ... 3 more
Caused by: java.io.FileNotFoundException: No such file or directory: 
s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
        at 
org.apache.hadoop.fs.FileSystem.getFileLinkStatus(FileSystem.java:2727)
        at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:1560)
        at org.apache.hadoop.fs.HadoopFsWrapper.rename(HadoopFsWrapper.java:53)
        at 
org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.copyFilesWithChecks(HdfsDataSegmentPusher.java:168)
        at 
org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:149)
        at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$3(AppenderatorImpl.java:647)
        at 
org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63)
        at 
org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81)
        at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:638)
        ... 6 more
{code}   

This task is add the ability to switch to using druid-s3-extension when using 
S3A file scheme for druid storage directory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to