[jira] [Updated] (HIVE-22001) AcidUtils.getAcidState() can fail if Cleaner is removing files at the same time

2019-08-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-22001:
--
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Committed to master, thanks for review [~ashutoshc]

> AcidUtils.getAcidState() can fail if Cleaner is removing files at the same 
> time
> ---
>
> Key: HIVE-22001
> URL: https://issues.apache.org/jira/browse/HIVE-22001
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22001.1.patch
>
>
> Had one user hit the following error during getSplits
> {noformat}
> 2019-07-06T14:33:03,067 ERROR [4640181a-3eb7-4b3e-9a40-d7a8de9a570c 
> HiveServer2-HttpHandler-Pool: Thread-415519]: SessionState 
> (SessionState.java:printError(1247)) - Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1560947172646_2452_6199_00, diagnostics=[Vertex 
> vertex_1560947172646_2452_6199_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: hive_table initializer failed, 
> vertex=vertex_1560947172646_2452_6199_00 [Map 1], java.lang.RuntimeException: 
> ORC split generation failed with exception: java.io.FileNotFoundException: 
> File hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 
> does not exist.
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1870)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1958)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException: 
> java.io.FileNotFoundException: File 
> hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 does 
> not exist.
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
> ... 17 more
> Caused by: java.io.FileNotFoundException: File 
> hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 does 
> not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
> at 
> 

[jira] [Updated] (HIVE-22001) AcidUtils.getAcidState() can fail if Cleaner is removing files at the same time

2019-08-05 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-22001:
--
Attachment: HIVE-22001.1.patch

> AcidUtils.getAcidState() can fail if Cleaner is removing files at the same 
> time
> ---
>
> Key: HIVE-22001
> URL: https://issues.apache.org/jira/browse/HIVE-22001
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-22001.1.patch
>
>
> Had one user hit the following error during getSplits
> {noformat}
> 2019-07-06T14:33:03,067 ERROR [4640181a-3eb7-4b3e-9a40-d7a8de9a570c 
> HiveServer2-HttpHandler-Pool: Thread-415519]: SessionState 
> (SessionState.java:printError(1247)) - Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1560947172646_2452_6199_00, diagnostics=[Vertex 
> vertex_1560947172646_2452_6199_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: hive_table initializer failed, 
> vertex=vertex_1560947172646_2452_6199_00 [Map 1], java.lang.RuntimeException: 
> ORC split generation failed with exception: java.io.FileNotFoundException: 
> File hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 
> does not exist.
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1870)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1958)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException: 
> java.io.FileNotFoundException: File 
> hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 does 
> not exist.
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
> ... 17 more
> Caused by: java.io.FileNotFoundException: File 
> hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 does 
> not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.chooseFile(AcidUtils.java:1903)
> at 
> 

[jira] [Updated] (HIVE-22001) AcidUtils.getAcidState() can fail if Cleaner is removing files at the same time

2019-08-05 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-22001:
--
Status: Patch Available  (was: Open)

> AcidUtils.getAcidState() can fail if Cleaner is removing files at the same 
> time
> ---
>
> Key: HIVE-22001
> URL: https://issues.apache.org/jira/browse/HIVE-22001
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-22001.1.patch
>
>
> Had one user hit the following error during getSplits
> {noformat}
> 2019-07-06T14:33:03,067 ERROR [4640181a-3eb7-4b3e-9a40-d7a8de9a570c 
> HiveServer2-HttpHandler-Pool: Thread-415519]: SessionState 
> (SessionState.java:printError(1247)) - Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1560947172646_2452_6199_00, diagnostics=[Vertex 
> vertex_1560947172646_2452_6199_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: hive_table initializer failed, 
> vertex=vertex_1560947172646_2452_6199_00 [Map 1], java.lang.RuntimeException: 
> ORC split generation failed with exception: java.io.FileNotFoundException: 
> File hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 
> does not exist.
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1870)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1958)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException: 
> java.io.FileNotFoundException: File 
> hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 does 
> not exist.
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809)
> ... 17 more
> Caused by: java.io.FileNotFoundException: File 
> hdfs://path/to/hive_table/oiddatemmdd=20190706/delta_0987070_0987070 does 
> not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.chooseFile(AcidUtils.java:1903)
> at 
>