[jira] [Updated] (HIVE-17404) Orc split generation cache does not handle files without file tail

2019-03-19 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17404:
-
Attachment: HIVE-17404.2.patch

> Orc split generation cache does not handle files without file tail
> --
>
> Key: HIVE-17404
> URL: https://issues.apache.org/jira/browse/HIVE-17404
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Aditya Shah
>Priority: Critical
> Attachments: HIVE-17404.2.patch, HIVE-17404.patch
>
>
> Some old files do not have Orc FileTail. If file tail does not exist, split 
> generation should fallback to old way of storing footers. 
> This can result in exceptions like below
> {code}
> ORC split generation failed with exception: Malformed ORC file. Invalid 
> postscript length 9
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1735)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1822)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:450)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.orc.FileFormatException: Malformed ORC file. Invalid 
> postscript length 9
>   at org.apache.orc.impl.ReaderImpl.ensureOrcFooter(ReaderImpl.java:297)
>   at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:470)
>   at 
> org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:804)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:922)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:891)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1763)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1707)
>   ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17404) Orc split generation cache does not handle files without file tail

2019-03-17 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-17404:
---
  Attachment: HIVE-17404.patch
Target Version/s:   (was: 3.0.0, 2.4.0)
  Status: Patch Available  (was: Open)

Have submitted a patch which Adds a check for ORC bytes in Orctail before 
putting it in the local cache. This issue was faced because in HIVE-16133 we 
minimize the tail data stored in the cache. This cause a call to extractTails 
which rebuilds the OrcTail while using it. This further causes a check for 
footer and results in an error being thrown. Because for old orc files when the 
tail is not present we check the head for the “ORC” text, but in the case where 
we just have a tail as in this call, it causes an exception.

cc [~prasanth_j] [~rajesh.balamohan] [~andrewom]

> Orc split generation cache does not handle files without file tail
> --
>
> Key: HIVE-17404
> URL: https://issues.apache.org/jira/browse/HIVE-17404
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Aditya Shah
>Priority: Critical
> Attachments: HIVE-17404.patch
>
>
> Some old files do not have Orc FileTail. If file tail does not exist, split 
> generation should fallback to old way of storing footers. 
> This can result in exceptions like below
> {code}
> ORC split generation failed with exception: Malformed ORC file. Invalid 
> postscript length 9
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1735)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1822)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:450)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.orc.FileFormatException: Malformed ORC file. Invalid 
> postscript length 9
>   at org.apache.orc.impl.ReaderImpl.ensureOrcFooter(ReaderImpl.java:297)
>   at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:470)
>   at 
> org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:804)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:922)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:891)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1763)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1707)
>   ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17404) Orc split generation cache does not handle files without file tail

2017-10-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17404:
-
Priority: Critical  (was: Blocker)

> Orc split generation cache does not handle files without file tail
> --
>
> Key: HIVE-17404
> URL: https://issues.apache.org/jira/browse/HIVE-17404
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
>
> Some old files do not have Orc FileTail. If file tail does not exist, split 
> generation should fallback to old way of storing footers. 
> This can result in exceptions like below
> {code}
> ORC split generation failed with exception: Malformed ORC file. Invalid 
> postscript length 9
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1735)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1822)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:450)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.orc.FileFormatException: Malformed ORC file. Invalid 
> postscript length 9
>   at org.apache.orc.impl.ReaderImpl.ensureOrcFooter(ReaderImpl.java:297)
>   at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:470)
>   at 
> org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:804)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:922)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:891)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1763)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1707)
>   ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17404) Orc split generation cache does not handle files without file tail

2017-08-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17404:
-
Summary: Orc split generation cache does not handle files without file tail 
 (was: Orc split generation cache does not handle files with file tail)

> Orc split generation cache does not handle files without file tail
> --
>
> Key: HIVE-17404
> URL: https://issues.apache.org/jira/browse/HIVE-17404
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
>
> Some old files do not have Orc FileTail. If file tail does not exist, split 
> generation should fallback to old way of storing footers. 
> This can result in exceptions like below
> {code}
> ORC split generation failed with exception: Malformed ORC file. Invalid 
> postscript length 9
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1735)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1822)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:450)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.orc.FileFormatException: Malformed ORC file. Invalid 
> postscript length 9
>   at org.apache.orc.impl.ReaderImpl.ensureOrcFooter(ReaderImpl.java:297)
>   at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:470)
>   at 
> org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:804)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:922)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:891)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1763)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1707)
>   ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)