[jira] [Commented] (ORC-437) Make acid schema checks case insensitive
[ https://issues.apache.org/jira/browse/ORC-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687201#comment-16687201 ] Eugene Koifman commented on ORC-437: +1 pending tests > Make acid schema checks case insensitive > > > Key: ORC-437 > URL: https://issues.apache.org/jira/browse/ORC-437 > Project: ORC > Issue Type: Bug > Components: Java >Affects Versions: 1.5.3 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Major > Attachments: ORC-437.1.patch > > > When reading from an Orc file, SchemaEvolution evolution tries to determine > if this is an Acid compliant format by comparing field names with Acid event > names in {{SchemaEvolution.checkAcidSchema}}. Would be good to make this > comparison case insensitive. > This requirement comes in from HIVE-20699 where a Hive query is being used to > run compaction (and hence write the compacted data to the bucket files via a > HiveQL query). Since hive converts all column names to lower case, the > compacted files end up with lower case Acid schema columns. The change is > much simpler when made in Orc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-437) Make acid schema checks case insensitive
[ https://issues.apache.org/jira/browse/ORC-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686872#comment-16686872 ] Eugene Koifman commented on ORC-437: [~vgumashta], I made you an ORC committer in the Jira > Make acid schema checks case insensitive > > > Key: ORC-437 > URL: https://issues.apache.org/jira/browse/ORC-437 > Project: ORC > Issue Type: Bug > Components: Java >Affects Versions: 1.5.3 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Major > Attachments: ORC-437.1.patch > > > When reading from an Orc file, SchemaEvolution evolution tries to determine > if this is an Acid compliant format by comparing field names with Acid event > names in {{SchemaEvolution.checkAcidSchema}}. Would be good to make this > comparison case insensitive. > This requirement comes in from HIVE-20699 where a Hive query is being used to > run compaction (and hence write the compacted data to the bucket files via a > HiveQL query). Since hive converts all column names to lower case, the > compacted files end up with lower case Acid schema columns. The change is > much simpler when made in Orc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ORC-389) Add ability to not decode Acid metadata columns
[ https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman resolved ORC-389. Resolution: Fixed Fix Version/s: 1.5 committed to master & branch-1.5 > Add ability to not decode Acid metadata columns > --- > > Key: ORC-389 > URL: https://issues.apache.org/jira/browse/ORC-389 > Project: ORC > Issue Type: Improvement > Components: ACID >Affects Versions: 1.5.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 1.5 > > > for example, the split is from base_x and there are no relevant delete_delta/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ORC-389) Add ability to not decode Acid metadata columns
[ https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated ORC-389: --- Affects Version/s: 1.5.0 > Add ability to not decode Acid metadata columns > --- > > Key: ORC-389 > URL: https://issues.apache.org/jira/browse/ORC-389 > Project: ORC > Issue Type: Improvement > Components: ACID >Affects Versions: 1.5.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > for example, the split is from base_x and there are no relevant delete_delta/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ORC-389) Add ability to not decode Acid metadata columns
[ https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned ORC-389: -- Assignee: Eugene Koifman > Add ability to not decode Acid metadata columns > --- > > Key: ORC-389 > URL: https://issues.apache.org/jira/browse/ORC-389 > Project: ORC > Issue Type: Improvement > Components: ACID >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > for example, the split is from base_x and there are no relevant delete_delta/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ORC-389) Add ability to not decode Acid metadata columns
[ https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated ORC-389: --- Component/s: ACID > Add ability to not decode Acid metadata columns > --- > > Key: ORC-389 > URL: https://issues.apache.org/jira/browse/ORC-389 > Project: ORC > Issue Type: Improvement > Components: ACID >Reporter: Eugene Koifman >Priority: Major > > for example, the split is from base_x and there are no relevant delete_delta/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ORC-223) FileDump utility should print user metadata
[ https://issues.apache.org/jira/browse/ORC-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned ORC-223: -- > FileDump utility should print user metadata > --- > > Key: ORC-223 > URL: https://issues.apache.org/jira/browse/ORC-223 > Project: ORC > Issue Type: Improvement >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > currently it doesn't > repro - take any data file from an Acid table it has "hive.acid.key.index" > key in user metadata which doesn't show up in the output -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ORC-195) FileFormatException should include file name in the message
[ https://issues.apache.org/jira/browse/ORC-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated ORC-195: --- Description: Here is 1 example: {noformat} ReaderImpl.extractFileTail(FileSystem fs, Path path, long maxFileLength) throws IOException {noformat} has {noformat} if (size <= OrcFile.MAGIC.length()) { throw new FileFormatException("Not a valid ORC file"); } {noformat} which in the logs looks like {noformat} 2017-05-18T12:08:23,572 WARN [Thread-360] mapred.LocalJobRunner: job_local150767050_0007 java.lang.Exception: org.apache.orc.FileFormatException: Not a valid ORC file at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) ~[hadoop-mapreduce-client-common-2.8.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) [hadoop-mapreduce-client-common-2.8.0.jar:?] Caused by: org.apache.orc.FileFormatException: Not a valid ORC file at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:511) ~[orc-core-1.3.3.jar:1.3.3] at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:378) ~[orc-core-1.3.3.jar:1.3.3] at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:63) ~[classes/:?] at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:90) ~[classes/:?] at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:2279) ~[classes/:?] at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:665) ~[classes/:?] at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:642) ~[classes/:?] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-2.8.0.jar:?] at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) ~[hadoop-mapreduce-client-core-2.8.0.jar:?] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.8.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) ~[hadoop-mapreduce-client-common-2.8.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_25] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_25] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_25] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_25] at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_25] {noformat} was: Here is 1 example: {noformat} ReaderImpl.extractFileTail(FileSystem fs, Path path, long maxFileLength) throws IOException has if (size <= OrcFile.MAGIC.length()) { throw new FileFormatException("Not a valid ORC file"); } {noformat} which in the logs looks like {noformat} 2017-05-18T12:08:23,572 WARN [Thread-360] mapred.LocalJobRunner: job_local150767050_0007 java.lang.Exception: org.apache.orc.FileFormatException: Not a valid ORC file at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) ~[hadoop-mapreduce-client-common-2.8.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) [hadoop-mapreduce-client-common-2.8.0.jar:?] Caused by: org.apache.orc.FileFormatException: Not a valid ORC file at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:511) ~[orc-core-1.3.3.jar:1.3.3] at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:378) ~[orc-core-1.3.3.jar:1.3.3] at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:63) ~[classes/:?] at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:90) ~[classes/:?] at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:2279) ~[classes/:?] at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:665) ~[classes/:?] at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:642) ~[classes/:?] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-2.8.0.jar:?] at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) ~[hadoop-mapreduce-client-core-2.8.0.jar:?] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.8.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) ~[hadoop-mapreduce-client-common-2.8.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_25] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_25] at
[jira] [Commented] (ORC-154) add OrcFile.WriterOptions.clone()
[ https://issues.apache.org/jira/browse/ORC-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930927#comment-15930927 ] Eugene Koifman commented on ORC-154: thanks > add OrcFile.WriterOptions.clone() > - > > Key: ORC-154 > URL: https://issues.apache.org/jira/browse/ORC-154 > Project: ORC > Issue Type: Improvement >Affects Versions: 1.3.3 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 1.4.0 > > Attachments: ORC-154.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ORC-154) add OrcFile.WriterOptions.clone()
[ https://issues.apache.org/jira/browse/ORC-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated ORC-154: --- Attachment: ORC-154.01.patch > add OrcFile.WriterOptions.clone() > - > > Key: ORC-154 > URL: https://issues.apache.org/jira/browse/ORC-154 > Project: ORC > Issue Type: Improvement >Affects Versions: 1.3.3 >Reporter: Eugene Koifman > Fix For: 1.4.0 > > Attachments: ORC-154.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)