[jira] [Commented] (ORC-437) Make acid schema checks case insensitive

2018-11-14 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687201#comment-16687201
 ] 

Eugene Koifman commented on ORC-437:


+1 pending tests

> Make acid schema checks case insensitive
> 
>
> Key: ORC-437
> URL: https://issues.apache.org/jira/browse/ORC-437
> Project: ORC
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 1.5.3
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Major
> Attachments: ORC-437.1.patch
>
>
> When reading from an Orc file, SchemaEvolution evolution tries to determine 
> if this is an Acid compliant format by comparing field names with Acid event 
> names in {{SchemaEvolution.checkAcidSchema}}. Would be good to make this 
> comparison case insensitive.
> This requirement comes in from HIVE-20699 where a Hive query is being used to 
> run compaction (and hence write the compacted data to the bucket files via a 
> HiveQL query). Since hive converts all column names to lower case, the 
> compacted files end up with lower case Acid schema columns. The change is 
> much simpler when made in Orc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-437) Make acid schema checks case insensitive

2018-11-14 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686872#comment-16686872
 ] 

Eugene Koifman commented on ORC-437:


[~vgumashta], I made you an ORC committer in the Jira

 

> Make acid schema checks case insensitive
> 
>
> Key: ORC-437
> URL: https://issues.apache.org/jira/browse/ORC-437
> Project: ORC
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 1.5.3
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Major
> Attachments: ORC-437.1.patch
>
>
> When reading from an Orc file, SchemaEvolution evolution tries to determine 
> if this is an Acid compliant format by comparing field names with Acid event 
> names in {{SchemaEvolution.checkAcidSchema}}. Would be good to make this 
> comparison case insensitive.
> This requirement comes in from HIVE-20699 where a Hive query is being used to 
> run compaction (and hence write the compacted data to the bucket files via a 
> HiveQL query). Since hive converts all column names to lower case, the 
> compacted files end up with lower case Acid schema columns. The change is 
> much simpler when made in Orc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ORC-389) Add ability to not decode Acid metadata columns

2018-07-25 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved ORC-389.

   Resolution: Fixed
Fix Version/s: 1.5

committed to master & branch-1.5

> Add ability to not decode Acid metadata columns
> ---
>
> Key: ORC-389
> URL: https://issues.apache.org/jira/browse/ORC-389
> Project: ORC
>  Issue Type: Improvement
>  Components: ACID
>Affects Versions: 1.5.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 1.5
>
>
> for example, the split is from base_x and there are no relevant delete_delta/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ORC-389) Add ability to not decode Acid metadata columns

2018-07-25 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated ORC-389:
---
Affects Version/s: 1.5.0

> Add ability to not decode Acid metadata columns
> ---
>
> Key: ORC-389
> URL: https://issues.apache.org/jira/browse/ORC-389
> Project: ORC
>  Issue Type: Improvement
>  Components: ACID
>Affects Versions: 1.5.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> for example, the split is from base_x and there are no relevant delete_delta/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ORC-389) Add ability to not decode Acid metadata columns

2018-07-25 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned ORC-389:
--

Assignee: Eugene Koifman

> Add ability to not decode Acid metadata columns
> ---
>
> Key: ORC-389
> URL: https://issues.apache.org/jira/browse/ORC-389
> Project: ORC
>  Issue Type: Improvement
>  Components: ACID
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> for example, the split is from base_x and there are no relevant delete_delta/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ORC-389) Add ability to not decode Acid metadata columns

2018-07-25 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ORC-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated ORC-389:
---
Component/s: ACID

> Add ability to not decode Acid metadata columns
> ---
>
> Key: ORC-389
> URL: https://issues.apache.org/jira/browse/ORC-389
> Project: ORC
>  Issue Type: Improvement
>  Components: ACID
>Reporter: Eugene Koifman
>Priority: Major
>
> for example, the split is from base_x and there are no relevant delete_delta/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ORC-223) FileDump utility should print user metadata

2017-08-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned ORC-223:
--


> FileDump utility should print user metadata
> ---
>
> Key: ORC-223
> URL: https://issues.apache.org/jira/browse/ORC-223
> Project: ORC
>  Issue Type: Improvement
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> currently it doesn't 
> repro - take any data file from an Acid table it has "hive.acid.key.index" 
> key in user metadata which doesn't show up in the output



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ORC-195) FileFormatException should include file name in the message

2017-05-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated ORC-195:
---
Description: 
Here is 1 example: 
{noformat}
ReaderImpl.extractFileTail(FileSystem fs, Path path, long maxFileLength) throws 
IOException 
{noformat}
has 
{noformat}
  if (size <= OrcFile.MAGIC.length()) {
throw new FileFormatException("Not a valid ORC file");
  }
{noformat}

which in the logs looks like

{noformat}
2017-05-18T12:08:23,572  WARN [Thread-360] mapred.LocalJobRunner: 
job_local150767050_0007
java.lang.Exception: org.apache.orc.FileFormatException: Not a valid ORC file
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) 
~[hadoop-mapreduce-client-common-2.8.0.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) 
[hadoop-mapreduce-client-common-2.8.0.jar:?]
Caused by: org.apache.orc.FileFormatException: Not a valid ORC file
at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:511) 
~[orc-core-1.3.3.jar:1.3.3]
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:378) 
~[orc-core-1.3.3.jar:1.3.3]
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:63) 
~[classes/:?]
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:90) 
~[classes/:?]
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:2279)
 ~[classes/:?]
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:665)
 ~[classes/:?]
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:642)
 ~[classes/:?]
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
~[hadoop-mapreduce-client-core-2.8.0.jar:?]
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
~[hadoop-mapreduce-client-core-2.8.0.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
~[hadoop-mapreduce-client-core-2.8.0.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
 ~[hadoop-mapreduce-client-common-2.8.0.jar:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_25]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_25]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_25]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_25]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_25]
{noformat}

  was:
Here is 1 example: 
{noformat}
ReaderImpl.extractFileTail(FileSystem fs, Path path, long maxFileLength) throws 
IOException 

has 

  if (size <= OrcFile.MAGIC.length()) {
throw new FileFormatException("Not a valid ORC file");
  }

{noformat}

which in the logs looks like

{noformat}
2017-05-18T12:08:23,572  WARN [Thread-360] mapred.LocalJobRunner: 
job_local150767050_0007
java.lang.Exception: org.apache.orc.FileFormatException: Not a valid ORC file
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) 
~[hadoop-mapreduce-client-common-2.8.0.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) 
[hadoop-mapreduce-client-common-2.8.0.jar:?]
Caused by: org.apache.orc.FileFormatException: Not a valid ORC file
at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:511) 
~[orc-core-1.3.3.jar:1.3.3]
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:378) 
~[orc-core-1.3.3.jar:1.3.3]
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:63) 
~[classes/:?]
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:90) 
~[classes/:?]
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:2279)
 ~[classes/:?]
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:665)
 ~[classes/:?]
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:642)
 ~[classes/:?]
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
~[hadoop-mapreduce-client-core-2.8.0.jar:?]
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) 
~[hadoop-mapreduce-client-core-2.8.0.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
~[hadoop-mapreduce-client-core-2.8.0.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
 ~[hadoop-mapreduce-client-common-2.8.0.jar:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_25]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_25]
at 

[jira] [Commented] (ORC-154) add OrcFile.WriterOptions.clone()

2017-03-17 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930927#comment-15930927
 ] 

Eugene Koifman commented on ORC-154:


thanks

> add OrcFile.WriterOptions.clone()
> -
>
> Key: ORC-154
> URL: https://issues.apache.org/jira/browse/ORC-154
> Project: ORC
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.4.0
>
> Attachments: ORC-154.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ORC-154) add OrcFile.WriterOptions.clone()

2017-03-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated ORC-154:
---
Attachment: ORC-154.01.patch

> add OrcFile.WriterOptions.clone()
> -
>
> Key: ORC-154
> URL: https://issues.apache.org/jira/browse/ORC-154
> Project: ORC
>  Issue Type: Improvement
>Affects Versions: 1.3.3
>Reporter: Eugene Koifman
> Fix For: 1.4.0
>
> Attachments: ORC-154.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)