subject:"\[jira\] \[Commented\] \(ORC\-54\) Evolve schemas based on field name rather than index"

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-08-17 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425919#comment-15425919
 ] 

Lefty Leverenz commented on ORC-54:
---

At least the new ORC configuration parameter (*orc.tolerate.missing.schema*) 
needs to be documented.  But what about the field name functionality?

> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Fix For: 1.2.0
>
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-08-17 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425915#comment-15425915
 ] 

Lefty Leverenz commented on ORC-54:
---

Should this be documented in the wiki?

> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Fix For: 1.2.0
>
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-08-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424754#comment-15424754
 ] 

ASF GitHub Bot commented on ORC-54:
---

Github user asfgit closed the pull request at:

https://github.com/apache/orc/pull/55


> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423575#comment-15423575
 ] 

ASF GitHub Bot commented on ORC-54:
---

Github user prasanthj commented on a diff in the pull request:

https://github.com/apache/orc/pull/55#discussion_r75038675
  
--- Diff: java/core/src/java/org/apache/orc/impl/SchemaEvolution.java ---
@@ -20,59 +20,132 @@
 
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.HashMap;
 import java.util.List;
+import java.util.Map;
+import java.util.regex.Pattern;
 
+import org.apache.orc.Reader;
 import org.apache.orc.TypeDescription;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 /**
- * Take the file types and the (optional) configuration column names/types 
and
- * see if there has been schema evolution.
+ * Infer and track the evolution between the schema as stored in the file 
and
+ * the schema that has been requested by the reader.
  */
 public class SchemaEvolution {
   // indexed by reader column id
   private final TypeDescription[] readerFileTypes;
   // indexed by reader column id
-  private final boolean[] included;
+  private final boolean[] readerIncluded;
+  // indexed by file column id
+  private final boolean[] fileIncluded;
   private final TypeDescription fileSchema;
   private final TypeDescription readerSchema;
   private boolean hasConversion = false;
+  private final boolean isAcid;
+
   // indexed by reader column id
   private final boolean[] ppdSafeConversion;
 
-  public SchemaEvolution(TypeDescription fileSchema, boolean[] 
includedCols) {
-this(fileSchema, null, includedCols);
+  private static final Logger LOG =
+LoggerFactory.getLogger(SchemaEvolution.class);
+  private static final Pattern missingMetadataPattern =
+Pattern.compile("_col\\d+");
+
+  public static class IllegalEvolutionException extends RuntimeException {
+public IllegalEvolutionException(String msg) {
+  super(msg);
+}
+  }
+
+  public SchemaEvolution(TypeDescription fileSchema,
+ Reader.Options options) {
+this(fileSchema, null, options);
   }
 
   public SchemaEvolution(TypeDescription fileSchema,
  TypeDescription readerSchema,
- boolean[] includedCols) {
-this.included = includedCols == null ? null :
+ Reader.Options options) {
+boolean allowMissingMetadata = options.getTolerateMissingSchema();
+boolean[] includedCols = options.getInclude();
+this.readerIncluded = includedCols == null ? null :
   Arrays.copyOf(includedCols, includedCols.length);
+this.fileIncluded = new boolean[fileSchema.getMaximumId() + 1];
 this.hasConversion = false;
 this.fileSchema = fileSchema;
+isAcid = checkAcidSchema(fileSchema);
 if (readerSchema != null) {
-  if (checkAcidSchema(fileSchema)) {
+  if (isAcid) {
 this.readerSchema = createEventSchema(readerSchema);
   } else {
 this.readerSchema = readerSchema;
   }
-  this.readerFileTypes = new 
TypeDescription[this.readerSchema.getMaximumId() + 1];
-  buildConversionFileTypesArray(fileSchema, this.readerSchema);
+  this.readerFileTypes =
+new TypeDescription[this.readerSchema.getMaximumId() + 1];
+  int positionalLevels = 0;
+  if (!hasColumnNames(isAcid? getBaseRow(fileSchema) : fileSchema)){
+if (!this.fileSchema.equals(this.readerSchema)) {
+  if (!allowMissingMetadata) {
+throw new RuntimeException("Found that schema metadata is 
missing"
++ " from file. This is likely caused by"
++ " a writer earlier than HIVE-4243. Will"
++ " not try to reconcile schemas");
+  } else {
+LOG.warn("Column names are missing from this file. This is"
++ " caused by a writer earlier than HIVE-4243. The reader 
will"
++ " reconcile schemas based on index. File type: " +
+this.fileSchema + ", reader type: " + this.readerSchema);
+positionalLevels = isAcid ? 2 : 1;
--- End diff --

What does positional level mean? Is it real row level? 
Does acid file schema look like this 
struct,struct[real_cols]>>? If so can you leave a comment 
about it?


> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
>

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423517#comment-15423517
 ] 

ASF GitHub Bot commented on ORC-54:
---

GitHub user omalley opened a pull request:

https://github.com/apache/orc/pull/55

ORC-54: Evolve schemas based on field name rather than index

This is an updated version of Mark's patch that fixes evolution of ACID 
files and rebases it to the current trunk.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/orc orc-54

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/55.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #55


commit e37bc797562f4c219bb733390f8e17a79d7ecec8
Author: Mark Wagner 
Date:   2016-08-16T22:30:51Z

ORC-54: Evolve schemas based on field name rather than index




> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364827#comment-15364827
 ] 

ASF GitHub Bot commented on ORC-54:
---

Github user omalley commented on a diff in the pull request:

https://github.com/apache/orc/pull/40#discussion_r69785443
  
--- Diff: java/core/src/java/org/apache/orc/impl/SchemaEvolution.java ---
@@ -85,55 +142,78 @@ void buildMapping(TypeDescription fileType,
 // check the easy case first
 if (fileType.getCategory() == readerType.getCategory()) {
   switch (readerType.getCategory()) {
-case BOOLEAN:
-case BYTE:
-case SHORT:
-case INT:
-case LONG:
-case DOUBLE:
-case FLOAT:
-case STRING:
-case TIMESTAMP:
-case BINARY:
-case DATE:
-  // these are always a match
-  break;
-case CHAR:
-case VARCHAR:
-  // HIVE-13648: Look at ORC data type conversion edge cases 
(CHAR, VARCHAR, DECIMAL)
-  isOk = fileType.getMaxLength() == readerType.getMaxLength();
-  break;
-case DECIMAL:
-  // HIVE-13648: Look at ORC data type conversion edge cases 
(CHAR, VARCHAR, DECIMAL)
-  // TODO we don't enforce scale and precision checks, but 
probably should
-  break;
-case UNION:
-case MAP:
-case LIST: {
-  // these must be an exact match
-  List fileChildren = fileType.getChildren();
-  List readerChildren = readerType.getChildren();
-  if (fileChildren.size() == readerChildren.size()) {
-for(int i=0; i < fileChildren.size(); ++i) {
-  buildMapping(fileChildren.get(i), readerChildren.get(i));
-}
-  } else {
-isOk = false;
+  case BOOLEAN:
+  case BYTE:
+  case SHORT:
+  case INT:
+  case LONG:
+  case DOUBLE:
+  case FLOAT:
+  case STRING:
+  case TIMESTAMP:
+  case BINARY:
+  case DATE:
+// these are always a match
+break;
+  case CHAR:
+  case VARCHAR:
+// HIVE-13648: Look at ORC data type conversion edge cases (CHAR, 
VARCHAR, DECIMAL)
+isOk = fileType.getMaxLength() == readerType.getMaxLength();
+break;
+  case DECIMAL:
+// HIVE-13648: Look at ORC data type conversion edge cases (CHAR, 
VARCHAR, DECIMAL)
+// TODO we don't enforce scale and precision checks, but probably 
should
+break;
+  case UNION:
+  case MAP:
+  case LIST: {
+// these must be an exact match
+List fileChildren = fileType.getChildren();
+List readerChildren = readerType.getChildren();
+if (fileChildren.size() == readerChildren.size()) {
+  for (int i = 0; i < fileChildren.size(); ++i) {
+buildMapping(fileChildren.get(i), readerChildren.get(i), 
useFieldNames);
--- End diff --

when you are recursing, useFieldNames should always be true. Prior to 
HIVE-4243 only the top level column names were lost. In recursive types the 
field names were correct.


> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364809#comment-15364809
 ] 

ASF GitHub Bot commented on ORC-54:
---

Github user omalley commented on a diff in the pull request:

https://github.com/apache/orc/pull/40#discussion_r69784363
  
--- Diff: java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java ---
@@ -201,6 +201,10 @@ public void nextVector(ColumnVector previous,
 public BitFieldReader getPresent() {
   return present;
 }
+
--- End diff --

What do you think about moving the building of writerIncluded into the 
SchemaEvolution class? I think it would be pretty natural there and building it 
would come naturally out of buildMapping.


> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364751#comment-15364751
 ] 

ASF GitHub Bot commented on ORC-54:
---

Github user omalley commented on a diff in the pull request:

https://github.com/apache/orc/pull/40#discussion_r69779322
  
--- Diff: java/core/src/java/org/apache/orc/OrcConf.java ---
@@ -82,6 +82,12 @@
   "If ORC reader encounters corrupt data, this value will be used 
to\n" +
   "determine whether to skip the corrupt data or throw 
exception.\n" +
   "The default behavior is to throw exception."),
+  TOLERATE_MISSING_SCHEMA("orc.tolerate.missing.schema",
+  "hive.exec.orc.tolerate.missing.schema",
+  true,
+  "Writers earlier than HIVE-4243 may have inaccurate schema 
metadata.\n"
+  + "This setting will enable best effort schema evolution 
rather\n"
--- End diff --

I'd suggest that the comment include something about using position in the 
top level structure as the fallback and that it support adding new columns to 
the end and changing types, but not deleting or reordering the top level 
columns.


> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364723#comment-15364723
 ] 

ASF GitHub Bot commented on ORC-54:
---

Github user omalley commented on a diff in the pull request:

https://github.com/apache/orc/pull/40#discussion_r69777633
  
--- Diff: java/core/src/java/org/apache/orc/impl/ReaderImpl.java ---
@@ -572,7 +572,14 @@ public RecordReader rows(Options options) throws 
IOException {
 boolean[] include = options.getInclude();
 // if included columns is null, then include all columns
 if (include == null) {
-  include = new boolean[types.size()];
+  int size;
+  TypeDescription readSchema = options.getSchema();
+  if (readSchema != null){
+size = readSchema.getMaximumId() + 1;
--- End diff --

I'd suggest doing
```
if (readSchema == null) {
  readSchema = schema;
}
```


> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-06-16 Thread Mark Wagner (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334675#comment-15334675
 ] 

Mark Wagner commented on ORC-54:


I'm reading through your patch on HIVE-13974. Wasn't aware of that work. Still 
need to read through it some more to grok, but I agree that there's overlap in 
that both of the patches are making the distinction around the included array 
explicit.

I don't see a reason Hive shouldn't be able to read these datasets as long as 
the OrcSerde declares the reader schema in the Options. We use Hive for reading 
Orc, so that's definitely something I'm paying attention to.

> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-06-16 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333415#comment-15333415
 ] 

Matt McCline commented on ORC-54:
-

Thanks for the heads up.  Big change -- seem like it is on a direct collision 
course with HIVE-13974 where the reader schema needs to be used for determining 
the included array.  I need to understand how this relates to Hive.  Will Hive 
be able to read these tables?

[~hagleitn] [~omalley]

> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-06-16 Thread Mark Wagner (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333251#comment-15333251
 ] 

Mark Wagner commented on ORC-54:


[~mmccline], I've just posted a  PR. Let me know your thoughts. Thanks!

> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-06-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333242#comment-15333242
 ] 

ASF GitHub Bot commented on ORC-54:
---

GitHub user wagnermarkd opened a pull request:

https://github.com/apache/orc/pull/40

ORC-54: Evolve schemas based on field name rather than index

Change evolution to compare column names when building mapping.

Misc notes:
- I've added a setting orc.tolerate.missing.schema to allow working with 
data before HIVE-4243 was fixed. This is important for anyone who has older 
data, though it limits the evolutions that can be supported
- A lot of changes in RecordReaderImpl to make the reader column vs writer 
column difference explicit
- SARG tests are passing, but there's no test coverage for SARG + 
evolution. Will work on that.
- Removed some unneeded checked exceptions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wagnermarkd/orc ORC-54-public

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/40.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #40


commit c24200900e55437a242034eda6a5cd5122cc3acc
Author: Mark Wagner 
Date:   2016-05-18T21:24:16Z

ORC-54: Evolve schemas based on field name rather than index




> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

2016-06-07 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/ORC-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319680#comment-15319680
 ] 

Matt McCline commented on ORC-54:
-

[~mwagner] [~owen.omalley] [~hagleitn] I need to be involved in this.

> Evolve schemas based on field name rather than index
> 
>
> Key: ORC-54
> URL: https://issues.apache.org/jira/browse/ORC-54
> Project: Orc
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Schema evolution as it stands today allows adding fields to the end of 
> schemas or removing them from the end. However, because it is based on the 
> index of the column, you can only ever add or remove -- not both.
> ORC files have the full schema information of their contents, so there's 
> actually enough metadata to support changing columns anywhere in the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

[jira] [Commented] (ORC-54) Evolve schemas based on field name rather than index

14 matches

Site Navigation

Mail list logo

Footer information