[GitHub] [incubator-iceberg] aokolnychyi merged pull request #509: Remove flaky Hive table tests

2019-10-01 Thread GitBox
aokolnychyi merged pull request #509: Remove flaky Hive table tests
URL: https://github.com/apache/incubator-iceberg/pull/509
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] prodeezy commented on issue #506: Handling Date type conversion as epochDay integer ordinal

2019-10-01 Thread GitBox
prodeezy commented on issue #506: Handling Date type conversion as epochDay 
integer ordinal 
URL: https://github.com/apache/incubator-iceberg/pull/506#issuecomment-536901662
 
 
   Thanks @rdblue! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] mykolasmith closed issue #508: java.lang.IllegalStateException: Already closed file for partition

2019-10-01 Thread GitBox
mykolasmith closed issue #508: java.lang.IllegalStateException: Already closed 
file for partition
URL: https://github.com/apache/incubator-iceberg/issues/508
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] mykolasmith commented on issue #508: java.lang.IllegalStateException: Already closed file for partition

2019-10-01 Thread GitBox
mykolasmith commented on issue #508: java.lang.IllegalStateException: Already 
closed file for partition
URL: 
https://github.com/apache/incubator-iceberg/issues/508#issuecomment-537083082
 
 
   Thanks, solved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on issue #509: Remove flaky Hive table tests

2019-10-01 Thread GitBox
rdblue commented on issue #509: Remove flaky Hive table tests
URL: https://github.com/apache/incubator-iceberg/pull/509#issuecomment-537106532
 
 
   > do we think CATALOG_CACHE in HiveCatalogs is related to the problem with 
these tests?
   
   Yes! I forgot about that one, but it puts us over the limit: Spark will have 
its own connection from the built-in metastore. Then we have a pool from 
`HiveCatalogs` that is reused, with two connections. One more connection for 
the direct client used to clean up, and two more for the `HiveCatalog` 
instance. That comes to 6, which is over the limit. Maybe if we add back the 
change to share the client pool for cleanup with the catalog used in tests?
   
   Let's bring these tests back in a PR, but keep this in master for now to 
unblock the release and avoid so many test failures for PRs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-01 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings 
for files written with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330302946
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java
 ##
 @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) {
   }
 
   @Override
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
   public Schema array(Schema array, Schema element) {
-if (array.getLogicalType() instanceof LogicalMap) {
+if (array.getLogicalType() instanceof LogicalMap || 
AvroSchemaUtil.isKeyValueSchema(array.getElementType())) {
   Schema keyValue = array.getElementType();
-  int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key"));
-  int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value"));
+  Integer keyId = fieldId(keyValue.getField("key"));
+  Integer valueId = fieldId(keyValue.getField("value"));
+  if (keyId == null) {
+Preconditions.checkState(valueId == null, "Map schema %s has value id 
but not key id", array);
 
 Review comment:
   Makes sense


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-01 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings 
for files written with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330357554
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java
 ##
 @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) {
   }
 
   @Override
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
   public Schema array(Schema array, Schema element) {
-if (array.getLogicalType() instanceof LogicalMap) {
+if (array.getLogicalType() instanceof LogicalMap || 
AvroSchemaUtil.isKeyValueSchema(array.getElementType())) {
   Schema keyValue = array.getElementType();
-  int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key"));
-  int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value"));
+  Integer keyId = fieldId(keyValue.getField("key"));
+  Integer valueId = fieldId(keyValue.getField("value"));
+  if (keyId == null) {
+Preconditions.checkState(valueId == null, "Map schema %s has value id 
but not key id", array);
 
 Review comment:
   For now I've added a log warning. Is adding logging for both [key missing 
and value present and value missing key present] a good idea, or do u see this 
to be a expected usage in which case I can even remove the existing logging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-01 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings 
for files written with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330357554
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java
 ##
 @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) {
   }
 
   @Override
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
   public Schema array(Schema array, Schema element) {
-if (array.getLogicalType() instanceof LogicalMap) {
+if (array.getLogicalType() instanceof LogicalMap || 
AvroSchemaUtil.isKeyValueSchema(array.getElementType())) {
   Schema keyValue = array.getElementType();
-  int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key"));
-  int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value"));
+  Integer keyId = fieldId(keyValue.getField("key"));
+  Integer valueId = fieldId(keyValue.getField("value"));
+  if (keyId == null) {
+Preconditions.checkState(valueId == null, "Map schema %s has value id 
but not key id", array);
 
 Review comment:
   For now I've added a log warning. Is adding logging for both [key missing 
and value present and value missing key present] a good idea, or do u see this 
to be a expected usage in which case I can remove even remove the existing 
logging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-01 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings 
for files written with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330287265
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java
 ##
 @@ -165,13 +165,14 @@ public Schema array(Schema array, Supplier 
element) {
   try {
 Schema keyValueSchema = array.getElementType();
 Schema.Field keyField = keyValueSchema.getFields().get(0);
+Schema.Field keyProjection = element.get().getField("key");
 Schema.Field valueField = keyValueSchema.getFields().get(1);
 Schema.Field valueProjection = element.get().getField("value");
 
 // element was changed, create a new array
-if (valueProjection.schema() != valueField.schema()) {
+if (keyProjection.schema() != keyField.schema() || 
valueProjection.schema() != valueField.schema()) {
   return 
AvroSchemaUtil.createProjectionMap(keyValueSchema.getFullName(),
-  AvroSchemaUtil.getFieldId(keyField), keyField.name(), 
keyField.schema(),
+  AvroSchemaUtil.getFieldId(keyField), keyField.name(), 
keyProjection.schema(),
 
 Review comment:
   That makes sense. I'll remove


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-01 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings 
for files written with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330305047
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java
 ##
 @@ -58,21 +66,19 @@ public Schema record(Schema record, List names, 
List fields) {
   // case where the converted field is non-null is when a map or list is
   // selected by lower IDs.
   if (selectedIds.contains(fieldId)) {
-filteredFields.add(copyField(field, field.schema()));
+filteredFields.add(copyField(field, field.schema(), fieldId));
   } else if (fieldSchema != null) {
-hasChange = true;
-filteredFields.add(copyField(field, fieldSchema));
+filteredFields.add(copyField(field, fieldSchema, fieldId));
   }
 }
 
-if (hasChange) {
+if (filteredFields.size() > 0) {
 
 Review comment:
   Makes sense. I've updated the patch


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #497: Support retaining last N snapshots

2019-10-01 Thread GitBox
rdblue commented on a change in pull request #497: Support retaining last N 
snapshots
URL: https://github.com/apache/incubator-iceberg/pull/497#discussion_r330158860
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java
 ##
 @@ -82,6 +83,18 @@ public ExpireSnapshots expireOlderThan(long 
timestampMillis) {
 return this;
   }
 
+  @Override
+  public ExpireSnapshots retainLast(int numSnapshots) {
+Preconditions.checkArgument(1 < numSnapshots,
+"Number of snapshots to retain must be at least 1, cannot be: %s", 
numSnapshots);
+LOG.info("Retaining last {} snapshots and expiring older snapshots", 
numSnapshots);
+Snapshot nthSnapshot = findNthSnapshot(numSnapshots);
+if (nthSnapshot != null) {
+  expireOlderThan(nthSnapshot.timestampMillis());
 
 Review comment:
   If both work with one another, there is no need to let the caller know if 
both are called. I think the right solution is to make the two settings 
compatible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-01 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings 
for files written with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330279824
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java
 ##
 @@ -58,21 +66,19 @@ public Schema record(Schema record, List names, 
List fields) {
   // case where the converted field is non-null is when a map or list is
   // selected by lower IDs.
   if (selectedIds.contains(fieldId)) {
-filteredFields.add(copyField(field, field.schema()));
+filteredFields.add(copyField(field, field.schema(), fieldId));
   } else if (fieldSchema != null) {
-hasChange = true;
-filteredFields.add(copyField(field, fieldSchema));
+filteredFields.add(copyField(field, fieldSchema, fieldId));
   }
 }
 
-if (hasChange) {
+if (filteredFields.size() > 0) {
 
 Review comment:
   This shouldn't remove the `hasChange` flag. The purpose of `hasChange` is to 
signal when this is using the projection of a field. That way, if there are no 
known changed fields and all fields were projected, we use the original record. 
That's better for caching.
   
   Since you now need to detect the case where the field ID is missing so that 
that record gets recreated, you should add a case above that detects when it 
isn't there and sets `hasChange = true`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-01 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings 
for files written with name-based schemas #40
URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330283342
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java
 ##
 @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) {
   }
 
   @Override
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
   public Schema array(Schema array, Schema element) {
-if (array.getLogicalType() instanceof LogicalMap) {
+if (array.getLogicalType() instanceof LogicalMap || 
AvroSchemaUtil.isKeyValueSchema(array.getElementType())) {
   Schema keyValue = array.getElementType();
-  int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key"));
-  int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value"));
+  Integer keyId = fieldId(keyValue.getField("key"));
+  Integer valueId = fieldId(keyValue.getField("value"));
+  if (keyId == null) {
+Preconditions.checkState(valueId == null, "Map schema %s has value id 
but not key id", array);
 
 Review comment:
   This could happen if the mapping has key, but not value. I'm not sure that 
it's a good idea to throw an exception in this case. Maybe just warn and not 
project?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org