[GitHub] [incubator-iceberg] aokolnychyi merged pull request #509: Remove flaky Hive table tests
aokolnychyi merged pull request #509: Remove flaky Hive table tests URL: https://github.com/apache/incubator-iceberg/pull/509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] prodeezy commented on issue #506: Handling Date type conversion as epochDay integer ordinal
prodeezy commented on issue #506: Handling Date type conversion as epochDay integer ordinal URL: https://github.com/apache/incubator-iceberg/pull/506#issuecomment-536901662 Thanks @rdblue! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] mykolasmith closed issue #508: java.lang.IllegalStateException: Already closed file for partition
mykolasmith closed issue #508: java.lang.IllegalStateException: Already closed file for partition URL: https://github.com/apache/incubator-iceberg/issues/508 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] mykolasmith commented on issue #508: java.lang.IllegalStateException: Already closed file for partition
mykolasmith commented on issue #508: java.lang.IllegalStateException: Already closed file for partition URL: https://github.com/apache/incubator-iceberg/issues/508#issuecomment-537083082 Thanks, solved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #509: Remove flaky Hive table tests
rdblue commented on issue #509: Remove flaky Hive table tests URL: https://github.com/apache/incubator-iceberg/pull/509#issuecomment-537106532 > do we think CATALOG_CACHE in HiveCatalogs is related to the problem with these tests? Yes! I forgot about that one, but it puts us over the limit: Spark will have its own connection from the built-in metastore. Then we have a pool from `HiveCatalogs` that is reused, with two connections. One more connection for the direct client used to clean up, and two more for the `HiveCatalog` instance. That comes to 6, which is over the limit. Maybe if we add back the change to share the client pool for cleanup with the catalog used in tests? Let's bring these tests back in a PR, but keep this in master for now to unblock the release and avoid so many test failures for PRs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330302946 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java ## @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) { } @Override + @SuppressWarnings("checkstyle:CyclomaticComplexity") public Schema array(Schema array, Schema element) { -if (array.getLogicalType() instanceof LogicalMap) { +if (array.getLogicalType() instanceof LogicalMap || AvroSchemaUtil.isKeyValueSchema(array.getElementType())) { Schema keyValue = array.getElementType(); - int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key")); - int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value")); + Integer keyId = fieldId(keyValue.getField("key")); + Integer valueId = fieldId(keyValue.getField("value")); + if (keyId == null) { +Preconditions.checkState(valueId == null, "Map schema %s has value id but not key id", array); Review comment: Makes sense This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330357554 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java ## @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) { } @Override + @SuppressWarnings("checkstyle:CyclomaticComplexity") public Schema array(Schema array, Schema element) { -if (array.getLogicalType() instanceof LogicalMap) { +if (array.getLogicalType() instanceof LogicalMap || AvroSchemaUtil.isKeyValueSchema(array.getElementType())) { Schema keyValue = array.getElementType(); - int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key")); - int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value")); + Integer keyId = fieldId(keyValue.getField("key")); + Integer valueId = fieldId(keyValue.getField("value")); + if (keyId == null) { +Preconditions.checkState(valueId == null, "Map schema %s has value id but not key id", array); Review comment: For now I've added a log warning. Is adding logging for both [key missing and value present and value missing key present] a good idea, or do u see this to be a expected usage in which case I can even remove the existing logging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330357554 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java ## @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) { } @Override + @SuppressWarnings("checkstyle:CyclomaticComplexity") public Schema array(Schema array, Schema element) { -if (array.getLogicalType() instanceof LogicalMap) { +if (array.getLogicalType() instanceof LogicalMap || AvroSchemaUtil.isKeyValueSchema(array.getElementType())) { Schema keyValue = array.getElementType(); - int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key")); - int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value")); + Integer keyId = fieldId(keyValue.getField("key")); + Integer valueId = fieldId(keyValue.getField("value")); + if (keyId == null) { +Preconditions.checkState(valueId == null, "Map schema %s has value id but not key id", array); Review comment: For now I've added a log warning. Is adding logging for both [key missing and value present and value missing key present] a good idea, or do u see this to be a expected usage in which case I can remove even remove the existing logging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330287265 ## File path: core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java ## @@ -165,13 +165,14 @@ public Schema array(Schema array, Supplier element) { try { Schema keyValueSchema = array.getElementType(); Schema.Field keyField = keyValueSchema.getFields().get(0); +Schema.Field keyProjection = element.get().getField("key"); Schema.Field valueField = keyValueSchema.getFields().get(1); Schema.Field valueProjection = element.get().getField("value"); // element was changed, create a new array -if (valueProjection.schema() != valueField.schema()) { +if (keyProjection.schema() != keyField.schema() || valueProjection.schema() != valueField.schema()) { return AvroSchemaUtil.createProjectionMap(keyValueSchema.getFullName(), - AvroSchemaUtil.getFieldId(keyField), keyField.name(), keyField.schema(), + AvroSchemaUtil.getFieldId(keyField), keyField.name(), keyProjection.schema(), Review comment: That makes sense. I'll remove This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330305047 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java ## @@ -58,21 +66,19 @@ public Schema record(Schema record, List names, List fields) { // case where the converted field is non-null is when a map or list is // selected by lower IDs. if (selectedIds.contains(fieldId)) { -filteredFields.add(copyField(field, field.schema())); +filteredFields.add(copyField(field, field.schema(), fieldId)); } else if (fieldSchema != null) { -hasChange = true; -filteredFields.add(copyField(field, fieldSchema)); +filteredFields.add(copyField(field, fieldSchema, fieldId)); } } -if (hasChange) { +if (filteredFields.size() > 0) { Review comment: Makes sense. I've updated the patch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #497: Support retaining last N snapshots
rdblue commented on a change in pull request #497: Support retaining last N snapshots URL: https://github.com/apache/incubator-iceberg/pull/497#discussion_r330158860 ## File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java ## @@ -82,6 +83,18 @@ public ExpireSnapshots expireOlderThan(long timestampMillis) { return this; } + @Override + public ExpireSnapshots retainLast(int numSnapshots) { +Preconditions.checkArgument(1 < numSnapshots, +"Number of snapshots to retain must be at least 1, cannot be: %s", numSnapshots); +LOG.info("Retaining last {} snapshots and expiring older snapshots", numSnapshots); +Snapshot nthSnapshot = findNthSnapshot(numSnapshots); +if (nthSnapshot != null) { + expireOlderThan(nthSnapshot.timestampMillis()); Review comment: If both work with one another, there is no need to let the caller know if both are called. I think the right solution is to make the two settings compatible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330279824 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java ## @@ -58,21 +66,19 @@ public Schema record(Schema record, List names, List fields) { // case where the converted field is non-null is when a map or list is // selected by lower IDs. if (selectedIds.contains(fieldId)) { -filteredFields.add(copyField(field, field.schema())); +filteredFields.add(copyField(field, field.schema(), fieldId)); } else if (fieldSchema != null) { -hasChange = true; -filteredFields.add(copyField(field, fieldSchema)); +filteredFields.add(copyField(field, fieldSchema, fieldId)); } } -if (hasChange) { +if (filteredFields.size() > 0) { Review comment: This shouldn't remove the `hasChange` flag. The purpose of `hasChange` is to signal when this is using the projection of a field. That way, if there are no known changed fields and all fields were projected, we use the original record. That's better for caching. Since you now need to detect the case where the field ID is missing so that that record gets recreated, you should add a case above that detects when it isn't there and sets `hasChange = true`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r330283342 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java ## @@ -100,36 +106,46 @@ public Schema union(Schema union, List options) { } @Override + @SuppressWarnings("checkstyle:CyclomaticComplexity") public Schema array(Schema array, Schema element) { -if (array.getLogicalType() instanceof LogicalMap) { +if (array.getLogicalType() instanceof LogicalMap || AvroSchemaUtil.isKeyValueSchema(array.getElementType())) { Schema keyValue = array.getElementType(); - int keyId = AvroSchemaUtil.getFieldId(keyValue.getField("key")); - int valueId = AvroSchemaUtil.getFieldId(keyValue.getField("value")); + Integer keyId = fieldId(keyValue.getField("key")); + Integer valueId = fieldId(keyValue.getField("value")); + if (keyId == null) { +Preconditions.checkState(valueId == null, "Map schema %s has value id but not key id", array); Review comment: This could happen if the mapping has key, but not value. I'm not sure that it's a good idea to throw an exception in this case. Maybe just warn and not project? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org