[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408598302 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) { List parentFields = new ArrayList<>(); Schema.Field commitTimeField = -new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field commitSeqnoField = -new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field recordKeyField = -new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field partitionPathField = -new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field fileNameField = -new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Review comment: > parquet-avro reader breaks as it expects every field in parquet schema to be present in avro. I have a fix to parquet-avro to skip those fields, will send a PR to show so we can discuss whether it is a good idea or not Yes, please. Would love to see how you are approaching that fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408472216 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) { List parentFields = new ArrayList<>(); Schema.Field commitTimeField = -new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field commitSeqnoField = -new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field recordKeyField = -new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field partitionPathField = -new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field fileNameField = -new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Review comment: Ok.. IMO hoodie metadata fields will never be written as null. If you strongly feel about doing this change, can we think of some alternate solution rather than going back to deprecated constructors. Also wanted to check with you how are you generating schema like you mentioned before? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408469635 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord record, LinkedHashSet GenericRecord newRecord = new GenericData.Record(newSchema); for (Schema.Field f : fieldsToWrite) { if (record.get(f.name()) == null) { -newRecord.put(f.name(), f.defaultVal()); +if (f.defaultVal() instanceof Null) { + newRecord.put(f.name(), null); Review comment: Ok so when using f.defaultVal(), NullNode corresponds to JsonProperties.Null class and hence the test case fails. Good catch :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408445302 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord record, LinkedHashSet GenericRecord newRecord = new GenericData.Record(newSchema); for (Schema.Field f : fieldsToWrite) { if (record.get(f.name()) == null) { -newRecord.put(f.name(), f.defaultVal()); +if (f.defaultVal() instanceof Null) { + newRecord.put(f.name(), null); Review comment: But these tests are passing on the master branch without the patch that you did. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408445302 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord record, LinkedHashSet GenericRecord newRecord = new GenericData.Record(newSchema); for (Schema.Field f : fieldsToWrite) { if (record.get(f.name()) == null) { -newRecord.put(f.name(), f.defaultVal()); +if (f.defaultVal() instanceof Null) { + newRecord.put(f.name(), null); Review comment: But these tests are passing on the master branch without the patch that you did. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408436224 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) { List parentFields = new ArrayList<>(); Schema.Field commitTimeField = -new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field commitSeqnoField = -new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field recordKeyField = -new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field partitionPathField = -new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field fileNameField = -new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Review comment: Ok.. I am still trying to understand what difference does it make if the schema is generated without defaultValue or if it is generated with defaultValue as null. Can you please provide more points to support this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408435191 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord record, LinkedHashSet GenericRecord newRecord = new GenericData.Record(newSchema); for (Schema.Field f : fieldsToWrite) { if (record.get(f.name()) == null) { -newRecord.put(f.name(), f.defaultVal()); +if (f.defaultVal() instanceof Null) { + newRecord.put(f.name(), null); Review comment: Which test you are talking about? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408425373 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord record, LinkedHashSet GenericRecord newRecord = new GenericData.Record(newSchema); for (Schema.Field f : fieldsToWrite) { if (record.get(f.name()) == null) { -newRecord.put(f.name(), f.defaultVal()); +if (f.defaultVal() instanceof Null) { + newRecord.put(f.name(), null); Review comment: Why do you need this? If f.defaultVal() is null, it will be automatically populated like that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408423175 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) { List parentFields = new ArrayList<>(); Schema.Field commitTimeField = -new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field commitSeqnoField = -new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field recordKeyField = -new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field partitionPathField = -new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field fileNameField = -new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", (Object) null); +new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Review comment: By doing this change, we are effectively going back to deprecated constructors again. Can you please explain the purpose of doing this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services