[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408598302
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field commitSeqnoField =
-new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field recordKeyField =
-new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field partitionPathField =
-new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field fileNameField =
-new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 
 Review comment:
   > parquet-avro reader breaks as it expects every field in parquet schema to 
be present in avro. I have a fix to parquet-avro to skip those fields, will 
send a PR to show so we can discuss whether it is a good idea or not
   
   Yes, please. Would love to see how you are approaching that fix. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408472216
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field commitSeqnoField =
-new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field recordKeyField =
-new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field partitionPathField =
-new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field fileNameField =
-new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 
 Review comment:
   Ok.. IMO hoodie metadata fields will never be written as null. If you 
strongly feel about doing this change, can we think of some alternate solution 
rather than going back to deprecated constructors. Also wanted to check with 
you how are you generating schema like you mentioned before? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408469635
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord 
record, LinkedHashSet
 GenericRecord newRecord = new GenericData.Record(newSchema);
 for (Schema.Field f : fieldsToWrite) {
   if (record.get(f.name()) == null) {
-newRecord.put(f.name(), f.defaultVal());
+if (f.defaultVal() instanceof Null) {
+  newRecord.put(f.name(), null);
 
 Review comment:
   Ok so when using f.defaultVal(), NullNode corresponds to JsonProperties.Null 
class and hence the test case fails. Good catch :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408445302
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord 
record, LinkedHashSet
 GenericRecord newRecord = new GenericData.Record(newSchema);
 for (Schema.Field f : fieldsToWrite) {
   if (record.get(f.name()) == null) {
-newRecord.put(f.name(), f.defaultVal());
+if (f.defaultVal() instanceof Null) {
+  newRecord.put(f.name(), null);
 
 Review comment:
   But these tests are passing on the master branch without the patch that you 
did. :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408445302
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord 
record, LinkedHashSet
 GenericRecord newRecord = new GenericData.Record(newSchema);
 for (Schema.Field f : fieldsToWrite) {
   if (record.get(f.name()) == null) {
-newRecord.put(f.name(), f.defaultVal());
+if (f.defaultVal() instanceof Null) {
+  newRecord.put(f.name(), null);
 
 Review comment:
   But these tests are passing on the master branch without the patch that you 
did. :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408436224
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field commitSeqnoField =
-new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field recordKeyField =
-new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field partitionPathField =
-new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field fileNameField =
-new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 
 Review comment:
   Ok.. I am still trying to understand what difference does it make if the 
schema is generated without defaultValue or if it is generated with 
defaultValue as null. Can you please provide more points to support this? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408435191
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord 
record, LinkedHashSet
 GenericRecord newRecord = new GenericData.Record(newSchema);
 for (Schema.Field f : fieldsToWrite) {
   if (record.get(f.name()) == null) {
-newRecord.put(f.name(), f.defaultVal());
+if (f.defaultVal() instanceof Null) {
+  newRecord.put(f.name(), null);
 
 Review comment:
   Which test you are talking about? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408425373
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -206,7 +207,11 @@ private static GenericRecord rewrite(GenericRecord 
record, LinkedHashSet
 GenericRecord newRecord = new GenericData.Record(newSchema);
 for (Schema.Field f : fieldsToWrite) {
   if (record.get(f.name()) == null) {
-newRecord.put(f.name(), f.defaultVal());
+if (f.defaultVal() instanceof Null) {
+  newRecord.put(f.name(), null);
 
 Review comment:
   Why do you need this? If f.defaultVal() is null, it will be automatically 
populated like that. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-04-14 Thread GitBox
pratyakshsharma commented on a change in pull request #1513: [HUDI-793] Adding 
proper default to hudi metadata fields and proper handling to rewrite routine
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r408423175
 
 

 ##
 File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
 ##
 @@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field commitSeqnoField =
-new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field recordKeyField =
-new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field partitionPathField =
-new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field fileNameField =
-new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 
 Review comment:
   By doing this change, we are effectively going back to deprecated 
constructors again. Can you please explain the purpose of doing this? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services