[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-04-02 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-812710497


   Closing this as we have a tracking jira. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-30 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-810288191


   there are two code paths in HoodieSparkSqlWriter. 
   (1) AvroConversionUtils.convertStructTypeToAvroSchema(df.schema, structName, 
nameSpace)
   (2) HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
   
   (1) uses SchemaConverters.toAvroType(...)
   (2) uses our custom converter function (createConverterToAvro) in 
AvroConversionHelper.
   
   What I meant is, (1) is strictly needed which is what I tried out. (2) is 
not strictly required since that schema does not get serialized in commit 
metadata. But yeah, we can try to keep both in sync. I am all for it. 
   
   Wrt testing:
   - You can run usual unit tests and integration tests. 
[this](https://github.com/apache/hudi) should have details on running tests.
   - I assume you will write tests covering schema evolution to test the new 
code to put up.
   - For testing schema evolution, you can try out the steps you used to report 
this issue. We don't have end to end schema evolution tests for MOR as you 
might have realized with this issue. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-27 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-808828818


   Yes, your approach should work. Only change is that, we might have to fix it 
where we generate avro schema from df schema in HoodieSparkSqlWriter. Eg: 
https://github.com/nsivabalan/hudi/commit/43b3fc845a7b2ea4c68f1b3fc3e13b41bfb2d17e
   (My method to regenerateSchema is not full fledged. But it does work for mor 
w/ evolved schema for string type. actual fix should look like what you have in 
your commit) We need to fix the schema in HoodieSparkSqlWriter, bcoz, thats 
what gets serialized in commit metadata and hence. Not sure if we need to fix 
HoodieSparkUtils.createRdd(). 
   
   Please go ahead and open up a PR. Would be happy to review. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-24 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-806348073


   yes, you are right. I was able to reproduce the issue(local spark). Have 
filed a [bug](https://issues.apache.org/jira/browse/HUDI-1716). 
   I am yet to try out the hive issue. but it could be the same. Appreciate any 
contribution :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-24 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-805805044


   1. do you use the RowBasedSchemaProvider and hence can't explicitly provide 
schema? If you were to use your own schema registry, you might as well provide 
an updated schema to hudi while writing. 
   2. got it. would be nice to have some contribution. I can help review the 
patch. 
   In the mean time, I will give it a try schema evolution on my end with some 
local set up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-22 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-804178486


   You can add null as default value for your new field if that would work for 
you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-03-22 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-804177670


   Yeah, hudi just relies on Avro's schema compatibility in general. From the 
[specification](http://avro.apache.org/docs/current/spec.html#Schema+Resolution),
 looks like adding a new field w/o default will error out. 
   ```
   if the reader's record schema has a field with no default value, and 
writer's schema does not have a field with the same name, an error is signalled.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org