[GitHub] [incubator-hudi] umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-25 Thread GitBox
umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-604137304
 
 
   > Sorry did not mean to hijack this fix.. Just trying to understand how it 
ll break compatibility while we are here.. All this schema namespace business 
is only before writing parquet files right... Once you are able to write 
parquet, it should be readable by parquet-avro for merging? (which has nothing 
to do with apache-spark-avro or databricks-spark-avro)... what causes the 
breakage?
   
   All I can think of is, since the old namespace is stored in the 
`parquet.avro.schema` in the actual parquet file, it might conflict with the 
new schema that has a different namespace. 
   @zhedoubushishi is looking into this.
   
   One good thing is that atleast it should not affect user's using 
`FileBaseSchemaProvider` or `SchemaRegistryProvider` with `DeltaStreamer` in 
which case from what I see we directly use the schema that user has passed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-23 Thread GitBox
umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-602933007
 
 
   > > > So anyone who has written data using databricks-avro will face issues 
reading.
   > 
   > By this you mean, reading for merging data (i.e during ingestion/writing) 
or querying via Spark/Hive/Presto?
   
   Yeah I mean writing additional data using `spark-avro` on top of old table 
written with data-bricks avro. Querying should not be affected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-23 Thread GitBox
umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-602846762
 
 
   > LGTM overall..
   > 
   > @umehrot2 @zhedoubushishi generally speaking, this schema namespace 
mismatch.. is this a backwards incompatible change.. i.e if we people have 
written data using 0.5.1, could they use master/0.6.0 to read and write without 
pain?
   
   @vinothchandar with 0.5.1 currently you cannot even write some of these 
complex data types like Array or structs etc. So this is actually a fix, and is 
not backwards incompatible with 0.5.1 since it uses `spark-avro`. However, it 
will be backwards incompatible with `databricks-avro`. So anyone who has 
written data using `databricks-avro` will face issues reading.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-16 Thread GitBox
umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-599776958
 
 
   > @umehrot2 are you interested in reviewing this? :)
   
   For sure. I either ways have to review it internally as well :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services