[GitHub] [incubator-hudi] vinothchandar commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-14 Thread GitBox


vinothchandar commented on pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-628749732


   @bvaradar this and #1518 are again related.. Can you take both of these home 
? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-05 Thread GitBox


vinothchandar commented on pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-624114815


   >Also the above plan only works for the combination of AvroKafkaSource and 
SchemaRegistryProvider. Thoughts?
   
   All for improving AvroKafkaSource/SR combo, since its heavily used.. but our 
framework needs to be improved generally.. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-05-05 Thread GitBox


vinothchandar commented on pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-624114147


   @pratyakshsharma  @afilipchik IIUC using the Confluent Avro Kafka decoders 
etc will integrate with SR and fetch and decode using the latest schema for us, 
which we will use as the schema for the write as well... There is another PR 
tracking this/fixing this.. (a lot of these schema PR interplay quite a bit :))
   
   
   On the initial suggestion, @pratyakshsharma I was merely suggesting a better 
contract for `SchemaProvider` where `getSourceSchema()` is support to return 
the latest source schema as of that time, not a cached copy based on what was 
fetched in the constructor.. Existing schema providers do a mix of these.. 
   
   Filebased/JdbcBased fetch the schema once in constructor and keep serving.. 
whereas SchemaRegistry/RowBased fetch again when `getSourceSchema()` is 
called.. So no need to create the schemaRegistryProvider instance every run, 
simply call `getSourceSchema()` every run? 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-04-27 Thread GitBox


vinothchandar commented on pull request #1566:
URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-620199568


   @afilipchik interested in taking a run at this? :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org