[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5984: Add row based schema validation code to detect schema mismatch

2020-09-09 Thread GitBox


Jackie-Jiang commented on pull request #5984:
URL: https://github.com/apache/incubator-pinot/pull/5984#issuecomment-689836238


   > Most of the cases can be covered by validating pinot schema and avro 
schema. One tricky thing is that when all the fields are required to be 
fetched, we convert the avro generic record to string first, then parse it as a 
json:
   > 
https://github.com/apache/incubator-pinot/blob/d54b04a2562f86dfb3adaa02ff400951d8108738/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java#L49
   > 
   > ```
   >   /**
   >* Converts from a GenericRecord to a json map
   >*/
   >   public static Map genericRecordToJson(GenericRecord 
genericRecord) {
   > try {
   >   String jsonString = genericRecord.toString();
   >   return DEFAULT_MAPPER.readValue(jsonString, new 
TypeReference>() {
   >   });
   > } catch (IOException e) {
   >   throw new IllegalStateException("Caught exception when converting 
generic record " + genericRecord + " to JSON");
   > }
   >   }
   > ```
   > 
   > The data type of the value from the k-v pair might get changed.
   
   I don't think we ever init record extractor without the fields. Also, 
converting to string then serializing as json doesn't seem correct. We should 
fix that instead of adding the row based validation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5984: Add row based schema validation code to detect schema mismatch

2020-09-08 Thread GitBox


Jackie-Jiang commented on pull request #5984:
URL: https://github.com/apache/incubator-pinot/pull/5984#issuecomment-689256109


   High level question, do we really need row level validation? What is the 
overhead of this validation?
   Seems like the validation only applies to the Avro reader. In that case, 
validating Pinot schema and Avro schema should be enough. IMO we are paying a 
lot of overhead for very little gain.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org