Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12163 )
Change subject: IMPALA-7087: Read Parquet decimal columns with lower precision/scale ...................................................................... Patch Set 1: > Starting a new thread, since the last one was getting long. > > I started working on getting this patch in line with Hive's > behavior - e.g. if an overflow occurs, set the row to NULL and emit > a warning. I hit an issue in the design of ScalarColumnReader::ConvertSlot. > The ConvertSlot method returns a bool, if true the execution > continues, if false the query fails. There is no clean way for > ConvertSlot to set the tuple to NULL and continue processing. > ScalarColumnReader::ValidateValue on the other had does allow for > this. If ValidateValue returns false, and the parse_status is ok() > then the tuple is set to NULL and execution continues. If it > returns false and the parse_status() is an error, it aborts the > query. If it returns true, execution continues. > > I checked and all current implementations of ConvertSlot always > return true, no matter what. So I propose changing ConvertSlot's > return semantics to match those of ValidateValue. > > Any objections? > Starting a new thread, since the last one was getting long. > > I started working on getting this patch in line with Hive's > behavior - e.g. if an overflow occurs, set the row to NULL and emit > a warning. I hit an issue in the design of ScalarColumnReader::ConvertSlot. > The ConvertSlot method returns a bool, if true the execution > continues, if false the query fails. There is no clean way for > ConvertSlot to set the tuple to NULL and continue processing. > ScalarColumnReader::ValidateValue on the other had does allow for > this. If ValidateValue returns false, and the parse_status is ok() > then the tuple is set to NULL and execution continues. If it > returns false and the parse_status() is an error, it aborts the > query. If it returns true, execution continues. > > I checked and all current implementations of ConvertSlot always > return true, no matter what. So I propose changing ConvertSlot's > return semantics to match those of ValidateValue. > > Any objections? > Starting a new thread, since the last one was getting long. > > I started working on getting this patch in line with Hive's > behavior - e.g. if an overflow occurs, set the row to NULL and emit > a warning. I hit an issue in the design of ScalarColumnReader::ConvertSlot. > The ConvertSlot method returns a bool, if true the execution > continues, if false the query fails. There is no clean way for > ConvertSlot to set the tuple to NULL and continue processing. > ScalarColumnReader::ValidateValue on the other had does allow for > this. If ValidateValue returns false, and the parse_status is ok() > then the tuple is set to NULL and execution continues. If it > returns false and the parse_status() is an error, it aborts the > query. If it returns true, execution continues. > > I checked and all current implementations of ConvertSlot always > return true, no matter what. So I propose changing ConvertSlot's > return semantics to match those of ValidateValue. > > Any objections? I would still prefer the "exclude the whole file with warning if overflow is possible" approach, as it would: - give a useful error message - probably make conversion faster, as a simple multiplication would be enough - would be simpler and need less testing If this is not an option (because enabling the potentially overflowing conversion is really useful in some use cases), then I agree with changing ConvertSlot() as you described. -- To view, visit http://gerrit.cloudera.org:8080/12163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61 Gerrit-Change-Number: 12163 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Comment-Date: Thu, 10 Jan 2019 22:11:01 +0000 Gerrit-HasComments: No