Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12163 )

Change subject: IMPALA-7087: Read Parquet decimal columns with lower 
precision/scale
......................................................................


Patch Set 1:

> Starting a new thread, since the last one was getting long.
 >
 > I started working on getting this patch in line with Hive's
 > behavior - e.g. if an overflow occurs, set the row to NULL and emit
 > a warning. I hit an issue in the design of ScalarColumnReader::ConvertSlot.
 > The ConvertSlot method returns a bool, if true the execution
 > continues, if false the query fails. There is no clean way for
 > ConvertSlot to set the tuple to NULL and continue processing.
 > ScalarColumnReader::ValidateValue on the other had does allow for
 > this. If ValidateValue returns false, and the parse_status is ok()
 > then the tuple is set to NULL and execution continues. If it
 > returns false and the parse_status() is an error, it aborts the
 > query. If it returns true, execution continues.
 >
 > I checked and all current implementations of ConvertSlot always
 > return true, no matter what. So I propose changing ConvertSlot's
 > return semantics to match those of ValidateValue.
 >
 > Any objections?

 > Starting a new thread, since the last one was getting long.
 >
 > I started working on getting this patch in line with Hive's
 > behavior - e.g. if an overflow occurs, set the row to NULL and emit
 > a warning. I hit an issue in the design of ScalarColumnReader::ConvertSlot.
 > The ConvertSlot method returns a bool, if true the execution
 > continues, if false the query fails. There is no clean way for
 > ConvertSlot to set the tuple to NULL and continue processing.
 > ScalarColumnReader::ValidateValue on the other had does allow for
 > this. If ValidateValue returns false, and the parse_status is ok()
 > then the tuple is set to NULL and execution continues. If it
 > returns false and the parse_status() is an error, it aborts the
 > query. If it returns true, execution continues.
 >
 > I checked and all current implementations of ConvertSlot always
 > return true, no matter what. So I propose changing ConvertSlot's
 > return semantics to match those of ValidateValue.
 >
 > Any objections?

 > Starting a new thread, since the last one was getting long.
 >
 > I started working on getting this patch in line with Hive's
 > behavior - e.g. if an overflow occurs, set the row to NULL and emit
 > a warning. I hit an issue in the design of ScalarColumnReader::ConvertSlot.
 > The ConvertSlot method returns a bool, if true the execution
 > continues, if false the query fails. There is no clean way for
 > ConvertSlot to set the tuple to NULL and continue processing.
 > ScalarColumnReader::ValidateValue on the other had does allow for
 > this. If ValidateValue returns false, and the parse_status is ok()
 > then the tuple is set to NULL and execution continues. If it
 > returns false and the parse_status() is an error, it aborts the
 > query. If it returns true, execution continues.
 >
 > I checked and all current implementations of ConvertSlot always
 > return true, no matter what. So I propose changing ConvertSlot's
 > return semantics to match those of ValidateValue.
 >
 > Any objections?

I would still prefer the "exclude the whole file with warning if overflow is 
possible" approach, as it would:
- give a useful error message
- probably make conversion faster, as a simple multiplication would be enough
- would be simpler and need less testing

If this is not an option (because enabling the potentially overflowing 
conversion is really useful in some use cases), then I agree with changing 
ConvertSlot() as you described.


--
To view, visit http://gerrit.cloudera.org:8080/12163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61
Gerrit-Change-Number: 12163
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jan 2019 22:11:01 +0000
Gerrit-HasComments: No

Reply via email to