Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12163 )
Change subject: IMPALA-7087: Read Parquet decimal columns with lower precision/scale ...................................................................... Patch Set 1: (1 comment) Thanks for taking a look @Csaba! I've only responded to your comment on overflow handling so far, as I want to get that behavior locked down first. http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG@25 PS1, Line 25: If the underlying Parquet files contain rows that cannot be converted to : the higher scale without overflow, an error is thrown and the query : fails. This is different from other engines such as Hive which set the : row to NULL and allow the query to run to completion without error. > I do not know the use case, so I may be wrong, but I dislike the idea of fa Yes, the example you provided would trigger an overflow. The formula you provided sounds like it should work. I couldn't find much info in the SQL standard on how to handle these types of overflows, so maybe the behavior is ambiguous. For relational databases, you get an error when you try to insert a row that causes an overflow. However, the error is upon row insertion, so the table is still queryable after a failed attempt at inserting the row. As mentioned in the comments, Hive sets the row to NULL, and the query runs to completion. So sounds like there are three options: (1) the current behavior in the patch, (2) use the formula you provided to prevent overflows in which case no runtime errors or warnings are necessary, or (3) set the row to NULL and print out a warning (I think we do this for other types of Parquet scanning errors?). FWIW, the current behavior is exactly what you described. If you put a Parquet file into a table with a lower precision / scale, the table will be unqueryable until the files are removed. So if we went with option (2), any file inside the table that doesn't match the formula will make the table unqueryable. The main difference with this patch is that the table will be unqueryable only if certain rows that overflow are present. I don't feel strongly about any of the approaches, and I'm not super familiar with how Impala handles these types of issues, so open to suggestions. -- To view, visit http://gerrit.cloudera.org:8080/12163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61 Gerrit-Change-Number: 12163 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Mon, 07 Jan 2019 16:26:28 +0000 Gerrit-HasComments: Yes
