Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12163 )

Change subject: IMPALA-7087: Read Parquet decimal columns with lower 
precision/scale
......................................................................


Patch Set 1:

(1 comment)

Thanks for taking a look @Csaba! I've only responded to your comment on 
overflow handling so far, as I want to get that behavior locked down first.

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG@25
PS1, Line 25: If the underlying Parquet files contain rows that cannot be 
converted to
            : the higher scale without overflow, an error is thrown and the 
query
            : fails. This is different from other engines such as Hive which 
set the
            : row to NULL and allow the query to run to completion without 
error.
> I do not know the use case, so I may be wrong, but I dislike the idea of fa
Yes, the example you provided would trigger an overflow. The formula you 
provided sounds like it should work.

I couldn't find much info in the SQL standard on how to handle these types of 
overflows, so maybe the behavior is ambiguous. For relational databases, you 
get an error when you try to insert a row that causes an overflow. However, the 
error is upon row insertion, so the table is still queryable after a failed 
attempt at inserting the row.

As mentioned in the comments, Hive sets the row to NULL, and the query runs to 
completion.

So sounds like there are three options: (1) the current behavior in the patch, 
(2) use the formula you provided to prevent overflows in which case no runtime 
errors or warnings are necessary, or (3) set the row to NULL and print out a 
warning (I think we do this for other types of Parquet scanning errors?).

FWIW, the current behavior is exactly what you described. If you put a Parquet 
file into a table with a lower precision / scale, the table will be unqueryable 
until the files are removed. So if we went with option (2), any file inside the 
table that doesn't match the formula will make the table unqueryable. The main 
difference with this patch is that the table will be unqueryable only if 
certain rows that overflow are present.

I don't feel strongly about any of the approaches, and I'm not super familiar 
with how Impala handles these types of issues, so open to suggestions.



--
To view, visit http://gerrit.cloudera.org:8080/12163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61
Gerrit-Change-Number: 12163
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Mon, 07 Jan 2019 16:26:28 +0000
Gerrit-HasComments: Yes

Reply via email to