Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13189 )

Change subject: IMPALA-7370: DATE: Read/Write to parquet.
......................................................................


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/13189/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13189/2//COMMIT_MSG@12
PS2, Line 12: Parquet uses DATE logical type for dates. DATE logical type 
annotates
            : an INT32 that stores the number of days from the Unix epoch, 1 
January
            : 1970.
> Gregorian stuff + Hive could be mentioned here, and I would also prefer to
Done. Added some e2e tests too.


http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-readers.cc@957
PS2, Line 957:   // The range was already checked during the int32_t->DateValue 
conversion, which
             :   // sets the date to invalid if it was out of range.
> optional:
I'll leave it like this for now. We can figure out a better way later.


http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-stats.inline.h
File be/src/exec/parquet/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-stats.inline.h@171
PS2, Line 171:   DateValue* result = reinterpret_cast<DateValue*>(slot);
             :   int size = buffer.size();
             :   const uint8_t* data = reinterpret_cast<const 
uint8_t*>(buffer.data());
             :   if (ParquetPlainEncoder::Decode<DateValue, 
parquet::Type::INT32>(data, data + size,
             :       size, result) == -1) {
             :     return false;
             :   }
> This looks very similar to line 148-155. It would be nice to extract it int
Done


http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-common.h@370
PS2, Line 370: i
> nit: "i" suggests a loop variable to me, so I would prefer a name like tmp
Done


http://gerrit.cloudera.org:8080/#/c/13189/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

http://gerrit.cloudera.org:8080/#/c/13189/2/common/thrift/generate_error_codes.py@408
PS2, Line 408:   ("PARQUET_DATE_OUT_OF_RANGE", 134,
             :    "Parquet file '$0' column '$1' contains an out of range date. 
"
             :    "The valid date range is 0000-01-01..9999-12-31."),
> Can you create file with a corrupt value to cover related paths?
I've added the e2e test. This test revealed a missing bit in my implementation, 
thanks!


http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1267
PS2, Line 1267:     // Right now, scanning DATE values is only supported for 
TEXT and PARQUET fileformats.
> I feel that listing unsupported formats could be done right above findUnsup
Done


http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1341
PS2, Line 1341: ff != HdfsFileFormat.TEXT && ff != HdfsFileFormat.PARQUET
> A function like "IsTypeSupported()" or "GetSupportedType()" could be added
Done


http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1341
PS2, Line 1341:  if (ff != HdfsFileFormat.TEXT && ff != HdfsFileFormat.PARQUET) 
{
              :         return part;
              :       }
> nit: can this be merged into one line?
Done


http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test
File 
testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test:

http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test@7
PS2, Line 7: ---- QUERY
> nit: Would it make sense to merge this step with the previous one as it doe
Added expected results to the previous section. I think it makes more sense 
than merging the two test cases together.


http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@236
PS2, Line 236: # Note: dictionary filtering currently does not work on dates
> IMPALA-4994 could be mentioned here, it would make it easier to cleanup rel
Done


http://gerrit.cloudera.org:8080/#/c/13189/2/tests/common/impala_connection.py
File tests/common/impala_connection.py:

http://gerrit.cloudera.org:8080/#/c/13189/2/tests/common/impala_connection.py@285
PS2, Line 285:   r = None
             :     try:
             :       r = self.__fetch_results(handle, 
profile_format=profile_format)
             :     finally:
             :       if r is None:
             :         # Try to close the query handle but ignore any 
exceptions not to overwrite the
             :         # original exception raised by '__fetch_results'.
             :         try:
             :           self.close_query(handle)
             :         except Exception:
             :           pass
             :       else:
             :         self.close_query(handle)
             :         return r
> Is this an intentional change for "date on parquet"? Isn't this an unrelate
I decided to include this fix here as it broke some of the hs2 DATE tests in 
test_date_queries.py.

I could create a separate patch set only for this, although probably that would 
be an overkill.


http://gerrit.cloudera.org:8080/#/c/13189/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/13189/2/tests/query_test/test_insert_parquet.py@667
PS2, Line 667:     values.
> nit: Date could be mentioned in this comment + the reason for handling it s
Done


http://gerrit.cloudera.org:8080/#/c/13189/2/tests/query_test/test_insert_parquet.py@669
PS2, Line 669:     # Expected values for functional.date_tbl
> nit: for me this comment doesn't add any value
Removed it.



-- 
To view, visit http://gerrit.cloudera.org:8080/13189
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67da03754531660bc8de3b6935580d46deae1814
Gerrit-Change-Number: 13189
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Fri, 03 May 2019 15:15:03 +0000
Gerrit-HasComments: Yes

Reply via email to