Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8747 )

Change subject: IMPALA-5993: Fix the file offset in value parsing error
......................................................................


Patch Set 1:

(12 comments)

I looked through the change. I can't say that I completely understand the 
design of the C++ part, so my comments for that part are simple local ones.

http://gerrit.cloudera.org:8080/#/c/8747/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8747/7//COMMIT_MSG@11
PS7, Line 11: hit an error, it always prints the end of the file as the
nit: hits


http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@365
PS7, Line 365:
typo: calculated


http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@373
PS7, Line 373: ers to ex
typo: difference


http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@373
PS7, Line 373: aths in Parquet scanner. Returns the status
Do you mean "taking read_bytes as argument instead of calculating it"?


http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@375
PS7, Line 375:   Status ScannerDebugAction() WARN_UNUSED_RESULT {
nit: remove extra space


http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.cc@497
PS7, Line 497:       // In this branch, we've just materialized slots not 
referenced by any conjunct.
Checking LogHasSpace() is not necessary here - LogError() checks if there is 
space for the log anyway, so the check is only useful to avoid the runtime cost 
of creating the error message if it would be dropped anyway. This is important 
if the logic is called a lot of time, for example for millions of rows. This 
part will be called only once (per fragment), as the function returns on the 
first error.


http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.cc@675
PS7, Line 675:       stream_->filename(), line, pos, offset);
             :   state_->LogError(ErrorMsg(TErrorCode::GENERAL, s));
             : }
The Substitute should be moved inside the if block - as I mentioned in line 
497, the reason to call LogHasSpace() is to avoid the runtime cost of creating 
the error message.


http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@847
PS7, Line 847:
             :
             :
             : 
             :
             :
             :
             :
             :
Can you explain why this function is needed? Other scanner test classes do not 
seem to do similar initialization.


http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@875
PS7, Line 875:
This step could be skipped, if the the table was created first (which creates 
an HDFS directory) and the data was copied afterwards.


http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@890
PS7, Line 890:
Why not use the default location ( /test-warehouse/$db_name.db/$table_name )? 
This line could be removed in that case.

Note that assuming that the databases are in /test-warehouse/ makes the tests 
less flexible, but other tests already rely on that, and the tests are expected 
to run on the Impala minicluster - many tests would fail on a different 
cluster, for example planner tests are sensitive to many cluster parameters. So 
I think that it is not a goal at the moment to make the tests runnable on other 
clusters. If it will be a goal, then a general solution should be created 
(probably in ImpalaTestSuite), not custom solutions for different tests.


http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@893
PS7, Line 893:
             :
nit: indentation and spaces around '='

In Impala, new blocks are indented with 2 spaces, while continuations of broken 
lines are intended with 4 spaces (most of the times...). This is a difference 
compared to PEP8, but makes the python code more consistent with C++ and Java 
code in Impala.

See the python part of 
https://cwiki.apache.org/confluence/display/IMPALA/Impala+Style+Guide for the 
list differences from PEP8.


http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@899
PS7, Line 899:
I would prefer to use self.execute_query(). The result it returns has a .log 
member, which can be split by "\n" to get the list of warnings.

Another way to do this is to create a .test file in testdata/workloads, list 
the warnings in the "---- ERRORS" section and run it with self.run_test_case(). 
See 
https://github.com/apache/impala/blob/89b29d3a89cca041c541476e25435dc0d34344c1/tests/common/test_result_verifier.py#L314
 for the way it processes the logs.



--
To view, visit http://gerrit.cloudera.org:8080/8747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f
Gerrit-Change-Number: 8747
Gerrit-PatchSet: 1
Gerrit-Owner: Kim Jin Chul <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Kim Jin Chul <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Comment-Date: Tue, 03 Apr 2018 15:28:43 +0000
Gerrit-HasComments: Yes

Reply via email to