Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8747 )

Change subject: IMPALA-5993: Fix the file offset in value parsing error
......................................................................


Patch Set 7:

(25 comments)

Hi Lars,

Sorry I thought I already replied your comments, but I didn't submit a botton 
when I answered them. Now the latest patch set is ready for your review. Thanks 
for your notification.

http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@368
PS6, Line 368:   /// to the error.
> Explain what error_file_offsets does in comment
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382
PS6, Line 382: r. Ret
> Wouldn't this report an invalid value (column inside row)?
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382
PS6, Line 382:   /// readers to exercise various failure paths in Parquet 
scanner. Returns the status
> It seems confusing to me that there is now a LogColumnParseError and Report
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@391
PS6, Line 391:
> Start comments with three /// like the other lines.
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@393
PS6, Line 393:  tuple (e.g. fields[0] ma
> No need to mention its type since it's the same as in the declaration itsel
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@408
PS6, Line 408:   /// the reason that the out error parameters are typed uint8_t 
instead of bool. We need
> This looks like it could have some performance impact. Did you do any perf
Done, the computation is moved to the error reporting function. By the way, 
HdfsSequenceScanner::ProcessRange needs the computation because I could not 
find a way to apply the computation logic. See line#337 
https://gerrit.cloudera.org/#/c/8747/7/be/src/exec/hdfs-sequence-scanner.cc


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@233
PS6, Line 233:     uint8_t* error_fields, uint8_t* error_in_row) {
> nit: wrap at 90 characters.
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@254
PS6, Line 254:   return EvalConjuncts(tuple_row);
> Is +1 for the delimiter? If so, can you add a brief comment?
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@578
PS6, Line 578:   DCHECK_EQ(replaced, 1);
> Please indent function calls 4 characters, here and elsewhere.
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@759
PS6, Line 759:
> nit: line wrapping.
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@761
PS6, Line 761:
> Why does this not check state_->HasLogSpace() but the method below does?
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@768
PS6, Line 768:
> I think the message was clearer before the change.
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README@136
PS6, Line 136: signed_integer_logical_types
> Can you think of a name that captures the nature of the problems? Most of t
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@30
PS6, Line 30: from parquet.ttypes import ConvertedType
> I think elsewhere we just use "os.path.join()" when we need it, so for cons
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@841
PS6, Line 841:
             :   @classmethod
> nit: missing word?
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@842
PS6, Line 842:   @classmethod
             :   def get_workload(cls):
> Rephrase to say what it should do, not what the problem was.
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@852
PS6, Line 852:       self.hdfs_client = get_hdfs_client_from_conf(hdfs_conf)
> If you change this to "if p.c.o...: " it will also cover the empty string.
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@860
PS6, Line 860:       "invalid_int":
> I think it might be cleaner if every test prepared their own test data sepa
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@862
PS6, Line 862:         "dir_path": os.path.join(tmpdir.strpath, "invalid_int"),
> This will break with parallel test runs. Use a tmpdir fixture for pytest in
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@876
PS6, Line 876: put
> nit: putting
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@888
PS6, Line 888:       WITH SERDEPROPERTIES ('field.delim'='\t', 
'serialization.format'='\t')
> nit: Will this fit within 90 characters of the previous line?
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@891
PS6, Line 891:     """
> Might be better to inline this to reduce complexity?
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@911
PS6, Line 911: "Er
> Can you think of a better name?
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@917
PS6, Line 917:
> Is each of these errors required? Can the order of the messages ever change
Done


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@930
PS6, Line 930:
> These should be separate test_..() methods.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f
Gerrit-Change-Number: 8747
Gerrit-PatchSet: 7
Gerrit-Owner: Kim Jin Chul <[email protected]>
Gerrit-Reviewer: Kim Jin Chul <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Comment-Date: Tue, 16 Jan 2018 19:51:07 +0000
Gerrit-HasComments: Yes

Reply via email to