Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 )
Change subject: IMPALA-5993: Fix the file offset in value parsing error ...................................................................... Patch Set 7: (25 comments) Hi Lars, Sorry I thought I already replied your comments, but I didn't submit a botton when I answered them. Now the latest patch set is ready for your review. Thanks for your notification. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@368 PS6, Line 368: /// to the error. > Explain what error_file_offsets does in comment Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: r. Ret > Wouldn't this report an invalid value (column inside row)? Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: /// readers to exercise various failure paths in Parquet scanner. Returns the status > It seems confusing to me that there is now a LogColumnParseError and Report Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@391 PS6, Line 391: > Start comments with three /// like the other lines. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@393 PS6, Line 393: tuple (e.g. fields[0] ma > No need to mention its type since it's the same as in the declaration itsel Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@408 PS6, Line 408: /// the reason that the out error parameters are typed uint8_t instead of bool. We need > This looks like it could have some performance impact. Did you do any perf Done, the computation is moved to the error reporting function. By the way, HdfsSequenceScanner::ProcessRange needs the computation because I could not find a way to apply the computation logic. See line#337 https://gerrit.cloudera.org/#/c/8747/7/be/src/exec/hdfs-sequence-scanner.cc http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@233 PS6, Line 233: uint8_t* error_fields, uint8_t* error_in_row) { > nit: wrap at 90 characters. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@254 PS6, Line 254: return EvalConjuncts(tuple_row); > Is +1 for the delimiter? If so, can you add a brief comment? Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@578 PS6, Line 578: DCHECK_EQ(replaced, 1); > Please indent function calls 4 characters, here and elsewhere. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@759 PS6, Line 759: > nit: line wrapping. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@761 PS6, Line 761: > Why does this not check state_->HasLogSpace() but the method below does? Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@768 PS6, Line 768: > I think the message was clearer before the change. Done http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README@136 PS6, Line 136: signed_integer_logical_types > Can you think of a name that captures the nature of the problems? Most of t Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@30 PS6, Line 30: from parquet.ttypes import ConvertedType > I think elsewhere we just use "os.path.join()" when we need it, so for cons Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@841 PS6, Line 841: : @classmethod > nit: missing word? Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@842 PS6, Line 842: @classmethod : def get_workload(cls): > Rephrase to say what it should do, not what the problem was. Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@852 PS6, Line 852: self.hdfs_client = get_hdfs_client_from_conf(hdfs_conf) > If you change this to "if p.c.o...: " it will also cover the empty string. Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@860 PS6, Line 860: "invalid_int": > I think it might be cleaner if every test prepared their own test data sepa Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@862 PS6, Line 862: "dir_path": os.path.join(tmpdir.strpath, "invalid_int"), > This will break with parallel test runs. Use a tmpdir fixture for pytest in Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@876 PS6, Line 876: put > nit: putting Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@888 PS6, Line 888: WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t') > nit: Will this fit within 90 characters of the previous line? Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@891 PS6, Line 891: """ > Might be better to inline this to reduce complexity? Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@911 PS6, Line 911: "Er > Can you think of a better name? Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@917 PS6, Line 917: > Is each of these errors required? Can the order of the messages ever change Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@930 PS6, Line 930: > These should be separate test_..() methods. Done -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin Chul <[email protected]> Gerrit-Reviewer: Kim Jin Chul <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Comment-Date: Tue, 16 Jan 2018 19:51:07 +0000 Gerrit-HasComments: Yes
