Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Prototype a simple JSON File reader
......................................................................


Patch Set 18:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/19699/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19699/18//COMMIT_MSG@23
PS18, Line 23:
Could you add a section for the current limitations? E.g.

Limitations
 - multiline json objects are not supported
 - compressed json files are not supported

Does this patch support parsing complex types from json files?


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h
File be/src/exec/json-parser.h:

http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h@45
PS18, Line 45: begin and end
nit: add quotes on var names, i.e. 'begin' and 'end'


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h@49
PS18, Line 49: some
nit: remove "some"


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h@68
PS18, Line 68: ///   bool AddNumber(int index, const char* str, uint32_t len);
nit: it'd be helpful to give a doc link here. E.g.

 /// See more in https://rapidjson.org/classrapidjson_1_1_handler.html


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h@143
PS18, Line 143:         !memcmp(field_found_.data(), field_found_.data() + 1, 
field_found_.size() - 1)));
This is not that readable. Could you add some comments and break them into 
smaller conditions?


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h@169
PS18, Line 169:           rapidjson::kParseStopWhenDoneFlag;
What about kParseInsituFlag? Can we use it in our scenario?


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h@230
PS18, Line 230:     DCHECK(!IsRequiredField());
nit: This is a bit confusing. It might be better to use 
DCHECK_EQ(current_field_idx_, -1) directly.


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json-parser.h@242
PS18, Line 242:   bool StartObject() {
nit: could you add a comment before each methods mentioning they are interfaces 
of handler used in RapidJson? e.g.

  /// Handler methods used in Rapidjson


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json/hdfs-json-scanner.cc
File be/src/exec/json/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json/hdfs-json-scanner.cc@114
PS18, Line 114: the scan range before
nit: "the previous scan range in the same file" might be more clear


http://gerrit.cloudera.org:8080/#/c/19699/18/be/src/exec/json/hdfs-json-scanner.cc@160
PS18, Line 160: if(
nit: add a space after "if"



--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 18
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Tue, 18 Jul 2023 13:41:06 +0000
Gerrit-HasComments: Yes

Reply via email to