Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Prototype a simple JSON File reader
......................................................................


Patch Set 22:

(9 comments)

The patch is looking good! I mostly look into the error handling in this round.

Not sure if we have tests about exceeding mem_limit. If not, we can add a test 
to write a json file with huge strings and read it in a query with a low 
mem_limit that will be exceeded.

http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h
File be/src/exec/json-parser.h:

http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h@37
PS22, Line 37:
Can we move this file into be/src/exec/json if it's only used by 
hdfs-json-scanner?
Also do you plan to move the implementation codes into a json-parser.cc file?


http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h@59
PS22, Line 59: /// means return true on success.
nit: "The following functions materialize output tuples. Functions with void 
return type must succeed. Functions with bool return type return true on 
succeed, and return false to stop parsing the whole scan range."


http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h@213
PS22, Line 213: be
nit: "been"


http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.h
File be/src/exec/json/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.h@47
PS22, Line 47: /// exactly one scanner.
Let's also mention the error handling, i.e. how different kinds of errors are 
handled.


http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc
File be/src/exec/json/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc@188
PS22, Line 188:        << desc->col_pos() - scan_node_->num_partition_keys()
It'd be more helpful to print the column name and table name, e.g.
Error converting column 'key1' of table my_tbl to bigint


http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc@211
PS22, Line 211:   return Status::OK();
It seems we should return a non-ok status for unrecoverable situations, e.g. 
running out of memory.


http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc@300
PS22, Line 300:   if (LIKELY(text_converter_->WriteSlot(slot_desc, &aux_type, 
tuple_, data, len, true,
A follow-up question for 
https://gerrit.cloudera.org/c/19699/18/be/src/exec/json-parser.h#303

This can return false in copying strings when we run out of memory:
https://github.com/apache/impala/blob/af3f56e6d1605a56f7bd02b0af35be980a7e4c63/be/src/exec/text-converter.inline.h#L96

It seems we will return true in HandleConvertError() and let RapidJSON continue 
parsing. Can we stop it and report the mem issue? Or did I miss something?


http://gerrit.cloudera.org:8080/#/c/19699/22/testdata/data/json_test/malformed.json
File testdata/data/json_test/malformed.json:

http://gerrit.cloudera.org:8080/#/c/19699/22/testdata/data/json_test/malformed.json@2
PS22, Line 2: {"bool_col":False,"int_col":1,"float_col":0.1,"string_col":abc123}
Can we also add these cases?

{ }
[ ]
( )
{"string_col":"abc123"}
["string_col", "abc123"]


http://gerrit.cloudera.org:8080/#/c/19699/21/testdata/data/json_test/multiline.json
File testdata/data/json_test/multiline.json:

http://gerrit.cloudera.org:8080/#/c/19699/21/testdata/data/json_test/multiline.json@4
PS21, Line 4: 1234
            : 567
> > Just curious, what the behavior if this is parsed as a numeric
Ack



--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 22
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Mon, 24 Jul 2023 11:32:52 +0000
Gerrit-HasComments: Yes

Reply via email to