Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19699 )

Change subject: IMPALA-10798: Prototype a simple JSON File reader
......................................................................


Patch Set 23:

(7 comments)

The patch is in good shape. I ran the following command to go through all tests 
on text format:

 git grep file_format tests | grep "'text'"

Identified some tests that would be good to add for json.

http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h
File be/src/exec/json-parser.h:

http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h@37
PS22, Line 37:
> > Can we move this file into be/src/exec/json if it's only used by
It seems to be tightly coupled with hdfs-json-scanner so I think we can put it 
in /json unless it can be reused in other places.

Moving the implementation codes to json-parser.cc helps to speedup 
recompilation when you have code changes. Also helps to make this header file 
shorter and easier for going through. You can keep some short methods in the 
header file and just move large methods like Parse().


http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc
File be/src/exec/json/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc@300
PS22, Line 300:   // TODO: Support Invoke CodeGend WriteSlot
> > A follow-up question for https://gerrit.cloudera.org/c/19699/18/be/src/ex
I checked other scanners and they have the same behavior. I think it's ok to 
ignore failures and just set nulls on the slot as long as we can report the mem 
issue in other places, e.g. report mem issue when we want to read more data. 
Currently, we have 'buffer_status_' that can report this. So I think the 
current implementation is ok.


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py
File tests/data_errors/test_data_errors.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py@128
PS23, Line 128:     self.run_test_case('DataErrorsTest/hdfs-scan-node-errors', 
vector)
Can we add a similar test for json?


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py
File tests/query_test/test_cancellation.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py@113
PS23, Line 113: 'text'
Let's add json here


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py
File tests/query_test/test_chars.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@37
PS23, Line 37: 'text'
Let's test json here


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@68
PS23, Line 68: 'text'
Let's test json here as well


http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py
File tests/query_test/test_date_queries.py:

http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py@45
PS23, Line 45: 'text'
Let's add json here. Please also update the above comment. DATE type is also 
supported in orc and json?



--
To view, visit http://gerrit.cloudera.org:8080/19699
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Gerrit-Change-Number: 19699
Gerrit-PatchSet: 23
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Tue, 15 Aug 2023 01:08:55 +0000
Gerrit-HasComments: Yes

Reply via email to