Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19699 )
Change subject: IMPALA-10798: Prototype a simple JSON File reader ...................................................................... Patch Set 23: (7 comments) The patch is in good shape. I ran the following command to go through all tests on text format: git grep file_format tests | grep "'text'" Identified some tests that would be good to add for json. http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h File be/src/exec/json-parser.h: http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json-parser.h@37 PS22, Line 37: > > Can we move this file into be/src/exec/json if it's only used by It seems to be tightly coupled with hdfs-json-scanner so I think we can put it in /json unless it can be reused in other places. Moving the implementation codes to json-parser.cc helps to speedup recompilation when you have code changes. Also helps to make this header file shorter and easier for going through. You can keep some short methods in the header file and just move large methods like Parse(). http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc File be/src/exec/json/hdfs-json-scanner.cc: http://gerrit.cloudera.org:8080/#/c/19699/22/be/src/exec/json/hdfs-json-scanner.cc@300 PS22, Line 300: // TODO: Support Invoke CodeGend WriteSlot > > A follow-up question for https://gerrit.cloudera.org/c/19699/18/be/src/ex I checked other scanners and they have the same behavior. I think it's ok to ignore failures and just set nulls on the slot as long as we can report the mem issue in other places, e.g. report mem issue when we want to read more data. Currently, we have 'buffer_status_' that can report this. So I think the current implementation is ok. http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py File tests/data_errors/test_data_errors.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/data_errors/test_data_errors.py@128 PS23, Line 128: self.run_test_case('DataErrorsTest/hdfs-scan-node-errors', vector) Can we add a similar test for json? http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_cancellation.py@113 PS23, Line 113: 'text' Let's add json here http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py File tests/query_test/test_chars.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@37 PS23, Line 37: 'text' Let's test json here http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_chars.py@68 PS23, Line 68: 'text' Let's test json here as well http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py File tests/query_test/test_date_queries.py: http://gerrit.cloudera.org:8080/#/c/19699/23/tests/query_test/test_date_queries.py@45 PS23, Line 45: 'text' Let's add json here. Please also update the above comment. DATE type is also supported in orc and json? -- To view, visit http://gerrit.cloudera.org:8080/19699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569 Gerrit-Change-Number: 19699 Gerrit-PatchSet: 23 Gerrit-Owner: Zihao Ye <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Tue, 15 Aug 2023 01:08:55 +0000 Gerrit-HasComments: Yes
