Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/22289 )
Change subject: IMPALA-12927: Support specifying format for reading JSON BINARY columns ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/22289/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/22289/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@572 PS1, Line 572: > Done, you are right, we indeed need to consider the case where a table can Sadly this is still not a proper solution, as different tables used in the query can have different JSON_BINARY_FORMAT parameter, while the query option changed here is global.See my other comment for details on how could this be propagated to the backend. http://gerrit.cloudera.org:8080/#/c/22289/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/22289/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@570 PS3, Line 570: String specificFormat = : tbl_.getMetaStoreTable().getParameters().get(JSON_BINARY_FORMAT); I am concerned that we are not completely correct here. I think that this can be also overridden per partition as a serde property. The following line in Hive looks up a combination of table and serde properties, though I don't know what the exact rule is: https://github.com/apache/hive/blob/dd5b05e4be89a3c705675871450821b66b8ec4d7/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135 I bumped into some confusing behavior related to serde properties vs table properties during another review https://gerrit.cloudera.org/#/c/22049/ This review contains an example of propagating potentially per partition properties to the backand: https://gerrit.cloudera.org/#/c/22049/8/common/thrift/CatalogObjects.thrift One issue is that Hive is buggy ( https://issues.apache.org/jira/browse/HIVE-28719 ), another is that Impala looks for specifically table property or serde propery, while in Hive you can use the same name in both. Created IMPALA-13748 about this. -- To view, visit http://gerrit.cloudera.org:8080/22289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf61fa3afc0f33caa63fbc05393e975733165e82 Gerrit-Change-Number: 22289 Gerrit-PatchSet: 3 Gerrit-Owner: Zihao Ye <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Tue, 11 Feb 2025 10:50:31 +0000 Gerrit-HasComments: Yes
