Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22289 )

Change subject: IMPALA-12927: Support specifying format for reading JSON BINARY 
columns
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22289/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/22289/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@572
PS1, Line 572:
> Done, you are right, we indeed need to consider the case where a table can
Sadly this is still not a proper solution, as different tables used in the 
query can have different JSON_BINARY_FORMAT parameter, while the query option 
changed here is global.See my other comment for details on how could this be 
propagated to the backend.


http://gerrit.cloudera.org:8080/#/c/22289/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/22289/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@570
PS3, Line 570:     String specificFormat =
             :         
tbl_.getMetaStoreTable().getParameters().get(JSON_BINARY_FORMAT);
I am concerned that we are not completely correct here.
I think that this can be also overridden per partition as a serde property. The 
following line in Hive looks up a combination of table and serde properties, 
though I don't know what the exact rule is:

https://github.com/apache/hive/blob/dd5b05e4be89a3c705675871450821b66b8ec4d7/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135

I bumped into some confusing behavior related to serde properties vs table 
properties during another review https://gerrit.cloudera.org/#/c/22049/
This review contains an example of propagating potentially per partition 
properties to the backand: 
https://gerrit.cloudera.org/#/c/22049/8/common/thrift/CatalogObjects.thrift

One issue is that Hive is buggy (
https://issues.apache.org/jira/browse/HIVE-28719 ), another is that Impala 
looks for specifically table property or serde propery, while in Hive you can 
use the same name in both. Created IMPALA-13748 about this.



--
To view, visit http://gerrit.cloudera.org:8080/22289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf61fa3afc0f33caa63fbc05393e975733165e82
Gerrit-Change-Number: 22289
Gerrit-PatchSet: 3
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Tue, 11 Feb 2025 10:50:31 +0000
Gerrit-HasComments: Yes

Reply via email to