[ https://issues.apache.org/jira/browse/CALCITE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Volodymyr Vysotskyi resolved CALCITE-3138. ------------------------------------------ Resolution: Fixed Fixed in [1e62d3d|https://github.com/apache/calcite/commit/1e62d3d64fc217d14016702237b4f8d56b3683f2]. Thanks, [~IhorHuzenko], for fixing this issue and [~danny0405] for the review! > RelStructuredTypeFlattener doesn't restructure ROW type fields > --------------------------------------------------------------- > > Key: CALCITE-3138 > URL: https://issues.apache.org/jira/browse/CALCITE-3138 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Haisheng Yuan > Assignee: Igor Guzenko > Priority: Major > Labels: pull-request-available > Fix For: 1.21.0 > > Attachments: ROW_repro.patch > > Time Spent: 7h 20m > Remaining Estimate: 0h > > 1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support > ROW type. However, ROW type is flattened by > RelStructuredTypeFlattener just like struct. So when user queries one column > with complex type ROW, after flattening and restructuring > top level project returns reference to first inner primitive field of the > requested column. > 2) Another finding is related to ITEM expression applied to array of structs > column. For example, let's imagine a table with column of > type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL > array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array > column > from Scan and 1 is index literal. Current flattener generates two field acess > expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account > that ITEM($0, 1).b returns struct which also should be flattened. > 3) In some cases applying of ITEM after flattenning is no longer possible. > For example, consider column with type > STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and > Calcite creates ITEM($0,'b'). > After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and > the old projection ITEM($0,'b') can't > be applied anymore. So now it should be converted to refer only subset of > fields ($1,$2) from flattening project. > UPDATES IN EXPECTED TEST RESULTS: > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testNestedColumnType() > {code} > select empa.home_address.zip from sales.emp_address empa where > empa.home_address.city = 'abc' > {code} > OLD RESULT: > {code} > LogicalProject(ZIP=[$4]) > LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))]) > LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], > ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], > ZIP7=[$2.ZIP], STATE8=[$2.STATE]) > LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) > {code} > 1. Above in logical filter, condition references to field $3 which is > ZIP=[$1.ZIP] field from previous project, > however in original query filtering should be done by CITY field. > 2. Also the top level project references to $4 field, which is > STATE=[$1.STATE] field from project, but original > query requested ZIP field. > > UPDATED RESULT: > {code} > LogicalProject(ZIP=[$3]) > LogicalFilter(condition=[=($2, 'abc')]) > LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], > ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], > ZIP7=[$2.ZIP], STATE8=[$2.STATE]) > LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) > {code} > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testStructTypeAlias() > {code} > select t.r AS myRow from (select row(row(1)) r from dept) t > {code} > OLD RESULT: > {code} > LogicalProject(MYROW$$0$$0=[1]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > 1. Inside the subselect of row(row(1)) type of returned column is > RecordType(RecordType(INTEGER EXPR$0) EXPR$0), > but the top level project uses flattened expression and returns to user > literal 1 with type RecordType(INTEGER MYROW$$0$$0), > although the type doesn't match type returned by row(row(1)) expression. > 2. Also it's suspicious that caller expects returned column to have name > 'myRow' but gets 'MYROW$$0$$0'. > > UPDATED RESULT: > {code} > LogicalProject(MYROW=[ROW(ROW($0))]) > LogicalProject(MYROW$$0$$0=[1]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testFlattenRecords() > {code} > select employees[1] from dept_nested > {code} > OLD RESULT: > {code} > LogicalProject(EXPR$0=[$0]) > LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], > EXPR$0$2=[ITEM($3, 1).DETAIL]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) > {code} > 1. Given selection of element by index from array, which by definition > contain elements of type: > {code} > STRUCT<EMPNO:INT, > ENAME:VARCHAR(10), > DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10), > DESC:VARCHAR(20), > > OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>> > {code} > First problem is that the intermediate project on top of the scan expected to > return only flattened types. > But last projection expression returns `DETAIL` field of array element which > has type STRUCT and contain array > field named 'SKILLS'. > 2. The top level project returns incorrect flattened EMPNO:INT field from > retrieved array element instead > of whole array element of type struct. > > UPDATED RESULT: > {code} > LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))]) > LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], > EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) > {code} > Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java > are more explicit and visible in PR. -- This message was sent by Atlassian Jira (v8.3.2#803003)