[jira] [Created] (CALCITE-3677) Add assertion to EnumerableTableScan constructor to validate if the table is suitable for enumerable scan
Vladimir Sitnikov created CALCITE-3677: -- Summary: Add assertion to EnumerableTableScan constructor to validate if the table is suitable for enumerable scan Key: CALCITE-3677 URL: https://issues.apache.org/jira/browse/CALCITE-3677 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.21.0 Reporter: Vladimir Sitnikov Assignee: Vladimir Sitnikov RelOptTableImpl#toRel (and some test methods) use EnumerableTableScan.create explicitly even in case the table is not implementable with enumerable scan. It does not hurt much for sql to rel translation, however, it is sad when a plan fails during execution. So EnumerableTableScan should have an assertion to prevent invalid uses. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3676) VolcanoPlanner. dumpGraphviz should handle exception gracefully
Haisheng Yuan created CALCITE-3676: -- Summary: VolcanoPlanner. dumpGraphviz should handle exception gracefully Key: CALCITE-3676 URL: https://issues.apache.org/jira/browse/CALCITE-3676 Project: Calcite Issue Type: Bug Components: core Reporter: Haisheng Yuan In case VolcanoPlanner can't generate plan due to some RelSubset doesn't have best rel, it will dump the sets and Graphviz. Since there is error during planning, we want to see the sets in the exception message, but when dumping Graphviz, it might encounter another exception (we can't guarantee all the sets and subsets are correctly generated for Graphviz), causing failure to dump Sets info, which is rather helpful in production system. {code:java} Caused by: java.lang.AssertionError at org.apache.calcite.util.PartiallyOrderedSet.findParentsChildren(PartiallyOrderedSet.java:318) at org.apache.calcite.util.PartiallyOrderedSet.findParents(PartiallyOrderedSet.java:308) at org.apache.calcite.util.PartiallyOrderedSet.add(PartiallyOrderedSet.java:226) at org.apache.calcite.plan.volcano.VolcanoPlanner.dumpGraphviz(VolcanoPlanner.java:1320) at org.apache.calcite.plan.volcano.VolcanoPlanner.dump(VolcanoPlanner.java:1194) at org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:606) at org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:307) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:649) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] CALCITE-3661, CALCITE-3665, MaterializationTest vs HR schema statistics
Hi, It looks like MaterializationTest heavily relies on inaccurate statistics for hr.emps and hr.depts tables. I was trying to improve statistic estimation for better join planning (see https://github.com/apache/calcite/pull/1712 ), and it looks like better estimates open the eyes of the optimizer, and now it realizes it does not really need to use materialized view for 4-row long table. In other words, the cost of the table access is more-or-less the same as the cost of the materialized view access. It looks like the way to go here is to add hr_with_extra_rows scheme so it contains the same emps and depts tables, but it should have bigger tables. Adding rows to the existing emps table is not an option because it would invalidate lots of tests. Does anybody have better ideas? Vladimir
Re: [DISCUSS] Stream tables vs hash joins
+1 on having a type of property on Relnode to make know which node is steam or non-stream. In Apache Beam SQL's practice, stream joins are already metadata driven in which if there is one side stream and another side non-stream, we use hash-join like implementation but build that table on the non-stream side. Technically, it's feasible to build hash table on stream side even if it is infinite (but I guess the rational is this stream is very small and it could join on a TB level non-steam data). New stream data will update the hash table. In this case, the implementation have to update data accordingly based on the new arriving data, which turned out to be difficult to implement. -Rui On Fri, Jan 3, 2020 at 10:19 AM Vladimir Sitnikov < sitnikov.vladi...@gmail.com> wrote: > Hi, > > Stream tables do not play very well for hash joins. > In other words, if hash join would try to build a lookup table out of a > stream, it could just run out of memory. > > Is there metadata or something like that to identify stream-like inputs so > hash join would ensure it does not > try to build a lookup table out of the stream? > > The case is org.apache.calcite.test.StreamTest#testStreamToRelationJoin > which transforms to the following. > The plan is wrong because it would build hash lookup out of the second > input which happens to be (infinite?) (STREAM). > > As a temporary workaround, I will increase the estimated rowcount for > orders table to 100'000, but it would be nice to make those decisions > metadata-driven. > > EnumerableProject(ROWTIME=[$2], ORDERID=[$3], SUPPLIERID=[$1]): rowcount = > 3000.0, cumulative cost = {6950.0 rows, 9650.0 cpu, 0.0 io}, id = 603 > EnumerableHashJoin(condition=[=($0, $6)], joinType=[inner]): rowcount = > 3000.0, cumulative cost = {3950.0 rows, 650.0 cpu, 0.0 io}, id = 602 > EnumerableInterpreter: rowcount = 200.0, cumulative cost = {100.0 rows, > 100.0 cpu, 0.0 io}, id = 599 > BindableTableScan(table=[[STREAM_JOINS, PRODUCTS]]): rowcount = > 200.0, cumulative cost = {2.0 rows, 2.0102 cpu, 0.0 io}, id = > 122 > EnumerableProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3], > PRODUCT0=[CAST($2):VARCHAR(32) NOT NULL]): rowcount = 100.0, cumulative > cost = {150.0 rows, 550.0 cpu, 0.0 io}, id = 601 > EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 > rows, 50.0 cpu, 0.0 io}, id = 600 > BindableTableScan(table=[[STREAM_JOINS, ORDERS, (STREAM)]]): > rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 182 > > Vladimir >
[DISCUSS] Stream tables vs hash joins
Hi, Stream tables do not play very well for hash joins. In other words, if hash join would try to build a lookup table out of a stream, it could just run out of memory. Is there metadata or something like that to identify stream-like inputs so hash join would ensure it does not try to build a lookup table out of the stream? The case is org.apache.calcite.test.StreamTest#testStreamToRelationJoin which transforms to the following. The plan is wrong because it would build hash lookup out of the second input which happens to be (infinite?) (STREAM). As a temporary workaround, I will increase the estimated rowcount for orders table to 100'000, but it would be nice to make those decisions metadata-driven. EnumerableProject(ROWTIME=[$2], ORDERID=[$3], SUPPLIERID=[$1]): rowcount = 3000.0, cumulative cost = {6950.0 rows, 9650.0 cpu, 0.0 io}, id = 603 EnumerableHashJoin(condition=[=($0, $6)], joinType=[inner]): rowcount = 3000.0, cumulative cost = {3950.0 rows, 650.0 cpu, 0.0 io}, id = 602 EnumerableInterpreter: rowcount = 200.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io}, id = 599 BindableTableScan(table=[[STREAM_JOINS, PRODUCTS]]): rowcount = 200.0, cumulative cost = {2.0 rows, 2.0102 cpu, 0.0 io}, id = 122 EnumerableProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3], PRODUCT0=[CAST($2):VARCHAR(32) NOT NULL]): rowcount = 100.0, cumulative cost = {150.0 rows, 550.0 cpu, 0.0 io}, id = 601 EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 50.0 cpu, 0.0 io}, id = 600 BindableTableScan(table=[[STREAM_JOINS, ORDERS, (STREAM)]]): rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 182 Vladimir
[jira] [Created] (CALCITE-3675) SQL to Rel conversion is broken for coalesce on nullable field
Anton Haidai created CALCITE-3675: - Summary: SQL to Rel conversion is broken for coalesce on nullable field Key: CALCITE-3675 URL: https://issues.apache.org/jira/browse/CALCITE-3675 Project: Calcite Issue Type: Bug Affects Versions: next Reporter: Anton Haidai Reproducible in master (06ac187a342f82a4b69e4c752ccdce0c269a350d): 1.22.0-SNAPSHOT SqlToRelConverterTest: {code} @Test public void testCoalesceOnNullableField() { final String sql = "select coalesce(mgr, 0) from emp"; sql(sql).ok(); } {code} Error: {code} Conversion to relational algebra failed to preserve datatypes: validated type: RecordType(INTEGER NOT NULL EXPR$0) NOT NULL converted type: RecordType(INTEGER EXPR$0) NOT NULL rel: LogicalProject(EXPR$0=[CASE(IS NOT NULL($3), $3, 0)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3674) EnumerableMergeJoinRule fails with NPE on nullable join keys
Vladimir Sitnikov created CALCITE-3674: -- Summary: EnumerableMergeJoinRule fails with NPE on nullable join keys Key: CALCITE-3674 URL: https://issues.apache.org/jira/browse/CALCITE-3674 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.21.0 Reporter: Vladimir Sitnikov Sample exception: {noformat} Caused by: java.lang.NullPointerException at java.lang.Short.compareTo(Short.java:445) at java.lang.Short.compareTo(Short.java:43) at org.apache.calcite.linq4j.EnumerableDefaults$MergeJoinEnumerator.advance(EnumerableDefaults.java:3866) at org.apache.calcite.linq4j.EnumerableDefaults$MergeJoinEnumerator.moveNext(EnumerableDefaults.java:3918) at org.apache.calcite.linq4j.EnumerableDefaults.aggregate(EnumerableDefaults.java:118) at org.apache.calcite.linq4j.DefaultEnumerable.aggregate(DefaultEnumerable.java:104) at Baz.bind(Unknown Source) at org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(CalcitePrepare.java:355) at org.apache.calcite.jdbc.CalciteConnectionImpl.enumerable(CalciteConnectionImpl.java:315){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3670) Add ability to release resources when connection is closed
Vladimir Sitnikov created CALCITE-3670: -- Summary: Add ability to release resources when connection is closed Key: CALCITE-3670 URL: https://issues.apache.org/jira/browse/CALCITE-3670 Project: Calcite Issue Type: New Feature Components: core Reporter: Vladimir Sitnikov Calcite schemas might often hold resources to external systems, however, it is not clear when the connection should be released. What if Calcite could close schemas that implements Closeable when Calcite connection is closed? -- This message was sent by Atlassian Jira (v8.3.4#803005)