[jira] [Created] (CALCITE-3677) Add assertion to EnumerableTableScan constructor to validate if the table is suitable for enumerable scan

2020-01-03 Thread Vladimir Sitnikov (Jira)
Vladimir Sitnikov created CALCITE-3677:
--

 Summary: Add assertion to EnumerableTableScan constructor to 
validate if the table is suitable for enumerable scan
 Key: CALCITE-3677
 URL: https://issues.apache.org/jira/browse/CALCITE-3677
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0
Reporter: Vladimir Sitnikov
Assignee: Vladimir Sitnikov


RelOptTableImpl#toRel (and some test methods) use EnumerableTableScan.create 
explicitly even in case the table is not implementable with enumerable scan.

It does not hurt much for sql to rel translation, however, it is sad when a 
plan fails during execution.

So EnumerableTableScan should have an assertion to prevent invalid uses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3676) VolcanoPlanner. dumpGraphviz should handle exception gracefully

2020-01-03 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3676:
--

 Summary: VolcanoPlanner. dumpGraphviz should handle exception 
gracefully
 Key: CALCITE-3676
 URL: https://issues.apache.org/jira/browse/CALCITE-3676
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


In case VolcanoPlanner can't generate plan due to some RelSubset doesn't have 
best rel, it will dump the sets and Graphviz. Since there is error during 
planning, we want to see the sets in the exception message, but when dumping 
Graphviz, it might encounter another exception (we can't guarantee all the sets 
and subsets are correctly generated for Graphviz), causing failure to dump Sets 
info, which is rather helpful in production system.


{code:java}
Caused by: java.lang.AssertionError
at 
org.apache.calcite.util.PartiallyOrderedSet.findParentsChildren(PartiallyOrderedSet.java:318)
at 
org.apache.calcite.util.PartiallyOrderedSet.findParents(PartiallyOrderedSet.java:308)
at 
org.apache.calcite.util.PartiallyOrderedSet.add(PartiallyOrderedSet.java:226)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.dumpGraphviz(VolcanoPlanner.java:1320)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.dump(VolcanoPlanner.java:1194)
at 
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:606)
at 
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:307)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:649)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] CALCITE-3661, CALCITE-3665, MaterializationTest vs HR schema statistics

2020-01-03 Thread Vladimir Sitnikov
Hi,

It looks like MaterializationTest heavily relies on inaccurate statistics
for hr.emps and hr.depts tables.

I was trying to improve statistic estimation for better join planning (see
https://github.com/apache/calcite/pull/1712 ),
and it looks like better estimates open the eyes of the optimizer, and now
it realizes it does not really need to use materialized view
for 4-row long table.

In other words, the cost of the table access is more-or-less the same as
the cost of the materialized view access.

It looks like the way to go here is to add hr_with_extra_rows scheme so it
contains the same emps and depts tables, but it should
have bigger tables.
Adding rows to the existing emps table is not an option because it would
invalidate lots of tests.

Does anybody have better ideas?

Vladimir


Re: [DISCUSS] Stream tables vs hash joins

2020-01-03 Thread Rui Wang
+1 on having a type of property on Relnode to make know which node is steam
or non-stream.  In Apache Beam SQL's practice, stream joins are already
metadata driven in which if there is one side stream and another side
non-stream, we use hash-join like implementation but build that table on
the non-stream side.


Technically, it's feasible to build hash table on stream side even if it is
infinite (but I guess the rational is this stream is very small and it
could join on a TB level non-steam data). New stream data will update the
hash table. In this case, the implementation have to update data
accordingly based on the new arriving data, which turned out to be
difficult to implement.


-Rui

On Fri, Jan 3, 2020 at 10:19 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Hi,
>
> Stream tables do not play very well for hash joins.
> In other words, if hash join would try to build a lookup table out of a
> stream, it could just run out of memory.
>
> Is there metadata or something like that to identify stream-like inputs so
> hash join would ensure it does not
> try to build a lookup table out of the stream?
>
> The case is org.apache.calcite.test.StreamTest#testStreamToRelationJoin
> which transforms to the following.
> The plan is wrong because it would build hash lookup out of the second
> input which happens to be (infinite?) (STREAM).
>
> As a temporary workaround, I will increase the estimated rowcount for
> orders table to 100'000, but it would be nice to make those decisions
> metadata-driven.
>
> EnumerableProject(ROWTIME=[$2], ORDERID=[$3], SUPPLIERID=[$1]): rowcount =
> 3000.0, cumulative cost = {6950.0 rows, 9650.0 cpu, 0.0 io}, id = 603
>   EnumerableHashJoin(condition=[=($0, $6)], joinType=[inner]): rowcount =
> 3000.0, cumulative cost = {3950.0 rows, 650.0 cpu, 0.0 io}, id = 602
> EnumerableInterpreter: rowcount = 200.0, cumulative cost = {100.0 rows,
> 100.0 cpu, 0.0 io}, id = 599
>   BindableTableScan(table=[[STREAM_JOINS, PRODUCTS]]): rowcount =
> 200.0, cumulative cost = {2.0 rows, 2.0102 cpu, 0.0 io}, id =
> 122
> EnumerableProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3],
> PRODUCT0=[CAST($2):VARCHAR(32) NOT NULL]): rowcount = 100.0, cumulative
> cost = {150.0 rows, 550.0 cpu, 0.0 io}, id = 601
>   EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0
> rows, 50.0 cpu, 0.0 io}, id = 600
> BindableTableScan(table=[[STREAM_JOINS, ORDERS, (STREAM)]]):
> rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 182
>
> Vladimir
>


[DISCUSS] Stream tables vs hash joins

2020-01-03 Thread Vladimir Sitnikov
Hi,

Stream tables do not play very well for hash joins.
In other words, if hash join would try to build a lookup table out of a
stream, it could just run out of memory.

Is there metadata or something like that to identify stream-like inputs so
hash join would ensure it does not
try to build a lookup table out of the stream?

The case is org.apache.calcite.test.StreamTest#testStreamToRelationJoin
which transforms to the following.
The plan is wrong because it would build hash lookup out of the second
input which happens to be (infinite?) (STREAM).

As a temporary workaround, I will increase the estimated rowcount for
orders table to 100'000, but it would be nice to make those decisions
metadata-driven.

EnumerableProject(ROWTIME=[$2], ORDERID=[$3], SUPPLIERID=[$1]): rowcount =
3000.0, cumulative cost = {6950.0 rows, 9650.0 cpu, 0.0 io}, id = 603
  EnumerableHashJoin(condition=[=($0, $6)], joinType=[inner]): rowcount =
3000.0, cumulative cost = {3950.0 rows, 650.0 cpu, 0.0 io}, id = 602
EnumerableInterpreter: rowcount = 200.0, cumulative cost = {100.0 rows,
100.0 cpu, 0.0 io}, id = 599
  BindableTableScan(table=[[STREAM_JOINS, PRODUCTS]]): rowcount =
200.0, cumulative cost = {2.0 rows, 2.0102 cpu, 0.0 io}, id =
122
EnumerableProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3],
PRODUCT0=[CAST($2):VARCHAR(32) NOT NULL]): rowcount = 100.0, cumulative
cost = {150.0 rows, 550.0 cpu, 0.0 io}, id = 601
  EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0
rows, 50.0 cpu, 0.0 io}, id = 600
BindableTableScan(table=[[STREAM_JOINS, ORDERS, (STREAM)]]):
rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 182

Vladimir


[jira] [Created] (CALCITE-3675) SQL to Rel conversion is broken for coalesce on nullable field

2020-01-03 Thread Anton Haidai (Jira)
Anton Haidai created CALCITE-3675:
-

 Summary: SQL to Rel conversion is broken for coalesce on nullable 
field
 Key: CALCITE-3675
 URL: https://issues.apache.org/jira/browse/CALCITE-3675
 Project: Calcite
  Issue Type: Bug
Affects Versions: next
Reporter: Anton Haidai


Reproducible in master (06ac187a342f82a4b69e4c752ccdce0c269a350d): 
1.22.0-SNAPSHOT

SqlToRelConverterTest:
{code}
  @Test public void testCoalesceOnNullableField() {
final String sql = "select coalesce(mgr, 0) from emp";
sql(sql).ok();
  }
{code}

Error:
{code}
Conversion to relational algebra failed to preserve datatypes:
validated type:
RecordType(INTEGER NOT NULL EXPR$0) NOT NULL
converted type:
RecordType(INTEGER EXPR$0) NOT NULL
rel:
LogicalProject(EXPR$0=[CASE(IS NOT NULL($3), $3, 0)])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3674) EnumerableMergeJoinRule fails with NPE on nullable join keys

2020-01-03 Thread Vladimir Sitnikov (Jira)
Vladimir Sitnikov created CALCITE-3674:
--

 Summary: EnumerableMergeJoinRule fails with NPE on nullable join 
keys
 Key: CALCITE-3674
 URL: https://issues.apache.org/jira/browse/CALCITE-3674
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0
Reporter: Vladimir Sitnikov


Sample exception:

{noformat}
Caused by: java.lang.NullPointerException
at java.lang.Short.compareTo(Short.java:445)
at java.lang.Short.compareTo(Short.java:43)
at 
org.apache.calcite.linq4j.EnumerableDefaults$MergeJoinEnumerator.advance(EnumerableDefaults.java:3866)
at 
org.apache.calcite.linq4j.EnumerableDefaults$MergeJoinEnumerator.moveNext(EnumerableDefaults.java:3918)
at 
org.apache.calcite.linq4j.EnumerableDefaults.aggregate(EnumerableDefaults.java:118)
at 
org.apache.calcite.linq4j.DefaultEnumerable.aggregate(DefaultEnumerable.java:104)
at Baz.bind(Unknown Source)
at 
org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(CalcitePrepare.java:355)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.enumerable(CalciteConnectionImpl.java:315){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3670) Add ability to release resources when connection is closed

2020-01-03 Thread Vladimir Sitnikov (Jira)
Vladimir Sitnikov created CALCITE-3670:
--

 Summary: Add ability to release resources when connection is closed
 Key: CALCITE-3670
 URL: https://issues.apache.org/jira/browse/CALCITE-3670
 Project: Calcite
  Issue Type: New Feature
  Components: core
Reporter: Vladimir Sitnikov


Calcite schemas might often hold resources to external systems, however, it is 
not clear when the connection should be released.

What if Calcite could close schemas that implements Closeable when Calcite 
connection is closed?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)