What does SqlSimpleParser.Query#purgeXXX do ?

2019-11-05 Thread 曲修成
Hi All:
When I use the org.apache.calcite.sql.advise.SqlSimpleParser.Query#simplify
,have a problem:

input sql:
select id,name from emp

but,output sql
select * from emp

Because org.apache.calcite.sql.advise.SqlSimpleParser.Query#purgeSelect,purge
Select

There are many methods like purgeXXX

Why?


Re: Re: Re: Problem with converters and possibly rule matching

2019-11-05 Thread Haisheng Yuan
Hi Vladimir,

The code in PHYSICAL convention L44 looks weird, I think it always returns true.
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44

Try this:
fromTraits.containsIfApplicable(Convention.PHYSICAL)
&& toTraits.containsIfApplicable(Convention.PHYSICAL);

Adding a AbstractConverter on logical operators is meaningless. Calcite is 
mixing the concept of logical and physical together, which is sad.

BTW, using 2 conventions is not appropriate and wrong.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年11月05日 18:02:15
收件人:Haisheng Yuan
抄 送:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Re: Problem with converters and possibly rule matching

Hi Haisheng,

 think I already tried something very similar to what you explained, but it 
gave not an optimal plan. Please let me describe what I did. I would appreciate 
your feedback. 

1) We start with a simple operator tree Root <- Project <- Scan, where the root 
is a final aggregator in the distributed query engine:
-> LogicalRoot
 -> LogicalProject
  -> LogicalScan

2) First, we convert the Root and enforce SINGLETON distribution on a child:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> LogicalProject
   -> LogicalScan

3) Then the project's rule is invoked. It doesn't know the distribution of the 
input, so it requests ANY distribution. Note that we have to set ANY to the 
project as well since we do not know the distribution of the input:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> Enforcer#2[ANY]
-> LogicalScan

4) Finally, the physical scan is created and its distribution is resolved. 
Suppose that it is REPLICATED, i.e. the whole result set is located on all 
nodes.
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> Enforcer#2[ANY]
-> PhysicalScan[REPLICATED]

5) Now as all logical nodes are converted, we start resolving enforcers. The 
second one is no-op, since REPLICATED satisfies ANY:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> PhysicalScan[REPLICATED]

6) But the first enforcer now requires an Exchange, since ANY doesn't satisfy 
SINGLETON!
-> PhysicalRoot[SINGLETON]
 -> SingletonExchange[SINGLETON]
  -> PhysicalProject[ANY] // <= unresolved!
   -> PhysicalScan[REPLICATED]

The resulting plan requires data movement only because we didn't know precise 
distribution of the PhysicalProject when it was created. But should I enable 
Convention.Impl.canConvertConvention, bottom-up propagation kicks in, and the 
correct plan is produced because now LogicalProject has a chance to be 
converted to PhysicalProject with the concrete distribution. The optimized plan 
looks like this (since REPLICATED satisfies SINGLETON):
-> PhysicalRoot[SINGLETON]
 -> PhysicalProject[REPLICATED]
  -> PhysicalScan[REPLICATED]
You may see this in action in my reproducer:
1) Test producing "bad" plan: 
https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45
2) Root enforces SINGLETON on Project: 
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45
3) Project enforces default (ANY) distribution on Scan: 
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49

Please let me know if this flow is similar to what you meant.

Regards,
Vladimir.
пн, 4 нояб. 2019 г. в 10:33, Haisheng Yuan :

Hi Vladimir,

This is still can be done through top-down request approach.

PhysicalFilter operator should request ANY distribution from child operator, 
unless there is outer reference in the filter condition, in which case, 
PhysicalFilter should request SINGLETON or BROADCAST distribution. So in your 
case, PhysicalFilter request ANY, its required distribution will be enforced on 
filter's output.

Regarding index usage, you should have a FIlterTableScan2IndexGet logical 
transformation rule, and a IndexGet2IndexScan physical implementation rule. 
Note that IndexGet is a logical operator and IndexScan is a physical operator, 
which are also used by SQL Server.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年11月01日 17:30:26
收件人:
主 题:Re: Problem with converters and possibly rule matching

Hi Stamatis,

Thank you for your reply. I also thought that we may set the distribution
trait during logical planning because it is known in advance. And the
example I gave will work! :-) But unfortunately, it will work only because
the tree is very simple, and the Project is adjacent to the Scan. This is
how my reproducer will work in that case:
1) Root: enforce "SINGLETON" on Project
2) Project: check the logical Scan, infer already resolved distribution,
then convert to [PhyiscalProject <- PhysicalScan]

Re: [DISCUSS] Towards Avatica 1.16.0

2019-11-05 Thread Francis Chuang
I just wanted to follow up on the Avatica 1.16.0 release as I think it 
would be nice to get it out in the next couple of weeks.


Can someone please review the follow if they have got spare cycles?
- CALCITE-3401 https://github.com/apache/calcite-avatica/pull/115
- CALCITE- https://github.com/apache/calcite-avatica/pull/110
- CALCITE-3162 https://github.com/apache/calcite-avatica/pull/106
- CALCITE-3163 https://github.com/apache/calcite-avatica/pull/105

If CALCITE-2704 can be finished 
(https://github.com/apache/calcite-avatica/pull/85) it would be great too.


Can someone please see if 
https://github.com/apache/calcite-avatica/pull/96 is valid or should be 
closed?


I think it's also possible to get CALCITE-3158 (replace maven with 
Gradle) into this release (thanks, Vladimir!)


Francis

On 22/10/2019 9:31 am, Vladimir Sitnikov wrote:

Francis>files updated to switch over to using a Gradle docker container to
run
Francis>the build/release steps as well.

Docker does unify the environment, however, I'm not sure it is good to make
Docker the only option for the release.
Gradle enables single-command "build+test+publish to SVN/Nexus" workflow,
so I guess it might make docker.sh obsolete.

Could you give it a try?
Here are the commands:
https://github.com/vlsi/vlsi-release-plugins/tree/master/plugins/stage-vote-release-plugin#testing-release-procedure

Francis>If the change to move to Gradle is checked in

To my best knowledge, the missing bits are:
* Documentation (site) update to reflect Maven -> Gradle
* Errorprone
* OWASP plugin
* "Unused dependency"
* "Used but undeclared dependency"
* Removal of pom.xml files

I'm not really sure there's a hard requirement to implement all of the
above before flipping the switch.
I'm inclined that OWASP and "unused/used dependency" can be implemented
later. Freel free to correct me.

Vladimir



Re: Re: [DISCUSS] On-demand traitset request

2019-11-05 Thread Jinfeng Ni
@Haisheng, @Xiening,

Thanks for pointing that previous email out.  Overall, I agree that
the physical trait enforcement should be done in the engine, not in
the rule. For the rule, it should only specify the request, and the
corresponding transformation, and let the engine to explore the search
space. It will be great if we can revamp the Volcano optimizer
framework, to do that way.

In terms of search space, it's always a tradeoff between the space
searched and the optimality of the plan found. I think it's fine for
the engine to explore a potential big search space, as long as it has
effective "bound-and-prune" strategy. In the original Volcano paper,
there is a way to prune the search space based on the best plan found
so far, using the parameter "limit".  When an implementable plan is
found, a "real" cost is obtained, which could be used to prune
un-necessary search space.  That's actually the advantage of Volcano's
"top-down" approach. However,  seems to me that Calcite's Volcano did
not apply that approach effectively, because of the existence of
AbstractConverter.


On Sun, Nov 3, 2019 at 10:12 PM Haisheng Yuan  wrote:
>
> Hi Jinfeng,
>
> I think you might have missed the email about proposed API for physical 
> operators I sent out previously in [1].
>
> We don't need request all the permutation, which is also impossible in 
> practice, the search space is going to explode.
>
> In the example in email [1], I already talked about your concen on passing 
> down parent request into children may lead to less optimal plan. Besically 
> join operator can send 2 collation optimization requests, one is to pass 
> request down, the other one is ignore the parent's request.
>
> Using AbstractConverter to enforce properties is inapporpriate, which handles 
> all the optimization work to physical operator providers, meaning there is 
> almost no physical level optimization mechanism in Calcite. SQL Server and 
> Greenplum's optimizer, which are Cascades framework based, implemented the 
> property enforcement in the core optimizer engine, not through 
> AbstractConverter and rules, physical operators just need to implement those 
> methods (or similar) I mentioned in email [1]. My goal is completely 
> abolishing AbstractConverter.
>
> [1] 
> http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e
>
> - Haisheng
>
> --
> 发件人:Jinfeng Ni
> 日 期:2019年11月01日 14:10:30
> 收件人:
> 主 题:Re: [DISCUSS] On-demand traitset request
>
> Hi Xiening,
>
> "Let say if R and S doesn’t have sorting properties at all. In your
> case, we would end up adding enforcers for LHS and RHS to get
> collation (a, b, c). Then we would need another enforcer to get
> collation (b, c). This is a sub optimal plan as we could have use (b,
> c, a) for join."
>
> In such case, for step 2 when MergeJoin request a permutation match of
> (a, b,c) on both it's input, it is not necessary to end up with
> collation (a, b, c) only. Since it request "permutation", MJ could ask
> all possible satisfying collations, which include (b, c, a). In other
> words, the steps I described did not exclude such plan.
>
> You may argue it would increase the search space. However, by
> limiting the search space, without explore all possible choice, we may
> lose the chance getting 'optimal' plan we want. For instance, in the
> above example, the idea of passing "on demand" trait request (b,c)
> from Agg to MJ is to avoid unnecessary sort (b,c). In cases where the
> join condition has good filtering, and such sort of join output could
> be quite cheap. Yet in the plan enumeration, since we use "on demand"
> trait request from parent to guide the actions of MJ, I'm not sure if
> we may restrict the choices we consider in the legs of join, whose
> cardinality could be larger and play a bigger role in the overall
> cost.
>
> In other words, by using "on demand" trait request, we may restrict
> the choices of plan, possibly in the some operators with larger data
> size.
>
> In the current implementation of VolcanoPlanner, I feel the root issue
> of long planning time is not to explore all possible satisfying trait.
> It is actually the unnecessary of AbstractConverter, added to the
> equivalence class.
>
>
> On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai  wrote:
> >
> > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> >
> > There’s one minor issue with your example. Let say if R and S doesn’t have 
> > sorting properties at all. In your case, we would end up adding enforcers 
> > for LHS and RHS to get collation (a, b, c). Then we would need another 
> > enforcer to get collation (b, c). This is a sub optimal plan as we could 
> > have use (b, c, a) for join.
> >
> > I think in step #2, the join operator would need to take the agg trait 
> > requirement into account. Then it would have two options -
> >
> > 1) require 

Re: Geofunction

2019-11-05 Thread Julian Hyde
I’m surprised we don’t support INTEGER to DECIMAL, since it is lossless 
(unlike, say, converting from INTEGER to REAL). Though I don’t recall what the 
SQL standard says about that conversion. 

It is reasonable to be able to call the GIS functions with integer arguments. 

Julian

> On Nov 5, 2019, at 4:21 AM, Kirils Mensikovs  wrote:
> 
> 
> Here is JIRA ticket:
> https://issues.apache.org/jira/browse/CALCITE-3477
> 
> If you can point me where can I fix it, it would be nice.
> 
> Thanks,
> -Kiril
> 
>> On Nov 4, 2019, at 9:41 PM, Julian Hyde  wrote:
>> 
>> Can you log a bug, please?
>> 
>>> On Nov 1, 2019, at 5:01 AM, Kirils Mensikovs  wrote:
>>> 
>>> Hi, 
>>> 
>>> I am using ST_MakePoint(1.0, 1.0) and that works fine. However, when I try 
>>> to run following command ST_MakePoint(1, 1), I got exception:
>>> `No applicable constructor/method found for actual parameters "int, int”`
>>> 
>>> How to automatically cast int to BigDecimal?
>>> 
>>> Thanks,
>>> -Kiril
>> 
> 


[avatica] STRUCT type information missing in type name

2019-11-05 Thread Alessandro Solimando
Hello,
I noticed that type information inside the "name" field into
"ColumnMetaData" are dropped for the inner components of STRUCT, while they
are preserved for MULTISET, ARRAY, LIST, MAP.

For example the following "RelRecordType":

> RecordType(
>   INTEGER f_int,
>   INTEGER NOT NULL ARRAY f_list_1,
>   (VARCHAR NOT NULL, VARCHAR NOT NULL) MAP f_map_1,
>   DOUBLE NOT NULL MULTISET f_set,
>   RecordType(
> BIGINT f_tuple_1_0,
> VARBINARY f_tuple_1_1,
> TIMESTAMP(0) f_tuple_1_2
>   ) f_tuple_1
> ) NOT NULL


is "translated" into the following when printing the elements of
"ResultSetMetaData":

> [f_int INTEGER,

f_list_1 INTEGER ARRAY,

f_map_1 (VARCHAR, VARCHAR) MAP
> f_set DOUBLE MULTISET,
> f_tuple_1 STRUCT]


while I was expecting something like:

> [f_int INTEGER,

f_list_1 INTEGER ARRAY,

f_map_1 (VARCHAR, VARCHAR) MAP
> f_set DOUBLE MULTISET,
> f_tuple_1 (BIGINT, VARBINARY, TIMESTAMP) STRUCT]


The difference comes from
"*org.apache.calcite.avatica.ColumnMetaData", *where you
have*:*

> public static StructType struct(List columns);
> public static ArrayType array(AvaticaType componentType, String typeName,
> Rep rep);
> public static ScalarType scalar(int type, String typeName, Rep rep);


For struct you don't pass "typeName", and the type name is set to "STRUCT"
inside the constructor.

For uniformity I'd build the type name (inside the aforementioned "struct"
method) similarly to the other "collections", that is: "(columnTypeName_1,
..., columnTypeName_n) STRUCT".
For this aim, the information inside the parameter "List
columns" suffices.

What do you think? Is there any reason I am missing for this different
treatment of collections?

Best regards,
Alessandro


[jira] [Created] (CALCITE-3477) Geofunction do not support int type as input

2019-11-05 Thread Kirils Mensikovs (Jira)
Kirils Mensikovs created CALCITE-3477:
-

 Summary: Geofunction do not support int type as input
 Key: CALCITE-3477
 URL: https://issues.apache.org/jira/browse/CALCITE-3477
 Project: Calcite
  Issue Type: Bug
  Components: spatial
Reporter: Kirils Mensikovs


Geospatial function with integer parameter fails. The expected behavior is to 
cast automatically all number values to BigDecimal.

{{Example: select  ST_MAKEPOINT(1.0, 1)}}

Return:
{{ Error: Error while executing SQL "select  ST_MAKEPOINT(1.0, 1)": Error while 
compiling generated Java code:}}{{public org.apache.calcite.linq4j.Enumerable 
bind(final org.apache.calcite.DataContext root) {}}{{  final 
org.apache.calcite.linq4j.Enumerable _inputEnumerable = 
org.apache.calcite.linq4j.Linq4j.asEnumerable(new Integer[]}}{{{     
0}}}{{);}}{{  return new org.apache.calcite.linq4j.AbstractEnumerable(){}}{{    
  public org.apache.calcite.linq4j.Enumerator enumerator() {}}{{        return 
new org.apache.calcite.linq4j.Enumerator(){}}{{            public final 
org.apache.calcite.linq4j.Enumerator inputEnumerator = 
_inputEnumerable.enumerator();}}{{            public void reset()}}{{{          
     inputEnumerator.reset();             }}}{{ }}{{            public boolean 
moveNext()}}{{{               return inputEnumerator.moveNext();             
}}}{{ }}{{            public void close()}}{{{               
inputEnumerator.close();             }}}{{ }}{{            public Object 
current()}}{{{               final java.math.BigDecimal v = 
$L4J$C$new_java_math_BigDecimal_1_0_;               return 
org.apache.calcite.runtime.GeoFunctions.ST_MakePoint(v, 1);             }}}{{ 
}}{{            static final java.math.BigDecimal 
$L4J$C$new_java_math_BigDecimal_1_0_ = new java.math.BigDecimal(}}{{            
  "1.0");}}{{          };}}{{      }}}{{ }}{{    };}}{{}}}{{ }}{{ }}{{public 
Class getElementType()}}{{{   return 
org.apache.calcite.runtime.GeoFunctions.Geom.class; }}}{{(state=,code=0)}}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-11-05 Thread Vladimir Ozerov
Hi Xiening,

I read the thread about on-demand trait requests. It seems pretty similar
to what I am trying to achieve, as it facilitates the bottom-up propagation
of physical traits. In fact, both your and my strategy propagate traits
bottom-up, but I do this through rules, which also fire bottom-up, while in
your case only the traits are propagated bottom-up, while rules continue
working in a top-down fashion.

However, I am thinking of how I would potentially implement my optimizer
with your approach, and it feels like with on-demand traits resulting
implementation of metadata queries may become very complex to that point
that it will look like another set of rules, parallel to the already
existing ruleset. For example, consider that I have a couple of distributed
tables in an OLTP application. These tables have a number of indexes, and I
would like to join them. First, I have a number of choices on how to join
tables with respect to distribution. Then, I have a number of choices on
which access method to use. Because sometimes it is beneficial to pick
index scans instead of table scans even without index conditions, for
example, to preserve a comfortable collation. So when my logical scan
receives such metadata request, it typically cannot return all possible
combinations, because there are too many of them. Instead, some heuristical
or cost-based logic will be used to calculate a couple of most prospective
ones. But it seems that we will have to duplicate the same logic in the
corresponding rule, aren't we?

I would love to read your design because this is a really interesting
topic, and it is of great importance for the distributed engines developed
on top of Calcite since proper use of distribution and collation is the key
success factor for efficient query optimization.

Regards,
Vladimir.

пт, 1 нояб. 2019 г. в 00:40, Xiening Dai :

> Actually we solved this problem in our setup using a mechanism called
> “Pull-Up Traits”, which explores the possible trait set of children’s input
> to decide parent’s physical properties. In order to determine child input
> trait, you would have to look at child’s children, and all the way to the
> leaves nodes or a barrier. A barrier is a rel node which cannot derive any
> traits regardless the input. A good example would be a user define function
> which would throw off any distribution or collation. Then we realize just
> pulling up is not enough, sometimes we would need to look at parent’s
> requirement as well. So we try to solve this in a unified framework, which
> we call “On Demand Trait” and implement it as part of the framework so
> anyone can be benefited. I hope Haisheng can share a design doc once we
> have more concrete ideas.
>
>
> > On Oct 31, 2019, at 11:37 AM, Jinfeng Ni  wrote:
> >
> > Hi Vladimir,
> >
> > The SubsetTransformer interface and the iterating over the RelNodes
> > within a RelSubset in Drill  is exactly implemented to do the trait
> > propagation. We also had to rely on AbstractConverter to fire
> > necessary rule to avoid the CanNotPlan issue. At some point, Calcite
> > community chooses to remove AbstractConverter and Drill had to add it
> > back, which is probably one of the main reasons for us to continue
> > using a Calcite fork.  I still remember we constantly had to deal with
> > the dilemma between "CanNotPlan" and long planing time due to explored
> > search space.
> >
> > Glad to see more people are joining the effort to solve this long
> > overdue issue, something missing in Calcite's core optimizer framework
> > "since before Calcite was Calcite" (Jacques's words).
> >
> > Jinfeng
> >
> >
> > On Thu, Oct 31, 2019 at 3:38 AM Vladimir Ozerov 
> wrote:
> >>
> >> Hi Danny,
> >>
> >> Thank you very much for the links. What is described here is pretty much
> >> similar to the problem I describe. Especially the discussion about trait
> >> propagation, as this is basically what I need - to explore potential
> traits
> >> of children before optimizing parents. And this is basically what Drill
> >> already does with it's SubsetTransformer:
> >> 1) There is a SubsetTransformer interface, which iterates over physical
> >> relations of the given subset [1]
> >> 2) If you want to make a physical project, you iterate over physical
> >> relations of the input subset and create possible physical projects [2]
> >> 3) But if you cannot find any physical input, then you trigger creation
> of
> >> a "bad" physical project, which is very likely to have poor cost
> because it
> >> cannot take advantage of input's distribution information [3]
> >> So, step 2 - is a trait set propagation which is needed by many
> >> distributed engines. Step 3 - an attempt to workaround current
> >> VolcanoPlanner behavior, when a parent rule is fired only if produced
> child
> >> node has compatible trait set.
> >>
> >> I do not know Calcite's architecture that good but at first glance, the
> >> proposed ability to re-fire rules of a specific Rel seems good 

Re: Re: Problem with converters and possibly rule matching

2019-11-05 Thread Vladimir Ozerov
Hi Haisheng,

 think I already tried something very similar to what you explained, but it
gave not an optimal plan. Please let me describe what I did. I would
appreciate your feedback.

1) We start with a simple operator tree Root <- Project <- Scan, where the
root is a final aggregator in the distributed query engine:
-> LogicalRoot
 -> LogicalProject
  -> LogicalScan

2) First, we convert the Root and enforce SINGLETON distribution on a child:
*-> PhysicalRoot[SINGLETON]*
* -> Enforcer#1[SINGLETON]*
  -> LogicalProject
   -> LogicalScan

3) Then the project's rule is invoked. It doesn't know the distribution of
the input, so it requests ANY distribution. Note that we have to set ANY to
the project as well since we do not know the distribution of the input:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
*  -> PhysicalProject[ANY]*
*   -> Enforcer#2[ANY]*
-> LogicalScan

4) Finally, the physical scan is created and its distribution is resolved.
Suppose that it is REPLICATED, i.e. the whole result set is located on all
nodes.
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> Enforcer#2[ANY]
*-> PhysicalScan[REPLICATED]*

5) Now as all logical nodes are converted, we start resolving enforcers.
The second one is no-op, since REPLICATED satisfies ANY:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> PhysicalScan[REPLICATED]

6) But the first enforcer now requires an Exchange, since ANY doesn't
satisfy SINGLETON!
-> PhysicalRoot[SINGLETON]
* -> SingletonExchange[SINGLETON]*
  -> PhysicalProject[ANY] // <= unresolved!
   -> PhysicalScan[REPLICATED]

The resulting plan requires data movement only because we didn't know
precise distribution of the PhysicalProject when it was created. But should
I enable Convention.Impl.canConvertConvention, bottom-up propagation kicks
in, and the correct plan is produced because now LogicalProject has a
chance to be converted to PhysicalProject with the concrete distribution.
The optimized plan looks like this (since REPLICATED satisfies SINGLETON):
-> PhysicalRoot[SINGLETON]
 -> PhysicalProject[REPLICATED]
  -> PhysicalScan[REPLICATED]

You may see this in action in my reproducer:
1) Test producing "bad" plan:
https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45
2) Root enforces SINGLETON on Project:
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45
3) Project enforces default (ANY) distribution on Scan:
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49

Please let me know if this flow is similar to what you meant.

Regards,
Vladimir.

пн, 4 нояб. 2019 г. в 10:33, Haisheng Yuan :

> Hi Vladimir,
>
> This is still can be done through top-down request approach.
>
> PhysicalFilter operator should request ANY distribution from child
> operator, unless there is outer reference in the filter condition, in which
> case, PhysicalFilter should request SINGLETON or BROADCAST distribution. So
> in your case, PhysicalFilter request ANY, its required distribution will be
> enforced on filter's output.
>
> Regarding index usage, you should have a FIlterTableScan2IndexGet logical
> transformation rule, and a IndexGet2IndexScan physical implementation rule.
> Note that IndexGet is a logical operator and IndexScan is a physical
> operator, which are also used by SQL Server.
>
> - Haisheng
>
> --
> 发件人:Vladimir Ozerov
> 日 期:2019年11月01日 17:30:26
> 收件人:
> 主 题:Re: Problem with converters and possibly rule matching
>
> Hi Stamatis,
>
> Thank you for your reply. I also thought that we may set the distribution
> trait during logical planning because it is known in advance. And the
> example I gave will work! :-) But unfortunately, it will work only because
> the tree is very simple, and the Project is adjacent to the Scan. This is
> how my reproducer will work in that case:
> 1) Root: enforce "SINGLETON" on Project
> 2) Project: check the logical Scan, infer already resolved distribution,
> then convert to [PhyiscalProject <- PhysicalScan]
> 3) Resolve Root enforcer, adding and Exchange if needed.
>
> But this stops working as soon as a plan becomes more complex so that it is
> impossible to infer the distribution from the child immediately. E.g.:
> LogicalRoot [distribution=SINGLETON]
> -> LogicalProject // We are here new and cannot produce the physical
> project
> -> LogicalFilter[distribution=?]
> -> LogicalScan[distribution=REPLICATED]
>
> This is where your suggestion with cascading enforcement may kick in. But
> now consider that instead of having REPLICATED distribution, which
> satisfies SINGLETON, we have a PARTITIONED distribution, It doesn't satisfy
> SINGLETON, so we have to insert the exchange.
>
> Before:
> PhysicalRoot [SINGLETON]
> -> PhysicalProject