Re: Best way to merge a set of Operators in a single Operator

2017-05-22 Thread Γιώργος Θεοδωράκης
In order to make it work, I had to specifically define my rowtype in the
core operator I have created:

public abstract class AggrCalc extends SingleRel{
...
   // constructor
   protected AggrCalc(...) {
super(cluster, traits, child);
 this.rowType = wantedRowType;
   }
}

I am not sure if this is a good way to do it from the beginning (e.g. I
lose wanted information about the original operators), but I think it
serves my purpose as it this rule is enforced before execution.

2017-05-22 12:13 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> Hello,
> I tried to write something by myself, and your example helped. However, I
> am stuck with this error:
>
> I have created an operator that holds the proper info (program1, program2,
> AggList, GroupList, ...) about two CALCs and one Aggregate in the following
> formation:
> (Calc1->Aggregate->Calc2) => CustomOperator
>
> and I get this error when I try to transform them to my custom operator:
>
> Exception in thread "main" java.lang.AssertionError: Cannot add expression
> of different type to set:
> set type is RecordType(Calc1 output schema)
> expression type is RecordType(Calc2 output schema)
> ...
>
> I have seen that there is a check when calling the trasformTo() method. Is
> there any trivial way to overcome this error?
>
> In addition, in your example, DruidQuery starts from the bottom of the
> operators' tree and you have to worry about only the row type of the last
> RelNode in the stack. Would I have a problem with this kind of rule I try
> to create? If I had only (Calc->Aggregate), would it be easier?
>
>
> 2017-05-12 5:34 GMT+03:00 Julian Hyde <jh...@apache.org>:
>
>> It seems that "composite" operators crop up quite often. Having kept
>> the operators separate in older adapters like the Mongo adapter, I
>> took a different approach in Druid adapter, and I'm quite please with
>> it.
>>
>> DruidQuery contains a "stack" of internal relational operators. They
>> are held in the field
>>
>>   final ImmutableList rels;
>>
>> The initial DruidQuery contains just a TableScan. It grows as other
>> operators (Filter, Project, Aggregate) are pushed down to Druid, one
>> at a time.
>>
>> The internal RelNodes are not visible to the planner but are
>> nevertheless set up pretty much as they would be if they were outside
>> the DruidQuery. The row type of the query is the row type of the last
>> RelNode in the stack.
>>
>> The "signature()" method helps figure out whether an operation can be
>> pushed onto a DruidQuery. It returns a string that indicates the
>> sequence of operations. For example, a TableScan followed by a Filter
>> followed by a Project returns "sfp", and the rule to push an Aggregate
>> into Druid knows that it can succeed because "sfpa" is in the list of
>> valid signatures.
>>
>> Julian
>>
>>
>>
>> On Thu, May 11, 2017 at 4:16 AM, Γιώργος Θεοδωράκης
>> <giwrgosrth...@gmail.com> wrote:
>> > I am trying to "separate" certain subsets of Operators in a query tree
>> and
>> > transform them to a more "general" RelNode implementation, that holds
>> the
>> > information required to rebuild them. I want to implement something more
>> > general than CALC (for more types of operators), that works like this:
>> >
>> > Operator1 -> Operator2 -> Operator3   ===Enforcing Rules with certain
>> > Conditions==>
>> >
>> > Operator1 -> (Operator2 -> Operator3) == Operator1 -> MergedOperators2,3
>> > (we can distinguish that this operator is build from Operators 2 and 3)
>> >
>> > Can anyone suggest a possible starting point?
>>
>
>


Best way to merge a set of Operators in a single Operator

2017-05-11 Thread Γιώργος Θεοδωράκης
I am trying to "separate" certain subsets of Operators in a query tree and
transform them to a more "general" RelNode implementation, that holds the
information required to rebuild them. I want to implement something more
general than CALC (for more types of operators), that works like this:

Operator1 -> Operator2 -> Operator3   ===Enforcing Rules with certain
Conditions==>

Operator1 -> (Operator2 -> Operator3) == Operator1 -> MergedOperators2,3
(we can distinguish that this operator is build from Operators 2 and 3)

Can anyone suggest a possible starting point?


Re: Understanding the science and concepts behind Calcite

2017-05-09 Thread Γιώργος Θεοδωράκης
Here http://www.redbook.io/index.html are also some interesting readings
about database systems with references for relevant papers.

Chapter 7 is about Query optimization (it mentions Calcite), and in
previous chapters, it also discusses System R, Volcano and some basic
optimization concepts.

2017-05-09 9:59 GMT+03:00 Ismaël Mejía :

> This thread is really good, and considering that this is 'common' question
> in
> the mailing list, maybe it would be a good idea to create a section for
> this in
> the Calcite website.
>
> I add my little contribution for the moment. The Database Course by Andy
> Pavlo
> covers many interesting subjects on databases and in particular discusses
> query
> planning in two lessons.
>
> http://15721.courses.cs.cmu.edu/spring2016/schedule.html
>
>
> On Sun, Apr 30, 2017 at 10:09 PM, Muhammad Gelbana 
> wrote:
> > Thanks everyone, this is very helpful !!
> >
> > And if anyone has anything else that could be of help, please share it.
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Sun, Apr 30, 2017 at 9:09 PM, Eli Levine  wrote:
> >
> >> Relational algebra concepts are presented well in "Database Management
> >> Systems" by Ramakrishnan/Gehrke. That's what I used for my undergrad
> >> DBMS course.
> >>
> >> Eli
> >>
> >> On Sun, Apr 30, 2017 at 10:44 AM, Khai Tran  >
> >> wrote:
> >> > I dont know any undergrad database teaching about the Volcano
> optimizer.
> >> > It's probably too hard for undergrad level. For relation algebra and
> >> > System-R style optimizer, this is a good one:
> >> > http://pages.cs.wisc.edu/~cs564-1/schedule.html
> >> >
> >> > On Sat, Apr 29, 2017 at 3:17 PM, Julian Hyde 
> wrote:
> >> >
> >> >> Adding dev@drill to the cc list, because Muhammad also asked the
> >> question
> >> >> there. But please reply to dev@calcite only.
> >> >>
> >> >> I gave a talk “Why you should care about relational algebra”[1],
> >> intended
> >> >> for an audience of people who know SQL, but with a lot of details
> about
> >> >> algebra and algebraic transformations.
> >> >>
> >> >> And you could do a lot worse than read Graefe & McKenna’s original
> >> Volcano
> >> >> planner paper[2]. Also Graefe's later Cascades paper. (Graefe just
> >> received
> >> >> the 2017 SIGMOD Edgar F. Codd Innovations Award[3], well deserved.
> >> Frankly,
> >> >> you should read everything he ever wrote! I love his work on hybrid
> hash
> >> >> join and comparing sort-based and hash-based algorithms.)
> >> >>
> >> >> Most of the terms Graefe uses are the same as we use in Calcite. We
> say
> >> >> “filter” because “select” confuses everyone who knows SQL. Graefe’s
> >> >> “physical properties” are our “traits” and he has another term for
> what
> >> we
> >> >> call “importance”. Our “converter” is his “enforcer”. Our “row type”
> is
> >> his
> >> >> “schema”. Our “set” is his “equivalence class”, and our “subset” is
> an
> >> >> equivalence class combined with a particular set of physical
> >> properties. We
> >> >> mix his “logical” and “physical” algebras into one algebra, and
> >> introduce a
> >> >> new concept of “calling convention” so that you can mix logical
> algebra
> >> >> with multiple physical algebras in hybrid plans.
> >> >>
> >> >> Does anyone know of a good undergraduate treatment of relational
> algebra
> >> >> and query optimization?
> >> >>
> >> >> Julian
> >> >>
> >> >> [1] https://calcite.apache.org/community/#more-talks <
> >> >> https://calcite.apache.org/community/#more-talks>
> >> >>
> >> >> [2] http://www.cs.colorado.edu/department/publications/
> >> >> reports/docs/CU-CS-563-91.pdf  >> >> department/publications/reports/docs/CU-CS-563-91.pdf>
> >> >>
> >> >> [3] https://sigmod.org/sigmod-awards/  >> awards/>
> >> >>
> >> >> > On Apr 29, 2017, at 10:22 AM, Muhammad Gelbana <
> m.gelb...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> > I'm trying to understand the scientific concepts behind Calcite
> and I
> >> was
> >> >> > wondering if anyone would kindly recommend
> >> articles\papers\books\topic-
> >> >> titles
> >> >> > that would help me understand Calcite from the ground up.
> >> >> >
> >> >> > For instance, I'm not fully understanding what are:
> >> >> >
> >> >> >   - Relational expressions
> >> >> >   - Row expressions
> >> >> >   - Calling conventions
> >> >> >   - Relational traits
> >> >> >   - Relational traits definitions
> >> >> >
> >> >> > I'm currently looking for books about "Relational Algebra", but
> when
> >> look
> >> >> > into one, I can't find anything about traits or calling
> conventions.
> >> Or
> >> >> am
> >> >> > I not searching for the correct keywords ?
> >> >>
> >> >>
> >>
>


CumulativeCost not working while searching for the optimal plan in Volcano

2017-05-02 Thread Γιώργος Θεοδωράκης
Hello,

I am trying to implement a cost model, in which some parameters are
computed based on estimations made by their children. For example, if I
have Filter2(Filter1(Scan)), I want to use the cpu estimation I have from
Filter1 to compute the cost parameters of Filter2 operator.

Code from Filter computeSelfCost method:

@Override public RelOptCost computeSelfCost(RelOptPlanner planner,
RelMetadataQuery m
double rowCount = mq.getRowCount(this);
   double selectivity = mq.getSelectivity(this.getInput(),
this.getCondition());
   double rate = selectivity * ((SaberCostBase)
mq.getCumulativeCost(this.getInput())).getRate();
   double cpuCost = SaberCostBase.Cs * rate;
double window =  selectivity * ((SaberCostBase)
mq.getCumulativeCost(this.getInput())).getWindow();
window = (window < 1) ? 1 : window; // fix window size in order to be >= 1
double R = (((SaberCostBase)
mq.getCumulativeCost(this.getInput())).getCpu() + cpuCost) / rate;
SaberCostFactory costFactory = (SaberCostFactory)planner.getCostFactory();
return costFactory.makeCost(rowCount, cpuCost, 0, rate, 0, window, R);
}

When I compute the cumulative cost on my final plan I get correct
estimations. However, I found out with checkpoints, that when I use Volcano
Optimizer to enforce some rules, both
((SaberCostBase) mq.getCumulativeCost(this.getInput())).getRate() and
((SaberCostBase) mq.getCumulativeCost(this.getInput())).getCpu() return
zeros  instead of the correct number. By inspecting "this" variable, I
discovered that the correct estimations can be found in
this->input->bestCost, which I can't use.

Is there any obvious way of accessing the estimations of the previous
operators, while using Volcano optimization procedure? Is it somehow
possible to do it?

Thank you in advance,
George


Re: Exception in VolcanoPlanner while using custom Operators and a rule that transposes two Filters.

2017-05-02 Thread Γιώργος Θεοδωράκης
Thank you. My rule was creating logical Filters in the last two lines
instead of my custom ones and failed.

2017-05-01 22:32 GMT+03:00 Julian Hyde <jh...@apache.org>:

> I think what’s happening is this. The root is logical, so you’re asking it
> to produce a logical plan. Logical plans have infinite cost. So they can’t
> be implemented.
>
> > On Apr 30, 2017, at 2:52 PM, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > I have written a very simple rule for pushing a filter through filter,
> > which worked perfectly when I applied it with Volcano on the regular
> > implementation of operators. Here is the code of my rule:
> >
> >  ...
> >  public void onMatch(RelOptRuleCall call) {
> >...
> >final LogicalFilter newFilter =
> > LogicalFilter.create(secFilter.getInput(), filter.getCondition());
> >final LogicalFilter newSecFilter = LogicalFilter.create(newFilter,
> > secFilter.getCondition());
> >call.transformTo(newSecFilter);
> >  }
> >
> > In order to introduce a new cost model, I have created my custom
> operators.
> > However, I see the following error when trying to use this specific rule
> in
> > Volcano:
> >
> > My query is:
> > select * from (
> > select * from s.orders
> > where s.orders.units > 10) as s1
> > where s1.productid = 15 or s1.productid = 17;
> >
> > Exception in thread "main"
> > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> > [rel#36:Subset#1.LOGICAL.[0]] could not be implemented; planner state:
> >
> > Root: rel#33:Subset#2.LOGICAL.[]
> > Original rel:
> >
> > Sets:
> > Set#0, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
> > productid, INTEGER units, INTEGER customerid)
> > rel#28:Subset#0.NONE.[0], best=rel#5, importance=0.405
> > rel#5:LogicalTableScan.NONE.[[0]](table=[s, orders]), rowcount=500.0,
> > cumulative cost={inf}
> > rel#35:Subset#0.LOGICAL.[0], best=rel#34, importance=0.81
> > rel#34:SaberTableScanRel.LOGICAL.[[0]](table=[s, orders]),
> rowcount=500.0,
> > cumulative cost={500.0 rows, 0.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0
> > window, 0.0 R}
> > Set#1, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
> > productid, INTEGER units, INTEGER customerid)
> > rel#30:Subset#1.NONE.[0], best=rel#47, importance=0.45
> > rel#29:LogicalFilter.NONE.[[0]](input=rel#28:Subset#0.NONE.[
> 0],condition=>($3,
> > 10)), rowcount=250.0, cumulative cost={inf}
> > rel#47:LogicalFilter.NONE.[[0]](input=rel#35:Subset#0.
> LOGICAL.[0],condition=>($3,
> > 10)), rowcount=250.0, cumulative cost={inf}
> > rel#36:Subset#1.LOGICAL.[0], best=null, importance=0.9
> > rel#39:SaberFilterRel.LOGICAL.[[0]](input=rel#35:Subset#0.
> LOGICAL.[0],condition=>($3,
> > 10)), rowcount=250.0, cumulative cost={501.0 rows, 1.0 cpu, 0.0 io, 0.0
> > rate, 0.0 memory, 0.0 window, 0.0 R}
> > Set#2, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
> > productid, INTEGER units, INTEGER customerid)
> > rel#32:Subset#2.NONE.[0], best=rel#31, importance=0.0
> > rel#31:LogicalFilter.NONE.[[0]](input=rel#30:Subset#1.NONE.[
> 0],condition=OR(=($2,
> > 15), =($2, 17))), rowcount=62.5, cumulative cost={inf}
> > rel#43:LogicalFilter.NONE.[[0]](input=rel#42:Subset#3.NONE.[
> 0],condition=>($3,
> > 10)), rowcount=62.5, cumulative cost={inf}
> > rel#33:Subset#2.LOGICAL.[], best=rel#37, importance=0.0
> > rel#37:SaberFilterRel.LOGICAL.[[0]](input=rel#36:Subset#1.
> LOGICAL.[0],condition=OR(=($2,
> > 15), =($2, 17))), rowcount=62.5, cumulative cost={inf}
> > rel#45:SaberFilterRel.LOGICAL.[[0]](input=rel#44:Subset#3.
> LOGICAL.[0],condition=>($3,
> > 10)), rowcount=62.5, cumulative cost={502.0 rows, 2.0 cpu, 0.0 io, 0.0
> > rate, 0.0 memory, 0.0 window, 0.0 R}
> > rel#38:Subset#2.LOGICAL.[0], best=rel#37, importance=0.0
> > rel#37:SaberFilterRel.LOGICAL.[[0]](input=rel#36:Subset#1.
> LOGICAL.[0],condition=OR(=($2,
> > 15), =($2, 17))), rowcount=62.5, cumulative cost={inf}
> > rel#45:SaberFilterRel.LOGICAL.[[0]](input=rel#44:Subset#3.
> LOGICAL.[0],condition=>($3,
> > 10)), rowcount=62.5, cumulative cost={502.0 rows, 2.0 cpu, 0.0 io, 0.0
> > rate, 0.0 memory, 0.0 window, 0.0 R}
> > Set#3, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
> > productid, INTEGER units, INTEGER customerid)
> > rel#42:Subset#3.NONE.[0], best=rel#40, importance=0.45
> > rel#40:LogicalFilter.NONE.[[0]](input=rel#35:Subset#0.
> LOGICAL.[0],conditio

Exception in VolcanoPlanner while using custom Operators and a rule that transposes two Filters.

2017-04-30 Thread Γιώργος Θεοδωράκης
Hello,

I have written a very simple rule for pushing a filter through filter,
which worked perfectly when I applied it with Volcano on the regular
implementation of operators. Here is the code of my rule:

  ...
  public void onMatch(RelOptRuleCall call) {
...
final LogicalFilter newFilter =
LogicalFilter.create(secFilter.getInput(), filter.getCondition());
final LogicalFilter newSecFilter = LogicalFilter.create(newFilter,
secFilter.getCondition());
call.transformTo(newSecFilter);
  }

In order to introduce a new cost model, I have created my custom operators.
However, I see the following error when trying to use this specific rule in
Volcano:

My query is:
select * from (
select * from s.orders
where s.orders.units > 10) as s1
where s1.productid = 15 or s1.productid = 17;

Exception in thread "main"
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
[rel#36:Subset#1.LOGICAL.[0]] could not be implemented; planner state:

Root: rel#33:Subset#2.LOGICAL.[]
Original rel:

Sets:
Set#0, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
productid, INTEGER units, INTEGER customerid)
rel#28:Subset#0.NONE.[0], best=rel#5, importance=0.405
rel#5:LogicalTableScan.NONE.[[0]](table=[s, orders]), rowcount=500.0,
cumulative cost={inf}
rel#35:Subset#0.LOGICAL.[0], best=rel#34, importance=0.81
rel#34:SaberTableScanRel.LOGICAL.[[0]](table=[s, orders]), rowcount=500.0,
cumulative cost={500.0 rows, 0.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0
window, 0.0 R}
Set#1, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
productid, INTEGER units, INTEGER customerid)
rel#30:Subset#1.NONE.[0], best=rel#47, importance=0.45
rel#29:LogicalFilter.NONE.[[0]](input=rel#28:Subset#0.NONE.[0],condition=>($3,
10)), rowcount=250.0, cumulative cost={inf}
rel#47:LogicalFilter.NONE.[[0]](input=rel#35:Subset#0.LOGICAL.[0],condition=>($3,
10)), rowcount=250.0, cumulative cost={inf}
rel#36:Subset#1.LOGICAL.[0], best=null, importance=0.9
rel#39:SaberFilterRel.LOGICAL.[[0]](input=rel#35:Subset#0.LOGICAL.[0],condition=>($3,
10)), rowcount=250.0, cumulative cost={501.0 rows, 1.0 cpu, 0.0 io, 0.0
rate, 0.0 memory, 0.0 window, 0.0 R}
Set#2, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
productid, INTEGER units, INTEGER customerid)
rel#32:Subset#2.NONE.[0], best=rel#31, importance=0.0
rel#31:LogicalFilter.NONE.[[0]](input=rel#30:Subset#1.NONE.[0],condition=OR(=($2,
15), =($2, 17))), rowcount=62.5, cumulative cost={inf}
rel#43:LogicalFilter.NONE.[[0]](input=rel#42:Subset#3.NONE.[0],condition=>($3,
10)), rowcount=62.5, cumulative cost={inf}
rel#33:Subset#2.LOGICAL.[], best=rel#37, importance=0.0
rel#37:SaberFilterRel.LOGICAL.[[0]](input=rel#36:Subset#1.LOGICAL.[0],condition=OR(=($2,
15), =($2, 17))), rowcount=62.5, cumulative cost={inf}
rel#45:SaberFilterRel.LOGICAL.[[0]](input=rel#44:Subset#3.LOGICAL.[0],condition=>($3,
10)), rowcount=62.5, cumulative cost={502.0 rows, 2.0 cpu, 0.0 io, 0.0
rate, 0.0 memory, 0.0 window, 0.0 R}
rel#38:Subset#2.LOGICAL.[0], best=rel#37, importance=0.0
rel#37:SaberFilterRel.LOGICAL.[[0]](input=rel#36:Subset#1.LOGICAL.[0],condition=OR(=($2,
15), =($2, 17))), rowcount=62.5, cumulative cost={inf}
rel#45:SaberFilterRel.LOGICAL.[[0]](input=rel#44:Subset#3.LOGICAL.[0],condition=>($3,
10)), rowcount=62.5, cumulative cost={502.0 rows, 2.0 cpu, 0.0 io, 0.0
rate, 0.0 memory, 0.0 window, 0.0 R}
Set#3, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
productid, INTEGER units, INTEGER customerid)
rel#42:Subset#3.NONE.[0], best=rel#40, importance=0.45
rel#40:LogicalFilter.NONE.[[0]](input=rel#35:Subset#0.LOGICAL.[0],condition=OR(=($2,
15), =($2, 17))), rowcount=125.0, cumulative cost={inf}
rel#44:Subset#3.LOGICAL.[0], best=rel#46, importance=0.9
rel#46:SaberFilterRel.LOGICAL.[[0]](input=rel#35:Subset#0.LOGICAL.[0],condition=OR(=($2,
15), =($2, 17))), rowcount=125.0, cumulative cost={501.0 rows, 1.0 cpu, 0.0
io, 0.0 rate, 0.0 memory, 0.0 window, 0.0 R}


at
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:443)
at
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:465)
at
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:293)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368)
at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:313)
at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:257)
at calcite.Tester.main(Tester.java:241)


Any hints on what am I doing wrong?

Thank you in advance,
George


Re: doubt about how to access a lowerbound value in window

2017-02-09 Thread Γιώργος Θεοδωράκης
I try something like this:

Iterate until you find the your window rel and then

LogicalWindow windowAgg = (LogicalWindow) rel;
int windowRange = createWindowFrame(windowAgg.getConstants());

...

private int createWindowFrame(List constants) { int windowFrame
= 0; for ( RexLiteral con : constants) windowFrame +=
Integer.parseInt(con.toString()); // fix unbounded window range //if
(windowFrame == 0) // windowFrame++; return windowFrame; }

2017-02-09 15:19 GMT+02:00 Stefano Bortoli :

> Hi all,
>
>
>
> I am new to the community and I  am still figuring out things. So please,
> apologize if I am not precise enough.
>
>
>
> I am trying to work with streaming windows, and I would need to access
> some values related to the boundaries of the window.
>
>
>
> So far, the way I found was to use the window group keys, and use its
> first value to get a value from the window.constants lists.
>
> **
>
> Integer lowerBoundKey = group.keys.asList().get(0);
>
> Object lowerbound = constants.get(lowerBoundKey).getValue2();
>
> **
>
> However, this method seems to be a little arbitrary as I would expect to
> be using the lowerbound offset somehow.
>
>
>
> Intuitively, I was expecting to use the lowerbound offset name or index
> position to get the value from somewhere, but I am having problems
> understanding how. I feel I am missing something, and I would be grateful
> if someone can give me some hint about it.
>
> Thanks a lot in advance!
> Stefano
>
>


Re: JoinAssociateRule Exception

2017-01-19 Thread Γιώργος Θεοδωράκης
I created a JIRA issue (I hope I did it right as it is my first). Thank you
Julian.

2017-01-19 22:12 GMT+02:00 Julian Hyde <jh...@apache.org>:

> Can you log a JIRA case for this? I will answer there.
>
> On Thu, Jan 19, 2017 at 11:25 AM, Γιώργος Θεοδωράκης
> <giwrgosrth...@gmail.com> wrote:
> > Hello,
> >
> > I have created my own operators and Convention to apply my custom cost
> > logic. I have tried many rules with both Volcano and HepPlanner and
> > everything works fine. When I apply LoptOptimizeRule I get the correct
> > output. However, when I try to use:
> > JoinPushThroughJoinRule.LEFT,
> > JoinPushThroughJoinRule.RIGHT,
> > JoinAssociateRule.INSTANCE,
> > JoinCommuteRule.INSTANCE
> > something goes wrong. I try to apply them in VolcanoPlanner in the same
> > step I use my converter rules to change the initial Logical Operators to
> > the custom ones.
> > I declare them as:
> > static final RelOptRule SABER_JOIN_PUSH_THROUGH_JOIN_RULE_RIGHT = new
> > JoinPushThroughJoinRule("JoinPushThroughJoinRule", true,
> > SaberJoinRel.class, SaberRelFactories.SABER_LOGICAL_BUILDER);
> > static final RelOptRule SABER_JOIN_PUSH_THROUGH_JOIN_RULE_LEFT = new
> > JoinPushThroughJoinRule("JoinPushThroughJoinRule", false,
> > SaberJoinRel.class, SaberRelFactories.SABER_LOGICAL_BUILDER);
> >
> > static final RelOptRule SABER_JOIN_COMMUTE_RULE = new
> > JoinCommuteRule(SaberJoinRel.class,
> > SaberRelFactories.SABER_LOGICAL_BUILDER, false);
> > So, when I use the four of them I get:
> >
> > Exception in thread "main" java.lang.AssertionError: Internal error:
> Error
> > while applying rule SaberProjectRule, args
> > [rel#171:LogicalProject.NONE.[](input=rel#170:Subset#20.
> LOGICAL.[],rowtime=$8,customerid=$9,phone=$10,rowtime0=$0,orderid=$1,
> productid=$2,units=$3,customerid0=$4,rowtime00=$5,
> productid0=$6,description=$7)]
> > at org.apache.calcite.util.Util.newInternal(Util.java:792)
> > at
> > org.apache.calcite.plan.volcano.VolcanoRuleCall.
> onMatch(VolcanoRuleCall.java:236)
> > at
> > org.apache.calcite.plan.volcano.VolcanoPlanner.
> findBestExp(VolcanoPlanner.java:819)
> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(
> Programs.java:334)
> > at org.apache.calcite.prepare.PlannerImpl.transform(
> PlannerImpl.java:308)
> > at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:257)
> > at calcite.Tester.main(Tester.java:183)
> > Caused by: java.lang.AssertionError: Internal error: Error occurred while
> > applying rule SaberProjectRule
> > at org.apache.calcite.util.Util.newInternal(Util.java:792)
> > at
> > org.apache.calcite.plan.volcano.VolcanoRuleCall.
> transformTo(VolcanoRuleCall.java:148)
> > at
> > org.apache.calcite.plan.RelOptRuleCall.transformTo(
> RelOptRuleCall.java:225)
> > at
> > org.apache.calcite.rel.convert.ConverterRule.onMatch(
> ConverterRule.java:117)
> > at
> > org.apache.calcite.plan.volcano.VolcanoRuleCall.
> onMatch(VolcanoRuleCall.java:213)
> > ... 5 more
> > Caused by: org.apache.calcite.rel.metadata.CyclicMetadataException
> > at
> > org.apache.calcite.rel.metadata.CyclicMetadataException.(
> CyclicMetadataException.java:28)
> > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> > at
> > org.apache.calcite.rel.metadata.RelMetadataQuery.
> getRowCount(RelMetadataQuery.java:201)
> > at
> > org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.
> java:132)
> > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
> > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> > at
> > org.apache.calcite.rel.metadata.RelMetadataQuery.
> getRowCount(RelMetadataQuery.java:201)
> >
> > 
> >
> > If it helps, when I don't use JoinCommuteRule I don't get an exception,
> but
> > I get wrong result and when I don't use JoinAssociateRule I get:
> >
> > Exception in thread "main" java.lang.StackOverflowError
> > at
> > com.google.common.collect.ImmutableCollection.(
> ImmutableCollection.java:157)
> > at com.google.common.collect.ImmutableList.(
> ImmutableList.java:313)
> > at
> > com.google.common.collect.SingletonImmutableList.(
> SingletonImmutableList.java:35)
> > at com.google.common.collect.ImmutableList.of(ImmutableList.java:76)
> > at org.apache.calcite.rel.SingleRel.getInputs(SingleRel.java:63)
> > at
> > org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(
> RelSubset.java:461)
> >
> > Thank you in advance,
> > George
>


JoinAssociateRule Exception

2017-01-19 Thread Γιώργος Θεοδωράκης
Hello,

I have created my own operators and Convention to apply my custom cost
logic. I have tried many rules with both Volcano and HepPlanner and
everything works fine. When I apply LoptOptimizeRule I get the correct
output. However, when I try to use:
JoinPushThroughJoinRule.LEFT,
JoinPushThroughJoinRule.RIGHT,
JoinAssociateRule.INSTANCE,
JoinCommuteRule.INSTANCE
something goes wrong. I try to apply them in VolcanoPlanner in the same
step I use my converter rules to change the initial Logical Operators to
the custom ones.
I declare them as:
static final RelOptRule SABER_JOIN_PUSH_THROUGH_JOIN_RULE_RIGHT = new
JoinPushThroughJoinRule("JoinPushThroughJoinRule", true,
SaberJoinRel.class, SaberRelFactories.SABER_LOGICAL_BUILDER);
static final RelOptRule SABER_JOIN_PUSH_THROUGH_JOIN_RULE_LEFT = new
JoinPushThroughJoinRule("JoinPushThroughJoinRule", false,
SaberJoinRel.class, SaberRelFactories.SABER_LOGICAL_BUILDER);

static final RelOptRule SABER_JOIN_COMMUTE_RULE = new
JoinCommuteRule(SaberJoinRel.class,
SaberRelFactories.SABER_LOGICAL_BUILDER, false);
So, when I use the four of them I get:

Exception in thread "main" java.lang.AssertionError: Internal error: Error
while applying rule SaberProjectRule, args
[rel#171:LogicalProject.NONE.[](input=rel#170:Subset#20.LOGICAL.[],rowtime=$8,customerid=$9,phone=$10,rowtime0=$0,orderid=$1,productid=$2,units=$3,customerid0=$4,rowtime00=$5,productid0=$6,description=$7)]
at org.apache.calcite.util.Util.newInternal(Util.java:792)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:236)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:819)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308)
at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:257)
at calcite.Tester.main(Tester.java:183)
Caused by: java.lang.AssertionError: Internal error: Error occurred while
applying rule SaberProjectRule
at org.apache.calcite.util.Util.newInternal(Util.java:792)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
at
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:225)
at
org.apache.calcite.rel.convert.ConverterRule.onMatch(ConverterRule.java:117)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:213)
... 5 more
Caused by: org.apache.calcite.rel.metadata.CyclicMetadataException
at
org.apache.calcite.rel.metadata.CyclicMetadataException.(CyclicMetadataException.java:28)
at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
at
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)
at
org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
at
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:201)



If it helps, when I don't use JoinCommuteRule I don't get an exception, but
I get wrong result and when I don't use JoinAssociateRule I get:

Exception in thread "main" java.lang.StackOverflowError
at
com.google.common.collect.ImmutableCollection.(ImmutableCollection.java:157)
at com.google.common.collect.ImmutableList.(ImmutableList.java:313)
at
com.google.common.collect.SingletonImmutableList.(SingletonImmutableList.java:35)
at com.google.common.collect.ImmutableList.of(ImmutableList.java:76)
at org.apache.calcite.rel.SingleRel.getInputs(SingleRel.java:63)
at
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:461)

Thank you in advance,
George


Re: Plug a new cost model to Calcite

2016-12-22 Thread Γιώργος Θεοδωράκης
I have written two conversion rules to test my custom TableScan and Project
on Volcano (I use as Convention: new Convention.Impl("LOGICAL",
SaberRel.class)). To begin with, I want only to convert the initial logical
rules of Calcite to my custom logical rules. I have added to my planner the
two conversion rules with AbstractConverter.ExpandConversionRule.INSTANCE.
However, I get this error:

Exception in thread "main"
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
[rel#8:Subset#0.ENUMERABLE.[]] could not be implemented; planner state:

Root: rel#8:Subset#0.ENUMERABLE.[]
Original rel:

Sets:
Set#0, type: RecordType(TIMESTAMP(0) rowtime, INTEGER productid, INTEGER
description)
rel#4:Subset#0.NONE.[0], best=null, importance=0.0
rel#2:LogicalTableScan.NONE.[[0]](table=[s, products]), rowcount=16384.0,
cumulative cost={inf}
rel#5:LogicalProject.NONE.[[0]](input=rel#4:Subset#0.NONE.[0],rowtime=$0,productid=$1,description=$2),
rowcount=16384.0, cumulative cost={inf}
rel#8:Subset#0.ENUMERABLE.[], best=null, importance=0.0
rel#10:Subset#0.LOGICAL.[0], best=rel#9, importance=0.0
rel#9:SaberTableScanRel.LOGICAL.[[0]](table=[s, products]),
rowcount=16384.0, cumulative cost={16384.0 rows, 16385.0 cpu, 0.0 io,
16384.0 rate, 0.0 memory, 0.0 window, 0.0 network}
rel#11:SaberProjectRel.LOGICAL.[[0]](input=rel#10:Subset#0.LOGICAL.[0],rowtime=$0,productid=$1,description=$2),
rowcount=16384.0, cumulative cost={32768.0 rows, 147457.0 cpu, 0.0 io,
16384.0 rate, 0.0 memory, 1.0 window, 0.0 network}


at
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:443)
at
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:293)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:835)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308)
at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:309)
at calcite.Tester.main(Tester.java:70)

Why does it still try to implement Enumerable convention? When I add the
Enumerable rules I get the Enumerable logical plan and not my custom. What
should I do?

2016-12-17 22:11 GMT+02:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> Hello,
>
> I think I have understood the basics about RelOptCost, transformer and
> converter rules and how to use them in Volcano in order to set my own
> cost-model, from examples I have seen in Drill and Hive mainly.
> As I am thinking about t it right now, I should:
> 1)define my cost model and optimization goal (Volcano uses Lt method) in a
> CustomCostBase class that implements RelOptCost
> 2)define my operators with the same logic as they are and override their
> computeCost method
> 3)define my proper converter and transform rules
>
> However, in order to use Volcano, I have to use some rules with Convention
> that is not NONE. So I think I have to implement my own custom Convention
> that my rules should have and convert the logical rules to my custom ones.
> Then enforce my rules on these custom rules and get an optimized plan.
>
> Do I miss something in the logic I describe above? Do you have any
> suggestions on how to do it in another way?
>
> Thank you in advance,
> George
>
>


Join Rules using Volcano or Hep Planner

2016-11-24 Thread Γιώργος Θεοδωράκης
Hello,

I am trying to get an optimized join reorder for a given RelNode. Until
now, I used VolcanoPlanner with these rules and it worked well for a small
number of Joins :
JoinPushThroughJoinRule.LEFT,
JoinPushThroughJoinRule.RIGHT,
JoinAssociateRule.INSTANCE

When I change the above rules with :
 JoinToMultiJoinRule.INSTANCE ,
LoptOptimizeJoinRule.INSTANCE
I still get the optimized result for the small number of tests I have ran.

However, when I try the latter rules in HepPlanner, I don't get an
optimized plan. Are these rules supposed to work only in VolcanoPlanner?
Also what is the difference between choosing (JoinPushThroughJoinRule +
JoinAssociateRule) vs (JoinToMultiJoinRule + LoptOptimizeJoinRule)? As I
see here
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/Programs.java#L217
,
it chooses LoptOptimizeJoinRule if there are 6 or more Joins. Could I use
it for all cases, or is it better to use (JoinPushThroughJoinRule +
JoinAssociateRule) for a small number of Joins?

Thank you in advance,
George


Re: Create a FilterPushThroughFilterRule

2016-11-16 Thread Γιώργος Θεοδωράκης
I think I fixed it from the last time, when I didn't swap the filters
correctly:

  public void onMatch(RelOptRuleCall call) {
final Filter filter = call.rel(0);
final Filter secFilter = call.rel(1);
final LogicalFilter newFilter =
LogicalFilter.create(secFilter.getInput(), filter.getCondition());
final LogicalFilter newSecFilter = LogicalFilter.create(newFilter,
secFilter.getCondition());
call.transformTo(newSecFilter);
 }

I haven't thought of special cases, and for now it chooses to swap filter
according to the number of rows in the output (and selectivity of
operators). In the current cost model it makes a little difference, but in
a more accurate approximation of selectivities it could be very helpful for
optimizing the final plan.


2016-11-16 2:09 GMT+02:00 Julian Hyde <jh...@apache.org>:

> Can you log a JIRA case, and include the full error stack?
> (VolcanoRuleCall.java:236 is just a “re-throw”).
>
> > On Nov 15, 2016, at 3:59 PM, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
> wrote:
> >
> > Hello Julian,
> >
> > I get no matter what I do this exception:
> > Exception in thread "main" java.lang.AssertionError: Internal error:
> Error
> > while applying rule FilterPushThroughFilter, args
> > [rel#15:LogicalFilter.NONE.[[0]](input=rel#12:Subset#1.
> NONE.[0],condition==($1,
> > 5)), rel#11:LogicalFilter.NONE.[[0]](input=rel#10:Subset#0.NONE.[
> 0],condition=>($1,
> > 5))]
> > at org.apache.calcite.util.Util.newInternal(Util.java:792)
> > at org.apache.calcite.plan.volcano.VolcanoRuleCall.
> onMatch(VolcanoRuleCall
> > .java:236)
> > at org.apache.calcite.plan.volcano.VolcanoPlanner.
> findBestExp(VolcanoPlanner
> > .java:819)
> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(
> Programs.java:334)
> > at org.apache.calcite.prepare.PlannerImpl.transform(
> PlannerImpl.java:308)
> > at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:287)
> > at calcite.Tester.main(Tester.java:68)
> > ...
> >
> > I try to use something like :
> >
> > final Filter filter = call.rel(0);
> > final Filter secFilter = call.rel(1);
> > final LogicalFilter newFilter = LogicalFilter.create(filter.getInput(),
> > secFilter.getCondition());
> > final LogicalFilter newSecFilter = LogicalFilter.create(newFilter,
> filter.
> > getCondition());
> > call.transformTo(newSecFilter);
> >
> > but nothing hapens. Volcano does nothing and Hep crushes. Also I think
> > RelBuilder doesn't try to merge them. Do you have any suggestions?
> >
> > Thanks,
> > George
> >
> > 2016-11-14 21:17 GMT+02:00 Julian Hyde <jh...@apache.org>:
> >
> >> You have two calls to “build”. That looks wrong, because “build” pops an
> >> entry off the stack.
> >>
> >> If your intention is to swap the filters then you should push them in
> the
> >> reverse order than you are currently pushing them.
> >>
> >> Lastly, RelBuilder.filter might try to merge consecutive filter nodes.
> If
> >> it does — and I don’t recall whether it does — your filter had better be
> >> flagged non-deterministic (or something) to prevent the merge from
> >> happening.
> >>
> >> Julian
> >>
> >>> On Nov 14, 2016, at 1:06 AM, Γιώργος Θεοδωράκης <
> giwrgosrth...@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I want to create a rule that pushes a filter through another filter ( I
> >>> don't merge them) according to their selectivities to optimize the
> final
> >>> plan. I am using other rules as templates to create it but I keep
> getting
> >>> errors, as I haven't understood correctly the basics. I want to have
> >>> something like this :
> >>>
> >>> public void onMatch(RelOptRuleCall call) {
> >>>   final Filter filter = call.rel(0);
> >>>   final Filter secFilter = call.rel(1);
> >>>   final RelBuilder relBuilder = call.builder();
> >>>   relBuilder.push(filter)
> >>>   .filter(secFilter.getCondition())
> >>>   .build();
> >>>   call.transformTo(relBuilder.build());
> >>> }
> >>
> >>
>
>


Create a FilterPushThroughFilterRule

2016-11-14 Thread Γιώργος Θεοδωράκης
Hello,

I want to create a rule that pushes a filter through another filter ( I
don't merge them) according to their selectivities to optimize the final
plan. I am using other rules as templates to create it but I keep getting
errors, as I haven't understood correctly the basics. I want to have
something like this :

  public void onMatch(RelOptRuleCall call) {
final Filter filter = call.rel(0);
final Filter secFilter = call.rel(1);
final RelBuilder relBuilder = call.builder();
relBuilder.push(filter)
.filter(secFilter.getCondition())
.build();
call.transformTo(relBuilder.build());
 }


Re: Window Semantics for Streams

2016-11-03 Thread Γιώργος Θεοδωράκης
I think the right syntax is like this:

"select s.orders.productid, SUM(units) over pr "
+ "from s.orders "
+ "window pr as (PARTITION BY productid ORDER BY productid ROWS BETWEEN 5
PRECEDING AND 10 FOLLOWING)"
);

or this :

"select s.orders.productid, SUM(units) over (pr ROWS BETWEEN 5 PRECEDING
AND 10 FOLLOWING) "
+ "from s.orders "
+ "window pr as (PARTITION BY productid ORDER BY productid)"
);

Although with the second way the bounds are undefined for me.
Do you get right the bounds for your query?

2016-11-03 11:22 GMT+02:00 Radu Tudoran <radu.tudo...@huawei.com>:

> Hi,
>
> I am also working on a similar topic and encountered a problem with the
> window definition and parsing it's syntax.
>
> I am following the example to define the windows with the OVER clause. And
> I am trying to use also the Partition BY clause when defining the window.
> Can I get some help to find the proper syntax to define the window. It
> seems that the example syntax did not work for me as shown below.
>
>
> SELECT CLIENT_NAME,
> SUM(AMOUNT) OVER prodW (RANGE INTERVAL '10' MINUTE PRECEDING) AS
> m10
>   FROM inputdata
>   WINDOW prodW AS (ORDER BY ETIME PARTITION BY ID)
>
> However this leads to a parsing error
>
>
> Encountered "(" at line 1, column 46.
> Was expecting one of:
> "FROM" ...
> "," ...
> "AS" ...
> ...
> at org.apache.calcite.sql.parser.impl.SqlParserImpl.convertException(
> SqlParserImpl.java:388)
> at org.apache.calcite.sql.parser.impl.SqlParserImpl.
> normalizeException(SqlParserImpl.java:119)
> at org.apache.calcite.sql.parser.SqlParser.parseQuery(
> SqlParser.java:131)
> at org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.
> java:156)
> at org.apache.calcite.prepare.PlannerImpl.parse(PlannerImpl.
> java:167)
> at org.huawei.demo.sqlparsercsv.LocalParser3.main(
> LocalParser3.java:68)
>
>
> Alternatively I tried to define the window completely
>
>
> SELECT CLIENT_NAME,
> SUM(AMOUNT) OVER prodW AS m10
>   FROM inputdata
>   WINDOW prodW AS (ORDER BY ETIME RANGE INTERVAL '10' MINUTE PRECEDING
> PARTITION BY ID)
>
> Which leads to a different error:
>
> Exception in thread "main" org.apache.calcite.sql.parser.SqlParseException:
> Encountered "PARTITION" at line 1, column 158.
> Was expecting one of:
> ")" ...
> "ALLOW" ...
> "DISALLOW" ...
>
> at org.apache.calcite.sql.parser.impl.SqlParserImpl.
> convertException(SqlParserImpl.java:388)
> at org.apache.calcite.sql.parser.impl.SqlParserImpl.
> normalizeException(SqlParserImpl.java:119)
> at org.apache.calcite.sql.parser.SqlParser.parseQuery(
> SqlParser.java:131)
> at org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.
> java:156)
> at org.apache.calcite.prepare.PlannerImpl.parse(PlannerImpl.
> java:167)
> at org.huawei.demo.sqlparsercsv.LocalParser3.main(
> LocalParser3.java:74)
>
>
>
>
>
> -Original Message-
> From: Julian Hyde [mailto:jh...@apache.org]
> Sent: Wednesday, November 02, 2016 7:14 PM
> To: dev@calcite.apache.org
> Subject: Re: Window Semantics for Streams
>
> As you know, streams.html is a specification. We do not claim that it is
> all implemented.
>
> Did you do a search of the existing tests? JdbcTest.testWinAgg2 features
> windows that have a variety of bounds, and produces the correct results.
> There are also tests in winagg.iq.
>
> I suspect that the “constants” field of Window is not output as part of
> the explain for Window (or LogicalWindow). The $2 and $3 refer to those
> hidden constants.
>
> Julian
>
> > On Nov 2, 2016, at 10:53 AM, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > Can someone inform me if we can define the bounds of sliding windows
> > with OVER in Calcite at this moment? I am trying to define sliding
> > windows according to the examples given in https://calcite.apache
> > .org/docs/stream.html and I keep getting wrong results. Some examples
> > and the plans (I use ProjectToWindowRUle) they generate are :
> > 1)
> >   "select s.orders.productid, SUM(units) over pr " + "from
> > s.orders " + "window pr as (ORDER BY productid ROWS BETWEEN 5
> > PRECEDING AND
> > 10 FOLLOWING)"
> >
> > ==>
> > LogicalProject(productid=[$0], EXPR$1=[CASE(>($2, 0),
> > CAST($3):INTEGER,
> > null)]) LogicalWindow(window#0=[window(

Re: Window Semantics for Streams

2016-11-02 Thread Γιώργος Θεοδωράκης
Hello,

Can someone inform me if we can define the bounds of sliding windows with
OVER in Calcite at this moment? I am trying to define sliding windows
according to the examples given in https://calcite.apache
.org/docs/stream.html and I keep getting wrong results. Some examples and
the plans (I use ProjectToWindowRUle) they generate are :
1)
   "select s.orders.productid, SUM(units) over pr " + "from
s.orders " + "window pr as (ORDER BY productid ROWS BETWEEN 5 PRECEDING AND
10 FOLLOWING)"

==>
LogicalProject(productid=[$0], EXPR$1=[CASE(>($2, 0), CAST($3):INTEGER,
null)]) LogicalWindow(window#0=[window(partition {} order by [0] rows
between $2 PRECEDING and $3 FOLLOWING aggs [COUNT($1), $SUM0($1)])])
LogicalProject(productid=[$1], units=[$2]) LogicalTableScan(table=[[s,
orders]])

in which, the numbers I have used are "converted" to columns.

2)
"select s.orders.productid, SUM(units) over (ORDER BY productid ROWS
BETWEEN 5 PRECEDING AND 10 FOLLOWING) " + "from s.orders "

For this I get the same plan as before.

3)The same goes for RANGE :

 "select s.orders.productid, SUM(units) over (ORDER BY productid
RANGE BETWEEN 5 PRECEDING AND 10 FOLLOWING) "
 + "from s.orders "

4)
 "select s.orders.productid, SUM(units) over (ORDER BY productid
RANGE 3600 PRECEDING) "
 + "from s.orders "

==>
LogicalProject(productid=[$0], EXPR$1=[CASE(>($2, 0), CAST($3):INTEGER,
null)])
  LogicalWindow(window#0=[window(partition {} order by [0] range between $2
PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])
LogicalProject(productid=[$1], units=[$2])
  LogicalTableScan(table=[[s, orders]])

Can someone provide me a working example?

Thanks,
George

2016-10-18 20:33 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> Also this query in Calcite :
>
>  "select * from ("
> + "select s.orders.productid , avg(units) OVER (product
> ROWS BETWEEN 10 PRECEDING and 5 FOLLOWING)"   + " as m10, "
> + "AVG(units) OVER (product RANGE INTERVAL '7' DAY
> PRECEDING) AS d7 "
> + "from s.orders "
> + " WINDOW product AS (PARTITION BY productId)) "
> + "where m10>d7 "
>  );
>
> gives me after optimization the following plan, that doesn't have any
> window boundaries :
>
> LogicalFilter(condition=[>($1, $2)])
>   LogicalProject(productid=[$0], m10=[CAST(/($1, $2)):INTEGER NOT NULL],
> d7=[CAST(/($3, $4)):INTEGER NOT NULL])
> LogicalProject(productid=[$0], $1=[$2], $2=[$3], $3=[$4], $4=[$5])
>   LogicalWindow(window#0=[window(partition {0} order by [] rows
> between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1),
> COUNT($1)])], window#1=[window(partition {0} order by [] range between
> UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1), COUNT($1)])])
> LogicalProject(productid=[$1], units=[$2])
>   LogicalTableScan(table=[[s, orders]])
>
> 2016-10-18 20:23 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:
>
>> Hi,
>>
>> I was wondering if there is any possible way to define windows with SQl
>> in Calcite for queries that don't have an aggregate function? For example,
>> I want to define the queries from Linear Road Benchmark of the STREAM
>> project (http://infolab.stanford.edu/stream/cql-benchmark.html):
>>
>> 1)
>> SELECT DISTINCT car_id
>> FROM CarSegStr [RANGE 30 SECONDS];
>>
>> 2)
>> SELECT car_id, exp_way, dir, seg
>> FROM CarSegStr [PARTITION BY car_id ROWS 1], CurActiveCars
>> WHERE CarSegStr.car_id = CurActiveCars.car_id;
>>
>> Thank you in advance,
>> George.
>>
>
>


Re: How to use VolcanoPlanner

2016-11-01 Thread Γιώργος Θεοδωράκης
I fixed the errors (they occurred  because of the way I added
EnumerableRules in Volcano).

I have implemented something like what you suggested:
1)I use VolcanoPlanner (with both simple and EnumerableRules), and get a
plan with EnumerableRules. As I have found by the hard way, I cannot get a
plan with LogicalOperators (this is what I would prefer) from
VolcanoPlanner, and I have to use some Convention Rules and Traits.
2)I use hepPlanner with two custom Rules I created, which pushdown
EnumerableProjections through EnumerableJoins (this is what I wanted). The
problem is that, my final plan has every rule as EnumerableOperator, except
from LogicalProjections because of my implementation of the custom rules.
The result is fine for now, but I would try to fix it.

 If I am missing something, I look forward to better suggestions.

Thanks,
George

2016-11-01 20:09 GMT+02:00 Colm McHugh <colm.mch...@gmail.com>:

> What errors did you get? It should be possible to use both Volcano and hep
> when query planning (Drill does this, and possibly others).
> It superficially sounds like applying a heuristics pass that includes the
> project pushdown (and any other rules you may want to apply)
> after the volcano pass should work.
>
> On Tue, Nov 1, 2016 at 4:05 AM, Γιώργος Θεοδωράκης <
> giwrgosrth...@gmail.com>
> wrote:
>
> > I am wondering if is it possible to push down projections in Volcano
> > generally. With the cost model of Volcano, a projection adds rows and cpu
> > cost and it can't be chosen. For example  for the next query:
> >
> > "select s.products.productid "
> > + "from s.products,s.orders "
> > + "where s.orders.productid = s.products.productid  "
> > + "and units>5 "
> >
> > , the optimized plan I get from Volcano is this
> >
> > EnumerableProject(productid=[$0]): rowcount = 5.3995,
> > cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
> >   EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
> > 5.3995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
> > 0.0 io}, id = 85
> > EnumerableTableScan(table=[[s, products]]): rowcount = 6.0,
> cumulative
> > cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
> > EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative
> > cost
> > = {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
> >   EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0,
> cumulative
> > cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59
> >
> > However the plan I would like to get is the one with projections pushed
> > down, which has bigger cumulative cost as we would have two more
> > projections at the source of inputs (with additional rowCount and cpu
> cost)
> > with Volcano's cost model, but smaller in most cases as we reduce the
> > memory needed(which is not taken in consideration here).
> >
> > Are there any suggestions? At first I used hepPlanner, but I didn't get
> > optimized plans for join order with the built-in
> > rules(using JoinPushThroughJoinRule.RIGHT and
> JoinPushThroughJoinRule.LEFT
> > never stops, and using MultiJoinRules doesn't do anything). Then, I tried
> > to use VolcanoPlanner and I got optimized plans for join order with no
> > pushed-down projections. I also tried to use a combination of both
> planners
> > but I got errors. What should I do?
> >
> > 2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:
> >
> > > I fixed the second error by changing my Statistics to:
> > >
> > > public Statistic getStatistic() {
> > > int rowCount = rows.size();
> > > return Statistics.of(rowCount, ImmutableList.of());
> > > }
> > >
> > > Any suggestions of a better solution are welcome. The project push-down
> > > problem still exists...
> > >
> > > 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com
> >:
> > >
> > >> Hi,
> > >> I was missing the implementations of operators, and I added the built
> in
> > >> EnumerableRules until I create my own, in order to fix it. However,
> the
> > >> plan I get from Volcano Optimizer is different from the one I get from
> > >> HepPlanner, although I use the same rules. My problem is about
> > Projection
> > >> push-down. The hepPlanner pushes Projections to the bottom of the
> > RelNode
> > >> tree, and VolcanoPlanner keeps them always at the top (doesn't push
> them
> > >> through joins). I use these rules in both :
> > >>
> >

Re: How to use VolcanoPlanner

2016-11-01 Thread Γιώργος Θεοδωράκης
I am wondering if is it possible to push down projections in Volcano
generally. With the cost model of Volcano, a projection adds rows and cpu
cost and it can't be chosen. For example  for the next query:

"select s.products.productid "
+ "from s.products,s.orders "
+ "where s.orders.productid = s.products.productid  "
+ "and units>5 "

, the optimized plan I get from Volcano is this

EnumerableProject(productid=[$0]): rowcount = 5.3995,
cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
  EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
5.3995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
0.0 io}, id = 85
EnumerableTableScan(table=[[s, products]]): rowcount = 6.0, cumulative
cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative cost
= {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
  EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0, cumulative
cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59

However the plan I would like to get is the one with projections pushed
down, which has bigger cumulative cost as we would have two more
projections at the source of inputs (with additional rowCount and cpu cost)
with Volcano's cost model, but smaller in most cases as we reduce the
memory needed(which is not taken in consideration here).

Are there any suggestions? At first I used hepPlanner, but I didn't get
optimized plans for join order with the built-in
rules(using JoinPushThroughJoinRule.RIGHT and JoinPushThroughJoinRule.LEFT
never stops, and using MultiJoinRules doesn't do anything). Then, I tried
to use VolcanoPlanner and I got optimized plans for join order with no
pushed-down projections. I also tried to use a combination of both planners
but I got errors. What should I do?

2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> I fixed the second error by changing my Statistics to:
>
> public Statistic getStatistic() {
> int rowCount = rows.size();
> return Statistics.of(rowCount, ImmutableList.of());
> }
>
> Any suggestions of a better solution are welcome. The project push-down
> problem still exists...
>
> 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:
>
>> Hi,
>> I was missing the implementations of operators, and I added the built in
>> EnumerableRules until I create my own, in order to fix it. However, the
>> plan I get from Volcano Optimizer is different from the one I get from
>> HepPlanner, although I use the same rules. My problem is about Projection
>> push-down. The hepPlanner pushes Projections to the bottom of the RelNode
>> tree, and VolcanoPlanner keeps them always at the top (doesn't push them
>> through joins). I use these rules in both :
>>
>>ProjectRemoveRule.INSTANCE,
>>ProjectJoinTransposeRule.INSTANCE,
>> ProjectFilterTransposeRule.INSTANCE, /*it is better to use
>> filter first and then project*/
>>ProjectTableScanRule.INSTANCE,
>>ProjectWindowTransposeRule.INSTANCE,
>>ProjectMergeRule.INSTANCE
>>
>> and
>> EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
>> EnumerableRules.ENUMERABLE_PROJECT_RULE,
>> ...
>>
>> Finally , when trying to use aggregate I get this error:
>>
>> Exception in thread "main" java.lang.AssertionError: Internal error:
>> Error while applying rule EnumerableTableScanRule, args
>> [rel#4:LogicalTableScan.NONE.[](table=[s, orders])]
>> at org.apache.calcite.util.Util.newInternal(Util.java:792)
>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> VolcanoRuleCall.java:236)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>> VolcanoPlanner.java:819)
>> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
>> s.java:334)
>> at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308)
>> at calcite.VolcanoTester.main(VolcanoTester.java:106)
>> Caused by: java.lang.AssertionError: Internal error: Error occurred while
>> applying rule EnumerableTableScanRule
>> at org.apache.calcite.util.Util.newInternal(Util.java:792)
>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
>> VolcanoRuleCall.java:148)
>> at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRul
>> eCall.java:225)
>> at org.apache.calcite.rel.convert.ConverterRule.onMatch(Convert
>> erRule.java:117)
>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> VolcanoRuleCall.java:213)
>> ... 4 more
>&

Re: How to use VolcanoPlanner

2016-10-27 Thread Γιώργος Θεοδωράκης
lativeCost(RelMetadataQuery.java:258)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:1128)
at
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:336)
at
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:319)
at
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:348)
at
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:319)
at
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:348)
at
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:319)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1830)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1766)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1032)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1052)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1942)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:136)
... 7 more

I define the Statistics in the tables I use like this:
public Statistic getStatistic() {
int rowCount = rows.size();
return Statistics.of(rowCount, null); //add List
}


Thanks,
George

2016-10-16 7:28 GMT+03:00 Jungtaek Lim <kabh...@gmail.com>:

> Hi George,
>
> This patch is ported version (with small fixes) of Milinda's samza-sql
> implementation for Storm SQL.
> https://github.com/apache/storm/pull/1736
>
> In this patch I removed adding HepPlanner and just rely on Volcano Planner
> (so the patch may be the closer thing what you want).
> For now I also remove code regarding metadata since I'm not clear on how it
> works and what it helps, but I'll re-address once I can find its usage and
> benefits.
>
> Hope this helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 10월 4일 (화) 오후 7:08, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>님이
> 작성:
>
> > I think I did as you said:
> >
> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/src/main/java/
> calcite/VolcanoTester.java
> >
> > and I get for every query I use:
> > Exception in thread "main"
> > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> > [rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
> > Root: rel#10:Subset#2.NONE.[]
> > Original rel:
> > 
> > at
> >
> > org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(
> RelSubset.java:443)
> > at
> >
> > org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.
> java:293)
> > at
> >
> > org.apache.calcite.plan.volcano.VolcanoPlanner.
> findBestExp(VolcanoPlanner.java:835)
> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(
> Programs.java:334)
> > at org.apache.calcite.prepare.PlannerImpl.transform(
> PlannerImpl.java:308)
> > at calcite.VolcanoTester.main(VolcanoTester.java:77)
> >
> > My table's is defined here :
> >
> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/src/main/java/
> calcite/utils/OrdersTableFactory.java
> >
> >
> > Thank you for your time,
> > George
> >
> >
> > 2016-10-04 2:38 GMT+03:00 Jordan Halterman <jordan.halter...@gmail.com>:
> >
> > > The link you provided is a pretty good example. Build a FrameworkConfig
> > > with your schema, parser config, and other information, and use it to
> > > create a Planner. That Planner uses a VolcanoPlanner internally. What's
> > > missing from that particular example is just the addition of programs.
> > > Programs are effectively sets of rules you will use to optimize your
> > query.
> > > So, to add your FilterProjectTransposeRule to the planner, call this
> when
> > > building your FrameworkConfig:
> > >
> > > .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
> > >
> > > That adds your program(s) to the set of programs in the planner, and
> > those
> > > programs can be accessed to optimize the query. Use the planner to
> > parse()
> > > your query, validate() your query, and then convert() your query into a
> > > logical plan. Then call...
> > >
> > > RelTraitSet traitSet = planner.emptyTraitSet().
> replace(Convention.NONE);
> > > planner.transform(0. traitSet, logicalPlan);
> > >
> > > to apply the rules you added to the configuration. That should use the
> > > VolcanoPlanner to apply the rules you added in your Progra

Re: Streaming Queries Optimization

2016-10-21 Thread Γιώργος Θεοδωράκης
Sorry for not being precise when I defined my problem (the title of the
subject must be misleading). The optimization on joins is for regular
relational joins, not streaming, and I can't find a combination of rules to
achieve it. The number of joins would be small(under 8) if it helps you.

As for my second question, I wish to change the cost model you are using in
Calcite to something like this :
Rate-Based Query Optimization for Streaming Information Sources (
http://www-db.cs.wisc.edu/niagara/papers/rates_crc.pdf),
which is rate-based cost optimization. Every operator has a its own
constant F(operator) that defines the output rate of tuples. For example if
we have a filter and a project there would be two plans:

a) Project->Filter : rin*F(project)=rout(project) =>
rout(project)*F(filter)=rout(final)
b) Filter->Project : rin*F(filter)=rout(filter) =>
rout(filter)*F(project)=rout(final) ,
rin=rate of tuples of the input stream and rout= the rate the output tuples

and we choose the plan with the higher rout(final). So, my question is if I
can use the predefined logical operators of calcite, change their cost and
instead of trying to find the minimum cost (as it is now), try to find the
maximum output rate.

Thank you in advance,
George


2016-10-21 19:32 GMT+03:00 Julian Hyde <jh...@apache.org>:

> I suspect that streaming join requires a plan of a very different
> shape than a regular relational join. If you're joining a stream to a
> table, and the table is small, then you can use a map join (aka a hash
> join with a small "build" side), so that's well understood from the
> world of database query optimization. But joining two streams is very
> different to joining two tables: for one thing, it is a union:
>
>  stream1 join history-of-stream2
>  union
>  history-of-stream2 join stream2
>
> And second, the size of the maps might be different than the "size" of
> the streams. If they're smaller than memory then virtually any plan
> will be OK.
>
> So where to start is easy: Figure out what physical plan you want the
> planner to create. Then work backwards and figure out a cost model
> whereby that plan is better than the other alternatives, and write
> transformation rules that can validly create that physical plan from
> your logical plan.
>
> Julian
>
>
> On Fri, Oct 21, 2016 at 8:35 AM, Γιώργος Θεοδωράκης
> <giwrgosrth...@gmail.com> wrote:
> > Hi,
> >
> > I have two questions:
> >
> > 1)When trying to optimize the join order of a query what rules should I
> > use? For example I have this query:
> > "select s.orders.productid  "
> > + "from  s.products, s.customers,s.orders "
> > + "where s.orders.productid = s.products.productid and "
> > + "s.customers.customerid=s.orders.
> > customerid  "
> > with these sizes => orders[15 rows] , products [5 rows], orders[10
> rows]. I
> > am using  the heuristicPlanner with these rules for join :
> > this.hepPlanner.addRule(JoinToMultiJoinRule.INSTANCE);
> > this.hepPlanner.addRule(LoptOptimizeJoinRule.INSTANCE);
> > this.hepPlanner.addRule(MultiJoinOptimizeBushyRule.INSTANCE);
> > this.hepPlanner.addRule(JoinPushThroughJoinRule.LEFT);
> >
> > but nothing happens, and the final logical plan changes according to the
> > order I define the tables after FROM in the query. Is there something I
> am
> > missing? What would the LoptOptimizeJoinRule do?
> >
> > 2)Is it possible to change the cost model of project,filter,aggregate and
> > join to a rate-based, from which I want to get the maximum rate instead
> of
> > the minimum cost for optimization? Should I create new rules or override
> > the old ones? Any hints on where to start?
> >
> > 2016-10-10 15:33 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:
> >
> >> Hello,
> >>
> >> I am trying to optimize the logical/physical plan of a given streaming
> >> query with Calcite and execute it in a separate engine. So far, I am
> using
> >> heuristic planner and some cost-based push-down rules and get a
> >> "relational" optimization on the plan. By relational, I mean that this
> is
> >> basic optimization that I would get if my query was executed in a
> >> relational database and wasn't a stream. As a result I am not optimizing
> >> the query with streaming criteria at all.
> >>
> >> Can someone give any suggestions on further optimization on streaming
> >> queries? Is there anything more to do using Calcite, or the optimization
> >> ends by using the built in rules? Finally, any related work would be
> >> welcome.
> >>
> >> Thanks in advance,
> >> George.
> >>
>


Re: Window Semantics for Streams

2016-10-18 Thread Γιώργος Θεοδωράκης
Also this query in Calcite :

 "select * from ("
+ "select s.orders.productid , avg(units) OVER (product
ROWS BETWEEN 10 PRECEDING and 5 FOLLOWING)"   + " as m10, "
+ "AVG(units) OVER (product RANGE INTERVAL '7' DAY
PRECEDING) AS d7 "
+ "from s.orders "
+ " WINDOW product AS (PARTITION BY productId)) "
+ "where m10>d7 "
 );

gives me after optimization the following plan, that doesn't have any
window boundaries :

LogicalFilter(condition=[>($1, $2)])
  LogicalProject(productid=[$0], m10=[CAST(/($1, $2)):INTEGER NOT NULL],
d7=[CAST(/($3, $4)):INTEGER NOT NULL])
LogicalProject(productid=[$0], $1=[$2], $2=[$3], $3=[$4], $4=[$5])
  LogicalWindow(window#0=[window(partition {0} order by [] rows between
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1), COUNT($1)])],
window#1=[window(partition {0} order by [] range between UNBOUNDED
PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1), COUNT($1)])])
LogicalProject(productid=[$1], units=[$2])
      LogicalTableScan(table=[[s, orders]])

2016-10-18 20:23 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> Hi,
>
> I was wondering if there is any possible way to define windows with SQl in
> Calcite for queries that don't have an aggregate function? For example, I
> want to define the queries from Linear Road Benchmark of the STREAM project
> (http://infolab.stanford.edu/stream/cql-benchmark.html):
>
> 1)
> SELECT DISTINCT car_id
> FROM CarSegStr [RANGE 30 SECONDS];
>
> 2)
> SELECT car_id, exp_way, dir, seg
> FROM CarSegStr [PARTITION BY car_id ROWS 1], CurActiveCars
> WHERE CarSegStr.car_id = CurActiveCars.car_id;
>
> Thank you in advance,
> George.
>


Window Semantics for Streams

2016-10-18 Thread Γιώργος Θεοδωράκης
Hi,

I was wondering if there is any possible way to define windows with SQl in
Calcite for queries that don't have an aggregate function? For example, I
want to define the queries from Linear Road Benchmark of the STREAM project
(http://infolab.stanford.edu/stream/cql-benchmark.html):

1)
SELECT DISTINCT car_id
FROM CarSegStr [RANGE 30 SECONDS];

2)
SELECT car_id, exp_way, dir, seg
FROM CarSegStr [PARTITION BY car_id ROWS 1], CurActiveCars
WHERE CarSegStr.car_id = CurActiveCars.car_id;

Thank you in advance,
George.


Streaming Queries Optimization

2016-10-10 Thread Γιώργος Θεοδωράκης
Hello,

I am trying to optimize the logical/physical plan of a given streaming
query with Calcite and execute it in a separate engine. So far, I am using
heuristic planner and some cost-based push-down rules and get a
"relational" optimization on the plan. By relational, I mean that this is
basic optimization that I would get if my query was executed in a
relational database and wasn't a stream. As a result I am not optimizing
the query with streaming criteria at all.

Can someone give any suggestions on further optimization on streaming
queries? Is there anything more to do using Calcite, or the optimization
ends by using the built in rules? Finally, any related work would be
welcome.

Thanks in advance,
George.


Re: How to use VolcanoPlanner

2016-10-04 Thread Γιώργος Θεοδωράκης
I think I did as you said:
https://github.com/giwrgostheod/Calcite-Saber/blob/master/src/main/java/calcite/VolcanoTester.java

and I get for every query I use:
Exception in thread "main"
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
[rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
Root: rel#10:Subset#2.NONE.[]
Original rel:

at
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:443)
at
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:293)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:835)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308)
at calcite.VolcanoTester.main(VolcanoTester.java:77)

My table's is defined here :
https://github.com/giwrgostheod/Calcite-Saber/blob/master/src/main/java/calcite/utils/OrdersTableFactory.java


Thank you for your time,
George


2016-10-04 2:38 GMT+03:00 Jordan Halterman <jordan.halter...@gmail.com>:

> The link you provided is a pretty good example. Build a FrameworkConfig
> with your schema, parser config, and other information, and use it to
> create a Planner. That Planner uses a VolcanoPlanner internally. What's
> missing from that particular example is just the addition of programs.
> Programs are effectively sets of rules you will use to optimize your query.
> So, to add your FilterProjectTransposeRule to the planner, call this when
> building your FrameworkConfig:
>
> .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
>
> That adds your program(s) to the set of programs in the planner, and those
> programs can be accessed to optimize the query. Use the planner to parse()
> your query, validate() your query, and then convert() your query into a
> logical plan. Then call...
>
> RelTraitSet traitSet = planner.emptyTraitSet().replace(Convention.NONE);
> planner.transform(0. traitSet, logicalPlan);
>
> to apply the rules you added to the configuration. That should use the
> VolcanoPlanner to apply the rules you added in your Program. The trait set
> that's passed to that method is the required output trait set. So, if you
> wanted to convert the logical plan into some physical convention, you'd
> pass your physical convention instead of Convention.NONE. I can respond
> with a full example if you need it in a bit. I just don't have the capacity
> to write it ATM.
>
> On Mon, Oct 3, 2016 at 8:51 AM, Γιώργος Θεοδωράκης <
> giwrgosrth...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I want to parse an Sql query and transform it to an optimized relational
> > plan (not convert it to physical !!) using calcite rules based on my
> > database schema and metadata. Right now, the only helpful example I have
> > found for my purpose is taken from
> > https://github.com/milinda/samza-sql/blob/master/samza-
> > sql-planner/src/main/java/org/apache/samza/sql/planner/QueryPlanner.java
> > ,
> > in which a simple Planner is used for parsing and validating Sql and a
> > HepPlanner is used for searching for an optimized plan based on imported
> > rules.
> >
> > Is there any way to use in my case the VolcanoPlanner? The only examples
> I
> > have seen so far from the test classes suggest that it should be used for
> > converting relational expressions to physical ones. How can I make the
> > Volcano Planner "see" my SchemaPlus schema ,when I can only define
> > RelOptSchema? Can someone provide me with a complete example of using
> > Volcano Planner and adding rules, such
> > as FilterProjectTransposeRule.INSTANCE?
> >
> > Thanks in advance,
> > George
> >
>


How to use VolcanoPlanner

2016-10-03 Thread Γιώργος Θεοδωράκης
Hi,

I want to parse an Sql query and transform it to an optimized relational
plan (not convert it to physical !!) using calcite rules based on my
database schema and metadata. Right now, the only helpful example I have
found for my purpose is taken from
https://github.com/milinda/samza-sql/blob/master/samza-sql-planner/src/main/java/org/apache/samza/sql/planner/QueryPlanner.java
,
in which a simple Planner is used for parsing and validating Sql and a
HepPlanner is used for searching for an optimized plan based on imported
rules.

Is there any way to use in my case the VolcanoPlanner? The only examples I
have seen so far from the test classes suggest that it should be used for
converting relational expressions to physical ones. How can I make the
Volcano Planner "see" my SchemaPlus schema ,when I can only define
RelOptSchema? Can someone provide me with a complete example of using
Volcano Planner and adding rules, such
as FilterProjectTransposeRule.INSTANCE?

Thanks in advance,
George


Re: Using Metadata in Query Optimization

2016-09-25 Thread Γιώργος Θεοδωράκης
I believe it has to do with the implementation of my tables, as I get fixed
numbers:
1)select * from products => Rows:100
2)select * from products where productid > 5 => Rows:50
3)select * from products where productid = 5 => Rows:25 (and the exact same
numbers for the table orders)

How can I define a table that gives back correct metadata for the rows?
Right now my tables implement ScannableTable.



2016-09-24 21:15 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> Hello,
>
> I am using a HepPlanner for query optimization on logical operators. When
> I run the optimizations, I get an optimized plan according to the rules I
> have used but wrong metadata results. My code is :
>
> SqlNode sqlNode = planner.parse(query);
> SqlNode validatedSqlNode = planner.validate(sqlNode);
> final RelNode convertedNode = planner.convert(validatedSqlNode);
> final RelMetadataProvider provider = convertedNode.getCluster().
> getMetadataProvider();
>
> final List list = Lists.newArrayList(provider);
> hepPlanner.registerMetadataProviders(list);
> final RelMetadataProvider cachingMetaDataProvider = new
> CachingRelMetadataProvider(ChainedRelMetadataProvider.of(list),
> hepPlanner);
> convertedNode.accept(new MetaDataProviderModifier(
> cachingMetaDataProvider));
>
> hepPlanner.setRoot(convertedNode);
> RelNode rel = hepPlanner.findBestExp(); // when I print the logical plan
> it is correct
> rel.accept(new MetaDataProviderModifier(provider));
>
> final RelMetadataQuery mq = RelMetadataQuery.instance();
> RelOptCost relCost= mq.getCumulativeCost(rel);
> System.out.println("Plan cost is : " + relCost.toString()); //when I print
> the cost I get //a fixed number that has nothing to do with the actual size
> of my tables
>
>
> I always get the same numbers from cumulative cost, even if I change the
> contents of my tables for the same query. However, the numbers change
> before an after optimizing the query, but are still very large.What am I
> doing wrong?
>
> Thank you in advance,
>  George
>


Using Metadata in Query Optimization

2016-09-24 Thread Γιώργος Θεοδωράκης
Hello,

I am using a HepPlanner for query optimization on logical operators. When I
run the optimizations, I get an optimized plan according to the rules I
have used but wrong metadata results. My code is :

SqlNode sqlNode = planner.parse(query);
SqlNode validatedSqlNode = planner.validate(sqlNode);
final RelNode convertedNode = planner.convert(validatedSqlNode);
final RelMetadataProvider provider =
convertedNode.getCluster().getMetadataProvider();

final List list = Lists.newArrayList(provider);
hepPlanner.registerMetadataProviders(list);
final RelMetadataProvider cachingMetaDataProvider = new
CachingRelMetadataProvider(ChainedRelMetadataProvider.of(list), hepPlanner);
convertedNode.accept(new MetaDataProviderModifier(cachingMetaDataProvider));

hepPlanner.setRoot(convertedNode);
RelNode rel = hepPlanner.findBestExp(); // when I print the logical plan it
is correct
rel.accept(new MetaDataProviderModifier(provider));

final RelMetadataQuery mq = RelMetadataQuery.instance();
RelOptCost relCost= mq.getCumulativeCost(rel);
System.out.println("Plan cost is : " + relCost.toString()); //when I print
the cost I get //a fixed number that has nothing to do with the actual size
of my tables


I always get the same numbers from cumulative cost, even if I change the
contents of my tables for the same query. However, the numbers change
before an after optimizing the query, but are still very large.What am I
doing wrong?

Thank you in advance,
 George


Re: Query optimization by using Rules

2016-09-17 Thread Γιώργος Θεοδωράκης
Thank you for the quick response Julian,

I am interested mainly in logical transformation rules. I try to create an
optimized logical plan and transform it in a physical one in my engine. The
engine is streaming and uses ByteBuffers, and I am wondering if it is
possible to optimize simple queries with the following procedure:

1)create a calcite schema that my engine uses without using all the input
data. My calcite schema would have only dummy data that are a small
percentage of real data!! =>
2)validate the query and enforce logical rules to improve the logical plan
=>
3)transform this logical plan to physical in my engine

As I have seen in some of our examples, the optimizations use metadata.
However, my engine is simplified in terms of operators, as it only has one
join and some of the key operators(aggregate,filter,project,expressions). I
am not very experienced in query optimization logic and my question is if I
use dummy data, my results would be wrong because of false metadata? Should
I create an adapter like csv for the ByteBuffers that get the streaming
data (if possible?)  ?

Thanks for your time,
George

2016-09-17 19:27 GMT+03:00 Julian Hyde <jh...@apache.org>:

> The form of your test data depends on the kind of rules you are writing.
> If you are aiming to push down a lot of operations down to a particular
> engine (e.g. Druid can handle everything except Join) then you should run
> on that engine. If you are interested mainly in logical transformation
> rules then the capabilities of the engine are less important and you could
> run on say the Csv adapter. For convenience & familiarity I use JDBC_SCOTT,
> and if I want a larger data set with a rich model I use JDBC_FOODMART.
>
> Lastly, if you are writing unit tests for correctness of the rules, and
> don’t want to execute queries, create a sub-class of RelOptRulesTest.
>
> Julian
>
>
> > On Sep 17, 2016, at 8:08 AM, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I am trying to create a basic planner that enforces rules on simple
> > queries. At the moment I have created a planner from the examples (and
> > samza-sql integration I found online) and used HepPlanner for testing
> some
> > rules. My question is which form should my test data be? I am using
> > something like JdbcTest.HrSchema()  right now and I am wondering if I
> > should create Tables that implement Scannable and Filterable in order to
> > implement optimizations.
> >
> > Thanks,
> > George
>
>


Query optimization by using Rules

2016-09-17 Thread Γιώργος Θεοδωράκης
Hi,

I am trying to create a basic planner that enforces rules on simple
queries. At the moment I have created a planner from the examples (and
samza-sql integration I found online) and used HepPlanner for testing some
rules. My question is which form should my test data be? I am using
something like JdbcTest.HrSchema()  right now and I am wondering if I
should create Tables that implement Scannable and Filterable in order to
implement optimizations.

Thanks,
George


Re: Simple Streaming Senario

2016-09-10 Thread Γιώργος Θεοδωράκης
After many tries, in order to build a streaming query application on csv in
Eclipse I used:
As maven dependencies:
- commons-io 2.4 , commons-logging 1.1.3 , commons-lang3 3.2 , janino 2.7.6
, eigenbase-properties 1.1.5 , avatica 1.8.0, opencsv 2.3 , json-simple 1.1

and as external jars :
calcite-core 1.9.0, example-csv-1.9.0, calcite-linq4j 1.9.0 after creating
them with mvn install command.

2016-09-06 21:15 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> I've tried with avatica, avatica-metrics, standalone-server and server,
> all in version 1.8.0 jars from maven repository as dependencies(at first
> only with avatica and avatica-metrics and then with other combinations) and
> 1.9.0-SNAPSHOT versions of calcite core and csv with their sources. At
> first I got an eigenbase-properties NoClass Exception, so I added version
> 1.1.5 from maven repositories. Then, I got java.lang.NoClassDefFoundError:
> org/codehaus/commons/compiler/CompileException , and I am stuck with
> these exception after many combinations and tries.
>
> 2016-09-06 18:46 GMT+03:00 Julian Hyde <jhyde.apa...@gmail.com>:
>
>> That still sounds like a version mismatch. Note that avatica releases are
>> separate from calcite these days. Therefore calcite-core-1.9.0-SNAPSHOT,
>> for instance, depends upon avatica-metrics-1.8.0.
>>
>> I think you should use 1.9.0-SNAPSHOT versions of calcite- jars, and
>> 1.8.0 versions of avatica- jars.
>>
>> Julian
>>
>>
>> > On Sep 6, 2016, at 7:45 AM, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
>> wrote:
>> >
>> > I have imported as external jars calcite-example-csv, calcite-core,
>> avatica
>> > , linq4j, avatica-metrics, avatica-standalone with all their sources and
>> > tests, and I get: Exception in thread "main"
>> java.lang.AbstractMethodError:
>> > org.apache.calcite.config.CalciteConnectionProperty.valueCla‌
>> ​ss()Ljava/lang/Class‌​;
>> > at
>> > org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getE‌
>> ​num(ConnectionConfig‌​Impl.java:228).
>> > Is there something else I am missing.
>> >
>> > Also, when I ran the streaming test from CsvTest.java and I add a
>> filter in
>> > the query, I get an error. Is filter implemented for streaming query by
>> > defining the query with a string, or should I define with another way?
>> >
>> >
>> > 2016-09-05 21:03 GMT+03:00 Julian Hyde <jh...@apache.org>:
>> >
>> >> You have mismatched versions. CANCEL_FLAG will be added to calcite-core
>> >> only in version 1.9. So, if you are using 1.9.0-SNAPSHOT version of
>> >> example/csv you should use a 1.9.0-SNAPSHOT version of other Calcite
>> jars.
>> >> (You can build and install in your local repo using ‘mvn install’.)
>> >>
>> >> 1.9 should be released in a couple of weeks.
>> >>
>> >> Julian
>> >>
>> >>> On Sep 5, 2016, at 3:04 AM, Γιώργος Θεοδωράκης <
>> giwrgosrth...@gmail.com>
>> >> wrote:
>> >>>
>> >>> It has to do with the dependencies. When I am running the sample code
>> >> with
>> >>> changes as Test in
>> >>> ../calcite-master/example/csv/src/test/java/org/apache/calcite/test/
>> I
>> >> have
>> >>> no errors. However, when I try to create my own project in Eclipse and
>> >>> after I  have imported everything with maven repositories (from
>> calcite
>> >> 1.8
>> >>> version and as external .jar the calcite-example-csv-1.9.0-SNAP
>> SHOT.jar
>> >>> from the github version) the same error occurs :
>> >>>
>> >>> Exception in thread "main" java.lang.NoSuchFieldError: CANCEL_FLAG
>> >>> at
>> >>> org.apache.calcite.adapter.csv.CsvScannableTable.scan(
>> >> CsvScannableTable.java:48)
>> >>> at
>> >>> org.apache.calcite.interpreter.TableScanNode.
>> >> createScannable(TableScanNode.java:117)
>> >>> at
>> >>> org.apache.calcite.interpreter.TableScanNode.
>> >> create(TableScanNode.java:94)
>> >>> at org.apache.calcite.interpreter.Nodes$CoreCompiler.visit(
>> Nodes.java:
>> >> 68)
>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>> at
>> >>> sun.reflect.NativeMethodAccessorImpl.invoke(
>> >> NativeMethodAccessorImpl.java:57)
&

Import Calcite as a maven project in Eclipse

2016-09-10 Thread Γιώργος Θεοδωράκης
Hello, I am trying to import latest version of calcite in eclipse. I have
downloaded the source code from github as zip, used these commands:
$mvn install
$mvn eclipse:eclipse
and finally imported the project as an existing maven project. However, I
get many errors (in core's pom.xml , classes missing) and I don't know what
to do. Any suggestions?


Re: Simple Streaming Senario

2016-09-06 Thread Γιώργος Θεοδωράκης
I have imported as external jars calcite-example-csv, calcite-core, avatica
, linq4j, avatica-metrics, avatica-standalone with all their sources and
tests, and I get: Exception in thread "main" java.lang.AbstractMethodError:
org.apache.calcite.config.CalciteConnectionProperty.valueCla‌​ss()Ljava/lang/Class‌​;
at
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getE‌​num(ConnectionConfig‌​Impl.java:228).
Is there something else I am missing.

Also, when I ran the streaming test from CsvTest.java and I add a filter in
the query, I get an error. Is filter implemented for streaming query by
defining the query with a string, or should I define with another way?


2016-09-05 21:03 GMT+03:00 Julian Hyde <jh...@apache.org>:

> You have mismatched versions. CANCEL_FLAG will be added to calcite-core
> only in version 1.9. So, if you are using 1.9.0-SNAPSHOT version of
> example/csv you should use a 1.9.0-SNAPSHOT version of other Calcite jars.
> (You can build and install in your local repo using ‘mvn install’.)
>
> 1.9 should be released in a couple of weeks.
>
> Julian
>
> > On Sep 5, 2016, at 3:04 AM, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
> wrote:
> >
> > It has to do with the dependencies. When I am running the sample code
> with
> > changes as Test in
> > ../calcite-master/example/csv/src/test/java/org/apache/calcite/test/ I
> have
> > no errors. However, when I try to create my own project in Eclipse and
> > after I  have imported everything with maven repositories (from calcite
> 1.8
> > version and as external .jar the calcite-example-csv-1.9.0-SNAPSHOT.jar
> > from the github version) the same error occurs :
> >
> > Exception in thread "main" java.lang.NoSuchFieldError: CANCEL_FLAG
> > at
> > org.apache.calcite.adapter.csv.CsvScannableTable.scan(
> CsvScannableTable.java:48)
> > at
> > org.apache.calcite.interpreter.TableScanNode.
> createScannable(TableScanNode.java:117)
> > at
> > org.apache.calcite.interpreter.TableScanNode.
> create(TableScanNode.java:94)
> > at org.apache.calcite.interpreter.Nodes$CoreCompiler.visit(Nodes.java:
> 68)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> > sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> > org.apache.calcite.util.ReflectUtil.invokeVisitorInternal(
> ReflectUtil.java:257)
> > at org.apache.calcite.util.ReflectUtil.invokeVisitor(
> ReflectUtil.java:214)
> > at org.apache.calcite.util.ReflectUtil$1.invokeVisitor(
> ReflectUtil.java:471)
> > at
> > org.apache.calcite.interpreter.Interpreter$Compiler.visit(Interpreter.
> java:476)
> > at
> > org.apache.calcite.interpreter.Interpreter$Compiler.visitRoot(
> Interpreter.java:433)
> > at org.apache.calcite.interpreter.Interpreter.
> (Interpreter.java:75)
> > at Baz.bind(Unknown Source)
> > at
> > org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(
> CalcitePrepare.java:327)
> > at
> > org.apache.calcite.jdbc.CalciteConnectionImpl.enumerable(
> CalciteConnectionImpl.java:282)
> > at
> > org.apache.calcite.jdbc.CalciteMetaImpl._createIterable(
> CalciteMetaImpl.java:553)
> > at
> > org.apache.calcite.jdbc.CalciteMetaImpl.createIterable(
> CalciteMetaImpl.java:544)
> > at
> > org.apache.calcite.avatica.AvaticaResultSet.execute(
> AvaticaResultSet.java:187)
> > at
> > org.apache.calcite.jdbc.CalciteResultSet.execute(
> CalciteResultSet.java:65)
> > at
> > org.apache.calcite.jdbc.CalciteResultSet.execute(
> CalciteResultSet.java:44)
> > at
> > org.apache.calcite.avatica.AvaticaConnection$1.execute(
> AvaticaConnection.java:605)
> > at
> > org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(
> CalciteMetaImpl.java:599)
> > at
> > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(
> AvaticaConnection.java:613)
> > at
> > org.apache.calcite.avatica.AvaticaStatement.executeInternal(
> AvaticaStatement.java:139)
> > at
> > org.apache.calcite.avatica.AvaticaStatement.executeQuery(
> AvaticaStatement.java:208)
> > at stream_test.CsvTest.checkSql(CsvTest.java:122)
> > at stream_test.CsvTest.checkSql(CsvTest.java:76)
> > at stream_test.TestQuery.main(TestQuery.java:13)
> >
> > Is there anything else I have to import or have I done something wrong?
> >
> > 2016-09-05 1:11 GMT+03:00 Julian Hyde <jhyde.apa...@gmail.com>:
>

Re: Simple Streaming Senario

2016-09-05 Thread Γιώργος Θεοδωράκης
It has to do with the dependencies. When I am running the sample code with
changes as Test in
../calcite-master/example/csv/src/test/java/org/apache/calcite/test/ I have
no errors. However, when I try to create my own project in Eclipse and
after I  have imported everything with maven repositories (from calcite 1.8
version and as external .jar the calcite-example-csv-1.9.0-SNAPSHOT.jar
from the github version) the same error occurs :

Exception in thread "main" java.lang.NoSuchFieldError: CANCEL_FLAG
at
org.apache.calcite.adapter.csv.CsvScannableTable.scan(CsvScannableTable.java:48)
at
org.apache.calcite.interpreter.TableScanNode.createScannable(TableScanNode.java:117)
at
org.apache.calcite.interpreter.TableScanNode.create(TableScanNode.java:94)
at org.apache.calcite.interpreter.Nodes$CoreCompiler.visit(Nodes.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.calcite.util.ReflectUtil.invokeVisitorInternal(ReflectUtil.java:257)
at org.apache.calcite.util.ReflectUtil.invokeVisitor(ReflectUtil.java:214)
at org.apache.calcite.util.ReflectUtil$1.invokeVisitor(ReflectUtil.java:471)
at
org.apache.calcite.interpreter.Interpreter$Compiler.visit(Interpreter.java:476)
at
org.apache.calcite.interpreter.Interpreter$Compiler.visitRoot(Interpreter.java:433)
at org.apache.calcite.interpreter.Interpreter.(Interpreter.java:75)
at Baz.bind(Unknown Source)
at
org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(CalcitePrepare.java:327)
at
org.apache.calcite.jdbc.CalciteConnectionImpl.enumerable(CalciteConnectionImpl.java:282)
at
org.apache.calcite.jdbc.CalciteMetaImpl._createIterable(CalciteMetaImpl.java:553)
at
org.apache.calcite.jdbc.CalciteMetaImpl.createIterable(CalciteMetaImpl.java:544)
at
org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:187)
at
org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:65)
at
org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:44)
at
org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:605)
at
org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:599)
at
org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:613)
at
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:139)
at
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:208)
at stream_test.CsvTest.checkSql(CsvTest.java:122)
at stream_test.CsvTest.checkSql(CsvTest.java:76)
at stream_test.TestQuery.main(TestQuery.java:13)

Is there anything else I have to import or have I done something wrong?

2016-09-05 1:11 GMT+03:00 Julian Hyde <jhyde.apa...@gmail.com>:

> It might be case-sensitivity. Try double-quoting the column names in your
> query.
>
> Julian
>
> > On Sep 4, 2016, at 09:43, Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>
> wrote:
> >
> > I have correctly used sqlline to run queries on a streaming table, but
> now
> > I face problems trying to implement it programmatically with java. I have
> > made an attempt, but haven't got it running yet (
> > http://stackoverflow.com/questions/39318653/create-a-
> streaming-example-with-calcite-using-csv
> > ).
> > Can somebody help me by giving a template or finding what's wrong with my
> > code?
> >
> > Thank you in advance,
> > George
> >
> > 2016-09-03 18:14 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:
> >
> >> When I tried a query like SELECT STREAM ss.depts.deptno FROM ss.depts
> >> WHERE ss.depts.deptno < 30; it gave me a correct answer on the
> SDEPTS.cvs
> >> in sales folder with both my json and model-stream-table.json. I only
> had
> >> to declare better where to find the tables and the columns, because with
> >> only the name it wouldn't run. I still haven't fixed my sOrders.csv yet,
> >> but I suppose it has to do with how I have created.
> >>
> >> 2016-09-03 15:39 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com
> >:
> >>
> >>> I am trying to create a simple streaming query ( like SELECT STREAM *
> >>> FROM ORDERS WHERE units > 10). I have created a stream using a socket
> that
> >>> saves the orders in an sOrders.csv file and I have changed the
> >>> model-stream-table.json like this:
> >>> {
> >>>  version: '1.0',
> >>>  defaultSchema: 'CUSTOM_TABLE',
> >>>  schemas: [
> >>>{
> >>>  name: 'CUSTOM_TABLE',
> >&g

Re: Simple Streaming Senario

2016-09-03 Thread Γιώργος Θεοδωράκης
When I tried a query like SELECT STREAM ss.depts.deptno FROM ss.depts WHERE
ss.depts.deptno < 30; it gave me a correct answer on the SDEPTS.cvs in
sales folder with both my json and model-stream-table.json. I only had to
declare better where to find the tables and the columns, because with only
the name it wouldn't run. I still haven't fixed my sOrders.csv yet, but I
suppose it has to do with how I have created.

2016-09-03 15:39 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrth...@gmail.com>:

> I am trying to create a simple streaming query ( like SELECT STREAM * FROM
> ORDERS WHERE units > 10). I have created a stream using a socket that saves
> the orders in an sOrders.csv file and I have changed the
> model-stream-table.json like this:
> {
>   version: '1.0',
>   defaultSchema: 'CUSTOM_TABLE',
>   schemas: [
> {
>   name: 'CUSTOM_TABLE',
>   tables: [
> {
>   name: 'ORDERS',
>   type: 'custom',
>   factory: 'org.apache.calcite.adapter.csv.CsvStreamTableFactory',
>   stream: {
> stream: true
>   },
>   operand: {
> file: '/home/hduser/Desktop/sOrders.csv',
> flavor: "scannable"
>   }
> }
>   ]
> }
>
> , because when I had defaultSchema: 'STREAM' it showed: Error while
> executing SQL "SELECT STREAM * FROM orders": From line 1, column 22 to line
> 1, column 27: Table 'ORDERS' not found (state=,code=0).
>
> Everything is good, until I try to project or filter a column when it
> shows: Error while executing SQL "SELECT STREAM productId FROM orders":
> From line 1, column 15 to line 1, column 23: Column 'PRODUCTID' not found
> in any table (state=,code=0). When I type !columns I get: CUSTOM_TABLE |
> ORDERS | productId   | 4 | JavaType(class...
>
> To solve my problem, should I write from the start 
> CsvStreamFilterableTable.java
> etc. or are these operations already implemented?
>
> Thank you in advance,
> George
>
>
>


Simple Streaming Senario

2016-09-03 Thread Γιώργος Θεοδωράκης
I am trying to create a simple streaming query ( like SELECT STREAM * FROM
ORDERS WHERE units > 10). I have created a stream using a socket that saves
the orders in an sOrders.csv file and I have changed the
model-stream-table.json like this:
{
  version: '1.0',
  defaultSchema: 'CUSTOM_TABLE',
  schemas: [
{
  name: 'CUSTOM_TABLE',
  tables: [
{
  name: 'ORDERS',
  type: 'custom',
  factory: 'org.apache.calcite.adapter.csv.CsvStreamTableFactory',
  stream: {
stream: true
  },
  operand: {
file: '/home/hduser/Desktop/sOrders.csv',
flavor: "scannable"
  }
}
  ]
}

, because when I had defaultSchema: 'STREAM' it showed: Error while
executing SQL "SELECT STREAM * FROM orders": From line 1, column 22 to line
1, column 27: Table 'ORDERS' not found (state=,code=0).

Everything is good, until I try to project or filter a column when it
shows: Error while executing SQL "SELECT STREAM productId FROM orders":
>From line 1, column 15 to line 1, column 23: Column 'PRODUCTID' not found
in any table (state=,code=0). When I type !columns I get: CUSTOM_TABLE |
ORDERS | productId   | 4 | JavaType(class...

To solve my problem, should I write from the
start CsvStreamFilterableTable.java etc. or are these operations already
implemented?

Thank you in advance,
George


Calcite - Spark Adapter

2016-06-05 Thread Γιώργος Θεοδωράκης
Hello,

My name is George and I am an undergraduate computer science student. I am
doing some research for my diploma thesis about query optimization on
distributed systems. After reading some basics about Calcite project, I
thought I could use it as an SQL optimizer on top of Spark.
I have a Hadoop cluster running on multiple machines, and I run SQl queries
with SparkSQL on data saved in a Data Warehouse (HIVE). My goal is to
optimize certain queries by pushing rules and functions down to the nodes
with a framework like Calcite. However, I haven't found any related
documentation and I am not sure if it is even possible to access the
metadata of hive through Calcite and run the optimizations on Spark. Can
you help me?

Thank you in advance.