RelNode rewrite

2020-02-13 Thread Christian Beikov

Hello,

I'm trying to generate SQL for a trigger that requires a little rewrite 
of a RelNode structure and I wanted to know if anyone could help me 
figure out how to do a few things


 * I need to swap left and right of a LogicalJoin. Doing that messes up
   predicates and projections though. Is there a utility I could use or
   how would I do that?
 * I need to introduce an "alias" like "NEW" into a RelNode for which I
   want to create a RexNode "NEW.key1" for. I want to add a WHERE
   condition based on the NEW alias. Not sure how I would do that
   though. On SqlNode level only maybe?
 * Can I somehow determine what columns of a table are unique or part
   of the primary key? I want to create a predicate like (NEW.key1,
   NEW.key2) = (existing.key1, existing.key2)

Thanks in advance!

Regards,

Christian



Re: RelNode rewrite

2020-02-14 Thread Christian Beikov

Hey Jess!

Am 14.02.2020 um 02:16 schrieb Jess Balint:

Hey Christian,

On Thu, Feb 13, 2020 at 6:30 PM Christian Beikov 
wrote:


Hello,

I'm trying to generate SQL for a trigger that requires a little rewrite
of a RelNode structure and I wanted to know if anyone could help me
figure out how to do a few things

   * I need to swap left and right of a LogicalJoin. Doing that messes up
 predicates and projections though. Is there a utility I could use or
 how would I do that?


You can use JoinCommuteRule, or check it's code for your needs.
Thanks for the hint. I just hope the re-projection isn't confusing for 
the DBMS optimizer. Would be cool if a RelNode could be rewritten from 
bottom up to avoid this. Or is there some kind of Rule I can apply to 
optimize the re-projection away?

   * I need to introduce an "alias" like "NEW" into a RelNode for which I
 want to create a RexNode "NEW.key1" for. I want to add a WHERE
 condition based on the NEW alias. Not sure how I would do that
 though. On SqlNode level only maybe?


I don't understand what you're after here. The WHERE condition will be a
filter node. Can you just add it with attaching something to the rel node?


So in a SQL trigger, there is the alias "NEW" and "OLD" available which 
I want to reference in a correlation predicate. I essentially want to 
correlate the RelNode of the changed table with the "NEW" alias and I 
try to do that through the primary key. Let's consider the example query 
"select * from table1 t". So for a trigger for "table1" I essentially 
want to create the query "select * from table1 t where (t.id) = (NEW.id)".


This is what I have so far, maybe you could point me to the right 
direction? The uniqueKeys are unfortunately null although the node is a 
JdbcTableScan.


private RexNodecreateNewCorrelatedJoinCondition(LogicalJoin join, RelNode node) 
{
Set uniqueKeys = node.getCluster().getMetadataQuery().getUniqueKeys(node); 
ImmutableBitSet uniqueFieldBitSet = uniqueKeys.iterator().next(); List 
fieldList = node.getRowType().getFieldList(); RelBuilder relBuilder = RelBuilder.create(config); 
RexNode condition = join.getCondition(); List conjuncts =new 
ArrayList<>(uniqueFieldBitSet.size() +1); conjuncts.add(condition); for (Integer index : 
uniqueFieldBitSet) {
conjuncts.add(
relBuilder.equals(
relBuilder.field("NEW", fieldList.get(index).getName()), 
relBuilder.field(fieldList.get(index).getIndex())
)
); }
return RexUtil.composeConjunction(rexBuilder, conjuncts); }

Thanks,

Christian



Correlation with a variable

2020-02-16 Thread Christian Beikov

Hello,

I asked this before but I guess the question got too big, so I thought 
splitting it up might be better.


I am trying to generate a query from a relational model on which I did a 
few tranformations but I don't know how to refer to a "variable".


In a SQL trigger, there usually are two variable "OLD" and "NEW" which I 
want to be able to refer to. I tried introducing a "transient scan", but 
unfortunately that won't work because this is not a relation and can't 
be quoted. I will workaround this for now by introducing a temporary 
relation in the trigger so that I can refer to it, but ideally I want to 
refer to the variable directly.


The simplest example SQL that I want to be able to produce would look 
like this:


select NEW."some_column"

The tricky part here is that NEW is not quoted. I don't know how I can 
represent this in a relation expression.


Thanks in advance for any help!

Regards,

Christian



Re: Correlation with a variable

2020-02-17 Thread Christian Beikov

Hey Danny,

it's not a view, it's a variable in PL/SQL with a row type. The thing 
is, variable names must not be quoted, but I have no idea how to avoid 
quoting for this single use case with the relational algebra model in 
Calcite.


Regards,

Christian

Am 18.02.2020 um 04:22 schrieb Danny Chan:

 From the case you gave, the “variable” seems a view ? Sorry I’m not familiar 
with the traditional RDBMS.

Best,
Danny Chan
在 2020年2月17日 +0800 PM1:27,Christian Beikov ,写道:

Hello,

I asked this before but I guess the question got too big, so I thought
splitting it up might be better.

I am trying to generate a query from a relational model on which I did a
few tranformations but I don't know how to refer to a "variable".

In a SQL trigger, there usually are two variable "OLD" and "NEW" which I
want to be able to refer to. I tried introducing a "transient scan", but
unfortunately that won't work because this is not a relation and can't
be quoted. I will workaround this for now by introducing a temporary
relation in the trigger so that I can refer to it, but ideally I want to
refer to the variable directly.

The simplest example SQL that I want to be able to produce would look
like this:

select NEW."some_column"

The tricky part here is that NEW is not quoted. I don't know how I can
represent this in a relation expression.

Thanks in advance for any help!

Regards,

Christian



Re: Correlation with a variable

2020-02-18 Thread Christian Beikov
My issue is not about parsing. I already have the relational model, I
parsed a query to which I want to add add a condition to a RelNode. Now I
want to add a RexNode to the LogicalFilter node that renders to:
NEW."somecolumn"

How would I construct a RexNode that renders to that when converting the
RelNode to SQL. Do I have to extend the SqlDialect to support that?

Danny Chan  schrieb am Di., 18. Feb. 2020, 15:12:

> If you want to make NEW a normal sql identifier, you should override it in
> the parser to make it unreserved.
>
> Christian Beikov 于2020年2月18日 周二下午3:11写道:
>
> > Hey Danny,
> >
> > it's not a view, it's a variable in PL/SQL with a row type. The thing
> > is, variable names must not be quoted, but I have no idea how to avoid
> > quoting for this single use case with the relational algebra model in
> > Calcite.
> >
> > Regards,
> >
> > Christian
> >
> > Am 18.02.2020 um 04:22 schrieb Danny Chan:
> > >  From the case you gave, the “variable” seems a view ? Sorry I’m not
> > familiar with the traditional RDBMS.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年2月17日 +0800 PM1:27,Christian Beikov  > >,写道:
> > >> Hello,
> > >>
> > >> I asked this before but I guess the question got too big, so I thought
> > >> splitting it up might be better.
> > >>
> > >> I am trying to generate a query from a relational model on which I
> did a
> > >> few tranformations but I don't know how to refer to a "variable".
> > >>
> > >> In a SQL trigger, there usually are two variable "OLD" and "NEW"
> which I
> > >> want to be able to refer to. I tried introducing a "transient scan",
> but
> > >> unfortunately that won't work because this is not a relation and can't
> > >> be quoted. I will workaround this for now by introducing a temporary
> > >> relation in the trigger so that I can refer to it, but ideally I want
> to
> > >> refer to the variable directly.
> > >>
> > >> The simplest example SQL that I want to be able to produce would look
> > >> like this:
> > >>
> > >> select NEW."some_column"
> > >>
> > >> The tricky part here is that NEW is not quoted. I don't know how I can
> > >> represent this in a relation expression.
> > >>
> > >> Thanks in advance for any help!
> > >>
> > >> Regards,
> > >>
> > >> Christian
> > >>
> >
>


Re: Correlation with a variable

2020-02-18 Thread Christian Beikov

Hey Stamatis,

thanks for your help.

Am 18.02.2020 um 19:34 schrieb Stamatis Zampetakis:

Hi Christian,

Long story short: maybe you can achieve what you want by adding a new type
of RexNode. From what I remember quoting is applied to identifiers (schema,
table, column names) so maybe if OLD/NEW are parameters (e.g.,
RexNamedParam) then possibly it does not make sense to quote them.
Right, and actually it is wrong to quote these variables/parameters i.e. 
it's an error as PG complains there is no relation with that name.

At row level triggers OLD and NEW refer to a single tuple/row at each point
in time. In this case, OLD and NEW are row type variables, or better say
parameters, with the same type as the table. In Calcite, there are
index-based parameters (RexDynamicParam) but not named parameters as the
one you seem to need. I think named parameters are useful in various
scenarios so possibly we could expand the RexNode hierarchy.
In this case to build the plan probably it suffices to create and plug the
parameter wherever you need.
That sounds great and just about what I would need. Is there an issue 
for this already?

At statement level triggers OLD and NEW refer to set of tuples/rows at each
point in time. In this case, OLD and NEW appear as (temporary)
relations/tables with the same type as the table. In terms of
implementation, I assume that the user defined query acts as a subquery
correlated with OLD/NEW as necessary.
Correct, but right now I am using row level triggers. I'll try to 
introduce somekind of synthetic view that holds the state so I can use a 
TransientScan for now.

In this case to build the plan probably you need to introduce scan
operations over OLD/NEW tables and create a correlation with the rest of
the query.
As of PG10 the REFERENCING clause can be used to introduce a temporary 
view for an old and new relation.

Best,
Stamatis

On Tue, Feb 18, 2020 at 7:03 PM Christian Beikov 
wrote:


My issue is not about parsing. I already have the relational model, I
parsed a query to which I want to add add a condition to a RelNode. Now I
want to add a RexNode to the LogicalFilter node that renders to:
NEW."somecolumn"

How would I construct a RexNode that renders to that when converting the
RelNode to SQL. Do I have to extend the SqlDialect to support that?

Danny Chan  schrieb am Di., 18. Feb. 2020, 15:12:


If you want to make NEW a normal sql identifier, you should override it

in

the parser to make it unreserved.

Christian Beikov 于2020年2月18日 周二下午3:11写道:


Hey Danny,

it's not a view, it's a variable in PL/SQL with a row type. The thing
is, variable names must not be quoted, but I have no idea how to avoid
quoting for this single use case with the relational algebra model in
Calcite.

Regards,

Christian

Am 18.02.2020 um 04:22 schrieb Danny Chan:

  From the case you gave, the “variable” seems a view ? Sorry I’m not

familiar with the traditional RDBMS.

Best,
Danny Chan
在 2020年2月17日 +0800 PM1:27,Christian Beikov <

christian.bei...@gmail.com

,写道:

Hello,

I asked this before but I guess the question got too big, so I

thought

splitting it up might be better.

I am trying to generate a query from a relational model on which I

did a

few tranformations but I don't know how to refer to a "variable".

In a SQL trigger, there usually are two variable "OLD" and "NEW"

which I

want to be able to refer to. I tried introducing a "transient scan",

but

unfortunately that won't work because this is not a relation and

can't

be quoted. I will workaround this for now by introducing a temporary
relation in the trigger so that I can refer to it, but ideally I

want

to

refer to the variable directly.

The simplest example SQL that I want to be able to produce would

look

like this:

select NEW."some_column"

The tricky part here is that NEW is not quoted. I don't know how I

can

represent this in a relation expression.

Thanks in advance for any help!

Regards,

Christian



Translation of SQL EXISTS

2020-02-19 Thread Christian Beikov

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd 
assume that an EXISTS is translated in relational algebra to a SEMI- and 
NOT EXISTS to an ANTI-join, but it's not.


PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 
from order_position p where o.id = p.order_id)");

p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. 
the EXISTS part is not rendered. This is the plan I get.


LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])

  LogicalProject(inputs=[0])
    LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
  LogicalJoin(condition=[=($0, $1)], joinType=[inner])
    JdbcTableScan(table=[[adhoc, _order]])
    LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
  LogicalProject(exprs=[[$1, true]])
    JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])

  LogicalProject(inputs=[0])
  LogicalJoin(condition=[=($0, $1)], joinType=[semi])
    JdbcTableScan(table=[[adhoc, _order]])
    JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])

  LogicalProject(inputs=[0])
  LogicalJoin(condition=[=($0, $1)], joinType=[anti])
    JdbcTableScan(table=[[adhoc, _order]])
    JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation 
makes sense?


I constructed relational algebra structures for some other statements 
with SEMI- and ANTI-joins and already noticed that these join types 
weren't handled in 
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join), 
which I fixed locally. Is the lack of a translation intentional?


Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an 
over-simplification or would you say it's correct? As far as I 
understood from https://en.wikipedia.org/wiki/Relational_algebra this is 
correct.


I'd be happy to contribute that back. I didn't look into the Sql-to-Rel 
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I 
assume that's not that hard and I could add that.


Regards,

Christian



Re: Translation of SQL EXISTS

2020-02-19 Thread Christian Beikov

Hey Haisheng,

it is nice to have a rule that detects such patterns but do you agree 
that it would be better to generate SEMI- and ANTI-joins already during 
Sql-to-Rel conversion?


I implemented the Rel-to-Sql conversion for SEMI- and ANTI-join and it 
seems to work for my use cases, but it feels a bit ugly as I have to do 
some transformations which might be ad-hoc. I had to replace identifiers 
that refer to the outcome of the SEMI- and ANTI-joins with the select 
items of the LHS because the SEMI- and ANTI-join does not get an alias.


See for yourself: 
https://github.com/beikov/calcite/commit/6ca67e2e8df44c2081e71407ee6893a961f36271


I'll add a few tests and a comment, but could anyone please comment on 
whether this approach looks ok?


Regards,

Christian

Am 19.02.2020 um 19:03 schrieb Haisheng Yuan:

Hi Christian,

For the query in your example, Calcite first generates inner join plan 
with aggregate child, then through SemJoinRule transform the inner 
join to semi or antisemi join. The reason to have inner join is that 
it allows join commutativity, which is good for generating a potential 
better plan with nestedloop join or hash join.


Admittedly, this process in Calcite is counter intuitive. It should be 
in reverse order, first generate a semi or anti-semi join, then 
generate an inner/outer join.


- Haisheng

--
    发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd

assume that an EXISTS is translated in relational algebra to a SEMI- and

NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1

from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.

the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
   LogicalProject(inputs=[0])
 LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
   LogicalJoin(condition=[=($0, $1)], joinType=[inner])
 JdbcTableScan(table=[[adhoc, _order]])
 LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
   LogicalProject(exprs=[[$1, true]])
 JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[semi])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[anti])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation
makes sense?

I constructed relational algebra structures for some other statements
with SEMI- and ANTI-joins and already noticed that these join types
weren't handled in

org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),

which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an

over-simplification or would you say it's correct? As far as I
understood from https://en.wikipedia.org/wiki/Relational_algebra this is

correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel

translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I

assume that's not that hard and I could add that.

Regards,

Christian




Re: How to optimize repeated RelNode Structures? (CALCITE-3806)

2020-02-20 Thread Christian Beikov
I guess what you are looking for is something like the SubPlan node of
PostgreSQL? Such a node would materialize the results to be used by
multiple other nodes or in a nested loop context to avoid accessing a
source relation multiple times.

Would be cool if Calcite could create a SubPlan and let the union all node
use that instead. Something like this

Materialize(name=Sub1)

  LogicalProject(EMPNO=[$0])

 LogicalFilter(condition=[OR(>=($0, 7369), >=($0, 7369))])

   LogicalTableScan(table=[[scott, EMP]])

LogicalUnion(all=[true])

   LogicalProject(EMPNO=[$0])

 LogicalFilter(condition=[>=($0, 7369)])

   LogicalTableScan(table=[[Sub1]])

   LogicalProject(EMPNO=[$0])

 LogicalFilter(condition=[>=($0, 7369)])

   LogicalTableScan(table=[[Sub1]])

Of course there are further optimizations that could be done like removing
filters etc. but that's just a general example.

Not sure if the Spool operator does that, but such a translation could also
come in handy for distributed queries where every row access to a remote
system has protocol overhead.

Danny Chan  schrieb am Do., 20. Feb. 2020, 15:27:

> Calcite has a Spool operator, maybe you can check that.
>
> Anjali Shrishrimal 于2020年2月20日
> 周四下午3:20写道:
>
> > Hi everybody,
> >
> > I would like to have your suggestions on CALCITE-3806.
> >
> > Asking it here as suggested by Julian.
> >
> >
> >
> >
> >
> > If RelNode tree contains a subtree whose result can be obtained by some
> > other part of the same tree,
> >
> > can we optimize it ? and how to express it in plan ?
> >
> >
> >
> > For example,
> >
> > Let's say input structure looks like this :
> >
> >
> >
> > LogicalUnion(all=[true])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> > LogicalFilter(condition=[>=($0, 7369)])
> >
> >   LogicalTableScan(table=[[scott, EMP]])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> > LogicalFilter(condition=[>=($0, 7369)])
> >
> >   LogicalTableScan(table=[[scott, EMP]])
> >
> >
> >
> >
> >
> > In this case,
> >
> >
> >
> >   LogicalProject(EMPNO=[$0])
> >
> > LogicalFilter(condition=[>=($0, 7369)])
> >
> >   LogicalTableScan(table=[[scott, EMP]])
> >
> >
> >
> > is repeated. It is going to fetch same data twice.
> >
> > Can we save one fetch? Can we somehow tell 2nd input of union to make use
> > of union's 1st input. Is there any way to express that in plan?
> >
> >
> >
> > Also,
> > If the structure was like this :
> >
> >
> >
> > LogicalUnion(all=[true])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> > LogicalFilter(condition=[>=($0, 7369)])
> >
> >   LogicalTableScan(table=[[scott, EMP]])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> > LogicalFilter(condition=[>=($0, 8000)])
> >
> >   LogicalTableScan(table=[[scott, EMP]])
> >
> >
> >
> > Second part of union can perform filtering on fetched data of 1st part.
> > (As second's output is subset of first's output)
> >
> >
> >
> > Does calcite provide such kind of optimizations ?
> >
> > If not, what are the challenges to do so?
> >
> >
> >
> >
> >
> >
> >
> > Would love to hear your thoughts.
> >
> >
> >
> >
> > Thank you,
> > Anjali Shrishrimal
> >
>


Re: Translation of SQL EXISTS

2020-02-20 Thread Christian Beikov

Alright, so here is the PR for the Rel-to-Sql translation.

I thought about what you wrote here already Julian, but thanks for 
mentioning it.


Would be great if this could make it into 1.22.0, or is it too late for 
that already?


Am 19.02.2020 um 22:49 schrieb Julian Hyde:

Agreed, it should be in reverse order. Translate to semi-join (or
anti-join for NOT EXISTS), then optionally use a rule to rewrite semi-
or anti-join to Join+Aggregate.

Note that if the EXISTS is in a disjunction (e.g. "delete from orders
where exists (select null from order_items where ...) or shipping >
20") we cannot use a semi-join. We have to use a left join, using an
indicator column on the right-hand side that will be null iff there is
no match. Which is what we do currently.

On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan  wrote:

Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd
assume that an EXISTS is translated in relational algebra to a SEMI- and
NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.
the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
LogicalJoin(condition=[=($0, $1)], joinType=[inner])
  JdbcTableScan(table=[[adhoc, _order]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(exprs=[[$1, true]])
  JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[semi])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[anti])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation
makes sense?

I constructed relational algebra structures for some other statements
with SEMI- and ANTI-joins and already noticed that these join types
weren't handled in
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an
over-simplification or would you say it's correct? As far as I
understood from https://en.wikipedia.org/wiki/Relational_algebra this is
correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I
assume that's not that hard and I could add that.

Regards,

Christian




Re: Translation of SQL EXISTS

2020-02-20 Thread Christian Beikov

Yeah right, somehow forgot to add the link.

Am 20.02.2020 um 21:46 schrieb Julian Hyde:

Thanks. I guess you mean https://issues.apache.org/jira/browse/CALCITE-3810 
<https://issues.apache.org/jira/browse/CALCITE-3810>.



On Feb 20, 2020, at 11:21 AM, Christian Beikov  
wrote:

Alright, so here is the PR for the Rel-to-Sql translation.

I thought about what you wrote here already Julian, but thanks for mentioning 
it.

Would be great if this could make it into 1.22.0, or is it too late for that 
already?

Am 19.02.2020 um 22:49 schrieb Julian Hyde:

Agreed, it should be in reverse order. Translate to semi-join (or
anti-join for NOT EXISTS), then optionally use a rule to rewrite semi-
or anti-join to Join+Aggregate.

Note that if the EXISTS is in a disjunction (e.g. "delete from orders
where exists (select null from order_items where ...) or shipping >
20") we cannot use a semi-join. We have to use a left join, using an
indicator column on the right-hand side that will be null iff there is
no match. Which is what we do currently.

On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan  wrote:

Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd
assume that an EXISTS is translated in relational algebra to a SEMI- and
NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.
the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
LogicalJoin(condition=[=($0, $1)], joinType=[inner])
  JdbcTableScan(table=[[adhoc, _order]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(exprs=[[$1, true]])
  JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[semi])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[anti])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation
makes sense?

I constructed relational algebra structures for some other statements
with SEMI- and ANTI-joins and already noticed that these join types
weren't handled in
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an
over-simplification or would you say it's correct? As far as I
understood from https://en.wikipedia.org/wiki/Relational_algebra this is
correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I
assume that's not that hard and I could add that.

Regards,

Christian






Re: TableModify does not keep UPSERT keyword

2020-03-17 Thread Christian Beikov
AFAIK MERGE has different concurrency semantics than what some DBMS call 
UPSERT. PostgreSQL for example has a guaranteed insert or update 
semantics whereas MERGE could end with constraint violation errors: 
https://wiki.postgresql.org/wiki/UPSERT


Maybe it's worth adding that to the relational model after all?

Am 17.03.2020 um 07:17 schrieb Enrico Olivelli:

Il Lun 16 Mar 2020, 23:55 Julian Hyde  ha scritto:


Change the unparse operation for the dialect so that you generate UPSERT
rather than MERGE.

IIRC we did this for another dialect - maybe Phoenix.


Julian,
In my case (HerdDB) I need to see the presence of UPSERT in the RelNode
(EnumerableTableModify oŕ LogicalTableModify)
I saw the JIRA where you introduced UPSERT for Phoenix
I will check deeper, thanks

Enrico




Julian


On Mar 16, 2020, at 1:22 AM, Enrico Olivelli 

wrote:

Il Lun 16 Mar 2020, 09:06 Stamatis Zampetakis  ha
scritto:


Hi Enrico,

I have the impression that UPSERT is currently supported only at the

parser

level [1] so it seems normal that you don't find something relevant in
LogicalTableModify etc. Note that the SQL standard equivalent is the

MERGE

statement [2] but this also seems to be supported only at the
parser/validator level [2].
I guess it is not necessary to introduce more things in TableModify

since

UPSERT seems to be syntactic sugar. I think that most of the work can be
done in RelToSqlConverter [4] and possibly the rest of the code can be

left

intact.


I would like to sens a patch that introduces the ability to keep the
SqlInsert 'keywords' and pass them to TableModify?
Would it be a good approach?

The alternative is to introduce a new Operation but it would be a more
invasive change and I think it is not worth


Enrico



Best,
Stamatis

[1] https://issues.apache.org/jira/browse/CALCITE-492
[2] https://en.wikipedia.org/wiki/Merge_(SQL)
[3] https://issues.apache.org/jira/browse/CALCITE-985
[4]



https://github.com/apache/calcite/blob/d234626227954eefffe49f42abec65c649ffe3a6/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L2395

On Sun, Mar 15, 2020 at 6:53 PM Enrico Olivelli 
wrote:

Hi,
I am trying to use UPSERT but it seems to me that in TableModify (or
best LogicalTableModify or EnumerableTableModify) there is no way to
distinguish an INSERT from an UPSERT.

Am I missing something?

Regards
Enrico



Re: TableModify does not keep UPSERT keyword

2020-03-18 Thread Christian Beikov
I'd argue that these technical details are in fact the reason why this 
is a different functionality, that deserves special handling.


I expect an upsert statement like `INSERT INTO tbl(a, b) VALUES(1, 2) ON 
CONFLICT DO NOTHING` to never produce a constraint violation.


This is functionally different from a statement like `INSERT INTO tbl(a, 
b) SELECT a, b (VALUES(1, 2)) x(a, b) WHERE NOT EXISTS(SELECT 1 FROM tbl 
y WHERE x.a = y.a AND x.b = y.b)` which might cause the constraint 
violation.


On the other hand, an upsert statement like `INSERT INTO tbl(a, b) 
VALUES(1, 2) ON CONFLICT DO UPDATE` guarantees that at the end of the 
statement, there is the tuple `(1, 2)`. There are other variants that 
provide different functionality(conflict handler conditions, special 
update clause, etc.) and overall, for DBMS that do not support the ON 
CONFLICT clause, it is necessary to fallback to PL/SQL to handle 
constraint violations in exception handlers within loops to ensure the 
same behvaior.


If Calcite transforms such an UPSERT statement to a MERGE statement, it 
must at least be flagged to require atomicity to be able to generate the 
correct logic for the backend that this is running on.


Am 17.03.2020 um 22:28 schrieb Julian Hyde:

I don't think there's a significant difference between the UPSERT and
MERGE. The differences are in marketing (which keyword to use) and in
technical details (e.g. concurrency semantics). Not worth splitting a
core concept over. We spend a lot of effort keeping like-things-like.

On Tue, Mar 17, 2020 at 1:11 AM Christian Beikov
 wrote:

AFAIK MERGE has different concurrency semantics than what some DBMS call
UPSERT. PostgreSQL for example has a guaranteed insert or update
semantics whereas MERGE could end with constraint violation errors:
https://wiki.postgresql.org/wiki/UPSERT

Maybe it's worth adding that to the relational model after all?

Am 17.03.2020 um 07:17 schrieb Enrico Olivelli:

Il Lun 16 Mar 2020, 23:55 Julian Hyde  ha scritto:


Change the unparse operation for the dialect so that you generate UPSERT
rather than MERGE.

IIRC we did this for another dialect - maybe Phoenix.


Julian,
In my case (HerdDB) I need to see the presence of UPSERT in the RelNode
(EnumerableTableModify oŕ LogicalTableModify)
I saw the JIRA where you introduced UPSERT for Phoenix
I will check deeper, thanks

Enrico




Julian


On Mar 16, 2020, at 1:22 AM, Enrico Olivelli 

wrote:

Il Lun 16 Mar 2020, 09:06 Stamatis Zampetakis  ha
scritto:


Hi Enrico,

I have the impression that UPSERT is currently supported only at the

parser

level [1] so it seems normal that you don't find something relevant in
LogicalTableModify etc. Note that the SQL standard equivalent is the

MERGE

statement [2] but this also seems to be supported only at the
parser/validator level [2].
I guess it is not necessary to introduce more things in TableModify

since

UPSERT seems to be syntactic sugar. I think that most of the work can be
done in RelToSqlConverter [4] and possibly the rest of the code can be

left

intact.


I would like to sens a patch that introduces the ability to keep the
SqlInsert 'keywords' and pass them to TableModify?
Would it be a good approach?

The alternative is to introduce a new Operation but it would be a more
invasive change and I think it is not worth


Enrico



Best,
Stamatis

[1] https://issues.apache.org/jira/browse/CALCITE-492
[2] https://en.wikipedia.org/wiki/Merge_(SQL)
[3] https://issues.apache.org/jira/browse/CALCITE-985
[4]



https://github.com/apache/calcite/blob/d234626227954eefffe49f42abec65c649ffe3a6/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L2395

On Sun, Mar 15, 2020 at 6:53 PM Enrico Olivelli 
wrote:

Hi,
I am trying to use UPSERT but it seems to me that in TableModify (or
best LogicalTableModify or EnumerableTableModify) there is no way to
distinguish an INSERT from an UPSERT.

Am I missing something?

Regards
Enrico



Sequence support

2017-08-10 Thread Christian Beikov

Hello,

I am new to Calcite but I already got quite far with integrating it into 
a test project. Now I am a bit stuck and I am not sure what is happening.


I would like to use Calcite for data federation, but first I am trying 
to simply put it between the application and the actual JDBC connection. 
I read something on the website that operations aren't pushed down to 
JDBC schemas yet? That would be unfortunate as that is something I 
actually need. Can you confirm what the status of that is?


While trying to insert data, I noticed that SQL sequences don't seem to 
be supported. Is there a special undocumented syntax for retrieving 
sequence values? I tried "next value for SEQUENCE" as something in 
SqlKind was suggesting that might be the appropriate syntax but it 
didn't work.


Finally, I also noticed that transactions don't seem to work properly. 
Apparently CalciteMetaImpl#commit throws an 
UnsupportedOperationException? Am I missing something here or is that a 
current limitation?


Thanks in advance!

--

Mit freundlichen Grüßen,
----
*Christian Beikov*


Materialization left join rewrite

2017-08-20 Thread Christian Beikov
I just discovered that queries containing left joins aren't rewritten 
when materializations are available and wondered why that is a 
limitation. I looked a bit into the implementation and found out that 
org.apache.calcite.rel.metadata.RelMdAllPredicates#getAllPredicates(org.apache.calcite.rel.core.Join, 
org.apache.calcite.rel.metadata.RelMetadataQuery) returns null when 
encountering a non-inner join.


For an inner join, the method returns essentially the join predicate, so 
I thought a left join should be similar. How about returning an OR node 
that asserts input refs of "outer-joined" tables are null if unmatched?


Let's consider an example query like "from emp e left join dept d on 
e.deptno = d.deptno"


If the join were an inner join, the return would be "=(e.deptno, 
d.deptno)". In case of a left join I'd suppose it returns 
"OR(IS_NULL(d.deptno),=(e.deptno, d.deptno))". Would that be sufficient 
to implement left join support or do you have something different in 
mind? I couldn't find a ticket for that particular case to discuss a 
possible implementation so please direct me if you already tracked that 
issue.


--

Mit freundlichen Grüßen,
--------
*Christian Beikov*


Casts because of nullability differences

2017-08-23 Thread Christian Beikov

Hello,

I came to notice that the use of a nullable column in a context where a 
non-nullable is expected, causes a cast that is essentially useless. Am 
I missing something and is there reason to why that is done?


--

Mit freundlichen Grüßen,

*Christian Beikov*


Re: Casts because of nullability differences

2017-08-23 Thread Christian Beikov
I am using materializations and my materialization table has nullable 
columns. Calcite correctly determines that the underlying query can 
never produce null for these columns, but I didn't encode that into the 
table definitino. I was just wondering why the casts are generated in 
the SQL that is sent to the owner of the materialization table.


Thanks for the explanation, I suppose the cast causes proper exceptions 
to be thrown when null is encountered instead of simply a NPE? Or is the 
cast in such a case really unnecessary?



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 23.08.2017 um 20:01 schrieb Julian Hyde:

I presume you’re talking about RexNodes. It makes a lot of things easier if the 
argument to function call are EXACTLY the type required by that function. The 
system inserts implicit casts where necessary.

If you’re generating Java code (in Enumerable convention) there is a lot of 
difference between INTEGER NOT NULL and INTEGER. The former can be represented 
by int, the latter only by java.lang.Integer.

Julian



On Aug 23, 2017, at 2:08 AM, Christian Beikov  
wrote:

Hello,

I came to notice that the use of a nullable column in a context where a 
non-nullable is expected, causes a cast that is essentially useless. Am I 
missing something and is there reason to why that is done?

--

Mit freundlichen Grüßen,
----
*Christian Beikov*




Re: Casts because of nullability differences

2017-08-23 Thread Christian Beikov
I get that the cast is done "internally" to handle nulls properly, but 
why would such a cast end up in the SQL that gets sent to the DBMS? Can 
you point me to a direction where to look for these generated casts? I'd 
like to try if I can avoid rendering these casts, as some DBMS planners 
might get confused.



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 23.08.2017 um 21:20 schrieb Julian Hyde:

It’s difficult to say for sure whether the cast is necessary. For some of us 
consistency, is an end in itself. :)

If a null was to occur, then yes, the cast would throw. But it’s also possible 
that Calcite can deduce that a null is impossible, in which case it wouldn’t 
even generate the logic.

I have this argument with Ashutosh from time to time. He tends to say we can 
never be sure which expressions might or might not produce nulls, so why bother 
being so strict? I’d rather use every piece of information we can get, because 
it might allow us to optimize.

By the way, there are cases where we can deduce that columns are not-nullable 
but we cannot change the row-type due to the RelOptRule contract. We ought to 
be generating RelMdPredicates in these cases.

Julian




On Aug 23, 2017, at 11:33 AM, Christian Beikov  
wrote:

I am using materializations and my materialization table has nullable columns. 
Calcite correctly determines that the underlying query can never produce null 
for these columns, but I didn't encode that into the table definitino. I was 
just wondering why the casts are generated in the SQL that is sent to the owner 
of the materialization table.

Thanks for the explanation, I suppose the cast causes proper exceptions to be 
thrown when null is encountered instead of simply a NPE? Or is the cast in such 
a case really unnecessary?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 23.08.2017 um 20:01 schrieb Julian Hyde:

I presume you’re talking about RexNodes. It makes a lot of things easier if the 
argument to function call are EXACTLY the type required by that function. The 
system inserts implicit casts where necessary.

If you’re generating Java code (in Enumerable convention) there is a lot of 
difference between INTEGER NOT NULL and INTEGER. The former can be represented 
by int, the latter only by java.lang.Integer.

Julian



On Aug 23, 2017, at 2:08 AM, Christian Beikov  
wrote:

Hello,

I came to notice that the use of a nullable column in a context where a 
non-nullable is expected, causes a cast that is essentially useless. Am I 
missing something and is there reason to why that is done?

--

Mit freundlichen Grüßen,
--------
*Christian Beikov*




Re: Materialization left join rewrite

2017-08-23 Thread Christian Beikov
FWIW I use an implementation of that idea locally and did a few 
end-to-end tests and it seems to work so far.


I can prepare a PR so we can discuss this if you tell me an issue number 
or that I should create one for this.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 02:13 schrieb Julian Hyde:

Jesus,

Can you answer this question for Christian please? Seems like a reasonable use 
case, and your area of expertise.

Julian


On Aug 20, 2017, at 12:25 PM, Christian Beikov  
wrote:

I just discovered that queries containing left joins aren't rewritten when 
materializations are available and wondered why that is a limitation. I looked 
a bit into the implementation and found out that 
org.apache.calcite.rel.metadata.RelMdAllPredicates#getAllPredicates(org.apache.calcite.rel.core.Join,
 org.apache.calcite.rel.metadata.RelMetadataQuery) returns null when 
encountering a non-inner join.

For an inner join, the method returns essentially the join predicate, so I thought a left 
join should be similar. How about returning an OR node that asserts input refs of 
"outer-joined" tables are null if unmatched?

Let's consider an example query like "from emp e left join dept d on e.deptno = 
d.deptno"

If the join were an inner join, the return would be "=(e.deptno, d.deptno)". In case of a 
left join I'd suppose it returns "OR(IS_NULL(d.deptno),=(e.deptno, d.deptno))". Would 
that be sufficient to implement left join support or do you have something different in mind? I 
couldn't find a ticket for that particular case to discuss a possible implementation so please 
direct me if you already tracked that issue.

--

Mit freundlichen Grüßen,
--------
*Christian Beikov*




Re: Casts because of nullability differences

2017-08-23 Thread Christian Beikov
RelOptMaterialization creates the cast which calls eventually into 
RexUtil.generateCastExpressions. That method, probably rightfully, is 
strict about nulls. The question is, whether the cast should be 
optimized away later or it shouldn't be created in the first place. What 
do you say?



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 24.08.2017 um 02:05 schrieb Julian Hyde:

You didn’t say that the casts were ending up in generated SQL. That would be a 
bug. Probably in RelToSqlConverter.



On Aug 23, 2017, at 4:35 PM, Christian Beikov  
wrote:

I get that the cast is done "internally" to handle nulls properly, but why 
would such a cast end up in the SQL that gets sent to the DBMS? Can you point me to a 
direction where to look for these generated casts? I'd like to try if I can avoid 
rendering these casts, as some DBMS planners might get confused.


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 23.08.2017 um 21:20 schrieb Julian Hyde:

It’s difficult to say for sure whether the cast is necessary. For some of us 
consistency, is an end in itself. :)

If a null was to occur, then yes, the cast would throw. But it’s also possible 
that Calcite can deduce that a null is impossible, in which case it wouldn’t 
even generate the logic.

I have this argument with Ashutosh from time to time. He tends to say we can 
never be sure which expressions might or might not produce nulls, so why bother 
being so strict? I’d rather use every piece of information we can get, because 
it might allow us to optimize.

By the way, there are cases where we can deduce that columns are not-nullable 
but we cannot change the row-type due to the RelOptRule contract. We ought to 
be generating RelMdPredicates in these cases.

Julian




On Aug 23, 2017, at 11:33 AM, Christian Beikov  
wrote:

I am using materializations and my materialization table has nullable columns. 
Calcite correctly determines that the underlying query can never produce null 
for these columns, but I didn't encode that into the table definitino. I was 
just wondering why the casts are generated in the SQL that is sent to the owner 
of the materialization table.

Thanks for the explanation, I suppose the cast causes proper exceptions to be 
thrown when null is encountered instead of simply a NPE? Or is the cast in such 
a case really unnecessary?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 23.08.2017 um 20:01 schrieb Julian Hyde:

I presume you’re talking about RexNodes. It makes a lot of things easier if the 
argument to function call are EXACTLY the type required by that function. The 
system inserts implicit casts where necessary.

If you’re generating Java code (in Enumerable convention) there is a lot of 
difference between INTEGER NOT NULL and INTEGER. The former can be represented 
by int, the latter only by java.lang.Integer.

Julian



On Aug 23, 2017, at 2:08 AM, Christian Beikov  
wrote:

Hello,

I came to notice that the use of a nullable column in a context where a 
non-nullable is expected, causes a cast that is essentially useless. Am I 
missing something and is there reason to why that is done?

--

Mit freundlichen Grüßen,
--------
*Christian Beikov*




Materialized view case sensitivity problem

2017-08-24 Thread Christian Beikov

Hey,

I have configured Lex.MYSQL_ANSI but when a query gets parsed, the 
column names of select items are "to-upper-cased".


I'm having problems with matching the row types of materialized views 
and the source sql because of that. Any idea how to fix that?


--

Mit freundlichen Grüßen,
--------
*Christian Beikov*


Re: Materialized view case sensitivity problem

2017-08-24 Thread Christian Beikov
Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a 
call to 
org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive 
to configure the parser according to the LEX configuration. Is that a 
bug or expected?



Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 11:24 schrieb Christian Beikov:


Hey,

I have configured Lex.MYSQL_ANSI but when a query gets parsed, the 
column names of select items are "to-upper-cased".


I'm having problems with matching the row types of materialized views 
and the source sql because of that. Any idea how to fix that?


--

Mit freundlichen Grüßen,
--------
*Christian Beikov*




Re: Materialized view case sensitivity problem

2017-08-24 Thread Christian Beikov
Actually, it seems the root cause is that the materialization uses the 
wrong configuration.


org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable 
creates a new connection with the default configuration that does 
TO_UPPER. Would it be ok for it to receive a CalciteConnectionConfig?



Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 11:36 schrieb Christian Beikov:


Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a 
call to 
org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive 
to configure the parser according to the LEX configuration. Is that a 
bug or expected?



Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 11:24 schrieb Christian Beikov:


Hey,

I have configured Lex.MYSQL_ANSI but when a query gets parsed, the 
column names of select items are "to-upper-cased".


I'm having problems with matching the row types of materialized views 
and the source sql because of that. Any idea how to fix that?


--

Mit freundlichen Grüßen,
--------
*Christian Beikov*






Re: Materialized view case sensitivity problem

2017-08-24 Thread Christian Beikov

Well my use case is apparently something that isn't supported yet :D

I'd like to have a view instead of a table act as materialization table 
for a materialized view. The reason is simply that the actual data is in 
a different non-SQL schema and I need the view to do data conversion.


I noticed that materializations only were looking for tables 
specifically and adapted it to also lookup views, but now I think I am 
stuck. Since the view is a LogicalProject, the use of getTable() in 
RelOptMaterialization won't work. Do you have an idea how I could make 
that work?



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 24.08.2017 um 12:57 schrieb Julian Hyde:

Or supply your own TableFactory? I'm not quite sure of your use case.
I've only tested cases where materialized views are "internal",
therefore they work fine with Calcite's default dialect.

On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov
 wrote:

Actually, it seems the root cause is that the materialization uses the wrong
configuration.

org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable
creates a new connection with the default configuration that does TO_UPPER.
Would it be ok for it to receive a CalciteConnectionConfig?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 24.08.2017 um 11:36 schrieb Christian Beikov:


Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a
call to
org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to
configure the parser according to the LEX configuration. Is that a bug or
expected?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 24.08.2017 um 11:24 schrieb Christian Beikov:


Hey,

I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column
names of select items are "to-upper-cased".

I'm having problems with matching the row types of materialized views and
the source sql because of that. Any idea how to fix that?

--

Mit freundlichen Grüßen,
--------
*Christian Beikov*






Re: Materialized view case sensitivity problem

2017-08-24 Thread Christian Beikov
My main problem is the row type equality assertion in 
org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel)


Imagine I have a table "document" with columns "id" and "name". When the 
JdbcSchema reads the structure, it gets column names in upper case. Now 
I register a materialized view for a query like "select id, name from 
document". The materialized table for that view is in my case a view 
again defined like "select ... AS `id`, ... AS `name` from ...".


The row type of my view correctly is "id, name". The row type of the 
table "document" is "ID, NAME" because the JdbcSchema gets upper cased 
names. Initially, the row type of the query for the materialized view is 
also correct, but during the "trim fields" phase the row type gets 
replaced with the types from the table. Is this replacement of field 
types even correct?


Because of that, the assertion in the substiution visitor fails. What 
would be the appropriate solution for this mismatch?



Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 12:57 schrieb Julian Hyde:

Or supply your own TableFactory? I'm not quite sure of your use case.
I've only tested cases where materialized views are "internal",
therefore they work fine with Calcite's default dialect.

On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov
 wrote:

Actually, it seems the root cause is that the materialization uses the wrong
configuration.

org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable
creates a new connection with the default configuration that does TO_UPPER.
Would it be ok for it to receive a CalciteConnectionConfig?


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 24.08.2017 um 11:36 schrieb Christian Beikov:


Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a
call to
org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to
configure the parser according to the LEX configuration. Is that a bug or
expected?


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 24.08.2017 um 11:24 schrieb Christian Beikov:


Hey,

I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column
names of select items are "to-upper-cased".

I'm having problems with matching the row types of materialized views and
the source sql because of that. Any idea how to fix that?

--

Mit freundlichen Grüßen,

*Christian Beikov*






Re: Materialization left join rewrite

2017-08-24 Thread Christian Beikov
I created https://issues.apache.org/jira/browse/CALCITE-1965 to track 
this issue.


Apart from tests that are already failing on master, it seems that my 
implementation(https://github.com/apache/calcite/pull/526) doesn't fail 
any other tests. I'd love some feedback as that is a very essential 
feature for me.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 02:13 schrieb Julian Hyde:

Jesus,

Can you answer this question for Christian please? Seems like a reasonable use 
case, and your area of expertise.

Julian


On Aug 20, 2017, at 12:25 PM, Christian Beikov  
wrote:

I just discovered that queries containing left joins aren't rewritten when 
materializations are available and wondered why that is a limitation. I looked 
a bit into the implementation and found out that 
org.apache.calcite.rel.metadata.RelMdAllPredicates#getAllPredicates(org.apache.calcite.rel.core.Join,
 org.apache.calcite.rel.metadata.RelMetadataQuery) returns null when 
encountering a non-inner join.

For an inner join, the method returns essentially the join predicate, so I thought a left 
join should be similar. How about returning an OR node that asserts input refs of 
"outer-joined" tables are null if unmatched?

Let's consider an example query like "from emp e left join dept d on e.deptno = 
d.deptno"

If the join were an inner join, the return would be "=(e.deptno, d.deptno)". In case of a 
left join I'd suppose it returns "OR(IS_NULL(d.deptno),=(e.deptno, d.deptno))". Would 
that be sufficient to implement left join support or do you have something different in mind? I 
couldn't find a ticket for that particular case to discuss a possible implementation so please 
direct me if you already tracked that issue.

--

Mit freundlichen Grüßen,
----
*Christian Beikov*




Re: Materialized view case sensitivity problem

2017-08-24 Thread Christian Beikov
I apparently had a different problem that lead me to believe the view 
was the problem. In fact, the actual query was the problem.


So i have the query for the materialized view "select id as `id`, name 
as `name` from document" and the query for the normal view "select 
cast(_MAP['id'] AS bigint) AS `id`, cast(_MAP['name'] AS varchar(255)) 
AS `name` from elasticsearch_raw.document_index".


Now when I send the query "select id as col1, name as col2 from 
document", the row type at first is "col1 bigint, col2 varchar(255)" and 
later it becomes "ID bigint, NAME varchar(255)" which is to a specific 
extent a good thing. The materialization logic determines it can 
substitue the query, but during the substitution it compares that row 
type with the one from the view. The Jdbc schema receives the columns in 
upper case, which is why the row type of the sent query is in upper 
case. Either the comparison should be case insensitive, or I simply 
upper case the names of the columns in the view, which is what I did now.


Doing that will unfortunately cause a little mismatch in the ES adapter 
which expects that the field names have the same case as the fields of 
the row type. This is why I adapted some rules to extract the correctly 
cased field name from the _MAP expression.


Now the question is, should the comparison be case insensitive or should 
I rely on the fact, that the JDBC schema will always have upper cased 
column names?



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 24.08.2017 um 21:00 schrieb Julian Hyde:

Rather than "select id, name from document” could you create your view as 
"select `id`, `name` from document” (or however the back-end system quotes 
identifiers). Then “id” would still be in lower-case when the JDBC adapter queries the 
catalog.


On Aug 24, 2017, at 5:17 AM, Christian Beikov  
wrote:

My main problem is the row type equality assertion in 
org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel)

Imagine I have a table "document" with columns "id" and "name". When the JdbcSchema reads the 
structure, it gets column names in upper case. Now I register a materialized view for a query like "select id, name from 
document". The materialized table for that view is in my case a view again defined like "select ... AS `id`, ... AS 
`name` from ...".

The row type of my view correctly is "id, name". The row type of the table "document" is "ID, 
NAME" because the JdbcSchema gets upper cased names. Initially, the row type of the query for the materialized 
view is also correct, but during the "trim fields" phase the row type gets replaced with the types from the 
table. Is this replacement of field types even correct?

Because of that, the assertion in the substiution visitor fails. What would be 
the appropriate solution for this mismatch?


Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 12:57 schrieb Julian Hyde:

Or supply your own TableFactory? I'm not quite sure of your use case.
I've only tested cases where materialized views are "internal",
therefore they work fine with Calcite's default dialect.

On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov
 wrote:

Actually, it seems the root cause is that the materialization uses the wrong
configuration.

org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable
creates a new connection with the default configuration that does TO_UPPER.
Would it be ok for it to receive a CalciteConnectionConfig?


Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 11:36 schrieb Christian Beikov:

Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a
call to
org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to
configure the parser according to the LEX configuration. Is that a bug or
expected?


Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 11:24 schrieb Christian Beikov:

Hey,

I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column
names of select items are "to-upper-cased".

I'm having problems with matching the row types of materialized views and
the source sql because of that. Any idea how to fix that?

--

Mit freundlichen Grüßen,

*Christian Beikov*




Re: Materialized view case sensitivity problem

2017-08-24 Thread Christian Beikov
I guess it's partly the DBMSes fault, because it either treats 
identifiers case insensitive or the schema metadata API returns column 
names in upper case. Not sure if we could do anything about that in the 
JDBC adapter.


I think I will just inspect the JDBC schema before preparing any 
materializations to avoid the casing problem and having to rely on a 
specific case. Thanks for the clarifications!



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 24.08.2017 um 21:52 schrieb Jesus Camacho Rodriguez:

I never hit this issue as we do not go through the JDBC adaptor when we
use the MV rewriting within Hive.

I am not familiar with that code path, but I guess no matter whether it is
MV or a table definition, we should end up doing the same wrt casing column
names, thus there should be no need for case insensitive comparison?

- Jesús



On 8/24/17, 12:19 PM, "Christian Beikov"  wrote:


I apparently had a different problem that lead me to believe the view
was the problem. In fact, the actual query was the problem.

So i have the query for the materialized view "select id as `id`, name
as `name` from document" and the query for the normal view "select
cast(_MAP['id'] AS bigint) AS `id`, cast(_MAP['name'] AS varchar(255))
AS `name` from elasticsearch_raw.document_index".

Now when I send the query "select id as col1, name as col2 from
document", the row type at first is "col1 bigint, col2 varchar(255)" and
later it becomes "ID bigint, NAME varchar(255)" which is to a specific
extent a good thing. The materialization logic determines it can
substitue the query, but during the substitution it compares that row
type with the one from the view. The Jdbc schema receives the columns in
upper case, which is why the row type of the sent query is in upper
case. Either the comparison should be case insensitive, or I simply
upper case the names of the columns in the view, which is what I did now.

Doing that will unfortunately cause a little mismatch in the ES adapter
which expects that the field names have the same case as the fields of
the row type. This is why I adapted some rules to extract the correctly
cased field name from the _MAP expression.

Now the question is, should the comparison be case insensitive or should
I rely on the fact, that the JDBC schema will always have upper cased
column names?


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 24.08.2017 um 21:00 schrieb Julian Hyde:

Rather than "select id, name from document” could you create your view as 
"select `id`, `name` from document” (or however the back-end system quotes 
identifiers). Then “id” would still be in lower-case when the JDBC adapter queries the 
catalog.


On Aug 24, 2017, at 5:17 AM, Christian Beikov  
wrote:

My main problem is the row type equality assertion in 
org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel)

Imagine I have a table "document" with columns "id" and "name". When the JdbcSchema reads the 
structure, it gets column names in upper case. Now I register a materialized view for a query like "select id, name from 
document". The materialized table for that view is in my case a view again defined like "select ... AS `id`, ... AS 
`name` from ...".

The row type of my view correctly is "id, name". The row type of the table "document" is "ID, 
NAME" because the JdbcSchema gets upper cased names. Initially, the row type of the query for the materialized 
view is also correct, but during the "trim fields" phase the row type gets replaced with the types from the 
table. Is this replacement of field types even correct?

Because of that, the assertion in the substiution visitor fails. What would be 
the appropriate solution for this mismatch?


Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 12:57 schrieb Julian Hyde:

Or supply your own TableFactory? I'm not quite sure of your use case.
I've only tested cases where materialized views are "internal",
therefore they work fine with Calcite's default dialect.

On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov
 wrote:

Actually, it seems the root cause is that the materialization uses the wrong
configuration.

org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable
creates a new connection with the default configuration that does TO_UPPER.
Would it be ok for it to receive a CalciteConnectionConfig?


Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 11:36 schrieb Christian Beikov:

Seems org.apa

Re: Casts because of nullability differences

2017-08-25 Thread Christian Beikov
It apparently has something to do with MV substitutions as I couldn't 
reproduce it in any other way than using a query that can be substituted 
by an MV. Hard to say where the problem might be. Is there a specific 
phase in the planning for this optimization that might not be triggered 
anymore after substitutions? Any idea where I should dig into?



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 24.08.2017 um 21:31 schrieb Julian Hyde:

Optimized away later. In fact during SQL generation. SQL has no concept of 
“casting away nullability”.


On Aug 23, 2017, at 6:10 PM, Christian Beikov  
wrote:

RelOptMaterialization creates the cast which calls eventually into 
RexUtil.generateCastExpressions. That method, probably rightfully, is strict 
about nulls. The question is, whether the cast should be optimized away later 
or it shouldn't be created in the first place. What do you say?


Mit freundlichen Grüßen,

*Christian Beikov*
Am 24.08.2017 um 02:05 schrieb Julian Hyde:

You didn’t say that the casts were ending up in generated SQL. That would be a 
bug. Probably in RelToSqlConverter.



On Aug 23, 2017, at 4:35 PM, Christian Beikov  
wrote:

I get that the cast is done "internally" to handle nulls properly, but why 
would such a cast end up in the SQL that gets sent to the DBMS? Can you point me to a 
direction where to look for these generated casts? I'd like to try if I can avoid 
rendering these casts, as some DBMS planners might get confused.


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 23.08.2017 um 21:20 schrieb Julian Hyde:

It’s difficult to say for sure whether the cast is necessary. For some of us 
consistency, is an end in itself. :)

If a null was to occur, then yes, the cast would throw. But it’s also possible 
that Calcite can deduce that a null is impossible, in which case it wouldn’t 
even generate the logic.

I have this argument with Ashutosh from time to time. He tends to say we can 
never be sure which expressions might or might not produce nulls, so why bother 
being so strict? I’d rather use every piece of information we can get, because 
it might allow us to optimize.

By the way, there are cases where we can deduce that columns are not-nullable 
but we cannot change the row-type due to the RelOptRule contract. We ought to 
be generating RelMdPredicates in these cases.

Julian




On Aug 23, 2017, at 11:33 AM, Christian Beikov  
wrote:

I am using materializations and my materialization table has nullable columns. 
Calcite correctly determines that the underlying query can never produce null 
for these columns, but I didn't encode that into the table definitino. I was 
just wondering why the casts are generated in the SQL that is sent to the owner 
of the materialization table.

Thanks for the explanation, I suppose the cast causes proper exceptions to be 
thrown when null is encountered instead of simply a NPE? Or is the cast in such 
a case really unnecessary?


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 23.08.2017 um 20:01 schrieb Julian Hyde:

I presume you’re talking about RexNodes. It makes a lot of things easier if the 
argument to function call are EXACTLY the type required by that function. The 
system inserts implicit casts where necessary.

If you’re generating Java code (in Enumerable convention) there is a lot of 
difference between INTEGER NOT NULL and INTEGER. The former can be represented 
by int, the latter only by java.lang.Integer.

Julian



On Aug 23, 2017, at 2:08 AM, Christian Beikov  
wrote:

Hello,

I came to notice that the use of a nullable column in a context where a 
non-nullable is expected, causes a cast that is essentially useless. Am I 
missing something and is there reason to why that is done?

--

Mit freundlichen Grüßen,
----
*Christian Beikov*




Materialization performance

2017-08-27 Thread Christian Beikov
Hey, I have been looking a bit into how materialized views perform 
during the planning because of a very long test 
run(MaterializationTest#testJoinMaterializationUKFK6) and the current 
state is problematic.


CalcitePrepareImpl#getMaterializations always reparses the SQL and down 
the line, there is a lot of expensive work(e.g. predicate and lineage 
determination) done during planning that could easily be pre-calculated 
and cached during materialization creation.


There is also a bit of a thread safety problem with the current 
implementation. Unless there is a different safety mechanism that I 
don't see, the sharing of the MaterializationService and thus also the 
maps in MaterializationActor via a static instance between multiple 
threads is problematic.


Since I mentioned thread safety, how is Calcite supposed to be used in a 
multi-threaded environment? Currently I use a connection pool that 
initializes the schema on new connections, but that is not really nice. 
I suppose caches are also bound to the connection? A thread safe context 
that can be shared between connections would be nice to avoid all that 
repetitive work.


Are these known issues which you have thought about how to fix or should 
I log JIRAs for these and fix them to the best of my knowledge? I'd more 
or less keep the service shared but would implement it using a copy on 
write strategy since I'd expect seldom schema changes after startup.


Regarding the repetitive work that partly happens during planning, I'd 
suggest doing that during materialization registration instead like it 
is already mentioned CalcitePrepareImpl#populateMaterializations. Would 
that be ok?


--

Mit freundlichen Grüßen,
--------
*Christian Beikov*


Re: Materialization performance

2017-08-28 Thread Christian Beikov
If the metadata was cached, that would be awesome, especially because 
that would also improve the prformance regarding the metadata retrival 
for the query currently being planned, although I am not sure how the 
caching would work since the RelNodes are mutable.


Have you considered implementing the filter tree index explained in the 
paper? As far as I understood, the whole thing only works when a 
redundant table elimination is implemented. Is that the case? If so, or 
if it can be done easily, I'd propose we initialize all the lookup 
structures during registration and use them during planning. This will 
improve planning time drastically and essentially handle the scalability 
problem you mention.


What other MV-related issues are on your personal todo list Jesus? I 
read the paper now and think I can help you in one place or another if 
you want.



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez:

Hive does not use the Calcite SQL parser, thus we follow a different path
and did not experience the problem on the Calcite end. However, FWIW we
avoided reparsing the SQL every time a query was being planned by
creating/managing our own cache too.

The metadata providers implement some caching, thus I would expect that once
you avoid reparsing every MV, the retrieval time of predicates, lineage, etc.
would improve (at least after using the MV for the first time). However,
I agree that the information should be inferred when the MV is loaded.
In fact, maybe just making some calls to the metadata providers while the MVs
are being loaded would do the trick (Julian should confirm this).

Btw, probably you will find another scalability issue as the number of MVs
grows large with the current implementation of the rewriting, since the´
pre-filtering implementation in place does not discard many of the views that
are not valid to rewrite a given query, and rewriting is attempted with all
of them.
This last bit is work that I would like to tackle shortly, but I have not
created the corresponding JIRA yet.

-Jesús
  




On 8/27/17, 10:43 PM, "Rajat Venkatesh"  wrote:


Thread Safety and repeated parsing is a problem. We have experience with
managing 10s of materialized views. Repeated parsing takes more time than
execution of the query itself. We also have a similar problem where
concurrent queries (with a different set of materialized views potentailly)
maybe planned at the same time. We solved it through maintaining a cache
and carefully setting the cache in a thread local.
Relevant code for inspiration:
https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java
https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java



On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov 
wrote:


Hey, I have been looking a bit into how materialized views perform
during the planning because of a very long test
run(MaterializationTest#testJoinMaterializationUKFK6) and the current
state is problematic.

CalcitePrepareImpl#getMaterializations always reparses the SQL and down
the line, there is a lot of expensive work(e.g. predicate and lineage
determination) done during planning that could easily be pre-calculated
and cached during materialization creation.

There is also a bit of a thread safety problem with the current
implementation. Unless there is a different safety mechanism that I
don't see, the sharing of the MaterializationService and thus also the
maps in MaterializationActor via a static instance between multiple
threads is problematic.

Since I mentioned thread safety, how is Calcite supposed to be used in a
multi-threaded environment? Currently I use a connection pool that
initializes the schema on new connections, but that is not really nice.
I suppose caches are also bound to the connection? A thread safe context
that can be shared between connections would be nice to avoid all that
repetitive work.

Are these known issues which you have thought about how to fix or should
I log JIRAs for these and fix them to the best of my knowledge? I'd more
or less keep the service shared but would implement it using a copy on
write strategy since I'd expect seldom schema changes after startup.

Regarding the repetitive work that partly happens during planning, I'd
suggest doing that during materialization registration instead like it
is already mentioned CalcitePrepareImpl#populateMaterializations. Would
that be ok?

--

Mit freundlichen Grüßen,
--------
*Christian Beikov*





Re: Materialization performance

2017-08-29 Thread Christian Beikov
I'd like to stick to trying to figure out how to support outer joins for 
now and when I have an implementation for that, I'd look into the filter 
tree index if you haven't done it by then.



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 28.08.2017 um 20:01 schrieb Jesus Camacho Rodriguez:

Christian,

The implementation of the filter tree index is what I was referring to
indeed. In the initial implementation I focused on the rewriting coverage,
but now that the first part is finished, it is at the top of my list as
I think it is critical to make the whole query rewriting algorithm work
at scale. However, I have not started yet.

The filter tree index will help to filter not only based on the tables used
by a given query, but also for queries that do not meet the equivalence
classes conditions, filter conditions, etc. We could implement all the
preconditions mentioned in the paper, and we could add our own additional
ones. I also think that in a second version, we might need to maybe add
some kind of ranking/limit as many views might meet the preconditions for
a given query.

It seems you understood how it should work, so if you could help to
quickstart that work by maybe implementing a first version of the filter
tree index with a couple of basic conditions (table matching and EC matching?),
that would be great. I could review any of the contributions you make.

-Jesús





On 8/28/17, 3:22 AM, "Christian Beikov"  wrote:


If the metadata was cached, that would be awesome, especially because
that would also improve the prformance regarding the metadata retrival
for the query currently being planned, although I am not sure how the
caching would work since the RelNodes are mutable.

Have you considered implementing the filter tree index explained in the
paper? As far as I understood, the whole thing only works when a
redundant table elimination is implemented. Is that the case? If so, or
if it can be done easily, I'd propose we initialize all the lookup
structures during registration and use them during planning. This will
improve planning time drastically and essentially handle the scalability
problem you mention.

What other MV-related issues are on your personal todo list Jesus? I
read the paper now and think I can help you in one place or another if
you want.


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez:

Hive does not use the Calcite SQL parser, thus we follow a different path
and did not experience the problem on the Calcite end. However, FWIW we
avoided reparsing the SQL every time a query was being planned by
creating/managing our own cache too.

The metadata providers implement some caching, thus I would expect that once
you avoid reparsing every MV, the retrieval time of predicates, lineage, etc.
would improve (at least after using the MV for the first time). However,
I agree that the information should be inferred when the MV is loaded.
In fact, maybe just making some calls to the metadata providers while the MVs
are being loaded would do the trick (Julian should confirm this).

Btw, probably you will find another scalability issue as the number of MVs
grows large with the current implementation of the rewriting, since the´
pre-filtering implementation in place does not discard many of the views that
are not valid to rewrite a given query, and rewriting is attempted with all
of them.
This last bit is work that I would like to tackle shortly, but I have not
created the corresponding JIRA yet.

-Jesús
   




On 8/27/17, 10:43 PM, "Rajat Venkatesh"  wrote:


Thread Safety and repeated parsing is a problem. We have experience with
managing 10s of materialized views. Repeated parsing takes more time than
execution of the query itself. We also have a similar problem where
concurrent queries (with a different set of materialized views potentailly)
maybe planned at the same time. We solved it through maintaining a cache
and carefully setting the cache in a thread local.
Relevant code for inspiration:
https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java
https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java



On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov 
wrote:


Hey, I have been looking a bit into how materialized views perform
during the planning because of a very long test
run(MaterializationTest#testJoinMaterializationUKFK6) and the current
state is problematic.

CalcitePrepareImpl#getMaterializations always reparses the SQL and down
the line, there is a lot of expensive work(e.g. predicate and lineage
determination) done during planning that could easily be pre-calculated
and cached during materialization creation.

There is al

Re: Materialization performance

2017-08-29 Thread Christian Beikov
If it were a proper actor like you described it, concurrency wouldn't be 
a problem, but right now it is just a global holder for non-concurrent 
hash maps which is the problem. Currently, it's simply not thread safe.


I don't see a benefit in having request and response queues, I'd rather 
make registration and retrival synchronous. Maybe you could explain to 
me why you were favoring that model? Having to go through concurrent 
queues for every planner invocation seems to me like an overkill. I'd 
rather have immutable state being CASed(compare-and-swap) to make the 
querying cheap and do updates in an optimistic concurrency control manner.


What do you say? Would that be a possibility?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 28.08.2017 um 21:31 schrieb Julian Hyde:

I gave some thought to performance and thread safety when I added materialized 
view support. I didn’t follow through and test at high load and parallelism 
because at that point, functionality was more important. I’m glad we’re having 
the discussion now.

The solution I settled on is the actor model[1]. That means that one thread is 
responsible for accessing a critical data structure (in this case, the set of 
valid materialized views). Other threads are not allowed to modify, or even 
see, mutable state. They can send immutable requests, and get immutable objects 
back in response.

This is manifested as the MaterializationService and MaterializationActor; see 
the comment in the latter:

   // Not an actor yet -- TODO make members private and add request/response
   // queues
  
If we did that I think we would be well on the way to a thread-safe architecture. We can improve performance further, if necessary, by reducing the work that has to be done by the actor, as long as it alone is responsible for the mutable state.


Julian

[1] https://en.wikipedia.org/wiki/Actor_model 
<https://en.wikipedia.org/wiki/Actor_model>


On Aug 28, 2017, at 11:01 AM, Jesus Camacho Rodriguez 
 wrote:

Christian,

The implementation of the filter tree index is what I was referring to
indeed. In the initial implementation I focused on the rewriting coverage,
but now that the first part is finished, it is at the top of my list as
I think it is critical to make the whole query rewriting algorithm work
at scale. However, I have not started yet.

The filter tree index will help to filter not only based on the tables used
by a given query, but also for queries that do not meet the equivalence
classes conditions, filter conditions, etc. We could implement all the
preconditions mentioned in the paper, and we could add our own additional
ones. I also think that in a second version, we might need to maybe add
some kind of ranking/limit as many views might meet the preconditions for
a given query.

It seems you understood how it should work, so if you could help to
quickstart that work by maybe implementing a first version of the filter
tree index with a couple of basic conditions (table matching and EC matching?),
that would be great. I could review any of the contributions you make.

-Jesús





On 8/28/17, 3:22 AM, "Christian Beikov"  wrote:


If the metadata was cached, that would be awesome, especially because
that would also improve the prformance regarding the metadata retrival
for the query currently being planned, although I am not sure how the
caching would work since the RelNodes are mutable.

Have you considered implementing the filter tree index explained in the
paper? As far as I understood, the whole thing only works when a
redundant table elimination is implemented. Is that the case? If so, or
if it can be done easily, I'd propose we initialize all the lookup
structures during registration and use them during planning. This will
improve planning time drastically and essentially handle the scalability
problem you mention.

What other MV-related issues are on your personal todo list Jesus? I
read the paper now and think I can help you in one place or another if
you want.


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez:

Hive does not use the Calcite SQL parser, thus we follow a different path
and did not experience the problem on the Calcite end. However, FWIW we
avoided reparsing the SQL every time a query was being planned by
creating/managing our own cache too.

The metadata providers implement some caching, thus I would expect that once
you avoid reparsing every MV, the retrieval time of predicates, lineage, etc.
would improve (at least after using the MV for the first time). However,
I agree that the information should be inferred when the MV is loaded.
In fact, maybe just making some calls to the metadata providers while the MVs
are being loaded would do the trick (Julian should confi

Re: Materialization performance

2017-08-29 Thread Christian Beikov
Imagine the holder of the various hash maps is immutable, let's call it 
"actor". When a new registration is done, we create a copy of that 
holder and CAS it. When we query, we simply get the current value and 
access it's maps. So MaterializationService could have an 
AtomicReference to a holder "actor" just like right now, but we make the 
maps immutable and create copies whenever a change occurs. We could hide 
such details behind a message passing interface so that remote models 
can be implemented too, but that seems like a next step.


The materialization concurrency issues isn't the only problem, what 
about the general usage in multithreaded environments? The whole schema 
is currently bound to a CalciteConnection. It would be nice if all the 
context could be shared between multiple connections so that we avoid 
having to initialize every connection. Do you have any plans to tackle 
that or am I not seeing how to achieve this?



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 29.08.2017 um 19:40 schrieb Julian Hyde:

I'd rather have immutable state being CASed(compare-and-swap) to make
the querying cheap and do updates in an optimistic concurrency control manner.

Compare and swap only works for one memory address. You can't use it
to, say, debit one bank account and credit another.

The set of valid materializations is just about the only mutable state
in Calcite and I think it will need to be several interconnected data
structures. So, compare-and-swap (or its high-level equivalent,
ConcurrentHashMap) won't cut it.

So we could use locks/monitors (the "synchronized" keyword) or we
could use an actor. The key difference between the two is who does the
work. With a monitor, each customer grabs the key (there is only one
key), walks into the bank vault, and moves the money from one deposit
box to another. With an actor, there is a bank employee in the vault
who is the only person allowed to move money around.

The work done is the same in both models. There are performance
advantages of the actor model (the data structures will tend to exist
in one core's cache) and there are code simplicity advantages (the
critical code is all in one class or package).

The overhead of two puts/gets on an ArrayBlockingQueue per request is
negligible. And besides, you can switch to a non-actor implementation
of the service if Calcite is single-threaded.

I haven't thought out the details of multi-tenant. It is not true to
say that this is "not a primary requirement for
the Calcite project." Look at the "data grid (cache)" on the diagram
in my "Optiq" talk [1] from 2013. Dynamic materialized views were in
from the very start. There can be multiple instances of the actor
(each with their own request/response queues), so you could have one
per tenant. Also, it is very straightforward to make the actors
remote, replacing the queues with RPC over a message broker. Remote
actors are called services.

Julian

[1] 
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework

On Tue, Aug 29, 2017 at 8:25 AM, Jesus Camacho Rodriguez
 wrote:

LGTM, I think by the time we have support for the outer joins, I might have
had time to finish the filter tree index implementation too.

-Jesús



On 8/29/17, 3:11 AM, "Christian Beikov"  wrote:


I'd like to stick to trying to figure out how to support outer joins for
now and when I have an implementation for that, I'd look into the filter
tree index if you haven't done it by then.


Mit freundlichen Grüßen,

*Christian Beikov*
Am 28.08.2017 um 20:01 schrieb Jesus Camacho Rodriguez:

Christian,

The implementation of the filter tree index is what I was referring to
indeed. In the initial implementation I focused on the rewriting coverage,
but now that the first part is finished, it is at the top of my list as
I think it is critical to make the whole query rewriting algorithm work
at scale. However, I have not started yet.

The filter tree index will help to filter not only based on the tables used
by a given query, but also for queries that do not meet the equivalence
classes conditions, filter conditions, etc. We could implement all the
preconditions mentioned in the paper, and we could add our own additional
ones. I also think that in a second version, we might need to maybe add
some kind of ranking/limit as many views might meet the preconditions for
a given query.

It seems you understood how it should work, so if you could help to
quickstart that work by maybe implementing a first version of the filter
tree index with a couple of basic conditions (table matching and EC matching?),
that would be great. I could review any of the contributions you make.

-Jesús





On 8/28/17, 3:22 AM, &

Re: Materialization performance

2017-08-30 Thread Christian Beikov
Of course steps 2 and 3 depend on what you read, but if a change happens 
in the meantime you'r CAS will fail in step 3 since all changes are done 
through such a CAS, so you have to "redo" the transaction or parts of 
it. That's basically optimistic locking :)


The important part is, that the whole holder is replaced so you can do 
guarantee safety by doing a single CAS. Imagine the actor field in 
MaterializationService is wrapped by an AtomicReference and all maps in 
the MaterializationActor are immutable. The only way to change a thing 
is to read the actor, create a copy of it with the new state and do a 
CAS through the atomic reference. That would already solve all thread 
safety issues that the current design has.


Could you maybe comment on the context sharing between connections part too?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 30.08.2017 um 21:31 schrieb Julian Hyde:

Consider a “transaction” that involves reads and writes:

   Read from a data structure
   Do some stuff
   Write to the data structure

If steps 2 and 3 depend on what you read in step 1, then you need to prevent 
anyone from writing until you have written. A simple CAS won’t solve this. The 
simplest solution is for the whole transaction to be in a critical section. It 
doesn’t really matter whether that is implemented using an actor or 
synchronized blocks.

We are mostly in agreement - especially about using immutable data structures 
for anything shared between threads.

Julian



On Aug 29, 2017, at 2:01 PM, Christian Beikov  
wrote:

Imagine the holder of the various hash maps is immutable, let's call it "actor". When a 
new registration is done, we create a copy of that holder and CAS it. When we query, we simply get 
the current value and access it's maps. So MaterializationService could have an AtomicReference to 
a holder "actor" just like right now, but we make the maps immutable and create copies 
whenever a change occurs. We could hide such details behind a message passing interface so that 
remote models can be implemented too, but that seems like a next step.

The materialization concurrency issues isn't the only problem, what about the 
general usage in multithreaded environments? The whole schema is currently 
bound to a CalciteConnection. It would be nice if all the context could be 
shared between multiple connections so that we avoid having to initialize every 
connection. Do you have any plans to tackle that or am I not seeing how to 
achieve this?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 29.08.2017 um 19:40 schrieb Julian Hyde:

I'd rather have immutable state being CASed(compare-and-swap) to make
the querying cheap and do updates in an optimistic concurrency control manner.

Compare and swap only works for one memory address. You can't use it
to, say, debit one bank account and credit another.

The set of valid materializations is just about the only mutable state
in Calcite and I think it will need to be several interconnected data
structures. So, compare-and-swap (or its high-level equivalent,
ConcurrentHashMap) won't cut it.

So we could use locks/monitors (the "synchronized" keyword) or we
could use an actor. The key difference between the two is who does the
work. With a monitor, each customer grabs the key (there is only one
key), walks into the bank vault, and moves the money from one deposit
box to another. With an actor, there is a bank employee in the vault
who is the only person allowed to move money around.

The work done is the same in both models. There are performance
advantages of the actor model (the data structures will tend to exist
in one core's cache) and there are code simplicity advantages (the
critical code is all in one class or package).

The overhead of two puts/gets on an ArrayBlockingQueue per request is
negligible. And besides, you can switch to a non-actor implementation
of the service if Calcite is single-threaded.

I haven't thought out the details of multi-tenant. It is not true to
say that this is "not a primary requirement for
the Calcite project." Look at the "data grid (cache)" on the diagram
in my "Optiq" talk [1] from 2013. Dynamic materialized views were in
from the very start. There can be multiple instances of the actor
(each with their own request/response queues), so you could have one
per tenant. Also, it is very straightforward to make the actors
remote, replacing the queues with RPC over a message broker. Remote
actors are called services.

Julian

[1] 
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework

On Tue, Aug 29, 2017 at 8:25 AM, Jesus Camacho Rodriguez
 wrote:

LGTM, I think by the time we have support for the outer joins, I might have
had t

Re: Materialization performance

2017-08-31 Thread Christian Beikov
My CAS scheme was merely meant for Materialization registration. The 
retry is an implementation detail and would happen inside of the 
registration method, the user wouldn't notice that. APIs stay the way 
they are, I'd only change the way the MaterializationActor is accessed.


The (root) schema is one part that I'd like to see being shared, but I 
guess the type factory as well as the CalciteServer should be shared too 
between connections. Is there anything else you think that can/should be 
shared?


I could implement the discussed sharing as a javax.sql.DataSource if you 
want so we can discuss specifics. Along the way I'd try to do some 
general performance improvements regarding concurrency synchronization 
mechanisms. Would that be ok?



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 31.08.2017 um 00:02 schrieb Julian Hyde:

Schema is the context you are referring to. Schema has a longer
lifespan than Connection, and if you make an immutable one (which we
recommend) you can share it among connections.

Your CAS scheme would work but requires each user to create a copy of
all of the materialization state. This is potentially large (thousands
of tables) and rapidly changing. Also, your scheme requires the user
to re-try. I think the actor model is better suited for this.

On Wed, Aug 30, 2017 at 2:14 PM, Christian Beikov
 wrote:

Of course steps 2 and 3 depend on what you read, but if a change happens in
the meantime you'r CAS will fail in step 3 since all changes are done
through such a CAS, so you have to "redo" the transaction or parts of it.
That's basically optimistic locking :)

The important part is, that the whole holder is replaced so you can do
guarantee safety by doing a single CAS. Imagine the actor field in
MaterializationService is wrapped by an AtomicReference and all maps in the
MaterializationActor are immutable. The only way to change a thing is to
read the actor, create a copy of it with the new state and do a CAS through
the atomic reference. That would already solve all thread safety issues that
the current design has.

Could you maybe comment on the context sharing between connections part too?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 30.08.2017 um 21:31 schrieb Julian Hyde:

Consider a “transaction” that involves reads and writes:

Read from a data structure
Do some stuff
Write to the data structure

If steps 2 and 3 depend on what you read in step 1, then you need to
prevent anyone from writing until you have written. A simple CAS won’t solve
this. The simplest solution is for the whole transaction to be in a critical
section. It doesn’t really matter whether that is implemented using an actor
or synchronized blocks.

We are mostly in agreement - especially about using immutable data
structures for anything shared between threads.

Julian



On Aug 29, 2017, at 2:01 PM, Christian Beikov
 wrote:

Imagine the holder of the various hash maps is immutable, let's call it
"actor". When a new registration is done, we create a copy of that holder
and CAS it. When we query, we simply get the current value and access it's
maps. So MaterializationService could have an AtomicReference to a holder
"actor" just like right now, but we make the maps immutable and create
copies whenever a change occurs. We could hide such details behind a message
passing interface so that remote models can be implemented too, but that
seems like a next step.

The materialization concurrency issues isn't the only problem, what about
the general usage in multithreaded environments? The whole schema is
currently bound to a CalciteConnection. It would be nice if all the context
could be shared between multiple connections so that we avoid having to
initialize every connection. Do you have any plans to tackle that or am I
not seeing how to achieve this?


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 29.08.2017 um 19:40 schrieb Julian Hyde:

I'd rather have immutable state being CASed(compare-and-swap) to make
the querying cheap and do updates in an optimistic concurrency control
manner.

Compare and swap only works for one memory address. You can't use it
to, say, debit one bank account and credit another.

The set of valid materializations is just about the only mutable state
in Calcite and I think it will need to be several interconnected data
structures. So, compare-and-swap (or its high-level equivalent,
ConcurrentHashMap) won't cut it.

So we could use locks/monitors (the "synchronized" keyword) or we
could use an actor. The key difference between the two is who does the
work. With a monitor, each customer grabs the key (there is only one
key), walks into the bank vault,

Re: JdbcAdapter Sort Limit and Offset

2017-08-31 Thread Christian Beikov
I might not be qualified to answer how to handle limit/offset, but I'd 
like to step in and say that we had a related discussion regarding where 
to put sequence support. How about introducing unparse methods in the 
SqlDialect for that stuff and let the dialects handle that however a 
DBMS supports it. Since the dialect is constructed with the database 
metadata it could even do further self configuration i.e. check if 
configurations are activated on the DBMS that enable the use of a 
certain capability.


If we want to support DBMSs that don't have a clause which we can use 
for doing limit/offset we need to wrap the query and use whatever the 
DBMS has to offer like e.g. ROWNUM.


I'd like to suggest we introduce a SqlDialectFactory which can be 
configured on the JdbcSchema. The sole purpose of that factory is to 
read the database metadata and construct an appropriate SqlDialect. The 
default factory looks at the database metadata and constructs an 
appropriate SqlDialect, although that dialect can then do some self 
configuration with that metadata. We'd make the current SqlDialect 
abstract and create implementations for at least every DBMS that we 
support. We could still have components like e.g. a SequenceExtractor or 
LimitOffsetRenderer, but the dialects choose the implementation based on 
the database metadata.


What do you think?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 31.08.2017 um 02:23 schrieb Luis Fernando Kauer:

Hi,

I was really upset that Calcite wouldn't push limit keyword to the database, not
even in simple queries like "select * from t limit 10".
Actually, not even ORDER BY was pushed to the database even in simple queries.
So I decided to take a look into it and opened Jira [CALCITE-1906] some time 
ago.
I found out that SortRemoveRule is applied when there is a 
RelCollationTraitDef, because it assumes that the table is already ordered, so 
JdbcSortRule must replace the input's treatSet with RelCollations.EMPTY, just 
like MongoDB and Cassandra adapters do.
Besides that, a simple fix to JdbcSortRule and JdbcSort allows it to push limit 
and fetch to the database.
Unfortunately, after fixing locally this other issues appeared.
When issuing a query with sort and no limit/fetch, like:
SELECT empno, ename FROM "SCOTT"."EMP" where ename='CLARK' order by ename
the selected plan is:
JdbcToEnumerableConverter
  JdbcProject(EMPNO=[$0], ENAME=[$1])
JdbcSort(sort0=[$1], dir0=[ASC])
  JdbcFilter(condition=[=($1, 'CLARK')])
JdbcTableScan(table=[[SCOTT, EMP]]))Notice that JdbcProject is above 
JdbcSort and that's because SortProjectTransposeRule is applied.
The problem is that the generated SQL is:
SELECT "EMPNO", "ENAME"
FROM (SELECT *
FROM "SCOTT"."EMP"
WHERE "ENAME" = 'CLARK'
ORDER BY "ENAME" NULLS LAST) AS "t0"
Noticed that the JdbcProject uses a subquery instead of being built at the same 
query level.
The problem is that SqlImplementor.Clause is ordered and "SELECT" is before 
"ORDER_BY" and SqlImplementor.Builder has the following check:
for (Clause clause : clauses) {
   if (maxClause.ordinal() > clause.ordinal() || (maxClause == clause && 
!nonWrapSet.contains(clause))) {
 needNew = true;
   }
}
So what happens is that "PROJECT" is added after "ORDER_BY" and the code above 
flags the need for a new query.
Changing SELECT order in SqlImplementor.Clause to the last fixes this issue but 
I don't know if there is any side effect or problem doing this.
I also noticed that after fixing all this, the 
JdbcAdapterTests.testColumnNullability was failing and the problem seems to be 
how JdbcAdapter is generating the SQL for HSQLDB now that limit clause is 
pushed to the database. The generated SQL is:
SELECT "employee_id", "position_id"
FROM "foodmart"."employee"
FETCH NEXT 10 ROWS ONLY
And the error is:
Caused by: org.hsqldb.HsqlException: unexpected token: NEXT : line: 3
Seems that HSQLDB has a problem with FETCH NEXT without OFFSET. One solution could be to change 
"NEXT" with "FIRST" in SqlPrettyWriter, but this is done to all the databases 
that support this feature, or could just make HSQLDB not support OffsetFetch in SqlDialect, so it 
would use LIMIT/OFFSET instead.
Sorry to bring all this together, but one thing led to another, and fixing one 
thing started to break other tests and I would really like to have this fixed.
Best regards,
Luis Fernando




Sequence support

2017-09-07 Thread Christian Beikov

Hey,

I'm having some problems implementing CALCITE-1940 and wanted to ask you 
for advice about how to tackle it.


A query like "select next value for seq1" where "seq1" is a sequence in 
e.g. PostgreSQL is will generate a query plan like "LogicalProject(NEXT 
VALUE FOR seq1) -> LogicalValues( (0) )".


As far as I understood, the convention to use for executing such a query 
is determined based on the input. The planner now always chooses the 
"EnumerableValues" implementation for "LogicalValues" thus the statement 
is never pushed down to PostgreSQL.


My question is, how would I be able to make the expression "NEXT VALUE 
FOR seq1", represented by the new subtype RexSeqCall, be executed on 
PostgreSQL? The sequence "seq1" is an entry in the JdbcSchema, but the 
current code doesn't seem to know that. During planning it seems only 
the input types are considered for the convention determination.


I hope you understand the problem I am having. Has anyone ideas of how 
to fix that?


I thought it might be possible to write a planner rule that finds such 
expressions and transforms them, but that's only one part I guess. 
Currently, the planner thinks a plan like "EnumerableProject(NEXT VALUE 
FOR seq1) -> EnumerableValues( (0) )" is better than "JdbcProject(NEXT 
VALUE FOR seq1) -> JdbcValues( (0) )" which it isn't. Any ideas how I 
could fix that?


--

Mit freundlichen Grüßen,

*Christian Beikov*


Re: adding CREATE TABLE DDL support?

2017-09-10 Thread Christian Beikov

Sounds like a good idea.


Mit freundlichen Grüßen,

*Christian Beikov*
Am 08.09.2017 um 19:28 schrieb Julian Hyde:

It’s in core/src/test/codegen/includes/parserImpls.ftl (i.e. a parser extension 
used in tests) but we don’t want to add it to the core parser because other 
projects want to have their own CREATE TABLE syntax.

I have been thinking of moving DDL into a separate module, so if people want to 
use Calcite+DDL in production they could do so. What do people think?

Julian




On Sep 8, 2017, at 7:53 AM, 王翔宇  wrote:

Looks like CREATE is not currently supported by the parser. I wanna add it.




Re: 1.14.0 Release candidate

2017-09-13 Thread Christian Beikov
I could setup test automation on TravisCI for most of the different
databases if you want with some description amd scripts for how to run the
docker containers for DBs locally. Would that help?
It's almost as flexible as Jenkins.

Regards,
Christian

Am 13.09.2017 22:46 schrieb "Michael Mior" :

> Created https://issues.apache.org/jira/browse/CALCITE-1985 to track the
> issues with the MongoDB adapter. If someone else could have a quick look
> that would be great. Failed test output is in there.
>
> --
> Michael Mior
> mm...@apache.org
>
> 2017-09-13 14:29 GMT-04:00 Michael Mior :
>
> > That's the major blocker here. I don't want to make a release when the
> > Cassandra adapter is completely broken for me. Although if others can
> > confirm that there are no issues, then maybe we should proceed and I'll
> > figure out the issues with my environment later.
> >
> > There are also a couple integration test failures on MongoDB that I'd
> like
> > to resolve, but at least one of those seems spurious as the generated
> plan
> > also seems ok. I'll open up JIRA for those and any failures I encounter
> > with details in case someone else is able to have a quick look.
> >
> > Other than test failures, I don't believe there are any outstanding bug
> > fixes or PRs that need to be merged. As mentioned earlier, if someone
> could
> > run integration tests for Oracle, that would be great. I had a brief look
> > over the Coverity scan results and didn't see anything that looks worth
> > blocking over, but I think it would be a good idea to have a more
> thorough
> > review in the future and also to set up some models to prune spurious
> > warnings so it's easier to take action in the future.
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > 2017-09-13 14:10 GMT-04:00 Julian Hyde :
> >
> >> How close are we to a release candidate? Commits to master are paused,
> >> so let's either make an RC soon or re-open the branch to commits.
> >>
> >> Michael and I seem to be deadlocked on
> >> https://issues.apache.org/jira/browse/CALCITE-1981. Michael gets a
> >> build error every time, I never get a build error or a runtime error.
> >> (I got one once - due to stale jars on my class path, I think.)
> >>
> >> Can someone else please try to reproduce the build error? Then we'll
> >> know whether it's me or Michael who has a messed-up environment.
> >>
> >> Julian
> >>
> >>
> >> On Fri, Sep 8, 2017 at 1:59 PM, Julian Hyde  wrote:
> >> > I don’t think there is anyone who has knowledge of MongoDB and time to
> >> make the fixes.
> >> >
> >> >> On Sep 7, 2017, at 10:06 AM, Michael Mior 
> wrote:
> >> >>
> >> >> Thanks. I'm also seeing some integration test failures for the
> MongoDB
> >> >> adapter. If someone more familiar with Mongo could check that out,
> that
> >> >> would be great.
> >> >>
> >> >> --
> >> >> Michael Mior
> >> >> mm...@apache.org
> >> >>
> >> >> 2017-09-07 12:51 GMT-04:00 Julian Hyde :
> >> >>
> >> >>> OK, doing that now. The coverity config is a little rusty, so it may
> >> >>> take a while before I have it working. I'll let you know.
> >> >>>
> >> >>> I discovered a couple of tests that were failing on windows, so I'll
> >> >>> be committing fixes for those also.
> >> >>>
> >> >>> Other PRs should wait until after the release.
> >> >>>
> >> >>> On Wed, Sep 6, 2017 at 6:21 PM, Michael Mior 
> >> wrote:
> >>  Could you also trigger a new Coverity scan? I don't have write
> >> access to
> >>  your repo.
> >> 
> >>  --
> >>  Michael Mior
> >>  mm...@apache.org
> >> 
> >>  2017-09-06 15:50 GMT-04:00 Julian Hyde :
> >> 
> >> > I’m good to go.
> >> >
> >> > Today, I will check that Calcite build / test on still works
> >> Windows.
> >> >
> >> > Julian
> >> >
> >> >> On Sep 6, 2017, at 11:48 AM, Michael Mior 
> >> wrote:
> >> >>
> >> >> As far as I know, all changes people are hoping to have in the
> >> 1.14.0
> >> >> release have landed. Please speak up if that is not the case,
> >> >>> otherwise
> >> > I'm
> >> >> hoping to prepare RC0 for tomorrow.
> >> >>
> >> >> --
> >> >> Michael Mior
> >> >> mm...@apache.org
> >> >
> >> >
> >> >>>
> >> >
> >>
> >
> >
>


Re: Using a calcite DB locally as well as exposing it over Avatica?

2017-09-17 Thread Christian Beikov
I might be wrong, but as far as my knowledge goes, you can't have more than
a single connection to the same schema right now. I am planning to work on
that soon by implementing a datasource. As part of that I would take a look
at how Aviatica's remote capabilities fit into. Until then, I'd rather
stick to testing for now.

Regards,
Christian


Am 18.09.2017 8:36 vorm. schrieb "Nick Stephen" :

Hi folks,

We’ve started experimenting with Calcite for some work and have been
running it in-process using a datasource wrapped around a local connection
(using jdbc:calcite).

I’d now like to expose this same database remotely using avatica, but can’t
see any APIs in avatica which can take a Connection object. If I create a
Meta with the same local URL then I understandably get a different DB
instance.

Could someone point out the appropriate API, or are there any examples of
exposing a calcite DB remotely using avatica whilst still being able to use
it locally in-process to avoid the additional network hop.

Thanks!

Nick


Travis CI build

2017-09-19 Thread Christian Beikov
I just wanted to give you a little update on this since I have to wait 
for a response now.


I tried to debug why the build hangs on the Travis CI infrastructure and 
it seems this is related to a known issue regarding parallel execution 
of parameterized tests: https://issues.apache.org/jira/browse/SUREFIRE-1264


When disabling the parallel execution, the tests should run without an 
issue. I suggest we introduce a property for configuring this and 
disable parallel execution on Travis CI until this issue is resolved. 
Would that be ok?


--

Mit freundlichen Grüßen,

*Christian Beikov*


Re: 1.14.0 Release candidate

2017-09-19 Thread Christian Beikov
I'd say whoever broke the build should take a look at it, but that's 
hard to tell now I guess.


I'm working on setting up build automation on Travis CI which will help 
with seeing what or who broke the build, but that will only help in the 
future. For now I'd say, depending on whether the MongoDB adapter is 
considered production ready, ignore the problems or cancel the release.



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 20.09.2017 um 04:18 schrieb Jesus Camacho Rodriguez:

Michael,

It seems nobody is willing to tackle CALCITE-1985 shortly, thus
we need to unblock the situation.

I would suggest we move it to 1.15.0 instead of qualifying it
as blocker for 1.14.0, since it seems it may cause a performance
regression for the MongoDB adapter, but there is no loss of
functionality and rest of tests are passing. We could add a
release note if necessary.

Otherwise, we need to cancel the release and wait for it to
be fixed, since we have been in the release cycle for a while
now holding all commits to master branch.

I would like to hear what others think about this too?
Other ideas?

Thanks,
-Jesús




On 9/20/17, 12:08 AM, "Michael Mior"  wrote:


Anyone willing to look at https://issues.apache.org/jira/browse/CALCITE-1985?
As far as I'm concerned, this is the only blocker for the release since it
seems I'm the only one who has issues with the Cassandra adapter.

--
Michael Mior
mm...@apache.org

2017-09-14 14:50 GMT-04:00 Michael Mior :


LGTM. I'm just waiting on resolving CALCITE-1981 and CALCITE-1985. Jesús,
any changes related to MV rewriting that may have caused CALCITE-1981?
Also, would appreciate anyone having a look at the Mongo failures in
CALCITE-1985. Other than that, I think we're good to release.

--
Michael Mior
mm...@uwaterloo.ca

2017-09-14 14:46 GMT-04:00 Julian Hyde :


Michael,

May I commit https://github.com/apache/calcite/pull/537? It's just a
javadoc change.

Julian


On Wed, Sep 13, 2017 at 9:25 PM, Christian Beikov
 wrote:

I could setup test automation on TravisCI for most of the different
databases if you want with some description amd scripts for how to run

the

docker containers for DBs locally. Would that help?
It's almost as flexible as Jenkins.

Regards,
Christian

Am 13.09.2017 22:46 schrieb "Michael Mior" :


Created https://issues.apache.org/jira/browse/CALCITE-1985 to track

the

issues with the MongoDB adapter. If someone else could have a quick

look

that would be great. Failed test output is in there.

--
Michael Mior
mm...@apache.org

2017-09-13 14:29 GMT-04:00 Michael Mior :


That's the major blocker here. I don't want to make a release when

the

Cassandra adapter is completely broken for me. Although if others can
confirm that there are no issues, then maybe we should proceed and

I'll

figure out the issues with my environment later.

There are also a couple integration test failures on MongoDB that I'd

like

to resolve, but at least one of those seems spurious as the generated

plan

also seems ok. I'll open up JIRA for those and any failures I

encounter

with details in case someone else is able to have a quick look.

Other than test failures, I don't believe there are any outstanding

bug

fixes or PRs that need to be merged. As mentioned earlier, if someone

could

run integration tests for Oracle, that would be great. I had a brief

look

over the Coverity scan results and didn't see anything that looks

worth

blocking over, but I think it would be a good idea to have a more

thorough

review in the future and also to set up some models to prune spurious
warnings so it's easier to take action in the future.

--
Michael Mior
mm...@apache.org

2017-09-13 14:10 GMT-04:00 Julian Hyde :


How close are we to a release candidate? Commits to master are

paused,

so let's either make an RC soon or re-open the branch to commits.

Michael and I seem to be deadlocked on
https://issues.apache.org/jira/browse/CALCITE-1981. Michael gets a
build error every time, I never get a build error or a runtime

error.

(I got one once - due to stale jars on my class path, I think.)

Can someone else please try to reproduce the build error? Then we'll
know whether it's me or Michael who has a messed-up environment.

Julian


On Fri, Sep 8, 2017 at 1:59 PM, Julian Hyde 

wrote:

I don’t think there is anyone who has knowledge of MongoDB and

time to

make the fixes.

On Sep 7, 2017, at 10:06 AM, Michael Mior 

wrote:

Thanks. I'm also seeing some integration test failures for the

MongoDB

adapter. If someone more familiar with Mongo could check that

out,

that

would be great.

--
Michael Mior
mm...@apache.org

2017-09-07 12:51 GMT-04:00 Julian Hyde :


OK, doing that now. The coverity config is a little rusty, so

it may

take a while before I have it working. I&#

Re: Travis CI build

2017-09-22 Thread Christian Beikov
Here is the PR for disabling parallelism on Travis CI. I also took the 
opportunity to fix the JDK7 problem.


https://github.com/apache/calcite/pull/542


Mit freundlichen Grüßen,

*Christian Beikov*
Am 20.09.2017 um 03:34 schrieb Julian Hyde:

Adding a parameter to control parallelism sounds fine. I’d prefer that the 
tests continue to run in parallel by default, but it’s fine to override in CI.



On Sep 19, 2017, at 4:34 AM, Christian Beikov  
wrote:

I just wanted to give you a little update on this since I have to wait for a 
response now.

I tried to debug why the build hangs on the Travis CI infrastructure and it 
seems this is related to a known issue regarding parallel execution of 
parameterized tests: https://issues.apache.org/jira/browse/SUREFIRE-1264

When disabling the parallel execution, the tests should run without an issue. I 
suggest we introduce a property for configuring this and disable parallel 
execution on Travis CI until this issue is resolved. Would that be ok?

--

Mit freundlichen Grüßen,

*Christian Beikov*




Re: Fwd: ElasticSearch5 not working

2017-09-26 Thread Christian Beikov
Sqlline is problematic as it will have both elastic search versions on 
the classpath. I haven't thought about that before. In the meantime you 
could remove old versions of libraries from the 
"target/fullclasspath.txt" file to workaround the classpath problem.


@Julian Hyde: How should we deal with something like this? We'd need 
isolated "modules" instead of a flat class path for sqlline. Maybe we 
could build sqlline on JBoss 
Modules(https://github.com/jboss-modules/jboss-modules) to achieve that?



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor:

-- Forwarded message --
From: Kunwar Deep Singh Toor 
Date: Mon, Sep 25, 2017 at 8:16 PM
Subject: ElasticSearch5 not working
To: dev-i...@calcite.apache.org


Hi,

  I have been trying to connect to the ElasticSearch  suing ES adapter
through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter
but when I try connecting to ES 5.1.1 using the ES5 adapter the following
error pops up, I checked the logs for ElasticSearch and it does not show  a
connection attempt being made.

The following is the error:
java.lang.NoSuchMethodError: org.elasticsearch.transport.cl
ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio
n;Ljava/util/Collection;)Ljava/util/Collection;
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:130)
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:116)
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:106)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
ma.open(Elasticsearch5Schema.java:119)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
ma.(Elasticsearch5Schema.java:74)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
maFactory.create(Elasticsearch5SchemaFactory.java:56)
 at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:269)
 at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomS
chema.java:45)
 at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:195)
 at org.apache.calcite.model.ModelHandler.(ModelHandler.java:87)
 at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver.java:104)
 at org.apache.calcite.avatica.UnregisteredDriver.connect(Unregi
steredDriver.java:139)
 at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156)
 at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:204)
 at sqlline.Commands.connect(Commands.java:1095)
 at sqlline.Commands.connect(Commands.java:1001)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
ssorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
thodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHa
ndler.java:38)
 at sqlline.SqlLine.dispatch(SqlLine.java:791)
 at sqlline.SqlLine.begin(SqlLine.java:668)
 at sqlline.SqlLine.start(SqlLine.java:373)
 at sqlline.SqlLine.main(SqlLine.java:265)

I have recently started working on calcite and any help would be much
appreciated.

Thanks,

Kunwar Singh





Re: Fwd: ElasticSearch5 not working

2017-09-27 Thread Christian Beikov
Depends if you want to be able to connect to an ES2 and ES5 server at the
same time through sqlline.

Am 27.09.2017 17:53 schrieb "Josh Elser" :

> A quick solution is to add some documentation around how to give a
> specific version of ElasticSearch at build-time. This would help folks
> build Calcite with the exact version that they want (assuming that the API
> hasn't changed and the adapter actually works with that version of ES)
>
> We don't need to have multiple versions of ElasticSearch on the classpath
> at once, right?
>
> On 9/27/17 2:13 AM, Christian Beikov wrote:
>
>> Sqlline is problematic as it will have both elastic search versions on
>> the classpath. I haven't thought about that before. In the meantime you
>> could remove old versions of libraries from the "target/fullclasspath.txt"
>> file to workaround the classpath problem.
>>
>> @Julian Hyde: How should we deal with something like this? We'd need
>> isolated "modules" instead of a flat class path for sqlline. Maybe we could
>> build sqlline on JBoss Modules(https://github.com/jbo
>> ss-modules/jboss-modules) to achieve that?
>>
>>
>> Mit freundlichen Grüßen,
>> 
>> *Christian Beikov*
>> Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor:
>>
>>> -- Forwarded message --
>>> From: Kunwar Deep Singh Toor 
>>> Date: Mon, Sep 25, 2017 at 8:16 PM
>>> Subject: ElasticSearch5 not working
>>> To: dev-i...@calcite.apache.org
>>>
>>>
>>> Hi,
>>>
>>>   I have been trying to connect to the ElasticSearch  suing ES adapter
>>> through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter
>>> but when I try connecting to ES 5.1.1 using the ES5 adapter the following
>>> error pops up, I checked the logs for ElasticSearch and it does not
>>> show  a
>>> connection attempt being made.
>>>
>>> The following is the error:
>>> java.lang.NoSuchMethodError: org.elasticsearch.transport.cl
>>> ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio
>>> n;Ljava/util/Collection;)Ljava/util/Collection;
>>>  at org.elasticsearch.transport.client.PreBuiltTransportClient.<
>>> init>(PreBuiltTransportClient.java:130)
>>>  at org.elasticsearch.transport.client.PreBuiltTransportClient.<
>>> init>(PreBuiltTransportClient.java:116)
>>>  at org.elasticsearch.transport.client.PreBuiltTransportClient.<
>>> init>(PreBuiltTransportClient.java:106)
>>>  at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
>>> ma.open(Elasticsearch5Schema.java:119)
>>>  at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
>>> ma.(Elasticsearch5Schema.java:74)
>>>  at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
>>> maFactory.create(Elasticsearch5SchemaFactory.java:56)
>>>  at org.apache.calcite.model.ModelHandler.visit(ModelHandler.
>>> java:269)
>>>  at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomS
>>> chema.java:45)
>>>  at org.apache.calcite.model.ModelHandler.visit(ModelHandler.
>>> java:195)
>>>  at org.apache.calcite.model.ModelHandler.(ModelHandler.
>>> java:87)
>>>  at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver.
>>> java:104)
>>>  at org.apache.calcite.avatica.UnregisteredDriver.connect(Unregi
>>> steredDriver.java:139)
>>>  at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156)
>>>  at sqlline.DatabaseConnection.getConnection(DatabaseConnection.
>>> java:204)
>>>  at sqlline.Commands.connect(Commands.java:1095)
>>>  at sqlline.Commands.connect(Commands.java:1001)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:498)
>>>  at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHa
>>> ndler.java:38)
>>>  at sqlline.SqlLine.dispatch(SqlLine.java:791)
>>>  at sqlline.SqlLine.begin(SqlLine.java:668)
>>>  at sqlline.SqlLine.start(SqlLine.java:373)
>>>  at sqlline.SqlLine.main(SqlLine.java:265)
>>>
>>> I have recently started working on calcite and any help would be much
>>> appreciated.
>>>
>>> Thanks,
>>>
>>> Kunwar Singh
>>>
>>>
>>
>>


Re: ElasticSearch5 not working

2017-09-27 Thread Christian Beikov
Well it's not so simple I suppose. You also need to exclude the 
elasticsearch2 maven module from the pom.xml since that one is actually 
contributing the elasticsearch client to the classpath. For the sake of 
simplicity, we could introduce profiles that only have a specific ES 
version, but that doesn't scale obviously regarding possible other 
version combinations.


I can introduce a profile if you want, but a real fix as I said needs 
classloader magic like what e.g. JBoss Modules does.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 27.09.2017 um 20:28 schrieb Julian Hyde:

I agree with Christian that it’s just a problem with sqlline’s class path. The 
simplest solution is to remove elasticsearch2 from the line that builds the 
class path:

for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb 
spark splunk example/csv example/function; do
   CP=${CP}${module}/target/classes:
   CP=${CP}${module}/target/test-classes:
done

Then sqlline will work out of the box for ES5. People who want to run ES2 will 
have to edit the script.

Kunwar, Can you please log a JIRA case?

Christian, Can you you fix it?

It’s conceivable that someone would want to combine data from an ES2 and ES5 
instance, but we don’t have an answer for them, whether they invoke Calcite via 
sqlline or any other way. The incompatible libraries will clash, unless there 
is some class-loader magic keeping them in different namespaces. The right 
answer is probably something like Jboss modules, but it’s low priority.

Julian



On Sep 27, 2017, at 9:29 AM, Christian Beikov  
wrote:

Depends if you want to be able to connect to an ES2 and ES5 server at the
same time through sqlline.

Am 27.09.2017 17:53 schrieb "Josh Elser" :


A quick solution is to add some documentation around how to give a
specific version of ElasticSearch at build-time. This would help folks
build Calcite with the exact version that they want (assuming that the API
hasn't changed and the adapter actually works with that version of ES)

We don't need to have multiple versions of ElasticSearch on the classpath
at once, right?

On 9/27/17 2:13 AM, Christian Beikov wrote:


Sqlline is problematic as it will have both elastic search versions on
the classpath. I haven't thought about that before. In the meantime you
could remove old versions of libraries from the "target/fullclasspath.txt"
file to workaround the classpath problem.

@Julian Hyde: How should we deal with something like this? We'd need
isolated "modules" instead of a flat class path for sqlline. Maybe we could
build sqlline on JBoss Modules(https://github.com/jbo
ss-modules/jboss-modules) to achieve that?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor:


-- Forwarded message --
From: Kunwar Deep Singh Toor 
Date: Mon, Sep 25, 2017 at 8:16 PM
Subject: ElasticSearch5 not working
To: dev-i...@calcite.apache.org


Hi,

  I have been trying to connect to the ElasticSearch  suing ES adapter
through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter
but when I try connecting to ES 5.1.1 using the ES5 adapter the following
error pops up, I checked the logs for ElasticSearch and it does not
show  a
connection attempt being made.

The following is the error:
java.lang.NoSuchMethodError: org.elasticsearch.transport.cl
ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio
n;Ljava/util/Collection;)Ljava/util/Collection;
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:130)
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:116)
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:106)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
ma.open(Elasticsearch5Schema.java:119)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
ma.(Elasticsearch5Schema.java:74)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
maFactory.create(Elasticsearch5SchemaFactory.java:56)
 at org.apache.calcite.model.ModelHandler.visit(ModelHandler.
java:269)
 at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomS
chema.java:45)
 at org.apache.calcite.model.ModelHandler.visit(ModelHandler.
java:195)
 at org.apache.calcite.model.ModelHandler.(ModelHandler.
java:87)
 at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver.
java:104)
 at org.apache.calcite.avatica.UnregisteredDriver.connect(Unregi
steredDriver.java:139)
 at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156)
 at sqlline.DatabaseConnection.getConnection(DatabaseConnection.
java:204)
 at

Re: ElasticSearch5 not working

2017-09-28 Thread Christian Beikov
The elasticsearch2 module depens on the elasticsearch client driver 
libraries with version 2.x and the elasticsearch5 module depends on the 
elasticsearch client driver libraries with version 5.x. That's the 
version conflict Kunwar was running into.



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 28.09.2017 um 18:35 schrieb Julian Hyde:

Christian,

I don’t understand. The elasticsearch5 module does not depend on the 
elasticsearch2 module, only on core and linq4j. Therefore if it (and its 
dependencies) are added to sqlline’s class path, it should have everything it 
needs. What am I missing?

Julian



On Sep 27, 2017, at 11:02 PM, Christian Beikov  
wrote:

Well it's not so simple I suppose. You also need to exclude the elasticsearch2 
maven module from the pom.xml since that one is actually contributing the 
elasticsearch client to the classpath. For the sake of simplicity, we could 
introduce profiles that only have a specific ES version, but that doesn't scale 
obviously regarding possible other version combinations.

I can introduce a profile if you want, but a real fix as I said needs 
classloader magic like what e.g. JBoss Modules does.


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 27.09.2017 um 20:28 schrieb Julian Hyde:

I agree with Christian that it’s just a problem with sqlline’s class path. The 
simplest solution is to remove elasticsearch2 from the line that builds the 
class path:

for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb 
spark splunk example/csv example/function; do
   CP=${CP}${module}/target/classes:
   CP=${CP}${module}/target/test-classes:
done

Then sqlline will work out of the box for ES5. People who want to run ES2 will 
have to edit the script.

Kunwar, Can you please log a JIRA case?

Christian, Can you you fix it?

It’s conceivable that someone would want to combine data from an ES2 and ES5 
instance, but we don’t have an answer for them, whether they invoke Calcite via 
sqlline or any other way. The incompatible libraries will clash, unless there 
is some class-loader magic keeping them in different namespaces. The right 
answer is probably something like Jboss modules, but it’s low priority.

Julian



On Sep 27, 2017, at 9:29 AM, Christian Beikov  
wrote:

Depends if you want to be able to connect to an ES2 and ES5 server at the
same time through sqlline.

Am 27.09.2017 17:53 schrieb "Josh Elser" :


A quick solution is to add some documentation around how to give a
specific version of ElasticSearch at build-time. This would help folks
build Calcite with the exact version that they want (assuming that the API
hasn't changed and the adapter actually works with that version of ES)

We don't need to have multiple versions of ElasticSearch on the classpath
at once, right?

On 9/27/17 2:13 AM, Christian Beikov wrote:


Sqlline is problematic as it will have both elastic search versions on
the classpath. I haven't thought about that before. In the meantime you
could remove old versions of libraries from the "target/fullclasspath.txt"
file to workaround the classpath problem.

@Julian Hyde: How should we deal with something like this? We'd need
isolated "modules" instead of a flat class path for sqlline. Maybe we could
build sqlline on JBoss Modules(https://github.com/jbo
ss-modules/jboss-modules) to achieve that?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor:


-- Forwarded message --
From: Kunwar Deep Singh Toor 
Date: Mon, Sep 25, 2017 at 8:16 PM
Subject: ElasticSearch5 not working
To: dev-i...@calcite.apache.org


Hi,

  I have been trying to connect to the ElasticSearch  suing ES adapter
through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter
but when I try connecting to ES 5.1.1 using the ES5 adapter the following
error pops up, I checked the logs for ElasticSearch and it does not
show  a
connection attempt being made.

The following is the error:
java.lang.NoSuchMethodError: org.elasticsearch.transport.cl
ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio
n;Ljava/util/Collection;)Ljava/util/Collection;
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:130)
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:116)
 at org.elasticsearch.transport.client.PreBuiltTransportClient.<
init>(PreBuiltTransportClient.java:106)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
ma.open(Elasticsearch5Schema.java:119)
 at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche
ma.(Elasticsearch5Schema.java:74)
  

Re: ElasticSearch5 not working

2017-09-28 Thread Christian Beikov
Maybe I am wrong, but I was assuming the command "mvn 
dependency:build-classpath" will include the dependencies of all modules 
regardless of whether we include the elasticsearch2 target/classes or not.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 28.09.2017 um 19:16 schrieb Julian Hyde:

I understand that. But if we remove calcite-elasticsearch2 from sqlline’s 
classpath, sqlline will still get calcite-elasticsearch5 and the elastic search 
5.x client driver libraries that it depends upon.

And the elastic search 2.x client driver libraries will not be on sqlline's 
class path, because calcite-elasticsearch2 is the only module that depends on 
them. So there will be no clash.

Julian



On Sep 28, 2017, at 9:53 AM, Christian Beikov  
wrote:

The elasticsearch2 module depens on the elasticsearch client driver libraries 
with version 2.x and the elasticsearch5 module depends on the elasticsearch 
client driver libraries with version 5.x. That's the version conflict Kunwar 
was running into.


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 28.09.2017 um 18:35 schrieb Julian Hyde:

Christian,

I don’t understand. The elasticsearch5 module does not depend on the 
elasticsearch2 module, only on core and linq4j. Therefore if it (and its 
dependencies) are added to sqlline’s class path, it should have everything it 
needs. What am I missing?

Julian



On Sep 27, 2017, at 11:02 PM, Christian Beikov  
wrote:

Well it's not so simple I suppose. You also need to exclude the elasticsearch2 
maven module from the pom.xml since that one is actually contributing the 
elasticsearch client to the classpath. For the sake of simplicity, we could 
introduce profiles that only have a specific ES version, but that doesn't scale 
obviously regarding possible other version combinations.

I can introduce a profile if you want, but a real fix as I said needs 
classloader magic like what e.g. JBoss Modules does.


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 27.09.2017 um 20:28 schrieb Julian Hyde:

I agree with Christian that it’s just a problem with sqlline’s class path. The 
simplest solution is to remove elasticsearch2 from the line that builds the 
class path:

for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb 
spark splunk example/csv example/function; do
   CP=${CP}${module}/target/classes:
   CP=${CP}${module}/target/test-classes:
done

Then sqlline will work out of the box for ES5. People who want to run ES2 will 
have to edit the script.

Kunwar, Can you please log a JIRA case?

Christian, Can you you fix it?

It’s conceivable that someone would want to combine data from an ES2 and ES5 
instance, but we don’t have an answer for them, whether they invoke Calcite via 
sqlline or any other way. The incompatible libraries will clash, unless there 
is some class-loader magic keeping them in different namespaces. The right 
answer is probably something like Jboss modules, but it’s low priority.

Julian



On Sep 27, 2017, at 9:29 AM, Christian Beikov  
wrote:

Depends if you want to be able to connect to an ES2 and ES5 server at the
same time through sqlline.

Am 27.09.2017 17:53 schrieb "Josh Elser" :


A quick solution is to add some documentation around how to give a
specific version of ElasticSearch at build-time. This would help folks
build Calcite with the exact version that they want (assuming that the API
hasn't changed and the adapter actually works with that version of ES)

We don't need to have multiple versions of ElasticSearch on the classpath
at once, right?

On 9/27/17 2:13 AM, Christian Beikov wrote:


Sqlline is problematic as it will have both elastic search versions on
the classpath. I haven't thought about that before. In the meantime you
could remove old versions of libraries from the "target/fullclasspath.txt"
file to workaround the classpath problem.

@Julian Hyde: How should we deal with something like this? We'd need
isolated "modules" instead of a flat class path for sqlline. Maybe we could
build sqlline on JBoss Modules(https://github.com/jbo
ss-modules/jboss-modules) to achieve that?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor:


-- Forwarded message --
From: Kunwar Deep Singh Toor 
Date: Mon, Sep 25, 2017 at 8:16 PM
Subject: ElasticSearch5 not working
To: dev-i...@calcite.apache.org


Hi,

  I have been trying to connect to the ElasticSearch  suing ES adapter
through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter
but when I try connecting to ES 5.1.1 using the ES5 adapter the following
error pops up, I che

Re: Does calcite have working ElasticSearch 5 adapter?

2017-10-02 Thread Christian Beikov
It does support basic operations, just as the ES2 adapter since it is 
based on it.


I am no ES user so I don't plan on doing this work, maybe you or someone 
else could step in?



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 01.10.2017 um 05:57 schrieb kant kodali:

Hi All,

The calcite elastic search 5 adapter webpage says the adapter currently
doesn't support aggregations however when I look at this code
https://github.com/apache/calcite/tree/master/elasticsearch5/src/main/java/org/apache/calcite/adapter/elasticsearch5

It doesn't look like it support the basic queries either. may I know when
we can possibly expect this?

Thanks!





Re: Does calcite have working ElasticSearch 5 adapter?

2017-10-02 Thread Christian Beikov

You can see what is supported by looking at the tests for the adapter.


Mit freundlichen Grüßen,

*Christian Beikov*
Am 02.10.2017 um 10:17 schrieb kant kodali:

where can I find all the operations the current ES5 adapter can support?
any ES5 examples? I have to deep dive into code before I can commit towards
contribution but I can see what I will be able to do.

On Mon, Oct 2, 2017 at 12:52 AM, Christian Beikov <
christian.bei...@gmail.com> wrote:


It does support basic operations, just as the ES2 adapter since it is
based on it.

I am no ES user so I don't plan on doing this work, maybe you or someone
else could step in?


Mit freundlichen Grüßen,
----
*Christian Beikov*

Am 01.10.2017 um 05:57 schrieb kant kodali:


Hi All,

The calcite elastic search 5 adapter webpage says the adapter currently
doesn't support aggregations however when I look at this code
https://github.com/apache/calcite/tree/master/elasticsearch5
/src/main/java/org/apache/calcite/adapter/elasticsearch5

It doesn't look like it support the basic queries either. may I know when
we can possibly expect this?

Thanks!






Re: Need to extract only the values in the field

2017-10-09 Thread Christian Beikov

The following should do it

select cast(_MAP['POP'] as integer) from "elasticsearch".zips

In the test of the adapter you can see other usages:

https://github.com/apache/calcite/blob/cc20ca13db4d506d9d4d1b861dd1c7ac3944e56e/elasticsearch2/src/test/resources/elasticsearch-zips-model.json#L37


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 09.10.2017 um 19:40 schrieb AshwinKumar AshwinKumar:

Hi Team,

Could you please help here. I am stuck on this problem for like 2 days now.

Thanks,
Ashwin

On Mon, Oct 9, 2017 at 4:42 PM, AshwinKumar AshwinKumar <
aash...@g.clemson.edu> wrote:


Hi Team,

I have an elasticsearch view called zips . Below is the table data -

0: jdbc:calcite:model=./elasticsearch2/src/te> select POP from
"elasticsearch".ZIPS;
+--+
| POP  |
+--+
| {POP=13367}  |
| {POP=1652}   |
| {POP=3184}   |
| {POP=43704}  |
| {POP=2084}   |
| {POP=1350}   |
| {POP=8194}   |
| {POP=1732}   |
| {POP=9808}   |
| {POP=4441}   |
+--+
10 rows selected (0.319 seconds)

Could you please let me know if there is a way to select only the values
in POP field using SQL. Like for eg I need only the integer values like
13367,1652 and so on from the tables. I need to join these values with
another table in postgres schema.

Thanks,
Ashwin





Re: Need to extract only the values in the field

2017-10-10 Thread Christian Beikov

Hey,

I'm not the original author of the ES2 adapter but I must admit, it 
seems odd that "select POP FROM "elasticsearch".ZIPS" doesn't work. 
Could you try ES5 and tell me if that works for you? I changed a few 
things related to projections for my test purposes which might just be 
what the ES2 adapter is missing.


Maybe the original ES2 author could step up and explain the rationale 
behind the way it works currently?



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 10.10.2017 um 16:47 schrieb AshwinKumar AshwinKumar:

Hi Christian,

Belowis my json file - {
   "version": "1.0",
   "defaultSchema": "elasticsearch1",
   "schemas": [
 {
   "name": "postgrestest",
   "type": "custom",
   "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory",
   "operand": {
 "jdbcDriver": "org.postgresql.Driver",
 "jdbcUrl": "jdbc:postgresql://localhost/bigdawg_catalog",
 "jdbcUser": "pguser",
 "jdbcPassword": "test"
   }
 },
 {
   "type": "custom",
   "name": "elasticsearch_raw",
   "factory":
"org.apache.calcite.adapter.elasticsearch2.Elasticsearch2SchemaFactory",
   "operand": {
 "coordinates": "{'127.0.0.1': 9300}",
 "userConfig": "{'bulk.flush.max.actions': 25,
'bulk.flush.max.size.mb': 1}",
 "index": "usa"
   }
 },
 {
   "name": "elasticsearch",
   "tables": [
 {
   "name": "ZIPS",
   "type": "view",
   "sql": [
 "select cast(_MAP['CITY'] AS varchar(20)) AS \"CITY\",\n",
 " cast(_MAP['LOC'][0] AS float) AS \"LONGITUDE\",\n",
 " cast(_MAP['LOC'][1] AS float) AS \"LATITUDE\",\n",
 " cast(_MAP['POP'] AS integer) AS \"POP\",\n",
 " cast(_MAP['STATE'] AS varchar(2)) AS \"STATE\",\n",
 " cast(_MAP['IDNUM'] AS varchar(5)) AS \"ID\"\n",
 "from \"elasticsearch_raw\".\"ZIPS\""
   ]
 }
   ]
 }


Earlier I was executing - select POP from "elasticsearch".ZIPS ; I got the
below error -

0: jdbc:calcite:model=./elasticsearch2/src/te>* select POP  from
"elasticsearch".ZIPS;*
++
|POP |
++
java.lang.ClassCastException: java.util.HashMap cannot be cast to
java.lang.Integer
 at
org.apache.calcite.avatica.util.AbstractCursor$IntAccessor.getInt(AbstractCursor.java:531)
 at org.apache.calcite.avatica.AvaticaSite.get(AvaticaSite.java:339)
 at
org.apache.calcite.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:409)
 at sqlline.Rows$Row.(Rows.java:157)
 at sqlline.IncrementalRows.hasNext(IncrementalRows.java:66)
 at sqlline.TableOutputFormat.print(TableOutputFormat.java:33)
 at sqlline.SqlLine.print(SqlLine.java:1648)
 at sqlline.Commands.execute(Commands.java:834)
 at sqlline.Commands.sql(Commands.java:733)
 at sqlline.SqlLine.dispatch(SqlLine.java:795)
 at sqlline.SqlLine.begin(SqlLine.java:668)
 at sqlline.SqlLine.start(SqlLine.java:373)
 at sqlline.SqlLine.main(SqlLine.java:265)
0: jdbc:calcite:model=./elasticsearch2/src/te>

I tried the one which you suggested. I am getting the below error -

0: jdbc:calcite:model=./elasticsearch2/src/te> *select cast(_MAP['POP'] as
integer) from "elasticsearch".ZIPS;*
2017-10-10 15:44:33,334 [main] ERROR -
org.apache.calcite.sql.validate.SqlValidatorException: Column '_MAP' not
found in any table
2017-10-10 15:44:33,347 [main] ERROR -
org.apache.calcite.runtime.CalciteContextException: From line 1, column 13
to line 1, column 16: Column '_MAP' not found in any table
Error: Error while executing SQL "select cast(_MAP['POP'] as integer) from
"elasticsearch".ZIPS": From line 1, column 13 to line 1, column 16: Column
'_MAP' not found in any table (state=,code=0)

All the varchar fields are working fine. For eg -

0: jdbc:calcite:model=./elasticsearch2/src/te> select STATE from
"elasticsearch".ZIPS;
+---+
| STATE |
+---+
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
| {STATE=MA} |
+

Re: [ANNOUNCE] New committer: Christian Beikov

2017-10-15 Thread Christian Beikov

Thank you all for the kind words! :)


Mit freundlichen Grüßen,

*Christian Beikov*
Am 13.10.2017 um 09:45 schrieb Chris Baynes:

Congrats Christian!

On Fri, Oct 13, 2017 at 5:22 AM, He Zhiqiang  wrote:


Congratulations and welcome!

Regards
Zhiqiang He


From: Julian Hyde 
Sent: Thursday, October 12, 2017 2:11
To: dev@calcite.apache.org
Subject: [ANNOUNCE] New committer: Christian Beikov

On behalf of the PMC I am delighted to announce Christian Beikov as a new
Calcite committer.

Christian’s first contribution[1] — quite out the blue — was a new adapter
for Elasticsearch5. Since then he has made various improvement’s to
Calcite’s support for federation and materialization, and has been active
in design discussions and helping users on the dev list.

Please give Christian a warm welcome to the project!

Julian

[1] https://github.com/apache/calcite/commits/master?author=beikov <
https://github.com/apache/calcite/commits/master?author=beikov>





Re: need help with Elasticsearch adapter

2017-12-29 Thread Christian Beikov
The problem is documented in this issue already: 
https://issues.apache.org/jira/browse/CALCITE-1994


It's not so easy to use the elasticsearch adapter with sqlline 
unfortunately.



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 26.12.2017 um 09:51 schrieb bryanbzhou(周彬):

Hi, all.


I met some problem with Elasticsearch adapter. Is there someone can help me ?


I try to test Elasticsearch adapter following 
https://calcite.apache.org/docs/elasticsearch_adapter.html.


First . I create a elacticsearch with docker on my macbook with following 
command:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" 
docker.elastic.co/elasticsearch/elasticsearch:5.6.1


Then. I edit es.json


{

   "version": "1.0",

   "defaultSchema": "elasticsearch",

   "schemas": [

 {

   "type": "custom",

   "name": "elasticsearch",

   "factory": 
"org.apache.calcite.adapter.elasticsearch5.Elasticsearch5SchemaFactory",

   "operand": {

 "coordinates": "{‘127.0.0.1': 9300}",

 "userConfig": "{ 'bulk.flush.max.size.mb': 10}",

 "index": "usa"

   }

 }

   ]

}


And try to connect to es with sqlline "!connect jdbc:calcite:model=es.json 
elastic changeme”

But I got an error:


java.lang.NoSuchMethodError: 
org.elasticsearch.transport.client.PreBuiltTransportClient.addPlugins(Ljava/util/Collection;Ljava/util/Collection;)Ljava/util/Collection;

at 
org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:130)

at 
org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:116)

at 
org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:106)

at 
org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Schema.open(Elasticsearch5Schema.java:119)

at 
org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Schema.(Elasticsearch5Schema.java:74)

at 
org.apache.calcite.adapter.elasticsearch5.Elasticsearch5SchemaFactory.create(Elasticsearch5SchemaFactory.java:56)

at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:270)

at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomSchema.java:45)

at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:196)

at org.apache.calcite.model.ModelHandler.(ModelHandler.java:88)

at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver.java:104)

at 
org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:139)

at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156)

at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:204)

at sqlline.Commands.connect(Commands.java:1095)

at sqlline.Commands.connect(Commands.java:1001)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)

at sqlline.SqlLine.dispatch(SqlLine.java:791)

at sqlline.SqlLine.begin(SqlLine.java:668)

at sqlline.SqlLine.start(SqlLine.java:373)

at sqlline.SqlLine.main(SqlLine.java:265)




Re: Maven wrapper

2017-12-29 Thread Christian Beikov
I didn't have the need for something like that up until now in any of 
the projects I worked on. I guess if there is some feature from a very 
recent Maven release that we would like to make use of, it would make 
sense to use that wrapper script. Since the current minimum version that 
is required for a build seems to be satisfied by the installations most 
users have, I don't see a reason for doing this yet. I'd suggest we do 
this when a new Maven model is released that might not be too widespread 
yet.



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 29.12.2017 um 21:42 schrieb Julian Hyde:

We have a pull request for a maven wrapper[1]. People would not need to install 
maven to build Calcite, but we would include shell/cmd scripts and a bootstrap 
.jar. It would allow us to use a specific version of maven (not that this has 
been a problem to date). It would make our release process a bit more 
complicated (we’d be shipping a binary in the .jar file) and we’d have to 
change instructions in several locations.

What do you all think of the idea?

I don’t think we should change something as fundamental as our build process 
unless there is a substantial majority in favor.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-2112 
<https://issues.apache.org/jira/browse/CALCITE-2112>




Re: Use Calcite adapter with Elastic search and Java

2018-02-15 Thread Christian Beikov
Did you take a look into the documentation? 
https://calcite.apache.org/docs/elasticsearch_adapter.html


If you use ES5+ you have to use an empty userConfig as the ES Client 
doesn't support the properties that are listed in the example.


You can also take a look at the 
tests(https://github.com/apache/calcite/blob/master/elasticsearch5/src/test/java/org/apache/calcite/test/Elasticsearch5AdapterIT.java) 
in the Calcite repository to find usage examples. Currently you can't 
make use of the ES5 adapter along with sqlline as there are classpath 
problems, but nobody stepped up to fix it yet.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 14.02.2018 um 10:15 schrieb Saurabh Pathak:

Hello,

I want to use calcite adapter with elastic search and java. But i haven't idea 
how to use calcite elastic search adapter could you please let us know how to 
use calcite adapter with elastic search and java.

Thanks in advance for your co operation

Thanks & Regards
Saurabh Pathak




Re: ElasticSearch. Explicit Definitions, Literals and RelBuilder

2018-05-17 Thread Christian Beikov
1) I wasn't aware of an API to retrieve a schema. We could definitely 
use that to avoid the need for views.


2) I agree, seems like a bug.


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 17.05.2018 um 13:51 schrieb Michael Mior:

1) I'm not too familiar with ES so I didn't realize you could define
mappings. I don't see any obvious reason why we couldn't use those assuming
they're exposed via the ES API.

2) I was not aware of this and not sure whether it was intentional. It
seems like a bug to me though.

3) Whenever you want to see how to build a particular query, you may find
it helpful to run "EXPLAIN PLAN FOR " in sqlline.

--
Michael Mior
mm...@uwaterloo.ca


Le jeu. 17 mai 2018 à 01:26, Andrei Sereda  a écrit :


Hello Calcite Devs,

I have some questions about ES adapter and custom predicates / projections
in Calcite. Your help is much appreciated.

1) All ES examples use a view (ZIPS
<
https://github.com/apache/calcite/blob/master/elasticsearch5/src/test/resources/elasticsearch-zips-model.json

)

which does explicit type cast, name alias and  dictionary access (via _MAP)
for each field. If such view is not defined beforehand, making ad-hoc
queries becomes tedious. Is there a way to make it more user-friendly (eg.
using existing ES mapping) ?

Example: select cast(_MAP['city'] AS varchar(20)) AS \"city\" from ...

Why some adapters require explicit definition (eg. Mongo / ES) while others
don't (eg. Geode)

2) When not using explicit casting (or field alias) query literals are
converted to lower case:

SQL: select * from "elastic" where _MAP['Foo'] = 'BAR' (note upper-case)
ES Query: { "term": { "foo" : "bar" }} (note lower-case)

This is less intuitive. Is there a way to switch it off (customize) ?

3) How to build the following query using Algebra Builder (RelBuilder)

select * from "elastic" where _MAP['Foo'] = 'Bar'

I presume one has to use SqlItemOperator but I couldn't get it to work. Is
this related to (1) and (2) ?

Regards,
Andrei.





Re: ElasticSearch Adapters. Calcite vs Dremio

2018-05-17 Thread Christian Beikov
I don't know Dremio, but I guess the ES adpater parts were just 
implemented independently to be able to get something faster. If the 
authors of Dremio want to contribute back we'd definitely integrate such 
work.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 17.05.2018 um 13:56 schrieb Michael Mior:

It's my understanding that while Dremio plugins serve a similar purpose,
they are different from Calcite adapters. In any case, it could be worth
looking at what Dremiois doing to see if there are any useful features to
pull into Calcite.

--
Michael Mior
mm...@uwaterloo.ca


Le jeu. 17 mai 2018 à 00:28, Andrei Sereda  a écrit :


Hello,

I've noticed that Dremio implemented
<https://github.com/dremio/dremio-oss/tree/master/plugins/elasticsearch>
their own (calcite) adapters for ElasticSearch.

According to documentation
<https://www.dremio.com/tutorials/unlocking-sql-on-elasticsearch/> those
adapters support ES aggregations (lacking in calcite) plus more advanced
features like date ranges or native scripting
<
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-getting-started.html

.

Is there a reason they're not backported / used in calcite ?  More
specifically is there any (technical) reason for those plugins to be
developed independently ?

Thanks in advance,
Andrei.





Re: ElasticSearch. Explicit Definitions, Literals and RelBuilder

2018-05-17 Thread Christian Beikov
I honestly don't know the reason for that. I mainly just reused the 
original implementation which was comitted here: 
https://github.com/apache/calcite/commit/f3caf13b9f1cd92f95dcf27716466bf2133e1ed7#diff-3ebf1b729d2fdd03596ff01088977e63


Seems like an easy fix. Would you mind creating the PR? :)


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 17.05.2018 um 17:43 schrieb Andrei Sereda:

Regarding  (2) Lower-case problem. The issue seems to be in
ElasticsearchFilter.Translator
<https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchFilter.java#L109>
which
can be easily fixed  :

return "\"query\" : " + builder.toJsonString(map).replaceAll("\\s+", "")
.toLowerCase(Locale.ROOT);
Any reason for such explicit toLowerCase() transformation ?


On Thu, May 17, 2018 at 11:14 AM, Michael Mior  wrote:


I'm not too familiar with the RelBuilder API myself unfortunately. I think
for your ???, you want builder.field(0) although builder.field("_MAP") may
also work.

--
Michael Mior
mm...@uwaterloo.ca


Le jeu. 17 mai 2018 à 10:39, Andrei Sereda  a écrit :


Hi Michael and Christian,

Thanks for your replies.

Regarding (3) RelBuilder below is the query plan. I'm not sure how to
express element =(ITEM($0, 'Foo') using API.

Query Plan
0: jdbc:calcite:model=target/test-classes/mod>  explain plan for select

*

from "elasticsearch_raw"."table" where _MAP['Foo'] = 'BAR';

| ElasticsearchToEnumerableConverter
   ElasticsearchFilter(condition=[=(ITEM($0, 'Foo'), 'BAR')])
 ElasticsearchTableScan(table=[[elasticsearch_raw,table]])


API
// Using RelBuilder API to construct simple ES query
// select * from "elasticsearch_raw"."table" where _MAP['Foo'] = 'BAR';
// use RexInputRef here ?
RexNode item = builder.call(SqlStdOperatorTable.ITEM, ???,
builder.literal("Foo"));
final RelNode node = builder
     .scan("elasticsearch_raw", "table")
 .filter(builder.call(SqlStdOperatorTable.EQUALS, item,
builder.literal("BAR")))
 .build();

Regards,
Andrei.




On Thu, May 17, 2018 at 9:16 AM, Christian Beikov <
christian.bei...@gmail.com> wrote:


1) I wasn't aware of an API to retrieve a schema. We could definitely

use

that to avoid the need for views.

2) I agree, seems like a bug.


Mit freundlichen Grüßen,




*Christian Beikov*

Am 17.05.2018 um 13:51 schrieb Michael Mior:


1) I'm not too familiar with ES so I didn't realize you could define
mappings. I don't see any obvious reason why we couldn't use those
assuming
they're exposed via the ES API.

2) I was not aware of this and not sure whether it was intentional. It
seems like a bug to me though.

3) Whenever you want to see how to build a particular query, you may

find

it helpful to run "EXPLAIN PLAN FOR " in sqlline.

--
Michael Mior
mm...@uwaterloo.ca


Le jeu. 17 mai 2018 à 01:26, Andrei Sereda  a

écrit :

Hello Calcite Devs,

I have some questions about ES adapter and custom predicates /
projections
in Calcite. Your help is much appreciated.

1) All ES examples use a view (ZIPS
<
https://github.com/apache/calcite/blob/master/elasticsearch5
/src/test/resources/elasticsearch-zips-model.json


)


which does explicit type cast, name alias and  dictionary access (via
_MAP)
for each field. If such view is not defined beforehand, making ad-hoc
queries becomes tedious. Is there a way to make it more user-friendly
(eg.
using existing ES mapping) ?

Example: select cast(_MAP['city'] AS varchar(20)) AS \"city\" from

...

Why some adapters require explicit definition (eg. Mongo / ES) while
others
don't (eg. Geode)

2) When not using explicit casting (or field alias) query literals

are

converted to lower case:

SQL: select * from "elastic" where _MAP['Foo'] = 'BAR' (note

upper-case)

ES Query: { "term": { "foo" : "bar" }} (note lower-case)

This is less intuitive. Is there a way to switch it off (customize) ?

3) How to build the following query using Algebra Builder

(RelBuilder)

select * from "elastic" where _MAP['Foo'] = 'Bar'

I presume one has to use SqlItemOperator but I couldn't get it to

work.

Is
this related to (1) and (2) ?

Regards,
Andrei.






Re: ElasticSearch use of rest client (instead of TransportClient)

2018-05-18 Thread Christian Beikov

Hey Andrei,

that would be awesome! Do you know by any chance if the low level client 
is also compatible with older ES versions?



Mit freundlichen Grüßen,

*Christian Beikov*
Am 18.05.2018 um 20:45 schrieb Andrei Sereda:

Hello,

ES TransportClient is deprecated in 7.0 (to be removed
<https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html>
in 8.0) in favor of http rest client(s). Would you consider a contribution
switching to Rest Low-Level Client
<https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low.html>(which
has much fewer dependencies) ?


Thanks,
Andrei.





Re: ElasticSearch use of rest client (instead of TransportClient)

2018-05-18 Thread Christian Beikov
That's mainly because the Java drivers changed in a way that made 
impossible to use the same adapter. I might be wrong, but I think the 
ES5 adapter doesn't work with an ES2 server instance just like the ES2 
adapter doesn't work with an ES5+ server instance.


If all of this could just go away, that would be great :)


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 18.05.2018 um 21:19 schrieb Andrei Sereda:

Yes it should be, since it is just an http client (apache http).
ElasticSearch Rest API (query API) didn't change much
<https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking-changes-5.0.html>
.

Next question would be : why there is a need in two separate modules
elasticsearch2 and elasticsearch5

On Fri, May 18, 2018 at 3:11 PM, Christian Beikov <
christian.bei...@gmail.com> wrote:


Hey Andrei,

that would be awesome! Do you know by any chance if the low level client
is also compatible with older ES versions?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 18.05.2018 um 20:45 schrieb Andrei Sereda:


Hello,

ES TransportClient is deprecated in 7.0 (to be removed
<https://www.elastic.co/guide/en/elasticsearch/client/java-a
pi/master/transport-client.html>
in 8.0) in favor of http rest client(s). Would you consider a contribution
switching to Rest Low-Level Client
<https://www.elastic.co/guide/en/elasticsearch/client/java-r
est/current/java-rest-low.html>(which
has much fewer dependencies) ?


Thanks,
Andrei.






Re: Elasticsearch integration test failures

2018-06-10 Thread Christian Beikov
I don't see a reason for running ES tests again on a "real" ES instance. 
What does that even mean?


With a local node, we would be testing functionality against almost the 
real thing. The only difference is that a real setup has a cluster, but 
that doesn't change the functionality. So I'm all in on using just a 
local node which is also something I wanted to do for a long time, but 
you got to it before I did.


Do the tests succeed with the local node?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 07.06.2018 um 20:47 schrieb Andrei Sereda:

Hi Volodymyr,

We're trying to bridge the gap between unit and integration tests (which
are run infrequently) using fakes
<https://martinfowler.com/bliki/InMemoryTestDatabase.html> in Calcite.

Currently the plan is to use fongo (see CALCITE-2345 for mongo adapter) and
local elastic node (see CALCITE-2347 for elastic).

During unit tests code will run against fake instances. For IT profile they
will run against real instances (as before).

Regards,
Andrei.


On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde  wrote:


Unfortunately, I don’t know. We don’t run the integration tests as often
as we should.


On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi  wrote:

Hi all,

I tried to run Elasticsearch integration tests for the current master,

and

many tests failed.

Is it a known problem, or just an issue of my env?

I noticed that one failure appears after the change made in CALCITE-2316.
Most of the other failures are observed for 1.16 version also.

Kind regards,
Volodymyr Vysotskyi






Re: Elasticsearch integration test failures

2018-06-11 Thread Christian Beikov
If you migrate the existing tests to use the local node in that PR, I'd 
consider merging that PR given that Travis CI reports no test failures.



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 11.06.2018 um 14:33 schrieb Andrei Sereda:

Do the tests succeed with the local node?

IT tests haven't been migrated to local node yet. I wanted to do it as
separate commit.
There are new tests which run against embedded instance successfully. Pls
see PR 716 for details.


I don't see a reason for running ES tests again on a "real" ES instance.

What does that even mean?
The only reason, I see, in connecting to external instance is testing
compatibility with different versions of the engine.

On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior  wrote:


Good point. Since ES is written in Java and we're running the same code
testing on a local node, it's probably not necessary to test against a
separate ES cluster.
--
Michael Mior
mm...@apache.org


Le lun. 11 juin 2018 à 01:44, Christian Beikov 
I don't see a reason for running ES tests again on a "real" ES instance.
What does that even mean?

With a local node, we would be testing functionality against almost the
real thing. The only difference is that a real setup has a cluster, but
that doesn't change the functionality. So I'm all in on using just a
local node which is also something I wanted to do for a long time, but
you got to it before I did.

Do the tests succeed with the local node?


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 07.06.2018 um 20:47 schrieb Andrei Sereda:

Hi Volodymyr,

We're trying to bridge the gap between unit and integration tests

(which

are run infrequently) using fakes
<https://martinfowler.com/bliki/InMemoryTestDatabase.html> in Calcite.

Currently the plan is to use fongo (see CALCITE-2345 for mongo adapter)

and

local elastic node (see CALCITE-2347 for elastic).

During unit tests code will run against fake instances. For IT profile

they

will run against real instances (as before).

Regards,
Andrei.


On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde  wrote:


Unfortunately, I don’t know. We don’t run the integration tests as

often

as we should.


On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi 

wrote:

Hi all,

I tried to run Elasticsearch integration tests for the current

master,

and

many tests failed.

Is it a known problem, or just an issue of my env?

I noticed that one failure appears after the change made in

CALCITE-2316.

Most of the other failures are observed for 1.16 version also.

Kind regards,
Volodymyr Vysotskyi






Re: Elasticsearch integration test failures

2018-06-11 Thread Christian Beikov
For me that's fine, though it would be great if you could fix those given
your time permits it.

Andrei Sereda  schrieb am Mo., 11. Juni 2018, 22:12:

> There are some tests which have been failing for while. Is it OK if I
> temporary mark them @Ignore(d) ?
>
> On Mon, Jun 11, 2018 at 3:48 PM, Christian Beikov <
> christian.bei...@gmail.com> wrote:
>
> > If you migrate the existing tests to use the local node in that PR, I'd
> > consider merging that PR given that Travis CI reports no test failures.
> >
> >
> > Mit freundlichen Grüßen,
> > ----
> > *Christian Beikov*
> > Am 11.06.2018 um 14:33 schrieb Andrei Sereda:
> >
> >> Do the tests succeed with the local node?
> >>>
> >> IT tests haven't been migrated to local node yet. I wanted to do it as
> >> separate commit.
> >> There are new tests which run against embedded instance successfully.
> Pls
> >> see PR 716 for details.
> >>
> >> I don't see a reason for running ES tests again on a "real" ES instance.
> >>>
> >> What does that even mean?
> >> The only reason, I see, in connecting to external instance is testing
> >> compatibility with different versions of the engine.
> >>
> >> On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior  wrote:
> >>
> >> Good point. Since ES is written in Java and we're running the same code
> >>> testing on a local node, it's probably not necessary to test against a
> >>> separate ES cluster.
> >>> --
> >>> Michael Mior
> >>> mm...@apache.org
> >>>
> >>>
> >>> Le lun. 11 juin 2018 à 01:44, Christian Beikov <
> >>> christian.bei...@gmail.com
> >>> a écrit :
> >>>
> >>> I don't see a reason for running ES tests again on a "real" ES
> instance.
> >>>> What does that even mean?
> >>>>
> >>>> With a local node, we would be testing functionality against almost
> the
> >>>> real thing. The only difference is that a real setup has a cluster,
> but
> >>>> that doesn't change the functionality. So I'm all in on using just a
> >>>> local node which is also something I wanted to do for a long time, but
> >>>> you got to it before I did.
> >>>>
> >>>> Do the tests succeed with the local node?
> >>>>
> >>>>
> >>>> Mit freundlichen Grüßen,
> >>>> 
> >>>> 
> >>>> *Christian Beikov*
> >>>> Am 07.06.2018 um 20:47 schrieb Andrei Sereda:
> >>>>
> >>>>> Hi Volodymyr,
> >>>>>
> >>>>> We're trying to bridge the gap between unit and integration tests
> >>>>>
> >>>> (which
> >>>
> >>>> are run infrequently) using fakes
> >>>>> <https://martinfowler.com/bliki/InMemoryTestDatabase.html> in
> Calcite.
> >>>>>
> >>>>> Currently the plan is to use fongo (see CALCITE-2345 for mongo
> adapter)
> >>>>>
> >>>> and
> >>>>
> >>>>> local elastic node (see CALCITE-2347 for elastic).
> >>>>>
> >>>>> During unit tests code will run against fake instances. For IT
> profile
> >>>>>
> >>>> they
> >>>>
> >>>>> will run against real instances (as before).
> >>>>>
> >>>>> Regards,
> >>>>> Andrei.
> >>>>>
> >>>>>
> >>>>> On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde 
> wrote:
> >>>>>
> >>>>> Unfortunately, I don’t know. We don’t run the integration tests as
> >>>>>>
> >>>>> often
> >>>
> >>>> as we should.
> >>>>>>
> >>>>>> On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi 
> >>>>>>>
> >>>>>> wrote:
> >>>
> >>>> Hi all,
> >>>>>>>
> >>>>>>> I tried to run Elasticsearch integration tests for the current
> >>>>>>>
> >>>>>> master,
> >>>
> >>>> and
> >>>>>>
> >>>>>>> many tests failed.
> >>>>>>>
> >>>>>>> Is it a known problem, or just an issue of my env?
> >>>>>>>
> >>>>>>> I noticed that one failure appears after the change made in
> >>>>>>>
> >>>>>> CALCITE-2316.
> >>>>
> >>>>> Most of the other failures are observed for 1.16 version also.
> >>>>>>>
> >>>>>>> Kind regards,
> >>>>>>> Volodymyr Vysotskyi
> >>>>>>>
> >>>>>>
> >>>>
> >
>


Re: Elasticsearch integration test failures

2018-06-11 Thread Christian Beikov
I'm fine with a single PR as long as you split commits up.

Andrei Sereda  schrieb am Mo., 11. Juni 2018, 23:24:

> Hi Christian,
>
> I'm working in parallel on CALCITE-2331 evaluation of predicate "(A or B)
> and C" for ES. Can I fix existing bugs as part of that (or separate) commit
> ?
>
> I was planning to do it just not in the same PR.
>
> On Mon, Jun 11, 2018 at 4:31 PM, Christian Beikov <
> christian.bei...@gmail.com> wrote:
>
> > For me that's fine, though it would be great if you could fix those given
> > your time permits it.
> >
> > Andrei Sereda  schrieb am Mo., 11. Juni 2018, 22:12:
> >
> > > There are some tests which have been failing for while. Is it OK if I
> > > temporary mark them @Ignore(d) ?
> > >
> > > On Mon, Jun 11, 2018 at 3:48 PM, Christian Beikov <
> > > christian.bei...@gmail.com> wrote:
> > >
> > > > If you migrate the existing tests to use the local node in that PR,
> I'd
> > > > consider merging that PR given that Travis CI reports no test
> failures.
> > > >
> > > >
> > > > Mit freundlichen Grüßen,
> > > > 
> > 
> > > > *Christian Beikov*
> > > > Am 11.06.2018 um 14:33 schrieb Andrei Sereda:
> > > >
> > > >> Do the tests succeed with the local node?
> > > >>>
> > > >> IT tests haven't been migrated to local node yet. I wanted to do it
> as
> > > >> separate commit.
> > > >> There are new tests which run against embedded instance
> successfully.
> > > Pls
> > > >> see PR 716 for details.
> > > >>
> > > >> I don't see a reason for running ES tests again on a "real" ES
> > instance.
> > > >>>
> > > >> What does that even mean?
> > > >> The only reason, I see, in connecting to external instance is
> testing
> > > >> compatibility with different versions of the engine.
> > > >>
> > > >> On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior 
> > wrote:
> > > >>
> > > >> Good point. Since ES is written in Java and we're running the same
> > code
> > > >>> testing on a local node, it's probably not necessary to test
> against
> > a
> > > >>> separate ES cluster.
> > > >>> --
> > > >>> Michael Mior
> > > >>> mm...@apache.org
> > > >>>
> > > >>>
> > > >>> Le lun. 11 juin 2018 à 01:44, Christian Beikov <
> > > >>> christian.bei...@gmail.com
> > > >>> a écrit :
> > > >>>
> > > >>> I don't see a reason for running ES tests again on a "real" ES
> > > instance.
> > > >>>> What does that even mean?
> > > >>>>
> > > >>>> With a local node, we would be testing functionality against
> almost
> > > the
> > > >>>> real thing. The only difference is that a real setup has a
> cluster,
> > > but
> > > >>>> that doesn't change the functionality. So I'm all in on using
> just a
> > > >>>> local node which is also something I wanted to do for a long time,
> > but
> > > >>>> you got to it before I did.
> > > >>>>
> > > >>>> Do the tests succeed with the local node?
> > > >>>>
> > > >>>>
> > > >>>> Mit freundlichen Grüßen,
> > > >>>> 
> > > >>>> 
> > > >>>> *Christian Beikov*
> > > >>>> Am 07.06.2018 um 20:47 schrieb Andrei Sereda:
> > > >>>>
> > > >>>>> Hi Volodymyr,
> > > >>>>>
> > > >>>>> We're trying to bridge the gap between unit and integration tests
> > > >>>>>
> > > >>>> (which
> > > >>>
> > > >>>> are run infrequently) using fakes
> > > >>>>> <https://martinfowler.com/bliki/InMemoryTestDatabase.html> in
> > > Calcite.
> > > >>>>>
> > > >>>>> Currently the plan is to use fongo (see CALCITE-2345 for mongo
> > > adapter)
> > > >>>>>
> > > >>>> and
> > > >>>>
> > > >>>>> local elastic node (see CALCITE-2347 for elastic).
> > > >>>>>
> > > >>>>> During unit tests code will run against fake instances. For IT
> > > profile
> > > >>>>>
> > > >>>> they
> > > >>>>
> > > >>>>> will run against real instances (as before).
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> Andrei.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde 
> > > wrote:
> > > >>>>>
> > > >>>>> Unfortunately, I don’t know. We don’t run the integration tests
> as
> > > >>>>>>
> > > >>>>> often
> > > >>>
> > > >>>> as we should.
> > > >>>>>>
> > > >>>>>> On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi 
> > > >>>>>>>
> > > >>>>>> wrote:
> > > >>>
> > > >>>> Hi all,
> > > >>>>>>>
> > > >>>>>>> I tried to run Elasticsearch integration tests for the current
> > > >>>>>>>
> > > >>>>>> master,
> > > >>>
> > > >>>> and
> > > >>>>>>
> > > >>>>>>> many tests failed.
> > > >>>>>>>
> > > >>>>>>> Is it a known problem, or just an issue of my env?
> > > >>>>>>>
> > > >>>>>>> I noticed that one failure appears after the change made in
> > > >>>>>>>
> > > >>>>>> CALCITE-2316.
> > > >>>>
> > > >>>>> Most of the other failures are observed for 1.16 version also.
> > > >>>>>>>
> > > >>>>>>> Kind regards,
> > > >>>>>>> Volodymyr Vysotskyi
> > > >>>>>>>
> > > >>>>>>
> > > >>>>
> > > >
> > >
> >
>


Re: Calcite travis jdk11 failed while compilation of calcite-elasticsearch2 on travis

2018-06-12 Thread Christian Beikov
Seems the latest Java 11 build will be stricter about the manifest 
format when reading. Also see other reports here:


 * https://bugs.eclipse.org/bugs/show_bug.cgi?id=535774
 * 
https://issues.jenkins-ci.org/browse/JENKINS-51871?workflowName=JNJira+%2B+In-Review&stepId=1

Not sure if there is anything you can do. I'd assume other tests that 
use e.g. Netty would fail compilation as well.


Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 12.06.2018 um 14:11 schrieb Sergey Nuyanzin:

Hello everyone

Just I have an issue related to compilation of calcite-elasticsearch2 on
travis
As I see it relates only jdk11 while with others compiles ok
e.g. https://travis-ci.org/apache/calcite/builds/391198921

could anyone please share knowledge how to overcome this?





Re: Elasticsearch integration test failures

2018-06-12 Thread Christian Beikov
Interesting, I tested both environments back then and it seemed to work. 
Good thing that we are about to replace the IT tests with using local ES 
nodes. This will finally allow us to run these tests regularly.


Regarding ES2 and ES5, I'd suggest you have separate 
ElasticsearchProjectRule for each version. In Elasticsearch2Table and 
Elasticsearch5Table you add the rules to the ElasticsearchTableScan 
constructor.



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 13.06.2018 um 06:15 schrieb Andrei Sereda:

The issue seems to have started since CALCITE-1967 (ES5 support).
ES2 expects keyword _source in scripted fields while ES5 param._source.
See  ElasticsearchProject
<https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchProject.java#L79>
.

Also ES2 doesn't like when both script_fields and _source are present in
the query. So if there are some explicit projections all have to be present
in script_fields.

Any elegant  way to differentiate at runtime ES2 vs ES5 inside
ElasticsearchProject ?



On Mon, Jun 11, 2018 at 5:46 PM Christian Beikov 
wrote:


I'm fine with a single PR as long as you split commits up.

Andrei Sereda  schrieb am Mo., 11. Juni 2018, 23:24:


Hi Christian,

I'm working in parallel on CALCITE-2331 evaluation of predicate "(A or B)
and C" for ES. Can I fix existing bugs as part of that (or separate)

commit

?

I was planning to do it just not in the same PR.

On Mon, Jun 11, 2018 at 4:31 PM, Christian Beikov <
christian.bei...@gmail.com> wrote:


For me that's fine, though it would be great if you could fix those

given

your time permits it.

Andrei Sereda  schrieb am Mo., 11. Juni 2018, 22:12:


There are some tests which have been failing for while. Is it OK if I
temporary mark them @Ignore(d) ?

On Mon, Jun 11, 2018 at 3:48 PM, Christian Beikov <
christian.bei...@gmail.com> wrote:


If you migrate the existing tests to use the local node in that PR,

I'd

consider merging that PR given that Travis CI reports no test

failures.


Mit freundlichen Grüßen,
--------



*Christian Beikov*
Am 11.06.2018 um 14:33 schrieb Andrei Sereda:


Do the tests succeed with the local node?
IT tests haven't been migrated to local node yet. I wanted to do

it

as

separate commit.
There are new tests which run against embedded instance

successfully.

Pls

see PR 716 for details.

I don't see a reason for running ES tests again on a "real" ES

instance.

What does that even mean?
The only reason, I see, in connecting to external instance is

testing

compatibility with different versions of the engine.

On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior 

wrote:

Good point. Since ES is written in Java and we're running the same

code

testing on a local node, it's probably not necessary to test

against

a

separate ES cluster.
--
Michael Mior
mm...@apache.org


Le lun. 11 juin 2018 à 01:44, Christian Beikov <
christian.bei...@gmail.com
a écrit :

I don't see a reason for running ES tests again on a "real" ES

instance.

What does that even mean?

With a local node, we would be testing functionality against

almost

the

real thing. The only difference is that a real setup has a

cluster,

but

that doesn't change the functionality. So I'm all in on using

just a

local node which is also something I wanted to do for a long

time,

but

you got to it before I did.

Do the tests succeed with the local node?


Mit freundlichen Grüßen,


*Christian Beikov*
Am 07.06.2018 um 20:47 schrieb Andrei Sereda:


Hi Volodymyr,

We're trying to bridge the gap between unit and integration

tests

(which
are run infrequently) using fakes

<https://martinfowler.com/bliki/InMemoryTestDatabase.html> in

Calcite.

Currently the plan is to use fongo (see CALCITE-2345 for mongo

adapter)

and


local elastic node (see CALCITE-2347 for elastic).

During unit tests code will run against fake instances. For IT

profile

they


will run against real instances (as before).

Regards,
Andrei.


On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde 

wrote:

Unfortunately, I don’t know. We don’t run the integration tests

as

often

as we should.

On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi 
Hi all,

I tried to run Elasticsearch integration tests for the

current

master,

and

many tests failed.

Is it a known problem, or just an issue of my env?

I noticed that one failure appears after the change made in


CALCITE-2316.

Most of the other failures are observed for 1.16 version also.

Kind regards,
Volodymyr Vysotskyi





Re: Calcite travis jdk11 failed while compilation of calcite-elasticsearch2 on travis

2018-06-12 Thread Christian Beikov
I think this might actually be a JDK bug. Let's wait a bit for a new EA 
release.



Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 12.06.2018 um 22:14 schrieb Julian Hyde:

What would be the command-line arguments to achieve this? I will add them to my 
nightly build scripts.

I already skip certain Guava versions based on JDK.

Julian




On Jun 12, 2018, at 12:42 PM, Kevin Risden  wrote:

We might want to skip JDK11 with ES 2.x. 2.x is EOL from Elastic. I can
look into this if it helps.

Kevin Risden

On Tue, Jun 12, 2018, 10:54 Christian Beikov 
wrote:


Seems the latest Java 11 build will be stricter about the manifest
format when reading. Also see other reports here:

  * https://bugs.eclipse.org/bugs/show_bug.cgi?id=535774
  *
https://issues.jenkins-ci.org/browse/JENKINS-51871?workflowName=JNJira+%2B+In-Review&stepId=1

Not sure if there is anything you can do. I'd assume other tests that
use e.g. Netty would fail compilation as well.

Mit freundlichen Grüßen,
----
*Christian Beikov*
Am 12.06.2018 um 14:11 schrieb Sergey Nuyanzin:

Hello everyone

Just I have an issue related to compilation of calcite-elasticsearch2 on
travis
As I see it relates only jdk11 while with others compiles ok
e.g. https://travis-ci.org/apache/calcite/builds/391198921

could anyone please share knowledge how to overcome this?







Re: ElasticSearch use of rest client (instead of TransportClient)

2018-06-22 Thread Christian Beikov
The original reason for having separate adapters was because the ES 
Java-Client SDKs require different library versions which aren't binary 
compatible. Having separate modules just seemed to be the simplest 
solution. If you can make sure this is not going to be a problem for 
users, I'd be all for unifying the adapters. Changing the dependency and 
the schema factory name are IMO not problematic.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 22.06.2018 um 19:07 schrieb Andrei Sereda:

1) If we go single (and separate) ES adapter route, people will have to
change their existing maven dependencies as well as ES schema configuration
(at least SchemaFactory name). I'm not sure if there are any explicit (or
implicit) backwards compatibility policies in calcite. There are (albeit)
small implications for end-user.

On Fri, Jun 22, 2018 at 12:54 PM Michael Mior  wrote:


1) I personally would be open to this unless there's strong evidence of use
of the ES2 adapter.

2) Calcite already depends on Jackson in core and both ES modules, so this
isn't a concern.
--
Michael Mior
mm...@apache.org



Le ven. 22 juin 2018 à 12:37, Andrei Sereda  a écrit :


Some questions regarding this change:

1) Should one remove ES2 and ES5 adapters (maven modules) in favor of
single one: just ES ? This will be backwards incompatible change. Or keep
them as is and create a new module ? There is also quite a bit of ES
related code in calcite-core.

2) Since I need to create / parse JSON formats, ES adapter would have to
depend on some JSON library (most likely existing Jackson). Is that
acceptable ?



On Fri, May 18, 2018 at 4:29 PM Andrei Sereda  wrote:


I believe this shouldn't be an issue with http client (contrary to

native

transport)

On Fri, May 18, 2018, 16:16 Christian Beikov <

christian.bei...@gmail.com

wrote:


That's mainly because the Java drivers changed in a way that made
impossible to use the same adapter. I might be wrong, but I think the
ES5 adapter doesn't work with an ES2 server instance just like the ES2
adapter doesn't work with an ES5+ server instance.

If all of this could just go away, that would be great :)


Mit freundlichen Grüßen,


--------

*Christian Beikov*
Am 18.05.2018 um 21:19 schrieb Andrei Sereda:

Yes it should be, since it is just an http client (apache http).
ElasticSearch Rest API (query API) didn't change much
<

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking-changes-5.0.html

.

Next question would be : why there is a need in two separate modules
elasticsearch2 and elasticsearch5

On Fri, May 18, 2018 at 3:11 PM, Christian Beikov <
christian.bei...@gmail.com> wrote:


Hey Andrei,

that would be awesome! Do you know by any chance if the low level

client

is also compatible with older ES versions?


Mit freundlichen Grüßen,


--------

*Christian Beikov*
Am 18.05.2018 um 20:45 schrieb Andrei Sereda:


Hello,

ES TransportClient is deprecated in 7.0 (to be removed
<https://www.elastic.co/guide/en/elasticsearch/client/java-a
pi/master/transport-client.html>
in 8.0) in favor of http rest client(s). Would you consider a

contribution

switching to Rest Low-Level Client
<https://www.elastic.co/guide/en/elasticsearch/client/java-r
est/current/java-rest-low.html>(which
has much fewer dependencies) ?


Thanks,
Andrei.








Re: ElasticSearch use of rest client (instead of TransportClient)

2018-06-22 Thread Christian Beikov

Looks great!


Mit freundlichen Grüßen,

*Christian Beikov*
Am 22.06.2018 um 20:24 schrieb Michael Mior:

Looks good to me but I'll defer to Christian since I know little about ES.
Thanks for this Andrei!
--
Michael Mior
mm...@apache.org



Le ven. 22 juin 2018 à 14:13, Andrei Sereda  a écrit :


CALCITE-2376 <https://issues.apache.org/jira/browse/CALCITE-2376>

Please let me know if you agree with the plan

On Fri, Jun 22, 2018 at 1:47 PM Christian Beikov <
christian.bei...@gmail.com>
wrote:


The original reason for having separate adapters was because the ES
Java-Client SDKs require different library versions which aren't binary
compatible. Having separate modules just seemed to be the simplest
solution. If you can make sure this is not going to be a problem for
users, I'd be all for unifying the adapters. Changing the dependency and
the schema factory name are IMO not problematic.


Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 22.06.2018 um 19:07 schrieb Andrei Sereda:

1) If we go single (and separate) ES adapter route, people will have to
change their existing maven dependencies as well as ES schema

configuration

(at least SchemaFactory name). I'm not sure if there are any explicit

(or

implicit) backwards compatibility policies in calcite. There are

(albeit)

small implications for end-user.

On Fri, Jun 22, 2018 at 12:54 PM Michael Mior 

wrote:

1) I personally would be open to this unless there's strong evidence

of

use

of the ES2 adapter.

2) Calcite already depends on Jackson in core and both ES modules, so

this

isn't a concern.
--
Michael Mior
mm...@apache.org



Le ven. 22 juin 2018 à 12:37, Andrei Sereda  a

écrit

:

Some questions regarding this change:

1) Should one remove ES2 and ES5 adapters (maven modules) in favor of
single one: just ES ? This will be backwards incompatible change. Or

keep

them as is and create a new module ? There is also quite a bit of ES
related code in calcite-core.

2) Since I need to create / parse JSON formats, ES adapter would have

to

depend on some JSON library (most likely existing Jackson). Is that
acceptable ?



On Fri, May 18, 2018 at 4:29 PM Andrei Sereda 

wrote:

I believe this shouldn't be an issue with http client (contrary to

native

transport)

On Fri, May 18, 2018, 16:16 Christian Beikov <

christian.bei...@gmail.com

wrote:


That's mainly because the Java drivers changed in a way that made
impossible to use the same adapter. I might be wrong, but I think

the

ES5 adapter doesn't work with an ES2 server instance just like the

ES2

adapter doesn't work with an ES5+ server instance.

If all of this could just go away, that would be great :)


Mit freundlichen Grüßen,


--------

*Christian Beikov*
Am 18.05.2018 um 21:19 schrieb Andrei Sereda:

Yes it should be, since it is just an http client (apache http).
ElasticSearch Rest API (query API) didn't change much
<

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking-changes-5.0.html

.

Next question would be : why there is a need in two separate

modules

elasticsearch2 and elasticsearch5

On Fri, May 18, 2018 at 3:11 PM, Christian Beikov <
christian.bei...@gmail.com> wrote:


Hey Andrei,

that would be awesome! Do you know by any chance if the low level

client

is also compatible with older ES versions?


Mit freundlichen Grüßen,


----

*Christian Beikov*
Am 18.05.2018 um 20:45 schrieb Andrei Sereda:


Hello,

ES TransportClient is deprecated in 7.0 (to be removed
<https://www.elastic.co/guide/en/elasticsearch/client/java-a
pi/master/transport-client.html>
in 8.0) in favor of http rest client(s). Would you consider a

contribution

switching to Rest Low-Level Client
<https://www.elastic.co/guide/en/elasticsearch/client/java-r
est/current/java-rest-low.html>(which
has much fewer dependencies) ?


Thanks,
Andrei.








Re: [2/2] calcite git commit: [CALCITE-2347] running ElasticSearch in embedded mode for unit tests of ES adapter (Andrei Sereda)

2018-06-22 Thread Christian Beikov
Huh? How is that possible? I thought the checks that are run by the 
Travis CI build catch these kinds of errors.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 22.06.2018 um 23:11 schrieb Julian Hyde:

Looks like this change broke “mvn site” (perhaps also “mvn 
javadoc:test-javadoc”).

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-site-plugin:3.7:site (default-site) on project 
calcite: Error generating maven-javadoc-plugin:3.0.1:test-aggregate report:
[ERROR] Exit code: 1 - 
/home/jhyde/regress/calcite/elasticsearch2/src/test/java/org/apache/calcite/adapter/elasticsearch2/EmbeddedElasticNode.java:29:
 error: package org.elasticsearch.node.internal does not exist
[ERROR] import org.elasticsearch.node.internal.InternalSettingsPreparer;
[ERROR]   ^
[ERROR]


On Jun 21, 2018, at 3:39 AM, mm...@apache.org wrote:

[CALCITE-2347] running ElasticSearch in embedded mode for unit tests of ES 
adapter (Andrei Sereda)

After discussion on dev-list Integration tests (for ES) have been removed. 
They're now
superseded by unit tests (which execute queries against a real elastic instance)

Added local file (zips-mini.json) which contains a small subset of original 
zips.json
(allows to bootstrap tests faster)

Created separate ES JUnit rule which can be re-used across different tests.

Both v2 and v5 of ES adapters are supported.

Close apache/calcite#716


Project: http://git-wip-us.apache.org/repos/asf/calcite/repo
Commit: http://git-wip-us.apache.org/repos/asf/calcite/commit/c12cb4b0
Tree: http://git-wip-us.apache.org/repos/asf/calcite/tree/c12cb4b0
Diff: http://git-wip-us.apache.org/repos/asf/calcite/diff/c12cb4b0

Branch: refs/heads/master
Commit: c12cb4b0de1baa3f7cbb9952ee350fdd1701662d
Parents: 37944bb
Author: Andrei Sereda 
Authored: Thu May 31 18:19:10 2018 -0400
Committer: Michael Mior 
Committed: Thu Jun 21 06:38:50 2018 -0400

--
.../AbstractElasticsearchTable.java |  12 +
.../elasticsearch/ElasticsearchProject.java |  61 ++-
elasticsearch2/pom.xml  |   6 +
.../Elasticsearch2Enumerator.java   |  12 +-
.../elasticsearch2/Elasticsearch2Schema.java|  16 +-
.../elasticsearch2/Elasticsearch2Table.java |   9 +-
.../ElasticSearch2AdapterTest.java  | 395 ++
.../elasticsearch2/EmbeddedElasticNode.java | 147 +++
.../elasticsearch2/EmbeddedElasticRule.java |  97 +
.../org/apache/calcite/test/ElasticChecker.java |  49 +++
.../calcite/test/Elasticsearch2AdapterIT.java   | 270 -
.../resources/elasticsearch-zips-model.json |  50 ---
.../src/test/resources/zips-mini.json   | 149 +++
elasticsearch5/pom.xml  |  31 ++
.../elasticsearch5/Elasticsearch5Schema.java|  17 +-
.../elasticsearch5/Elasticsearch5Table.java |  11 +-
.../ElasticSearch5AdapterTest.java  | 399 +++
.../elasticsearch5/EmbeddedElasticNode.java | 153 +++
.../elasticsearch5/EmbeddedElasticRule.java |  98 +
.../org/apache/calcite/test/ElasticChecker.java |  49 +++
.../calcite/test/Elasticsearch5AdapterIT.java   | 270 -
.../resources/elasticsearch-zips-model.json |  50 ---
elasticsearch5/src/test/resources/log4j2.xml|  16 +
.../src/test/resources/zips-mini.json   | 149 +++
pom.xml |  20 +-
25 files changed, 1866 insertions(+), 670 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/calcite/blob/c12cb4b0/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java
--
diff --git 
a/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java
 
b/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java
index 0980469..8cc5933 100644
--- 
a/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java
+++ 
b/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java
@@ -75,6 +75,18 @@ public abstract class AbstractElasticsearchTable extends 
AbstractQueryableTable
 relOptTable, this, null);
   }

+  /**
+   * In ES 5.x scripted fields start with {@code params._source.foo} while in 
ES2.x
+   * {@code _source.foo}. Helper method to build correct query based on 
runtime version of elastic.
+   *
+   * @see https://github.com/elastic/elasticsearch/issues/20068";>_source 
variable
+   * @see https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-fields.html";>Scripted
 Fields
+   */
+  protected String scriptedFieldPrefix() {
+// this is default

Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table

2018-06-28 Thread Christian Beikov
Is there an API to discover indexes? If there is, I'd suggest we allow a 
config option that to make the adapter discover the possible indexes. 
We'd still have to adapt the code a bit, but internally, the schema 
could just keep a cache of type name to index name map and be able to 
support both scenarios.



Mit freundlichen Grüßen,

*Christian Beikov*
Am 29.06.2018 um 00:12 schrieb Andrei Sereda:

1) What's the time horizon for the current adapter no longer working with these

changes to ES ?
Current adapter will be working for a while with existing setup. The
problem is nomenclature and ease of use.

Their new SQL concepts mapping
<https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html>
drops
the notion of ES type (which before was equivalent of RDBMS table) and uses
ES index as new table equivalent (before ES index was equal to database).
Most users use elastic this way (one type , one index) index == table.

Currently calcite requires schema per index. In RDBMS parlance database per
table (I'd like to change that).


2) Any guess how complicated it would be to maintain code paths for both
behaviours? I know this is probably really challenging to estimate, but I
really have no idea of the scope of these changes. Would it mean two
different ES adapters?

One can have just a separate calcite schema implementations (same adapter /
module) :
1)  LegacySchema (old). Schema can have only one index (but multiple
types). Type == table in this case.
2)  NewSchema (new). Single schema can have multiple indexes (type is
dropped). Index == table in this case


3) Do we really need compatibility with the current version of the

adapter?

IMO this depends on what versions of ES we would lose support for and how
complex it would be for users of the current ES adapter to make updates

for

any Calcite API changes.

The issue is not in adapter but how calcite schema exposes tables.  Should
it expose index as individual table (new), or ES type (old) ?

Andrei.

On Thu, Jun 28, 2018 at 5:23 PM Michael Mior  wrote:


Unfortunately I know very little about ES so I'm not in a great position to
asses the impact of these changes. I will say that that legacy
compatibility is great, but maintaining two sets of logic is always a
challenge. A few follow up questions:

1) What's the time horizon for the current adapter no longer working with
these changes to ES?

2) Any guess how complicated it would be to maintain code paths for both
behaviours? I know this is probably really challenging to estimate, but I
really have no idea of the scope of these changes. Would it mean two
different ES adapters?

3) Do we really need compatibility with the current version of the adapter?
IMO this depends on what versions of ES we would lose support for and how
complex it would be for users of the current ES adapter to make updates for
any Calcite API changes.

Thanks for your continued work on the ES adapter Andrei!

--
Michael Mior
mm...@apache.org



Le jeu. 28 juin 2018 à 12:57, Andrei Sereda  a écrit :


Hello,

Elastic announced
<


https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html

that they will be deprecating mapping types in ES6 and indexes will be
single-typed only.

Historical analogy <https://www.elastic.co/blog/index-vs-type> between
RDBMS and elastic was that index is equivalent to a database and type
corresponds to table in that database. In a couple of releases (ES6-8)

this

shall not longer be true.

Recent SQL addition
<https://www.elastic.co/blog/elasticsearch-6-3-0-released> to elastic
confirms
this trend
<


https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html

.

Index is equivalent to a table and there are no more ES types.

I would like to propose to include this logic in Calcite ES adapter. IE,
expose each ES single-typed index as a separate table inside calcite
schema. This is in contrast to  current integration where schema can only
have a single index. Current approach forces you to create multiple

schemas

to query single-typed indexes (on the same ES cluster).

Legacy compatibility can always be controlled with configuration
parameters.

Do you agree with such changes ? If yes, would you consider a PR ?

Regards,
Andrei.





Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table

2018-06-29 Thread Christian Beikov
IMO the best solution would be to make it configurable by introducing a 
"table_mapping" config with values


 * type - every type in the known indices is mapped as table
 * index - every known index is mapped as table

We'd probably also need a "type_field" configuration for defining which 
field to use for the type determination as one of the possible future 
ways to do things is to introduce a custom field: 
https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2


We already detect the ES version, so we can set a smart default for this 
setting. Let's make the index config param optional.


 * When no index is given, we discover indexes, the default for
   "table_mapping" then is "index"
 * When index is given, the we only discover types according to the
   "type_field" configuration and the default for "table_mapping" is "type"

This would also allow to discover indexes but still use "type" as 
"table_mapping".


What do you think?

Mit freundlichen Grüßen,

*Christian Beikov*
Am 29.06.2018 um 02:41 schrieb Andrei Sereda:

Yes. There is an API to list all indexes / types in elastic. They can be
automatically imported into a schema.

What needs to be agreed upon is how to expose those elements in calcite
schema (naming / behaviour).

1) Many (most?) of setups are single type per index. Natural way to name
would be  "elastic.$index" (elastic being schema name). Multiple indexes
would be under same schema "elastic.index1" "elastic.index2" etc.

2) What if index has several types should they exported as calcite tables:
"elastic.$index_type1" "elastic.$index_type2" ?  Or (current behaviour) as
"elastic.type1" and "elastic.type2". Or as subschema
"elastic.$index.type1" ?

Now what if one has combination of (1) and (2) ?
Setup (2) is already deprecated (and will be unsupported in next version)


On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov 
wrote:


Is there an API to discover indexes? If there is, I'd suggest we allow a
config option that to make the adapter discover the possible indexes.
We'd still have to adapt the code a bit, but internally, the schema
could just keep a cache of type name to index name map and be able to
support both scenarios.


Mit freundlichen Grüßen,

*Christian Beikov*
Am 29.06.2018 um 00:12 schrieb Andrei Sereda:

1) What's the time horizon for the current adapter no longer working

with these

changes to ES ?
Current adapter will be working for a while with existing setup. The
problem is nomenclature and ease of use.

Their new SQL concepts mapping
<

https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html

drops
the notion of ES type (which before was equivalent of RDBMS table) and

uses

ES index as new table equivalent (before ES index was equal to database).
Most users use elastic this way (one type , one index) index == table.

Currently calcite requires schema per index. In RDBMS parlance database

per

table (I'd like to change that).


2) Any guess how complicated it would be to maintain code paths for both
behaviours? I know this is probably really challenging to estimate, but

I

really have no idea of the scope of these changes. Would it mean two
different ES adapters?

One can have just a separate calcite schema implementations (same

adapter /

module) :
1)  LegacySchema (old). Schema can have only one index (but multiple
types). Type == table in this case.
2)  NewSchema (new). Single schema can have multiple indexes (type is
dropped). Index == table in this case


3) Do we really need compatibility with the current version of the

adapter?

IMO this depends on what versions of ES we would lose support for and

how

complex it would be for users of the current ES adapter to make updates

for

any Calcite API changes.

The issue is not in adapter but how calcite schema exposes tables.

Should

it expose index as individual table (new), or ES type (old) ?

Andrei.

On Thu, Jun 28, 2018 at 5:23 PM Michael Mior  wrote:


Unfortunately I know very little about ES so I'm not in a great

position to

asses the impact of these changes. I will say that that legacy
compatibility is great, but maintaining two sets of logic is always a
challenge. A few follow up questions:

1) What's the time horizon for the current adapter no longer working

with

these changes to ES?

2) Any guess how complicated it would be to maintain code paths for both
behaviours? I know this is probably really challenging to estimate, but

I

really have no idea of the scope of these changes. Would it mean two
different ES adapters?

3) D

Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table

2018-06-29 Thread Christian Beikov
I'm not sure what the benefit of allowing users to specify this scheme 
would be. We'd have to parse it, interpret it, make sure the expressions 
don't result conflicting names etc.


IMO a simple mode configuration would be way easier to implement and 
probably cover 99% of the use cases.



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 29.06.2018 um 20:19 schrieb Julian Hyde:

Andrei,

I'm not an ES user so I don't fully understand this issue, but my two
cents anyway...

Can you show how those examples affect SQL against the ES adapter
and/or how they affect JSON models?

You seem to be using '_' as a separator character. Are we sure that
people will never use it in index or type name? Separator characters
often cause problems.

Julian




On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda  wrote:

I agree there should be a configuration option. How about the following
approach.

Expose both variables ${index} and ${type} in configuration (JSON) and user
will use them to generate table name in calcite schema.

Example
"table_name": "${type}" // current
"table_name": "${index}" // new (default?)
"table_name": "${index}_${type}" // most generic. supports multiple types
per index





On Fri, Jun 29, 2018 at 9:26 AM Michael Mior  wrote:


I think it sounds like you and Andrei are in a good position to tackle this
one so I'm happy to have you both work on whatever solution you think is
best.

--
Michael Mior
mm...@apache.org



Le ven. 29 juin 2018 à 04:19, Christian Beikov 
IMO the best solution would be to make it configurable by introducing a
"table_mapping" config with values

   * type - every type in the known indices is mapped as table
   * index - every known index is mapped as table

We'd probably also need a "type_field" configuration for defining which
field to use for the type determination as one of the possible future
ways to do things is to introduce a custom field:



https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2

We already detect the ES version, so we can set a smart default for this
setting. Let's make the index config param optional.

   * When no index is given, we discover indexes, the default for
 "table_mapping" then is "index"
   * When index is given, the we only discover types according to the
 "type_field" configuration and the default for "table_mapping" is
"type"

This would also allow to discover indexes but still use "type" as
"table_mapping".

What do you think?

Mit freundlichen Grüßen,

*Christian Beikov*
Am 29.06.2018 um 02:41 schrieb Andrei Sereda:

Yes. There is an API to list all indexes / types in elastic. They can

be

automatically imported into a schema.

What needs to be agreed upon is how to expose those elements in calcite
schema (naming / behaviour).

1) Many (most?) of setups are single type per index. Natural way to

name

would be  "elastic.$index" (elastic being schema name). Multiple

indexes

would be under same schema "elastic.index1" "elastic.index2" etc.

2) What if index has several types should they exported as calcite

tables:

"elastic.$index_type1" "elastic.$index_type2" ?  Or (current behaviour)

as

"elastic.type1" and "elastic.type2". Or as subschema
"elastic.$index.type1" ?

Now what if one has combination of (1) and (2) ?
Setup (2) is already deprecated (and will be unsupported in next

version)


On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov <

christian.bei...@gmail.com>

wrote:


Is there an API to discover indexes? If there is, I'd suggest we

allow a

config option that to make the adapter discover the possible indexes.
We'd still have to adapt the code a bit, but internally, the schema
could just keep a cache of type name to index name map and be able to
support both scenarios.


Mit freundlichen Grüßen,




*Christian Beikov*
Am 29.06.2018 um 00:12 schrieb Andrei Sereda:

1) What's the time horizon for the current adapter no longer working

with these

changes to ES ?
Current adapter will be working for a while with existing setup. The
problem is nomenclature and ease of use.

Their new SQL concepts mapping
<

https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html

drops
the notion of ES type (which before was equivalent of RDBMS table)

and

uses

ES index as new table equivalent (before ES index was equal to

database).

Most users use elastic this way (one type , one index) index ==

table

Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table

2018-06-30 Thread Christian Beikov
I like the idea of the regex filter, might be cool to have something 
like that in general for all adapters, but it's fine if you do it just 
for ES now. I guess you are considering include and exclude pattern 
parameters?


I'm more for a mode parameter and not let the user decide the name 
explicitly. Either the types or the index names will have to have 
meaningful unique names or the user will have to map certain indexes to 
a different schema. IMO that's a good solution.



Mit freundlichen Grüßen,
--------
*Christian Beikov*
Am 30.06.2018 um 16:43 schrieb Andrei Sereda:

Christian / Michael,

Can you please weight-in for your preferred solution and I'll implement it.

One more question. Sometimes it is nice to be able to filter (limit)
indexes (tables) exposed by calcite. Say my cluster has 10 indexes but I
want user to query only one. Would you be opposed if I add configuration
parameter which allows to specify a (eg. regexp) filter for ES indexes ?


On Fri, Jun 29, 2018 at 11:17 PM Andrei Sereda  wrote:


That's a reasonable alternative.

On Fri, Jun 29, 2018 at 7:57 PM Julian Hyde  wrote:


Maybe there could be a separator char as one of the adapter’s parameters.
People should choose a value, say ‘$’ or ‘#’, that is legal in an unquoted
SQL identifier but does not occur in any of their index or type names.

If not specified, the adapter would end up in a simple mode, say looking
for indexes first, then looking for types, and people would need to make
sure indexes and types have distinct names. After the transition to
single-type indexes, people could stop using the parameter.

Julian



On Jun 29, 2018, at 4:43 PM, Andrei Sereda  wrote:

That's a valid point. Then user would define a different pattern like
"i$index_t$type" for his cluster.

I think  we should first answer wherever such scenarios should be

supported

by calcite (given that they're already deprecated by the vendor). If

yes,

what should be collision strategy ? User defined pattern like above or
failure or auto generated name ?

On Fri, Jun 29, 2018, 19:14 Julian Hyde  wrote:


In elastic (index/type) pair is guaranteed to be unique therefore
"${index}_${type}" will be also unique (as string). This is only

necessary

when we have several types per index. Valid question is wherever user
should be allowed such flexibility.

Uniqueness is not my concern.

Suppose there is an index called "x_y" with a type called "z", and
another index called "x" with a type called "y_z". If I write "x_y_z"
it's not clear how it should be broken into index/type.


On Fri, Jun 29, 2018 at 3:15 PM, Andrei Sereda 

wrote:

Can you show how those examples affect SQL against the ES adapter

and/or

how they affect JSON models?

The discussion is how to properly bridge (index/type) concept from ES

into

relational world. Proposal to use placeholders ($index / $type)

affects

only how table is named in calcite. They're not used as SQL literals.

IE

it

affects only configuration phase of the schema.
Pretty much we're doing string/replace to derive table name from
($index/$type).


You seem to be using '_' as a separator character. Are we sure that
people will never use it in index or type name? Separator characters
often cause problems.

In elastic (index/type) pair is guaranteed to be unique therefore
"${index}_${type}" will be also unique (as string). This is only

necessary

when we have several types per index. Valid question is wherever user
should be allowed such flexibility.



On Fri, Jun 29, 2018 at 2:19 PM Julian Hyde  wrote:


Andrei,

I'm not an ES user so I don't fully understand this issue, but my two
cents anyway...

Can you show how those examples affect SQL against the ES adapter
and/or how they affect JSON models?

You seem to be using '_' as a separator character. Are we sure that
people will never use it in index or type name? Separator characters
often cause problems.

Julian




On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda 

wrote:

I agree there should be a configuration option. How about the

following

approach.

Expose both variables ${index} and ${type} in configuration (JSON)

and

user

will use them to generate table name in calcite schema.

Example
"table_name": "${type}" // current
"table_name": "${index}" // new (default?)
"table_name": "${index}_${type}" // most generic. supports multiple

types

per index





On Fri, Jun 29, 2018 at 9:26 AM Michael Mior 

wrote:

I think it sounds like you and Andrei are in a good position to

tackle

this

one so I'm happy to have you both work on whatever solution you

think is

best.

--
Michael Mior
mm...@apache.org



Le ven. 29 juin 2018 à 04:19, Christian Beikov <

christian.bei...@

Re: [VOTE] [CALCITE-490] Remove "End File.java" comments from the end of files

2018-09-10 Thread Christian Beikov

+1

Am 10.09.2018 um 10:49 schrieb Vladimir Sitnikov:

Hi,

Could we have a vote on coding style regarding "// End File.java" in
Calcite source files?
"//End..." comments clutter Git history, they consume screen space, and
they slow down the development by causing Checkstyle violations.

Could we vote on the style regarding "//End File.java"

[ ] +1, remove the ending "// End File.java" comment, ensure files end with
single blank line
[ ] -1, keep "// End File.java" since ...


https://www.apache.org/foundation/voting.html#votes-on-code-modification
Whole numbers are recommended for this type of vote, as the opinion being

expressed is Boolean: 'I approve/do not approve of this change.'

...
A veto without a justification is invalid and has no weight.


Note back in 2014 Julian did mention 5 thoughts which are questionable.

Julian>And it helps ensure that the file ends with a line ending.

This is just false. The comment line has nothing to di with line ending.
A rule of "file must end with a single blank line" can be verified without
"dedicated comment line".

Julian>It indicates the name of the file (useful if the file has been
renamed).

If a file is renamed, then ending comment HAS to be updated otherwise
Checkstyle fails the build.
In other words, comment always matches "current" file name, thus it cannot
help to detect renames.

Julian>The comment makes it easy to see that you are looking at the last
page of a file in an editor.

This is true, however it does sound like a very rare use-case for me.
Editors often show file name on their own (tab name, application title,
navigation gutter).
Of course, certain UI elements in the IDE can be disabled, however I sill
think IDE is much better at telling you which file you are looking at than
a comment in the very last line of the file.

If we follow that logic, should we add "filename comment" after each 25
lines so one always knows which file is open? That is just non-sense.

Julian>It verifies that the file has not been truncated

If the file is truncated, the compiler will bail out. Unit test could
detect that as well.
On the other hand, "ending comment" does not protect from accidental cut
from a middle of the file.

Julian>and provides a buffer against truncation issues.

I don't get it, however it mentions truncation, thus I assume that is not a
valid justification. Truncation is to be managed by Git / code review /
compiler / test suite.

Vladimir



Re: scrolling in ElasticSearch adapter

2018-10-24 Thread Christian Beikov

Hey,

not sure if this should be an SQL keyword. JDBC specifies various 
constants that can be used at statement creation time: 
https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html


Not sure though if or how these configurations are accessible for data 
stores or dialects, but IMO using these would be the proper way.


Regards

Christian

Am 24.10.2018 um 18:44 schrieb Andrei Sereda:

Hello,


I was thinking about adding [scrolling functionality](
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html)
to elastic search adapter. Since scrolling has non-negligible effect on the
cluster it should be selectively enabled on per query basis. So, likely,
user has to explicitly set "scroll flag" somewhere.

Most natural way seems in SQL. [Calcite sql grammar](
https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
(unused to my knowledge). There were also discussions about adding hints to
Calcite.

### Examples
```sql
-- special sql keyword ?
SCROLL select * from elastic;

-- assuming hints are available in calcite
/* HINT: scroll */ select * from elastic;
```

What people think about this use-case ? Are there better ideas ?

Regards,
Andrei.



Re: scrolling in ElasticSearch adapter

2018-10-24 Thread Christian Beikov
In JDBC one can configure a fetch size which would reflect the amount of 
rows to be fetched initially, but also subsequently. 
https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)


According to what you are writing, ES behvior is what 
TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't 
affected by changes.


IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4, 
... and view R1, R2, then R3 is deleted and you fetch the next rows, you 
wouldn't see R3.


According to the JDBC spec 
(https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int) 
) you don't have to support all modes. Usually, user code doesn't use 
scrolling that much, but at least forward scrolling makes sense.


Am 24.10.2018 um 21:38 schrieb Andrei Sereda:

Hi Julian,

Scrolling (in elastic) does not only mean “open a cursor” but also iterate
over consistent snapshot. From docs:

The results that are returned from a scroll request reflect the state of
the index at the time that the initial search request was made, like a
snapshot in time. Subsequent changes to documents (index, update or delete)
will only affect later search requests.

So pagination (fetch / offset) can’t exactly replicate this functionality.

The problem with scrolling (in elastic) is that it is expensive and can’t
(shouldn’t) be enabled it by default.

There is one more “issue”. Currently select * from elastic returns at most
10 rows (in calcite). This is consistent with elastic behaviour which
limits result set to 10 documents (unless size is specified). When
returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
<https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
or SQL hint) does it mean return whole elastic index ? I’m not at ease with
returning different results based on hints or cursor settings.

Andrei.

On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde  wrote:


It seems to me that Elasticsearch scroll means return a cursor - a
collection of rows that you iterate over, and you may not read all of them.
This is the default operation of JDBC.

So, I guess we need to give the user a way to signal their intent to read
all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
hints[1] seem close to this. We would want the hints to be acted upon by
both the optimizer and the JDBC transport.

Related is pagination. SQL has FETCH and OFFSET, which allow you to
retrieve different pieces of a large result set in separate statements or
(using query parameters) executions. It would be useful if the server could
be given a hint to cache a statement across page requests.

Julian

[1]
https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924


On Oct 24, 2018, at 11:19 AM, Christian Beikov <

christian.bei...@gmail.com> wrote:

Hey,

not sure if this should be an SQL keyword. JDBC specifies various

constants that can be used at statement creation time:
https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html

Not sure though if or how these configurations are accessible for data

stores or dialects, but IMO using these would be the proper way.

Regards

Christian


Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
Hello,


I was thinking about adding [scrolling functionality](


https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
)

to elastic search adapter. Since scrolling has non-negligible effect on

the

cluster it should be selectively enabled on per query basis. So, likely,
user has to explicitly set "scroll flag" somewhere.

Most natural way seems in SQL. [Calcite sql grammar](
https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
(unused to my knowledge). There were also discussions about adding

hints to

Calcite.

### Examples
```sql
-- special sql keyword ?
SCROLL select * from elastic;

-- assuming hints are available in calcite
/* HINT: scroll */ select * from elastic;
```

What people think about this use-case ? Are there better ideas ?

Regards,
Andrei.



Re: scrolling in ElasticSearch adapter

2018-10-25 Thread Christian Beikov

Hey Andrei,

I don't have an answer for how you can access these settings from within 
the adapter nor how one could do that via RelNodes but the suggestion to 
use DataContext for that purpose sounds reasonable. Maybe someone else 
has an idea?


Anyway, since these are settings that don't affect the general semantics 
of the query/statement and also usually require a special API to be 
used, I'd rather see these aspects not end up in the query string.


Am 25.10.2018 um 02:15 schrieb Andrei Sereda:

Christian,

I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
generally but have some reservations (questions) :

How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
What if user doesn’t use JDBC (eg. RelBuilders) ?
On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
 wrote:

In JDBC one can configure a fetch size which would reflect the amount of
rows to be fetched initially, but also subsequently.
https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)

According to what you are writing, ES behvior is what
TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
affected by changes.

IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
... and view R1, R2, then R3 is deleted and you fetch the next rows, you
wouldn't see R3.

According to the JDBC spec
(https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
) you don't have to support all modes. Usually, user code doesn't use
scrolling that much, but at least forward scrolling makes sense.

Am 24.10.2018 um 21:38 schrieb Andrei Sereda:

Hi Julian,

Scrolling (in elastic) does not only mean “open a cursor” but also iterate
over consistent snapshot. From docs:

The results that are returned from a scroll request reflect the state of
the index at the time that the initial search request was made, like a
snapshot in time. Subsequent changes to documents (index, update or delete)
will only affect later search requests.

So pagination (fetch / offset) can’t exactly replicate this functionality.

The problem with scrolling (in elastic) is that it is expensive and can’t
(shouldn’t) be enabled it by default.

There is one more “issue”. Currently select * from elastic returns at most
10 rows (in calcite). This is consistent with elastic behaviour which
limits result set to 10 documents (unless size is specified). When
returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
<https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
or SQL hint) does it mean return whole elastic index ? I’m not at ease with
returning different results based on hints or cursor settings.

Andrei.

On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde  wrote:


It seems to me that Elasticsearch scroll means return a cursor - a
collection of rows that you iterate over, and you may not read all of them.
This is the default operation of JDBC.

So, I guess we need to give the user a way to signal their intent to read
all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
hints[1] seem close to this. We would want the hints to be acted upon by
both the optimizer and the JDBC transport.

Related is pagination. SQL has FETCH and OFFSET, which allow you to
retrieve different pieces of a large result set in separate statements or
(using query parameters) executions. It would be useful if the server could
be given a hint to cache a statement across page requests.

Julian

[1]
https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924


On Oct 24, 2018, at 11:19 AM, Christian Beikov <

christian.bei...@gmail.com> wrote:

Hey,

not sure if this should be an SQL keyword. JDBC specifies various

constants that can be used at statement creation time:
https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html

Not sure though if or how these configurations are accessible for data

stores or dialects, but IMO using these would be the proper way.

Regards

Christian


Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
Hello,


I was thinking about adding [scrolling functionality](


https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
)

to elastic search adapter. Since scrolling has non-negligible effect on

the

cluster it should be selectively enabled on per query basis. So, likely,
user has to explicitly set "scroll flag" somewhere.

Most natural way seems in SQL. [Calcite sql grammar](
https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
(unused to my knowledge). There were also discussions about adding

hints to

Calcite.

### Examples
```sql
-- special sql keyword ?
SCROLL select * from elastic;

-- assuming hints are available in calcite
/* HINT: scroll */ select * from elastic;
```

What people think about this use-case ? Are there better ideas ?

Regards,
Andrei.



Re: [DISCUSS] Move gitbox notification emails to another list?

2019-02-27 Thread Christian Beikov

Hey,

yeah just make those E-Mails stop please. They are pretty annoying.

Am 27.02.2019 um 11:13 schrieb Francis Chuang:

Hey all,

I wanted to gauge your opinions regarding the gitbox emails being sent 
to the dev@ list. I am finding these emails to be quite noisy (there 
is an email every time there is activity in our Github repos). I end 
up deleting most of these emails without reading them and I feel that 
there is a lot of noise compared to signal.


How do you guys feel about moving these to another list (for example 
git...@calcite.apache.org)? As Vladimir mentioned in another thread, 
these emails are quite useful and important as they serve as a 
searchable archive of activity in our Github repos.


If we do move the emails to a different list, what do we do with the 
emails that have been sent to dev@? Do we nominate someone to forward 
a copy to the new list?


Francis


EnumerableTableScan array/multiset handling

2024-05-24 Thread Christian Beikov

Hello,

in my recent experiments I ran into some issues when trying to unnest an 
array of struct.


The query is roughly this: select t.id, e.value1 from MyTable t, 
unnest(t.structArray) e


EnumerableTableScan#fieldExpression will then try to generate code that 
converts the value of the "structArray" column to a List, which is 
where the problems start to arise. This code does not seem to be tested 
at all, because currently generates a compile error, due to missing a 
cast to "Iterable". It also assumes the data is already available in the 
JavaRowFormat.CUSTOM representation, but AFAIU, it could be in any format.


When using RelRecordType for structs, regular struct columns seem to 
expect JavaRowFormat.ARRAY, but struct arrays don't seem to behave the 
same way.


What is the expected data format that an enumerator should return for 
struct arrays that are typed as RelRecordType?


To support formats per type it might be nice to allow specifying the 
JavaRowFormat on RelDataType. Wdyt?


Also, is there a way to allow working with custom Java types for 
table/struct rows? From looking at AbstractCursor#createAccessor, it 
seems the Aviatica code currently only works with classes that expose 
public fields.


Regards,

Christian


Re: EnumerableTableScan array/multiset handling

2024-06-03 Thread Christian Beikov

Hi Julian,

it seems to me that 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.JavaType, which is a 
subtype of RelDataType, would be the best place to model this. How about 
I add an Accessor contract and a field to JavaType to take care of 
producing expressions for the enumerable adapater? By default, this will 
use an implementation based on public fields or record accessors 
depending on class type. In addition, I would also add an implementation 
that works based on calling getter methods, since that is what I need 
for my purposes.


Wdyt?

Regards,

Christian

Am 31.05.2024 um 03:02 schrieb Julian Hyde:

Thanks for doing these experiments, Christian, and documenting what you found.

I think you’re running into the limitations of ReflectiveSchema. It works with 
POJOs (java classes with public fields) but hasn’t been tested for richer 
variations and therefore just doesn’t work. In many cases, it can be fixed 
(seehttps://issues.apache.org/jira/browse/CALCITE-6244  for example).

I’m uneasy about extending RelDataType to return JavaRowFormat. That seems to 
be introducing a physical property into a logical data type; also, it is 
surfacing the details of one particular adapter (enumerable). Maybe there is a 
way to surface this information via annotations or the java.sql.Wrapper 
interface without extending RelDataType.

Julian



On May 24, 2024, at 11:33 AM, Christian Beikov  
wrote:

Hello,

in my recent experiments I ran into some issues when trying to unnest an array 
of struct.

The query is roughly this: select t.id, e.value1 from MyTable t, 
unnest(t.structArray) e

EnumerableTableScan#fieldExpression will then try to generate code that converts the value of the 
"structArray" column to a List, which is where the problems start to arise. This 
code does not seem to be tested at all, because currently generates a compile error, due to missing a 
cast to "Iterable". It also assumes the data is already available in the JavaRowFormat.CUSTOM 
representation, but AFAIU, it could be in any format.

When using RelRecordType for structs, regular struct columns seem to expect 
JavaRowFormat.ARRAY, but struct arrays don't seem to behave the same way.

What is the expected data format that an enumerator should return for struct 
arrays that are typed as RelRecordType?

To support formats per type it might be nice to allow specifying the 
JavaRowFormat on RelDataType. Wdyt?

Also, is there a way to allow working with custom Java types for table/struct 
rows? From looking at AbstractCursor#createAccessor, it seems the Aviatica code 
currently only works with classes that expose public fields.

Regards,

Christian

Re: Support for mv rewrite between different join types

2024-08-13 Thread Christian Beikov

Hi,

I'd be very curious to see this being implemented. I wanted to tackle 
this in the past as well, but was unable to properly understand the 
papers in that area and hence failed to properly implement this. Nice to 
see it being picked up again.


Regards,

Christian

Am 13.08.2024 um 18:50 schrieb suibianwanwan:

Hello, everyone. As the use of materialized views is becoming more and more 
widespread, it is believed that there will be more users of the materialized 
view.
Now including such as Oracle, Starrocks in the past on the basis of SPJG 
rewriting gradually support the rewriting of different types of join, i.e. 
SPOJG(o means outer join)
I think we can build on UnifyRule by adding a rewrite between different join 
types.
I actually mentioned this PR about a year ago when I was new to calcite, and 
didn't know enough about it then,  but now I think I'm ready to start over, and 
before I do so again, I'd like to check out yours thoughts and discussions!

In addition, I just joined the dev list and apologize for asking questions and 
discussion in jira issues in the past without understanding the community 
rules, and thank you all for your help!

Thank you,
suibianwanwan


[jira] [Created] (CALCITE-3810) Implement Rel-to-Sql translation for ANTI and SEMI join

2020-02-20 Thread Christian Beikov (Jira)
Christian Beikov created CALCITE-3810:
-

 Summary: Implement Rel-to-Sql translation for ANTI and SEMI join
 Key: CALCITE-3810
 URL: https://issues.apache.org/jira/browse/CALCITE-3810
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.21.0
Reporter: Christian Beikov


Currently, when constructing an ANTI or SEMI join in the relational algebra, 
it's not possible to render to an Sql AST.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3840) Re-aliasing of VALUES that has column aliases produces wrong SQL in the JDBC adapter

2020-03-04 Thread Christian Beikov (Jira)
Christian Beikov created CALCITE-3840:
-

 Summary: Re-aliasing of VALUES that has column aliases produces 
wrong SQL in the JDBC adapter
 Key: CALCITE-3840
 URL: https://issues.apache.org/jira/browse/CALCITE-3840
 Project: Calcite
  Issue Type: Bug
  Components: jdbc-adapter
Affects Versions: 1.21.0
Reporter: Christian Beikov


Rendering a VALUES relnode to e.g. PostgreSQL will produce \{{FROM 
(VALUES((1))) AS t(col_alias)}} where "t" is a static alias. When e.g. joining 
with such a VALUES, the RelToSqlConverter tries to re-alias this with a unique 
alias, but fails because it produces \{{FROM (VALUES((1))) AS t(col_alias) AS 
newAlias}}.

The fix is to replace the static table alias instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-1940) Implement dialect specific support for sequences

2017-08-13 Thread Christian Beikov (JIRA)
Christian Beikov created CALCITE-1940:
-

 Summary: Implement dialect specific support for sequences
 Key: CALCITE-1940
 URL: https://issues.apache.org/jira/browse/CALCITE-1940
 Project: Calcite
  Issue Type: Improvement
Reporter: Christian Beikov
Assignee: Julian Hyde


The Calcite parser and validator already supports sequences but the push down 
to the JDBC level is currently limited. SInce sequences are not supported on 
all DBMS every sql dialect should have the possibility to render it's own way 
of extracting sequence information or incrementing the value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-1965) Support outer joins for materialized views

2017-08-24 Thread Christian Beikov (JIRA)
Christian Beikov created CALCITE-1965:
-

 Summary: Support outer joins for materialized views
 Key: CALCITE-1965
 URL: https://issues.apache.org/jira/browse/CALCITE-1965
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Christian Beikov
Assignee: Julian Hyde


Currently, only inner joins are supported for materialized view substitutions. 
The support for outer joins involves creating new pulled up predicates in case 
of outer joins that represent semantics of the join. For a join predicate like 
"a.id = b.id" the inner join just pulls up that predicate. When having a left 
join like e.g. {{select * from a left join b on a.id = b.id}}, the actual 
pulled up predicate would be {{OR(=(a.id, b.id),ISNULL(b.id))}}. For a right 
join it would be  {{OR(=(a.id, b.id),ISNULL(a.id))}} and for a full outer join 
it would be  {{OR(=(a.id, b.id),ISNULL(a.id),ISNULL(b.id))}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-1966) Support for normal views to act as materialization table

2017-08-24 Thread Christian Beikov (JIRA)
Christian Beikov created CALCITE-1966:
-

 Summary: Support for normal views to act as materialization table
 Key: CALCITE-1966
 URL: https://issues.apache.org/jira/browse/CALCITE-1966
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Christian Beikov
Assignee: Julian Hyde


Currently, it seems a materialized view can only have it's materialization in a 
real table, although it might be nice to be able to use a view instead. Since 
external data stores like e.g. elasticsearch aren't might have different data 
representations, the mismatch has to be overcome by doing casts in a view.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-1967) Add support for elastic search 5

2017-08-24 Thread Christian Beikov (JIRA)
Christian Beikov created CALCITE-1967:
-

 Summary: Add support for elastic search 5
 Key: CALCITE-1967
 URL: https://issues.apache.org/jira/browse/CALCITE-1967
 Project: Calcite
  Issue Type: Improvement
Reporter: Christian Beikov
Assignee: Julian Hyde


The elastic search adapter seems to target versions before 5.x but it would be 
nice to also have an adapter for ES5+ so I created one that is quite similar to 
the existing adapter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-1971) Use embedded elastic search in tests

2017-08-29 Thread Christian Beikov (JIRA)
Christian Beikov created CALCITE-1971:
-

 Summary: Use embedded elastic search in tests
 Key: CALCITE-1971
 URL: https://issues.apache.org/jira/browse/CALCITE-1971
 Project: Calcite
  Issue Type: Improvement
Reporter: Christian Beikov
Assignee: Julian Hyde


As discussed in CALCITE-1967 we should use the embedded elastic search server 
instead to be able to run integration tests locally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)