RelNode rewrite
Hello, I'm trying to generate SQL for a trigger that requires a little rewrite of a RelNode structure and I wanted to know if anyone could help me figure out how to do a few things * I need to swap left and right of a LogicalJoin. Doing that messes up predicates and projections though. Is there a utility I could use or how would I do that? * I need to introduce an "alias" like "NEW" into a RelNode for which I want to create a RexNode "NEW.key1" for. I want to add a WHERE condition based on the NEW alias. Not sure how I would do that though. On SqlNode level only maybe? * Can I somehow determine what columns of a table are unique or part of the primary key? I want to create a predicate like (NEW.key1, NEW.key2) = (existing.key1, existing.key2) Thanks in advance! Regards, Christian
Re: RelNode rewrite
Hey Jess! Am 14.02.2020 um 02:16 schrieb Jess Balint: Hey Christian, On Thu, Feb 13, 2020 at 6:30 PM Christian Beikov wrote: Hello, I'm trying to generate SQL for a trigger that requires a little rewrite of a RelNode structure and I wanted to know if anyone could help me figure out how to do a few things * I need to swap left and right of a LogicalJoin. Doing that messes up predicates and projections though. Is there a utility I could use or how would I do that? You can use JoinCommuteRule, or check it's code for your needs. Thanks for the hint. I just hope the re-projection isn't confusing for the DBMS optimizer. Would be cool if a RelNode could be rewritten from bottom up to avoid this. Or is there some kind of Rule I can apply to optimize the re-projection away? * I need to introduce an "alias" like "NEW" into a RelNode for which I want to create a RexNode "NEW.key1" for. I want to add a WHERE condition based on the NEW alias. Not sure how I would do that though. On SqlNode level only maybe? I don't understand what you're after here. The WHERE condition will be a filter node. Can you just add it with attaching something to the rel node? So in a SQL trigger, there is the alias "NEW" and "OLD" available which I want to reference in a correlation predicate. I essentially want to correlate the RelNode of the changed table with the "NEW" alias and I try to do that through the primary key. Let's consider the example query "select * from table1 t". So for a trigger for "table1" I essentially want to create the query "select * from table1 t where (t.id) = (NEW.id)". This is what I have so far, maybe you could point me to the right direction? The uniqueKeys are unfortunately null although the node is a JdbcTableScan. private RexNodecreateNewCorrelatedJoinCondition(LogicalJoin join, RelNode node) { Set uniqueKeys = node.getCluster().getMetadataQuery().getUniqueKeys(node); ImmutableBitSet uniqueFieldBitSet = uniqueKeys.iterator().next(); List fieldList = node.getRowType().getFieldList(); RelBuilder relBuilder = RelBuilder.create(config); RexNode condition = join.getCondition(); List conjuncts =new ArrayList<>(uniqueFieldBitSet.size() +1); conjuncts.add(condition); for (Integer index : uniqueFieldBitSet) { conjuncts.add( relBuilder.equals( relBuilder.field("NEW", fieldList.get(index).getName()), relBuilder.field(fieldList.get(index).getIndex()) ) ); } return RexUtil.composeConjunction(rexBuilder, conjuncts); } Thanks, Christian
Correlation with a variable
Hello, I asked this before but I guess the question got too big, so I thought splitting it up might be better. I am trying to generate a query from a relational model on which I did a few tranformations but I don't know how to refer to a "variable". In a SQL trigger, there usually are two variable "OLD" and "NEW" which I want to be able to refer to. I tried introducing a "transient scan", but unfortunately that won't work because this is not a relation and can't be quoted. I will workaround this for now by introducing a temporary relation in the trigger so that I can refer to it, but ideally I want to refer to the variable directly. The simplest example SQL that I want to be able to produce would look like this: select NEW."some_column" The tricky part here is that NEW is not quoted. I don't know how I can represent this in a relation expression. Thanks in advance for any help! Regards, Christian
Re: Correlation with a variable
Hey Danny, it's not a view, it's a variable in PL/SQL with a row type. The thing is, variable names must not be quoted, but I have no idea how to avoid quoting for this single use case with the relational algebra model in Calcite. Regards, Christian Am 18.02.2020 um 04:22 schrieb Danny Chan: From the case you gave, the “variable” seems a view ? Sorry I’m not familiar with the traditional RDBMS. Best, Danny Chan 在 2020年2月17日 +0800 PM1:27,Christian Beikov ,写道: Hello, I asked this before but I guess the question got too big, so I thought splitting it up might be better. I am trying to generate a query from a relational model on which I did a few tranformations but I don't know how to refer to a "variable". In a SQL trigger, there usually are two variable "OLD" and "NEW" which I want to be able to refer to. I tried introducing a "transient scan", but unfortunately that won't work because this is not a relation and can't be quoted. I will workaround this for now by introducing a temporary relation in the trigger so that I can refer to it, but ideally I want to refer to the variable directly. The simplest example SQL that I want to be able to produce would look like this: select NEW."some_column" The tricky part here is that NEW is not quoted. I don't know how I can represent this in a relation expression. Thanks in advance for any help! Regards, Christian
Re: Correlation with a variable
My issue is not about parsing. I already have the relational model, I parsed a query to which I want to add add a condition to a RelNode. Now I want to add a RexNode to the LogicalFilter node that renders to: NEW."somecolumn" How would I construct a RexNode that renders to that when converting the RelNode to SQL. Do I have to extend the SqlDialect to support that? Danny Chan schrieb am Di., 18. Feb. 2020, 15:12: > If you want to make NEW a normal sql identifier, you should override it in > the parser to make it unreserved. > > Christian Beikov 于2020年2月18日 周二下午3:11写道: > > > Hey Danny, > > > > it's not a view, it's a variable in PL/SQL with a row type. The thing > > is, variable names must not be quoted, but I have no idea how to avoid > > quoting for this single use case with the relational algebra model in > > Calcite. > > > > Regards, > > > > Christian > > > > Am 18.02.2020 um 04:22 schrieb Danny Chan: > > > From the case you gave, the “variable” seems a view ? Sorry I’m not > > familiar with the traditional RDBMS. > > > > > > Best, > > > Danny Chan > > > 在 2020年2月17日 +0800 PM1:27,Christian Beikov > >,写道: > > >> Hello, > > >> > > >> I asked this before but I guess the question got too big, so I thought > > >> splitting it up might be better. > > >> > > >> I am trying to generate a query from a relational model on which I > did a > > >> few tranformations but I don't know how to refer to a "variable". > > >> > > >> In a SQL trigger, there usually are two variable "OLD" and "NEW" > which I > > >> want to be able to refer to. I tried introducing a "transient scan", > but > > >> unfortunately that won't work because this is not a relation and can't > > >> be quoted. I will workaround this for now by introducing a temporary > > >> relation in the trigger so that I can refer to it, but ideally I want > to > > >> refer to the variable directly. > > >> > > >> The simplest example SQL that I want to be able to produce would look > > >> like this: > > >> > > >> select NEW."some_column" > > >> > > >> The tricky part here is that NEW is not quoted. I don't know how I can > > >> represent this in a relation expression. > > >> > > >> Thanks in advance for any help! > > >> > > >> Regards, > > >> > > >> Christian > > >> > > >
Re: Correlation with a variable
Hey Stamatis, thanks for your help. Am 18.02.2020 um 19:34 schrieb Stamatis Zampetakis: Hi Christian, Long story short: maybe you can achieve what you want by adding a new type of RexNode. From what I remember quoting is applied to identifiers (schema, table, column names) so maybe if OLD/NEW are parameters (e.g., RexNamedParam) then possibly it does not make sense to quote them. Right, and actually it is wrong to quote these variables/parameters i.e. it's an error as PG complains there is no relation with that name. At row level triggers OLD and NEW refer to a single tuple/row at each point in time. In this case, OLD and NEW are row type variables, or better say parameters, with the same type as the table. In Calcite, there are index-based parameters (RexDynamicParam) but not named parameters as the one you seem to need. I think named parameters are useful in various scenarios so possibly we could expand the RexNode hierarchy. In this case to build the plan probably it suffices to create and plug the parameter wherever you need. That sounds great and just about what I would need. Is there an issue for this already? At statement level triggers OLD and NEW refer to set of tuples/rows at each point in time. In this case, OLD and NEW appear as (temporary) relations/tables with the same type as the table. In terms of implementation, I assume that the user defined query acts as a subquery correlated with OLD/NEW as necessary. Correct, but right now I am using row level triggers. I'll try to introduce somekind of synthetic view that holds the state so I can use a TransientScan for now. In this case to build the plan probably you need to introduce scan operations over OLD/NEW tables and create a correlation with the rest of the query. As of PG10 the REFERENCING clause can be used to introduce a temporary view for an old and new relation. Best, Stamatis On Tue, Feb 18, 2020 at 7:03 PM Christian Beikov wrote: My issue is not about parsing. I already have the relational model, I parsed a query to which I want to add add a condition to a RelNode. Now I want to add a RexNode to the LogicalFilter node that renders to: NEW."somecolumn" How would I construct a RexNode that renders to that when converting the RelNode to SQL. Do I have to extend the SqlDialect to support that? Danny Chan schrieb am Di., 18. Feb. 2020, 15:12: If you want to make NEW a normal sql identifier, you should override it in the parser to make it unreserved. Christian Beikov 于2020年2月18日 周二下午3:11写道: Hey Danny, it's not a view, it's a variable in PL/SQL with a row type. The thing is, variable names must not be quoted, but I have no idea how to avoid quoting for this single use case with the relational algebra model in Calcite. Regards, Christian Am 18.02.2020 um 04:22 schrieb Danny Chan: From the case you gave, the “variable” seems a view ? Sorry I’m not familiar with the traditional RDBMS. Best, Danny Chan 在 2020年2月17日 +0800 PM1:27,Christian Beikov < christian.bei...@gmail.com ,写道: Hello, I asked this before but I guess the question got too big, so I thought splitting it up might be better. I am trying to generate a query from a relational model on which I did a few tranformations but I don't know how to refer to a "variable". In a SQL trigger, there usually are two variable "OLD" and "NEW" which I want to be able to refer to. I tried introducing a "transient scan", but unfortunately that won't work because this is not a relation and can't be quoted. I will workaround this for now by introducing a temporary relation in the trigger so that I can refer to it, but ideally I want to refer to the variable directly. The simplest example SQL that I want to be able to produce would look like this: select NEW."some_column" The tricky part here is that NEW is not quoted. I don't know how I can represent this in a relation expression. Thanks in advance for any help! Regards, Christian
Translation of SQL EXISTS
Hello, I'm a bit confused about how the SQL EXISTS predicate is translated. I'd assume that an EXISTS is translated in relational algebra to a SEMI- and NOT EXISTS to an ANTI-join, but it's not. PlannerImpl p = new PlannerImpl(config); SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 from order_position p where o.id = p.order_id)"); p.validate(sqlNode); RelRoot rel = p.rel(sqlNode); RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect); SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel); sqlWriter.format(result.asStatement()); Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. the EXISTS part is not rendered. This is the plan I get. LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]]) LogicalJoin(condition=[=($0, $1)], joinType=[inner]) JdbcTableScan(table=[[adhoc, _order]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(exprs=[[$1, true]]) JdbcTableScan(table=[[adhoc, order_position]]) I'd expect something along the lines of LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[semi]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) and for NOT EXISTS LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[anti]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) Am I missing something and the current aggregate function translation makes sense? I constructed relational algebra structures for some other statements with SEMI- and ANTI-joins and already noticed that these join types weren't handled in org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join), which I fixed locally. Is the lack of a translation intentional? Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an over-simplification or would you say it's correct? As far as I understood from https://en.wikipedia.org/wiki/Relational_algebra this is correct. I'd be happy to contribute that back. I didn't look into the Sql-to-Rel translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I assume that's not that hard and I could add that. Regards, Christian
Re: Translation of SQL EXISTS
Hey Haisheng, it is nice to have a rule that detects such patterns but do you agree that it would be better to generate SEMI- and ANTI-joins already during Sql-to-Rel conversion? I implemented the Rel-to-Sql conversion for SEMI- and ANTI-join and it seems to work for my use cases, but it feels a bit ugly as I have to do some transformations which might be ad-hoc. I had to replace identifiers that refer to the outcome of the SEMI- and ANTI-joins with the select items of the LHS because the SEMI- and ANTI-join does not get an alias. See for yourself: https://github.com/beikov/calcite/commit/6ca67e2e8df44c2081e71407ee6893a961f36271 I'll add a few tests and a comment, but could anyone please comment on whether this approach looks ok? Regards, Christian Am 19.02.2020 um 19:03 schrieb Haisheng Yuan: Hi Christian, For the query in your example, Calcite first generates inner join plan with aggregate child, then through SemJoinRule transform the inner join to semi or antisemi join. The reason to have inner join is that it allows join commutativity, which is good for generating a potential better plan with nestedloop join or hash join. Admittedly, this process in Calcite is counter intuitive. It should be in reverse order, first generate a semi or anti-semi join, then generate an inner/outer join. - Haisheng -- 发件人:Christian Beikov 日 期:2020年02月19日 21:12:13 收件人: 主 题:Translation of SQL EXISTS Hello, I'm a bit confused about how the SQL EXISTS predicate is translated. I'd assume that an EXISTS is translated in relational algebra to a SEMI- and NOT EXISTS to an ANTI-join, but it's not. PlannerImpl p = new PlannerImpl(config); SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 from order_position p where o.id = p.order_id)"); p.validate(sqlNode); RelRoot rel = p.rel(sqlNode); RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect); SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel); sqlWriter.format(result.asStatement()); Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. the EXISTS part is not rendered. This is the plan I get. LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]]) LogicalJoin(condition=[=($0, $1)], joinType=[inner]) JdbcTableScan(table=[[adhoc, _order]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(exprs=[[$1, true]]) JdbcTableScan(table=[[adhoc, order_position]]) I'd expect something along the lines of LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[semi]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) and for NOT EXISTS LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[anti]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) Am I missing something and the current aggregate function translation makes sense? I constructed relational algebra structures for some other statements with SEMI- and ANTI-joins and already noticed that these join types weren't handled in org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join), which I fixed locally. Is the lack of a translation intentional? Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an over-simplification or would you say it's correct? As far as I understood from https://en.wikipedia.org/wiki/Relational_algebra this is correct. I'd be happy to contribute that back. I didn't look into the Sql-to-Rel translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I assume that's not that hard and I could add that. Regards, Christian
Re: How to optimize repeated RelNode Structures? (CALCITE-3806)
I guess what you are looking for is something like the SubPlan node of PostgreSQL? Such a node would materialize the results to be used by multiple other nodes or in a nested loop context to avoid accessing a source relation multiple times. Would be cool if Calcite could create a SubPlan and let the union all node use that instead. Something like this Materialize(name=Sub1) LogicalProject(EMPNO=[$0]) LogicalFilter(condition=[OR(>=($0, 7369), >=($0, 7369))]) LogicalTableScan(table=[[scott, EMP]]) LogicalUnion(all=[true]) LogicalProject(EMPNO=[$0]) LogicalFilter(condition=[>=($0, 7369)]) LogicalTableScan(table=[[Sub1]]) LogicalProject(EMPNO=[$0]) LogicalFilter(condition=[>=($0, 7369)]) LogicalTableScan(table=[[Sub1]]) Of course there are further optimizations that could be done like removing filters etc. but that's just a general example. Not sure if the Spool operator does that, but such a translation could also come in handy for distributed queries where every row access to a remote system has protocol overhead. Danny Chan schrieb am Do., 20. Feb. 2020, 15:27: > Calcite has a Spool operator, maybe you can check that. > > Anjali Shrishrimal 于2020年2月20日 > 周四下午3:20写道: > > > Hi everybody, > > > > I would like to have your suggestions on CALCITE-3806. > > > > Asking it here as suggested by Julian. > > > > > > > > > > > > If RelNode tree contains a subtree whose result can be obtained by some > > other part of the same tree, > > > > can we optimize it ? and how to express it in plan ? > > > > > > > > For example, > > > > Let's say input structure looks like this : > > > > > > > > LogicalUnion(all=[true]) > > > > LogicalProject(EMPNO=[$0]) > > > > LogicalFilter(condition=[>=($0, 7369)]) > > > > LogicalTableScan(table=[[scott, EMP]]) > > > > LogicalProject(EMPNO=[$0]) > > > > LogicalFilter(condition=[>=($0, 7369)]) > > > > LogicalTableScan(table=[[scott, EMP]]) > > > > > > > > > > > > In this case, > > > > > > > > LogicalProject(EMPNO=[$0]) > > > > LogicalFilter(condition=[>=($0, 7369)]) > > > > LogicalTableScan(table=[[scott, EMP]]) > > > > > > > > is repeated. It is going to fetch same data twice. > > > > Can we save one fetch? Can we somehow tell 2nd input of union to make use > > of union's 1st input. Is there any way to express that in plan? > > > > > > > > Also, > > If the structure was like this : > > > > > > > > LogicalUnion(all=[true]) > > > > LogicalProject(EMPNO=[$0]) > > > > LogicalFilter(condition=[>=($0, 7369)]) > > > > LogicalTableScan(table=[[scott, EMP]]) > > > > LogicalProject(EMPNO=[$0]) > > > > LogicalFilter(condition=[>=($0, 8000)]) > > > > LogicalTableScan(table=[[scott, EMP]]) > > > > > > > > Second part of union can perform filtering on fetched data of 1st part. > > (As second's output is subset of first's output) > > > > > > > > Does calcite provide such kind of optimizations ? > > > > If not, what are the challenges to do so? > > > > > > > > > > > > > > > > Would love to hear your thoughts. > > > > > > > > > > Thank you, > > Anjali Shrishrimal > > >
Re: Translation of SQL EXISTS
Alright, so here is the PR for the Rel-to-Sql translation. I thought about what you wrote here already Julian, but thanks for mentioning it. Would be great if this could make it into 1.22.0, or is it too late for that already? Am 19.02.2020 um 22:49 schrieb Julian Hyde: Agreed, it should be in reverse order. Translate to semi-join (or anti-join for NOT EXISTS), then optionally use a rule to rewrite semi- or anti-join to Join+Aggregate. Note that if the EXISTS is in a disjunction (e.g. "delete from orders where exists (select null from order_items where ...) or shipping > 20") we cannot use a semi-join. We have to use a left join, using an indicator column on the right-hand side that will be null iff there is no match. Which is what we do currently. On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan wrote: Hi Christian, For the query in your example, Calcite first generates inner join plan with aggregate child, then through SemJoinRule transform the inner join to semi or antisemi join. The reason to have inner join is that it allows join commutativity, which is good for generating a potential better plan with nestedloop join or hash join. Admittedly, this process in Calcite is counter intuitive. It should be in reverse order, first generate a semi or anti-semi join, then generate an inner/outer join. - Haisheng -- 发件人:Christian Beikov 日 期:2020年02月19日 21:12:13 收件人: 主 题:Translation of SQL EXISTS Hello, I'm a bit confused about how the SQL EXISTS predicate is translated. I'd assume that an EXISTS is translated in relational algebra to a SEMI- and NOT EXISTS to an ANTI-join, but it's not. PlannerImpl p = new PlannerImpl(config); SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 from order_position p where o.id = p.order_id)"); p.validate(sqlNode); RelRoot rel = p.rel(sqlNode); RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect); SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel); sqlWriter.format(result.asStatement()); Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. the EXISTS part is not rendered. This is the plan I get. LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]]) LogicalJoin(condition=[=($0, $1)], joinType=[inner]) JdbcTableScan(table=[[adhoc, _order]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(exprs=[[$1, true]]) JdbcTableScan(table=[[adhoc, order_position]]) I'd expect something along the lines of LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[semi]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) and for NOT EXISTS LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[anti]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) Am I missing something and the current aggregate function translation makes sense? I constructed relational algebra structures for some other statements with SEMI- and ANTI-joins and already noticed that these join types weren't handled in org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join), which I fixed locally. Is the lack of a translation intentional? Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an over-simplification or would you say it's correct? As far as I understood from https://en.wikipedia.org/wiki/Relational_algebra this is correct. I'd be happy to contribute that back. I didn't look into the Sql-to-Rel translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I assume that's not that hard and I could add that. Regards, Christian
Re: Translation of SQL EXISTS
Yeah right, somehow forgot to add the link. Am 20.02.2020 um 21:46 schrieb Julian Hyde: Thanks. I guess you mean https://issues.apache.org/jira/browse/CALCITE-3810 <https://issues.apache.org/jira/browse/CALCITE-3810>. On Feb 20, 2020, at 11:21 AM, Christian Beikov wrote: Alright, so here is the PR for the Rel-to-Sql translation. I thought about what you wrote here already Julian, but thanks for mentioning it. Would be great if this could make it into 1.22.0, or is it too late for that already? Am 19.02.2020 um 22:49 schrieb Julian Hyde: Agreed, it should be in reverse order. Translate to semi-join (or anti-join for NOT EXISTS), then optionally use a rule to rewrite semi- or anti-join to Join+Aggregate. Note that if the EXISTS is in a disjunction (e.g. "delete from orders where exists (select null from order_items where ...) or shipping > 20") we cannot use a semi-join. We have to use a left join, using an indicator column on the right-hand side that will be null iff there is no match. Which is what we do currently. On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan wrote: Hi Christian, For the query in your example, Calcite first generates inner join plan with aggregate child, then through SemJoinRule transform the inner join to semi or antisemi join. The reason to have inner join is that it allows join commutativity, which is good for generating a potential better plan with nestedloop join or hash join. Admittedly, this process in Calcite is counter intuitive. It should be in reverse order, first generate a semi or anti-semi join, then generate an inner/outer join. - Haisheng -- 发件人:Christian Beikov 日 期:2020年02月19日 21:12:13 收件人: 主 题:Translation of SQL EXISTS Hello, I'm a bit confused about how the SQL EXISTS predicate is translated. I'd assume that an EXISTS is translated in relational algebra to a SEMI- and NOT EXISTS to an ANTI-join, but it's not. PlannerImpl p = new PlannerImpl(config); SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 from order_position p where o.id = p.order_id)"); p.validate(sqlNode); RelRoot rel = p.rel(sqlNode); RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect); SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel); sqlWriter.format(result.asStatement()); Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. the EXISTS part is not rendered. This is the plan I get. LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]]) LogicalJoin(condition=[=($0, $1)], joinType=[inner]) JdbcTableScan(table=[[adhoc, _order]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(exprs=[[$1, true]]) JdbcTableScan(table=[[adhoc, order_position]]) I'd expect something along the lines of LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[semi]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) and for NOT EXISTS LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], flattened=[true]) LogicalProject(inputs=[0]) LogicalJoin(condition=[=($0, $1)], joinType=[anti]) JdbcTableScan(table=[[adhoc, _order]]) JdbcTableScan(table=[[adhoc, order_position]]) Am I missing something and the current aggregate function translation makes sense? I constructed relational algebra structures for some other statements with SEMI- and ANTI-joins and already noticed that these join types weren't handled in org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join), which I fixed locally. Is the lack of a translation intentional? Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an over-simplification or would you say it's correct? As far as I understood from https://en.wikipedia.org/wiki/Relational_algebra this is correct. I'd be happy to contribute that back. I didn't look into the Sql-to-Rel translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I assume that's not that hard and I could add that. Regards, Christian
Re: TableModify does not keep UPSERT keyword
AFAIK MERGE has different concurrency semantics than what some DBMS call UPSERT. PostgreSQL for example has a guaranteed insert or update semantics whereas MERGE could end with constraint violation errors: https://wiki.postgresql.org/wiki/UPSERT Maybe it's worth adding that to the relational model after all? Am 17.03.2020 um 07:17 schrieb Enrico Olivelli: Il Lun 16 Mar 2020, 23:55 Julian Hyde ha scritto: Change the unparse operation for the dialect so that you generate UPSERT rather than MERGE. IIRC we did this for another dialect - maybe Phoenix. Julian, In my case (HerdDB) I need to see the presence of UPSERT in the RelNode (EnumerableTableModify oŕ LogicalTableModify) I saw the JIRA where you introduced UPSERT for Phoenix I will check deeper, thanks Enrico Julian On Mar 16, 2020, at 1:22 AM, Enrico Olivelli wrote: Il Lun 16 Mar 2020, 09:06 Stamatis Zampetakis ha scritto: Hi Enrico, I have the impression that UPSERT is currently supported only at the parser level [1] so it seems normal that you don't find something relevant in LogicalTableModify etc. Note that the SQL standard equivalent is the MERGE statement [2] but this also seems to be supported only at the parser/validator level [2]. I guess it is not necessary to introduce more things in TableModify since UPSERT seems to be syntactic sugar. I think that most of the work can be done in RelToSqlConverter [4] and possibly the rest of the code can be left intact. I would like to sens a patch that introduces the ability to keep the SqlInsert 'keywords' and pass them to TableModify? Would it be a good approach? The alternative is to introduce a new Operation but it would be a more invasive change and I think it is not worth Enrico Best, Stamatis [1] https://issues.apache.org/jira/browse/CALCITE-492 [2] https://en.wikipedia.org/wiki/Merge_(SQL) [3] https://issues.apache.org/jira/browse/CALCITE-985 [4] https://github.com/apache/calcite/blob/d234626227954eefffe49f42abec65c649ffe3a6/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L2395 On Sun, Mar 15, 2020 at 6:53 PM Enrico Olivelli wrote: Hi, I am trying to use UPSERT but it seems to me that in TableModify (or best LogicalTableModify or EnumerableTableModify) there is no way to distinguish an INSERT from an UPSERT. Am I missing something? Regards Enrico
Re: TableModify does not keep UPSERT keyword
I'd argue that these technical details are in fact the reason why this is a different functionality, that deserves special handling. I expect an upsert statement like `INSERT INTO tbl(a, b) VALUES(1, 2) ON CONFLICT DO NOTHING` to never produce a constraint violation. This is functionally different from a statement like `INSERT INTO tbl(a, b) SELECT a, b (VALUES(1, 2)) x(a, b) WHERE NOT EXISTS(SELECT 1 FROM tbl y WHERE x.a = y.a AND x.b = y.b)` which might cause the constraint violation. On the other hand, an upsert statement like `INSERT INTO tbl(a, b) VALUES(1, 2) ON CONFLICT DO UPDATE` guarantees that at the end of the statement, there is the tuple `(1, 2)`. There are other variants that provide different functionality(conflict handler conditions, special update clause, etc.) and overall, for DBMS that do not support the ON CONFLICT clause, it is necessary to fallback to PL/SQL to handle constraint violations in exception handlers within loops to ensure the same behvaior. If Calcite transforms such an UPSERT statement to a MERGE statement, it must at least be flagged to require atomicity to be able to generate the correct logic for the backend that this is running on. Am 17.03.2020 um 22:28 schrieb Julian Hyde: I don't think there's a significant difference between the UPSERT and MERGE. The differences are in marketing (which keyword to use) and in technical details (e.g. concurrency semantics). Not worth splitting a core concept over. We spend a lot of effort keeping like-things-like. On Tue, Mar 17, 2020 at 1:11 AM Christian Beikov wrote: AFAIK MERGE has different concurrency semantics than what some DBMS call UPSERT. PostgreSQL for example has a guaranteed insert or update semantics whereas MERGE could end with constraint violation errors: https://wiki.postgresql.org/wiki/UPSERT Maybe it's worth adding that to the relational model after all? Am 17.03.2020 um 07:17 schrieb Enrico Olivelli: Il Lun 16 Mar 2020, 23:55 Julian Hyde ha scritto: Change the unparse operation for the dialect so that you generate UPSERT rather than MERGE. IIRC we did this for another dialect - maybe Phoenix. Julian, In my case (HerdDB) I need to see the presence of UPSERT in the RelNode (EnumerableTableModify oŕ LogicalTableModify) I saw the JIRA where you introduced UPSERT for Phoenix I will check deeper, thanks Enrico Julian On Mar 16, 2020, at 1:22 AM, Enrico Olivelli wrote: Il Lun 16 Mar 2020, 09:06 Stamatis Zampetakis ha scritto: Hi Enrico, I have the impression that UPSERT is currently supported only at the parser level [1] so it seems normal that you don't find something relevant in LogicalTableModify etc. Note that the SQL standard equivalent is the MERGE statement [2] but this also seems to be supported only at the parser/validator level [2]. I guess it is not necessary to introduce more things in TableModify since UPSERT seems to be syntactic sugar. I think that most of the work can be done in RelToSqlConverter [4] and possibly the rest of the code can be left intact. I would like to sens a patch that introduces the ability to keep the SqlInsert 'keywords' and pass them to TableModify? Would it be a good approach? The alternative is to introduce a new Operation but it would be a more invasive change and I think it is not worth Enrico Best, Stamatis [1] https://issues.apache.org/jira/browse/CALCITE-492 [2] https://en.wikipedia.org/wiki/Merge_(SQL) [3] https://issues.apache.org/jira/browse/CALCITE-985 [4] https://github.com/apache/calcite/blob/d234626227954eefffe49f42abec65c649ffe3a6/core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java#L2395 On Sun, Mar 15, 2020 at 6:53 PM Enrico Olivelli wrote: Hi, I am trying to use UPSERT but it seems to me that in TableModify (or best LogicalTableModify or EnumerableTableModify) there is no way to distinguish an INSERT from an UPSERT. Am I missing something? Regards Enrico
Sequence support
Hello, I am new to Calcite but I already got quite far with integrating it into a test project. Now I am a bit stuck and I am not sure what is happening. I would like to use Calcite for data federation, but first I am trying to simply put it between the application and the actual JDBC connection. I read something on the website that operations aren't pushed down to JDBC schemas yet? That would be unfortunate as that is something I actually need. Can you confirm what the status of that is? While trying to insert data, I noticed that SQL sequences don't seem to be supported. Is there a special undocumented syntax for retrieving sequence values? I tried "next value for SEQUENCE" as something in SqlKind was suggesting that might be the appropriate syntax but it didn't work. Finally, I also noticed that transactions don't seem to work properly. Apparently CalciteMetaImpl#commit throws an UnsupportedOperationException? Am I missing something here or is that a current limitation? Thanks in advance! -- Mit freundlichen Grüßen, ---- *Christian Beikov*
Materialization left join rewrite
I just discovered that queries containing left joins aren't rewritten when materializations are available and wondered why that is a limitation. I looked a bit into the implementation and found out that org.apache.calcite.rel.metadata.RelMdAllPredicates#getAllPredicates(org.apache.calcite.rel.core.Join, org.apache.calcite.rel.metadata.RelMetadataQuery) returns null when encountering a non-inner join. For an inner join, the method returns essentially the join predicate, so I thought a left join should be similar. How about returning an OR node that asserts input refs of "outer-joined" tables are null if unmatched? Let's consider an example query like "from emp e left join dept d on e.deptno = d.deptno" If the join were an inner join, the return would be "=(e.deptno, d.deptno)". In case of a left join I'd suppose it returns "OR(IS_NULL(d.deptno),=(e.deptno, d.deptno))". Would that be sufficient to implement left join support or do you have something different in mind? I couldn't find a ticket for that particular case to discuss a possible implementation so please direct me if you already tracked that issue. -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Casts because of nullability differences
Hello, I came to notice that the use of a nullable column in a context where a non-nullable is expected, causes a cast that is essentially useless. Am I missing something and is there reason to why that is done? -- Mit freundlichen Grüßen, *Christian Beikov*
Re: Casts because of nullability differences
I am using materializations and my materialization table has nullable columns. Calcite correctly determines that the underlying query can never produce null for these columns, but I didn't encode that into the table definitino. I was just wondering why the casts are generated in the SQL that is sent to the owner of the materialization table. Thanks for the explanation, I suppose the cast causes proper exceptions to be thrown when null is encountered instead of simply a NPE? Or is the cast in such a case really unnecessary? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 23.08.2017 um 20:01 schrieb Julian Hyde: I presume you’re talking about RexNodes. It makes a lot of things easier if the argument to function call are EXACTLY the type required by that function. The system inserts implicit casts where necessary. If you’re generating Java code (in Enumerable convention) there is a lot of difference between INTEGER NOT NULL and INTEGER. The former can be represented by int, the latter only by java.lang.Integer. Julian On Aug 23, 2017, at 2:08 AM, Christian Beikov wrote: Hello, I came to notice that the use of a nullable column in a context where a non-nullable is expected, causes a cast that is essentially useless. Am I missing something and is there reason to why that is done? -- Mit freundlichen Grüßen, ---- *Christian Beikov*
Re: Casts because of nullability differences
I get that the cast is done "internally" to handle nulls properly, but why would such a cast end up in the SQL that gets sent to the DBMS? Can you point me to a direction where to look for these generated casts? I'd like to try if I can avoid rendering these casts, as some DBMS planners might get confused. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 23.08.2017 um 21:20 schrieb Julian Hyde: It’s difficult to say for sure whether the cast is necessary. For some of us consistency, is an end in itself. :) If a null was to occur, then yes, the cast would throw. But it’s also possible that Calcite can deduce that a null is impossible, in which case it wouldn’t even generate the logic. I have this argument with Ashutosh from time to time. He tends to say we can never be sure which expressions might or might not produce nulls, so why bother being so strict? I’d rather use every piece of information we can get, because it might allow us to optimize. By the way, there are cases where we can deduce that columns are not-nullable but we cannot change the row-type due to the RelOptRule contract. We ought to be generating RelMdPredicates in these cases. Julian On Aug 23, 2017, at 11:33 AM, Christian Beikov wrote: I am using materializations and my materialization table has nullable columns. Calcite correctly determines that the underlying query can never produce null for these columns, but I didn't encode that into the table definitino. I was just wondering why the casts are generated in the SQL that is sent to the owner of the materialization table. Thanks for the explanation, I suppose the cast causes proper exceptions to be thrown when null is encountered instead of simply a NPE? Or is the cast in such a case really unnecessary? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 23.08.2017 um 20:01 schrieb Julian Hyde: I presume you’re talking about RexNodes. It makes a lot of things easier if the argument to function call are EXACTLY the type required by that function. The system inserts implicit casts where necessary. If you’re generating Java code (in Enumerable convention) there is a lot of difference between INTEGER NOT NULL and INTEGER. The former can be represented by int, the latter only by java.lang.Integer. Julian On Aug 23, 2017, at 2:08 AM, Christian Beikov wrote: Hello, I came to notice that the use of a nullable column in a context where a non-nullable is expected, causes a cast that is essentially useless. Am I missing something and is there reason to why that is done? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Materialization left join rewrite
FWIW I use an implementation of that idea locally and did a few end-to-end tests and it seems to work so far. I can prepare a PR so we can discuss this if you tell me an issue number or that I should create one for this. Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 02:13 schrieb Julian Hyde: Jesus, Can you answer this question for Christian please? Seems like a reasonable use case, and your area of expertise. Julian On Aug 20, 2017, at 12:25 PM, Christian Beikov wrote: I just discovered that queries containing left joins aren't rewritten when materializations are available and wondered why that is a limitation. I looked a bit into the implementation and found out that org.apache.calcite.rel.metadata.RelMdAllPredicates#getAllPredicates(org.apache.calcite.rel.core.Join, org.apache.calcite.rel.metadata.RelMetadataQuery) returns null when encountering a non-inner join. For an inner join, the method returns essentially the join predicate, so I thought a left join should be similar. How about returning an OR node that asserts input refs of "outer-joined" tables are null if unmatched? Let's consider an example query like "from emp e left join dept d on e.deptno = d.deptno" If the join were an inner join, the return would be "=(e.deptno, d.deptno)". In case of a left join I'd suppose it returns "OR(IS_NULL(d.deptno),=(e.deptno, d.deptno))". Would that be sufficient to implement left join support or do you have something different in mind? I couldn't find a ticket for that particular case to discuss a possible implementation so please direct me if you already tracked that issue. -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Casts because of nullability differences
RelOptMaterialization creates the cast which calls eventually into RexUtil.generateCastExpressions. That method, probably rightfully, is strict about nulls. The question is, whether the cast should be optimized away later or it shouldn't be created in the first place. What do you say? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 24.08.2017 um 02:05 schrieb Julian Hyde: You didn’t say that the casts were ending up in generated SQL. That would be a bug. Probably in RelToSqlConverter. On Aug 23, 2017, at 4:35 PM, Christian Beikov wrote: I get that the cast is done "internally" to handle nulls properly, but why would such a cast end up in the SQL that gets sent to the DBMS? Can you point me to a direction where to look for these generated casts? I'd like to try if I can avoid rendering these casts, as some DBMS planners might get confused. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 23.08.2017 um 21:20 schrieb Julian Hyde: It’s difficult to say for sure whether the cast is necessary. For some of us consistency, is an end in itself. :) If a null was to occur, then yes, the cast would throw. But it’s also possible that Calcite can deduce that a null is impossible, in which case it wouldn’t even generate the logic. I have this argument with Ashutosh from time to time. He tends to say we can never be sure which expressions might or might not produce nulls, so why bother being so strict? I’d rather use every piece of information we can get, because it might allow us to optimize. By the way, there are cases where we can deduce that columns are not-nullable but we cannot change the row-type due to the RelOptRule contract. We ought to be generating RelMdPredicates in these cases. Julian On Aug 23, 2017, at 11:33 AM, Christian Beikov wrote: I am using materializations and my materialization table has nullable columns. Calcite correctly determines that the underlying query can never produce null for these columns, but I didn't encode that into the table definitino. I was just wondering why the casts are generated in the SQL that is sent to the owner of the materialization table. Thanks for the explanation, I suppose the cast causes proper exceptions to be thrown when null is encountered instead of simply a NPE? Or is the cast in such a case really unnecessary? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 23.08.2017 um 20:01 schrieb Julian Hyde: I presume you’re talking about RexNodes. It makes a lot of things easier if the argument to function call are EXACTLY the type required by that function. The system inserts implicit casts where necessary. If you’re generating Java code (in Enumerable convention) there is a lot of difference between INTEGER NOT NULL and INTEGER. The former can be represented by int, the latter only by java.lang.Integer. Julian On Aug 23, 2017, at 2:08 AM, Christian Beikov wrote: Hello, I came to notice that the use of a nullable column in a context where a non-nullable is expected, causes a cast that is essentially useless. Am I missing something and is there reason to why that is done? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Materialized view case sensitivity problem
Hey, I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column names of select items are "to-upper-cased". I'm having problems with matching the row types of materialized views and the source sql because of that. Any idea how to fix that? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Materialized view case sensitivity problem
Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a call to org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to configure the parser according to the LEX configuration. Is that a bug or expected? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 11:24 schrieb Christian Beikov: Hey, I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column names of select items are "to-upper-cased". I'm having problems with matching the row types of materialized views and the source sql because of that. Any idea how to fix that? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Materialized view case sensitivity problem
Actually, it seems the root cause is that the materialization uses the wrong configuration. org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable creates a new connection with the default configuration that does TO_UPPER. Would it be ok for it to receive a CalciteConnectionConfig? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 11:36 schrieb Christian Beikov: Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a call to org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to configure the parser according to the LEX configuration. Is that a bug or expected? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 11:24 schrieb Christian Beikov: Hey, I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column names of select items are "to-upper-cased". I'm having problems with matching the row types of materialized views and the source sql because of that. Any idea how to fix that? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Materialized view case sensitivity problem
Well my use case is apparently something that isn't supported yet :D I'd like to have a view instead of a table act as materialization table for a materialized view. The reason is simply that the actual data is in a different non-SQL schema and I need the view to do data conversion. I noticed that materializations only were looking for tables specifically and adapted it to also lookup views, but now I think I am stuck. Since the view is a LogicalProject, the use of getTable() in RelOptMaterialization won't work. Do you have an idea how I could make that work? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 24.08.2017 um 12:57 schrieb Julian Hyde: Or supply your own TableFactory? I'm not quite sure of your use case. I've only tested cases where materialized views are "internal", therefore they work fine with Calcite's default dialect. On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov wrote: Actually, it seems the root cause is that the materialization uses the wrong configuration. org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable creates a new connection with the default configuration that does TO_UPPER. Would it be ok for it to receive a CalciteConnectionConfig? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 24.08.2017 um 11:36 schrieb Christian Beikov: Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a call to org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to configure the parser according to the LEX configuration. Is that a bug or expected? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 24.08.2017 um 11:24 schrieb Christian Beikov: Hey, I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column names of select items are "to-upper-cased". I'm having problems with matching the row types of materialized views and the source sql because of that. Any idea how to fix that? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Materialized view case sensitivity problem
My main problem is the row type equality assertion in org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel) Imagine I have a table "document" with columns "id" and "name". When the JdbcSchema reads the structure, it gets column names in upper case. Now I register a materialized view for a query like "select id, name from document". The materialized table for that view is in my case a view again defined like "select ... AS `id`, ... AS `name` from ...". The row type of my view correctly is "id, name". The row type of the table "document" is "ID, NAME" because the JdbcSchema gets upper cased names. Initially, the row type of the query for the materialized view is also correct, but during the "trim fields" phase the row type gets replaced with the types from the table. Is this replacement of field types even correct? Because of that, the assertion in the substiution visitor fails. What would be the appropriate solution for this mismatch? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 12:57 schrieb Julian Hyde: Or supply your own TableFactory? I'm not quite sure of your use case. I've only tested cases where materialized views are "internal", therefore they work fine with Calcite's default dialect. On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov wrote: Actually, it seems the root cause is that the materialization uses the wrong configuration. org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable creates a new connection with the default configuration that does TO_UPPER. Would it be ok for it to receive a CalciteConnectionConfig? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 24.08.2017 um 11:36 schrieb Christian Beikov: Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a call to org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to configure the parser according to the LEX configuration. Is that a bug or expected? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 24.08.2017 um 11:24 schrieb Christian Beikov: Hey, I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column names of select items are "to-upper-cased". I'm having problems with matching the row types of materialized views and the source sql because of that. Any idea how to fix that? -- Mit freundlichen Grüßen, *Christian Beikov*
Re: Materialization left join rewrite
I created https://issues.apache.org/jira/browse/CALCITE-1965 to track this issue. Apart from tests that are already failing on master, it seems that my implementation(https://github.com/apache/calcite/pull/526) doesn't fail any other tests. I'd love some feedback as that is a very essential feature for me. Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 02:13 schrieb Julian Hyde: Jesus, Can you answer this question for Christian please? Seems like a reasonable use case, and your area of expertise. Julian On Aug 20, 2017, at 12:25 PM, Christian Beikov wrote: I just discovered that queries containing left joins aren't rewritten when materializations are available and wondered why that is a limitation. I looked a bit into the implementation and found out that org.apache.calcite.rel.metadata.RelMdAllPredicates#getAllPredicates(org.apache.calcite.rel.core.Join, org.apache.calcite.rel.metadata.RelMetadataQuery) returns null when encountering a non-inner join. For an inner join, the method returns essentially the join predicate, so I thought a left join should be similar. How about returning an OR node that asserts input refs of "outer-joined" tables are null if unmatched? Let's consider an example query like "from emp e left join dept d on e.deptno = d.deptno" If the join were an inner join, the return would be "=(e.deptno, d.deptno)". In case of a left join I'd suppose it returns "OR(IS_NULL(d.deptno),=(e.deptno, d.deptno))". Would that be sufficient to implement left join support or do you have something different in mind? I couldn't find a ticket for that particular case to discuss a possible implementation so please direct me if you already tracked that issue. -- Mit freundlichen Grüßen, ---- *Christian Beikov*
Re: Materialized view case sensitivity problem
I apparently had a different problem that lead me to believe the view was the problem. In fact, the actual query was the problem. So i have the query for the materialized view "select id as `id`, name as `name` from document" and the query for the normal view "select cast(_MAP['id'] AS bigint) AS `id`, cast(_MAP['name'] AS varchar(255)) AS `name` from elasticsearch_raw.document_index". Now when I send the query "select id as col1, name as col2 from document", the row type at first is "col1 bigint, col2 varchar(255)" and later it becomes "ID bigint, NAME varchar(255)" which is to a specific extent a good thing. The materialization logic determines it can substitue the query, but during the substitution it compares that row type with the one from the view. The Jdbc schema receives the columns in upper case, which is why the row type of the sent query is in upper case. Either the comparison should be case insensitive, or I simply upper case the names of the columns in the view, which is what I did now. Doing that will unfortunately cause a little mismatch in the ES adapter which expects that the field names have the same case as the fields of the row type. This is why I adapted some rules to extract the correctly cased field name from the _MAP expression. Now the question is, should the comparison be case insensitive or should I rely on the fact, that the JDBC schema will always have upper cased column names? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 24.08.2017 um 21:00 schrieb Julian Hyde: Rather than "select id, name from document” could you create your view as "select `id`, `name` from document” (or however the back-end system quotes identifiers). Then “id” would still be in lower-case when the JDBC adapter queries the catalog. On Aug 24, 2017, at 5:17 AM, Christian Beikov wrote: My main problem is the row type equality assertion in org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel) Imagine I have a table "document" with columns "id" and "name". When the JdbcSchema reads the structure, it gets column names in upper case. Now I register a materialized view for a query like "select id, name from document". The materialized table for that view is in my case a view again defined like "select ... AS `id`, ... AS `name` from ...". The row type of my view correctly is "id, name". The row type of the table "document" is "ID, NAME" because the JdbcSchema gets upper cased names. Initially, the row type of the query for the materialized view is also correct, but during the "trim fields" phase the row type gets replaced with the types from the table. Is this replacement of field types even correct? Because of that, the assertion in the substiution visitor fails. What would be the appropriate solution for this mismatch? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 12:57 schrieb Julian Hyde: Or supply your own TableFactory? I'm not quite sure of your use case. I've only tested cases where materialized views are "internal", therefore they work fine with Calcite's default dialect. On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov wrote: Actually, it seems the root cause is that the materialization uses the wrong configuration. org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable creates a new connection with the default configuration that does TO_UPPER. Would it be ok for it to receive a CalciteConnectionConfig? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 11:36 schrieb Christian Beikov: Seems org.apache.calcite.prepare.CalcitePrepareImpl#prepare2_ misses a call to org.apache.calcite.sql.parser.SqlParser.ConfigBuilder#setCaseSensitive to configure the parser according to the LEX configuration. Is that a bug or expected? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 11:24 schrieb Christian Beikov: Hey, I have configured Lex.MYSQL_ANSI but when a query gets parsed, the column names of select items are "to-upper-cased". I'm having problems with matching the row types of materialized views and the source sql because of that. Any idea how to fix that? -- Mit freundlichen Grüßen, *Christian Beikov*
Re: Materialized view case sensitivity problem
I guess it's partly the DBMSes fault, because it either treats identifiers case insensitive or the schema metadata API returns column names in upper case. Not sure if we could do anything about that in the JDBC adapter. I think I will just inspect the JDBC schema before preparing any materializations to avoid the casing problem and having to rely on a specific case. Thanks for the clarifications! Mit freundlichen Grüßen, ---- *Christian Beikov* Am 24.08.2017 um 21:52 schrieb Jesus Camacho Rodriguez: I never hit this issue as we do not go through the JDBC adaptor when we use the MV rewriting within Hive. I am not familiar with that code path, but I guess no matter whether it is MV or a table definition, we should end up doing the same wrt casing column names, thus there should be no need for case insensitive comparison? - Jesús On 8/24/17, 12:19 PM, "Christian Beikov" wrote: I apparently had a different problem that lead me to believe the view was the problem. In fact, the actual query was the problem. So i have the query for the materialized view "select id as `id`, name as `name` from document" and the query for the normal view "select cast(_MAP['id'] AS bigint) AS `id`, cast(_MAP['name'] AS varchar(255)) AS `name` from elasticsearch_raw.document_index". Now when I send the query "select id as col1, name as col2 from document", the row type at first is "col1 bigint, col2 varchar(255)" and later it becomes "ID bigint, NAME varchar(255)" which is to a specific extent a good thing. The materialization logic determines it can substitue the query, but during the substitution it compares that row type with the one from the view. The Jdbc schema receives the columns in upper case, which is why the row type of the sent query is in upper case. Either the comparison should be case insensitive, or I simply upper case the names of the columns in the view, which is what I did now. Doing that will unfortunately cause a little mismatch in the ES adapter which expects that the field names have the same case as the fields of the row type. This is why I adapted some rules to extract the correctly cased field name from the _MAP expression. Now the question is, should the comparison be case insensitive or should I rely on the fact, that the JDBC schema will always have upper cased column names? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 24.08.2017 um 21:00 schrieb Julian Hyde: Rather than "select id, name from document” could you create your view as "select `id`, `name` from document” (or however the back-end system quotes identifiers). Then “id” would still be in lower-case when the JDBC adapter queries the catalog. On Aug 24, 2017, at 5:17 AM, Christian Beikov wrote: My main problem is the row type equality assertion in org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel) Imagine I have a table "document" with columns "id" and "name". When the JdbcSchema reads the structure, it gets column names in upper case. Now I register a materialized view for a query like "select id, name from document". The materialized table for that view is in my case a view again defined like "select ... AS `id`, ... AS `name` from ...". The row type of my view correctly is "id, name". The row type of the table "document" is "ID, NAME" because the JdbcSchema gets upper cased names. Initially, the row type of the query for the materialized view is also correct, but during the "trim fields" phase the row type gets replaced with the types from the table. Is this replacement of field types even correct? Because of that, the assertion in the substiution visitor fails. What would be the appropriate solution for this mismatch? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 12:57 schrieb Julian Hyde: Or supply your own TableFactory? I'm not quite sure of your use case. I've only tested cases where materialized views are "internal", therefore they work fine with Calcite's default dialect. On Thu, Aug 24, 2017 at 3:21 AM, Christian Beikov wrote: Actually, it seems the root cause is that the materialization uses the wrong configuration. org.apache.calcite.materialize.MaterializationService.DefaultTableFactory#createTable creates a new connection with the default configuration that does TO_UPPER. Would it be ok for it to receive a CalciteConnectionConfig? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 11:36 schrieb Christian Beikov: Seems org.apa
Re: Casts because of nullability differences
It apparently has something to do with MV substitutions as I couldn't reproduce it in any other way than using a query that can be substituted by an MV. Hard to say where the problem might be. Is there a specific phase in the planning for this optimization that might not be triggered anymore after substitutions? Any idea where I should dig into? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 24.08.2017 um 21:31 schrieb Julian Hyde: Optimized away later. In fact during SQL generation. SQL has no concept of “casting away nullability”. On Aug 23, 2017, at 6:10 PM, Christian Beikov wrote: RelOptMaterialization creates the cast which calls eventually into RexUtil.generateCastExpressions. That method, probably rightfully, is strict about nulls. The question is, whether the cast should be optimized away later or it shouldn't be created in the first place. What do you say? Mit freundlichen Grüßen, *Christian Beikov* Am 24.08.2017 um 02:05 schrieb Julian Hyde: You didn’t say that the casts were ending up in generated SQL. That would be a bug. Probably in RelToSqlConverter. On Aug 23, 2017, at 4:35 PM, Christian Beikov wrote: I get that the cast is done "internally" to handle nulls properly, but why would such a cast end up in the SQL that gets sent to the DBMS? Can you point me to a direction where to look for these generated casts? I'd like to try if I can avoid rendering these casts, as some DBMS planners might get confused. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 23.08.2017 um 21:20 schrieb Julian Hyde: It’s difficult to say for sure whether the cast is necessary. For some of us consistency, is an end in itself. :) If a null was to occur, then yes, the cast would throw. But it’s also possible that Calcite can deduce that a null is impossible, in which case it wouldn’t even generate the logic. I have this argument with Ashutosh from time to time. He tends to say we can never be sure which expressions might or might not produce nulls, so why bother being so strict? I’d rather use every piece of information we can get, because it might allow us to optimize. By the way, there are cases where we can deduce that columns are not-nullable but we cannot change the row-type due to the RelOptRule contract. We ought to be generating RelMdPredicates in these cases. Julian On Aug 23, 2017, at 11:33 AM, Christian Beikov wrote: I am using materializations and my materialization table has nullable columns. Calcite correctly determines that the underlying query can never produce null for these columns, but I didn't encode that into the table definitino. I was just wondering why the casts are generated in the SQL that is sent to the owner of the materialization table. Thanks for the explanation, I suppose the cast causes proper exceptions to be thrown when null is encountered instead of simply a NPE? Or is the cast in such a case really unnecessary? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 23.08.2017 um 20:01 schrieb Julian Hyde: I presume you’re talking about RexNodes. It makes a lot of things easier if the argument to function call are EXACTLY the type required by that function. The system inserts implicit casts where necessary. If you’re generating Java code (in Enumerable convention) there is a lot of difference between INTEGER NOT NULL and INTEGER. The former can be represented by int, the latter only by java.lang.Integer. Julian On Aug 23, 2017, at 2:08 AM, Christian Beikov wrote: Hello, I came to notice that the use of a nullable column in a context where a non-nullable is expected, causes a cast that is essentially useless. Am I missing something and is there reason to why that is done? -- Mit freundlichen Grüßen, ---- *Christian Beikov*
Materialization performance
Hey, I have been looking a bit into how materialized views perform during the planning because of a very long test run(MaterializationTest#testJoinMaterializationUKFK6) and the current state is problematic. CalcitePrepareImpl#getMaterializations always reparses the SQL and down the line, there is a lot of expensive work(e.g. predicate and lineage determination) done during planning that could easily be pre-calculated and cached during materialization creation. There is also a bit of a thread safety problem with the current implementation. Unless there is a different safety mechanism that I don't see, the sharing of the MaterializationService and thus also the maps in MaterializationActor via a static instance between multiple threads is problematic. Since I mentioned thread safety, how is Calcite supposed to be used in a multi-threaded environment? Currently I use a connection pool that initializes the schema on new connections, but that is not really nice. I suppose caches are also bound to the connection? A thread safe context that can be shared between connections would be nice to avoid all that repetitive work. Are these known issues which you have thought about how to fix or should I log JIRAs for these and fix them to the best of my knowledge? I'd more or less keep the service shared but would implement it using a copy on write strategy since I'd expect seldom schema changes after startup. Regarding the repetitive work that partly happens during planning, I'd suggest doing that during materialization registration instead like it is already mentioned CalcitePrepareImpl#populateMaterializations. Would that be ok? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Materialization performance
If the metadata was cached, that would be awesome, especially because that would also improve the prformance regarding the metadata retrival for the query currently being planned, although I am not sure how the caching would work since the RelNodes are mutable. Have you considered implementing the filter tree index explained in the paper? As far as I understood, the whole thing only works when a redundant table elimination is implemented. Is that the case? If so, or if it can be done easily, I'd propose we initialize all the lookup structures during registration and use them during planning. This will improve planning time drastically and essentially handle the scalability problem you mention. What other MV-related issues are on your personal todo list Jesus? I read the paper now and think I can help you in one place or another if you want. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez: Hive does not use the Calcite SQL parser, thus we follow a different path and did not experience the problem on the Calcite end. However, FWIW we avoided reparsing the SQL every time a query was being planned by creating/managing our own cache too. The metadata providers implement some caching, thus I would expect that once you avoid reparsing every MV, the retrieval time of predicates, lineage, etc. would improve (at least after using the MV for the first time). However, I agree that the information should be inferred when the MV is loaded. In fact, maybe just making some calls to the metadata providers while the MVs are being loaded would do the trick (Julian should confirm this). Btw, probably you will find another scalability issue as the number of MVs grows large with the current implementation of the rewriting, since the´ pre-filtering implementation in place does not discard many of the views that are not valid to rewrite a given query, and rewriting is attempted with all of them. This last bit is work that I would like to tackle shortly, but I have not created the corresponding JIRA yet. -Jesús On 8/27/17, 10:43 PM, "Rajat Venkatesh" wrote: Thread Safety and repeated parsing is a problem. We have experience with managing 10s of materialized views. Repeated parsing takes more time than execution of the query itself. We also have a similar problem where concurrent queries (with a different set of materialized views potentailly) maybe planned at the same time. We solved it through maintaining a cache and carefully setting the cache in a thread local. Relevant code for inspiration: https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov wrote: Hey, I have been looking a bit into how materialized views perform during the planning because of a very long test run(MaterializationTest#testJoinMaterializationUKFK6) and the current state is problematic. CalcitePrepareImpl#getMaterializations always reparses the SQL and down the line, there is a lot of expensive work(e.g. predicate and lineage determination) done during planning that could easily be pre-calculated and cached during materialization creation. There is also a bit of a thread safety problem with the current implementation. Unless there is a different safety mechanism that I don't see, the sharing of the MaterializationService and thus also the maps in MaterializationActor via a static instance between multiple threads is problematic. Since I mentioned thread safety, how is Calcite supposed to be used in a multi-threaded environment? Currently I use a connection pool that initializes the schema on new connections, but that is not really nice. I suppose caches are also bound to the connection? A thread safe context that can be shared between connections would be nice to avoid all that repetitive work. Are these known issues which you have thought about how to fix or should I log JIRAs for these and fix them to the best of my knowledge? I'd more or less keep the service shared but would implement it using a copy on write strategy since I'd expect seldom schema changes after startup. Regarding the repetitive work that partly happens during planning, I'd suggest doing that during materialization registration instead like it is already mentioned CalcitePrepareImpl#populateMaterializations. Would that be ok? -- Mit freundlichen Grüßen, -------- *Christian Beikov*
Re: Materialization performance
I'd like to stick to trying to figure out how to support outer joins for now and when I have an implementation for that, I'd look into the filter tree index if you haven't done it by then. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 28.08.2017 um 20:01 schrieb Jesus Camacho Rodriguez: Christian, The implementation of the filter tree index is what I was referring to indeed. In the initial implementation I focused on the rewriting coverage, but now that the first part is finished, it is at the top of my list as I think it is critical to make the whole query rewriting algorithm work at scale. However, I have not started yet. The filter tree index will help to filter not only based on the tables used by a given query, but also for queries that do not meet the equivalence classes conditions, filter conditions, etc. We could implement all the preconditions mentioned in the paper, and we could add our own additional ones. I also think that in a second version, we might need to maybe add some kind of ranking/limit as many views might meet the preconditions for a given query. It seems you understood how it should work, so if you could help to quickstart that work by maybe implementing a first version of the filter tree index with a couple of basic conditions (table matching and EC matching?), that would be great. I could review any of the contributions you make. -Jesús On 8/28/17, 3:22 AM, "Christian Beikov" wrote: If the metadata was cached, that would be awesome, especially because that would also improve the prformance regarding the metadata retrival for the query currently being planned, although I am not sure how the caching would work since the RelNodes are mutable. Have you considered implementing the filter tree index explained in the paper? As far as I understood, the whole thing only works when a redundant table elimination is implemented. Is that the case? If so, or if it can be done easily, I'd propose we initialize all the lookup structures during registration and use them during planning. This will improve planning time drastically and essentially handle the scalability problem you mention. What other MV-related issues are on your personal todo list Jesus? I read the paper now and think I can help you in one place or another if you want. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez: Hive does not use the Calcite SQL parser, thus we follow a different path and did not experience the problem on the Calcite end. However, FWIW we avoided reparsing the SQL every time a query was being planned by creating/managing our own cache too. The metadata providers implement some caching, thus I would expect that once you avoid reparsing every MV, the retrieval time of predicates, lineage, etc. would improve (at least after using the MV for the first time). However, I agree that the information should be inferred when the MV is loaded. In fact, maybe just making some calls to the metadata providers while the MVs are being loaded would do the trick (Julian should confirm this). Btw, probably you will find another scalability issue as the number of MVs grows large with the current implementation of the rewriting, since the´ pre-filtering implementation in place does not discard many of the views that are not valid to rewrite a given query, and rewriting is attempted with all of them. This last bit is work that I would like to tackle shortly, but I have not created the corresponding JIRA yet. -Jesús On 8/27/17, 10:43 PM, "Rajat Venkatesh" wrote: Thread Safety and repeated parsing is a problem. We have experience with managing 10s of materialized views. Repeated parsing takes more time than execution of the query itself. We also have a similar problem where concurrent queries (with a different set of materialized views potentailly) maybe planned at the same time. We solved it through maintaining a cache and carefully setting the cache in a thread local. Relevant code for inspiration: https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov wrote: Hey, I have been looking a bit into how materialized views perform during the planning because of a very long test run(MaterializationTest#testJoinMaterializationUKFK6) and the current state is problematic. CalcitePrepareImpl#getMaterializations always reparses the SQL and down the line, there is a lot of expensive work(e.g. predicate and lineage determination) done during planning that could easily be pre-calculated and cached during materialization creation. There is al
Re: Materialization performance
If it were a proper actor like you described it, concurrency wouldn't be a problem, but right now it is just a global holder for non-concurrent hash maps which is the problem. Currently, it's simply not thread safe. I don't see a benefit in having request and response queues, I'd rather make registration and retrival synchronous. Maybe you could explain to me why you were favoring that model? Having to go through concurrent queues for every planner invocation seems to me like an overkill. I'd rather have immutable state being CASed(compare-and-swap) to make the querying cheap and do updates in an optimistic concurrency control manner. What do you say? Would that be a possibility? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 28.08.2017 um 21:31 schrieb Julian Hyde: I gave some thought to performance and thread safety when I added materialized view support. I didn’t follow through and test at high load and parallelism because at that point, functionality was more important. I’m glad we’re having the discussion now. The solution I settled on is the actor model[1]. That means that one thread is responsible for accessing a critical data structure (in this case, the set of valid materialized views). Other threads are not allowed to modify, or even see, mutable state. They can send immutable requests, and get immutable objects back in response. This is manifested as the MaterializationService and MaterializationActor; see the comment in the latter: // Not an actor yet -- TODO make members private and add request/response // queues If we did that I think we would be well on the way to a thread-safe architecture. We can improve performance further, if necessary, by reducing the work that has to be done by the actor, as long as it alone is responsible for the mutable state. Julian [1] https://en.wikipedia.org/wiki/Actor_model <https://en.wikipedia.org/wiki/Actor_model> On Aug 28, 2017, at 11:01 AM, Jesus Camacho Rodriguez wrote: Christian, The implementation of the filter tree index is what I was referring to indeed. In the initial implementation I focused on the rewriting coverage, but now that the first part is finished, it is at the top of my list as I think it is critical to make the whole query rewriting algorithm work at scale. However, I have not started yet. The filter tree index will help to filter not only based on the tables used by a given query, but also for queries that do not meet the equivalence classes conditions, filter conditions, etc. We could implement all the preconditions mentioned in the paper, and we could add our own additional ones. I also think that in a second version, we might need to maybe add some kind of ranking/limit as many views might meet the preconditions for a given query. It seems you understood how it should work, so if you could help to quickstart that work by maybe implementing a first version of the filter tree index with a couple of basic conditions (table matching and EC matching?), that would be great. I could review any of the contributions you make. -Jesús On 8/28/17, 3:22 AM, "Christian Beikov" wrote: If the metadata was cached, that would be awesome, especially because that would also improve the prformance regarding the metadata retrival for the query currently being planned, although I am not sure how the caching would work since the RelNodes are mutable. Have you considered implementing the filter tree index explained in the paper? As far as I understood, the whole thing only works when a redundant table elimination is implemented. Is that the case? If so, or if it can be done easily, I'd propose we initialize all the lookup structures during registration and use them during planning. This will improve planning time drastically and essentially handle the scalability problem you mention. What other MV-related issues are on your personal todo list Jesus? I read the paper now and think I can help you in one place or another if you want. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez: Hive does not use the Calcite SQL parser, thus we follow a different path and did not experience the problem on the Calcite end. However, FWIW we avoided reparsing the SQL every time a query was being planned by creating/managing our own cache too. The metadata providers implement some caching, thus I would expect that once you avoid reparsing every MV, the retrieval time of predicates, lineage, etc. would improve (at least after using the MV for the first time). However, I agree that the information should be inferred when the MV is loaded. In fact, maybe just making some calls to the metadata providers while the MVs are being loaded would do the trick (Julian should confi
Re: Materialization performance
Imagine the holder of the various hash maps is immutable, let's call it "actor". When a new registration is done, we create a copy of that holder and CAS it. When we query, we simply get the current value and access it's maps. So MaterializationService could have an AtomicReference to a holder "actor" just like right now, but we make the maps immutable and create copies whenever a change occurs. We could hide such details behind a message passing interface so that remote models can be implemented too, but that seems like a next step. The materialization concurrency issues isn't the only problem, what about the general usage in multithreaded environments? The whole schema is currently bound to a CalciteConnection. It would be nice if all the context could be shared between multiple connections so that we avoid having to initialize every connection. Do you have any plans to tackle that or am I not seeing how to achieve this? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 29.08.2017 um 19:40 schrieb Julian Hyde: I'd rather have immutable state being CASed(compare-and-swap) to make the querying cheap and do updates in an optimistic concurrency control manner. Compare and swap only works for one memory address. You can't use it to, say, debit one bank account and credit another. The set of valid materializations is just about the only mutable state in Calcite and I think it will need to be several interconnected data structures. So, compare-and-swap (or its high-level equivalent, ConcurrentHashMap) won't cut it. So we could use locks/monitors (the "synchronized" keyword) or we could use an actor. The key difference between the two is who does the work. With a monitor, each customer grabs the key (there is only one key), walks into the bank vault, and moves the money from one deposit box to another. With an actor, there is a bank employee in the vault who is the only person allowed to move money around. The work done is the same in both models. There are performance advantages of the actor model (the data structures will tend to exist in one core's cache) and there are code simplicity advantages (the critical code is all in one class or package). The overhead of two puts/gets on an ArrayBlockingQueue per request is negligible. And besides, you can switch to a non-actor implementation of the service if Calcite is single-threaded. I haven't thought out the details of multi-tenant. It is not true to say that this is "not a primary requirement for the Calcite project." Look at the "data grid (cache)" on the diagram in my "Optiq" talk [1] from 2013. Dynamic materialized views were in from the very start. There can be multiple instances of the actor (each with their own request/response queues), so you could have one per tenant. Also, it is very straightforward to make the actors remote, replacing the queues with RPC over a message broker. Remote actors are called services. Julian [1] https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework On Tue, Aug 29, 2017 at 8:25 AM, Jesus Camacho Rodriguez wrote: LGTM, I think by the time we have support for the outer joins, I might have had time to finish the filter tree index implementation too. -Jesús On 8/29/17, 3:11 AM, "Christian Beikov" wrote: I'd like to stick to trying to figure out how to support outer joins for now and when I have an implementation for that, I'd look into the filter tree index if you haven't done it by then. Mit freundlichen Grüßen, *Christian Beikov* Am 28.08.2017 um 20:01 schrieb Jesus Camacho Rodriguez: Christian, The implementation of the filter tree index is what I was referring to indeed. In the initial implementation I focused on the rewriting coverage, but now that the first part is finished, it is at the top of my list as I think it is critical to make the whole query rewriting algorithm work at scale. However, I have not started yet. The filter tree index will help to filter not only based on the tables used by a given query, but also for queries that do not meet the equivalence classes conditions, filter conditions, etc. We could implement all the preconditions mentioned in the paper, and we could add our own additional ones. I also think that in a second version, we might need to maybe add some kind of ranking/limit as many views might meet the preconditions for a given query. It seems you understood how it should work, so if you could help to quickstart that work by maybe implementing a first version of the filter tree index with a couple of basic conditions (table matching and EC matching?), that would be great. I could review any of the contributions you make. -Jesús On 8/28/17, 3:22 AM, &
Re: Materialization performance
Of course steps 2 and 3 depend on what you read, but if a change happens in the meantime you'r CAS will fail in step 3 since all changes are done through such a CAS, so you have to "redo" the transaction or parts of it. That's basically optimistic locking :) The important part is, that the whole holder is replaced so you can do guarantee safety by doing a single CAS. Imagine the actor field in MaterializationService is wrapped by an AtomicReference and all maps in the MaterializationActor are immutable. The only way to change a thing is to read the actor, create a copy of it with the new state and do a CAS through the atomic reference. That would already solve all thread safety issues that the current design has. Could you maybe comment on the context sharing between connections part too? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 30.08.2017 um 21:31 schrieb Julian Hyde: Consider a “transaction” that involves reads and writes: Read from a data structure Do some stuff Write to the data structure If steps 2 and 3 depend on what you read in step 1, then you need to prevent anyone from writing until you have written. A simple CAS won’t solve this. The simplest solution is for the whole transaction to be in a critical section. It doesn’t really matter whether that is implemented using an actor or synchronized blocks. We are mostly in agreement - especially about using immutable data structures for anything shared between threads. Julian On Aug 29, 2017, at 2:01 PM, Christian Beikov wrote: Imagine the holder of the various hash maps is immutable, let's call it "actor". When a new registration is done, we create a copy of that holder and CAS it. When we query, we simply get the current value and access it's maps. So MaterializationService could have an AtomicReference to a holder "actor" just like right now, but we make the maps immutable and create copies whenever a change occurs. We could hide such details behind a message passing interface so that remote models can be implemented too, but that seems like a next step. The materialization concurrency issues isn't the only problem, what about the general usage in multithreaded environments? The whole schema is currently bound to a CalciteConnection. It would be nice if all the context could be shared between multiple connections so that we avoid having to initialize every connection. Do you have any plans to tackle that or am I not seeing how to achieve this? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 29.08.2017 um 19:40 schrieb Julian Hyde: I'd rather have immutable state being CASed(compare-and-swap) to make the querying cheap and do updates in an optimistic concurrency control manner. Compare and swap only works for one memory address. You can't use it to, say, debit one bank account and credit another. The set of valid materializations is just about the only mutable state in Calcite and I think it will need to be several interconnected data structures. So, compare-and-swap (or its high-level equivalent, ConcurrentHashMap) won't cut it. So we could use locks/monitors (the "synchronized" keyword) or we could use an actor. The key difference between the two is who does the work. With a monitor, each customer grabs the key (there is only one key), walks into the bank vault, and moves the money from one deposit box to another. With an actor, there is a bank employee in the vault who is the only person allowed to move money around. The work done is the same in both models. There are performance advantages of the actor model (the data structures will tend to exist in one core's cache) and there are code simplicity advantages (the critical code is all in one class or package). The overhead of two puts/gets on an ArrayBlockingQueue per request is negligible. And besides, you can switch to a non-actor implementation of the service if Calcite is single-threaded. I haven't thought out the details of multi-tenant. It is not true to say that this is "not a primary requirement for the Calcite project." Look at the "data grid (cache)" on the diagram in my "Optiq" talk [1] from 2013. Dynamic materialized views were in from the very start. There can be multiple instances of the actor (each with their own request/response queues), so you could have one per tenant. Also, it is very straightforward to make the actors remote, replacing the queues with RPC over a message broker. Remote actors are called services. Julian [1] https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework On Tue, Aug 29, 2017 at 8:25 AM, Jesus Camacho Rodriguez wrote: LGTM, I think by the time we have support for the outer joins, I might have had t
Re: Materialization performance
My CAS scheme was merely meant for Materialization registration. The retry is an implementation detail and would happen inside of the registration method, the user wouldn't notice that. APIs stay the way they are, I'd only change the way the MaterializationActor is accessed. The (root) schema is one part that I'd like to see being shared, but I guess the type factory as well as the CalciteServer should be shared too between connections. Is there anything else you think that can/should be shared? I could implement the discussed sharing as a javax.sql.DataSource if you want so we can discuss specifics. Along the way I'd try to do some general performance improvements regarding concurrency synchronization mechanisms. Would that be ok? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 31.08.2017 um 00:02 schrieb Julian Hyde: Schema is the context you are referring to. Schema has a longer lifespan than Connection, and if you make an immutable one (which we recommend) you can share it among connections. Your CAS scheme would work but requires each user to create a copy of all of the materialization state. This is potentially large (thousands of tables) and rapidly changing. Also, your scheme requires the user to re-try. I think the actor model is better suited for this. On Wed, Aug 30, 2017 at 2:14 PM, Christian Beikov wrote: Of course steps 2 and 3 depend on what you read, but if a change happens in the meantime you'r CAS will fail in step 3 since all changes are done through such a CAS, so you have to "redo" the transaction or parts of it. That's basically optimistic locking :) The important part is, that the whole holder is replaced so you can do guarantee safety by doing a single CAS. Imagine the actor field in MaterializationService is wrapped by an AtomicReference and all maps in the MaterializationActor are immutable. The only way to change a thing is to read the actor, create a copy of it with the new state and do a CAS through the atomic reference. That would already solve all thread safety issues that the current design has. Could you maybe comment on the context sharing between connections part too? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 30.08.2017 um 21:31 schrieb Julian Hyde: Consider a “transaction” that involves reads and writes: Read from a data structure Do some stuff Write to the data structure If steps 2 and 3 depend on what you read in step 1, then you need to prevent anyone from writing until you have written. A simple CAS won’t solve this. The simplest solution is for the whole transaction to be in a critical section. It doesn’t really matter whether that is implemented using an actor or synchronized blocks. We are mostly in agreement - especially about using immutable data structures for anything shared between threads. Julian On Aug 29, 2017, at 2:01 PM, Christian Beikov wrote: Imagine the holder of the various hash maps is immutable, let's call it "actor". When a new registration is done, we create a copy of that holder and CAS it. When we query, we simply get the current value and access it's maps. So MaterializationService could have an AtomicReference to a holder "actor" just like right now, but we make the maps immutable and create copies whenever a change occurs. We could hide such details behind a message passing interface so that remote models can be implemented too, but that seems like a next step. The materialization concurrency issues isn't the only problem, what about the general usage in multithreaded environments? The whole schema is currently bound to a CalciteConnection. It would be nice if all the context could be shared between multiple connections so that we avoid having to initialize every connection. Do you have any plans to tackle that or am I not seeing how to achieve this? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 29.08.2017 um 19:40 schrieb Julian Hyde: I'd rather have immutable state being CASed(compare-and-swap) to make the querying cheap and do updates in an optimistic concurrency control manner. Compare and swap only works for one memory address. You can't use it to, say, debit one bank account and credit another. The set of valid materializations is just about the only mutable state in Calcite and I think it will need to be several interconnected data structures. So, compare-and-swap (or its high-level equivalent, ConcurrentHashMap) won't cut it. So we could use locks/monitors (the "synchronized" keyword) or we could use an actor. The key difference between the two is who does the work. With a monitor, each customer grabs the key (there is only one key), walks into the bank vault,
Re: JdbcAdapter Sort Limit and Offset
I might not be qualified to answer how to handle limit/offset, but I'd like to step in and say that we had a related discussion regarding where to put sequence support. How about introducing unparse methods in the SqlDialect for that stuff and let the dialects handle that however a DBMS supports it. Since the dialect is constructed with the database metadata it could even do further self configuration i.e. check if configurations are activated on the DBMS that enable the use of a certain capability. If we want to support DBMSs that don't have a clause which we can use for doing limit/offset we need to wrap the query and use whatever the DBMS has to offer like e.g. ROWNUM. I'd like to suggest we introduce a SqlDialectFactory which can be configured on the JdbcSchema. The sole purpose of that factory is to read the database metadata and construct an appropriate SqlDialect. The default factory looks at the database metadata and constructs an appropriate SqlDialect, although that dialect can then do some self configuration with that metadata. We'd make the current SqlDialect abstract and create implementations for at least every DBMS that we support. We could still have components like e.g. a SequenceExtractor or LimitOffsetRenderer, but the dialects choose the implementation based on the database metadata. What do you think? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 31.08.2017 um 02:23 schrieb Luis Fernando Kauer: Hi, I was really upset that Calcite wouldn't push limit keyword to the database, not even in simple queries like "select * from t limit 10". Actually, not even ORDER BY was pushed to the database even in simple queries. So I decided to take a look into it and opened Jira [CALCITE-1906] some time ago. I found out that SortRemoveRule is applied when there is a RelCollationTraitDef, because it assumes that the table is already ordered, so JdbcSortRule must replace the input's treatSet with RelCollations.EMPTY, just like MongoDB and Cassandra adapters do. Besides that, a simple fix to JdbcSortRule and JdbcSort allows it to push limit and fetch to the database. Unfortunately, after fixing locally this other issues appeared. When issuing a query with sort and no limit/fetch, like: SELECT empno, ename FROM "SCOTT"."EMP" where ename='CLARK' order by ename the selected plan is: JdbcToEnumerableConverter JdbcProject(EMPNO=[$0], ENAME=[$1]) JdbcSort(sort0=[$1], dir0=[ASC]) JdbcFilter(condition=[=($1, 'CLARK')]) JdbcTableScan(table=[[SCOTT, EMP]]))Notice that JdbcProject is above JdbcSort and that's because SortProjectTransposeRule is applied. The problem is that the generated SQL is: SELECT "EMPNO", "ENAME" FROM (SELECT * FROM "SCOTT"."EMP" WHERE "ENAME" = 'CLARK' ORDER BY "ENAME" NULLS LAST) AS "t0" Noticed that the JdbcProject uses a subquery instead of being built at the same query level. The problem is that SqlImplementor.Clause is ordered and "SELECT" is before "ORDER_BY" and SqlImplementor.Builder has the following check: for (Clause clause : clauses) { if (maxClause.ordinal() > clause.ordinal() || (maxClause == clause && !nonWrapSet.contains(clause))) { needNew = true; } } So what happens is that "PROJECT" is added after "ORDER_BY" and the code above flags the need for a new query. Changing SELECT order in SqlImplementor.Clause to the last fixes this issue but I don't know if there is any side effect or problem doing this. I also noticed that after fixing all this, the JdbcAdapterTests.testColumnNullability was failing and the problem seems to be how JdbcAdapter is generating the SQL for HSQLDB now that limit clause is pushed to the database. The generated SQL is: SELECT "employee_id", "position_id" FROM "foodmart"."employee" FETCH NEXT 10 ROWS ONLY And the error is: Caused by: org.hsqldb.HsqlException: unexpected token: NEXT : line: 3 Seems that HSQLDB has a problem with FETCH NEXT without OFFSET. One solution could be to change "NEXT" with "FIRST" in SqlPrettyWriter, but this is done to all the databases that support this feature, or could just make HSQLDB not support OffsetFetch in SqlDialect, so it would use LIMIT/OFFSET instead. Sorry to bring all this together, but one thing led to another, and fixing one thing started to break other tests and I would really like to have this fixed. Best regards, Luis Fernando
Sequence support
Hey, I'm having some problems implementing CALCITE-1940 and wanted to ask you for advice about how to tackle it. A query like "select next value for seq1" where "seq1" is a sequence in e.g. PostgreSQL is will generate a query plan like "LogicalProject(NEXT VALUE FOR seq1) -> LogicalValues( (0) )". As far as I understood, the convention to use for executing such a query is determined based on the input. The planner now always chooses the "EnumerableValues" implementation for "LogicalValues" thus the statement is never pushed down to PostgreSQL. My question is, how would I be able to make the expression "NEXT VALUE FOR seq1", represented by the new subtype RexSeqCall, be executed on PostgreSQL? The sequence "seq1" is an entry in the JdbcSchema, but the current code doesn't seem to know that. During planning it seems only the input types are considered for the convention determination. I hope you understand the problem I am having. Has anyone ideas of how to fix that? I thought it might be possible to write a planner rule that finds such expressions and transforms them, but that's only one part I guess. Currently, the planner thinks a plan like "EnumerableProject(NEXT VALUE FOR seq1) -> EnumerableValues( (0) )" is better than "JdbcProject(NEXT VALUE FOR seq1) -> JdbcValues( (0) )" which it isn't. Any ideas how I could fix that? -- Mit freundlichen Grüßen, *Christian Beikov*
Re: adding CREATE TABLE DDL support?
Sounds like a good idea. Mit freundlichen Grüßen, *Christian Beikov* Am 08.09.2017 um 19:28 schrieb Julian Hyde: It’s in core/src/test/codegen/includes/parserImpls.ftl (i.e. a parser extension used in tests) but we don’t want to add it to the core parser because other projects want to have their own CREATE TABLE syntax. I have been thinking of moving DDL into a separate module, so if people want to use Calcite+DDL in production they could do so. What do people think? Julian On Sep 8, 2017, at 7:53 AM, 王翔宇 wrote: Looks like CREATE is not currently supported by the parser. I wanna add it.
Re: 1.14.0 Release candidate
I could setup test automation on TravisCI for most of the different databases if you want with some description amd scripts for how to run the docker containers for DBs locally. Would that help? It's almost as flexible as Jenkins. Regards, Christian Am 13.09.2017 22:46 schrieb "Michael Mior" : > Created https://issues.apache.org/jira/browse/CALCITE-1985 to track the > issues with the MongoDB adapter. If someone else could have a quick look > that would be great. Failed test output is in there. > > -- > Michael Mior > mm...@apache.org > > 2017-09-13 14:29 GMT-04:00 Michael Mior : > > > That's the major blocker here. I don't want to make a release when the > > Cassandra adapter is completely broken for me. Although if others can > > confirm that there are no issues, then maybe we should proceed and I'll > > figure out the issues with my environment later. > > > > There are also a couple integration test failures on MongoDB that I'd > like > > to resolve, but at least one of those seems spurious as the generated > plan > > also seems ok. I'll open up JIRA for those and any failures I encounter > > with details in case someone else is able to have a quick look. > > > > Other than test failures, I don't believe there are any outstanding bug > > fixes or PRs that need to be merged. As mentioned earlier, if someone > could > > run integration tests for Oracle, that would be great. I had a brief look > > over the Coverity scan results and didn't see anything that looks worth > > blocking over, but I think it would be a good idea to have a more > thorough > > review in the future and also to set up some models to prune spurious > > warnings so it's easier to take action in the future. > > > > -- > > Michael Mior > > mm...@apache.org > > > > 2017-09-13 14:10 GMT-04:00 Julian Hyde : > > > >> How close are we to a release candidate? Commits to master are paused, > >> so let's either make an RC soon or re-open the branch to commits. > >> > >> Michael and I seem to be deadlocked on > >> https://issues.apache.org/jira/browse/CALCITE-1981. Michael gets a > >> build error every time, I never get a build error or a runtime error. > >> (I got one once - due to stale jars on my class path, I think.) > >> > >> Can someone else please try to reproduce the build error? Then we'll > >> know whether it's me or Michael who has a messed-up environment. > >> > >> Julian > >> > >> > >> On Fri, Sep 8, 2017 at 1:59 PM, Julian Hyde wrote: > >> > I don’t think there is anyone who has knowledge of MongoDB and time to > >> make the fixes. > >> > > >> >> On Sep 7, 2017, at 10:06 AM, Michael Mior > wrote: > >> >> > >> >> Thanks. I'm also seeing some integration test failures for the > MongoDB > >> >> adapter. If someone more familiar with Mongo could check that out, > that > >> >> would be great. > >> >> > >> >> -- > >> >> Michael Mior > >> >> mm...@apache.org > >> >> > >> >> 2017-09-07 12:51 GMT-04:00 Julian Hyde : > >> >> > >> >>> OK, doing that now. The coverity config is a little rusty, so it may > >> >>> take a while before I have it working. I'll let you know. > >> >>> > >> >>> I discovered a couple of tests that were failing on windows, so I'll > >> >>> be committing fixes for those also. > >> >>> > >> >>> Other PRs should wait until after the release. > >> >>> > >> >>> On Wed, Sep 6, 2017 at 6:21 PM, Michael Mior > >> wrote: > >> Could you also trigger a new Coverity scan? I don't have write > >> access to > >> your repo. > >> > >> -- > >> Michael Mior > >> mm...@apache.org > >> > >> 2017-09-06 15:50 GMT-04:00 Julian Hyde : > >> > >> > I’m good to go. > >> > > >> > Today, I will check that Calcite build / test on still works > >> Windows. > >> > > >> > Julian > >> > > >> >> On Sep 6, 2017, at 11:48 AM, Michael Mior > >> wrote: > >> >> > >> >> As far as I know, all changes people are hoping to have in the > >> 1.14.0 > >> >> release have landed. Please speak up if that is not the case, > >> >>> otherwise > >> > I'm > >> >> hoping to prepare RC0 for tomorrow. > >> >> > >> >> -- > >> >> Michael Mior > >> >> mm...@apache.org > >> > > >> > > >> >>> > >> > > >> > > > > >
Re: Using a calcite DB locally as well as exposing it over Avatica?
I might be wrong, but as far as my knowledge goes, you can't have more than a single connection to the same schema right now. I am planning to work on that soon by implementing a datasource. As part of that I would take a look at how Aviatica's remote capabilities fit into. Until then, I'd rather stick to testing for now. Regards, Christian Am 18.09.2017 8:36 vorm. schrieb "Nick Stephen" : Hi folks, We’ve started experimenting with Calcite for some work and have been running it in-process using a datasource wrapped around a local connection (using jdbc:calcite). I’d now like to expose this same database remotely using avatica, but can’t see any APIs in avatica which can take a Connection object. If I create a Meta with the same local URL then I understandably get a different DB instance. Could someone point out the appropriate API, or are there any examples of exposing a calcite DB remotely using avatica whilst still being able to use it locally in-process to avoid the additional network hop. Thanks! Nick
Travis CI build
I just wanted to give you a little update on this since I have to wait for a response now. I tried to debug why the build hangs on the Travis CI infrastructure and it seems this is related to a known issue regarding parallel execution of parameterized tests: https://issues.apache.org/jira/browse/SUREFIRE-1264 When disabling the parallel execution, the tests should run without an issue. I suggest we introduce a property for configuring this and disable parallel execution on Travis CI until this issue is resolved. Would that be ok? -- Mit freundlichen Grüßen, *Christian Beikov*
Re: 1.14.0 Release candidate
I'd say whoever broke the build should take a look at it, but that's hard to tell now I guess. I'm working on setting up build automation on Travis CI which will help with seeing what or who broke the build, but that will only help in the future. For now I'd say, depending on whether the MongoDB adapter is considered production ready, ignore the problems or cancel the release. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 20.09.2017 um 04:18 schrieb Jesus Camacho Rodriguez: Michael, It seems nobody is willing to tackle CALCITE-1985 shortly, thus we need to unblock the situation. I would suggest we move it to 1.15.0 instead of qualifying it as blocker for 1.14.0, since it seems it may cause a performance regression for the MongoDB adapter, but there is no loss of functionality and rest of tests are passing. We could add a release note if necessary. Otherwise, we need to cancel the release and wait for it to be fixed, since we have been in the release cycle for a while now holding all commits to master branch. I would like to hear what others think about this too? Other ideas? Thanks, -Jesús On 9/20/17, 12:08 AM, "Michael Mior" wrote: Anyone willing to look at https://issues.apache.org/jira/browse/CALCITE-1985? As far as I'm concerned, this is the only blocker for the release since it seems I'm the only one who has issues with the Cassandra adapter. -- Michael Mior mm...@apache.org 2017-09-14 14:50 GMT-04:00 Michael Mior : LGTM. I'm just waiting on resolving CALCITE-1981 and CALCITE-1985. Jesús, any changes related to MV rewriting that may have caused CALCITE-1981? Also, would appreciate anyone having a look at the Mongo failures in CALCITE-1985. Other than that, I think we're good to release. -- Michael Mior mm...@uwaterloo.ca 2017-09-14 14:46 GMT-04:00 Julian Hyde : Michael, May I commit https://github.com/apache/calcite/pull/537? It's just a javadoc change. Julian On Wed, Sep 13, 2017 at 9:25 PM, Christian Beikov wrote: I could setup test automation on TravisCI for most of the different databases if you want with some description amd scripts for how to run the docker containers for DBs locally. Would that help? It's almost as flexible as Jenkins. Regards, Christian Am 13.09.2017 22:46 schrieb "Michael Mior" : Created https://issues.apache.org/jira/browse/CALCITE-1985 to track the issues with the MongoDB adapter. If someone else could have a quick look that would be great. Failed test output is in there. -- Michael Mior mm...@apache.org 2017-09-13 14:29 GMT-04:00 Michael Mior : That's the major blocker here. I don't want to make a release when the Cassandra adapter is completely broken for me. Although if others can confirm that there are no issues, then maybe we should proceed and I'll figure out the issues with my environment later. There are also a couple integration test failures on MongoDB that I'd like to resolve, but at least one of those seems spurious as the generated plan also seems ok. I'll open up JIRA for those and any failures I encounter with details in case someone else is able to have a quick look. Other than test failures, I don't believe there are any outstanding bug fixes or PRs that need to be merged. As mentioned earlier, if someone could run integration tests for Oracle, that would be great. I had a brief look over the Coverity scan results and didn't see anything that looks worth blocking over, but I think it would be a good idea to have a more thorough review in the future and also to set up some models to prune spurious warnings so it's easier to take action in the future. -- Michael Mior mm...@apache.org 2017-09-13 14:10 GMT-04:00 Julian Hyde : How close are we to a release candidate? Commits to master are paused, so let's either make an RC soon or re-open the branch to commits. Michael and I seem to be deadlocked on https://issues.apache.org/jira/browse/CALCITE-1981. Michael gets a build error every time, I never get a build error or a runtime error. (I got one once - due to stale jars on my class path, I think.) Can someone else please try to reproduce the build error? Then we'll know whether it's me or Michael who has a messed-up environment. Julian On Fri, Sep 8, 2017 at 1:59 PM, Julian Hyde wrote: I don’t think there is anyone who has knowledge of MongoDB and time to make the fixes. On Sep 7, 2017, at 10:06 AM, Michael Mior wrote: Thanks. I'm also seeing some integration test failures for the MongoDB adapter. If someone more familiar with Mongo could check that out, that would be great. -- Michael Mior mm...@apache.org 2017-09-07 12:51 GMT-04:00 Julian Hyde : OK, doing that now. The coverity config is a little rusty, so it may take a while before I have it working. I
Re: Travis CI build
Here is the PR for disabling parallelism on Travis CI. I also took the opportunity to fix the JDK7 problem. https://github.com/apache/calcite/pull/542 Mit freundlichen Grüßen, *Christian Beikov* Am 20.09.2017 um 03:34 schrieb Julian Hyde: Adding a parameter to control parallelism sounds fine. I’d prefer that the tests continue to run in parallel by default, but it’s fine to override in CI. On Sep 19, 2017, at 4:34 AM, Christian Beikov wrote: I just wanted to give you a little update on this since I have to wait for a response now. I tried to debug why the build hangs on the Travis CI infrastructure and it seems this is related to a known issue regarding parallel execution of parameterized tests: https://issues.apache.org/jira/browse/SUREFIRE-1264 When disabling the parallel execution, the tests should run without an issue. I suggest we introduce a property for configuring this and disable parallel execution on Travis CI until this issue is resolved. Would that be ok? -- Mit freundlichen Grüßen, *Christian Beikov*
Re: Fwd: ElasticSearch5 not working
Sqlline is problematic as it will have both elastic search versions on the classpath. I haven't thought about that before. In the meantime you could remove old versions of libraries from the "target/fullclasspath.txt" file to workaround the classpath problem. @Julian Hyde: How should we deal with something like this? We'd need isolated "modules" instead of a flat class path for sqlline. Maybe we could build sqlline on JBoss Modules(https://github.com/jboss-modules/jboss-modules) to achieve that? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor: -- Forwarded message -- From: Kunwar Deep Singh Toor Date: Mon, Sep 25, 2017 at 8:16 PM Subject: ElasticSearch5 not working To: dev-i...@calcite.apache.org Hi, I have been trying to connect to the ElasticSearch suing ES adapter through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter but when I try connecting to ES 5.1.1 using the ES5 adapter the following error pops up, I checked the logs for ElasticSearch and it does not show a connection attempt being made. The following is the error: java.lang.NoSuchMethodError: org.elasticsearch.transport.cl ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio n;Ljava/util/Collection;)Ljava/util/Collection; at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:130) at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:116) at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:106) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche ma.open(Elasticsearch5Schema.java:119) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche ma.(Elasticsearch5Schema.java:74) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche maFactory.create(Elasticsearch5SchemaFactory.java:56) at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:269) at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomS chema.java:45) at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:195) at org.apache.calcite.model.ModelHandler.(ModelHandler.java:87) at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver.java:104) at org.apache.calcite.avatica.UnregisteredDriver.connect(Unregi steredDriver.java:139) at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156) at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:204) at sqlline.Commands.connect(Commands.java:1095) at sqlline.Commands.connect(Commands.java:1001) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce ssorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe thodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHa ndler.java:38) at sqlline.SqlLine.dispatch(SqlLine.java:791) at sqlline.SqlLine.begin(SqlLine.java:668) at sqlline.SqlLine.start(SqlLine.java:373) at sqlline.SqlLine.main(SqlLine.java:265) I have recently started working on calcite and any help would be much appreciated. Thanks, Kunwar Singh
Re: Fwd: ElasticSearch5 not working
Depends if you want to be able to connect to an ES2 and ES5 server at the same time through sqlline. Am 27.09.2017 17:53 schrieb "Josh Elser" : > A quick solution is to add some documentation around how to give a > specific version of ElasticSearch at build-time. This would help folks > build Calcite with the exact version that they want (assuming that the API > hasn't changed and the adapter actually works with that version of ES) > > We don't need to have multiple versions of ElasticSearch on the classpath > at once, right? > > On 9/27/17 2:13 AM, Christian Beikov wrote: > >> Sqlline is problematic as it will have both elastic search versions on >> the classpath. I haven't thought about that before. In the meantime you >> could remove old versions of libraries from the "target/fullclasspath.txt" >> file to workaround the classpath problem. >> >> @Julian Hyde: How should we deal with something like this? We'd need >> isolated "modules" instead of a flat class path for sqlline. Maybe we could >> build sqlline on JBoss Modules(https://github.com/jbo >> ss-modules/jboss-modules) to achieve that? >> >> >> Mit freundlichen Grüßen, >> >> *Christian Beikov* >> Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor: >> >>> -- Forwarded message -- >>> From: Kunwar Deep Singh Toor >>> Date: Mon, Sep 25, 2017 at 8:16 PM >>> Subject: ElasticSearch5 not working >>> To: dev-i...@calcite.apache.org >>> >>> >>> Hi, >>> >>> I have been trying to connect to the ElasticSearch suing ES adapter >>> through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter >>> but when I try connecting to ES 5.1.1 using the ES5 adapter the following >>> error pops up, I checked the logs for ElasticSearch and it does not >>> show a >>> connection attempt being made. >>> >>> The following is the error: >>> java.lang.NoSuchMethodError: org.elasticsearch.transport.cl >>> ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio >>> n;Ljava/util/Collection;)Ljava/util/Collection; >>> at org.elasticsearch.transport.client.PreBuiltTransportClient.< >>> init>(PreBuiltTransportClient.java:130) >>> at org.elasticsearch.transport.client.PreBuiltTransportClient.< >>> init>(PreBuiltTransportClient.java:116) >>> at org.elasticsearch.transport.client.PreBuiltTransportClient.< >>> init>(PreBuiltTransportClient.java:106) >>> at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche >>> ma.open(Elasticsearch5Schema.java:119) >>> at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche >>> ma.(Elasticsearch5Schema.java:74) >>> at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche >>> maFactory.create(Elasticsearch5SchemaFactory.java:56) >>> at org.apache.calcite.model.ModelHandler.visit(ModelHandler. >>> java:269) >>> at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomS >>> chema.java:45) >>> at org.apache.calcite.model.ModelHandler.visit(ModelHandler. >>> java:195) >>> at org.apache.calcite.model.ModelHandler.(ModelHandler. >>> java:87) >>> at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver. >>> java:104) >>> at org.apache.calcite.avatica.UnregisteredDriver.connect(Unregi >>> steredDriver.java:139) >>> at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156) >>> at sqlline.DatabaseConnection.getConnection(DatabaseConnection. >>> java:204) >>> at sqlline.Commands.connect(Commands.java:1095) >>> at sqlline.Commands.connect(Commands.java:1001) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>> ssorImpl.java:62) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>> thodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHa >>> ndler.java:38) >>> at sqlline.SqlLine.dispatch(SqlLine.java:791) >>> at sqlline.SqlLine.begin(SqlLine.java:668) >>> at sqlline.SqlLine.start(SqlLine.java:373) >>> at sqlline.SqlLine.main(SqlLine.java:265) >>> >>> I have recently started working on calcite and any help would be much >>> appreciated. >>> >>> Thanks, >>> >>> Kunwar Singh >>> >>> >> >>
Re: ElasticSearch5 not working
Well it's not so simple I suppose. You also need to exclude the elasticsearch2 maven module from the pom.xml since that one is actually contributing the elasticsearch client to the classpath. For the sake of simplicity, we could introduce profiles that only have a specific ES version, but that doesn't scale obviously regarding possible other version combinations. I can introduce a profile if you want, but a real fix as I said needs classloader magic like what e.g. JBoss Modules does. Mit freundlichen Grüßen, *Christian Beikov* Am 27.09.2017 um 20:28 schrieb Julian Hyde: I agree with Christian that it’s just a problem with sqlline’s class path. The simplest solution is to remove elasticsearch2 from the line that builds the class path: for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb spark splunk example/csv example/function; do CP=${CP}${module}/target/classes: CP=${CP}${module}/target/test-classes: done Then sqlline will work out of the box for ES5. People who want to run ES2 will have to edit the script. Kunwar, Can you please log a JIRA case? Christian, Can you you fix it? It’s conceivable that someone would want to combine data from an ES2 and ES5 instance, but we don’t have an answer for them, whether they invoke Calcite via sqlline or any other way. The incompatible libraries will clash, unless there is some class-loader magic keeping them in different namespaces. The right answer is probably something like Jboss modules, but it’s low priority. Julian On Sep 27, 2017, at 9:29 AM, Christian Beikov wrote: Depends if you want to be able to connect to an ES2 and ES5 server at the same time through sqlline. Am 27.09.2017 17:53 schrieb "Josh Elser" : A quick solution is to add some documentation around how to give a specific version of ElasticSearch at build-time. This would help folks build Calcite with the exact version that they want (assuming that the API hasn't changed and the adapter actually works with that version of ES) We don't need to have multiple versions of ElasticSearch on the classpath at once, right? On 9/27/17 2:13 AM, Christian Beikov wrote: Sqlline is problematic as it will have both elastic search versions on the classpath. I haven't thought about that before. In the meantime you could remove old versions of libraries from the "target/fullclasspath.txt" file to workaround the classpath problem. @Julian Hyde: How should we deal with something like this? We'd need isolated "modules" instead of a flat class path for sqlline. Maybe we could build sqlline on JBoss Modules(https://github.com/jbo ss-modules/jboss-modules) to achieve that? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor: -- Forwarded message -- From: Kunwar Deep Singh Toor Date: Mon, Sep 25, 2017 at 8:16 PM Subject: ElasticSearch5 not working To: dev-i...@calcite.apache.org Hi, I have been trying to connect to the ElasticSearch suing ES adapter through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter but when I try connecting to ES 5.1.1 using the ES5 adapter the following error pops up, I checked the logs for ElasticSearch and it does not show a connection attempt being made. The following is the error: java.lang.NoSuchMethodError: org.elasticsearch.transport.cl ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio n;Ljava/util/Collection;)Ljava/util/Collection; at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:130) at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:116) at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:106) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche ma.open(Elasticsearch5Schema.java:119) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche ma.(Elasticsearch5Schema.java:74) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche maFactory.create(Elasticsearch5SchemaFactory.java:56) at org.apache.calcite.model.ModelHandler.visit(ModelHandler. java:269) at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomS chema.java:45) at org.apache.calcite.model.ModelHandler.visit(ModelHandler. java:195) at org.apache.calcite.model.ModelHandler.(ModelHandler. java:87) at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver. java:104) at org.apache.calcite.avatica.UnregisteredDriver.connect(Unregi steredDriver.java:139) at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156) at sqlline.DatabaseConnection.getConnection(DatabaseConnection. java:204) at
Re: ElasticSearch5 not working
The elasticsearch2 module depens on the elasticsearch client driver libraries with version 2.x and the elasticsearch5 module depends on the elasticsearch client driver libraries with version 5.x. That's the version conflict Kunwar was running into. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 28.09.2017 um 18:35 schrieb Julian Hyde: Christian, I don’t understand. The elasticsearch5 module does not depend on the elasticsearch2 module, only on core and linq4j. Therefore if it (and its dependencies) are added to sqlline’s class path, it should have everything it needs. What am I missing? Julian On Sep 27, 2017, at 11:02 PM, Christian Beikov wrote: Well it's not so simple I suppose. You also need to exclude the elasticsearch2 maven module from the pom.xml since that one is actually contributing the elasticsearch client to the classpath. For the sake of simplicity, we could introduce profiles that only have a specific ES version, but that doesn't scale obviously regarding possible other version combinations. I can introduce a profile if you want, but a real fix as I said needs classloader magic like what e.g. JBoss Modules does. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 27.09.2017 um 20:28 schrieb Julian Hyde: I agree with Christian that it’s just a problem with sqlline’s class path. The simplest solution is to remove elasticsearch2 from the line that builds the class path: for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb spark splunk example/csv example/function; do CP=${CP}${module}/target/classes: CP=${CP}${module}/target/test-classes: done Then sqlline will work out of the box for ES5. People who want to run ES2 will have to edit the script. Kunwar, Can you please log a JIRA case? Christian, Can you you fix it? It’s conceivable that someone would want to combine data from an ES2 and ES5 instance, but we don’t have an answer for them, whether they invoke Calcite via sqlline or any other way. The incompatible libraries will clash, unless there is some class-loader magic keeping them in different namespaces. The right answer is probably something like Jboss modules, but it’s low priority. Julian On Sep 27, 2017, at 9:29 AM, Christian Beikov wrote: Depends if you want to be able to connect to an ES2 and ES5 server at the same time through sqlline. Am 27.09.2017 17:53 schrieb "Josh Elser" : A quick solution is to add some documentation around how to give a specific version of ElasticSearch at build-time. This would help folks build Calcite with the exact version that they want (assuming that the API hasn't changed and the adapter actually works with that version of ES) We don't need to have multiple versions of ElasticSearch on the classpath at once, right? On 9/27/17 2:13 AM, Christian Beikov wrote: Sqlline is problematic as it will have both elastic search versions on the classpath. I haven't thought about that before. In the meantime you could remove old versions of libraries from the "target/fullclasspath.txt" file to workaround the classpath problem. @Julian Hyde: How should we deal with something like this? We'd need isolated "modules" instead of a flat class path for sqlline. Maybe we could build sqlline on JBoss Modules(https://github.com/jbo ss-modules/jboss-modules) to achieve that? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor: -- Forwarded message -- From: Kunwar Deep Singh Toor Date: Mon, Sep 25, 2017 at 8:16 PM Subject: ElasticSearch5 not working To: dev-i...@calcite.apache.org Hi, I have been trying to connect to the ElasticSearch suing ES adapter through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter but when I try connecting to ES 5.1.1 using the ES5 adapter the following error pops up, I checked the logs for ElasticSearch and it does not show a connection attempt being made. The following is the error: java.lang.NoSuchMethodError: org.elasticsearch.transport.cl ient.PreBuiltTransportClient.addPlugins(Ljava/util/Collectio n;Ljava/util/Collection;)Ljava/util/Collection; at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:130) at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:116) at org.elasticsearch.transport.client.PreBuiltTransportClient.< init>(PreBuiltTransportClient.java:106) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche ma.open(Elasticsearch5Schema.java:119) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Sche ma.(Elasticsearch5Schema.java:74)
Re: ElasticSearch5 not working
Maybe I am wrong, but I was assuming the command "mvn dependency:build-classpath" will include the dependencies of all modules regardless of whether we include the elasticsearch2 target/classes or not. Mit freundlichen Grüßen, *Christian Beikov* Am 28.09.2017 um 19:16 schrieb Julian Hyde: I understand that. But if we remove calcite-elasticsearch2 from sqlline’s classpath, sqlline will still get calcite-elasticsearch5 and the elastic search 5.x client driver libraries that it depends upon. And the elastic search 2.x client driver libraries will not be on sqlline's class path, because calcite-elasticsearch2 is the only module that depends on them. So there will be no clash. Julian On Sep 28, 2017, at 9:53 AM, Christian Beikov wrote: The elasticsearch2 module depens on the elasticsearch client driver libraries with version 2.x and the elasticsearch5 module depends on the elasticsearch client driver libraries with version 5.x. That's the version conflict Kunwar was running into. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 28.09.2017 um 18:35 schrieb Julian Hyde: Christian, I don’t understand. The elasticsearch5 module does not depend on the elasticsearch2 module, only on core and linq4j. Therefore if it (and its dependencies) are added to sqlline’s class path, it should have everything it needs. What am I missing? Julian On Sep 27, 2017, at 11:02 PM, Christian Beikov wrote: Well it's not so simple I suppose. You also need to exclude the elasticsearch2 maven module from the pom.xml since that one is actually contributing the elasticsearch client to the classpath. For the sake of simplicity, we could introduce profiles that only have a specific ES version, but that doesn't scale obviously regarding possible other version combinations. I can introduce a profile if you want, but a real fix as I said needs classloader magic like what e.g. JBoss Modules does. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 27.09.2017 um 20:28 schrieb Julian Hyde: I agree with Christian that it’s just a problem with sqlline’s class path. The simplest solution is to remove elasticsearch2 from the line that builds the class path: for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb spark splunk example/csv example/function; do CP=${CP}${module}/target/classes: CP=${CP}${module}/target/test-classes: done Then sqlline will work out of the box for ES5. People who want to run ES2 will have to edit the script. Kunwar, Can you please log a JIRA case? Christian, Can you you fix it? It’s conceivable that someone would want to combine data from an ES2 and ES5 instance, but we don’t have an answer for them, whether they invoke Calcite via sqlline or any other way. The incompatible libraries will clash, unless there is some class-loader magic keeping them in different namespaces. The right answer is probably something like Jboss modules, but it’s low priority. Julian On Sep 27, 2017, at 9:29 AM, Christian Beikov wrote: Depends if you want to be able to connect to an ES2 and ES5 server at the same time through sqlline. Am 27.09.2017 17:53 schrieb "Josh Elser" : A quick solution is to add some documentation around how to give a specific version of ElasticSearch at build-time. This would help folks build Calcite with the exact version that they want (assuming that the API hasn't changed and the adapter actually works with that version of ES) We don't need to have multiple versions of ElasticSearch on the classpath at once, right? On 9/27/17 2:13 AM, Christian Beikov wrote: Sqlline is problematic as it will have both elastic search versions on the classpath. I haven't thought about that before. In the meantime you could remove old versions of libraries from the "target/fullclasspath.txt" file to workaround the classpath problem. @Julian Hyde: How should we deal with something like this? We'd need isolated "modules" instead of a flat class path for sqlline. Maybe we could build sqlline on JBoss Modules(https://github.com/jbo ss-modules/jboss-modules) to achieve that? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 27.09.2017 um 01:30 schrieb Kunwar Deep Singh Toor: -- Forwarded message -- From: Kunwar Deep Singh Toor Date: Mon, Sep 25, 2017 at 8:16 PM Subject: ElasticSearch5 not working To: dev-i...@calcite.apache.org Hi, I have been trying to connect to the ElasticSearch suing ES adapter through sqlline. I was able to connect to ES 2.1.4 using the ES2 adapter but when I try connecting to ES 5.1.1 using the ES5 adapter the following error pops up, I che
Re: Does calcite have working ElasticSearch 5 adapter?
It does support basic operations, just as the ES2 adapter since it is based on it. I am no ES user so I don't plan on doing this work, maybe you or someone else could step in? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 01.10.2017 um 05:57 schrieb kant kodali: Hi All, The calcite elastic search 5 adapter webpage says the adapter currently doesn't support aggregations however when I look at this code https://github.com/apache/calcite/tree/master/elasticsearch5/src/main/java/org/apache/calcite/adapter/elasticsearch5 It doesn't look like it support the basic queries either. may I know when we can possibly expect this? Thanks!
Re: Does calcite have working ElasticSearch 5 adapter?
You can see what is supported by looking at the tests for the adapter. Mit freundlichen Grüßen, *Christian Beikov* Am 02.10.2017 um 10:17 schrieb kant kodali: where can I find all the operations the current ES5 adapter can support? any ES5 examples? I have to deep dive into code before I can commit towards contribution but I can see what I will be able to do. On Mon, Oct 2, 2017 at 12:52 AM, Christian Beikov < christian.bei...@gmail.com> wrote: It does support basic operations, just as the ES2 adapter since it is based on it. I am no ES user so I don't plan on doing this work, maybe you or someone else could step in? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 01.10.2017 um 05:57 schrieb kant kodali: Hi All, The calcite elastic search 5 adapter webpage says the adapter currently doesn't support aggregations however when I look at this code https://github.com/apache/calcite/tree/master/elasticsearch5 /src/main/java/org/apache/calcite/adapter/elasticsearch5 It doesn't look like it support the basic queries either. may I know when we can possibly expect this? Thanks!
Re: Need to extract only the values in the field
The following should do it select cast(_MAP['POP'] as integer) from "elasticsearch".zips In the test of the adapter you can see other usages: https://github.com/apache/calcite/blob/cc20ca13db4d506d9d4d1b861dd1c7ac3944e56e/elasticsearch2/src/test/resources/elasticsearch-zips-model.json#L37 Mit freundlichen Grüßen, -------- *Christian Beikov* Am 09.10.2017 um 19:40 schrieb AshwinKumar AshwinKumar: Hi Team, Could you please help here. I am stuck on this problem for like 2 days now. Thanks, Ashwin On Mon, Oct 9, 2017 at 4:42 PM, AshwinKumar AshwinKumar < aash...@g.clemson.edu> wrote: Hi Team, I have an elasticsearch view called zips . Below is the table data - 0: jdbc:calcite:model=./elasticsearch2/src/te> select POP from "elasticsearch".ZIPS; +--+ | POP | +--+ | {POP=13367} | | {POP=1652} | | {POP=3184} | | {POP=43704} | | {POP=2084} | | {POP=1350} | | {POP=8194} | | {POP=1732} | | {POP=9808} | | {POP=4441} | +--+ 10 rows selected (0.319 seconds) Could you please let me know if there is a way to select only the values in POP field using SQL. Like for eg I need only the integer values like 13367,1652 and so on from the tables. I need to join these values with another table in postgres schema. Thanks, Ashwin
Re: Need to extract only the values in the field
Hey, I'm not the original author of the ES2 adapter but I must admit, it seems odd that "select POP FROM "elasticsearch".ZIPS" doesn't work. Could you try ES5 and tell me if that works for you? I changed a few things related to projections for my test purposes which might just be what the ES2 adapter is missing. Maybe the original ES2 author could step up and explain the rationale behind the way it works currently? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 10.10.2017 um 16:47 schrieb AshwinKumar AshwinKumar: Hi Christian, Belowis my json file - { "version": "1.0", "defaultSchema": "elasticsearch1", "schemas": [ { "name": "postgrestest", "type": "custom", "factory": "org.apache.calcite.adapter.jdbc.JdbcSchema$Factory", "operand": { "jdbcDriver": "org.postgresql.Driver", "jdbcUrl": "jdbc:postgresql://localhost/bigdawg_catalog", "jdbcUser": "pguser", "jdbcPassword": "test" } }, { "type": "custom", "name": "elasticsearch_raw", "factory": "org.apache.calcite.adapter.elasticsearch2.Elasticsearch2SchemaFactory", "operand": { "coordinates": "{'127.0.0.1': 9300}", "userConfig": "{'bulk.flush.max.actions': 25, 'bulk.flush.max.size.mb': 1}", "index": "usa" } }, { "name": "elasticsearch", "tables": [ { "name": "ZIPS", "type": "view", "sql": [ "select cast(_MAP['CITY'] AS varchar(20)) AS \"CITY\",\n", " cast(_MAP['LOC'][0] AS float) AS \"LONGITUDE\",\n", " cast(_MAP['LOC'][1] AS float) AS \"LATITUDE\",\n", " cast(_MAP['POP'] AS integer) AS \"POP\",\n", " cast(_MAP['STATE'] AS varchar(2)) AS \"STATE\",\n", " cast(_MAP['IDNUM'] AS varchar(5)) AS \"ID\"\n", "from \"elasticsearch_raw\".\"ZIPS\"" ] } ] } Earlier I was executing - select POP from "elasticsearch".ZIPS ; I got the below error - 0: jdbc:calcite:model=./elasticsearch2/src/te>* select POP from "elasticsearch".ZIPS;* ++ |POP | ++ java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.Integer at org.apache.calcite.avatica.util.AbstractCursor$IntAccessor.getInt(AbstractCursor.java:531) at org.apache.calcite.avatica.AvaticaSite.get(AvaticaSite.java:339) at org.apache.calcite.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:409) at sqlline.Rows$Row.(Rows.java:157) at sqlline.IncrementalRows.hasNext(IncrementalRows.java:66) at sqlline.TableOutputFormat.print(TableOutputFormat.java:33) at sqlline.SqlLine.print(SqlLine.java:1648) at sqlline.Commands.execute(Commands.java:834) at sqlline.Commands.sql(Commands.java:733) at sqlline.SqlLine.dispatch(SqlLine.java:795) at sqlline.SqlLine.begin(SqlLine.java:668) at sqlline.SqlLine.start(SqlLine.java:373) at sqlline.SqlLine.main(SqlLine.java:265) 0: jdbc:calcite:model=./elasticsearch2/src/te> I tried the one which you suggested. I am getting the below error - 0: jdbc:calcite:model=./elasticsearch2/src/te> *select cast(_MAP['POP'] as integer) from "elasticsearch".ZIPS;* 2017-10-10 15:44:33,334 [main] ERROR - org.apache.calcite.sql.validate.SqlValidatorException: Column '_MAP' not found in any table 2017-10-10 15:44:33,347 [main] ERROR - org.apache.calcite.runtime.CalciteContextException: From line 1, column 13 to line 1, column 16: Column '_MAP' not found in any table Error: Error while executing SQL "select cast(_MAP['POP'] as integer) from "elasticsearch".ZIPS": From line 1, column 13 to line 1, column 16: Column '_MAP' not found in any table (state=,code=0) All the varchar fields are working fine. For eg - 0: jdbc:calcite:model=./elasticsearch2/src/te> select STATE from "elasticsearch".ZIPS; +---+ | STATE | +---+ | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | | {STATE=MA} | +
Re: [ANNOUNCE] New committer: Christian Beikov
Thank you all for the kind words! :) Mit freundlichen Grüßen, *Christian Beikov* Am 13.10.2017 um 09:45 schrieb Chris Baynes: Congrats Christian! On Fri, Oct 13, 2017 at 5:22 AM, He Zhiqiang wrote: Congratulations and welcome! Regards Zhiqiang He From: Julian Hyde Sent: Thursday, October 12, 2017 2:11 To: dev@calcite.apache.org Subject: [ANNOUNCE] New committer: Christian Beikov On behalf of the PMC I am delighted to announce Christian Beikov as a new Calcite committer. Christian’s first contribution[1] — quite out the blue — was a new adapter for Elasticsearch5. Since then he has made various improvement’s to Calcite’s support for federation and materialization, and has been active in design discussions and helping users on the dev list. Please give Christian a warm welcome to the project! Julian [1] https://github.com/apache/calcite/commits/master?author=beikov < https://github.com/apache/calcite/commits/master?author=beikov>
Re: need help with Elasticsearch adapter
The problem is documented in this issue already: https://issues.apache.org/jira/browse/CALCITE-1994 It's not so easy to use the elasticsearch adapter with sqlline unfortunately. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 26.12.2017 um 09:51 schrieb bryanbzhou(周彬): Hi, all. I met some problem with Elasticsearch adapter. Is there someone can help me ? I try to test Elasticsearch adapter following https://calcite.apache.org/docs/elasticsearch_adapter.html. First . I create a elacticsearch with docker on my macbook with following command: docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:5.6.1 Then. I edit es.json { "version": "1.0", "defaultSchema": "elasticsearch", "schemas": [ { "type": "custom", "name": "elasticsearch", "factory": "org.apache.calcite.adapter.elasticsearch5.Elasticsearch5SchemaFactory", "operand": { "coordinates": "{‘127.0.0.1': 9300}", "userConfig": "{ 'bulk.flush.max.size.mb': 10}", "index": "usa" } } ] } And try to connect to es with sqlline "!connect jdbc:calcite:model=es.json elastic changeme” But I got an error: java.lang.NoSuchMethodError: org.elasticsearch.transport.client.PreBuiltTransportClient.addPlugins(Ljava/util/Collection;Ljava/util/Collection;)Ljava/util/Collection; at org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:130) at org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:116) at org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:106) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Schema.open(Elasticsearch5Schema.java:119) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5Schema.(Elasticsearch5Schema.java:74) at org.apache.calcite.adapter.elasticsearch5.Elasticsearch5SchemaFactory.create(Elasticsearch5SchemaFactory.java:56) at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:270) at org.apache.calcite.model.JsonCustomSchema.accept(JsonCustomSchema.java:45) at org.apache.calcite.model.ModelHandler.visit(ModelHandler.java:196) at org.apache.calcite.model.ModelHandler.(ModelHandler.java:88) at org.apache.calcite.jdbc.Driver$1.onConnectionInit(Driver.java:104) at org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:139) at sqlline.DatabaseConnection.connect(DatabaseConnection.java:156) at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:204) at sqlline.Commands.connect(Commands.java:1095) at sqlline.Commands.connect(Commands.java:1001) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38) at sqlline.SqlLine.dispatch(SqlLine.java:791) at sqlline.SqlLine.begin(SqlLine.java:668) at sqlline.SqlLine.start(SqlLine.java:373) at sqlline.SqlLine.main(SqlLine.java:265)
Re: Maven wrapper
I didn't have the need for something like that up until now in any of the projects I worked on. I guess if there is some feature from a very recent Maven release that we would like to make use of, it would make sense to use that wrapper script. Since the current minimum version that is required for a build seems to be satisfied by the installations most users have, I don't see a reason for doing this yet. I'd suggest we do this when a new Maven model is released that might not be too widespread yet. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 29.12.2017 um 21:42 schrieb Julian Hyde: We have a pull request for a maven wrapper[1]. People would not need to install maven to build Calcite, but we would include shell/cmd scripts and a bootstrap .jar. It would allow us to use a specific version of maven (not that this has been a problem to date). It would make our release process a bit more complicated (we’d be shipping a binary in the .jar file) and we’d have to change instructions in several locations. What do you all think of the idea? I don’t think we should change something as fundamental as our build process unless there is a substantial majority in favor. Julian [1] https://issues.apache.org/jira/browse/CALCITE-2112 <https://issues.apache.org/jira/browse/CALCITE-2112>
Re: Use Calcite adapter with Elastic search and Java
Did you take a look into the documentation? https://calcite.apache.org/docs/elasticsearch_adapter.html If you use ES5+ you have to use an empty userConfig as the ES Client doesn't support the properties that are listed in the example. You can also take a look at the tests(https://github.com/apache/calcite/blob/master/elasticsearch5/src/test/java/org/apache/calcite/test/Elasticsearch5AdapterIT.java) in the Calcite repository to find usage examples. Currently you can't make use of the ES5 adapter along with sqlline as there are classpath problems, but nobody stepped up to fix it yet. Mit freundlichen Grüßen, *Christian Beikov* Am 14.02.2018 um 10:15 schrieb Saurabh Pathak: Hello, I want to use calcite adapter with elastic search and java. But i haven't idea how to use calcite elastic search adapter could you please let us know how to use calcite adapter with elastic search and java. Thanks in advance for your co operation Thanks & Regards Saurabh Pathak
Re: ElasticSearch. Explicit Definitions, Literals and RelBuilder
1) I wasn't aware of an API to retrieve a schema. We could definitely use that to avoid the need for views. 2) I agree, seems like a bug. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 17.05.2018 um 13:51 schrieb Michael Mior: 1) I'm not too familiar with ES so I didn't realize you could define mappings. I don't see any obvious reason why we couldn't use those assuming they're exposed via the ES API. 2) I was not aware of this and not sure whether it was intentional. It seems like a bug to me though. 3) Whenever you want to see how to build a particular query, you may find it helpful to run "EXPLAIN PLAN FOR " in sqlline. -- Michael Mior mm...@uwaterloo.ca Le jeu. 17 mai 2018 à 01:26, Andrei Sereda a écrit : Hello Calcite Devs, I have some questions about ES adapter and custom predicates / projections in Calcite. Your help is much appreciated. 1) All ES examples use a view (ZIPS < https://github.com/apache/calcite/blob/master/elasticsearch5/src/test/resources/elasticsearch-zips-model.json ) which does explicit type cast, name alias and dictionary access (via _MAP) for each field. If such view is not defined beforehand, making ad-hoc queries becomes tedious. Is there a way to make it more user-friendly (eg. using existing ES mapping) ? Example: select cast(_MAP['city'] AS varchar(20)) AS \"city\" from ... Why some adapters require explicit definition (eg. Mongo / ES) while others don't (eg. Geode) 2) When not using explicit casting (or field alias) query literals are converted to lower case: SQL: select * from "elastic" where _MAP['Foo'] = 'BAR' (note upper-case) ES Query: { "term": { "foo" : "bar" }} (note lower-case) This is less intuitive. Is there a way to switch it off (customize) ? 3) How to build the following query using Algebra Builder (RelBuilder) select * from "elastic" where _MAP['Foo'] = 'Bar' I presume one has to use SqlItemOperator but I couldn't get it to work. Is this related to (1) and (2) ? Regards, Andrei.
Re: ElasticSearch Adapters. Calcite vs Dremio
I don't know Dremio, but I guess the ES adpater parts were just implemented independently to be able to get something faster. If the authors of Dremio want to contribute back we'd definitely integrate such work. Mit freundlichen Grüßen, *Christian Beikov* Am 17.05.2018 um 13:56 schrieb Michael Mior: It's my understanding that while Dremio plugins serve a similar purpose, they are different from Calcite adapters. In any case, it could be worth looking at what Dremiois doing to see if there are any useful features to pull into Calcite. -- Michael Mior mm...@uwaterloo.ca Le jeu. 17 mai 2018 à 00:28, Andrei Sereda a écrit : Hello, I've noticed that Dremio implemented <https://github.com/dremio/dremio-oss/tree/master/plugins/elasticsearch> their own (calcite) adapters for ElasticSearch. According to documentation <https://www.dremio.com/tutorials/unlocking-sql-on-elasticsearch/> those adapters support ES aggregations (lacking in calcite) plus more advanced features like date ranges or native scripting < https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-getting-started.html . Is there a reason they're not backported / used in calcite ? More specifically is there any (technical) reason for those plugins to be developed independently ? Thanks in advance, Andrei.
Re: ElasticSearch. Explicit Definitions, Literals and RelBuilder
I honestly don't know the reason for that. I mainly just reused the original implementation which was comitted here: https://github.com/apache/calcite/commit/f3caf13b9f1cd92f95dcf27716466bf2133e1ed7#diff-3ebf1b729d2fdd03596ff01088977e63 Seems like an easy fix. Would you mind creating the PR? :) Mit freundlichen Grüßen, ---- *Christian Beikov* Am 17.05.2018 um 17:43 schrieb Andrei Sereda: Regarding (2) Lower-case problem. The issue seems to be in ElasticsearchFilter.Translator <https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchFilter.java#L109> which can be easily fixed : return "\"query\" : " + builder.toJsonString(map).replaceAll("\\s+", "") .toLowerCase(Locale.ROOT); Any reason for such explicit toLowerCase() transformation ? On Thu, May 17, 2018 at 11:14 AM, Michael Mior wrote: I'm not too familiar with the RelBuilder API myself unfortunately. I think for your ???, you want builder.field(0) although builder.field("_MAP") may also work. -- Michael Mior mm...@uwaterloo.ca Le jeu. 17 mai 2018 à 10:39, Andrei Sereda a écrit : Hi Michael and Christian, Thanks for your replies. Regarding (3) RelBuilder below is the query plan. I'm not sure how to express element =(ITEM($0, 'Foo') using API. Query Plan 0: jdbc:calcite:model=target/test-classes/mod> explain plan for select * from "elasticsearch_raw"."table" where _MAP['Foo'] = 'BAR'; | ElasticsearchToEnumerableConverter ElasticsearchFilter(condition=[=(ITEM($0, 'Foo'), 'BAR')]) ElasticsearchTableScan(table=[[elasticsearch_raw,table]]) API // Using RelBuilder API to construct simple ES query // select * from "elasticsearch_raw"."table" where _MAP['Foo'] = 'BAR'; // use RexInputRef here ? RexNode item = builder.call(SqlStdOperatorTable.ITEM, ???, builder.literal("Foo")); final RelNode node = builder .scan("elasticsearch_raw", "table") .filter(builder.call(SqlStdOperatorTable.EQUALS, item, builder.literal("BAR"))) .build(); Regards, Andrei. On Thu, May 17, 2018 at 9:16 AM, Christian Beikov < christian.bei...@gmail.com> wrote: 1) I wasn't aware of an API to retrieve a schema. We could definitely use that to avoid the need for views. 2) I agree, seems like a bug. Mit freundlichen Grüßen, *Christian Beikov* Am 17.05.2018 um 13:51 schrieb Michael Mior: 1) I'm not too familiar with ES so I didn't realize you could define mappings. I don't see any obvious reason why we couldn't use those assuming they're exposed via the ES API. 2) I was not aware of this and not sure whether it was intentional. It seems like a bug to me though. 3) Whenever you want to see how to build a particular query, you may find it helpful to run "EXPLAIN PLAN FOR " in sqlline. -- Michael Mior mm...@uwaterloo.ca Le jeu. 17 mai 2018 à 01:26, Andrei Sereda a écrit : Hello Calcite Devs, I have some questions about ES adapter and custom predicates / projections in Calcite. Your help is much appreciated. 1) All ES examples use a view (ZIPS < https://github.com/apache/calcite/blob/master/elasticsearch5 /src/test/resources/elasticsearch-zips-model.json ) which does explicit type cast, name alias and dictionary access (via _MAP) for each field. If such view is not defined beforehand, making ad-hoc queries becomes tedious. Is there a way to make it more user-friendly (eg. using existing ES mapping) ? Example: select cast(_MAP['city'] AS varchar(20)) AS \"city\" from ... Why some adapters require explicit definition (eg. Mongo / ES) while others don't (eg. Geode) 2) When not using explicit casting (or field alias) query literals are converted to lower case: SQL: select * from "elastic" where _MAP['Foo'] = 'BAR' (note upper-case) ES Query: { "term": { "foo" : "bar" }} (note lower-case) This is less intuitive. Is there a way to switch it off (customize) ? 3) How to build the following query using Algebra Builder (RelBuilder) select * from "elastic" where _MAP['Foo'] = 'Bar' I presume one has to use SqlItemOperator but I couldn't get it to work. Is this related to (1) and (2) ? Regards, Andrei.
Re: ElasticSearch use of rest client (instead of TransportClient)
Hey Andrei, that would be awesome! Do you know by any chance if the low level client is also compatible with older ES versions? Mit freundlichen Grüßen, *Christian Beikov* Am 18.05.2018 um 20:45 schrieb Andrei Sereda: Hello, ES TransportClient is deprecated in 7.0 (to be removed <https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html> in 8.0) in favor of http rest client(s). Would you consider a contribution switching to Rest Low-Level Client <https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low.html>(which has much fewer dependencies) ? Thanks, Andrei.
Re: ElasticSearch use of rest client (instead of TransportClient)
That's mainly because the Java drivers changed in a way that made impossible to use the same adapter. I might be wrong, but I think the ES5 adapter doesn't work with an ES2 server instance just like the ES2 adapter doesn't work with an ES5+ server instance. If all of this could just go away, that would be great :) Mit freundlichen Grüßen, -------- *Christian Beikov* Am 18.05.2018 um 21:19 schrieb Andrei Sereda: Yes it should be, since it is just an http client (apache http). ElasticSearch Rest API (query API) didn't change much <https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking-changes-5.0.html> . Next question would be : why there is a need in two separate modules elasticsearch2 and elasticsearch5 On Fri, May 18, 2018 at 3:11 PM, Christian Beikov < christian.bei...@gmail.com> wrote: Hey Andrei, that would be awesome! Do you know by any chance if the low level client is also compatible with older ES versions? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 18.05.2018 um 20:45 schrieb Andrei Sereda: Hello, ES TransportClient is deprecated in 7.0 (to be removed <https://www.elastic.co/guide/en/elasticsearch/client/java-a pi/master/transport-client.html> in 8.0) in favor of http rest client(s). Would you consider a contribution switching to Rest Low-Level Client <https://www.elastic.co/guide/en/elasticsearch/client/java-r est/current/java-rest-low.html>(which has much fewer dependencies) ? Thanks, Andrei.
Re: Elasticsearch integration test failures
I don't see a reason for running ES tests again on a "real" ES instance. What does that even mean? With a local node, we would be testing functionality against almost the real thing. The only difference is that a real setup has a cluster, but that doesn't change the functionality. So I'm all in on using just a local node which is also something I wanted to do for a long time, but you got to it before I did. Do the tests succeed with the local node? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 07.06.2018 um 20:47 schrieb Andrei Sereda: Hi Volodymyr, We're trying to bridge the gap between unit and integration tests (which are run infrequently) using fakes <https://martinfowler.com/bliki/InMemoryTestDatabase.html> in Calcite. Currently the plan is to use fongo (see CALCITE-2345 for mongo adapter) and local elastic node (see CALCITE-2347 for elastic). During unit tests code will run against fake instances. For IT profile they will run against real instances (as before). Regards, Andrei. On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde wrote: Unfortunately, I don’t know. We don’t run the integration tests as often as we should. On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi wrote: Hi all, I tried to run Elasticsearch integration tests for the current master, and many tests failed. Is it a known problem, or just an issue of my env? I noticed that one failure appears after the change made in CALCITE-2316. Most of the other failures are observed for 1.16 version also. Kind regards, Volodymyr Vysotskyi
Re: Elasticsearch integration test failures
If you migrate the existing tests to use the local node in that PR, I'd consider merging that PR given that Travis CI reports no test failures. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 11.06.2018 um 14:33 schrieb Andrei Sereda: Do the tests succeed with the local node? IT tests haven't been migrated to local node yet. I wanted to do it as separate commit. There are new tests which run against embedded instance successfully. Pls see PR 716 for details. I don't see a reason for running ES tests again on a "real" ES instance. What does that even mean? The only reason, I see, in connecting to external instance is testing compatibility with different versions of the engine. On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior wrote: Good point. Since ES is written in Java and we're running the same code testing on a local node, it's probably not necessary to test against a separate ES cluster. -- Michael Mior mm...@apache.org Le lun. 11 juin 2018 à 01:44, Christian Beikov I don't see a reason for running ES tests again on a "real" ES instance. What does that even mean? With a local node, we would be testing functionality against almost the real thing. The only difference is that a real setup has a cluster, but that doesn't change the functionality. So I'm all in on using just a local node which is also something I wanted to do for a long time, but you got to it before I did. Do the tests succeed with the local node? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 07.06.2018 um 20:47 schrieb Andrei Sereda: Hi Volodymyr, We're trying to bridge the gap between unit and integration tests (which are run infrequently) using fakes <https://martinfowler.com/bliki/InMemoryTestDatabase.html> in Calcite. Currently the plan is to use fongo (see CALCITE-2345 for mongo adapter) and local elastic node (see CALCITE-2347 for elastic). During unit tests code will run against fake instances. For IT profile they will run against real instances (as before). Regards, Andrei. On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde wrote: Unfortunately, I don’t know. We don’t run the integration tests as often as we should. On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi wrote: Hi all, I tried to run Elasticsearch integration tests for the current master, and many tests failed. Is it a known problem, or just an issue of my env? I noticed that one failure appears after the change made in CALCITE-2316. Most of the other failures are observed for 1.16 version also. Kind regards, Volodymyr Vysotskyi
Re: Elasticsearch integration test failures
For me that's fine, though it would be great if you could fix those given your time permits it. Andrei Sereda schrieb am Mo., 11. Juni 2018, 22:12: > There are some tests which have been failing for while. Is it OK if I > temporary mark them @Ignore(d) ? > > On Mon, Jun 11, 2018 at 3:48 PM, Christian Beikov < > christian.bei...@gmail.com> wrote: > > > If you migrate the existing tests to use the local node in that PR, I'd > > consider merging that PR given that Travis CI reports no test failures. > > > > > > Mit freundlichen Grüßen, > > ---- > > *Christian Beikov* > > Am 11.06.2018 um 14:33 schrieb Andrei Sereda: > > > >> Do the tests succeed with the local node? > >>> > >> IT tests haven't been migrated to local node yet. I wanted to do it as > >> separate commit. > >> There are new tests which run against embedded instance successfully. > Pls > >> see PR 716 for details. > >> > >> I don't see a reason for running ES tests again on a "real" ES instance. > >>> > >> What does that even mean? > >> The only reason, I see, in connecting to external instance is testing > >> compatibility with different versions of the engine. > >> > >> On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior wrote: > >> > >> Good point. Since ES is written in Java and we're running the same code > >>> testing on a local node, it's probably not necessary to test against a > >>> separate ES cluster. > >>> -- > >>> Michael Mior > >>> mm...@apache.org > >>> > >>> > >>> Le lun. 11 juin 2018 à 01:44, Christian Beikov < > >>> christian.bei...@gmail.com > >>> a écrit : > >>> > >>> I don't see a reason for running ES tests again on a "real" ES > instance. > >>>> What does that even mean? > >>>> > >>>> With a local node, we would be testing functionality against almost > the > >>>> real thing. The only difference is that a real setup has a cluster, > but > >>>> that doesn't change the functionality. So I'm all in on using just a > >>>> local node which is also something I wanted to do for a long time, but > >>>> you got to it before I did. > >>>> > >>>> Do the tests succeed with the local node? > >>>> > >>>> > >>>> Mit freundlichen Grüßen, > >>>> > >>>> > >>>> *Christian Beikov* > >>>> Am 07.06.2018 um 20:47 schrieb Andrei Sereda: > >>>> > >>>>> Hi Volodymyr, > >>>>> > >>>>> We're trying to bridge the gap between unit and integration tests > >>>>> > >>>> (which > >>> > >>>> are run infrequently) using fakes > >>>>> <https://martinfowler.com/bliki/InMemoryTestDatabase.html> in > Calcite. > >>>>> > >>>>> Currently the plan is to use fongo (see CALCITE-2345 for mongo > adapter) > >>>>> > >>>> and > >>>> > >>>>> local elastic node (see CALCITE-2347 for elastic). > >>>>> > >>>>> During unit tests code will run against fake instances. For IT > profile > >>>>> > >>>> they > >>>> > >>>>> will run against real instances (as before). > >>>>> > >>>>> Regards, > >>>>> Andrei. > >>>>> > >>>>> > >>>>> On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde > wrote: > >>>>> > >>>>> Unfortunately, I don’t know. We don’t run the integration tests as > >>>>>> > >>>>> often > >>> > >>>> as we should. > >>>>>> > >>>>>> On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi > >>>>>>> > >>>>>> wrote: > >>> > >>>> Hi all, > >>>>>>> > >>>>>>> I tried to run Elasticsearch integration tests for the current > >>>>>>> > >>>>>> master, > >>> > >>>> and > >>>>>> > >>>>>>> many tests failed. > >>>>>>> > >>>>>>> Is it a known problem, or just an issue of my env? > >>>>>>> > >>>>>>> I noticed that one failure appears after the change made in > >>>>>>> > >>>>>> CALCITE-2316. > >>>> > >>>>> Most of the other failures are observed for 1.16 version also. > >>>>>>> > >>>>>>> Kind regards, > >>>>>>> Volodymyr Vysotskyi > >>>>>>> > >>>>>> > >>>> > > >
Re: Elasticsearch integration test failures
I'm fine with a single PR as long as you split commits up. Andrei Sereda schrieb am Mo., 11. Juni 2018, 23:24: > Hi Christian, > > I'm working in parallel on CALCITE-2331 evaluation of predicate "(A or B) > and C" for ES. Can I fix existing bugs as part of that (or separate) commit > ? > > I was planning to do it just not in the same PR. > > On Mon, Jun 11, 2018 at 4:31 PM, Christian Beikov < > christian.bei...@gmail.com> wrote: > > > For me that's fine, though it would be great if you could fix those given > > your time permits it. > > > > Andrei Sereda schrieb am Mo., 11. Juni 2018, 22:12: > > > > > There are some tests which have been failing for while. Is it OK if I > > > temporary mark them @Ignore(d) ? > > > > > > On Mon, Jun 11, 2018 at 3:48 PM, Christian Beikov < > > > christian.bei...@gmail.com> wrote: > > > > > > > If you migrate the existing tests to use the local node in that PR, > I'd > > > > consider merging that PR given that Travis CI reports no test > failures. > > > > > > > > > > > > Mit freundlichen Grüßen, > > > > > > > > > > *Christian Beikov* > > > > Am 11.06.2018 um 14:33 schrieb Andrei Sereda: > > > > > > > >> Do the tests succeed with the local node? > > > >>> > > > >> IT tests haven't been migrated to local node yet. I wanted to do it > as > > > >> separate commit. > > > >> There are new tests which run against embedded instance > successfully. > > > Pls > > > >> see PR 716 for details. > > > >> > > > >> I don't see a reason for running ES tests again on a "real" ES > > instance. > > > >>> > > > >> What does that even mean? > > > >> The only reason, I see, in connecting to external instance is > testing > > > >> compatibility with different versions of the engine. > > > >> > > > >> On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior > > wrote: > > > >> > > > >> Good point. Since ES is written in Java and we're running the same > > code > > > >>> testing on a local node, it's probably not necessary to test > against > > a > > > >>> separate ES cluster. > > > >>> -- > > > >>> Michael Mior > > > >>> mm...@apache.org > > > >>> > > > >>> > > > >>> Le lun. 11 juin 2018 à 01:44, Christian Beikov < > > > >>> christian.bei...@gmail.com > > > >>> a écrit : > > > >>> > > > >>> I don't see a reason for running ES tests again on a "real" ES > > > instance. > > > >>>> What does that even mean? > > > >>>> > > > >>>> With a local node, we would be testing functionality against > almost > > > the > > > >>>> real thing. The only difference is that a real setup has a > cluster, > > > but > > > >>>> that doesn't change the functionality. So I'm all in on using > just a > > > >>>> local node which is also something I wanted to do for a long time, > > but > > > >>>> you got to it before I did. > > > >>>> > > > >>>> Do the tests succeed with the local node? > > > >>>> > > > >>>> > > > >>>> Mit freundlichen Grüßen, > > > >>>> > > > >>>> > > > >>>> *Christian Beikov* > > > >>>> Am 07.06.2018 um 20:47 schrieb Andrei Sereda: > > > >>>> > > > >>>>> Hi Volodymyr, > > > >>>>> > > > >>>>> We're trying to bridge the gap between unit and integration tests > > > >>>>> > > > >>>> (which > > > >>> > > > >>>> are run infrequently) using fakes > > > >>>>> <https://martinfowler.com/bliki/InMemoryTestDatabase.html> in > > > Calcite. > > > >>>>> > > > >>>>> Currently the plan is to use fongo (see CALCITE-2345 for mongo > > > adapter) > > > >>>>> > > > >>>> and > > > >>>> > > > >>>>> local elastic node (see CALCITE-2347 for elastic). > > > >>>>> > > > >>>>> During unit tests code will run against fake instances. For IT > > > profile > > > >>>>> > > > >>>> they > > > >>>> > > > >>>>> will run against real instances (as before). > > > >>>>> > > > >>>>> Regards, > > > >>>>> Andrei. > > > >>>>> > > > >>>>> > > > >>>>> On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde > > > wrote: > > > >>>>> > > > >>>>> Unfortunately, I don’t know. We don’t run the integration tests > as > > > >>>>>> > > > >>>>> often > > > >>> > > > >>>> as we should. > > > >>>>>> > > > >>>>>> On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi > > > >>>>>>> > > > >>>>>> wrote: > > > >>> > > > >>>> Hi all, > > > >>>>>>> > > > >>>>>>> I tried to run Elasticsearch integration tests for the current > > > >>>>>>> > > > >>>>>> master, > > > >>> > > > >>>> and > > > >>>>>> > > > >>>>>>> many tests failed. > > > >>>>>>> > > > >>>>>>> Is it a known problem, or just an issue of my env? > > > >>>>>>> > > > >>>>>>> I noticed that one failure appears after the change made in > > > >>>>>>> > > > >>>>>> CALCITE-2316. > > > >>>> > > > >>>>> Most of the other failures are observed for 1.16 version also. > > > >>>>>>> > > > >>>>>>> Kind regards, > > > >>>>>>> Volodymyr Vysotskyi > > > >>>>>>> > > > >>>>>> > > > >>>> > > > > > > > > > >
Re: Calcite travis jdk11 failed while compilation of calcite-elasticsearch2 on travis
Seems the latest Java 11 build will be stricter about the manifest format when reading. Also see other reports here: * https://bugs.eclipse.org/bugs/show_bug.cgi?id=535774 * https://issues.jenkins-ci.org/browse/JENKINS-51871?workflowName=JNJira+%2B+In-Review&stepId=1 Not sure if there is anything you can do. I'd assume other tests that use e.g. Netty would fail compilation as well. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 12.06.2018 um 14:11 schrieb Sergey Nuyanzin: Hello everyone Just I have an issue related to compilation of calcite-elasticsearch2 on travis As I see it relates only jdk11 while with others compiles ok e.g. https://travis-ci.org/apache/calcite/builds/391198921 could anyone please share knowledge how to overcome this?
Re: Elasticsearch integration test failures
Interesting, I tested both environments back then and it seemed to work. Good thing that we are about to replace the IT tests with using local ES nodes. This will finally allow us to run these tests regularly. Regarding ES2 and ES5, I'd suggest you have separate ElasticsearchProjectRule for each version. In Elasticsearch2Table and Elasticsearch5Table you add the rules to the ElasticsearchTableScan constructor. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 13.06.2018 um 06:15 schrieb Andrei Sereda: The issue seems to have started since CALCITE-1967 (ES5 support). ES2 expects keyword _source in scripted fields while ES5 param._source. See ElasticsearchProject <https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchProject.java#L79> . Also ES2 doesn't like when both script_fields and _source are present in the query. So if there are some explicit projections all have to be present in script_fields. Any elegant way to differentiate at runtime ES2 vs ES5 inside ElasticsearchProject ? On Mon, Jun 11, 2018 at 5:46 PM Christian Beikov wrote: I'm fine with a single PR as long as you split commits up. Andrei Sereda schrieb am Mo., 11. Juni 2018, 23:24: Hi Christian, I'm working in parallel on CALCITE-2331 evaluation of predicate "(A or B) and C" for ES. Can I fix existing bugs as part of that (or separate) commit ? I was planning to do it just not in the same PR. On Mon, Jun 11, 2018 at 4:31 PM, Christian Beikov < christian.bei...@gmail.com> wrote: For me that's fine, though it would be great if you could fix those given your time permits it. Andrei Sereda schrieb am Mo., 11. Juni 2018, 22:12: There are some tests which have been failing for while. Is it OK if I temporary mark them @Ignore(d) ? On Mon, Jun 11, 2018 at 3:48 PM, Christian Beikov < christian.bei...@gmail.com> wrote: If you migrate the existing tests to use the local node in that PR, I'd consider merging that PR given that Travis CI reports no test failures. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 11.06.2018 um 14:33 schrieb Andrei Sereda: Do the tests succeed with the local node? IT tests haven't been migrated to local node yet. I wanted to do it as separate commit. There are new tests which run against embedded instance successfully. Pls see PR 716 for details. I don't see a reason for running ES tests again on a "real" ES instance. What does that even mean? The only reason, I see, in connecting to external instance is testing compatibility with different versions of the engine. On Mon, Jun 11, 2018 at 7:16 AM, Michael Mior wrote: Good point. Since ES is written in Java and we're running the same code testing on a local node, it's probably not necessary to test against a separate ES cluster. -- Michael Mior mm...@apache.org Le lun. 11 juin 2018 à 01:44, Christian Beikov < christian.bei...@gmail.com a écrit : I don't see a reason for running ES tests again on a "real" ES instance. What does that even mean? With a local node, we would be testing functionality against almost the real thing. The only difference is that a real setup has a cluster, but that doesn't change the functionality. So I'm all in on using just a local node which is also something I wanted to do for a long time, but you got to it before I did. Do the tests succeed with the local node? Mit freundlichen Grüßen, *Christian Beikov* Am 07.06.2018 um 20:47 schrieb Andrei Sereda: Hi Volodymyr, We're trying to bridge the gap between unit and integration tests (which are run infrequently) using fakes <https://martinfowler.com/bliki/InMemoryTestDatabase.html> in Calcite. Currently the plan is to use fongo (see CALCITE-2345 for mongo adapter) and local elastic node (see CALCITE-2347 for elastic). During unit tests code will run against fake instances. For IT profile they will run against real instances (as before). Regards, Andrei. On Thu, Jun 7, 2018 at 2:32 PM, Julian Hyde wrote: Unfortunately, I don’t know. We don’t run the integration tests as often as we should. On Jun 7, 2018, at 10:20 AM, Vova Vysotskyi Hi all, I tried to run Elasticsearch integration tests for the current master, and many tests failed. Is it a known problem, or just an issue of my env? I noticed that one failure appears after the change made in CALCITE-2316. Most of the other failures are observed for 1.16 version also. Kind regards, Volodymyr Vysotskyi
Re: Calcite travis jdk11 failed while compilation of calcite-elasticsearch2 on travis
I think this might actually be a JDK bug. Let's wait a bit for a new EA release. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 12.06.2018 um 22:14 schrieb Julian Hyde: What would be the command-line arguments to achieve this? I will add them to my nightly build scripts. I already skip certain Guava versions based on JDK. Julian On Jun 12, 2018, at 12:42 PM, Kevin Risden wrote: We might want to skip JDK11 with ES 2.x. 2.x is EOL from Elastic. I can look into this if it helps. Kevin Risden On Tue, Jun 12, 2018, 10:54 Christian Beikov wrote: Seems the latest Java 11 build will be stricter about the manifest format when reading. Also see other reports here: * https://bugs.eclipse.org/bugs/show_bug.cgi?id=535774 * https://issues.jenkins-ci.org/browse/JENKINS-51871?workflowName=JNJira+%2B+In-Review&stepId=1 Not sure if there is anything you can do. I'd assume other tests that use e.g. Netty would fail compilation as well. Mit freundlichen Grüßen, ---- *Christian Beikov* Am 12.06.2018 um 14:11 schrieb Sergey Nuyanzin: Hello everyone Just I have an issue related to compilation of calcite-elasticsearch2 on travis As I see it relates only jdk11 while with others compiles ok e.g. https://travis-ci.org/apache/calcite/builds/391198921 could anyone please share knowledge how to overcome this?
Re: ElasticSearch use of rest client (instead of TransportClient)
The original reason for having separate adapters was because the ES Java-Client SDKs require different library versions which aren't binary compatible. Having separate modules just seemed to be the simplest solution. If you can make sure this is not going to be a problem for users, I'd be all for unifying the adapters. Changing the dependency and the schema factory name are IMO not problematic. Mit freundlichen Grüßen, *Christian Beikov* Am 22.06.2018 um 19:07 schrieb Andrei Sereda: 1) If we go single (and separate) ES adapter route, people will have to change their existing maven dependencies as well as ES schema configuration (at least SchemaFactory name). I'm not sure if there are any explicit (or implicit) backwards compatibility policies in calcite. There are (albeit) small implications for end-user. On Fri, Jun 22, 2018 at 12:54 PM Michael Mior wrote: 1) I personally would be open to this unless there's strong evidence of use of the ES2 adapter. 2) Calcite already depends on Jackson in core and both ES modules, so this isn't a concern. -- Michael Mior mm...@apache.org Le ven. 22 juin 2018 à 12:37, Andrei Sereda a écrit : Some questions regarding this change: 1) Should one remove ES2 and ES5 adapters (maven modules) in favor of single one: just ES ? This will be backwards incompatible change. Or keep them as is and create a new module ? There is also quite a bit of ES related code in calcite-core. 2) Since I need to create / parse JSON formats, ES adapter would have to depend on some JSON library (most likely existing Jackson). Is that acceptable ? On Fri, May 18, 2018 at 4:29 PM Andrei Sereda wrote: I believe this shouldn't be an issue with http client (contrary to native transport) On Fri, May 18, 2018, 16:16 Christian Beikov < christian.bei...@gmail.com wrote: That's mainly because the Java drivers changed in a way that made impossible to use the same adapter. I might be wrong, but I think the ES5 adapter doesn't work with an ES2 server instance just like the ES2 adapter doesn't work with an ES5+ server instance. If all of this could just go away, that would be great :) Mit freundlichen Grüßen, -------- *Christian Beikov* Am 18.05.2018 um 21:19 schrieb Andrei Sereda: Yes it should be, since it is just an http client (apache http). ElasticSearch Rest API (query API) didn't change much < https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking-changes-5.0.html . Next question would be : why there is a need in two separate modules elasticsearch2 and elasticsearch5 On Fri, May 18, 2018 at 3:11 PM, Christian Beikov < christian.bei...@gmail.com> wrote: Hey Andrei, that would be awesome! Do you know by any chance if the low level client is also compatible with older ES versions? Mit freundlichen Grüßen, -------- *Christian Beikov* Am 18.05.2018 um 20:45 schrieb Andrei Sereda: Hello, ES TransportClient is deprecated in 7.0 (to be removed <https://www.elastic.co/guide/en/elasticsearch/client/java-a pi/master/transport-client.html> in 8.0) in favor of http rest client(s). Would you consider a contribution switching to Rest Low-Level Client <https://www.elastic.co/guide/en/elasticsearch/client/java-r est/current/java-rest-low.html>(which has much fewer dependencies) ? Thanks, Andrei.
Re: ElasticSearch use of rest client (instead of TransportClient)
Looks great! Mit freundlichen Grüßen, *Christian Beikov* Am 22.06.2018 um 20:24 schrieb Michael Mior: Looks good to me but I'll defer to Christian since I know little about ES. Thanks for this Andrei! -- Michael Mior mm...@apache.org Le ven. 22 juin 2018 à 14:13, Andrei Sereda a écrit : CALCITE-2376 <https://issues.apache.org/jira/browse/CALCITE-2376> Please let me know if you agree with the plan On Fri, Jun 22, 2018 at 1:47 PM Christian Beikov < christian.bei...@gmail.com> wrote: The original reason for having separate adapters was because the ES Java-Client SDKs require different library versions which aren't binary compatible. Having separate modules just seemed to be the simplest solution. If you can make sure this is not going to be a problem for users, I'd be all for unifying the adapters. Changing the dependency and the schema factory name are IMO not problematic. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 22.06.2018 um 19:07 schrieb Andrei Sereda: 1) If we go single (and separate) ES adapter route, people will have to change their existing maven dependencies as well as ES schema configuration (at least SchemaFactory name). I'm not sure if there are any explicit (or implicit) backwards compatibility policies in calcite. There are (albeit) small implications for end-user. On Fri, Jun 22, 2018 at 12:54 PM Michael Mior wrote: 1) I personally would be open to this unless there's strong evidence of use of the ES2 adapter. 2) Calcite already depends on Jackson in core and both ES modules, so this isn't a concern. -- Michael Mior mm...@apache.org Le ven. 22 juin 2018 à 12:37, Andrei Sereda a écrit : Some questions regarding this change: 1) Should one remove ES2 and ES5 adapters (maven modules) in favor of single one: just ES ? This will be backwards incompatible change. Or keep them as is and create a new module ? There is also quite a bit of ES related code in calcite-core. 2) Since I need to create / parse JSON formats, ES adapter would have to depend on some JSON library (most likely existing Jackson). Is that acceptable ? On Fri, May 18, 2018 at 4:29 PM Andrei Sereda wrote: I believe this shouldn't be an issue with http client (contrary to native transport) On Fri, May 18, 2018, 16:16 Christian Beikov < christian.bei...@gmail.com wrote: That's mainly because the Java drivers changed in a way that made impossible to use the same adapter. I might be wrong, but I think the ES5 adapter doesn't work with an ES2 server instance just like the ES2 adapter doesn't work with an ES5+ server instance. If all of this could just go away, that would be great :) Mit freundlichen Grüßen, -------- *Christian Beikov* Am 18.05.2018 um 21:19 schrieb Andrei Sereda: Yes it should be, since it is just an http client (apache http). ElasticSearch Rest API (query API) didn't change much < https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking-changes-5.0.html . Next question would be : why there is a need in two separate modules elasticsearch2 and elasticsearch5 On Fri, May 18, 2018 at 3:11 PM, Christian Beikov < christian.bei...@gmail.com> wrote: Hey Andrei, that would be awesome! Do you know by any chance if the low level client is also compatible with older ES versions? Mit freundlichen Grüßen, ---- *Christian Beikov* Am 18.05.2018 um 20:45 schrieb Andrei Sereda: Hello, ES TransportClient is deprecated in 7.0 (to be removed <https://www.elastic.co/guide/en/elasticsearch/client/java-a pi/master/transport-client.html> in 8.0) in favor of http rest client(s). Would you consider a contribution switching to Rest Low-Level Client <https://www.elastic.co/guide/en/elasticsearch/client/java-r est/current/java-rest-low.html>(which has much fewer dependencies) ? Thanks, Andrei.
Re: [2/2] calcite git commit: [CALCITE-2347] running ElasticSearch in embedded mode for unit tests of ES adapter (Andrei Sereda)
Huh? How is that possible? I thought the checks that are run by the Travis CI build catch these kinds of errors. Mit freundlichen Grüßen, *Christian Beikov* Am 22.06.2018 um 23:11 schrieb Julian Hyde: Looks like this change broke “mvn site” (perhaps also “mvn javadoc:test-javadoc”). [ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.7:site (default-site) on project calcite: Error generating maven-javadoc-plugin:3.0.1:test-aggregate report: [ERROR] Exit code: 1 - /home/jhyde/regress/calcite/elasticsearch2/src/test/java/org/apache/calcite/adapter/elasticsearch2/EmbeddedElasticNode.java:29: error: package org.elasticsearch.node.internal does not exist [ERROR] import org.elasticsearch.node.internal.InternalSettingsPreparer; [ERROR] ^ [ERROR] On Jun 21, 2018, at 3:39 AM, mm...@apache.org wrote: [CALCITE-2347] running ElasticSearch in embedded mode for unit tests of ES adapter (Andrei Sereda) After discussion on dev-list Integration tests (for ES) have been removed. They're now superseded by unit tests (which execute queries against a real elastic instance) Added local file (zips-mini.json) which contains a small subset of original zips.json (allows to bootstrap tests faster) Created separate ES JUnit rule which can be re-used across different tests. Both v2 and v5 of ES adapters are supported. Close apache/calcite#716 Project: http://git-wip-us.apache.org/repos/asf/calcite/repo Commit: http://git-wip-us.apache.org/repos/asf/calcite/commit/c12cb4b0 Tree: http://git-wip-us.apache.org/repos/asf/calcite/tree/c12cb4b0 Diff: http://git-wip-us.apache.org/repos/asf/calcite/diff/c12cb4b0 Branch: refs/heads/master Commit: c12cb4b0de1baa3f7cbb9952ee350fdd1701662d Parents: 37944bb Author: Andrei Sereda Authored: Thu May 31 18:19:10 2018 -0400 Committer: Michael Mior Committed: Thu Jun 21 06:38:50 2018 -0400 -- .../AbstractElasticsearchTable.java | 12 + .../elasticsearch/ElasticsearchProject.java | 61 ++- elasticsearch2/pom.xml | 6 + .../Elasticsearch2Enumerator.java | 12 +- .../elasticsearch2/Elasticsearch2Schema.java| 16 +- .../elasticsearch2/Elasticsearch2Table.java | 9 +- .../ElasticSearch2AdapterTest.java | 395 ++ .../elasticsearch2/EmbeddedElasticNode.java | 147 +++ .../elasticsearch2/EmbeddedElasticRule.java | 97 + .../org/apache/calcite/test/ElasticChecker.java | 49 +++ .../calcite/test/Elasticsearch2AdapterIT.java | 270 - .../resources/elasticsearch-zips-model.json | 50 --- .../src/test/resources/zips-mini.json | 149 +++ elasticsearch5/pom.xml | 31 ++ .../elasticsearch5/Elasticsearch5Schema.java| 17 +- .../elasticsearch5/Elasticsearch5Table.java | 11 +- .../ElasticSearch5AdapterTest.java | 399 +++ .../elasticsearch5/EmbeddedElasticNode.java | 153 +++ .../elasticsearch5/EmbeddedElasticRule.java | 98 + .../org/apache/calcite/test/ElasticChecker.java | 49 +++ .../calcite/test/Elasticsearch5AdapterIT.java | 270 - .../resources/elasticsearch-zips-model.json | 50 --- elasticsearch5/src/test/resources/log4j2.xml| 16 + .../src/test/resources/zips-mini.json | 149 +++ pom.xml | 20 +- 25 files changed, 1866 insertions(+), 670 deletions(-) -- http://git-wip-us.apache.org/repos/asf/calcite/blob/c12cb4b0/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java -- diff --git a/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java b/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java index 0980469..8cc5933 100644 --- a/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java +++ b/core/src/main/java/org/apache/calcite/adapter/elasticsearch/AbstractElasticsearchTable.java @@ -75,6 +75,18 @@ public abstract class AbstractElasticsearchTable extends AbstractQueryableTable relOptTable, this, null); } + /** + * In ES 5.x scripted fields start with {@code params._source.foo} while in ES2.x + * {@code _source.foo}. Helper method to build correct query based on runtime version of elastic. + * + * @see https://github.com/elastic/elasticsearch/issues/20068";>_source variable + * @see https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-fields.html";>Scripted Fields + */ + protected String scriptedFieldPrefix() { +// this is default
Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table
Is there an API to discover indexes? If there is, I'd suggest we allow a config option that to make the adapter discover the possible indexes. We'd still have to adapt the code a bit, but internally, the schema could just keep a cache of type name to index name map and be able to support both scenarios. Mit freundlichen Grüßen, *Christian Beikov* Am 29.06.2018 um 00:12 schrieb Andrei Sereda: 1) What's the time horizon for the current adapter no longer working with these changes to ES ? Current adapter will be working for a while with existing setup. The problem is nomenclature and ease of use. Their new SQL concepts mapping <https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html> drops the notion of ES type (which before was equivalent of RDBMS table) and uses ES index as new table equivalent (before ES index was equal to database). Most users use elastic this way (one type , one index) index == table. Currently calcite requires schema per index. In RDBMS parlance database per table (I'd like to change that). 2) Any guess how complicated it would be to maintain code paths for both behaviours? I know this is probably really challenging to estimate, but I really have no idea of the scope of these changes. Would it mean two different ES adapters? One can have just a separate calcite schema implementations (same adapter / module) : 1) LegacySchema (old). Schema can have only one index (but multiple types). Type == table in this case. 2) NewSchema (new). Single schema can have multiple indexes (type is dropped). Index == table in this case 3) Do we really need compatibility with the current version of the adapter? IMO this depends on what versions of ES we would lose support for and how complex it would be for users of the current ES adapter to make updates for any Calcite API changes. The issue is not in adapter but how calcite schema exposes tables. Should it expose index as individual table (new), or ES type (old) ? Andrei. On Thu, Jun 28, 2018 at 5:23 PM Michael Mior wrote: Unfortunately I know very little about ES so I'm not in a great position to asses the impact of these changes. I will say that that legacy compatibility is great, but maintaining two sets of logic is always a challenge. A few follow up questions: 1) What's the time horizon for the current adapter no longer working with these changes to ES? 2) Any guess how complicated it would be to maintain code paths for both behaviours? I know this is probably really challenging to estimate, but I really have no idea of the scope of these changes. Would it mean two different ES adapters? 3) Do we really need compatibility with the current version of the adapter? IMO this depends on what versions of ES we would lose support for and how complex it would be for users of the current ES adapter to make updates for any Calcite API changes. Thanks for your continued work on the ES adapter Andrei! -- Michael Mior mm...@apache.org Le jeu. 28 juin 2018 à 12:57, Andrei Sereda a écrit : Hello, Elastic announced < https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html that they will be deprecating mapping types in ES6 and indexes will be single-typed only. Historical analogy <https://www.elastic.co/blog/index-vs-type> between RDBMS and elastic was that index is equivalent to a database and type corresponds to table in that database. In a couple of releases (ES6-8) this shall not longer be true. Recent SQL addition <https://www.elastic.co/blog/elasticsearch-6-3-0-released> to elastic confirms this trend < https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html . Index is equivalent to a table and there are no more ES types. I would like to propose to include this logic in Calcite ES adapter. IE, expose each ES single-typed index as a separate table inside calcite schema. This is in contrast to current integration where schema can only have a single index. Current approach forces you to create multiple schemas to query single-typed indexes (on the same ES cluster). Legacy compatibility can always be controlled with configuration parameters. Do you agree with such changes ? If yes, would you consider a PR ? Regards, Andrei.
Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table
IMO the best solution would be to make it configurable by introducing a "table_mapping" config with values * type - every type in the known indices is mapped as table * index - every known index is mapped as table We'd probably also need a "type_field" configuration for defining which field to use for the type determination as one of the possible future ways to do things is to introduce a custom field: https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2 We already detect the ES version, so we can set a smart default for this setting. Let's make the index config param optional. * When no index is given, we discover indexes, the default for "table_mapping" then is "index" * When index is given, the we only discover types according to the "type_field" configuration and the default for "table_mapping" is "type" This would also allow to discover indexes but still use "type" as "table_mapping". What do you think? Mit freundlichen Grüßen, *Christian Beikov* Am 29.06.2018 um 02:41 schrieb Andrei Sereda: Yes. There is an API to list all indexes / types in elastic. They can be automatically imported into a schema. What needs to be agreed upon is how to expose those elements in calcite schema (naming / behaviour). 1) Many (most?) of setups are single type per index. Natural way to name would be "elastic.$index" (elastic being schema name). Multiple indexes would be under same schema "elastic.index1" "elastic.index2" etc. 2) What if index has several types should they exported as calcite tables: "elastic.$index_type1" "elastic.$index_type2" ? Or (current behaviour) as "elastic.type1" and "elastic.type2". Or as subschema "elastic.$index.type1" ? Now what if one has combination of (1) and (2) ? Setup (2) is already deprecated (and will be unsupported in next version) On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov wrote: Is there an API to discover indexes? If there is, I'd suggest we allow a config option that to make the adapter discover the possible indexes. We'd still have to adapt the code a bit, but internally, the schema could just keep a cache of type name to index name map and be able to support both scenarios. Mit freundlichen Grüßen, *Christian Beikov* Am 29.06.2018 um 00:12 schrieb Andrei Sereda: 1) What's the time horizon for the current adapter no longer working with these changes to ES ? Current adapter will be working for a while with existing setup. The problem is nomenclature and ease of use. Their new SQL concepts mapping < https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html drops the notion of ES type (which before was equivalent of RDBMS table) and uses ES index as new table equivalent (before ES index was equal to database). Most users use elastic this way (one type , one index) index == table. Currently calcite requires schema per index. In RDBMS parlance database per table (I'd like to change that). 2) Any guess how complicated it would be to maintain code paths for both behaviours? I know this is probably really challenging to estimate, but I really have no idea of the scope of these changes. Would it mean two different ES adapters? One can have just a separate calcite schema implementations (same adapter / module) : 1) LegacySchema (old). Schema can have only one index (but multiple types). Type == table in this case. 2) NewSchema (new). Single schema can have multiple indexes (type is dropped). Index == table in this case 3) Do we really need compatibility with the current version of the adapter? IMO this depends on what versions of ES we would lose support for and how complex it would be for users of the current ES adapter to make updates for any Calcite API changes. The issue is not in adapter but how calcite schema exposes tables. Should it expose index as individual table (new), or ES type (old) ? Andrei. On Thu, Jun 28, 2018 at 5:23 PM Michael Mior wrote: Unfortunately I know very little about ES so I'm not in a great position to asses the impact of these changes. I will say that that legacy compatibility is great, but maintaining two sets of logic is always a challenge. A few follow up questions: 1) What's the time horizon for the current adapter no longer working with these changes to ES? 2) Any guess how complicated it would be to maintain code paths for both behaviours? I know this is probably really challenging to estimate, but I really have no idea of the scope of these changes. Would it mean two different ES adapters? 3) D
Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table
I'm not sure what the benefit of allowing users to specify this scheme would be. We'd have to parse it, interpret it, make sure the expressions don't result conflicting names etc. IMO a simple mode configuration would be way easier to implement and probably cover 99% of the use cases. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 29.06.2018 um 20:19 schrieb Julian Hyde: Andrei, I'm not an ES user so I don't fully understand this issue, but my two cents anyway... Can you show how those examples affect SQL against the ES adapter and/or how they affect JSON models? You seem to be using '_' as a separator character. Are we sure that people will never use it in index or type name? Separator characters often cause problems. Julian On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda wrote: I agree there should be a configuration option. How about the following approach. Expose both variables ${index} and ${type} in configuration (JSON) and user will use them to generate table name in calcite schema. Example "table_name": "${type}" // current "table_name": "${index}" // new (default?) "table_name": "${index}_${type}" // most generic. supports multiple types per index On Fri, Jun 29, 2018 at 9:26 AM Michael Mior wrote: I think it sounds like you and Andrei are in a good position to tackle this one so I'm happy to have you both work on whatever solution you think is best. -- Michael Mior mm...@apache.org Le ven. 29 juin 2018 à 04:19, Christian Beikov IMO the best solution would be to make it configurable by introducing a "table_mapping" config with values * type - every type in the known indices is mapped as table * index - every known index is mapped as table We'd probably also need a "type_field" configuration for defining which field to use for the type determination as one of the possible future ways to do things is to introduce a custom field: https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2 We already detect the ES version, so we can set a smart default for this setting. Let's make the index config param optional. * When no index is given, we discover indexes, the default for "table_mapping" then is "index" * When index is given, the we only discover types according to the "type_field" configuration and the default for "table_mapping" is "type" This would also allow to discover indexes but still use "type" as "table_mapping". What do you think? Mit freundlichen Grüßen, *Christian Beikov* Am 29.06.2018 um 02:41 schrieb Andrei Sereda: Yes. There is an API to list all indexes / types in elastic. They can be automatically imported into a schema. What needs to be agreed upon is how to expose those elements in calcite schema (naming / behaviour). 1) Many (most?) of setups are single type per index. Natural way to name would be "elastic.$index" (elastic being schema name). Multiple indexes would be under same schema "elastic.index1" "elastic.index2" etc. 2) What if index has several types should they exported as calcite tables: "elastic.$index_type1" "elastic.$index_type2" ? Or (current behaviour) as "elastic.type1" and "elastic.type2". Or as subschema "elastic.$index.type1" ? Now what if one has combination of (1) and (2) ? Setup (2) is already deprecated (and will be unsupported in next version) On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov < christian.bei...@gmail.com> wrote: Is there an API to discover indexes? If there is, I'd suggest we allow a config option that to make the adapter discover the possible indexes. We'd still have to adapt the code a bit, but internally, the schema could just keep a cache of type name to index name map and be able to support both scenarios. Mit freundlichen Grüßen, *Christian Beikov* Am 29.06.2018 um 00:12 schrieb Andrei Sereda: 1) What's the time horizon for the current adapter no longer working with these changes to ES ? Current adapter will be working for a while with existing setup. The problem is nomenclature and ease of use. Their new SQL concepts mapping < https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html drops the notion of ES type (which before was equivalent of RDBMS table) and uses ES index as new table equivalent (before ES index was equal to database). Most users use elastic this way (one type , one index) index == table
Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table
I like the idea of the regex filter, might be cool to have something like that in general for all adapters, but it's fine if you do it just for ES now. I guess you are considering include and exclude pattern parameters? I'm more for a mode parameter and not let the user decide the name explicitly. Either the types or the index names will have to have meaningful unique names or the user will have to map certain indexes to a different schema. IMO that's a good solution. Mit freundlichen Grüßen, -------- *Christian Beikov* Am 30.06.2018 um 16:43 schrieb Andrei Sereda: Christian / Michael, Can you please weight-in for your preferred solution and I'll implement it. One more question. Sometimes it is nice to be able to filter (limit) indexes (tables) exposed by calcite. Say my cluster has 10 indexes but I want user to query only one. Would you be opposed if I add configuration parameter which allows to specify a (eg. regexp) filter for ES indexes ? On Fri, Jun 29, 2018 at 11:17 PM Andrei Sereda wrote: That's a reasonable alternative. On Fri, Jun 29, 2018 at 7:57 PM Julian Hyde wrote: Maybe there could be a separator char as one of the adapter’s parameters. People should choose a value, say ‘$’ or ‘#’, that is legal in an unquoted SQL identifier but does not occur in any of their index or type names. If not specified, the adapter would end up in a simple mode, say looking for indexes first, then looking for types, and people would need to make sure indexes and types have distinct names. After the transition to single-type indexes, people could stop using the parameter. Julian On Jun 29, 2018, at 4:43 PM, Andrei Sereda wrote: That's a valid point. Then user would define a different pattern like "i$index_t$type" for his cluster. I think we should first answer wherever such scenarios should be supported by calcite (given that they're already deprecated by the vendor). If yes, what should be collision strategy ? User defined pattern like above or failure or auto generated name ? On Fri, Jun 29, 2018, 19:14 Julian Hyde wrote: In elastic (index/type) pair is guaranteed to be unique therefore "${index}_${type}" will be also unique (as string). This is only necessary when we have several types per index. Valid question is wherever user should be allowed such flexibility. Uniqueness is not my concern. Suppose there is an index called "x_y" with a type called "z", and another index called "x" with a type called "y_z". If I write "x_y_z" it's not clear how it should be broken into index/type. On Fri, Jun 29, 2018 at 3:15 PM, Andrei Sereda wrote: Can you show how those examples affect SQL against the ES adapter and/or how they affect JSON models? The discussion is how to properly bridge (index/type) concept from ES into relational world. Proposal to use placeholders ($index / $type) affects only how table is named in calcite. They're not used as SQL literals. IE it affects only configuration phase of the schema. Pretty much we're doing string/replace to derive table name from ($index/$type). You seem to be using '_' as a separator character. Are we sure that people will never use it in index or type name? Separator characters often cause problems. In elastic (index/type) pair is guaranteed to be unique therefore "${index}_${type}" will be also unique (as string). This is only necessary when we have several types per index. Valid question is wherever user should be allowed such flexibility. On Fri, Jun 29, 2018 at 2:19 PM Julian Hyde wrote: Andrei, I'm not an ES user so I don't fully understand this issue, but my two cents anyway... Can you show how those examples affect SQL against the ES adapter and/or how they affect JSON models? You seem to be using '_' as a separator character. Are we sure that people will never use it in index or type name? Separator characters often cause problems. Julian On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda wrote: I agree there should be a configuration option. How about the following approach. Expose both variables ${index} and ${type} in configuration (JSON) and user will use them to generate table name in calcite schema. Example "table_name": "${type}" // current "table_name": "${index}" // new (default?) "table_name": "${index}_${type}" // most generic. supports multiple types per index On Fri, Jun 29, 2018 at 9:26 AM Michael Mior wrote: I think it sounds like you and Andrei are in a good position to tackle this one so I'm happy to have you both work on whatever solution you think is best. -- Michael Mior mm...@apache.org Le ven. 29 juin 2018 à 04:19, Christian Beikov < christian.bei...@
Re: [VOTE] [CALCITE-490] Remove "End File.java" comments from the end of files
+1 Am 10.09.2018 um 10:49 schrieb Vladimir Sitnikov: Hi, Could we have a vote on coding style regarding "// End File.java" in Calcite source files? "//End..." comments clutter Git history, they consume screen space, and they slow down the development by causing Checkstyle violations. Could we vote on the style regarding "//End File.java" [ ] +1, remove the ending "// End File.java" comment, ensure files end with single blank line [ ] -1, keep "// End File.java" since ... https://www.apache.org/foundation/voting.html#votes-on-code-modification Whole numbers are recommended for this type of vote, as the opinion being expressed is Boolean: 'I approve/do not approve of this change.' ... A veto without a justification is invalid and has no weight. Note back in 2014 Julian did mention 5 thoughts which are questionable. Julian>And it helps ensure that the file ends with a line ending. This is just false. The comment line has nothing to di with line ending. A rule of "file must end with a single blank line" can be verified without "dedicated comment line". Julian>It indicates the name of the file (useful if the file has been renamed). If a file is renamed, then ending comment HAS to be updated otherwise Checkstyle fails the build. In other words, comment always matches "current" file name, thus it cannot help to detect renames. Julian>The comment makes it easy to see that you are looking at the last page of a file in an editor. This is true, however it does sound like a very rare use-case for me. Editors often show file name on their own (tab name, application title, navigation gutter). Of course, certain UI elements in the IDE can be disabled, however I sill think IDE is much better at telling you which file you are looking at than a comment in the very last line of the file. If we follow that logic, should we add "filename comment" after each 25 lines so one always knows which file is open? That is just non-sense. Julian>It verifies that the file has not been truncated If the file is truncated, the compiler will bail out. Unit test could detect that as well. On the other hand, "ending comment" does not protect from accidental cut from a middle of the file. Julian>and provides a buffer against truncation issues. I don't get it, however it mentions truncation, thus I assume that is not a valid justification. Truncation is to be managed by Git / code review / compiler / test suite. Vladimir
Re: scrolling in ElasticSearch adapter
Hey, not sure if this should be an SQL keyword. JDBC specifies various constants that can be used at statement creation time: https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html Not sure though if or how these configurations are accessible for data stores or dialects, but IMO using these would be the proper way. Regards Christian Am 24.10.2018 um 18:44 schrieb Andrei Sereda: Hello, I was thinking about adding [scrolling functionality]( https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html) to elastic search adapter. Since scrolling has non-negligible effect on the cluster it should be selectively enabled on per query basis. So, likely, user has to explicitly set "scroll flag" somewhere. Most natural way seems in SQL. [Calcite sql grammar]( https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword (unused to my knowledge). There were also discussions about adding hints to Calcite. ### Examples ```sql -- special sql keyword ? SCROLL select * from elastic; -- assuming hints are available in calcite /* HINT: scroll */ select * from elastic; ``` What people think about this use-case ? Are there better ideas ? Regards, Andrei.
Re: scrolling in ElasticSearch adapter
In JDBC one can configure a fetch size which would reflect the amount of rows to be fetched initially, but also subsequently. https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int) According to what you are writing, ES behvior is what TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't affected by changes. IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4, ... and view R1, R2, then R3 is deleted and you fetch the next rows, you wouldn't see R3. According to the JDBC spec (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int) ) you don't have to support all modes. Usually, user code doesn't use scrolling that much, but at least forward scrolling makes sense. Am 24.10.2018 um 21:38 schrieb Andrei Sereda: Hi Julian, Scrolling (in elastic) does not only mean “open a cursor” but also iterate over consistent snapshot. From docs: The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests. So pagination (fetch / offset) can’t exactly replicate this functionality. The problem with scrolling (in elastic) is that it is expensive and can’t (shouldn’t) be enabled it by default. There is one more “issue”. Currently select * from elastic returns at most 10 rows (in calcite). This is consistent with elastic behaviour which limits result set to 10 documents (unless size is specified). When returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE> or SQL hint) does it mean return whole elastic index ? I’m not at ease with returning different results based on hints or cursor settings. Andrei. On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde wrote: It seems to me that Elasticsearch scroll means return a cursor - a collection of rows that you iterate over, and you may not read all of them. This is the default operation of JDBC. So, I guess we need to give the user a way to signal their intent to read all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS hints[1] seem close to this. We would want the hints to be acted upon by both the optimizer and the JDBC transport. Related is pagination. SQL has FETCH and OFFSET, which allow you to retrieve different pieces of a large result set in separate statements or (using query parameters) executions. It would be useful if the server could be given a hint to cache a statement across page requests. Julian [1] https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924 On Oct 24, 2018, at 11:19 AM, Christian Beikov < christian.bei...@gmail.com> wrote: Hey, not sure if this should be an SQL keyword. JDBC specifies various constants that can be used at statement creation time: https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html Not sure though if or how these configurations are accessible for data stores or dialects, but IMO using these would be the proper way. Regards Christian Am 24.10.2018 um 18:44 schrieb Andrei Sereda: Hello, I was thinking about adding [scrolling functionality]( https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html ) to elastic search adapter. Since scrolling has non-negligible effect on the cluster it should be selectively enabled on per query basis. So, likely, user has to explicitly set "scroll flag" somewhere. Most natural way seems in SQL. [Calcite sql grammar]( https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword (unused to my knowledge). There were also discussions about adding hints to Calcite. ### Examples ```sql -- special sql keyword ? SCROLL select * from elastic; -- assuming hints are available in calcite /* HINT: scroll */ select * from elastic; ``` What people think about this use-case ? Are there better ideas ? Regards, Andrei.
Re: scrolling in ElasticSearch adapter
Hey Andrei, I don't have an answer for how you can access these settings from within the adapter nor how one could do that via RelNodes but the suggestion to use DataContext for that purpose sounds reasonable. Maybe someone else has an idea? Anyway, since these are settings that don't affect the general semantics of the query/statement and also usually require a special API to be used, I'd rather see these aspects not end up in the query string. Am 25.10.2018 um 02:15 schrieb Andrei Sereda: Christian, I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement generally but have some reservations (questions) : How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ? What if user doesn’t use JDBC (eg. RelBuilders) ? On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov wrote: In JDBC one can configure a fetch size which would reflect the amount of rows to be fetched initially, but also subsequently. https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int) According to what you are writing, ES behvior is what TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't affected by changes. IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4, ... and view R1, R2, then R3 is deleted and you fetch the next rows, you wouldn't see R3. According to the JDBC spec (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int) ) you don't have to support all modes. Usually, user code doesn't use scrolling that much, but at least forward scrolling makes sense. Am 24.10.2018 um 21:38 schrieb Andrei Sereda: Hi Julian, Scrolling (in elastic) does not only mean “open a cursor” but also iterate over consistent snapshot. From docs: The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests. So pagination (fetch / offset) can’t exactly replicate this functionality. The problem with scrolling (in elastic) is that it is expensive and can’t (shouldn’t) be enabled it by default. There is one more “issue”. Currently select * from elastic returns at most 10 rows (in calcite). This is consistent with elastic behaviour which limits result set to 10 documents (unless size is specified). When returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE> or SQL hint) does it mean return whole elastic index ? I’m not at ease with returning different results based on hints or cursor settings. Andrei. On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde wrote: It seems to me that Elasticsearch scroll means return a cursor - a collection of rows that you iterate over, and you may not read all of them. This is the default operation of JDBC. So, I guess we need to give the user a way to signal their intent to read all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS hints[1] seem close to this. We would want the hints to be acted upon by both the optimizer and the JDBC transport. Related is pagination. SQL has FETCH and OFFSET, which allow you to retrieve different pieces of a large result set in separate statements or (using query parameters) executions. It would be useful if the server could be given a hint to cache a statement across page requests. Julian [1] https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924 On Oct 24, 2018, at 11:19 AM, Christian Beikov < christian.bei...@gmail.com> wrote: Hey, not sure if this should be an SQL keyword. JDBC specifies various constants that can be used at statement creation time: https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html Not sure though if or how these configurations are accessible for data stores or dialects, but IMO using these would be the proper way. Regards Christian Am 24.10.2018 um 18:44 schrieb Andrei Sereda: Hello, I was thinking about adding [scrolling functionality]( https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html ) to elastic search adapter. Since scrolling has non-negligible effect on the cluster it should be selectively enabled on per query basis. So, likely, user has to explicitly set "scroll flag" somewhere. Most natural way seems in SQL. [Calcite sql grammar]( https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword (unused to my knowledge). There were also discussions about adding hints to Calcite. ### Examples ```sql -- special sql keyword ? SCROLL select * from elastic; -- assuming hints are available in calcite /* HINT: scroll */ select * from elastic; ``` What people think about this use-case ? Are there better ideas ? Regards, Andrei.
Re: [DISCUSS] Move gitbox notification emails to another list?
Hey, yeah just make those E-Mails stop please. They are pretty annoying. Am 27.02.2019 um 11:13 schrieb Francis Chuang: Hey all, I wanted to gauge your opinions regarding the gitbox emails being sent to the dev@ list. I am finding these emails to be quite noisy (there is an email every time there is activity in our Github repos). I end up deleting most of these emails without reading them and I feel that there is a lot of noise compared to signal. How do you guys feel about moving these to another list (for example git...@calcite.apache.org)? As Vladimir mentioned in another thread, these emails are quite useful and important as they serve as a searchable archive of activity in our Github repos. If we do move the emails to a different list, what do we do with the emails that have been sent to dev@? Do we nominate someone to forward a copy to the new list? Francis
EnumerableTableScan array/multiset handling
Hello, in my recent experiments I ran into some issues when trying to unnest an array of struct. The query is roughly this: select t.id, e.value1 from MyTable t, unnest(t.structArray) e EnumerableTableScan#fieldExpression will then try to generate code that converts the value of the "structArray" column to a List, which is where the problems start to arise. This code does not seem to be tested at all, because currently generates a compile error, due to missing a cast to "Iterable". It also assumes the data is already available in the JavaRowFormat.CUSTOM representation, but AFAIU, it could be in any format. When using RelRecordType for structs, regular struct columns seem to expect JavaRowFormat.ARRAY, but struct arrays don't seem to behave the same way. What is the expected data format that an enumerator should return for struct arrays that are typed as RelRecordType? To support formats per type it might be nice to allow specifying the JavaRowFormat on RelDataType. Wdyt? Also, is there a way to allow working with custom Java types for table/struct rows? From looking at AbstractCursor#createAccessor, it seems the Aviatica code currently only works with classes that expose public fields. Regards, Christian
Re: EnumerableTableScan array/multiset handling
Hi Julian, it seems to me that org.apache.calcite.rel.type.RelDataTypeFactoryImpl.JavaType, which is a subtype of RelDataType, would be the best place to model this. How about I add an Accessor contract and a field to JavaType to take care of producing expressions for the enumerable adapater? By default, this will use an implementation based on public fields or record accessors depending on class type. In addition, I would also add an implementation that works based on calling getter methods, since that is what I need for my purposes. Wdyt? Regards, Christian Am 31.05.2024 um 03:02 schrieb Julian Hyde: Thanks for doing these experiments, Christian, and documenting what you found. I think you’re running into the limitations of ReflectiveSchema. It works with POJOs (java classes with public fields) but hasn’t been tested for richer variations and therefore just doesn’t work. In many cases, it can be fixed (seehttps://issues.apache.org/jira/browse/CALCITE-6244 for example). I’m uneasy about extending RelDataType to return JavaRowFormat. That seems to be introducing a physical property into a logical data type; also, it is surfacing the details of one particular adapter (enumerable). Maybe there is a way to surface this information via annotations or the java.sql.Wrapper interface without extending RelDataType. Julian On May 24, 2024, at 11:33 AM, Christian Beikov wrote: Hello, in my recent experiments I ran into some issues when trying to unnest an array of struct. The query is roughly this: select t.id, e.value1 from MyTable t, unnest(t.structArray) e EnumerableTableScan#fieldExpression will then try to generate code that converts the value of the "structArray" column to a List, which is where the problems start to arise. This code does not seem to be tested at all, because currently generates a compile error, due to missing a cast to "Iterable". It also assumes the data is already available in the JavaRowFormat.CUSTOM representation, but AFAIU, it could be in any format. When using RelRecordType for structs, regular struct columns seem to expect JavaRowFormat.ARRAY, but struct arrays don't seem to behave the same way. What is the expected data format that an enumerator should return for struct arrays that are typed as RelRecordType? To support formats per type it might be nice to allow specifying the JavaRowFormat on RelDataType. Wdyt? Also, is there a way to allow working with custom Java types for table/struct rows? From looking at AbstractCursor#createAccessor, it seems the Aviatica code currently only works with classes that expose public fields. Regards, Christian
Re: Support for mv rewrite between different join types
Hi, I'd be very curious to see this being implemented. I wanted to tackle this in the past as well, but was unable to properly understand the papers in that area and hence failed to properly implement this. Nice to see it being picked up again. Regards, Christian Am 13.08.2024 um 18:50 schrieb suibianwanwan: Hello, everyone. As the use of materialized views is becoming more and more widespread, it is believed that there will be more users of the materialized view. Now including such as Oracle, Starrocks in the past on the basis of SPJG rewriting gradually support the rewriting of different types of join, i.e. SPOJG(o means outer join) I think we can build on UnifyRule by adding a rewrite between different join types. I actually mentioned this PR about a year ago when I was new to calcite, and didn't know enough about it then, but now I think I'm ready to start over, and before I do so again, I'd like to check out yours thoughts and discussions! In addition, I just joined the dev list and apologize for asking questions and discussion in jira issues in the past without understanding the community rules, and thank you all for your help! Thank you, suibianwanwan
[jira] [Created] (CALCITE-3810) Implement Rel-to-Sql translation for ANTI and SEMI join
Christian Beikov created CALCITE-3810: - Summary: Implement Rel-to-Sql translation for ANTI and SEMI join Key: CALCITE-3810 URL: https://issues.apache.org/jira/browse/CALCITE-3810 Project: Calcite Issue Type: Improvement Components: core Affects Versions: 1.21.0 Reporter: Christian Beikov Currently, when constructing an ANTI or SEMI join in the relational algebra, it's not possible to render to an Sql AST. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3840) Re-aliasing of VALUES that has column aliases produces wrong SQL in the JDBC adapter
Christian Beikov created CALCITE-3840: - Summary: Re-aliasing of VALUES that has column aliases produces wrong SQL in the JDBC adapter Key: CALCITE-3840 URL: https://issues.apache.org/jira/browse/CALCITE-3840 Project: Calcite Issue Type: Bug Components: jdbc-adapter Affects Versions: 1.21.0 Reporter: Christian Beikov Rendering a VALUES relnode to e.g. PostgreSQL will produce \{{FROM (VALUES((1))) AS t(col_alias)}} where "t" is a static alias. When e.g. joining with such a VALUES, the RelToSqlConverter tries to re-alias this with a unique alias, but fails because it produces \{{FROM (VALUES((1))) AS t(col_alias) AS newAlias}}. The fix is to replace the static table alias instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-1940) Implement dialect specific support for sequences
Christian Beikov created CALCITE-1940: - Summary: Implement dialect specific support for sequences Key: CALCITE-1940 URL: https://issues.apache.org/jira/browse/CALCITE-1940 Project: Calcite Issue Type: Improvement Reporter: Christian Beikov Assignee: Julian Hyde The Calcite parser and validator already supports sequences but the push down to the JDBC level is currently limited. SInce sequences are not supported on all DBMS every sql dialect should have the possibility to render it's own way of extracting sequence information or incrementing the value. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-1965) Support outer joins for materialized views
Christian Beikov created CALCITE-1965: - Summary: Support outer joins for materialized views Key: CALCITE-1965 URL: https://issues.apache.org/jira/browse/CALCITE-1965 Project: Calcite Issue Type: Improvement Components: core Reporter: Christian Beikov Assignee: Julian Hyde Currently, only inner joins are supported for materialized view substitutions. The support for outer joins involves creating new pulled up predicates in case of outer joins that represent semantics of the join. For a join predicate like "a.id = b.id" the inner join just pulls up that predicate. When having a left join like e.g. {{select * from a left join b on a.id = b.id}}, the actual pulled up predicate would be {{OR(=(a.id, b.id),ISNULL(b.id))}}. For a right join it would be {{OR(=(a.id, b.id),ISNULL(a.id))}} and for a full outer join it would be {{OR(=(a.id, b.id),ISNULL(a.id),ISNULL(b.id))}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-1966) Support for normal views to act as materialization table
Christian Beikov created CALCITE-1966: - Summary: Support for normal views to act as materialization table Key: CALCITE-1966 URL: https://issues.apache.org/jira/browse/CALCITE-1966 Project: Calcite Issue Type: Improvement Components: core Reporter: Christian Beikov Assignee: Julian Hyde Currently, it seems a materialized view can only have it's materialization in a real table, although it might be nice to be able to use a view instead. Since external data stores like e.g. elasticsearch aren't might have different data representations, the mismatch has to be overcome by doing casts in a view. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-1967) Add support for elastic search 5
Christian Beikov created CALCITE-1967: - Summary: Add support for elastic search 5 Key: CALCITE-1967 URL: https://issues.apache.org/jira/browse/CALCITE-1967 Project: Calcite Issue Type: Improvement Reporter: Christian Beikov Assignee: Julian Hyde The elastic search adapter seems to target versions before 5.x but it would be nice to also have an adapter for ES5+ so I created one that is quite similar to the existing adapter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CALCITE-1971) Use embedded elastic search in tests
Christian Beikov created CALCITE-1971: - Summary: Use embedded elastic search in tests Key: CALCITE-1971 URL: https://issues.apache.org/jira/browse/CALCITE-1971 Project: Calcite Issue Type: Improvement Reporter: Christian Beikov Assignee: Julian Hyde As discussed in CALCITE-1967 we should use the embedded elastic search server instead to be able to run integration tests locally. -- This message was sent by Atlassian JIRA (v6.4.14#64029)