MaterializationService.THREAD_INSTANCE vs INSTANCE

2018-09-26 Thread Luis Fernando Kauer
Hi.

I'm trying to use Calcite to optimize queries rewriting the queries to use 
materialized views.

First, I found the following doc page about materialization in Calcite website 
by googling but I coudn't find a link to it in Calcite's website. Anyway, it 
does not explain how to add materialized views to the schema.

http://calcite.apache.org/docs/materialized_views

I'm trying to follow how Cassandra Adapter does it.  It uses de Hook.TRIMMED to 
be called to add the materialized views.
Is that the recommended approach?

I noticed that MaterializationService.instance() first tries to get an instance 
from a ThreadLocal field (THREAD_INSTANCE) and if it is null then uses the 
field instance (INSTANCE). The comments in the class says that ThreadLocal 
instance is supposed to be used for testing.

However, THREAD_INSTANCE field is initialized with 
ThreadLocal.withInitial(MaterializationService::new), so it seems that it will 
never be null, making instance() method always return the THREAD_INSTANCE 
instance.

The problem is that I'm running the statements in worker threads and sometimes 
the materialized view is used and sometimes not. I think the problem is that 
the materialization is not available when running in a different thread.

Is that a bug?

Any tips on how to add materialized views to an AbstractSchema?

Regards,

Luis Fernando


Benchmarking Calcite - starting the conversation on the targets and design of the benchmark

2018-02-04 Thread Luis Fernando Kauer
I'd like to share the following link that I came across sime time ago about a 
product that uses Calcite to optimize Spark queries. 
https://www.datascience.com/blog/grunion-data-science-tools-query-optimizer-apache-spark





[jira] [Created] (CALCITE-2081) NPE in join of 2 subqueries using window functions

2017-12-08 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-2081:


 Summary: NPE in join of 2 subqueries using window functions
 Key: CALCITE-2081
 URL: https://issues.apache.org/jira/browse/CALCITE-2081
 Project: Calcite
  Issue Type: Bug
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde


NPE when joining 2 subqueries that use window functions.

Testing with a new test case in JdbcTest:
{code}
  /** Tests for a join of subqueries using Window Functions */
  @Test public void testJoinWithWinAgg() {
final String sql = "select a.*, b.r from\n"
+ "(select \"deptno\", first_value(\"empid\") over \n"
+ "(partition by \"deptno\" order by \"commission\") as r\n"
+ "from \"hr\".\"emps\") a\n"
+ "left join\n"
+ "(select \"deptno\", last_value(\"empid\") over \n"
+ "(partition by \"deptno\" order by \"commission\") as r\n"
+ "from \"hr\".\"emps\") b\n"
+ "on a.\"deptno\" = b.\"deptno\"";
CalciteAssert.hr()
.query(sql)
.runs();
  }
{code}

Debugging this I found out that the NPE occurs at Expressions.declare (line 
2937) in initializer.getType() because initializer is null.  The statement is
{code}
int prevStart;
{code}
This statement does not have an initializer, but the method "declare" is trying 
to get the initializer's type, generating NPE.
This happens when joining 2 subqueries that use window functions because they 
end up using the same variables.  In BlockBuilder.append (line 124) it checks 
if the variable already exists and the problem starts there.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: BindableTableScan is losing aliased named

2017-11-17 Thread Luis Fernando Kauer
Please insert the following test in ScannableTableTest and run:
   /** ProjectableFilterableTable project push down using alias. */
  @Test public void testPFPushDownWithAlias() throws Exception {
    final StringBuilder buf = new StringBuilder();
    final Table table = new BeatlesProjectableFilterableTable(buf, true);
    CalciteAssert.that()
        .with(newSchema("s", "beatles", table))
        .query("select \"i\" as theKey from \"s\".\"beatles\"")
        .explainContains("EnumerableInterpreter\n"
            + "  BindableTableScan(table=[[s, beatles]], projects=[[0]])\n")
        .returnsUnordered("THEKEY=4",
            "THEKEY=4",
            "THEKEY=6",
            "THEKEY=5");  }
It passes for me.
You'll need the updated version of this test class, because it was recently 
refactored.
It's usually easier for other people to test the problem if you create runnable 
test cases to show the problem.   Em sexta-feira, 17 de novembro de 2017 
14:29:09 BRST, Enrico Olivelli <eolive...@gmail.com> escreveu:  
 
 In the RowType I have 'k1' and not "thekey"

Enrico

2017-11-17 17:20 GMT+01:00 Luis Fernando Kauer <lfka...@yahoo.com.br.invalid
>:

>  Did you execute the query?
> ProjectRemoveRule removes the Project when it is "trivial".  Since the
> only used project was pushed to BindableTableScan, the Project would only
> set the alias, but that can be done in row type.
> The result is correct because RowType is preserved with the alias.  It
> just does not show in the plan.
> However, I seems that repeating a column with different aliases generates
> an error:
> SELECT k1 theKey, k1 theKey2 FROM tblspace1.tsql where k1 ='mykey2'
>
> Regards,
> Luis Fernando
>
>    Em sexta-feira, 17 de novembro de 2017 12:27:49 BRST, Enrico Olivelli <
> eolive...@gmail.com> escreveu:
>
>  Hi,
> I have a ProjectableFilterableTable, it seems to me that the BindablaScan
> RelNode does not keep track of column name aliases in the original
> projection
>
> Example:
>
> Query:SELECT k1 theKey FROM tblspace1.tsql where k1 ='mykey2'
>
> -- Logical Plan
> LogicalProject(THEKEY=[$0])
>  LogicalFilter(condition=[=($0, 'mykey2')])
>    EnumerableTableScan(table=[[tblspace1, tsql]])
>
> -- Best Plan
> EnumerableInterpreter
>  BindableTableScan(table=[[tblspace1, tsql]], filters=[[=($0,
> 'mykey2')]],
> projects=[[0]])
>
> Is this correct ?
> IMHO It would better to keep somethin like an EnumerableProjection which
> renames the fields
>
> Am I missing something ?
> Thanks
> Enrico
>
>
  

Re: BindableTableScan is losing aliased named

2017-11-17 Thread Luis Fernando Kauer
 Did you execute the query?
ProjectRemoveRule removes the Project when it is "trivial".  Since the only 
used project was pushed to BindableTableScan, the Project would only set the 
alias, but that can be done in row type.
The result is correct because RowType is preserved with the alias.  It just 
does not show in the plan.
However, I seems that repeating a column with different aliases generates an 
error:
SELECT k1 theKey, k1 theKey2 FROM tblspace1.tsql where k1 ='mykey2'

Regards,
Luis Fernando

Em sexta-feira, 17 de novembro de 2017 12:27:49 BRST, Enrico Olivelli 
 escreveu:  
 
 Hi,
I have a ProjectableFilterableTable, it seems to me that the BindablaScan
RelNode does not keep track of column name aliases in the original
projection

Example:

Query:SELECT k1 theKey FROM tblspace1.tsql where k1 ='mykey2'

-- Logical Plan
LogicalProject(THEKEY=[$0])
  LogicalFilter(condition=[=($0, 'mykey2')])
    EnumerableTableScan(table=[[tblspace1, tsql]])

-- Best Plan
EnumerableInterpreter
  BindableTableScan(table=[[tblspace1, tsql]], filters=[[=($0, 'mykey2')]],
projects=[[0]])

Is this correct ?
IMHO It would better to keep somethin like an EnumerableProjection which
renames the fields

Am I missing something ?
Thanks
Enrico
  

Re: Project + Sort on single and on multiple columns

2017-11-14 Thread Luis Fernando Kauer
There is something I don't get about Sort in Calcite.  Actually, it seems to 
happen already when parsing the SQL. 
Sort does not limit the projects, but it requires that the projects used for 
ordering are included in its input.
How can a Sort be the top node when there are projects used for ordering that 
are not in the select list? I thought there should always be a top project node 
selecting the projects to return when it needs to restrict or change from the 
previous node.  The problem is that the lack of a Project node on top 
restricting the projects will make adapters return more columns than are 
actually necessary.


Example 1:
>> SELECT empno FROM scott.emp ORDER BY ename

  Logical Plan:
LogicalSort(sort0=[$1], dir0=[ASC])
  LogicalProject(EMPNO=[$0], ENAME=[$1])
    JdbcTableScan(table=[[SCOTT, EMP]])

  Physical Plan:
 JdbcToEnumerableConverter
  JdbcProject(ENAME=[$1], EMPNO=[$0])
    JdbcSort(sort0=[$0], dir0=[ASC])
      JdbcTableScan(table=[[SCOTT, EMP]])


 The generated SQL to query the database is:
 SELECT ENAME, EMPNOFROM (SELECT * FROM SCOTT.EMP ORDER BY EMPNO NULLS LAST) AS 
t


---

Example 1:
> SELECT ename FROM scott.emp ORDER BY empno limit 10

-   Logical Plan:

  LogicalSort(sort0=[$1], dir0=[ASC], fetch=[10])
  LogicalProject(EMPNO=[$0], ENAME=[$1])
    JdbcTableScan(table=[[SCOTT, EMP]])


-  Physical Plan:
 JdbcToEnumerableConverter
  JdbcProject(EMPNO=[$0], ENAME=[$1])
    JdbcSort(sort0=[$1], dir0=[ASC], fetch=[10])
      JdbcTableScan(table=[[SCOTT, EMP]])


 The generated SQL to query the database is:

 SELECT EMPNO, ENAMEFROM (SELECT * FROM SCOTT.EMP ORDER BY ENAME NULLS LAST 
FETCH NEXT 10 ROWS ONLY) AS t0



I'm facing 3 main problems:
1. There's not a Project node that limits the projects above Sort even in the 
Logical Plan.
2. RelToSqlConverter does not work right with Sort
3. Sometimes JdbcProject is above JdbcSort and others is below.

Anybody can help me figure this out?
   Em sábado, 11 de novembro de 2017 09:35:40 BRST, Christian Tzolov 
<ctzo...@pivotal.io> escreveu:  
 
 @Julian,  this exactly what i would expect as logical behavior.

In fact it (consistently) behaves like this (e.g. Sort is after Project)
when the Sort is on a single column! But for some reasons if the Sort is
performed on two columns (same used in the Project) then the Sort is
performed before the Project!

This would be an expected if the Sort columns are not contained in the
Project such. So i wonder could it be that somewhere during the Sort
optimization the this column check gets confused?

On 11 November 2017 at 03:05, Julian Hyde <jh...@apache.org> wrote:

> While the cost of Project does not depend heavily on the number of input
> columns, the cost of Sort (or at least a typical Sort algorithm such as
> external merge sort) does depend on the number of columns (or more
> precisely on the average row size in bytes). So, if the Project reduces the
> number of columns (as most Projects do) then the Sort will have lower cost
> if performed after the Project, because it is handling fewer bytes.
>
>
> > On Nov 10, 2017, at 10:32 AM, Luis Fernando Kauer
> <lfka...@yahoo.com.br.INVALID> wrote:
> >
> > I'm trying to fix https://issues.apache.org/jira/browse/CALCITE-1906
> and I'm facing a similar problem.
> > After managing to make JdbcSort to work, sometimes the JdbcProject is
> above JdbcSort and the generated SQL is wrong because RelToSqlConverter
> uses SqlImplementator.Clause enum to decide when to create subqueries, but
> since ORDER_BY is after SELECT, once it gets to JdbcProject it can't use
> the same query because it already used ORDER_BY.
> > The rule responsable for this is SortProjectTransposeRule.  The opposite
> rule is ProjectSortTransposeRule, but this one only matches if the sort
> node is exactly Sort.class, so it ends up not matching.
> > Is pushing the Project above Sort usually a good final plan or is it
> done to allow other rules to match?  If it is not, maybe we should solve
> this in the core project.
> >
> >
> >
> >    Em sexta-feira, 10 de novembro de 2017 15:57:34 BRST, Michael Mior <
> mm...@uwaterloo.ca> escreveu:
> >
> > Since the cost of the project doesn't depend on the number of columns
> being
> > projected or the size of the input, putting the project before or after
> the
> > sort will result in the same estimated cost. One approach would be to
> scale
> > the cost of the projection based on the fraction of columns projected.
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > 2017-11-10 12:42 GMT-05:00 Christian Tzolov <ctzo...@pivotal.io>:
> >
> >> ​I've observed in my
> >> no-sql adapter
> >> ​ implementation that for q
> >>

Re: Aliases for schemas

2017-11-14 Thread Luis Fernando Kauer
Hi,
FrameworkConfig has getDefaultSchema(), from javadoc "Returns the default 
schema that should be checked before looking at the root schema.  Returns null 
to only consult the root schema".
So, if you are using Frameworks.newConfigBuilder(), you can set it using 
".defaultSchema( rootSchema.getSubSchema( "one" ))", for example.   Em 
terça-feira, 14 de novembro de 2017 10:59:33 BRST, Enrico Olivelli 
 escreveu:  
 
 Hi,
(I am playing with the planner, you know)
I have several "schema" in my system, like "one", "two"
So my root schema is built this way:
final SchemaPlus rootSchema = Frameworks.createRootSchema(true);
        for (String tableSpace : listTableSpaces()) {
            SchemaPlus schema = rootSchema.add(tableSpace, new
AbstractSchema());
}

Queries like
SELECT * from one.mytable
SELECT * from two.mytable

get parsed and validated.

But I would like to set a "default" schema to be used when the query does
not explicitly refer to a schema, like
default-schema='one'
SELECT * from mytable

which I would expect to be parsed as
SELECT * from one.mytable

Is it possible ?


Bonus question, separate from the former, it seems that the parser
complaints if I try to use a schema with name "default", using MYSQL5_5
compatinility for instance. Is there any tweak to make it work ?

Calcite is really great!
Enrico
  

Re: Project + Sort on single and on multiple columns

2017-11-10 Thread Luis Fernando Kauer
 I'm trying to fix https://issues.apache.org/jira/browse/CALCITE-1906 and I'm 
facing a similar problem.
After managing to make JdbcSort to work, sometimes the JdbcProject is above 
JdbcSort and the generated SQL is wrong because RelToSqlConverter uses 
SqlImplementator.Clause enum to decide when to create subqueries, but since 
ORDER_BY is after SELECT, once it gets to JdbcProject it can't use the same 
query because it already used ORDER_BY.
The rule responsable for this is SortProjectTransposeRule.  The opposite rule 
is ProjectSortTransposeRule, but this one only matches if the sort node is 
exactly Sort.class, so it ends up not matching.
Is pushing the Project above Sort usually a good final plan or is it done to 
allow other rules to match?  If it is not, maybe we should solve this in the 
core project.



Em sexta-feira, 10 de novembro de 2017 15:57:34 BRST, Michael Mior 
 escreveu:  
 
 Since the cost of the project doesn't depend on the number of columns being
projected or the size of the input, putting the project before or after the
sort will result in the same estimated cost. One approach would be to scale
the cost of the projection based on the fraction of columns projected.

--
Michael Mior
mm...@apache.org

2017-11-10 12:42 GMT-05:00 Christian Tzolov :

> ​I've observed in my
> no-sql adapter
> ​ implementation that for q
> ueries with
> ​P
> roject
> ​ +
> ​S
> ort by
> ​ONE
>  column
> ​t​
> he
> ​Project
>
> ​is pushed (as expected) ​
> before the Sort but for Sort
> ​on MULTIPLE
>  columns
> ​the Sort is before the Project.
> For example
> ​for a query with one
> sort column:
>
> SELECT yearPublished FROM BookMaster ORDER BY yearPublished ASC
>
> ​The plan looks like expected (project before the sort)
>
>
> PLAN=GeodeToEnumerableConverterRel
>  *GeodeSortRel*(sort0=[$0], dir0=[ASC])
>        GeodeProjectRel(yearPublished=[$2])
>            GeodeTableScanRel(table=[[TEST, BookMaster]])
>
> But
> ​ for sort​
> with
> ​two​
> ​
> columns:
>
> SELECT yearPublished, itemNumber from BookMaster ORDER BY yearPublished
> ASC, itemNumber ASC
>
> The
> ​the plan is:
>
>
> PLAN=GeodeToEnumerableConverterRel
>  GeodeProjectRel(yearPublished=[$2], itemNumber=[$0])
>    *GeodeSortRel*(sort0=[$2], sort1=[$0], dir0=[ASC], dir1=[ASC])
>      GeodeTableScanRel(table=[[TEST, BookMaster]])
>
> I'm not sure i can explain
> ​ why in the second case the Sort appears before the Project?
> Here
> ​are my cost functions:
>
> ​* ​
> GeodeSortRel
> ​:
> https://github.com/tzolov/calcite/blob/geode-1.3/geode/
> src/main/java/org/apache/calcite/adapter/geode/rel/GeodeSortRel.java#L51
>
> * GoedeProjectRel:
> https://github.com/tzolov/calcite/blob/4a631d9055340f64f5e644454551f9
> 64ea08f9de/geode/src/main/java/org/apache/calcite/adapter/geode/rel/
> GeodeProjectRel.java#L52
> ​
>
> ​
> ​Cheers,
> Christian​
>  

Re: Recycling Planner objects

2017-11-10 Thread Luis Fernando Kauer
Sorry,
My knowledge hasn't reached that far.
Currently the tutorial and example code is mostly for creating new adapters.
I think we need tutorial, documentation and example code to work directly with 
the planner, because it is a lot harder.
Regards,
Luis Fernando
   Em quinta-feira, 9 de novembro de 2017 06:52:26 BRST, Enrico Olivelli 
 escreveu:  
 
 I am going to use Calcite Planner inside my DBMS.
I would like to understand which objects can be reused and which objects
are treated in a dynamic fashion.

Planner object, is this cachable ? I see i is not stateless. Is it better
to alwayes create a new instance or could it be reclycled ?

Table statistics:
are them evaluated at every time a query is to be planned ?
at least the table size

Schemas:
in a live system schema changes, but as far as I have undestood the Schema
is part of Framework config.
Which is the best strategy ? to drop the current framework config and all
cached objects in case of a DDL command ?

Execution Plans:
Are RelNodes cachable ? I see they have strong references to the planner
which built the object.
I am looking to a cache like QUERY -> PhisycalPlan

Regards
Enrico
  

Re: Volcano planner

2017-11-08 Thread Luis Fernando Kauer
 If you intend to run a query then you should follow the tutorial and try to 
change the csv adapter.  You can add the table to the schema at runtime using 
something like:
-

Class.forName("org.apache.calcite.jdbc.Driver");
Properties info = new Properties();
info.setProperty("lex", "MYSQL_ANSI");

final Connection connection = DriverManager.getConnection("jdbc:calcite:", 
info);
CalciteConnection conn = connection.unwrap(CalciteConnection.class);
SchemaPlus root = conn.getRootSchema();
root.add("MYTABLE", new TableImpl());
Statement statement = conn.createStatement();
ResultSet rs = statement.executeQuery("SELECT * FROM MYTABLE");
-

But if you only want to parse, validate and optimize the query plan, you can 
use something like:
-
    Table table = new TableImpl();
    final SchemaPlus rootSchema = Frameworks.createRootSchema(true);
    SchemaPlus schema = rootSchema.add("x", new AbstractSchema());
    schema.add("MYTABLE", table);
    List traitDefs = new ArrayList<>();
    traitDefs.add(ConventionTraitDef.INSTANCE);
    traitDefs.add(RelCollationTraitDef.INSTANCE);
    SqlParser.Config parserConfig = 
       SqlParser.configBuilder(SqlParser.Config.DEFAULT)
      .setCaseSensitive(false)
      .build();
      
    final FrameworkConfig config = Frameworks.newConfigBuilder()
        .parserConfig(parserConfig)
        .defaultSchema(schema)
        .traitDefs(traitDefs)
        // define the rules you want to apply

        .programs(Programs.ofRules(Programs.RULE_SET))
        .build();
    Planner planner = Frameworks.getPlanner(config);
    SqlNode n = planner.parse(" SELECT * FROM MYTABLE WHERE ID < 10");
    n = planner.validate(n);
    RelNode root = planner.rel(n).project();
    System.out.println(RelOptUtil.dumpPlan("-- Logical Plan", root, 
SqlExplainFormat.TEXT, 
SqlExplainLevel.DIGEST_ATTRIBUTES));
    RelOptCluster cluster = root.getCluster();
    final RelOptPlanner optPlanner = cluster.getPlanner();
    RelTraitSet desiredTraits =
        cluster.traitSet().replace(EnumerableConvention.INSTANCE);
    final RelNode newRoot = optPlanner.changeTraits(root, desiredTraits);
    optPlanner.setRoot(newRoot);
    RelNode bestExp = optPlanner.findBestExp();        
System.out.println(RelOptUtil.dumpPlan("-- Best Plan", bestExp, 
SqlExplainFormat.TEXT, 
SqlExplainLevel.DIGEST_ATTRIBUTES));
 -

The main problem was that you were not setting the desired trait to use 
EnumerableConvention.
You can see that instead of implementing all the interfaces you should use the 
available builders and classes.
Also for implementing Table I think you should extend AbstractTable instead of 
implementing Table interface and you can use Statistics.of instead of 
implementing Statistic interface if it is simple:
-

  private static class TableImpl extends AbstractTable {
    public TableImpl() {}
    @Override    public RelDataType getRowType(RelDataTypeFactory typeFactory) {
      Builder builder = new RelDataTypeFactory.Builder(typeFactory);
      return builder.add("id", typeFactory.createSqlType(SqlTypeName.INTEGER))
          .add("name", typeFactory.createSqlType(SqlTypeName.VARCHAR)).build();
    }
    @Override
    public Statistic getStatistic() {
       return Statistics.of(15D, ImmutableList.of(), 
          ImmutableList.of(RelCollations.of(0), RelCollations.of(1)));
    }

  }
-


Em quarta-feira, 8 de novembro de 2017 12:15:34 BRST, Enrico Olivelli 
 escreveu:  
 
 Hi,
I am playing with the planner but I can't get it work for a very simple
query.
Th table is
 MYTABLE(id integer, name varchar)            definition is given in code
snippet
the query is "SELECT * FROM MYTABLE"

The error is:
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
[rel#7:Subset#0.NONE.[0, 1].any] could not be implemented; planner state:

Root: rel#7:Subset#0.NONE.[0, 1].any
Original rel:
LogicalProject(subset=[rel#6:Subset#1.NONE.[0, 1].any], id=[$0],
name=[$1]): rowcount = 15.0, cumulative cost = {15.0 rows, 30.0 cpu, 0.0
io}, id = 5
  EnumerableTableScan(subset=[rel#4:Subset#0.ENUMERABLE.[0, 1].any],
table=[[default, MYTABLE]]): rowcount = 15.0, cumulative cost = {15.0 rows,
16.0 cpu, 0.0 io}, id = 2

Sets:
Set#0, type: RecordType(INTEGER id, VARCHAR name)
    rel#4:Subset#0.ENUMERABLE.[0, 1].any, best=rel#2, importance=0.9
        rel#2:EnumerableTableScan.ENUMERABLE.[[0, 1]].any(table=[default,
MYTABLE]), rowcount=15.0, cumulative cost={15.0 rows, 16.0 cpu, 0.0 io}
        rel#9:EnumerableProject.ENUMERABLE.[[0,
1]].any(input=rel#4:Subset#0.ENUMERABLE.[0, 

[jira] [Created] (CALCITE-2044) Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate

2017-11-08 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-2044:


 Summary: Tweak cost of BindableTableScan to make sure Project is 
pushed through Aggregate
 Key: CALCITE-2044
 URL: https://issues.apache.org/jira/browse/CALCITE-2044
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde
Priority: Minor


Similar to [CALCITE-1876].
Projects are not pushed to BindableTableScan when using 
ProjectableFilterableTable with aggregate functions.
The reason is that the cost of BindableTableScan does not use projects (and 
filters), so the planner chooses a plan with Project node removed by 
ProjectRemoveRule.
By tweaking the cost to use the number of used projects solved the problem.
Any suggestion on the cost formula to take both projects and filters into 
account?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: ProjectableFilterableTable.scan() and expensive columns

2017-11-06 Thread Luis Fernando Kauer
 Alexey,

The example of computeSelfCost multiplies the cost by projects.size() / 
identity().size(), where projects.size() is the number of columns to be read 
(used) and identity.size() is the total number of columns of the table. So the 
cost is reduced if less columns need to be read instead of reading them all.
You could use projects.size() / table.getRowType().getFieldCount() instead, but 
it's the same thing.
Or you could use something more complex like in DruidQuery.computeSelfCost.
If you plan on using ProjectableFilterableTable then you'll have to fix 
BindableTableScan.computeSelfCost, because that's what is used to calculate the 
cost and currently it does not use the projects size.
But you can use TranslatableTable just like int CSV Adapter.  The problem with 
not pushing the used projects when using some aggregate functions is also 
solved by implementing computeSelfCost in CsvTableScan:
 @Override
  public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) 
{
    return super.computeSelfCost(planner, 
mq).multiplyBy(fields.length/table.getRowType().getFieldCount());
  }

And by using TranslatableTable you have full control on how to compute the cost 
to tailor your needs without having to change anything in Calcite, just in your 
adapter.
If you need to push filters down to your table, you'll need to implement that 
through rules.  This can be more advanced because you'll need to learn how to 
create rules but it will be worth it, because you will understand Calcite 
better and be able to solve other problems by yourself.

Em segunda-feira, 6 de novembro de 2017 12:09:06 BRST, Alexey Roytman 
<alexey.royt...@oracle.com> escreveu:  
 
 Hello, Julian.
The changes that belong to the Calcite itself -- definitely, I will 
contribute them. For now I have only javadoc-related in pull request.

I understand that my approach shall be flexible and universal enough, 
and aligned with Calcite's ideology (that I'm not acquainted with).

So, for now my only idea is to have an interface 
ProjectableFilterableTableScanCostEstimator that has a:

@Override public RelOptCost computeScanCost(RelOptPlanner planner, 
RelMetadataQuery mq, final List list, final int[] projected);

And at some moment during VolcanoPlanner.findBestExp() flow, we need to 
check whether a (table.table instanceof 
ProjectableFilterableTableScanCostEstimator) and if yes, then multiply 
by computeScanCost(). Maybe, the right place is somewhere near 
BindableTableScan.computeSelfCost(), but I don't remember whether I have 
an access to table.table there...

Regarding your approach 
(https://issues.apache.org/jira/browse/CALCITE-1933) of 
ExpressionComplexity -- I don't get where shall I have an instance of 
this, and how do I push such a beast to ProjectableFilterableTable 
instance. I.e. I don't understand the idea of data composition here...

- Alexey.

On 11/01/2017 10:37 PM, Julian Hyde wrote:
> Alexey,
>
> Do you intend to contribute your changes back to Calcite? If not, feel free 
> to disregard the following (but also downgrade the amount of free advice that 
> you expect from this list; we all have other priorities).
>
> If you do intend to contribute back, you should keep in mind that some 
> changes will be more acceptable than others. It’s not sufficient that the 
> changes solve your problem; your  changes must not cause problems for other 
> people.
>
> For instance, adding a field to SqlTypeName is probably not good because it 
> doesn’t allow people to customize the cost. Also, when extending a class or 
> interface by adding methods and fields, ask yourself whether it was intended 
> to be extended.
>
> The absolute most useful code you can contribute is a test case that covers 
> all of your requirements and is simple for everyone else to run. Then we can 
> safely refactor your implementation and know that you are still getting what 
> you need.
>
> Are you aware of 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1933=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=LhjCnSmzU2hPgGoBHx67hhcdFfWo4OSoQob6udt6lpA=wxCJy0-IChQRB53y7qCSONbcgJj24vGDvV1P8DylhMg=LBpaGzJP6rHVxjUmLc_MXY_w4P_jkJSIx5QhtdXGuiw=
>  
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1933=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=LhjCnSmzU2hPgGoBHx67hhcdFfWo4OSoQob6udt6lpA=wxCJy0-IChQRB53y7qCSONbcgJj24vGDvV1P8DylhMg=LBpaGzJP6rHVxjUmLc_MXY_w4P_jkJSIx5QhtdXGuiw=>
>  ?  There I suggested adding a new kind of metadata called 
> ExpressionComplexity. You could write your own handler that takes into 
> account SqlTypeName or whatever you like.
>
> Julian
>
>
>
>> On Oct 31, 2017, at 10:13 AM, Luis Fernando Kauer 
>> <lfka...@yahoo.com.br.INVALID> wrote:
>>
>> Hi,
>> I

Re: ProjectableFilterableTable.scan() and expensive columns

2017-10-31 Thread Luis Fernando Kauer
 Hi,
I'm also newbie and I'm just trying to help with I'm learned so far.
It's all about the rules.  I strongly recommend taking a look into:
https://calcite.apache.org/docs/howto.html#tracing

Specially about setting the logger lever to DEBUG.

By doing this you can check which rules are being fired, the costs and chosen 
plan.
As I mentioned in my last reply, the problem seems to be that 
BindableTableScan.computeSelfCost does not use the information about used 
projects and filters to compute the cost of the BindableTableScan.
Many rules are applied that convert the nodes and at the end the plan with the 
lowest cost is chosen.  Among the generated plans are plans that push the 
projects to the BindableTableScan and others that do not.  The total cost of 
the plan that pushes the projects and the filter is the probably the same or 
higher than the cost of the plan that pushes only the filter and removes the 
projects (through AggregateProjectMergeRule) and the planner ends up choosing 
the latter.
So maybe just by fixing computeSelfCost it solves the problem.
Just for the purpose of testing this, I changed computeSelfCost to:
@Override public RelOptCost computeSelfCost(RelOptPlanner planner,
        RelMetadataQuery mq) {
      return super.computeSelfCost(planner, mq).multiplyBy((0.01 + 
projects.size())/identity().size());
    }
And only by doing this, the planner chose a plan that pushes the projects and 
filters to BindableTableScan when used with aggregation.
You can also override getStatistic() for your implementation of 
ProjectableFilterableTable to provide some statistics to be used in computing 
costs.
Calcite also allows other ways to provide information and statistics but that's 
beyond my knowledge.
Regards,
Luis Fernando


Em terça-feira, 31 de outubro de 2017 13:38:05 BRST, Alexey Roytman 
<alexey.royt...@oracle.com> escreveu:  
 
 Sorry for the delay, Luis Fernando.

Please see below, as there are a number of answers.


On 10/26/2017 09:37 PM, Luis Fernando Kauer wrote:
>  I'm sorry, but I have no idea what you are talking about.
> Cassandra Adapter has code to translate the plan to run the query in 
> Cassandra Server.
> If you are only interested in querying CSV files I don't see how copying that 
> code without understanding it will help you.
[Alexey]I need neither Cassandra nor CSV adapter. Cassandra was 
mentioned by Julian, so I started investigated it. The CVS files were 
used because this way I can create a working test to share with 
community to ask questions.

> First of all, you need to decide whether you will use 
> ProjectableFilterableTable or TranslatableTable.
[Alexey] I would like to use ProjectableFilterableTable, but it does a 
poor work for me: starting from a certain level of nesting, it wants all 
projections. And for my task it's a too heavy query: my numeric columns 
are remotely calculated, with different (and unpredictable) amount of 
work for each column.

[Alexey] So, as Julian mentioned Cassandra's interface, I started with 
it. There was as complication: it has both Translatable and both 
Queriable interfaces. For Queriable interfaces, the RexNode are 
translated to List (all these translateBinary2() functions etc) 
and passed via reflection call to the query. But for me, at the query 
level, I need the RexNodes themselves, and I don't want to parse these 
List back!

[Alexey] So, I've started again, but with the Druid adapter, which uses 
only Translatable interface.

> You must try to understand how the rules work and how to check which rules 
> are being fired and which ones are being chosen.
[Alexey] I do try. And when I reach certain understanding, then 
obviously I won't ask such newbie questions that I ask for now :-))
> Did you follow the tutorial for creating CSV Adapter? It creates a rule to 
> push the used projects to the table scan. That is a great start.
[Alexey] I did. But even in flavor=translatable it is very simplistic, 
and did not do the job of having both filters and projections at the 
lowest level...

> It's a good idea to take a look at the built in rules available in Calcite 
> too.
> You should take a look into FilterTableScanRule and ProjectTableScanRule, 
> which are the rules that push the projects and filters used with 
> ProjectableFilterableTable into a BindableTableScan, and the other rules int 
> Bindables.java.
[Alexey] I totally agree with you. But when I look at the code there, I 
understand even less than now. This will improve with time, but for now 
this is what I have...
> The rules work fine when there is no aggregate function, pushing both filter 
> and projects into BindableTableScan.  The problem seems to be with 
> AggregateProjectMergeRule which removes the Project from the plan.
> If you remove the filter from your test cases you'll see that the projects 
> are pushed to the BindableTableScan.
> I was able to simulate your problem

Re: ProjectableFilterableTable.scan() and expensive columns

2017-10-26 Thread Luis Fernando Kauer
 BindableTableScan.computeSelfCost does not use in computing the cost if there 
are filters or how many projects are used, it just multiplies the cost by 0.01.
Just by changing the cost to take into account the number of projects over the 
identity (all projects) makes the planner choose the plan that pushes projects 
and filter to BindableTableScan, because the overall cost is lower than 
removing the Project by AggregateProjectMergeRule.
Any suggestion or example on a good way to compute the cost taking into account 
the projects and filters?
Em quinta-feira, 26 de outubro de 2017 16:38:14 BRST, Luis Fernando Kauer 
<lfka...@yahoo.com.br.INVALID> escreveu:  
 
  I'm sorry, but I have no idea what you are talking about.
Cassandra Adapter has code to translate the plan to run the query in Cassandra 
Server.
If you are only interested in querying CSV files I don't see how copying that 
code without understanding it will help you.
First of all, you need to decide whether you will use 
ProjectableFilterableTable or TranslatableTable.
You must try to understand how the rules work and how to check which rules are 
being fired and which ones are being chosen.
Did you follow the tutorial for creating CSV Adapter? It creates a rule to push 
the used projects to the table scan. That is a great start.
It's a good idea to take a look at the built in rules available in Calcite too.
You should take a look into FilterTableScanRule and ProjectTableScanRule, which 
are the rules that push the projects and filters used with 
ProjectableFilterableTable into a BindableTableScan, and the other rules int 
Bindables.java.
The rules work fine when there is no aggregate function, pushing both filter 
and projects into BindableTableScan.  The problem seems to be with 
AggregateProjectMergeRule which removes the Project from the plan.
If you remove the filter from your test cases you'll see that the projects are 
pushed to the BindableTableScan.
I was able to simulate your problem using 
ScannableTableTest.testProjectableFilterable2WithProject changing the query 
into "select \"k\", count(*) from (select \"k\",\"j\" from \"s\".\"beatles\" 
where \"i\" = 4) x group by \"k\"".
The plan:
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
  LogicalProject(k=[$1])
    LogicalFilter(condition=[=($0, 4)])
      LogicalProject(i=[$0], k=[$2])
        LogicalTableScan(table=[[s, beatles]])

PhysicalPlan:
EnumerableAggregate(group=[{2}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 112
  EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 50.0 
cpu, 0.0 io}, id = 110
    BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]]): rowcount = 
100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 62


If I disable AggregateProjectMergeRule, the physical plan is:
EnumerableAggregate(group=[{0}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 102
  EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 50.0 
cpu, 0.0 io}, id = 100
    BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]], 
projects=[[2]]): rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 
io}, id = 78


Regards,

Luis Fernando



    Em quinta-feira, 26 de outubro de 2017 13:19:46 BRST, Alexey Roytman 
<alexey.royt...@oracle.com> escreveu:  
 
 Thanks for the hints.

I've tried to use [i.e. copy-pasted a lot of] Cassandra*.java for my 
CSV-files example. It's really too wordy! So lot of code I need to 
understand later!..

But what bothers me most for now is the fact that I just cannot pass 
List to [my modification of] CassandraTable.query(); I need to 
convert it to some string form within List using 
CassandraFilter.Translator, and then, when passed to [my modification 
of] CassandraTable.query(), I need to parse these List back... 
Is there way to eliminate this back-and-forth serialization-deserialization?

- Alexey.

(P.S. Sorry for not keeping the email thread for now...)

Julian Hyde wrote wrote:
> By "write a rule" I mean write a class that extends RelOptRule. An 
> example is CassandraRules.CassandraFilterRule.
> ProjectableFilterableTable was "only" designed for the case that 
> occurs 80% of the time but requires 20% of the functionality. Rules 
> run in a richer environment so have more power and flexibility.


Re: ProjectableFilterableTable.scan() and expensive columns

2017-10-26 Thread Luis Fernando Kauer
 I'm sorry, but I have no idea what you are talking about.
Cassandra Adapter has code to translate the plan to run the query in Cassandra 
Server.
If you are only interested in querying CSV files I don't see how copying that 
code without understanding it will help you.
First of all, you need to decide whether you will use 
ProjectableFilterableTable or TranslatableTable.
You must try to understand how the rules work and how to check which rules are 
being fired and which ones are being chosen.
Did you follow the tutorial for creating CSV Adapter? It creates a rule to push 
the used projects to the table scan. That is a great start.
It's a good idea to take a look at the built in rules available in Calcite too.
You should take a look into FilterTableScanRule and ProjectTableScanRule, which 
are the rules that push the projects and filters used with 
ProjectableFilterableTable into a BindableTableScan, and the other rules int 
Bindables.java.
The rules work fine when there is no aggregate function, pushing both filter 
and projects into BindableTableScan.  The problem seems to be with 
AggregateProjectMergeRule which removes the Project from the plan.
If you remove the filter from your test cases you'll see that the projects are 
pushed to the BindableTableScan.
I was able to simulate your problem using 
ScannableTableTest.testProjectableFilterable2WithProject changing the query 
into "select \"k\", count(*) from (select \"k\",\"j\" from \"s\".\"beatles\" 
where \"i\" = 4) x group by \"k\"".
The plan:
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
  LogicalProject(k=[$1])
    LogicalFilter(condition=[=($0, 4)])
      LogicalProject(i=[$0], k=[$2])
        LogicalTableScan(table=[[s, beatles]])

PhysicalPlan:
EnumerableAggregate(group=[{2}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 112
  EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 50.0 
cpu, 0.0 io}, id = 110
    BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]]): rowcount = 
100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 io}, id = 62


If I disable AggregateProjectMergeRule, the physical plan is:
EnumerableAggregate(group=[{0}], EXPR$1=[COUNT()]): rowcount = 10.0, cumulative 
cost = {61.25 rows, 50.0 cpu, 0.0 io}, id = 102
  EnumerableInterpreter: rowcount = 100.0, cumulative cost = {50.0 rows, 50.0 
cpu, 0.0 io}, id = 100
    BindableTableScan(table=[[s, beatles]], filters=[[=($0, 4)]], 
projects=[[2]]): rowcount = 100.0, cumulative cost = {1.0 rows, 1.01 cpu, 0.0 
io}, id = 78


Regards,

Luis Fernando



Em quinta-feira, 26 de outubro de 2017 13:19:46 BRST, Alexey Roytman 
 escreveu:  
 
 Thanks for the hints.

I've tried to use [i.e. copy-pasted a lot of] Cassandra*.java for my 
CSV-files example. It's really too wordy! So lot of code I need to 
understand later!..

But what bothers me most for now is the fact that I just cannot pass 
List to [my modification of] CassandraTable.query(); I need to 
convert it to some string form within List using 
CassandraFilter.Translator, and then, when passed to [my modification 
of] CassandraTable.query(), I need to parse these List back... 
Is there way to eliminate this back-and-forth serialization-deserialization?

- Alexey.

(P.S. Sorry for not keeping the email thread for now...)

Julian Hyde wrote wrote:
> By "write a rule" I mean write a class that extends RelOptRule. An 
> example is CassandraRules.CassandraFilterRule.
> ProjectableFilterableTable was "only" designed for the case that 
> occurs 80% of the time but requires 20% of the functionality. Rules 
> run in a richer environment so have more power and flexibility.



Re: ProjectableFilterableTable.scan() and expensive columns

2017-10-23 Thread Luis Fernando Kauer
 Hi,
Can you give us some examples of the queries you tested? Include the the query 
plan Calcite generated.  (Use EXPLAIN PLAN FOR you query)
Currently, aggregates with no column reference, like count(*), generates a plan 
that scans all projects when using ProjectableFilterableTable. I'm not sure it 
there is a Jira for that already.
The other option is to use TranslatableTable, but be aware that you'll have to 
implement rules to push the necessary projects to the table scan, unlike when 
using ProjectableFilterableTable which has many rules built in Calcite for that.
See also:[CALCITE-1876] Create a rule to push the projections used in aggregate 
functions - ASF JIRA

Druid adapter calls Druid to execute the table scan. In CSV Adapter it is 
executed by Calcite using Enumerator.
Always check the query plan first and create rules to optimize it the way you 
want.
Enable tracing if you want to check which rules were applied.

https://calcite.apache.org/docs/howto.html#tracing



Re: Re: Assunto: querying elastisearch fields

2017-10-05 Thread Luis Fernando Kauer


  Your problem, again, is about the names being case sensitive. 
Try using double quotes around column and table names to make sure they are 
interpreted correctly or use them all in uppercase, both in the json model and 
in the query. 
You can also set a different lex property for the connection that is more 
flexible like MYSQL_ANSI or just the caseSensitive property to false. 
http://calcite.apache.org/docs/adapter.htm


Assunto: querying elastisearch fields

2017-10-05 Thread Luis Fernando Kauer
Did you take a look in the test cases for some examples of use? 
https://github.com/apache/calcite/blob/master/elasticsearch5/src/test/java/org/apache/calcite/test/Elasticsearch5AdapterIT.java
What have you tried? 

  


Re: Support of query for complex/nested data in calcite

2017-10-05 Thread Luis Fernando Kauer
 I created:[CALCITE-1999] UNNEST generates a NPE when the parameter is null - 
ASF JIRA
[CALCITE-2000] UNNEST a collection that has a field with nested data generates 
an Exception - ASF JIRA

The syntax you posted as the equivalent standard query did not work (wrong 
syntax):  select d.name, e.empid, e.salary from hr.depts d left cross join 
unnest(d.employees) e

What I managed to work for a left join is:  select d.name, e.empid, e.salary 
from hr.depts d left join unnest(d.employees) e on true

The syntax below also worked but it uses inner join:  select d.name, e.empid, 
e.salary from hr.depts d cross join unnest(d.employees) e

Regards,
Luis Fernando
Em quarta-feira, 4 de outubro de 2017 17:54:23 BRT, Julian Hyde 
<jhyde.apa...@gmail.com> escreveu:  
 
 Regarding nulls and FlatList. Can you please log a bug.

Regarding the Impala query

  select d.name, e.empid, e.salary from hr.depts d left join d.employees e

The equivalent standard query is

  select d.name, e.empid, e.salary from hr.depts d left cross join 
unnest(d.employees) e

I can see how the Impala’s syntactic sugar might be helpful, although frankly 
it’s not much shorter. Feel free to log a JIRA case.

Julian


> On Oct 4, 2017, at 12:07 PM, Luis Fernando Kauer 
> <lfka...@yahoo.com.br.INVALID> wrote:
> 
> 
> 
> Hi Joel.
> 
> Nice to hear someone else is also concerned about this.
> I'm more concerned about nested structures than unnesting simple arrays.What 
> I found so far:- the nested collection must exist (can't be null) or it 
> generates a NPE.  In "hr" schema, try creating a department with employees 
> collection as null.  Unfortunately, this is a very common issue for me.- the 
> join type is "inner" by default, but since I have many empty/null nested 
> collections, often I need "left join" instead.- a nested structure collection 
> from a nested structure collection (nested more than one level) always 
> generates errors when trying to unnest the first level.  This is because 
> Calcite uses FlatLists internally and a nested collection field usually 
> contains an array or List, which are not Comparable, so when it tries do add 
> the value of the column with nested data into the FlatList it generates an 
> error.- unnest reads the value the column passed as parameter with the nested 
> data for the current record and iterates through it. So the column must 
> return the whole nested collection with all nested fields loaded.  But it 
> would be nice if somehow it could select only the nested fields that are used 
> in the query.- It may not be the standard, but I like Impala's syntax where 
> you can simply query:select d.name, e.empid, e.salary from hr.depts d left 
> join d.employees eI think it is very simple to use and to understand.
> Best regards,
> 
> Luis Fernando
> 
> 
> Em quarta-feira, 4 de outubro de 2017 11:26:38 BRT, Joel Pfaff 
> <joel.pf...@gmail.com> escreveu: 
> 
> 
> 
> 
> 
> Hello,
> 
> Sorry I was not registered in the mailing list when Luis Fernando Kauer
> sent the mail, so I cannot reply directly to his.
> But we share a similar concern with ability to implement more
> precise/efficient queries on complex documents using SQL.
> 
> Currently in our system, the management of complex types requires using
> UNNEST, and there is no way to express a transformation inside a nested
> document without flattening it first. Or this flattening operation can be
> expensive if it has to be followed by a GROUP BY clause to recreate a
> similar document but with some nested data being filtered and/or aggregated
> (min/max/avg/concatenation).
> 
> For similar purposes, Google Big Query proposes UNNEST and ARRAY keywords
> (see  https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays)
> and by allowing implicit joins between simple columns and nested columns,
> it allows computations to stay local to the document.
> 
> Inside Impala, it is possible to create correlated sub-queries that apply
> on a repetition inside a document (see
> https://www.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html).
> There is another kind of syntax, different from the one from Big Query, but
> it works.
> 
> In Presto, it is possible to apply lambda function to map/array to
> transform the structure and apply filter on it. So you have filter,
> map_filter function to filter array and map respectively (see
> https://prestodb.io/docs/current/functions/lambda.html#filter). So it is
> yet another syntax.
> 
> From calcite's code, I can see that a good part of the code related to the
> parsing is already here, there is already support of MAP/ARRAY/MULTISET,
> and I am wondering what would be missing/different from Google Big Query
> s

[jira] [Created] (CALCITE-2000) UNNEST a collection that has a field with nested data generates an Exception

2017-10-05 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-2000:


 Summary: UNNEST a collection that has a field with nested data 
generates an Exception
 Key: CALCITE-2000
 URL: https://issues.apache.org/jira/browse/CALCITE-2000
 Project: Calcite
  Issue Type: Bug
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde


When unnesting a column that has another level of nesting it generates the 
following exception:
{noformat}
Caused by: java.lang.ArrayStoreException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.calcite.runtime.SqlFunctions$ProductComparableListEnumerator.current(SqlFunctions.java:2312)
at 
org.apache.calcite.runtime.SqlFunctions$ProductComparableListEnumerator.current(SqlFunctions.java:1)
at 
org.apache.calcite.linq4j.EnumerableDefaults$17$1.current(EnumerableDefaults.java:1968)
at 
org.apache.calcite.linq4j.EnumerableDefaults$11$1.moveNext(EnumerableDefaults.java:1225)
at Baz$4$1.moveNext(Unknown Source)
at 
org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.(Linq4j.java:680)
at org.apache.calcite.linq4j.Linq4j.enumeratorIterator(Linq4j.java:98)
at 
org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33)
at org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:90)
at 
org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:206)
at 
org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:67)
at 
org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:1)
at 
org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:630)
at 
org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:607)
at 
org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:638)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:149)
... 29 more
{noformat}

The problem is that Calcite uses FlatLists internally that expect items to 
implement Comparable.  
However, a column with a nested collection will have an array or a List as the 
column value, which do not implement Comparable, so it generates the Exception.
To test this in JdbcTest, I created a field in Employee with a collection.
For simplicity, I created a field called dependents in Employee class:

{code:java}
public static class HrSchema {
@Override public String toString() {
  return "HrSchema";
}

public final Dependent[] dependents = {
new Dependent(10, "Michael"),
new Dependent(10, "Jane"),
};
public final Employee[] emps = {
  new Employee(100, 10, "Bill", 1, 1000, Arrays.asList(dependents)),
  new Employee(200, 20, "Eric", 8000, 500,  
Collections.emptyList()),
  new Employee(150, 10, "Sebastian", 7000, null,  
Collections.emptyList()),
  new Employee(110, 10, "Theodore", 11500, 250,  
Collections.emptyList()),
};
public final Department[] depts = {
  new Department(10, "Sales", Arrays.asList(emps[0], emps[2]),
  new Location(-122, 38)),
  new Department(30, "Marketing", Collections.emptyList(),
  new Location(0, 52)),
  new Department(40, "HR", Collections.singletonList(emps[1]), null),
};
public final Dependent[] locations = {
  new Dependent(10, "San Francisco"),
  new Dependent(20, "San Diego"),
};

public QueryableTable foo(int count) {
  return Smalls.generateStrings(count);
}

public TranslatableTable view(String s) {
  return Smalls.view(s);
}
  }

  public static class Employee {
public final int empid;
public final int deptno;
public final String name;
public final float salary;
public final Integer commission;
public final List dependents;

public Employee(int empid, int deptno, String name, float salary,
Integer commission) {
  this(empid, deptno, name, salary, commission, 
Collections.emptyList());
}

public Employee(int empid, int deptno, String name, float salary,
Integer commission, List dependents) {
  this.empid = empid;
  this.deptno = deptno;
  this.name = name;
  this.salary = salary;
  this.commission = commission;
  this.dependents = dependents;
}

@Override public String toString() {
  return "Employee [empid: " + empid + ", deptno: " + deptno
  + ", name: " + name + "]";
}

@Override public boolean equals(Object obj) {
  return obj == this
  || obj instanceof Employee
  && empid == ((Employee) obj).empid;
}
  }
{code}

Just running the test case JdbcTest.testUnnestArrayColumn generates the 
exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CALCITE-1999) UNNEST generates a NPE when the parameter is null

2017-10-05 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-1999:


 Summary: UNNEST generates a NPE when the parameter is null
 Key: CALCITE-1999
 URL: https://issues.apache.org/jira/browse/CALCITE-1999
 Project: Calcite
  Issue Type: Bug
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde


It happens when trying to unnest a column with a nested collection when the 
column is null.
To reproduce it, I changed JdbcTest, addind a Department with a null collection 
of employees to HrSchema: 
{code:java}
 new Department(50, "NULL", null, null),
{code}
Just running the testCase JdbcTest.testUnnestArrayColumn() results in the NPE
NPE also happens running the following query:
{code:java}
select * from unnest(array(select * from \"hr\".\"depts\" where \"deptno\"=-1)) 
a
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Support of query for complex/nested data in calcite

2017-10-04 Thread Luis Fernando Kauer


Hi Joel.

Nice to hear someone else is also concerned about this.
I'm more concerned about nested structures than unnesting simple arrays.What I 
found so far:- the nested collection must exist (can't be null) or it generates 
a NPE.  In "hr" schema, try creating a department with employees collection as 
null.  Unfortunately, this is a very common issue for me.- the join type is 
"inner" by default, but since I have many empty/null nested collections, often 
I need "left join" instead.- a nested structure collection from a nested 
structure collection (nested more than one level) always generates errors when 
trying to unnest the first level.  This is because Calcite uses FlatLists 
internally and a nested collection field usually contains an array or List, 
which are not Comparable, so when it tries do add the value of the column with 
nested data into the FlatList it generates an error.- unnest reads the value 
the column passed as parameter with the nested data for the current record and 
iterates through it. So the column must return the whole nested collection with 
all nested fields loaded.  But it would be nice if somehow it could select only 
the nested fields that are used in the query.- It may not be the standard, but 
I like Impala's syntax where you can simply query:select d.name, e.empid, 
e.salary from hr.depts d left join d.employees eI think it is very simple to 
use and to understand.
Best regards,

Luis Fernando
 
 
 Em quarta-feira, 4 de outubro de 2017 11:26:38 BRT, Joel Pfaff 
<joel.pf...@gmail.com> escreveu: 





Hello,

Sorry I was not registered in the mailing list when Luis Fernando Kauer
sent the mail, so I cannot reply directly to his.
But we share a similar concern with ability to implement more
precise/efficient queries on complex documents using SQL.

Currently in our system, the management of complex types requires using
UNNEST, and there is no way to express a transformation inside a nested
document without flattening it first. Or this flattening operation can be
expensive if it has to be followed by a GROUP BY clause to recreate a
similar document but with some nested data being filtered and/or aggregated
(min/max/avg/concatenation).

For similar purposes, Google Big Query proposes UNNEST and ARRAY keywords
(see  https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays)
and by allowing implicit joins between simple columns and nested columns,
it allows computations to stay local to the document.

Inside Impala, it is possible to create correlated sub-queries that apply
on a repetition inside a document (see
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html).
There is another kind of syntax, different from the one from Big Query, but
it works.

In Presto, it is possible to apply lambda function to map/array to
transform the structure and apply filter on it. So you have filter,
map_filter function to filter array and map respectively (see
https://prestodb.io/docs/current/functions/lambda.html#filter). So it is
yet another syntax.

>From calcite's code, I can see that a good part of the code related to the
parsing is already here, there is already support of MAP/ARRAY/MULTISET,
and I am wondering what would be missing/different from Google Big Query
syntax?

Si I started to play, and I managed to run such a query end to end using a
test in JdbcTest:
@Test public void testUnnestArrayColumnWithFilter() {
  CalciteAssert.hr()
          .query(
        "select d.\"deptno\",\n"
          + " array(select x.\"empid\" * 2 from UNNEST(d.\"employees\") AS
x WHERE x.\"empid\" > 2) AS doubled_empno"
          + " from \"hr\".\"depts\" as d")
          .returnsUnordered(
                  "deptno=10; DOUBLED_EMPNO=[{200}, {300}]",
                  "deptno=30; DOUBLED_EMPNO=[]",
                  "deptno=40; DOUBLED_EMPNO=[{400}]");
}

SQL Parsing and Validation and Execution appeared to work just fine, and
that looks really nice.

I then used the explainContains() on that query to get access to the plan,
and I got:
PLAN=
EnumerableCalc(expr#0..4=[{inputs}], deptno=[$t0], DOUBLED_EMPNO=[$t4])
  EnumerableCorrelate(correlation=[$cor0], joinType=[inner],
requiredColumns=[{2}])
    EnumerableTableScan(table=[[hr, depts]])
    EnumerableCollect(field=[EXPR$0])
      EnumerableCalc(expr#0..4=[{inputs}], expr#5=[2], expr#6=[*($t0,
$t5)], expr#7=[>($t0, $t5)], EXPR$0=[$t6], $condition=[$t7])
        EnumerableUncollect
          EnumerableCalc(expr#0=[{inputs}], expr#1=[$cor0],
expr#2=[$t1.employees], employees=[$t2])
            EnumerableValues(tuples=[[{ 0 }]])

I wanted to check the plan generated by calcite in these queries, to
validate that the aggregations stay local to the document, and do not
trigger a full flatten/group by, in the case it is not needed.
And I am ho

Query complex/nested data

2017-09-13 Thread Luis Fernando Kauer
Hi,
We have many data in complex formats, similar to a json file, with nested 
collections and I would like to know what is the best approach to implement an 
adapter to read it efficiently.Users will probably end up joining the top level 
struct with the nested collections of struct when they need the nested data, so 
this is something I want to assure that works.But I would like the adapter to 
be able to select the used columns to be read from the nested collections, 
because they can be very large.It seems the function UNNEST is the way to make 
the nested collection available as a table in Calcite, but the whole collection 
is read with all the columns, so it is not efficient.Probably I'll need rules 
to push the projects used from the nested collections to the TableScan.It would 
be even greater if the query didn't need to use a function like UNNEST, but 
simply allow a join between the table and the nested collection field.Is there 
anything available for this?Any suggestion?
Best regards,
Luis Fernando


Re: Two newbie questions regarding Calcite Cassandra Adapter

2017-09-04 Thread Luis Fernando Kauer
Hi,

I'll try to answer your questions since nobody else did.
But I'm still fairly new to Calcite and I'm still learning it.

I'll answer your questions in reverse order because the answer to the second 
question is important to answer the first question.

2. ConverterRules: how Calcite knows which LogicalProject conversion rule to 
apply
Calcite has the concept of "Convention", which determines the calling 
convention.  Relational expressions start with Convention.NONE and each adapter 
sets its own convention when implementing TranslatableTable.toRel.
CassandraRel implements a Convention for Cassandra.
MongoRel implements a Convention for MongoDB.
CassandraTable.toRel returns a CassandraTableScan with a traitSet of 
CassandraRel.CONVENTION.
If you check the rules, there are 2 types basically:
- Normal rule (extends RelOptRule): rule defined by "operands"
- Conversion rule (extends ConverterRule): defined by a class and a traitSet 
(convention)
The converter rule uses the convention of the input to match the rule.
CassandraProjectRule is a ConverterRule so it matches only when the input has 
CassandraRel.CONVENTION. So it won't match a LogicalProject with 
MongoRel.CONVENTION.
The adapters also have a rule to convert between its convention and the 
EnumerableConvention.INSTANCE (CassandraToEnumerableConverter).
So the convention is the key to know which conversion rule to match and apply.
1. Different plans with different "Limit" clause

Looking into calcite-cassandra adapter I noticed that some of its rules are not 
convertion rules and they try to convert/apply to nodes already converted to 
Enumerable.
Take a look at CassandraLimitRule:
  super(operand(EnumerableLimit.class, 
operand(CassandraToEnumerableConverter.class, any())), "CassandraLimitRule");
The problem with this approach (IMHO) is that it depends heavily on the overall 
costs to have the CassandraLimit matched and chosen, but this kind of node 
should always be pushed.
That's what is happening when the limit is 1 in your query. 
LogicalSort is used for sorting but also for limit (fetch) and offset.
Cassandra adapter has a conversion rule for Sort but not when offset or limit 
is set.
I don't know the details why this approach was chosen, maybe it was because the 
jdbc adapter is a reference and it does not implement limit and offset 
correctly.
But I think CassandraSortRule could use the limit (fetch) and offset 
information and set it in the implement method, with no need for CassandraLimit.
Trying to solve this by only changing the costs is hard because the planner 
generates many different plans and you'll have to make sure the costs will 
always be smaller and keeping the enumerable nodes.

Hope it helps.

Best regards,

Luis Fernando

Em Sexta-feira, 1 de Setembro de 2017 1:13, Junwei Li 
 escreveu:



1. I happen to notice that the value in limit could change the query plan.

For example, if I do explain plan for "select \"tweet_id\" from

\"userline\" limit 1", the plan is:

PLAN=EnumerableCalc(expr#0..2=[{inputs}], tweet_id=[$t2])

  EnumerableLimit(fetch=[1])

CassandraToEnumerableConverter

  CassandraTableScan(table=[[twissandra, userline]])


But if limit value no smaller than 8, such as for "select \"tweet_id\" from

\"userline\" limit 8", the plan is:

PLAN=CassandraToEnumerableConverter

  CassandraLimit(fetch=[8])

CassandraProject(tweet_id=[$2])

  CassandraTableScan(table=[[twissandra, userline]])


Since computeSelfCost() depends on the row count, I guess that is why the

query plans are different. But I want to make sure the query plan stay the

same as the second one even the limit value is smaller than 8. How should I

tweak the computeSelfCost()?




2. If I do a inner join query that spans across cassandra and another

database, such as mongodb, an example for illustration purpose:

"with\n" +

"tweet as (\n" +

"select \"tweet_id\" as id from

\"twissandra\".\"userline\" where "\"username\"='!PUBLIC!' order by

\"time\")\n," +

"metrics as (\n" +

"select \"id\" as id, \"metricName\", \"value\"\n" +

"from \"mongodb\".METRIC\n" +

"where \"metricName\" = 'cpu.usage.average'\n" +

")\n" +

"select tweet.id, metrics.\"value\" from tweet\n" +

"inner join metrics using(id)\n" +

"where metrics.\"value\" > 70"


According to CassandraProjectRule, super(LogicalProject.class,

"CassandraProjectRule"), seems like as long as there is a

LogicalProject.class, this CassandraProjectRule  will get triggered. Does

this mean LogicalProject that belongs to mongodb will also get matched with

CassandraProjectRule? But when I do explain plan, the generated plan does

separate the projections correctly to Cassandra and mongodb. How can

Calcite tell which projection belong to which schema? Thanks.




BindableTableScan.computeSelfCost does not use projects or filters info into account

2017-08-28 Thread Luis Fernando Kauer
At first I was using TranslatableTable for my adapters (read files in different 
formats), but I realized that many rules apply only to 
ProjectableFilterableTable, so I started using it to take advantage of all 
these builtin rules.
Restricting the projects that need to be scanned is very important because it 
can reduce a lot the time to scan the table.

Unfortunately I'm having trouble making Calcite select a plan that pushes the 
used projects to the table scan in simple queries that have a filter.  The plan 
selected has only the filters and no projects.
What seems to be happening is that BindableFilterRule and BindableProjectRule 
are applied generating different plans, some with only the filter, some with 
only the projects and some with both.
The planner has to decide which one to choose, but 
BindableTableScan.computeSelfCost just multiplies the cost by 0.01, without 
taking into account the restriction of projects or the existence of filters.
Since the different plans with and without the projects and filters end up with 
the same cost, the planner chooses the first plan, which in my case has been a 
BindableTableScan with only the filters, generated by the BindableFilterRule.
The query is something like:
select c1, c2 from t where c3=10

Changing BindableTableScan.computeSelfCost to calculate different costs 
according to projects and filters made the planner choose it right, but it 
would be nice to allow each adapter/table to define its own costs, since 
projects and filters can have different costs.


Re: Rule to remove right part of a left join when no project is used after the join

2017-07-28 Thread Luis Fernando Kauer
> I think you also need a check that the right-hand key is unique. For
> example, it's not valid to rewrite
>
>   select d.name from dept left join emp using (deptno)
>
> to
>
> select d.name from dept
>
> because the second query gives fewer rows.


Yes, you are right. I forgot about these cases where it can increase the number 
of rows.
My main use case for the rule is to simplify a query that uses a view with many 
joins to dimension tables.
So how can I check if the right-hand key is unique?
Jdbc-adapter doesn't seem to load any information about keys.
You mentioned RelReferencialConstraint.
How can I use it?

I'll try to implement your other comments too.

Thank you.

Luis Fernando

Em Sexta-feira, 28 de Julho de 2017 14:27, Julian Hyde <jh...@apache.org> 
escreveu:



That looks good.

I think you also need a check that the right-hand key is unique. For
example, it's not valid to rewrite

  select d.name from dept left join emp using (deptno)

to

select d.name from dept

because the second query gives fewer rows.

I think there's some library code to ensure that a join only uses
fields from a particular input.

Often people choose to parameterize rule with classes. Your rule, for
instance, could work on any sub-class of Project and Join, not just
LogicalProject and LogicalJoin.

I also strongly recommend that you add sufficient tests to
RelOptRulesTest to prove that the rule handles the main cases.
Including cases, like above, where it would be wrong to fire the rule.

If you intend to contribute this at some point, please file a JIRA
case. You can attach a PR later.

Julian



On Fri, Jul 28, 2017 at 7:53 AM, Luis Fernando Kauer
<lfka...@yahoo.com.br.invalid> wrote:
> I'm still learning Calcite and how to convert and transform the nodes, but I 
> managed to create a rule to do this that seems to be working, but I would 
> like your comments about flaws and improvements.
> It drops the right hand side of a join when the join type is "left" and the 
> project above the join includes only field reference to the input from the 
> left.
>
> public class LeftJoinRemoveRule extends RelOptRule {
>   public static final LeftJoinRemoveRule INSTANCE =
> new LeftJoinRemoveRule();
>
>   //~ Constructors ---
>
>   /** Creates a SemiJoinRemoveRule. */
>   private LeftJoinRemoveRule() {
> super(operand(LogicalProject.class,
>   operand(LogicalJoin.class, any(;
>   }
>
>   //~ Methods 
>
>   public void onMatch(RelOptRuleCall call) {
> LogicalProject project = call.rel(0);
> LogicalJoin join = call.rel(1);
> RelNode input = join.getInput(0);
> int fieldCount = input.getRowType().getFieldCount();
> if( join.getJoinType()!=JoinRelType.LEFT)
>   return;
> for(RexNode rex: project.getProjects()) {
>   if (! (rex instanceof RexInputRef)) {
> return;
>   }
>   RexInputRef ref = (RexInputRef) rex;
>   if( ref.getIndex() >= fieldCount ) {
> return;
>   }
> }
>call.transformTo(project.copy(project.getTraitSet(), 
> ImmutableList.of(input)));
>   }
> }
>
> Best regards,
>
> Luis Fernando
>
> Em Sexta-feira, 21 de Julho de 2017 22:21, Julian Hyde <jh...@apache.org> 
> escreveu:
>
>
>
> As you're probably aware, It's only safe to drop a table if (a) none
> of its fields are used, (b) the join is to a unique key of that table,
> (c) there is a referential constraint to that table.
>
> Without conditions (b) and (c) are necessary, the table could be
> converting each row on the left to zero or more than one rows, and
> therefore removing the join would affect the result.
>
> This kind of schema is common in data warehousing environments, so I
> created the Lattice construct. Tables in a lattice implicitly have
> many-to-one relationships. That makes it safe to transform a query on,
> say (Sales JOIN Customer) to use a materialized view on (Sales JOIN
> Customer) JOIN Product), because adding/removing the Product table
> will not affect the row count.
>
> With Lattices the referential constraints are implicit, but we didn't
> have the explicit referential constraints that made it safe to do this
> transform. Jesus added them (class RelReferentialConstraint) as part
> of https://issues.apache.org/jira/browse/CALCITE-1731.
> Maybe you can leverage them in a rule you write. It would match
> (Project (Join A B)) and convert to (Project A).
>
> I don't recall how to add rules. Some people use Frameworks or Planner.
>
> Julian
>
>
>
>
> On Fri, Jul 21, 2017 at 12:42 PM, Luis Fernando Kauer
> <lfka...@yahoo.com.br.inv

Re: Rule to remove right part of a left join when no project is used after the join

2017-07-28 Thread Luis Fernando Kauer
I'm still learning Calcite and how to convert and transform the nodes, but I 
managed to create a rule to do this that seems to be working, but I would like 
your comments about flaws and improvements.
It drops the right hand side of a join when the join type is "left" and the 
project above the join includes only field reference to the input from the left.

public class LeftJoinRemoveRule extends RelOptRule { 
  public static final LeftJoinRemoveRule INSTANCE = 
new LeftJoinRemoveRule(); 

  //~ Constructors --- 

  /** Creates a SemiJoinRemoveRule. */ 
  private LeftJoinRemoveRule() { 
super(operand(LogicalProject.class, 
  operand(LogicalJoin.class, any(; 
  } 

  //~ Methods  

  public void onMatch(RelOptRuleCall call) { 
LogicalProject project = call.rel(0); 
LogicalJoin join = call.rel(1); 
RelNode input = join.getInput(0); 
int fieldCount = input.getRowType().getFieldCount(); 
if( join.getJoinType()!=JoinRelType.LEFT) 
  return; 
for(RexNode rex: project.getProjects()) { 
  if (! (rex instanceof RexInputRef)) { 
return; 
  } 
  RexInputRef ref = (RexInputRef) rex; 
  if( ref.getIndex() >= fieldCount ) { 
return; 
  } 
} 
   call.transformTo(project.copy(project.getTraitSet(), 
ImmutableList.of(input))); 
  } 
}

Best regards,

Luis Fernando

Em Sexta-feira, 21 de Julho de 2017 22:21, Julian Hyde <jh...@apache.org> 
escreveu:



As you're probably aware, It's only safe to drop a table if (a) none
of its fields are used, (b) the join is to a unique key of that table,
(c) there is a referential constraint to that table.

Without conditions (b) and (c) are necessary, the table could be
converting each row on the left to zero or more than one rows, and
therefore removing the join would affect the result.

This kind of schema is common in data warehousing environments, so I
created the Lattice construct. Tables in a lattice implicitly have
many-to-one relationships. That makes it safe to transform a query on,
say (Sales JOIN Customer) to use a materialized view on (Sales JOIN
Customer) JOIN Product), because adding/removing the Product table
will not affect the row count.

With Lattices the referential constraints are implicit, but we didn't
have the explicit referential constraints that made it safe to do this
transform. Jesus added them (class RelReferentialConstraint) as part
of https://issues.apache.org/jira/browse/CALCITE-1731.
Maybe you can leverage them in a rule you write. It would match
(Project (Join A B)) and convert to (Project A).

I don't recall how to add rules. Some people use Frameworks or Planner.

Julian




On Fri, Jul 21, 2017 at 12:42 PM, Luis Fernando Kauer
<lfka...@yahoo.com.br.invalid> wrote:
> Hi,
>
> I have a schema with a lot of normalized tables, which makes it hard for 
> users to build the queries.
>
> I would like to create views with all the joins already included so that 
> users could just select the fields without worrying with multiple tables and 
> join conditions.
>
>
> But the optimizer should drop all the tables that are not actually used (when 
> possible), otherwise the performance is not good.
>
> Any suggestion?
>
> How can I register my own rules besides the build in rules when connecting 
> through CalciteConnection?
>
> Best regards,
>
> Luis Fernando


[jira] [Created] (CALCITE-1906) JdbcSortRule has a bug and it is chosen

2017-07-27 Thread Luis Fernando Kauer (JIRA)
Luis Fernando Kauer created CALCITE-1906:


 Summary: JdbcSortRule has a bug and it is chosen
 Key: CALCITE-1906
 URL: https://issues.apache.org/jira/browse/CALCITE-1906
 Project: Calcite
  Issue Type: Bug
  Components: jdbc-adapter
Reporter: Luis Fernando Kauer
Assignee: Julian Hyde


JdbcSortRule tries to push sort and limit operations to the database.
Currently offset and limit operations are explicitly not pushed to the database 
(prevented by the rule) but even sort operations end up not being pushed.
Checking how other adapters deal with this, like Mongo and Cassandra adapters, 
I realized that the convert function from JdbcSortRule is different from the 
others.
Jdbc-adapter:
{{
 if (sort.offset != null || sort.fetch != null) {
// Cannot implement "OFFSET n FETCH n" currently.
return null;
  }
  final RelTraitSet traitSet = sort.getTraitSet().replace(out);
  return new JdbcSort(rel.getCluster(), traitSet,
  convert(sort.getInput(), traitSet), sort.getCollation());
}}
mongodb-adapter:
{{
  final RelTraitSet traitSet =
  sort.getTraitSet().replace(out)
  .replace(sort.getCollation());
  return new MongoSort(rel.getCluster(), traitSet,
  convert(sort.getInput(), traitSet.replace(RelCollations.EMPTY)),
  sort.getCollation(), sort.offset, sort.fetch);
}}
By fixing JdbcSortRule so that it is just like those others and by removing the 
code that prevented the rule to match when limit or offset are used seems to 
solve the problem and JdbcSortRule now is being applied and both sort and limit 
are being pushed to the database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Rule to remove right part of a left join when no project is used after the join

2017-07-21 Thread Luis Fernando Kauer
Hi,

I have a schema with a lot of normalized tables, which makes it hard for users 
to build the queries.

I would like to create views with all the joins already included so that users 
could just select the fields without worrying with multiple tables and join 
conditions.


But the optimizer should drop all the tables that are not actually used (when 
possible), otherwise the performance is not good.

Any suggestion?

How can I register my own rules besides the build in rules when connecting 
through CalciteConnection?

Best regards,

Luis Fernando


Re: Explain Plan for aggregating a single column in CSV Adapter

2017-07-11 Thread Luis Fernando Kauer
Hi,

If I change CsvTranslatableTable so that it implements 
ProjectableFilterableTable instead of TranslatableTable and implement the scan 
method, Calcite's own rules apply and the plan gets right, scanning only the 
used field in the aggregate function.

However, now I realized that 

"select count(*) from EMPS" generates the plan:
EnumerableAggregate(group=[{}], EXPR$0=[COUNT()]) 
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

"select * from EMPS" generates the plan:
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Notice that the count(*) generates a plan that scans all fields, requiring to 
convert them all without the need.
Even when using ProjectableFilterableTable plan scans all fields, but the plan 
for "select count(name) from EMPS" scans just one field.
What could be the best approach to handle the count(*) without having to scan 
all fields?

Best regards,

Luis Fernando





Em Quinta-feira, 6 de Julho de 2017 18:05, Julian Hyde <jh...@apache.org> 
escreveu:



Calcite should realize that Aggregate has an implied Project (because it only 
uses a few columns) and push that projection into the CsvTableScan, but it 
doesn’t.

I think we need a new rule for Aggregate on a TableScan of a 
ProjectableFilterableTable. Can you create a JIRA case please?

I created a test case. It currently fails:

diff --git a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java 
b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
index 00c59ee..2402872 100644
--- a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
+++ b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
@@ -241,6 +241,13 @@ public Void apply(ResultSet resultSet) {
 .ok();
   }

+  @Test public void testAggregateImpliesProject() throws SQLException {
+final String sql = "select max(name) from EMPS";
+final String plan = "PLAN=EnumerableAggregate(group=[{}], 
EXPR$0=[MAX($0)])\n"
++ "  CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])\n";
+sql("smart", "explain plan for " + sql).returns(plan).ok();
+  }
+
   @Test public void testFilterableSelect() throws SQLException {
 sql("filterable-model", "select name from EMPS").ok();
   }


Julian


> On Jul 6, 2017, at 1:23 PM, Luis Fernando Kauer 
> <lfka...@yahoo.com.br.INVALID> wrote:
> 
> Hi,
> I'm trying to understand the CSV Adapter and how the rules are fired.The 
> CsvProjectTableScanRule gets fired when I use CsvTranslatableTable.But I'm 
> not understanding why I'm getting a plan that scans all fields when I use an 
> aggregate function.For example:explain plan for select name from 
> emps;CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
> 
> explain plan for select max(name) from emps;EnumerableAggregate(group=[{}], 
> EXPR$0=[MAX($1)])CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 
> 5, 6, 7, 8, 9]])
> I noticed that the rule gets fired and at that point it shows just 1 field 
> being used.But the last time CsvTableScan.deriveRowType() gets called it has 
> all the fields set, and it's not the instance create by the rule, but the 
> first instance created with all the fields.
> Can anybody explain me if this is a bug or if this is supposed to happen with 
> aggregate functions ?
> Best regards,
> Luis Fernando Kauer