Re: JavaScript Avatica client

2019-09-12 Thread Gabriel Reid
Hi,

Personally, I'd be happy to contribute it -- it's currently MIT licensed
(as a matter of convention of my employer), but I'll double-check with my
employer about whether there are any qualms about just contributing it to
the project directly.

For now, it's certainly fine with me if you link to it from the Calcite
site.

- Gabriel


On Tue, Sep 10, 2019 at 7:24 PM Julian Hyde  wrote:

> I have to say… if you wish to contribute this, we would definitely
> consider it. It would not be the first client in a non-JVM language:
> Francis wrote the Go driver and now it is part of Calcite/Avatica.
>
> Julian
>
>
> > On Sep 10, 2019, at 3:03 AM, Francis Chuang 
> wrote:
> >
> > Hi Gabriel,
> >
> > It's always very exciting to see new clients/drivers for Avatica that
> are contributed by the community.
> >
> > We maintain a list of clients for Avatica on this page:
> https://calcite.apache.org/avatica/docs/
> >
> > Would it be okay to add your Javascript client to that page?
> >
> > Francis
> >
> > On 10/09/2019 5:45 pm, Gabriel Reid wrote:
> >> Hi,
> >> I just wanted to let anyone who's interested know that I've recently
> been
> >> working on a JavaScript (i.e. Node.js) client for Avatica. The current
> >> status is that it does what I need it to do (i.e. it can execute a query
> >> and return the results), and it could potentially be useful to others
> (as
> >> well as being built on further), although I'm sure that it doesn't cover
> >> all use cases (e.g. currently only a subset of data types are handled).
> >> The GitHub repo is here: https://github.com/waylayio/avatica-js
> >> The npm entry is here: https://www.npmjs.com/package/avaticajs
> >> - Gabriel
>
>


Re: Issues in exposing data via TableFunction vs TableMacro

2019-09-12 Thread Gabriel Reid
Hi Julian,

On Tue, Sep 10, 2019 at 5:19 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

>
> when going through the Code I just had another Idea.
> Currently a TableFunction is executed as EnumerableTableFunctionScan which
> gets generated from a LogicalTableFunctionScan by the Rule
> EnumerableTableFunctionScanRule.
> What if you just remove that Rule and add a custom Rule of yours which
> translates it of a TableScan of your taste?
>
>

That is indeed something I had thought about.

In reference to your earlier question, yes, there are parameters needed for
the TableFunction, so I think that the DrillTable approach wouldn't work in
my case.

I didn't see any easy way to alter rules outside of RelNode.register, but I
assume that that is indeed possible, so I'll look further into this
approach. Thanks for the advice.

- Gabriel





> Julian
>
> Am 10.09.19, 08:13 schrieb "Julian Feinauer" <
> j.feina...@pragmaticminds.de>:
>
> Hi Gabriel,
>
> thats an interesting question for me too.
> Do you need parameters for those "dynamic tables"?
> If not you could do it similar to what Drill is doing and just
> implement a Schema which always returns "true" if someone asks for a Table
> and then returns a Table Implementation that you provide where you can hook
> in later and add the functionality that you in fact need. This can then
> also be used in optimization as you can then control your custom Table type.
> Perhaps it helps to look at the DrillTable class in [1].
>
> On a Side node I try to figure out what would be necessary to make
> TableFunction wo also work with TranslatableTable.
> Would you mind opening an issue in Jira for that?
>
> Julian
>
> [1]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java
>
> Am 10.09.19, 03:25 schrieb "Gabriel Reid" :
>
> Hi,
>
> I'm currently using a combination of TableFunctions and
> TableMacros to
> expose various dynamic (relatively unstructured) data sources via
> Calcite.
> The underlying data sources are such that data can only be
> retrieved by
> first specifying what you want (i.e. there is no catalog of all
> data that
> is available).
>
> I'm currently handling this by using a combination of
> TableFunctions and
> TableMacros.
>
> The issue that I'm running into comes when I want to implement
> custom
> planner rules for the underlying functionality. As far I as I can
> see, it's
> not possible to register planner rules based on a
> TableFunctionImpl,
> because a TableFunctionImpl only exposes a ScannableTable, so
> there's no
> chance to hook into RelOptNode.register.
>
> On the other hand, implementing a TableMacro does allow to return a
> TranslatableTable, which then does allow intercepting the call to
> RelOptNode.register to register rules. However, TableMacros
> require that
> all parameters are literals, and I'm providing timestamps, via
> TIMESTAMPADD() and CURRENT_TIMESTAMP() calls, which then doesn't
> work for
> TableMacros (all parameters to a table macro need to be literals,
> otherwise
> query validation fails in Calcite).
>
> I'm wondering if I'm missing some built-in functionality which
> would make
> it possible to have a dynamic table function/macro that can also be
> manipulated via custom planner rules.
>
> Options (which may or may not exist) that I can think of are:
> * something that would/could visit all macro parameters ahead of
> time and
> resolve things like CURRENT_TIMESTAMP() to a literal, before
> further query
> validation occurs
> * register rules somewhere outside of RelOptNode.register (e.g.
> when the
> schema is first created)
>
> Are there any currently-working options in Calcite that can help
> me do what
> I'm trying to do? And if there aren't and I would add such a thing
> to
> Calcite, are there any suggestions as to what the most appropriate
> approach
> would be (either one of the two options I listed above, or
> something else)?
>
> Thanks,
>
> Gabriel
>
>
>
>
>


Issues in exposing data via TableFunction vs TableMacro

2019-09-10 Thread Gabriel Reid
Hi,

I'm currently using a combination of TableFunctions and TableMacros to
expose various dynamic (relatively unstructured) data sources via Calcite.
The underlying data sources are such that data can only be retrieved by
first specifying what you want (i.e. there is no catalog of all data that
is available).

I'm currently handling this by using a combination of TableFunctions and
TableMacros.

The issue that I'm running into comes when I want to implement custom
planner rules for the underlying functionality. As far I as I can see, it's
not possible to register planner rules based on a TableFunctionImpl,
because a TableFunctionImpl only exposes a ScannableTable, so there's no
chance to hook into RelOptNode.register.

On the other hand, implementing a TableMacro does allow to return a
TranslatableTable, which then does allow intercepting the call to
RelOptNode.register to register rules. However, TableMacros require that
all parameters are literals, and I'm providing timestamps, via
TIMESTAMPADD() and CURRENT_TIMESTAMP() calls, which then doesn't work for
TableMacros (all parameters to a table macro need to be literals, otherwise
query validation fails in Calcite).

I'm wondering if I'm missing some built-in functionality which would make
it possible to have a dynamic table function/macro that can also be
manipulated via custom planner rules.

Options (which may or may not exist) that I can think of are:
* something that would/could visit all macro parameters ahead of time and
resolve things like CURRENT_TIMESTAMP() to a literal, before further query
validation occurs
* register rules somewhere outside of RelOptNode.register (e.g. when the
schema is first created)

Are there any currently-working options in Calcite that can help me do what
I'm trying to do? And if there aren't and I would add such a thing to
Calcite, are there any suggestions as to what the most appropriate approach
would be (either one of the two options I listed above, or something else)?

Thanks,

Gabriel


JavaScript Avatica client

2019-09-10 Thread Gabriel Reid
Hi,

I just wanted to let anyone who's interested know that I've recently been
working on a JavaScript (i.e. Node.js) client for Avatica. The current
status is that it does what I need it to do (i.e. it can execute a query
and return the results), and it could potentially be useful to others (as
well as being built on further), although I'm sure that it doesn't cover
all use cases (e.g. currently only a subset of data types are handled).

The GitHub repo is here: https://github.com/waylayio/avatica-js
The npm entry is here: https://www.npmjs.com/package/avaticajs

- Gabriel


[jira] [Created] (CALCITE-3333) Add time-based of ResultSet frame size limiting

2019-09-09 Thread Gabriel Reid (Jira)
Gabriel Reid created CALCITE-:
-

 Summary: Add time-based of ResultSet frame size limiting
 Key: CALCITE-
 URL: https://issues.apache.org/jira/browse/CALCITE-
 Project: Calcite
  Issue Type: New Feature
  Components: avatica
Reporter: Gabriel Reid


The size of a single JDBC ResultSet frame returned in a single 
{{prepareAndExecute}} or {{fetch}} invocation is currently 100, meaning that 
each retrieval of a portion of a ResultSet will send 100 rows over the wire. 
This frame size may be too big in some situations, and too small in other 
situations.

If the underlying data source being queried can provide thousands of (small) 
records per second, then only reading them at 100 per RPC call will be 
unnecessarily slow.

On the other hand, if the underlying data source is only providing records at a 
rate of 1 per second, then it will take 100 seconds for each RPC call to 
return, which can lead to timeouts (particularly if Avatica server is sitting 
behind a proxy that has a strict request timeout).

The main factors to take into account when finding an ideal size of frame to 
return for each RPC call are:
* make the frames small enough that they don't overload either Avatica server 
or the client with overly large amounts of data at one time
* make the frames large enough so that the percentage of total query time that 
is spent only on RPC overhead is minimized

The general idea of this ticket is to add a pluggable "frame size limiting" 
functionality so that frame size limiting can be done based on the number of 
rows, number of bytes, amount of time spent building a frame, or any other 
property or combination of properties.

Note that CALCITE-2322 contains some work to allow configuring the size of a 
single frame on a Connection or Statement (via the {{setFetchSize}} method), 
although it's not yet merged in. That ticket would also be useful, and does not 
conflict with the general intent of this ticket.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Optimization of join between a small table and a large table

2019-08-08 Thread Gabriel Reid
Hi Stamatis,

Thank you so much, this is exactly the kind of info I was looking for! I
think I can figure out how to go forward based on this, thanks again.

- Gabriel

On Thu, Aug 8, 2019 at 8:19 AM Stamatis Zampetakis 
wrote:

> Hi Gabriel,
>
> What you want indeed seems to be a nested loop join; there has been a
> relevant discussion in the dev list [1].
> You may also find relevant the ongoing discussion on CALCITE-2979 [2].
>
> Best,
> Stamatis
>
> [1]
>
> https://lists.apache.org/thread.html/d9f95683e66009872a53e7e617295158b98746b550d2bf68230b3096@%3Cdev.calcite.apache.org%3E
>
> [2] https://issues.apache.org/jira/browse/CALCITE-2979
>
> On Wed, Aug 7, 2019 at 2:56 PM Gabriel Reid 
> wrote:
>
> > Hi,
> >
> > I'm currently working on a custom Calcite adapter, and I've got a
> situation
> > where I want to join a small table with a large table, basically just
> using
> > the small table to filter the large table. Conceptually, I'm talking
> about
> > something that is equivalent to the following query:
> >
> > select emp.name
> > from emp join dept on emp.deptid = dept.id
> > where dept.name = 'Sales'
> >
> > I've got converter rules which to push down filtering on all tables,
> which
> > make a big difference in performance. The above query current results in
> an
> > EnumerableHashJoin over a filtered scan over the 'dept' table, and an
> > unfiltered scan over the 'emp' table.
> >
> > What I would like to accomplish is that this is converted into a (I
> think)
> > a nested loop join between 'dept' and emp, so that the filtered scan is
> > done once over 'dept', and then a filtered scan is done for each entry of
> > the 'dept' table using the 'id' value from that entry as a filter value
> on
> > the scan on the 'emp' table.
> >
> > I've been able to get a nested loop join to be used, but I haven't
> managed
> > to have the 'id' values from the 'dept' table to be used to filter the
> > 'emp' table. Instead, the full 'emp' table is scanned for each iteration
> of
> > the 'dept' table.
> >
> > And now my questions: are my hopes/expectations here realistic, and/or is
> > there a much better/easier way of accomplishing the same thing? Is there
> > prior art somewhere within the Calcite code base or elsewhere? Or should
> > this just be working by default?
> >
> > I would assume that this isn't that unusual of a situation, which is why
> I
> > was expecting that there would already be something like this somewhere
> (or
> > I'm doing the wrong thing), but I haven't managed to find any clear
> > pointers in any one direction yet.
> >
> > Thanks in advance for any advice!
> >
> > - Gabriel
> >
>


Optimization of join between a small table and a large table

2019-08-07 Thread Gabriel Reid
Hi,

I'm currently working on a custom Calcite adapter, and I've got a situation
where I want to join a small table with a large table, basically just using
the small table to filter the large table. Conceptually, I'm talking about
something that is equivalent to the following query:

select emp.name
from emp join dept on emp.deptid = dept.id
where dept.name = 'Sales'

I've got converter rules which to push down filtering on all tables, which
make a big difference in performance. The above query current results in an
EnumerableHashJoin over a filtered scan over the 'dept' table, and an
unfiltered scan over the 'emp' table.

What I would like to accomplish is that this is converted into a (I think)
a nested loop join between 'dept' and emp, so that the filtered scan is
done once over 'dept', and then a filtered scan is done for each entry of
the 'dept' table using the 'id' value from that entry as a filter value on
the scan on the 'emp' table.

I've been able to get a nested loop join to be used, but I haven't managed
to have the 'id' values from the 'dept' table to be used to filter the
'emp' table. Instead, the full 'emp' table is scanned for each iteration of
the 'dept' table.

And now my questions: are my hopes/expectations here realistic, and/or is
there a much better/easier way of accomplishing the same thing? Is there
prior art somewhere within the Calcite code base or elsewhere? Or should
this just be working by default?

I would assume that this isn't that unusual of a situation, which is why I
was expecting that there would already be something like this somewhere (or
I'm doing the wrong thing), but I haven't managed to find any clear
pointers in any one direction yet.

Thanks in advance for any advice!

- Gabriel